Blog
All Blog Posts | Next Post | Previous Post
Add AI superpower to your Delphi & C++Builder apps part 3: multimodal LLM use
Today
This is part 3 of our blog series on adding AI superpower to your Delphi & C++Builder apps. We already had the first article on basic usage of LLMs and the second article about using function calling with LLMs. In these first two articles, we dealt with textual information. In this third installment, we shift to multimodal LLMs. That is LLMs with the capabilities to deal also with other information than "simple prompts". In other words, providing files as context for the LLMs that contain ngimages, video, audio, documents ...
Embracing Multimodal LLMs in Delphi: Describe, Compare, Extract, Summarize, Translate All in One
AI has quickly moved beyond just text generation. With the rise of multimodal large language models (LLMs), Delphi developers can now leverage image understanding, OCR, file summarization, and translation all with minimal code and maximum flexibility. And thanks to the TTMSFNCCloudAI component, switching between AI providers like OpenAI, Claude, Mistral, Gemini, DeepSeek, Ollama, Grok, or Perplexity becomes seamless.
Why Multimodal Matters
Traditional LLMs focused on text. Todays advanced models can process both text and images, enabling workflows such as:
-
Automatically describing image content
-
Performing OCR on photos or scanned documents
-
Comparing two pictures and identifying visual differences
-
Summarizing lengthy documents
-
Translating files between languages
All of these tasks are achievable with the same API structure, just by adjusting context instructions. And best of all, you remain in control of the backend AI servicewhether hosted or local.
A Unified Approach with TTMSFNCCloudAI
Heres how you use it:
1. Describe an Image
Whether its a scenic photo or a complex chart, supported AI models can return a natural language summary of whats in the image.Here is an example showing an amazing result, that it even detected a half readable bottle label and could correctly identify it as Jules Mumm champagne!
2. Compare Two Pictures
Ideal for visual regression tests, UI comparisons, or even spotting differences in scanned documents or maps. In our testing, the Claude LLM seemed to provide the most accurate and knowledgable answer.
3. Perform OCR (Optical Character Recognition)
Forget hard-coded OCR libraries just describe the task and let the LLM handle everything. Here the test performed was with a picture taken from the back of the "Delphi Component Design" by the late Danny Thorpe (I had the honor to meet a few times back in Scotts Valley). Here credits go to OpenAI that was not only extremely accurate but was also smart enough to see the two column layout and properly put the text under each other. Up till the ISBN number of the book, everything is correct.
4. Summarize a Text File
Perfect for making sense of long reports, log files, or any dense document.
5. Translate Text
Build multilingual applications with just a few lines of Delphi code.
Abstracting the Complexity
One of the biggest strengths of TTMSFNCCloudAI is abstraction. You don't need to learn every provider's API or worry about changing your code when switching services. The interface stays the same. Just configure your model and endpoint.
This allows developers to:
-
Prototype with OpenAI, then move to Claude for privacy
-
Use local models with Ollama during development
-
Compare results from Gemini or Grok with just a config change
Vision Models Required
Note: Some providers require specific models that support image understanding. For example:
-
Ollama: Only models like
llava
orbakllava
support vision -
Grok and Mistral: Need to be paired with multimodal-capable backends
-
Claude, OpenAI (GPT-4o), and Gemini Pro Vision support image input natively
Always ensure the model you choose understands the data type you're sending.
A Future-Proof Way to Integrate AI
With TTMSFNCCloudAI, you're not locked into one vendor or use case. You build once, and switch as needed. The multimodal revolution is here, and Delphi developers now have a first-class way to participate.
Start experimenting. Start integrating. Start building smarter Delphi apps today.
Explore TTMSFNCCloudAI and redefine how your applications interact with the world.
In upcoming articles, well dive deeper into RAG, agents, MCP servers & clients.
If you have an active TMS ALL-ACCESS license, you can now get also access to the first test version of TMS AI Studio that uses the TTMSFNCCloudAI
component but also has everything on board to let you build MCP servers and clients.
Register now to participate in this testing via this landing page.
Bruno Fierens

This blog post has received 2 comments.


Bruno Fierens
All Blog Posts | Next Post | Previous Post
Carlomagno Antonello