Blog

All Blog Posts | Next Post | Previous Post

Add AI superpower to your Delphi & C++Builder apps part 3: multimodal LLM use

Friday, May 23, 2025

TMS Software Delphi Components

This is part 3 of our blog series on adding AI superpower to your Delphi & C++Builder apps. We already had the first article on basic usage of LLMs and the second article about using function calling with LLMs. In these first two articles, we dealt with textual information. In this third installment, we shift to multimodal LLMs. That is LLMs with the capabilities to deal also with other information than "simple prompts". In other words, providing files as context for the LLMs that contain ngimages, video, audio, documents ...

Embracing Multimodal LLMs in Delphi: Describe, Compare, Extract, Summarize, Translate — All in One

AI has quickly moved beyond just text generation. With the rise of multimodal large language models (LLMs), Delphi developers can now leverage image understanding, OCR, file summarization, and translation — all with minimal code and maximum flexibility. And thanks to the TTMSFNCCloudAI component, switching between AI providers like OpenAI, Claude, Mistral, Gemini, DeepSeek, Ollama, Grok, or Perplexity becomes seamless.

TMS Software Delphi Components

Why Multimodal Matters

Traditional LLMs focused on text. Today’s advanced models can process both text and images, enabling workflows such as:

Automatically describing image content
Performing OCR on photos or scanned documents
Comparing two pictures and identifying visual differences
Summarizing lengthy documents
Translating files between languages

All of these tasks are achievable with the same API structure, just by adjusting context instructions. And best of all, you remain in control of the backend AI service—whether hosted or local.

A Unified Approach with TTMSFNCCloudAI

Here’s how you use it:

1. Describe an Image

delphi
TMSFNCCloudAI1.Files.Clear;
TMSFNCCloudAI1.AddFile(ImageFileName, aiftImage);
TMSFNCCloudAI1.Context.Clear;
TMSFNCCloudAI1.Context.Text := 'describe the picture';
TMSFNCCloudAI1.Execute;

Whether it’s a scenic photo or a complex chart, supported AI models can return a natural language summary of what’s in the image.Here is an example showing an amazing result, that it even detected a half readable bottle label and could correctly identify it as Jules Mumm champagne!
TMS Software Delphi Components

2. Compare Two Pictures

delphi
TMSFNCCloudAI1.Files.Clear;
TMSFNCCloudAI1.AddFile(ImageFileName1, aiftImage);
TMSFNCCloudAI1.AddFile(ImageFileName2, aiftImage);
TMSFNCCloudAI1.Context.Clear;
TMSFNCCloudAI1.Context.Text := 'compare the two pictures and describe the differences';
TMSFNCCloudAI1.Execute;

Ideal for visual regression tests, UI comparisons, or even spotting differences in scanned documents or maps. In our testing, the Claude LLM seemed to provide the most accurate and knowledgable answer.

TMS Software Delphi Components

3. Perform OCR (Optical Character Recognition)

delphi
TMSFNCCloudAI1.Files.Clear;
TMSFNCCloudAI1.AddFile(ImageFileName, aiftImage);
TMSFNCCloudAI1.Context.Clear;
TMSFNCCloudAI1.Context.Text := 'extract the text from the picture';
TMSFNCCloudAI1.Execute;

Forget hard-coded OCR libraries — just describe the task and let the LLM handle everything. Here the test performed was with a picture taken from the back of the "Delphi Component Design" by the late Danny Thorpe (I had the honor to meet a few times back in Scotts Valley). Here credits go to OpenAI that was not only extremely accurate but was also smart enough to see the two column layout and properly put the text under each other. Up till the ISBN number of the book, everything is correct. TMS Software Delphi Components

4. Summarize a Text File

delphi
TMSFNCCloudAI1.Files.Clear;
TMSFNCCloudAI1.AddFile(TextFileName, aiftText);
TMSFNCCloudAI1.Context.Clear;
TMSFNCCloudAI1.Context.Text := 'summarize this text for me in one paragraph';
TMSFNCCloudAI1.Execute;

Perfect for making sense of long reports, log files, or any dense document.

5. Translate Text

delphi
TMSFNCCloudAI1.Files.Clear;
TMSFNCCloudAI1.AddFile(TextFileName, aiftText);
TMSFNCCloudAI1.Context.Clear;
TMSFNCCloudAI1.Context.Text := 'translate the text to german';
TMSFNCCloudAI1.Execute;

Build multilingual applications with just a few lines of Delphi code.

Abstracting the Complexity

One of the biggest strengths of TTMSFNCCloudAI is abstraction. You don't need to learn every provider's API or worry about changing your code when switching services. The interface stays the same. Just configure your model and endpoint.

This allows developers to:

Prototype with OpenAI, then move to Claude for privacy
Use local models with Ollama during development
Compare results from Gemini or Grok with just a config change

Vision Models Required

Note: Some providers require specific models that support image understanding. For example:

Ollama: Only models like llava or bakllava support vision
Grok and Mistral: Need to be paired with multimodal-capable backends
Claude, OpenAI (GPT-4o), and Gemini Pro Vision support image input natively

Always ensure the model you choose understands the data type you're sending.

A Future-Proof Way to Integrate AI

With TTMSFNCCloudAI, you're not locked into one vendor or use case. You build once, and switch as needed. The multimodal revolution is here, and Delphi developers now have a first-class way to participate.

Start experimenting. Start integrating. Start building smarter Delphi apps today.

Explore TTMSFNCCloudAI and redefine how your applications interact with the world.

In upcoming articles, we’ll dive deeper into RAG, agents, MCP servers & clients.
If you have an active TMS ALL-ACCESS license, you can now get also access to the first test version of TMS AI Studio that uses the TTMSFNCCloudAI component but also has everything on board to let you build MCP servers and clients.
Register now to participate in this testing via this landing page.

Bruno Fierens

Add AI superpower to your Delphi & C++Builder apps part 1

Add AI superpower to your Delphi & C++Builder apps part 2: function calling

Add AI superpower to your Delphi & C++Builder apps part 3: multimodal LLM use

Add AI superpower to your Delphi & C++Builder apps part 4: create MCP servers

Add AI superpower to your Delphi & C++Builder apps part 5: create your MCP client

Add AI superpower to your Delphi & C++Builder apps part 6: RAG

This blog post has received 2 comments.

1. Friday, May 23, 2025 at 2:33:55 PM

WoW great! .. will there be any examples with Assistant AI?

Carlomagno Antonello

2. Friday, May 23, 2025 at 2:52:37 PM

Yes, working on it

Bruno Fierens

All Blog Posts | Next Post | Previous Post

Explore All Products

Blog

Add AI superpower to your Delphi & C++Builder apps part 3: multimodal LLM use

Friday, May 23, 2025

Embracing Multimodal LLMs in Delphi: Describe, Compare, Extract, Summarize, Translate — All in One

Why Multimodal Matters

A Unified Approach with TTMSFNCCloudAI

1. Describe an Image

2. Compare Two Pictures

3. Perform OCR (Optical Character Recognition)

5. Translate Text

Abstracting the Complexity

Vision Models Required

A Future-Proof Way to Integrate AI

This blog post has received 2 comments.

Add a new comment

Blog Search

Explore All Products

Blog

Add AI superpower to your Delphi & C++Builder apps part 3: multimodal LLM use

Friday, May 23, 2025

Embracing Multimodal LLMs in Delphi: Describe, Compare, Extract, Summarize, Translate — All in One

Why Multimodal Matters

A Unified Approach with TTMSFNCCloudAI

1. Describe an Image

2. Compare Two Pictures

3. Perform OCR (Optical Character Recognition)

5. Translate Text

Abstracting the Complexity

Vision Models Required

A Future-Proof Way to Integrate AI

Related Blog Posts

This blog post has received 2 comments.

Add a new comment