Blog

All Blog Posts | Next Post | Previous Post

Connect to llama.cpp servers from Delphi with TMS AI Studio v1.6

Tuesday, February 24, 2026

We are pleased to announce the release of TMS AI Studio v1.6.0.0, introducing support for llama.cpp as a service in TTMSMCPCloudAI. With this update, you can seamlessly connect to a llama.cpp server and integrate it into your existing AI workflow, without changing your components or development model!

The Advantages of Running Models Locally

Local AI is no longer limited to “huge models or nothing”. Smaller and specialized models have improved significantly. For many application scenarios like summarization, rewriting, extraction, classification, or domain-specific assistance a smaller model can be surprisingly capable, especially when combined with good prompt design and tool integration.

Running models locally is becoming a practical and strategic choice, because the benefits are clear:

Cost: Cloud AI services are powerful, but usage-based pricing can be difficult to predict in real applications, especially once AI features are used frequently by end users. Running models locally can eliminate these costs entirely and makes AI usage easier to scale without surprises.
Privacy and control: Many applications handle sensitive user data, such as documents, internal notes, customer information, or proprietary knowledge. Even with trusted cloud providers, some customers and industries require that prompts and context never leave the local machine or internal network.
Offline: This matters not only for fully disconnected environments, but also for reliability. A local model keeps working even if a service is down, rate-limited, or temporarily unreachable.

Ollama or llama.cpp

Ollama uses llama.cpp internally, and TTMSMCPCloudAI already supports Ollama as an AI service. So why add llama.cpp as a separate service and which one is better?

Our goal is simple: provide flexibility. There is no universal “better” option. Ollama focuses on convenience and ease of setup while llama.cpp offers deeper configurability and often better performance.

Both Ollama and llama.cpp can run models purely on the CPU, without requiring a dedicated GPU. For many smaller workloads, this is already sufficient and avoids additional hardware investment. With a consumer GPU, models can run at least 5× faster, making local hosting suitable for chat, coding assistance, and larger workloads.

We ran some models ourselves on our Windows office machines to better understand the differences and see how far someone can get with avarage hardware. We used the prebuilt llama.cpp binaries and observed the following patterns:

	CPU (Intel i7-9700)	GPU (AMD 7900 XT)
Speed winner	llama.cpp	No consistent winner
Speed difference	llama.cpp consistently 24% faster than Ollama	13–30% difference either way
Stability	llama.cpp stable; Ollama occasional freezes	llama.cpp stable; Ollama ocassional errors
Impact of output length	Longer outputs reduce speed in both llama.cpp and Ollama	No impact

The differences largely depend on the hardware, workload, and configuration. Systems running Linux with an NVIDIA GPU might deliver better results due to driver maturity and broader optimization support. However, the safest approach is to test both in your own environment and choose the one that fits your setup best!

Conclusion

By supporting multiple local runtimes, TMS AI Studio gives you greater control over deployment decisions, helping you address key requirements such as cost, data privacy, offline availability, and infrastructure independence. You can continue using the same workflow and component interface while gaining even more flexibility in how and where your models run.

As local AI continues to evolve, our goal remains the same: providing developers with practical, flexible, and production-ready tools to integrate AI into real-world applications.

Tunde Keller

Add AI superpower to your Delphi & C++Builder apps - Part 1: intro

Add AI superpower to your Delphi & C++Builder apps - Part 2: function calling

Add AI superpower to your Delphi & C++Builder apps - Part 3: multimodal LLM use

Add AI superpower to your Delphi & C++Builder apps - Part 4: create MCP servers

Add AI superpower to your Delphi & C++Builder apps - Part 5: create your MCP client

Add AI superpower to your Delphi & C++Builder apps - Part 6: RAG

Introducing TMS AI Studio: Your Complete AI Development Toolkit for Delphi

Automatic invoice data extraction in Delphi apps via AI

AI based scheduling in classic Delphi desktop apps

Voice-Controlled Maps in Delphi with TMS AI Studio + OpenAI TTS/STT

Creating an n8n Workflow to use a Logging MCP Server

Supercharging Delphi Apps with TMS AI Studio v1.2 Toolsets: Fine-Grained AI Function Control

AI-powered HTML Reports with Embedded Browser Visualization

Additional audio transcribing support in TMS AI Studio v1.2.3.0 and more ...

Introducing Attributes Support for MCP Servers in Delphi

Using AI Services securely in TMS AI Studio

Automate StellarDS database operations with AI via MCP

TMS AI Studio v1.4 is bringing HTTP.sys to MCP

Windows Service Deployment Guide for the HTTP.SYS-Ready MCP Server Built with TMS AI Studio

Extending AI Image Capabilities in TMS AI Studio v1.5.0.0

Try the Free TMS AI Studio RAG App

Connect to llama.cpp servers from Delphi with TMS AI Studio v1.6

This blog post has not received any comments yet.

All Blog Posts | Next Post | Previous Post

Explore All Products

Blog

Connect to llama.cpp servers from Delphi with TMS AI Studio v1.6

Tuesday, February 24, 2026

The Advantages of Running Models Locally

Ollama or llama.cpp

Conclusion

This blog post has not received any comments yet.

Add a new comment

Blog Search

Explore All Products

Blog

Connect to llama.cpp servers from Delphi with TMS AI Studio v1.6

Tuesday, February 24, 2026

The Advantages of Running Models Locally

Ollama or llama.cpp

Conclusion

Related Blog Posts

This blog post has not received any comments yet.

Add a new comment