Blog
All Blog Posts | Next Post | Previous Post
Connect to llama.cpp servers from Delphi with TMS AI Studio v1.6
Today
We are pleased to announce the release of TMS AI Studio v1.6.0.0, introducing support for llama.cpp as a service in TTMSMCPCloudAI. With this update, you can seamlessly connect to a llama.cpp server and integrate it into your existing AI workflow, without changing your components or development model!
The Advantages of Running Models Locally
Local AI is no longer limited to huge models or nothing. Smaller and specialized models have improved significantly. For many application scenarios like summarization, rewriting, extraction, classification, or domain-specific assistance a smaller model can be surprisingly capable, especially when combined with good prompt design and tool integration.
Running models locally is becoming a practical and strategic choice, because the benefits are clear:
- Cost: Cloud AI services are powerful, but usage-based pricing can be difficult to predict in real applications, especially once AI features are used frequently by end users. Running models locally can eliminate these costs entirely and makes AI usage easier to scale without surprises.
- Privacy and control: Many applications handle sensitive user data, such as documents, internal notes, customer information, or proprietary knowledge. Even with trusted cloud providers, some customers and industries require that prompts and context never leave the local machine or internal network.
- Offline: This matters not only for fully disconnected environments, but also for reliability. A local model keeps working even if a service is down, rate-limited, or temporarily unreachable.
Ollama or llama.cpp
Ollama uses llama.cpp internally, and TTMSMCPCloudAI already supports Ollama as an AI service. So why add llama.cpp as a separate service and which one is better?
Our goal is simple: provide flexibility. There is no universal better option. Ollama focuses on convenience and ease of setup while llama.cpp offers deeper configurability and often better performance.
Both Ollama and llama.cpp can run models purely on the CPU, without requiring a dedicated GPU. For many smaller workloads, this is already sufficient and avoids additional hardware investment. With a consumer GPU, models can run at least 5× faster, making local hosting suitable for chat, coding assistance, and larger workloads.
We ran some models ourselves on our Windows office machines to better understand the differences and see how far someone can get with avarage hardware. We used the prebuilt llama.cpp binaries and observed the following patterns:
| CPU (Intel i7-9700) | GPU (AMD 7900 XT) | |
| Speed winner | llama.cpp | No consistent winner |
| Speed difference | llama.cpp consistently 24% faster than Ollama | 1330% difference either way |
| Stability | llama.cpp stable; Ollama occasional freezes | llama.cpp stable; Ollama ocassional errors |
| Impact of output length | Longer outputs reduce speed in both llama.cpp and Ollama | No impact |
The differences largely depend on the hardware, workload, and configuration. Systems running Linux with an NVIDIA GPU might deliver better results due to driver maturity and broader optimization support. However, the safest approach is to test both in your own environment and choose the one that fits your setup best!
Conclusion
By supporting multiple local runtimes, TMS AI Studio gives you greater control over deployment decisions, helping you address key requirements such as cost, data privacy, offline availability, and infrastructure independence. You can continue using the same workflow and component interface while gaining even more flexibility in how and where your models run.
As local AI continues to evolve, our goal remains the same: providing developers with practical, flexible, and production-ready tools to integrate AI into real-world applications.
Tunde Keller
Related Blog Posts
-
Add AI superpower to your Delphi & C++Builder apps - Part 1: intro
-
Add AI superpower to your Delphi & C++Builder apps - Part 2: function calling
-
Add AI superpower to your Delphi & C++Builder apps - Part 3: multimodal LLM use
-
Add AI superpower to your Delphi & C++Builder apps - Part 4: create MCP servers
-
Add AI superpower to your Delphi & C++Builder apps - Part 5: create your MCP client
-
Add AI superpower to your Delphi & C++Builder apps - Part 6: RAG
-
Introducing TMS AI Studio: Your Complete AI Development Toolkit for Delphi
-
Automatic invoice data extraction in Delphi apps via AI
-
AI based scheduling in classic Delphi desktop apps
-
Voice-Controlled Maps in Delphi with TMS AI Studio + OpenAI TTS/STT
-
Creating an n8n Workflow to use a Logging MCP Server
-
Supercharging Delphi Apps with TMS AI Studio v1.2 Toolsets: Fine-Grained AI Function Control
-
AI-powered HTML Reports with Embedded Browser Visualization
-
Additional audio transcribing support in TMS AI Studio v1.2.3.0 and more ...
-
Introducing Attributes Support for MCP Servers in Delphi
-
Using AI Services securely in TMS AI Studio
-
Automate StellarDS database operations with AI via MCP
-
TMS AI Studio v1.4 is bringing HTTP.sys to MCP
-
Windows Service Deployment Guide for the HTTP.SYS-Ready MCP Server Built with TMS AI Studio
-
Extending AI Image Capabilities in TMS AI Studio v1.5.0.0
-
Try the Free TMS AI Studio RAG App
-
Connect to llama.cpp servers from Delphi with TMS AI Studio v1.6
This blog post has not received any comments yet.
All Blog Posts | Next Post | Previous Post