Blog

All Blog Posts  |  Next Post  |  Previous Post

TMS WEB Core and More with Andrew:
OpenAI's ChatGPT and Friends - Part 1 of 2

Bookmarks: 

Tuesday, April 18, 2023

Photo of Andrew Simard
The biggest topic in the technology space of late, now that the drama over Elon Musk's purchase of Twitter has calmed down to some degree, is of course ChatGPT. In this pair of posts, we're going to have a look at how to incorporate ChatGPT into our TMS WEB Core projects - by way of TMS XData. In this first post, we'll have a look at currently the most popular (and accessible) variant - ChatGPT 3.5. In the second post, we'll also have a look at the options for generating images, as well as how we might store and retrieve this kind of content from a database.


Motivation.

As with any new technology, there is a general FOMO (fear of missing out) - the latest trend, whatever it may be, may actually be a game-changer - and nobody wants to be left behind. We've seen examples of this happening before, so it isn't something to take lightly. Many of us might remember just how very long it took for Bill Gates to realize that "the internet" was a thing. Or how the CEOs of Microsoft (again!) or Motorola scoffed very publicly at Steve Jobs' introduction of the iPhone. There are many examples where something new was not given its due. Only to have it come around and upend the status quo. Ignoring ChatGPT might risk doing exactly that.

On the other hand, AI as a topic has been around for a good long time. I personally recall a period in the late 80s when I first learned of the CYC Project, which was already well on its way. Some 40 years after it was first founded, it has made great progress but isn't really getting the same level of attention anymore. The idea then was that AI was always 20 years away from whatever point we were currently at, in no small part because we didn't really know enough about the topic to make a claim any bolder than that. 

More recently, say in the past ten years, something changed. First, nearly everything with an algorithm, and now, anything with a transistor in it, was tagged with an AI label and pushed as the next big thing. Most recently, folks are now talking about "AGI" as the lofty next goal that we're about to overcome, as the now-old-news "AI" has been accomplished. Makes you wonder what is really going on.

ChatGPT and its friends are certainly interesting and potentially a lot of fun to play with. And maybe even useful in some instances. But it isn't the AI that we were thinking about long ago, and it certainly isn't the AGI that people are talking about today. Less generous reviewers tend to categorize current developments as "expert text predictors" which is probably right in the middle, between accurate and unfair. These new systems are certainly worthy of attention. But, like any tool, whether they are of any value largely depends on how they are used. 

And it is still early days to make any declaration as to whether Microsoft's investment of additional billions of dollars into OpenAI (the creators of ChatGPT) will generate the out-sized returns they are after, whether it is the latest play in their "embrace, extend, and extinguish" business model, or perhaps whether it is just another in a long series of missteps. No doubt, there are supporters and detractors of every scenario.

The Basics.

A little bit of terminology to help get us up to speed on a few things. This is a large topic. Just a bit of lingo so we're not tripping over ourselves later.

  • GPT. Stands for generative pre-trained transformer. Catchy, no? Perhaps a fancy AI can come up with a more interesting name. In the meantime, the idea here is that a GPT of any kind is one that is capable of producing proper language output, free from grammatical and spelling errors. This is where the "expert text predictor" categorization comes from - it really is an advanced version of what is used to predict text as you write a text message or an email.
  • Models. There are several GPT-related projects that have been created. The technology within them is generally referred to as a "model" and given a name. ChatGPT-3, ChatGPT-3.5, ChatGPT-4, and DALL-E (images) are examples of some of the models produced by OpenAI. Each has specific rules that apply, but they take input of some kind and produce an output of some kind. We'll cover some of the differences between these as they come up.
  • Training. The process of feeding data into a model that it later uses to do its work is called "training" the model. This is a surprisingly computationally intensive task, often requiring a small army of GPU-laden computers to crunch through the process. In some instances, this is a one-off job, where there is a data file generated that can then be used for all subsequent requests. When new knowledge is to be added, this training step typically needs to be repeated. Not good.
  • Fine-Tuning. Some models, specifically ChatGPT-3, have had the ability to add new data to their systems, without having to go through the whole training process. While this might seem like a good solution to the next problem, and to the problem of having to run the training process on a huge amount of hardware, this fine-tuning option hasn't been added to the later ChatGPT variants, and they no longer learn from any of the data passing through them. Also not good.
  • Knowledge Cut-Off. Perhaps the most telling attribute that suggests we're not quite "there" in terms of AI is that thus far they only know about "stuff" up until a certain point - typically in the vicinity of September 2021 in the case of ChatGPT and its closest relatives. This means that if we ask about anything that occurred after that, the model will happily report that it doesn't know about current events. Or worse, it won't report that and will instead happily present us with demonstrably false information. This is an enormous problem that needs to be addressed before any of this stuff can be taken too seriously.
  • Tokens. When exchanging data with a model, it may seem as if we're working entirely in the natural language world. And from the perspective of the user, we certainly are. Look at one level lower, however, and all of that text is represented as tokens. And we probably wouldn't care about that very much. However, ChatGPT services are not free. Instead, fees are levied based on the number of tokens being exchanged. This is roughly equivalent to the number of words of text, but not specifically that. OpenAI puts the estimate at around 1,000 tokens for 750 words. There may also be limits to the number of tokens that can be exchanged in a conversation - typically 2048 or 4096 tokens, depending on the model. Seems like a lot, right?
  • Contextual. This is a term that refers to how the model is able to carry on a conversation based on prior parts of a conversation. If you were to ask a question about a building, and then ask "How tall is it?", the model will look back in the conversation to figure out what "it" is and will realize the conversation was about a specific building, and then it will tell you how tall that building is. But alas, this seemingly clever bit requires that we pass the rest of the preceding conversation back to the model each time. The API, much like a traditional web server, is stateless. Suddenly, those tokens are in very limited supply. And we end up paying for them multiple times over in the same conversation. Not a fan, I must say. More on that later, though.

There is of course a lot more to the story, but this is all we need to get moving. And while there might be a slightly negative undertone detectable in those descriptions, once we get a little further along we'll be having so much fun we likely won't care! This is also problematic - our new shiny toy is perhaps so very shiny that we might forget that it really isn't all that intelligent after all. As a result, we now have lawsuits related to the spread of AI-generated misinformation, and even places (like Italy) where it has been outright banned.

The Setup.

Getting started with ChatGPT is not difficult conceptually. There is a REST API server provided by OpenAI where we can post requests and get responses. It works like any other REST API, using an API Key that we can get (initially for free) from the OpenAI website. But as with any such API key, we absolutely do not want to embed it directly in our TMS WEB Core client project or make it available to our web client directly, even if it is encoded in some way.  Anyone with access to the application could very quickly extract the API Key and then be merrily on their way, running up our bill - or just using up our API token allocation for the month.

To get around this, we'll instead set up an XData server to be a proxy service - it will pass requests between our TMS WEB Core client application and the OpenAI REST API server. It can then insert the API key, securely, protected from the prying eyes of anyone using our client application. And, while we're at it, the XData application can also log the activity and potentially contribute something more along the way.

The first thing we need, then, is our own OpenAI API key. Visit https://openai.com and click on the Developers | Overview link at the top. Then either Login or Sign Up (if this is a first visit). From the Accounts menu (user icon in the top-right corner) there is a "View API Keys" menu option. This leads to a page where new API keys can be generated. As is often the case, any keys generated here are shown initially but are then not available after, so make sure to copy the API key and paste it into a safe place elsewhere. Keys can also be revoked and reissued. The documentation here suggests that they will revoke any keys that they find out in "the wild" (this is a good thing!) but they don't elaborate on this, so be careful where these are stored.

While we're on the OpenAI site, we can also track down our Organization ID, if you've signed up as an organization.  This is another key that is passed to the REST API to indicate where the costs for the requests are to be allocated if you happen to be working with several organizations. Also, the "API Reference" link at the top of the page gives plenty of examples of how to pass REST API requests, using curl as a starting point. Here, we can see the typical API Key using a "Bearer" token and the various other bits of data that might need to be included. Worth strolling through to see what's available. We're primarily after the "Chat" and "Images" endpoints, but there are several others available as well.

In this documentation, you will find the same advice about the API Keys again - don't add them to anything at all in your TMS WEB Core client project. Instead, they recommend creating a back-end service to look after securely handling API Keys, which is effectively what we're doing with TMS XData. 

The TMS XData Template Demo Data Project.

On the XData side, we're going to add a new service to our TMS XData Template Demo Data project (see this blog post for more information). This isn't really much different than starting a new XData project, just that we've already got a lot of the bases covered in terms of our own authentication mechanisms and so on. If you've got your own XData project that you'd like to use instead, it will likely work just as well - most of what is of interest here is the code that goes into each service endpoint. And these can (hopefully) be easily replicated in any other XData application. Here, we're just taking a few shortcuts that we've covered extensively before to help move things along a little quicker. 

One of the ways we're doing that is by adding a bit of information into a configuration JSON file that XData has been set up to load when the application first starts. This is where we can put our API key and other information that our XData server can use to provide this service. These values could be passed directly as parameters, by using INI files, by adding keys to the Windows Registry, or via any other means that you'd like to use to pass data into an XData application. In this case, these values could even be hard-coded as they might not change often, and may very well be the same when moving between development and production environments. 

For our example, we'll assume that they might be different, and thus we'll drop them into the JSON file for XData to pick up when it first starts. We can always just use the same values in both cases. Our XData application also has a CONFIG parameter where we can specify the location of this JSON file. Here's what this might look like (the values will be different for your installation, and your API Keys, of course - these are just examples). As we get into some of the more involved aspects, this list of elements may also need to be updated.


{
  "Chat Interface": {
    "Model":"gpt-3.5-turbo",
    "Organization":"org-1234rpR1abcdVuQp1m6qPXYZ",
    "API Key":"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",
    "Chat Endpoint":"https://api.openai.com/v1/chat/completions",
    "Image Endpoint":"https://api.openai.com/v1/images/generations"
  }
}


Now, when our XData server starts, these values will be available. If we can't find them, then our service endpoints should return a message of some kind indicating as much. For our new service, let's call it ChatService. We can add it the same way we normally add new services to XData - using the File | New... | TMS XData | TMS XData Service and then entering "Chat" for the service name. We'll end up with a new ChatService.pas interface unit and a new ChatServiceImplementation.pas implementation unit in our application. In our Template project, we've been putting these into a "units" folder off the main project folder. If we run our project now, we'll be prompted to save the new units, so we'll save them there. Doesn't matter particularly where they are stored, however.

Next, we'll have to consider what service endpoints we're interested in. Naturally, we'll want them all to be authorized, as we don't want this open to the public, due to cost concerns. While working on the project on a development system, though, leaving off the [Authorize] attribute makes life a little easier for testing, perhaps using Swagger for example. The first function that comes to mind is "chat" of course - we'll need to pass in whatever the chat question is (potentially the rest of the chat as well) and return the next part of the conversation. 

There is also an option to generate an image based on text input, so we'll want to plan for that. The image service has a couple of parameters - the number of images and their size (256x256, 512x512, or 1024x1024) so we'll have to factor those in somehow. Regardless, we'll want to return whatever the chat service gives us - potentially an arbitrarily large amount of JSON. So let's do what we've been doing previously - return a TStream in these cases.

Chat Endpoint.

For our first endpoint, we'll need the text - our initial part of the conversation - which we'll pass using the parameter "conversation". We'll also need the history of the conversation, potentially, if we want our conversation to include contextual content, so we'll call that parameter "context". An extra parameter, ChatID, will be used to help us know when the chat is ongoing. We want to ultimately store the chat in a database but don't want to keep the intermediate parts of the conversation. We can use this value to overwrite the previous entry as the conversation proceeds. In the ChatService.pas interface unit, what we're after is this.

unit ChatService;

interface

uses
  System.Classes,
  XData.Security.Attributes,
  XData.Service.Common;

type
  [ServiceContract]
  IChatService = interface(IInvokable)
    ['{F95E651C-DA7A-4997-8630-314587B1857F}']

    [Authorize] [HttpGet] function Chat(Conversation:String; Context: String; ChatID: String):TStream;
  end;

implementation

initialization
  RegisterServiceType(TypeInfo(IChatService));

end.


Over to the implementation. Initially, the process is simple enough. We'll need to check that we have enough information to process the request (an API Key and so on). We'll set up a request to the OpenAI server using the credentials that we've loaded from the JSON configuration file. We'll need to add the data from the endpoint parameters. And then we'll return whatever comes back as the "result" value - JSON returned as a TStream. 

Note that in XData, each call to an endpoint generates a new thread. Here, in this thread, we're contacting OpenAI by opening a new HTTPS connection, and we'd like to wait here until it responds, if possible, having the endpoint return only when it has data, or when an error has occurred. Let's give this a try to start with.

function TChatService.Chat(Conversation, Context: String; ChatID: String): TStream;
var
  Client: TNetHTTPClient;
  Response: String;
  Request: TStringStream;
begin

  // Do we have enough information to process this request?
  if (MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONObject) = nil
  then raise EXDataHttpUnauthorized.Create('Chat Service is Unavailable.');

  // Returning JSON, so flag it as such
  TXDataOperationContext.Current.Response.Headers.SetValue('content-type', 'application/json');

  Client := TNetHTTPClient.Create(nil);
  client.Asynchronous := False;
  Client.ConnectionTimeout := 60000; // 1 min
  Client.ResponseTimeout := 60000; // 1 min
  Client.ContentType := 'application/json';
  Client.SecureProtocols := [THTTPSecureProtocol.SSL3, THTTPSecureProtocol.TLS12];
  Client.CustomHeaders['Authorization'] := 'Bearer '+(((MainForm.AppConfiguration.getValue('Chat Interface') as TJSONObject).GetValue('API Key') as TJSONString).Value);
  Client.CustomHeaders['OpenAI-Organization'] := (((MainForm.AppConfiguration.getValue('Chat Interface') as TJSONObject).GetValue('Organization') as TJSONString).Value);

  Request := TStringStream.Create('{'+
    '"model":"'+(((MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONObject).GetValue('Model') as TJSONString).Value)+'",'+
    '"messages":[{"role":"user","content":"'+Conversation+'"}]'+
    '}');
  MainForm.mmInfo.Lines.Add(Request.DataString);
  try
    Response := Client.Post(
      ((MainForm.AppConfiguration.getValue('Chat Interface') as TJSONObject).GetValue('Chat Endpoint') as TJSONString).Value,
      Request
    ).ContentAsString;
   except on E: Exception do
     begin
       MainForm.mmInfo.LInes.Add('[ '+E.ClassName+' ] '+E.Message);
     end;
   end;

  Result := TStringStream.Create(Response);
  Request.Free;
end;

The trickiest part of this is likely to do with how to compose a Post call. In this example, we've gone down the TStream route primarily because the TStrings route didn't seem to work. Regardless of your choices here, the idea is that we're just submitting our request to the OpenAI API as JSON, and getting back a block of JSON with the response. 

In fact, we can already test this using Swagger. If we ask the question "Who is Anna Torv?" we'll get a response.  Note that this is a non-deterministic system, meaning that you can ask the same question over and over and get different responses each time. Hopefully just small variations, but potentially factually different responses. We're also pretty lax when it comes to checking the input value for "conversation". 

TMS Software Delphi  Components
Our First Chat!

Note carefully that the curl command at the top is describing our request being sent to XData, and XData itself issues its own internal HTTP Post request to OpenAI, returning the response from that call as the response to our XData call. A proxy, in effect. Also, note in the response that we can see those tokens accumulating. At $0.002 USD per 1,000 tokens, this isn't going to accumulate very quickly for a single developer (I don't think I've even hit $1 yet), but if hundreds of people start asking numerous questions, with potentially long inputs and long outputs, things can quickly get out of hand. Hence one of the reasons for all the API Key security considerations. 

With the endpoint working, we're off and running, ready to build something a little more interesting in our TMS WEB Core application. We'll come back to the other two parameters (Content and ChatID) once we have a better way of populating them.

Chat Interface.

While Swagger is great for testing APIs, it leaves a little to be desired in the UI department. Of course, we'd never expect anyone other than a developer to use Swagger. So we'll need a Chat UI of some kind. As we're already using the TMS XData Template Demo Data project for our example, we might as well create a Chat UI using the TMS WEB Core Template Demo project. There's (at least one) post about this project, which you can find here. And, as luck would have it, the AdminLTE Bootstrap/HTML template that this project is based on already has a chat-style UI as one of its examples.

The design of this project is a little more involved than the XData project, and we'll be relying on more of its existing structure than was the case with XData. Most of the UI is structured around "Dashboards" - separate pages (forms) where we can display a bit of information about a particular application subsystem, and then provide links, buttons, or other UI elements that are specific to a particular feature.

In this case, we want the chat feature to be broadly accessible, so we'll add an icon on the top menu bar. We'd also like some ability to customize this icon, so we'll use our usual icon mechanism to select a "robot" icon. For the name, let's add "ChatBotName" to the optional configuration JSON for this project. Note that this is completely separate from the configuration JSON used for the XData project. If it isn't provided, this will default to "Willard". We covered how to use a configuration JSON file with a TMS WEB Core project in this post.

The AdminLTE template that this project is based on has an example chat interface included in its example pages.  This essentially amounts to a collection of classes and <div> elements arranged a certain way so as to have chat bubbles, avatars, names, and timestamps arranged into something that looks like a chat window. There's even a bit of styling for the text input field and button. Note that this is just a template, however - there isn't any code provided to add chat messages or clear the window or anything like that.

The next step, then, is to fix up the text input field a little bit. We'd like it to be a multi-line input field, as there may well be more text that can easily fit on one line. And, naturally, the button needs an icon. We can pair up the HTML elements in the template to components on our form by assigning id values to the HTML template elements and adding corresponding components to our form with these same id values assigned to the ElementID property of the components. This time, we're using a TWebMemo paired with a <textarea> element and a TWebButton paired with a <button> element. 

To get the memo to grow with the text input, we'll need to have it resize explicitly. But we'll also want to apply a bit of CSS to ensure that things don't get out of hand. In our WebFormCreate method, we can set some of these values like this.

  // Adjustments to the chat input memo field
  memoChat.ElementHandle.setAttribute('rows','1');
  memoChat.ElementHandle.style.setProperty('min-height','38px');
  memoChat.ElementHandle.style.setProperty('max-height','250px');
  btnChatSend.Caption := DMIcons.icon('Send')+'Send';

  // Chat Window adjustments
  asm
    ChatWindow.style.setProperty('height','auto');
    ChatWindow.style.setProperty('min-height','300px');
    ChatWindow.style.setProperty('max-height','500px');
  end;

For the memo field itself, we can string together a series of Bootstrap classes to get it just right. This is what is in the ElementClassName property.

    fs-6 flex-fill px-2 me-2 border border-1 border-secondary rounded


And then in the template HTML we can make a few more adjustments for spacing and consistency.  And this is where the extra resizing bit comes in.

                      <!-- Chat Footer-->
                      <div class="card-footer p-2">
                        <div class="input-group d-flex">
                          <textarea id="memoChat" oninput="this.style.height = ''; this.style.height = this.scrollHeight+2 +'px'"></textarea>
                          <div class="input-group-append">
                            <button id="btnChatSend"></button>
                          </div>
                        </div>
                      </div>


For the rest of the Chat Window, we can of course get rid of the sample conversation from the AdminLTE template.  We could add a TWebHTMLDiv component to give us a Delphi component to work with, but we don't really need to set anything in it, so there's not much point in doing that. Instead, we can just make a few tweaks to the HTML template directly, like setting an ID value that we can reference directly later via JavaScript.

                      <div class="card-body p-1">
                        <!-- Conversations are loaded here -->
                        <div id="ChatWindow" class="direct-chat-messages">
                        </div>
                      </div>

There are three different kinds of messages we want to appear in the chat window. First, we want to display static "system" messages, like the welcome message or the waiting indicator message. These aren't tracked or used in any way - just little UI touches to make things run a little more smoothly. For these messages, we'll use this method.

procedure TChatStatisticsSubForm.AddStaticChatResponse(ChatMessage: String);
var
  ChatBlock: String;
begin
  ChatBlock := '<div class="direct-chat-msg">'+
               '  <div class="direct-chat-infos clearfix">'+
               '    <span class="direct-chat-name float-start">'+MainForm.App_ChatBotName+'</span>'+
               '    <span class="direct-chat-timestamp float-end">'+FormatDateTime('yyyy-MMM-dd hh:nn:ss',Now)+'</span>'+
               '  </div>'+
               DMIcons.Icon('Robot_Avatar')+
               '  <div class="direct-chat-text fs-6">'+
                    ChatMessage+
               '  </div>'+
               '</div>';
  asm
    ChatWindow.innerHTML += ChatBlock;
    ChatWindow.scrollTo({left: 0, top: 1000000, behaviour: "smooth"});
  end;
end;

All we're really doing is fabricating a block of HTML and appending it to the ChatWindow <div> from our template.  We're using the same name and icon as our avatar for the chatbot that we're using elsewhere in our UI. Whenever we add something to this chat window, we want to scroll the contents to the bottom. We will use this when the chat form is first created, to display an opening message of some kind. Just for fun, it randomly chooses from a set of options.

  // Initialize with a canned response of some kind.
  Openers[0] := 'Hello!  My name is '+Mainform.App_ChatBotName+'.  Please ask me anything you like.';
  Openers[1] := 'Greetings and salutations.  It is I, '+Mainform.App_ChatBotName+'.  How may I be of assistance?';
  Openers[2] :=  Mainform.App_ChatBotName+' here. How can I help you?';
  Openers[3] := 'I come in peace. Hahaha. Just kidding. What do you want to know?';
  Openers[4] := 'Bleep. Bleep. Bloop. Bleep.';
  // Bicentennial Man
  Openers[5] := 'One is glad to be of service.';
  // Aladdin
  Openers[6] := 'The ever impressive, the long contained, often imitated, but never duplicated... Genie of the lamp!';
  // Terminator 2
  Openers[7] := 'I have detailed files.';
  // The Matrix
  Openers[8] := 'I am the Architect. I created the matrix. I''ve been waiting for you. You have many questions, and although the process has altered your consciousness, you remain irrevocably human. Ergo, some of my answers you will understand, and some of them you will not. Concordantly, while your first question may be the most pertinent, you may or may not realize it is also irrelevant.';
  // Aliens
  Openers[9] := 'Well, that explains it then. The A2s always were a bit twitchy. That could never happen now with our behavioral inhibitors. It is impossible for me to harm or by omission of action, allow to be harmed, a human being.';

  AddStaticChatResponse(Openers[System.Random(10)]);

When we submit a request, it may take some time to complete. For a progress indicator, it would be nice to show a little animation. When someone at the other end of the chat is typing in iMessage on an iPhone, for example, three dots appear in a little animation at your end to let you know that they're working on a reply (maybe). There are a number of CSS-only animations that could be used for this. Here's a link to a few of them: https://codepen.io/nzbin/pen/GGrXbp. We've gone with the "dot-collision" option. This involves primarily just adding a bunch of CSS into our "template.css" file and then adding a <div> element that includes a class referring to that CSS. Like this.

    // Add a "typing" indicator
    AddStaticChatResponse('<div class="dot-collision"></div>');

Later, when the response comes back, we can remove this progress indicator from the ChatWindow <div> like this.

    // Remove the "typing" indicator
    asm ChatWindow.lastElementChild.remove(); end;

As for the CSS, it looks like this. We covered CSS animation in another blog post, which you can find here, if you're curious about the specifics and are not familiar with CSS animation already.


/* Fancy animation for a chat- waiting indicator */
/* https://codepen.io/nzbin/pen/GGrXbp */

.dot-collision {
  position: relative;
  width: 10px;
  height: 10px;
  border-radius: 5px;
  background-color: #9880ff;
  color: #9880ff;
  margin: 7px 0px 7px 20px;
}
.dot-collision::before, .dot-collision::after {
  content: "";
  display: inline-block;
  position: absolute;
  top: 0;
}
.dot-collision::before {
  left: -10px;
  width: 10px;
  height: 10px;
  border-radius: 5px;
  background-color: #9880ff;
  color: #9880ff;
  animation: dot-collision-before 2s infinite ease-in;
}
.dot-collision::after {
  left: 10px;
  width: 10px;
  height: 10px;
  border-radius: 5px;
  background-color: #9880ff;
  color: #9880ff;
  animation: dot-collision-after 2s infinite ease-in;
  animation-delay: 1s;
}

@keyframes dot-collision-before {
  0%, 50%, 75%, 100% {
    transform: translateX(0);
  }
  25% {
    transform: translateX(-15px);
  }
}
@keyframes dot-collision-after {
  0%, 50%, 75%, 100% {
    transform: translateX(0);
  }
  25% {
    transform: translateX(15px);
  }
}

Whenever we send a request, we'll need to add it to the ChatWindow <div>, as this makes up our half of the history of the chat. A similar approach is taken here, but we're using the name and image of the person currently logged in for the avatar. Also, when we add the message, we also set up an "addEventListener" function so that if the message is clicked, it is copied into the TWebMemo field again. For example, if you make a typo or want to change something, this saves having to retype it again. We'll also change the pointer to the "hand" pointer to make it a little more obvious that this is an available option.

procedure TChatStatisticsSubForm.AddChatMessage(ChatMessage: String);
var
  ChatBlock: String;
begin
  ChatBlock := '<div class="direct-chat-msg end">'+
               '  <div class="direct-chat-infos clearfix">'+
               '    <span class="direct-chat-name float-end">'+MainForm.User_FirstName+' '+MainForm.User_LastName+'</span>'+
               '    <span class="direct-chat-timestamp float-start">'+FormatDateTime('yyyy-MMM-dd hh:nn:ss',Now)+'</span>'+
               '  </div>'+
               '  <img class="direct-chat-img" '+Copy(MainForm.User_Photo,5,length(MainForm.User_Photo))+'</img>'+
               '  <div class="direct-chat-text fs-6 cursor-pointer chatmessage">'+
                    ChatMessage+
               '  </div>'+
               '</div>';
  asm
    ChatWindow.innerHTML += ChatBlock;
    ChatWindow.addEventListener('click', (e) => {
      if (e.target.classList.contains('chatmessage')) {
        e.stopImmediatePropagation();
        memoChat.value = e.target.innerHTML;
        memoChat.style.height = '';
        memoChat.style.height = memoChat.scrollHeight+2 +'px';
      }
    });
    ChatWindow.scrollTo({left: 0, top: 1000000, behaviour: "smooth"});
  end;
end;


When we get a response back, we're going to do something similar again. Except for this time, we'll also include a value indicating how long it took to process the request. We're also doing a little bit of formatting here. If you make a request where a code block is returned (for example, "Write a JavaScript function to do X"), the response may include code surrounded with backticks (```). If this happens, we replace them with <p><pre></pre></p>.  Likewise for single backticks. To help out a little more, the <pre> tag was added to our template.css, changing the color and adjusting the display property to look a little nicer. An example of that can be seen in one of the screenshots below.


procedure TChatStatisticsSubForm.AddChatResponse(ChatMessage: String; QueryTime:TDateTime);
var
  ChatBlock: String;
  FormattedMessage: String;
  PreReplace: String;
begin
  FormattedMessage := ChatMessage;

  // Replace ``` with <p><pre> and </pre></p>
  PreReplace := '<p><pre>';
  while pos('```',FormattedMessage) > 0 do
  begin
    FormattedMessage := StringReplace(FormattedMessage,'```',PreReplace,[]);
    if PreReplace = '<p><pre>'
    then PreReplace := '</pre></p>'
    else PreReplace := '<p><pre>';
  end;

  // Replace ` with <pre> and </pre>
  PreReplace := '<pre>';
  while pos('`',FormattedMessage) > 0 do
  begin
    FormattedMessage := StringReplace(FormattedMessage,'`',PreReplace,[]);
    if PreReplace = '<pre>'
    then PreReplace := '</pre>'
    else PreReplace := '<pre>';
  end;

  ChatBlock := '<div class="direct-chat-msg">'+
               '  <div class="direct-chat-infos clearfix">'+
               '    <span class="direct-chat-name float-start">'+MainForm.App_ChatBotName+'</span>'+
               '    <span class="direct-chat-timestamp float-end">'+'('+FloatToStrF(MillisecondsBetween(QueryTime,Now)/1000,ffNumber,5,2)+'s) '+FormatDateTime('yyyy-MMM-dd hh:nn:ss',Now)+'</span>'+
               '  </div>'+
                  DMIcons.Icon('Robot_Avatar')+
               '  <div class="direct-chat-text fs-6">'+
               '  </div>'+
               '</div>';
  asm
    ChatWindow.innerHTML += ChatBlock;
    var ChatMessageWords = FormattedMessage.split(' ');
    for (var i = 1; i <= ChatMessageWords.length; i++) {
      ChatWindow.lastElementChild.lastElementChild.innerHTML = ChatMessageWords.slice(0,i).join(' ');
      ChatWindow.scrollTo({left: 0, top: 1000000, behaviour: "smooth"});
      await new Promise(r => setTimeout(r, Math.random() * 200));
    }
  end;
end;

For many chat user interfaces, there seems to be a bit of a common theme where the text is displayed slowly - as if the chatbot was typing out the response. This is a bit silly, as the response comes back as a complete block from the REST API call. But we don't want to be left out of the fun. In the last part of the above, the message is broken up into an array of words. Then, the words are revealed by a loop - replacing the text with an additional word added each time, along with a random very short delay. Not quite the same as a typewriter effect. More like a dictation effect perhaps.  

That about covers the UI portion for our first test. But what do we do when we click the "Send" button? All we're doing to start with is submitting the memoChat text to our XData server and waiting for a response. When it comes in, we'll post it as a new message. The work involved in calling the XData function is handled by a common "JSONRequest" function that we have in MainForm already. Here's what that looks like.

function TMainForm.JSONRequest(Endpoint: String; Params: array of JSValue): String;
var
  ClientConn: TXDataWebClient;
  Response: TXDataClientResponse;
  Blob: JSValue;
  ErrorCode: String;
  ErrorMessage: String;
  Elapsed: TDateTime;
begin
  Elapsed := Now;
  Result := '';
  LogAction('Requested: '+Endpoint, False);

  await(XDataConnect);
  if (XdataConn.Connected) then
  begin
    try
      ClientConn := TXDataWebClient.Create(nil);
      ClientConn.Connection := XDataConn;
      Response := await(ClientConn.RawInvokeAsync(Endpoint, Params));

      Blob := Response.Result;
      asm Result = await Blob.text(); end;

    except on E: Exception do
      begin
        // Get the error message we created in XData
        asm {
          var ErrorDetail = JSON.parse( await E.FErrorResult.FResponse.$o.FXhr.response.text() );
          ErrorCode = ErrorDetail.error.code;
          ErrorMessage = ErrorDetail.error.message;
        } end;

        // Log the error, but leave out the URI (because it includes the password)
        LogAction('ERROR Request Exception Received From'+Endpoint, False);
        LogAction(' --> ['+E.ClassName+']', False);
        LogAction(' --> '+Copy(E.Message,1,Pos('Uri:',E.Message)-2), False);
        LogAction(' --> '+Copy(E.Message,Pos('Status code:',E.Message),16), False);
        LogAction(' --> '+ErrorCode, False);
        LogAction(' --> '+ErrorMessage, False);

        // Will tamp these down a bit once we get a better feel for the kinds of errors
        // that come up regularly.
        Toast(DMIcons.Icon('Bug_Menu')+'Unexpected Error',
          '[ '+E.ClassName+' ] '+Endpoint+'<br />'+
          Copy(E.Message,1,Pos('Uri:',E.Message)-2)+'<br />'+
          Copy(E.Message,Pos('Status code:',E.Message),16)+'<br />'+
          ErrorCode+'<br />'+
          ErrorMessage
        ,45000);

      end;
    end;
  end;

  LogAction('Responded: '+Endpoint+' ('+IntToStr(MillisecondsBetween(Now, Elapsed))+'ms)', False);
  PreventCompilerHint(Blob);
end;

Most of that is actually related to error handling. The gist of it, though, is that we're calling an endpoint and passing it parameters. Here's what it looks like when we call it from our button click event.

procedure TChatStatisticsSubForm.btnChatSendClick(Sender: TObject);
var
  i: Integer;
  MessageString: String;
  ResponseString: String;
  ResponseJSON: TJSONObject;
  QueryTime: TDateTime;
begin

  // Get a string that has \n for line endings
  MessageString := '';
  for i := 0 to memoChat.Lines.Count -1 do
  begin
    if Trim(memoChat.lines[i]) <> ''
    then MessageString := MessageString + Trim(memoChat.lines[i])+ ' \n';
  end;
  if RightStr(MessageString,3) = ' \n'
  then MessageString := Copy(MessageString,1,length(MessageString) - 3);

  // If we've got something to send, then let's send it
  if (Length(MessageString)) >= 5 then
  begin

    // Add new request to chat window
    AddChatMessage(memoChat.Lines.Text);

    // Clear out text and get ready for next message entry
    memoChat.Lines.Text := '';
    memoChat.SetFocus;
    memoChat.ElementHandle.style.setProperty('height','38px');
    asm
      setTimeout(function() {
        memoChat.value = '';
        memoChat.style.height = '38px';
      },100);
    end;

    // Add a "typing" indicator
    AddStaticChatResponse('<div class="dot-collision"></div>');

    // Submit request
    QueryTime := Now;
    ResponseString := await(MainForm.JSONRequest('IChatService.Chat',[
      MessageString, // Conversation
      '', // Context
      ''  // ChatID
    ]));

    // Remove the "typing" indicator
    asm ChatWindow.lastElementChild.remove(); end;

    // Check that we've got something we can use
    if ResponseString <> '' then
    begin
      ResponseJSON := TJSONObject.ParseJSONValue(ResponseString) as TJSONObject;
      // Long way to check that we've got JSON in an expected structure
      if (ResponseJSON <> nil) and
         ((ResponseJSON.GetValue('choices')) <> nil) and
         ((ResponseJSON.GetValue('choices') as TJSONArray)[0] <> nil) and
         (((ResponseJSON.GetValue('choices') as TJSONArray)[0] as TJSONObject).GetValue('message') <> nil) and
         ((((ResponseJSON.GetValue('choices') as TJSONArray)[0] as TJSONObject).GetValue('message') as TJSONObject).GetValue('content')  <> nil) then
      begin
        // Add response to the ChatWindow
        AddChatResponse(((((ResponseJSON.GetValue('choices') as TJSONArray)[0] as TJSONObject).GetValue('message') as TJSONObject).GetValue('content') as TJSONString).value, QueryTime);
      end;
    end;

  end;
end;

Lots going on. We're taking our message, applying a bit of light formatting, sending it off, and then posting the response. All the JSON messiness is just checking that the structure exists in the format that we expect. If anything goes amiss, nothing will be posted (at this stage) but, hopefully, no errors will be generated either. Lots of things can go wrong (too many tokens, invalid requests, server down, XData down, etc.), so a little extra caution isn't a bad idea. But all of our work has paid off. Here's what we've got for an interface.

TMS Software Delphi  Components
Chatting About TMS WEB Core and JavaScript Coding.

That's our first cut of this interface. Not too bad. Everything works and flows along as expected. The delay in responses is entirely dependent on the complexity of the question. Quick lookups are, well, quick. More complex questions can certainly be answered but may take as long as a minute in some of the more exotic cases. Requests can include all kinds of additional information, formatting instructions, and so on. Sometimes it might take a few tries. And the response to the same question might very well be different the next time it is posted. And, of course, accuracy is anywhere from "amazing" to "highly suspect" depending on the nature of the material. 

Here's an example where it does a pretty good job, but I would have thought Episode III would have been included as well. The point, though, isn't about the content but rather how it is formatted - it knows about lists and can display some nice formatting when it is in the mood to do so. This one took around 15 seconds to answer. Usually, the responses come much more quickly than this. Formatting lists, sorting, and so on, add significant delays to the time taken to generate a response.

TMS Software Delphi  Components
Formatting of HTML Lists.

There is no end to the examples that can be used for this kind of thing. Part of the fun with this project is, naturally, trying to come up with questions that can trip up the algorithm in some way. It is surprisingly good in many respects, but the lack of current knowledge is a problem. We'll ignore that for the time being and get on to a few other tasks.

Context.

With what we've done so far, each request that is sent off just includes the question - the rest of the conversation (the context of the conversation) is not included. To address that, we'll have to keep track of what is going back and forth and include those entries in our request. This will result in a great many more tokens being used as our conversation starts to lengthen, so we'll also have to keep an eye on that so as to leave enough space for a response to come back. 

The token limit includes our inputs, context, as well as output. If we were to use up all but 10 of the tokens for our input and context, for example, our response would come back clipped at 10 tokens. Not really ideal. The return value includes an indication as to whether the full response was returned or whether it was abbreviated or impacted in some other way. So we'll need to give that a bit of attention as well - perhaps a project for another day.

Another consideration is that there is potentially more than one model available. Let's adjust the configuration JSON in XData to accommodate this, and also include more information about the tokens for each model - the cost and the limit. We'd also very much like to include the "image" model, which has additional options relating to the size and number of images returned. And, once these models have been configured on the server, they need to somehow make an appearance in the client. To deal with that, we'll need to make an initial request, which we can also use to serve up a few statistics and other information for our page. Lots of little things, it turns out.

First, let's adjust the server's configuration JSON. Instead of having the "Chat Interface" object point at one model, it will instead point to an array of models, all with largely the same elements as before. We'll add a new "Name" element that we can then use in the client, displayed as a dropdown that the user can select from. When the XData server starts, we can also display a list of the available models in the same startup sequence that we display the rest of the XData configuration information, in Unit2.

  // Are chat services available?
  if (AppConfiguration.GetValue('Chat Interface') as TJSONArray) = nil
  then mmInfo.Lines.Add('...Chat: UNAVAILABLE')
  else
  begin
    mmInfo.Lines.Add('...Chat:');
    i := 0;
    while i < (AppConfiguration.GetValue('Chat Interface') as TJSONArray).Count do
    begin;
      mmInfo.Lines.Add('        '+(((AppConfiguration.GetValue('Chat Interface') as TJSONArray).items[i] as TJSONObject).getValue('Name') as TJSONString).Value);
      i := i + 1;
    end;
  end;

We'll need another service endpoint to be able to make this information available to the client. We'll start with the list of models available but we'll add in some statistics and other bits shortly. For the list of models, we'll also want to highlight the default model. The order is set by the JSON order (it is an array, so the order is maintained), but we'll add a flag to indicate which one is the default model in case it isn't the first one.

function TChatService.GetChatInformation: TStream;
var
  i: Integer;
  Response: TJSONObject;
  Models: TJSONArray;
begin
  // Returning JSON, so flag it as such
  TXDataOperationContext.Current.Response.Headers.SetValue('content-type', 'application/json; charset=utf-8');

  // Emtpy to begin with
  Response := TJSONObject.Create;

  // Are chat services avialable?
  if (MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONArray) = nil
  then Response.AddPair('Chat','UNAVAILABLE')
  else
  begin
    i := 0;
    Models := TJSONArray.Create;
    while i < (MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONArray).Count do
    begin
      if (((MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONArray).items[i] as TJSONObject).getValue('Default') <> nil) and
         ((((MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONArray).items[i] as TJSONObject).getValue('Default') as TJSONBool).AsBoolean = True)
      then
      begin
        Models.Add('*** '+(((MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONArray).items[i] as TJSONObject).getValue('Name') as TJSONString).Value);
      end
      else
      begin
        Models.Add((((MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONArray).items[i] as TJSONObject).getValue('Name') as TJSONString).Value);
      end;
      i := i + 1;
    end;
    Response.AddPair('Models',Models);
  end;

  Result := TStringStream.Create(Response.ToString);
  Response.Free;

end;


This will generate a block of JSON that should look similar to this, depending on how many models are defined.

{
  "Models": [
    "*** ChatGPT 3.5",
    "ChatGPT 4",
    "ChatGPT 4 8K",
    "ChatGPT 4 32K",
    "Image 256",
    "Image 512",
    "Image 1024"
  ]
}

This list then needs to be retrieved and used by the client to indicate which model the user would prefer. We'll also add another parameter to the Chat endpoint to indicate this selection. More on that in a moment. To add this data to the client, we have to add a call to the endpoint whenever the chat page is loaded.  And then, assuming we get some data back, we'll need to populate a new dropdown on the chat window where the model can be selected, being careful to set the default value and remove the *** prefix that we're using to indicate this. 

We can update our HTML template to include a control for selecting the model as well as a control for selecting the number of images in case the model is an "image" model by copying an example from one of the other AdminLTE windows. As previously, the template is used to provide a collection of classes and <div> elements that give us the look that we're after. The HTML id values that are assigned here can be referenced later in our code.

                          <div class="btn-group">
                            <button id="ChatModelSelection" type="button" class="btn btn-tool dropdown-toggle border-0" data-bs-toggle="dropdown">
                            </button>
                            <div id="ChatModelMenu" class="dropdown-menu dropdown-menu-end" role="menu">
                            </div>
                          </div>
                          <div class="btn-group">
                            <button id="ChatCountSelection" type="button" disabled class="btn btn-tool dropdown-toggle border-0" data-bs-toggle="dropdown">
                              Images: 1
                            </button>
                            <div id="ChatCountMenu" class="dropdown-menu dropdown-menu-end" role="menu">
                              <div class="dropdown-item cursor-pointer ChatCountMenuItem">1</div>
                              <div class="dropdown-item cursor-pointer ChatCountMenuItem">2</div>
                              <div class="dropdown-item cursor-pointer ChatCountMenuItem">3</div>
                              <div class="dropdown-item cursor-pointer ChatCountMenuItem">4</div>
                              <div class="dropdown-item cursor-pointer ChatCountMenuItem">5</div>
                              <div class="dropdown-item cursor-pointer ChatCountMenuItem">6</div>
                              <div class="dropdown-item cursor-pointer ChatCountMenuItem">7</div>
                              <div class="dropdown-item cursor-pointer ChatCountMenuItem">8</div>
                              <div class="dropdown-item cursor-pointer ChatCountMenuItem">9</div>
                              <div class="dropdown-item cursor-pointer ChatCountMenuItem">10</div>
                            </div>
                          </div>


For the code, we're just adding menu elements if there are models provided by XData. The addEventListener functions update the buttons with the menu options selected. Selecting a model with "image" in the name will enable the Images dropdown, where a count between 1 and 10 can be chosen.  

  // Load up Chat Information
  ResponseString := await(MainForm.JSONRequest('IChatService.GetChatInformation',[]));
  if ResponseString <> '' then
  begin
    Here := Self;
    asm
      var data = JSON.parse(ResponseString);
      if (data.Models.length > 0) {

        Here.InitializeChat();

        // Use this to automatically add splitters
        var splittercheck = data.Models[0].replace('*** ','').substring(0,4);

        // Add all Models
        for (var i = 0; i < data.Models.length; i++) {
          if (data.Models[i].indexOf('*** ') == 0) {
            data.Models[i] = data.Models[i].replace('*** ','');
            ChatModelSelection.innerHTML = data.Models[i];
          }
          if (splittercheck !== data.Models[i].substring(0,4)) {
            splittercheck = data.Models[i].substring(0,4);
            ChatModelMenu.innerHTML += '<div class="dropdown-divider"></div>';
          }
          ChatModelMenu.innerHTML += '<div class="dropdown-item cursor-pointer ChatModelMenuItem">'+data.Models[i]+'</div>';
        }

        // If Model clicked, update button.  Also enable/disable image count
        ChatModelMenu.addEventListener('click', (e) => {
          if (e.target.classList.contains('ChatModelMenuItem')) {

            // Reset chat interface each time we change the model
            Here.InitializeChat();

            ChatModelSelection.innerHTML = e.target.innerHTML;
            if (e.target.innerHTML.indexOf('Image') == -1) {
              ChatCountSelection.setAttribute('disabled','');
            } else {
              ChatCountSelection.removeAttribute('disabled');
            }
          }
        });

        // If Image count clicked, update button
        ChatCountMenu.addEventListener('click', (e) => {
          if (e.target.classList.contains('ChatCountMenuItem')) {
            ChatCountSelection.innerHTML = "Images: "+e.target.innerHTML;
          }
        });
      }
      // No Models = No Chat
      else {
        btnChatSend.setAttribute('disabled');
        ChatCountSelection.setAttribute('disabled');
        ChatModelSelect.setSttribute('disabled');
      }
    end;
  end;

Any time the model changes, we want to clear the conversation and start again. So that code has been pulled into its own function.


procedure TChatStatisticsSubForm.InitializeChat;
var
  Openers: Array[0..9] of String;
begin
  // Clear out any prior chat
  asm
    ChatWindow.replaceChildren();
  end;

  // Initialize with a canned response of some kind.
  Openers[0] := 'Hello!  My name is '+Mainform.App_ChatBotName+'.  Please ask me anything you like.';
  Openers[1] := 'Greetings and salutations.  It is I, '+Mainform.App_ChatBotName+'.  How may I be of assistance?';
  Openers[2] :=  Mainform.App_ChatBotName+' here. How can I help you?';
  Openers[3] := 'I come in peace. Hahaha. Just kidding. What do you want to know?';
  Openers[4] := 'Bleep. Bleep. Bloop. Bleep.';
  // Bicentennial Man
  Openers[5] := 'One is glad to be of service.';
  // Aladdin
  Openers[6] := 'The ever impressive, the long contained, often imitated, but never duplicated... Genie of the lamp!';
  // Terminator 2
  Openers[7] := 'I have detailed files.';
  // The Matrix
  Openers[8] := 'I am the Architect. I created the matrix. I''ve been waiting for you. You have many questions, and although the process has altered your consciousness, you remain irrevocably human. Ergo, some of my answers you will understand, and some of them you will not. Concordantly, while your first question may be the most pertinent, you may or may not realize it is also irrelevant.';
  // Aliens
  Openers[9] := 'Well, that explains it then. The A2s always were a bit twitchy. That could never happen now with our behavioral inhibitors. It is impossible for me to harm or by omission of action, allow to be harmed, a human being.';

  AddStaticChatResponse(Openers[System.Random(10)]);
end;


Here's what the UI looks like with the new choices added.

TMS Software Delphi  Components
Chat Options.


Back over to XData, we'll need to update our Chat endpoint with a "model" parameter and a "choices" parameter.  To make testing a bit easier, we can also define default values in Swagger. Here's our new interface declaration, complete with comments.

    /// Chat
    ///  <summary>
    ///    Submit a conversation to OpenAI, along with context, get a response.
    ///  </summary>
    ///  <remarks>
    ///    There are limits to the overall conversation, typically 4096 tokens, which includes the current conversation, the context, and the response.
    ///  </remarks>
    ///  <param name="Model">
    ///    The name of the model to use for this chat.  The model name should be selected from the list provided by GetChatInformation, which is the user-viewable name, not the internal chat model name.
    ///  </param>
    ///  <param name="Conversation">
    ///   The next part of the conversation.
    ///  </param>
    ///  <param name="Context">
    ///    The prior portions of the conversation that we would like to include to give the model the opportunity to include it when coming up with a reponse. This is what gives our chat the illusion of knowing about prior responses. It doesn't, really, so we have to supply them each time.
    ///  </param>
    ///  <param name="Choices">
    ///    An integer that indicates how many choices to return.  For normal chat conversations, this is typically going to be "1".  For images, this might be any value from 1 to 10.
    ///  </param>
    ///  <param name="ChatID">
    ///    A unique identifier for the chat.  This is used to store the chat, and subsequent updates to the chat, in a server database, and helps ensure that we just end up with a single copy of the conversation when we're done.
    ///  </param>
    [Authorize] [HttpGet] function Chat([XDefault('ChatGPT 3.5')] Model: String; Conversation: String; [XDefault('None')] Context: String; [XDefault(1)] Choices: Integer; [XDefault('Swagger Testing')] ChatID: String):TStream;

And this is what it looks like in Swagger.

TMS Software Delphi  Components
Chat via Swagger.


In ChatServiceImplementation, we now have to find the model in the configuration JSON before we can do much of anything. If we don't find the model, then this is an error condition and we return an exception.

function TChatService.Chat(Model: String; Conversation: String; Context: String; Choices: Integer; ChatID: String):TStream;
var
  Client: TNetHTTPClient;
  Response: String;
  Request: TStringStream;
  ModelJSON: TJSONObject;
  i: integer;
begin

  // Do we have enough information to process this request?

  // Is the chat service available?
  if (MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONArray) = nil
  then raise EXDataHttpUnauthorized.Create('Chat Service is Unavailable.');

  // Find the matching model
  i := 0;
  while i <  (MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONArray).Count do
  begin
    if ((MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONArray).Items[i] as TJSONObject).GetValue('Name') <> nil then
    begin
      if Trim(Uppercase((((MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONArray).Items[i] as TJSONObject).GetValue('Name') as TJSONString).Value)) = Trim(Uppercase(Model)) then
      begin
        ModelJSON := (MainForm.AppConfiguration.GetValue('Chat Interface') as TJSONArray).Items[i] as TJSONObject;
      end;
    end;
    i := i + 1;
  end;

  if ModelJSON = nil
  then raise EXDataHttpUnauthorized.Create('Chat Service Error: Model Not Found [ '+Model+' ]');

When it comes to the actual context of the conversation, we'll need to include the prior exchanges as part of the "messages" being sent with the request. Previously, we just passed the current part of the conversation. For the rest of the conversation, we'll need entries that indicate the role of "user" or "assistant", as well as what was sent or received. There is also a "system" role where we, as developers, can include additional information. More on that a little later. 

The idea is to pass in enough information to help carry on the conversation. However, the entire "message" being sent back is restricted to the number of tokens allowed for that particular chat endpoint. If the conversation data plus the context data is anywhere close to the token limit, we'll either get an abbreviated response or an error. So we'll need to check for this. We could do this on the client, but as there may potentially be multiple clients, it is instead handled in XData. We'd also like to have a complete conversation to store in the database. 

What we're doing here is just deleting the history items from the beginning of the context data until the remaining total is 75% of the available tokens. And we're doing a bit of hand-waving, assuming that a token, on average, is 4 characters (reference). We'd also like to know when this is happening, so a string is passed back. More on that in just a moment.

function TChatService.TrimConversation(var Conversation, Context: String; Limit: Integer):String;
var
  LimitChars: Integer;
  Started: Integer;
  Clipped: Integer;
  i: Integer;
  ContextJSON: TJSONArray;
begin
  LimitChars := Limit * 3;  // 75% of 4 characters per token
  // Our conversation is so long that there is no room for context.
  if length(Conversation) > LimitChars then
  begin
    Conversation := Copy(Conversation, 1, LimitChars);
    Context := '';
    Result := 'Conversation Clipped';
  end
  else if (Length(Conversation) + Length(Context)) > LimitChars then
  begin
    try
      ContextJSON := TJSONObject.ParseJSONValue('['+Context+']') as TJSONArray;
    except on E: Exception do
      begin
        Context := '';
      end
    end;
    Started := 0;
    Clipped := 0;
    if ((ContextJSON <> nil) and (ContextJSON.Count > 0)) then
    begin
      Started := ContextJSON.Count;
      i := 0;
      while i < ContextJSON.Count do
      begin
        if (Length(Conversation) + Length(Context)) > LimitChars then
        begin
          // Remove entry from the beginning of the array
          ContextJSON.Remove(0);
          Clipped := Clipped + 1;
          Context := Copy(ContextJSON.ToString,2,Length(ContextJSON.ToString)-2);
        end;
        i := i + 1;
      end;
    end;
    Result := 'Context Clipped '+IntToStr(Clipped)+' of '+IntToStr(Started);
    ContextJSON.Free;
  end
  else
  begin
    Result := 'Normal';
  end;
end;

Now, when we are preparing to submit a request, we'll include the context, if available, as extra elements in the "messages" array. If no context is available (or has been removed due to length) then we just pass the conversation. A bit of playing around here to generate the JSON as a string with the appropriate quotations and so on, but as all the components are already processed, this works pretty well.


  // Prepare Conversation
  if Copy(Trim(Conversation),1,1) <> '"'
  then Conversation := '"'+Trim(Conversation)+'"';

  // Prepare Context
  TrimMessage := TrimConversation(Conversation, Context, ((ModelJSON.GetValue('Token Limit') as TJSONNumber).AsInt));

  // Without Context (initial conversation, or excessive context)
  if ((Context = '') or (Context = 'None')) then
  begin
    Request := TStringStream.Create('{'+
      '"model":"'+((ModelJSON.GetValue('Model') as TJSONString).Value)+'",'+
      '"messages":['+'{"role":"user","content":'+Conversation+'}]'+
      '}');
  end

  // With Context
  else
  begin
    Request := TStringStream.Create('{'+
      '"model":"'+((ModelJSON.GetValue('Model') as TJSONString).Value)+'",'+
      '"messages":['+Context+',{"role":"user","content":'+Conversation+'}]'+
      '}');
  end;

When we get the response back, we then tack on this trimming information to the data being sent back to the client.


  // Prepare Response
  ResultJSON := TJSONObject.ParseJSONValue(Response) as TJSONObject;

  // Add details about whether anything was altered
  ResultJSON.AddPair('Trim Message',TrimMessage);

Back in the client, we get the context that we're after. We can add this extra bit of information to the display to let us know if any trimming has occurred. This will naturally happen as the conversation goes on and, so long as the conversation doesn't reach too far back, it does what it is intended to do - providing context so that we don't have to repeat everything all of the time.

TMS Software Delphi  Components
Conversation with Context.

So while this is a bit of a behind-the-scenes trick, it helps facilitate conversations that are considerably more natural. The problem that arises, though, is that chat models only accept a certain number of tokens, so at some point the prior parts of the conversation are forgotten. 

Newer models, like ChatGPT 4, have options for processing a larger number of tokens (8k and 32k). Note that there can be instances where even just the response can exceed 4k tokens. So this is a welcome development. But one can imagine the cost to implement models is significantly higher as well. Which is likely the main factor contributing to the 10x cost of tokens for ChatGPT4 versus earlier versions. To help a little bit, ChatGPT4 also currently has different prices for the context (cheaper) vs. the completion (response) tokens. One can imagine many benefits that come from pre-loading context, developer-generated, to help ensure the chat doesn't stray too far from reality.  We'll touch on this in the closing section of this blog.

Next Time: Images.

The code for this project has been integrated into the latest versions of the TMS WEB Core Template Demo and the TMS XData Template Demo Data projects on GitHub. So please check there for the latest updates and any code changes that will very likely have been made since this post was first created. 

Next, time, we'll have a look at incorporating images into our chat system, as well as how to deal with database logging and displaying prior chat content. As always, if you have any feedback, please post any comments or questions below.


Follow Andrew on 𝕏 at @WebCoreAndMore or join our
𝕏
Web Core and More Community.



Andrew Simard


Bookmarks: 

This blog post has not received any comments yet.



Add a new comment

You will receive a confirmation mail with a link to validate your comment, please use a valid email address.
All fields are required.



All Blog Posts  |  Next Post  |  Previous Post