Archive<< November 2019 >>
- Adrian Gallero (21)
- Bart Holvoet (2)
- Bernard Roussely (1)
- Bruno Fierens (228)
- Gjalt Vanhauwaert (1)
- Holger Flick (15)
- Marcos Douglas B. Santos (5)
- Masiha Zemarai (28)
- Nancy Lescouhier (32)
- Pieter Scheldeman (21)
- Roman Yankovsky (1)
- Tunde Keller (3)
- Wagner R. Landgraf (63)
Most Recent Post | Index List
Friday, September 30, 2016We released earlier this week a major update of the TMS FMX Cloud Pack. This new version adds a lot of new components covering seamless access to all kinds of interesting cloud services. Among the new services covered, two services from Microsoft stand out and open up new ways to enrich our Delphi applications with cool features. In this blog, I wanted to present the Microsoft Computer Vision and Microsoft Bing speech service. Our new components TTMSFMXCloudMSComputerVision and TTMSFMXCloudMSBingSpeech offer instant and dead-easy access to these services. Powered with these components, the idea came up to create a small iPhone app that let's vision impaired people take a picture of their environment or a document and have the Microsoft services analyze the picture taken and let Microsoft Bing speech read the result.
So, roll up your sleeves and in 15 minutes you can assemble this cool iPhone app powered with Delphi 10.1 Berlin and the TMS FMX Cloud Pack!
To get started, the code is added to allow taking pictures from the iPhone. This is a snippet of code that comes right from the Delphi docs. From a button's OnClick event, the camera is started:
if TPlatformServices.Current.SupportsPlatformService(IFMXCameraService, Service) then begin Params.Editable := True; // Specifies whether to save a picture to device Photo Library Params.NeedSaveToAlbum := false; Params.RequiredResolution := TSize.Create(640, 640); Params.OnDidFinishTaking := DoDidFinish; Service.TakePhoto(Button1, Params); end
procedure TForm1.FormShow(Sender: TObject); begin TMSFMXCloudMSBingSpeech1.App.Key := MSBingSpeechAppkey; TMSFMXCLoudMSComputerVision1.App.Key := MSComputerVisionAppkey; end;
So, a TTask is used to start this analysis with the call TMSFMXCLoudMSComputerVision1.ProcessFile(s, cv). A TTask is used to avoid that the UI is locked during this analysis, after-all, the image must be submitted to Microsoft, processed and the result returned and parsed, so this can take 1 or 2 seconds. Depending on the analysis type, the result is captured as text in a memo control. After this, we connect to the Bing speech service.
procedure TForm1.DoDidFinish(Image: TBitmap); var aTask: ITask; s: string; cv: TMSComputerVisionType; begin CaptureImage.Bitmap.Assign(Image); // take local copy of the file for processing s := TPath.GetDocumentsPath + PathDelim + 'photo.jpg'; Image.SaveToFile(s); // asynchronously start image analysis aTask := TTask.Create (procedure () var i: integer; begin if btnAn0.IsChecked then cv := ctAnalysis; if btnAn1.IsChecked then cv := ctOCR; if TMSFMXCLoudMSComputerVision1.ProcessFile(s, cv) then begin Description := ''; if cv = ctAnalysis then begin // concatenate the image description returned from Microsoft Computer Vision API for I := 0 to TMSFMXCLoudMSComputerVision1.Analysis.Descriptions.Count - 1 do begin Description := Description + TMSFMXCLoudMSComputerVision1.Analysis.Descriptions[I] + #13#10; end; end else begin Description := TMSFMXCLoudMSComputerVision1.OCR.Text.Text; end; // update UI in main UI thread TThread.Queue(TThread.CurrentThread, procedure () begin if Assigned(AnalysisResult) then AnalysisResult.Lines.Text := Description; end ); TMSFMXCloudMSBingSpeech1.Connect; end else begin // update UI in main UI thread TThread.Queue(TThread.CurrentThread, procedure () begin AnalysisResult.Lines.Add('Sorry, could not process image.'); end ); end; end ); aTask.Start; end;
procedure TForm1.TMSFMXCloudMSBingSpeech1Connected(Sender: TObject); var st: TMemoryStream; s: string; begin st := TMemoryStream.Create; s := AnalysisResult.Lines.Text; try TMSFMXCloudMSBingSpeech1.Synthesize(s, st); TMSFMXCloudMSBingSpeech1.PlaySound(st); finally st.Free; end; end;
Now, let's try out the app in the real world. Here are a few examples we tested.
Using the app on the road to read road signs and capture car license plates
Trying to figure out what we see in a showroom
First the app analyzed correctly this is a bottle of wine in the cellar and is then pretty good at reading the wine bottle label.
You can download the full source code of the app here and have fun discovering these new capabilities.
This blog post has received 6 comments.
Most Recent Post | Index List