Issues with spell checking components

We found the following issues with version 1.0 of the TAdvSpellCheck and TAdvSpellCheckCorrectDialog components.

Though we were able to fix issues 2 up to 7, we would highly appreciate your help with issue 1:



1. Words with the German SZ character "?" are not validated and the SZ character is not correctly encoded in the suggestions list.

2. The affix file is not loaded if the current working directory of the executable is different from the executable path (see. TSpellCheckLanguagePack.SetAffixFilename).

3. The TAdvSpellCheck component sets its internal change state to true, if the index of a word in the ignore word list is retrieved which doesn't change anything (see TCustomAdvSpellCheck.IndexInIgnoreList).

4. In TAdvSpellCheckCorrectDialog.HandleAdd a call to DoAddWord is missing.

5. If the Add button in the correct panel or dialog is pressed, the current spell checked word should be filled in in the Add word dialog as default.

6. The TAdvSpellCheckCorrectPanel.DoAdd method misses a DoNextError at the end.

7. In a correct dialog, DoNextError is actually called twice on ignoring a word or ignoring all, so the next misspelled word is skipped.



We'll investigate these items.

It is unclear though what is meant with:


the German SZ character "?" 

What exact character is being referred to here?

The German SZ (s-zet or sharp s) character "&szlig" has ISO-8859-1 code 223 and HTML code ß. In my forum message it has been replaced by a question mark.

Thanks. We'll investigate this.

Ok. I found 2 bugs regarding the 'ß' issue:



1. function TTMSSpellinfo.Encode shouldn't replace alternate formats using the uppercase or lowercase function. It should just do

Result := ReplaceStr(Result, a.FirstString, a.SecondString);


since

a. section

altstringtype "dictionary"


in German.aff already defines separate rules for uppercase and lowercase umlaut characters.



b.Decode(Encode('ß') would then return the expected result 'ß', not 'SS' (Decode doesn't apply uppercase or lowercase to the alternate formats).



2. Some of the entries in section

altstringtype "dictionary"


in German.aff

are not correctly parsed by TMSSpellInfos.TTMSAlternates.Setvalue. While ä, ö and ü are parsed correctly, Ä Ö and Ü and ß are not.

I fixed that by letting TMSSpellCheckUtil.Unescape return a AnsiString(CP_UTF8).

1)

I've seen and fixed the issue with the uppercase/lowercase. Next update will have this fix.

2)
I could not see an issue with the German.aff we include with the spell checker and the unescape function. What Delphi version do you use and do you use the German.aff file that we include in the distribution?

We use Delphi 10 Seattle and the German.aff file that comes with the TMS SpellCheck distribution.

I can't see an issue here.
Can you make the statement 

"some of the entries are not correctly parsed"
Which entries do you see problem with and how exactly do you see it not being parsed correct?

Set breakpoint in

TMSSpellInfos.TTMSAlternates.Setvalue

line

FValue := Value;

with condition

Pos('\303',value)>0

.

On the first break, put FirstString to your list of supervised expressions. The values I reference in the following text can be found in German.aff, lines 294-300 and 306-312.

When the debugger stops with Value '\303\244 a"', FirstString is correctly set to 'ä'.

When debugger stops with Value '\303\204 A"', at least my debugger shows two question marks '??' for FirstString, the first question mark in white color on a black diamond.

For value '\303\266 o"', FirstString is correctly set to 'ö'.

For value '\303\226 O"' FirstString is incorrectly set to the two question marks again.

For value '\303\274 u"', FirstString is correctly set to 'ü'.

For value '\303\234 U"' FirstString is incorrectly set to the two question marks again.

For value '\303\237 sS', FirstString is incorrectly set to the two question marks again.

Thanks for these extra details. This helped us to analyze this issue and apply a fix. The next update will have this fixed.

Here are some other issues:



8. A {$DESCRIPTION 'TMS SpellCheck'} is missing in the .dpk packages.

9. The .dfm file of a form with a Spell checking component is regularly set to changed by the IDE, since the Guid in the spell checking component's language items is updated everytime the first recompile is done in the IDE. I think the Guid should be fixed if once set.

10. It would be nice to turn off the progress dialog that is shown when the spell check database is updated. Or, if there would be a progress event, one could assign an event handler and use another mechanism like a task bar balloon hint to show progress information.

Thanks for your continued feedback. We'll investigate & improve these items

Some valid German words like "eigenen", "weinten", "ernteten", "ersehnten", "gelegenen" or "verbliebenen" are validated as invalid by the TMS Spell Checker, though they could be derived from words in the German.lat file and rules in the German.aff file.



We found the following issues:

10. The affix file German.aff has flag rules with the > operator spanning two lines, e.g. flag *O, lines 964/965 or 966/967. These lines are not correctly parsed by TMSSpellInfos.TTMSSpellinfo.LoadFromStream which expects a flag rule with the > operator to span only one line, resulting in the regular expression/condition of the > operator to be empty. We added code that prepends the previous line if the > operator is the first character in the line currently processed.

11. In TMSSpellInfos.TTMSFlagData.CompareSuffix and TMSSpellInfosTTMSFlagData.RemoveAffix the choices without leading '-' should processed/replaced before the choices with leading '-', since this would be the inverse to the order in which rules in the affix file are applied to words in the .lat file. Therefore, we changed the for..to loops in for..downto loops for the time being.

12. TMSSpellInfos.TTMSFlagData.CompareAffix: Here we removed the two CompareSuffix and Exit calls, since the condition (regular expression) should always to be checked to avoid invalid words like "ersehneen" or "erseheen" to be suggested. Nevertheless, we are not sure whether this fix is correct in general.

13. TMSSpellInfos.TTMSFlagData.Suggestions: We replaced the call "RemoveAffix(Word)" by "Word" here, since RemoveAffix might have already been done by the caller. This fix has become necessary for us to correctly validate German words like "eigenen" which ends with two suffixes "EN". Removing the suffix "EN" twice would result in the word "eig" from which the word "eigenen" could never be derived using the affix rules. Nevertheless, we are not sure whether this fix is correct in general, since the caller might also not have called RemoveAffix.



No issues, just code cleanup/optimization:

14. TMSSpellInfos.TTMSFlagData.Suggestions: we removed the superfluous CompareAffix calls, since CompareAffix has already been done by the callers.

15. TMSSpellInfos.TTMSFlagData.Suggestions: there is no need to add an empty value if flag type is prefix.



Furthermore:

16. The German.lat file could be further enhanced by adding the word

geweint/A

and replacing

aufgeregt/AC

by

aufgeregt/ACU

to successfully validate the German word "unaufgeregt".

Yet another issue:



17. Affix rules like the following one for flag *A in German.aff

[^ELRsS]          >     E

are not applied if the word ends with "s", like the word "dies/A" in German.lat. The reason is that the

sS

in the rule's left-hand side

[^ELRsS]

is actually handled as "s" and "S" and not as "sS". For now, we fixed TMSSpellInfos.TTMSFlagData.CompareAffix so that

sS

is replaced by the special character #2 in both the word to be matched and the left-hand side of the rule.

Thanks for the detailed reports. The responsible developer will go through this and we'll look to address these.