Grammalecte  Check-in [f1c63ee223]

Overview
Comment:[graphspell][py] count words occurrence: count elided words
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk | graphspell
Files: files | file ages | folders
SHA3-256: f1c63ee223c5255895c238d1a91d5510756d11b7201728e0ff8c1e69fb8d4be3
User & Date: olr on 2020-03-02 12:59:38
Other Links: manifest | tags
Context
2020-03-02
15:15
[lo][fr] Recenseur de mots: compte double pour les mots composés formes verbales interrogatives check-in: f02cd26db1 user: olr tags: lo, trunk
12:59
[graphspell][py] count words occurrence: count elided words check-in: f1c63ee223 user: olr tags: graphspell, trunk
12:07
[fr] ajustements check-in: e72a9109a9 user: olr tags: fr, trunk
Changes

Modified graphspell/spellchecker.py from [9c6b027a4b] to [114a0237c3].

   146    146   
   147    147       def countWordsOccurrences (self, sText, bByLemma=False, bOnlyUnknownWords=False, dWord={}):
   148    148           """count word occurrences.
   149    149              <dWord> can be used to cumulate count from several texts."""
   150    150           if not self.oTokenizer:
   151    151               self._loadTokenizer()
   152    152           for dToken in self.oTokenizer.genTokens(sText):
   153         -            if dToken['sType'] == "WORD":
          153  +            if dToken['sType'].startswith("WORD"):
   154    154                   if bOnlyUnknownWords:
   155    155                       if not self.isValidToken(dToken['sValue']):
   156    156                           dWord[dToken['sValue']] = dWord.get(dToken['sValue'], 0) + 1
   157    157                   else:
   158    158                       if not bByLemma:
   159    159                           dWord[dToken['sValue']] = dWord.get(dToken['sValue'], 0) + 1
   160    160                       else: