Grammalecte  Check-in [0283fcb23c]

Overview
Comment:syntax documentation update
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk | doc
Files: files | file ages | folders
SHA3-256: 0283fcb23c3b3f9de5f7bcb36ec3b5acb1c0b66dbb89c6d66c05bff874c02ed2
User & Date: olr on 2017-06-04 13:10:59
Other Links: manifest | tags
Context
2017-06-05
11:13
[fr] nouvelle règle: confusion et/est check-in: 4f461da67e user: olr tags: fr, trunk
2017-06-04
13:10
syntax documentation update check-in: 0283fcb23c user: olr tags: doc, trunk
10:13
[core] getReadableError changed check-in: 8fefa29d19 user: olr tags: core, trunk
Changes

Modified doc/syntax.txt from [ee9590ca86] to [2900658ffe].

     1      1   
     2      2   WRITING RULES FOR GRAMMALECTE
     3      3   
            4  +Note: This documentation is obsolete right now.
     4      5   
     5         -= Principles =
            6  +# Principles #
     6      7   
     7         -Grammalecte is a multi-passes grammar checker engine. On the first pass, the
     8         -engine checks the text paragraph by paragraph. On the next passes, the engine
            8  +Grammalecte is a bi-passes grammar checker engine. On the first pass, the
            9  +engine checks the text paragraph by paragraph. On the second passe, the engine
     9     10   check the text sentence by sentence.
    10     11   
    11         -The command to add a new pass is:
           12  +The command to switch to the second pass is:
    12     13   [++]
    13     14   
    14         -You shoudn’t need more than two passes, but you can create as many passes as
    15         -you wish.
    16         -
    17     15   In each pass, you can write as many rules as you need.
    18     16   
    19     17   A rule is defined by:
    20         -- a regex pattern trigger
    21         -- a list of actions (can’t be empty)
    22         -- [optional] flags “LCR” for the regex word boundaries and case sensitiveness
    23         -- [optional] user option name for activating/disactivating the rule
           18  +* [optional] flags “LCR” for the regex word boundaries and case sensitiveness
           19  +* a regex pattern trigger
           20  +* a list of actions (can’t be empty)
           21  +* [optional] user option name for activating/disactivating the rule
           22  +* [optional] rule name
    24     23   
    25     24   There is no limit to the number of actions and the type of actions a rule can
    26     25   launch. Each action has its own condition to be triggered.
    27     26   
    28     27   There are three kind of actions:
    29         -- Error warning, with a message and optionaly suggestions and optionally an URL
           28  +- Error warning, with a message, and optionally suggestions, and optionally an URL
    30     29   - Text transformation, modifying internally the checked text
    31     30   - Disambigation action, setting tags on a position
    32     31   
    33     32   
    34     33   The rules file for your language must be named “rules.grx”.
    35         -The options file must be named “option.txt”.
    36     34   The settings file must be named “config.ini”.
    37     35   
    38     36   All these files are simple utf-8 text file.
    39     37   UTF-8 is mandatory.
    40     38   
    41     39   
    42     40   
    43         -= Rule syntax =
           41  +# Rule syntax #
    44     42   
    45         -__LCR__  pattern
           43  +__LCR/option(rulename)__  pattern
    46     44       <<- condition ->> error_suggestions  # message_error|http://awebsite.net...
    47     45       <<- condition ~>> text_rewriting
    48     46       <<- condition =>> commands_for_disambigation
    49     47       ...
    50     48   
    51     49   Patterns are written with the Python syntax for regular expressions:
    52     50   http://docs.python.org/library/re.html
................................................................................
    55     53   written.
    56     54   
    57     55   Conditions are optional, i.e.:
    58     56       <<- ~>> replacement
    59     57   
    60     58   
    61     59   LCR flags means:
    62         -- Left boundary for the regex
    63         -- Case sensitiveness
    64         -- Right boundary for the regex
           60  +* L: Left boundary for the regex
           61  +* C: Case sensitiveness
           62  +* R: Right boundary for the regex
    65     63   
    66         -Left boundary:  [  word boundary  or  <  no word boundary
    67         -right boundary:  ]  word boundary  or  >  no word boundary
    68         -Case sensitiveness:
    69         -    i: case insensitive
    70         -    s: case sensitive
    71         -    u: uppercase allowed for lowercased characters
    72         -        i.e.:  "Word"  becomes  "W[oO][rR][dD]"
           64  +Left boundary (L):
           65  +    `[`   word boundary
           66  +    `<`   no word boundary
           67  +
           68  +right boundary (R):
           69  +    `]`   word boundary
           70  +    `>`   no word boundary
           71  +
           72  +Case sensitiveness (C):
           73  +    `i`     case insensitive
           74  +    `s`     case sensitive
           75  +    `u`     uppercase allowed for lowercased characters
           76  +            i.e.:  "Word"  becomes  "W[oO][rR][dD]"
    73     77   
    74     78   Examples:
    75         -__[i]__  pattern
    76         -__<s]__  pattern
    77         -__[u>__  pattern
    78         -__<s>__  pattern
    79         -...
           79  +    __[i]__  pattern
           80  +    __<s]__  pattern
           81  +    __[u>__  pattern
           82  +    __<s>__  pattern
           83  +    ...
           84  +
    80     85   
    81     86   User option activating/disactivating is possible with an option name placed
    82     87   just after the LCR flags, i.e.:
    83         -__[i]/useroption1__  pattern
    84         -__[u]/useroption2__  pattern
    85         -__[s>/useroption1__  pattern
    86         -__<u>/useroption3__  pattern
    87         -__<i>/useroption3__  pattern
    88         -...
           88  +    __[i]/option1__  pattern
           89  +    __[u]/option2__  pattern
           90  +    __[s>/option1__  pattern
           91  +    __<u>/option3__  pattern
           92  +    __<i>/option3__  pattern
           93  +    ...
           94  +
           95  +Rules can be named:
           96  +    __[i]/option1(name1)__  pattern
           97  +    __[u]/option2(name2)__  pattern
           98  +    __[s>/option1(name3)__  pattern
           99  +    __<u>/option3(name4)__  pattern
          100  +    __<i>/option3(name5)__  pattern
          101  +    ...
          102  +
          103  +Each rule name must be unique.
          104  +
    89    105   
    90    106   The LCR flags are also optional. If you don’t set these flags, the default LCR
    91    107   flags will be:
    92         -__[i]__
          108  +    __[i]__
    93    109   
    94    110   Example. Report “foo” in the text and suggest "bar":
    95    111   
    96         -foo <<- ->> bar # Use bar instead of foo.
          112  +    foo <<- ->> bar # Use bar instead of foo.
    97    113   
    98    114   Example. Recognize and suggest missing hyphen and rewrite internally the text
    99    115   with the hyphen:
   100    116   
   101         -__[s]__ foo bar
   102         -    <<- ->> foo-bar # Missing hyphen.
   103         -    <<- ~>> foo-bar
          117  +    __[s]__ foo bar
          118  +        <<- ->> foo-bar # Missing hyphen.
          119  +        <<- ~>> foo-bar
   104    120   
   105    121   
   106    122   == Simple-line or multi-line rules ==
   107    123   
   108    124   Rules can be break to multiple lines by leading tabulators or spaces.
   109    125   You should use 4 spaces.
   110    126   
   111    127   Examples:
   112    128   
   113         -__<s>__ pattern <<- condition
   114         -    ->> replacement
   115         -    # message
   116         -    <<- condition ->> suggestion # message
   117         -    <<- condition
   118         -    ~>> text_rewriting
   119         -    <<- =>> disambiguation
          129  +    __<s>__ pattern
          130  +        <<- condition ->> replacement
          131  +        # message
          132  +        <<- condition ->> suggestion # message
          133  +        <<- condition
          134  +        ~>> text_rewriting
          135  +        <<- =>> disambiguation
   120    136   
   121         -__<s>__ pattern <<- condition ->> replacement # message
          137  +    __<s>__ pattern <<- condition ->> replacement # message
   122    138   
   123    139   
   124         -== Comments ==
          140  +## Comments ##
   125    141   
   126    142   Lines beginning with # are comments.
   127    143   
   128         -Example. No action done.
   129    144   
   130         -# pattern <<- ->> foo bar # message
   131         -
   132         -
   133         -== End of file ==
          145  +## End of file ##
   134    146   
   135    147   With the command:
   136    148   
   137         -#END
          149  +`#END`
   138    150   
   139         -the compiler won’t go further. Whatever is written after will be considered
   140         -as comments.
          151  +at the beginning of a line, the compiler won’t go further.
          152  +Whatever is written after will be considered as comments.
   141    153   
   142    154   
   143         -== Whitespaces at the border of patterns or suggestions ==
          155  +## Whitespaces at the border of patterns or suggestions ##
   144    156   
   145    157   Example. Recognize double or more spaces and suggests a single space:
   146    158   
   147         -__<s>__  "  +" <<- ->> " " # Extra space(s).
          159  +    __<s>__  "  +" <<- ->> " " # Extra space(s).
   148    160   
   149    161   ASCII " characters protect spaces in the pattern and in the replacement text.
   150    162   
   151    163   
   152         -== Pattern groups and back references ==
          164  +## Pattern groups and back references ##
   153    165   
   154    166   It is usually useful to retrieve parts of the matched pattern. We simply use
   155    167   parenthesis in pattern to get groups with back references.
   156    168   
   157    169   Example. Suggest a word with correct quotation marks:
   158    170   
   159    171   \"(\w+)\" <<- ->> “\1” # Correct quotation marks.
................................................................................
   163    175   __<i]__ \b([?!.])([A-Z]+) <<- ->> \1 \2 # Missing space?
   164    176   
   165    177   Example. Back reference in messages.
   166    178   
   167    179   (fooo) bar <<- ->> foo bar # “\1” should be:
   168    180   
   169    181   
   170         -== Name definitions ==
          182  +## Name definitions ##
   171    183   
   172    184   Grammalecte supports name definitions to simplify the description of the
   173    185   complex rules.
   174    186   
   175    187   Example.
   176    188   
   177    189   DEF: name pattern
   178    190   
   179    191   Usage in the rules:
   180    192   
   181    193   ({name}) (\w+) ->> "\1-\2" # Missing hyphen?
   182    194   
   183    195   
   184         -== Multiple suggestions ==
          196  +## Multiple suggestions ##
   185    197   
   186    198   Use | in the replacement text to add multiple suggestions:
   187    199   
   188    200   Example 7. Foo, FOO, Bar and BAR suggestions for the input word "foo".
   189    201   
   190    202   foo <<- ->> Foo|FOO|Bar|BAR # Did you mean:
   191    203   
   192    204   
   193         -== No suggestion ==
          205  +## No suggestion ##
   194    206   
   195    207   You can display message without making suggestions. For this purpose,
   196    208   use a single character _ in the suggestion field.
   197    209   
   198    210   Example. No suggestion.
   199    211   
   200    212   foobar <<- ->> _ # Message
   201    213   
   202    214   
   203         -== Positioning ==
          215  +## Positioning ##
   204    216   
   205    217   Positioning is valid only for error creation and text rewriting.
   206    218   
   207    219   By default, the full pattern will be underlined with blue. You can shorten the
   208    220   underlined text area by specifying a back reference group of the pattern.
   209    221   Instead of writing ->>, write -n>>  n being the number of a back reference
   210    222   group. Actually,  ->>  is similar to  -0>>
   211    223   
   212    224   Example.
   213    225   
   214    226   (ying) and yang <<- -1>> yin # Did you mean:
   215    227   __[s]__ (Mr.) [A-Z]\w+ <<- ~1>> Mr
   216    228   
   217         -=== Comparison ===
          229  +### Comparison ###
   218    230   
   219    231   Rule A:
   220    232   ying and yang       <<- ->>     yin and yang        # Did you mean:
   221    233   
   222    234   Rule B:
   223    235   (ying) and yang     <<- -1>>    yin                 # Did you mean:
   224    236   
................................................................................
   227    239       ^^^^^^^^^^^^^
   228    240   
   229    241   With the rule B, only the first group is underlined:
   230    242       ying and yang
   231    243       ^^^^
   232    244   
   233    245   
   234         -== Longer explanations with URLs ==
          246  +## Longer explanations with URLs ##
   235    247   
   236    248   Warning messages can contain optional URL for longer explanations separated by "|":
   237    249   
   238    250   (your|her|our|their)['’]s
   239    251       <<- ->> \1s
   240    252       # Possessive pronoun:|http://en.wikipedia.org/wiki/Possessive_pronoun
   241    253   
   242    254   
   243    255   
   244         -= Text rewriting =
          256  +# Text rewriting #
   245    257   
   246    258   Example. Replacing a string by another
   247    259   
   248    260   Mr. [A-Z]\w+ <<- ~>> Mister
   249    261   
   250    262   WARNING: The replacing text must be shorter than the replaced text or have the
   251    263   same length. Breaking this rule will misplace following error reports. You
................................................................................
   268    280   
   269    281   You can also call Python expressions.
   270    282   
   271    283   __[s]__ Mr. ([a-z]\w+) <<- ~1>> =\1.upper()
   272    284   
   273    285   
   274    286   
   275         -= Disambiguation =
          287  +# Disambiguation #
   276    288   
   277    289   When Grammalecte analyses a word with morph or morphex, before requesting the
   278    290   POS tags to the dictionary, it checks if there is a stored marker for the
   279    291   position where the word is. If there is a marker, Grammalecte uses the stored
   280    292   data and don’t make request to the dictionary.
   281    293   
   282    294   The disambigation commands store POS tags at the position of a word.
................................................................................
   305    317   define(\1, "po:nom is:plur|po:adj is:sing|po:adv")
   306    318   
   307    319   This will store a list of tags at the position of the first group:
   308    320   ["po:nom is:plur", "po:adj is:sing", "po:adv"]
   309    321   
   310    322   
   311    323   
   312         -= Conditions =
          324  +# Conditions #
   313    325   
   314    326   Conditions are Python expressions, they must return a value, which will be
   315    327   evaluated as boolean. You can use the usual Python syntax and libraries.
   316    328   
   317    329   You can call pattern subgroups via \0, \1, \2…
   318    330   
   319    331   Example:
................................................................................
   324    336   You can also apply functions to subgroups like:
   325    337       \1.startswith("a")
   326    338       \3.islower()
   327    339       re.match("pattern", \2)
   328    340     329    341   
   330    342   
   331         -== Standard functions ==
          343  +## Standard functions ##
   332    344   
   333    345   word(n)
   334    346       catches the nth next word after the pattern (separated only by white spaces).
   335    347       returns None if no word catched
   336    348   
   337    349   word(-n)
   338    350       catches the nth next word before the pattern (separated only by white spaces).
................................................................................
   359    371   
   360    372   option(option_name)
   361    373       returns True if option_name is activated else False
   362    374   
   363    375   Note: the analysis is done on the preprocessed text.
   364    376   
   365    377   
   366         -== Default variables ==
          378  +## Default variables ##
   367    379   
   368    380   sCountry
   369    381   
   370    382   It contains the current country locale of the checked paragraph.
   371    383   
   372    384   colour <<- sCountry == "US" ->> color # Use American English spelling.
   373    385   
   374    386   
   375    387   
   376         -= Expressions in the suggestions =
          388  +# Expressions in the suggestions #
   377    389   
   378    390   Suggestions (and warning messages) started by an equal sign are Python string expressions
   379    391   extended with possible back references and named definitions:
   380    392   
   381    393   Example:
   382    394   
   383    395   foo\w+ ->> = '"' + \0.upper() + '"' # With uppercase letters and quoation marks