Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 42 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,36 +11,66 @@ Here is how it is typically run:

python comment_spell_check.py --exclude Ancillary $SIMPLEITK_SOURCE_DIR/Code

This command will recursively find all the '.h' files in a directory,
This command will recursively find all the \'.h\' files in a directory,
extract the C/C++ comments from the code, and run a spell checker on them.
The **'--exclude'** flag tells the script to ignore any file that has
'Ancillary' in its full path name. This flag will accept any
The **\'\-\-exclude\'** flag tells the script to ignore any file that has
\'Ancillary\' in its full path name. This flag will accept any
regular expression.

In addition to pyenchant's English dictionary, we use the words in
In addition to pyenchant\'s English dictionary, we use the words in
**additional_dictionary.txt**. These words are proper names and
technical terms harvest by hand from the SimpleITK and ITK code bases.

If a word is not found in the dictionaries, we try two additional checks.

1. If the word starts with some known prefix, the prefix is removed
...and the remaining word is checked against the dictionary. The prefixes
...used by default are **'sitk'**, **'itk'**, and **'vtk'**. Additional
...prefixes can be specified with the **'--prefix'** command line argument.
and the remaining word is checked against the dictionary. The prefixes
used by default are **\'sitk\'**, **\'itk\'**, and **\'vtk\'**. Additional
prefixes can be specified with the **\'\-\-prefix\'** command line argument.

2. We attempt to split the word by capitalization and check each
...sub-word against the dictionary. This method is an attempt to detect
...camel-case words such as 'GetArrayFromImage', which would get split into
...'Get', 'Array', 'From', and 'Image'. Camel-case words are very commonly
...used for code elements.
sub\-word against the dictionary. This method is an attempt to detect
camel-case words such as \'GetArrayFromImage\', which would get split into
\'Get\', \'Array\', \'From\', and \'Image\'. Camel-case words are very commonly
used for code elements.

The script can also process other file types. With the **'--suffix'**
The script can also process other file types. With the **\'\-\-suffix\'**
option, the following file types are available: Python (.py), C/C++
(.c/.cxx), CSharp (.cs), Text (.txt), reStructuredText(.rst), Markdown (.md),
Ruby (.ruby), R (.R), and Java (.java). Note that reStructuredText files are
treated as standard text. Consequentially, all markup keywords that are not
actual words will need to be added to the additional/exception dictionary.

## Disabling Spell Checking

Spell checking can be disabled for sections of code by using special

comments. The following comments will disable spell checking until
the corresponding end comment is found.
```
// spell-check-disable

// This comment will not be spell checked.

// spell-check-enable
```

Note that for C-style, multi-line comments, the disable and enable
comments must be in seperate comments. If the disable command
is found in a multi-line comment, spell checking will be
disabled for the entire multi-line comment.

```
/*
spell-check-disable
spell-check-enable
This comment will NOT be spell checked
*/
/* spell-check-enable */
/* This comment WILL be spell checked */
```


## Dictionary notes

We use [PySpellChecker](https://github.com/barrust/pyspellchecker) as the
Expand Down
14 changes: 14 additions & 0 deletions comment_spell_check/comment_spell_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,21 @@ def spell_check_file(
bad_words = []
line_count = 0

disable_spell_check = False

for c in clist:
if "spell-check-disable" in c.text().lower():
disable_spell_check = True
logger.info(" Spell checking disabled")
continue

if "spell-check-enable" in c.text().lower():
disable_spell_check = False
logger.info(" Spell checking enabled")

if disable_spell_check:
continue

mistakes = spell_check_comment(spell_checker, c, prefixes=prefixes)
if len(mistakes) > 0:
logger.info("\nLine number %s", c.line_number())
Expand Down
4 changes: 4 additions & 0 deletions tests/example.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@
// With node id's.
// With the itemIndex'th where itemIndex is a variable name.

// spell-check-disable
// Some comment with a misspelled word: definately
// spell-check-enable

#include <stdio.h>

int test_int;
Expand Down
2 changes: 1 addition & 1 deletion tests/test_comment_spell_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ def test_url(self):
"""URL test"""
url = (
"https://raw.githubusercontent.com/SimpleITK/SimpleITK/"
"refs/heads/master/.github/workflows/additional_dictionary.txt"
"refs/heads/main/.github/workflows/additional_dictionary.txt"
)
runresult = subprocess.run(
[
Expand Down