English | 简体中文 | 繁体中文 | 日本語 | 한국어 | Русский | Tiếng Việt
This is a Python tool for detecting invalid characters in code files. It supports detecting English, Chinese, Japanese, Korean and Russian characters and can help you find special or invisible characters in your code that may cause problems, and you can use it to detect invalid characters in your code.
- Supports multiple programming language files (default: .py, .java, .c, .cpp, .js, .html, .css, .txt)
- Support for detecting characters in multiple languages:
- English letters and numbers
- Chinese (CJK Unified Kanji)
- Japanese (Hiragana and Katakana)
- Korean (Hiragana and Katakana)
- Russian (Cyrillic alphabet)
- Vietnamese (Latin alphabet)
- Recursive checking of the entire project catalog
- Precisely locate the row and column numbers of invalid characters
- Displays the Unicode encoding value of invalid characters
- Command line parameter support
- Detailed error output
- Basic usage:
python invalid_char_checker.py /path/to/your/project
- Specify the file type:
python invalid_char_checker.py /path/to/your/project -e .py, .java, .c, .cpp, .js, .html, .css, .txt
-
ASCII characters:
- English letters (a-z, A-Z)
- Numbers (0-9)
- Common punctuation and operators
- Blank characters (spaces, tabs, line breaks, etc.)
-
Unicode characters:
- Chinese characters (CJK Unified Kanji)
- Japanese hiragana and katakana
- Korean characters
- Russian characters (Cyrillic alphabet)
- Files must be encoded in UTF-8
- If an encoding error is encountered, the program will display an appropriate error message
- It is recommended that you test the program on a small scale before working on a large project.
- If the specified directory does not exist, the program will display an error message and exit.
- If the file is not UTF-8 encoded, the program will display an encoding error message.
- If the file is not UTF-8 encoded, an encoding error will be displayed. Other errors while processing the file will be caught and detailed information will be displayed.
MIT License