Skip to content

Correct large Word documents in colour with Claude Sonnet, surpasses Microsoft Word. Includes chunking and prompt evaluations.

Notifications You must be signed in to change notification settings

michellepace/word-document-corrector-claude

Repository files navigation

Word Document Corrector using Claude

Corrects large Word documents using Claude 3.5 Sonnet at a level that surpasses Microsoft Word. Upload any .docx file and see corrections in colour.

  • Deep language correction beyond Microsoft Word capabilities.
  • Corrects grammar, spelling, and inappropriate word choice in colour.
  • Preserves writting style and semantic meaning.
  • Comprehensive testing suite to validate correction integrity (see below).

Notebook Usage

  1. Click Open In Colab
  2. Then follow these steps in section Your Settings

Examples of Corrections

Eg.1 — Draft write-up for this notebook

Eg.2 — Draft article "Pushing Aside the Bench for the Mark"

Notebook Implementation by Section

See solution-diagram:

  1. Setup: Installs and imports necessary libraries
  2. Pre-processing: Extracts text from Word, converts to markdown, splits into chunks
  3. Processing: Sends chunks to Claude for correction
  4. Post-Processing: Reassembles chunks, creates HTML output
  5. Testing: Evaluates output, tests prompt, and tests code

Test Suite

Testing is arranged into three major errors:

  1. Test Processed Doc: End-to-end testing, verifying code and prompt.
  2. Prompt testing (evaluations): Ensure prompts performs as instructed (uses generated test data).
  3. Test My Code: Traditional functional testing for core code components.

For Test Processed Doc:

  • The prompt intructs to "make corrections but retain original meaning".
  • Therefore for each chunk pair (original vs corrected), testing covers:
    • Content Preservation: Document structure
    • Content Preservation: Simple Word Count (+/- 5% tolorance)
    • Content Preservation: Semantic Meaning (scores > 70%)

About

Correct large Word documents in colour with Claude Sonnet, surpasses Microsoft Word. Includes chunking and prompt evaluations.

Topics

Resources

Stars

Watchers

Forks