Back to homepage

Search-indexed profile page

Document Intelligence Pipeline | Bikash Sapkota

OCR and operational automation: Combined OCR extraction, text classification, rule-based extraction, NER, and interface improvements to support faster manual review.

Context: OCR and operational automation

Company: Smart Data Solutions

Problem: Scanned claims required structured extraction and classification before they could move efficiently through operational workflows.

Solution: Combined OCR extraction, text classification, rule-based extraction, NER, and interface improvements to support faster manual review.

Impact: Improved the path from scanned documents to structured operational data. Reduced friction for manual keying workflows. Connected ML extraction with practical back-office usability.

Architecture: Scanned claims -> OCR -> Classification -> Entity extraction -> Review interface

Stack: OCR, Tesseract, FineReader, WEKA, Random Forest, NER