News Nug

I trained a NER model on 33,000 Indian Supreme Court judgments (1950–2024) CASE_CITATION hits 97.76% F1, +17 points over the only prior baseline [P]

r/MachineLearning · 2d ago · 7 · new model open source fine tuning ner

New open-source NER model (en_legal_ner_ind_trf v0.1) fine-tuned on InLegalBERT for Indian legal document extraction, achieving 78.67% F1 across 13 entity types with exceptional performance on case citations (97.76% F1). Addresses the gap left by unmaintained OpenNyAI model, particularly handling pre-1990 OCR-degraded constitutional texts using a silver-annotation pipeline combining regex, metadata projection, transformer NER, and gazetteer approaches trained with Focal Loss for label imbalance.