r/MachineLearning
·
2d ago
·
7
·
new model
open source
fine tuning
ner
New open-source NER model (en_legal_ner_ind_trf v0.1) fine-tuned on InLegalBERT for Indian legal document extraction, achieving 78.67% F1 across 13 entity types with exceptional performance on case citations (97.76% F1). Addresses the gap left by unmaintained OpenNyAI model, particularly handling pre-1990 OCR-degraded constitutional texts using a silver-annotation pipeline combining regex, metadata projection, transformer NER, and gazetteer approaches trained with Focal Loss for label imbalance.