IndoDoc Vision:  Automated Kartu Keluarga Extraction with State-of-the-Art AFeatured
Featured Project

IndoDoc Vision: Automated Kartu Keluarga Extraction with State-of-the-Art A

YOLOVLMFastAPIOCR

About this Project

IndoDoc Vision is a pipeline designed to extract structured JSON data from Indonesian Family Card (Kartu Keluarga) documents. By integrating YOLOv8, U-Net, and Gemini VLM, the system achieves >95% field-level accuracy.

Details

Features According to the sources, the project utilizes a sophisticated three-stage pipeline to ensure maximum reliability:

• Intelligent Detection: Uses YOLOv8 to pinpoint 22 specific field classes (e.g., NIK, Name, Address) with high precision (mAP@0.5-0.95 = 0.886).

• Advanced Enhancement: Employs a U-Net model to "clean" document crops through denoising, binarization, and line removal, making the text significantly more legible for the extraction phase.

• VLM Extraction: Leverages Google Gemini 1.5 to perform intelligent field association and OCR, converting visual data into a structured JSON format.

• Production Capabilities: The system is Docker-ready, supports Prometheus metrics for observability, and features a secure design that ensures no PII (Personally Identifiable Information) logging.

Challenges The primary challenge identified in the sources was balancing latency and accuracy. While a "VLM Only" mode is faster (~1.0s), it suffers from hallucinations and lower accuracy (85-92%) on poor-quality documents. Implementing the "Full Pipeline" solves this but introduces higher GPU memory usage and increased complexity by managing three separate model dependencies.

Key Learnings

• The "Sweet Spot": Through testing, the YOLO + VLM mode was discovered to be the recommended balance for most production cases, providing accurate row association and layout understanding while being ~400ms faster than the Full Pipeline.

• Critical Preprocessing: The sources highlight that U-Net enhancement is essential for legacy or low-quality documents to prevent OCR errors caused by table borders or image noise.

• Efficiency Tools: Implementing the UV package manager was a significant breakthrough, allowing for 10-100x faster package installations compared to traditional methods.

Analogy: Think of this system as a high-speed automated sorting facility: YOLO is the scanner that identifies where the packages are; U-Net is the cleaning station that removes mud and dust from the labels; and Gemini VLM is the intelligent clerk who reads the labels and records them perfectly into the digital database.

Technologies Used

Y
YOLO
V
VLM
F
FastAPI
O
OCR

Interested in working together?

Let's discuss your next project and bring your ideas to life.

© 2026 Eggi Satria. All rights reserved.