Skip to content

Release Apache 2.0 Python 3.11+

template-engine

Audit-grade document normalization engine. Regex-first, LLM-as-judge, zero LibreOffice. Built for regulated environments where document content cannot leak.

Quickstart GitHub

Why this exists

Three problems off-the-shelf solutions don't solve together:

  • Cost — paying the LLM per-doc when 95% of fields are mechanically extractable.
  • Compliance — regulators want auditability + a guarantee that LGPD/HIPAA data never reached an external API.
  • Verification — "did the candidate doc match the standard?" — text alone is not enough; structure, layout, required formats matter too.

How it solves each

  • Hybrid mapper — regex tier resolves what it can; only missing fields go to the LLM in a single batched call. Documents the regex resolves cost zero LLM tokens.
  • local_only=True raises before any remote call. PII masking + append-only audit log + deterministic regex path replayable bit-for-bit.
  • check_conformity — multi-dimensional verdict across text + structural + visual + design + technical. Each dimension scored independently. A single critical (invalid CPF, orphan placeholder, lost field) invalidates the doc regardless of the average.

Pipeline

extract → schema_inference → pattern_inference → hybrid_mapper → render → semantic_diff
                                                                       ConformityReport

Regex-first

pattern_inference learns regexes from 3 gold docs + field examples. 10 predefined value shapes plus optional grex-learned. Documents the regex resolves cost zero LLM tokens.

LLM as judge, not author

semantic_diff and the text / design conformity dimensions ask the LLM "did anything go missing?" and "does this match the standard?". The LLM does not write content; it audits.

Local-only mode

local_only=True on normalize_batch and check_conformity raises if any LLM provider is supplied. Hard guarantee for LGPD/HIPAA-grade deployments.

Multi-provider with fallback

6 providers — Gemini, OpenAI, Anthropic, Groq, Ollama, OpenRouter. LLMRouter chains them with automatic fallback on rate-limit / timeout.

Stateless

Path / bytes in, paths / bytes / dataclasses out. No web framework, ORM, or app layer. Plug into any caller.

Audit trail

engine.security.AuditLog writes append-only JSON Lines. Records sha256 hashes — never raw content.

Bundled formats

5 ready-to-use formats: ABNT NBR 6022 / 14724 / 6023, NR-12 (laudo), contrato simples. load_format(name) ships schemas + golds + tuned conformity weights.

Cost by tier (Gemini Flash)

Path LLM calls $/doc
Regex resolves everything 0 $0.0000
Some fields fall back to LLM 1 ~$0.0006
With semantic_diff enabled 2 ~$0.0012
With check_conformity(text + design) 4 ~$0.0024

Use cases

  • Industrial: standardize 400 maintenance reports onto a corporate template.
  • Legal: contract clause normalization with audit trail.
  • Government / regulated: forms processing with local_only=True and PII masking.
  • Migration: bulk move legacy documents to a new corporate standard.
  • QA: verify a third party delivered docs that match your spec (check_conformity).

Quick install

pip install "template-engine-ia[gemini]"

Continue with Quickstart →

License

Apache 2.0 · Copyright 2026 luizhcrs