0012: Reverse-Edge Functions — Closing the XML ↔ JSON Schema ↔ Pydantic Loop
Status: Accepted
Date: 2026-06-11
Authors: Ben Lin
Context
weirding models three representations of a structured-output shape:
- A — XML schema document: the authoring format (plain-attribute annotation convention per ADR-0001, or XSD per ADR-0006).
- B — JSON Schema IR: the public
dictreturned bycompile(), the canonical intermediate representation (ADR-0002). The native compiler emits a fully inlined IR — nested objects become inline{"type":"object","properties":{...}}, never$ref/$defs. References appear only when an IR originates outside weirding. - C — Pydantic v2 model class: built from the IR by
from_schema()(ADR-0004).
Today the schema-level pipeline is strictly one-directional: A → B → C.
| Edge | Function | Present? |
|---|---|---|
| A → B | compile(xml) |
yes |
| B → C | from_schema(ir) |
yes |
| A → C | define_model(xml) |
yes (composition) |
| B → B′ | to_json_schema(ir) |
yes — forward IR→provider transform (ADR-0010), not a reverse edge |
| C → B | — | no |
| B → A | — | no |
| C → A | — | no (only prompt.to_template, which is a lossy prompt artifact, not a faithful schema) |
The project README and GitHub positioning describe "3-way XML ↔ JSON Schema ↔ Pydantic v2
fungibility." Taken literally — free interconversion among all three — this is not
currently true: nothing flows back out of the funnel as a schema. The three missing or
lossy edges are C → B, B → A, and C → A.
This is a decision because the reverse edges are non-obvious in two ways:
-
C → Bcould be reimplemented or delegated. Pydantic already exposesmodel.model_json_schema()producing draft 2020-12 JSON Schema. The open question is whether to reimplement IR extraction frommodel_fields, or to normalize Pydantic's native output. The native output omits weirding'sx-weirding-item-tagextension key for hand-written models and may emitprefixItems(banned by ADR-0004 / MEMORY rule 11) for tuple fields. -
B → Acannot be a total function. The annotation convention (ADR-0001) has no syntax for non-null unions (anyOf/oneOf/allOfbeyond thenullablepattern) and no reference mechanism — so cyclic/self-referential IR cannot be serialized into a finite XML tree. A reverse serializer must define, and fail loudly on, the inexpressible subset rather than emit silently-wrong XML.
Alternatives considered:
-
Do nothing; correct the marketing instead. Reword the README to claim only a forward pipeline plus a two-way instance round-trip (
parse/to_xml). Rejected: the reverse edges are genuinely useful (import a legacy Pydantic model into the XML-authoring workflow; regenerate canonical XML from a hand-edited IR; produce a diffable XML schema from a model for review) and the loop is nearly closed already by inlined-IR symmetry. -
Reimplement
C → Bfrommodel_fieldsdirectly. Rejected: duplicates the type→schema logic Pydantic already owns and would drift from it across Pydantic releases. Normalizingmodel_json_schema()output is strictly less code and tracks upstream. -
Make
B → Atotal by inventing union/reference XML syntax. Rejected: extends the ADR-0001 authoring vocabulary for an edge case, breaking the "idiomatic, LLM-promptable XML" property that motivated the convention. Failing loudly on the inexpressible subset preserves the convention. -
Route
C → Athrough a new dedicated function. Rejected as redundant: it is exactlydump_xml(to_schema(model)). A third primitive adds surface area for zero new behavior. -
Naming
decompile/to_xml_schemaforB → A. Considered.decompilepairs neatly withcompilebut reads as "reverse-engineering."to_xml_schemais explicit but visually collides with the existingto_xml. Chosedump_xml(see Decision) to follow the Pydantic/json.dump"serialize-out" convention.
Decision
We will add two pure, reverse-edge functions to the public API, and document C → A as
their composition rather than a third function.
to_schema(model: type[BaseModel]) -> JsonSchemaIR (edge C → B), inverse of
from_schema. It normalizes model.model_json_schema() into IR rather than
reimplementing extraction:
- Array properties lacking
x-weirding-item-tagreceive a synthesized tag via the same singularization fallback used by_serializers._item_tag_for_field(tags→tag, elseitem). This shared heuristic is factored into one helper imported by both call sites so they cannot drift. prefixItems(tuple fields) raisesSchemaErrornaming the field — it is unrepresentable in the IR (ADR-0004, MEMORY rule 11).$defs/$refare left intact; resolving them isdump_xml's concern, not this one.
dump_xml(ir: JsonSchemaIR) -> str (edge B → A), inverse of compile. It emits a
canonical ADR-0001 annotation XML document — the structural inverse of
_schema._element_to_schema. It inlines any $ref/$defs first (reusing the resolution
logic from _export), then maps each IR construct to its attribute form: minLength/
minItems→min, maxLength/maxItems→max, enum→pipe-joined enum, the
anyOf:[T,null] pattern→nullable="true", a property absent from an object's required
list→required="false", and array items + x-weirding-item-tag→a single child template
element. It raises SchemaError, naming the construct, on:
- a non-null
anyOf/oneOf/allOf(no union syntax in the convention), - a cyclic or unresolvable
$ref(no finite XML serialization).
dump_xml is named, not to_xml_schema or decompile. It deliberately sits beside the
existing to_xml(instance): dump_xml serializes a schema IR to an XML schema
document; to_xml serializes a model instance to XML data. Their docstrings must
state this distinction explicitly as the primary disambiguator.
C → A is dump_xml(to_schema(model)) — documented as a one-liner, not given its own
function.
Both functions are pure: deep-copy their input, never mutate it, perform no I/O, no
logging, no network access — matching to_json_schema (ADR-0010) and the project logging
policy.
This decision is additive and semver-minor. It does not alter compile, from_schema,
the IR format (ADR-0002 stability contract is untouched), or any existing behavior.
Consequences
Positive
- All six edges of the A ↔ B ↔ C triangle exist; the "3-way fungibility" claim becomes literally true (with two documented limits) rather than aspirational.
- New workflows: import a hand-written or legacy Pydantic model into the XML-authoring
flow (
to_schema); regenerate canonical, diffable XML from an edited IR or a model (dump_xml); review a model as XML in a PR. to_schematracks Pydantic's own type→schema logic across releases instead of duplicating it.- The round-trip invariants
compile(dump_xml(ir)) == ir(acyclic, union-free IR) andto_schema(from_schema(ir)) ≈ irbecome testable property-based guarantees, hardening the IR contract in both directions.
Negative
dump_xmlis a partial function. Cyclic/self-referential IR and non-null unions raise rather than serialize. Callers must handleSchemaError, and the limitation is a permanent property of the ADR-0001 convention, not a backlog item.to_schemainherits whatever quirksmodel_json_schema()emits (e.g. Pydantic-addedtitlekeys). The IR tolerates these, butto_schema(from_schema(ir))is equal only modulo such additive noise — exact dict equality is not guaranteed.- Two more names (
dump_xml,to_schema) on the public surface raise the chance of confusion with the adjacentto_xmlandto_json_schema. Mitigated by docstrings but not eliminated. - More surface area is more API to keep stable under semver going forward.
Neutral
- Implementation lives in new modules (
_decompile.py,_introspect.py) so the Tier-2 protected_schema.pyand_models.pyare not touched. The only protected-file edit is additive re-exports in the Tier-1__init__.py, which requires explicit approval. - The singularization fallback is factored out of
_serializers.pyinto a shared helper; behavior is unchanged, but the helper's location moves. - Future changes to either function's failure-mode set (e.g. teaching
dump_xmla new expressible construct) are behavioral and should be recorded as a follow-up ADR or noted against this one.
Amendment (2026-06-11, implementation)
Recorded during implementation of this ADR. These are refinements discovered while
building to_schema and dump_xml; they do not change the decision, only sharpen its
edges.
-
to_schemarestoresrequired: []. Pydantic'smodel_json_schema()omits therequiredkey entirely on an object that has no required fields, whereas canonicalcompile()IR always carries arequiredlist (possibly empty) on every object node.to_schematherefore restoresrequired: []on any object node where Pydantic dropped it, keepingto_schema(from_schema(ir))structurally symmetric with the canonical IR. -
Round-trip equivalence is defined modulo
$ref-inlining (not title noise alone).from_schemabuilds every nested object into a real nested PydanticBaseModel, and Pydantic v2model_json_schema()unconditionally hoists nested models into top-level$defs+$ref. Soto_schema(from_schema(ir))for any IR with a nested object is$ref-bearing, while canonicalcompile()IR is fully inlined. The equivalenceto_schema(from_schema(ir)) ≈ iris therefore asserted only after inlining$defs/$refon theto_schemaside and stripping additivetitle/defaultkeys.to_schemaitself does NOT inline — that remainsdump_xml's job (the Decision above). Only the test comparison inlines. The fully-canonical round tripcompile(dump_xml(ir)) == iris, by contrast, asserted as exact dict equality. -
dump_xmlaccepts both nullable shapes. The nullable pattern is recognized in bothanyOf:[T, {type:null}](what_schema._wrap_nullableemits) andtype:[T, "null"](whatto_json_schema(strict=True)and some foreign IR emit); both map tonullable="true"plus the inner type's attributes. -
Out-of-vocabulary keyword handling.
formatandadditionalPropertiesare dropped (no annotation-convention equivalent; lossy but non-fatal, mirroring the strict-export drops of ADR-0010).constis rejected withSchemaError— dropping it would silently lose a value constraint rather than a presentational hint. -
Neutral
$refresolution core. The local-$refresolution logic was factored out of_exportinto_refs.pywith caller-agnostic error wording, sodump_xmlfailures do not emit_export's strict-mode-specific phrasing._exportwraps the neutral core to keep its own message text. This guarantees forward (_export) and reverse (_decompile) inlining cannot drift.