Technical Specification

betto_schema

1 Purpose and scope

betto_schema is a pure-Dart library that provides JSON Schema validation primitives aligned with the JSON Schema Validation 2020-12 specification.

The library exposes three layers:

Layer 1 must never import Layer 2 or Layer 3. Layer 3 only calls Layer 2.

2 A Vocabulary for Structural Validation

This section documents the correct behaviour of each supported JSON Schema keyword, including edge cases and deviations from naïve implementations. Only keywords with non-trivial or surprising implementation details are documented here; keywords that follow the specification without deviation are omitted.

2.1 Validation Keywords for Any Instance Type

2.1.1 type

The type keyword value MUST be either a string or an array of strings.

String form — the value must match the named type exactly.

Array form — the value is valid if its type matches any entry in the list (logical OR). For example, {"type": ["string", "null"]} accepts both strings and null.

Supported type names: string, number, integer, boolean, array, object, null. Unknown type names are silently accepted in Layer 2 (the rule tree) and silently rejected in Layer 1 (the programmatic validators).

2.1.1.1 integer sub-type

The spec defines an integer as “a numeric instance whose value is without a fractional part”. This means:

2.1.2 const

The const keyword constrains the instance to be equal to the declared constant value. The value may be any JSON type, including null.

The comparison uses structural (deep) equality. Nested List and Map values are compared element-by-element rather than by object identity. Primitive values (string, number, boolean, null) are handled correctly by deep equality as well.

const validates the value of the instance, not its presence. Use required to enforce that a key exists; use const to enforce what its value must be. A schema {"const": null} accepts the value null and rejects any non-null value.

2.2 Validation Keywords for Numeric Instances

2.2.1 multipleOf

The multipleOf keyword validates that a numeric instance is an exact multiple of the declared divisor. Non-numeric instances are silently skipped (no violation).

The keyword value must be a number strictly greater than zero. A divisor of zero is a schema-error guard: the rule produces a violation for any numeric value rather than throwing a Dart exception.

2.2.1.1 Floating-point safety

The naive check instance % divisor == 0 is numerically unsafe for decimal divisors because of IEEE-754 rounding. For example, 0.3 % 0.1 is approximately 2.77e-17, not 0. The implementation uses a quotient-based check: (instance / divisor) - round(instance / divisor) is tested against an epsilon of 1e-10. This correctly identifies 0.3 as a multiple of 0.1.

2.3 Validation Keywords for Strings

2.3.1 pattern

Regular expressions in the pattern keyword are not implicitly anchored. A pattern need only match somewhere within the string — it does not need to match the entire string. This is identical to the behaviour of RegExp.hasMatch() in Dart.

For example, {"pattern": "foo"} accepts "foobar", "barfoo", and "foo".

Callers who need a full-string match must anchor their pattern explicitly using ^ and $ (e.g. {"pattern": "^foo$"}).

An empty pattern ("") always matches every string (correct per spec: an empty regex has at least one match at position 0 in any string).

2.4 Validation Keywords for Arrays

2.4.1 uniqueItems

The uniqueItems keyword validates that all elements in an array are pairwise distinct. It activates only when the keyword value is exactly true; a value of false (or absence) means no uniqueness constraint is applied. Non-array instances are silently skipped.

Uniqueness uses structural (deep) equality — the same DeepCollectionEquality used for const. A naïve toSet() check fails to detect duplicate nested objects or arrays because Dart compares Map and List by object identity in the default equality. The implementation uses an O(n²) pairwise comparison, which is simple and correct for the expected sizes of JSON Schema instances.

Empty and single-element arrays are vacuously unique.

2.4.2 contains, minContains, maxContains

The contains keyword validates that at least one element of an array satisfies a sub-schema. minContains and maxContains refine how many elements must match. The three keywords are tightly coupled and implemented together as a single ContainsRule.

Semantics:

Non-array instances are silently skipped (no violation), consistent with every other array-keyword rule.

Empty sub-schema (contains: {}) matches every element, so minContains/maxContains effectively become array-count constraints (e.g. {"contains": {}, "minContains": 3} requires at least three elements).

2.4.3 prefixItems and boolean items

In JSON Schema 2020-12, prefixItems is an array of schemas applied positionally: element i is validated against prefixItems[i]. The existing items keyword applies only to elements beyond the prefix (indices ≥ prefixItems.length) when prefixItems is present in the same schema. When prefixItems is absent, items applies uniformly to all elements (consistent with earlier behaviour).

prefixItems array-length behaviour:

Boolean items (2020-12):

The spec §6.4.1 clarifies that items MUST be a valid JSON Schema, and false is a valid boolean schema meaning “always invalid”.

Violation paths for positional elements use bracket notation, e.g. [0], [1].

2.5 Validation Keywords for Objects

2.5.1 minProperties

The minProperties keyword validates that an object (map) has at least the specified number of properties. Non-object instances are silently skipped. The bound is inclusive: {"minProperties": 2} accepts objects with exactly two properties. An empty object satisfies {"minProperties": 0}.

2.5.2 maxProperties

The maxProperties keyword validates that an object (map) has at most the specified number of properties. Non-object instances are silently skipped. The bound is inclusive: {"maxProperties": 3} accepts objects with one, two, or three properties.

When both minProperties and maxProperties are declared, a single ObjectSizeRule enforces both bounds and collects all violations in one pass.

2.5.3 dependentRequired

The dependentRequired keyword declares conditional property dependencies. The keyword value is an object where each key is a trigger property name that maps to an array of dependent property names.

For each trigger key that is present in the instance, all listed dependent property names must also be present. If the trigger key is absent, no validation is performed for that entry (the dependency is not activated).

One SchemaViolation is emitted per missing dependent property. The violation path follows the same format as required: the path is the dot-notation path to the object, followed by the missing property name (e.g. "payment.billingAddress"). The violation message is "required field is missing", consistent with RequiredRule.

An empty dependent list ("trigger": []) always passes when the trigger is present.

2.5.4 patternProperties

A map of ECMA-262 regex strings to sub-schemas. For each property in the instance, every pattern that matches the property name (unanchored, using RegExp.hasMatch) causes the associated sub-schema to be applied to the property value. A property may be matched by zero, one, or more patterns — every matching sub-schema is applied and all violations are collected.

Key points:

Interaction with additionalProperties: a property is considered “pattern-evaluated” if at least one pattern in patternProperties matches its name. The additionalProperties rule skips pattern-evaluated properties just as it skips properties declared in properties.

2.5.5 additionalProperties

In addition to additionalProperties: false (already supported), the parser handles a schema-valued additionalProperties. Properties not covered by properties or patternProperties must validate against this sub-schema.

Evaluated-property tracking:

The set of “evaluated” keys is determined at parse time:

  1. Keys explicitly declared under properties.
  2. Keys matched at runtime by any regex in patternProperties (handled by PatternPropertiesRule).

AdditionalPropertiesSchemaRule receives both the static declared-key set and the compiled pattern list. At validation time it skips any key that is either in the declared set or matched by a pattern, then applies the sub-schema to the remaining keys.

Guard removal: the previous parsedProperties != null guard that prevented additionalProperties from activating when properties was absent has been removed. additionalProperties (both false and schema form) now activates regardless of whether properties is present. When neither properties nor patternProperties is declared, every key is “additional”.

additionalProperties: false with patternProperties: Uses AdditionalPropertiesSchemaRule with an AlwaysInvalidRule payload so that pattern-matched keys are correctly excluded from the “additional” set before rejection.

3 Vocabularies for Semantic Content With format

3.1 Foreword

The format keyword assigns a semantic meaning to string instances. betto_schema implements format as assertion behaviour: a string that does not satisfy the named format produces a SchemaViolation. Unrecognised format names are silently ignored (no violation).

3.2 Defined Formats

The following formats align with JSON Schema Validation 2020-12 §7.3. Only formats with non-trivial validation logic or notable implementation decisions are described in detail; the remaining formats are listed with a brief summary.

3.2.1 Dates, Times, and Duration

3.2.2 Email Addresses

3.2.3 Hostnames

3.2.3.1 hostname

The hostname format validator accepts DNS hostnames per RFC 1123 §2.1.

Rules:

Underscores, spaces, @, and other non-alphanumeric/hyphen characters in labels are rejected.

3.2.3.2 idn-hostname

The idn-hostname format validator accepts internationalized hostnames per RFC 5890. This is a best-effort check, not full IDNA 2008 / Punycode conformance.

Full IDNA 2008 conformance requires Punycode encoding and Unicode normalization (NFKC) that are not available in a pure-Dart context without external dependencies. This may be upgraded in v1 if a suitable pure-Dart IDNA library becomes available.

What is validated:

3.2.4 IP Addresses

3.2.4.1 ipv4

The ipv4 format validator accepts dotted-quad IPv4 address notation per RFC 2673 §3.2. Each of the four decimal octets must be in the range 0–255.

Leading zeros are rejected. 01.0.0.0 is invalid because a leading zero is ambiguous (octal vs. decimal interpretation). The validator uses a per-octet regex alternation (25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d) that enforces both the range constraint and the no-leading-zero constraint in a single pass.

Addresses with fewer than four octets, more than four octets, or non-numeric content are rejected.

3.2.4.2 ipv6

The ipv6 format validator accepts IPv6 address strings per RFC 4291 §2.2. It is implemented without dart:io (InternetAddress) so that the validator works in browser/web environments where dart:io is unavailable.

Supported forms:

Hex groups are case-insensitive. Strings containing more than one ::, more than eight groups in the full form, or hex groups with more than four digits are rejected.

Zone IDs (%eth0 suffixes) are not part of the RFC 4291 text representation grammar and are rejected by this validator.

3.2.5 Resource Identifiers

3.2.5.1 uri

The uri format validator uses Dart’s Uri.tryParse(). This is intentionally lenient: it accepts any string that Dart can parse as a URI reference, including relative references and bare words, as well as all registered URI schemes.

In particular, a valid URN (e.g. urn:isbn:0451450523) is a valid URI because urn is a registered URI scheme per RFC 3986. The uri validator therefore accepts both http URLs and valid URNs.

Callers who need to enforce absolute URIs should check Uri.isAbsolute or apply additional constraints beyond this format validator.

3.2.5.2 urn

The urn format validator uses Urn.tryParse() (the Urn class from lib/src/formats/urn.dart), which strictly validates URN syntax (urn:<nid>:<nss>). Plain http/https URLs and other non-URN strings are rejected.

3.2.5.3 uri-reference

The uri-reference format validator accepts URI references per RFC 3986 §4.1. A URI reference is either an absolute URI (e.g. https://example.com) or a relative reference (e.g. /path/to, ../foo, #section, "").

The validator uses a structural approach:

  1. Uri.tryParse must succeed (handles structural parsing).
  2. The string must not contain characters that are illegal in both absolute URIs and relative references: unescaped spaces, ASCII control characters (0x00–0x1F, 0x7F), or literal angle brackets (<, >).

The empty string is a valid uri-reference (it refers to the current document). Percent-encoded spaces (e.g. %20) are valid; literal spaces are not.

3.2.6 JSON Pointers

3.2.6.1 json-pointer

The json-pointer format validator accepts JSON Pointer strings per RFC 6901.

A JSON Pointer is either:

Within a reference token, the tilde character ~ must only appear as the two-character escape sequences ~0 (representing ~) or ~1 (representing /). A bare ~ or an escape sequence other than ~0/~1 (e.g. ~2) is invalid.

Strings that do not begin with / (and are not the empty string) are invalid.

3.2.6.2 relative-json-pointer

The relative-json-pointer format validator accepts Relative JSON Pointer strings per the IETF draft (bhutton/relative-json-pointer).

A Relative JSON Pointer begins with a non-negative integer prefix (the number of steps to walk up the document tree) followed by either:

Leading zeros in the integer prefix are rejected unless the prefix is exactly "0". So 01 and 00 are invalid, but 0 is valid.

Examples: 0, 1, 0#, 1#, 0/foo, 2/a/b, 10/foo.

3.2.7 regex

The regex format validator accepts any string that compiles as a valid Dart RegExp. The check is performed by attempting RegExp(value) and catching FormatException.

Note: JSON Schema 2020-12 specifies ECMA-262 regular expression syntax. Dart’s RegExp uses a compatible but not identical dialect; minor differences in behaviour may exist for edge-case patterns.

3.3 Extension Formats

The following format strings are not defined by the JSON Schema specification. They are project-specific extensions provided by betto_schema for use in Bettongia collection schemas. They are recognised by StringFormatValidator and the Layer 2 rule tree in the same way as the standard formats.

3.3.1 hex-string

Accepts a string composed entirely of hexadecimal digits (09, af, AF). An optional 0x prefix is permitted and stripped before validation. Hex digits are case-insensitive: DEADBEEF and deadbeef are both valid.

An empty string (after stripping the optional 0x prefix) is invalid.

3.3.2 digit-string

Accepts a string composed entirely of decimal digits (09). No leading-zero restriction is applied — "007" is valid. The string must be non-empty.

This is distinct from the integer type, which validates a numeric JSON value. digit-string validates a string whose characters are all decimal digits — useful for numeric identifiers such as barcodes, phone numbers, and EAN codes where the value is always stored as a string.

3.3.3 roman-numeral

Accepts a string composed entirely of recognised Roman numeral characters (I, V, X, L, C, D, M, case-insensitive). The string must be non-empty.

This is a structural check only: it verifies that every character is a valid Roman numeral symbol. Canonical subtractive ordering (IV, IX, etc.) is not enforced — additive repetition such as IIII is accepted as 4. The apostrophus and vinculum extended notations and fractional values are not supported.

3.3.4 isbn-13

Accepts a 13-digit International Standard Book Number (ISBN-13) per the ISBN Users’ Manual. The input may contain hyphens or spaces as separators (they are stripped before validation), but the total number of extracted digits must be exactly 13.

Validation rules:

Inputs longer than 22 characters (before digit extraction) are rejected to guard against pathologically long strings.

3.3.5 doi

Accepts a Digital Object Identifier (DOI) string per the DOI Handbook. A DOI has the form:

<prefix>/<suffix>

where the prefix has the form <directory-indicator>.<registrant-code> (the directory indicator and each registrant-code segment must be all-digit strings, and segments are separated by .), and the suffix is any non-empty string. Both prefix and suffix must be non-empty, and the / separator must be present.

DOI names are case-insensitive per DOI Handbook §2.4. The string form is normalised to uppercase in the DOI class, but isValid accepts any case.

3.3.6 lang

Accepts a language tag per RFC 5646. Three tag forms are recognised:

Language tags are case-insensitive per RFC 5646. The lang format validator accepts any casing; the LanguageTag class normalises output to the recommended conventions (lowercase language, uppercase region, Title Case script).

Extended language subtags (e.g. zh-cmn-Hans-CN) are supported per RFC 5646 §2.2.2.