Structural diff is easy. You compare two JSON objects, see that one has a field the other doesn’t, and call it a day. But structural diff doesn’t catch semantic drift.
What if a field changes from a string to a number? What if the range of valid values shifts? What if a nullable field starts returning null 10% of the time instead of 1%?
These are semantic changes. They break downstream consumers silently. And they’re what c28n is designed to catch.
The Approach
We don’t just compare schemas. We fingerprint every payload that flows through the system. For each field, we track:
- Type (string, number, boolean, null)
- Nullability frequency
- Value distribution (for strings: length histogram; for numbers: min/max/percentiles)
- Patterns (for strings: regex matches for UUIDs, emails, ISO dates)
Every hour, we aggregate these fingerprints. We compare them to the previous hour. If any field’s fingerprint diverges beyond a threshold, we flag it as drift.
The Details
The tricky part is setting thresholds. Flag too aggressively, and you drown in false positives. Too conservatively, and you miss real drift.
We use a sliding window approach. For each field, we track the last 24 hours of fingerprints. If the current hour’s fingerprint falls outside the 95th percentile of historical behavior, we flag it.
For semantic changes (like a field becoming nullable), we use a stricter threshold: if a field’s nullability frequency changes by more than 5% in a single hour, that’s drift.
The Payoff
This approach has caught drift that structural comparison would miss. Last month, a partner API started returning null for a “required” field 8% of the time — up from 0.1%. Structural diff would have missed it. c28n caught it in the first hour and alerted us before it cascaded.
Semantic drift is subtle. But it’s the kind of subtle that causes incidents. We’re working on making c28n’s drift detection smarter — stay tuned for updates.