HN Reader

Just say no to broken JSON

7 months agoby ingve

> {"key": "value\nda"}

> My convention is that \n is the one-byte ASCII control character linefeed. This JSON is not valid.

How is this not valid?

7 months agoby chrisjj

In addition to the prior comments re: how the example JSON parse failures are actually correct...

    Let me consider a broken JSON document: `{"key": "value\nda"}`

So using xxd on that input we get

    00000000: 7b22 6b65 7922 3a20 2276 616c 7565 5c6e  {"key": "value\n
    00000010: 6461 227d                                da"}

or as a contiguous hex string 7b226b6579223a2276616c75655c6e6461227d which is definitely valid JSON.

> My convention is that \n is the one-byte ASCII control character linefeed, unless otherwise stated.

That would be the following bytes

    00000000: 7b22 6b65 7922 3a20 2276 616c 7565 0a64  {"key": "value.d
    00000010: 6122 7d                                  a"}

which is for sure invalid JSON, but is also for sure different than the original document.

Transforming the characters `\n` (5c6e) into the character LF (0a) is a function of whatever primitive/type is used to represent the original string literal, and how it gets interpreted by the language in which it is defined.

For example in Go that same `{"key": "value\nda"}` would transform the \n to LF if it's wrapped in double-quotes (and the interior double-quotes appropriately escaped) which would result in a sequence of bytes that are not valid JSON. But it would leave the \n alone if it's wrapped in single backticks, which would result in a sequence of bytes that would be totally valid JSON.

> To avoid technical debt, systems should simply reject invalid JSON.

It seems like every system mentioned in the post... does simply reject invalid JSON. In fact I'm not aware of anything that would accept that invalid JSON... ?

> The problem is not with Python, JavaScript and friends. The problem is with people generating JSON documents where the strings are not escaped. I am telling them to stop doing it.

By this does the author mean "generating JSON documents where the strings are" invalid JSON -- i.e. contain 0x0a? Presumably not, as nobody would get very far if they're producing invalid JSON. So it's probably not this.

Or do they mean where strings are valid JSON that encode \n as 0x5c 0x6e? But this is fine, right? So probably not this either?

Or does the author mean that they expect every \n to be transformed to \\n automatically and universally, i.e. 0x5c 0x5c 0x6e? But that wouldn't make any sense!

Very confusing post.

7 months agoby kiitos