mirror of
https://github.com/nostr-protocol/nips.git
synced 2024-12-23 00:45:53 -05:00
additional notes about escaping to ensure correct event IDs
This commit is contained in:
parent
2c7e2af15f
commit
e34653ad04
38
01.md
38
01.md
|
@ -42,6 +42,8 @@ To obtain the `event.id`, we `sha256` the serialized event. The serialization is
|
||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### String Escapes
|
||||||
|
|
||||||
To prevent implementation differences from creating a different event ID for the same event, the following rules MUST be followed while serializing:
|
To prevent implementation differences from creating a different event ID for the same event, the following rules MUST be followed while serializing:
|
||||||
- UTF-8 should be used for encoding.
|
- UTF-8 should be used for encoding.
|
||||||
- Whitespace, line breaks or other unnecessary formatting should not be included in the output JSON.
|
- Whitespace, line breaks or other unnecessary formatting should not be included in the output JSON.
|
||||||
|
@ -54,6 +56,42 @@ To prevent implementation differences from creating a different event ID for the
|
||||||
- A backspace, (`0x08`), use `\b`
|
- A backspace, (`0x08`), use `\b`
|
||||||
- A form feed, (`0x0C`), use `\f`
|
- A form feed, (`0x0C`), use `\f`
|
||||||
|
|
||||||
|
In addition, implementations should retain all other escape sequences
|
||||||
|
without modification due a normalization to one scheme affecting event IDs
|
||||||
|
in the absence of a normative marker to specify the one being used,
|
||||||
|
because there is three forms of escaping other than the single letter C
|
||||||
|
style as above:
|
||||||
|
|
||||||
|
- `\uXX` - 8 bit hex
|
||||||
|
- `\uXXXX` - 16 bit hex
|
||||||
|
- `\XXX` - 24 bit octal
|
||||||
|
|
||||||
|
Implementations *could* make this a part of their internal data structure
|
||||||
|
but the primary directive is that the submitted event string encoding MUST
|
||||||
|
be the same after marshalling it back to JSON, thus it is simpler to just
|
||||||
|
leave them alone.
|
||||||
|
|
||||||
|
There can also be HTML entities, but these do not need special handling due
|
||||||
|
to their not being based on the reverse solidus " \ ". Longer `\u` codes are
|
||||||
|
possible, according to UTF-8 rules but few implementations use them and a
|
||||||
|
parser that accepts the `\u` prefix without modification will accept 2, 4, 6
|
||||||
|
or 8 hex digits or even incorrect values that don't include reverse solidus.
|
||||||
|
A parser can thus make a special case for `\u` and `\[0-9]` and cover all cases.
|
||||||
|
|
||||||
|
Because of the absence of sentinels to signify which scheme should be used, and
|
||||||
|
to conserve space on the most frequently occurring control characters, `\n`,
|
||||||
|
`\t` and `\\`, this specification uses the C-style escapes, and so any escapes
|
||||||
|
like these three common types above, should not be modified to ensure that the
|
||||||
|
canonical form of the event that determines the event ID hash is consistent
|
||||||
|
across implementations.
|
||||||
|
|
||||||
|
As a rule, data that is intended to represent binary should be either
|
||||||
|
encoded in hexadecimal or standard JSON Base64. Wherever possible, as with
|
||||||
|
the `e` and `p` tags, specifications that put binary data in a specific
|
||||||
|
format in fields of tags should make it simple for implementations to store
|
||||||
|
the data in binary format in the runtime to conserve memory and improve
|
||||||
|
matching performance, at a very low processing cost.
|
||||||
|
|
||||||
### Tags
|
### Tags
|
||||||
|
|
||||||
Each tag is an array of one or more strings, with some conventions around them. Take a look at the example below:
|
Each tag is an array of one or more strings, with some conventions around them. Take a look at the example below:
|
||||||
|
|
Loading…
Reference in New Issue
Block a user