mirror of
https://github.com/nostr-protocol/nips.git
synced 2024-12-23 00:45:53 -05:00
additional notes about escaping to ensure correct event IDs
This commit is contained in:
parent
2c7e2af15f
commit
e34653ad04
38
01.md
38
01.md
|
@ -42,6 +42,8 @@ To obtain the `event.id`, we `sha256` the serialized event. The serialization is
|
|||
]
|
||||
```
|
||||
|
||||
### String Escapes
|
||||
|
||||
To prevent implementation differences from creating a different event ID for the same event, the following rules MUST be followed while serializing:
|
||||
- UTF-8 should be used for encoding.
|
||||
- Whitespace, line breaks or other unnecessary formatting should not be included in the output JSON.
|
||||
|
@ -54,6 +56,42 @@ To prevent implementation differences from creating a different event ID for the
|
|||
- A backspace, (`0x08`), use `\b`
|
||||
- A form feed, (`0x0C`), use `\f`
|
||||
|
||||
In addition, implementations should retain all other escape sequences
|
||||
without modification due a normalization to one scheme affecting event IDs
|
||||
in the absence of a normative marker to specify the one being used,
|
||||
because there is three forms of escaping other than the single letter C
|
||||
style as above:
|
||||
|
||||
- `\uXX` - 8 bit hex
|
||||
- `\uXXXX` - 16 bit hex
|
||||
- `\XXX` - 24 bit octal
|
||||
|
||||
Implementations *could* make this a part of their internal data structure
|
||||
but the primary directive is that the submitted event string encoding MUST
|
||||
be the same after marshalling it back to JSON, thus it is simpler to just
|
||||
leave them alone.
|
||||
|
||||
There can also be HTML entities, but these do not need special handling due
|
||||
to their not being based on the reverse solidus " \ ". Longer `\u` codes are
|
||||
possible, according to UTF-8 rules but few implementations use them and a
|
||||
parser that accepts the `\u` prefix without modification will accept 2, 4, 6
|
||||
or 8 hex digits or even incorrect values that don't include reverse solidus.
|
||||
A parser can thus make a special case for `\u` and `\[0-9]` and cover all cases.
|
||||
|
||||
Because of the absence of sentinels to signify which scheme should be used, and
|
||||
to conserve space on the most frequently occurring control characters, `\n`,
|
||||
`\t` and `\\`, this specification uses the C-style escapes, and so any escapes
|
||||
like these three common types above, should not be modified to ensure that the
|
||||
canonical form of the event that determines the event ID hash is consistent
|
||||
across implementations.
|
||||
|
||||
As a rule, data that is intended to represent binary should be either
|
||||
encoded in hexadecimal or standard JSON Base64. Wherever possible, as with
|
||||
the `e` and `p` tags, specifications that put binary data in a specific
|
||||
format in fields of tags should make it simple for implementations to store
|
||||
the data in binary format in the runtime to conserve memory and improve
|
||||
matching performance, at a very low processing cost.
|
||||
|
||||
### Tags
|
||||
|
||||
Each tag is an array of one or more strings, with some conventions around them. Take a look at the example below:
|
||||
|
|
Loading…
Reference in New Issue
Block a user