NIP-77 ====== Negentropy Syncing ------------------ `draft` `optional` This document describes a protocol extension for syncing events. It works for both client-relay and relay-relay scenarios. If both sides of the sync have events in common, then this protocol will use less bandwidth than transferring the full set of events (or even just their IDs). It is a Nostr-friendly wrapper around the [Negentropy](https://github.com/hoytech/negentropy) protocol, which uses a technique called [Range-Based Set Reconciliation](https://logperiodic.com/rbsr.html). Since Negentropy is a binary protocol, this wrapper hex-encodes its messages. The specification for Negentropy Protocol V1 is attached as an appendix to this NIP below. ## High-Level Protocol Description We're going to call the two sides engaged in the sync the client and the relay (even though the initiator could be another relay instead of a client). * (1) Client (initiator) chooses a filter, and retrieves the set of events that it has locally that match this filter (or uses a cache), and constructs an initial message. * (2) Client sends a `NEG-OPEN` message to the relay, which includes the filter and the initial message. * (3) Relay selects the set of events that it has locally that match the filter (or uses a cache). * (4) Relay constructs a response and returns it to the client in a `NEG-MSG` message. * (5) Client parses the message to learn about IDs it has (and relay needs) and IDs it needs (and relay has). * If client wishes to continue, then it constructs a new message and sends it to the relay in a `NEG-MSG` message. Goto step 4. * If client wishes to stop, then it sends a `NEG-CLOSE` message or disconnects the websocket. The above protocol only results in the client learning about IDs it has/needs, and does not actually transfer events. Given these IDs, the client can upload events it has with `EVENT`, and/or download events it needs with `REQ`. This can be performed over the same websocket connection in parallel with subsequent `NEG-MSG` messages. If a client is only interested in determining the number of unique events (ie, reaction counts), it may choose to not download/upload at all. ## Nostr Messages ### Initial message (client to relay): ```jsonc [ "NEG-OPEN", , , ] ``` * The subscription ID is used by each side to identify which query a message refers to. It only needs to be long enough to distinguish it from any other concurrent subscriptions on this websocket connection (an integer that increments once per `NEG-OPEN` is fine). Subscription IDs are in a separate namespace from `REQ` subscription IDs. If a `NEG-OPEN` is issued for a currently open subscription ID, the existing subscription is first closed. * The filter is as described in [NIP-01](01.md). * `initialMessage` is the initial Negentropy binary message, hex-encoded. See appendix. ### Error message (relay to client): If a request cannot be serviced by the relay, an error is returned to the client: ```jsonc [ "NEG-ERR", , ] ``` Current reason codes are: * `RESULTS_TOO_BIG` * Relays can optionally reject queries that would require them to process too many records, or records that are too old * The maximum number of records that can be processed can optionally be returned as the 4th element in the response * `CLOSED` * Because the `NEG-OPEN` queries may be stateful, relays may choose to time-out inactive queries to recover memory resources After a `NEG-ERR` is issued, the subscription is considered to be closed. ### Subsequent messages (bidirectional): Relay and client alternate sending each other `NEG-MSG`s: ```jsonc [ "NEG-MSG", , ] ``` * `message` is a Negentropy binary message, hex-encoded. Both message directions use the same format. See appendix. ### Close message (client to relay): When finished, the client should tell the relay it can release its resources with a `NEG-CLOSE`: ```jsonc [ "NEG-CLOSE", ] ``` ## Appendix: Negentropy Protocol V1 ### Preparation There are two protocol participants: Client and server. The client creates an initial message and transmits it to the server, which replies with its own message in response. The client continues querying the server until it is satisifed, and then terminates the protocol. Messages in either direction have the same format. Each participant has a collection of records. A records consists of a 64-bit numeric timestamp and a 256-bit ID. Each participant starts by sorting their items according to timestamp, ascending. If two timestamps are equal then items are sorted lexically by ID, ascending by first differing byte. Items may not use the max uint64 value (`2**64 - 1`) as a timestamp since this is reserved as a special "infinity" value. The goal of the protocol is for the client to learn the set of IDs that it has and the server does not, and the set of items that the server has and it does not. ### `Varint` Varints (variable-sized unsigned integers) are represented as base-128 digits, most significant digit first, with as few digits as possible. Bit eight (the high bit) is set on each byte except the last. Varint := * ### `Id` IDs are represented as byte-strings of length `32`: Id := Byte{32} ### `Message` A reconciliation message is a protocol version byte followed by an ordered list of ranges: Message := * The current protocol version is 1, represented by the byte `0x61`. Protocol version 2 will be `0x62`, and so forth. If a server receives a message with a protocol version that it cannot handle, it should reply with a single byte containing the highest protocol version it supports, allowing the client to downgrade and retry its message. Each Range corresponds to a contiguous section of the timestamp/ID space. The first Range starts at timestamp 0 and an ID of 0 bytes. Ranges are always adjacent (no gaps). If the last Range doesn't end at the special infinity value, an implicit `Skip` to infinity Range is appended. This means that the list of Ranges always covers the full timestamp/ID space. ### `Range` A Range consists of an upper bound, a mode, and a payload: Range := The contents of the payload is determined by mode: * If `mode = 0`, then payload is `Skip`, meaning the sender does not wish to process this Range further. This payload is empty: Skip := * If `mode = 1`, then payload is a `Fingerprint`, which is a [digest](#fingerprint-algorithm) of all the IDs the sender has within the Range: Fingerprint := Byte{16} * If `mode = 2`, the payload is `IdList`, a variable-length list of all IDs the sender has within the Range: IdList := * ### `Bound` Each Range is specified by an *inclusive* lower bound and an *exclusive* upper bound. As defined above, each Range only includes an upper bound: the lower bound of a Range is the upper bound of the previous Range, or 0 timestamp/0 ID for the first Range. A Bound consists of an encoded timestamp and a variable-length disambiguating prefix of an ID (in case multiple items have the same timestamp): Bound := * * The timestamp is encoded specially. The infinity timestamp is encoded as `0`. All other values are encoded as `1 + offset`, where offset is the difference between this timestamp and the previously encoded timestamp. The initial offset starts at `0` and resets at the beginning of each message. Offsets are always non-negative since the upper bound's timestamp is greater than or equal to the lower bound's timestamp, ranges in a message are always encoded in ascending order, and ranges never overlap. * The size of `idPrefix` is encoded in `length`, and can be between `0` and `32` bytes, inclusive. This allows implementations to use the shortest possible prefix to separate the first record of this Range from the last record of the previous Range. If these records' timestamps differ, then the length should be 0, otherwise it should be the byte-length of their common ID-prefix plus 1. If the `idPrefix` length is less than `32` then the omitted trailing bytes are implicitly 0 bytes. ### Fingerprint Algorithm The fingerprint of a Range is computed with the following algorithm: * Compute the addition mod 2256 of the element IDs (interpreted as 32-byte little-endian unsigned integers) * Concatenate with the number of elements in the Range, encoded as a [Varint](#varint) * Hash with SHA-256 * Take the first 16 bytes