Skip to content
This repository was archived by the owner on Jun 3, 2022. It is now read-only.

IPC RPC should have a transport layer with payload length #14

Open
axic opened this issue Feb 25, 2017 · 14 comments
Open

IPC RPC should have a transport layer with payload length #14

axic opened this issue Feb 25, 2017 · 14 comments

Comments

@axic
Copy link
Member

axic commented Feb 25, 2017

JSON-RPC doesn't specify a transport layer. It relies on the transport protocol (HTTP) to do that. This however is missing in certain use cases, such as Unix socket based IPC.

There is no consistent way defined between Ethereum clients to deal with this:

  • geth uses a streaming JSON decoder which can detect the end of the input
  • soltest (solidity) looks for the first new line (\n), which is obviously dangerous
  • cpp-ethereum waits for a big enough input

There are some proposals to define a transport layer:

  • open a new connection every single time (not really suitable for our use)
  • use an encapsulation format (proposes netstrings)
  • rely on the JSON decoder

Since there is no widely adopted standard, I propose to have a single length field preceding the JSON request and the response.

For the encoding I propose to use either:

  • number as a string terminated with new line (this valid JSON)
  • a single member array with the length as the only member ([1234] - this is valid JSON too and is easy to search for the closing bracket ])
  • RLP or CBOR encoded number

Choosing a JSON compatible encoding would mean that clients could support backwards compatibility by reading another JSON message if the first message is not JSON-RPC (e.g. it is a single number in an array as above)

@cdetrio
Copy link
Member

cdetrio commented Feb 25, 2017

Also being discussed at https://github.com/ethcore/parity/issues/4647

@karalabe
Copy link
Member

You can't seriously ignore all of the discussion on that thread and restate that proposal ignoring all the counterarguments and soluyions probided there.

@axic
Copy link
Member Author

axic commented Feb 26, 2017

You can't seriously ignore all of the discussion on that thread

Unless I wasn't aware of that thread. I came across this issue roughly three weeks ago when changing Solidity's soltest and implementing IPC support in testrpc.

@karalabe
Copy link
Member

Ah, I see axic wasn't aware of that thread. Please read it. There shouldn't need to be an extra transport protocol if you have a decent json parser.

Also lenght prefixing needs you to serialize in memory first and send after it, which requires more memory and doesn't allow streaming json generation and transmission.

@axic
Copy link
Member Author

axic commented Feb 26, 2017

I will go through it (albeit not tonight).

@MicahZoltu
Copy link

If it is decided to move forward with this, my vote would be for a length prefix (2^53 or smaller) + sentinel.

The length prefix allows for clients (on both sides of the connection) to read just the length prefix and then wait until they have enough bytes before attempting any decoding. This simplifies the job of the parser significantly as it can just read a well defined number of bytes (however long the length is) and then wait for their buffer to fill before continuing on with parsing. This is particularly meaningful if anything beyond ASCII-7 is encoded, which I JSON-RPC allows for, as dealing with UTF-8 surrogates in an unknown length stream is a bit of a pain.

The reason I think a sentinel is valuable is that it allows for recovery of the stream if one of the participants puts invalid JSON onto the wire. With Web Sockets a single malformed JSON payload doesn't blow up the entire connection like it will with a Concatenated JSON parser. A simple sentinel would be a null byte as that isn't allowed inside a JSON string. The problem, of course, is that this will make the wire protocol not entirely made up of valid Concatenated JSON and that may be desirable for backwards compatibility. I suspect that no matter what enveloping/prefixing is used clients will need to make changes so my vote would be to go all the way and have a null byte sentinel between messages (between each message, before the length). The sentinel could either be a prefix to each message or a terminator for each message, I don't believe it matters (though it should be specified which it is clearly).

As for the encoding of the length, I personally prefer network byte order (big endian) long (8 bytes) as that should be big enough for very far into the future and since these payloads are JSON, size on the wire is clearly not a concern. While JSON encoding the length prefix could work, it doesn't alleviate much of the developer pain around building a proper parser as the developer will still need to build a JSON parser that can parse an undefined length stream. I suppose the length prefix could be padded JSON to provide a fixed length? For example:

18446744073709551616{ "jsonrpc": "2.0", "id": "1", "method": "..." }
48                  { "jsonrpc": "2.0", "id": "1", "method": "abc" }

This would allow Concatenated JSON parsers to see and ignore the length and simple parsers to read exactly 20 bytes off the wire, parse them as a number, then read that many bytes off the wire to get the message. Netstrings would work, it just means the developer will need to stream-process bytes until it reaches the : delimiter. Not quite as easy/friendly as well defined string length prefixes.

There may be value in limiting to 53-bit length since that is the biggest integer that JavaScript can hold without a bunch of trouble.

All of the above being said, I am a big fan of standardizations and if https://www.simple-is-better.org/json-rpc/transport_sockets.html is picking up any real steam I would get on board with following it. I am a bit disappointed that it isn't self-healing (no sentinel character) though.

A final option would be to go with sentinel only (null byte). This would allow stream processing on both sender and receiver, yet also allow the receiver to take a much more simple approach of waiting for a null byte and then decoding the entire string from UTF-8 into a language native string and running the whole thing through a parser in one shot. It does mean that the client will need to iterate over the stream twice in this case (once to wait for the null byte, then again when it parses) but one could engineer around this if they really needed the minor performance benefits that a single pass parser would provide.

@karalabe
Copy link
Member

karalabe commented Feb 26, 2017 via email

@karalabe
Copy link
Member

karalabe commented Feb 26, 2017 via email

@karalabe
Copy link
Member

karalabe commented Feb 26, 2017 via email

@fjl
Copy link

fjl commented Feb 26, 2017

My suggestion: we can the guarantee that there is a newline after each JSON object on the wire. This makes parsing easy: just read lines into a buffer until the parser says that the buffer is valid JSON.

This would be slow for big values spread over many lines (it's quadratic in the number of lines) but we never generate multi-line values so it's not an issue in practice. It's also backwards-compatible.

@fjl
Copy link

fjl commented Feb 26, 2017

As a matter of fact, go-ethereum already does newline-separation (just checked with netcat).

@bas-vk
Copy link
Member

bas-vk commented Feb 27, 2017

I'm against extending the protocol with mandatory newlines between json messages. The reason is that if you want to properly implement the protocol you need to check for these newlines. In practise it means that in most cases you cannot use standard streaming encoders/decoders since the point of using them is that you don't need to have these kind of separators. So they will simply ignore them and accept an endless stream of json messages without newlines.

As an example, for geth it means that we need to implement a custom json decoder that ensures that requests are newline separated. Otherwise geth accepts invalid requests.

@fjl
Copy link

fjl commented Feb 27, 2017

Not sure I agree. My suggestion was that newlines should be mandatory when writing JSON objects to the wire. There is no need to check that newlines are present when using a streaming decoder.

@axic
Copy link
Member Author

axic commented Jan 6, 2018

The tradeoff suggested by @fjl I think is the best, albeit it should be extended with the strict rule that unescaped new lines cannot be present in the JSON. This basically ensures that pretty printed JSON shouldn't be transmitted on RPC.

Are these two acceptable rules for geth, @karalabe @bas-vk ?

It seems parity is doing that already according to openethereum/parity-ethereum#4647 (comment).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants