-
Notifications
You must be signed in to change notification settings - Fork 175
IPC RPC should have a transport layer with payload length #14
Comments
Also being discussed at https://github.com/ethcore/parity/issues/4647 |
You can't seriously ignore all of the discussion on that thread and restate that proposal ignoring all the counterarguments and soluyions probided there. |
Unless I wasn't aware of that thread. I came across this issue roughly three weeks ago when changing Solidity's |
Ah, I see axic wasn't aware of that thread. Please read it. There shouldn't need to be an extra transport protocol if you have a decent json parser. Also lenght prefixing needs you to serialize in memory first and send after it, which requires more memory and doesn't allow streaming json generation and transmission. |
I will go through it (albeit not tonight). |
If it is decided to move forward with this, my vote would be for a length prefix (2^53 or smaller) + sentinel. The length prefix allows for clients (on both sides of the connection) to read just the length prefix and then wait until they have enough bytes before attempting any decoding. This simplifies the job of the parser significantly as it can just read a well defined number of bytes (however long the length is) and then wait for their buffer to fill before continuing on with parsing. This is particularly meaningful if anything beyond ASCII-7 is encoded, which I JSON-RPC allows for, as dealing with UTF-8 surrogates in an unknown length stream is a bit of a pain. The reason I think a sentinel is valuable is that it allows for recovery of the stream if one of the participants puts invalid JSON onto the wire. With Web Sockets a single malformed JSON payload doesn't blow up the entire connection like it will with a Concatenated JSON parser. A simple sentinel would be a null byte as that isn't allowed inside a JSON string. The problem, of course, is that this will make the wire protocol not entirely made up of valid Concatenated JSON and that may be desirable for backwards compatibility. I suspect that no matter what enveloping/prefixing is used clients will need to make changes so my vote would be to go all the way and have a null byte sentinel between messages (between each message, before the length). The sentinel could either be a prefix to each message or a terminator for each message, I don't believe it matters (though it should be specified which it is clearly). As for the encoding of the length, I personally prefer network byte order (big endian) long (8 bytes) as that should be big enough for very far into the future and since these payloads are JSON, size on the wire is clearly not a concern. While JSON encoding the length prefix could work, it doesn't alleviate much of the developer pain around building a proper parser as the developer will still need to build a JSON parser that can parse an undefined length stream. I suppose the length prefix could be padded JSON to provide a fixed length? For example:
This would allow Concatenated JSON parsers to see and ignore the length and simple parsers to read exactly 20 bytes off the wire, parse them as a number, then read that many bytes off the wire to get the message. Netstrings would work, it just means the developer will need to stream-process bytes until it reaches the There may be value in limiting to 53-bit length since that is the biggest integer that JavaScript can hold without a bunch of trouble. All of the above being said, I am a big fan of standardizations and if https://www.simple-is-better.org/json-rpc/transport_sockets.html is picking up any real steam I would get on board with following it. I am a bit disappointed that it isn't self-healing (no sentinel character) though. A final option would be to go with sentinel only (null byte). This would allow stream processing on both sender and receiver, yet also allow the receiver to take a much more simple approach of waiting for a null byte and then decoding the entire string from UTF-8 into a language native string and running the whole thing through a parser in one shot. It does mean that the client will need to iterate over the stream twice in this case (once to wait for the null byte, then again when it parses) but one could engineer around this if they really needed the minor performance benefits that a single pass parser would provide. |
Whether you post this in three new thread won't make it decided. We've
explained again and again that it's not needed. Stop ignoring those threads.
…On Feb 26, 2017 04:22, "Micah Zoltu" ***@***.***> wrote:
If it is decided to move forward with this, my vote would be for a length
prefix (2^53 or smaller) + sentinel.
The length prefix allows for clients (on both sides of the connection) to
read just the length prefix and then wait until they have enough bytes
before attempting any decoding. This simplifies the job of the parser
significantly as it can just read a well defined number of bytes (however
long the length is) and then wait for their buffer to fill before
continuing on with parsing. This is particularly meaningful if anything
beyond ASCII-7 is encoded, which I JSON-RPC allows for, as dealing with
UTF-8 surrogates in an unknown length stream is a bit of a pain.
The reason I think a sentinel is valuable is that it allows for recovery
of the stream if one of the participants puts invalid JSON onto the wire.
With Web Sockets a single malformed JSON payload doesn't blow up the entire
connection like it will with a Concatenated JSON parser. A simple sentinel
would be a null byte as that isn't allowed inside a JSON string. The
problem, of course, is that this will make the wire protocol *not*
entirely made up of valid Concatenated JSON and that may be desirable for
backwards compatibility. I suspect that no matter what enveloping/prefixing
is used clients will need to make changes so my vote would be to go all the
way and have a null byte sentinel between messages (between each message,
before the length). The sentinel could either be a prefix to each message
or a terminator for each message, I don't believe it matters (though it
should be specified which it is clearly).
As for the encoding of the length, I personally prefer network byte order
(big endian) long (8 bytes) as that should be big enough for *very* far
into the future and since these payloads are JSON, size on the wire is
clearly not a concern. While JSON encoding the length prefix could work, it
doesn't alleviate much of the developer pain around building a proper
parser as the developer will still need to build a JSON parser that can
parse an undefined length stream. I suppose the length prefix could be
padded JSON to provide a fixed length? For example:
18446744073709551616{ "jsonrpc": "2.0", "id": "1", "method": "..." }
48 { "jsonrpc": "2.0", "id": "1", "method": "abc" }
This would allow Concatenated JSON parsers to see and ignore the length
and simple parsers to read exactly 20 bytes off the wire, parse them as a
number, then read that many bytes off the wire to get the message.
Netstrings would work, it just means the developer will need to
stream-process bytes until it reaches the : delimiter. Not quite as
easy/friendly as well defined string length prefixes.
There may be value in limiting to 53-bit length since that is the biggest
integer that JavaScript can hold without a bunch of trouble.
All of the above being said, I am a big fan of standardizations and if
https://www.simple-is-better.org/json-rpc/transport_sockets.html is
picking up any real steam I would get on board with following it. I am a
bit disappointed that it isn't self-healing (no sentinel character) though.
A final option would be to go with sentinel only (null byte). This would
allow stream processing on both sender and receiver, yet also allow the
receiver to take a much more simple approach of waiting for a null byte and
then decoding the entire string from UTF-8 into a language native string
and running the whole thing through a parser in one shot. It does mean that
the client will need to iterate over the stream twice in this case (once to
wait for the null byte, then again when it parses) but one could engineer
around this if they really needed the minor performance benefits that a
single pass parser would provide.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#14 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAH6GVvbhL0YpFVvvG1OSCxU3HHubItTks5rgOHKgaJpZM4MMATt>
.
|
Just for the record, I am not willing to accept a made up custom protocol.
A json stream is a simple thing. If you want an alternative, suggest a
standard and state its advantages. Dreaming up a weird custom hybrid
binary/text protocol will just make the Ethereum APIs even messier than
they currently are.
…On Feb 26, 2017 10:56, "Péter Szilágyi" ***@***.***> wrote:
Whether you post this in three new thread won't make it decided. We've
explained again and again that it's not needed. Stop ignoring those threads.
On Feb 26, 2017 04:22, "Micah Zoltu" ***@***.***> wrote:
> If it is decided to move forward with this, my vote would be for a length
> prefix (2^53 or smaller) + sentinel.
>
> The length prefix allows for clients (on both sides of the connection) to
> read just the length prefix and then wait until they have enough bytes
> before attempting any decoding. This simplifies the job of the parser
> significantly as it can just read a well defined number of bytes (however
> long the length is) and then wait for their buffer to fill before
> continuing on with parsing. This is particularly meaningful if anything
> beyond ASCII-7 is encoded, which I JSON-RPC allows for, as dealing with
> UTF-8 surrogates in an unknown length stream is a bit of a pain.
>
> The reason I think a sentinel is valuable is that it allows for recovery
> of the stream if one of the participants puts invalid JSON onto the wire.
> With Web Sockets a single malformed JSON payload doesn't blow up the entire
> connection like it will with a Concatenated JSON parser. A simple sentinel
> would be a null byte as that isn't allowed inside a JSON string. The
> problem, of course, is that this will make the wire protocol *not*
> entirely made up of valid Concatenated JSON and that may be desirable for
> backwards compatibility. I suspect that no matter what enveloping/prefixing
> is used clients will need to make changes so my vote would be to go all the
> way and have a null byte sentinel between messages (between each message,
> before the length). The sentinel could either be a prefix to each message
> or a terminator for each message, I don't believe it matters (though it
> should be specified which it is clearly).
>
> As for the encoding of the length, I personally prefer network byte order
> (big endian) long (8 bytes) as that should be big enough for *very* far
> into the future and since these payloads are JSON, size on the wire is
> clearly not a concern. While JSON encoding the length prefix could work, it
> doesn't alleviate much of the developer pain around building a proper
> parser as the developer will still need to build a JSON parser that can
> parse an undefined length stream. I suppose the length prefix could be
> padded JSON to provide a fixed length? For example:
>
> 18446744073709551616{ "jsonrpc": "2.0", "id": "1", "method": "..." }
> 48 { "jsonrpc": "2.0", "id": "1", "method": "abc" }
>
> This would allow Concatenated JSON parsers to see and ignore the length
> and simple parsers to read exactly 20 bytes off the wire, parse them as a
> number, then read that many bytes off the wire to get the message.
> Netstrings would work, it just means the developer will need to
> stream-process bytes until it reaches the : delimiter. Not quite as
> easy/friendly as well defined string length prefixes.
>
> There may be value in limiting to 53-bit length since that is the biggest
> integer that JavaScript can hold without a bunch of trouble.
>
> All of the above being said, I am a big fan of standardizations and if
> https://www.simple-is-better.org/json-rpc/transport_sockets.html is
> picking up any real steam I would get on board with following it. I am a
> bit disappointed that it isn't self-healing (no sentinel character) though.
>
> A final option would be to go with sentinel only (null byte). This would
> allow stream processing on both sender and receiver, yet also allow the
> receiver to take a much more simple approach of waiting for a null byte and
> then decoding the entire string from UTF-8 into a language native string
> and running the whole thing through a parser in one shot. It does mean that
> the client will need to iterate over the stream twice in this case (once to
> wait for the null byte, then again when it parses) but one could engineer
> around this if they really needed the minor performance benefits that a
> single pass parser would provide.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#14 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AAH6GVvbhL0YpFVvvG1OSCxU3HHubItTks5rgOHKgaJpZM4MMATt>
> .
>
|
E.g. websockets splits up a message into multiple frames. This works around
three of the issues your idea introduces: double memory requirement
serialization side; weird 53byte lenght constraint (why should ethereum
cater its protocol for javascript in the first place?) and no need for
magical extra characters.
Also you state all over the place that your suggestion protects against
data corruption that is impossible to happen. I explained many times
already. Stop ignoring and try to force a solution to a non existent
problem.
…On Feb 26, 2017 11:08, "Péter Szilágyi" ***@***.***> wrote:
Just for the record, I am not willing to accept a made up custom protocol.
A json stream is a simple thing. If you want an alternative, suggest a
standard and state its advantages. Dreaming up a weird custom hybrid
binary/text protocol will just make the Ethereum APIs even messier than
they currently are.
On Feb 26, 2017 10:56, "Péter Szilágyi" ***@***.***> wrote:
> Whether you post this in three new thread won't make it decided. We've
> explained again and again that it's not needed. Stop ignoring those threads.
>
> On Feb 26, 2017 04:22, "Micah Zoltu" ***@***.***> wrote:
>
>> If it is decided to move forward with this, my vote would be for a
>> length prefix (2^53 or smaller) + sentinel.
>>
>> The length prefix allows for clients (on both sides of the connection)
>> to read just the length prefix and then wait until they have enough bytes
>> before attempting any decoding. This simplifies the job of the parser
>> significantly as it can just read a well defined number of bytes (however
>> long the length is) and then wait for their buffer to fill before
>> continuing on with parsing. This is particularly meaningful if anything
>> beyond ASCII-7 is encoded, which I JSON-RPC allows for, as dealing with
>> UTF-8 surrogates in an unknown length stream is a bit of a pain.
>>
>> The reason I think a sentinel is valuable is that it allows for recovery
>> of the stream if one of the participants puts invalid JSON onto the wire.
>> With Web Sockets a single malformed JSON payload doesn't blow up the entire
>> connection like it will with a Concatenated JSON parser. A simple sentinel
>> would be a null byte as that isn't allowed inside a JSON string. The
>> problem, of course, is that this will make the wire protocol *not*
>> entirely made up of valid Concatenated JSON and that may be desirable for
>> backwards compatibility. I suspect that no matter what enveloping/prefixing
>> is used clients will need to make changes so my vote would be to go all the
>> way and have a null byte sentinel between messages (between each message,
>> before the length). The sentinel could either be a prefix to each message
>> or a terminator for each message, I don't believe it matters (though it
>> should be specified which it is clearly).
>>
>> As for the encoding of the length, I personally prefer network byte
>> order (big endian) long (8 bytes) as that should be big enough for
>> *very* far into the future and since these payloads are JSON, size on
>> the wire is clearly not a concern. While JSON encoding the length prefix
>> could work, it doesn't alleviate much of the developer pain around building
>> a proper parser as the developer will still need to build a JSON parser
>> that can parse an undefined length stream. I suppose the length prefix
>> could be padded JSON to provide a fixed length? For example:
>>
>> 18446744073709551616{ "jsonrpc": "2.0", "id": "1", "method": "..." }
>> 48 { "jsonrpc": "2.0", "id": "1", "method": "abc" }
>>
>> This would allow Concatenated JSON parsers to see and ignore the length
>> and simple parsers to read exactly 20 bytes off the wire, parse them as a
>> number, then read that many bytes off the wire to get the message.
>> Netstrings would work, it just means the developer will need to
>> stream-process bytes until it reaches the : delimiter. Not quite as
>> easy/friendly as well defined string length prefixes.
>>
>> There may be value in limiting to 53-bit length since that is the
>> biggest integer that JavaScript can hold without a bunch of trouble.
>>
>> All of the above being said, I am a big fan of standardizations and if
>> https://www.simple-is-better.org/json-rpc/transport_sockets.html is
>> picking up any real steam I would get on board with following it. I am a
>> bit disappointed that it isn't self-healing (no sentinel character) though.
>>
>> A final option would be to go with sentinel only (null byte). This would
>> allow stream processing on both sender and receiver, yet also allow the
>> receiver to take a much more simple approach of waiting for a null byte and
>> then decoding the entire string from UTF-8 into a language native string
>> and running the whole thing through a parser in one shot. It does mean that
>> the client will need to iterate over the stream twice in this case (once to
>> wait for the null byte, then again when it parses) but one could engineer
>> around this if they really needed the minor performance benefits that a
>> single pass parser would provide.
>>
>> —
>> You are receiving this because you commented.
>> Reply to this email directly, view it on GitHub
>> <#14 (comment)>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/AAH6GVvbhL0YpFVvvG1OSCxU3HHubItTks5rgOHKgaJpZM4MMATt>
>> .
>>
>
|
My suggestion: we can the guarantee that there is a newline after each JSON object on the wire. This makes parsing easy: just read lines into a buffer until the parser says that the buffer is valid JSON. This would be slow for big values spread over many lines (it's quadratic in the number of lines) but we never generate multi-line values so it's not an issue in practice. It's also backwards-compatible. |
As a matter of fact, go-ethereum already does newline-separation (just checked with netcat). |
I'm against extending the protocol with mandatory newlines between json messages. The reason is that if you want to properly implement the protocol you need to check for these newlines. In practise it means that in most cases you cannot use standard streaming encoders/decoders since the point of using them is that you don't need to have these kind of separators. So they will simply ignore them and accept an endless stream of json messages without newlines. As an example, for geth it means that we need to implement a custom json decoder that ensures that requests are newline separated. Otherwise geth accepts invalid requests. |
Not sure I agree. My suggestion was that newlines should be mandatory when writing JSON objects to the wire. There is no need to check that newlines are present when using a streaming decoder. |
The tradeoff suggested by @fjl I think is the best, albeit it should be extended with the strict rule that unescaped new lines cannot be present in the JSON. This basically ensures that pretty printed JSON shouldn't be transmitted on RPC. Are these two acceptable rules for geth, @karalabe @bas-vk ? It seems parity is doing that already according to openethereum/parity-ethereum#4647 (comment). |
JSON-RPC doesn't specify a transport layer. It relies on the transport protocol (HTTP) to do that. This however is missing in certain use cases, such as Unix socket based IPC.
There is no consistent way defined between Ethereum clients to deal with this:
geth
uses a streaming JSON decoder which can detect the end of the inputsoltest
(solidity) looks for the first new line (\n
), which is obviously dangerouscpp-ethereum
waits for a big enough inputThere are some proposals to define a transport layer:
Since there is no widely adopted standard, I propose to have a single length field preceding the JSON request and the response.
For the encoding I propose to use either:
[1234]
- this is valid JSON too and is easy to search for the closing bracket]
)Choosing a JSON compatible encoding would mean that clients could support backwards compatibility by reading another JSON message if the first message is not JSON-RPC (e.g. it is a single number in an array as above)
The text was updated successfully, but these errors were encountered: