Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(json): don't emit defaut fields #65

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

librolibro
Copy link
Contributor

@librolibro librolibro commented Aug 6, 2024

Closes #64

For now there are failing tests, proper implementation is in progress)

P.S. If I looked carefully into generated structs - all fields (expect ArrayLists which should be empty ones after MessageMixins.init) have default values (even proto2 defaults are supported), so it shouldn't be a hard task

Since protobuf specs tells that omitting fields without presence should
be a default behavior we don't need *fillDefaultStructValues* call.
All we need is to call MessageMixins.init to properly initialize all
generated struct fields (including ArrayList ones)
@librolibro
Copy link
Contributor Author

librolibro commented Aug 7, 2024

  • We should not emit empty repeated fields
  • We should not emit empty string
  • We should not emit empty byte arrays (bytes type)
  • Check if negative zero is ALWAYS emitting (see here)
  • .OneOf's with default values is ALWAYS emitting
// Wrong
{
  "oneof_field": {}
}

// Even more wrong
{}

// Right
{
  "oneof_field": {
    "active_oneof_field": 0
  }
}

Looks like everything is working as expected but I'd like to add a bit more tests to be sure

P.S. I completely forgot about proto2 defaults... Need to reimplement need_to_emit_numeric function

@librolibro
Copy link
Contributor Author

librolibro commented Aug 8, 2024

I read the proto2 specification several times and I still don't understand that's default field values for:

  • Optional field with default value: what's the difference between this one and optional field without default value (in case of deserialization)? If I understand correctly, any value != null should be emitted both in protobuf bytes/JSON string serialization (e.g. zeros, empty strings/bytes etc.). I understand that default value will be the initial value of field after pb_init() - is there any meaning for it other than that? When value is absent in serialized bytes/JSON string - does it mean than value==null and that field was not set? Or it should it be equal to default value - in this case, wth is the purpose of this if I can't signal that value is absent because when it's not on the wire it will not be equal to null but to default?
  • Required field with default value: there is different logic I suppose, e.g. if numeric field with default=5 exists than if will not be serialized if value=5 (but 0 should be serialized), and when deserializing if value is absent it will be equal to 5 (because pb_init set it and there were no changes)

Am I understanding that right?

@Arwalk
Copy link
Owner

Arwalk commented Aug 8, 2024

I'll look into details when I can, but don't bother too much with proto2 options. This library mostly focuses on proto3.

Thanks again for everything.

@librolibro
Copy link
Contributor Author

librolibro commented Aug 11, 2024

I wrote this simple Python code to check how they're handling these optional proto2 fields (took tests/protos_for_test/jspb.proto definition):

import sys

from google.protobuf.json_format import MessageToJson
from google.protobuf.json_format import Parse as JsonToMessage
from google.protobuf.message import Message

from jspb_pb2 import DefaultValues

if sys.stdout.isatty():
    ansi_magenta = "\x1b[1;35m"
    ansi_reset = "\x1b[0m"
else:
    ansi_magenta = ansi_reset = ""


def print_header(s: str) -> None:
    print("-" * len(s))
    print(ansi_magenta, s, ansi_reset, sep="")


def print_msg(msg: Message, indent: int = 0) -> None:
    msg.DESCRIPTOR.fields
    for field_descriptor in msg.DESCRIPTOR.fields:
        field_name = field_descriptor.name
        print(field_name, ":", sep="", end=" ")
        if isinstance(field_name, Message):
            print_msg(field_name, indent=indent + 2)
        else:
            print(getattr(msg, field_name))
    print()


def print_msg_jsons(msg: Message) -> None:
    for arg in (True, False):
        print("always_print_fields_with_no_presence = ", arg, ":", sep="")
        print(MessageToJson(msg, always_print_fields_with_no_presence=arg, indent=2))
    print()


def print_all(msg: Message) -> None:
    print_msg(msg)
    print_msg_jsons(msg)
    pb_msg = msg.SerializeToString()
    print("Pb serialized message length:", len(pb_msg), end="")
    if pb_msg:
        print(", message itself:", *map("{0:02X}".format, pb_msg))
    else:
        print()


def main_encode() -> None:
    msg = DefaultValues()
    print_header("No arguments were passed to the constructor")
    print_all(msg)

    print_header("Setting *int_field* = default (11)")
    msg.int_field = 11
    print_all(msg)

    print_header("Setting *int_field* = 10")
    msg.int_field = 10
    print_all(msg)


def main_decode() -> None:
    json_string1 = "{}"
    msg = JsonToMessage(json_string1, DefaultValues())
    print_header(f"Parse from {json_string1}")
    print_all(msg)

    json_string2 = '{"intField": 11}'
    msg = JsonToMessage(json_string2, DefaultValues())
    print_header(f"Parse from {json_string2}")
    print_all(msg)


    json_string3 = '{"intField": 10}'
    msg = JsonToMessage(json_string3, DefaultValues())
    print_header(f"Parse from {json_string3}")
    print_all(msg)


if __name__ == "__main__":
    main_encode()
    main_decode()

There is the output:

-------------------------------------------
No arguments were passed to the constructor
string_field: default<>'"abc
bool_field: True
int_field: 11
enum_field: 13
empty_field:
bytes_field: b'moo'

always_print_fields_with_no_presence = True:
{}
always_print_fields_with_no_presence = False:
{}

Pb serialized message length: 0
----------------------------------
Setting *int_field* = default (11)
string_field: default<>'"abc
bool_field: True
int_field: 11
enum_field: 13
empty_field:
bytes_field: b'moo'

always_print_fields_with_no_presence = True:
{
  "intField": "11"
}
always_print_fields_with_no_presence = False:
{
  "intField": "11"
}

Pb serialized message length: 2, message itself: 18 0B
------------------------
Setting *int_field* = 10
string_field: default<>'"abc
bool_field: True
int_field: 10
enum_field: 13
empty_field:
bytes_field: b'moo'

always_print_fields_with_no_presence = True:
{
  "intField": "10"
}
always_print_fields_with_no_presence = False:
{
  "intField": "10"
}

Pb serialized message length: 2, message itself: 18 0A
-------------
Parse from {}
string_field: default<>'"abc
bool_field: True
int_field: 11
enum_field: 13
empty_field:
bytes_field: b'moo'

always_print_fields_with_no_presence = True:
{}
always_print_fields_with_no_presence = False:
{}

Pb serialized message length: 0
---------------------------
Parse from {"intField": 11}
string_field: default<>'"abc
bool_field: True
int_field: 11
enum_field: 13
empty_field:
bytes_field: b'moo'

always_print_fields_with_no_presence = True:
{
  "intField": "11"
}
always_print_fields_with_no_presence = False:
{
  "intField": "11"
}

Pb serialized message length: 2, message itself: 18 0B
---------------------------
Parse from {"intField": 10}
string_field: default<>'"abc
bool_field: True
int_field: 10
enum_field: 13
empty_field:
bytes_field: b'moo'

always_print_fields_with_no_presence = True:
{
  "intField": "10"
}
always_print_fields_with_no_presence = False:
{
  "intField": "10"
}

Pb serialized message length: 2, message itself: 18 0A

It confuses me even more, I just don't understand the logic behind it: if the optional field is not explicitly set in constructor then it's equal to default value but it will not be serialized, but if I explicitly set this field to the default value it will be serialized? Why?

Also, before I wrote this simple test I expected proto2 optional fields to have a type "Something | None" (if you're not familiar with Python, it's kind of a .Union but just for typing) but there was just "Something". Maybe we also don't need to make these fields as .Optional's since even builtin protoc parsers like Python's doesn't make it that way? Another benefit could be easier field access without extra .? thing.

While you're saying that this project is mainly aiming at proto3 support, proto2 support shouldn't be that hard (in case of JSON (de)serialization at least) but these misunderstanding are irritating and prevent this from happening.

@Arwalk
Copy link
Owner

Arwalk commented Aug 11, 2024

Hey there, i understand your frustration. And i think i found the source of it!

It's actually not your fault that you can't find a way to make this work with proto2. Even google doesn't!

image

The language guide for proto2 and proto3 have respectively both a section for json options, here for proto 2 and there for proto 3. And as you can see in the screenshot, google itself is explicitely not compliant for proto 2!

So don't hurt yourself too much, do the same thing as python's implementation. This is the official implementation after all, if anybody complains, we'll point them to this.

Remember that this library is at the start a personal project of mine that happened to become the first viable (to my knowledge) implementation of protobuf for zig.

As a contributor i expect you to mostly do things for fun, or personal interest in making it work for your use cases.

Take care friend, don't hesitake to take a break.

I haven't looked into all the details of what you did yet, i'll check it out if I find some time for it.

@Arwalk
Copy link
Owner

Arwalk commented Sep 15, 2024

hey there @librolibro , where are you on this? Do you have some plans with it?

I have no desire to force you to finish it if you do not want to. If you want to leave it as is, for any reason whatsoever, just tell me.

Thank you for your work already.

@librolibro
Copy link
Contributor Author

Yeah, many things happened and I almost forgot about this project :) I'll take a look on this week, need to remember what did I do and why did I stuck.

I also remember that I wanted to implement emittion options (camel or snake case, emit default value or not etc.) - I still remember the idea I had for that so I'll also take a look on this.

I just switched my focus to more interesting and much less useful project - and I almost forgot why did I want JSON serialization in the first place, thanks for reminding me :)

@Arwalk
Copy link
Owner

Arwalk commented Sep 16, 2024

It's OK. Take care.

@librolibro
Copy link
Contributor Author

librolibro commented Sep 17, 2024

I'd like to ask you (something I mentioned in previous messages) before I'll read some code: do we really need .Union's in generated code? If optional/required fields are only about presence when emitting (to make pb messages or JSON strings more compact) then there's no need in this. I could skip something important but for now I think they're useless :) For the end-users it's also better to not have these optionals because you'll not need to type extra .? over and over again.

Back again to Python's implementation which I'm familiar with - I used protoc to generate Python code from all proto files I found in this project (with optionals and without, both proto2 and proto3) and these were no optionals in generated code (I didn't find Optional[Something] or Something | None in generated *.pyi files) - yes, constructors may accept None for most fields but after initialization these values will have the target type (and default values).

P.S. For people already using this project it will be a breaking change for sure.

P.P.S. Yeah, looks like I was wrong ... For proto2-required fields if it's null then field wasn't initialized and should fail on serializing. And for proto3-optional fields if it's null then field wasn't initialized but shouldn't fail on serializing (default will be used). By this logic proto3-optional may be null and it will be serialized by uisng default value, but in code you probably want to get default value when accessing this uninitialized field and not useless null. Looking at the C++ generated code they're not accessing fields directly but using methods like set_name and get_name - and if I understand correctly get_name will return the default value if the field wasn't initialized.

@Arwalk
Copy link
Owner

Arwalk commented Sep 18, 2024

I'd like to ask you (something I mentioned in previous messages) before I'll read some code: do we really need .Union's in generated code?

Unions here are not used for optionals but for oneOf fields.

If optional/required fields are only about presence when emitting (to make pb messages or JSON strings more compact) then there's no need in this. I could skip something important but for now I think they're useless :) For the end-users it's also better to not have these optionals because you'll not need to type extra .? over and over again.

The subject is a bit more complicated, and the concept of "optional value" in zig do not corresponds to the concept of "optional field" in proto3.

Here is the excerpt from the spec for proto 3:

image

What can be understood here is that:

  • default field with no labels are closer to the concept of "optional value" in zig: when parsing, either this field is present, or it is not.
  • fields with optional are either set to a specific value, or to their default value.

In other programming languages, this would be the difference between a nullable parameter, and a parameter with a default value. Example in python:

def message(field_with_no_label : Optional[Int], field_with_optional_label = 1 : Int):
    pass

The truth is that we miss the "you can check to see if the value was explicitly set" in our implementation so far, but this will have to be through an optional at some point, so i think we'll have to keep them.

P.P.S. blahblah

This repository's aim is to support proto3, do not bother with proto2. I'd invite you to not even look at the spec to avoid confusing yourself.

Thanks a lot, at no point I meant to be harsh in my comments here but it's sometimes hard to express things in a friendly manner through text only. I'm thankful for your interest and work in many ways.

for numerics and strings/bytes if it's not null
@librolibro
Copy link
Contributor Author

librolibro commented Sep 19, 2024

For now I made that if struct field has the default value (no matter if it's proto2 or proto3 default created by code generation) than I will compare with it - if default is not present (or null) then I respect proto3 specs. If field's value is null then it will not be serialized by any means (that mean than value wasn't initialized and shouldn't be in JSON).

Earlier I rarely used protobuf, mostly Python's implementation, and even after working at this project I think I'm still missing some key points - so sorry for bothering you about that technical details (it's definitely less confusing for me now :)).

After all proto2 is somehow supported but without any guarantees.

Unions here are not used for optionals but for oneOf fields.

I meant .Optional's here. Sometimes both the language barrier and my inattention to some details prevent me from properly checking whether I wrote what I wanted to say or not :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSON: don't emit fields with default values (at least by default)
2 participants