Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement DNS over QUIC #18

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

roopeshsn
Copy link

@roopeshsn roopeshsn commented Dec 26, 2024

Implemented DNS over QUIC.

Part of rbmk-project/issues#3

@roopeshsn
Copy link
Author

Hi @bassosimone! I made changes as per RFC 9250. Could you test once?

One more change is pending. The edns-tcp-keepalive option (section 5.5.2). This option should not be set. If we receive this option as an argument in NewQuery function we need to handle it.

@bassosimone
Copy link
Member

bassosimone commented Dec 27, 2024

Nice, it seems the code is now working as intended with dns0.eu! I would like to suggest we collect some more DoQ servers and check whether the code is also working as intended with them (the intent being to have more data points).

@bassosimone
Copy link
Member

More specifically, I think there are two criteria we should meet for merging an MVP of DNS over QUIC:

  • the code works with 3 public DoQ servers
  • the code feels okay with respect to the RFC

I will allocate some time to study the RFC and review the code over the weekend!

Thank you for working on DoQ, @roopeshsn! 🙏 ✨

@roopeshsn
Copy link
Author

I tried to resolve the following domain,

var (
	serverAddr = flag.String("server", "dns0.eu:853", "DNS server address")
	domain     = flag.String("domain", "www.roopeshsn.com", "Domain to query")
	qtype      = flag.String("type", "A", "Query type (A, AAAA, CNAME, etc.)")
	protocol   = flag.String("protocol", "doq", "DNS protocol (udp, tcp, dot, doh)")
)

I got the following error,

roopesh@Roopeshs-MacBook-Pro dnscore % go run internal/cmd/transport/main.go
;; Query:
;; opcode: QUERY, status: NOERROR, id: 19161
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 4096

;; QUESTION SECTION:
;www.roopeshsn.com.     IN       A

{"time":"2024-12-27T15:36:21.801539+05:30","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABA3d3dwlyb29wZXNoc24DY29tAAABAAEAACkQAAAAAAAAAA==","serverAddr":"dns0.eu:853","serverProtocol":"doq","t":"2024-12-27T15:36:21.801471+05:30","protocol":""}

;; Response:
;; opcode: QUERY, status: NXDOMAIN, id: 0
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 1232

;; QUESTION SECTION:
;www.roopeshsn.com.     IN       A

;; AUTHORITY SECTION:
roopeshsn.com.  1800    IN      SOA     johnathan.ns.cloudflare.com. dns.cloudflare.com. 2358959324 10000 2400 604800 1800


panic: Try0: no such host

goroutine 1 [running]:
github.com/rbmk-project/common/runtimex.PanicOnError(...)
        /Users/roopesh/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:21
github.com/rbmk-project/common/runtimex.Try0(...)
        /Users/roopesh/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:35
main.main()
        /Users/roopesh/Desktop/projects/dnscore/internal/cmd/transport/main.go:74 +0x57c
exit status 2

I'll look into this.

@roopeshsn
Copy link
Author

More specifically, I think there are two criteria we should meet for merging an MVP of DNS over QUIC:

  • the code works with 3 public DoQ servers
  • the code feels okay with respect to the RFC

I will allocate some time to study the RFC and review the code over the weekend!

Thank you for working on DoQ, @roopeshsn! 🙏 ✨

Thanks to you!

@roopeshsn
Copy link
Author

I tried to resolve the following domain,

var (
	serverAddr = flag.String("server", "dns0.eu:853", "DNS server address")
	domain     = flag.String("domain", "www.roopeshsn.com", "Domain to query")
	qtype      = flag.String("type", "A", "Query type (A, AAAA, CNAME, etc.)")
	protocol   = flag.String("protocol", "doq", "DNS protocol (udp, tcp, dot, doh)")
)

I got the following error,

roopesh@Roopeshs-MacBook-Pro dnscore % go run internal/cmd/transport/main.go
;; Query:
;; opcode: QUERY, status: NOERROR, id: 19161
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 4096

;; QUESTION SECTION:
;www.roopeshsn.com.     IN       A

{"time":"2024-12-27T15:36:21.801539+05:30","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABA3d3dwlyb29wZXNoc24DY29tAAABAAEAACkQAAAAAAAAAA==","serverAddr":"dns0.eu:853","serverProtocol":"doq","t":"2024-12-27T15:36:21.801471+05:30","protocol":""}

;; Response:
;; opcode: QUERY, status: NXDOMAIN, id: 0
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 1232

;; QUESTION SECTION:
;www.roopeshsn.com.     IN       A

;; AUTHORITY SECTION:
roopeshsn.com.  1800    IN      SOA     johnathan.ns.cloudflare.com. dns.cloudflare.com. 2358959324 10000 2400 604800 1800


panic: Try0: no such host

goroutine 1 [running]:
github.com/rbmk-project/common/runtimex.PanicOnError(...)
        /Users/roopesh/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:21
github.com/rbmk-project/common/runtimex.Try0(...)
        /Users/roopesh/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:35
main.main()
        /Users/roopesh/Desktop/projects/dnscore/internal/cmd/transport/main.go:74 +0x57c
exit status 2

I'll look into this.

From response.go file, I came to know that if a domain is not valid, then we'll get an error with a suffix "no such host". This is expected, correct? But in this case, roopeshsn.com is valid. So dns0.eu is not able to resolve this domain correct?

@bassosimone
Copy link
Member

bassosimone commented Dec 27, 2024

From response.go file, I came to know that if a domain is not valid, then we'll get an error with a suffix "no such host". This is expected, correct?

Let me try and run the same command you were running with other protocols.

So, I tried using DNS-over-TLS first:

% ./transport -protocol dot -domain www.roopeshsn.com
;; Query:
;; opcode: QUERY, status: NOERROR, id: 48820
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags: do; udp: 4096
; PADDING: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

;; QUESTION SECTION:
;www.roopeshsn.com.	IN	 A

panic: Try1: EOF

goroutine 1 [running]:
github.com/rbmk-project/common/runtimex.PanicOnError(...)
	/Users/simone/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:21
github.com/rbmk-project/common/runtimex.Try1[...](...)
	/Users/simone/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:40
main.main()
	/Users/simone/src/github.com/rbmk-project/dnscore/internal/cmd/transport/main.go:67 +0x664

It's curious I get an EOF error here. I wonder whether this is the expected result—I'll check more in details over the weekend. (Side note: panicking is not good in general for a command line tool, but this tool us just meant for quick and dirty testing, so panicking is fine in this case.)

Then I tried using DNS-over-UDP:

% ./transport -protocol udp -domain www.roopeshsn.com
;; Query:
;; opcode: QUERY, status: NOERROR, id: 24191
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 1232

;; QUESTION SECTION:
;www.roopeshsn.com.	IN	 A

{"time":"2024-12-27T13:32:16.363934+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"Xn8BAAABAAAAAAABA3d3dwlyb29wZXNoc24DY29tAAABAAEAACkE0AAAAAAAAA==","serverAddr":"8.8.8.8:53","serverProtocol":"udp","t":"2024-12-27T13:32:16.363844+01:00","protocol":"udp"}
{"time":"2024-12-27T13:32:16.431423+01:00","level":"INFO","msg":"dnsResponse","localAddr":"192.168.1.113:64193","dnsRawQuery":"Xn8BAAABAAAAAAABA3d3dwlyb29wZXNoc24DY29tAAABAAEAACkE0AAAAAAAAA==","dnsRawResponse":"Xn+BgwABAAAAAQABA3d3dwlyb29wZXNoc24DY29tAAABAAHAEAAGAAEAAAcIADQJam9obmF0aGFuAm5zCmNsb3VkZmxhcmXAGgNkbnPAPIya3NwAACcQAAAJYAAJOoAAAAcIAAApAgAAAAAAAAA=","remoteAddr":"8.8.8.8:53","serverAddr":"8.8.8.8:53","serverProtocol":"udp","t0":"2024-12-27T13:32:16.363844+01:00","t":"2024-12-27T13:32:16.4314+01:00","protocol":"udp"}

;; Response:
;; opcode: QUERY, status: NXDOMAIN, id: 24191
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 512

;; QUESTION SECTION:
;www.roopeshsn.com.	IN	 A

;; AUTHORITY SECTION:
roopeshsn.com.	1800	IN	SOA	johnathan.ns.cloudflare.com. dns.cloudflare.com. 2358959324 10000 2400 604800 1800


panic: Try0: no such host

goroutine 1 [running]:
github.com/rbmk-project/common/runtimex.PanicOnError(...)
	/Users/simone/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:21
github.com/rbmk-project/common/runtimex.Try0(...)
	/Users/simone/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:35
main.main()
	/Users/simone/src/github.com/rbmk-project/dnscore/internal/cmd/transport/main.go:74 +0x57c

This result seems to suggest www.roopeshsn.com does not exist. Such a result is further confirmed by:

% dig www.roopeshsn.com

; <<>> DiG 9.10.6 <<>> www.roopeshsn.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 13121
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;www.roopeshsn.com.		IN	A

;; AUTHORITY SECTION:
roopeshsn.com.		1800	IN	SOA	johnathan.ns.cloudflare.com. dns.cloudflare.com. 2358959324 10000 2400 604800 1800

;; Query time: 88 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Fri Dec 27 13:35:47 CET 2024
;; MSG SIZE  rcvd: 110

(See how the answer is NXDOMAIN, which is == no such host.)

But in this case, roopeshsn.com is valid. So dns0.eu is not able to resolve this domain correct?

So, roopeshsn.com seems to be a valid domain:

% ./transport -protocol udp -domain roopeshsn.com
;; Query:
;; opcode: QUERY, status: NOERROR, id: 33460
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 1232

;; QUESTION SECTION:
;roopeshsn.com.	IN	 A

{"time":"2024-12-27T13:37:07.718682+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"grQBAAABAAAAAAABCXJvb3Blc2hzbgNjb20AAAEAAQAAKQTQAAAAAAAA","serverAddr":"8.8.8.8:53","serverProtocol":"udp","t":"2024-12-27T13:37:07.718669+01:00","protocol":"udp"}
{"time":"2024-12-27T13:37:07.775325+01:00","level":"INFO","msg":"dnsResponse","localAddr":"192.168.1.113:51059","dnsRawQuery":"grQBAAABAAAAAAABCXJvb3Blc2hzbgNjb20AAAEAAQAAKQTQAAAAAAAA","dnsRawResponse":"grSBgAABAAIAAAABCXJvb3Blc2hzbgNjb20AAAEAAcAMAAEAAQAAAB0ABGgVYAfADAABAAEAAAAdAASsQ5YdAAApAgAAAAAAAAA=","remoteAddr":"8.8.8.8:53","serverAddr":"8.8.8.8:53","serverProtocol":"udp","t0":"2024-12-27T13:37:07.718669+01:00","t":"2024-12-27T13:37:07.77529+01:00","protocol":"udp"}

;; Response:
;; opcode: QUERY, status: NOERROR, id: 33460
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 512

;; QUESTION SECTION:
;roopeshsn.com.	IN	 A

;; ANSWER SECTION:
roopeshsn.com.	29	IN	A	104.21.96.7
roopeshsn.com.	29	IN	A	172.67.150.29

However, www.ropeshsn.com does not seem to exist. Is this your personal domain? Is it possible that you have just added the www. subdomain and it has not yet propagated through the DNS system? From my vantage point, only the base domain ropeshsn.com seem to exist. 🤔

Anyway, back onto the DoQ topic, I think the DoQ code is working as intended, since it shows NXDOMAIN where other protocols are showing NXDOMAIN and (through other tests such as the following) returns consistent answers:

% ./transport -protocol doq -domain roopeshsn.com -server dns0.eu:853
;; Query:
;; opcode: QUERY, status: NOERROR, id: 31070
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 4096

;; QUESTION SECTION:
;roopeshsn.com.	IN	 A

{"time":"2024-12-27T13:39:57.268949+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABCXJvb3Blc2hzbgNjb20AAAEAAQAAKRAAAAAAAAAA","serverAddr":"dns0.eu:853","serverProtocol":"doq","t":"2024-12-27T13:39:57.268892+01:00","protocol":""}

;; Response:
;; opcode: QUERY, status: NOERROR, id: 0
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 1232

;; QUESTION SECTION:
;roopeshsn.com.	IN	 A

;; ANSWER SECTION:
roopeshsn.com.	300	IN	A	172.67.150.29
roopeshsn.com.	300	IN	A	104.21.96.7

However, I am still a bit confused by the following:

% ./transport -protocol doq -domain roopeshsn.com -server 1.1.1.1:853
;; Query:
;; opcode: QUERY, status: NOERROR, id: 25618
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 4096

;; QUESTION SECTION:
;roopeshsn.com.	IN	 A

{"time":"2024-12-27T13:40:42.660301+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABCXJvb3Blc2hzbgNjb20AAAEAAQAAKRAAAAAAAAAA","serverAddr":"1.1.1.1:853","serverProtocol":"doq","t":"2024-12-27T13:40:42.660283+01:00","protocol":""}
panic: Try1: timeout: no recent network activity

goroutine 1 [running]:
github.com/rbmk-project/common/runtimex.PanicOnError(...)
	/Users/simone/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:21
github.com/rbmk-project/common/runtimex.Try1[...](...)
	/Users/simone/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:40
main.main()
	/Users/simone/src/github.com/rbmk-project/dnscore/internal/cmd/transport/main.go:67 +0x664

I would have expected 1.1.1.1 to implement DoQ, but maybe it does not. I wonder which other public DNS servers implement DoQ. I suppose there must be an online list, and I guess trying with a few of them is necessary to ensure the client code you wrote is working as intended.

Copy link
Member

@bassosimone bassosimone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for preparing this initial pull request! I have finally managed to grok the related RFC, which means I am able to provide suggestions regarding improving this work. Please, take a look and let me know what you think! 🙏 🚀

Comment on lines +27 to +35
udpAddr, err := net.ResolveUDPAddr("udp", addr.Address)
if err != nil {
return
}

udpConn, err := net.ListenUDP("udp", &net.UDPAddr{IP: net.IPv4zero, Port: 0})
if err != nil {
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible here to use net.ListenConfig.ListenPacket instead? My thinking is that, like we do for TCP, I would like (not in this PR) to abstract away the establishment of a connection, which is useful for (a) testing and (b) measurements (by wrapping the returned conn). In light of this, it seems to me net.ListenConfig.ListenPacket is a better API to aim for, given that it combines DNS lookup and connection creation in a single call.

// 4.2.1. DNS Message IDs
// When sending queries over a QUIC connection, the DNS Message ID MUST
// be set to 0.
query.Id = 0
Copy link
Member

@bassosimone bassosimone Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should extend this comment to mention that we're modifying the original query, which means that, after additional refactoring, this code could cause a data race if it is changed so that we rewrite the query.Id in a background goroutine. I see this as documenting and making it explicit what could go wrong with refactoring.

Likewise, I would like a corresponding comment to be present in the *Transport.Query method. The comment should read something like "This function MAY modify the provided query in place. This happens, e.g., with DoH and DoQ, where we set the query.Id to zero. However, we guarantee that this modification will always happen in the same goroutine that called this function, thus avoiding data races."

(The actual comment does not need to be exactly like this, but I wrote this ~quick explainer to clarify what I mean and why I think it's important to document expectations with respect to mutability.)

if err != nil {
return
}
stream.Write(rawQuery)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we are doing here is not fully compliant with the RFC and, in particular, with this requirement laid out in Sect. 4.2.:

In order for multiple responses to be parsed, a 2-octet length field is used in exactly the same way as the 2-octet length field defined for DNS over TCP [RFC1035]. The practical result of this is that the content of each QUIC stream is exactly the same as the content of a TCP connection that would manage exactly one query.

All DNS messages (queries and responses) sent over DoQ connections MUST be encoded as a 2-octet length field followed by the message content as specified in [RFC1035].

We already have code to implement this kind of exchange with the server, in dotcp.go and we're using it to implement DNS over TCP and DNS over TLS. In fact, DNS over TLS is just a tiny function that setups the TLS connection and then invokes the *Transport.queryStream function.

In light of this, I was wondering whether it could be sensible to attempt to use the same code for DNS over QUIC as well. I think doing this would reduce the amount of duplicate code, but there will be some required changes (I am mostly guessing here, since I did not try this out systematically, just skimmed through the code):

  1. We may want to change the queryStream function signature to take a specific interface (maybe named dnsStream?) rather than a net.Conn, and this interface should only specify the functions we need inside queryStream (and using a different interface would also communicate that queryStream accepts data types beyond the classical net.Conn and indeed could also accept a QUIC stream).

  2. The queryStream function should have a way to only close the stream/conn when we're doing DoQ, which means DoQ requires us to complicate the original algorithm by a single if, which seems ~fine.

  3. It seems to me quic.Stream does not have the LocalAddr and RemoteAddr method, so we should probably play composition by creating a new ad-hoc type that uses the quic.Conn or the quic.Stream to implement the required methods of dnsStream (or however we want to call this interface) to ensure the combination of quic.Stream and quic.Conn meets the requirements of the queryStream method.

@roopeshsn
Copy link
Author

Hi @bassosimone! Thanks for taking the time to review my PR! I'll go through your comments and make the changes this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants