Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyse .rodata/go:embed content #44

Open
gudvinr opened this issue May 15, 2024 · 5 comments
Open

Analyse .rodata/go:embed content #44

gudvinr opened this issue May 15, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@gudvinr
Copy link

gudvinr commented May 15, 2024

go:embed stores its data in .rodata section of the binary file.

I am not sure if it's possible to extract all of the content of .rodata but it would be useful to at least have some idea about embedded content.

As for example, lingua-go stores tremendous amounts of embeds, so .rodata will take up ~100Mb of the file.

Information on the exact data structure is rather sparse but it's somewhat simple because we know what does the embedding (it is https://pkg.go.dev/embed)

See also:

@Zxilly Zxilly added the enhancement New feature or request label May 15, 2024
@Zxilly
Copy link
Owner

Zxilly commented May 15, 2024

This is certainly possible, and in fact existing methods based on decompilation already recognise some of them.
But I discarded the results that could not be recognised as strings when I processed the results obtained from decompilation. Because false positives can be very disturbing.
There is an additional difficulty, gsa currently supports three platforms, pe/macho/elf, and writing a parser for each platform or even each go version might be too much work.
I would expect a parser based on dwarf to handle this, after all, at runtime the embedded content is just a string of bytes. golang uses dwarf on all platforms including pe, so the workload is relatively acceptable.

@Zxilly
Copy link
Owner

Zxilly commented Jun 9, 2024

The dwarf doesn't contain information about embed, maybe we still need some reverse engineering work.
Therefore, it may be a long wait to implement this feature.

@Zxilly
Copy link
Owner

Zxilly commented Jun 15, 2024

Please try v1.3.0. It has an initial support for embed parse.
You must compile with the debug symbol to enable this feature.
image

@Zxilly
Copy link
Owner

Zxilly commented Jun 15, 2024

Keeping this issue open for now, as the new implementation is based on inversion and some assumptions, and it's not certain that the code will handle all real-world situations correctly. Expect feedback to fix it further.

@gudvinr
Copy link
Author

gudvinr commented Jun 17, 2024

I see that my test binary now reduced unknown .rodata size from ~9MB to ~3MB. Good work, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants