Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gosym can provide the wrong address for funcs in cgo go binary #76

Closed
Zxilly opened this issue Jan 23, 2024 · 14 comments · Fixed by #90
Closed

gosym can provide the wrong address for funcs in cgo go binary #76

Zxilly opened this issue Jan 23, 2024 · 14 comments · Fixed by #90

Comments

@Zxilly
Copy link
Contributor

Zxilly commented Jan 23, 2024

As described in golang/go#65232, gosym.NewLineTable should be used with the value of symbol runtime.text but not just use the start of .text segment, which was used by gore right now.

I have no idea how to implement a search for runtime.text in the stripped binary, but at least we can show a warning like the offset in func can be incorrect in cgo binary right now.

image

@TcM1911
Copy link
Member

TcM1911 commented Jan 29, 2024

I think I implemented a fix for this in redress. We should be be able to just take the start of text address from the moduledata structure.

@Zxilly
Copy link
Contributor Author

Zxilly commented Jan 29, 2024

Maybe the same logic can also be applied to pclntab

@TcM1911
Copy link
Member

TcM1911 commented Jan 29, 2024

I think I have an idea for how we can solve this. First we need to find the moduledata structure. This we can do without knowing which compiler version was used to compile the binary. While we need to know the compiler version to parse the structure correctly, I think we can check offsets for expected data on the fly as long as we do it in the right order.

So we read the offset where TEXT and ETEXT should be, check if the values falls within the .text section of the file. We can also "verify" that we read the right address by also checking for example the start of data and end of data. If the address is not in the expected range, we use another offset for other compiler versions. Once we find the right offset, we can compare it to the start address of the text section found in the header. If it's different, we use the value from the module data as the image base.

Do you have a test binary that can be used for testing the implementation?

@Zxilly
Copy link
Contributor Author

Zxilly commented Jan 29, 2024

My testdata placed at https://github.com/Zxilly/go-testdata. But I think the offset may not a fixed value. As the golang team member said:

For a pure Go binary, we put it at the start o f the text section and text segment. But for a cgo binary, the C linker doesn't necessarily to do so. It can put some C functions in the text section before the runtime.text symbol.

The value of offset can change a lot while the devs static link some c library.

@TcM1911
Copy link
Member

TcM1911 commented Jan 29, 2024

@Zxilly
Copy link
Contributor Author

Zxilly commented Jan 30, 2024

I read some other reverse implementation12, maybe we can search moduledata based on magic number, then validate the address one by one?

Footnotes

  1. https://github.com/0xjiayu/go_parser/blob/master/moduledata.py

  2. https://github.com/pnfsoftware/jeb-golang-analyzer/blob/master/Commons.py

@TcM1911
Copy link
Member

TcM1911 commented Jan 30, 2024

I would first find the pclntab based on its magic bytes. Get the virtual address of that offset. Search for the offset in the binary to find the moduledata since it is the first field in the structure.

This is pretty much how these two structures are found in PE files already.

@Zxilly
Copy link
Contributor Author

Zxilly commented Jan 30, 2024

Validating moduledata is easy if we find it first. We can use the text&etext addr and pclntab data for validation. For text&etext, we can check the .text section, and for pclntab, we can verify the magic number at the start.

However, if we find pnlntab first, we have to build gosym.Table twice because we didn't get the accurate pc base for it during the first search. These are my suggestions.

@Zxilly
Copy link
Contributor Author

Zxilly commented Jan 30, 2024

We can create a magic pair for moduledata and pclntab magic numbers. They should pair each other in the same golang version.

@Zxilly
Copy link
Contributor Author

Zxilly commented Jan 31, 2024

I will rewrite the PR associated with the moduledata search based implementation. Please let me know if you have any ideas.

@TcM1911
Copy link
Member

TcM1911 commented Jan 31, 2024

Validating moduledata is easy if we find it first. We can use the text&etext addr and pclntab data for validation. For text&etext, we can check the .text section, and for pclntab, we can verify the magic number at the start.

However, if we find pnlntab first, we have to build gosym.Table twice because we didn't get the accurate pc base for it during the first search. These are my suggestions.

We don't need to parse the pclntab twice, we just need the address of the structure so it can be used to find the moduledata structure. After we have the moduledata, we can parse the pclntab with the correct text start.

@Zxilly
Copy link
Contributor Author

Zxilly commented Jan 31, 2024

Can you add more explain about Search for the offset in the binary to find the moduledata since it is the first field in the structure.?

I found the moduledata defind in the golang source like

type moduledata struct {
	sys.NotInHeap // Only in static data

	pcHeader     *pcHeader
	funcnametab  []byte
	cutab        []uint32
	filetab      []byte
	pctab        []byte
	pclntable    []byte
	ftab         []functab
	findfunctab  uintptr
	minpc, maxpc uintptr

	text, etext           uintptr
	noptrdata, enoptrdata uintptr
	data, edata           uintptr
	bss, ebss             uintptr
	noptrbss, enoptrbss   uintptr
	covctrs, ecovctrs     uintptr
	end, gcdata, gcbss    uintptr
	types, etypes         uintptr
	rodata                uintptr
	gofunc                uintptr // go.func.*

	textsectmap []textsect
	typelinks   []int32 // offsets from types
	itablinks   []*itab

	ptab []ptabEntry

	pluginpath string
	pkghashes  []modulehash

	// This slice records the initializing tasks that need to be
	// done to start up the program. It is built by the linker.
	inittasks []*initTask

	modulename   string
	modulehashes []modulehash

	hasmain uint8 // 1 if module contains the main function, 0 otherwise

	gcdatamask, gcbssmask bitvector

	typemap map[typeOff]*_type // offset to *_rtype in previous module

	bad bool // module failed to load and should be ignored

	next *moduledata
}

Is only this part persist in the binary?

	text, etext           uintptr
	noptrdata, enoptrdata uintptr
	data, edata           uintptr
	bss, ebss             uintptr
	noptrbss, enoptrbss   uintptr
	covctrs, ecovctrs     uintptr
	end, gcdata, gcbss    uintptr
	types, etypes         uintptr
	rodata                uintptr
	gofunc                uintptr // go.func.*

@TcM1911
Copy link
Member

TcM1911 commented Jan 31, 2024

The whole structure is present in the binary. Check the different structures in the moduledata.go file. The size is different depending on 32 vs 64 bit and Go version. The logic used to find it is:

func findModuledata(f fileHandler) ([]byte, error) {
	_, secData, err := f.getSectionData(f.moduledataSection())
	if err != nil {
		return nil, err
	}
        /// Get the virtual address of the PCLNTAB.
	tabAddr, _, err := f.getPCLNTABData()
	if err != nil {
		return nil, err
	}

	// Search for moduledata using the address to the PCLNTAB. The match will be hit on `*pcHeader` which is the first field.
	buf := new(bytes.Buffer)
	err = binary.Write(buf, binary.LittleEndian, &tabAddr)
	if err != nil {
		return nil, err
	}
	off := bytes.Index(secData, buf.Bytes()[:intSize32])
	if off == -1 {
		return nil, errors.New("could not find moduledata")
	}
	// TODO: Verify that hit is correct.

	return secData[off : off+0x300], nil
}

For PE files gore searches the .rdata and .text sections for the PCLNTAB using this logic:

func searchSectionForTab(secData []byte) ([]byte, error) {
	// First check for the current magic used. If this fails, it could be
	// an older version. So check for the old header.
MAGIC_LOOP:
	for _, magic := range [][]byte{pclntab120magic, pclntab118magic, pclntab116magic, pclntab12magic} {
		off := bytes.LastIndex(secData, magic)
		if off == -1 {
			continue // Try other magic.
		}
		for off != -1 {
			if off != 0 {
				buf := secData[off:]
				if len(buf) < 16 || buf[4] != 0 || buf[5] != 0 ||
					(buf[6] != 1 && buf[6] != 2 && buf[6] != 4) || // pc quantum
					(buf[7] != 4 && buf[7] != 8) { // pointer size
					// Header doesn't match.
					if off-1 <= 0 {
						continue MAGIC_LOOP
					}
					off = bytes.LastIndex(secData[:off-1], magic)
					continue
				}
				// Header match
				return secData[off:], nil
			}
			break
		}
	}
	return nil, ErrNoPCLNTab
}

The address of the PCLNTAB is calculated using: addr := sec.VirtualAddress + uint32(len(secData)-len(tab)). Where tab[0] is the starting byte of the PCLNTAB. The secData is a []byte of the section. tab is a sub-slice of secData starting at the PCLNTAB plus the tail of the section. len(secData)-len(tab) calculates the PCLNTAB offset in the section. The calculated virtual address is the pcHeader field in the moduledata structure.

I maybe can get a chance to code this up this weekend. Otherwise, if you want take a stab at it, please go ahead.

Essentially, I would add this logic to ELF and MachO files as a fallback if the named sections don't exist.

@Zxilly
Copy link
Contributor Author

Zxilly commented Feb 5, 2024

I can port this logic to elf and macho, but before we can do that we need to merge the #77

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants