-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llext: Full ARM ELF relocation support #70452
llext: Full ARM ELF relocation support #70452
Conversation
6a07793
to
d57622d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits and one big caveat for which I'm hesitant to approve this PR, even though it LGTM from the technical side: I honestly do not know what other rewrites will be needed to avoid future licensing issues.
0a104ba
to
97446f7
Compare
Don't want to cause trouble here with the licensing issue. Tried to rewrite it the Zephyr way as @bjarki-trackunit asked but might not be sufficient enough. Can we ask for a second opinion here? Maybe @nashif? |
97446f7
to
a57b0a9
Compare
I think we should rewrite all the relocations, in zephyr style, using bitfields and uints instead of manually doing bitshifting and handling endianess. I have rewritten the relocations properly for the ARM_THM_CALL reloc like this
It is way easier to read, way safer, and not copying linux, compared to the rather messy implementation in Linux of the same relocation :) |
I've already tried bitfield in structure in a past project and I've learnt that it shall not be used to map hardware. There is no warranty that the bits you defined will actually be set where you think. It is compiler/implementation depend. I remember reading it in GCC documentation but cannot get my hands on it... Here I found something close enough on the web:
Bit shifting/masking may seems messy but it is the only way to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still like that we're adding the relocations that have been missing, particularly the PC relative jumps. -mlong-call was never intended to be the end game, the end game is to have normally constructed statically linked elfs load. That means that a jump table (I think arm calls them veneers) almost certainly will be needed. This feels like it gets us one step closer which is great.
Overall I think I'd like to see more tests exercising and showing these relocation at work. I'm happy to see the movw/movt setup. Lets get more of those please. I'd also like to see the code better match the docs in how the operation looks. E.g. for R_ARM_ABS32 the operation is defined in the docs as (S + A) | T
Is there a reason we cannot have effectively that exact operation in C code here, e.g. *op = (sym_addr + addend) | thumb_target;
The closer to the docs the code looks, the easier it is to see that we're doing the right things imho.
We're going to need to really take our time and comb over this I feel like given this is the second round where directly copied comments/code were noted.
#define PREL31_LOWER_BOUNDARY ((int32_t)-0x40000000) | ||
#define THM_JUMP_UPPER_BOUNDARY ((int32_t)0xff000000) | ||
#define THM_JUMP_LOWER_BOUNDARY ((int32_t)0x01000000) | ||
#define MASK_V4BX_RM_COND 0xf000000f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Zephyr support armv4 at all? Whats the use case for the v4bx relocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, from what I know, newer ARM architecture are based on the previous ones. So, if Zephyr supports v8, it means that it support v4, 5 ,6 and 7 I guess
arch/arm/core/elf.c
Outdated
case R_ARM_ABS32: | ||
/* Add the addend stored at opaddr to opval */ | ||
opval += *((uint32_t *)opaddr); | ||
/* fall-through */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like wise a stack of cases is clear enough to designate a common branch, no need to comment its a fall-through
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* fall-through */ is for newer gcc's feature.
you'll get a warning/error if your 'case' does not have a 'break'. It is meant to force developers to explicit their code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's insane, this is used all over
case X:
case Y:
case Z:
useful_work();
break;
Should absolutely not be warned on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, totally agree. I had the same reaction but welcome to GCC 7 (check for fallthrough or comments)
So either you add a new warning suppressor according to GCC version or you already adapt your code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that true only if the case has a body but no break?
The docs suggest that to be the case
https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wimplicit-fallthrough_003d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I didn't find it in the doc but it's true that examples always have bodies... need to test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, please do remove the fall through comments where there is no body.
*(uint32_t *)loc |= OPCODE2ARMMEM(MASK_V4BX_NOT_RM_COND); | ||
break; | ||
|
||
case R_ARM_PREL31: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is this relocation needed/used? Do we have a test covering it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the 5th commit of the PR#70171, there are tests.
I'll add more code to trigger most of the relocations and create a smaller PR when/if the previous ones are merged (otherwise they cannot work)
opval += *((uint32_t *)opaddr); | ||
/* fall-through */ | ||
case R_ARM_TARGET1: | ||
*(uint32_t *)loc += sym_base_addr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to account for STT_FUNC and Thumb here? The arm docs suggest this should be (S + A) | T
It might also be clearer, at least when comparing with the docs to assign an addend variable here.
e.g. something like...
uint32_t t = (is_func(sym) && is_thumb(loc)) ? 1 : 0;
uint32_t addend = *(uint32_t *)loc;
*(uint32_t *)loc = (sym_addr + addend) | t;
Much closer matches the arm docs which I think personally would be helpful when looking to see "is this relocation doing what the docs say?"
Update: I get this is what was here before, but since this is rewriting much of it maybe this is the time to make it look like the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an answer from my technical expert (what he meant is that is already done, but in several stages):
As in '(S + A) | T', the '| T' does mean that we must set the bit0 if we are in Thumb & a function.
This is done by LLEXT while we are computing 'link_addr' in 'llext_link' which becomes 'sym_addr' in 'arch_elf_relocate()'
else if (ELF_ST_TYPE(sym.st_info) == STT_SECTION ||
ELF_ST_TYPE(sym.st_info) == STT_FUNC ||
ELF_ST_TYPE(sym.st_info) == STT_OBJECT) {
/* Link address is relative to the start of the section */
link_addr = (uintptr_t)ext->mem[ldr->sect_map[sym.st_shndx]]
+ sym.st_value;
The internal extension symbols are handled by the + sym.st_value;
The external ones:
else if (sym.st_shndx == SHN_UNDEF) {
/* If symbol is undefined, then we need to look it up */
link_addr = (uintptr_t)llext_find_sym(NULL, name);
'llext_find_sym()' returns the correct address because it has been register via the EXPORT_SYNBOL macro which actually handles the bit0 in Thumb.
Here is an example with the movwmovt.llext test:
For the FUNC type 'test_func' symbol, 'sym.st_value' equals 1 but 0 for the others. Same for 'test_entry' where 'sym.st_value' is odd.
a57b0a9
to
ed7ffe0
Compare
@teburd it might be the good place to ask such question but, is there anyway to ask the CI/CD (twister) to reset its workspace? |
I think the server auto-merges with A rebase and manual fix of this merge "non-conflict" should solve this 🙂 |
ed7ffe0
to
fc85b55
Compare
You're right for 'offset'. I move it locally from the global switch() into each handler functions.
Well, I'd prefer keep it as uintptr as, as you said, it is a global function. It could used in another architecture where the pointer are 64bit long (cast will be needed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay and thanks for your dedication. Looks good, I've added a few comments below!
arch/arm/core/elf.c
Outdated
case R_ARM_ABS32: | ||
/* Add the addend stored at opaddr to opval */ | ||
opval += *((uint32_t *)opaddr); | ||
/* fall-through */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, please do remove the fall through comments where there is no body.
((offset & (MASK_THM_MOV_IMM4<<SHIFT_THM_MOV_IMM4)) >> SHIFT_THM_MOV_IMM4) | | ||
((offset & (MASK_THM_MOV_I<<SHIFT_THM_MOV_I)) >> SHIFT_THM_MOV_I)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... here, with absolute shifts, this could be rewritten as something like
((offset & (MASK_THM_MOV_IMM4<<SHIFT_THM_MOV_IMM4)) >> SHIFT_THM_MOV_IMM4) | | |
((offset & (MASK_THM_MOV_I<<SHIFT_THM_MOV_I)) >> SHIFT_THM_MOV_I)); | |
(FIELD_PREP(MASK_THM_MOV_IMM4, offset >> SHIFT_THM_MOV_IMM4) | | |
(FIELD_PREP(MASK_THM_MOV_I, offset >> SHIFT_THM_MOV_I)); |
(please double check, I haven't tested this code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to address all your requests but I don't get this one. I checked the FIELD_PREP()/FIELD_GET() and I got scared :) They contains division/multiplication and, in my case, they would be used with a variable so... I don't know how the compiler could optimize it...
@pillo79 Do we really need to use these macros here?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, missed this reply last week! Not a deal breaker, it was always in the same spirit of using "native" Zephyr tools. These macros use a maths trick (multiplication / division by a power of 2 - the LSB_GET
returns the lowest bit in the bitmask) to actually shift numbers to/from offset 0; the compiler turns this into a constant shift operation, so it has no runtime cost.
The macros also automatically chop bits outside the mask you give them, so you can avoid some typing.
The current code uses the following numbers:
#define MASK_THM_MOV_I BIT(10)
#define MASK_THM_MOV_IMM4 GENMASK(3, 0)
#define SHIFT_THM_MOV_I 1
#define SHIFT_THM_MOV_IMM4 12
...
((offset & (MASK_THM_MOV_IMM4<<SHIFT_THM_MOV_IMM4)) >> SHIFT_THM_MOV_IMM4) |
((offset & (MASK_THM_MOV_I<<SHIFT_THM_MOV_I)) >> SHIFT_THM_MOV_I)
I would rewrite it as:
#define MASK_THM_MOV_I BIT(10)
#define MASK_THM_MOV_IMM4 GENMASK(3, 0)
#define SHIFT_THM_MOV_I 11 // <--
#define SHIFT_THM_MOV_IMM4 12
...
FIELD_PREP(MASK_THM_MOV_IMM4, offset >> SHIFT_THM_MOV_IMM4) |
FIELD_PREP(MASK_THM_MOV_I, offset >> SHIFT_THM_MOV_I)
Note how the shift for MOV_I
is 11, which I find easier to check and maintain than "the position of the mask, plus or minus 1 depending on the direction" (since some have inverted shifts!).
(the one for MOV_IMM4
gets lucky as the mask is zero-based, so the shift does not change 🙂)
offset = ((upper & MASK_THM_MOV_IMM4) << SHIFT_THM_MOV_IMM4) | | ||
((upper & MASK_THM_MOV_I) << SHIFT_THM_MOV_I) | | ||
((lower & MASK_THM_MOV_IMM3) >> SHIFT_THM_MOV_IMM3) | (lower & MASK_THM_MOV_IMM8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO the way shifts are written in this file really makes them rather arcane - you have to read both the MASK_
and the SHIFT_
symbols and do some mental gym to understand where each term ends up in the final integer.
I wish bit shifts like this were written in Zephyr style, for example like this:
offset = ((upper & MASK_THM_MOV_IMM4) << SHIFT_THM_MOV_IMM4) | | |
((upper & MASK_THM_MOV_I) << SHIFT_THM_MOV_I) | | |
((lower & MASK_THM_MOV_IMM3) >> SHIFT_THM_MOV_IMM3) | (lower & MASK_THM_MOV_IMM8); | |
*offset = (FIELD_GET(MASK_THM_MOV_IMM4, upper) << SHIFT_THM_MOV_IMM4) | | |
(FIELD_GET(MASK_THM_MOV_I, upper) << SHIFT_THM_MOV_I) | | |
(FIELD_GET(MASK_THM_MOV_IMM3, lower) << SHIFT_THM_MOV_IMM3) | | |
(FIELD_GET(MASK_THM_MOV_IMM8, lower) << SHIFT_THM_MOV_IMM8); |
where now the shifts are absolute and not related to the masks. The advantage is even clearer in the reverse operation below...
7a8da39
to
af37474
Compare
f36e8fd
to
4eb4f35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you for all the hard work
Thank you! |
Just wait for one of our release engineers to merge it since it now has all the approvals :) It should happen virtually anytime now! |
There's a merge conflict, so it needs to be rebased first. |
I used the graphic tool to fix the conflict, hope it is the way to go... |
Adds support for all relocation type produced by GCC on ARM platform using partial linking (-r flag) or shared link (-fpic and -shared flag). Signed-off-by: Cedric Lescop <cedric.lescop@se.com>
Arch arm relocatate test covers: R_ARM_ABS32: all tests decode_thm_jumps -> relative jmp extension decode_thm_movs -> movwmovt extension Signed-off-by: Cedric Lescop <cedric.lescop@se.com>
8771564
to
da2d3f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! The FIELD_ macros are just a preference, but it's probably either now or never 🤭, so I had to ask!
((offset & (MASK_THM_MOV_IMM4<<SHIFT_THM_MOV_IMM4)) >> SHIFT_THM_MOV_IMM4) | | ||
((offset & (MASK_THM_MOV_I<<SHIFT_THM_MOV_I)) >> SHIFT_THM_MOV_I)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, missed this reply last week! Not a deal breaker, it was always in the same spirit of using "native" Zephyr tools. These macros use a maths trick (multiplication / division by a power of 2 - the LSB_GET
returns the lowest bit in the bitmask) to actually shift numbers to/from offset 0; the compiler turns this into a constant shift operation, so it has no runtime cost.
The macros also automatically chop bits outside the mask you give them, so you can avoid some typing.
The current code uses the following numbers:
#define MASK_THM_MOV_I BIT(10)
#define MASK_THM_MOV_IMM4 GENMASK(3, 0)
#define SHIFT_THM_MOV_I 1
#define SHIFT_THM_MOV_IMM4 12
...
((offset & (MASK_THM_MOV_IMM4<<SHIFT_THM_MOV_IMM4)) >> SHIFT_THM_MOV_IMM4) |
((offset & (MASK_THM_MOV_I<<SHIFT_THM_MOV_I)) >> SHIFT_THM_MOV_I)
I would rewrite it as:
#define MASK_THM_MOV_I BIT(10)
#define MASK_THM_MOV_IMM4 GENMASK(3, 0)
#define SHIFT_THM_MOV_I 11 // <--
#define SHIFT_THM_MOV_IMM4 12
...
FIELD_PREP(MASK_THM_MOV_IMM4, offset >> SHIFT_THM_MOV_IMM4) |
FIELD_PREP(MASK_THM_MOV_I, offset >> SHIFT_THM_MOV_I)
Note how the shift for MOV_I
is 11, which I find easier to check and maintain than "the position of the mask, plus or minus 1 depending on the direction" (since some have inverted shifts!).
(the one for MOV_IMM4
gets lucky as the mask is zero-based, so the shift does not change 🙂)
I tend to agree with @pillo79 on the readability on using the FIELD_PREP() macro. Happy to re-approve if changed to use FIELD_PREP() and would prefer this be done. Having said that this is a lot of great work and changing to FIELD_PREP may be done immediately as a follow up or in this PR. |
Roger that. Just let me do another PR since I still have "a lot" to propose from the PR#70171, I'll add it to the list. I need a "small" victory here 😇 |
No worries, I get it |
Hi @selescop! To celebrate this milestone and showcase your contribution, we'd love to award you the Zephyr Technical Contributor badge. If you're interested, please claim your badge by filling out this form: Claim Your Zephyr Badge. Thank you for your valuable input, and we look forward to seeing more of your contributions in the future! 🪁 |
Adds support for all relocation types produced by GCC on ARM platform using partial linking (-r flag) or shared link (-fpic and -shared flag).
Subset of LLEXT full ARM relocation support (adding executable) PR#70171