Registrarse

Problem when learning limitations of "ldr"

Estado
Cerrado para nuevas respuestas.

jiangzhengwenjz

Usuario mítico
JPAN said in his guide(Click to get to that thread if you want the full document):
You may have noticed something odd, but I’ll point it out anyway. Notice that I load the labels always separated by two, or multiples of two. That is because the Load Lable operation can only fetch the addresses that are four bytes away from them, or multiples of four. That is because Ldr is a 32-bit Word fetching operation, and when added to the PC offset, the offset is shifted by two before adding to the address. That was made to make people’s lives easier, allowing to fetch words that are not 127 but 508 bytes away.
After that he said a lot to explain how to solve the problem of misaligned code, such as adding "add r0, r0, #0x0" or other code without real meaning. I could not understand the above quoted words so I couldn't get what he said.

Could anyone help me as I'm a newbie in Assembly? :)
Simple examples will also be appreciated.

PS: I may not reply immediately for the timezone difference, sorry.
 
Última edición:

cosarara97

Dejad de cambiar de nick
Miembro de honor
You may have noticed something odd, but I’ll point it out anyway. Notice that I load the labels always separated by two, or multiples of two. That is because the Load Lable operation can only fetch the addresses that are four bytes away from them, or multiples of four.
Say, this is our dummy ROM:
00000000 | 01 30 17 41 07 34 04 78
00000008 | 23 07 13 08 47 20 97 84
00000010 | 23 08 97 50 97 83 50 91
00000018 | 70 93 74 09 72 04 52 37
00000020 | 40 73 01 47 07 40 27 40
00000028 | 27 40 27 40 92 74 02 39
Now, I've highlighted in red the byte at address 0x5, in green the byte at address 0xA, and in blue the byte at address 0x14.
The byte at 0x5 is not aligned in any way, since 0x5 is neither multiple of 4 nor 2.
The byte at 0xA is halfword-aligned, meaning it is on a multiple of 2.
The byte at 0x14 is word-aligned, meaning it is on a multiple of 4.
You cannot use the ldr instruction giving it an address which isn't multiple of 4 (word-aligned), so you want all your labels to be word-aligned.
That is because Ldr is a 32-bit Word fetching operation, and when added to the PC offset, the offset is shifted by two before adding to the address. That was made to make people’s lives easier, allowing to fetch words that are not 127 but 508 bytes away.
JPAN's tutorial dijo:
.align 2
.thumb
Var_adder: push {r0-r2, lr}
Ldr r0, first_var_addr @this will load the address of that lable
Ldrh r0, [r0, #0x0] @and this will load its content.
Ldr r1, second_var_addr
Ldrh r1, [r1, #0x0] @both variable numbers are loaded
Lsl r2, r2, #0x10 @cleans the bottom half of register r2
Add r2, r2,#0x8 @places 0x8 in r2
Lsl r2, r2, #0xc @and this turns 0x8 in 0x8000, and cleans the rest @of r2
Sub r1, r1, r2 @leaves only the last value in
Sub r0, r0, r2 @each variable.
Lsl r1, r1, #0x2 @after multiplied by 4, It becomes the
@offset of that address
Ldr r2, var_8000_addr
Lsl r0, r0, #0x2 @Same here.
Add r0, r2, r0 @by adding the variable address to the offest
Add r1, r2, r1 @you get the correct variable address
Ldrh r2, [r0, #0x0] @now we load the content of the variables we
Ldrh r1, [r1, #0x0] @want to add
Add r2, r1, r2 @and add them
Strh r2, [r0, #0x0] @the result, that is in r2, is placed in the first @variable, the one in 0x8013
Pop {r0-r2, pc} @and we’re done

first_var_addr: .word 0x020370dc
second_var_addr:.word 0x020370de
var_8000_addr:.word 0x020370b8
Now: See that ".align 2" at the start?
Well, I just tested and if you write another ".align 2" before the variables at the end you get the assembler to pad things right for you.
.align 2
.thumb
Var_adder: push {r0-r2, lr}
Ldr r0, first_var_addr @this will load the address of that lable
Ldrh r0, [r0, #0x0] @and this will load its content.
Ldr r1, second_var_addr
Ldrh r1, [r1, #0x0] @both variable numbers are loaded
Lsl r2, r2, #0x10 @cleans the bottom half of register r2
Add r2, r2,#0x8 @places 0x8 in r2
Lsl r2, r2, #0xc @and this turns 0x8 in 0x8000, and cleans the rest @of r2
Sub r1, r1, r2 @leaves only the last value in
Sub r0, r0, r2 @each variable.
Lsl r1, r1, #0x2 @after multiplied by 4, It becomes the
@offset of that address
Ldr r2, var_8000_addr
Lsl r0, r0, #0x2 @Same here.
Add r0, r2, r0 @by adding the variable address to the offest
Add r1, r2, r1 @you get the correct variable address
Ldrh r2, [r0, #0x0] @now we load the content of the variables we
Ldrh r1, [r1, #0x0] @want to add
Add r2, r1, r2 @and add them
Strh r2, [r0, #0x0] @the result, that is in r2, is placed in the first @variable, the one in 0x8013
Pop {r0-r2, pc} @and we’re done

.align 2
first_var_addr: .word 0x020370dc
second_var_addr:.word 0x020370de
var_8000_addr:.word 0x020370b8
Now, JPAN seems to believe this is not the case. Therefore, since each thumb instruction is 2 bytes long, if the number of instructions in your routine is not even, you'd have to add ".hword 0000" or "add r0, r0, #0" at the end, so that your labels at the end are word-aligned.
".align 2" will align things to the word size (4) bytes. Then why not ".align 4", you may ask. Well, because the number you give to the directive is n in 2 to the power of n. So if you used ".align 4" you'd be aligning to 2⁴ which is 16 instead of 4.
 

jiangzhengwenjz

Usuario mítico
Re: Respuesta: Problem when learning limitations of "ldr"

Say, this is our dummy ROM:

Now, I've highlighted in red the byte at address 0x5, in green the byte at address 0xA, and in blue the byte at address 0x14.
The byte at 0x5 is not aligned in any way, since 0x5 is neither multiple of 4 nor 2.
The byte at 0xA is halfword-aligned, meaning it is on a multiple of 2.
The byte at 0x14 is word-aligned, meaning it is on a multiple of 4.
You cannot use the ldr instruction giving it an address which isn't multiple of 4 (word-aligned), so you want all your labels to be word-aligned.



Now: See that ".align 2" at the start?
Well, I just tested and if you write another ".align 2" before the variables at the end you get the assembler to pad things right for you.
.align 2
.thumb
Var_adder: push {r0-r2, lr}
Ldr r0, first_var_addr @this will load the address of that lable
Ldrh r0, [r0, #0x0] @and this will load its content.
Ldr r1, second_var_addr
Ldrh r1, [r1, #0x0] @both variable numbers are loaded
Lsl r2, r2, #0x10 @cleans the bottom half of register r2
Add r2, r2,#0x8 @places 0x8 in r2
Lsl r2, r2, #0xc @and this turns 0x8 in 0x8000, and cleans the rest @of r2
Sub r1, r1, r2 @leaves only the last value in
Sub r0, r0, r2 @each variable.
Lsl r1, r1, #0x2 @after multiplied by 4, It becomes the
@offset of that address
Ldr r2, var_8000_addr
Lsl r0, r0, #0x2 @Same here.
Add r0, r2, r0 @by adding the variable address to the offest
Add r1, r2, r1 @you get the correct variable address
Ldrh r2, [r0, #0x0] @now we load the content of the variables we
Ldrh r1, [r1, #0x0] @want to add
Add r2, r1, r2 @and add them
Strh r2, [r0, #0x0] @the result, that is in r2, is placed in the first @variable, the one in 0x8013
Pop {r0-r2, pc} @and we’re done

.align 2
first_var_addr: .word 0x020370dc
second_var_addr:.word 0x020370de
var_8000_addr:.word 0x020370b8
Now, JPAN seems to believe this is not the case. Therefore, since each thumb instruction is 2 bytes long, if the number of instructions in your routine is not even, you'd have to add ".hword 0000" or "add r0, r0, #0" at the end, so that your labels at the end are word-aligned.
".align 2" will align things to the word size (4) bytes. Then why not ".align 4", you may ask. Well, because the number you give to the directive is n in 2 to the power of n. So if you used ".align 4" you'd be aligning to 2⁴ which is 16 instead of 4.
Thanks, I think that I've got most of your words, but still some problems. (First of all I must say that I use Hackmew's thumb assembler in case there's any misleading)

JPAN said ldr Lable can only fetch the addresses that are 4 bytes away from them, or multiply of 4, while you say that the address where the lable is compiled should be multiply of 4. I think these two are not the same thing.

And for the ".align 2" I have a question. As you say, 2 means "2^2", then why at the first line we wrote .align 2 instead of .align 1 as each thumb instruction is 2-byte long? In addition, what if in Hackmew's assembler?
Then for
when added to the PC offset, the offset is shifted by two before adding to the address. That was made to make people’s lives easier, allowing to fetch words that are not 127 but 508 bytes away.
I still could not understand that. :s

Sorry for so many questions. ;)
 
Última edición:

cosarara97

Dejad de cambiar de nick
Miembro de honor
Respuesta: Re: Respuesta: Problem when learning limitations of "ldr"

Thanks, I think that I've got most of your words, but still some problems. (First of all I must say that I use Hackmew's thumb assembler in case there's any misleading)

JPAN said ldr Lable can only fetch the addresses that are 4 bytes away from them, or multiply of 4, while you say that the address where the lable is compiled should be multiply of 4. I think these two are not the same thing.
Actually, you are right. With a ldr from a register, the address has to be word-aligned. With a ldr from a label, this is not the case. I will explain how it works at the end.

And for the ".align 2" I have a question. As you say, 2 means "2^2", then why at the first line we wrote .align 2 instead of .align 1 as each thumb instruction is 2-byte long?
I think this is by convention, and because some call instructions can only call to word-aligned addresses (like blx).
In addition, what if in Hackmew's assembler?
There is no such a thing as "hackmew's assembler". What you name that way is just an old version of GAS, the GNU Assembler, compiled to work with the GBA processor. It also comes bundled in devkitARM and mid2agb, among others. This is the assembler we all assume to be using. Now, it could be that it is a very old version which forces you to write those nops manually? I doubt it, but I encourage you to try and see for yourself.
What Hackmew DID do was pack it with a .bat file (thumb.bat), which makes it easier for windows user to build their routines into a binary.
when added to the PC offset, the offset is shifted by two before adding to the address. That was made to make people’s lives easier, allowing to fetch words that are not 127 but 508 bytes away.
The ARM manual says it's multiplied by 4. It's the same thing, maybe a bit easier to understand. So, let's see an example.
This is our routine (doesn't do much, I know):
Código:
.align 2
.thumb
main:
push {lr}
ldr r1, wordy
mov r0, #3
str r0, [r1]
pop {pc}

.align 2
wordy:
.word 0xDEADBEEF
Here we have it assembled and disassembled:
Código:
00000000 <main>:
   0:	b500      	push	{lr}
   2:	4902      	ldr	r1, [pc, #8]	; (c <wordy>)
   4:	2003      	movs	r0, #3
   6:	6008      	str	r0, [r1, #0]
   8:	bd00      	pop	{pc}
   a:	46c0      	nop			; (mov r8, r8)

0000000c <wordy>:
   c:	deadbeef 	.word	0xdeadbeef
So, we have our GBA CPU running in Thumb mode, and it arrives to the point of executing that "ldr r1, wordy". As you can see in the disassembly, the CPU doesn't know about our labels. The assembler wrote instead "load the word at (PC+8)".
This is encoded as 0x4902. The 0x49 part means "ldr r1, [PC, (something)]". The 0x02 part means that something is 2*4, so #8.
Now, if wordy is at 0xc and the ldr at 2, why is it 8 instead of 10? (since 2+8=0xA, not 0xC) It's because the PC register will be 2 bytes ahead at that moment. The manual says it will be 4 bytes ahead. It's wrong. Even vba-sdl-h shows r15 as 2 bytes ahead. So whatever.

Anyway, so your ldr label will assemble into something following this syntax:
ldr rN, [PC + #M*4]
So it always has to be at a distance multiple of 4 of PC. And PC will be pointing to the next instruction.

Also, in my example, see that nop instruction? That was inserted automatically by my assembler. Go try with yours.
 

jiangzhengwenjz

Usuario mítico
Re: Respuesta: Re: Respuesta: Problem when learning limitations of "ldr"

Actually, you are right. With a ldr from a register, the address has to be word-aligned. With a ldr from a label, this is not the case. I will explain how it works at the end.


I think this is by convention, and because some call instructions can only call to word-aligned addresses (like blx).

There is no such a thing as "hackmew's assembler". What you name that way is just an old version of GAS, the GNU Assembler, compiled to work with the GBA processor. It also comes bundled in devkitARM and mid2agb, among others. This is the assembler we all assume to be using. Now, it could be that it is a very old version which forces you to write those nops manually? I doubt it, but I encourage you to try and see for yourself.
What Hackmew DID do was pack it with a .bat file (thumb.bat), which makes it easier for windows user to build their routines into a binary.

The ARM manual says it's multiplied by 4. It's the same thing, maybe a bit easier to understand. So, let's see an example.
This is our routine (doesn't do much, I know):
Código:
.align 2
.thumb
main:
push {lr}
ldr r1, wordy
mov r0, #3
str r0, [r1]
pop {pc}

.align 2
wordy:
.word 0xDEADBEEF
Here we have it assembled and disassembled:
Código:
00000000 <main>:
   0:	b500      	push	{lr}
   2:	4902      	ldr	r1, [pc, #8]	; (c <wordy>)
   4:	2003      	movs	r0, #3
   6:	6008      	str	r0, [r1, #0]
   8:	bd00      	pop	{pc}
   a:	46c0      	nop			; (mov r8, r8)

0000000c <wordy>:
   c:	deadbeef 	.word	0xdeadbeef
So, we have our GBA CPU running in Thumb mode, and it arrives to the point of executing that "ldr r1, wordy". As you can see in the disassembly, the CPU doesn't know about our labels. The assembler wrote instead "load the word at (PC+8)".
This is encoded as 0x4902. The 0x49 part means "ldr r1, [PC, (something)]". The 0x02 part means that something is 2*4, so #8.
Now, if wordy is at 0xc and the ldr at 2, why is it 8 instead of 10? (since 2+8=0xA, not 0xC) It's because the PC register will be 2 bytes ahead at that moment. The manual says it will be 4 bytes ahead. It's wrong. Even vba-sdl-h shows r15 as 2 bytes ahead. So whatever.

Anyway, so your ldr label will assemble into something following this syntax:
ldr rN, [PC + #M*4]
So it always has to be at a distance multiple of 4 of PC. And PC will be pointing to the next instruction.

Also, in my example, see that nop instruction? That was inserted automatically by my assembler. Go try with yours.
OK, this time I can understand all that you say!
Thank you greatly for your help.;)

I just get a summary if someone want to get the answer to my question:
When using "ldr":
1. (load from register) the address in the register must be ended with 0,4,8,C
2. (load from label) The really form is "ldr rA, [PC, #B*4]", so the address where the label is compiled have some limitation obviously. But it really doesn't matter as the compiler itself will add nop instruction to make it work.
 
Última edición:
Estado
Cerrado para nuevas respuestas.
Arriba