Unnecessary Load Store Operations

While I was writing the selection pattern for the function below, I saw that there is many load and store operation in the assembly. I didn’t give an optimization flag. However, I didn’t define any load-store while calculating the result. I wonder why there is such load and store optimization, especially lhu. As I see it ignores my right shift instruction.

unsigned int pkg_fun(unsigned int rs1, unsigned int rs2){ 
    return (rs2<<16) | (rs1>>16);
	.loc	0 8 13 prologue_end             # pkg_fun.c:8:13
	lw	a0, -16(s0)
	.loc	0 8 16 is_stmt 0                # pkg_fun.c:8:16
	slli	a0, a0, 16
	.loc	0 8 25                          # pkg_fun.c:8:25
	lhu	a1, -10(s0)
	.loc	0 8 22                          # pkg_fun.c:8:22
	or	a0, a0, a1
	.loc	0 8 5                           # pkg_fun.c:8:5
	lw	ra, 12(sp)                      # 4-byte Folded Reload
	lw	s0, 8(sp)                       # 4-byte Folded Reload
	addi	sp, sp, 16

I only put the important part of the assembly code.

Unoptimised code generation is deliberately stupid, it makes it much simpler to reason about, as well as fast to generate. Keeping everything on the stack other than the data immediately being worked with is such a simplification, it avoids more complex register allocation and tracking.

This is not true. You missed the store that the load loads from, namely sw a0, -12(s0) according to godbolt.org.

It didn’t. Because it was already storing the value to the stack and loading it back, it was able to fold the right shift into the load; note how it stored 4 bytes to -12(s0) but loaded 2 from -10(s0), 2 bytes higher than the previous store’s start address, i.e. it loads just the high 2 bytes (on a little-endian target like riscv32).

1 Like

Thank you so much.

I mean it ignores and converts to the lhu. but I understand by the first paragraph of your answer.

Yes. My mistake. I only focused the part that was confusing for me.