Optimize away memory allocations?

I’m writing a compiler and I was wondering if I can optimize this away somehow. Take the below as an example. The optimizer can easily replace buf[0] = getValue() + '0'; with buf[0] = '1'; The malloc is 16 bytes so there’s no problem putting it on the stack. The whole thing can become short v='1'; puts((char*)&v).

The compiler doesn’t depend on any runtime libraries (no libc, we wrote all the assembly ourselves). The language hides memory allocations so it doesn’t matter if the memory is on the stack or heap. I was wondering if there’s a way to hint to clang and its optimizer so it’d be willing to remove the malloc and free?

The reason the compiler doesn’t do this itself is the more complicated example of concatenating two strings which happen to be literals after applying a few optimizations. If the llvm optimizer can do it I rather use that then write my own which may interfere with good code generation

#include <stdio.h>
#include <stdlib.h>
int getValue() { return 1; }
int main(int argc, char *argv[]) {
	char*buf=malloc(16);
	buf[0] = getValue() + '0';
	buf[1] = 0;
	puts(buf);
	free(buf);
}

If you turn on the Attributor it will perform heap-2-stack for you:

1 Like

Thanks that worked. Now I’ll need to figure out how to reproduce it in my code. I suspect the problem is the function doing the malloc isn’t being inlined. Another potential problem is my malloc has a different signature (libmalloc(int size, int alignment)). I should be able to figure this out

You need to annotate your allocation and freeing functions with appropriate attributes. It should look roughly like this:

declare noalias ptr @my.malloc(i64 %size, i64 allocalign %align) allocsize(0) allockind("alloc,aligned,uninitialized") "alloc-family"="my.malloc"
declare void @my.free(ptr allocptr %ptr) allockind("free") "alloc-family"="my.malloc"

See LangRef for an explanation of these attributes.

3 Likes

Thanks I’ll look into that next. I just sat down and I confirmed the function calling malloc isn’t being inlined. Is there an easy way to find the reason? It’s probably obvious. I vaguely remember gcc having something like that buried in one of their json outputs. My calling code is a ll file and the malloc is in a cpp file. I’m not worried, I’ll probably edit this comment in an hour or two when I figure this all out

Chances are that remarks would tell why a function isn’t inlined.

  • It’s possible that at some places the function bail out (i.e., not inline) without emitting a remark though.

For llvm opt (taking LLVM IR as input), use opt -pass-remarks-missed=inline (Remarks — LLVM 18.0.0git documentation)

For clang (taking c++ as input, or LLVM IR with -x ir), use -Rpass-missed=inline (Clang Compiler User’s Manual — Clang 18.0.0git documentation)

1 Like

Actually with Compiler Explorer, inline remark works for getValue.

With Compiler Explorer, missed remarks doesn’t tell anything. disassembly - Can a C/C++ compiler inline builtin functions like malloc()? - Stack Overflow has explanations.

1 Like

There are no remarks as malloc & free are declarations and emitting remarks in such cases would be pretty verbose.

I’m surprised this is what stopped the optimization. The function gets inlined but the malloc does not Compiler Explorer

I’m not sure how or where to report this but it looks like I should here? Missed optimization inlining malloc · Issue #58071 · llvm/llvm-project · GitHub

EDIT: Apparently the question is why did the heap-2-stack not happen, the answer is that the Attributor module pass runs before the inliner and the “mymalloc” function is not recognized as a “malloc-like” function. See Optimize away memory allocations? - #4 by nikic

If you alternatively run the Attributor cgscc pass it works again, but you really want to also annotate your “mymalloc”

I couldn’t comment on github. I confirmed the solution works for me on the simple example on compiler explorer but not my larger one. In the morning I’ll see if I can provide another reproducible. I’ll also try compiling it down to ll and giving it the attributes nikic suggested

I don’t think this will affect my goal. I get the error hello.ll:18:19: error: unterminated attribute group whenever I try to set an allockind. Here’s what I did

I first compiled this c file into ll

void* test() { return 0; }
int main() {
	test();
}

Then edit the attribute. I copy pasted attr 0 then added allockind and tried a few various. I also tried to use it on the line where I declare the function test . I couldn’t figure out how not to get an error

; ModuleID = 'hello.c'
source_filename = "hello.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

; Function Attrs: noinline nounwind optnone sspstrong uwtable
define dso_local i8* @test() #1 {
  ret i8* null
}

; Function Attrs: noinline nounwind optnone sspstrong uwtable
define dso_local i32 @main() #0 {
  %1 = call i8* @test()
  ret i32 0
}

attributes #0 = { noinline nounwind optnone sspstrong uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { allockind("alloc") noinline nounwind optnone sspstrong uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x8>


!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 1}
!4 = !{i32 7, !"frame-pointer", i32 2}
!5 = !{!"clang version 14.0.6"}

I’m not sure if allocy-family was a auto correct, a typo that happened twice or a test to see if I’m one to read documentation. I didn’t get any errors when I used alloc-family but that was yesterday and I may have remembered wrong

Back to my problem (which I can solve in my own compiler). Below is a C file that shows the bad case, commenting out one line gets the good case. I used clang -O3 -mllvm -attributor-enable=all -Rpass-missed=inline test.c && gdb -batch -ex "disassemble/rs main" ./a.out | grep call to test

The problem now is I pass in the malloc pointer and length into a function which puts it into a struct. For whatever reason it stops the compiler from optimizing out the malloc (assembly shows its inline, -Rpass says its inlined at the cost of -50). If you comment out the function you’ll get the good case. You can uncomment the rest of the functions to see calling those functions don’t break it.

I can simply inline that function call in my compiler so I don’t think this will be a problem. I might not bother for another week or two since there’s other things I want to do

#include <unistd.h>
#include <stdio.h>	
#include <stdlib.h>
#include <string.h>

typedef unsigned long long u64;
typedef signed long long s64;

struct Slice { char*ptr; s64 size; };

typedef struct Slice Slice;

void create_string(const char*p, long long len, Slice*out, int flags)
{
	out->ptr = (char*)p;
	out->size = ~(s64)(((len << 1)|(flags&1)));
}
void destroy_string(Slice*s)
{
	free(s->ptr);
}
void string_to_slice(char*p, Slice*out)
{
	Slice*in = (Slice*)p;
	out->ptr = in->ptr;
	out->size = (~in->size) >> 1;
}

void print(Slice*slice, int printNewline) {
	write(1, slice->ptr, slice->size);
	if (printNewline) {
		write(1, "\n", 1); ///TODO: To not do this
	}
}


int main(int argc, char *argv[])
{
	int64_t a = 5, count=17, i=0;
	count += 1;
	char*buf=malloc(16);
	char*sz=buf;
	
	memcpy(sz, "a >15 byte string", 17);
	i = 17;
	sz[i]='5';
	i += 1;

	Slice szSlice, u8slice;
	// Comment next line for good case
	create_string(sz, i, &szSlice, 1); if (0)
	{
		u8slice.ptr = (char*)sz;
		u8slice.size = ~((i << 1)|(1&1));
	}

	//string_to_slice((char*)&szSlice, &u8slice); if (0)
	{
		char*p = (char*)&szSlice;
		if (p[15] >= 0) {
			u8slice.ptr = p;
			u8slice.size = p[14] ? 15 : p[15];
		} else {
			Slice*in = (Slice*)p;
			u8slice.ptr = in->ptr;
			u8slice.size = (~in->size) >> 1;
		}
	}
	//print(&u8slice, 1); if (0)
	{
		write(1, u8slice.ptr, u8slice.size);
		if (1) {
			write(1, "\n", 1);
		}
	}
	//destroy_string(&szSlice); if (0)
	{
		free(buf);
	}
	return 0;
}

Which LLVM version are you using? Most of the allocator support attributes are new in LLVM 15.

It was a typo, fixed :slight_smile:

Edit: Actually, your IR sample does have an unterminated attribute group. One line ends with "target-cpu"="x8>, something got corrupted there?

Strange, must have been a copy/paste error. It’s not like that in my source file.

Which LLVM version are you using?

That was it. My distro is on clang 14.0.6. I bookmarked LLVM Language Reference Manual — LLVM 14.0.0 documentation for next time. It didn’t occur to me that there could be memory attributes that aren’t there from the beginning.

Assuming nothing comes up after inlining create_string in my compiler my problem is solved

I’ll patch the Attributor to handle this case. Given the position of the free call it should work and will soon.

1 Like