graphviz problem

When running clang with the following command:

clang -checker-simple -analyzer-store-region -analyzer-viz-egraph-graphviz example.c

on the following code:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

void write_file(int append) {
char* tempname = 0;
if (append == 1)
tempname = “a”;

if (append == 1)
open(tempname, O_RDONLY);

I got the following error:

ANALYZE: int.c write_file
Writing ‘/tmp/llvm_n297pA/’… done.
Running ‘dot’ program… Error: Invalid 2-byte UTF8 found in input. Perhaps “-Gcharset=latin1” is needed?
Error viewing graph: ’

It seems that “a” is too short for ‘dot’. When the string is “aa”, “aap” is printed in the generated PS graph. When the string is “aaa” or longer, ‘dot’ works fine.

I fixed it.

Here it works just fine. But my system's default charset is latin0 (similar with latin1).

Did you use the fixed version? My observasion is that sometimes strlen(StringLiteral::getStrData()) is different from StringLiteral::getByteLength().

From Expr.h:

/// StringLiteral - This represents a string literal expression, e.g. “foo”
/// or L"bar" (wide strings). The actual string is returned by getStrData()
/// is NOT null-terminated, and the length of the string is determined by
/// calling getByteLength(). The C type for a string is always a
/// ConstantArrayType.

This means that using strlen(StringLiteral::getStrData()) is not safe; it will just march off the end of the literal until it hits the first null character that lies on the heap.

Thanks, Ted. I should read that comment.

It’s a subtle detail. Maybe we should wrap the return value of getStrData() with a variant so people won’t repeatedly make the same mistake (this comment is very easy to miss).

Incidentally, it’s this way because a string literal can contain a null character anywhere in the literal.


const char *s = “hello\0world”;

Uhm, I dunno what you mean, but I tested it with vanilla dot (2.20.3) and with vanilla clang trunk.