llvm.org robots.txt prevents crawling by Google code search?

One of the tools I use most frequently when coding is Google codesearch. Unfortunately, llvm.org’s robots.txt appears to block all crawlers from indexing the llvm.org svn archive. This means that when you search for an LLVM-related symbol in code search, you get one of the many (possibly out-of-date) mirrors, rather than the up-to-date llvm.org version. This is sad.

For more info, see the codesearch FAQ entry (item 9):

http://www.google.com/intl/en/help/faq_codesearch.html#regexp

indexing the llvm.org svn archive. This means that when you search for an
LLVM-related symbol in code search, you get one of the many (possibly
out-of-date) mirrors, rather than the up-to-date llvm.org version. This is
sad.

This is intentional. The workload of the server was pretty huge w/o this.

Anton Korobeynikov wrote:

indexing the llvm.org svn archive. This means that when you search for an
LLVM-related symbol in code search, you get one of the many (possibly
out-of-date) mirrors, rather than the up-to-date llvm.org version. This is
sad.

This is intentional. The workload of the server was pretty huge w/o this.

That was the old server though, wasn't it? Would we actually have any problems if we reenabled this?

Nick

indexing the llvm.org svn archive. This means that when you search for an
LLVM-related symbol in code search, you get one of the many (possibly
out-of-date) mirrors, rather than the up-to-date llvm.org version. This is
sad.

This is intentional. The workload of the server was pretty huge w/o this.

Could we at least add a rule allowing the codesearch crawler, rather than opening it up to all crawlers? The user agent string is SVN/1.5.4/GoogleCodeSearch.

So what I am proposing is replacing the contents of the robots.txt with the following: