Is it possible some reviews aren’t included in the static archive on http://108.170.204.19/?
For example, I get 404 when trying to access http://108.170.204.19/D99750.
Cheers,
Florian
Is it possible some reviews aren’t included in the static archive on http://108.170.204.19/?
For example, I get 404 when trying to access http://108.170.204.19/D99750.
Cheers,
Florian
cc @MaskRay
We also can’t access the post-commit discussions on phab anymore (the rL and rG pages) - these redirect to the github commit so we’ve lost all the replies ![]()
I haven’t had time to crawl /rGxxx pages. (It took ~7 days even if I used 6 processes and the CPU cycles of Phabricator+MySQL were very high) before I went on vacation. For /rLxxx` pages, I think Phabricator instance has redirected them to rG or github long ago so the discussions have been lost for a long time.
We have now lost access to the Phabricator instance, therefore I cannot crawl the /rGxxx pages… I have asked whether the Phabricator instance can be opened temporarily even if just for our local area network, so that I can improve the archive, e.g. make some large pages smaller by not using ?large=true, archiving the missing differential pages.
Note, certain differentials and user profiles can be found on archive.org, e.g. https://web.archive.org/web/20231118021608/https://reviews.llvm.org/D70401 https://web.archive.org/web/20231004073648/https://reviews.llvm.org/p/MaskRay/
yeah so far i have been relying on archive.org to find all the patches that were pending review. It is not very reliable for all the patches though.
Specifically for, https://reviews.llvm.org/D99750 I’d ask @ABataev1 to add the patch on github for review.
I have collected differentials that are not “Closed” at
https://gist.githubusercontent.com/MaskRay/798de69eb9e7ec7c3e98507265dc5514/raw/
(The majority of differentials are “Closed” and not interesting.)
Newer libc++ differentials have the subscriber “libcxx-commits”. Hopefully this list can offer some help
% grep -m 3 'Needs Review.*libcxx-commits' differentials-not-closed.tsv
https://reviews.llvm.org/D52995 Needs Review [CMake] Fix the -nodefaultlibs check. cdavis5x libcxx-commits
https://reviews.llvm.org/D53127 Needs Review Fix missing defines in libc++abi when compiled against libgcc's arm unwind (follow-up of D42242) stefson libcxx-commits
https://reviews.llvm.org/D53796 Needs Review [libcxx] Use AS_NEEDED command for linker script inputs phosek libcxx-commits
Specifically for, https://reviews.llvm.org/D99750 I’d ask @ABataev1 to add the patch on github for review.
For differentials that failed to be crawled, I can try re-crawl them if I can access the Phabricator instance. For now, archive.org is the best bet (but it doesn’t index all differentials…)
It looks like https://reviews.llvm.org/D149573 was missed from the previous crawl. This is an important review that has been going on for quite some time. It doesn’t appear in the previously linked “not closed” gist either. Its single capture on archive.org is outdated as well.
Was any communication sent at all about Phabricator transitioning to a static archive site? Or that email support was being discontinued? I’m very disappointed in how this was handled. ![]()
Even if new patches could not be posted I think there was still a lot of useful information within Phabricator. I think having some of the other functionality like being able to search for patches was still useful. I agree about the last sentence.
Among D1~D159553, there were 1669 missing differentials in my first round of crawling. These differentials might be deleted by the author, had a permission error (the author did it make it publicly readable), or the crawler had an error (a failed emulated button click).
The number of truly missing differentials is about 759. Among them, 184 differentials have a state other than “Closed”.
It seems that the Phabricator machine is up.
By changing my /etc/hosts, I managed to crawl missing pages (such as https://reviews.llvm.org/D99750 https://reviews.llvm.org/D149573) for the read-only archive (current reviews.llvm.org). https://gist.githubusercontent.com/MaskRay/798de69eb9e7ec7c3e98507265dc5514/raw/ has been updated.
Some differentials, such as ⚙ D144999 [Clang][MC][MachO]Only emits compact-unwind format for "canonical" personality symbols. For the rest, use DWARFs., have a modal dialog like Failed to load file data for changeset ("PHID-FILE-6vzxx7brvn3xup3gjn7g"). I think we need some javascript hack to remove it…
Crawler change (GitHub - MaskRay/llvm-phabricator-archive)
@@ -77,10 +78,14 @@ def load_diff(browser, diff, url, diff_version_id):
for older_link in older_links:
print(f"{print_prefix}: found older comments to load, clicking", flush=True)
- older_link.click()
- criteria = (
- By.XPATH,
- '//div[contains(text(), " created this revision.") and contains(@class, "phui-timeline-title")]',
- )
- WebDriverWait(browser, 10).until(EC.presence_of_element_located(criteria))
+ try:
+ older_link.click()
+ criteria = (
+ By.XPATH,
+ '//div[contains(text(), " created this revision.") and contains(@class, "phui-timeline-title")]',
+ )
+ WebDriverWait(browser, 10).until(EC.presence_of_element_located(criteria))
+ except:
+ # Work around NoSuchElementError@chrome
+ pass
show_more_buttons = browser.find_elements(
@@ -88,12 +93,25 @@ def load_diff(browser, diff, url, diff_version_id):
'//a[@data-sigil="show-more" and contains(text(), "Show File Contents")]',
)
- for button in show_more_buttons:
- print(f"{print_prefix}: found folded file to load, clicking", flush=True)
- button.click()
+
+ for button in show_more_buttons[0:20]:
+ try:
+ m = re.match(r'([\d,]+) lines', button.find_element(By.XPATH,'..').text)
+ if not m: continue
+ lines = int(m.replace(',', ''))
+ if lines > 10000:
+ print(f"{print_prefix}: found folded file to load, skip due to too long: {lines}", flush=True)
+ continue
+ print(f"{print_prefix}: found folded file to load, clicking", flush=True)
+ button.click()
+ except Exception as e:
+ pass
# No need to wait here since we're waiting for all file loaders at the
# end of the function
print(f"{print_prefix}: waiting until all loaders have disappeared", flush=True)
- WebDriverWait(browser, 60).until(AllLoaderHaveDisappeared())
+ try:
+ WebDriverWait(browser, 30).until(AllLoaderHaveDisappeared())
+ except:
+ pass
This is a relatively minor issue, but I thought I’d point it out in case it might be a low-effort fix: it looks like image attachments are not preserved in the archive.
For example, this review comment contains two inline images, and it’s harder to understand the comment without seeing the images.