The Hacker Who Archived Parler Explains How She Did It (and What Comes Next)

Authored by vice.com and submitted by CodeDinosaur

Those plans hit a snag when Amazon Web Services, Google, and Apple deplatformed Parler, effectively erasing it off the internet, at least temporarily. Parler was an organizing and rallying point for the far-right, including many of the Capitol Hill insurrectionists, and so its erasure from the internet threatened to destroy months of posts that could be used to better understand the attack on the Capitol.

With Twitter's permanent ban of Donald Trump and tens of thousands of QAnon-linked accounts following the Capitol takeover on Wednesday, many of Trump’s followers planned to make a new home on Parler, the “free speech” alternative that has become known for hosting far-right content.

“I hope that it can be used to hold people accountable and to prevent more death,” she said. “I think people should be allowed to have their own opinion as long as they can act civilized, on Wednesday we saw what can happen if they don’t.”

Nevertheless, with the FBI, state and local law enforcement, and open-source investigators looking for media from Wednesday's attack, the archive could be highly useful to a whole host of people.

“Everything we grabbed was publicly available on the web, we just made a permanent public snapshot of it,” donk_enby told me.

When news of donk_enby's archival efforts broke, several viral tweets, Reddit posts, and Facebook posts claimed that she had captured private information, scans of drivers licenses and IDs, and other highly sensitive information. She said those posts are “not at all” accurate.

But the quick thinking of a self-described hacker by the name of donk_enby and a host of amateur data hoarders preserved more than 56.7 terabytes of data from Parler that donk_enby and open source investigators believe could be useful in piecing together what happened last Wednesday and in the weeks and months leading up to it. donk_enby was able to scrape and capture and archive nearly the entire content of the website after it became clear that hundreds of Trump supporters had uploaded potentially incriminating photos and videos of themselves to the platform, many filming from inside the Capitol itself.

The task of downloading that data, what she called the “big pull”, was a race against the clock—Amazon was set to revoke Parler’s hosting services within hours, and over 50 terrabytes of data had to be pulled from the site in order to be effectively archived. After donk_enby tweeted about the content she was scraping from Parler, the Archive Team , a volunteer collection of hackers and data researchers who have saved a host of other dying sites, took notice and joined in her effort . “The Archive Team deserves a lot of credit for orchestrating the big pull,” donky_enby told me, saying that he group paid the steep server costs and constructed a tool that allowed anonymous Twitter users to volunteer their own bandwidth to help speed the transfer, which at one point peaked at 50 GB per second. The extra speed proved critical—the group-effort managed to capture 96% of Parler’s content by midnight.

When word of donk_enby’s project broke online, competing theories circled about what information had actually been pulled. What donk_enby actually did was an old school scrape of already publicly available information. Using a jailbroken iPad and Ghidra, a piece of reverse-engineering software designed and publicly released by the National Security Agency, donk_enby managed to exploit weaknesses in the website’s design to pull the URL’s of every single public post on Parler in sequential order, from the very first to the very last, allowing her to then capture and archive the contents.

When rumors of Parler’s imminent deletion began to circulate, donk_enby, who has been researching Parler for months, understood that a litany of important information about America’s most prominent far-right extremist groups was at risk of being permanently hidden from the public eye. In a monumental effort, donk_enby and a few other fellow hackers and researchers managed to capture and archive nearly every post, photo and video on Parler before it was shut down.

On Saturday, Amazon Web Services announced that it would no longer host Parler, cutting the company off from one of the largest web hosts in the world. The move was set to be effective Sunday at midnight. The clock was ticking.

The data is currently being processed and should be available to browse in a couple days, according to donk_enby. Early archives of it are already cropping up as torrent files and are being shared on IRC channels and different git sites. One of the hosters posted this message on their website : "the files were shared from this site, and made into a torrent file so the distribution is mostly out of my hands now," they said. "the data has also been shared with researches and archival organizations." Metadata archives have been uploaded and new scripts have been written to help parse and plot the data.

donk_enby had originally intended to grab data only from the day of the Capitol takeover, but found that the poor construction and security of Parler allowed her to capture, essentially, the entire website. That ended up being 56.7 terabytes of data, which included every public post on Parler, 412 million files in all—including 150 million photos and more than 1 million videos. Each of these had embedded metadata like date, time and GPS coordinates —unlike most social media sites, Parler does not strip metadata from media its users upload, which, crucially, could be useful for law enforcement and open source investigators.

In December, donk_enby published details about Parler's iOS app on her GitHub, which Archive Team used to help them scrape the site. At the time, she posted on her GitHub that the API could be used "to solve fun mysteries such as:

Users of Parler have responded with threats.

“All the hate and threats I’m getting make it all the more satisfying. I don’t know the full extent of what’s in there but people are afraid,” she said.

A screenshot she posted from a group named the North Central Florida Patriots called out her Twitter handle and named her “the rat running the operation”:

“Bad news. Left extremists have captured and archived over 70TB of data from parler severs. This includes posts, personal information, locations, videos, images etc. The intent is a mass dox and a list to hold patriots “accountable”. It is too late to scrub your data, and its already archived. There is nothing you can do to prevent whats already happened. All you can do is prepare for the fallout.”

Parler has since registered their domain with Epik, a service that hosts other similar platforms used by far-right groups like Gab and 8chan, and are now suing Amazon).

It’s worth noting that the FBI could have gotten the server information on their own, but what this kind of public dump does is empower other hackers, researchers, activists, and antifascist members of the public to identify suspects on their own and make their names and faces public. It also preserves posts organizing the insurrection and other violent threats, rhetoric, and planning done by the far right groups involved in the takeover, an important piece of information when trying to answer the Capitol Police’s and federal government's inexplicable lack of preparation for the January 6th violence. donk_enby told me that the data is “already being processed to extract metadata, pull still frames, and maybe run some computer vision analysis.” Social media has proved a powerful force in identifying people at riots and protests—as I write this, the FBI is posting screengrabs on Twitter asking for help identifying suspects photographed inside the Capitol.

While donk_enby’s information will surely prove valuable to antifascist groups and others who have a vested interest in naming and shaming right-wing extremists, the level playing field of the internet makes it just as likely to aid the state in seeking prosecution. While donk_enby didn't archive this for the explicit purpose of helping law enforcement (she considers herself an anarcho-socialist and said the data would have utility for those leading crowd-sourced identification efforts), she acknowledges they may find it useful. “Once people start sifting through our archive, it should point them to where they can find the actual legally admissible evidence,” she said.

x_Sh1MMy_x on January 13rd, 2021 at 02:13 UTC »

"Using a jailbroken iPad and Ghidra, a piece of reverse-engineering software designed and publicly released by the National Security Agency, donk_enby managed to exploit weaknesses in the website’s design to pull the URL’s of every single public post on Parler in sequential order, from the very first to the very last, allowing her to then capture and archive the contents." -If anyone was wondering how it was done  ..

Edit:Thanks for my first award kind person of reddit and the upvotes

unpopulrOpini0n on January 12nd, 2021 at 23:04 UTC »

"Each of these had embedded metadata like date, time and GPS coordinates—unlike most social media sites, Parler does not strip metadata from media its users upload, which, crucially, could be useful for law enforcement and open source investigators. "

Bruh GPS, did they not have a single real coder on staff? I thought anyone even mildly versed in tech would know about metadata in pictures?

Edit: do yourself a favor, google Monero.

rawling on January 12nd, 2021 at 19:53 UTC »

When news of donk_enby's archival efforts broke, several viral tweets, Reddit posts, and Facebook posts claimed that she had captured private information, scans of drivers licenses and IDs, and other highly sensitive information. She said those posts are “not at all” accurate.

I've spent the past 48 hours telling people this; glad to have it spelled out.