While the Republican National Committee was allegedly busy evading Russian hacking attempts during the campaign, one of the contractors tasked with running its big-data operation apparently stored some of its most-sensitive files on an unsecured Amazon server that could be accessed by anyone who stumbled across the URL. As the Hill reports, Deep Root Analytics left a database containing 24 terabytes of data, including information about 198 million potential voters, or virtually the entire eligible population. The data included sensitive, but publicly available, information like voters’ addresses and phone numbers.
But more interesting than any personal information involved in the leak was the insight into Deep Root’s “big data” modeling tactics. The data included probabilities for individual voters’ positions on dozens of political issues, as well as estimates of how they voted in past elections. It also provided some insight into how big-data firms collaborate, as identifiers in the data suggest some of it was provided to Deep Root from TargetPoint and the Data Trust, two other big-data firms used by the Republicans.
Here’s the Hill:
“For example, a 50-gigabyte file of “Post Elect 2016” information, last updated in mid-January, contained modeled data about a voter’s likely positions on 46 different issues ranging from “how likely it is the individual voted for Obama in 2012, whether they agree with the Trump foreign policy of ‘America First’ and how likely they are to be concerned with auto manufacturing as an issue, among others.”
That file appears in a folder titled “targetpoint,” an apparent reference to another firm contracted by the RNC to crunch data. UpGuard speculates that the folder may imply that the firm TargetPoint compiled and shared the data with Deep Root. Another folder appears to reference Data Trust, another contracted firm.”
One UpGuard analyst who spoke with the Hill said he looked himself up in the data and that the estimated preferences were right on the money.
Gizmodo went into more detail about the exact nature of some of Deep Root’s models, including one that was used to predict voters’ opinions about the oil and gas industry in the US.
“One exposed folder is labeled “Exxon-Mobile” [sic] and contains spreadsheets apparently used to predict which voters support the oil and gas industry. Divided by state, the files include the voters’ names and addresses, along with a unique RNC identification number assigned to every US citizen registered to vote. Each row indicates where voters likely fall on issues of interest to ExxonMobil, the country’s biggest natural gas producer.”
“The data evaluates, for example, whether or not a specific voter believes drilling for fossil fuels is vital to US security. It also predicts if the voter thinks the US should be moving away from fossil-fuel use. The ExxonMobil ‘national score’ document alone contains data on 182,746,897 Americans spread across 19 fields.”
Some of the data were apparently scraped from various subreddits, including, bizarrely, the banned subreddit “r/fatpeoplehate.” As Gizmodo noted, this is likely an attempt by Republicans to match social-media profiles with individual voters, a technique at Obama’s 2012 reelection campaign pioneered.
“Some of the data included in Deep Root’s dataset veers into downright bizarre territory. A folder titled simply ‘reddit’ houses 170 GBs of data apparently scraped from several subreddits, including the controversial r/fatpeoplehate that was home to a community of people who posted pictures of people and mocked them for their weight before it was banned from Reddit’s platform in 2015. Other subreddits that appear to have been scraped by Deep Root or a partner organization focused on more benign topics, like mountain biking and the Spanish language.”
“The Reddit data could’ve been used as training data for an artificial intelligence algorithm focused on natural language processing, or it might have been harvested as part of an effort to match up Reddit users with their voter registration records. During the 2012 election cycle, Barack Obama’s campaign data team relied on information gleaned from Facebook profiles and matched profiles to voter records.”
Reddit was an important locus of activity for Trump supporters during the campaign: it was the medium where they would create memes and share news stories that countered the mainstream media narrative. The subreddit currently boasts more than 440,000 members. Deep Root was paid handsomely for its efforts. The Hill, citing FEC data, said the firm collected $983,000 from the RNC between January 2015 and November 2016, according to Ad Age.
Though Deep Root has characterized the leak as an oversight, it still amounts to the largest leak of voter information ever.
“It dwarfs the second-largest exposure of voter information — 93.4 million records of Mexican citizens — by more than 100 million voters and tops the largest data breach of voter information — 55 million records of Philippine voters — by more than 140 million.”
For its part, Deep Root said it was merely using the data to help target TV ad-buys. “Deep Root Analytics builds voter models to help enhance advertiser understanding of TV viewership. The data accessed was not built for or used by any specific client. It is our proprietary analysis to help inform local television ad buying.”