Inside Google’s Secret War Against Ad Fraud
Inside Google’s Secret War Against Ad Fraud
The Leading National Advertisers 2016 Report
In a conference room nine floors above London’s St. Giles High Street, a Russian engineer named Sasha booted up a computer and began giving me instructions.
“First step, let’s go to some website,” he commanded. “AdAge.com, how about that one?” As the page loaded on my browser, a stream of code ran down the screen in a separate window to the left. After a few seconds, Sasha explained what was happening. “I’m afraid that when you’re dealing with our team, you shouldn’t just go to a website when we tell you,” he said. The computer, brand new, was already infected. “You are participating in a botnet.”
Actually, the connection I was operating on was hacked, not AdAge.com; visiting any site on the internet would have infected the computer. But Sasha appeared to be enjoying my discomfort, and his work was just beginning.
Sasha is a member of Google‘s secretive antifraud team. The unit, numbering more than 100, is locked in a war against an unknown quantity of cybercriminals who are actively siphoning billions of dollars out of the digital advertising industry, primarily via the creation of robotic traffic that appears human. Mysterious to many even within Google, the group has never spoken to an outsider about the way it hunts botnets, let alone allowed someone into its offices to observe the process. But that silence ended the moment Sasha opened his computer.
For players on the web both big and small, digital ad fraud is a significant and growing problem. The flow of advertising dollars to digital media from TV and print, accompanied by digital’s movement toward automation, has turned the space into fertile ground for some of the internet’s worst actors. According to a study by the fraud-fighting firm White Ops and the Association of National Advertisers, $6.3 billion will be lost to ad fraud in 2015. And Google, the biggest advertising technology company on the planet, stands to lose the most because of the enormous amount of transactions running through its ad servers, automated-buying platform and ad exchange every day. If advertisers believed the company’s operation were fraud-filled, they could take their money elsewhere and the business would falter.
The best available reckoning of Google’s ad-tech dominance comes from a data pull by Ghostery, an ad-technology company that monitors ad tags on the web. In the month of September 2013, the most recent estimate available from the company, Ghostery found Google technology served 316 billion ad impressions. The next largest company, OpenX, came in at 84.4 billion. The heft means Google is exposed when it comes to fraud, but it also puts it in position to lead the fight against the problem. Until now, the company was content to work against fraud from behind the scenes, but it’s hard to lead while keeping quiet, part of the reason it is speaking to Ad Age.
“Sharing our point of view and our stance on it, our level of investment, is something that we think will help the rest of the industry along,” said Neal Mohan, Google VP-video and display ad products.
Google’s decision led to my trip across the Atlantic this spring, to embed with Sasha and his colleagues, as they opened up the door to one of the most important and best-protected secret units of the web. Though almost every word spoken was on the record, Sasha and a number of his fellow Google employees asked to be referred to by their first names, saying they were concerned for their safety. “Because it is part of organized crime, I’m guessing it would not be a friendly environment for the people that speak out against it,” said one team member.
As Sasha worked across two monitors, sunlight flooded into the office through large windows opening up views of South London. Six antifraud team members sat scattered about in the room, which was reachable only by walking through a hulking door with a circular vaultlike handle.
Sasha, speaking with a thick accent and a tone bordering on amused, began digging through AdAge.com’s site code (again, manipulated on that specific computer by a hacked connection) until he found a few lines called an “exploit”—essentially a key that hackers use to unlock computers. When an exploit opens the door to a computer, malware operators can install programs to gain full control. To an ad fraudster, this control is gold. It allows him to use the computer to browse the web in hidden windows, doing so without its owner’s knowledge.
That ad fraud is carried out through personal computers is one of its most striking characteristics. The hacked personal machines, called drones, combine to form botnets, or droves of computers browsing the internet in a coordinated dance meant to grab as many advertiser dollars as possible. Taking over personal machines helps botnet operators avoid detection. It diversifies their IP addresses and geographic locations, masking the loads of traffic they send across the internet.
Exploits can make their way onto computers via a number of paths, including through Wi-Fi networks, ads containing the code (malvertising), hijacked home routers, spam emails and hacked websites. (The Google team sneaked an exploit into the copy of Ad Age’s site on the computer I was sitting at through a corrupted connection.) When you run into one of these scenarios, the exploit can unlock your machine without any sign something bad is occurring. “Normal users won’t really see that something is going on,” Sasha explained. You don’t even have to click to get infected.
Though Sasha claimed the computer I was using was infected, there was no way to tell until he opened a program called WinLister, which provided more detail on the machine’s hidden windows. Once there, he found a set of Internet Explorer windows, all maximum size, all hidden and all labeled “message.” When Sasha unhid the windows, they appeared on the screen with cursor trackers showing mouse movements bumping about the page. When Sasha took his hand off the mouse, the cursors kept moving and clicking.
Surprise revelations like this one sparked bursts of laughter from the team, whose members don’t tend to explain what they do to stupefied outsiders every day.
For fraudsters, making money off infected machines is a simple process. There are two basic ways to do it: You sell bot traffic to publishers (through a chain of middle-men) who figure they can make more revenue from ads than the cost of the traffic; or you set up your own website, send the traffic there and sell your own ads.
The formula for making money from botnets might be straightforward, but detecting them is anything but. It’s one thing to know what a botnet looks like, but another to discern whether each ad served is being shown to a human or something else entirely.
While Sasha moved through the bot scripts, Douglas de Jager, the man in charge of Google’s botnet-hunting operation, sat in the back of the room and watched intently. Mr. de Jager is a confident, straight-talking South African who sold his fraud-fighting company, Spider.io, to Google for an undisclosed sum early last year. Though his team members were the ones set up on the monitors, there was no question he is the one calling the shots.
Mr. de Jager discovered the evils of the Internet early on. “We were one of the bad people,” he joked. Not quite, but he could have been. His first company, BytePlay, scraped content for brokers who felt middle-men were trapping their information in smaller portals to make a buck. BytePlay’s scrapers resembled humans, and the team quickly became aware of their potential to be used for evil. After selling BytePlay, Mr. de Jager decided to found Spider.io to fight those dark uses. “I wanted to try to prevent anyone from ever using technologies like the technologies I had built previously to do evil things,” he said.
Spider.io was all of seven people when it was acquired by Google, and the deal provided access to Google’s computing power, speeding up its process dramatically. “Where typically it would have taken us a day to produce a report on a particular slice of traffic, today it’s done in real time,” Mr. de Jager explained. It also brought a new element to the team’s work: restrictions. Spider must steer clear of Google’s sales team to avoid conflicts of interest. (The sales team, as you might imagine, doesn’t stand to gain immediately when inventory is removed from Google’s systems. The more ads it sells, the more money it takes in.)
Spider appears to be integrating well. The rapport between original team members and their new peers was apparent as the crew gathered at the Craft Beer Co., a homey London pub with a wooden bar and dozens of draught handles—the kind of place that’s easy to enter but difficult to leave. After a few hours of informal conversation and more laughs, groups broke off and headed to dinner. On his way out, a senior Google employee instrumental in the Spider acquisition made a point to stop and mention to me how happy he was with the move.
Since Mr. de Jager moved away from the “other side,” the bad guys have grown far more sophisticated. Malware, he said, was once primarily used for banking fraud, but two-factor authentication (for example, when a bank asks you for a code from your cellphone before you can sign in on a new computer, or asks whether you really meant to send money to Uruguay) severely reduced its profitability. Then, the hackers moved to credit-card fraud, but the security on that front is now so good that you can buy thousands of active credit-card records for a few dollars, because they’re essentially worthless. Next up was Bitcoin mining, where hacked machines were used to unearth the crypto currency. But that too became less profitable, leaving ad fraud as the most lucrative endeavor a cybercriminal can undertake today. “We’re at a point now where malware is being used principally for ad fraud,” Mr. de Jager said. Scary words for an advertising industry only starting to grasp the problem.
Looking at a malware binary for the first time is an unsettling experience. The encrypted program looks like a collection of the most indecipherable gibberish a computer could possibly spit out. A new member of the team, Sebastian, took a seat next to me in front of the monitors and pulled up one such binary, attempting to explain how these confusing columns—one line read, “15 68 C8 58 00 10 57 8B”—hold the DNA of the botnets themselves.
The binary is the engine of a botnet, instructing infected computers how to browse the web. It tells them which sites to visit, how long to stay, what to do while there, and more. Google’s antifraud team gets these chunks of raw code from a handful of sources, including VirusTotal, a malware-scanning company it acquired in 2012. It must then reverse-engineer the code to learn the characteristics of a particular botnet.
Decoding the binaries is a crucial step in the process, allowing the team to fingerprint the botnets. “Once we understand how it works, it informs what it is we look for in order to recognize that this visitor to a website must have this particular malware on their machine because of X, Y, Z,” said Vegard Johnsen, a product manager on the team.
The specific botnet binary on the screen in front of us contains 150 “actions,” each a specific instruction meant to mimic a human web visitor. The program, for example, instructs the possessed computer to create a hidden window, use Internet Explorer, set the window at full screen, disable sound, target the traffic to users whose browsing matches keywords such as “Liberty Insurance,” move the mouse randomly and click 20% of the time. This 150-action program is relatively simple; some botnets contain more than 2,000.
The binary is so detailed, you really get a sense of the people behind the code as you go through it. “You know someone sat there and chose these things and wrote this code,” Mr. Johnsen said. “We do wonder, what’s the equivalent team to us sitting in a real dark dungeon making a lot of money.”
An even fuller picture of the fraudsters emerges when looking at the message boards they operate on. The Google team monitors these forums, watching as the bad actors buy and sell infected computers and traffic directed to exploits. During my visit, the team showed me one middleman’s post that even included a “Fraudsters—don’t bother” warning. The middleman, of course, was referring to those who would defraud him, not to ad fraudsters, the desired audience.
This black market operates with its own system of checks and balances. There is a reputation point system and an escrow where money can be placed while the goods are delivered. “There is an element of at least recognizing that there’s a lot of effort that goes into this fraud,” Mr. Johnsen said.
The fraudsters are not bulletproof, though. Unlike the bots they spawn, they make mistakes like the rest of us humans. And those mistakes, sometimes seemingly tiny and insignificant, are what allow Google to definitively identify and neutralize their handiwork.
Long discussions about ad fraud necessitate lots of coffee, which is where Google’s famous microkitchens came in handy. After each session on the screens, some of which lasted nearly two hours, the team flocked to the coffee machines, using the opportunity to caffeinate and forget, for a moment, about the pixels and numbers. The refill was especially needed before a discussion about the complex giveaways that the team uses to ID bot traffic.
When Google’s fraud fighters finish reverse-engineering a botnet’s code, they are left with a detailed blueprint of the botnet’s behavior. Thanks to Google’s massive size, the blueprint can then be overlaid on top of Google’s wealth of impression data to find chunks of traffic that match up.
As part of this process, the Google team looks to match the traffic to both the botnet’s characteristics and what it calls “signals.” Characteristics are straightforward. They are any type of traffic behavior that occurs in nature, such as click-through rate, conversion rate, browser used and even where on the page the clicks occur. One botnet the group showed me called z00clicker instructed its drones to pick two random points on the page and move along the line, clicking as soon as its path crossed something clickable. The botnet, then, left a distinct pattern of clicks on ad creative—a signature, if you will. A map of clicks on ads shown to z00clicker traffic shows incredible click density on the edges, and little action in the center.
Characteristics are helpful, but when Google labels traffic as nonhuman, it takes dramatic action, refusing to pay the publisher that served the ad—and it does not charge the advertiser a cent. You need something a bit more conclusive in this case. Which is why signals are critical.
A signal is a type of behavior that does not exist under normal circumstances but is inadvertently created by the fraudster when he programs the bot. “Our job is to try to find the little signals that unfortunately [for him] some particular creator of these [botnet] payloads has leaked,” Mr. de Jager said. “It’s a way we can then identify the traffic that comes from the infected machines for this particular payload.”
The Google team was particularly cagey about handing over signals, as many of them are still live and can tip off the fraudsters if published. “We’re skirting pretty close to the edge on quite a few things,” Mr. de Jager had said at the outset of my visit. And this was the edge.
The team did provide a few examples, though, of signals it believes are unique to ZeroAccess, a botnet that Microsoft helped kill in 2013, but has since resurrected itself. Here’s one: In nature, resetting a browser’s cookies yields a “0” in its cookie field. But, for some reason, ZeroAccess inserts a space character there. The botnet resets the cookies on the browser before each browsing session, so it shows that space fairly regularly. This signal is enough to identify the traffic as ZeroAccess-generated, but often Google requires a number of signals showing up at the same time to definitively say the traffic is of a certain botnet and dispose of it.
Any force of good that fights the powers of evil needs its defining piece of hardware. For Batman, it’s the Batmobile. For Frodo, the Ring. The Jedis, of course, have their lightsabers.
For the Google team, it’s Powerdrill.
Powerdrill is a freak computing system. It’s capable of processing a half trillion cells of data in a less than five seconds (translation: It’s damn fast). And it can spit that data out as charts and other graphical representations that make it possible to spot the irregularities of nonhuman traffic.
Introducing a session on the tool, Mr. de Jager simply labeled it “Here be dragons.” Another member of the team, Phil, slid up to the monitors and opened a Powerdrill screen showing a monster piece of traffic originating almost entirely from four IP addresses and one web server. The traffic, clearly all the work of some central entity, generated 100 million ad clicks on a single Google network over the course of just ten days. “This is real traffic,” Phil explained. “This is using data from three days ago.”
This swath of traffic was so massive, it likely disrupted the results of countless ad campaigns over the 10 days measured, and it’s still running today. “This has potential to be artificially inflating the clickthrough rate of advertising campaigns quite significantly across the board,” Phil noted.
Perplexingly, this traffic is not even part of a botnet. “This company”—which Phil declined to name—”is actually an ad verification service.” What the company does, he said, is go across the web, sampling as many ads as possible and clicking through to record which landing page each ad leads to. Though the verification service could identify itself in its browser as “nonhuman,” it chooses not to, and is therefore passing itself off as legitimate human traffic to plenty of ad-tech companies that have not yet identified it.
Sharing this type of information with other companies could go a long way toward combating the industry’s fraud problem—and Google now appears ready to do it. Mr. de Jager said his team is about to start publishing detailed information on bad traffic for the first time, providing disclosures on everything from the type of traffic created by the ad verification company to details on certain botnets it detects.
Mr. de Jager hopes Google’s disclosures will inspire other companies to release their own findings, combining to help the industry turn ad fraud into an uneconomic proposition for fraudsters. “Our job is to increase the cost for them and reduce the payout to the point where they think, ‘Maybe ad fraud is not where we should be making our money,'” he said.
Whether or not the Google team (or the ad industry itself for that matter) is making significant progress toward that goal is, admittedly, hard to tell. I did see the fight in action during my visit, and what I witnessed was the best-articulated plan to eliminate the problem out of the dozens I’ve listened to. But the war against ad fraud is so frustratingly opaque that if I reported back that the good guys are winning, or even on their way, it would be disingenuous.
If that day of victory comes, though, Mr. de Jager has a plan. He joked he might take a vacation and “go to the same beach that maybe some of these cybercriminals are hanging out on,” he said. “Share a drink? Who knows?”