Introduction to the Core Services of the Google Privacy Sandbox

In the previous post we ended with a high-level diagram of the revised Chrome browser that is adapted for the Google Privacy Sandbox (Figure 1). In this article, I will introduce the core products and services that make up the browser side of the Privacy Sandbox in more detail. In the subsequent article, we will go through each of the elements in Figure 1 and explain how they support/tie into the products and services that need to be delivered (Figure 2). After that, I will delve deeply into each of the services and how they work, referring to only those API calls that are most critical to understanding. Lastly, I will tie the entire current flow of a transaction through these browser elements. That is how the Privacy Sandbox works today - the server side elements are still a long way from being implemented. We will therefore focus on those elements and their impact on the overall architecture in later articles.

‍

Figure 1- The Browser with Updates for Google Privacy Sandbox

‍

The Core Sandbox Technical APIS/Product Elements

Now it may seem like I am violating my promise to not drill into APIs, but in order to understand the Privacy Sandbox you first need to understand the core product elements, and these products are packaged as APIs with completely separate functions. So as I describe them, you can just mentally “remove” the term API and you will be able to see them as product names.

There are three core browser-centric products in the overall Google Privacy Sandbox Suite, with many supporting elements (also defined by APIs).

Topics
Protected Audiences
Attribution Reporting

There are also two core server-side products that make up the complete suite which we will cover later:

Key Management Server (there are at least two in order to provide Multi-party Computation)
k-anonymity server

I do not think of the balance of the technologies, such as Fenced Frames or DNS over HTTPS, as “products” per se because they are technologies designed to support the core products, not products in-and-of themselves. . Many are evolutions of browser standards that already exist or they are additions to the browser, such as secure Shared Storage, which will be available to more than just the Privacy Sandbox.

Topics API

The Topics API is intended to deliver targeting for contextual audiences without cookies as part of the Privacy Sandbox. Contextual audiences are relatively easy to create. You index all the pages on various websites and categorize them by some kind of audience taxonomy. Then you capture in the browser what pages a particular browser visits and algorithmically determine the most likely “fit” for the browser into an interest-based audience or audiences.

For example, the IAB has a ~1,500 element audience taxonomy that could easily be used for contextual targeting. At this point, Google is using a 471-element taxonomy as part of the Topics API. If you were to ask me why Google is not using the IAB taxonomy to make things easier for providing consistent contextual targeting across both Google, publisher sites, and other third-party adTech platforms, the answer lies in the need to maintain k-anonymity for purposes of complying with privacy requirements. In general, an audience must have at least 50-100 members in order for it to be considered sufficiently anonymous for purposes of targeting. If the taxonomy is too fine-grained, it becomes difficult to create a large enough size of audience to meet the anonymity requirement.

The Topics API evolved out of what I consider the first true “product” that came out of the process that has led to today’s Google Privacy Sandbox: Federated Learning of Cohorts, or FLoC. Federated learning is a data science approach that allows PII (or any) data to reside remotely (in this case in the browser) and when needed have it sent to a central server in anonymous fashion to update the weights of an algorithm. The weights are then sent back to the remote locale and the algorithm run against the local data.

Google came up with an approach that used federated learning to create contextual audiences. A cohort was a short name shared by a large number (thousands) of people, derived by the browser from its user’s browsing history. The browser would update the cohort over time as its user viewed pages on the web. In FLoC, the browser used the local algorithms to develop a cohort based on the sites that an individual visited. In this version, the algorithms might be based on any number of distinguishable features, such as the URLs of the visited sites, on the content of those pages, or other factors. The central idea was that these input features to the algorithm, including the web history, were kept only in the browser and were not uploaded elsewhere — the browser only exposed the generated cohort.

The FLoC API was developed in 2019 - 2020 and tested in 2021. Testing ended in July, 2021 for the following reasons and these learnings were incorporated into the current Topics API:

FLoC ended up not using federated learning. Google and others found that on-device computation was faster and less resource intensive. So by definition the whole approach (and naming, obviously) had to change.
FLoC did not provide enough protection against cross-site identifying information. Because of this, device fingerprinting was still possible. Two academics from MIT found that more than 95 percent of user devices could be uniquely identified after only four weeks.
The ad tech industry wanted more transparency and control over how the contextual categories were created. In FLoC, the automatic way in which contextual audiences were created was a result of the algorithm, not a fixed taxonomy. It was also unpredictable, which means cohorts could be created around sensitive topics and the adTech providers would not be able to prevent advertisers’ ads from showing in contexts unsuitable for specific brands.

We will drill into more detail on all of these issues when we talk about contextual audience creation under the Privacy Sandbox.

Protected Audiences API

The Protected Audiences API is the core product you read about in articles about on-going testing and evolution of the Privacy Sandbox. It started life as something called TurtleDove. As a side note: to this day I don’t know why bird names were chosen, even though I still have emails in my email folders from Michael Kleber (of Google, one of the core technical leaders of the Privacy Sandbox initiative) about setting up the repository. A series of other bird-named APIs came in - PIGIN, DoveKey, TERN, SPARROW, PARRROT, SPURFOWL, SWAN - but ultimately Turtledove and the best suggestions from these other API proposals were merged into FLEDGE, which stands for First Locally-Executed Decision over Groups Experiment. FLEDGE was then renamed the Protected Audience API (abbreviated as PAAPI) in April 2023, once the technology looked reasonably viable and a more “product-oriented” name was needed.

The goal of the Protected Audience API was to allow advertisers to target audiences in the browser based on behavior the advertiser had seen - for example, from purchases made on their website - without being able to combine that with other information about the person — in particular, with who they are or what pages they visit across the web. Protected AUdiences API calls these audiences interest groups but I find that quite confusing, because I tend to think of interest groups being associated with contextual targeting (i.e. people who read certain pages have an interest in that topic). Even the Topics API shows this same issue with naming of audience concepts:

“Interest-based advertising (IBA) is a form of personalized advertising in which an ad is selected for the user based on interests derived from the sites that they’ve visited in the past. This is different from contextual advertising, which is based solely on the interests derived from the current site being viewed (and advertised on).

The term interests, as in interest groups, is used for audience concepts in both Protected Audiences and Topics APIs. Yet these are very different types of audiences and are stored in different browser storage locations (once again, read “files on the hard drive”).

So moving forward, we will use the term Contextual Audiences to refer to audiences in the Topics API, and interest-based audiences or interest groups to refer to audiences in the Protected Audiences API.

The Protected Audiences API is Where Auctions and Bidding Are Handled

One interesting aspect of the Google Privacy Sandbox is that there are no separate products called the Auctions API or Bidding API. There is documentation for auction and bidding services in the main Github privacy sandbox repository, but these are services called under the Protected Audiences API. This is why all the effort right now is on testing PAAPI, because it is where bid requests and bid responses for both contextual audiences and interest-based audiences occur. PAAPI also specifies where and how the ad for the winning bid is delivered to the browser. This is where the concept of Fenced Frames is defined. So while Protected Audiences API defines how interest-based audiences are created, stored and used, it is the core product of the three because it encompasses all the other services needed to bid for and deliver ads.

The Protected Audience API Also Covers Auction Results Reporting

Reporting on auctions and conversions is a significantly complicated topic in the Privacy Sandbox, and has not been fully fleshed out yet. Reporting on conversions, attributing them to specific ads, and the rules by which fractional attribution is done, is handled by the Attribution Reporting API. But reporting on auctions- what the auction structure was, what the winning bid was and its features, and what happened to losing bids, are all covered by PAAPI.

There are two kinds of reports:

Event-level reports associated with a particular auction, bid and ad delivery to a specific browser
Aggregatable reports that provide a mechanism for rich metadata to be reported in aggregate, to better support use-cases such as campaign-level performance reporting, segmentation based on contextual or interest-based audiences, as well as reports combining with second- or third-party data to do reporting on demographic, psychographic, or other segmentation schemes.

Today, reporting is in its infancy. For FOT #1, reporting functions in the Protected Audiences API can send event-level reports to participating adTechs’ servers directly. There is a longer-term plan for doing both event-level and aggregate-level reporting in a way that prevents an adTech from learning which interest groups a particular browser belongs to. The basis for this long-term approach is currently outlined in a draft proposal called the Private Aggregation API. This API covers numerous potential use cases beyond programmatic bidding. As a result, there is also an extension of that API specifically for the Protected Audiences API that is described in the PAAPI repository here.

Reporting is complicated even further because the Privacy Sandbox is built around something called fenced frames. which will be discussed in the next article in detail. Fenced frames are a privacy-preserving version of an iFrame. The problem is that the reporting endpoints in PAAPI, named respectively reportResult() for publishers and reportWin() for advertisers, can see results for contextual ad requests under the Topics API, but cannot “see” the results of interest-based ad events that occur in the Fenced Frame because of its privacy protections. Therefore there has to be a way to get the information about impressions, interactions, and clicks for interest-based ads out of the fenced frame for reporting purposes. This is handled by the Fenced Frames Ads Reporting API endpoints that are part of the PAAPI specification

Attribution Reporting API

The Attribution Reporting API provides measurement services for both publishers and advertisers to the Google Privacy Sandbox. As described directly in the API documentation, the Attribution Reporting API makes it possible to measure when an ad click or view leads to a conversion on an advertiser site, such as a sale or a sign-up. The API enables two types of attribution reports:

Event-level reports associated a particular event on the ad side (a click, view or touch) with coarse conversion data. To preserve user privacy, conversion-side data is coarse, and reports are noised and are not sent immediately. The number of conversions is also limited.
Aggregatable reports provide a mechanism for rich metadata to be reported in aggregate, to better support use-cases such as campaign-level performance reporting or conversion values.

The API allows advertisers and ad tech providers to measure conversions from:

Ad clicks and views.
Ads in a third-party iframe, such as ads on a publisher site that uses a third-party adTech provider.
Ads in a first-party context, such as ads on a social network or a search engine results page, or a publisher serving their own ads.

Each browser captures the activity and sends encrypted event reports to an adTech server. The adTech server, whether belonging to the publisher or the advertiser (or their proxies, like an SSP or DSP), cannot see the individual events. The adTech server, located in a Trusted Executive Environment, decrypts and then aggregates the individual browser actions into aggregate, privacy-preserving reports. These are the only reports that the advertiser and publisher can see from this API.

One key difference between the Attribution Reporting API and the standard reporting in the Protected Audiences API is that in the Attribution Reporting API there is what I will call a two-sided event. The first event is the ad being shown and some activity around that. The second is a purchase or some other conversion event on the advertisers site. The ad is considered the “attribution source” or “reporting origin” and has a unique source_id, while the conversion action is considered the “destination”. The two events are tied together by a unique destination ID that is registered to the attribution source at the time it is created.

There are two other important aspects of the Attribution Reporting API that distinguish it from auction-based reporting. Firsat, ads can be given priorities. These priorities will represent how much weight they will be given in a fractional attribution system. Second, there is an attribution window which is the amount of time after the ad is displayed that a conversion will be counted against that impression. The default is 30 days, but can be set by the advertiser between 1 - 30 days. As of now, 30 days is the maximum conversion window allowed. My guess is this will be extended at some point, since automobile advertisers tend to use longer attribution windows.

A Services View of the Google Privacy Sandbox

Figure 1 showed the physical elements in the browser that support the Google Privacy Sandbox. However, we can take a different view when thinking about the three core products, which are really in themselves nothing more than services delivered through APIs. However, this view is helpful because it shows all the other services and APIs on which the three core products depend, many of which have their own W3C standards, W3C working groups, and Github repositories. This view is displayed in Figure 2.

Figure 2 - A Services View of the Google Privacy Sandbox

‍

To reiterate, I am not trying to show the entire services architecture of Chrome or any other browser. I am only trying to represent enough of the features and services to explain how the Privacy Sandbox works.

That’s all for today. In the next article, I will go back to the core browser elements and tie them to the products/services that have been the focus of this article.