Topics API

Unofficial Proposal Draft,

More details about this document
This version:
https://github.com/patcg-individual-drafts/topics
Issue Tracking:
GitHub
Editors:
(Google)
(Google)
Participate:
GitHub patcg-individual-drafts/topics (new issue, open issues)

Abstract

This specification describes a method that could enable ad-targeting based on a person’s general browsing interests without exposing their exact browsing history.

Status of this document

This document is an individual draft proposal. It has not been adopted by the Private Advertising Technology Community Group, but it may be discussed in that CG’s meetings. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

On today’s web, people’s interests are typically inferred based on observing what sites or pages they visit. This relies on tracking techniques such as third-party cookies, or less-transparent mechanisms like device fingerprinting. It would be better for privacy if interest-based advertising could be accomplished without needing to collect a particular individual’s browsing history.

This specification provides an API to enable ad-targeting based on a person’s general browsing interests, without exposing their exact browsing history.

Creating an ad based on browsing interests, using the document.browsingTopics() JavaScript API:

(Inside an https://ads.example iframe)

// document.browsingTopics() returns an array of BrowsingTopic objects.
const topics = await document.browsingTopics();

// Get data for an ad creative.
const response = await fetch('https://ads.example/get-creative', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify(topics)
});

// Get the JSON from the response.
const creative = await response.json();

// Display the ad.
Creating an ad based on browsing interests, based on the `Sec-Browsing-Topics` HTTP request header sent by this invocation of fetch():

(Inside the top level context)

// A 'Sec-Browsing-Topics: [topics header value]' header will be sent in
// the HTTP request.
const response = await fetch('https://ads.example/get-creative', {browsingTopics: true});
const adCreative = await response.json();
// Display the ad.

2. Terminology and types

A taxonomy comprises a list of advertising topic ids as integers. A taxonomy is identified by a taxonomy version string. A topic id is no smaller than 1.

The taxonomy must be in a tree hierarchy, where an ancestor topic id always represents something more general than its descendant topic ids. The browser should implement a get descendant topics algorithm, which takes a topic id, and returns its descendants' topic ids as a list.

The model version is a string that identifies the model used to classify a string into topic ids. The meaning may vary across browser vendors. The classification result topic ids should be relevant to the input string’s underlying content.

The configuration version identifies the algorithm (other than the model part) used to calculate the topic. It should take the form of "<browser vendor identifier>.<an integer version>". The meaning may vary across browser vendors.

Given configuration version configurationVersion, taxonomy version taxonomyVersion, and model version modelVersion, the version is the result of concatenating « configurationVersion, taxonomyVersion, modelVersion » using ":".

The maximum version string length is the maximum possible string length of a version that a user agent could possibly generate in a given software release. For example, in Chrome’s experimentation phase, 13 was used for the maximum version string length to account for a version like chrome.1:1:11.

A user topics state is a struct with the following fields and default values:

An epoch is a struct with the following fields:

A topic with caller domains is a struct with the following fields:

A topics history entry is a struct with the following fields and default values:

A topics caller context is a struct with the following fields:

All domains used in this API will be the result of obtaining the registrable domain from some host.

3. User agent associated state

Each user agent has an associated user topics state user topics state with epochs initially empty, and hmac key initially a randomly generated 128-bit number.

Each user agent has an associated topics history storage to store the information about the visited pages that are needed for topics calculation. It is a list of topics history entries, initially empty.

Each user agent has an associated taxonomy taxonomy (identified by taxonomy version taxonomy version) and model model (identified by model version model version).

The taxonomy and model may be shipped to the browser asynchronously with respect to the browser release, and may be unavailable at a given point. They must be updated atomically with respect to algorithms that access them (e.g. the calculate user topics algorithm).

Note: In Chrome versions M121 and later, the taxonomy used is taxonomy_v2.md. The expectation is that it will change over time.

Each user agent has an associated topics algorithm configuration (identified by configuration version configuration version). The initial value and meaning is browser defined.

Note: The configuration version allows the browser vendor to provide algorithms different from the ones specified in this specification. For example, for some of the algorithms in this specification, it may be possible to use a different constant value, while the system overall still has utility and meets the privacy goals.

When the configuration version is updated, the browser must properly migrate or delete data in user topics state and topics history storage so that the state and the configuration are consistent.

3.1. Expiring stored data

User agents must automatically delete stored data 28 days after its creation.

4. BrowsingTopic dictionary

The BrowsingTopic dictionary is used to contain the IDL correspondences of topic id, version, configuration version, taxonomy version, and model version.
dictionary BrowsingTopic {
  [EnforceRange] unsigned long long topic;
  DOMString version;
  DOMString configVersion;
  DOMString modelVersion;
  DOMString taxonomyVersion;
};
An example BrowsingTopic object from Chrome: { configVersion: "chrome.1", modelVersion: "1", taxonomyVersion: "1", topic: 43, version: "chrome.1:1:1" }.
A BrowsingTopic dictionary a is code unit less than a BrowsingTopic dictionary b if the following steps return true:
  1. If a["version"] is code unit less than b["version"], then return true.

  2. If a["topic"] < b["topic"], then return true.

  3. Return false.

5. document ID

Each Document has a document id, which is an implementation-defined unique identifier shared with no other Document objects within or across browser sessions for a user agent.

6. Determine topics calculation input data

Given a Document, the browser must have a way to determine the topics calculation input data. topics calculation input data is a string that encodes the attributes to be used for topics classification, as determined by the browser vendor. By default, the attributes should be scoped to the document’s URL and metadata.

Note: unless specifically allowed, data beyond the document shouldn’t be included, such as data from localStorage or cookies.

Note: In Chrome’s experimentation phase, the host of a Document's URL is used as the topics calculation input data, and the model is trained with human curated hostnames and topics.

7. Collect page topics calculation input data

To collect page topics calculation input data, given a Document document:
  1. If document’s node navigable is a prerendering navigable, then append the following steps to document’s post-prerendering activation steps list and return. Else, run the following steps in parallel:

    1. Let documentId be document’s document id.

    2. If user agent’s topics history storage contains a topics history entry whose document id is documentId, return.

    3. Let topicsHistoryEntry be a topics history entry.

    4. Set topicsHistoryEntry’s document id to documentId.

    5. Set topicsHistoryEntry’s topics calculation input data to the topics calculation input data for document.

    6. Let unsafeMoment be the wall clock's unsafe current time.

    7. Let moment be the result of running coarsen time algorithm given unsafeMoment and wall clock as input.

    8. Let fromUnixEpochTime be the duration from the Unix epoch to moment.

    9. Set topicsHistoryEntry’s time to fromUnixEpochTime.

    10. Append topicsHistoryEntry to user agent’s topics history storage.

8. Collect topics caller domain

To collect topics caller domain, given a Document document and a domain callerDomain:
  1. Run the following steps in parallel:

    1. Let documentId be document’s document id.

    2. If user agent’s topics history storage does not contain a topics history entry whose document id is documentId, return.

    3. Let topicsHistoryEntry be the topics history entry in user agent’s topics history storage whose document id is documentId.

    4. Append callerDomain to topicsHistoryEntry’s topics caller domains.

9. Derive top 5 topics

Given a list of topics history entries historyEntriesForUserTopics, the browser should provide an algorithm to derive top 5 topics, that are believed to be valuable for the Topics callers. The algorithm should return a list of 5 topic ids.

In Chrome versions M122 and later, topics are scored for ranking first by a binary priority level (see topics-utility-buckets-v1.md), and then by the frequency of page loads with that topic.

Given a list of topics history entries historyEntriesForUserTopics:

  1. Let topicsCount be an empty map.

  2. For each topics history entry historyEntry in historyEntriesForUserTopics:

    1. Let topicIds be the result of classifying historyEntry’s topics calculation input data.

    2. For each topicId in topicIds:

      1. If topicsCount[topicId] does not exist:

        1. Initialize topicsCount[topicId] to 0.

      2. Increment topicsCount[topicId] by 1.

  3. Let prioritizedTopicsCount be the result of sorting in ascending order topicsCount, with a less than algorithm compare topics based on priority and count.

  4. Let top5Topics be the first up to 5 keys of prioritizedTopicsCount.

  5. If top5Topics has less than 5 entries:

    1. Pad top5Topics with random topic ids from user agent’s taxonomy, until top5Topics has 5 entries.

  6. Return top5Topics.

To compare topics based on priority and count, given (topic1, count1) and (topic2, count2), perform the following steps. They return a boolean.

  1. Assert: count1 > 0.

  2. Assert: count2 > 0.

  3. Let highUtilityTopics be « 57, 86, 126, 149, 172, 180, 196, 207, 239, 254, 263, 272, 289, 299, 332 ».

  4. If highUtilityTopics contains topic1 and highUtilityTopics does not contain topic2, then return true.

  5. If highUtilityTopics does not contain topic1 and highUtilityTopics contains topic2, then return false.

  6. Return count1 > count2.

10. Periodically calculate user topics

At the start of a browser session, run the schedule user topics calculation algorithm.

This roughly schedules topic calculation every 7 days, unless the browser is inactive at the scheduled time(s), in which case a topic calculation will occur as soon as the browser restarts.
To schedule user topics calculation, perform the following steps:
  1. Let unsafeMoment be the wall clock's unsafe current time.

  2. Let moment be the result of running coarsen time algorithm given unsafeMoment and wall clock as input.

  3. Let fromUnixEpochTime be the duration from the Unix epoch to moment.

  4. Let presumedNextCalculationDelay be a duration of 0.

  5. If user agent’s user topics state's epochs is not empty:

    1. Let numEpochs be user agent’s user topics state's epochs's size.

    2. Let lastTopicsCalculationTime be user agent’s user topics state's epochs[numEpochs − 1].

    3. Let presumedNextCalculationDelay be lastTopicsCalculationTime + (a duration of 7 days) − fromUnixEpochTime.

    4. If presumedNextCalculationDelay < (a duration of 0), then set presumedNextCalculationDelay to (a duration of 0).

    5. Else if presumedNextCalculationDelay ≥ (a duration of 14 days), then set presumedNextCalculationDelay to (a duration of 0).

      Note: This could happen if the machine time has gone backward since the last topics calculation. Recalculate immediately to align with the expected schedule rather than potentially stop calculating for a very long time.

  6. Schedule the calculate user topics algorithm to run at Unix epoch + fromUnixEpochTime + presumedNextCalculationDelay.

To calculate user topics, perform the following steps:
  1. Let unsafeMoment be the wall clock's unsafe current time.

  2. Let moment be the result of running coarsen time algorithm given unsafeMoment and wall clock as input.

  3. Let fromUnixEpochTime be the duration from the Unix epoch to moment.

  4. If either user agent’s model or taxonomy isn’t available:

    1. Let epoch be an epoch struct with default initial field values.

    2. Set epoch’s time to fromUnixEpochTime.

    3. Append epoch to user agent’s user topics state's epochs.

    4. If user agent’s user topics state's epochs has more than 4 entries, remove the oldest epoch (i.e. the epoch with index 0).

    5. Schedule this calculate user topics algorithm to run at Unix epoch + fromUnixEpochTime + (a duration of 7 days).

    6. Return.

  5. Let historyEntriesForUserTopics be an empty list.

  6. Let topicsCallers be an empty map.

  7. Let userTopicsDataStartTime be fromUnixEpochTime − (a duration of 7 days).

  8. Let topicsCallerDataStartTime be fromUnixEpochTime − (a duration of 21 days).

  9. For each topics history entry topicsHistoryEntry in user agent’s topics history storage:

    1. Let visitTime be topicsHistoryEntry’s time.

    2. If visitTime is before topicsCallerDataStartTime, then continue.

    3. Let topicIds be the result of classifying topicsHistoryEntry’s topics calculation input data.

    4. If visitTime is greater than userTopicsDataStartTime:

      1. Append topicsHistoryEntry to historyEntriesForUserTopics.

    5. For each topicId in topicIds:

      1. If topicsCallers[topicId] does not exist:

        1. Initialize topicsCallers[topicId] to be an empty list.

      2. For each callerDomain in topicsHistoryEntry’s topics caller domains:

        1. Append callerDomain to topicsCallers[topicId].

  10. Let top5Topics be the result of running derive top 5 topics algorithm, given historyEntriesForUserTopics.

  11. Let top5TopicsWithCallerDomains be an empty list.

  12. For each topTopicId in top5Topics:

    1. Let topicWithCallerDomains be a topic with caller domains struct with topic id initially 0 and caller domains initially empty.

    2. If topTopicId is allowed by user preference setting:

      1. Set topicWithCallerDomains’s topic id to topicId.

      2. Let topicWithDescendantIds be the result of running get descendant topics given topTopicId.

      3. Add topTopicId to topicWithDescendantIds.

      4. For each topicId in topicWithDescendantIds:

        1. If topicId is allowed by user preference setting:

          1. Insert all elements in topicsCallers[topicId] to topicWithCallerDomains’s caller domains.

    3. Append topicWithCallerDomains to top5TopicsWithCallerDomains.

  13. Let epoch be an epoch struct with default initial field values.

  14. Set epoch’s taxonomy to user agent’s taxonomy.

  15. Set epoch’s taxonomy version to user agent’s taxonomy version.

  16. Set epoch’s model version to user agent’s model version.

  17. Set epoch’s config version to user agent’s configuration version.

  18. Set epoch’s top 5 topics with caller domains to top5TopicsWithCallerDomains.

  19. Set epoch’s time to fromUnixEpochTime.

  20. Append epoch to user agent’s user topics state's epochs.

  21. If user agent’s user topics state's epochs has more than 4 entries, remove the oldest epoch.

  22. Schedule this calculate user topics algorithm to run at Unix epoch + fromUnixEpochTime + (a duration of 7 days).

11. Epochs for caller

To calculate the epochs for caller, given a topics caller context callerContext, perform the following steps. They return a list of epochs.
  1. Let epochs be user agent’s user topics state's epochs.

  2. If epochs is empty, then return an empty list.

  3. Let numEpochs be epochs’s size.

  4. Let lastEpochTime be epochs[numEpochs − 1]'s time.

  5. Let epochSwitchTimeDecisionMessageArray be the concatenation of "epoch-switch-time-decision|", lastEpochTime, and callerContext’s top level context domain.

  6. Let epochSwitchTimeDecisionHmacOutput be the output of the HMAC algorithm, given input parameters: whichSha=SHA256, key=user agent’s user topics state's hmac key, and message_array=epochSwitchTimeDecisionMessageArray.

  7. Let epochSwitchTimeDecisionHash be 64-bit truncation of epochSwitchTimeDecisionHmacOutput.

  8. Let epochSwitchTimeDelayIntroduction be a duration of (epochSwitchTimeDecisionHash % 172800) seconds (i.e. 172800 is 2 days in seconds).

  9. Let epochPhaseOutTimeDecisionMessageArray be the concatenation of "epoch-phase-out-time-decision|", lastEpochTime, and callerContext’s top level context domain.

  10. Let epochPhaseOutTimeDecisionHmacOutput be the output of the HMAC algorithm, given input parameters: whichSha=SHA256, key=user agent’s user topics state's hmac key, and message_array=epochPhaseOutTimeDecisionMessageArray.

  11. Let epochPhaseOutTimeDecisionHash be 64-bit truncation of epochPhaseOutTimeDecisionHmacOutput.

  12. Let epochPhaseOutTimeOffset be a duration of (epochPhaseOutTimeDecisionHash % 172800) seconds (i.e. 172800 is 2 days in seconds).

  13. Let timestamp be callerContext’s timestamp.

  14. Let result be an empty list.

  15. Let startEpochIndex be -1.

  16. Let endEpochIndex be -1.

  17. If timestamplastEpochTime + epochSwitchTimeDelayIntroduction:

    1. Set startEpochIndex to max(numEpochs − 4, 0).

    2. Set endEpochIndex to numEpochs − 2.

  18. Else:

    1. Set startEpochIndex to max(numEpochs − 3, 0).

    2. Set endEpochIndex to numEpochs − 1.

  19. If endEpochIndex ≥ 0:

    1. Let i be startEpochIndex.

    2. Let epochRetentionDuration be a duration of 28 days.

    3. While iendEpochIndex:

      1. If epochs[i]'s time < timestamp - epochRetentionDuration + epochPhaseOutTimeOffset, then continue.

      2. Append epochs[i] to result.

      3. Set i to i + 1.

  20. Return result.

This roughly returns 3 recently calculated epochs, either counting back from the last epoch, or from the second to the last epoch, and excludes any epochs that are too old. The new epoch is introduced after a fixed duration (between 0 and 2 days) has elapsed since the epoch’s calculation time. Each epoch expires after a longer fixed interval (between 26 and 28 days). Both durations are specific to each user, site, and epoch. This mechanism makes it harder to correlate the same user across sites via the time that topics are changed, or via the time interval between two changes. The HMAC helps to compute the per-user per-site per-epoch delay on the fly, without needing to store extra data for each site or epoch.

12. Get the number of distinct versions in epochs

To get the number of distinct versions in epochs, given a topics caller context callerContext, perform the following steps. They return an integer.
  1. Let epochs be the result of running the calculate the epochs for caller algorithm given callerContext as input.

  2. Let distinctVersions be an empty set.

  3. For each epoch in epochs:

    1. If epoch’s taxonomy version is empty (implying that the topics calculation for that epoch didn’t occur), then continue.

    2. Insert tuple (epoch’s taxonomy version, epoch’s model version) to distinctVersions.

  4. Return distinctVersions’s size.

13. Topics for caller

To calculate the topics for caller, given a topics caller context callerContext, perform the following steps. They return a list of BrowsingTopics.
  1. Let epochs be the result of running the calculate the epochs for caller algorithm given callerContext as input.

  2. Let result be an empty list.

  3. For each epoch in epochs:

    1. If epoch’s top 5 topics with caller domains is empty (implying the topics calculation failed for that epoch), then continue.

    2. Let topic be null.

    3. Let topTopicIndexDecisionMessageArray be the concatenation of "top-topic-index-decision|", epoch’s time, and callerContext’s top level context domain.

    4. Let topTopicIndexDecisionHmacOutput be the output of the HMAC algorithm, given input parameters: whichSha=SHA256, key=user agent’s user topics state's hmac key, and message_array=topTopicIndexDecisionMessageArray.

    5. Let topTopicIndexDecisionHash be 64-bit truncation of topTopicIndexDecisionHmacOutput.

    6. Let topTopicIndex be topTopicIndexDecisionHash % 5.

    7. Let topTopicWithCallerDomains be epoch’s top 5 topics with caller domains[topTopicIndex].

    8. If topTopicWithCallerDomains’s caller domains contains callerContext’s caller domain:

      1. Set topic to an empty BrowsingTopic dictionary.

      2. Set topic["topic"] to topTopicWithCallerDomains’s topic id.

    9. If topic is null, or if topic’s topic is 0 (i.e. the candidate topic was cleared), then continue.

    10. Let randomOrTopTopicDecisionMessageArray be the concatenation of "random-or-top-topic-decision|", epoch’s time, and callerContext’s top level context domain.

    11. Let randomOrTopTopicDecisionHmacOutput be the output of the HMAC algorithm, given input parameters: whichSha=SHA256, key=user agent’s user topics state's hmac key, and message_array=randomOrTopTopicDecisionMessageArray.

    12. Let randomOrTopTopicDecisionHash be 64-bit truncation of randomOrTopTopicDecisionHmacOutput.

    13. If randomOrTopTopicDecisionHash % 100 < 5:

      1. Let randomTopicIndexDecisionMessageArray be the concatenation of "random-topic-index-decision|", epoch’s time, and callerContext’s top level context domain.

      2. Let randomTopicIndexDecisionHmacOutput be the output of the HMAC algorithm, given input parameters: whichSha=SHA256, key=user agent’s user topics state's hmac key, and message_array=randomTopicIndexDecisionMessageArray.

      3. Let randomTopicIndexDecisionHash be 64-bit truncation of randomTopicIndexDecisionHmacOutput.

      4. Let randomTopicIndex be randomTopicIndexDecisionHash % epoch’s taxonomy's size.

      5. Set topic’s topic to epoch’s taxonomy[randomTopicIndex].

    14. Set topic["configVersion"] to epoch’s config version.

    15. Set topic["modelVersion"] to epoch’s model version.

    16. Set topic["taxonomyVersion"] to epoch’s taxonomy version.

    17. Determine the version version, given topic’s configVersion, modelVersion and taxonomyVersion as input.

    18. Set topic["version"] to version.

    19. Add topic to result.

  4. Sort entries in result given the less-than comparator for the BrowsingTopic dictionary.

  5. Remove duplicate entries in result. Two BrowsingTopic dictionaries a and b are considered equal if a is not code unit less than b and b is not code unit less than a.

  6. Return result.

This roughly selects one random topic from each of the previous epochs (to limit cross-site reidentification capabilities), and only returns those that were observed by the caller (so that this provides roughly only a subset of the capabilities of third-party cookies). For each epoch, there is a 5% chance to return a random topic from the full taxonomy, rather than returning the real top topic, so as to provide some amount of plausible deniability. This random topic will only be returned if the caller would have received the real top topic (i.e. observed by the caller). This makes it non-trivial to detect which topics are the random topics (see github issue). All the randomnesses involved in this process are sticky to the user agent, epoch, and site. The HMAC helps to compute the random sticky values on the fly, without needing to store extra data for each epoch and site.

14. The JavaScript API

The Topics API lives under the Document interface, and is only available if the document is in secure context.

dictionary BrowsingTopicsOptions {
  boolean skipObservation = false;
};

partial interface Document {
    [SecureContext] Promise<sequence<BrowsingTopic>> browsingTopics(optional BrowsingTopicsOptions options = {});
};
The browsingTopics(options) method steps are:
  1. Let document be this.

  2. Let topLevelDocument be document’s node navigable's top-level traversable's active document.

  3. Let promise be a new promise.

  4. Let topicsCallerContext be a topics caller context.

  5. Set topicsCallerContext’s caller domain to document’s origin's host's registrable domain.

  6. Set topicsCallerContext’s top level context domain to topLevelDocument’s origin's host's registrable domain.

  7. Let unsafeMoment be the wall clock's unsafe current time.

  8. Let moment be the result of running coarsen time algorithm given unsafeMoment and wall clock as input.

  9. Let fromUnixEpochTime be the duration from the Unix epoch to moment.

  10. Set topicsCallerContext’s timestamp to fromUnixEpochTime.

  11. If any of the following is true:

    then:

    1. Queue a global task on the browsing topics task source given document’s relevant global object to reject promise with a "NotAllowedError" DOMException.

    2. Abort these steps.

  12. Run the following steps in parallel:

    1. Let topics be an empty list.

    2. If the user preference setting and other user agent-defined mechanisms like enrollment allow access to topics from topLevelDocument given document’s origin:

      1. Set topics to the result of running the calculate the topics for caller algorithm, with topicsCallerContext as input.

      2. If options["skipObservation"] is false:

        1. Run the collect page topics calculation input data algorithm with topLevelDocument as input.

        2. Run the collect topics caller domain algorithm with topLevelDocument and topicsCallerContext’s caller domain as input.

    3. Queue a global task on the browsing topics task source given document’s relevant global object to perform the following steps:

      1. Resolve promise with topics.

  13. Return promise.

15. fetch() and iframe integration

Topics can be sent in the HTTP header for fetch() requests and for iframe navigation requests. The response header for a topics related request can specify whether the caller should be recorded.

15.1. send browsing topics header boolean associated with Request

A request has an associated send browsing topics header boolean. Unless stated otherwise it is false.

TODO: make the modification directly to the fetch spec.

15.2. browsingtopics content attribute for HTMLIframeElement

The iframe element contains a browsingtopics content attribute. The IDL attribute browsingTopics reflects the browsingtopics content attribute.
partial interface HTMLIFrameElement {
  [CEReactions] attribute boolean browsingTopics;
};

TODO: make the modification directly to the html spec.

15.3. browsingTopics attribute in RequestInit

The RequestInit dictionary contains a browsingTopics attribute:
partial dictionary RequestInit {
  boolean browsingTopics;
};

TODO: make the modification directly to the fetch spec.

15.4. Modification to request constructor steps

The following step will be added to the new Request(input, init) constructor steps, before step "Set this’s request to request":
  1. If init["browsingTopics"] exists, then set request’s send browsing topics header boolean to it.

TODO: make the modification directly to the fetch spec.

15.5. Modification to "create navigation params by fetching" steps

The following step will be added to the create navigation params by fetching steps, after step "Let request be a new request, with ...":
  1. If navigable’s container is an iframe element, and if it has a browsingtopics content attribute, then set request’s send browsing topics header boolean to true.

TODO: make the modification directly to the html spec.

15.6. The `Sec-Browsing-Topics` HTTP request header

This specification defines a `Sec-Browsing-Topics` HTTP request header. It is used to send the topics.

15.7. Modification to HTTP-network-or-cache fetch algorithm

The following step will be added to the HTTP-network-or-cache fetch algorithm, before step "Modify httpRequest’s header list per HTTP. ...":
  1. Append or modify a request `Sec-Browsing-Topics` header for httpRequest.

TODO: make the modification directly to the fetch spec.

15.8. Append or modify a request Sec-Browsing-Topics header

To append or modify a request `Sec-Browsing-Topics` header, given a request request, run these steps:
  1. If request’s send browsing topics header boolean is not true, then return.

  2. Delete `Sec-Browsing-Topics` from request’s header list.

    The topics a request is allowed to see can change within its redirect chain. For example, different caller domains may receive different topics, as the callers can only get the topics about the sites they were on. The timestamp can also affect the candidate epochs where the topics are derived from, thus resulting in different topics across redirects.

  3. Let initiatorWindow be request’s window.

  4. Let requestOrigin be request’s URL's origin.

  5. If requestOrigin is not a potentially trustworthy origin, then return.

  6. If initiatorWindow is not an environment settings object, then return.

  7. If initiatorWindow is not a secure context, then return.

  8. For each feature f in « "browsing-topic", "interest-cohort" »:

    1. Run the Should request be allowed to use feature? algorithm with feature set to f and request set to request. If the algorithm returns false, then return.

    Note: the above algorithm should include the pending update, i.e. the request should be considered to contain the equivalent opt-in flags for both "browsing-topic" and the "interest-cohort" feature.

  9. Let topLevelDocument be initiatorWindow’s global object's navigable's top-level traversable's active document.

  10. Let topicsCallerContext be a topics caller context with default initial field values.

  11. Set topicsCallerContext’s caller domain to requestOrigin’s host's registrable domain.

  12. Set topicsCallerContext’s top level context domain to topLevelDocument’s origin's host's registrable domain.

  13. Let unsafeMoment be the wall clock's unsafe current time.

  14. Let moment be the result of running coarsen time algorithm given unsafeMoment and wall clock as input.

  15. Let fromUnixEpochTime be the duration from the Unix epoch to moment.

  16. Set topicsCallerContext’s timestamp to fromUnixEpochTime.

  17. Let topics be an empty list.

  18. Let numVersionsInEpochs be 0.

  19. If the user preference setting and other user agent-defined mechanisms like enrollment allow access to topics from topLevelDocument given requestOrigin:

    1. Set topics to the result of running the calculate the topics for caller algorithm, with topicsCallerContext as input.

    2. Set numVersionsInEpochs to the result of running the get the number of distinct versions in epochs algorithm, with topicsCallerContext as input.

  20. Let versionsToTopics be an ordered map.

  21. For each topic of topics:

    1. Let version be topic["version"].

    2. Let topicInteger be topic["topic"].

    3. If versionsToTopics[version] does not exist, then set it to an empty list.

    4. Append topicInteger to versionsToTopics[version].

  22. Let topicsStructuredFieldsList be an empty Structured Fields List.

  23. For each versiontopicIntegers of versionsToTopics:

    1. Let innerList be an empty Structured Fields Inner List.

    2. Append all items from topicIntegers to innerList.

    3. Let topicParameters be an empty Structured Fields Parameters.

    4. Set topicParameters["v"] to a Structured Fields Token with value version.

    5. Associate topicParameters with innerList.

    6. Append innerList to topicsStructuredFieldsList.

  24. If numVersionsInEpochs is 0, then set numVersionsInEpochs to 1.

  25. Let maxNumberOfEpochs be 3 (i.e. topics are selected from the last 3 epochs).

  26. Let topicMaxLength be number of base-10 digits in the maximum topic id (e.g. for Chrome’s current taxonomy, topicMaxLength is 3, as the topic id has maximum 3 digits).

  27. Let versionMaxLength be the length of the current maximum version string length.

  28. Let listItemsSeparatorLength be 2 (i.e. structured fields use two characters (", ") to separate list items).

  29. Let perVersionedTopicsInnerListOverhead be 5 (i.e. for "();v=")

  30. Let maxPaddingLength be maxNumberOfEpochs * topicMaxLength + maxNumberOfEpochs - numVersionsInEpochs + numVersionsInEpochs * perVersionedTopicsInnerListOverhead + numVersionsInEpochs * versionMaxLength + (numVersionsInEpochs - 1) * listItemsSeparatorLength.

  31. Let paddingLength be maxPaddingLength.

  32. If topicsStructuredFieldsList is not empty:

    1. Let serializedTopicsList be the result of executing the serializing structured fields algorithm on topicsStructuredFieldsList.

    2. Decrement paddingLength by serializedTopicsList’s length.

  33. Else:

    1. Increment paddingLength by listItemsSeparatorLength (i.e. to account for the separator characters that would be added when topics are not empty).

  34. If paddingLength < 0, then set paddingLength to 0.

Note: the padding should generally be ≥ 0. It may be negative in certain circumstances: when historically stored topic versions are greater (and use more digits) than the current maximum version string length; or when there is a race between getting topics and getting the number of distinct topic versions. Clamp to 0 to prevent breakage in these rare circumstances.

  1. Let paddedToken be "P".

  2. Append paddingLength "0" characters to the end of paddedToken.

  3. Let paddedEntryParameters be an empty Structured Fields Parameters.

  4. Set paddedEntryParameters["p"] to a Structured Fields Token with value paddedToken.

  5. Let emptyInnerList be an empty Structured Fields Inner List.

  6. Associate paddedEntryParameters with emptyInnerList.

  7. Append emptyInnerList to topicsStructuredFieldsList.

  8. Set a structured field value given (`Sec-Browsing-Topics`, topicsStructuredFieldsList) in request’s header list.

This algorithm transforms the topics list into structured fields format, which contains paddings to make the total length consistent for different topics callers.
Empty returned topics, and underlying epochs have same versions:

();p=P0000000000000000000000000000000

One returned topic, and underlying epochs have same versions:

(1);v=chrome.1:1:2, ();p=P00000000000

Two returned topics, and underlying epochs have same versions:

(1 2);v=chrome.1:1:2, ();p=P000000000

Two returned topics, and underlying epochs have two different versions:

(1);v=chrome.1:1:2, (1);v=chrome.1:1:4, ();p=P0000000000

Three returned topics, and underlying epochs have three different versions:

(100);v=chrome.1:1:20, (200);v=chrome.1:1:40, (300);v=chrome.1:1:60, ();p=P

Why adding paddings: servers typically have a GET request size limit e.g. 8KB, and will return an error when the limit is reached. An attacker can rely on this to learn the number of topics for a different domain, and/or a small amount of information about the topics themselves (e.g whether the topic ids are < 10, < 100, etc.)

The various lengths being returned (that depends on the number of distinct versions) could leak which epochs the user had disabled topics or didn’t use the browser, if it coincided with the version change. But this leak is minor. The most common cases (i.e. returning same version topics, or no topics) will have the same length.

15.9. The `Observe-Browsing-Topics` HTTP response header

The `Observe-Browsing-Topics` HTTP response header can be used to record a caller’s topics observation.

To handle topics response, given a response response and a request request:
  1. If request’s header list does not contain `Sec-Browsing-Topics` (implying the request’s current URL is not eligible for topics), then return.

  2. Let topLevelDocument be request’s window's global object's navigable's top-level traversable's active document.

  3. Let callerOrigin be request’s current URL's origin.

  4. If the user preference setting or other user agent-defined mechanisms like enrollment disallows access to topics from topLevelDocument given callerOrigin, then return.

  5. Let callerDomain be callerOrigin’s host's registrable domain.

  6. Let list be response’s header list.

  7. Let observe be the result of running get a structured field value algorithm given `Observe-Browsing-Topics`, "item", and list as input.

  8. If observe is true:

    1. Run the collect page topics calculation input data algorithm with topLevelDocument as input.

    2. Run the collect topics caller domain algorithm with topLevelDocument and callerDomain as input.

15.10. Modification to HTTP fetch steps

The following step will be added to the HTTP fetch steps, before checking the redirect status (i.e. "If actualResponse’s status is a redirect status, ..."):
  1. Handle topics response, given response actualResponse and request request as input.

TODO: make the modification directly to the fetch spec.

16. Permissions policy integration

This specification defines a policy-controlled feature identified by the string "browsing-topics". Its default allowlist is *.

For backward compatibility, this specification also defines a policy-controlled feature identified by the string "interest-cohort". Its default allowlist is *.

17. Privacy considerations

The Topics API attempts to provide just enough relevant interest information for advertisers to be able to personalize their ads for the user while maintaining user privacy. Some privacy safeguards include: usage in secure contexts only, topic limitation to a human curated taxonomy, different topics given to different sites in the same epoch to prevent cross-site reidentification, noised topics, a limited number of topics provided per epoch, user opt outs, site opt outs, and a suggestion that user agents provide UX to give users choice in which Topics are returned.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[FETCH]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[HR-TIME-3]
Yoav Weiss. High Resolution Time. URL: https://w3c.github.io/hr-time/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[PERMISSIONS-POLICY-1]
Ian Clelland. Permissions Policy. URL: https://w3c.github.io/webappsec-permissions-policy/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[SECURE-CONTEXTS]
Mike West. Secure Contexts. URL: https://w3c.github.io/webappsec-secure-contexts/
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/

IDL Index

dictionary BrowsingTopic {
  [EnforceRange] unsigned long long topic;
  DOMString version;
  DOMString configVersion;
  DOMString modelVersion;
  DOMString taxonomyVersion;
};

dictionary BrowsingTopicsOptions {
  boolean skipObservation = false;
};

partial interface Document {
    [SecureContext] Promise<sequence<BrowsingTopic>> browsingTopics(optional BrowsingTopicsOptions options = {});
};

partial interface HTMLIFrameElement {
  [CEReactions] attribute boolean browsingTopics;
};

partial dictionary RequestInit {
  boolean browsingTopics;
};