Privacy/Reviews/KPI Backend: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(→‎Type of data stored:: Adding user_agent)
Line 58: Line 58:
* '''screen_size:''' The screen dimensions of the device used by the user, determined programatically with javascript
* '''screen_size:''' The screen dimensions of the device used by the user, determined programatically with javascript
* '''sample_rate:''' Rate at which the server is sampling clients data for KPI messages. 0.1 would be a 10% sample rate. We plan on shipping with 100% traffic, so 1.0
* '''sample_rate:''' Rate at which the server is sampling clients data for KPI messages. 0.1 would be a 10% sample rate. We plan on shipping with 100% traffic, so 1.0
* '''user_agent''' A generalized version of user agent which includes coarse grained details for Operating System, Browser, and Browser version. Does not contain original user agent string.


=== Example data:===
=== Example data:===

Revision as of 22:20, 30 April 2012

Document Overview

Feature/Product: KPI Backend
Projected Feature Freeze Date: End of Q2
Product Champions: Austin King
Privacy Champions: TBD
Security Contact: TBD
Document State: [ON TRACK] ready for review?


Timeline:

Architectural Overview: TBD
Recommendation Meeting: TBD
Wrap-up Meeting: (if necessary)

Architecture

In this section, the product's architecture is described. Any individual components or actors are identified, their "knowledge" or what data they store is identified, and data flow between components and external entities is described.

The main objective of this feature/product is: to allow the BrowserID product team to access how well changes to the service are meeting key performance indicators (KPI). UX will design a feature change, engineering will build it and a KPI Dashboard will give us the feedback of how successful the change is with real users.

KPI Backend must be built before we build the KPI Dashboard, which will be built next quarter and have it's own privacy review. KPI Backend stores the raw data described below.

Design Documents: Link to any design or architectural documents here.

Components

Describe any major components in the system and how they interact. Also include any third-party APIs (those Mozilla does not control) and what type of data is sent or received via those APIs.

Client Component

The client portion of the KPI Dashboard feature is the HTML/Javascript that runs in a user's browser when they sign into a website using Persona Sign-In on a browser without native support. The dialog that is displayed records interactions and timing information, building a JSON data structure during interaction with the dialog. This JSON data structure is then sent to Persona Sign-In servers at the end of the interaction.

Type of data stored:

During the users interaction with the dialog we capture various information:

  • timestamp: the time that the interaction started
  • event_stream: interesting events that occurred during the user's interaction, including both events initiated by the user (mouse clicks) as well as events originating from running javascript code (keypair generation). Each event is uniquely named and includes a time offset for when it occurred measured from when the dialog was displayed
  • email_type: In the event that the interaction results in the user selecting an email address to use to sign in, we include the type of email used: "primary" is an email address from a domain that has browserid support, "secondary" is an email address from a domain that does not directly support browserid
  • number_emails: When an interaction proceeds to the point where the user authenticates to the personaid service we include the number of emails that this user has verified with BrowserID
  • new_account: If during an interaction a new browserid account is created, this property is true (as opposed to an interaction which represents sign in using an existing account)
  • language: the language that was displayed to this user during the interaction
  • sites_logged_in: If the user is authenticated to the Persona servers at any point during the interaction, we include the number of distinct sites that the user has logged into recently using browserid
  • screen_size: The screen dimensions of the device used by the user, determined programatically with javascript
  • sample_rate: Rate at which the server is sampling clients data for KPI messages. 0.1 would be a 10% sample rate. We plan on shipping with 100% traffic, so 1.0
  • user_agent A generalized version of user agent which includes coarse grained details for Operating System, Browser, and Browser version. Does not contain original user agent string.

Example data:

{
    "timestamp": 1333046104322,
    "event_stream": [
         [ "picker", 732 ],
         [ "picker::change", 1700 ],
         [ "picker::signin": 2300 ],
         [ "assertion_generation": 2500 ],
         [ "certified": 3300 ],
         [ "assertion_generated": 4500 ],
         [ "complete": 4777 ]
    ],
    "email_type": "secondary",
    "number_emails": 3,
    "new_account": false,
    "language": "en_US",
    "number_sites_logged_in": 1,
    "screen_size": { "width": 640, "height": 480 },
    "sample_rate": 1.0
}

Server Component

Persona ID is currently implemented in two data centers with six "webheads", frontline web servers receiving requests from client devices. For this feature each webhead will expose a new API that accepts JSON data and forwards it to data storage servers. Data will be retained forever or purged based on resource usage. Historical data will be valuable for guiding the teams design decisions.

We reserve the right to sample data, but will start with 100% intake.

The client accessible API is: /wsapi/interaction_data

The API requires an HTTP POST with a CSRF token. The content type of the post is 'application/json', and body is the JSON document described above. The server returns a 200 on successful storage, and a non-500 otherwise. In the event of failure, the client may store the blob in localStorage and retry transmission at a later point.

Data Storage Component

Persona Sign-In webheads serve as simple forwarders for this feature. They may do some input validation and then POST the data to a small number of servers who store it. These servers expose a similar API for reciept of the data. Additionally, these servers have APIs to allow access to the data based on a date range, supporting streaming or pagination as desired.

Access to this data may be highly restricted initially, with a goal of opening up access as much as is feasible to allow for transparency, community involvement, and a high level of decoupling between the systems that store and the systems that analyze the data to answer meaningful questions about project health and usability.

User Data Risk Minimization

In this section, the privacy champion will identify areas of user data risk and recommendations for minimizing the risk.

Alignment with Privacy Operating Principles

In this section, the privacy champion will identify how the feature lines up with Mozilla's privacy operating principles.

See Also: Privacy/Roadmap_2011#Operating_Principles:

Follow-up Tasks and tracking

What Who Bug Details