Releases/Firefox 5/Risk mitigation strategies: Difference between revisions
Jump to navigation
Jump to search
(20 intermediate revisions by 5 users not shown) | |||
Line 7: | Line 7: | ||
=== Add-ons === | === Add-ons === | ||
'''Mobile''' | |||
* AMO has no compatibility bumping for mobile | |||
* Mobile has about 50% compatible | |||
** Lower than we would like to see it | |||
* We should manually look at the recommended add-ons and bump them | |||
* Not very many binary add-ons | |||
* Don't think we should hold the release if we don't increase the percentage | |||
'''Desktop''' | |||
* 78% compatible with Firefox 5 for the add-ons on AMO | * 78% compatible with Firefox 5 for the add-ons on AMO | ||
* Large portion of the remaining percentage is the .NET Framework Assistant | * Large portion of the remaining percentage is the .NET Framework Assistant | ||
Line 16: | Line 26: | ||
** 78% of addons compatible | ** 78% of addons compatible | ||
** .net framework assitant: ETA? | ** .net framework assitant: ETA? | ||
** | |||
* AVG, Synamtec, McAfee, Kaspersy should be ready | |||
* Haven't heard back from google for the toolbar, not currently compatible | |||
* [https://spreadsheets.google.com/spreadsheet/ccc?key=0Ap1xJcDYQ9zjdFN6NFpWNjcyQVhhUEQ5cGQxbU1ZQ0E&hl=en_US&authkey=CJa3g_wL Extension compatibility spreadsheet] | |||
* From the add-on side we should do a very gradual rollout so add-on authors have time to update before the bulk of our users are affected by incompatibilities | |||
=== Stability === | === Stability === | ||
'''Mobile''' | |||
* Mobile crash data is close/the same as 4.0.1 | |||
* See higher crashes in beta, as the ADUs grow the crash rate goes down a bit | |||
* Number of users on beta are small (but the best we've ever had) | |||
* Not watching any particular bug to see if it flares up after the release | |||
* Looks good for release | |||
* https://crash-stats.mozilla.com/daily?p=Fennec&v[]= | |||
<pre> | |||
5.0 - 4.0.1 | |||
crashes ADUs throt crash/100 crashes ADUs throt crash/100 | |||
2011-06-16 49 7,114 100% 0.69% 809 165,235 100% 0.49% | |||
2011-06-15 40 6,583 100% 0.61% 771 165,907 100% 0.46% | |||
2011-06-14 43 6,069 100% 0.71% 785 163,882 100% 0.48 | |||
</pre> | |||
'''Desktop''' | |||
* 1.7 million users on 5.0 overall. | * 1.7 million users on 5.0 overall. | ||
* Crash rate fairly low at 1.36 crashes per 100 ADU. | * Crash rate fairly low at 1.36 crashes per 100 ADU. | ||
Line 32: | Line 64: | ||
** Not enough data to really understand if there is top crasher. | ** Not enough data to really understand if there is top crasher. | ||
** 2 Flash releases in the last week and a half. | ** 2 Flash releases in the last week and a half. | ||
* From the stability side we need automatic updates to 6-10 million users to be confident releasing to the rest. That calls for a release method to get that many ADUs, pause while we interpret the data, then open it for everyone | |||
=== Security === | === Security === | ||
* {{bug|659349}} has details released prematurely | |||
** Filed May 24th. Got a fix into Firefox 5. They talked about the details 5 days early | |||
** Not the most significant bug we are fixing in this release | |||
** Screenscraping bug that will affect 40-50% of our users that have machines that can run WebGL | |||
** People expect us to talk about it during the release | |||
* If we are doing a slow rollout we might have to delay the security advisories | |||
* Very uncomfortable with a slow rollout because people will go look | |||
* Other than {{bug|659349}}, we're in good shape for a Tuesday release | |||
** Most were found internally | |||
** The external bugs are both sg:moderate | |||
* At this point it's really too late to do a 4.0.2, unless we think Firefox 5's uptake is going to be terrible | |||
* If we are going to roll out slowly in the future, we need to start discussing a possibly 5.0.1 | |||
* From the security side we want to release as quickly as possible | |||
=== Web compatibility === | === Web compatibility === | ||
* The WebGL disabling cross-texture | |||
* setTimeout background time clamping has potential negative consequences, none we know about | |||
* A throttled roll out may help us find issues before the whole audience is exposed to it | |||
* From the web compatibility side a gradual rollout will let us know if these web compatibility issues affect our userbase before exposing the entire userbase | |||
== Dials we can adjust == | == Dials we can adjust == | ||
Line 51: | Line 103: | ||
# More users exposed for longer if we announce security vulnerability details | # More users exposed for longer if we announce security vulnerability details | ||
# Requires webpage creation, copy creation, and localization--none of which has been done | # Requires webpage creation, copy creation, and localization--none of which has been done | ||
# requires manual RelEng touching of the updates; some small QA impact TBD | |||
=== Manual-only update === | === Manual-only update === | ||
Line 75: | Line 128: | ||
# May be harder to get initial feedback as the volume could be too low to determine if something is a major issue | # May be harder to get initial feedback as the volume could be too low to determine if something is a major issue | ||
# More users exposed for longer if we announce security vulnerability details | # More users exposed for longer if we announce security vulnerability details | ||
==Outcome== | |||
* clooney/mfinkle will take point on getting all featured mobile add-ons compatible or removing them from the featured list | |||
* Mobile doesn't need to throttle | |||
* No one wanted to do prompted update for desktop | |||
* lmesa liked Manual-only the best for desktop | |||
** We decided it didn't get us where we needed to be testing-wise | |||
** Not the best from a security standpoint | |||
** Discounted | |||
* Argued to throttle @ 100% and then cut it off when we hit enough of an audience or to throttle at some percentage and later increase to 100% | |||
* '''Decided to throttle automatic updates to 25-33% for a maximum of 51 hours (48 + 3 hours to get us to a regular PDT time)''' | |||
** Asked for 72 hours, security team was more comfortable with 48 hours | |||
** Staying throttled (or turning off updates entirely) after 51 hours needs to have clear justification and signoff from the security team | |||
* clegnitto and joduinn decided on 33% (based on some WAG numbers) as they would rather overshoot than undershoot | |||
* clegnitto will work with metrics to get hourly ADU reports |
Latest revision as of 17:57, 20 June 2011
Why?
We need to see where we are at with various risk factors for Firefox 5 and ways to mitigate that risk if we aren't comfortable with the level.
This page / planning does not mean we NEED to do any of these or Firefox 5 isn't ready to release. It is merely prudent to discuss where we are at, what's in our control, and ways to mitigate risk before they are needed.
Current risk profile
Add-ons
Mobile
- AMO has no compatibility bumping for mobile
- Mobile has about 50% compatible
- Lower than we would like to see it
- We should manually look at the recommended add-ons and bump them
- Not very many binary add-ons
- Don't think we should hold the release if we don't increase the percentage
Desktop
- 78% compatible with Firefox 5 for the add-ons on AMO
- Large portion of the remaining percentage is the .NET Framework Assistant
- Talked with the developer at Microsoft, said he would update his add-on. We don't have a timeframe for the update though
- Risk: LOW for AMO add-ons. HIGH for non-AMO add-ons
- Most have updated, and the ones that aren't are waiting for release
- https://addons.mozilla.org/en-US/firefox/compatibility
- 78% of addons compatible
- .net framework assitant: ETA?
- AVG, Synamtec, McAfee, Kaspersy should be ready
- Haven't heard back from google for the toolbar, not currently compatible
- From the add-on side we should do a very gradual rollout so add-on authors have time to update before the bulk of our users are affected by incompatibilities
Stability
Mobile
- Mobile crash data is close/the same as 4.0.1
- See higher crashes in beta, as the ADUs grow the crash rate goes down a bit
- Number of users on beta are small (but the best we've ever had)
- Not watching any particular bug to see if it flares up after the release
- Looks good for release
5.0 - 4.0.1 crashes ADUs throt crash/100 crashes ADUs throt crash/100 2011-06-16 49 7,114 100% 0.69% 809 165,235 100% 0.49% 2011-06-15 40 6,583 100% 0.61% 771 165,907 100% 0.46% 2011-06-14 43 6,069 100% 0.71% 785 163,882 100% 0.48
Desktop
- 1.7 million users on 5.0 overall.
- Crash rate fairly low at 1.36 crashes per 100 ADU.
- Distribution of users scattered across all betas - http://test.kairo.at/socorro/2011-06-16.buildcrashes.html.
- Risks
- No good data right now on any one beta for 1 million+ users.
- b7: 89K users, 6.799 crashes per 100 ADU.
- b6: 295K users, 1.436 crashes per 100 ADU.
- The last several betas have never increased much beyond 250K users.
- We know from 4.0 experience that the crash landscape changes above 1 million, 2 million, 5 million. We had over 2 million beta users for pre 4.0 builds.
- Not enough data to really understand if there is top crasher.
- 2 Flash releases in the last week and a half.
- From the stability side we need automatic updates to 6-10 million users to be confident releasing to the rest. That calls for a release method to get that many ADUs, pause while we interpret the data, then open it for everyone
Security
- bug 659349 has details released prematurely
- Filed May 24th. Got a fix into Firefox 5. They talked about the details 5 days early
- Not the most significant bug we are fixing in this release
- Screenscraping bug that will affect 40-50% of our users that have machines that can run WebGL
- People expect us to talk about it during the release
- If we are doing a slow rollout we might have to delay the security advisories
- Very uncomfortable with a slow rollout because people will go look
- Other than bug 659349, we're in good shape for a Tuesday release
- Most were found internally
- The external bugs are both sg:moderate
- At this point it's really too late to do a 4.0.2, unless we think Firefox 5's uptake is going to be terrible
- If we are going to roll out slowly in the future, we need to start discussing a possibly 5.0.1
- From the security side we want to release as quickly as possible
Web compatibility
- The WebGL disabling cross-texture
- setTimeout background time clamping has potential negative consequences, none we know about
- A throttled roll out may help us find issues before the whole audience is exposed to it
- From the web compatibility side a gradual rollout will let us know if these web compatibility issues affect our userbase before exposing the entire userbase
Dials we can adjust
Advertised vs unadvertised update
- We could offer an advertised (major) update rather than an unadvertised (minor) one
Pros
- Gives users more notice / lets them opt-in
- Ability to speak directly to users via the billboard
- Users may be more tolerant of add-on incompatibility due to better mental preparation
Cons
- Slows uptake
- If the user chooses never we don't have a point release in a month reprompting them
- More users exposed for longer if we announce security vulnerability details
- Requires webpage creation, copy creation, and localization--none of which has been done
- requires manual RelEng touching of the updates; some small QA impact TBD
Manual-only update
- We could only offer a manual download from Mozilla.com. Users would only get the in-product update if they manually check for updates
Pros
- Minimizes risk to userbase while still being technically released
- Gives users more notice / lets them opt-in (either from mozilla.com or checking for updates manually)
- Users may be more tolerant as they explicitly looked for and installed the release
- Press around release may prompt add-on makers to update their add-ons
Cons
- Slows uptake considerably
- Do we disclose security vulnerability details?
- Some may not view it as a release if it is only available when manual action is taken
Throttled automatic update offers
- Release as normal but have some percentage of update pings return no update available
Pros
- Lowers risk across the entire userbase
- Gives add-on developers additional time to increase compatibility
Cons
- Gives some users more risk, others less
- May be harder to see crash spikes as the user ramp is gradual
- May be harder to get initial feedback as the volume could be too low to determine if something is a major issue
- More users exposed for longer if we announce security vulnerability details
Outcome
- clooney/mfinkle will take point on getting all featured mobile add-ons compatible or removing them from the featured list
- Mobile doesn't need to throttle
- No one wanted to do prompted update for desktop
- lmesa liked Manual-only the best for desktop
- We decided it didn't get us where we needed to be testing-wise
- Not the best from a security standpoint
- Discounted
- Argued to throttle @ 100% and then cut it off when we hit enough of an audience or to throttle at some percentage and later increase to 100%
- Decided to throttle automatic updates to 25-33% for a maximum of 51 hours (48 + 3 hours to get us to a regular PDT time)
- Asked for 72 hours, security team was more comfortable with 48 hours
- Staying throttled (or turning off updates entirely) after 51 hours needs to have clear justification and signoff from the security team
- clegnitto and joduinn decided on 33% (based on some WAG numbers) as they would rather overshoot than undershoot
- clegnitto will work with metrics to get hourly ADU reports