Platform/GFX/TriageSchedule

Overview

This is a pilot project starting in 2015. The goal is to refine the process enough to understand what kind of time load this requires, what kind of latency we can accomplish, and collect enough data for a retrospective about best ways to measure the progress, any possible changes and next steps.

Process

This is a rotating duty. Each individual will be in charge of a week worth of new bugs, assigned to nobody, starting Saturday, ending Friday. You would then have one extra week to act on them, so that a bug is at most 14 days old by the time somebody looks at it.
At this pace, everybody will get this duty once per quarter. The schedule is in the shared calendar (see above), and it should be self-managing - if you want to trade your week with somebody else, you should be able to just move the item around.
The goal is to make sure we don’t miss something important, completely or until “late”, and also notice any trends we may have with crashes or intermittent failures, or in any particular areas of the code. The idea is to categorize the bugs as they come in so that we know which ones need a jump on, which ones can wait a bit, maybe ask for some information that is missing, maybe CC the right people, etc.
We will cover these components: Canvas: 2D, Canvas: WebGL, GFX: Color Management, Graphics, Graphics: Layers, Graphics: Text, Image Blocking, ImageLib, Panning and Zooming.
Some guidelines:
- A good guideline should be ~15 minutes per bug, which is probably about hour and a half a day for the two weeks, but lets see what we really need as we get going.
- This isn’t about finding a cause, and it isn’t about the full prioritization.
- This is about noticing things sooner.
- This is about asking the bug author for info that may be missing or would help with the triage.
- This is about asking for a regression range, or even getting one if you can reproduce the problem and you have time.
- This is about CC-ing the people on the team (or elsewhere) you’re guessing could shed more light on the issue.
- This is about doing an occasional needinfo, and should be reserved for what you deem is a high priority.
- Some types of bugs would be handled outside of this triage process; for example, intermittent test errors can get a gfx-noted with a quick check if it's something obvious, not really spending time to resolve the issue if it takes a larger effort.

Keywords

Add the relevant keywords:
- "crash" if it's a crash;
- "hang" if it's a hang;
- "perf" if it's a performance related issue;
- "feature" if it's new code, doing something that wasn't done before; note that a "feature" can block a "crash", we want a wide definition;
- "regression" - not quite sure about this, we may want to save it for really bad and immediate regressions only?

Clean up the bug:
- set the correct platform if it's obvious and we're reasonably certain (e.g., DirectX issue is going to be Windows);
- if we know how to reproduce it, set the "Has STR" field; if there is a regression range, set that as well.

Set the priority field (P1-P5 under importance) at the time that you make it gfx-noted.
- If you are not sure, set it to P3.
- If you are going to fix it in this release, set it to P1. Note - this is not the same as thinking it should be fixed in this release. It's a scheduling note, not a priority setting.
- If you are going to fix it in the next release, set it to P2. Are you sure though? Do you know you will have time?
- If you don't think we will ever have the time to spend on this, can ship for years without fixing it, and will take a patch if a contributor produces it, set it to P5.

Consider the Severity value (blocker, critical, major, etc. under Importance)
- If it's already set to anything higher than normal, please CC Milan.
- If you think it should be set to higher than normal, please do so, then CC Milan.

Schedule

The schedule is tracked in a shared calendar, ID mozilla.com_6059q0oha1t7ueamb52cs7vegk@group.calendar.google.com and in case of difference with that and the table below, the shared calendar wins.

There is also a dashboard tracking how we're doing.

2017 Q1	2017 Q2	2017 Q3	2017 Q4
Dec 31 - Jan 6 Vincent Jan 7 - Jan 13 Mason Jan 14 - Jan 20 Morris Jan 21 - Jan 27 Lee Jan 28 - Feb 3 George Feb 4 - Feb 10 Andrew Feb 11 - Feb 17 Dzmitry Feb 18 - Feb 24 Ryan

2016 Q1	2016 Q2	2016 Q3	2016 Q4
Jan 2 - Jan 8 Sotaro Jan 9 - Jan 15 David Jan 16 - Jan 22 Timothy Jan 23 - Jan 29 Nicolas Jan 30 - Feb 5 Jeff Muizelaar Feb 6 - Feb 12 Mason Feb 13 - Feb 19 Benoit Feb 20 - Feb 26 Lee Feb 27 - Mar 4 Milan Mar 5 - Mar 11 Edwin Mar 12 - Mar 18 Jamie Mar 19 - Mar 25 Bas Mar 26 - Apr 1 Jeff Gilbert	Apr 2 - Apr 8 Sotaro Apr 9 - Apr 15 David Apr 16 - Apr 22 Timothy Apr 23 - Apr 29 Nicolas Apr 30 - May 6 Jeff Muizelaar May 7 - May 13 Mason May 14 - May 20 Edwin May 21 - May 27 Lee May 28 - June 3 Benoit June 4 - June 10 Jamie June 11 - June 17 Milan June 18 - June 24 Sotaro June 25 - July 1 Jeff Gilbert	July 2 - July 8 Bas July 9 - July 15 David July 16 - July 22 Peter July 23 - July 29 Timothy July 30 - Aug 5 Jerry Aug 6 - Aug 12 Nicolas Aug 13 - Aug 19 Ethan Aug 20 - Aug 26 Jeff Muizelaar Aug 27 - Sep 2 Vincent Sep 3 - Sep 9 Mason Sep 10 - Sep 16 Morris Sep 17 - Sep 23 Lee Sep 24 - Sep 30 George	Oct 1 - Oct 7 Jamie Oct 8 - Oct 14 Edwin Oct 15 - Oct 21 Sotaro Oct 22 - Oct 28 Jeff Gilbert Oct 29 - Nov 4 Bas Nov 5 - Nov 11 David Nov 12 - Nov 18 Peter Nov 19 - Nov 25 Timothy Nov 26 - Dec 2 Jerry Dec 3 - Dec 9 Milan Dec 10 - Dec 16 Nicolas Dec 17 - Dec 23 Ethan Dec 24 - Dec 30 Jeff Muizelaar

2015 Q1	2015 Q2	2015 Q3	2015 Q4
Jan 3 - Jan 9 Milan Jan 10 - Jan 16 Kats Jan 17 - Jan 23 Bas Jan 24 - Jan 30 David Jan 31 - Feb 6 Benoit Feb 7 - Feb 13 Dan Feb 14 - Feb 20 Sotaro Feb 21 - Feb 27 Jeff Gilbert Feb 28 - Mar 6 Timothy Mar 7 - Mar 13 Nicolas Mar 14 - Mar 20 Botond Mar 21 - Mar 27 Jeff Muizelaar Mar 28 - Apr 3 Mason	Apr 4 - Apr 10 Milan Apr 11 - Apr 17 Dan Apr 18 - Apr 24 Bas Apr 25 - May 1 David May 2 - May 8 Sotaro May 9 - May 15 Jeff Gilbert May 16 - May 22 Timothy May 23 - May 29 Nicolas May 30 - June 5 Jeff Muizelaar June 6 - June 12 Mason June 13 - June 19 Benoit June 20 - June 26 Milan June 27 - July 3 Dan	July 4 - July 10 Bas July 11 - July 17 David July 18 - July 24 Sotaro July 25 - July 31 Jeff Gilbert Aug 1 - Aug 7 Timothy Aug 8 - Aug 14 Nicolas Aug 15 - Aug 21 Jeff Muizelaar Aug 22 - Aug 28 Mason Aug 29 - Sep 4 Benoit Sep 5 - Sep 11 Lee Sep 12 - Sep 18 Milan Sep 19 - Sep 25 Bas Sep 26 - Oct 2 Dan	Oct 3 - Oct 9 David Oct 10 - Oct 16 Sotaro Oct 17 - Oct 23 Jeff Gilbert Oct 24 - Oct 30 Timothy Oct 31 - Nov 6 Nicolas Nov 7 - Nov 13 Jeff Muizelaar Nov 14 - Nov 20 Mason Nov 21 - Nov 27 Benoit Nov 28 - Dec 4 Lee Dec 5 - Dec 11 Milan Dec 12 - Dec 18 Jamie Dec 19 - Dec 25 Bas Dec 26 - Jan 1 Jeff Gilbert

Future considerations

This is something JS team did at one point; when we're considering the next steps on this, we will want to consider it:

JS team tried shared-triage-responsibility a few years ago. It didn't last very long,
but it was not scheduled or enforced. Eventually managers/project managers/tech leads
took over for the sub-components they were responsible for.

Before JS did coordinated triage, Dave Mandelin measured that there were about 11 new bugs
per day, half of which were internally generated by the team and didn't need triage
(developers triage their own bugs). So that was about 5/6 bugs a day across the component.
Of those, the most serious ones (~2 a week, I think?) were already getting fixed within
a release cycle. Based on the distribution we ended up with three priority tags:

   p1 = must do
   p2 = want to do <- general bucket
   p3 = may do <- usually idea/investigation/research bugs

And two follow-up tags:
   investigate = someone needs to spend a few minutes investigating
   nonactionable = nothing to do

Thoughts and comments about the first round

(Milan) Worth revisiting the query for the bugs you've triaged a few days, or a week after you've reduced the number to zero - sometimes the new ones show up because of the component change or bug getting reopened, or some such.
(Kats) Current method gives people exposure to other parts of the the code, but without sufficient context to properly triage bugs (no history of what landed recently, or if other similar bugs were reported in the past week). I would still prefer a component-watching approach
(Kats) Intermittents are more challenging to deal with - if it's a low-volume initially and later increases in volume who is responsible for it?

Platform/GFX/TriageSchedule

Contents

Overview

Process

Keywords

Schedule

Future considerations

Thoughts and comments about the first round

Navigation menu

Platform/GFX/TriageSchedule

Overview

Process

Keywords

Schedule

Future considerations

Thoughts and comments about the first round

Navigation menu

Search