Breakpad/Status Meetings/2017-03-22: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(initial creation)
 
 
(8 intermediate revisions by 4 users not shown)
Line 18: Line 18:


== Operations Updates ==
== Operations Updates ==
* did not ship this morning
* just verified stage, going to release *right now*
* antennae load test continues
** changing configuration so we can aggregate logs and investigate boxes afterwards
** necessary to verify that logs going into grephost are actually the logs from the box
* manually updated the admin node, hopefully for the last time
** running the latest crontabber but still using a bash lock


== Project Updates ==
== Project Updates ==
Line 31: Line 39:


=== Splitting out collector (Antenna) ===
=== Splitting out collector (Antenna) ===
* (willkg) Worked out cause of startup problems (s3 connections failing, not sending startup errors to Sentry, gunicorn starting too many processes)--Antenna has no configuration at startup. Tracked in [https://bugzilla.mozilla.org/show_bug.cgi?id=1342619 bug 1342619].
* (willkg) Worked on determining dataloss possibilities. Pushed log parser to [https://github.com/willkg/antenna-log-parser github]. Work is tracked in [https://bugzilla.mozilla.org/show_bug.cgi?id=1348881 bug 1348881].
* (willkg) Prepped Pigeon for monitoring. Work tracked in [https://bugzilla.mozilla.org/show_bug.cgi?id=1344037 bug 1344037].
* (miles, mbrandt) Load tests to figure out good autoscaling parameters. Work tracked in [https://bugzilla.mozilla.org/show_bug.cgi?id=1342621 bug 1342621].
Once we establish autoscaling configuration and determine what we can about dataloss, then we can move forward to the next phase where we connect Antenna -stage to Socorro -stage replacing the Socorro collectors. Maybe end of week?
=== Deprecation rampage ===
=== Deprecation rampage ===
* Correlations are done and gone
* Next step: middleware
** need to move a few remaining services to the webapp
=== Processor rewrite ===
=== Processor rewrite ===
* some new rules ported over
* https://github.com/tobgu/pyrsistent
** likely going to pull this in
** PMap at first, PRecord later
** can use it to enforce schema throughout the transformation pipeline
* last call for comments: https://bugzilla.mozilla.org/show_bug.cgi?id=1346883
=== Upgrading elasticsearch ===
=== Upgrading elasticsearch ===
* (Adrian) working on the ES 5.1 mapping
** currently working on mapping testing with real data


== Other Business ==
== Other Business ==
* (Adrian) interest in a signature generation service is coming back
** context is crash pings (Telemetry), ccorcoran is looking into having a service to generate a signature like ours from a stack trace
** could make the processor rewrite slightly easier
** prototype is here: https://github.com/adngdb/crash-signature-service/
* (peterbe) New validate_and_test.py vastly improved
** will help prevent bad edits to crash_report.json
** still not perfect, and not enforced or automated
** filed https://bugzilla.mozilla.org/show_bug.cgi?id=1349633


== Travel, etc ==
== Travel, etc ==
* lonnen out monday, friday of next week available in between intermittently
* miles in the bay next week


== Links ==
== Links ==

Latest revision as of 18:31, 22 March 2017

« previous meetingindexnext week » create?

Meeting Info

Breakpad status meetings occur on Wed at 10:00am Pacific Time.

Conference numbers:

   Vidyo: Stability 
   650-903-0800 x92 conf 98200#
   800-707-2533 (pin 369) conf 98200# 

IRC backchannel: #breakpad
Mountain View: Dancing Baby (3rd floor)

Operations Updates

  • did not ship this morning
  • just verified stage, going to release *right now*
  • antennae load test continues
    • changing configuration so we can aggregate logs and investigate boxes afterwards
    • necessary to verify that logs going into grephost are actually the logs from the box
  • manually updated the admin node, hopefully for the last time
    • running the latest crontabber but still using a bash lock


Project Updates

Deployment Triage

PR Triage


Major Projects

Splitting out collector (Antenna)

  • (willkg) Worked out cause of startup problems (s3 connections failing, not sending startup errors to Sentry, gunicorn starting too many processes)--Antenna has no configuration at startup. Tracked in bug 1342619.
  • (willkg) Worked on determining dataloss possibilities. Pushed log parser to github. Work is tracked in bug 1348881.
  • (willkg) Prepped Pigeon for monitoring. Work tracked in bug 1344037.
  • (miles, mbrandt) Load tests to figure out good autoscaling parameters. Work tracked in bug 1342621.

Once we establish autoscaling configuration and determine what we can about dataloss, then we can move forward to the next phase where we connect Antenna -stage to Socorro -stage replacing the Socorro collectors. Maybe end of week?

Deprecation rampage

  • Correlations are done and gone
  • Next step: middleware
    • need to move a few remaining services to the webapp

Processor rewrite

Upgrading elasticsearch

  • (Adrian) working on the ES 5.1 mapping
    • currently working on mapping testing with real data

Other Business

  • (Adrian) interest in a signature generation service is coming back
    • context is crash pings (Telemetry), ccorcoran is looking into having a service to generate a signature like ours from a stack trace
    • could make the processor rewrite slightly easier
    • prototype is here: https://github.com/adngdb/crash-signature-service/
  • (peterbe) New validate_and_test.py vastly improved

Travel, etc

  • lonnen out monday, friday of next week available in between intermittently
  • miles in the bay next week

Links