Auto-tools/Projects/Pulse: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(Remove outdated sections, add some updates)
(Added link to Go (golang) pulse client library and made headings bigger)
Line 1: Line 1:
= Introducing Pulse =
https://pulse.mozilla.org/
https://pulse.mozilla.org/


Line 11: Line 13:
Also see the [https://tools.taskcluster.net/pulse-inspector/ Pulse Inspector] web app, which displays Pulse messages in real time, and the (manually updated) [[/Exchanges|list of Pulse exchanges]].
Also see the [https://tools.taskcluster.net/pulse-inspector/ Pulse Inspector] web app, which displays Pulse messages in real time, and the (manually updated) [[/Exchanges|list of Pulse exchanges]].


=== System Description ===
= System Description =


Pulse isn't any one thing.  At its heart, it is a RabbitMQ system with a particular configuration and a set of conventions for using it along with a management tool, [[Auto-tools/Projects/Pulse/PulseGuardian|PulseGuardian]], to make Pulse as automated and self-serve as possible.  Pulse follows the pub-sub pattern, in which publishers send messages to topic exchanges, and consumers create queues bound to these exchanges in order to subscribe to the publishers' messages.
Pulse isn't any one thing.  At its heart, it is a RabbitMQ system with a particular configuration and a set of conventions for using it along with a management tool, [[Auto-tools/Projects/Pulse/PulseGuardian|PulseGuardian]], to make Pulse as automated and self-serve as possible.  Pulse follows the pub-sub pattern, in which publishers send messages to topic exchanges, and consumers create queues bound to these exchanges in order to subscribe to the publishers' messages.


==== Specification ====
= Specification =


Pulse is a managed [https://www.rabbitmq.com/resources/specs/amqp0-9-1.pdf AMQP 0-9-1]
Pulse is a managed [https://www.rabbitmq.com/resources/specs/amqp0-9-1.pdf AMQP 0-9-1]
Line 22: Line 24:
use to integrate and extend Mozilla infrastructure.
use to integrate and extend Mozilla infrastructure.


===== Authentication =====
== Authentication ==


Pulse credentials are managed and issued by [[Auto-tools/Projects/Pulse/PulseGuardian|PulseGuardian]],
Pulse credentials are managed and issued by [[Auto-tools/Projects/Pulse/PulseGuardian|PulseGuardian]],
Line 33: Line 35:
''username'' and ''password'', respectively.
''username'' and ''password'', respectively.


===== Authorized Users =====
== Authorized Users ==


Pulse is intended to be open to all Mozillians who want to
Pulse is intended to be open to all Mozillians who want to
Line 39: Line 41:
abuse PulseGuardian users MUST authenticate via Persona. PulseGuardian SHOULD verify that users have a vouched Mozillians profile.
abuse PulseGuardian users MUST authenticate via Persona. PulseGuardian SHOULD verify that users have a vouched Mozillians profile.


===== Publishers =====
== Publishers ==


Publishers MUST name ''exchanges'' in the form
Publishers MUST name ''exchanges'' in the form
Line 63: Line 65:
given exchange do not exhibit deliver ''at-least-once'' semantics.
given exchange do not exhibit deliver ''at-least-once'' semantics.


===== Subscribers =====
== Subscribers ==


Subscribers MUST name ''queues'' in the form
Subscribers MUST name ''queues'' in the form
Line 82: Line 84:
for deliver-''at-least-once'' semantics.
for deliver-''at-least-once'' semantics.


===== Appendix A: Everything in Bullet Points =====
== Appendix A: Everything in Bullet Points ==


This is a summary of the above.
This is a summary of the above.
Line 122: Line 124:
* MUST not grow unbounded
* MUST not grow unbounded


=== Let's Use It ===
= Let's Use It =
 
There are currently two pulse clients available. Please note that you can also connect to pulse in other languages, provided you have an AMQP 0.9.1 library that will let you interact with AMQP exchanges. See https://github.com/rabbitmq/rabbitmq-tutorials#languages for example.


The [https://pypi.python.org/pypi/MozillaPulse mozillapulse] Python package provides classes for existing publishers, consumers, and messages so you can quickly build Pulse applications.
* Python Pulse client library: the [https://pypi.python.org/pypi/MozillaPulse mozillapulse] Python package provides classes for existing publishers, consumers, and messages so you can quickly build Pulse applications.
* Go (golang) Pulse client library: http://petemoore.github.io/pulse-go/


=== Contributing ===
= Contributing =


To set up a local system for development, see the [https://hg.mozilla.org/automation/mozillapulse/file/tip/HACKING.md HACKING.md] file included in the mozillapulse source.
To set up a local system for development, see the [https://hg.mozilla.org/automation/mozillapulse/file/tip/HACKING.md HACKING.md] file included in the mozillapulse source.
Line 155: Line 160:
For mentored bugs, we use the User Story to provide a link back to this page, as well as any extra information for contributors, such as required knowledge/learnings.  The basic text for mentored bugs should be "This is a mentored Pulse bug.  For general information on Pulse, see https://wiki.mozilla.org/Auto-tools/Projects/Pulse, which includes a section on Contributing."  An example of extra text is "This bug also requires you to have a working mail server."
For mentored bugs, we use the User Story to provide a link back to this page, as well as any extra information for contributors, such as required knowledge/learnings.  The basic text for mentored bugs should be "This is a mentored Pulse bug.  For general information on Pulse, see https://wiki.mozilla.org/Auto-tools/Projects/Pulse, which includes a section on Contributing."  An example of extra text is "This bug also requires you to have a working mail server."


==== Consuming Buildbot messages ====
= Consuming Buildbot messages =


There are two ways to consume messages published by Buildbot.  The most direct way, which requires the most knowledge about Buildbot, is using the BuildConsumer in [http://hg.mozilla.org/automation/mozillapulse mozillapulse].  This consumer has access to all the native Buildbot messages, and therefore offers the most flexibility.
There are two ways to consume messages published by Buildbot.  The most direct way, which requires the most knowledge about Buildbot, is using the BuildConsumer in [http://hg.mozilla.org/automation/mozillapulse mozillapulse].  This consumer has access to all the native Buildbot messages, and therefore offers the most flexibility.
Line 167: Line 172:
Generally speaking, consumers that wish to be notified when specific build or test jobs are completed should use the NormalizedBuildConsumer; consumers that need direct access to the Buildbot pulse stream or are looking for non-specific jobs (such as all jobs belonging to a particular commit) should probably use the BuildConsumer.
Generally speaking, consumers that wish to be notified when specific build or test jobs are completed should use the NormalizedBuildConsumer; consumers that need direct access to the Buildbot pulse stream or are looking for non-specific jobs (such as all jobs belonging to a particular commit) should probably use the BuildConsumer.


=== Road Map ===
= Road Map =


See the [http://mzl.la/1pc2F3M prioritized bug list] for all open issues and feature requests.
See the [http://mzl.la/1pc2F3M prioritized bug list] for all open issues and feature requests.


=== Security Model ===
= Security Model =


This is summarized in the formal Pulse specification above.  What follows is the rationale and some technical implementation notes.
This is summarized in the formal Pulse specification above.  What follows is the rationale and some technical implementation notes.
Line 188: Line 193:
With this security model, we technically don't really need vhosts, since the names of the queues and exchanges the users can use are so specific.  There may still be a benefit in allowing apps to use the same queue name for different exchanges, though, which would be possible if each exchange had its own vhost.  The downside is that you cannot specify "all vhosts" when setting a user's permissions, so they would either have to list all vhosts they want to use when creating the user in PulseGuardian, and be able to update that list later, or PulseGuardian or some other app would have to automatically add new permissions to all users when a vhost is created.
With this security model, we technically don't really need vhosts, since the names of the queues and exchanges the users can use are so specific.  There may still be a benefit in allowing apps to use the same queue name for different exchanges, though, which would be possible if each exchange had its own vhost.  The downside is that you cannot specify "all vhosts" when setting a user's permissions, so they would either have to list all vhosts they want to use when creating the user in PulseGuardian, and be able to update that list later, or PulseGuardian or some other app would have to automatically add new permissions to all users when a vhost is created.


=== Admin Procedures ===
= Admin Procedures =


* PulseGuardian should be deleting queues that are too long. If you need to manually delete a queue, use the Management UI. Try to ping the queue owner first before killing if possible.
* PulseGuardian should be deleting queues that are too long. If you need to manually delete a queue, use the Management UI. Try to ping the queue owner first before killing if possible.
Line 194: Line 199:
* logparser service, used by [http://brasstacks.mozilla.com/orangefactor/ Orange Factor], runs on orangefactor1.dmz.phx1.mozilla.com
* logparser service, used by [http://brasstacks.mozilla.com/orangefactor/ Orange Factor], runs on orangefactor1.dmz.phx1.mozilla.com


=== More reading ===
= More reading =


* [http://slides.com/mcote/pulse Slides] from a presentation on Pulse.
* [http://slides.com/mcote/pulse Slides] from a presentation on Pulse.

Revision as of 17:40, 25 February 2015

Introducing Pulse

https://pulse.mozilla.org/

Mozilla currently has a ton of different systems that are inter-connected via polling, screen scraping, email, and other brittle methods. To make their lives easier community members often build tools on top of this house of cards, adding yet another level of scraping and polling. Many systems don't even export important data for others to scrape and use, preventing better tools from being written.

The goal of Pulse is to eliminate polling and add visibility into all aspects of Mozilla and its systems. This allows more robust, dynamic, and informative tools.

We have a discussion forum available via the standard trio of USENET newsgroup, mailing list, and Google Group.

File bugs under Webtools :: Pulse.

Also see the Pulse Inspector web app, which displays Pulse messages in real time, and the (manually updated) list of Pulse exchanges.

System Description

Pulse isn't any one thing. At its heart, it is a RabbitMQ system with a particular configuration and a set of conventions for using it along with a management tool, PulseGuardian, to make Pulse as automated and self-serve as possible. Pulse follows the pub-sub pattern, in which publishers send messages to topic exchanges, and consumers create queues bound to these exchanges in order to subscribe to the publishers' messages.

Specification

Pulse is a managed AMQP 0-9-1 service with RabbitMQ extensions for publishing messages from Mozilla infrastructure. The aim is provide hooks that subscribers can use to integrate and extend Mozilla infrastructure.

Authentication

Pulse credentials are managed and issued by PulseGuardian, available at https://pulse.mozilla.org. This service SHALL issue an accessToken for any clientId that is registered with authorized email address. The accessToken is strictly secret and MUST NOT be shared publicly. The clientId is not secret. When establishing an AMQP connection, the clientId and accessToken MUST be used as username and password, respectively.

Authorized Users

Pulse is intended to be open to all Mozillians who want to extend or integrate with Mozilla infrastructure. To guard against abuse PulseGuardian users MUST authenticate via Persona. PulseGuardian SHOULD verify that users have a vouched Mozillians profile.

Publishers

Publishers MUST name exchanges in the form exchange/<clientId>/<name>; attempts to name an exchange otherwise SHALL result in an authorization error. Exchanges MUST be topic exchanges and they MUST be declared durable.

Messages MUST contain a UTF-8-encoded JSON payload, and their Content-Type MUST be application/json. Messages SHOULD NOT be larger than 8 kB; deviations may be feasible for low-traffic exchanges. Messages MUST NOT contain secret or sensitive information; all exchanges and messages SHALL be considered public.

A message SHOULD carry a routing key, in which fields have a fixed index from the left. Additionally, a message MAY be cced to multiple routing keys, using the RabbitMQ Sender-selected Distribution extension.

Messages SHOULD be durable and SHOULD be published over RabbitMQ confirm-publish channels. Otherwise, the documentation MUST clearly reflect that messages from the given exchange do not exhibit deliver at-least-once semantics.

Subscribers

Subscribers MUST name queues in the form queue/<clientId>/<name>; attempts to name a queue otherwise SHALL result in an authorization error. Queues MAY consume from any exchange prefixed exchange/; attempts to consume from any other exchange SHALL result in an authentication error.

Subscribers MAY limit the size of their queues using the RabbitMQ Queue Length Limit extension. Subscribers MUST NOT let their queues grow unbounded; if left unattended, Pulse SHALL notify the owner by email. Additionally, Pulse MAY delete a queue which exceeds defined limits. Subscribers SHOULD specify a prefetch limit using the RabbitMQ Consumer Prefetch limit extension.

Subscribers SHOULD use either durable queues or auto-delete queues. Implementors are recommended to aim for deliver-at-least-once semantics.

Appendix A: Everything in Bullet Points

This is a summary of the above.

Pulse:

Publishers:

Exchanges:

  • MUST be named exchange/<clientId>/<name>
  • MUST be topic exchanges
  • MUST be durable

Messages:

  • MUST be UTF-8-encoded JSON
  • MUST carry application/json as Content-Type
  • SHOULD be durable
  • SHOULD be less than 8 KiB (for good performance)
  • MAY be CC'ed to multiple routing keys
  • MUST NOT contain private or sensitive information
  • SHOULD have a routing key where fields have a fixed index from the left

Subscribers:

Queues:

  • MUST be named queue/<clientId>/<name>
  • MAY have a limited length
  • MUST not grow unbounded

Let's Use It

There are currently two pulse clients available. Please note that you can also connect to pulse in other languages, provided you have an AMQP 0.9.1 library that will let you interact with AMQP exchanges. See https://github.com/rabbitmq/rabbitmq-tutorials#languages for example.

  • Python Pulse client library: the mozillapulse Python package provides classes for existing publishers, consumers, and messages so you can quickly build Pulse applications.
  • Go (golang) Pulse client library: http://petemoore.github.io/pulse-go/

Contributing

To set up a local system for development, see the HACKING.md file included in the mozillapulse source.

Here is a the list of open, unassigned mentored Pulse bugs to see how you can contribute!

Full Query
ID Summary Priority Status
1071947 Support for notifying mailing lists P5 NEW
1079523 [PulseGuardian] List exchanges with ability to delete P5 NEW
1084706 API for listing queues by user (useful for bulk deletion after tests) P5 NEW
1215520 [PulseGuardian] Handle auth failures gracefully P5 NEW
1298929 Disaster Recovery plan -- NEW
1346304 [PulseGuardian] Randomly generate passwords rather than prompting for them P5 NEW
1347088 [PulseGuardian] "Queue is overgrowing" email needs adjustment for unbounded queues -- NEW
1347093 [PulseGuardian] Add UI for allowing admins to mark queues as unbounded -- NEW
1434385 [PulseGuardian] "My RabbitMQ Accounts" shows unowned accounts as directly belonging to admins -- NEW
1509429 [PulseGuardian] JS errorMessage() function doesn't exist -- NEW
1536698 implement additional alerts for pulse.m.o to check for a large volume of unacked alarms -- NEW
1609989 pulseguardian cannot delete exclusive queues, doesn't log about it -- NEW
1663374 Please disable mtrinkala's Pulse Guardian account -- NEW
1875132 queues with high amounts of unconsumed messages can take down pulse -- NEW
1875328 upgrade to latest rabbitmq version -- NEW
1903235 Investigate isolation of taskcluster exchanges/queues from the rest by vhost -- NEW

16 Total; 16 Open (100%); 0 Resolved (0%); 0 Verified (0%);


Once you have your feet wet and are ready to take on a more involved project, here is a list of all current Pulse bugs:

Full Query
ID Summary Priority Status
1071947 Support for notifying mailing lists P5 NEW
1079523 [PulseGuardian] List exchanges with ability to delete P5 NEW
1084706 API for listing queues by user (useful for bulk deletion after tests) P5 NEW
1215520 [PulseGuardian] Handle auth failures gracefully P5 NEW
1298929 Disaster Recovery plan -- NEW
1346304 [PulseGuardian] Randomly generate passwords rather than prompting for them P5 NEW
1347088 [PulseGuardian] "Queue is overgrowing" email needs adjustment for unbounded queues -- NEW
1347093 [PulseGuardian] Add UI for allowing admins to mark queues as unbounded -- NEW
1434385 [PulseGuardian] "My RabbitMQ Accounts" shows unowned accounts as directly belonging to admins -- NEW
1509429 [PulseGuardian] JS errorMessage() function doesn't exist -- NEW
1536698 implement additional alerts for pulse.m.o to check for a large volume of unacked alarms -- NEW
1609989 pulseguardian cannot delete exclusive queues, doesn't log about it -- NEW
1663374 Please disable mtrinkala's Pulse Guardian account -- NEW
1875132 queues with high amounts of unconsumed messages can take down pulse -- NEW
1875328 upgrade to latest rabbitmq version -- NEW
1903235 Investigate isolation of taskcluster exchanges/queues from the rest by vhost -- NEW

16 Total; 16 Open (100%); 0 Resolved (0%); 0 Verified (0%);


For mentored bugs, we use the User Story to provide a link back to this page, as well as any extra information for contributors, such as required knowledge/learnings. The basic text for mentored bugs should be "This is a mentored Pulse bug. For general information on Pulse, see https://wiki.mozilla.org/Auto-tools/Projects/Pulse, which includes a section on Contributing." An example of extra text is "This bug also requires you to have a working mail server."

Consuming Buildbot messages

There are two ways to consume messages published by Buildbot. The most direct way, which requires the most knowledge about Buildbot, is using the BuildConsumer in mozillapulse. This consumer has access to all the native Buildbot messages, and therefore offers the most flexibility.

The disadvantage of using the BuildConsumer is that you need to spend time understanding what messages Buildbot publishes to pulse, and how these can vary, and associate particular messages with what you're trying to accomplish. The format of Buildbot messages is undocumented, and can change without warning, which makes services based on the BuildConsumer potentially fragile.

To address some of these disadvantages, a translator is run against the BuildConsumer (the pulsetranslator) which re-publishes a subset of Buildbot messages to a NormalizedBuild exchange, which are available using the NormalizedBuildConsumer of mozillapulse. The content of these messages is simplified and normalized, making it easier to consume without the need to have a thorough understanding of how Buildbot publishes messages to pulse. The re-published messages also protect consumers against some changes to the pulse stream, although significant enough changes will likely break the pulse translator as well as direct users of BuildConsumer.

Another advantage of the NormalizedBuildConsumer is that it will only publish messages for a given build or test job after the logs for that job are available; using the BuildConsumer directly can result in the reception of messages for a build before the build artifacts are available, which can cause problems in consumers if they don't explicitly guard against this problem.

Generally speaking, consumers that wish to be notified when specific build or test jobs are completed should use the NormalizedBuildConsumer; consumers that need direct access to the Buildbot pulse stream or are looking for non-specific jobs (such as all jobs belonging to a particular commit) should probably use the BuildConsumer.

Road Map

See the prioritized bug list for all open issues and feature requests.

Security Model

This is summarized in the formal Pulse specification above. What follows is the rationale and some technical implementation notes.

In order to have a reliable, well behaved system, the following assertions will need to be true.

  • All users, publishers and consumers alike, must have their own accounts (no guest/public users).
  • Only publishers should be able to declare exchanges.
  • Only the publisher user account associated with a particular vhost should be allowed to publish messages to exchanges in the vhost. In other words, exactly one user account should be allowed to publish messages within a given vhost.
  • Only the user that created a particular queue should be allowed to consume from it.

Since exchange and queue permissions go together, we'll need exchange and queue naming conventions mixed with restrictive permissions. Each user will be restricted to a particular exchange and queue naming prefix. Many users will be either consumers or publishers, but for simplicity, each user can do both. Users will have full permissions on "^exchange/<username>/.*$" and "^queue/<username>/.*$". They will also have read permissions to exchange/*. This will both prevent users from writing to other users' exchanges as well as prevent them from consuming from other users' queues. For convenience, if a consumer creates a nondurable queue, mozillapulse can assign a random suffix to the user's standard queue name prefix, i.e. queue/<username>/<random string>, since the user wouldn't be able to create nor access a completely random server-assigned name.

Note that this doesn't prevent a consumer from creating an exchange named as a queue, since the permission model doesn't distinguish between queues and exchanges, and consumers need the ability to create queues. This is not particularly problematic, since no one would have permission to use that exchange.

With this security model, we technically don't really need vhosts, since the names of the queues and exchanges the users can use are so specific. There may still be a benefit in allowing apps to use the same queue name for different exchanges, though, which would be possible if each exchange had its own vhost. The downside is that you cannot specify "all vhosts" when setting a user's permissions, so they would either have to list all vhosts they want to use when creating the user in PulseGuardian, and be able to update that list later, or PulseGuardian or some other app would have to automatically add new permissions to all users when a vhost is created.

Admin Procedures

  • PulseGuardian should be deleting queues that are too long. If you need to manually delete a queue, use the Management UI. Try to ping the queue owner first before killing if possible.
  • pulsetranslator service, which normalizes Buildbot messages, is currently running on pulsetranslator.ateam.phx1.mozilla.com and may need to be reset from time to time.
  • logparser service, used by Orange Factor, runs on orangefactor1.dmz.phx1.mozilla.com

More reading

  • Slides from a presentation on Pulse.
  • Update on Pulse from 2015/02/16.

LegNeato also wrote several blog posts on Pulse as he was building it. They contain some more background if you're really interested. They are linked below, in chronological order.