Platform/JSDebugv2: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
 
(55 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= <tt>js::dbg2</tt>: JavaScript Debugging Interface, v2 =
<font size="+2">This is a DRAFT.</font>


We'd like to improve the Mozilla platform's debugging facilities, for a number of reasons:
Comments are welcome on <b>dev-tech-js-engine(at)lists.mozilla.org</b>; or, you can send them directly to me at <b>jimb(at)mozilla.com</b>.


* Beyond debuggers, we want to encourage the creation of other sorts of monitoring and manipulation tools for web code; "watching programs run" is a broad charge. Think [http://en.wikipedia.org/wiki/DTrace DTrace] and [http://sourceware.org/systemtap SystemTap].
= Goals =
* Our JavaScript implementation is changing rapidly; we need to do better than falling back to the bytecode interpreter whenever the debugger is enabled.
* We need to be able to monitor and debug worker threads.
* We need to be able to monitor and debug JavaScript running on embedded devices.
* Now that SpiderMonkey is C++, an interface designed in that language can be more expressive and less error-prone than a C interface.


= Long-term goals =
<ul>


* The interface must operate at the source language level, and not expose details of the implementation technique: it should behave the same way regardless of whether the debuggee is being executed by a bytecode interpreter (SpiderMonkey classic), a just-in-time compiler (TraceMonkey), or a whole-method JIT (Jägermonkey). If the implementation compiles to native code, the debugging interface should be independent of the underlying processor architecture. The interface should be sufficiently high-level to allow debugging of (say) JITted code without requiring the implementation to pretend that is still a bytecode interpreter.
<li>SpiderMonkey's JavaScript debugging API must support close
collaboration with sibling APIs for debugging other web technologies: not
only DOM structure, CSS rules, and networking requests, but also upcoming
tools like worker threads and local storage. These technologies were
designed to interact with each other, and useful debugging tools must
illuminate those interactions.


* The interface must support cross-thread debugging: if the client uses the interfaces provided for this purpose, it should be able to debug JavaScript code running in another thread.
<li>The debugging API must support the creation of robust debugging tools.
Mozilla's current debugging tools are plagued with problems stemming from
the debugger having unintended effects on the debuggee: because both run in
the same process, they share an event loop, chrome, and (to some extent)
JavaScript objects. Our design should strengthen the isolation between the
two, making debugging more reliable.


* The interface must be cross-runtime: it should allow full inspection of JavaScript values, including objects, without creating direct inter-runtime object references or otherwise violating the rules for working with multiple runtimes.
<li>The debugging API must be able to debug web worker threads. Web workers
allow computational tasks to run concurrently with ordinary content
JavaScript. A page can have many worker threads, and workers can spawn
subworker threads. The debugging API should allow debuggers to enumerate
worker threads and monitor their execution and interactions, just as it
does for content JavaScript.


* The interface must be network-transparent: using the appropriate interfaces, a client should be able to inspect the state of a JavaScript program running on another machine.
<li>The debugging API must support remote debugging. Mobile devices often
have restricted user interfaces; it should be possible for the debugger's
user interface to run on a workstation or laptop, while inspecting a
debuggee on the mobile device.


Since the interface is both is network-transparent and independent of the implementation's machine architecture, this means it can be used for debugging JavaScript running on (say) mobile devices, assuming an appropriate connection can be set up.
<li>The debugging API should be prepared to support separate content
processes, if Mozilla implements them.


= Debugging Models =
<li>The debugging API must support our evolving JavaScript implementation.
With its bytecode interpreter, the TraceMonkey tracing just-in-time
compiler, and now Jägermonkey, the method-at-a-time compiler, SpiderMonkey
has three distinct ways of executing JavaScript code. We should be able to
debug programs that have been compiled to machine code, and not force
SpiderMonkey to revert to the slowest implementation technique.
 
</ul>
 
= Design Summary =
 
<i>(Even though it hasn't been implemented yet, this description uses the
present tense, for clarity and ease of transition to summary
documentation.)</i>
 
Debugger user interfaces communicate with the application being debugged via a [[Remote Debugging Protocol|remote debugging protocol]]. The protocol is JSON-based, with clients and servers typically implemented in JavaScript. Each packet from the client is directed at a specific <i>actor</i> on the server, representing a thread, breakpoint, JavaScript object, or the like; each packet from the server comes from a specific actor.
 
Every server provides a root actor that can provide global information about the application ("I am a web browser"), and enumerate the potential debuggees present in the application&mdash;tabs, worker threads, chrome, and so on&mdash;each of which is represented by its own actor.
 
Actors representing individual JavaScript threads use the jsd2IDebuggerService Web IDL interface to inspect and manipulate the debuggee they represent. jsd2IDebuggerService is an alternative to the existing jsdIDebuggerService, implemented in terms of the js::dbg2 C++ interfaces.
 
The server interacts with debuggees running in other threads simply by passing entire JSON packets between the client and actor code running on those threads. Thus, all inter-thread communication is handled via the protocol, permitting thread actors and the interfaces they use to be single-threaded and simplifying their implementation. Communication with subprocesses can be handled the same way.
 
The jsd2IDebuggerService Web IDL interface presents js::dbg2's facilities to JavaScript.  jsd2IDebuggerService is an alternative to the existing jsdIDebuggerService.
 
The js::dbg2 interface provides functions to:
 
* select code of interest to the developer (everything in a tab; a selected frame within a tab; chrome; and so on),
 
* establish breakpoints, watchpoints, and other sorts of monitoring, and be notified when events of interest occur,
 
* inspect and manipulate stack frames, scope chains, objects, and other such members of the JavaScript menagerie.
 
[[File:Architecture-new.png]]
 
=== Debugging Protocol ===
 
[[Remote Debugging Protocol|Remote debugging]], in which the debugger's user interface can run in a separate process from the debuggee and communicates with the debuggee over a stream connection, addresses many of our goals at once:
<ul>
<li><b>A debugger running in a separate process from the debuggee is easier to make robust.</b> The debugger's user interface and the debuggee need not share an event loop or a chrome DOM tree.
<li><b>Remote debugging eases mobile development.</b> The debugger could run on a desktop computer, and operate on a debuggee on a mobile device.
<li><b>The remote protocol can handle almost all inter-thread communication.</b> Each actor runs on the same thread as the debuggee it represents, so actor/debuggee interactions are intra-thread, and need not worry about synchronization or shared state. Actors and the application's main server interact only by exchanging protocol packets. The debugger
user interface simply needs to be able to talk to more than one agent at a time.<p>(Note that some operations are inherently cross-thread: enumerating
currently running threads; thread creation notifications; the initial
attachment of the debugger to a thread. But once a thread has been attached
to, all subsequent communication can be via the remote protocol.)</p>
</ul>
 
=== The js::dbg2 Interfaces and jsd2IDebuggerService ===
 
The [[js::dbg2|js::dbg2 interfaces]], wrapped for JavaScript as the jsd2IDebuggerService, allow the debugger to select the code to debug, set breakpoints and watchpoints and otherwise express interest in debuggee
behaviors, and inspect the debuggee's state.
 
The js::dbg2 interfaces operate at a higher level than jsd. Whereas jsd works in terms of the original SpiderMonkey bytecode interpreter&mdash;JSScript objects, bytecode offsets, JSStackFrame objects, and so on&mdash;the js::dbg2 interfaces operate at the JavaScript source code and value level, and avoid referring to
specifics of the implementation. This makes it easier to support debugging
of TraceMonkey- and JägerMonkey-compiled code: such code need not present
its state in terms of an older intermediate representation that it doesn't
use.
 
Like jsd, js::dbg2 provides <i>grip</i> objects that refer to values in the
debuggee. The debugger can inspect the object's properties, their attributes, and so on via the grip without accidentally invoking getters or setters, making it easier to write secure and robust debuggers.
 
Also like jsd, js::dbg2 provides grip objects referring to JavaScript stack frames. However, there is no necessary correspondence between js::dbg2 stack frame grips and SpiderMonkey's internal JSStackFrame objects. SpiderMonkey's JITs are free to report the current function activations to js::dbg2 in whatever way is most convenient to them; they are not required to synthesize JSStackFrame objects, which must satisfy complex internal constraints.
 
= Tasks and Estimates =
Note that all estimates include time to write unit tests
providing full code and branch coverage for new code.
<dl>
<dt>JS_CopyScript JSAPI function <i>(8 days)</i>
<dd>Implement, document, and test a function that makes a fresh, deep copy
of a JSScript object, suitable for execution in a thread or global object
different than the original JSScript.
<p>For various reasons, SpiderMonkey is moving towards restricting each
JSScript to be used with a single global object (the next task; see details
there). Before we can impose this requirement, we must make it possible for
embedders to comply with it by providing a function which copies a JSScript
object.</p>
<dt>Associate JSScripts with specific global objects <i>(5 days)</i>
<dd>Add a 'global' field to JSScript, and change JS_ExecuteScript to clone
JSScript objects if necessary to match the global object passed.
<p>This is needed to allow us to enumerate all the scripts in use by a
particular global object, along with several other current SpiderMonkey
goals; see {{bug|563375#c4}}. We can accomplish this by having
JS_ExecuteScript use copies of JSScripts owned by globals other than the
one passed to it.</p>
<dt>Change <span>JSRuntime::scriptFilenameTable</span> to use
<span>js::HashMap</span> <i>(3 days)</i>
<dd>Since subsequent tasks will involve changing the data structures used
to store script source URLs, we should grant ourselves the benefits of
strict typing provided by the new js::HashMap template.
<dt>Create name-to-script mapping <i>(8 days)</i>
<dd>Adapt the existing hash table of script names to also function as a
map from script names to scripts. This entails adding links to
JSScript objects, arranging for entries in scriptFilenameTable to
head chains of scripts, and having garbage collection properly remove
scripts from their names' lists.
<dt>Script URL enumeration <i>(5 days)</i>
<dd>Define a function to enumerate the URLs of all scripts associated
with a given global object.
<p>Debugger user interfaces need to be able to present the user with
a list of the scripts in use by a particular page or origin, so that
the user can browse their source code, set breakpoints, and so on.
These lists should include only those scripts in use by the page or
origin being debugged.</p>
<dt>Draft C++ <span>js::dbg2</span> breakpoint API <i>(3 days)</i>
<dd>Write a C++ API declaring:
<ul>
<li>A class representing a position at which a breakpoint can be set,
expressed in terms of textual positions (URL, line, and column) or in terms
of function names (a global object, a series of containing function names,
and a final function name), or in terms of specific function objects.
<p>The API should permit the "grammar" of breakpoint locations to be
extended in the future (to describe, say, function-valued properties in
object literals).</p>
<p>These should be designed such that, in normal, efficent use, no explicit
storage management (new/delete) is required.</p>
<p>URLs in breakpoint locations should be represented as entries in the
runtime's scriptFilenameTable. This means that, given a breakpoint
location, we have immediate access to the list of JSScripts derived from
the source code to which the location refers.</p>
<p>If possible, the URL/line/column variant of this type should be suitable
for use by the js::dbg2 stack frame type to represent source positions; we
should not need two distinct types that represent locations in source code.</p>
<li>A class representing a breakpoint, js::dbg2::Breakpoint, which can be
inserted in or removed from a debugging sphere. This API will not be
concerned with breakpoint conditions, ignore counts, and such; those
behaviors must be implemented by the client of the js::dbg2 interface.
<li>A stub js::dbg2::Sphere class, sufficient for bootstrapping,
constructed from a given global object.
<li>Debugging sphere member functions for enumerating the currently
inserted breakpoints.
</ul>
<dt>Implement Breakpoint Location Classes <i>(5 days)</i>
<dd>Implement the classes described above describing breakpoint locations.
There may be some tricky work here, as we want to have entries in the
scriptFilenameTable that are live because they are referred to by
breakpoint location objects, not scripts, and have entries cleaned up as
appropriate.
<dt>Implement <span>js::dbg2::Breakpoint</span><i>(15 days)</i>
<dd>Implement the js::dbg2::Breakpoint class, including insertion and
removal. This entails:
<ul>
<li>turning the various sorts of breakpoint locations into JSScript,offset pairs
<li>searching JSScript lists to insert and remove traps
<li>managing multiple breakpoints set at the same bytecode
<li>inserting traps for existing breakpoints into newly loaded code (pending breakpoints)
<li>coping with scripts being garbage collected
<li>interlocking with JägerMonkey to insure that breakpoints are never set
in functions that have JM frames on the stack
</ul>
<dt>Use function start positions when re-setting breakpoints <i>(8 days)</i>
<dd>When re-loading a previously loaded script, we should use our knowledge
of function boundaries to improve our accuracy as we re-set breakpoints in
the new script. If all changes to a script lie outside a given function's
definition, then treating the breakpoint as if it were set relative to the
function's start, rather than at an absolute line and column, will allow us
to find a better location for it in the new script.
<dt>Expand source notes to carry column information <i>(8 days)</i>
<dd>Extend the source notes attached to JSScripts to carry both line and
column information. This allows debugging of poorly-formatted code such as
that produced by script compressors or obfuscators. The bytecode compiler
already tracks column numbers; they're simply not recorded in the source
notes.
<p>Note that this need not imply any increase in the size of notes for
normally formatted source code: the granularity of the features
distinguished by the source annotations (that is, statements) need not
change. Only if there were multiple statements or functions on the same
line would column numbers be needed to distinguish them.</p>
</dl>
 
= Links =
 
* http://src.chromium.org/viewvc/chrome/trunk/src/views/events/
* http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/automation/?pathrev=80000
* http://code.google.com/p/selenium/wiki/JsonWireProtocol
* https://wiki.mozilla.org/Remote_Debugging_Protocol
* https://wiki.mozilla.org/User:Automatedtester/FennecDriver
* http://code.google.com/p/selenium/wiki/AutomationAtoms
* https://bugzilla.mozilla.org/show_bug.cgi?id=670674
* https://wiki.mozilla.org/Auto-tools/Projects/Marionette

Latest revision as of 23:39, 5 October 2011

This is a DRAFT.

Comments are welcome on dev-tech-js-engine(at)lists.mozilla.org; or, you can send them directly to me at jimb(at)mozilla.com.

Goals

  • SpiderMonkey's JavaScript debugging API must support close collaboration with sibling APIs for debugging other web technologies: not only DOM structure, CSS rules, and networking requests, but also upcoming tools like worker threads and local storage. These technologies were designed to interact with each other, and useful debugging tools must illuminate those interactions.
  • The debugging API must support the creation of robust debugging tools. Mozilla's current debugging tools are plagued with problems stemming from the debugger having unintended effects on the debuggee: because both run in the same process, they share an event loop, chrome, and (to some extent) JavaScript objects. Our design should strengthen the isolation between the two, making debugging more reliable.
  • The debugging API must be able to debug web worker threads. Web workers allow computational tasks to run concurrently with ordinary content JavaScript. A page can have many worker threads, and workers can spawn subworker threads. The debugging API should allow debuggers to enumerate worker threads and monitor their execution and interactions, just as it does for content JavaScript.
  • The debugging API must support remote debugging. Mobile devices often have restricted user interfaces; it should be possible for the debugger's user interface to run on a workstation or laptop, while inspecting a debuggee on the mobile device.
  • The debugging API should be prepared to support separate content processes, if Mozilla implements them.
  • The debugging API must support our evolving JavaScript implementation. With its bytecode interpreter, the TraceMonkey tracing just-in-time compiler, and now Jägermonkey, the method-at-a-time compiler, SpiderMonkey has three distinct ways of executing JavaScript code. We should be able to debug programs that have been compiled to machine code, and not force SpiderMonkey to revert to the slowest implementation technique.

Design Summary

(Even though it hasn't been implemented yet, this description uses the present tense, for clarity and ease of transition to summary documentation.)

Debugger user interfaces communicate with the application being debugged via a remote debugging protocol. The protocol is JSON-based, with clients and servers typically implemented in JavaScript. Each packet from the client is directed at a specific actor on the server, representing a thread, breakpoint, JavaScript object, or the like; each packet from the server comes from a specific actor.

Every server provides a root actor that can provide global information about the application ("I am a web browser"), and enumerate the potential debuggees present in the application—tabs, worker threads, chrome, and so on—each of which is represented by its own actor.

Actors representing individual JavaScript threads use the jsd2IDebuggerService Web IDL interface to inspect and manipulate the debuggee they represent. jsd2IDebuggerService is an alternative to the existing jsdIDebuggerService, implemented in terms of the js::dbg2 C++ interfaces.

The server interacts with debuggees running in other threads simply by passing entire JSON packets between the client and actor code running on those threads. Thus, all inter-thread communication is handled via the protocol, permitting thread actors and the interfaces they use to be single-threaded and simplifying their implementation. Communication with subprocesses can be handled the same way.

The jsd2IDebuggerService Web IDL interface presents js::dbg2's facilities to JavaScript. jsd2IDebuggerService is an alternative to the existing jsdIDebuggerService.

The js::dbg2 interface provides functions to:

  • select code of interest to the developer (everything in a tab; a selected frame within a tab; chrome; and so on),
  • establish breakpoints, watchpoints, and other sorts of monitoring, and be notified when events of interest occur,
  • inspect and manipulate stack frames, scope chains, objects, and other such members of the JavaScript menagerie.

Architecture-new.png

Debugging Protocol

Remote debugging, in which the debugger's user interface can run in a separate process from the debuggee and communicates with the debuggee over a stream connection, addresses many of our goals at once:

  • A debugger running in a separate process from the debuggee is easier to make robust. The debugger's user interface and the debuggee need not share an event loop or a chrome DOM tree.
  • Remote debugging eases mobile development. The debugger could run on a desktop computer, and operate on a debuggee on a mobile device.
  • The remote protocol can handle almost all inter-thread communication. Each actor runs on the same thread as the debuggee it represents, so actor/debuggee interactions are intra-thread, and need not worry about synchronization or shared state. Actors and the application's main server interact only by exchanging protocol packets. The debugger user interface simply needs to be able to talk to more than one agent at a time.

    (Note that some operations are inherently cross-thread: enumerating currently running threads; thread creation notifications; the initial attachment of the debugger to a thread. But once a thread has been attached to, all subsequent communication can be via the remote protocol.)

The js::dbg2 Interfaces and jsd2IDebuggerService

The js::dbg2 interfaces, wrapped for JavaScript as the jsd2IDebuggerService, allow the debugger to select the code to debug, set breakpoints and watchpoints and otherwise express interest in debuggee behaviors, and inspect the debuggee's state.

The js::dbg2 interfaces operate at a higher level than jsd. Whereas jsd works in terms of the original SpiderMonkey bytecode interpreter—JSScript objects, bytecode offsets, JSStackFrame objects, and so on—the js::dbg2 interfaces operate at the JavaScript source code and value level, and avoid referring to specifics of the implementation. This makes it easier to support debugging of TraceMonkey- and JägerMonkey-compiled code: such code need not present its state in terms of an older intermediate representation that it doesn't use.

Like jsd, js::dbg2 provides grip objects that refer to values in the debuggee. The debugger can inspect the object's properties, their attributes, and so on via the grip without accidentally invoking getters or setters, making it easier to write secure and robust debuggers.

Also like jsd, js::dbg2 provides grip objects referring to JavaScript stack frames. However, there is no necessary correspondence between js::dbg2 stack frame grips and SpiderMonkey's internal JSStackFrame objects. SpiderMonkey's JITs are free to report the current function activations to js::dbg2 in whatever way is most convenient to them; they are not required to synthesize JSStackFrame objects, which must satisfy complex internal constraints.

Tasks and Estimates

Note that all estimates include time to write unit tests providing full code and branch coverage for new code.

JS_CopyScript JSAPI function (8 days)
Implement, document, and test a function that makes a fresh, deep copy of a JSScript object, suitable for execution in a thread or global object different than the original JSScript.

For various reasons, SpiderMonkey is moving towards restricting each JSScript to be used with a single global object (the next task; see details there). Before we can impose this requirement, we must make it possible for embedders to comply with it by providing a function which copies a JSScript object.

Associate JSScripts with specific global objects (5 days)
Add a 'global' field to JSScript, and change JS_ExecuteScript to clone JSScript objects if necessary to match the global object passed.

This is needed to allow us to enumerate all the scripts in use by a particular global object, along with several other current SpiderMonkey goals; see bug 563375#c4. We can accomplish this by having JS_ExecuteScript use copies of JSScripts owned by globals other than the one passed to it.

Change JSRuntime::scriptFilenameTable to use js::HashMap (3 days)
Since subsequent tasks will involve changing the data structures used to store script source URLs, we should grant ourselves the benefits of strict typing provided by the new js::HashMap template.
Create name-to-script mapping (8 days)
Adapt the existing hash table of script names to also function as a map from script names to scripts. This entails adding links to JSScript objects, arranging for entries in scriptFilenameTable to head chains of scripts, and having garbage collection properly remove scripts from their names' lists.
Script URL enumeration (5 days)
Define a function to enumerate the URLs of all scripts associated with a given global object.

Debugger user interfaces need to be able to present the user with a list of the scripts in use by a particular page or origin, so that the user can browse their source code, set breakpoints, and so on. These lists should include only those scripts in use by the page or origin being debugged.

Draft C++ js::dbg2 breakpoint API (3 days)
Write a C++ API declaring:
  • A class representing a position at which a breakpoint can be set, expressed in terms of textual positions (URL, line, and column) or in terms of function names (a global object, a series of containing function names, and a final function name), or in terms of specific function objects.

    The API should permit the "grammar" of breakpoint locations to be extended in the future (to describe, say, function-valued properties in object literals).

    These should be designed such that, in normal, efficent use, no explicit storage management (new/delete) is required.

    URLs in breakpoint locations should be represented as entries in the runtime's scriptFilenameTable. This means that, given a breakpoint location, we have immediate access to the list of JSScripts derived from the source code to which the location refers.

    If possible, the URL/line/column variant of this type should be suitable for use by the js::dbg2 stack frame type to represent source positions; we should not need two distinct types that represent locations in source code.

  • A class representing a breakpoint, js::dbg2::Breakpoint, which can be inserted in or removed from a debugging sphere. This API will not be concerned with breakpoint conditions, ignore counts, and such; those behaviors must be implemented by the client of the js::dbg2 interface.
  • A stub js::dbg2::Sphere class, sufficient for bootstrapping, constructed from a given global object.
  • Debugging sphere member functions for enumerating the currently inserted breakpoints.
Implement Breakpoint Location Classes (5 days)
Implement the classes described above describing breakpoint locations. There may be some tricky work here, as we want to have entries in the scriptFilenameTable that are live because they are referred to by breakpoint location objects, not scripts, and have entries cleaned up as appropriate.
Implement js::dbg2::Breakpoint(15 days)
Implement the js::dbg2::Breakpoint class, including insertion and removal. This entails:
  • turning the various sorts of breakpoint locations into JSScript,offset pairs
  • searching JSScript lists to insert and remove traps
  • managing multiple breakpoints set at the same bytecode
  • inserting traps for existing breakpoints into newly loaded code (pending breakpoints)
  • coping with scripts being garbage collected
  • interlocking with JägerMonkey to insure that breakpoints are never set in functions that have JM frames on the stack
Use function start positions when re-setting breakpoints (8 days)
When re-loading a previously loaded script, we should use our knowledge of function boundaries to improve our accuracy as we re-set breakpoints in the new script. If all changes to a script lie outside a given function's definition, then treating the breakpoint as if it were set relative to the function's start, rather than at an absolute line and column, will allow us to find a better location for it in the new script.
Expand source notes to carry column information (8 days)
Extend the source notes attached to JSScripts to carry both line and column information. This allows debugging of poorly-formatted code such as that produced by script compressors or obfuscators. The bytecode compiler already tracks column numbers; they're simply not recorded in the source notes.

Note that this need not imply any increase in the size of notes for normally formatted source code: the granularity of the features distinguished by the source annotations (that is, statements) need not change. Only if there were multiple statements or functions on the same line would column numbers be needed to distinguish them.

Links