Platform/JSDebugv2: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
 
(39 intermediate revisions by 3 users not shown)
Line 3: Line 3:
Comments are welcome on <b>dev-tech-js-engine(at)lists.mozilla.org</b>; or, you can send them directly to me at <b>jimb(at)mozilla.com</b>.
Comments are welcome on <b>dev-tech-js-engine(at)lists.mozilla.org</b>; or, you can send them directly to me at <b>jimb(at)mozilla.com</b>.


= <tt>js::dbg2</tt>: JavaScript Debugging Interface, v2 =
= Goals =


We'd like to improve the Mozilla platform's debugging facilities, for a number of reasons:
<ul>


* Beyond debuggers, we want to encourage the creation of other sorts of monitoring and manipulation tools for web code; "watching programs run" is a broad charge. Think [http://en.wikipedia.org/wiki/DTrace DTrace] and [http://sourceware.org/systemtap SystemTap].
<li>SpiderMonkey's JavaScript debugging API must support close
* Our JavaScript implementation is changing rapidly; we need to do better than falling back to the bytecode interpreter whenever the debugger is enabled.
collaboration with sibling APIs for debugging other web technologies: not
* We need to be able to monitor and debug worker threads.
only DOM structure, CSS rules, and networking requests, but also upcoming
* We need to be able to monitor and debug JavaScript running on embedded devices.
tools like worker threads and local storage. These technologies were
* Now that SpiderMonkey is C++, an interface designed in that language can be more expressive and less error-prone than a C interface.
designed to interact with each other, and useful debugging tools must
illuminate those interactions.


= General goals =
<li>The debugging API must support the creation of robust debugging tools.
Mozilla's current debugging tools are plagued with problems stemming from
the debugger having unintended effects on the debuggee: because both run in
the same process, they share an event loop, chrome, and (to some extent)
JavaScript objects. Our design should strengthen the isolation between the
two, making debugging more reliable.


* The interface must operate at the source language level, and not expose details of the implementation technique: it should behave the same way regardless of whether the debuggee is being executed by a bytecode interpreter (SpiderMonkey classic), a just-in-time compiler (TraceMonkey), or a whole-method JIT (Jägermonkey). If the implementation compiles to native code, the debugging interface should be independent of the underlying processor architecture. The interface should be sufficiently high-level to allow debugging of (say) JITted code without requiring the implementation to pretend that is still a bytecode interpreter.
<li>The debugging API must be able to debug web worker threads. Web workers
allow computational tasks to run concurrently with ordinary content
JavaScript. A page can have many worker threads, and workers can spawn
subworker threads. The debugging API should allow debuggers to enumerate
worker threads and monitor their execution and interactions, just as it
does for content JavaScript.


* The interface must support cross-thread debugging: if the client uses the interfaces provided for this purpose, it should be able to debug JavaScript code running in another thread.
<li>The debugging API must support remote debugging. Mobile devices often
have restricted user interfaces; it should be possible for the debugger's
user interface to run on a workstation or laptop, while inspecting a
debuggee on the mobile device.


* The interface must be cross-runtime: it should allow full inspection of JavaScript values, including objects, without creating direct inter-runtime object references or otherwise violating the rules for working with multiple runtimes.
<li>The debugging API should be prepared to support separate content
processes, if Mozilla implements them.


* The interface must be network-transparent: using the appropriate interfaces, a client should be able to inspect the state of a JavaScript program running on another machine.
<li>The debugging API must support our evolving JavaScript implementation.
With its bytecode interpreter, the TraceMonkey tracing just-in-time
compiler, and now Jägermonkey, the method-at-a-time compiler, SpiderMonkey
has three distinct ways of executing JavaScript code. We should be able to
debug programs that have been compiled to machine code, and not force
SpiderMonkey to revert to the slowest implementation technique.


Since the interface is both is network-transparent and independent of the implementation's machine architecture, this means it can be used for debugging JavaScript running on (say) mobile devices, assuming an appropriate connection can be set up.
</ul>


= Event Handlers and Spheres =
= Design Summary =


A JavaScript debugger connects to a debuggee by expressing to js::dbg2 its
<i>(Even though it hasn't been implemented yet, this description uses the
interest in <em>events</em> occurring in particular <em>spheres</em>.
present tense, for clarity and ease of transition to summary
Events are things like breakpoint or watchpoint hits, completions of
documentation.)</i>
single-step operations, exceptions being thrown, or 'eval' being called.
Spheres are things like particular global objects, origins (in the HTML5
sense), XUL chrome, worker threads, or other things that identify
subdivisions of the system that one might want to select to debug.


(In jsd, the <tt>jsdIFilter</tt> interface attempts to help the debugger
Debugger user interfaces communicate with the application being debugged via a [[Remote Debugging Protocol|remote debugging protocol]]. The protocol is JSON-based, with clients and servers typically implemented in JavaScript. Each packet from the client is directed at a specific <i>actor</i> on the server, representing a thread, breakpoint, JavaScript object, or the like; each packet from the server comes from a specific actor.
distinguish the events it cares about from those it doesn't, but it bases
its decisions on script URL patterns; the debugger user wants to debug a
particular web site, which could use code from any number of sources.
js::dbg2's filtering by origin (web site) and global (web page) provide a
better basis for implementing the behavior the debugger's users actually
want.)


It may be helpful to provide events reporting the creation and destruction
Every server provides a root actor that can provide global information about the application ("I am a web browser"), and enumerate the potential debuggees present in the application&mdash;tabs, worker threads, chrome, and so on&mdash;each of which is represented by its own actor.
of spheres (creating new tabs; visiting new web sites); this is something I
don't understand well yet.


= Frames and Scopes =
Actors representing individual JavaScript threads use the jsd2IDebuggerService Web IDL interface to inspect and manipulate the debuggee they represent. jsd2IDebuggerService is an alternative to the existing jsdIDebuggerService, implemented in terms of the js::dbg2 C++ interfaces.


Like jsd, js::dbg2 represents the control stack as a list of frames.
The server interacts with debuggees running in other threads simply by passing entire JSON packets between the client and actor code running on those threads. Thus, all inter-thread communication is handled via the protocol, permitting thread actors and the interfaces they use to be single-threaded and simplifying their implementation. Communication with subprocesses can be handled the same way.


* A frame representing a call to a JavaScript function has a source location (a URL,line pair), and a scope (a set of identifier bindings). Given a scope, one can look up an identifier's binding, enumerate the bindings present, find its enclosing scope, evaluate JavaScript expressions in that scope, and so on.
The jsd2IDebuggerService Web IDL interface presents js::dbg2's facilities to JavaScript. jsd2IDebuggerService is an alternative to the existing jsdIDebuggerService.


* A frame representing a call to a host function (implemented in C++, say) will have some appropriate identification.
The js::dbg2 interface provides functions to:


In this area, js::dbg2's behavior will not differ much from jsd's, except
* select code of interest to the developer (everything in a tab; a selected frame within a tab; chrome; and so on),
that it will identify the current point of execution using script URLs and
line numbers, not a script proxy objects and bytecode offsets; see
"High-level Source Positions", below.


= Value Proxies =
* establish breakpoints, watchpoints, and other sorts of monitoring, and be notified when events of interest occur,


Like jsd, js::dbg2 does not permit the debugger to refer to values in the
* inspect and manipulate stack frames, scope chains, objects, and other such members of the JavaScript menagerie.
debuggee directly. Instead, it provides proxy objects (analogous to jsd's
<tt>jsdIValue</tt>) which facilitate inspection, but protect the debugger
from inadvertently invoking getters, setters, and the like. js::dbg2 will
follow jsd's design here, except that the facilities for examining object
properties will more closely resemble ES5's inspection facilities
(Object.getOwnPropertyDescriptor, etc.)


js::dbg2 aims to support debugging interfaces that correlate values in JS programs with DOM trees, CSS rules, and content rendered on the screen. Thus, js::dbg2 proxy objects representing DOM nodes and other interesting host objects should provide extended interfaces to support these sorts of XUL-specific exploration.
[[File:Architecture-new.png]]


= High-level Source Positions =
=== Debugging Protocol ===


The js::dbg2 interface will allow the debugger to specify breakpoint
[[Remote Debugging Protocol|Remote debugging]], in which the debugger's user interface can run in a separate process from the debuggee and communicates with the debuggee over a stream connection, addresses many of our goals at once:
positions in terms of script URLs and line numbers, or function names
<ul>
qualified by enclosing scopes, not JSScript objects and bytecode offsets.
<li><b>A debugger running in a separate process from the debuggee is easier to make robust.</b> The debugger's user interface and the debuggee need not share an event loop or a chrome DOM tree.
It will be the responsibility of js::dbg2 to manage the mapping between
<li><b>Remote debugging eases mobile development.</b> The debugger could run on a desktop computer, and operate on a debuggee on a mobile device.
source locations and trapped bytecodes, and insert and remove trap
<li><b>The remote protocol can handle almost all inter-thread communication.</b> Each actor runs on the same thread as the debuggee it represents, so actor/debuggee interactions are intra-thread, and need not worry about synchronization or shared state. Actors and the application's main server interact only by exchanging protocol packets. The debugger
bytecodes as JSScript objects are created and destroyed.
user interface simply needs to be able to talk to more than one agent at a time.<p>(Note that some operations are inherently cross-thread: enumerating
 
currently running threads; thread creation notifications; the initial
Code passed to 'eval' or the 'Function' constructors, or established via
attachment of the debugger to a thread. But once a thread has been attached
DOM manipulation, will be assigned synthetic names; see also "Script
to, all subsequent communication can be via the remote protocol.)</p>
Labeling", below.
</ul>
 
The jsd interface for setting breakpoints requires the debugger to identify
the <tt>jsdIScript</tt> (a wrapper for JSAPI <tt>JSScript</tt> objects) and
bytecode offset within that script at which the breakpoint should be
inserted. The debugger is responsible for tracking the creation and
destruction of scripts, mapping source locations to (script, bytecode
offset) pairs, and inserting and removing trap points. There are a number
of problems with this approach:


* The script+offset interface is oriented towards one particular implementation of JavaScript, out of the three we now have. The new interface for breakpoint specification is implementation-neutral, as it expresses locations strictly in terms of JavaScript source code.
=== The js::dbg2 Interfaces and jsd2IDebuggerService ===


* Tracking the creation and destruction of scripts is a source of considerable complexity in the debugger; being able to take advantage of SpiderMonkey's own data structures for managing JSScripts, which may need some revisions, should be a net simplification.
The [[js::dbg2|js::dbg2 interfaces]], wrapped for JavaScript as the jsd2IDebuggerService, allow the debugger to select the code to debug, set breakpoints and watchpoints and otherwise express interest in debuggee
behaviors, and inspect the debuggee's state.


* If a bug in the debugger causes it to supply an incorrect location for a breakpoint trap bytecode, the debugger can cause the interpreter to crash. (At the moment, the system does not even check that the trap locations provided by the debugger are valid offsets into the script's bytecode, but that could easily be fixed.)
The js::dbg2 interfaces operate at a higher level than jsd. Whereas jsd works in terms of the original SpiderMonkey bytecode interpreter&mdash;JSScript objects, bytecode offsets, JSStackFrame objects, and so on&mdash;the js::dbg2 interfaces operate at the JavaScript source code and value level, and avoid referring to
specifics of the implementation. This makes it easier to support debugging
of TraceMonkey- and JägerMonkey-compiled code: such code need not present
its state in terms of an older intermediate representation that it doesn't
use.


* For remote debugging, it would be very inefficient to report the creation and destruction of all JSScripts across the communication channel to the debugger.
Like jsd, js::dbg2 provides <i>grip</i> objects that refer to values in the
debuggee. The debugger can inspect the object's properties, their attributes, and so on via the grip without accidentally invoking getters or setters, making it easier to write secure and robust debuggers.


=== Estimates ===
Also like jsd, js::dbg2 provides grip objects referring to JavaScript stack frames. However, there is no necessary correspondence between js::dbg2 stack frame grips and SpiderMonkey's internal JSStackFrame objects. SpiderMonkey's JITs are free to report the current function activations to js::dbg2 in whatever way is most convenient to them; they are not required to synthesize JSStackFrame objects, which must satisfy complex internal constraints.


= Tasks and Estimates =
Note that all estimates include time to write unit tests
Note that all estimates include time to write unit tests
providing full code and branch coverage for new code.
providing full code and branch coverage for new code.
<dl>
<dl>
<dt>JS_CopyScript JSAPI function <i>(8 days)</i>
<dt>JS_CopyScript JSAPI function <i>(8 days)</i>
Line 111: Line 108:
of a JSScript object, suitable for execution in a thread or global object
of a JSScript object, suitable for execution in a thread or global object
different than the original JSScript.
different than the original JSScript.
 
<p>For various reasons, SpiderMonkey is moving towards restricting each
For various reasons, SpiderMonkey is moving towards restricting each
JSScript to be used with a single global object (the next task; see details
JSScript to be used with a single global object (the next task; see details
there). Before we can impose this requirement, we must make it possible for
there). Before we can impose this requirement, we must make it possible for
embedders to comply with it by providing a function which copies a JSScript
embedders to comply with it by providing a function which copies a JSScript
object.
object.</p>
<dt>Associate JSScripts with specific global objects <i>(5 days)</i>
<dt>Associate JSScripts with specific global objects <i>(5 days)</i>
<dd>Add a 'global' field to JSScript, and change JS_ExecuteScript to clone
<dd>Add a 'global' field to JSScript, and change JS_ExecuteScript to clone
JSScript objects if necessary to match the global object passed.
JSScript objects if necessary to match the global object passed.
 
<p>This is needed to allow us to enumerate all the scripts in use by a
This is needed to allow us to enumerate all the scripts in use by a
particular global object, along with several other current SpiderMonkey
particular global object, along with several other current SpiderMonkey
goals; see {{bug|563375#c4}}. We can accomplish this by having
goals; see {{bug|563375#c4}}. We can accomplish this by having
JS_ExecuteScript use copies of JSScripts owned by globals other than the
JS_ExecuteScript use copies of JSScripts owned by globals other than the
one passed to it.
one passed to it.</p>
<dt>Change <span>JSRuntime::scriptFilenameTable</span> to use <span>js::HashMap</span> <i>(3 days)</i>
<dt>Change <span>JSRuntime::scriptFilenameTable</span> to use
<span>js::HashMap</span> <i>(3 days)</i>
<dd>Since subsequent tasks will involve changing the data structures used
<dd>Since subsequent tasks will involve changing the data structures used
to store script source URLs, we should grant ourselves the benefits of
to store script source URLs, we should grant ourselves the benefits of
Line 139: Line 135:
<dd>Define a function to enumerate the URLs of all scripts associated
<dd>Define a function to enumerate the URLs of all scripts associated
with a given global object.
with a given global object.
 
<p>Debugger user interfaces need to be able to present the user with
Debugger user interfaces need to be able to present the user with
a list of the scripts in use by a particular page or origin, so that
a list of the scripts in use by a particular page or origin, so that
the user can browse their source code, set breakpoints, and so on.
the user can browse their source code, set breakpoints, and so on.
These lists should include only those scripts in use by the page or
These lists should include only those scripts in use by the page or
origin being debugged.
origin being debugged.</p>
<dt>Draft C++ <span>js::dbg2</span> breakpoint API <i>(3 days)</i>
<dt>Draft C++ <span>js::dbg2</span> breakpoint API <i>(3 days)</i>
<dd>Write a C++ API declaring:
<dd>Write a C++ API declaring:
Line 152: Line 147:
of function names (a global object, a series of containing function names,
of function names (a global object, a series of containing function names,
and a final function name), or in terms of specific function objects.
and a final function name), or in terms of specific function objects.
 
<p>The API should permit the "grammar" of breakpoint locations to be
The API should permit the "grammar" of breakpoint locations to be
extended in the future (to describe, say, function-valued properties in
extended in the future (to describe, say, function-valued properties in
object literals).
object literals).</p>
 
<p>These should be designed such that, in normal, efficent use, no explicit
These should be designed such that, in normal, efficent use, no explicit
storage management (new/delete) is required.</p>
storage management (new/delete) is required.
<p>URLs in breakpoint locations should be represented as entries in the
 
URLs in breakpoint locations should be represented as entries in the
runtime's scriptFilenameTable. This means that, given a breakpoint
runtime's scriptFilenameTable. This means that, given a breakpoint
location, we have immediate access to the list of JSScripts derived from
location, we have immediate access to the list of JSScripts derived from
the source code to which the location refers.
the source code to which the location refers.</p>
 
<p>If possible, the URL/line/column variant of this type should be suitable
If possible, the URL/line/column variant of this type should be suitable
for use by the js::dbg2 stack frame type to represent source positions; we
for use by the js::dbg2 stack frame type to represent source positions; we
should not need two distinct types that represent locations in source code.
should not need two distinct types that represent locations in source code.</p>
<li>A class representing a breakpoint, js::dbg2::Breakpoint, which can be
<li>A class representing a breakpoint, js::dbg2::Breakpoint, which can be
inserted in or removed from a debugging sphere. This API will not be
inserted in or removed from a debugging sphere. This API will not be
Line 195: Line 186:
in functions that have JM frames on the stack
in functions that have JM frames on the stack
</ul>
</ul>
<dt>Use function start positions when re-setting breakpoints <i>(8 days)</i>
<dd>When re-loading a previously loaded script, we should use our knowledge
of function boundaries to improve our accuracy as we re-set breakpoints in
the new script. If all changes to a script lie outside a given function's
definition, then treating the breakpoint as if it were set relative to the
function's start, rather than at an absolute line and column, will allow us
to find a better location for it in the new script.
<dt>Expand source notes to carry column information <i>(8 days)</i>
<dd>Extend the source notes attached to JSScripts to carry both line and
column information. This allows debugging of poorly-formatted code such as
that produced by script compressors or obfuscators. The bytecode compiler
already tracks column numbers; they're simply not recorded in the source
notes.
<p>Note that this need not imply any increase in the size of notes for
normally formatted source code: the granularity of the features
distinguished by the source annotations (that is, statements) need not
change. Only if there were multiple statements or functions on the same
line would column numbers be needed to distinguish them.</p>
</dl>
</dl>


= Remote Debugging =
= Links =  
 
js::dbg2 will provide facilities for connecting to a remote XUL process,
either on the same machine or via a network or hardware connection, and
enumerating the spheres present in that process, providing human-readable
descriptions. If the js::dbg2 client expresses an interest in events
occurring in such spheres, a remote debugging session is established.
 
This communication will be implemented using something resembling V8's [http://code.google.com/p/v8/wiki/DebuggerProtocol Debugger Protocol] and Chrome's [http://code.google.com/p/chromedevtools/wiki/ChromeDevToolsProtocol ChromeDevTools Protocol].
 
Remote debugging support will make a number of things possible:
 
* The debugger UI can move into its own process (say, as a XULrunner application), providing better debugger/debuggee segregation.
 
* A debugger running in a separate process will be able to provide better chrome debugging, as the debugger won't be trying to operate on its own chrome.
 
* We can use it to debug worker threads, simply by using an intra-process communications channel (and perhaps using the fact that we share the debuggee's architecture and ABI to use a simpler protocol).
 
= Compilation Hooks And Script Interrogation =
 
Instead of jsd's <tt>onScriptCreated</tt> and <tt>onScriptDestroyed</tt>
hooks, js::dbg2 will provide events for the start and end of each
compilation, not individual scripts created by those compilations.
 
The 'compilation start' event will make available the full text to be
compiled (if available; compilation can consume tokens from a &lt;stdio.h&gt;
FILE, although I don't think the browser uses this).
 
The 'compilation end' event will make available a list of the names of
functions declared in the compiled script.
 
= Script Labeling =
 
We should provide variants of 'eval' and the 'Function' constructor that
allow their callers to provide a URL and line number for the code being
evaluated, just as the JSAPI <tt>JS_EvaluateScript</tt> function does. This
is a trivial change that, with cooperation from loaders and debuggers, will
improve the debugging experience and allow debuggers to be more robust.
 
Real-life web code often uses 'loaders': JavaScript programs that retrieve
code using an XMLHTTPRequest and pass it to 'eval'. Firebug (and other
JavaScript debuggers, apparently) go to great lengths to find such scripts
and assign them meaningful names; for example, Firebug searches the
script's source code for specially formatted comments at the bottom that
supply the script's URL, or generates identifiers based on content hashes.
However, a cooperative loader could simply supply an appropriate name or
URL for the script that the debugger could display to its users.
 
Template engines and other code generators are also popular, producing
JavaScript code on the fly and passing it to eval or the Function
constructor. In these cases, there may be no underlying URL, but it would
still be valuable to the user if the debugger could identify the parameters
used to produce the code.
 
= Debugging of JITted code =
 
Although debugging may disable just-in-time compilation for the time being,
in the long term we would like to support debugging of code that has been
compiled by Jägermonkey, and perhaps to some degree by TraceMonkey. Some
operations would be restricted, but allowing code to run at full speed
under the debugger seems like a valuable feature.
 
In the case of Jägermonkey, the compiler would need to maintain a mapping
from generated machine code instructions to source locations, scope
extents, and stack information. JavaScript-level breakpoints could be
implemented by placing machine-level breakpoints in the compiled code, and
then using a signal handler that uses the instruction address to probe this
map, find variable's homes, and walk the stack.
 
Much of the challenge here will be in handling variable references:
 
* Unused variables may not be represented in the machine code at all.
 
* Null closures may not provide enough information to find variables in enclosing scopes.
 
* The compiler may have made assumptions that restrict what sorts of values can be assigned to a variable, or make it impossible to assign to the variable at all.
 
* Allowing the user to add and delete variables by passing 'var' and 'delete' forms to the debugger's 'evaluate-in-frame' command may not be practical.
 
However, in almost all cases, simply being able to produce a stack trace
and show the values of the variables will be sufficient for most users.
 
Debugging code compiled by TraceMonkey may be more difficult to support, as
that compiler seems to generate machine code that is further from the
original source, but it's still worth looking into. Again, getting this
mostly right will be perfectly fine for many users.
 
= No Cross-Runtime Debugging =
 
The jsd interface only supports debugging programs running in a single
Runtime at once. There has been some discussion about whether js::dbg2
should support inter-runtime debugging, but this has been set aside:
 
* Intra-runtime debugging isn't required for any of our current plans. Worker threads, Chrome and content all share a single runtime, and there are no plans to change this.
 
* Experienced SpiderMonkey developers did not feel that segregating debugger and debuggee in separate runtimes offered much benefit in practice.
 
= Internal Debugging Models =
 
The <tt>js::dbg2</tt> debugging interface operates at the JavaScript level,
not at the C++ or machine level. It assumes that the JavaScript
implementation itself is healthy and responsive: the JavaScript program
being executed may have gone wrong, but the JavaScript implementation's
internal state must not be corrupt. Bugs in the implementation may cause
the debugger to fail; bugs in the interpreted program must not.
 
Whenever a program's execution is paused, the C++ call stack looks like
this (younger frames appear above older frames):
 
{| border="1"
| debugger machinery frames
|-
| interpreter/JITted frames for debuggee
|-
| top-level event loop
|}
 
In this case, the "debugger machinery" is responsible for reporting the
state of the JavaScript debuggee and interacting with the debugger's user
interface until the program is continued. When control continues, the
"debugger machinery" frame simply returns, and the "interpreter frames for
debuggee" resume execution. If the user decides to stop executing the
debuggee, the "debugger machinery" frame throws an appropriate, uncatchable
exception, allowing the interpreter to clean up its state in an orderly
way.
 
== Evaluating User Expressions ==
 
If the user asks the debugger to evaluate an expression that requires
evaluating JavaScript code (like <tt>e.x()</tt>), then the C++ stack looks
like this:
 
{| border="1"
| interpreter/JITted frames for expression given to debugger
|-
| debugger machinery frames
|-
| interpreter/JITted frames for debuggee
|-
| top-level event loop
|}
 
If evaluation of the expression throws an exception or hits a breakpoint,
then the result is a matter of user interface. Either we abandon evaluation
of the expression, and C++ control returns to the original machinery frames:
 
{| border="1"
| debugger machinery frames
|-
| interpreter/JITted frames for debuggee
|-
| top-level event loop
|}
 
Or we treat the event as something to be investigated, just as if it had
occurred in the debuggee's normal course of execution:
 
{| border="1"
| nested debugger machinery frames
|-
| interpreter/JITted frames for expression given to debugger
|-
| debugger machinery frames
|-
| interpreter/JITted frames for debuggee
|-
| top-level event loop
|}
 
Again, the debugger machinery is <em>not</em> written to tolerate corrupt
interpreter data structures or incomplete execution states; it relies on
the interpreter's debugging API working correctly.
 
== Same-Stack Debugging ==
 
In the current model for debugging Firefox, the debugger runs in the same
process as the debuggee. Since the XUL user interface only allows one
thread to interact with it, the debugger's user interface must share a
thread, and thus a stack, with the debuggee. Thus, when the debuggee is
paused and the user is interacting with the debugger's user interface, the
C++ stack looks like this:
 
{| border="1"
| debugger UI frames<br>(that is, more interpreted/JITted JS frames)
|-
| nested event loop invocation
|-
| debugger machinery frames
|-
| interpreter/JITted frames for debuggee
|-
| top-level event loop
|}
 
There are a number of complications that arise from this model:
 
* The debugger's UI and the debuggee share a DOM, and may interact with each other in unexpected ways through that DOM.
 
* The debugger should never refer to the debuggee's objects directly --- it is too easy to introduce bugs and security holes by doing so. However, avoiding this is similar to the problem of ensuring that references between Firefox chrome and content go through the proper wrapper objects. This seems to be challenging in practice.
 
== Remote Debugging ==
 
One way to avoid the issues mentioned above is to move the debugger UI into
its own process, and have it communicate with the debuggee using a wire
protocol.  (See Remote Debugging, above.)
 
This ability is also helpful when the debuggee is running on a device with
a limited user interface (say, a mobile phone or tablet computer): it can
be valuable to have the debugger's user interface running on a workstation
or laptop. In this case, the C++ call stack looks like this:
 
{| border="1"
| debug protocol server
|-
| nested event loop invocation
|-
| debugger machinery frames
|-
| interpreter/JITted frames for debuggee
|-
| top-level event loop
|}
 
The stack of the debugger's user interface can be whatever is convenient,
as long as it communicates appropriately with the debug server. But one
possible arrangement would be to treat the protocol as simply another back
end for the js::dbg2 interface; the debugger UI would behave identically
regardless of whether the debuggee was local or remote. Thus, the C++ stack
in the process running the debugger UI would look like this:
 
{| border="1"
| debugger UI frames
|-
| nested event loop invocation
|-
| debugger machinery frames
|-
| debugger back end: debug protocol client
|-
| top-level event loop
|}
 
Remote debugging also enables debugging worker threads: if the worker's
top-level event loop responds to messages registering the debugger's
interest in the sphere
 
Remote debugging also prepares us to support debugging content in an
architecture which places content JavaScript in separate processes from
chrome JavaScript.
 
== Separate Windows Cannot Be Debugged Independently ==
 
One interesting consequence of the fact that Firefox uses a single thread
for all chrome and content JavaScript is that independent windows (in the
sense of an HTML5 "Window" object; tabs are windows) cannot be debugged
independently. Suppose we hit a breakpoint in one window:
 
{| border="1"
| debugger UI frames
|-
| nested event loop invocation
|-
| debugger machinery frames
|-
| interpreter/JITted frames for first window
|-
| top-level event loop
|}
 
Then we switch to a different window and hit a breakpoint there, as well:
 
{| border="1"
| debugger UI frames
|-
| nested event loop invocation
|-
| debugger machinery frames
|-
| interpreter/JITted frames for second window
|-
| nested event loop invocation
|-
| debugger machinery frames
|-
| interpreter/JITted frames for first window
|-
| top-level event loop
|}
 
(I believe Firebug currently forbids this situation from arising, either by
refusing to allow debugging to occur in the second window, or by throwing
away the first window's JavaScript stack. But the goal here is to point out
intrinsic limitations in Firefox's execution model, regardless of how
Firebug behaves.)
 
In this case, we cannot simply switch back to the first window and resume
execution there: we must first finish (or abandon) execution in the second
window, because its stack frames are on top of the ones we wish to resume.
 
There are two general solutions. The first would be to change SpiderMonkey
to represent the JavaScript stack entirely in the heap, such that no C++
frames accumulate in the above scenario, and then use a separate JavaScript
stack for each window. However, aside from the engineering work needed,
accomodating native frames mixed with JavaScript frames in this arrangement
would be a challenge.
 
The second is to change Firefox to use a separate C++ stack for each
window, by creating a separate thread for each window. These threads would
not run concurrently (if properly designed, the functions for passing
control from one stack to another can guarantee this), avoiding the sorts
of unreproducible behavior that make most multi-threaded, shared memory
programming so difficult.


If Firefox evolves towards a process-per-window model, then it will have a
* http://src.chromium.org/viewvc/chrome/trunk/src/views/events/
separate stack per window, and the debugging restrictions described above
* http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/automation/?pathrev=80000
can be lifted. However, if the user creates a large number of windows,
* http://code.google.com/p/selenium/wiki/JsonWireProtocol
Firefox may need to have windows share processes; in this case, the
* https://wiki.mozilla.org/Remote_Debugging_Protocol
multiple, non-mutually-preemptive thread model described above could
* https://wiki.mozilla.org/User:Automatedtester/FennecDriver
provide consistency between the process-per-window and
* http://code.google.com/p/selenium/wiki/AutomationAtoms
several-windows-per-process arrangements.
* https://bugzilla.mozilla.org/show_bug.cgi?id=670674
* https://wiki.mozilla.org/Auto-tools/Projects/Marionette

Latest revision as of 23:39, 5 October 2011

This is a DRAFT.

Comments are welcome on dev-tech-js-engine(at)lists.mozilla.org; or, you can send them directly to me at jimb(at)mozilla.com.

Goals

  • SpiderMonkey's JavaScript debugging API must support close collaboration with sibling APIs for debugging other web technologies: not only DOM structure, CSS rules, and networking requests, but also upcoming tools like worker threads and local storage. These technologies were designed to interact with each other, and useful debugging tools must illuminate those interactions.
  • The debugging API must support the creation of robust debugging tools. Mozilla's current debugging tools are plagued with problems stemming from the debugger having unintended effects on the debuggee: because both run in the same process, they share an event loop, chrome, and (to some extent) JavaScript objects. Our design should strengthen the isolation between the two, making debugging more reliable.
  • The debugging API must be able to debug web worker threads. Web workers allow computational tasks to run concurrently with ordinary content JavaScript. A page can have many worker threads, and workers can spawn subworker threads. The debugging API should allow debuggers to enumerate worker threads and monitor their execution and interactions, just as it does for content JavaScript.
  • The debugging API must support remote debugging. Mobile devices often have restricted user interfaces; it should be possible for the debugger's user interface to run on a workstation or laptop, while inspecting a debuggee on the mobile device.
  • The debugging API should be prepared to support separate content processes, if Mozilla implements them.
  • The debugging API must support our evolving JavaScript implementation. With its bytecode interpreter, the TraceMonkey tracing just-in-time compiler, and now Jägermonkey, the method-at-a-time compiler, SpiderMonkey has three distinct ways of executing JavaScript code. We should be able to debug programs that have been compiled to machine code, and not force SpiderMonkey to revert to the slowest implementation technique.

Design Summary

(Even though it hasn't been implemented yet, this description uses the present tense, for clarity and ease of transition to summary documentation.)

Debugger user interfaces communicate with the application being debugged via a remote debugging protocol. The protocol is JSON-based, with clients and servers typically implemented in JavaScript. Each packet from the client is directed at a specific actor on the server, representing a thread, breakpoint, JavaScript object, or the like; each packet from the server comes from a specific actor.

Every server provides a root actor that can provide global information about the application ("I am a web browser"), and enumerate the potential debuggees present in the application—tabs, worker threads, chrome, and so on—each of which is represented by its own actor.

Actors representing individual JavaScript threads use the jsd2IDebuggerService Web IDL interface to inspect and manipulate the debuggee they represent. jsd2IDebuggerService is an alternative to the existing jsdIDebuggerService, implemented in terms of the js::dbg2 C++ interfaces.

The server interacts with debuggees running in other threads simply by passing entire JSON packets between the client and actor code running on those threads. Thus, all inter-thread communication is handled via the protocol, permitting thread actors and the interfaces they use to be single-threaded and simplifying their implementation. Communication with subprocesses can be handled the same way.

The jsd2IDebuggerService Web IDL interface presents js::dbg2's facilities to JavaScript. jsd2IDebuggerService is an alternative to the existing jsdIDebuggerService.

The js::dbg2 interface provides functions to:

  • select code of interest to the developer (everything in a tab; a selected frame within a tab; chrome; and so on),
  • establish breakpoints, watchpoints, and other sorts of monitoring, and be notified when events of interest occur,
  • inspect and manipulate stack frames, scope chains, objects, and other such members of the JavaScript menagerie.

Architecture-new.png

Debugging Protocol

Remote debugging, in which the debugger's user interface can run in a separate process from the debuggee and communicates with the debuggee over a stream connection, addresses many of our goals at once:

  • A debugger running in a separate process from the debuggee is easier to make robust. The debugger's user interface and the debuggee need not share an event loop or a chrome DOM tree.
  • Remote debugging eases mobile development. The debugger could run on a desktop computer, and operate on a debuggee on a mobile device.
  • The remote protocol can handle almost all inter-thread communication. Each actor runs on the same thread as the debuggee it represents, so actor/debuggee interactions are intra-thread, and need not worry about synchronization or shared state. Actors and the application's main server interact only by exchanging protocol packets. The debugger user interface simply needs to be able to talk to more than one agent at a time.

    (Note that some operations are inherently cross-thread: enumerating currently running threads; thread creation notifications; the initial attachment of the debugger to a thread. But once a thread has been attached to, all subsequent communication can be via the remote protocol.)

The js::dbg2 Interfaces and jsd2IDebuggerService

The js::dbg2 interfaces, wrapped for JavaScript as the jsd2IDebuggerService, allow the debugger to select the code to debug, set breakpoints and watchpoints and otherwise express interest in debuggee behaviors, and inspect the debuggee's state.

The js::dbg2 interfaces operate at a higher level than jsd. Whereas jsd works in terms of the original SpiderMonkey bytecode interpreter—JSScript objects, bytecode offsets, JSStackFrame objects, and so on—the js::dbg2 interfaces operate at the JavaScript source code and value level, and avoid referring to specifics of the implementation. This makes it easier to support debugging of TraceMonkey- and JägerMonkey-compiled code: such code need not present its state in terms of an older intermediate representation that it doesn't use.

Like jsd, js::dbg2 provides grip objects that refer to values in the debuggee. The debugger can inspect the object's properties, their attributes, and so on via the grip without accidentally invoking getters or setters, making it easier to write secure and robust debuggers.

Also like jsd, js::dbg2 provides grip objects referring to JavaScript stack frames. However, there is no necessary correspondence between js::dbg2 stack frame grips and SpiderMonkey's internal JSStackFrame objects. SpiderMonkey's JITs are free to report the current function activations to js::dbg2 in whatever way is most convenient to them; they are not required to synthesize JSStackFrame objects, which must satisfy complex internal constraints.

Tasks and Estimates

Note that all estimates include time to write unit tests providing full code and branch coverage for new code.

JS_CopyScript JSAPI function (8 days)
Implement, document, and test a function that makes a fresh, deep copy of a JSScript object, suitable for execution in a thread or global object different than the original JSScript.

For various reasons, SpiderMonkey is moving towards restricting each JSScript to be used with a single global object (the next task; see details there). Before we can impose this requirement, we must make it possible for embedders to comply with it by providing a function which copies a JSScript object.

Associate JSScripts with specific global objects (5 days)
Add a 'global' field to JSScript, and change JS_ExecuteScript to clone JSScript objects if necessary to match the global object passed.

This is needed to allow us to enumerate all the scripts in use by a particular global object, along with several other current SpiderMonkey goals; see bug 563375#c4. We can accomplish this by having JS_ExecuteScript use copies of JSScripts owned by globals other than the one passed to it.

Change JSRuntime::scriptFilenameTable to use js::HashMap (3 days)
Since subsequent tasks will involve changing the data structures used to store script source URLs, we should grant ourselves the benefits of strict typing provided by the new js::HashMap template.
Create name-to-script mapping (8 days)
Adapt the existing hash table of script names to also function as a map from script names to scripts. This entails adding links to JSScript objects, arranging for entries in scriptFilenameTable to head chains of scripts, and having garbage collection properly remove scripts from their names' lists.
Script URL enumeration (5 days)
Define a function to enumerate the URLs of all scripts associated with a given global object.

Debugger user interfaces need to be able to present the user with a list of the scripts in use by a particular page or origin, so that the user can browse their source code, set breakpoints, and so on. These lists should include only those scripts in use by the page or origin being debugged.

Draft C++ js::dbg2 breakpoint API (3 days)
Write a C++ API declaring:
  • A class representing a position at which a breakpoint can be set, expressed in terms of textual positions (URL, line, and column) or in terms of function names (a global object, a series of containing function names, and a final function name), or in terms of specific function objects.

    The API should permit the "grammar" of breakpoint locations to be extended in the future (to describe, say, function-valued properties in object literals).

    These should be designed such that, in normal, efficent use, no explicit storage management (new/delete) is required.

    URLs in breakpoint locations should be represented as entries in the runtime's scriptFilenameTable. This means that, given a breakpoint location, we have immediate access to the list of JSScripts derived from the source code to which the location refers.

    If possible, the URL/line/column variant of this type should be suitable for use by the js::dbg2 stack frame type to represent source positions; we should not need two distinct types that represent locations in source code.

  • A class representing a breakpoint, js::dbg2::Breakpoint, which can be inserted in or removed from a debugging sphere. This API will not be concerned with breakpoint conditions, ignore counts, and such; those behaviors must be implemented by the client of the js::dbg2 interface.
  • A stub js::dbg2::Sphere class, sufficient for bootstrapping, constructed from a given global object.
  • Debugging sphere member functions for enumerating the currently inserted breakpoints.
Implement Breakpoint Location Classes (5 days)
Implement the classes described above describing breakpoint locations. There may be some tricky work here, as we want to have entries in the scriptFilenameTable that are live because they are referred to by breakpoint location objects, not scripts, and have entries cleaned up as appropriate.
Implement js::dbg2::Breakpoint(15 days)
Implement the js::dbg2::Breakpoint class, including insertion and removal. This entails:
  • turning the various sorts of breakpoint locations into JSScript,offset pairs
  • searching JSScript lists to insert and remove traps
  • managing multiple breakpoints set at the same bytecode
  • inserting traps for existing breakpoints into newly loaded code (pending breakpoints)
  • coping with scripts being garbage collected
  • interlocking with JägerMonkey to insure that breakpoints are never set in functions that have JM frames on the stack
Use function start positions when re-setting breakpoints (8 days)
When re-loading a previously loaded script, we should use our knowledge of function boundaries to improve our accuracy as we re-set breakpoints in the new script. If all changes to a script lie outside a given function's definition, then treating the breakpoint as if it were set relative to the function's start, rather than at an absolute line and column, will allow us to find a better location for it in the new script.
Expand source notes to carry column information (8 days)
Extend the source notes attached to JSScripts to carry both line and column information. This allows debugging of poorly-formatted code such as that produced by script compressors or obfuscators. The bytecode compiler already tracks column numbers; they're simply not recorded in the source notes.

Note that this need not imply any increase in the size of notes for normally formatted source code: the granularity of the features distinguished by the source annotations (that is, statements) need not change. Only if there were multiple statements or functions on the same line would column numbers be needed to distinguish them.

Links