aboutsummaryrefslogtreecommitdiff
path: root/doc/impl.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/impl.md')
-rw-r--r--doc/impl.md242
1 files changed, 242 insertions, 0 deletions
diff --git a/doc/impl.md b/doc/impl.md
new file mode 100644
index 0000000..380e2d6
--- /dev/null
+++ b/doc/impl.md
@@ -0,0 +1,242 @@
+
+# Starlark in Go: Implementation
+
+This document (a work in progress) describes some of the design
+choices of the Go implementation of Starlark.
+
+ * [Scanner](#scanner)
+ * [Parser](#parser)
+ * [Resolver](#resolver)
+ * [Evaluator](#evaluator)
+ * [Data types](#data-types)
+ * [Freezing](#freezing)
+ * [Testing](#testing)
+
+
+## Scanner
+
+The scanner is derived from Russ Cox's
+[buildifier](https://github.com/bazelbuild/buildtools/tree/master/buildifier)
+tool, which pretty-prints Bazel BUILD files.
+
+Most of the work happens in `(*scanner).nextToken`.
+
+## Parser
+
+The parser is hand-written recursive-descent parser. It uses the
+technique of [precedence
+climbing](http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing)
+to reduce the number of productions.
+
+In some places the parser accepts a larger set of programs than are
+strictly valid, leaving the task of rejecting them to the subsequent
+resolver pass. For example, in the function call `f(a, b=c)` the
+parser accepts any expression for `a` and `b`, even though `b` may
+legally be only an identifier. For the parser to distinguish these
+cases would require additional lookahead.
+
+## Resolver
+
+The resolver reports structural errors in the program, such as the use
+of `break` and `continue` outside of a loop.
+
+Starlark has stricter syntactic limitations than Python. For example,
+it does not permit `for` loops or `if` statements at top level, nor
+does it permit global variables to be bound more than once.
+These limitations come from the Bazel project's desire to make it easy
+to identify the sole statement that defines each global, permitting
+accurate cross-reference documentation.
+
+In addition, the resolver validates all variable names, classifying
+them as references to universal, global, local, or free variables.
+Local and free variables are mapped to a small integer, allowing the
+evaluator to use an efficient (flat) representation for the
+environment.
+
+Not all features of the Go implementation are "standard" (that is,
+supported by Bazel's Java implementation), at least for now, so
+non-standard features such as `lambda`, `float`, and `set`
+are flag-controlled. The resolver reports
+any uses of dialect features that have not been enabled.
+
+
+## Evaluator
+
+### Data types
+
+<b>Integers:</b> Integers are representing using `big.Int`, an
+arbitrary precision integer. This representation was chosen because,
+for many applications, Starlark must be able to handle without loss
+protocol buffer values containing signed and unsigned 64-bit integers,
+which requires 65 bits of precision.
+
+Small integers (<256) are preallocated, but all other values require
+memory allocation. Integer performance is relatively poor, but it
+matters little for Bazel-like workloads which depend much
+more on lists of strings than on integers. (Recall that a typical loop
+over a list in Starlark does not materialize the loop index as an `int`.)
+
+An optimization worth trying would be to represent integers using
+either an `int32` or `big.Int`, with the `big.Int` used only when
+`int32` does not suffice. Using `int32`, not `int64`, for "small"
+numbers would make it easier to detect overflow from operations like
+`int32 * int32`, which would trigger the use of `big.Int`.
+
+<b>Floating point</b>:
+Floating point numbers are represented using Go's `float64`.
+Again, `float` support is required to support protocol buffers. The
+existence of floating-point NaN and its infamous comparison behavior
+(`NaN != NaN`) had many ramifications for the API, since we cannot
+assume the result of an ordered comparison is either less than,
+greater than, or equal: it may also fail.
+
+<b>Strings</b>:
+
+TODO: discuss UTF-8 and string.bytes method.
+
+<b>Dictionaries and sets</b>:
+Starlark dictionaries have predictable iteration order.
+Furthermore, many Starlark values are hashable in Starlark even though
+the Go values that represent them are not hashable in Go: big
+integers, for example.
+Consequently, we cannot use Go maps to implement Starlark's dictionary.
+
+We use a simple hash table whose buckets are linked lists, each
+element of which holds up to 8 key/value pairs. In a well-distributed
+table the list should rarely exceed length 1. In addition, each
+key/value item is part of doubly-linked list that maintains the
+insertion order of the elements for iteration.
+
+<b>Struct:</b>
+The `starlarkstruct` Go package provides a non-standard Starlark
+extension data type, `struct`, that maps field identifiers to
+arbitrary values. Fields are accessed using dot notation: `y = s.f`.
+This data type is extensively used in Bazel, but its specification is
+currently evolving.
+
+Starlark has no `class` mechanism, nor equivalent of Python's
+`namedtuple`, though it is likely that future versions will support
+some way to define a record data type of several fields, with a
+representation more efficient than a hash table.
+
+
+### Freezing
+
+All mutable values created during module initialization are _frozen_
+upon its completion. It is this property that permits a Starlark module
+to be referenced by two Starlark threads running concurrently (such as
+the initialization threads of two other modules) without the
+possibility of a data race.
+
+The Go implementation supports freezing by storing an additional
+"frozen" Boolean variable in each mutable object. Once this flag is set,
+all subsequent attempts at mutation fail. Every value defines a
+Freeze method that sets its own frozen flag if not already set, and
+calls Freeze for each value that it contains.
+For example, when a list is frozen, it freezes each of its elements;
+when a dictionary is frozen, it freezes each of its keys and values;
+and when a function value is frozen, it freezes each of the free
+variables and parameter default values implicitly referenced by its closure.
+Application-defined types must also follow this discipline.
+
+The freeze mechanism in the Go implementation is finer grained than in
+the Java implementation: in effect, the latter has one "frozen" flag
+per module, and every value holds a reference to the frozen flag of
+its module. This makes setting the frozen flag more efficient---a
+simple bit flip, no need to traverse the object graph---but coarser
+grained. Also, it complicates the API slightly because to construct a
+list, say, requires a reference to the frozen flag it should use.
+
+The Go implementation would also permit the freeze operation to be
+exposed to the program, for example as a built-in function.
+This has proven valuable in writing tests of the freeze mechanism
+itself, but is otherwise mostly a curiosity.
+
+
+### Fail-fast iterators
+
+In some languages (such as Go), a program may mutate a data structure
+while iterating over it; for example, a range loop over a map may
+delete map elements. In other languages (such as Java), iterators do
+extra bookkeeping so that modification of the underlying collection
+invalidates the iterator, and the next attempt to use it fails.
+This often helps to detect subtle mistakes.
+
+Starlark takes this a step further. Instead of mutation of the
+collection invalidating the iterator, the act of iterating makes the
+collection temporarily immutable, so that an attempt to, say, delete a
+dict element while looping over the dict, will fail. The error is
+reported against the delete operation, not the iteration.
+
+This is implemented by having each mutable iterable value record a
+counter of active iterators. Starting a loop increments this counter,
+and completing a loop decrements it. A collection with a nonzero
+counter behaves as if frozen. If the collection is actually frozen,
+the counter bookkeeping is unnecessary. (Consequently, iterator
+bookkeeping is needed only while objects are still mutable, before
+they can have been published to another thread, and thus no
+synchronization is necessary.)
+
+A consequence of this design is that in the Go API, it is imperative
+to call `Done` on each iterator once it is no longer needed.
+
+```
+TODO
+starlark.Value interface and subinterfaces
+argument passing to builtins: UnpackArgs, UnpackPositionalArgs.
+```
+
+<b>Evaluation strategy:</b>
+The evaluator uses a simple recursive tree walk, returning a value or
+an error for each expression. We have experimented with just-in-time
+compilation of syntax trees to bytecode, but two limitations in the
+current Go compiler prevent this strategy from outperforming the
+tree-walking evaluator.
+
+First, the Go compiler does not generate a "computed goto" for a
+switch statement ([Go issue
+5496](https://github.com/golang/go/issues/5496)). A bytecode
+interpreter's main loop is a for-loop around a switch statement with
+dozens or hundreds of cases, and the speed with which each case can be
+dispatched strongly affects overall performance.
+Currently, a switch statement generates a binary tree of ordered
+comparisons, requiring several branches instead of one.
+
+Second, the Go compiler's escape analysis assumes that the underlying
+array from a `make([]Value, n)` allocation always escapes
+([Go issue 20533](https://github.com/golang/go/issues/20533)).
+Because the bytecode interpreter's operand stack has a non-constant
+length, it must be allocated with `make`. The resulting allocation
+adds to the cost of each Starlark function call; this can be tolerated
+by amortizing one very large stack allocation across many calls.
+More problematic appears to be the cost of the additional GC write
+barriers incurred by every VM operation: every intermediate result is
+saved to the VM's operand stack, which is on the heap.
+By contrast, intermediate results in the tree-walking evaluator are
+never stored to the heap.
+
+```
+TODO
+frames, backtraces, errors.
+threads
+Print
+Load
+```
+
+## Testing
+
+```
+TODO
+starlarktest package
+`assert` module
+starlarkstruct
+integration with Go testing.T
+```
+
+
+## TODO
+
+
+```
+Discuss practical separation of code and data.
+```