1 files changed, 594 insertions, 0 deletions
diff --git a/docs/writing_extensions.txt b/docs/writing_extensions.txt
new file mode 100644
index 0000000..3aad74a
--- /dev/null
+++ b/docs/writing_extensions.txt
@@ -0,0 +1,594 @@
+Writing Extensions for Python-Markdown
+======================================
+
+Overview
+--------
+
+Python-Markdown includes an API for extension writers to plug their own 
+custom functionality and/or syntax into the parser. There are preprocessors
+which allow you to alter the source before it is passed to the parser, 
+inline patterns which allow you to add, remove or override the syntax of
+any inline elements, and postprocessors which allow munging of the
+output of the parser before it is returned. If you really want to dive in, 
+there are also blockprocessors which are part of the core BlockParser.
+
+As the parser builds an [ElementTree][] object which is later rendered 
+as Unicode text, there are also some helpers provided to ease manipulation of 
+the tree. Each part of the API is discussed in its respective section below. 
+Additionaly, reading the source of some [[Available Extensions]] may be helpful.
+For example, the [[Footnotes]] extension uses most of the features documented 
+here.
+
+* [Preprocessors][]
+* [InlinePatterns][]
+* [Treeprocessors][] 
+* [Postprocessors][]
+* [BlockParser][]
+* [Working with the ElementTree][]
+* [Integrating your code into Markdown][]
+    * [extendMarkdown][]
+    * [OrderedDict][]
+    * [registerExtension][]
+    * [Config Settings][]
+    * [makeExtension][]
+
+<h3 id="preprocessors">Preprocessors</h3>
+
+Preprocessors munge the source text before it is passed into the Markdown 
+core. This is an excellent place to clean up bad syntax, extract things the 
+parser may otherwise choke on and perhaps even store it for later retrieval.
+
+Preprocessors should inherit from ``markdown.preprocessors.Preprocessor`` and 
+implement a ``run`` method with one argument ``lines``. The ``run`` method of 
+each Preprocessor will be passed the entire source text as a list of Unicode 
+strings. Each string will contain one line of text. The ``run`` method should 
+return either that list, or an altered list of Unicode strings.
+
+A pseudo example:
+
+    class MyPreprocessor(markdown.preprocessors.Preprocessor):
+        def run(self, lines):
+            new_lines = []
+            for line in lines:
+                m = MYREGEX.match(line)
+                if m:
+                    # do stuff
+                else:
+                    new_lines.append(line)
+            return new_lines
+
+<h3 id="inlinepatterns">Inline Patterns</h3>
+
+Inline Patterns implement the inline HTML element syntax for Markdown such as
+``*emphasis*`` or ``[links](http://example.com)``. Pattern objects should be 
+instances of classes that inherit from ``markdown.inlinepatterns.Pattern`` or 
+one of its children. Each pattern object uses a single regular expression and 
+must have the following methods:
+
+* **``getCompiledRegExp()``**: 
+
+    Returns a compiled regular expression.
+
+* **``handleMatch(m)``**: 
+
+    Accepts a match object and returns an ElementTree element of a plain 
+    Unicode string.
+
+Note that any regular expression returned by ``getCompiledRegExp`` must capture
+the whole block. Therefore, they should all start with ``r'^(.*?)'`` and end
+with ``r'(.*?)!'``. When using the default ``getCompiledRegExp()`` method 
+provided in the ``Pattern`` you can pass in a regular expression without that 
+and ``getCompiledRegExp`` will wrap your expression for you. This means that 
+the first group of your match will be ``m.group(2)`` as ``m.group(1)`` will 
+match everything before the pattern.
+
+For an example, consider this simplified emphasis pattern:
+
+    class EmphasisPattern(markdown.inlinepatterns.Pattern):
+        def handleMatch(self, m):
+            el = markdown.etree.Element('em')
+            el.text = m.group(3)
+            return el
+
+As discussed in [Integrating Your Code Into Markdown][], an instance of this
+class will need to be provided to Markdown. That instance would be created
+like so:
+
+    # an oversimplified regex
+    MYPATTERN = r'\*([^*]+)\*'
+    # pass in pattern and create instance
+    emphasis = EmphasisPattern(MYPATTERN)
+
+Actually it would not be necessary to create that pattern (and not just because
+a more sophisticated emphasis pattern already exists in Markdown). The fact is,
+that example pattern is not very DRY. A pattern for `**strong**` text would
+be almost identical, with the exception that it would create a 'strong' element.
+Therefore, Markdown provides a number of generic pattern classes that can 
+provide some common functionality. For example, both emphasis and strong are
+implemented with separate instances of the ``SimpleTagPettern`` listed below. 
+Feel free to use or extend any of these Pattern classes.
+
+**Generic Pattern Classes**
+
+* **``SimpleTextPattern(pattern)``**:
+
+    Returns simple text of ``group(2)`` of a ``pattern``.
+
+* **``SimpleTagPattern(pattern, tag)``**:
+
+    Returns an element of type "`tag`" with a text attribute of ``group(3)``
+    of a ``pattern``. ``tag`` should be a string of a HTML element (i.e.: 'em').
+
+* **``SubstituteTagPattern(pattern, tag)``**:
+
+    Returns an element of type "`tag`" with no children or text (i.e.: 'br').
+
+There may be other Pattern classes in the Markdown source that you could extend
+or use as well. Read through the source and see if there is anything you can 
+use. You might even get a few ideas for different approaches to your specific
+situation.
+
+<h3 id="treeprocessors">Treeprocessors</h3>
+
+Treeprocessors manipulate an ElemenTree object after it has passed through the
+core BlockParser. This is where additional manipulation of the tree takes
+place. Additionally, the InlineProcessor is a Treeprocessor which steps through
+the tree and runs the InlinePatterns on the text of each Element in the tree.
+
+A Treeprocessor should inherit from ``markdown.treeprocessors.Treeprocessor``,
+over-ride the ``run`` method which takes one argument ``root`` (an Elementree 
+object) and returns either that root element or a modified root element.
+
+A pseudo example:
+
+    class MyTreeprocessor(markdown.treeprocessors.Treeprocessor):
+        def run(self, root):
+            #do stuff
+            return my_modified_root
+
+For specifics on manipulating the ElementTree, see 
+[Working with the ElementTree][] below.
+
+<h3 id="postprocessors">Postprocessors</h3>
+
+Postprocessors manipulate the document after the ElementTree has been 
+serialized into a string. Postprocessors should be used to work with the
+text just before output.
+
+A Postprocessor should inherit from ``markdown.postprocessors.Postprocessor`` 
+and over-ride the ``run`` method which takes one argument ``text`` and returns 
+a Unicode string.
+
+Postprocessors are run after the ElementTree has been serialized back into 
+Unicode text.  For example, this may be an appropriate place to add a table of 
+contents to a document:
+
+    class TocPostprocessor(markdown.postprocessors.Postprocessor):
+        def run(self, text):
+            return MYMARKERRE.sub(MyToc, text)
+
+<h3 id="blockparser">BlockParser</h3>
+
+Sometimes, pre/tree/postprocessors and Inline Patterns aren't going to do what 
+you need. Perhaps you want a new type of block type that needs to be integrated 
+into the core parsing. In such a situation, you can add/change/remove 
+functionality of the core ``BlockParser``. The BlockParser is composed of a
+number of Blockproccessors. The BlockParser steps through each block of text
+(split by blank lines) and passes each block to the appropriate Blockprocessor.
+That Blockprocessor parses the block and adds it to the ElementTree. The
+[[Definition Lists]] extension would be a good example of an extension that
+adds/modifies Blockprocessors.
+
+A Blockprocessor should inherit from ``markdown.blockprocessors.BlockProcessor``
+and implement both the ``test`` and ``run`` methods.
+
+The ``test`` method is used by BlockParser to identify the type of block.
+Therefore the ``test`` method must return a boolean value. If the test returns
+``True``, then the BlockParser will call that Blockprocessor's ``run`` method.
+If it returns ``False``, the BlockParser will move on to the next 
+BlockProcessor.
+
+The **``test``** method takes two arguments:
+
+* **``parent``**: The parent etree Element of the block. This can be useful as
+  the block may need to be treated differently if it is inside a list, for
+  example.
+
+* **``block``**: A string of the current block of text. The test may be a 
+  simple string method (such as ``block.startswith(some_text)``) or a complex 
+  regular expression.
+
+The **``run``** method takes two arguments:
+
+* **``parent``**: A pointer to the parent etree Element of the block. The run 
+  method will most likely attach additional nodes to this parent. Note that
+  nothing is returned by the method. The Elementree object is altered in place.
+
+* **``blocks``**: A list of all remaining blocks of the document. Your run 
+  method must remove (pop) the first block from the list (which it altered in
+  place - not returned) and parse that block. You may find that a block of text
+  legitimately contains multiple block types. Therefore, after processing the 
+  first type, your processor can insert the remaining text into the beginning
+  of the ``blocks`` list for future parsing.
+
+Please be aware that a single block can span multiple text blocks. For example,
+The official Markdown syntax rules state that a blank line does not end a
+Code Block. If the next block of text is also indented, then it is part of
+the previous block. Therefore, the BlockParser was specifically designed to 
+address these types of situations. If you notice the ``CodeBlockProcessor``,
+in the core, you will note that it checks the last child of the ``parent``.
+If the last child is a code block (``<pre><code>...</code></pre>``), then it
+appends that block to the previous code block rather than creating a new 
+code block.
+
+Each BlockProcessor has the following utility methods available:
+
+* **``lastChild(parent)``**: 
+
+    Returns the last child of the given etree Element or ``None`` if it had no 
+    children.
+
+* **``detab(text)``**: 
+
+    Removes one level of indent (four spaces by default) from the front of each
+    line of the given text string.
+
+* **``looseDetab(text, level)``**: 
+
+    Removes "level" levels of indent (defaults to 1) from the front of each line 
+    of the given text string. However, this methods allows secondary lines to 
+    not be indented as does some parts of the Markdown syntax.
+
+Each BlockProcessor also has a pointer to the containing BlockParser instance at
+``self.parser``, which can be used to check or alter the state of the parser.
+The BlockParser tracks it's state in a stack at ``parser.state``. The state
+stack is an instance of the ``State`` class.
+
+**``State``** is a subclass of ``list`` and has the additional methods:
+
+* **``set(state)``**: 
+
+    Set a new state to string ``state``. The new state is appended to the end 
+    of the stack.
+
+* **``reset()``**: 
+
+    Step back one step in the stack. The last state at the end is removed from 
+    the stack.
+
+* **``isstate(state)``**: 
+
+    Test that the top (current) level of the stack is of the given string 
+    ``state``.
+
+Note that to ensure that the state stack doesn't become corrupted, each time a
+state is set for a block, that state *must* be reset when the parser finishes
+parsing that block.
+
+An instance of the **``BlockParser``** is found at ``Markdown.parser``.
+``BlockParser`` has the following methods:
+
+* **``parseDocument(lines)``**: 
+
+    Given a list of lines, an ElementTree object is returned. This should be 
+    passed an entire document and is the only method the ``Markdown`` class 
+    calls directly.
+
+* **``parseChunk(parent, text)``**: 
+
+    Parses a chunk of markdown text composed of multiple blocks and attaches 
+    those blocks to the ``parent`` Element. The ``parent`` is altered in place 
+    and nothing is returned. Extensions would most likely use this method for 
+    block parsing.
+
+* **``parseBlocks(parent, blocks)``**: 
+
+    Parses a list of blocks of text and attaches those blocks to the ``parent``
+    Element. The ``parent`` is altered in place and nothing is returned. This 
+    method will generally only be used internally to recursively parse nested 
+    blocks of text.
+
+While is is not recommended, an extension could subclass or completely replace
+the ``BlockParser``. The new class would have to provide the same public API.
+However, be aware that other extensions may expect the core parser provided
+and will not work with such a drastically different parser.
+
+<h3 id="working_with_et">Working with the ElementTree</h3>
+
+As mentioned, the Markdown parser converts a source document to an 
+[ElementTree][] object before serializing that back to Unicode text. 
+Markdown has provided some helpers to ease that manipulation within the context 
+of the Markdown module.
+
+First, to get access to the ElementTree module import ElementTree from 
+``markdown`` rather than importing it directly. This will ensure you are using 
+the same version of ElementTree as markdown. The module is named ``etree`` 
+within Markdown.
+
+    from markdown import etree
+    
+``markdown.etree`` tries to import ElementTree from any known location, first 
+as a standard library module (from ``xml.etree`` in Python 2.5), then as a third
+party package (``Elementree``). In each instance, ``cElementTree`` is tried 
+first, then ``ElementTree`` if the faster C implementation is not available on 
+your system.
+
+Sometimes you may want text inserted into an element to be parsed by 
+[InlinePatterns][]. In such a situation, simply insert the text as you normally
+would and the text will be automatically run through the InlinePatterns. 
+However, if you do *not* want some text to be parsed by InlinePatterns,
+then insert the text as an ``AtomicString``.
+
+    some_element.text = markdown.AtomicString(some_text)
+
+Here's a basic example which creates an HTML table (note that the contents of 
+the second cell (``td2``) will be run through InlinePatterns latter):
+
+    table = etree.Element("table") 
+    table.set("cellpadding", "2")                      # Set cellpadding to 2
+    tr = etree.SubElement(table, "tr")                 # Add child tr to table
+    td1 = etree.SubElement(tr, "td")                   # Add child td1 to tr
+    td1.text = markdown.AtomicString("Cell content")   # Add plain text content
+    td2 = etree.SubElement(tr, "td")                   # Add second td to tr
+    td2.text = "*text* with **inline** formatting."    # Add markup text
+    table.tail = "Text after table"                    # Add text after table
+
+You can also manipulate an existing tree. Consider the following example which 
+adds a ``class`` attribute to ``<a>`` elements:
+
+	def set_link_class(self, element):
+		for child in element: 
+		    if child.tag == "a":
+                child.set("class", "myclass") #set the class attribute
+            set_link_class(child) # run recursively on children
+
+For more information about working with ElementTree see the ElementTree
+[Documentation](http://effbot.org/zone/element-index.htm) 
+([Python Docs](http://docs.python.org/lib/module-xml.etree.ElementTree.html)).
+
+<h3 id="integrating_into_markdown">Integrating Your Code Into Markdown</h3>
+
+Once you have the various pieces of your extension built, you need to tell 
+Markdown about them and ensure that they are run in the proper sequence. 
+Markdown accepts a ``Extension`` instance for each extension. Therefore, you
+will need to define a class that extends ``markdown.Extension`` and over-rides
+the ``extendMarkdown`` method. Within this class you will manage configuration 
+options for your extension and attach the various processors and patterns to 
+the Markdown instance. 
+
+It is important to note that the order of the various processors and patterns 
+matters. For example, if we replace ``http://...`` links with <a> elements, and 
+*then* try to deal with  inline html, we will end up with a mess. Therefore, 
+the various types of processors and patterns are stored within an instance of 
+the Markdown class in [OrderedDict][]s. Your ``Extension`` class will need to 
+manipulate those OrderedDicts appropriately. You may insert instances of your 
+processors and patterns into the appropriate location in an OrderedDict, remove
+a built-in instance, or replace a built-in instance with your own.
+
+<h4 id="extendmarkdown">extendMarkdown</h4>
+
+The ``extendMarkdown`` method of a ``markdown.Extension`` class accepts two 
+arguments:
+
+* **``md``**:
+
+    A pointer to the instance of the Markdown class. You should use this to 
+    access the [OrderedDict][]s of processors and patterns. They are found 
+    under the following attributes:
+
+    * ``md.preprocessors``
+    * ``md.inlinePatterns``
+    * ``md.parser.blockprocessors``
+    * ``md.treepreprocessors``
+    * ``md.postprocessors``
+
+    Some other things you may want to access in the markdown instance are:
+
+    * ``md.htmlStash``
+    * ``md.output_formats``
+    * ``md.set_output_format()``
+    * ``md.registerExtension()``
+
+* **``md_globals``**:
+
+    Contains all the various global variables within the markdown module.
+
+Of course, with access to those items, theoretically you have the option to 
+changing anything through various [monkey_patching][] techniques. However, you 
+should be aware that the various undocumented or private parts of markdown 
+may change without notice and your monkey_patches may break with a new release.
+Therefore, what you really should be doing is inserting processors and patterns
+into the markdown pipeline. Consider yourself warned.
+
+[monkey_patching]: http://en.wikipedia.org/wiki/Monkey_patch
+
+A simple example:
+
+    class MyExtension(markdown.Extension):
+        def extendMarkdown(self, md, md_globals):
+            # Insert instance of 'mypattern' before 'references' pattern
+            md.inlinePatterns.add('mypattern', MyPattern(md), '<references')
+
+<h4 id="ordereddict">OrderedDict</h4>
+
+An OrderedDict is a dictionary like object that retains the order of it's
+items. The items are ordered in the order in which they were appended to
+the OrderedDict. However, an item can also be inserted into the OrderedDict
+in a specific location in relation to the existing items.
+
+Think of OrderedDict as a combination of a list and a dictionary as it has 
+methods common to both. For example, you can get and set items using the 
+``od[key] = value`` syntax and the methods ``keys()``, ``values()``, and 
+``items()`` work as expected with the keys, values and items returned in the 
+proper order. At the same time, you can use ``insert()``, ``append()``, and 
+``index()`` as you would with a list.
+
+Generally speaking, within Markdown extensions you will be using the special 
+helper method ``add()`` to add additional items to an existing OrderedDict. 
+
+The ``add()`` method accepts three arguments:
+
+* **``key``**: A string. The key is used for later reference to the item.
+
+* **``value``**: The object instance stored in this item.
+
+* **``location``**: Optional. The items location in relation to other items. 
+
+    Note that the location can consist of a few different values:
+
+    * The special strings ``"_begin"`` and ``"_end"`` insert that item at the 
+      beginning or end of the OrderedDict respectively. 
+    
+    * A less-than sign (``<``) followed by an existing key (i.e.: 
+      ``"<somekey"``) inserts that item before the existing key.
+    
+    * A greater-than sign (``>``) followed by an existing key (i.e.: 
+      ``">somekey"``) inserts that item after the existing key. 
+
+Consider the following example:
+
+    >>> import markdown
+    >>> od = markdown.OrderedDict()
+    >>> od['one'] =  1           # The same as: od.add('one', 1, '_begin')
+    >>> od['three'] = 3          # The same as: od.add('three', 3, '>one')
+    >>> od['four'] = 4           # The same as: od.add('four', 4, '_end')
+    >>> od.items()
+    [("one", 1), ("three", 3), ("four", 4)]
+
+Note that when building an OrderedDict in order, the extra features of the
+``add`` method offer no real value and are not necessary. However, when 
+manipulating an existing OrderedDict, ``add`` can be very helpful. So let's 
+insert another item into the OrderedDict.
+
+    >>> od.add('two', 2, '>one')         # Insert after 'one'
+    >>> od.values()
+    [1, 2, 3, 4]
+
+Now let's insert another item.
+
+    >>> od.add('twohalf', 2.5, '<three') # Insert before 'three'
+    >>> od.keys()
+    ["one", "two", "twohalf", "three", "four"]
+
+Note that we also could have set the location of "twohalf" to be 'after two'
+(i.e.: ``'>two'``). However, it's unlikely that you will have control over the 
+order in which extensions will be loaded, and this could affect the final 
+sorted order of an OrderedDict. For example, suppose an extension adding 
+'twohalf' in the above examples was loaded before a separate  extension which 
+adds 'two'. You may need to take this into consideration when adding your 
+extension components to the various markdown OrderedDicts.
+
+Once an OrderedDict is created, the items are available via key:
+
+    MyNode = od['somekey']
+
+Therefore, to delete an existing item:
+
+    del od['somekey']
+
+To change the value of an existing item (leaving location unchanged):
+
+    od['somekey'] = MyNewObject()
+
+To change the location of an existing item:
+
+    t.link('somekey', '<otherkey')
+
+<h4 id="registerextension">registerExtension</h4>
+
+Some extensions may need to have their state reset between multiple runs of the
+Markdown class. For example, consider the following use of the [[Footnotes]] 
+extension:
+
+    md = markdown.Markdown(extensions=['footnotes'])
+    html1 = md.convert(text_with_footnote)
+    md.reset()
+    html2 = md.convert(text_without_footnote)
+
+Without calling ``reset``, the footnote definitions from the first document will
+be inserted into the second document as they are still stored within the class
+instance. Therefore the ``Extension`` class needs to define a ``reset`` method
+that will reset the state of the extension (i.e.: ``self.footnotes = {}``).
+However, as many extensions do not have a need for ``reset``, ``reset`` is only
+called on extensions that are registered.
+
+To register an extension, call ``md.registerExtension`` from within your 
+``extendMarkdown`` method:
+
+
+    def extendMarkdown(self, md, md_globals):
+        md.registerExtension(self)
+        # insert processors and patterns here
+
+Then, each time ``reset`` is called on the Markdown instance, the ``reset`` 
+method of each registered extension will be called as well. You should also
+note that ``reset`` will be called on each registered extension after it is
+initialized the first time. Keep that in mind when over-riding the extension's
+``reset`` method.
+
+<h4 id="configsettings">Config Settings</h4>
+
+If an extension uses any parameters that the user may want to change,
+those parameters should be stored in ``self.config`` of your 
+``markdown.Extension`` class in the following format:
+
+    self.config = {parameter_1_name : [value1, description1],
+                   parameter_2_name : [value2, description2] }
+
+When stored this way the config parameters can be over-ridden from the
+command line or at the time Markdown is initiated:
+
+    markdown.py -x myextension(SOME_PARAM=2) inputfile.txt > output.txt
+
+Note that parameters should always be assumed to be set to string
+values, and should be converted at run time. For example:
+
+    i = int(self.getConfig("SOME_PARAM"))
+
+<h4 id="makeextension">makeExtension</h4>
+
+Each extension should ideally be placed in its own module starting
+with the  ``mdx_`` prefix (e.g. ``mdx_footnotes.py``).  The module must
+provide a module-level function called ``makeExtension`` that takes
+an optional parameter consisting of a dictionary of configuration over-rides 
+and returns an instance of the extension.  An example from the footnote 
+extension:
+
+    def makeExtension(configs=None) :
+        return FootnoteExtension(configs=configs)
+
+By following the above example, when Markdown is passed the name of your 
+extension as a string (i.e.: ``'footnotes'``), it will automatically import
+the module and call the ``makeExtension`` function initiating your extension.
+
+You may have noted that the extensions packaged with Python-Markdown do not
+use the ``mdx_`` prefix in their module names. This is because they are all
+part of the ``markdown.extensions`` package. Markdown will first try to import
+from ``markdown.extensions.extname`` and upon failure, ``mdx_extname``. If both
+fail, Markdown will continue without the extension.
+
+However, Markdown will also accept an already existing instance of an extension.
+For example:
+
+    import markdown
+    import myextension
+    configs = {...}
+    myext = myextension.MyExtension(configs=configs)
+    md = markdown.Markdown(extensions=[myext])
+
+This is useful if you need to implement a large number of extensions with more
+than one residing in a module.
+
+[Preprocessors]: #preprocessors
+[InlinePatterns]: #inlinepatterns
+[Treeprocessors]: #treeprocessors
+[Postprocessors]: #postprocessors
+[BlockParser]: #blockparser
+[Working with the ElementTree]: #working_with_et
+[Integrating your code into Markdown]: #integrating_into_markdown
+[extendMarkdown]: #extendmarkdown
+[OrderedDict]: #ordereddict
+[registerExtension]: #registerextension
+[Config Settings]: #configsettings
+[makeExtension]: #makeextension
+[ElementTree]: http://effbot.org/zone/element-index.htm