aboutsummaryrefslogtreecommitdiff
path: root/docs/Notes_on_WST_StructuredDocument.txt
diff options
context:
space:
mode:
Diffstat (limited to 'docs/Notes_on_WST_StructuredDocument.txt')
-rwxr-xr-xdocs/Notes_on_WST_StructuredDocument.txt183
1 files changed, 0 insertions, 183 deletions
diff --git a/docs/Notes_on_WST_StructuredDocument.txt b/docs/Notes_on_WST_StructuredDocument.txt
deleted file mode 100755
index dcf124d56..000000000
--- a/docs/Notes_on_WST_StructuredDocument.txt
+++ /dev/null
@@ -1,183 +0,0 @@
-Notes on WST StructuredDocument
--------------------------------
-
-Created: 2010/11/26
-References: WST 3.1.x, Eclipse 3.5 Galileo
-
-To manipulate XML documents in refactorings, we sometimes use the WST/SEE
-"StructuredDocument" API. There isn't exactly a lot of documentation on
-this out there, so this is a short explanation of how it works, totally
-based on _empirical_ evidence. As such, it must be taken with a grain of salt.
-
-Examples of usage can be found in
- sdk/eclipse/plugins/com.android.ide.eclipse.adt/src/com/android/ide/eclipse/adt/internal/refactorings/
-
-
-1- Get a document instance
---------------------------
-
-To get a document from an existing IFile resource:
-
- IModelManager modelMan = StructuredModelManager.getModelManager();
- IStructuredDocument sdoc = modelMan.createStructuredDocumentFor(file);
-
-Note that the IStructuredDocument and all the associated interfaces we'll use
-below are all located in org.eclipse.wst.sse.core.internal.provisional,
-meaning they _might_ change later.
-
-Also note that this parses the content of the file on disk, not of a buffer
-with pending unsaved modifications opened in an editor.
-
-There is a counterpart for non-existent resources:
-
- IModelManager.createNewStructuredDocumentFor(IFile)
-
-However our goal so far has been to _parse_ existing documents, find
-the place that we wanted to modify and then generate a TextFileChange
-for a refactoring operation. Consequently this document doesn't say
-anything about using this model to modify content directly.
-
-
-2- Structured Document overview
--------------------------------
-
-The IStructuredDocument is organized in "regions", which are little pieces
-of text.
-
-The document contains a list of region collections, each one being
-a list of regions. Each region has a type, as well as text.
-
-Since we use this to parse XML, let's look at this XML example:
-
-<?xml version="1.0" encoding="utf-8"?> \n
-<resource> \n
- <color/>
- <string name="my_string">Some Value</string> <!-- comment -->\n
-</resource>
-
-
-This will result in the following regions and sub-regions:
-(all the constants below are located in DOMRegionContext)
-
-XML_PI_OPEN
- XML_PI_OPEN:<?
- XML_TAG_NAME:xml
- XML_TAG_ATTRIBUTE_NAME:version
- XML_TAG_ATTRIBUTE_EQUALS:=
- XML_TAG_ATTRIBUTE_VALUE:"1.0"
- XML_TAG_ATTRIBUTE_NAME:encoding
- XML_TAG_ATTRIBUTE_EQUALS:=
- XML_TAG_ATTRIBUTE_VALUE:"utf-8"
- XML_PI_CLOSE:?>
-
-XML_CONTENT
- XML_CONTENT:\n
-
-XML_TAG_NAME
- XML_TAG_OPEN:<
- XML_TAG_NAME:resources
- XML_TAG_CLOSE:>
-
-XML_CONTENT
- XML_CONTENT:\n + whitespace before color
-
-XML_TAG_NAME
- XML_TAG_OPEN:<
- XML_TAG_NAME:color
- XML_EMPTY_TAG_CLOSE:/>
-
-XML_CONTENT
- XML_CONTENT:\n + whitespace before string
-
-XML_TAG_NAME
- XML_TAG_OPEN:<
- XML_TAG_NAME:string
- XML_TAG_ATTRIBUTE_NAME:name
- XML_TAG_ATTRIBUTE_EQUALS:=
- XML_TAG_ATTRIBUTE_VALUE:"my_string"
- XML_TAG_CLOSE:>
-
-XML_CONTENT
- XML_CONTENT:Some Value
-
-XML_TAG_NAME
- XML_END_TAG_OPEN:</
- XML_TAG_NAME:string
- XML_TAG_CLOSE:>
-
-XML_CONTENT
- XML_CONTENT: (2 spaces before the comment)
-
-XML_COMMENT_TEXT
- XML_COMMENT_OPEN:<!--
- XML_COMMENT_TEXT: comment
- XML_COMMENT_CLOSE:--
-
-XML_CONTENT
- XML_CONTENT: \n after comment
-
-XML_TAG_NAME
- XML_END_TAG_OPEN:</
- XML_TAG_NAME:resources
- XML_TAG_CLOSE:>
-
-XML_CONTENT
- XML_CONTENT:
-
-
-3- Iterating through regions
-----------------------------
-
-To iterate through all regions, we need to process the list of top-level regions and then
-iterate over inner regions:
-
- for (IStructuredDocumentRegion regions : sdoc.getStructuredDocumentRegions()) {
- // process inner regions
- for (int i = 0; i < regions.getNumberOfRegions(); i++) {
- ITextRegion region = regions.getRegions().get(i);
- String type = region.getType();
- String text = regions.getText(region);
- }
- }
-
-Each "region collection" basically matches one XML tag, with sub-regions for all the tokens
-inside a tag.
-
-Note that an XML_CONTENT region is actually the whitespace, was is known as a TEXT in the w3c DOM.
-
-Also note that each outer region has a type, but the inner regions also reuse a similar type.
-So for example an outer XML_TAG_NAME region collection is a proper XML tag, and it will contain
-an opening tag, a closing tag but also an XML_TAG_NAME that is the tag name itself.
-
-Surprisingly, the inner regions do not have many access methods we can use on them, except their
-type and start/length/end. There are two length and end methods:
-- getLength() and getEnd() take any whitespace into account.
-- getTextLength() and getTextEnd() exclude some typical trailing whitespace.
-
-Note that regarding the trailing whitespace, empirical evidence shows that in the XML case
-here, the only case where it matters is in a tag such as <string name="my_string">: for the
-XML_TAG_NAME region, getLength is 7 (string + space) and getTextLength is 6 (string, no space).
-Spacing between XML element is its own collapsed region.
-
-If you want the text of the inner region, you actually need to query it from the outer region.
-The outer IStructuredDocumentRegion (the region collection) contains lots more useful access
-methods, some of which return details on the inner regions:
-- getText : without the whitespace.
-- getFullText : with the whitespace.
-- getStart / getLength / getEnd : type-dependent offset, including whitespace.
-- getStart / getTextLength / getTextEnd : type-dependent offset, excluding "irrelevant" whitespace.
-- getStartOffset / getEndOffset / getTextEndOffset : relative to document.
-
-Empirical evidence shows that there is no discernible difference between the getStart/getEnd
-values and those returned by getStartOffset/getEndOffset. Please abide by the javadoc.
-
-All offsets start at zero.
-
-Given a region collection, you can also browse regions either using a getRegions() list, or
-using getFirst/getLastRegion, or using getRegionAtCharacterOffset(). Iterating the region
-list seems the most useful scenario. There's no actual iterator provided for inner regions.
-
-There are a few other methods available in the regions classes. This was not an exhaustive list.
-
-
-----