aboutsummaryrefslogtreecommitdiff
path: root/Readme.md
blob: 5ab88f86d3be4e74e557b0aadb2f6acc55ff21c5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
xml-rs, an XML library for Rust
===============================

[![Build Status][build-status-img]](https://github.com/netvl/xml-rs/actions?query=workflow%3ACI)
[![crates.io][crates-io-img]](https://crates.io/crates/xml-rs)
[![docs][docs-img]](https://docs.rs/xml-rs/)

[Documentation](https://docs.rs/xml-rs/)

  [build-status-img]: https://img.shields.io/github/workflow/status/netvl/xml-rs/CI/master?style=flat-square
  [crates-io-img]: https://img.shields.io/crates/v/xml-rs.svg?style=flat-square
  [docs-img]: https://img.shields.io/badge/docs-latest%20release-6495ed.svg?style=flat-square

xml-rs is an XML library for [Rust](http://www.rust-lang.org/) programming language.
It is heavily inspired by Java [Streaming API for XML (StAX)][stax].

  [stax]: https://en.wikipedia.org/wiki/StAX

This library currently contains pull parser much like [StAX event reader][stax-reader].
It provides iterator API, so you can leverage Rust's existing iterators library features.

  [stax-reader]: http://docs.oracle.com/javase/8/docs/api/javax/xml/stream/XMLEventReader.html

It also provides a streaming document writer much like [StAX event writer][stax-writer].
This writer consumes its own set of events, but reader events can be converted to
writer events easily, and so it is possible to write XML transformation chains in a pretty
clean manner.

  [stax-writer]: http://docs.oracle.com/javase/8/docs/api/javax/xml/stream/XMLEventWriter.html

This parser is mostly full-featured, however, there are limitations:
* no other encodings but UTF-8 are supported yet, because no stream-based encoding library
  is available now; when (or if) one will be available, I'll try to make use of it;
* DTD validation is not supported, `<!DOCTYPE>` declarations are completely ignored; thus no
  support for custom entities too; internal DTD declarations are likely to cause parsing errors;
* attribute value normalization is not performed, and end-of-line characters are not normalized too.

Other than that the parser tries to be mostly XML-1.0-compliant.

Writer is also mostly full-featured with the following limitations:
* no support for encodings other than UTF-8, for the same reason as above;
* no support for emitting `<!DOCTYPE>` declarations;
* more validations of input are needed, for example, checking that namespace prefixes are bounded
  or comments are well-formed.

What is planned (highest priority first, approximately):

0. missing features required by XML standard (e.g. aforementioned normalization and
   proper DTD parsing);
1. miscellaneous features of the writer;
2. parsing into a DOM tree and its serialization back to XML text;
3. SAX-like callback-based parser (fairly easy to implement over pull parser);
4. DTD validation;
5. (let's dream a bit) XML Schema validation.

Building and using
------------------

xml-rs uses [Cargo](http://crates.io), so just add a dependency section in your project's manifest:

```toml
[dependencies]
xml-rs = "0.8"
```

The package exposes a single crate called `xml`:

```rust
extern crate xml;
```

Reading XML documents
---------------------

`xml::reader::EventReader` requires a `Read` instance to read from. When a proper stream-based encoding
library is available, it is likely that xml-rs will be switched to use whatever character stream structure
this library would provide, but currently it is a `Read`.

Using `EventReader` is very straightforward. Just provide a `Read` instance to obtain an iterator
over events:

```rust,no_run
extern crate xml;

use std::fs::File;
use std::io::BufReader;

use xml::reader::{EventReader, XmlEvent};

fn indent(size: usize) -> String {
    const INDENT: &'static str = "    ";
    (0..size).map(|_| INDENT)
             .fold(String::with_capacity(size*INDENT.len()), |r, s| r + s)
}

fn main() {
    let file = File::open("file.xml").unwrap();
    let file = BufReader::new(file);

    let parser = EventReader::new(file);
    let mut depth = 0;
    for e in parser {
        match e {
            Ok(XmlEvent::StartElement { name, .. }) => {
                println!("{}+{}", indent(depth), name);
                depth += 1;
            }
            Ok(XmlEvent::EndElement { name }) => {
                depth -= 1;
                println!("{}-{}", indent(depth), name);
            }
            Err(e) => {
                println!("Error: {}", e);
                break;
            }
            _ => {}
        }
    }
}
```

`EventReader` implements `IntoIterator` trait, so you can just use it in a `for` loop directly.
Document parsing can end normally or with an error. Regardless of exact cause, the parsing
process will be stopped, and iterator will terminate normally.

You can also have finer control over when to pull the next event from the parser using its own
`next()` method:

```rust,ignore
match parser.next() {
    ...
}
```

Upon the end of the document or an error the parser will remember that last event and will always
return it in the result of `next()` call afterwards. If iterator is used, then it will yield
error or end-of-document event once and will produce `None` afterwards.

It is also possible to tweak parsing process a little using `xml::reader::ParserConfig` structure.
See its documentation for more information and examples.

You can find a more extensive example of using `EventReader` in `src/analyze.rs`, which is a
small program (BTW, it is built with `cargo build` and can be run after that) which shows various
statistics about specified XML document. It can also be used to check for well-formedness of
XML documents - if a document is not well-formed, this program will exit with an error.

Writing XML documents
---------------------

xml-rs also provides a streaming writer much like StAX event writer. With it you can write an
XML document to any `Write` implementor.

```rust,no_run
extern crate xml;

use std::fs::File;
use std::io::{self, Write};

use xml::writer::{EventWriter, EmitterConfig, XmlEvent, Result};

fn handle_event<W: Write>(w: &mut EventWriter<W>, line: String) -> Result<()> {
    let line = line.trim();
    let event: XmlEvent = if line.starts_with("+") && line.len() > 1 {
        XmlEvent::start_element(&line[1..]).into()
    } else if line.starts_with("-") {
        XmlEvent::end_element().into()
    } else {
        XmlEvent::characters(&line).into()
    };
    w.write(event)
}

fn main() {
    let mut file = File::create("output.xml").unwrap();

    let mut input = io::stdin();
    let mut output = io::stdout();
    let mut writer = EmitterConfig::new().perform_indent(true).create_writer(&mut file);
    loop {
        print!("> "); output.flush().unwrap();
        let mut line = String::new();
        match input.read_line(&mut line) {
            Ok(0) => break,
            Ok(_) => match handle_event(&mut writer, line) {
                Ok(_) => {}
                Err(e) => panic!("Write error: {}", e)
            },
            Err(e) => panic!("Input error: {}", e)
        }
    }
}
```

The code example above also demonstrates how to create a writer out of its configuration.
Similar thing also works with `EventReader`.

The library provides an XML event building DSL which helps to construct complex events,
e.g. ones having namespace definitions. Some examples:

```rust,ignore
// <a:hello a:param="value" xmlns:a="urn:some:document">
XmlEvent::start_element("a:hello").attr("a:param", "value").ns("a", "urn:some:document")

// <hello b:config="name" xmlns="urn:default:uri">
XmlEvent::start_element("hello").attr("b:config", "value").default_ns("urn:defaul:uri")

// <![CDATA[some unescaped text]]>
XmlEvent::cdata("some unescaped text")
```

Of course, one can create `XmlEvent` enum variants directly instead of using the builder DSL.
There are more examples in `xml::writer::XmlEvent` documentation.

The writer has multiple configuration options; see `EmitterConfig` documentation for more
information.

Other things
------------

No performance tests or measurements are done. The implementation is rather naive, and no specific
optimizations are made. Hopefully the library is sufficiently fast to process documents of common size.
I intend to add benchmarks in future, but not until more important features are added.

Known issues
------------

All known issues are present on GitHub issue tracker: <http://github.com/netvl/xml-rs/issues>.
Feel free to post any found problems there.

License
-------

This library is licensed under MIT license.

---
Copyright (C) Vladimir Matveev, 2014-2020