Expand description
Parsing support for the network document meta-format
The meta-format used by Tor network documents evolved over time from a legacy line-oriented format. Itβs described more fully in Torβs dir-spec.txt.
In brief, a network document is a sequence of tokenize::Items. Each Item starts with a keyword::Keyword, takes a number of arguments on the same line, and is optionally followed by a PEM-like base64-encoded object.
Individual document types define further restrictions on the Items. They may require Items with a particular keyword to have a certain number of arguments, to have (or not have) a particular kind of object, to appear a certain number of times, and so on.
More complex documents can be divided into parser::Sections. A Section might correspond to the header or footer of a longer document, or to a single stanza in a longer document.
To parse a document into a Section, the programmer defines a type
of keyword that the document will use, using the
decl_keyword! macro. The programmer then defines a
parser::SectionRules object, containing a rules::TokenFmt
describing the rules for each allowed keyword in the
section. Finally, the programmer uses a tokenize::NetDocReader
to tokenize the document, passing the stream of tokens to the
SectionRules object to validate and parse it into a Section.
For multiple-section documents, this crate uses
Itertools::peeking_take_while
(via a [.pause_at](NetDocReader::pause_at) convenience method)
and a batching_split_before
module which can split
a document item iterator into sections..
ModulesΒ§
- keyword π
- Declaration for the Keyword trait.
- macros π
- Declares macros to help implementing parsers.
- parser π
- Based on a set of rules, validate a token stream and collect the tokens by type.
- rules π
- Keywords for interpreting items and rules for validating them.
- tokenize π
- Break a string into a set of directory-object Items.