src/doc/sphinx/selector.rst (12622B) - raw
1 Selector expressions
2 ====================
3
4 Selector expressions are a concise way of selecting some set of nodes in an RDF
5 graph, given a particular starting node. For example, given the graph:
6
7 .. include:: example-graph.rst.inc
8
9 evaluating the selector expression ``foaf:knows/foaf:name`` with a starting node of
10 ``<bob>`` would yield two literal nodes: ``"Alice"`` and ``"Carol"``.
11
12 The syntax for selector expressions was inspired by `XPath`_ syntax, so if you
13 are familiar with XPath you will notice many similarities.
14
15 .. todo:: make graphviz output prettier
16
17 .. _XPath: http://www.w3.org/TR/xpath/
18
19 Syntax
20 ------
21
22 Traversing
23 ~~~~~~~~~~
24
25 At the heart of selector expressions is the notion of traversing a path through
26 the RDF graph along properties, in the same way XPath can traverse a document
27 tree. Selector expressions are always evaluated with respect to a context node
28 in the graph, which is the starting point for traversals.
29
30 The simplest traversal consists of a single RDF property, such as
31 ``foaf:knows``. This expression selects all nodes which are the object of
32 a foaf:knows predicate where the starting node is the subject. In other words,
33 it can be considered equivalent to the SPARQL query:
34
35 .. code-block:: none
36
37 SELECT ?o
38 WHERE { ?start foaf:knows ?o . }
39
40 (Actually the *simplest* traversal is the empty string, which always evaluates
41 to the starting node. This is really only meaningful when used with other
42 syntax elements described below.)
43
44 If we used ``<bob>`` in the example graph above as our starting node, the
45 expression ``foaf:knows`` would evaluate to two resource nodes: ``<alice>`` and
46 ``<carol>``. In general a selector expression may yield zero or more results.
47 For example, if we used ``<alice>`` as a starting node, the result would be
48 empty.
49
50 Multiple traversals may be chained together using ``/`` as a separator, as in
51 ``foaf:knows/foaf:name``.
52
53 Note that property URIs are always given in their prefixed form. In order to
54 keep the syntax simple, there is no way to specify a complete URI reference in
55 a selector expression.
56
57 Inverse traversal
58 ~~~~~~~~~~~~~~~~~
59
60 The direction of a property traversal can be inverted by prepending ``!`` to
61 the property name. For example, given some article as a starting node, the
62 expression ``dc:creator/!dc:creator/dc:title`` might be used to select the
63 title of all articles written by the authors of the starting node.
64
65 .. _predicates:
66
67 Predicates
68 ~~~~~~~~~~
69
70 The set of nodes resulting from a traversal can be filtered with a predicate.
71 The predicate is given in square brackets (``[]``) following the property name.
72 Predicates may appear at any point in the chain of traversals.
73
74 The following predicates are supported:
75
76 ``type``
77 Includes only nodes of the given type. Use it like this:
78 ``!dc:creator[type=bibo:Article]``.
79
80 ``uri-prefix``
81 Includes only resource nodes whose URI begins with the given string. Use it
82 like this: ``dc:identifier[uri-prefix='urn:issn:']``.
83
84 Multiple predicates may be applied by joining them together with the ``and``
85 keyword, as in ``!dc:creator[type=bibo:Article and uri-prefix='http://example.com/']``.
86
87 Custom predicates may be defined at runtime by supplying a custom
88 :java:class:`PredicateResolver` implementation.
89
90 .. _adaptations:
91
92 Adapting the result
93 ~~~~~~~~~~~~~~~~~~~
94
95 The result of evaluating a traversal is zero or more RDF nodes (in Java,
96 implementations of Jena’s :java:class:`RDFNode
97 <com.hp.hpl.jena.rdf.model.RDFNode>` interface). However, it is often necessary
98 to convert these RDF nodes into a more useful data type, or to perform some
99 post-processing on them.
100
101 An adaptation is a function which takes an RDF node and “adapts” it in some
102 way. An adaptation can be specified at the end of a selector expression,
103 preceded by ``#`` and optionally followed by an argument list. For example, the
104 expression ``foaf:knows#uri`` would evaluate to the URIs of the people known to
105 the starting node. The distinction here is important: whereas ``foaf:knows``
106 evaluates to zero or more :java:class:`RDFNodes
107 <com.hp.hpl.jena.rdf.model.RDFNode>`, ``foaf:knows#uri`` evaluates to zero or
108 more :java:class:`Strings <java.lang.String>` giving the URI of each node.
109
110 The following adaptations are supported:
111
112 ``uri``
113 Returns the URI of the RDF node as a :java:class:`String
114 <java.lang.String>`. Throws an exception if applied to a node which is not
115 a resource.
116
117 ``uri-slice``
118 Returns a substring of the URI. This adaptation takes a single integer
119 argument specifying the number of characters to be removed. Use it like
120 this: ``dc:identifier[uri-prefix='urn:issn:']#uri-slice(9)``.
121
122 ``uri-anchor``
123 Returns the anchor part of the URI, excluding the # character. Returns
124 empty string if there is no anchor part.
125
126 ``lv``
127 Short for “literal value”. Returns the value of the literal RDF node,
128 converted to a Java object using Jena’s type conversion facilities (see
129 :java:method:`Literal#getValue()
130 <com.hp.hpl.jena.rdf.model.Literal#getValue()>`). Throws an exception if
131 applied to a node which is not a literal.
132
133 ``comparable-lv``
134 Essentially the same as ``lv``, but with a runtime check to ensure the
135 literal value implements :java:class:`Comparable <java.lang.Comparable>`.
136 Only exists for type-safety reasons.
137
138 ``string-lv``
139 Like ``lv``, but additionally calls toString() on the resulting object to
140 ensure it is always a String. This adaptation also strips all tags from XML
141 literals.
142
143 ``formatted-dt``
144 Short for “formatted date-time”. This adaptation can only be applied to
145 literal nodes whose values are represented as Joda datetime types. It takes
146 a single string argument, specifying the date-time format to apply. Use it
147 like this: ``dc:created#formatted-dt('d MMMM yyyy')``.
148
149 .. todo:: hacks for Joda are not in stock Jena
150
151 Custom adaptations may be defined at runtime by supplying a custom
152 :java:class:`AdaptationFactory` implementation.
153
154 Sorting the result
155 ~~~~~~~~~~~~~~~~~~
156
157 RDF graphs by their nature do not define any ordering, so a selector expression
158 like ``foaf:knows`` will return its results in arbitrary order. When we expect
159 the result to contain more than one node, it is often useful to ensure
160 a predictable (repeatable) ordering of the resulting nodes.
161
162 Sorting can be applied at any point in the chain of traversals, by giving
163 a sort expression enclosed in parentheses (``()``). The sort expression can be
164 a complete selector expression (including multiple traversals, nested sorts,
165 and any other selector features). The set of nodes in the traversal are then
166 sorted by evaluating the sort expression for each node, and sorting with these
167 values as keys. The sort expression may optionally be prepended with ``~`` to
168 indicate a reverse sort.
169
170 For example, given an author as a starting node,
171 ``!dc:creator(dc:title#comparable-lv)`` would evaluate to the works created by
172 that author, ordered by the title of each work.
173
174 Note that the sort expression must always evaluate to a Java object which
175 implements :java:class:`Comparable <java.lang.Comparable>`, so it is typically
176 necessary to apply the ``comparable-lv`` adaptation in the sort expression.
177
178 If one expression is not enough to uniquely sort each item in the result,
179 multiple sort expressions can be specified using ``,`` to separate them.
180
181 Selecting from many results
182 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
183
184 A sort expression may optionally be followed by a subscript ``[n]``, indicating
185 that only the *n*-th node in the result should be selected. For example,
186 ``!dc:creator(~dc:date)[0]/dc:title`` might be used to select the title of an
187 author’s most recent work.
188
189 Combining multiple expressions
190 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
191
192 Selector expressions can be chained together using ``|``. The result of the
193 expression will be the result of each sub-expression chained together in
194 sequence. For example: ``!dc:creator | !bibo:translator``.
195
196 Evaluating expressions
197 ----------------------
198
199 The following classes in the au.id.djc.rdftemplate.selector package are
200 relevant for compiling and evaluating selector expressions:
201
202 .. java:class:: au.id.djc.rdftemplate.selector.Selector<T>
203
204 This interface represents the compiled version of a selector expression. It
205 is parametrised on the result type of the expression.
206
207 .. java:method:: java.lang.Class<T> getResultType()
208
209 Returns the result type of this selector expression. (This is the runtime
210 class of the type parameter T.) For a simple traversal this will be
211 :java:class:`RDFNode <com.hp.hpl.jena.rdf.model.RDFNode>`, or if an
212 adaptation is applied to the selector expression it will be the result
213 type of the adaptation (such as :java:class:`String <java.lang.String>`
214 or :java:class:`Object <java.lang.Object>`).
215
216 .. java:method:: Selector<Other> withResultType(java.lang.Class<Other> otherType)
217
218 A convenience method to cast the type parameter of this Selector. Always
219 returns this instance. Just a dumb hack to keep Java’s static type
220 checking happy.
221
222 .. java:method:: java.util.List<T> result(com.hp.hpl.jena.rdf.model.RDFNode node)
223
224 Evaluates this selector expression with respect to the given starting
225 node, and returns the result.
226
227 .. java:method:: T singleResult(com.hp.hpl.jena.rdf.model.RDFNode node)
228
229 Evaluates this selector expression with respect to the given starting
230 node, and returns the result. If the selector does not evaluate to
231 exactly one node, an exception is thrown.
232
233 .. java:class:: au.id.djc.rdftemplate.selector.AntlrSelectorFactory
234
235 Use this class to compile selector expressions into :java:class:`Selector`
236 instances. Instances of this class can safely be shared across threads (for
237 example, as singleton beans in Spring).
238
239 .. java:method:: au.id.djc.rdftemplate.selector.Selector<?> get(java.lang.String expression)
240
241 Compiles the given selector expression into a :java:class:`Selector`
242 instance.
243
244 .. code-block:: java
245
246 Selector<RDFNode> s1 = factory.get("foaf:knows").withResultType(RDFNode.class);
247 Selector<String> s2 = factory.get("foaf:knows/foaf:name#string-lv").withResultType(String.class);
248
249 .. java:method:: void setAdaptationFactory(au.id.djc.rdftemplate.selector.AdaptationFactory adaptationFactory)
250
251 Configures a custom :java:class:`AdaptationFactory` implementation for
252 selectors created by this factory. If this setter is not called, an
253 instance of :java:class:`DefaultAdaptationFactory
254 <au.id.djc.rdftemplate.selector.DefaultAdaptationFactory>` will be used.
255
256 .. java:method:: void setPredicateResolver(au.id.djc.rdftemplate.selector.PredicateResolver predicateResolver)
257
258 Configures a custom :java:class:`PredicateResolver` implementation for
259 selectors created by this factory. If this setter is not called, an
260 instance of :java:class:`DefaultPredicateResolver
261 <au.id.djc.rdftemplate.selector.DefaultPredicateResolver>` will be used.
262
263 .. java:method:: void setNamespacePrefixMap(java.util.Map<String, String> namespacePrefixMap)
264
265 Configure namespace prefix mappings for selectors created by this
266 factory. If this setter is not called, no namespace prefixes will be
267 defined.
268
269 .. java:class:: au.id.djc.rdftemplate.selector.AdaptationFactory
270
271 Implement this interface if you would like to use custom adaptations in your
272 selector expressions.
273
274 Your implementation should fall back to
275 a :java:class:`DefaultAdaptationFactory
276 <au.id.djc.rdftemplate.selector.DefaultAdaptationFactory>` instance, so that
277 selector expressions have access to the builtin adaptations in addition to
278 your custom ones.
279
280 .. java:class:: au.id.djc.rdftemplate.selector.PredicateResolver
281
282 Implement this interface if you would like to use custom predicates in your
283 selector expressions.
284
285 Your implementation should fall back to
286 a :java:class:`DefaultPredicateResolver
287 <au.id.djc.rdftemplate.selector.DefaultPredicateResolver>` instance, so that
288 selector expressions have access to the builtin predicates in addition to
289 your custom ones.
290
291 .. java:class:: au.id.djc.rdftemplate.selector.EternallyCachingSelectorFactory
292
293 Wrap an :java:class:`AntlrSelectorFactory` with this class if you want to
294 avoid compiling selectors anew every time. Do not use this class if the
295 number of different selector expressions is unbounded, as it will cause heap
296 exhaustion.