repoze.urispace – Hierarchical URI-based metadata

Author:Tres Seaver
Version:0.1

Overview

repoze.urispace implements the URISpace [1] 1.0 spec, as proposed to the W3C by Akamai. Its aim is to provide an implementation of that language as a vehicle for asserting declarative metadata about a resource based on pattern matching against its URI.

Once asserted, such metadata can be used to guide the application in serving the resource, with possible applications including:

  • Setting cache control headers.
  • Selecting externally applied themes, e.g. in Deliverance
  • Restricting access, e.g. to emulate Zope’s “placeful security.”

URISpace Specification

The URISpace [1] specification provides for matching on the following portions of a URI:

  • scheme

  • authority (see URIRFC [2])

    o host, including wildcarding (leading only) and port

    o user (if specified in the URI)

  • path elements, including nesting and wildcarding, as well as parameters, where used.

  • query elements, including test for presence or for specific value

  • fragments (likely irrelevant for server-side applications)

Note

repoze.urispace does not yet provide support for fragment matching.

Match statements against these URI portions are called selectors, and an individual selector may be scalar, can contain multiple values separated by whitespace, or can use RDF Bags and Alternates to indicate groups of possible matches.

Note

repoze.urispace does not yet provide support for parsing multi-option selectors using RDF or parsing multi-option selectors separated by whitespace.

When multiple matches occur within a single selector or within sibling selectors, the most specific match takes precedence. In cases where there are multiple matches of equal specificity, the first such match takes precedence.

Note

repoze.urispace does not yet observe these rules. Currently the final match among siblings will get precedence, regardless of specificity.

The asserted metadata can be scalar or can use RDF Bag and Sequences to indicate sets or ordered collections.

Note

repoze.urispace does not yet provide support for parsing multi-valued assertions using RDF.

Operators are provided to allow for incrementally updating or clearing the value for a given metadata element. Specified operators include:

replace
Completely replace any previously asserted value with a new one. This is the default operator.
clear
Remove any previously asserted value.
union
Perform a set union: old | new
intersection
Perform a set intersection: old & new
rev-intersection
Perform a set exclusion: old ^ new
difference
Perform set subtraction: old - new
rev-difference
Perform set subtraction: new - old
prepend
Insert new values at the head of old values
append
Insert new values at the tail of old values

Example

Suppose we want to select different Deliverance themes and or rulesets based on the URI of the resource being themed. In particular:

  • The news, lifestyle, and sports sections of the site each get custom themes, with the homepage and any other sections sharing the default theme.
  • Within the news section, the world, national, and local sections all use a different theme URL (one with the desired color scheme name encoded as a query string).
  • Within any section, the index.html page should use a different ruleset, than that for stories in that section (whose final path element will be <slug>.html): the index page’s HTML structured very differently from that used for stories.

A URISpace file specifying these policies would look like:

<?xml version="1.0" ?>
<themeselect
   xmlns:uri='http://www.w3.org/2000/urispace'
   xmlns:urix='http://repoze.org/repoze.urispace/extensions'
   xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
   >

 <!-- default theme and rules -->
 <theme>http://themes.example.com/default.html</theme>
 <rules>http://static.example.com/rules/default.xml</rules>

 <uri:path uri:match="news">
  <theme>http://themes.example.com/news.html</theme>
  <uri:path uri:match="world">
   <theme>http://themes.example.com/news.html?style=world</theme>
  </uri:path>
  <uri:path uri:match="national">
   <theme>http://themes.example.com/news.html?style=national</theme>
  </uri:path>
  <uri:path uri:match="local">
   <theme>http://themes.example.com/news.html?style=local</theme>
  </uri:path>
 </uri:path>

 <uri:path uri:match="lifestyle">
  <theme>http://themes.example.com/lifestyle.html</theme>
 </uri:path>

 <uri:path uri:match="sports">
  <theme>http://themes.example.com/sports.html</theme>
 </uri:path>

 <!-- Note that the following rules match "across" sections -->
 <urix:pathlast uri:match="*.xhtml">
  <rules>http://static.example.com/rules/story.xml</rules>
 </urix:pathlast>

 <urix:pathlast uri:match="index.xhtml">
  <rules>http://static.example.com/rules/index.xml</rules>
 </urix:pathlast>

 <!-- Note that the following rules fail to match "across" sections -->
 <uri:path uri:match="*.html">
  <rules>http://static.example.com/rules/story.xml</rules>
 </uri:path>

 <uri:path uri:match="index.html">
  <rules>http://static.example.com/rules/index.xml</rules>
 </uri:path>

</themeselect>

Given that URISpace file, one can test how given URIs matches using the uri_test script:

$ /path/to/bin/uri_test examples/dv_news.xml \
  http://example.com/ \
  http://example.com/foo \
  http://example.com/news/ \
  http://example.com/news/index.html \
  http://example.com/news/world/index.html \
  http://example.com/sports/ \
  http://example.com/sports/world_series_2008.html
------------------------------------------------------------------------------
URI: http://example.com/
------------------------------------------------------------------------------
rules = http://static.example.com/rules/default.xml
theme = http://themes.example.com/default.html

------------------------------------------------------------------------------
URI: http://example.com/foo
------------------------------------------------------------------------------
rules = http://static.example.com/rules/default.xml
theme = http://themes.example.com/default.html

------------------------------------------------------------------------------
URI: http://example.com/news/
------------------------------------------------------------------------------
rules = http://static.example.com/rules/default.xml
theme = http://themes.example.com/news.html

------------------------------------------------------------------------------
URI: http://example.com/news/index.html
------------------------------------------------------------------------------
rules = http://static.example.com/rules/default.xml
theme = http://themes.example.com/news.html

------------------------------------------------------------------------------
URI: http://example.com/news/index.xhtml
------------------------------------------------------------------------------
rules = http://static.example.com/rules/index.xml
theme = http://themes.example.com/news.html

------------------------------------------------------------------------------
URI: http://example.com/news/world/index.html
------------------------------------------------------------------------------
rules = http://static.example.com/rules/default.xml
theme = http://themes.example.com/news.html?style=world

------------------------------------------------------------------------------
URI: http://example.com/news/world/index.xhtml
------------------------------------------------------------------------------
rules = http://static.example.com/rules/index.xml
theme = http://themes.example.com/news.html?style=world

------------------------------------------------------------------------------
URI: http://example.com/sports/
------------------------------------------------------------------------------
rules = http://static.example.com/rules/default.xml
theme = http://themes.example.com/sports.html

------------------------------------------------------------------------------
URI: http://example.com/sports/world_series_2008.html
------------------------------------------------------------------------------
rules = http://static.example.com/rules/default.xml
theme = http://themes.example.com/sports.html

------------------------------------------------------------------------------
URI: http://example.com/sports/world_series_2008.xhtml
------------------------------------------------------------------------------
rules = http://static.example.com/rules/story.xml
theme = http://themes.example.com/sports.html

Using a URISpace parser in Python Code

Once parsing is complete, the URISpace is available as tree-like object. The canonical operators to extract metadata for a given URI are:

from urlparse import urlsplit
scheme, nethost, path, query, fragment = urlsplit(uri)

path = path.split('/')
if len(path) > 1 and path[0] == '':
    path = path[1:]

info = {'scheme': scheme,
        'nethost': nethost,
        'path': path,
        'query': parse_qs(query, keep_blank_values=1),
        'fragment': fragment,
        }
operators = urispace.collect(info)
assertions = {}
for operator in operators:
    operator.apply(assertions)

At this point, assertions will contain keys and values for all operators found while matching against the URI.

Using URISpace as WSGI Middleware

One application of a URISpace might be to make assertions about the URI of a WSGI request, in order to allow other parts of the application to use those assertions. repoze.urispace provides a component which can be used as middleware for this purpose.

To configure the middleware in a PasteDeploy config file:

[filter:urispace]
use = egg:repoze.urispace#urispace
file = %{here)s/urispace.xml

You should then be able to add the middleware to your pipeline:

[pipeline:main]
pipeline =
  urispace
  your_app

In your application, you can get to the assertions made by the middleware using the repoze.urispace.middleware.getAssertions() API, e.g.:

from repoze.urispace.middleware import getAssertions

def your_app(environ, start_response):
    assertions = getAssertions(environ)