Traversal is a context finding mechanism. It is the act of finding a context and a view name by walking over an object graph, starting from a root object, using a request object as a source of path information.
In this chapter, we’ll provide a high-level overview of traversal, we’ll explain the concept of an object graph, and we’ll show how traversal might be used within an application.
We use an analogy to provide an introduction to traversal. Imagine an inexperienced UNIX computer user, wishing only to use the command line to find a file and to invoke the cat command against that file. Because he is inexperienced, the only commands he knows how to use are cd, which changes the current directory and cat, which prints the contents of a file. And because he is inexperienced, he doesn’t understand that cat can take an absolute path specification as an argument, so he doesn’t know that you can issue a single command command cat /an/absolute/path to get the desired result. Instead, this user believes he must issue the cd command, starting from the root, for each intermediate path segment, even the path segment that represents the file itself. Once he gets an error (because you cannot successfully cd into a file), he knows he has reached the file he wants, and he will be able to execute cat against the resulting path segment.
This inexperienced user’s attempt to execute cat against the file named /fiz/buz/myfile might be to issue the following set of UNIX commands:
cd /
cd fiz
cd buz
cd myfile
The user now knows he has found a file, because the cd command issues an error when he executed cd myfile. Now he knows that he can run the cat command:
cat myfile
The contents of myfile are now printed on the user’s behalf.
repoze.bfg is very much like this inexperienced UNIX user as it uses traversal against an object graph. In this analogy, we can map the cat program to the repoze.bfg concept of a view callable: it is a program that can be run against some context as the result of view lookup. The file being operated on in this analogy is the context object; the context is the “last node found” in a traversal. The directory structure is the object graph being traversed. The act of progressively changing directories to find the file as well as the handling of a cd error as a stop condition is analogous to traversal.
The analogy we’ve used is not exactly correct, because, while the naive user already knows which command he wants to invoke before he starts “traversing” (cat), repoze.bfg needs to obtain that information from the path being traversed itself. In traversal, the “command” meant to be invoked is a view callable. A view callable is derived via view lookup from the combination of the view name and the context. Traversal is the act of obtaining these two items.
Traversal is dependent on information in a request object. Every request object contains URL path information in the PATH_INFO portion of the WSGI environment. The PATH_INFO portion of the WSGI environment is the portion of a request’s URL following the hostname and port number, but before any query string elements or fragment element. For example the PATH_INFO portion of the URL http://example.com:8080/a/b/c?foo=1 is /a/b/c.
Traversal treats the PATH_INFO segment of a URL as a sequence of path segments. For example, the PATH_INFO string /a/b/c is converted to the sequence ['a', 'b', 'c'].
After the path info is converted, a lookup is performed against the object graph for each path segment. Each lookup uses the __getitem__ method of an object in the graph.
For example, if the path info sequence is ['a', 'b', 'c']:
This process continues until the path segment sequence is exhausted or a lookup for a path element fails. In either case, a context is found.
Traversal “stops” when it either reaches a leaf level model instance in your object graph or when the path segments implied by the URL “run out”. The object that traversal “stops on” becomes the context. If at any point during traversal any node in the graph doesn’t have a __getitem__ method, or if the __getitem__ method of a node raises a KeyError, traversal ends immediately, and that node becomes the context.
The results of a traversal also include a view name. The view name is the first URL path segment in the set of PATH_INFO segments “left over” in the path segment list popped by the traversal process after traversal finds a context object.
The combination of the context object and the view name found via traversal is used later in the same request by a separate repoze.bfg subsystem – the view lookup subsystem – to find a view callable later within the same request. How repoze.bfg performs view lookup is explained within the Views chapter.
When your application uses traversal to resolve URLs to code, your application must supply an object graph to repoze.bfg. This graph is represented by a root object.
In order to supply a root object for an application, at system startup time, the repoze.bfg Router is configured with a callback known as a root factory. The root factory is supplied by the application developer as the root_factory argument to the application’s Configurator.
Here’s an example of a simple root factory:
1 2 3 | class Root(dict):
def __init__(self, request):
pass
|
Here’s an example of using this root factory within startup configuration, by passing it to an instance of a Configurator named config:
1 | config = Configurator(root_factory=Root)
|
Using the root_factory argument to a repoze.bfg.configuration.Configurator constructor tells your repoze.bfg application to call this root factory to generate a root object whenever a request enters the application. This root factory is also known as the global root factory.
A root factory is passed a request object and it is expected to return an object which represents the root of the object graph. All traversal will begin at this root object. Usually a root factory for a traversal-based application will be more complicated than the above Root object; in particular it may be associated with a database connection or another persistence mechanism. A root object is often an instance of a class which has a __getitem__ method.
Warning
In repoze.bfg 1.0 and prior versions, the root factory was passed a WSGI environment object (a dictionary) while in repoze.bfg 1.1+ it is passed a request object. For backwards compatibility purposes, the request object passed to the root factory has a dictionary-like interface that emulates the WSGI environment, so code expecting the argument to be a dictionary will continue to work.
If no root factory is passed to the repoze.bfg Configurator constructor, or the root_factory is specified as the value None, a default root factory is used. The default root factory always returns an object that has no child nodes.
Items contained within the object graph are sometimes analogous to the concept of model objects used by many other frameworks (and repoze.bfg APIs often refers to them as “models”, as well). They are typically instances of Python classes.
The object graph consists of container nodes and leaf nodes. There is only one difference between a container node and a leaf node: container nodes possess a __getitem__ method while leaf nodes do not. The __getitem__ method was chosen as the signifying difference between the two types of nodes because the presence of this method is how Python itself typically determines whether an object is “containerish” or not.
Each container node is presumed to be willing to return a child node or raise a KeyError based on a name passed to its __getitem__.
Leaf-level instances must not have a __getitem__. If instances that you’d like to be leaves already happen to have a __getitem__ through some historical inequity, you should subclass these node types and cause their __getitem__ methods to simply raise a KeyError. Or just disuse them and think up another strategy.
Usually, the traversal root is a container node, and as such it contains other nodes. However, it doesn’t need to be a container. Your object graph can be as shallow or as deep as you require.
In general, the object graph is traversed beginning at its root object using a sequence of path elements described by the PATH_INFO of the current request; if there are path segments, the root object’s __getitem__ is called with the next path segment, and it is expected to return another graph object. The resulting object’s __getitem__ is called with the very next path segment, and it is expected to return another graph object. This happens ad infinitum until all path segments are exhausted.
This section will attempt to explain the repoze.bfg traversal algorithm. We’ll provide a description of the algorithm, a diagram of how the algorithm works, and some example traversal scenarios that might help you understand how the algorithm operates against a specific object graph.
We’ll also talk a bit about view lookup. The Views chapter discusses view lookup in detail, and it is the canonical source for information about views. Technically, view lookup is a repoze.bfg subsystem that is separated from traversal entirely. However, we’ll describe the fundamental behavior of view lookup in the examples in the next few sections to give you an idea of how traversal and view lookup cooperate, because they are almost always used together.
When a user requests a page from your traversal -powered application, the system uses this algorithm to find a context and a view name.
The request for the page is presented to the repoze.bfg router in terms of a standard WSGI request, which is represented by a WSGI environment and a WSGI start_response callable.
The router creates a request object based on the WSGI environment.
The root factory is called with the request. It returns a root object.
The router uses the WSGI environment’s PATH_INFO information to determine the path segments to traverse. The leading slash is stripped off PATH_INFO, and the remaining path segments are split on the slash character to form a traversal sequence.
The traversal algorithm by default attempts to first URL-unquote and then Unicode-decode each path segment derived from PATH_INFO from its natural byte string (str type) representation. URL unquoting is performed using the Python standard library urllib.unquote function. Conversion from a URL-decoded string into Unicode is attempted using the UTF-8 encoding. If any URL-unquoted path segment in PATH_INFO is not decodeable using the UTF-8 decoding, a TypeError is raised. A segment will be fully URL-unquoted and UTF8-decoded before it is passed it to the __getitem__ of any model object during traversal.
Thus, a request with a PATH_INFO variable of /a/b/c maps to the traversal sequence [u'a', u'b', u'c'].
Traversal begins at the root object returned by the root factory. For the traversal sequence [u'a', u'b', u'c'], the root object’s __getitem__ is called with the name a. Traversal continues through the sequence. In our example, if the root object’s __getitem__ called with the name a returns an object (aka “object a“), that object’s __getitem__ is called with the name b. If object A returns an object when asked for b, “object b“‘s __getitem__ is then asked for the name c, and may return “object c“.
Traversal ends when a) the entire path is exhausted or b) when any graph element raises a KeyError from its __getitem__ or c) when any non-final path element traversal does not have a __getitem__ method (resulting in a NameError) or d) when any path element is prefixed with the set of characters @@ (indicating that the characters following the @@ token should be treated as a view name).
When traversal ends for any of the reasons in the previous step, the last object found during traversal is deemed to be the context. If the path has been exhausted when traversal ends, the view name is deemed to be the empty string (''). However, if the path was not exhausted before traversal terminated, the first remaining path segment is treated as the view name.
Any subsequent path elements after the view name is found are deemed the subpath. The subpath is always a sequence of path segments that come from PATH_INFO that are “left over” after traversal has completed.
Once context and view name and associated attributes such as the subpath are located, the job of traversal is finished. It passes back the information it obtained to its caller, the repoze.bfg Router, which subsequently invokes view lookup with the context and view name information.
The traversal algorithm exposes two special cases:
Finally, traversal is responsible for locating a virtual root. A virtual root is used during “virtual hosting”; see the Virtual Hosting chapter for information. We won’t speak more about it in this chapter.
No one can be expected to understand the traversal algorithm by analogy and description alone, so let’s examine some traversal scenarios that use concrete URLs and object graph compositions.
Let’s pretend the user asks for http://example.com/foo/bar/baz/biz/buz.txt. The request’s PATH_INFO in that case is /foo/bar/baz/biz/buz.txt. Let’s further pretend that when this request comes in that we’re traversing the following object graph:
/--
|
|-- foo
|
----bar
Here’s what happens:
The fact that it does not find “baz” at this point does not signify an error condition. It signifies that:
At this point, traversal has ended, and view lookup begins.
Because it’s the “context”, the view lookup machinery examines “bar” to find out what “type” it is. Let’s say it finds that the context is a Bar type (because “bar” happens to be an instance of the class Bar). Using the view name (baz) and the type, view lookup asks the application registry this question:
Let’s say that view lookup finds no matching view type. In this circumstance, the repoze.bfg router returns the result of the not found view and the request ends.
However, for this graph:
/--
|
|-- foo
|
----bar
|
----baz
|
biz
The user asks for http://example.com/foo/bar/baz/biz/buz.txt
The fact that it does not find “buz.txt” at this point does not signify an error condition. It signifies that:
At this point, traversal has ended, and view lookup begins.
Because it’s the “context”, the view lookup machinery examines “biz” to find out what “type” it is. Let’s say it finds that the context is a Biz type (because “biz” is an instance of the Python class Biz). Using the view name (buz.txt) and the type, view lookup asks the application registry this question:
Let’s say that question is answered by the application registry; in such a situation, the application registry returns a view callable. The view callable is then called with the current WebOb request as the sole argument: request; it is expected to return a response.
A tutorial showing how traversal can be used within a repoze.bfg application exists in ZODB + Traversal Wiki Tutorial.
See the Views chapter for detailed information about view lookup.
The repoze.bfg.traversal module contains API functions that deal with traversal, such as traversal invocation from within application code.
The repoze.bfg.url.model_url() function generates a URL when given an object retrieved from an object graph.