When traversal is used within a repoze.bfg application, the repoze.bfg Router parses the URL associated with the request. It splits the URL into individual path segments. Based on these path segments, repoze.bfg traverses a model graph in order to find a context. It then attempts to find a view based on the type of the context (specified by its Python class type or any interface attached to it). If repoze.bfg finds a view for the context, it calls it and returns a response to the user.
When your application uses traversal to resolve URLs to code, your application must supply a model graph to repoze.bfg.
Users interact with your repoze.bfg -based application via a router, which is just a fancy WSGI application. At system startup time, the router is configured with a callback known as a root factory, supplied by the application developer. The root factory is passed the WSGI “environment” (a dictionary) and it is expected to return an object which represents the root of the model graph. All traversal will begin at this root object. The root object is usually a mapping object (such as a Python dictionary).
Note
If a root factory is passed to the repoze.bfg “make_app” function as the value None, a default root factory is used. This is most useful when you’re using URL dispatch and you don’t care very much about traversing any particular graph to resolve URLs to code. It is also possible to use traversal and URL dispatch together. When both a root factory (and therefore traversal) and “routes” declarations (and therefore url dispatch) are used, the url dispatch routes are checked first, and if none match, repoze.bfg will fall back to using traversal to attempt to map the request to a view. If the name *traverse is in a route’s path pattern, when it is matched, it is also possible to do traversal after a route has been matched. See Combining Traversal and URL Dispatch for more information.
Items contained within the object graph are analogous to the concept of model objects used by many other frameworks (and repoze.bfg refers to them as models, as well). They are typically instances of Python classes.
The model graph consists of container nodes and leaf nodes. There is only one difference between a container node and a leaf node: container nodes possess a __getitem__ method while leaf nodes do not. The __getitem__ method was chosen as the signifying difference between the two types of nodes because the presence of this method is how Python itself typically determines whether an object is “containerish” or not.
A container node is presumed to be willing to return a child node or raise a KeyError based on a name passed to its __getitem__.
No leaf-level instance is required to have a __getitem__. If leaf-level instances happen to have a __getitem__ (through some historical inequity), you should subclass these node types and cause their __getitem__ methods to simply raise a KeyError. Or just disuse them and think up another strategy.
Usually, the traversal root is a container node, and as such it contains other nodes. However, it doesn’t need to be a container. Your model graph can be as shallow or as deep as you require.
Traversal “stops” when repoze.bfg either reaches a leaf level model instance in your object graph or when the path segments implied by the URL “run out”. The object that traversal “stops on” becomes the context.
When a user requests a page from your repoze.bfg -powered application, the system uses this algorithm to determine which Python code to execute:
The request for the page is presented to the repoze.bfg router in terms of a standard WSGI request, which is represented by a WSGI environment and a start_response callable.
The router creates a WebOb request object based on the WSGI environment.
The root factory is called with the WSGI environment. It returns a root object.
The router uses the WSGI environment’s PATH_INFO variable to determine the path segments to traverse. The leading slash is stripped off PATH_INFO, and the remaining path segments are split on the slash character to form a traversal sequence, so a request with a PATH_INFO variable of /a/b/c maps to the traversal sequence [u'a', u'b', u'c']. Note that each of the path segments in the sequence is converted to Unicode using the UTF-8 decoding (if the decoding fails, a TypeError is raised).
Traversal begins at the root object returned by the root factory. For the traversal sequence [u'a', u'b', u'c'], the root object’s __getitem__ is called with the name a. Traversal continues through the sequence. In our example, if the root object’s __getitem__ called with the name a returns an object (aka “object a“), that object’s __getitem__ is called with the name b. If object A returns an object when asked for b, “object b“‘s __getitem__ is then asked for the name c, and may return “object c“.
Traversal ends when a) the entire path is exhausted or b) when any graph element raises a KeyError from its __getitem__ or c) when any non-final path element traversal does not have a __getitem__ method (resulting in a NameError) or d) when any path element is prefixed with the set of characters @@ (indicating that the characters following the @@ token should be treated as a “view name”).
When traversal ends for any of the reasons in the previous step, the the last object found during traversal is deemed to be the context. If the path has been exhausted when traversal ends, the “view name” is deemed to be the empty string (''). However, if the path was not exhausted before traversal terminated, the first remaining path element is treated as the view name.
Any subsequent path elements after the view name are deemed the subpath. The subpath is always a sequence of path segments that come from PATH_INFO that are “left over” after traversal has completed. For instance, if PATH_INFO was /a/b and the root returned an “object a“, and “object a” subsequently returned an “object b“, the router deems that the context is “object b“, the view name is the empty string, and the subpath is the empty sequence. On the other hand, if PATH_INFO was /a/b/c and “object a” was found but raised a KeyError for the name b, the router deems that the context is “object a“, the view name is b and the subpath is ('c',).
If a authentication policy is configured, the router performs a permission lookup. If a permission declaration is found for the view name and context implied by the current request, an authorization policy is consulted to see if the “current user” (all determined by the the authentication policy) can perform the action. If he can, processing continues. If he cannot, the forbidden view is called (see Changing the Forbidden View).
Armed with the context, the view name, and the subpath, the router performs a view lookup. It attempts to look up a view from the repoze.bfg application registry using the view name and the context. If a view function is found, it is called with the context and the request. It returns a response, which is fed back upstream. If a view is not found, the notfound view is called (see Changing the Not Found View).
In either case, the result is returned upstream via the WSGI protocol.
It’s useful to be able to debug NotFound errors when they occur unexpectedly due to an application registry misconfiguration. To debug these errors, use the BFG_DEBUG_NOTFOUND environment variable or the debug_notfound configuration file setting. Details of why a view was not found will be printed to stderr, and the browser representation of the error will include the same information. See Environment and Configuration for more information about how and where to set these values.
Let’s pretend the user asks for http://example.com/foo/bar/baz/biz/buz.txt. Let’s pretend that the request’s PATH_INFO in that case is /foo/bar/baz/biz/buz.txt. Let’s further pretend that when this request comes in that we’re traversing the following graph:
/--
|
|-- foo
|
----bar
Here’s what happens:
The fact that it does not find “baz” at this point does not signify an error condition. It signifies that:
Because it’s the “context”, repoze.bfg examines “bar” to find out what “type” it is. Let’s say it finds that the context is an IBar type (because “bar” happens to have an attribute attached to it that indicates it’s an IBar).
Using the “view name” (“baz”) and the type, it asks the application registry (configured separately, via configure.zcml) this question:
Let’s say it finds no matching view type. It then returns the result of the notfound view. The request ends. Everyone is sad.
But! For this graph:
/--
|
|-- foo
|
----bar
|
----baz
|
biz
The user asks for http://example.com/foo/bar/baz/biz/buz.txt
The fact that it does not find “buz.txt” at this point does not signify an error condition. It signifies that:
Because it’s the “context”, repoze.bfg examines “biz” to find out what “type” it is. Let’s say it finds that the context an IBiz type (because “biz” happens to have an attribute attached to it that happens indicates it’s an IBiz).
Using the “view name” (“buz.txt”) and the type, it asks the application registry this question:
Let’s say that question is answered “here you go, here’s a bit of code that is willing to deal with that case”, and returns a view. It is passed the “biz” object as the “context” and the current WebOb request as the “request”. It returns a response.
There are two special cases:
The traversal machinery by default attempts to first URL-unquote and then Unicode-decode each path element in PATH_INFO from its natural byte string (str type) representation. URL unquoting is performed using the Python standard library urllib.unquote function. Conversion from a URL-decoded string into Unicode is attempted using the UTF-8 encoding. If any URL-unquoted path segment in PATH_INFO is not decodeable using the UTF-8 decoding, a TypeError is raised. A segment will be fully URL-unquoted and UTF8-decoded before it is passed it to the __getitem__ of any model object during traversal.