A Very Simple ParserThe first step to analyzing a program is parsing the textual source code into an in-memory walkable tree. Module ast takes away the hard task of implementing a generic lexer and python parser with the function ast.parse(). The module also takes care of walking the tree. All you have to do is implement analysis code for the type of statements you are want to analyze. To do this, you implement a class that derives from class ast.NodeVisitor and only override the member functions that are pertinent to your analysis.
NodeVisitor exposes callback member functions that the tree walker ("node visitor") calls when it encounters particular python statements, one for each type of statement in the language. For instance, NodeVisitor.visit_Import is called each time the waloker encounters an import statement.
Enough with the theory..As code is often the best documentation, let's look at a parser that does one thing: print a message for each import statement
For the code snippet "import foo", this produces
Implementing visit_.. CallbacksYou can define callbacks of the form visit_<type> for each of the left-hand symbols and right-hand constructors defined in the abstract grammar for python. Thus, visit_stmt and visit_Import are both valid callbacks. Left-hand symbols may be abstract classes that are never called directly, however. Instead, their concrete implementations listed on the right are: visit_stmt is never called, but visit_Import implements a concrete type of statement and will be called for all import statements.
In the common case, when a node has no associated visit_<type> member, the parser calls the member generic_visit, which ensures that the walk recurses to the children of that node -- for which you may have a member defined, even if you did not define one for the node. When you override a member, that function is called and generic_visit is no longer called automatically. You are responsible for ensuring that the children are called, by calling generic_visit explicitly (unless you expressly intended to stop recursion) in your member.
Using the returned objectsEach callback function visit_<type>(self, object) returns with an object of a class particular to the given type. All classes derive from the abstract class ast.AST. As a result, each has a member _fields, along with members specific to the class. The names member shown in the first example is specific to visit_Import, for instance. Note that this corresponds to the argument of the Import constructor in the syntax. In general, I believe that these arguments are the class-specific members, although I could not find any definite documentation on this.
The _fields MemberThe following snippet gives an example of how iterating of _fields returns all children of a node. Given the input "a = b + 1", the member function
generates the output
For each child, the generator returns a tuple consisting of name and child object. A quick look at the abstract syntax grammar shows that indeed all child classes again correspond to symbols in the grammar: Name, Add and Num.
Calling the ParserThis brings us to the last step: how to actually pass input to the parser and generate output. Assuming you have a string containing Python code, this string is parsed into an in-memory tree and the tree walked with your callbacks using:
A Warning on Modifying CodeTree walking is not just useful for inspecting code, you can also use it to modify the parse tree. The reference documentation (see below) is very clear on the fact that you cannot use NodeVisitor for this purpose. Instead, derive from the NodeTransformer class, whose members are expected to return a replacement object for each object with which they are called.
- The authoritative information sources is the Python ast module reference
- StackOverflow has a discussion with informative examples