Recreating Awk like behavior in Python

In Fedora Devshell, one feature i would like to support is turning a revision control branch of fedora specific branches into a set of Patch lines, and converting a set of patch lines into a revision control branch. I figured there would be two ways of doing this, the first parsing the entire spec file and recreating it later, the second using stream editing via the well known Unix tools such as ed, sed, and awk.

The problem i had there is that while full out parsing is useful in certain conditions, you risk having the result of the parser not being exactly the same as the input. For example, if a developer stuck a Patch: line in a spec file after the BuildRequires, but the rest were before the BuildRequires, a poorly written parser might reorder the file, unnecessarily. This is the kind of headache that would make writing a full out spec file parser a several day long project that would be prone to error and annoying to test. I decided to go with the latter method.

Awk is a really cool program if you ever learn to use it effectively. The problem though is that scripting code from Python to Awk would get unwieldy. It would require creating one more layer over a DSL, domain specific language, that is just as troublesome as writing a full parser. Still, i would liked to have programmed in a set of semantics for making all kinds of changes to a spec file from the bottom up. Instead, i've recreated some of the core features from Awk in Python, as a sort of complex parser.

This pythonic awk parser lets you create all kinds of patterns and composites of patterns. It also lets you create various handlers or composites of handlers for various patterns. Finally, it lets you compose a series of patterns into a single awk "program" represented by an instance of the Awk object. Since the process method accepts an iterable and yields a generator, multiple awk programs can be chained together. This only took a few hours to do.

Click here to see in the fedora-devshell.git repository.