I want a TAP parser in Python, so I tried yeanpypa:
from yeanpypa import *
non_zero_number = AnyOf('123456789') + ZeroOrMore(digit)
rest_of_line = OneOrMore(NoneOf('\n'))
plan = Literal('1..') + non_zero_number
todo_directive = Optional(' ') + rest_of_line
skip_reason = ZeroOrMore(NoneOf(' ')) + Literal(' ')
+ rest_of_line
skip_directive = Optional(' ') + Literal('# ')
+ Literal('skip') + Optional(skip_reason)
directive = Optional(' ') + Literal('# ')
+ (skip_directive | todo_directive)
description = Optional(' ') + Optional('- ')
+ ZeroOrMore(NoneOf('#\n'))
ok_not_ok = Literal('ok') | Literal('not ok')
test_num = Optional(' ') + non_zero_number
test = ok_not_ok + Optional(test_num)
+ Optional(description) + Optional(directive)
plan_skipped = Literal('1..0') + skip_directive
plan_first_tap = plan + ZeroOrMore(Literal('\n') + test)
plan_last_tap = test + ZeroOrMore(Literal('\n') + test)
+ Optional(Literal('\n') + plan)
tap = plan_skipped | plan_first_tap | plan_last_tap
I guess it works, but it feels like writing a regular expression longhand. I would probably also make it more lax if I wrote it as an re, as it would be easier to write \s*-\s* in the description rule, say.
untitled
untitled
untitled
Yeah, I used YAPPS back in the day at school, which has a run-through-this-tool step to compile a separate grammar spec into Python code.
Defined in Python like this, it is just a longhand RE. It’ll be helpful if (like I say below) I can use yeanpypa’s
CallbackParserto get a SAX-like view of the document, since I can’t do that with the re module (though I guess I could kind of build that out ofre.finditer).untitled
untitled
untitled
untitled
Yeah, exactly. One or more of these parsing kits actually compiles your grammar into a real RE. So far it’s really not more powerful at all; all I get out of the posted code is the equivalent of a Python
MatchGroupanyhow.I haven’t looked into yeanpypa’s
CallbackParser; hopefully yeanpypa’sCallbackParseris more like SAX parsing, and will let me collect the parts as it goes. That would be worthwhile.Pyparsing rendition
# newlines are significant whitespace in this parser, so set # default skippable whitespace to just spaces and tabs ParserElement.setDefaultWhitespaceChars(" \t") NL = LineEnd().suppress() integer = Word(nums) plan = '1..' + integer("ubound") OK,NOT_OK = map(Literal,['ok','not ok']) testStatus = (OK | NOT_OK) description = Regex("[^#\n]+") description.setParseAction(lambda t:t[0].lstrip('- ')) TODO,SKIP = map(CaselessLiteral,'TODO SKIP'.split()) directive = Group(Suppress('#') + (TODO + restOfLine | FollowedBy(SKIP) + restOfLine.copy().setParseAction(lambda t:['SKIP',t[0]]) )) testLine = Group(testStatus("passed") + Optional(integer)("testNumber") + Optional(description)("description") + Optional(directive)("directive") ) bailLine = Group(Literal("Bail out!")("BAIL") + empty + Optional(restOfLine)("reason")) tapOutput = Optional(Group(plan)("plan") + NL) & \ Group(OneOrMore((testLine|bailLine) + NL))("tests")The quoted strings embedded within the grammar indicate field names that can be used to directly access strings within the parsed results. Here is the code that tests and processes the TAP output:
def tallyResults(results): failedTests = [] skippedTests = [] todoTests = [] bonusTests = [] if results.plan: expected = range(1, int(results.plan.ubound)+1) else: expected = range(1,len(results.tests)+1) for i,res in enumerate(results.tests): # test for bail out if res.BAIL: print "Test suite aborted: " + res.reason failedTests += expected[i:] break #~ print res.dump() testnum = i+1 if res.testNumber != "": if testnum != int(res.testNumber): print "ERROR! test %(testNumber)s out of sequence" % res testnum = int(res.testNumber) passed = (res.passed=="ok") skipped = todo = False if res.directive: skipped = (res.directive[0][0]=='SKIP') todo = (res.directive[0][0]=='TODO') if not passed: failedTests.append(testnum) if skipped: skippedTests.append(testnum) if todo: todoTests.append(testnum) if todo and passed: bonusTests.append(testnum) if failedTests: print "Failed tests:", failedTests if skippedTests: print "SKIPPED:", skippedTests if todoTests: print "TODO:", todoTests if bonusTests: print "BONUS:", bonusTests if (set(failedTests)-set(todoTests) == set()): print "PASSED" else: print "FAILED" if __name__ == "__main__": test1 = """\ 1..4 ok 1 - Input file opened not ok 2 - First line of the input valid ok 3 - Read the rest of the file not ok 4 - Summarized correctly # TODO Not written yet """ test2 = """\ ok 1 not ok 2 some description # TODO with a directive ok 3 a description only, no directive ok 4 # TODO directive only ok a description only, no directive ok # SKIP only a directive, no description ok """ for test in (test1,test2): print test testResults = tapOutput.parseString(test) tallyResults(testResults) printThese tests print out:
You are absolutely right, pyparsing is mostly an uber-verbose regex. But that verbosity really helps when the time comes to go back and make changes or enhancements to a parser that you wrote months ago.