markpasc (markpasc) wrote,
markpasc
markpasc

TAP in pyparsing

Paul McGuire, author of pyparsing, was nice enough to write up a TAP parser using it for comparison to my yeanpypa attempt. A few days earlier I had hacked up my own pyparsing version too, as pyparsing has a few features that seemed like it would make a more forgiving parser. Here’s the pyparsing version I wrote:

from pyparsing import *

non_zero_digits = '123456789'
digits = '0' + non_zero_digits

non_zero_number = Combine(Word(non_zero_digits, digits))

plan = Combine(Suppress(Literal('1..')) + non_zero_number)
test_num    = non_zero_number('num')
description = Combine(NotAny(digits) + OneOrMore(CharsNotIn('#\n') | Literal('\\#')))('description')
directive   = Suppress(Literal('#')) + Optional(CaselessLiteral('skip')('skip') | CaselessLiteral('todo')('todo')) + Combine(Regex(r'(?s).'))('directive')
result = Literal('not ok')('notok') | Literal('ok')('ok')
preamble = ZeroOrMore(~result + SkipTo(OneOrMore(LineEnd()), include=True))
test_line = result + Optional(test_num + Optional(description) + Optional(directive))
test = Group(Combine(preamble)('preamble') + test_line)

def stripLeadingDash(tokens):
   tokens[0] = re.sub(r'^-\s*', '', tokens[0])
   tokens['description'] = re.sub(r'^-\s*', '', tokens['description'])

description.addParseAction(stripLeadingDash)

tap = (plan('plan') & Group(OneOrMore(test))('tests')) | ('1..0' + directive)

Some interesting features:

  • I used the pydoc and the examples instead of the actual manual, so I didn’t see setDefaultWhitespaceChars that Paul used. I had to use CharsNotIn('#\n') in description and (?s) in the directive regexp so the dot wouldn’t match the \n.
  • I used a parse action to strip the optional leading dash from description, rather than try to match it. Looking back now I don’t see why I couldn’t include the Optional('-') on the front, but it seemed to make sense when I wrote it.

Using Combine and the naming feature means I get immediately useful results with this parser, unlike the yeanpypa parser, which ends up being more of a tokenizer.

>>> from tap_parser import tap
>>> test1 = """\                                                                                                          
... 1..4                                                                                                                  
... ok 1 - Input file opened                                                                                              
... not ok 2 - First line of the input valid                                                                              
... ok 3 - Read the rest of the file                                                                                      
... not ok 4 - Summarized correctly # TODO Not written yet
... """
>>> res = tap.parseString(test1)
>>> res.plan
'4'
>>> res.tests[0].ok
'ok'
>>> res.tests[1].ok
''
>>> res.tests[1].notok
'not ok'
>>> res.tests[1].description
'First line of the input valid'
>>> [res.tests[3][f] for f in ('num', 'todo', 'description')]
['4', 'todo', 'Summarized correctly ']
>>>

Paul still uses a function in his example to massage the results into output, but my results map almost exactly into the Django model objects I planned to make.

On the other hand, mine fails Paul’s test3 example, which is certainly proper TAP. Seems to not be handling post-planned results due to my attempted use of Each (the & operator). Hmm.

Update: Oh, no, it was the missing test numbers that made it barf. Somehow I’d read into the TAP spec that a test number was required if there was a description… or at least accidentally written that into the parser. Changing test_line’s definition thusly made the tests work (though I’m only testing one stream for correctness so far):

test_line = result + Optional(test_num) +
            Optional(description) + Optional(directive)
Tags: code, pyparsing, python, tap
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

  • 0 comments