Problems with CIF BNF

Joe Krahn krahn at
Mon Mar 12 17:44:42 GMT 2007

Some parts of CIF are vague. I hoped that the BNF syntax would be a
precise syntax specification, but it has problems. It is central to
properly defining the CIF format, and should therefore be very accurate.

First, there are some plain syntax errors, like unbalanced braces in the
production of <Float>, and an empty token in the TokenizedComments

There are also a few hacks like <noteol>, and the lack of rules for the
content of quoted strings. I think it is also a hack for a production
unit to be defined for two elements, like "<eol><UnquotedString>".

Does EOF count as whitespace? Normally, a text file ends with an <eol>
on the last line, so it is not a problem. With Fortran, you may not be
able to distinguish between them, so it seems that EOF probably should
count as a whitespace token.

There are also places where the grammar could be simplified, such as:

  { {'e' | 'E' } | {'e' | 'E' } { '+' | '- ' } } <UnsignedInteger>

written as:
  {'e' | 'E' } { '+' | '-' }?  <UnsignedInteger>

Also note the error in the first form copied from the web page: the
minus sign has a space included.

Should the logical-OR symbol always be contained within braces? This
appears to be inconsistent, but maybe the rule is to require braces when
the members include a quoted character element.

I will try to edit my own version of the BNF to produce what I think it
is supposed to mean. Answers to some of the above questions will be
helpful in getting it right.

Joe Krahn

More information about the comcifs mailing list