Problems with CIF BNF

Tue Mar 13 02:12:08 GMT 2007

You may find the following links helpful:

    http://arcib.dowling.edu/cifiucr/

especially

    http://www.bernstein-plus-sons.com/software/ciftest/

and

    http://arcib.dowling.edu/vcif/

At 4:30 PM -0400 3/12/07, Joe Krahn wrote:
>I realize that there are a few hacks in the BNF to deal with
>context-dependence, like productions defined as multiple symbols, which
>make it impossible to use as a working BNF. But, there are other
>problems with grammar. With the end-of-line example, the lexer can do
>something 'sensible', but it is still important to have a specific
>definition of whether missing a terminal <eol> makes the CIF invalid.
>
>I can look at CBFlib to see an interpretation of the CIF grammar, but
>someone else's parser may have a different interpretation. In fact, it
>would be good to have a collection of unusual CIF files for parser
>testing, with a consensus as to which ones are valid and which are invalid.
>
>Joe
>
>Herbert J. Bernstein wrote:
>>  Without a defined lexer, you cannot do CIF as a BNF; it is context
>>  sensitive in its use of whitespace.  The question you are raising
>>  about EOF should be handled by the lexer, which should deal sensibly
>>  with the usual unix problem of disambiguating the case of a final
>>  line that ends with eof rather than eol-eof.  There is a rather
>>  complete bison grammar in CBFlib working on the level of tokens
>>  after lexing the input.  -- HJB
>>
>>
>>  At 1:44 PM -0400 3/12/07, Joe Krahn wrote:
>>>  Some parts of CIF are vague. I hoped that the BNF syntax would be a
>>>  precise syntax specification, but it has problems. It is central to
>>>  properly defining the CIF format, and should therefore be very accurate.
>>>
>>>  First, there are some plain syntax errors, like unbalanced braces in the
>>>  production of <Float>, and an empty token in the TokenizedComments
>>>  production.
>>>
>>>  There are also a few hacks like <noteol>, and the lack of rules for the
>>>  content of quoted strings. I think it is also a hack for a production
>>>  unit to be defined for two elements, like "<eol><UnquotedString>".
>>>
>>>  Does EOF count as whitespace? Normally, a text file ends with an <eol>
>>>  on the last line, so it is not a problem. With Fortran, you may not be
>>>  able to distinguish between them, so it seems that EOF probably should
>>>  count as a whitespace token.
>>>
>>>  There are also places where the grammar could be simplified, such as:
>>>
>>>    { {'e' | 'E' } | {'e' | 'E' } { '+' | '- ' } } <UnsignedInteger>
>>>
>>>  written as:
>>>    {'e' | 'E' } { '+' | '-' }?  <UnsignedInteger>
>>>
>>>  Also note the error in the first form copied from the web page: the
>>>  minus sign has a space included.
>>>
>>>  Should the logical-OR symbol always be contained within braces? This
>>>  appears to be inconsistent, but maybe the rule is to require braces when
>>>  the members include a quoted character element.
>>>
>>>  I will try to edit my own version of the BNF to produce what I think it
>>>  is supposed to mean. Answers to some of the above questions will be
>>>  helpful in getting it right.
>>>
>>>  Thanks,
>>>  Joe Krahn
>>>  _______________________________________________
>>>  comcifs mailing list
>>>  comcifs at iucr.org
>>>  http://scripts.iucr.org/mailman/listinfo/comcifs
>>
>>  _______________________________________________
>>  comcifs mailing list
>>  comcifs at iucr.org
>>  http://scripts.iucr.org/mailman/listinfo/comcifs
>_______________________________________________
>comcifs mailing list
>comcifs at iucr.org
>http://scripts.iucr.org/mailman/listinfo/comcifs

Crystallography Online: the website of the International Union of Crystallography

Problems with CIF BNF