CIF Infoset

Dr P. Murray-Rust pm286 at cam.ac.uk
Wed Aug 18 09:42:15 BST 2004


On Aug 17 2004, Herbert J. Bernstein wrote:

> Peter asks some interesting questions.  I do not propose to answer
> them in detail here.  However, I should point out that interpretation
> of a given CIF may require 4 sets of documents:
> 
>   1.  The CIF itself.
>   2.  The dictionary or dictionaries defining the tags
> used in the CIF
>   3.  The relevant DDLs
>   4.  The CIF specification:
>        http://www.iucr.org/iucr-top/cif/spec/version1.1/index.html

I agree with this. A little while ago I was invited to work with Syd and 
Nick and spent 2 pleasant weeks looking at whether this could be managed in 
a self-consistent system. In theory, yes. In practice it was questionable 
whether it was worthwhile and would be used.

It is almost isomorphic with the XML schema hierarchy:

DDL-validates->DDL-validates->dictionary-validates->CIF

i.e. the DDL is self-validating. The problem was that *any* changes to the 
DDL have repercussions down the line which multiply. In XMLSchema we have

SchemaSchema -validates-> XSDSchema -validates-> instance

The construction of slef-consistent schemas in XML has been anything but 
trivial and has caused much argument. It is unlikely that CIF will benefit 
from a rerun.

So I have taken the pragmatic view that we have DDL2 and DDL1 as currently 
accepted and used. As my own interests are currently in DDL1 I have 
restricted my questions and conserns to CIF (i.e. not STAR) and built 
software for this. My architecture should be sufficiently modular 
thatif/when CIF extends to fuller STAR it can be enhanced.

> 
> Many of Peter's questions are answered in the specification.

The lexical questions are. I have used the syntax and semantics documents 
as reference. I have assumed these are formal abstractions of the original 
published article(s). If they are not, then it would be useful to abstract 
additional rules - I think that implementers need to know exactly what 
documents apply and what the rules are.

> 
> The infoset concept is useful, but be warned that the appropriate
> handling of information depends on the context within which you are
> working, regardless of whether you are using CIF or using XML or
> the PDB format.  For an application intended to just get at the data,
> comments may be discarded, while for an application intended to reformat
> the presentation of the data, comments are highly significant
> information.  Similarly, the particular form of quoting, the
> distinction between "." and "?", etc. may or may not be
> signficant.  If the application in question is, say, a
> refinement program that just needs to read CIFs to extract
> expected crystallographic data, then construction of the "infoset"
> from a CIF is particularly simple.  More demanding applications,
> e.g. in CIF validation and publication suites, may need to deal
> with more subtle data and metadata questions.
> 
I am afraid I disagree! If the interpretation of a CIF depends on what 
program is to be used to process it then it is (IMO) not an abstract 
archive and transfer format.

Peter M-R




More information about the comcifs mailing list