Global section in CIF headers

James Hester jamesrhester at gmail.com
Wed Sep 9 03:49:18 BST 2009


Let's explore these ideas. First I think it is worth clarifying the
syntax situation.

Currently, there are three CIF syntaxes: the old 1.0 syntax, where
brackets were allowed to begin non-delimited character strings; the
current 1.1 syntax, where this behaviour was disallowed and maximum
line lengths were increased; and the upcoming 1.2 syntax, which has
bracketed lists and is still being finalised.  The next layer of
semantics is provided by the dictionaries to which a given file
conforms. DDL1, DDL2 and DDLm are relevant only to the dictionaries,
not directly to the data files.  Note that the "dot" notation
introduced in DDL2/mmCIF datanames has no computer-readable meaning,
but is purely a convenience for the human reader.

While I wholeheartedly agree with the sentiment of removing all data
from comments, in the one particular case of distinguishing between
different syntaxes, it is convenient to have a syntax indicator in the
first few characters of a file.  A simple examination of the first
line of the file is sufficient to decide which parser to execute,
following which no syntax issues remain.  However, if the syntax is to
be specified in a global_ block, some sort of CIF parser needs to be
run in order to even find the global block and discover the precise
syntax, following which, presumably, the parser has to reconfigure
itself and continue on.  This seems like a comparatively complex
procedure compared to examination of the first few characters.

Note also that the global_ block under this proposal will be forced to
occur at the beginning of the file, although the original
specification indicated that it could appear anywhere in the file.

Moving on to Brian's suggestion, I think we are overdue for finding a
way to describe links between data blocks.  However, using a global_
block to do this restricts the description to a single data file.  As
an alternative proposal, how about defining a CIF semantic dictionary,
with the following two datanames in it?

_semantic.block_id
_semantic.block_relationship

These datanames could be looped inside any datablock order to convey a
series of relationships to other datablocks, including those not in
the same file e.g.

loop_
 _semantic.block_id
 _semantic.block_relationship

|Sydney|090909|JRH         'wavelength determination'
|Sydney|080808|VKP        'identical batch of sample'
|Sydney|070707|XYZ         'raw powder data'
|Sydney|060606|ABC        'Lebail refined structure'


In the example I have used a pd_block.id type construction to uniquely
identify a datablock.  Also, the nature of the relationship could be
formalised into an enumerated list to help machine-readability.

James.

On Wed, Sep 9, 2009 at 2:44 AM, Brian H. Toby<Brian.Toby at anl.gov> wrote:
> A global section could also be used to describe the relationships between
> data blocks in a single CIF. To date, (outside of pdCIF) the CIF model has
> assumed that each block is fully defined internally and thus can be used
> independently. This defeats the point of having multi-block file structures.
> Brian
> On Sep 8, 2009, at 10:28 AM, Joe Krahn wrote:
>
> I have been thinking that it makes sense to allow a global_ block as
> part of a CIF file. Globals have been excluded because they don't fit
> very well into the data model, but it might be useful to allow them to
> provide general format hints to the parser.
> My idea is that a common low-level parser could be used for mmCIF, CIF,
> and possibly other STAR variants. The global_ header would define
> parsing rules for the file, including possible future revisions of the
> same format, but not be considered part of the actual mmCIF data. For
> example:
> global_
> _format         mmCIF
> _version        1.2
> In a way, this just replaces the initial comment-embedded CIF
> identifier, but I have always dislike the idea of a comment containing
> data. This approach could be more detailed, depending on how much the
> CIF/mmCIF format changes over time. Will it ever include STAR 2.0
> bracketed lists? Will they ever directly include Unicode text?
> Joe Krahn
> _______________________________________________
> comcifs mailing list
> comcifs at iucr.org
> http://scripts.iucr.org/mailman/listinfo/comcifs
>
> ********************************************************************
> Brian H. Toby, Ph.D.                            office: 630-252-5488
> Senior Physicist/Materials Characterization Group Leader
> Advanced Photon Source
> 9700 S. Cass Ave, Bldg. 433/D003             work cell: 630-327-8426
> Argonne National Laboratory         secretary (Marija): 630-252-5453
> Argonne, IL 60439-4856         e-mail: brian dot toby at anl dot gov
> ********************************************************************
> "We will restore science to its rightful place, and wield technology's
> wonders... We will harness the sun and the winds and the soil to fuel our
> cars and run our factories...  All this we can do. All this we will do."
>
> _______________________________________________
> comcifs mailing list
> comcifs at iucr.org
> http://scripts.iucr.org/mailman/listinfo/comcifs
>
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148


More information about the comcifs mailing list