Important CIF items for discussion
krahn at niehs.nih.gov
Thu Jul 17 18:12:05 BST 2008
I'm not a database developer. My interest is more towards making CIF
more useful in local software before it gets deposited to a standard
James Hester wrote:
> My thoughts on the three issues raised by David.
> 1. Virtual dictionaries
> I favour option D (the dictionary is contained within the datablock as
> the text value of a dataitem). This ensures that the dictionary
> remains as close as possible to the dataitems it is concerned with.
> Note that creating a CIF with such a built-in definition in no way
> forces the IUCr to condone the content of that definition: if someone
> were to define betas in their CIF and then submit a CIF with betas
> rather than UIJs, the IUCr remains free to act as it has in the past
> (especially if the beta definition boils down to a description along
> the lines of "beta i j as conventionally defined").
I suggested something like this recently on this list, but got no
interest. Arguments against it were that non-standard data just makes
things complicate database management. I favor it because not all data
is database oriented, and that this would allow easy addition of
non-standard data, which is often needed when experimenting with new
The contained virtual dictionary can specify the parent dictionary. The
same syntax could be used to just specify the associated dictionary that
the data block conforms to. The parent dictionary referenced could also
be a partial dictionary which references another higher-level dictionary
to establish a dictionary hierarchy.
> The technical issues boil down to being able to escape the
> "<EOL><semicolon>" digraphs in the embedded dictionary text, which
> would otherwise prematurely end the datavalue. Some suggestions:
This is definitely a problem. A single text datablock should be able to
hold an entire CIF dictionary without having to indent the text.
> (1) Define "<backslash><semicolon>" as being an escaped semicolon (and
> this would require defining "<backslash><backslash>" as an escaped
> backslash in order to cope with those situations in which you actually
> want the text that comes out to be "<backslash><semicolon>").
> Obviously the escaping character doesn't have to be backslash.
> (1a) Define "<EOL><backslash><semicolon>" to be an escaped "<EOL><semicolon>".
I made a similar suggestion over a year ago. Nobody wants to define
special escape sequences, do to interference with the set codes for
> (2) Substitute <EOL><hash> for <EOL> in the entire text field. This
> immediately signals to the human reader that the entire text field is
> a single block, rather than bits of a CIF file (same as quoting in
> emails), and is easily reversible. And nestable, but let's not go
Another alternative is to just use MIME encapsulation, already defined
for Binary CIF. As it is implemented there, the data block is all
base-64 encoded, which contains no semicolons. But, MIME can also
encapsulate unencoded text, by defining the end-of-data marker. This
requires the low-level parser to understand the MIME format.
I should also mention that the newer STAR format has a bracketed list
format that includes backslash escape sequences to allow contained items
to have a bracket. If this is adopted by CIF, that may be enough of a
precedent to support backslash-escapes for beginning-of-line semicolons
Here is a snippet of my idea for storing dictionary data within a
datablock, using a SAVE frame, so that it is part of the actual CIF
syntax rather than using a text block, which means the data has to be
By using save frames, it is easy to avoid conflicts, because CIF
currently restricts their use to dictionaries. I would also support
nested SAVE frames so that the whole dictionary syntax can be fit inside
of one parent SAVE frame, rather than having to split it.
Written by O version 9.0.3
Thu Feb 22 16:54:17 2007
DATE:23-Feb-07 10:46:23 created by user: krahn
More information about the comcifs