Important CIF items for discussion
Herbert J. Bernstein
yaya at bernstein-plus-sons.com
Fri Jul 11 22:13:17 BST 2008
David has raised very important issues, but they are based on two
CIF principles that have never really held, and under DDLm need
very serious modification:
The first principle put forward by David is that "CIF (normally)
defines only one data item for a given piece of information and the
format of that item is the one closest to the natural representation
of the quantity (rather than a format in common use or one designed
for ease of computation). For example, the CIF specifies which units
that are to be used and alternative units are not permitted."
The second principle is that "A given piece of information may
appear only once in a given datablock."
Certainly it is desirable to avoid easily misunderstood choices
of representation and it is desirable to avoid pointless repetition
of the same information, but we already treat these more as sensible
guidance than as rigid rules.
Even the generally well-accept "principle" of not allowing alternate
units is "violated" in the PDB Exchange dictionary, and will be
in the next imgCIF dictionary.
One could also question how "natural" the choices of units are.
Consider, for example, cell parameters. The "natural" units for
angles are radians, but out of respect for common practice, we
The second principle is also violated, for example by allowing
cell volumes as well as cell edge lengths and angles. The volume
is derived and duplicative, and it certainly is possible to
generate a CIF in which the cell volumes is inconsistent with
the cell edge lengths and angles, and, unlike U's vs. B's it
is highly likely that both will have been given, along with,
in the case of mmCIF, several transformation matrices that also
need to be cross-checked.
What is nice about DDLm is that it now allows us to put the necessary
cross-check information into the dictionaries, and that would seem
to me the best way to address David's third issue of dealing with
intermediate values used in a computation -- allow them in, but
only with the methods necessary to validate them given. We only
get ourselves in trouble if we use the same data item name for
two different or computationally inconsistent meanings, not if
we have unique names for related items.
The layering of dictionaries and hierarchy of methods can be
viewed as bugs or as features. A CIF being used for a journal
submission or an archive submission has to be a complete fully
documented package. In that case, it would not be appropriate
for the CIF to depend on a local dictionary not submitted with
the data CIF, e.g. using David's option D of
_audit_conform_included_dictionary, but clearly it would be
helpful to the community to have the IUCr start and archive of
"local" dictionaries, preferably with namespace controls, e.g.
using the current system of prefixes.
For work within a lab, however, we cannot and should not act as
the "CIF police". If someone has a scientific use for odd
layering and real-time assembly of virtual dictionaries, why
should they not do it? When the CIF is ready to be moved
elsewhere it needs to be cleaned up and documented, e.g. by
creating the expanded dictionary from the local pieces.
It would suggest replacing the two principles with the following
1. When defining new data items, dictionary developers are
advised to avoid unnecessarily duplicative definitions, e.g.
two definitions of the same item that differ only in the units
used. Exceptions should be justified and fully documented.
2. When relationships exist among multiple definitions, those
relationships should be stated clearly, and, if at all possible,
algorithmically, preferably using DDLm, to allow for automatic
validation, and, when, necessary, generation of values for
missing data items.
3. An existing tag should never be used in a way that is
inconsistent with definitions used by journals and archives.
At 11:33 AM -0400 7/11/08, David Brown wrote:
>I have attached to this email a dicsussion paper concerning three
>issues that have arisen during our evaluation of the new DDLm.
>These are important issues for the future of CIF, and before I ask
>the voting members of COMCIFS to make a decision, I would like to
>see the issues fully discussed and, if possible, a consensus
>reached. The attached document is six pages long, but I hope you
>will take the time to read it and comment on the issues raised.
>I apologize if you have received this message more than once by
>being on more than one discussion list. Please ignore the second
>Attachment converted: Macintosh HD:CIFprinciplesDiscussion.pdf (PDF
>Attachment converted: Macintosh HD:idbrown 1.vcf (TEXT/ttxt) (0033D5C0)
>comcifs mailing list
>comcifs at iucr.org
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
yaya at dowling.edu
More information about the comcifs