core_cif review note: _diffrn.crystal_id removed

James Hester jamesrhester at gmail.com
Mon Sep 12 04:41:58 BST 2016


Dear David and Core DMG,

I naturally agree that we must maintain definitions in order for our
archived data to be interpretable, and also because those tags with
definitions are now enshrined in software that will continue to produce
those tags into the future. The question is simply whether or not some or
all of these tags should be placed in a separate dictionary and associated
with a different _audit.schema setting.

To assess the practical significance of adjusting these tags, I have
grepped through 380,000 or so Crystallography Open Database CIF files, and
found _exptl_crystal_id used 584 times, but looped in only 3 cases, all
from a single series of high-pressure experiments (COD 7104407,08,09, link:
http://www.crystallography.net/7104407.cif). In each of these three files,
the _exptl_crystal category is split between a looped and unlooped section,
which is formally incorrect (see Vol G p 235: _exptl_crystal_F_000 must
appear in list with _exptl_crystal_id). I conclude that the entire
_exptl_crystal category can be defined as a Set category with no impact on
currently-written software (which clearly is outputting unlooped
exptl_crystal information).

_diffrn_refln_crystal_id appears once (COD 4023466) in a file in which
exptl_crystal_id is not looped, so _diffrn_refln_crystal_id provides no new
information. Likewise, I am advised by the IUCr office that the
_exptl_crystal_id tag has appeared a total of 46 times over the last 14
years in the IUCr archives (around 1% of files), and none of the other
_crystal_id tags are present at all.

Given the above, I therefore propose that we retain only _exptl_crystal.id
in the exptl_crystal category, which becomes a Set category (i.e. one value
per dataname) in keeping with current practice as established above. In
addition, we define a 'multi-crystal' schema in a separate dictionary in
which exptl_crystal is looped, and _diffrn_refln.crystal_id and
refln.crystal_id are defined (and perhaps others). The only datafiles in
the COD corpus whose interpretations are affected are the above 3 files,
which are already formally incorrect (but of course do contain perfectly
useful information for the human reader and in other categories).

I will edit the current core CIF dictionary draft accordingly, and expand
the 'Legacy' section of the looping proposal (
https://github.com/COMCIFS/comcifs.github.io/blob/master/looping_proposal.md)
to include exptl_crystal. This editing is purely to expedite the process
(rather than waiting for comments that may never come), and of course
further discussion is welcome.

James.

FYI, the commands I used to obtain the above numbers are (in the COD
download top directory):
(1) all occurrences of _exptl_crystal_id etc.:
 pcregrep -r '_exptl_crystal_id' 1/* 2/* 3/* 4/* 5/* 6/* 7/* 8/* 9/*
(2) all looped occurences of _exptl_crystal_id:
pcregrep -rM '(_[[:graph:]]+|loop_)[^[:graph:]]+_exptl_crystal_id' 1/* 2/*
3/* 4/* 5/* 6/* 7/* 8/* 9/*

and

pcregrep -rM '_exptl_crystal_id[^[:graph:]]+_[[:graph:]]+' 1/* 2/* 3/* 4/*
5/* 6/* 7/* 8/* 9/*


On 9 September 2016 at 00:18, Brown, David <idbrown at mcmaster.ca> wrote:

> I should point out that the multiple crystal feature was added  because of
> requests from users. It may not be needed for routine work, but it is
> needed for archival purposes and in some cases for submission of papers to
> journals. Not every program needs to be able to handle multiple crystal
> files, but the feature needs to be present for those cases where the
> structure determination is dependent on the use of different crystals.
>
> David
>
> I. David Brown
> Professor Emeritus
> Department of Physics and Astronomy
> McMaster University
> Hamilton, Ontario, Canada
> ------------------------------
> *From:* coreDMG [coredmg-bounces at iucr.org] on behalf of James Hester [
> jamesrhester at gmail.com]
> *Sent:* September 7, 2016 21:28
> *To:* Distribution list of the IUCr COMCIFS Core Dictionary Maintenance
> Group
> *Subject:* core_cif review note: _diffrn.crystal_id removed
>
> Dear Core DMG,
>
> In the course of removing unneeded keys (as per a previous email), I noted
> that the draft core dictionary is inconsistent as to whether or not
> multiple crystals are supported.   Referring to Vol G, the original DDL1
> core allowed multiple crystals to be listed in exptl_crystal, and these
> crystal ids could be included in the diffrn_refln and refln lists.  The new
> draft core adds these crystal ids to the exptl_crystal_face category (not
> too problematic) and to the diffrn category. The latter is nominally a set
> category (one value per dataname in DDL1) and so couldn't refer to more
> than a single crystal id.  I have therefore removed crystal_id from this
> category in the updated draft, which now accurately reflects the state of
> the DDL1 dictionary.
>
> Looking to the future, we do now have an elegant solution for handling
> multiple crystals, by making use of the new _audit.schema arrangement. In
> an ideal world, the core dictionary would assume only one crystal, and a
> small expansion dictionary associated with a non-default value of
> _audit.schema would define exptl_crystal.id
> <http://redir.aspx?REF=Rh7zRQg6XE_hXQ2_CL1baegcc9L3sugogaRoGb3p-ohc8_cT8tfTCAFodHRwOi8vZXhwdGxfY3J5c3RhbC5pZA..>
> and the keys listed in the previous paragraph.  In this ideal world,
> software that did not want to deal with multiple crystals could happily
> stick to the default schema.
>
> It's not clear to me how much the multiple crystal definitions in the DDL1
> core are actually used. It would be great to have some comments, especially
> from software authors, as to whether or not they input/output
> _exptl_crystal_id as defined in the current DDL1 core dictionary.  For
> example, would your software input and process CIFs correctly if the
> reflection list contained multiple instances of the same h,k,l, each from
> different crystals? Do you actually output _refln_crystal_id in the
> reflection list?
>
> I am currently preparing a core CIF draft containing the various small
> revisions described in this and previous emails and should have it
> available for you by the end of the week.
>
> all the best,
> James.
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>
> _______________________________________________
> coreDMG mailing list
> coreDMG at iucr.org
> http://mailman.iucr.org/cgi-bin/mailman/listinfo/coredmg
>
>


-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.iucr.org/pipermail/coredmg/attachments/20160912/f594cfe2/attachment.html>


More information about the coreDMG mailing list