Adding datanames covering database information

James Hester jamesrhester at gmail.com
Thu Apr 12 06:59:49 BST 2018


Dear Core CIF users and experts,

The current core CIF provides the DATABASE and DATABASE_CODE categories for
identifying a database entry corresponding to the structure contained in
the data block, for a variety of pre-determined databases.  These are both
Set categories, that is, their datanames can only take a single value in a
single data block.  This restriction is reasonable if the database content
for that entry is seen as coincident with the data block contents, as has
been the case for structural databases.

However, it is possible for multiple entries from a single database to be
more broadly relevant to the contents of a data block. For example,
multiple structures may correspond to a single topology.  So I would like
you to consider the creation of a (looped) DATABASE_RELATED category that
would simply list entry codes for databases in the same way as CITATION
simply lists literature references.  Other categories in other dictionaries
may then reference these entries for their own uses.  This is not intended
to replace the current DATABASE categories, which would still be preferred
for use by structural databases upon deposition and delivery of CIF files.
The new category would instead align with the mmCIF DATABASE_2 category.

The proposed data names are as follows, with short summaries of their
meanings:

_database_related.id           'An arbitrary identifier for this entry'
_database_related.database_id            'An identifier for the database
from an enumerated list (e.g. CCDC, PDB, ICSD, COD ...)
_database_related.reference   'A code used by the database given in
_database_related.database_id'
_database_related.relation      'The way in which the database entry is
related to the contents of the data block, from an enumerated list. Initial
suggestions include "identical","component","derived","common source" '
_database_related.special_details   'Optional free-form description of the
relationship between this entry and the data block contents"

An example of use in a data file would then be:

loop_
_database_related.id
_database_related.database_id
_database_related.reference
_database_related.relation
_database_related.special_details
1    COD              1234
identical                            'As deposited structure'
2    COD              6789                   'common source'
'Curated version of this structure'
3    CCDC            qrst-12               'common source'
'Curated version of this structure'
4    ICSD              lll-ppp                 .
              'An earlier version of the structure with missing H atoms'

Please provide your thoughts on this general scheme, and any further data
names that you think might be useful in this context.  If there are no
objections, I will prepare formal definitions and advise this group when
they are ready for inclusion.

best wishes,
James Hester.
-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.iucr.org/pipermail/coredmg/attachments/20180412/630c4a97/attachment.html>


More information about the coreDMG mailing list