Adding datanames covering database information

James Hester jamesrhester at gmail.com
Tue Jan 8 05:37:25 GMT 2019


These changes have now been included in the latest version of the
dictionary at github,
https://github.com/COMCIFS/cif_core/blob/cif2-conversion/cif_core.dic

On Thu, 28 Jun 2018 at 17:24, James Hester <jamesrhester at gmail.com> wrote:

> Please see below some draft definitions for a new database_related
> category, as foreshadowed in my email of April 12th.  Feel free to comment.
> If any databases have been left off the initial list below, feel free to
> suggest additions.
>
> Note that I have chosen not to make these datanames aliases of the
> DATABASE_2 datanames in mmCIF, as the new category has a different key.
>
> James.
> =============================================================
> #
> #  Draft definitions for a new DATABASE_RELATED category
> #
>
> save_DATABASE_RELATED
> _definition.id          DATABASE_RELATED
> _definition.class       Loop
> _definition.scope       Category
> _definition.update      2018-06-29
> _description.text
> ;
>
>     A category of items recording entries in databases that describe
>     the same or related data. Databases wishing to insert their own
>     canonical codes when archiving and delivering data blocks should
>     use items from the DATABASE category.
>
> ;
> _name.category_id       PUBLICATION
> _name.object_id         DATABASE_RELATED
> _category_key.name      '_database_related.id'
> save_
>
> save_database_related.id
> _definition.id          '_database_related.id'
> _definition.update      2018-06-29
> _description.text
> ;
>        An identifer for this database reference
> ;
> _name.category_id       database_related
> _name.object_id         id
> _type.purpose           Key
> _type.source            Recorded
> _type.container         Single
> _type.contents          Text
> save_
>
> save_database_related.database_id
> _definition.id          '_database_related.database_id'
> _definition.update      2018-06-29
> _description.text
> ;
>        An identifier for the database that contains the
>        related dataset.
> ;
> _name.category_id       database_related
> _name.object_id         database_id
> _type.purpose           State
> _type.source            Recorded
> _type.container         Single
> _type.contents          Text
> _import.get [{'save':database_list 'file':templ_enum.cif}]
> save_
>
> save_database_related.database_code
> _definition.id          '_database_related.database_code'
> _definition.update      2018-06-29
> _description.text
> ;
>        The code used by the database referred to in
>        _database_related.database_id to identify the
>        related dataset.
> ;
> _name.category_id       database_related
> _name.object_id         database_code
> _type.purpose           Encode
> _type.source            Recorded
> _type.container         Single
> _type.contents          Text
>
> save_
>
> save_database_related.relation
> _definition.id          '_database_related.relation'
> _definition.update      2018-06-29
> _description.text
> ;
>        The general relationship of the data in the data block
>        to the dataset referred to in the database.
> ;
> _name.category_id       database_related
> _name.object_id         relation
> _type.purpose           State
> _type.source            Recorded
> _type.container         Single
> _type.contents          Text
> loop_
>    _enumeration_set.state
>    _enumeration_set.details
>    Identical           'The dataset contents are identical'
>    Subset              'The dataset contents are a proper subset of the
> contents of the data block'
>    Superset            'The dataset contents include the contents of the
> data block'
>    Derived             'The dataset contents are derivable from the
> contents of the data block'
>    Common              'The dataset contents share a common source'
> save_
>
> save_database_related.special_details
> _definition.id          '_database_related.special_details'
> _definition.update      2018-06-29
> _description.text
> ;
>     Information about the external dataset and relationship not encoded
>     elsewhere.
> ;
> _name.category_id                       database_related
> _name.object_id                         special_details
> _type.purpose                           Describe
> _type.source                            Recorded
> _type.container                         Single
> _type.contents                          Text
>
> save_
>
>
> #
> # Contents to be added to templ_enum.cif listing database codes
> #
>
>
> save_database_list
> loop_
>     _enumeration_set.state
>     _enumeration_set.detail
>     CAS          'Chemical Abstracts'
>     COD          'Crystallographic Open Database'
>     CSD          'Cambridge Structural Database'
>     ICSD         'Inorganic Crystal Structure Database'
>     MDF          'Metals Data File'
>     NDB          'Nucleic Acid Database'
>     PDB          'Protein Data Bank'
>     PDF          'Powder Diffraction File (JCPDS/ICDD)'
>     RCSB         'Research Collaboratory for STructural Bioinformatics'
>     EBI          'European Bioinformatics Institute'
> save_
>
>
> On 12 April 2018 at 15:59, James Hester <jamesrhester at gmail.com> wrote:
>
>> Dear Core CIF users and experts,
>>
>> The current core CIF provides the DATABASE and DATABASE_CODE categories
>> for identifying a database entry corresponding to the structure contained
>> in the data block, for a variety of pre-determined databases.  These are
>> both Set categories, that is, their datanames can only take a single value
>> in a single data block.  This restriction is reasonable if the database
>> content for that entry is seen as coincident with the data block contents,
>> as has been the case for structural databases.
>>
>> However, it is possible for multiple entries from a single database to be
>> more broadly relevant to the contents of a data block. For example,
>> multiple structures may correspond to a single topology.  So I would like
>> you to consider the creation of a (looped) DATABASE_RELATED category that
>> would simply list entry codes for databases in the same way as CITATION
>> simply lists literature references.  Other categories in other dictionaries
>> may then reference these entries for their own uses.  This is not intended
>> to replace the current DATABASE categories, which would still be preferred
>> for use by structural databases upon deposition and delivery of CIF files.
>> The new category would instead align with the mmCIF DATABASE_2 category.
>>
>> The proposed data names are as follows, with short summaries of their
>> meanings:
>>
>> _database_related.id           'An arbitrary identifier for this entry'
>> _database_related.database_id            'An identifier for the database
>> from an enumerated list (e.g. CCDC, PDB, ICSD, COD ...)
>> _database_related.reference   'A code used by the database given in
>> _database_related.database_id'
>> _database_related.relation      'The way in which the database entry is
>> related to the contents of the data block, from an enumerated list. Initial
>> suggestions include "identical","component","derived","common source" '
>> _database_related.special_details   'Optional free-form description of
>> the relationship between this entry and the data block contents"
>>
>> An example of use in a data file would then be:
>>
>> loop_
>> _database_related.id
>> _database_related.database_id
>> _database_related.reference
>> _database_related.relation
>> _database_related.special_details
>> 1    COD              1234
>> identical                            'As deposited structure'
>> 2    COD              6789                   'common source'
>> 'Curated version of this structure'
>> 3    CCDC            qrst-12               'common source'
>> 'Curated version of this structure'
>> 4    ICSD              lll-ppp                 .
>>                 'An earlier version of the structure with missing H atoms'
>>
>> Please provide your thoughts on this general scheme, and any further data
>> names that you think might be useful in this context.  If there are no
>> objections, I will prepare formal definitions and advise this group when
>> they are ready for inclusion.
>>
>> best wishes,
>> James Hester.
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>>
>
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>


-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.iucr.org/pipermail/coredmg/attachments/20190108/5d4ba2b6/attachment.html>


More information about the coreDMG mailing list