Adding datanames covering database information
James Hester
jamesrhester at gmail.com
Tue Jan 8 05:37:25 GMT 2019
These changes have now been included in the latest version of the
dictionary at github,
https://github.com/COMCIFS/cif_core/blob/cif2-conversion/cif_core.dic
On Thu, 28 Jun 2018 at 17:24, James Hester <jamesrhester at gmail.com> wrote:
> Please see below some draft definitions for a new database_related
> category, as foreshadowed in my email of April 12th. Feel free to comment.
> If any databases have been left off the initial list below, feel free to
> suggest additions.
>
> Note that I have chosen not to make these datanames aliases of the
> DATABASE_2 datanames in mmCIF, as the new category has a different key.
>
> James.
> =============================================================
> #
> # Draft definitions for a new DATABASE_RELATED category
> #
>
> save_DATABASE_RELATED
> _definition.id DATABASE_RELATED
> _definition.class Loop
> _definition.scope Category
> _definition.update 2018-06-29
> _description.text
> ;
>
> A category of items recording entries in databases that describe
> the same or related data. Databases wishing to insert their own
> canonical codes when archiving and delivering data blocks should
> use items from the DATABASE category.
>
> ;
> _name.category_id PUBLICATION
> _name.object_id DATABASE_RELATED
> _category_key.name '_database_related.id'
> save_
>
> save_database_related.id
> _definition.id '_database_related.id'
> _definition.update 2018-06-29
> _description.text
> ;
> An identifer for this database reference
> ;
> _name.category_id database_related
> _name.object_id id
> _type.purpose Key
> _type.source Recorded
> _type.container Single
> _type.contents Text
> save_
>
> save_database_related.database_id
> _definition.id '_database_related.database_id'
> _definition.update 2018-06-29
> _description.text
> ;
> An identifier for the database that contains the
> related dataset.
> ;
> _name.category_id database_related
> _name.object_id database_id
> _type.purpose State
> _type.source Recorded
> _type.container Single
> _type.contents Text
> _import.get [{'save':database_list 'file':templ_enum.cif}]
> save_
>
> save_database_related.database_code
> _definition.id '_database_related.database_code'
> _definition.update 2018-06-29
> _description.text
> ;
> The code used by the database referred to in
> _database_related.database_id to identify the
> related dataset.
> ;
> _name.category_id database_related
> _name.object_id database_code
> _type.purpose Encode
> _type.source Recorded
> _type.container Single
> _type.contents Text
>
> save_
>
> save_database_related.relation
> _definition.id '_database_related.relation'
> _definition.update 2018-06-29
> _description.text
> ;
> The general relationship of the data in the data block
> to the dataset referred to in the database.
> ;
> _name.category_id database_related
> _name.object_id relation
> _type.purpose State
> _type.source Recorded
> _type.container Single
> _type.contents Text
> loop_
> _enumeration_set.state
> _enumeration_set.details
> Identical 'The dataset contents are identical'
> Subset 'The dataset contents are a proper subset of the
> contents of the data block'
> Superset 'The dataset contents include the contents of the
> data block'
> Derived 'The dataset contents are derivable from the
> contents of the data block'
> Common 'The dataset contents share a common source'
> save_
>
> save_database_related.special_details
> _definition.id '_database_related.special_details'
> _definition.update 2018-06-29
> _description.text
> ;
> Information about the external dataset and relationship not encoded
> elsewhere.
> ;
> _name.category_id database_related
> _name.object_id special_details
> _type.purpose Describe
> _type.source Recorded
> _type.container Single
> _type.contents Text
>
> save_
>
>
> #
> # Contents to be added to templ_enum.cif listing database codes
> #
>
>
> save_database_list
> loop_
> _enumeration_set.state
> _enumeration_set.detail
> CAS 'Chemical Abstracts'
> COD 'Crystallographic Open Database'
> CSD 'Cambridge Structural Database'
> ICSD 'Inorganic Crystal Structure Database'
> MDF 'Metals Data File'
> NDB 'Nucleic Acid Database'
> PDB 'Protein Data Bank'
> PDF 'Powder Diffraction File (JCPDS/ICDD)'
> RCSB 'Research Collaboratory for STructural Bioinformatics'
> EBI 'European Bioinformatics Institute'
> save_
>
>
> On 12 April 2018 at 15:59, James Hester <jamesrhester at gmail.com> wrote:
>
>> Dear Core CIF users and experts,
>>
>> The current core CIF provides the DATABASE and DATABASE_CODE categories
>> for identifying a database entry corresponding to the structure contained
>> in the data block, for a variety of pre-determined databases. These are
>> both Set categories, that is, their datanames can only take a single value
>> in a single data block. This restriction is reasonable if the database
>> content for that entry is seen as coincident with the data block contents,
>> as has been the case for structural databases.
>>
>> However, it is possible for multiple entries from a single database to be
>> more broadly relevant to the contents of a data block. For example,
>> multiple structures may correspond to a single topology. So I would like
>> you to consider the creation of a (looped) DATABASE_RELATED category that
>> would simply list entry codes for databases in the same way as CITATION
>> simply lists literature references. Other categories in other dictionaries
>> may then reference these entries for their own uses. This is not intended
>> to replace the current DATABASE categories, which would still be preferred
>> for use by structural databases upon deposition and delivery of CIF files.
>> The new category would instead align with the mmCIF DATABASE_2 category.
>>
>> The proposed data names are as follows, with short summaries of their
>> meanings:
>>
>> _database_related.id 'An arbitrary identifier for this entry'
>> _database_related.database_id 'An identifier for the database
>> from an enumerated list (e.g. CCDC, PDB, ICSD, COD ...)
>> _database_related.reference 'A code used by the database given in
>> _database_related.database_id'
>> _database_related.relation 'The way in which the database entry is
>> related to the contents of the data block, from an enumerated list. Initial
>> suggestions include "identical","component","derived","common source" '
>> _database_related.special_details 'Optional free-form description of
>> the relationship between this entry and the data block contents"
>>
>> An example of use in a data file would then be:
>>
>> loop_
>> _database_related.id
>> _database_related.database_id
>> _database_related.reference
>> _database_related.relation
>> _database_related.special_details
>> 1 COD 1234
>> identical 'As deposited structure'
>> 2 COD 6789 'common source'
>> 'Curated version of this structure'
>> 3 CCDC qrst-12 'common source'
>> 'Curated version of this structure'
>> 4 ICSD lll-ppp .
>> 'An earlier version of the structure with missing H atoms'
>>
>> Please provide your thoughts on this general scheme, and any further data
>> names that you think might be useful in this context. If there are no
>> objections, I will prepare formal definitions and advise this group when
>> they are ready for inclusion.
>>
>> best wishes,
>> James Hester.
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>>
>
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.iucr.org/pipermail/coredmg/attachments/20190108/5d4ba2b6/attachment.html>
More information about the coreDMG
mailing list