Updating list of _audit.schema

James Hester jamesrhester at gmail.com
Thu Jan 7 04:39:41 GMT 2021


OK Herbert, I can only go on what is in the dictionaries. Please explain
how mmCIF can "loop freely" the categories containing an _entry.id child
key data name within a single data block given the definition. I will leave
John W to further comment on how _entry.id is supposed to be used if he
wishes. Meanwhile, in order to make progress I suggest simply
(i) removing the "Macromolecular" option, noting the previous "Experiments"
option covers multi-wavelength, multi-crystal setups.
(ii) removing the "imgCIF" option
(iii) returning in the future to add corresponding _audit.schema options
corresponding to mmCIF and imgCIF if necessary

By the way, I think I share your enthusiasm for managing information using
the relational model, but data containers (data blocks/files/directories
etc.) are unavoidable, and your characterisation of them as projections
over particular values of one or more key data names I think is the precise
way of defining the relationship between a data block and a relational
schema and is vital for proper understanding of how to build datasets from
constituent pieces.


On Thu, 7 Jan 2021 at 13:11, Herbert J. Bernstein <yayahjb at gmail.com> wrote:

> Dear James,
>   John Westbrook will have to speak to the question of why his dictionary
> says that, but the reality is that he also runs a database that
> in fact supports a lot more than one entry id, and it is certainly the
> case that imgCIF data can have a very complex and tangled relationship
> with mmCIF entry ids.  Further it is not unusual to present one entry as
> multiple datablocks pulled together by a common entry id when
> they are not in same data file, e.g. for structure factors and coordinates.
>   All of which is beside the point.  Both imgCIF and mmCIF are
> database schema and stick keys on everything and loop them quite freely.
> Sure you can always pick any key and extract only the tuples that contain
> that key at a single value and such other tuples from related
> child categories as make some kind of sense and call it a datablock, but
> that is a very narrow and minimally useful view of how to manage
> information.
>   Regards,
>     Herbert
>
>
> On Wed, Jan 6, 2021 at 8:21 PM James Hester <jamesrhester at gmail.com>
> wrote:
>
>> Herbert - are you arguing that imgCIF and mmCIF should not be assigned
>> different schema names? If your comments are not about that, feel free to
>> ignore the following.
>>
>> If you scrutinize the definition in mmCIF of _entry.id (
>> https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html),
>> you will see that it "identifies the data block" so is therefore restricted
>> to a single value in a single data block. It follows that all the child
>> data items of _entry.id are restricted to single values, so where these
>> child items are the sole keys of their categories those categories become
>> single-row categories. Such categories are entirely functionally equivalent
>> to DDLm Set categories and so it would be possible to list which Set
>> categories in core CIF are multi-row in mmCIF, satisfying the criteria for
>> a schema. Frankly I was a bit too lazy to write the code to determine this
>> but from memory it is only diffrn and exptl_crystal. If there are
>> objections to the label "macromolecular" we can change it to "multi-crystal
>> multi-wavelength" to avoid any implications or restrictions on mmCIF.
>>
>> Although imgCIF does not have any categories that have child data names
>> of _entry.id (so every imgCIF category can have multiple rows), it does
>> add new key data names to a few mmCIF categories, thereby creating a
>> distinct "_audit.schema". I don't think that is a controversial statement.
>> For example, the Diffrn_Detector category in imgCIF has key data names
>> "_diffrn_detector.diffrn_id" as well as "_diffrn_detector.id", whereas
>> mmCIF has only the former (as per the text at bottom of p203 of Vol G).
>>
>> all the best,
>> James.
>>
>>
>> On Thu, 7 Jan 2021 at 10:01, Herbert J. Bernstein <yayahjb at gmail.com>
>> wrote:
>>
>>> I believe both imgCIF and mmCIF only use loop categories and any set
>>> categories picked up for inclusion with their datasets will need to have
>>> keys added and be mapped into loop categories.  That is certainly the
>>> case for imgCIF -- Herbert
>>>
>>> On Wed, Jan 6, 2021 at 5:06 PM James Hester <jamesrhester at gmail.com>
>>> wrote:
>>>
>>>> Apologies for the lax terminology. By "looped" I mean "able to have
>>>> more than one row in a loop". Perhaps the explanations should be rewritten
>>>> to use 'Loop category' and 'Set category' rigorously?
>>>>
>>>> On Thu, 7 Jan 2021 at 03:07, Herbert J. Bernstein <yayahjb at gmail.com>
>>>> wrote:
>>>>
>>>>>  In imgCIF (as with mmCIF) any and all categories may be looped -- its
>>>>> how you put information into database tables.  - Herbert
>>>>>
>>>>> On Wed, Jan 6, 2021 at 1:35 AM James Hester via comcifs <
>>>>> comcifs at iucr.org> wrote:
>>>>>
>>>>>> Dear COMCIFS,
>>>>>>
>>>>>> FIrst of all, Happy New Year to you all, I hope you've all been
>>>>>> keeping well.
>>>>>>
>>>>>> I am writing to propose updating the list of _audit.schema in the
>>>>>> core dictionary. Normally this would be core DMG business, but as it
>>>>>> concerns most dictionaries covered by COMCIFS I believe this is the more
>>>>>> appropriate forum. This has been prompted by reviewing the DDLm dictionary
>>>>>> chapters for the next edition of Volume G. Please examine the list below
>>>>>> and discuss any changes you would like to see.  The formal changes to the
>>>>>> dictionary can be viewed as a diff at this link:
>>>>>> https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e
>>>>>>
>>>>>> As a reminder, the _audit.schema dataname indicates that one or more
>>>>>> categories have become looped relative to the core CIF dictionary. For
>>>>>> example, where multiple crystals are used in a measurement, the
>>>>>> exptl_crystal category becomes looped. Ideally software will check this
>>>>>> dataname and exit if the dataname has an incompatible value.
>>>>>>
>>>>>> best wishes,
>>>>>> James.
>>>>>>
>>>>>> =====================================================
>>>>>> loop_
>>>>>> _enumeration_set.state
>>>>>> _enumeration_set.detail
>>>>>>     Base                'Original Core CIF schema'
>>>>>>    'Space group tables' 'space_group category is looped'
>>>>>>     Entry
>>>>>> ;
>>>>>>     entry category is defined and looped: multiple experiments
>>>>>>     with results may be present
>>>>>> ;
>>>>>>     Powder              'Multiple compounds (phases) may be present'
>>>>>>     Modulated           'Multiple subsystems may be present'
>>>>>>     Experiments
>>>>>> ;
>>>>>>     diffrn and exptl_crystal categories are looped: multiple
>>>>>>     diffraction measurements on multiple samples may be present
>>>>>> ;
>>>>>>     Macromolecular
>>>>>> ;
>>>>>>     mmCIF equivalent. Only single-key mmCIF categories containing
>>>>>> children
>>>>>>     of _entry.id are Set categories
>>>>>> ;
>>>>>>     Raw
>>>>>> ;
>>>>>>     imgCIF equivalent. As for Macromolecular, with the addition of
>>>>>>     multiple detectors.
>>>>>> ;
>>>>>>     Laue
>>>>>> ;
>>>>>>     diffrn_radiation is looped: Multiple wavelengths are used.
>>>>>> ;
>>>>>>     Custom              'Examine dictionaries provided in
>>>>>> _audit_conform'
>>>>>>     Local               'Locally modified dictionaries. Datafile not
>>>>>> for distribution'
>>>>>> _enumeration.default    Base
>>>>>> =======================
>>>>>> --
>>>>>> T +61 (02) 9717 9907
>>>>>> F +61 (02) 9717 3145
>>>>>> M +61 (04) 0249 4148
>>>>>> _______________________________________________
>>>>>> comcifs mailing list
>>>>>> comcifs at iucr.org
>>>>>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs
>>>>>>
>>>>>
>>>>
>>>> --
>>>> T +61 (02) 9717 9907
>>>> F +61 (02) 9717 3145
>>>> M +61 (04) 0249 4148
>>>>
>>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>>
>

-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.iucr.org/pipermail/comcifs/attachments/20210107/7e05562f/attachment-0001.html>


More information about the comcifs mailing list