Updating list of _audit.schema

Herbert J. Bernstein yayahjb at gmail.com
Thu Jan 7 02:10:30 GMT 2021


Dear James,
  John Westbrook will have to speak to the question of why his dictionary
says that, but the reality is that he also runs a database that
in fact supports a lot more than one entry id, and it is certainly the case
that imgCIF data can have a very complex and tangled relationship
with mmCIF entry ids.  Further it is not unusual to present one entry as
multiple datablocks pulled together by a common entry id when
they are not in same data file, e.g. for structure factors and coordinates.
  All of which is beside the point.  Both imgCIF and mmCIF are
database schema and stick keys on everything and loop them quite freely.
Sure you can always pick any key and extract only the tuples that contain
that key at a single value and such other tuples from related
child categories as make some kind of sense and call it a datablock, but
that is a very narrow and minimally useful view of how to manage
information.
  Regards,
    Herbert


On Wed, Jan 6, 2021 at 8:21 PM James Hester <jamesrhester at gmail.com> wrote:

> Herbert - are you arguing that imgCIF and mmCIF should not be assigned
> different schema names? If your comments are not about that, feel free to
> ignore the following.
>
> If you scrutinize the definition in mmCIF of _entry.id (
> https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html),
> you will see that it "identifies the data block" so is therefore restricted
> to a single value in a single data block. It follows that all the child
> data items of _entry.id are restricted to single values, so where these
> child items are the sole keys of their categories those categories become
> single-row categories. Such categories are entirely functionally equivalent
> to DDLm Set categories and so it would be possible to list which Set
> categories in core CIF are multi-row in mmCIF, satisfying the criteria for
> a schema. Frankly I was a bit too lazy to write the code to determine this
> but from memory it is only diffrn and exptl_crystal. If there are
> objections to the label "macromolecular" we can change it to "multi-crystal
> multi-wavelength" to avoid any implications or restrictions on mmCIF.
>
> Although imgCIF does not have any categories that have child data names of
> _entry.id (so every imgCIF category can have multiple rows), it does add
> new key data names to a few mmCIF categories, thereby creating a distinct
> "_audit.schema". I don't think that is a controversial statement. For
> example, the Diffrn_Detector category in imgCIF has key data names
> "_diffrn_detector.diffrn_id" as well as "_diffrn_detector.id", whereas
> mmCIF has only the former (as per the text at bottom of p203 of Vol G).
>
> all the best,
> James.
>
>
> On Thu, 7 Jan 2021 at 10:01, Herbert J. Bernstein <yayahjb at gmail.com>
> wrote:
>
>> I believe both imgCIF and mmCIF only use loop categories and any set
>> categories picked up for inclusion with their datasets will need to have
>> keys added and be mapped into loop categories.  That is certainly the
>> case for imgCIF -- Herbert
>>
>> On Wed, Jan 6, 2021 at 5:06 PM James Hester <jamesrhester at gmail.com>
>> wrote:
>>
>>> Apologies for the lax terminology. By "looped" I mean "able to have more
>>> than one row in a loop". Perhaps the explanations should be rewritten to
>>> use 'Loop category' and 'Set category' rigorously?
>>>
>>> On Thu, 7 Jan 2021 at 03:07, Herbert J. Bernstein <yayahjb at gmail.com>
>>> wrote:
>>>
>>>>  In imgCIF (as with mmCIF) any and all categories may be looped -- its
>>>> how you put information into database tables.  - Herbert
>>>>
>>>> On Wed, Jan 6, 2021 at 1:35 AM James Hester via comcifs <
>>>> comcifs at iucr.org> wrote:
>>>>
>>>>> Dear COMCIFS,
>>>>>
>>>>> FIrst of all, Happy New Year to you all, I hope you've all been
>>>>> keeping well.
>>>>>
>>>>> I am writing to propose updating the list of _audit.schema in the core
>>>>> dictionary. Normally this would be core DMG business, but as it concerns
>>>>> most dictionaries covered by COMCIFS I believe this is the more appropriate
>>>>> forum. This has been prompted by reviewing the DDLm dictionary chapters for
>>>>> the next edition of Volume G. Please examine the list below and discuss any
>>>>> changes you would like to see.  The formal changes to the dictionary can be
>>>>> viewed as a diff at this link:
>>>>> https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e
>>>>>
>>>>> As a reminder, the _audit.schema dataname indicates that one or more
>>>>> categories have become looped relative to the core CIF dictionary. For
>>>>> example, where multiple crystals are used in a measurement, the
>>>>> exptl_crystal category becomes looped. Ideally software will check this
>>>>> dataname and exit if the dataname has an incompatible value.
>>>>>
>>>>> best wishes,
>>>>> James.
>>>>>
>>>>> =====================================================
>>>>> loop_
>>>>> _enumeration_set.state
>>>>> _enumeration_set.detail
>>>>>     Base                'Original Core CIF schema'
>>>>>    'Space group tables' 'space_group category is looped'
>>>>>     Entry
>>>>> ;
>>>>>     entry category is defined and looped: multiple experiments
>>>>>     with results may be present
>>>>> ;
>>>>>     Powder              'Multiple compounds (phases) may be present'
>>>>>     Modulated           'Multiple subsystems may be present'
>>>>>     Experiments
>>>>> ;
>>>>>     diffrn and exptl_crystal categories are looped: multiple
>>>>>     diffraction measurements on multiple samples may be present
>>>>> ;
>>>>>     Macromolecular
>>>>> ;
>>>>>     mmCIF equivalent. Only single-key mmCIF categories containing
>>>>> children
>>>>>     of _entry.id are Set categories
>>>>> ;
>>>>>     Raw
>>>>> ;
>>>>>     imgCIF equivalent. As for Macromolecular, with the addition of
>>>>>     multiple detectors.
>>>>> ;
>>>>>     Laue
>>>>> ;
>>>>>     diffrn_radiation is looped: Multiple wavelengths are used.
>>>>> ;
>>>>>     Custom              'Examine dictionaries provided in
>>>>> _audit_conform'
>>>>>     Local               'Locally modified dictionaries. Datafile not
>>>>> for distribution'
>>>>> _enumeration.default    Base
>>>>> =======================
>>>>> --
>>>>> T +61 (02) 9717 9907
>>>>> F +61 (02) 9717 3145
>>>>> M +61 (04) 0249 4148
>>>>> _______________________________________________
>>>>> comcifs mailing list
>>>>> comcifs at iucr.org
>>>>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs
>>>>>
>>>>
>>>
>>> --
>>> T +61 (02) 9717 9907
>>> F +61 (02) 9717 3145
>>> M +61 (04) 0249 4148
>>>
>>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.iucr.org/pipermail/comcifs/attachments/20210106/714ce370/attachment.html>


More information about the comcifs mailing list