Updating list of _audit.schema
john.westbrook at rcsb.org
john.westbrook at rcsb.org
Thu Jan 7 13:32:39 GMT 2021
Hi James,
Related to the loop_ presentation issue. Nothing prohibits creating a loop_ with a single row even if
the category logically has unit cardinality.
Regards,
John
On 1/7/21 7:00 AM, Herbert J. Bernstein via comcifs wrote:
> Dear James,
> This reminds me very much about the bitter fight in the early 1970s between the proponents of hierarchical databases
> and relational databases. At the time I was on the wrong side and thought that there was something very neat and
> organized about always forcing your information into a tree that allowed the use of highly efficient pointers. Just putting
> information into tables of unsorted tuples and eschewing pointers seemed horribly inefficient. Those of us who
> liked hierarchies and pointers were wrong. Codd was right. My enthusiasm is the enthusiasm of a convert. Relations
> rule!!!
> Regards,
> Herbert
>
> On Wed, Jan 6, 2021 at 11:39 PM James Hester <jamesrhester at gmail.com <mailto:jamesrhester at gmail.com>> wrote:
>
> OK Herbert, I can only go on what is in the dictionaries. Please explain how mmCIF can "loop freely" the categories containing
> an _entry.id <http://entry.id> child key data name within a single data block given the definition. I will leave John W to
> further comment on how _entry.id <http://entry.id> is supposed to be used if he wishes. Meanwhile, in order to make progress I
> suggest simply
> (i) removing the "Macromolecular" option, noting the previous "Experiments" option covers multi-wavelength, multi-crystal setups.
> (ii) removing the "imgCIF" option
> (iii) returning in the future to add corresponding _audit.schema options corresponding to mmCIF and imgCIF if necessary
>
> By the way, I think I share your enthusiasm for managing information using the relational model, but data containers (data
> blocks/files/directories etc.) are unavoidable, and your characterisation of them as projections over particular values of one
> or more key data names I think is the precise way of defining the relationship between a data block and a relational schema and
> is vital for proper understanding of how to build datasets from constituent pieces.
>
>
> On Thu, 7 Jan 2021 at 13:11, Herbert J. Bernstein <yayahjb at gmail.com <mailto:yayahjb at gmail.com>> wrote:
>
> Dear James,
> John Westbrook will have to speak to the question of why his dictionary says that, but the reality is that he also runs a
> database that
> in fact supports a lot more than one entry id, and it is certainly the case that imgCIF data can have a very complex and
> tangled relationship
> with mmCIF entry ids. Further it is not unusual to present one entry as multiple datablocks pulled together by a common
> entry id when
> they are not in same data file, e.g. for structure factors and coordinates.
> All of which is beside the point. Both imgCIF and mmCIF are database schema and stick keys on everything and loop them
> quite freely.
> Sure you can always pick any key and extract only the tuples that contain that key at a single value and such other tuples
> from related
> child categories as make some kind of sense and call it a datablock, but that is a very narrow and minimally useful view of
> how to manage information.
> Regards,
> Herbert
>
> On Wed, Jan 6, 2021 at 8:21 PM James Hester <jamesrhester at gmail.com <mailto:jamesrhester at gmail.com>> wrote:
>
> Herbert - are you arguing that imgCIF and mmCIF should not be assigned different schema names? If your comments are not
> about that, feel free to ignore the following.
>
> If you scrutinize the definition in mmCIF of _entry.id <http://entry.id>
> (https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html
> <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entry.id.html>), you will see that it "identifies the
> data block" so is therefore restricted to a single value in a single data block. It follows that all the child data
> items of _entry.id <http://entry.id> are restricted to single values, so where these child items are the sole keys of
> their categories those categories become single-row categories. Such categories are entirely functionally equivalent to
> DDLm Set categories and so it would be possible to list which Set categories in core CIF are multi-row in mmCIF,
> satisfying the criteria for a schema. Frankly I was a bit too lazy to write the code to determine this but from memory
> it is only diffrn and exptl_crystal. If there are objections to the label "macromolecular" we can change it to
> "multi-crystal multi-wavelength" to avoid any implications or restrictions on mmCIF.
>
> Although imgCIF does not have any categories that have child data names of _entry.id <http://entry.id> (so every imgCIF
> category can have multiple rows), it does add new key data names to a few mmCIF categories, thereby creating a distinct
> "_audit.schema". I don't think that is a controversial statement. For example, the Diffrn_Detector category in imgCIF
> has key data names "_diffrn_detector.diffrn_id" as well as "_diffrn_detector.id <http://diffrn_detector.id>", whereas
> mmCIF has only the former (as per the text at bottom of p203 of Vol G).
>
> all the best,
> James.
>
>
> On Thu, 7 Jan 2021 at 10:01, Herbert J. Bernstein <yayahjb at gmail.com <mailto:yayahjb at gmail.com>> wrote:
>
> I believe both imgCIF and mmCIF only use loop categories and any set categories picked up for inclusion with their
> datasets will need to have
> keys added and be mapped into loop categories. That is certainly the case for imgCIF -- Herbert
>
> On Wed, Jan 6, 2021 at 5:06 PM James Hester <jamesrhester at gmail.com <mailto:jamesrhester at gmail.com>> wrote:
>
> Apologies for the lax terminology. By "looped" I mean "able to have more than one row in a loop". Perhaps the
> explanations should be rewritten to use 'Loop category' and 'Set category' rigorously?
>
> On Thu, 7 Jan 2021 at 03:07, Herbert J. Bernstein <yayahjb at gmail.com <mailto:yayahjb at gmail.com>> wrote:
>
> In imgCIF (as with mmCIF) any and all categories may be looped -- its how you put information into
> database tables. - Herbert
>
> On Wed, Jan 6, 2021 at 1:35 AM James Hester via comcifs <comcifs at iucr.org <mailto:comcifs at iucr.org>> wrote:
>
> Dear COMCIFS,
>
> FIrst of all, Happy New Year to you all, I hope you've all been keeping well.
>
> I am writing to propose updating the list of _audit.schema in the core dictionary. Normally this would
> be core DMG business, but as it concerns most dictionaries covered by COMCIFS I believe this is the more
> appropriate forum. This has been prompted by reviewing the DDLm dictionary chapters for the next edition
> of Volume G. Please examine the list below and discuss any changes you would like to see. The formal
> changes to the dictionary can be viewed as a diff at this link:
> https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e
> <https://github.com/COMCIFS/cif_core/pull/190/commits/5e3b84e6f84997f9822f704a9f380ff500e0410e>
>
> As a reminder, the _audit.schema dataname indicates that one or more categories have become looped
> relative to the core CIF dictionary. For example, where multiple crystals are used in a measurement, the
> exptl_crystal category becomes looped. Ideally software will check this dataname and exit if the
> dataname has an incompatible value.
>
> best wishes,
> James.
>
> =====================================================
> loop_
> _enumeration_set.state
> _enumeration_set.detail
> Base 'Original Core CIF schema'
> 'Space group tables' 'space_group category is looped'
> Entry
> ;
> entry category is defined and looped: multiple experiments
> with results may be present
> ;
> Powder 'Multiple compounds (phases) may be present'
> Modulated 'Multiple subsystems may be present'
> Experiments
> ;
> diffrn and exptl_crystal categories are looped: multiple
> diffraction measurements on multiple samples may be present
> ;
> Macromolecular
> ;
> mmCIF equivalent. Only single-key mmCIF categories containing children
> of _entry.id <http://entry.id> are Set categories
> ;
> Raw
> ;
> imgCIF equivalent. As for Macromolecular, with the addition of
> multiple detectors.
> ;
> Laue
> ;
> diffrn_radiation is looped: Multiple wavelengths are used.
> ;
> Custom 'Examine dictionaries provided in _audit_conform'
> Local 'Locally modified dictionaries. Datafile not for distribution'
> _enumeration.default Base
> =======================
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> comcifs mailing list
> comcifs at iucr.org <mailto:comcifs at iucr.org>
> http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs
> <http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs>
>
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>
>
> _______________________________________________
> comcifs mailing list
> comcifs at iucr.org
> http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs
>
--
John Westbrook
RCSB, Protein Data Bank
Rutgers, The State University of New Jersey
Institute for Quantitative Biomedicine at Rutgers
174 Frelinghuysen Rd
Piscataway, NJ 08854-8087
e-mail: john.westbrook at rcsb.org
Ph: (848) 445-4290 Fax: (732) 445-4320
More information about the comcifs
mailing list