Reactivating mmCIF DMG (was discussion of dictionary update procedure)

Nick Spadaccini nick at csse.uwa.edu.au
Tue Aug 25 07:36:41 BST 2009


This was the last message of this thread and since there has been no further
correspondence over the last week it is timely to pitch in.

Herb makes some very relevant and sound points. If there are data items in
the pdbx_ namespace that have wider reference to the community then they
should appear in the relevant dictionaries, or even spawn their own
dictionaries. With the new DDLm support for importing, we can draw in
exactly which dictionaries, or part there of, we need.

I can see the need for a TRUE pdbx_ dictionary to exist, in the sense that
it is truly an exchange dictionary and very specific to the PDB. This would
probably not need to be public (though it could be) and remain in house.

Anything that was, say, NMR would likely be best housed in an NMR dictionary
with the provisos (a) COMCIFS (I believe) does not purvey that community's
ontology, and (b) there already is an NMRIF, if I am not mistaken, though it
was in Star (not CIF) and I am not sure of its current status.

One could build an MR dictionary and then accept or deprecate data items if
the NMR community adopt the CIF model and begin fully defining their
discipline - the beauty of the "import" mechanism.

Finally anything that is relevant to the wider community should (I would
argue MUST) have the pdbx_ tag dropped. Why? The "PDBness" of the data is in
itself data, not part of the ontology.

So, presenting secondary structure is part of the ontology, the fact that
one uses the PDB model is data, ie


_..._struct.secondary_whatever_model "PDB"

loop_
_..._struct.secondary_whatever

Rather than

_pdbx_..._struct.secondary_whatever

Otherwise we get a crazy proliferation of personalised, preferred tags. For
instance, an insistence from certain players that there should be

_shelx_....
_crystals_....

Syd and the original developers fought hard to avoid such an event, and it
would be remiss of us not to maintain that approach.



On 20/08/09 7:35 PM, "Herbert J. Bernstein" <yaya at bernstein-plus-sons.com>
wrote:

> Dear John,
> 
>    The wwPDB web site lists PDB file, mmCIF File, and PDBML/XML File as the
> three formats it offers.  It refers to the dictionary it uses as the
> "mmcif_pdbx.dic" dictionary, not as the "PDB Exchange Dictionary".  The
> content of the files themselves are not simply mmCIF files with some
> additional information from the local PDB pdbx dictionary, but differ
> significantly from what would be required to present the structures as
> mmCIF files (e.g. in presenting secondary structure).  It is a disservice
> to the community to have this pointless confusion.
> 
>    It is many years past time to return to the approach to the handling of
> the mmCIF dictionary that the PDB originally endorsed, but did not
> implement, and to create a new community-based mmCIF DMG and to clearly
> define the crystallographic macromolecular CIF dictionary.
> 
>    When the "local" tags needed to represent a crystallographic structure
> become as numerous as they now have become, something is very, very wrong.
> 
>    Our science has changed.  Any valid mmCIF dictionary must, of necessity,
> be coordinated with terms from NMR and microscopy, and with terms used in
> experimantal data collection, just as it must be coordinated with terms
> from the small molecule community and must use symmetry-related terms.
> DDLm offers us the opportunity to bring the mecessary multiple
> dictionaries together into a common framework, but that effort will be
> impaired if we do not clean-up, modularize and coordinate the management
> of the terms used in each domain, and ensure that those speaking for each
> domain have gone to the effort to interact with their communities and
> ensure appropriate discussion and support for what is being recommended.
> 
>    What I am urging is precisely what ANSI and ISO require of any
> standardization effort.  It is also what we all agreed to when the mmCIF
> effort started.
> 
>    Any valid mmCIF DMG needs representation not just from wwPDB, but from
> the dictionaries with which mmCIF interacts -- the core, symmetry, imgCIF,
> NMR, microscopy, etc., and from the communities with which it interacts
> and it needs to carefully consider each of the tags that the PDB and
> others have proposed as being needed to represent a crytallographic
> macromolecular structure.
> 
>    Regards,
>      Herbert
> 
> =====================================================
>   Herbert J. Bernstein, Professor of Computer Science
>     Dowling College, Kramer Science Center, KSC 121
>          Idle Hour Blvd, Oakdale, NY, 11769
> 
>                   +1-631-244-3035
>                   yaya at dowling.edu
> =====================================================
> 
> On Thu, 20 Aug 2009, John Westbrook wrote:
> 
>> James and Herbert,
>> 
>> There is an unfortunate confusion in that mmCIF names both a dictionary
>> and a syntax.  And it is unavoidable that people may identify  the
>> wwPDB PDBx files as mmCIF files.  However, the wwPDB has long refered
>> to its data files as compliant with the PDB Exchange Dictionary (PDBx).
>> The PDB Exchange Dictionary described in the International Tables Volume
>> G is a recognized extension dictionary which is a superset of mmCIF content
>> and uses mmCIF syntax.  This dictionary has been developed in conjunction
>> with the mmCIF dictionary, respecting the mmCIF dictionary organization,
>> and following the COMCIFs practice of using using a local like pdbx_
>> to identify extension definitions.   The content of the PDB exchange
>> dictionary necessarily describes experimental techniques other than
>> crystallography such as  NMR, EM(3dem_), ..., and  hybrids).  This
>> dictionary must also provide mappings to the legacy of nomenclature
>> and conventions in published structures.  Describing this broader
>> content in a manner that maintains the organization and terminology
>> in the mmCIF dictionary presents a range of challenges; however,
>> extensions have been introduced in a manner which as much as possible
>> preserves the mmCIF schema.
>> 
>> It is important to emphasize that the majority of extension content
>> in the PDB Exchange dictionary has been added through grass roots
>> community efforts like COMCIFs and through coordination with other
>> resources like BMRB and EMDB.  PDB has hosted and resourced the
>> the mmCIF DMG for many years.  There has admittedly been little
>> input to this group for a number of years; however, the same can
>> be said for the other dmg's.   We would certainly welcome and
>> support new content contributed to the mmCIF dictionary through
>> this mechanism.
>> 
>> The wwPDB continues to work to provide an archive of data files
>> with a unified representation of macromolecular structure and
>> experiment. We would certainly appreciate COMCIFs support in this
>> effort.
>> 
>> Regards,
>> 
>> John
>> 
>> 
>> 
>> On 8/19/09 11:26 PM, James Hester wrote:
>>> Hi Herbert: I do not dispute that there are also many pdbx tags,
>>> additional to those for internal use, that have significance for the
>>> macromolecular community, thus my agreement that we should reactivate
>>> the mmCIF DMG and look at how to include these.
>>> 
>>> James.
>>> 
>>> On Thu, Aug 20, 2009 at 12:30 PM, Herbert J.
>>> Bernstein<yaya at bernstein-plus-sons.com>  wrote:
>>>> Dear James,
>>>> 
>>>>   The statement that "the PDB additionally include many 'pdbx' items which
>>>> have no meaning outside the PDB, but which enable them to freely convert
>>>> between the database and a CIF file" is incorrect. There certainly are
>>>> tags
>>>> in pdbx which are only needed for internal purposes of the PDB, but the
>>>> issue at hand is not the set of tags for internal use by the pdb, but the
>>>> very large number of tags from the pdbx dictionary that are essential to
>>>> the
>>>> crystallographic description of the molecule.
>>>> 
>>>>   Consider for example, the secondary structure tags.  The pdbx secondary
>>>> structure tags are not just some augmentation for database management,
>>>> they
>>>> are a major recasting of the approach to secondary structure from the
>>>> rather
>>>> elegant approach adopted as part of mmCIF to a compromise between the old
>>>> PDB secondary structure description and the new mmCIF description.
>>>> 
>>>>   Consider also, _atom_site.pdbx_PDB_model_num, which is essential to the
>>>> understanding of multiple model entries, especially because the PDB
>>>> repeats
>>>> atom serial numbers between models.  This is not a database management
>>>> tag.
>>>> 
>>>>   There are many more such tags.  If they were purely for internal
>>>> database
>>>> management, they would not have to be part of every so-called mmCIF entry
>>>> released by the PDB.  These are, quite literally, de facto standards for
>>>> crystallographic macromolecular data, and should be carefully considered
>>>> by
>>>> the crystallographic community in that context.
>>>> 
>>>>   I urge everyone to read
>>>> http://mmcif.pdb.org/dictionaries/ascii/mmcif_pdbx.dic
>>>> 
>>>>   Regards,
>>>>     Herbert
>>>> =====================================================
>>>>   Herbert J. Bernstein, Professor of Computer Science
>>>>    Dowling College, Kramer Science Center, KSC 121
>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>> 
>>>>                  +1-631-244-3035
>>>>                  yaya at dowling.edu
>>>> =====================================================
>>>> 
>>>> On Thu, 20 Aug 2009, James Hester wrote:
>>>> 
>>>>> To first address Herbert's comments below:  A PDB mmCIF file does
>>>>> contain a lot of mmCIF data items, so I would argue that calling it an
>>>>> 'mmCIF' file is reasonable. The PDB additionally include many 'pdbx'
>>>>> items which have no meaning outside the PDB, but which enable them to
>>>>> freely convert between the database and a CIF file.  I might add in
>>>>> passing that this relational database<->  CIF interconvertibility is a
>>>>> rather remarkable attribute of the CIF standard, and we should
>>>>> recognise the PDB for the work that they have put into realising this.
>>>>> 
>>>>> It is not correct to state that the 'pdbx' tags are being proposed as
>>>>> de-facto standards.  If I include 'anbf' tags in powder diffraction
>>>>> CIF file and call the result a 'pdCIF' file, am I proposing these as
>>>>> defacto pdCIF standards?  I think not.  I am simply stating that
>>>>> software that works with pdCIF (eg CMPR) will be able to process this
>>>>> file.
>>>>> 
>>>>> On the other hand, it is clear that there are plenty of pdbx dataitems
>>>>> of relevance to the macromolecular community, and work on bringing
>>>>> these into the mmCIF dictionary would be welcome.  I would certainly
>>>>> support reactivation of the mmCIF DMG and tasking it with updating
>>>>> mmCIF.  What do other members think?
>>>>> 
>>>>> Best wishes,
>>>>> James.
>>>>> 
>>>>> On Thu, Aug 20, 2009 at 10:53 AM, Herbert J.
>>>>> Bernstein<yaya at bernstein-plus-sons.com>  wrote:
>>>>>> 
>>>>>> Dear Colleagues,
>>>>>> 
>>>>>> James has written:
>>>>>> 
>>>>>>> My understanding is that imgCIF and mmCIF are within the purvey of
>>>>>>> COMCIFS, but we have no responsibility for pdbx and so this procedure
>>>>>>> would not apply to it.
>>>>>> 
>>>>>>   I am unable to see any justification for exclusion of pdbx, when that,
>>>>>> rather than mmCIF, is what the PDB uses for its crystallographic
>>>>>> macromolecular file releases, and even calls those pdbx files mmCIF
>>>>>> files.
>>>>>> 
>>>>>>   For example, when I display the "mmCIF" file for 4ins, I get a file
>>>>>> that contains the following pdbx items:
>>>>>> 
>>>>>> _audit_conform.dict_name       mmcif_pdbx.dic
>>>>>> _audit_conform.dict_version    1.0670
>>>>>> _audit_conform.dict_location
>>>>>> http://mmcif.pdb.org/dictionaries/ascii/mmcif_pdbx.dic
>>>>>> 
>>>>>> #
>>>>>> _pdbx_database_PDB_obs_spr.id               SPRSDE
>>>>>> _pdbx_database_PDB_obs_spr.date             1990-04-15
>>>>>> _pdbx_database_PDB_obs_spr.pdb_id           4INS
>>>>>> _pdbx_database_PDB_obs_spr.replace_pdb_id   1INS
>>>>>> #
>>>>>> _pdbx_database_status.status_code    REL
>>>>>> _pdbx_database_status.entry_id       4INS
>>>>>> _pdbx_database_status.deposit_site   ?
>>>>>> _pdbx_database_status.process_site   ?
>>>>>> _pdbx_database_status.SG_entry       .
>>>>>> #
>>>>>> 
>>>>>> loop_
>>>>>> _audit_author.name
>>>>>> _audit_author.pdbx_ordinal
>>>>>> 'Dodson, G.G.'  1
>>>>>> 'Dodson, E.J.'  2
>>>>>> 'Hodgkin, D.C.' 3
>>>>>> 'Isaacs, N.W.'  4
>>>>>> 'Vijayan, M.'   5
>>>>>> 
>>>>>> 
>>>>>> ....
>>>>>> 
>>>>>> and many, many more
>>>>>> 
>>>>>> It is a serious abdication of COMCIFS responsibility to the
>>>>>> crystallographic
>>>>>> community for COMCIFS to fail to consider each of the pdbx tags that are
>>>>>> implicitly being proposed as de facto revisions to the crystallographic
>>>>>> mmCIF dictionary.
>>>>>> 
>>>>>> I propose that a DMG be reactivated for mmCIF and that it be asked by
>>>>>> COMCIFS to make a proposal to COMCIFs on updating the mmCIF dictionary
>>>>>> so
>>>>>> that it can actually be used for crystallographic macromolecular
>>>>>> structures.
>>>>>> 
>>>>>> Regards,
>>>>>>   Herbert
>>>>>> 
>>>>>> P.S.  An alternative would simply be to discard the mmCIF dictionary,
>>>>>> inasmuch as it is not being used.
>>>>>> 
>>>>>> =====================================================
>>>>>>   Herbert J. Bernstein, Professor of Computer Science
>>>>>>    Dowling College, Kramer Science Center, KSC 121
>>>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>>> 
>>>>>>                  +1-631-244-3035
>>>>>>                  yaya at dowling.edu
>>>>>> =====================================================
>>>>> 
>>>>> 
>>>>> --
>>>>> T +61 (02) 9717 9907
>>>>> F +61 (02) 9717 3145
>>>>> M +61 (04) 0249 4148
>>>> 
>>> 
>>> 
>>> 
>> 
>> -- 
>> ******************************************************************
>>  John Westbrook, Ph.D.
>>  Rutgers, The State University of New Jersey
>>  Department of Chemistry and Chemical Biology
>>  610 Taylor Road
>>  Piscataway, NJ 08854-8087
>>  e-mail: jwest at rcsb.rutgers.edu
>>  Ph:  (732) 445-4290  Fax: (732) 445-4320
>> ******************************************************************
>> 
> _______________________________________________
> comcifs mailing list
> comcifs at iucr.org
> http://scripts.iucr.org/mailman/listinfo/comcifs

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini at uwa.edu.au







More information about the comcifs mailing list