Items for the Agenda of the COMCIFS closed meeting

David Brown idbrown at mcmaster.ca
Mon Mar 21 15:17:15 GMT 2005


To members of COMCIFS

ITEMS FOR DISCUSSION AT THE CLOSED COMCIFS MEETING IN FLORENCE

I would like to place the following two topics on the agenda for the 
closed meetings in Florence.  I welcome suggestions for other agenda items.

1. What is the role of CIF in the current rapidly changing world of 
information technology?

2. How can we make transparent the boundary between CIFs written with 
DDL1 dictionaries and those written with DDL2?

David Brown

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

COMMENTS OF THESE ITEMS

1. FUTURE DIRECTION OF CIF
It should be no surprise that an information technology language adopted 
in 1990 needs to be reviewed after fifteen years of operation.   The 
rapid advances in the field and the introduction of XML make such a 
review more than timely.  A further urgency is added by the need to 
ensure that incremental changes that we make in the dictionaries and 
other documents are compatible with future directions of 
crystallographic information technology.  Two current problems 
illustrate how this impacts on dictionary structures.

1. Is it better to have a semantically meaningless item as the 
_list_reference (DDL1) or _category_key (DDL2) to label each line in a 
loop, or should we use semantically meaningful items (such as 
_atom_site_label) that are already present?  The former solution allows 
a more straightforward programming and avoids possible conflicts between 
the information technology and crystallographic use of the item, but the 
latter leaves the CIF less cluttered and easier for humans to follow 
because the links are more readily followed by eye.  The current 
revision of the core dictionary needs an answer to this question, 
because the answer will affect future CIF data structures.

2. Should there be rules defining the relationships that are allowed to 
be expressed by parent-child links?  These links have been developed in 
an ad hoc way, but as we move towards more advanced data structures, we 
may find that we have developed links that are impossible to 
manipulate.  One way of exploring the logic of the linked structures is 
to use the ResourceDescriptionFramework (RDF) which is being developed 
as part of the Semantic Web (see http://www.w3.org/RDF/ and 
http://www.w3.org/RDF/FAQ ).  This scheme expresses the parent-child 
links as a graph making it easier to trace the logic.  Another 
possibility is to use the Unified Modeling Language ( www.uml.org ).

2. THE CONVERGENCE OF DDL1 AND DDL2
As interest focuses on software that explores the interactions of small 
and large molecules, the incompatibility between the Dictionary 
Definition Language 1 (DDL1) and DDL2 is becoming a hindrance.

CoreCIF is designed for use with small molecules and is written in DDL1 
but mmCIF designed for reporting macromolecules is written using DDL2.  
While most of the features of the two standards are similar, there are 
two significant differences:  Firstly DDL2 has a tighter structure 
designed to make automatic computer manipulation of the information 
easier, secondly the names given to the data items have a different 
structure.  As the similarities between the two languages are far 
greater than their differences, it should be possible to achieve some 
convergence;  already the core dictionary is evolving towards the DDL2 
standard, but a complete convergence would require major reworking of 
some dictionaries.

Convergence can be achieved in different ways.  One way is to ensure 
that software is able to validate CIFs against both DDL1 and DDL2 
dictionaries, and since the dictionaries contain synonyms of the data 
names (alternative data names for items with essentially the same 
definition, listed under _related_item (DDL1) and 
_item_aliases.alias_name (DDL2)), any character string used to represent 
a particular data name should be recognized by software that takes note 
of any alias names present regardless of the dictionary or version being 
used.   Since all the items in the coreCIF dictionary appear 
(transformed to DDL2) in the mmCIF dictionary with their original DDL1 
data names given as aliases, mmCIF software should be able to read 
coreCIFs without difficulty.  mmCIF aliases are currently not present in 
the coreCIF dictionary but could easily be added.  Alternatively, a DDL2 
version of the coreCIF dictionary could be separated out and used as an 
alternative to the DDL1 core dictionary.




-------------- next part --------------
A non-text attachment was scrubbed...
Name: idbrown.vcf
Type: text/x-vcard
Size: 298 bytes
Desc: not available
Url : http://scripts.iucr.org/pipermail/comcifs/attachments/20050321/718b6dd2/idbrown.vcf


More information about the comcifs mailing list