Annual Report for 2005

David Brown idbrown at mcmaster.ca
Thu Apr 20 12:26:41 BST 2006


COMCIFS Annual Report for 2005 to the IUCr Executive Committee

This year marks the fifteenth year since the Union adopted CIF 
(Crystallographic Information Framework, formerly Crystallographic 
Information File) as a standard for submission of crystal structure 
reports to the Union journals.  Much has happened in that time and the 
IUCr Congress in Florence provided an opportunity for COMCIFS to take 
stock of the project and plan its future directions.

The most notable achievement of the past fifteen years has been the 
preparation of an impressive array of CIF dictionaries that provide 
data-names and definitions for the two thousand or so crystallographic 
terms that can appear in CIFs.  No other discipline has a comparable set 
of dictionaries with such a wide community acceptance.  These 
dictionaries are used in conjunction with the STAR file syntax as the 
format for the considerable archive of CIF-based structure reports.  In 
the field of small-cell crystallography CIF is now widely accepted as 
the standard for the submission of structure reports to many scientific 
journals, and for their archiving and downloading.  In the 
macromolecular field CIF is used to archive the Protein Data Bank, but 
it does not yet have as wide community acceptance, most protein 
structure laboratories preferring to stay with the familiar, if 
inadequate, PDB format, and the macromolecular data centres favouring 
the use of XML.

XML is a markup language with many functional similarities to the STAR 
file structure used by CIF.  Although a recent arrival, its development 
by the information technology community has earned it widespread 
acceptance in many scientific communities.  It is more flexible than 
CIF, though this is not necessarily an advantage in an established field 
like crystallography.  It allows users to develop their own semantics 
and define concepts in ways that may not be compatible with those 
defined by other users.  Although XML users have access to an extensive 
suite of programs to manipulate their files, unless they agree on the 
semantics, i.e., the definitions and organization of the concepts of 
their discipline, they are unable to communicate with each other.  CIF's 
suite of dictionaries provides a widely accepted semantic for 
crystallography which can be translated into an XML format for the 
benefit of XML users, though the reverse process is only possible if the 
XML file is written in a form designed to be compatible with CIF.  
COMCIFS is working to ensure that the information contained in CIFs and 
CIF dictionaries is available in XML format.  Some conversion programs 
are already available and more work is planned.

Our goal is to enable CIFs to be read by generic programs that obtain 
all their crystallographic knowledge directly from the CIF 
dictionaries.  This requires that all CIFs rigorously conform to the 
standard.  In the early days this standard was not strictly enforced so 
as to avoid discouraging those who found CIF strange and unfamiliar, but 
over the years the degree of conformity has been steadily increased and 
the CIF standard itself has evolved in subtle ways as we became more 
aware of the possibilities inherent in the STAR syntax.  Thus after 
preparing the coreCIF dictionary as a STAR file using the Dictionary 
Definition Language 1 (DDL1) it was decided that the macromolecular CIF 
dictionary should use advanced features that were only available in 
DDL2.  The result was two incompatible CIF dialects, CIF1 and CIF2, 
using dictionaries based on DDL1 and DDL2 respectively.  This required 
different programs for each dialect, or a duplication of effort to 
ensure that a single program could read both.  While this decision made 
sense at the time, it has returned to haunt us as we strive to ensure 
that we retain compatibility between the CIF1 and CIF2 definitions even 
as the dictionaries evolve independently.

The problem of CIF dialects was discussed in  Florence at the closed 
COMCIFS meeting.  Here we developed a consensus that we should move 
towards a new dictionary language, DDL3, with corresponding CIF3 
dictionaries.  Programs designed to work with CIF3 dictionaries would be 
fully back-compatible and able to read any file written in either CIF1 
or CIF2.  A prototype has already been tested and an early approval of 
DDL3 will allow the conversion of the existing CIF1 and CIF2 
dictionaries to CIF3.  The opportunity is being taken to incorporate 
advanced features that were unimagined fifteen years ago.  One of these 
is the development of an hierarchy of crystallographic concepts that 
would add flexibility and allow the dictionaries to evolve in parallel.  
Another innovation is the introduction of algorithms that instruct a 
program how the value of an item can be calculated on the fly from other 
items present in a CIF.  These algorithms are computer readable 
definitions that will enhance the ability of CIF dictionaries to serve 
as machine-readable repositories of crystallographic knowledge.

While these activities help to keep CIF at the forefront of information 
technology, COMCIFS is also concerned not abandon those who find 
themselves still challenged by the demands of checkCIF.  From the 
beginning we knew that we would need a suite of tools to assist in 
preparing CIFs.  The last couple of years has seen the appearance of a 
number of such programs, e.g., enCIFer, publCIF and CIFedit, that use 
the appropriate CIF dictionaries to assist users in writing fully 
conformant CIFs.  PublCIF has been developed by the IUCr editorial 
office and is well-tuned to the publication requirements for small-cell 
structures. It will continue to be developed to handle macromolecular 
structure reports that are accompanied by structural data in mmCIF 
format, as the editorial production processes develop to handle such 
articles efficiently. Other tools are under development in an 
IUCr-sponsored project to upgrade some older CIF software to strict 
compliance with the latest CIF specifications. This project includes 
updates to vcif, a simple syntax checker, and to CIFtbx, a Fortran 
library; and the provision of a utility to manage the relaxation of the 
line and data name length restrictions in CIF version 1.1. As the 
existing dictionaries are converted to DDL3 we will encourage the 
preparation of CIF3-level programs that will be able to read any CIF 
whether written as CIF1, CIF2 or CIF3.  We expect, however, that the 
existing dictionaries will continue in use until the advantages of CIF3 
become sufficiently apparent that users voluntarily convert.

Among the routine business transacted during the course of the year were 
the preparation of new terms of reference expanding the mandate of 
COMCIFS to ensure that crystallographic information in digital form is 
compatible with standards in neighbouring fields.  These terms were 
subsequently approved by the Executive Committee.  COMCIFS also formally 
adopted responsibility for the maintenance of the DDL1 dictionary which 
had no organization designated to authorize and approve necessary 
changes.  Finally, a complete documentation of CIF concepts and 
associated data dictionaries has been completed as Volume G of the IUCr 
International Tables series.

I. David Brown
Chair

-------------- next part --------------
A non-text attachment was scrubbed...
Name: idbrown.vcf
Type: text/x-vcard
Size: 298 bytes
Desc: not available
Url : http://scripts.iucr.org/pipermail/comcifs/attachments/20060420/bbea3e47/idbrown.vcf


More information about the comcifs mailing list