Advice on COMCIFS policy regarding compatibility of CIFsyntax with other domains.

Brian McMahon bm at iucr.org
Sat Mar 5 12:53:00 GMT 2011


I should like to make a number of comments to explain my perspective
on the choices we face in continuing to develop CIF in the direction
of greater functionality. (Think of me in this context as an Editor of
International Tables G, with a specific desire to document a coherent
and consistent standard sponsored by the IUCr.)

(1) "Stakeholder buy-in" is not something we'll be able to guarantee,
whatever approach we take. Herbert makes the well-established point
that demonstrably "better" solutions are not always those that
flourish. Peter's experiences with CML are also illuminating. On the
face of it, one would expect CML to attract a huge number of
stakeholders based on the potential returns from an information
exchange standard in chemistry, and the adoption of a standard data
format (XML) that is common currency. And yet, the uptake of CML still
seems to me modest - I would go so far as to say disappointing.

(2) There is a real danger that a poor choice for a novel syntactic
feature could lead to some confusion. But (a) there are relatively few
available constructs with the necessary aesthetic appeal (if we want
to keep files reasonably human readable), and any one choice may
conflict in some way with some other existing convention; and (b)
people can learn to cope with potential ambiguities when they have
to. Who are such "people" and when will they "have to"? That will
depend in part on how the results are used in the real world. If all
CIF input/output is through computerised pipelines, the onus is on the
developers writing the standard parsers. If it's envisaged that
general cut and paste will frequently be involved, then the incidence
of mistakes is likely to be higher. There's going to be considerable
scope for judgement in trying to assess the impact of a syntactic
change in the real world.

(3) That's related to Peter's notion of toolchains, which is well
made. The CIF data format is idiosyncratic (a turn-off for many), but
well specified and relatively lightweight. On the other hand, it is
supported by a rather small set of tools. Building new bespoke tools
will be burdensome, because it must be undertaken by a small
community. In my view, extending that toolchain within a small group,
if that is what needs to be done, is helped by applying Occam's razor
("entities should not be unnecessarily multiplied") to new syntactic
elements - they should contain the smallest amount of complexity needed
to achieve the purpose for which they have been introduced.

(4) Then the question arises of whether - and what - we gain by
introducing syntactic features, not strictly necessary but familiar to
a larger constituency, in the hope of encouraging greater direct
involvement in extending the toolchain. Frankly, I am sceptical that
by itself this would attract many new active programmers, but I'd be
interested in hearing counter-views, especially from this wider circle
of COMCIFS advisers with all your experience of your respective
communities.

I think it *might* be more appealing to groups that decline to engage
directly with CIF as it now stands if there were widespread traffic of
CIFs expressed in a less idiosyncratic format (e.g. XML). But only
"might" - there's nothing actually preventing people from developing
XML files based on CIF tokens and dictionary attributes within the
"Crystallographic Information Framework". Indeed, there are already
instances of this: PDBML, the crystal structure content within CML
files, the symmetry database of the Bilbao group. But none have really
taken off, and I'm not aware of any community demand for delivery of
CIF data in XML format, certainly among the people who interact with
the IUCr journals. My conclusion is that it is not primarily the
format that is inhibiting greater stakeholder buy-in, certainly to CIF
as it now exists.

There might be greater interest from people external to this group if
and when we have functioning dictionary-based methods evaluation
through dREL, but on past experience I wouldn't automatically expect
that to happen. However, we do have within this CIF developer
community significant experience in implementing dictionary-based
methods. Doing this properly over the complex objects described by
related dictionary categories is not at all an easy task. This is why
we have needed to develop dREL, not as yet another general-purpose
language, but as one specifically tuned to the data structures
expressible in CIF. If the consequence of this is the need to continue
extending our own toolchain, that argues against putting additional
obstacles in the path of doing so.


Regards
Brian
_________________________________________________________________________
Brian McMahon                                       tel: +44 1244 342878
Research and Development Officer                    fax: +44 1244 314888
International Union of Crystallography            e-mail:  bm at iucr.org
5 Abbey Square, Chester CH1 2HU, England


More information about the comcifs mailing list