Advice on COMCIFS policy regarding compatibility of CIF syntax with other domains

Peter Murray-Rust pm286 at
Tue Mar 1 09:02:35 GMT 2011

On Tue, Mar 1, 2011 at 4:12 AM, James Hester <jamesrhester at> wrote:

> Dear COMCIFS members:
> The DDLm group is currently engaging in developing an elide mechanism
> for the CIF2 standard.  Our deliberations have reached something of an
> impasse due to disagreement around the use of triple quotes as a
> string delimiter.  Python is a popular programming language that also
> uses triple quotes to delimit strings. One side of the discussion
> considers that use of triple quotes as a string delimiter means that
> all escape sequences recognised by Python should also be recognised by
> CIF, in order to avoid confusion and improve consistency with
> mainstream (ie Python) practice.  The other side of the discussion
> sees little to benefit to CIF from including the additional ten or so
> escape sequences and advocates leaving them out of the CIF2 standard,
> instead adopting the minimal number of escape sequences to allow
> eliding.

I have been through a lot of similar stuff in creating CML and want to
emphasize that this is a difficult problem and not one that can be tackled
in small responses to problems as they arise. I see the following aspects:
* quoting/escaping. All languages suffer from this and there is no escape
from infinite regress. (I have Java code where I had to escape twice and I
end up with four concatenated backslashes. XML uses a special construct
(CDATA) to escape XML within XML.) The triple quotes have the same potential
for recursion.
* The inclusion of Python starts to turn CIF from a data markup language to
a declarative/functional language (the ultimate example is LISP and modern
derivatives). By including executable python you commit either to a wide
range of constructs or you have to draw an arbitrary borderline where the
power of the language is reduced.

In creating a language there is usually a spectrum between:
* easy-to implement - often verbose, but precise. Authors usually hate it
* easy to write - minimal markup but difficult to process. This relies on
having a large and flexible toolchain. It is possible to write specs that
are unimplementable. XML had this problem and reduced the power of the
language to solve the problem.

By creating its own language CIF has implicitly had to create a considerable
toolchain of parsing, validation, semantics. If the community can support
the increased demands for tools, fine. Otherwise I suspect we should work
with pragmatic approaches, some of which are theoretically broken and some
of which may require specific knowledge for authors.


Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
-------------- next part --------------
An HTML attachment was scrubbed...

More information about the comcifs mailing list