Proposal to regulate markup in CIF files

James Hester jamesrhester at gmail.com
Wed Sep 13 05:17:09 BST 2017


Dear COMCIFS

Please see below a draft proposal for dealing with markup in CIF files. Let
me know if you agree with the general approach, or suggest better
alternatives.  Once our general direction is agreed, our COMCIFS
subcommittees will go into a huddle and sort out the definitions.

James.

===============


Proposal for regulating markup of CIF text items
================================================

Summary
=======

Data names whose values can be marked-up are given a new type.  In
consultation with heavy users of markup, a limited set of markup
systems (ideally just one) is documented in the CIF core dictionary.

Introduction
============

>From the very first publication describing CIF, markup conventions
have been provided in order to extend the range of characters and font
effects representable in ASCII.  Which data values these conventions
might apply to, and whether or not this is more properly a CIF syntax
or dictionary (semantic) issue, has been left implicit.

Marked-up text according to the ad-hoc definitions described in Vol G
appears both in CIF data files and in dictionary definitions. While
COMCIFS has control over the conventions applying within dictionaries,
it has far less control over data values in data files, which are
produced both by dedicated software, such as publCIF, and hand-editing
or local ad-hoc solutions.  Marked-up text in data files plays an
important role in the publication workflow.

Vol G (First Edition) notes in section 2.2.5.3: "It is hoped that in
future different types of such markup may be permitted so long as the
data values affected can be tagged with an indication of their content
type that allows the appropriate content handlers to be invoked". This
document attempts to clean up this outstanding piece of CIF
housekeeping, in part by rejecting the premise that multiple markup
approaches are desirable.

Syntax or semantics
===================

The markup conventions have traditionally been presented together with
the CIF syntax standard, as a "common semantic feature". However, upon
consideration of CIF-JSON and CIF embedded in HDF5, it is clear that
the syntax itself does not determine how a particular text string
should be marked up: a text string transferred to either of those
formats remains identical, with identical meaning as defined in the
appropriate dictionary; there is no reason that markup should be any
different.

Therefore, markup conventions should be described either in the
relevant DDLm dictionary or in the DDLm attribute definitions. Use of
the DDLm attribute dictionary to define markup conventions is too
rigid, as DDLm could, in principle, be used in a context completely
divorced from crystallography that may well have its own preferred
markup.  This leaves the dictionary as the appropriate place to define
the markup convention.

I propose we proceed as follows:

(1) We add a new enumerated value to `_type.contents`
    e.g. 'Marked-up'. Data names with this type may be marked up.
(2) We add a new data name to the core dictionary
    e.g. `_publ.markup_convention`. The markup convention indicated by
    this data name applies to all data values of type 'Marked-up'.
(3) We provide the `_publ.markup_convention` data name with one or
    more enumerated values each corresponding to a particular markup
    convention that may be in operation in a data block
(4) We aim to restrict markup options to a single (default) value
    corresponding to the current markup system. Any expansion or
    alteration should be carried out in consultation with heavy users
    of the markup.

Comments
========

1. It is generally undesirable to allow multiple markup
   systems. Publication workflows are generally complex, and having to
   support multiple source text forms would place an undue and
   unnecessary burden on heavy users of CIF text markup like the IUCr
   journals.  The presence of a dedicated data name does, however,
   allow proponents of alternative markup conventions to make their
   case.

2. Markup is purely for human consumption. In particular, enumerated
   values must be excluded, and no machine-actionable content should
   be encoded in marked-up text.

3. With the agreement of heavy consumers of CIF marked-up text, we may
   explore expanding the scope of the current convention to include
   elements such as footnotes.

4. Since 1993, when the lightweight CIF markup was proposed, a number
   of lightweight ASCII text markup systems have been developed. These
   include reStructured text, Markdown and ASCIIDoc. While it might be
   worthwhile transitioning to one of these in order to use the more
   comprehensive formatting that would be available, the backslash
   notation currently used is likely to cause conflict.


-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.iucr.org/pipermail/comcifs/attachments/20170913/36d3e002/attachment.html>


More information about the comcifs mailing list