Accent escape sequences

Fri Mar 2 10:11:47 GMT 2007

Dear Joe

We have recently exchanged a few messages off-list, and it is
clear that you have an interest in, and perhaps some time for,
working on CIF-based applications. It would be great if you would
introduce yourself to the list with a brief indication of your
current interests.

Regarding the untidy typographic markup conventions in CIF text
fields, what we currently have arises from the pragmatic
requirements of our early 1991 (prehistoric!) CIF-handling
procedures in Acta Cryst. We used TeX as a formatter, so
the markup (initially) was somewhat TeX-like; but there was
pressure on us not to rely on TeX, especially as many of our
authors would have no experience of it. Thus a minimal set
of markup was devised, requiring very little learning from
authors, that covered most markup that in practice we came
across in Acta C papers (which have rather little
mathematical content). Very few additional codes were
introduced; and, for example, the relatively recent <i> and
<b> markup for italic and bold was chosen because
non-specialist authors were beginning to become familiar
with such codes in HTML markup.

The current arrangement is, in my opinion, very inelegant,
but it is supported by publCIF, the IUCr's own CIF editor,
and is workable within that tool's reasonably user-friendly
interface.

To provide better formatting abilities, I think it would be
preferable to allow text fields to contain markup in various
different standard formats, suitably identified, and to
pass the fields to appropriate handlers. The simplest way to
do so would be to have a 'magic number' introducing each text
field. There's an undocumented example of this inasmuch as
ciftex, the old cif->TeX translater, passes through unchanged
any text field beginning 
;%T   (i.e. it treats is as containing pure TeX markup).
The 'magic number' might be a simple character sequence
(%T for TeX, %L for LaTeX, %H html, %R RTF, %U Unicode...)
or could be a more general, but more verbose, signature
involving MIME headers:
;
Content-Type: application/tex
(this mimics the approach for embedding binary data in imgCIF files).

There's nothing fundamentally wrong with extending the existing
special character sequences, and I'm happy to consider a
specific proposal in terms of whether we could easily provide
publCIF support for it. The problem is that the more one offers
to the author, the more the author will want to do, and the more
unwieldy an ad-hoc markup will become. (And recall that even
TeX, which is unparalleled for mathematics, does not offer as
primitives anywhere near all the symbols that our authors do
use.)

I should be interested in hearing other COMCIFS' members thoughts
on this.

Brian

On Thu, Mar 01, 2007 at 02:28:41PM -0500, Joe Krahn wrote:
> It seems that there is no way to escape a single quote followed by a
> space. I was looking at the accent escape sequences and realize that it
> would be useful if these trigraphs were allowed to use a space as the
> 'letter' being modified. For example:
> 
>   "\' " becomes "'"
>   "\~ " becomes "~"
>   "\^ " becomes "^"
>   "\% " becomes the degree symbol
> 
> Currently, there is no carat escape to avoid superscripts, and the
> current tilde escape is only listed as "accepted by convention".
> 
> If you generalize the sequence <backslash><non-alphabetic><character> to
> function like an old double-strike sequence, you can get other useful
> combinations as well, for example "\/=" becomes not-equals.
> 
> I suspect that these trigraphs have not become better defined because
> most people would rather just switch to some other modern encoding. But,
> as an archival format, we are somewhat stuck with the current scheme,
> and it probably makes sense to keep things in plain ASCII, and
> human-readable. Also, I found another set of similar trigraph
> definitions that are much more extensive at the bottom of the following
> page:
> 
> http://abc.sourceforge.net/standard/abc2-draft.html
> 
> It is probably good to define a complete list of allowed trigraphs and
> other codes, and do away with "accepted by convention" as a separate
> list. I also think that it is worth extending the trigraphs to a more
> complete set.
> 
> I am willing to try to make such a list if it is deemed useful, but
> there are some things I already don't understand from the current set:
> 
> What is the purpose of \\rangle and \\langle; are these different from
> "<" and ">"?
> 
> Why not use a more symbolic form for some items, like "\<-" instead of
> "\\leftarrow"
> 
> Why do double and triple bond codes have names, and single bond is just
> "---"?
> 
> Joe Krahn
> _______________________________________________
> comcifs mailing list
> comcifs at iucr.org
> http://scripts.iucr.org/mailman/listinfo/comcifs

Crystallography Online: the website of the International Union of Crystallography

Accent escape sequences