[Cif2-encoding] A new(?) compromise position

Brian McMahon bm at iucr.org
Wed Sep 29 16:56:03 BST 2010


On Thu, Sep 30, 2010 at 12:24:45AM +1000, James Hester wrote:
> Here is a newish compromise:
> 
> Encoding: The encoding of CIF2 text streams containing only code points in
> the ASCII range is not specified. CIF2 text streams containing any code
> points outside the ASCII range must be encoded such that the encoding can be
> reliably identified from the file contents.  At present only UTF8 and UTF16
> are considered to satisfy this constraint.

I concur with the principle of this statement (Herbert and John have
already demonstrated that some additional drafting effort may be needed
to eliminate or reduce ambiguity). It leaves the door open for
additional encodings that are deemed to satisfy the constraint of
providing self-identification, perhaps through key signatures or
hashcodes; but there is then an onus on a community wishing to adopt
such an encoding to develop and publish the criteria that make it
self-identifying.

(I'm not *encouraging* this to happen; the irony of our lengthy debate
is that we all seem to be trying to achieve the promulgation of a
canonical encoding.)

Best wishes
Brian
_________________________________________________________________________
Brian McMahon                                       tel: +44 1244 342878
Research and Development Officer                    fax: +44 1244 314888
International Union of Crystallography            e-mail:  bm at iucr.org
5 Abbey Square, Chester CH1 2HU, England


More information about the cif2-encoding mailing list