[Cif2-encoding] How we wrap this up

Bollinger, John C John.Bollinger at STJUDE.ORG
Wed Sep 29 15:26:10 BST 2010


Simon,

On Wednesday, September 29, 2010 2:17 AM, SIMON WESTRIP

>John, I do not think a specification that suggests that a CIF can be invalidated simply by being moved to
>another environment is helpful to anyone.

In that case, you must be operating under a different definition of "text" than Herb provided yesterday:

On Tuesday, September 28, 2010 2:41 PM, Herbert J. Bernstein wrote:
>However, the real answer (not a joke) is that a text encoding is whatever
>the formatted I/O system in a fortran compiler on the system under
>discussion reads and writes or the format of a COBOL EBCDIC-sequential
>file or a COBOL ASCII line-sequential file, or what a text editor on the
>system handles.  That is the point -- text is something very, very system
>and language dependent. [...]

The potential for confusion over the meaning of "text" was by far my greatest cause for concern about the "As for CIF1..." alternatives, so I am very grateful to Herb for providing a definition.  I am furthermore very pleased that his definition matches so well the one that I have advanced under the label "local", which I think is also the best interpretation of the requirements of CIF1.  Even disregarding the definition of "text", however, CIF1 clearly holds that a CIF can indeed be invalidated simply by being moved to another environment.  In particular, CIF1 expressly specifies that CIF processors are not required to understand non-native line termination sequences.  I have used CIF1 processors on several platforms that do not do so.  As has been observed several times, CIF1 has nevertheless served well for years.  We would not be having this discussion now if it were not helpful to many people.

I submit that among the options on the table, only (3) and (4) do not leave CIF2 CIFs susceptible to invalidation upon being moved to a different environment.  These are not my overall preference, but I favor them over "text"-only because they permit use of UTF-8.  Under the above definition of "text" and the "As for CIF1..." proposals, any recommendation that the spec might make to use UTF-8 and / or UTF-16 would be futile.  Depending on the environment, either UTF-8(-16) would be required for conformance with the local definition of "text", or it would be forbidden as non-conforming (I disregard the case of ASCII-only CIFs for which the encoding could be construed as any ASCII-compatible encoding, including UTF-8).  In most current environments, UTF-8 would be forbidden.

As much as I join Herb in favoring support for "text" CIFs as he defines them, I remain convinced that UTF-8 must be a conformant option for CIF2 to move ahead.  I think UTF-8 would be sufficient to cover most (but not all) of the cases for which "text" ensures support, thus my preference for options (3) and (4) over options (1) and (2).  This is, again, the genesis of option (5), which I think now could be relabeled "text + UTF-8 (+- UTF-16)".


Regards,

John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital


Email Disclaimer:  www.stjude.org/emaildisclaimer


More information about the cif2-encoding mailing list