[Cif2-encoding] Splitting of imgCIF and other sub-topics. .. .. .

Bollinger, John C John.Bollinger at STJUDE.ORG
Tue Sep 14 17:10:09 BST 2010


On Tuesday, September 14, 2010 9:47 AM, Herbert J. Bernstein wrote:
>   To avoid any misunderstandings, rather than worrying about how we got to where we are, let us each just state a clear position.

Very well:

I favor restricting the scope of the CIF2 specification to the file format, excluding any explicit requirements of programs, users, or other entities.  *Non-normative* commentary on the meaning, impact, and use of the normative format definition is welcome, however.  In that light,

I favor CIF2 defining binary "CIFs" formed by encoding the underlying Unicode text according to local text conventions, as in CIF1, and those formed by encoding the underlying Unicode text according to UTF-8.  CIFs of the former type are "text files" in their context; those of latter type might also be text files under some circumstances.  If the Unicode text consists exclusively of ASCII characters then these two options are indistinguishable in many contexts.

I am open to CIF2 additionally defining binary CIFs formed by encoding the underlying Unicode text according to specific alternative schemes.  In particular, I would agree to UTF-16.  My support for other specific alternatives would be granted or withheld on a case-by-case basis.

I disfavor CIF2 defining binary CIFs formed in other ways, or leaving the definition of a "CIF" open-ended, but I favor express recognition of the possibility of alternative serializations CIF-conformant Unicode text.  In that vein, I favor creating a supplementary specification for CIF storage and exchange that addresses the multitude of possible encodings that CIF2 support for local defaults would permit in various environments.

My use of the term "Unicode text" is meant to emphasize that the vast majority of the CIF2 spec is independent of any encoding.  I think the latest (May) draft of the spec for the most part uses similar terminology, and I favor that form of description over one based on UTF-8 or some other specific encoding as a placeholder or reference.

It is my expectation that a result of the above provisions would be establishment of UTF-8 as a de facto default encoding for CIF2 CIFs.


Best,

John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital




Email Disclaimer:  www.stjude.org/emaildisclaimer


More information about the cif2-encoding mailing list