[Cif2-encoding] How we wrap this up

SIMON WESTRIP simonwestrip at btinternet.com
Thu Sep 23 11:45:37 BST 2010


OK, final comments before this is wrapped up (hopefully):

1. Herbert's 'as for CIF1 proposal with UTF8 in place of ASCII' recently posted 
here and to COMCIFS.
2. Herbert's 'as for CIF1 proposal with UTF8 in place of ASCII', together with 
Brian's *recommendations*
3. UTF8-only as in the original draft
4. UTF8 + UTF16
5. UTF8, UTF16 + "local"

These can be broken down to:

'any encoding' (1, 2, and 5)

'specified encoding' (3 and 4)

Note I put 5 in the 'any encoding' category as I think 'local' could be 
interpretted as any encoding.

The 'any encoding' approach is to me unsatisfactory when considering that CIF is 
a data-exchange format
and should be specified in terms that allow the consumer to know exactly what to 
expect (i.e. no uncertainty in encoding).

'Specified encodings' can be seen as restrictive, especially if there is only 
one. A list of specified encodings could be
seen as inflexible and perhaps arbitrary (e.g. why isnt UTF32 on the list...).
If encoding is to be specified, it could be in terms of UTF8 + any Unicode 
encoding that is inherently identifiable 

(which in reality boils down to the UTF family).

In either case, a degree of work will be required to accommodate user practice 
and the legacy of CIF1.
If the 'any encoding' approach is taken, I believe there should be a wealth of 
supporting material for
both users and developers to encourage the use of a default encoding (i.e. 
UTF8). Hence my recent support
for something along the lines of (2) above. This approach avoids mandating some 
of the less-satisfactory
schemes we have been discussing (e.g. declaration of encoding), but at least 
makes them available 

to conscientious developers.

Equally, if CIF2 adopts 'specified encodings', there should be a wealth of 
supporting material for
both users and developers to enable transcoding.

The pedant in me would like to see 'specified encodings' (preferably UTF8 
default +  any  inherently identifiable Unicode encoding),
but if the 'any encoding' approach is to be taken, I think it has to be 
described as Herbert proposes, with any schemes
for identifying the encoding left out of the 'specification' (let the 
specification reflect the uncertainty that is the encoding of a CIF :-)

Cheers

Simon






________________________________
From: James Hester <jamesrhester at gmail.com>
To: Group for discussing encoding and content validation schemes for CIF2 
<cif2-encoding at iucr.org>
Sent: Thursday, 23 September, 2010 1:37:48
Subject: [Cif2-encoding] How we wrap this up

Dear CIF2 encoding participants,

As Herbert has indicated, we are starting to run out of time for
resolution of the encoding issue.  I believe that we have now explored
the various proposals sufficiently to all have a good understanding of
the consequences and advantages of each approach.  So, after a round
of final comments, I propose that we vote on the general scheme that
we recommend.  We can then flesh out the details of the particular
scheme that we have settled on, and take this completed proposal to
the DDLm group for their approval, following which we will present the
entire CIF2 syntax document to COMCIFS for a formal vote.

The proposals that I believe are still on the table are:

1. Herbert's 'as for CIF1 proposal' recently posted here and to COMCIFS.
2. Herbert's 'as for CIF1 proposal', together with Brian's proposal
(if you agree that they are compatible)
2. UTF8-only as in the original draft
3. UTF8 + UTF16
4. UTF8, UTF16 + "local"

I have not included the hashcode proposal as I believe it no longer
has any supporters.

We would need to conduct a preferential vote.  I stress that this is
purely to determine the recommendation of this working group, and is
not in any way binding on COMCIFS.

James.
-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
cif2-encoding mailing list
cif2-encoding at iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://scripts.iucr.org/pipermail/cif2-encoding/attachments/20100923/8d365d08/attachment-0001.html 


More information about the cif2-encoding mailing list