[Cif2-encoding] How we wrap this up

SIMON WESTRIP simonwestrip at btinternet.com
Sat Sep 25 17:09:12 BST 2010


Dear all

In the event that CIF2 adopts the 'any encoding' approach, would there be any 
objections to
explicitly defining a default encoding in the specification, to be defaulted to 
when there were no indications
to the contrary. At worst this would give CIF2 service providers an excuse to 
interpret CIFs as e.g. UTF8 if they couldnt
determine the encoding by other means - but such intollerant service providers 
would soon find that their service is 

not successful - while at best this might raise awareness of the issues 
regarding encoding once non-ASCII is used in 

a CIF. Essentially, it does not require users to change there working practices, 
which is one of the main arguments for 

'any encoding'.

So, CIF2 would remain 'any encoding', and specifications in terms of e.g. 
"Herbert's as for CIF1..."
might only require a single sentence to define the default after stating what 
the 'preferred' encoding was;
the proposal might be phrased as "Herbert's as for CIF1..." + "explicit default 
encoding"?

I do not wish to prolong this debate - if there are objections I will not launch 
into an endless round of exchanges
that cover the same ground that has led us this far.

Cheers

Simon









________________________________
From: SIMON WESTRIP <simonwestrip at btinternet.com>
To: Group for discussing encoding and content validation schemes for CIF2 
<cif2-encoding at iucr.org>
Sent: Friday, 24 September, 2010 20:10:13
Subject: Re: [Cif2-encoding] How we wrap this up


Dear James

As you may have gathered I have been reconsidering my position on this issue.
Please forgive me, but I would like to change my vote if that is OK, in favour 
of the 'any encoding' camp.
This apparent U-turn is not a response to recent contributions; rather it is the 
outcome of a meeting I had this morning
where I demonstrated some new software to the Managing Editor of IUCr journals.

By way of explanation:

I have been developing a new docx template which the IUCr editorial office is 
shortly to release for use by
authors. The template will be packaged with some tools to extract data from CIFs
and tabulate them in the Word document, e.g. open an mmCIF, click a button, and 
standard
tables populated with data from the CIF will be included in the document, acting 
as
table templates for the author to edit as appropriate for their manuscript.

Inclusion of the mmCIF tools is part of an unofficial policy to 'coax' 
biologists to start using/accepting mmCIF 

as a useful medium, rather than as a product of their deposition to the PDB, and 
to encourage them to become comfortable
with passing mmCIFs between applications, and even to edit the things (in the 
same way as the core-CIF community 

treats CIFs). For example, our perception is that there is no reason why an 
author should not feel free to take an mmCIF
that has been created by e.g. pdb_extract and populate it using third-party 
software before uploading to the PDB for
deposition.

This cause would not be furthered by effectively invalidating an mmCIF if it 
were not to be encoded in one of
the specified encodings.

So although I am uneasy about a specification that propogates uncertainty, I'm 
also uneasy about alienating  users,
especially when we are struggling to change their mindset as in the case of the 
biological community
(my perception of the biological community's attitude to mmCIF is based on 
feedback from authors/coeditors to
IUCr journals).

Granted this may not be the most compelling argument in favour of 'any 
encoding', but recognizing the hurdles that 

may have to be overcome once we move beyond ASCII whatever the CIF2 
specification, I support 'any encoding'
as 'a means to an end'.

I will not provide my preferences in terms of the numbered options until you say 
so; afterall, I have already voted and
all this has to be signed off by COMCIFs in any case.

Cheers

Simon







________________________________
From: "Bollinger, John C" <John.Bollinger at STJUDE.ORG>
To: Group for discussing encoding and content validation schemes for CIF2 
<cif2-encoding at iucr.org>
Sent: Friday, 24 September, 2010 14:50:57
Subject: Re: [Cif2-encoding] How we wrap this up

Dear Simon,

It is exactly this sort of issue that drove me to support more permissive 
encoding rules and ultimately to devise the UTF-8 + UTF-16 + local proposal.

Do please think about the considerations Herb raised.  As you reconsider your 
votes, I urge you also to ask yourself what, *precisely*, a "text file" is, and 
to consider whether your answer is functionally different from my "local".  If 
you decide not, then please consider what that answer implies about CIF2 support 
of UTF-8 and UTF-16  (which evidently you favor) under each option on the table, 
especially for CIFs containing non-ASCII characters.  Whatever you decide about 
the meaning of "text file", please consider whether reasonable people might 
reach a different conclusion, as I assert they might do, and to what extent the 
standard needs to address that.


Regards,

John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital


>From: cif2-encoding-bounces at iucr.org [mailto:cif2-encoding-bounces at iucr.org] On 
>Behalf Of SIMON WESTRIP
>Sent: Friday, September 24, 2010 7:53 AM
>To: Group for discussing encoding and content validation schemes for CIF2
>Subject: Re: [Cif2-encoding] How we  wrap this up. .
>
>Dear Herbert
>
>Not for the first time, I find your arguement persuasive. Brian's vote and 
>explanation have also raised some
>questions that I would like to look into.
>
>I will confirm or otherwise my vote as soon as possible, assuming that is OK 
>with James and assuming that
>this round of votes might wrap this up.
>
>Cheers
>
>Simon
>
>________________________________________
>From: Herbert J. Bernstein <yaya at bernstein-plus-sons.com>
>To: Group for discussing encoding and content validation schemes for CIF2 
><cif2-encoding at iucr.org>
>Sent: Friday, 24 September, 2010 13:17:14
>Subject: Re: [Cif2-encoding] How we wrap this  up
>
>If he ignores the standard, in most cases all he has to do to comply with CIF2 
>is to run whatever applications he currently runs to produce CIF1 and, perhaps, 
>in some cases, run a minor edit pass at the end, to convert for the minor 
>syntactive differences and/or changed tags required to comply with CIF2 and the 
>new dictionaries, but he is unlikely to have to do anything to deal with the 
>messy business of whether his encoding is really a proper UTF8 encoding or not.

>The punishment if he tries to comply, is that he has to totally uproot and 
>reconfigure the environment in which he produces CIFs from whatever he is 
>currently doing to create an enviroment in which he can reliably create and, 
>more importantly, transmit compliant UTF8 files.  This can be very tricky if he 
>does only a partial job, say fudging in one special application (yet to be 
>written), because if he stays with his old system, all kinds of tools will keep  
>trying to transcode whatever he has produced back to whatever his system 
>considers a standard. Those of us who have files, applications and tools that 
>have lived through several generations of macs are living proof of the problem. 
>Macs now have excellent UTF8/16 unicode support, but every once in a while in 
>working with a unicode file I find it has been strangely and unexpectedly 
>converted to something else, and it can be really tricky to spot when the 
>unaccented roman text part has been left untouched but just a few accen
ted letters have gotten different accents.

>Mandating UTF8 is simply trying to shift a serious software problem from the 
>central handlers of CIF (IUCr, PDB, etc.) to the external users. Most users will 
>probably have the good sense to simply ignore the demand and leave the burden 
>just where it is now.  A few sophisticated users will probably adapt with no 
>trouble, but the punishment for those users who blindly follow  orders before we 
>have a complete multiplatform supporting infrastructure in place by mandating 
>UTF8 is severe, expensive and undeserved.  Until and unless we have developed 
>solid support, we will just be alienating people from CIF.  I will continue to 
>oppose such a move.

[...]


Email Disclaimer:  www.stjude.org/emaildisclaimer
_______________________________________________
cif2-encoding mailing list
cif2-encoding at iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://scripts.iucr.org/pipermail/cif2-encoding/attachments/20100925/f9b6feca/attachment-0001.html 


More information about the cif2-encoding mailing list