[Cif2-encoding] How we wrap this up

SIMON WESTRIP simonwestrip at btinternet.com
Sat Sep 25 20:34:14 BST 2010


OK - as promised, I wont pursue the matter :-)





________________________________
From: Herbert J. Bernstein <yaya at bernstein-plus-sons.com>
To: Group for discussing encoding and content validation schemes for CIF2 
<cif2-encoding at iucr.org>
Sent: Saturday, 25 September, 2010 19:18:54
Subject: Re: [Cif2-encoding] How we wrap this up

Dear Simon,

  Unfortunately, that is likely to take us back into our infinite loop or into a 
diverging spiral.  Right now, we would have UTF8 as no more or less a default 
for CIF2 than ASCII is for CIF1 -- i.e. a not too bad first guess as the likely 
default encoding for any given CIF, but not a formal constraint.  I would 
suggest we leave the wording in that imprecise state, get CIF2 out and accepted 
and then work further on the encoding issue.

  Regards,
    Herbert

=====================================================
Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                yaya at dowling.edu
=====================================================

On Sat, 25 Sep 2010, SIMON WESTRIP wrote:

> Dear all
> 
> In the event that CIF2 adopts the 'any encoding' approach, would there be
> any objections to
> explicitly defining a default encoding in the specification, to be defaulted
> to when there were no indications
> to the contrary. At worst this would give CIF2 service providers an excuse
> to interpret CIFs as e.g. UTF8 if they couldnt
> determine the encoding by other means - but such intollerant service
> providers would soon find that their service is
> not successful - while at best this might raise awareness of the issues
> regarding encoding once non-ASCII is used in
> a CIF. Essentially, it does not require users to change there working
> practices, which is one of the main arguments for
> 'any encoding'.
> 
> So, CIF2 would remain 'any encoding', and specifications in terms of e.g.
> "Herbert's as for CIF1..."
> might only require a single sentence to define the default after stating
> what the 'preferred' encoding was;
> the proposal might be phrased as "Herbert's as for CIF1..." + "explicit
> default encoding"?
> 
> I do not wish to prolong this debate - if there are objections I will not
> launch into an endless round of exchanges
> that cover the same ground that has led us this far.
> 
> Cheers
> 
> Simon
> 
> 
> 
> 
> 
> 
> ____________________________________________________________________________
> From: SIMON WESTRIP <simonwestrip at btinternet.com>
> To: Group for discussing encoding and content validation schemes for CIF2
> <cif2-encoding at iucr.org>
> Sent: Friday, 24 September, 2010 20:10:13
> Subject: Re: [Cif2-encoding] How we wrap this up
> 
> Dear James
> 
> As you may have gathered I have been reconsidering my position on this
> issue.
> Please forgive me, but I would like to change my vote if that is OK, in
> favour of the 'any encoding' camp.
> This apparent U-turn is not a response to recent contributions; rather it is
> the outcome of a meeting I had this morning
> where I demonstrated some new software to the Managing Editor of IUCr
> journals.
> 
> By way of explanation:
> 
> I have been developing a new docx template which the IUCr editorial office
> is shortly to release for use by
> authors. The template will be packaged with some tools to extract data from
> CIFs
> and tabulate them in the Word document, e.g. open an mmCIF, click a button,
> and standard
> tables populated with data from the CIF will be included in the document,
> acting as
> table templates for the author to edit as appropriate for their manuscript.
> 
> Inclusion of the mmCIF tools is part of an unofficial policy to 'coax'
> biologists to start using/accepting mmCIF
> as a useful medium, rather than as a product of their deposition to the PDB,
> and to encourage them to become comfortable
> with passing mmCIFs between applications, and even to edit the things (in
> the same way as the core-CIF community
> treats CIFs). For example, our perception is that there is no reason why an
> author should not feel free to take an mmCIF
> that has been created by e.g. pdb_extract and populate it using third-party
> software before uploading to the PDB for
> deposition.
> 
> This cause would not be furthered by effectively invalidating an mmCIF if it
> were not to be encoded in one of
> the specified encodings.
> 
> So although I am uneasy about a specification that propogates uncertainty,
> I'm also uneasy about alienating users,
> especially when we are struggling to change their mindset as in the case of
> the biological community
> (my perception of the biological community's attitude to mmCIF is based on
> feedback from authors/coeditors to
> IUCr journals).
> 
> Granted this may not be the most compelling argument in favour of 'any
> encoding', but recognizing the hurdles that
> may have to be overcome once we move beyond ASCII whatever the CIF2
> specification, I support 'any encoding'
> as 'a means to an end'.
> 
> I will not provide my preferences in terms of the numbered options until you
> say so; afterall, I have already voted and
> all this has to be signed off by COMCIFs in any case.
> 
> Cheers
> 
> Simon
> 
> 
> 
> 
> ____________________________________________________________________________
> From: "Bollinger, John C" <John.Bollinger at STJUDE.ORG>
> To: Group for discussing encoding and content validation schemes for CIF2
> <cif2-encoding at iucr.org>
> Sent: Friday, 24 September, 2010 14:50:57
> Subject: Re: [Cif2-encoding] How we wrap this up
> 
> Dear Simon,
> 
> It is exactly this sort of issue that drove me to support more permissive
> encoding rules and ultimately to devise the UTF-8 + UTF-16 + local proposal.
> 
> Do please think about the considerations Herb raised.  As you reconsider
> your votes, I urge you also to ask yourself what, *precisely*, a "text file"
> is, and to consider whether your answer is functionally different from my
> "local".  If you decide not, then please consider what that answer implies
> about CIF2 support of UTF-8 and UTF-16 (which evidently you favor) under
> each option on the table, especially for CIFs containing non-ASCII
> characters.  Whatever you decide about the meaning of "text file", please
> consider whether reasonable people might reach a different conclusion, as I
> assert they might do, and to what extent the standard needs to address that.
> 
> 
> Regards,
> 
> John
> --
> John C. Bollinger, Ph.D.
> Department of Structural Biology
> St. Jude Children's Research Hospital
> 
> 
> >From: cif2-encoding-bounces at iucr.org
> [mailto:cif2-encoding-bounces at iucr.org] On Behalf Of SIMON WESTRIP
> >Sent: Friday, September 24, 2010 7:53 AM
> >To: Group for discussing encoding and content validation schemes for CIF2
> >Subject: Re: [Cif2-encoding] How we wrap this up. .
> >
> >Dear Herbert
> >
> >Not for the first time, I find your arguement persuasive. Brian's vote and
> explanation have also raised some
> >questions that I would like to look into.
> >
> >I will confirm or otherwise my vote as soon as possible, assuming that is
> OK with James and assuming that
> >this round of votes might wrap this up.
> >
> >Cheers
> >
> >Simon
> >
> >________________________________________
> >From: Herbert J. Bernstein <yaya at bernstein-plus-sons.com>
> >To: Group for discussing encoding and content validation schemes for CIF2
> <cif2-encoding at iucr.org>
> >Sent: Friday, 24 September, 2010 13:17:14
> >Subject: Re: [Cif2-encoding] How we wrap this up
> >
> >If he ignores the standard, in most cases all he has to do to comply with
> CIF2 is to run whatever applications he currently runs to produce CIF1 and,
> perhaps, in some cases, run a minor edit pass at the end, to convert for the
> minor syntactive differences and/or changed tags required to comply with
> CIF2 and the new dictionaries, but he is unlikely to have to do anything to
> deal with the messy business of whether his encoding is really a proper UTF8
> encoding or not.
> 
> >The punishment if he tries to comply, is that he has to totally uproot and
> reconfigure the environment in which he produces CIFs from whatever he is
> currently doing to create an enviroment in which he can reliably create and,
> more importantly, transmit compliant UTF8 files.  This can be very tricky if
> he does only a partial job, say fudging in one special application (yet to
> be written), because if he stays with his old system, all kinds of tools
> will keep trying to transcode whatever he has produced back to whatever his
> system considers a standard. Those of us who have files, applications and
> tools that have lived through several generations of macs are living proof
> of the problem. Macs now have excellent UTF8/16 unicode support, but every
> once in a while in working with a unicode file I find it has been strangely
> and unexpectedly converted to something else, and it can be really tricky to
> spot when the unaccented roman text part has been left untouched but just a
> few accen
> ted letters have gotten different accents.
> 
> >Mandating UTF8 is simply trying to shift a serious software problem from
> the central handlers of CIF (IUCr, PDB, etc.) to the external users. Most
> users will probably have the good sense to simply ignore the demand and
> leave the burden just where it is now.  A few sophisticated users will
> probably adapt with no trouble, but the punishment for those users who
> blindly follow orders before we have a complete multiplatform supporting
> infrastructure in place by mandating UTF8 is severe, expensive and
> undeserved.  Until and unless we have developed solid support, we will just
> be alienating people from CIF.  I will continue to oppose such a move.
> 
> [...]
> 
> 
> Email Disclaimer:  www.stjude.org/emaildisclaimer
> _______________________________________________
> cif2-encoding mailing list
> cif2-encoding at iucr.org
> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://scripts.iucr.org/pipermail/cif2-encoding/attachments/20100925/f36e4eca/attachment-0001.html 


More information about the cif2-encoding mailing list