[Cif2-encoding] How we wrap this up

Herbert J. Bernstein yaya at bernstein-plus-sons.com
Sat Sep 25 20:37:46 BST 2010


Thank you for your cooperation. -- Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya at dowling.edu
=====================================================

On Sat, 25 Sep 2010, SIMON WESTRIP wrote:

> OK - as promised, I wont pursue the matter :-)
> 
> 
> ____________________________________________________________________________
> From: Herbert J. Bernstein <yaya at bernstein-plus-sons.com>
> To: Group for discussing encoding and content validation schemes for CIF2
> <cif2-encoding at iucr.org>
> Sent: Saturday, 25 September, 2010 19:18:54
> Subject: Re: [Cif2-encoding] How we wrap this up
> 
> Dear Simon,
> 
>   Unfortunately, that is likely to take us back into our infinite loop or
> into a diverging spiral.  Right now, we would have UTF8 as no more or less a
> default for CIF2 than ASCII is for CIF1 -- i.e. a not too bad first guess as
> the likely default encoding for any given CIF, but not a formal constraint. 
> I would suggest we leave the wording in that imprecise state, get CIF2 out
> and accepted and then work further on the encoding issue.
> 
>   Regards,
>     Herbert
> 
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
> 
>                 +1-631-244-3035
>                 yaya at dowling.edu
> =====================================================
> 
> On Sat, 25 Sep 2010, SIMON WESTRIP wrote:
> 
> > Dear all
> >
> > In the event that CIF2 adopts the 'any encoding' approach, would there be
> > any objections to
> > explicitly defining a default encoding in the specification, to be
> defaulted
> > to when there were no indications
> > to the contrary. At worst this would give CIF2 service providers an excuse
> > to interpret CIFs as e.g. UTF8 if they couldnt
> > determine the encoding by other means - but such intollerant service
> > providers would soon find that their service is
> > not successful - while at best this might raise awareness of the issues
> > regarding encoding once non-ASCII is used in
> > a CIF. Essentially, it does not require users to change there working
> > practices, which is one of the main arguments for
> > 'any encoding'.
> >
> > So, CIF2 would remain 'any encoding', and specifications in terms of e.g.
> > "Herbert's as for CIF1..."
> > might only require a single sentence to define the default after stating
> > what the 'preferred' encoding was;
> > the proposal might be phrased as "Herbert's as for CIF1..." + "explicit
> > default encoding"?
> >
> > I do not wish to prolong this debate - if there are objections I will not
> > launch into an endless round of exchanges
> > that cover the same ground that has led us this far.
> >
> > Cheers
> >
> > Simon
> >
> >
> >
> >
> >
> >
> >___________________________________________________________________________
> _
> > From: SIMON WESTRIP <simonwestrip at btinternet.com>
> > To: Group for discussing encoding and content validation schemes for CIF2
> > <cif2-encoding at iucr.org>
> > Sent: Friday, 24 September, 2010 20:10:13
> > Subject: Re: [Cif2-encoding] How we wrap this up
> >
> > Dear James
> >
> > As you may have gathered I have been reconsidering my position on this
> > issue.
> > Please forgive me, but I would like to change my vote if that is OK, in
> > favour of the 'any encoding' camp.
> > This apparent U-turn is not a response to recent contributions; rather it
> is
> > the outcome of a meeting I had this morning
> > where I demonstrated some new software to the Managing Editor of IUCr
> > journals.
> >
> > By way of explanation:
> >
> > I have been developing a new docx template which the IUCr editorial office
> > is shortly to release for use by
> > authors. The template will be packaged with some tools to extract data
> from
> > CIFs
> > and tabulate them in the Word document, e.g. open an mmCIF, click a
> button,
> > and standard
> > tables populated with data from the CIF will be included in the document,
> > acting as
> > table templates for the author to edit as appropriate for their
> manuscript.
> >
> > Inclusion of the mmCIF tools is part of an unofficial policy to 'coax'
> > biologists to start using/accepting mmCIF
> > as a useful medium, rather than as a product of their deposition to the
> PDB,
> > and to encourage them to become comfortable
> > with passing mmCIFs between applications, and even to edit the things (in
> > the same way as the core-CIF community
> > treats CIFs). For example, our perception is that there is no reason why
> an
> > author should not feel free to take an mmCIF
> > that has been created by e.g. pdb_extract and populate it using
> third-party
> > software before uploading to the PDB for
> > deposition.
> >
> > This cause would not be furthered by effectively invalidating an mmCIF if
> it
> > were not to be encoded in one of
> > the specified encodings.
> >
> > So although I am uneasy about a specification that propogates uncertainty,
> > I'm also uneasy about alienating users,
> > especially when we are struggling to change their mindset as in the case
> of
> > the biological community
> > (my perception of the biological community's attitude to mmCIF is based on
> > feedback from authors/coeditors to
> > IUCr journals).
> >
> > Granted this may not be the most compelling argument in favour of 'any
> > encoding', but recognizing the hurdles that
> > may have to be overcome once we move beyond ASCII whatever the CIF2
> > specification, I support 'any encoding'
> > as 'a means to an end'.
> >
> > I will not provide my preferences in terms of the numbered options until
> you
> > say so; afterall, I have already voted and
> > all this has to be signed off by COMCIFs in any case.
> >
> > Cheers
> >
> > Simon
> >
> >
> >
> >
> >___________________________________________________________________________
> _
> > From: "Bollinger, John C" <John.Bollinger at STJUDE.ORG>
> > To: Group for discussing encoding and content validation schemes for CIF2
> > <cif2-encoding at iucr.org>
> > Sent: Friday, 24 September, 2010 14:50:57
> > Subject: Re: [Cif2-encoding] How we wrap this up
> >
> > Dear Simon,
> >
> > It is exactly this sort of issue that drove me to support more permissive
> > encoding rules and ultimately to devise the UTF-8 + UTF-16 + local
> proposal.
> >
> > Do please think about the considerations Herb raised.  As you reconsider
> > your votes, I urge you also to ask yourself what, *precisely*, a "text
> file"
> > is, and to consider whether your answer is functionally different from my
> > "local".  If you decide not, then please consider what that answer implies
> > about CIF2 support of UTF-8 and UTF-16 (which evidently you favor) under
> > each option on the table, especially for CIFs containing non-ASCII
> > characters.  Whatever you decide about the meaning of "text file", please
> > consider whether reasonable people might reach a different conclusion, as
> I
> > assert they might do, and to what extent the standard needs to address
> that.
> >
> >
> > Regards,
> >
> > John
> > --
> > John C. Bollinger, Ph.D.
> > Department of Structural Biology
> > St. Jude Children's Research Hospital
> >
> >
> > >From: cif2-encoding-bounces at iucr.org
> > [mailto:cif2-encoding-bounces at iucr.org] On Behalf Of SIMON WESTRIP
> > >Sent: Friday, September 24, 2010 7:53 AM
> > >To: Group for discussing encoding and content validation schemes for CIF2
> > >Subject: Re: [Cif2-encoding] How we wrap this up. .
> > >
> > >Dear Herbert
> > >
> > >Not for the first time, I find your arguement persuasive. Brian's vote
> and
> > explanation have also raised some
> > >questions that I would like to look into.
> > >
> > >I will confirm or otherwise my vote as soon as possible, assuming that is
> > OK with James and assuming that
> > >this round of votes might wrap this up.
> > >
> > >Cheers
> > >
> > >Simon
> > >
> > >________________________________________
> > >From: Herbert J. Bernstein <yaya at bernstein-plus-sons.com>
> > >To: Group for discussing encoding and content validation schemes for CIF2
> > <cif2-encoding at iucr.org>
> > >Sent: Friday, 24 September, 2010 13:17:14
> > >Subject: Re: [Cif2-encoding] How we wrap this up
> > >
> > >If he ignores the standard, in most cases all he has to do to comply with
> > CIF2 is to run whatever applications he currently runs to produce CIF1
> and,
> > perhaps, in some cases, run a minor edit pass at the end, to convert for
> the
> > minor syntactive differences and/or changed tags required to comply with
> > CIF2 and the new dictionaries, but he is unlikely to have to do anything
> to
> > deal with the messy business of whether his encoding is really a proper
> UTF8
> > encoding or not.
> >
> > >The punishment if he tries to comply, is that he has to totally uproot
> and
> > reconfigure the environment in which he produces CIFs from whatever he is
> > currently doing to create an enviroment in which he can reliably create
> and,
> > more importantly, transmit compliant UTF8 files.  This can be very tricky
> if
> > he does only a partial job, say fudging in one special application (yet to
> > be written), because if he stays with his old system, all kinds of tools
> > will keep trying to transcode whatever he has produced back to whatever
> his
> > system considers a standard. Those of us who have files, applications and
> > tools that have lived through several generations of macs are living proof
> > of the problem. Macs now have excellent UTF8/16 unicode support, but every
> > once in a while in working with a unicode file I find it has been
> strangely
> > and unexpectedly converted to something else, and it can be really tricky
> to
> > spot when the unaccented roman text part has been left untouched but just
> a
> > few accen
> > ted letters have gotten different accents.
> >
> > >Mandating UTF8 is simply trying to shift a serious software problem from
> > the central handlers of CIF (IUCr, PDB, etc.) to the external users. Most
> > users will probably have the good sense to simply ignore the demand and
> > leave the burden just where it is now.  A few sophisticated users will
> > probably adapt with no trouble, but the punishment for those users who
> > blindly follow orders before we have a complete multiplatform supporting
> > infrastructure in place by mandating UTF8 is severe, expensive and
> > undeserved.  Until and unless we have developed solid support, we will
> just
> > be alienating people from CIF.  I will continue to oppose such a move.
> >
> > [...]
> >
> >
> > Email Disclaimer:  www.stjude.org/emaildisclaimer
> > _______________________________________________
> > cif2-encoding mailing list
> > cif2-encoding at iucr.org
> > http://scripts.iucr.org/mailman/listinfo/cif2-encoding
> >
> >
> 
>


More information about the cif2-encoding mailing list