[Cif2-encoding] How we wrap this up

Herbert J. Bernstein yaya at bernstein-plus-sons.com
Mon Sep 27 17:23:32 BST 2010


Dear Simon,

   We do not seem to be communicating effectively.  Do you have a
Skype account?  We really need a meeting.

   Regards,
     Herbert


At 3:27 PM +0000 9/27/10, SIMON WESTRIP wrote:
>I see nothing wrong with a strategy to introduce CIF2 if necessary.
>My initial thoughts are that the current 'as for CIF1...' description
>is not best suited as base specification on which to build full
>unicode support, should such a strategy be pursued.
>
>However, I will reflect on this along with recent contributions from
>James and John...
>
>Cheers
>
>Simon
>
>
>
>From: Herbert J. Bernstein <yaya at bernstein-plus-sons.com>
>To: Group for discussing encoding and content validation schemes for 
>CIF2 <cif2-encoding at iucr.org>
>Sent: Monday, 27 September, 2010 14:45:16
>Subject: Re: [Cif2-encoding] How we wrap this up
>
>The problem is that options 3,4 and 5 specifically prescribe the
>use of Unicode characters (that is the entire point of those
>options -- and that is the point in dispute -- whether we should
>be prescribing UTF8 or using is as we now use ASCII, as a way to
>be clear what we are talking about as in CIF1) and we simply are not 
>ready to deal such a requirement yet.
>
>I take the blame for starting this discussion many years ago when
>I simply asked for just what my motion says, that we start using
>UTF8 in the same way we had been using ASCII.  Unfortunately
>this discussion has turned into a strong push to focus CIF on
>that particular encoding, stop using Brian's elides, etc.  With
>the current weak state of software support for CIF and the large
>investment at the IUCr and at the PDB in current workflows, I
>think it would be a very disruptive and expensive change to make
>right now.  God and the Devil are in the details.
>
>Note that I am _not_ basing this argument on imgCIF.  At this point
>it appears, unfortunately, that CIF2 and imgCIF will have to diverge.
>If we have enough face-to-face discussions, perhaps we can bring
>them together again, as we did in 1998, but that is an even more
>difficult discussion than the one we need to have on encodings.
>What is I we will do is to go at this in incremental stages:
>
>1.  Make the transition from CIF1 to CIF2 using new dictionaries
>but allowing most data files to remain unchanges, and providing
>simple algorithmic transformations for the rest, but keeping
>most of the current semantic extensions that we have in CIF1,
>focusing our enegry on getting the new dictionaries used and
>making use of dREL;
>
>2.  Work on a CIF2.1 that, by creative and well-supported use
>of Unicode, allows for a well organized transition from Brian's
>elides to use of Unicode characters
>
>3.  Then working in that context, whatever it turns out to be,
>work on having imgCIF make the transition to CIF2 in some
>reasonably compatible way.
>
>I see how to do item 1 for next summer.  I don't see how to do 2 and
>3 in that time frame, though I am sure we could make a dent in
>them if we could meet face to face.  email tends to stiffen too
>many positions.
>
>Regards,
>   Herbert
>
>=====================================================
>Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>
>                 +1-631-244-3035
>                 <mailto:yaya at dowling.edu>yaya at dowling.edu
>=====================================================
>
>On Mon, 27 Sep 2010, SIMON WESTRIP wrote:
>
>>  Dear Herbert
>>
>>  I do not understand why it is *only* options 3, 4 or 5 that allow users to
>>  start using
>>  unicode characters?
>>
>>  More generally, are you suggesting that the use of anything but ASCII in a
>>  data value is only allowed if
>>  e.g. the dictionary definition of the data item permits, or even only if the
>>  IUCr says that's OK?
>>
>>  Fundamentally, I'm starting to infer that the purpose of the 'as for
>  > CIF1...' approach to encoding is
>>  to open the door to full unicode support, but not actually let anyone cross
>>  the threshold?
>>
>>
>>  Cheers
>>
>>  Simon
>>
>>  ____________________________________________________________________________
>>  From: Herbert J. Bernstein 
>><<mailto:yaya at bernstein-plus-sons.com>yaya at bernstein-plus-sons.com>
>>  To: Group for discussing encoding and content validation schemes for CIF2
>>  <<mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org>
>>  Sent: Monday, 27 September, 2010 11:48:49
>>  Subject: Re: [Cif2-encoding] How we wrap this up
>>
>>  Dear Simon,
>>
>>    Under the CIF2 specification with UTF8 in place of ASCII there is
>>  _no_ change in the use of elided ASCII sequences to represent non-ASCII
>>  characters until and unless the IUCr publications office decides that,
>>  for that particular application, they are ready to accept something
>>  new.
>>
>>    It is _only_ if you go forward with options 3, 4 or 5 that you
>>  are giving the green light to users to do precisely what you are
>>  concerned about -- using the unicode characters instead instead
>>  in possibly strange admixtures that nobody is ready to process.
>>
>>    Remember, under the CIF2 specification as now written, it is
>>  _not_ part of the CIF2 specification to determine the handling
>>  of the characters in quoted strings other than to ensure that
>>  those string do not contain illegal characters from the point
>>  of view of CIF2.  Dealing with the validity of particular character
>>  sequences in strings users provide is, just as in CIF1, the
>>  responsibility of the application (i.e. the IUCr journal flows
>>  or the PDB archiving flows).
>>
>>    My apologies to James, who I know is trying to do what he believes
>>  to be right, but I believe James has things backwards -- the "deep
>>  breath" is provided by my proposal -- taking the time to properly engineer
>>  the use of the extra characters UTF8 allows us to discuss clearly,
>>  while James' push for an immediate prescriptive use of UTF8 with
>>  prescriptions that differ drastically from what has been adopted
>>  by all other frameworks (HTML, XML, python, etc.) in ways that
>>  are untested and unsupported by most existing software is
>>  the untimely rush to judgement.
>>
>>    I beg you to support options 1 and/or 2 to allow CIF2 to go forward
>>  in all other respects while we all take a deep breath and deal
>>  with the tricky issue you raised slowly and carefully without the
>>  pressure of trying to have CIF2 itself ready for next summer.
>>
>>    Regards,
>>      Herbert
>>
>>  At 9:34 AM +0000 9/27/10, SIMON WESTRIP wrote:
>>  >I was not so concerned about invalidating existing CIFs, or even the
>>  >likelihood
>>  >that users will continue to write e.g. 'f\'oo' - this is a syntax
>>  >error in CIF2 that is readily recoverable.
>>  >
>>  >Rather there is a large group of CIF1 users that are in the habit of
>>  >using elided ASCII sequences to
>>  >represent non-ASCII characters. With CIF2 these users will be able
>>  >to use the unicode character itself.
>>  >So we might end up with a mixture of esacaped sequences and unicode
>>  >characters (e.g. a user may have a keyboard shortcut
>>  >for an accented character that forms part of their name, but might
>>  >still resort to \a for alpha, under the assumption that \a is still
>>  >valid because CIF2 is basically the same as CIF1, and, rightly or
>>  >wrongly, they perceive the eliding machanism as part of
>>  >CIF syntax.
>>  >
>>  >I think this is an issue where we can't afford to take an 'as for
>>  >CIF1...' approach, especially as the CIF1 specification
>>  >isn't entirely satisfactory (e.g. there's an example in the
>>  >line-folding protocal that uses elides in a file path to make a
>>  >point,
>>  >but actually these elides may easily be interpretted as escape
>>  >sequences), and as the encoding issue is very much concerned with
>>  >user practice, the large group of users that currently use elided
>>  >character codes need to be aware what the situation is in
>>  >CIF2?
>>  >
>>  >I'm not convinced this issue should be left for discussion later;
>>  >it is relevant when considering how the move beyond ASCII is specified.
>>  >
>>  >Cheers
>  > >
>>  >Simon
>>  >
>>  >
>>  >
>>  >
>>  >From: Herbert J. Bernstein 
>><<mailto:yaya at bernstein-plus-sons.com>yaya at bernstein-plus-sons.com>
>>  >To: Group for discussing encoding and content validation schemes for
>>  >CIF2 <<mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org>
>>  >Sent: Sunday, 26 September, 2010 23:14:55
>>  >Subject: Re: [Cif2-encoding] How we wrap this up
>>  >
>>  >Dear Simon,
>>  >
>>  >  The current CIF2 spec, with or without the changes I have suggested
>>  >to temporarily resolve the encoding issue is at best vague and
>>  >confusing on the elide character issue.  The interacting issue on
>>  >which the CIF2 spec
>>  >is clear is that we are changing the handling of quoted strings so
>>  >that they end on the first occurrence of the quoting character and leaves
>>  >the handling of elides to the calling application.
>>  >
>>  >  This will be a problem -- the change from CIF1 in the termination of
>>  >quoted strings along with the absence of a way of eliding the quotes
>>  >will invalidate a significant number of existing CIFS without any simple
>>  >mechanism to recover.  Rather than reopen another endless discussion,
>>  >I would suggest we simply add the python string concatenation character
>>  >"+" to ensure we can map all current CIF1 files and use Brian's common
>>  >semantic features for the moment.  We can then deal with the full elides
>>  >discussion at a future date.
>>  >
>>  >  Regards,
>>  >    Herbert
>>  >
>>  >
>>  >
>>  >
>>  >
>>  >At 1:40 PM -0700 9/26/10, SIMON WESTRIP wrote:
>>  >>Dear all
>>  >>
>>  >>While reviewing my hypothetical 'to do' list for implementing CIF2
>>  >>in current software, I realized that
>>  >>the issue of current support for elided character codes hasnt really
>>  >>been addressed in the context of CIF2.
>>  >>My 'to do' list contains notes that software could treat them as
>>  >>keyboard shortcuts, and their use could be
>>  >>defined in the dictionary. However, that was based on a distinct
>>  >>difference between CIF1 and CIF2,
>>  >>while the current arguments for 'as for CIF1...' suggest that the
>>  >>distinction between CIF1 and CIF2
>>  >>should almost be imperceptible.
>>  >>
>>  >>How is this issue to be addressed in the specification?
>>  >>
>>  >>Cheers
>>  >>
>>  >>Simon
>>  >>
>>  >>
>>  >>
>>  >>From: Herbert J. Bernstein
>>  >><<mailto:<mailto:yaya at bernstein-plus-sons.com>yaya at bernstein-plus-sons.com><mailto:yaya at bernstein-plus-sons.com>yaya at bernstein-plus-sons.com>
>>  >>To: Group for discussing encoding and content validation schemes for
>>  >>CIF2 
>><<mailto:<mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org>
>>  >>Sent: Saturday, 25 September, 2010 20:37:46
>>  >>Subject: Re: [Cif2-encoding] How we wrap this up
>>  >>
>>  >>Thank you for your cooperation. -- Herbert
>>  >>
>>  >>=====================================================
>>  >>Herbert J. Bernstein, Professor of Computer Science
>>  >>  Dowling College, Kramer Science Center, KSC 121
>>  >>        Idle Hour Blvd, Oakdale, NY, 11769
>>  >>
>>  >>                +1-631-244-3035
>>  >>
>>  >><mailto:<mailto:<mailto:yaya at dowling.edu>yaya at dowling.edu><mailto:yaya at dowling.edu>yaya at dowling.edu><mailto:<mailto:yaya at dowling.ed>yaya at dowling.ed
>>  u><mailto:yaya at dowling.edu>yaya at dowling.edu
>>  >>=====================================================
>>  >>
>>  >>On Sat, 25 Sep 2010, SIMON WESTRIP wrote:
>>  >>
>>  >>>  OK - as promised, I wont pursue the matter :-)
>>  >>>
>>  >>>
>>  >>>
>>  >>>________________________________________________________________________
>>  ____
>>  >>>  From: Herbert J. Bernstein
>>  >>><<mailto:<mailto:<mailto:yaya at bernstein-plus-sons.com>yaya at bernstein-plus-sons.com><mailto:yaya at bernstein-plus-sons.c>yaya at bernstein-plus-sons.c
>> 
>>om><mailto:<mailto:yaya at bernstein-plus-sons.com>yaya at bernstein-plus-sons.com><mailto:yaya at bernstein-plus-sons.com>yaya at bernstein-plus-sons.com>
>>  >>>  To: Group for discussing encoding and content validation schemes for
>>  CIF2
>>  >>>
>>  >>><<mailto:<mailto:<mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:c
>> 
>><mailto:if2-encoding at iucr.org>if2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org>
>  > >>>  Sent: Saturday, 25 September, 2010 19:18:54
>>  >>>  Subject: Re: [Cif2-encoding] How we wrap this up
>>  >>>
>>  >>>  Dear Simon,
>>  >>>
>>  >>>    Unfortunately, that is likely to take us back into our infinite loop
>>  or
>>  >>>  into a diverging spiral.  Right now, we would have UTF8 as no
>>  >>>more or less a
>>  >>>  default for CIF2 than ASCII is for CIF1 -- i.e. a not too bad
>>  >>>first guess as
>>  >>>  the likely default encoding for any given CIF, but not a formal
>>  >>>constraint.
>>  >>>  I would suggest we leave the wording in that imprecise state, get CIF2
>>  out
>>  >>>  and accepted and then work further on the encoding issue.
>>  >>>
>>  >>>    Regards,
>>  >>>      Herbert
>>  >>>
>>  >>>  =====================================================
>>  >>>  Herbert J. Bernstein, Professor of Computer Science
>>  >>>    Dowling College, Kramer Science Center, KSC 121
>>  >>>          Idle Hour Blvd, Oakdale, NY, 11769
>>  >>>
>>  >>>                  +1-631-244-3035
>>  >>>
>>  >>><mailto:<mailto:<mailto:yaya at dowling.edu>yaya at dowling.edu><mailto:yaya at dowling.edu>yaya at dowling.edu><mailto:<mailto:yaya at dowling.e>yaya at dowling.e
>>  du><mailto:yaya at dowling.edu>yaya at dowling.edu
>>  >>>  =====================================================
>>  >>>
>>  >>>  On Sat, 25 Sep 2010, SIMON WESTRIP wrote:
>>  >>>
>>  >>>  > Dear all
>>  >>>  >
>>  >>>  > In the event that CIF2 adopts the 'any encoding' approach,
>>  >>>would there be
>>  >>  > > any objections to
>>  >  >>  > explicitly defining a default encoding in the specification, to be
>>  >>>  defaulted
>>  >>>  > to when there were no indications
>>  >>>  > to the contrary. At worst this would give CIF2 service
>>  >>>providers an excuse
>>  >>>  > to interpret CIFs as e.g. UTF8 if they couldnt
>>  >>>  > determine the encoding by other means - but such intollerant service
>>  >>>  > providers would soon find that their service is
>>  >>>  > not successful - while at best this might raise awareness of the
>>  issues
>>  >>>  > regarding encoding once non-ASCII is used in
>>  >>>  > a CIF. Essentially, it does not require users to change there working
>>  >>>  > practices, which is one of the main arguments for
>>  >>>  > 'any encoding'.
>>  >>>  >
>>  >>>  > So, CIF2 would remain 'any encoding', and specifications in
>>  >>>terms of e.g.
>>  >>>  > "Herbert's as for CIF1..."
>>  >>>  > might only require a single sentence to define the default after
>>  stating
>>  >>>  > what the 'preferred' encoding was;
>>  >>>  > the proposal might be phrased as "Herbert's as for CIF1..." +
>>  "explicit
>>  >>>  > default encoding"?
>>  >>>  >
>>  >>>  > I do not wish to prolong this debate - if there are objections
>>  >>>I will not
>>  >>>  > launch into an endless round of exchanges
>>  >>>  > that cover the same ground that has led us this far.
>>  >>>  >
>>  >>>  > Cheers
>>  >>>  >
>>  >>>  > Simon
>>  >>>  >
>>  >>>  >
>>  >>>  >
>>  >>>  >
>>  >>>  >
>>  >>>  >
>>  >>>
>>  >>>>_______________________________________________________________________
>>  ____
>>  >>>  _
>>  >>>  > From: SIMON WESTRIP
>>  >>><<mailto:<mailto:<mailto:simonwestrip at btinternet.com>simonwestrip at btinternet.com><mailto:simonwestrip at btinternet.com>simonwestrip at btinternet.com
>>  ><mailto:<mailto:simonwestrip at btinternet.com>simonwestrip at btinternet.com><mailto:simonwestrip at btinternet.com>simonwestrip at btinternet.com>
>>  >>>  > To: Group for discussing encoding and content validation
>>  >>>schemes for CIF2
>>  >>>  >
>>  >>><<mailto:<mailto:<mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:c
>> 
>><mailto:if2-encoding at iucr.org>if2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org>
>>  >>>  > Sent: Friday, 24 September, 2010 20:10:13
>>  >>>  > Subject: Re: [Cif2-encoding] How we wrap this up
>>  >>>  >
>>  >>>  > Dear James
>>  >>>  >
>>  >>>  > As you may have gathered I have been reconsidering my position on
>>  this
>>  >>>  > issue.
>>  >>>  > Please forgive me, but I would like to change my vote if that is OK,
>>  in
>>  >>>  > favour of the 'any encoding' camp.
>>  >>>  > This apparent U-turn is not a response to recent
>>  >>>contributions; rather it
>>  >>>  is
>>  >>>  > the outcome of a meeting I had this morning
>  > >>>  > where I demonstrated some new software to the Managing 
>Editor of IUCr
>>  >>>  > journals.
>>  >>>  >
>>  >>>  > By way of explanation:
>>  >>>  >
>>  >>>  > I have been developing a new docx template which the IUCr
>>  >>>editorial office
>>  >>>  > is shortly to release for use by
>>  >>>  > authors. The template will be packaged with some tools to extract
>>  data
>>  >>>  from
>>  >>>  > CIFs
>>  >>>  > and tabulate them in the Word document, e.g. open an mmCIF, click a
>>  >>>  button,
>>  >>>  > and standard
>>  >>>  > tables populated with data from the CIF will be included in
>>  >>>the document,
>>  >>>  > acting as
>>  >>>  > table templates for the author to edit as appropriate for their
>>  >>>  manuscript.
>>  >>>  >
>>  >>>  > Inclusion of the mmCIF tools is part of an unofficial policy to
>>  'coax'
>>  >>>  > biologists to start using/accepting mmCIF
>>  >>>  > as a useful medium, rather than as a product of their deposition to
>>  the
>>  >>>  PDB,
>>  >>>  > and to encourage them to become comfortable
>>  >>>  > with passing mmCIFs between applications, and even to edit the
>>  >>>things (in
>>  >>>  > the same way as the core-CIF community
>>  >>>  > treats CIFs). For example, our perception is that there is no reason
>>  why
>>  >>>  an
>>  >>>  > author should not feel free to take an mmCIF
>>  >>>  > that has been created by e.g. pdb_extract and populate it using
>>  >>>  third-party
>>  >>>  > software before uploading to the PDB for
>>  >>>  > deposition.
>>  >>>  >
>>  >>>  > This cause would not be furthered by effectively invalidating
>>  >>>an mmCIF if
>>  >>>  it
>>  >>>  > were not to be encoded in one of
>>  >>>  > the specified encodings.
>>  >>>  >
>>  >>>  > So although I am uneasy about a specification that propogates
>>  >>>uncertainty,
>>  >>>  > I'm also uneasy about alienating users,
>>  >>>  > especially when we are struggling to change their mindset as in the
>>  case
>>  >>>  of
>>  >>>  > the biological community
>>  >>>  > (my perception of the biological community's attitude to mmCIF
>>  >>>is based on
>>  >>>  > feedback from authors/coeditors to
>>  >>>  > IUCr journals).
>>  >>>  >
>>  >  >>  > Granted this may not be the most compelling argument in favour of
>>  'any
>>  >>>  > encoding', but recognizing the hurdles that
>>  >>>  > may have to be overcome once we move beyond ASCII whatever the CIF2
>>  >>>  > specification, I support 'any encoding'
>>  >>>  > as 'a means to an end'.
>>  >>>  >
>>  >>>  > I will not provide my preferences in terms of the numbered options
>>  until
>>  >>  > you
>>  >>>  > say so; afterall, I have already voted and
>>  >>>  > all this has to be signed off by COMCIFs in any case.
>>  >>>  >
>>  >>>  > Cheers
>>  >>>  >
>>  >>>  > Simon
>>  >>>  >
>>  >>>  >
>>  >>>  >
>>  >>>  >
>>  >>>
>>  >>>>_______________________________________________________________________
>>  ____
>>  >>>  _
>>  >>>  > From: "Bollinger, John C"
>>  >>><<mailto:<mailto:<mailto:John.Bollinger at STJUDE.ORG>John.Bollinger at STJUDE.ORG><mailto:John.Bollinger at STJUDE.ORG>John.Bollinger at STJUDE.ORG><ma
>> 
>>ilto:<mailto:John.Bollinger at STJUDE.ORG>John.Bollinger at STJUDE.ORG><mailto:John.Bollinger at STJUDE.ORG>John.Bollinger at STJUDE.ORG>
>>  >>>  > To: Group for discussing encoding and content validation
>>  >>>schemes for CIF2
>>  >>>  >
>>  >>><<mailto:<mailto:<mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:c
>> 
>><mailto:if2-encoding at iucr.org>if2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org>
>>  >>>  > Sent: Friday, 24 September, 2010 14:50:57
>>  >>>  > Subject: Re: [Cif2-encoding] How we wrap this up
>>  >>>  >
>>  >>>  > Dear Simon,
>>  >>>  >
>>  >>>  > It is exactly this sort of issue that drove me to support more
>>  >>>permissive
>>  >>>  > encoding rules and ultimately to devise the UTF-8 + UTF-16 + local
>>  >>>  proposal.
>>  >>>  >
>>  >>>  > Do please think about the considerations Herb raised.  As you
>>  reconsider
>>  >>>  > your votes, I urge you also to ask yourself what, *precisely*, a
>>  "text
>>  >>>  file"
>>  >>>  > is, and to consider whether your answer is functionally
>>  >>>different from my
>>  >>>  > "local".  If you decide not, then please consider what that
>>  >>>answer implies
>  > >>>  > about CIF2 support of UTF-8 and UTF-16 (which evidently you favor)
>>  under
>>  >>>  > each option on the table, especially for CIFs containing non-ASCII
>>  >>>  > characters.  Whatever you decide about the meaning of "text
>>  >>>file", please
>>  >>>  > consider whether reasonable people might reach a different
>>  >>>conclusion, as
>>  >>>  I
>>  >>>  > assert they might do, and to what extent the standard needs to
>>  address
>>  >>>  that.
>>  >>>  >
>>  >>>  >
>>  >>>  > Regards,
>>  >>>  >
>>  >>>  > John
>>  >>>  > --
>>  >>>  > John C. Bollinger, Ph.D.
>>  >>>  > Department of Structural Biology
>>  >>>  > St. Jude Children's Research Hospital
>>  >>>  >
>>  >>>  >
>>  >>>  > >From:
>>  >>><mailto:<mailto:<mailto:cif2-encoding-bounces at iucr.org>cif2-encoding-bounces at iucr.org>cif2-encoding-bounces at iuc
>> 
>>r.org><mailto:<mailto:cif2-encoding-bounces at iucr.org>cif2-encoding-bounces at iucr.org><mailto:cif2-encoding-bounces at iucr.org>cif2-encoding-bounces at iucr.org
>>
>>  >>>  >
>>  >>>[mailto:<mailto:<mailto:<mailto:cif2-encoding-bounces at iucr.org>cif2-encoding-bounces at iucr.org>cif2-encoding-bou
>> 
>><mailto:nces at iucr.org>nces at iucr.org><mailto:<mailto:cif2-encoding-bounces at iucr.org>cif2-encoding-bounces at iucr.org>cif2-encoding-bounces@
>>  iucr.org]
>>  >>>On Behalf Of SIMON WESTRIP
>>  >>>  > >Sent: Friday, September 24, 2010 7:53 AM
>>  >>>  > >To: Group for discussing encoding and content validation
>>  >>>schemes for CIF2
>>  >>>  > >Subject: Re: [Cif2-encoding] How we wrap this up. .
>>  >>>  > >
>>  >>>  > >Dear Herbert
>>  >>>  > >
>>  >>>  > >Not for the first time, I find your arguement persuasive. Brian's
>>  vote
>>  >>>  and
>>  >>>  > explanation have also raised some
>>  >>>  > >questions that I would like to look into.
>>  >>>  > >
>>  >>>  > >I will confirm or otherwise my vote as soon as possible,
>>  >>>assuming that is
>>  >>>  > OK with James and assuming that
>>  >>>  > >this round of votes might wrap this up.
>>  >>>  > >
>>  >>>  > >Cheers
>>  >>>  > >
>>  >>>  > >Simon
>>  >>>  > >
>>  >>>  > >________________________________________
>>  >>>  > >From: Herbert J. Bernstein
>>  >>><<mailto:<mailto:<mailto:yaya at bernstein-plus-sons.com>yaya at bernstein-plus-sons.com><mailto:yaya at bernstein-plus-sons.c>yaya at bernstein-plus-sons.c
>> 
>>om><mailto:<mailto:yaya at bernstein-plus-sons.com>yaya at bernstein-plus-sons.com><mailto:yaya at bernstein-plus-sons.com>yaya at bernstein-plus-sons.com>
>>  >>>  > >To: Group for discussing encoding and content validation
>>  >>>schemes for CIF2
>>  >>>  >
>>  >>><<mailto:<mailto:<mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:c
>> 
>><mailto:if2-encoding at iucr.org>if2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org>
>>  >>>  > >Sent: Friday, 24 September, 2010 13:17:14
>>  >>>  > >Subject: Re: [Cif2-encoding] How we wrap this up
>>  >>>  > >
>>  >>>  > >If he ignores the standard, in most cases all he has to do to
>>  >>>comply with
>>  >>>  > CIF2 is to run whatever applications he currently runs to produce
>>  CIF1
>>  >>>  and,
>>  >>>  > perhaps, in some cases, run a minor edit pass at the end, to convert
>>  for
>>  >>>  the
>>  >>>  > minor syntactive differences and/or changed tags required to comply
>>  with
>>  >>>  > CIF2 and the new dictionaries, but he is unlikely to have to do
>>  anything
>>  >  >>  to
>>  >>>  > deal with the messy business of whether his encoding is really a
>>  proper
>>  >>>  UTF8
>>  >>>  > encoding or not.
>>  >>>  >
>>  >>>  > >The punishment if he tries to comply, is that he has to totally
>>  uproot
>>  >>>  and
>>  >>>  > reconfigure the environment in which he produces CIFs from
>>  >>>whatever he is
>>  >>>  > currently doing to create an enviroment in which he can reliably
>>  create
>>  >>>  and,
>>  >>>  > more importantly, transmit compliant UTF8 files.  This can be
>>  >>>very tricky
>>  >>>  if
>>  >>>  > he does only a partial job, say fudging in one special
>>  >>>application (yet to
>>  >>>  > be written), because if he stays with his old system, all kinds of
>>  tools
>>  >>>  > will keep trying to transcode whatever he has produced back to
>>  whatever
>>  >>>  his
>>  >>>  > system considers a standard. Those of us who have files,
>>  >>>applications and
>  > >>>  > tools that have lived through several generations of macs are
>>  >>>living proof
>>  >>>  > of the problem. Macs now have excellent UTF8/16 unicode
>>  >>>support, but every
>>  >>  > > once in a while in working with a unicode file I find it has been
>>  >>>  strangely
>>  >>>  > and unexpectedly converted to something else, and it can be
>>  >>>really tricky
>>  >>>  to
>>  >>>  > spot when the unaccented roman text part has been left
>>  >>>untouched but just
>>  >>>  a
>>  >>>  > few accen
>>  >>>  > ted letters have gotten different accents.
>>  >>>  >
>>  >>>  > >Mandating UTF8 is simply trying to shift a serious software
>>  >>>problem from
>>  >>>  > the central handlers of CIF (IUCr, PDB, etc.) to the external
>>  >>>users. Most
>>  >>>  > users will probably have the good sense to simply ignore the demand
>>  and
>>  >>>  > leave the burden just where it is now.  A few sophisticated users
>>  will
>>  >>>  > probably adapt with no trouble, but the punishment for those users
>>  who
>>  >>>  > blindly follow orders before we have a complete multiplatform
>>  supporting
>>  >>>  > infrastructure in place by mandating UTF8 is severe, expensive and
>>  >>>  > undeserved.  Until and unless we have developed solid support, we
>>  will
>>  >>>  just
>>  >>>  > be alienating people from CIF.  I will continue to oppose such a
>>  move.
>>  >>>  >
>>  >>>  > [...]
>>  >>>  >
>>  >>>  >
>>  >>>  > Email Disclaimer:
>>  >>><<<http://www.stjude.org/emaildisclaimer>http://www.stjude.org/emaildisclaimer><http://www.stjude.org/emaildiscl>http://www.stjude.org/emaildiscl
>> 
>>aimer><<http://www.stjude.org/emaildisclaimer>http://www.stjude.org/emaildisclaimer><http://www.stjude.org/emaildisclaimer>www.stjude.org/emaildisclaimer
>>
>>  >>>  > _______________________________________________
>>  >>>  > cif2-encoding mailing list
>>  >>>  >
>>  >>><mailto:<mailto:<mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:ci
>> 
>><mailto:f2-encoding at iucr.org>f2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org
>>  >>>  >
>>  >>><<<http://scripts.iucr.org/mailman/listinfo/cif2-encoding>http://scripts.iucr.org/mailman/listinfo/cif2-encoding><http://scripts>http://scripts.
>> 
>>iucr.org/mailman/listinfo/cif2-encoding><<http://scripts.iucr.org/mailman/li>http://scripts.iucr.org/mailman/li
>> 
>>stinfo/cif2-encoding><http://scripts.iucr.org/mailman/listinfo/cif2-encoding>http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>>
>>  >>>  >
>>  >>>  >
>>  >>>
>>  >>>
>>  >>
>>  >>_______________________________________________
>>  >>cif2-encoding mailing list
>>  >><mailto:<mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org
>>  >><<http://scripts.iucr.org/mailman/listinfo/cif2-encoding>http://scripts.iucr.org/mailman/listinfo/cif2-encoding><http://scripts.iu>http://scripts.iu
>>  cr.org/mailman/listinfo/cif2-encoding
>>  >
>>  >
>>  >--
>>  >=====================================================
>>  >  Herbert J. Bernstein, Professor of Computer Science
>>  >    Dowling College, Kramer Science Center, KSC 121
>>  >        Idle Hour Blvd, Oakdale, NY, 11769
>>  >
>>  >                  +1-631-244-3035
>>  > 
>><mailto:<mailto:yaya at dowling.edu>yaya at dowling.edu><mailto:yaya at dowling.edu>yaya at dowling.edu
>>  >=====================================================
>>  >_______________________________________________
>>  >cif2-encoding mailing list
>>  ><mailto:<mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org
>>  ><<http://scripts.iucr.org/mailman/listinfo/cif2-encoding>http://scripts.iucr.org/mailman/listinfo/cif2-encoding><http://scripts.iuc>http://scripts.iuc
>>  r.org/mailman/listinfo/cif2-encoding
>>  >
>>  >
>>  >_______________________________________________
>>  >cif2-encoding mailing list
>>  ><mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org
>>  ><http://scripts.iucr.org/mailman/listinfo/cif2-encoding>http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>>
>>
>>  --
>>  =====================================================
>>    Herbert J. Bernstein, Professor of Computer Science
>>      Dowling College, Kramer Science Center, KSC 121
>  >         Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                    +1-631-244-3035
>>                    <mailto:yaya at dowling.edu>yaya at dowling.edu
>>  =====================================================
>>  _______________________________________________
>>  cif2-encoding mailing list
>>  <mailto:cif2-encoding at iucr.org>cif2-encoding at iucr.org
>> 
>><http://scripts.iucr.org/mailman/listinfo/cif2-encoding>http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>>
>>
>
>_______________________________________________
>cif2-encoding mailing list
>cif2-encoding at iucr.org
>http://scripts.iucr.org/mailman/listinfo/cif2-encoding


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya at dowling.edu
=====================================================


More information about the cif2-encoding mailing list