Advice on COMCIFS policy regarding compatibility of CIF syntax with other domains

Herbert J. Bernstein yaya at bernstein-plus-sons.com
Tue Mar 15 11:39:14 GMT 2011


Dear Colleagues,

   My apologies.  This may take a while.  To avoid critical points getting
lost, I'd like to focus on one sub-issue at a time, starting with the
divide between syntax and semantics.  If we isolate all syntax development
from semantics and relegate all semantics to the dictionaries, CIF becomes
something very different from what it has been in the past, through
CIF1.1.  CIF would be confined purely to considerations of which strings
of characters are valid.  The dictionaries would deal with such issues
as whether the numeric strings 13.45 and 1.345E1 are equivalent.  All
the "common semantic features" of CIF 1.1 would have to be replicated
dictionary by dictionary and no longer would have to be common.  The
relationships between CIFS and their dictionaries now specified by
the DDLs as part of CIF would have to be moved down purely to dictionary
development, and instead of just having DDL1, DDL2 and DDLm, we could have
one or more flavors of DDL for each subdomain using CIF, or even one per
data file.

   I, for one, think that the divide used in the past, in which as much
as possible of the common semantics was treated along with the raw
syntax, was a very useful approach and helped to reduce the drift
of CIF into multiple dialects, and that we will consider all proposed
features in terms of their total impact on the use of CIF, not just
in terms of the validity or invalidity of particular strings.

   Regards,
     Herbert



At 2:57 PM +1100 3/15/11, James Hester wrote:
>Herbert has proposed a revised version of the principles for syntax
>development I put forward.  Starting from the lower-numbered ones:
>
>(Herbert suggests)
>>  ======================================================================
>>
>>
>>  1.(i): If it is feasible to implement the desired behavior by
>>  specification of changes to dictionaries rather then to CIF syntax,
>>  that alternative should be seriously considered and balanced against
>>  the human-readability of the resulting CIFs without reference to
>>  dictionaries;
>>
>>  ======================================================================
>
>Original text:
>
>>  (i) implementation or use of equivalent behaviour at dictionary level
>>  is either significantly more cumbersome or not possible;
>
>Note that the intent of my original formulation was to move semantic
>complexity to the dictionary level where this is practical, and
>Herbert's formulation has rephrased this in terms of human
>readability.   I have fiddled with Herbert's text below in order to
>indicate that dictionary-based change is not always going to impact
>human readability negatively, and to remove 'without reference to the
>dictionary' as I don't think that is relevant.  I can expand on that
>last sentence if it is controversial.
>
>===================================================
>1.(i): If it is feasible to implement the desired behavior by
>specification of changes to dictionaries rather than to CIF syntax,
>that alternative is to be preferred, and balanced against a syntax-based
>change in those cases where a dictionary-based change would
>significantly reduce human readability.
>=====================================================
>
>Next is principle 1(ii)
>
>Original text:
>>   the feature provides
>>  significant new functionality that is widely applicable to most
>>  scientific domains
>>
>>  This principle would prevent CIF from having features which
>>  support any one scientific domain.  Under this principle, we never
>>  would have had DDL2 and mmCIF, nor imgCIF.  I would suggest changing
>>  this principle to
>>  ======================================================================
>>
>>  1.(ii).  the feature provides significant new functionality for
>>  some scientific application domain, and does not interfere with
>  > the use of CIF in other scientific application domains.
>>
>>  ======================================================================
>
>As John B has pointed out, 1(ii) applies to syntax only, so those
>counterexamples are invalid.  If anything, they demonstrate the power
>of the CIF1 framework and justify the approach of focusing development
>work at the dictionary level.  Furthermore, there is no such thing as
>a new syntactical feature that does not impact other application
>domains - a new syntactical feature requires that everybody implements
>it, which is therefore an unavoidable cost.  Pending some valid
>counterexamples, I do not accept Herbert's proposed alteration to
>1(ii).
>
>Finally and most importantly, Herbert suggests a completely new
>Principle 3.  The original Principle 3 was intended to resolve
>conflicts between Principles 1 and 2 in clear favour of Principle 1.
>This original Principle 3 would ensure that the standard remained
>consistent with the clear vision of the Preamble, while requiring any
>issues not relating to those in Principle 1 to be addressed in
>appropriate arenas.
>
>The crux of Herbert's argument around the original Principle 3 is that
>(i) Principles 1 and 2 use some words that are not entirely technical
>(so belong in the realm of Principle 3), and (ii) basing design on
>purely technical grounds is not a guarantee of success.  To address
>these perceived flaws, Herbert proposes instead working for 'buy-in'
>among stakeholders before making any changes.  In response to point
>(i): consider that Principles 1 and 2 describe a framework within
>which syntax changes can be made.  Whether this is called "technical"
>or "objective" or something else is not the key point; the key point
>is that a syntax change must satisfy those principles, whatever their
>character.  I have endeavoured to use wording that would allow
>reasonable people to agree. So Herbert's objection (i) does not
>ultimately matter. In response to (ii) I would argue that basing
>design on stakeholder buy-in is also not a guarantee of success, so it
>is not on the face of it a superior approach to the original Principle
>3 approach. Brian makes the point that we cannot guarantee stakeholder
>buy-in.  I would then take the next logical step of saying that making
>changes to the syntax contingent on stakeholder buy-in would be
>tantamount to freezing the standard.  John B raises all sorts of
>questions about just how the level of buy-in might be assessed.
>
>I would ask further: in order to achieve 'buy-in', are we talking only
>about weighting Principle 2 (conformance to other standards) higher
>than Principle 1 (remaining true to the vision of CIF syntax in the
>Preamble) in order to achieve buy-in?  Or are we talking about making
>arbitrary syntactical changes in return for 'buy-in'?  And what do we
>get in return for this willingness on our part to compromise our
>consistency and coherency?  A promise to think about using CIF?  An
>undertaking to write software for CIF?  A statement from that
>stakeholder that CIF is a fine standard?  And how on earth can anybody
>in science speak on behalf of everybody in their field in anything but
>a speculative manner anyway?  Let us remember that it is one thing to
>achieve 'buy-in' from a company that is paying for a software
>engineering project (where cash on the table is a tangible result of
>buy in) and another to get a whole field of science to move in the
>same direction (particularly if most of them don't care one way or the
>other about CIF syntax).
>
>So I conclude that it would be terribly counterproductive to include
>'buy-in' in these principles.  Instead I propose a 4th principle
>(which is simply a restatement of accepted COMCIFS procedure) to
>address any concerns that serious objections from CIF stakeholders
>will be ignored:
>
>===============
>4.  Draft syntax changes will be made available on the IUCr website
>for public comment for a period of at least 6 weeks, following which
>COMCIFS voting members, after consideration of any objections raised,
>can vote to accept the change. A syntax change will be accepted if 3/4
>of COMCIFS voting members approve it.
>===============
>
>Insofar as the voting members represent, and can seek advice from, a
>broad cross-section of CIF stakeholders, their vote should be a
>sufficient reflection of the desirability of any given change.
>
>Considerable time has gone by without any further responses on this
>topic, so we may safely assume that those who are interested in this
>issue have commented.  It remains to converge on an agreed position
>and then to tidy up the wording, following which we will apply these
>principles to what I hope is the last syntax change for a while, the
>'elide question'.
>
>James.
>
>On Sat, Mar 5, 2011 at 1:20 AM, Herbert J. Bernstein
><yaya at bernstein-plus-sons.com> wrote:
>>  Dear Colleagues,
>>
>>     I very much agree with the concept of clearly established principles to
>>  help guide our discussions.  To summarize James's list as amended by Peter,
>>  what he has proposed as guiding principles are:
>>
>>  ====================================================================
>>
>>  Principles guiding development of CIF syntax
>>  -----------------------------------------------------------------
>>
>>  Preamble: The CIF syntax describes a human-readable, syntactic
>>  container for scientific data.  CIF syntax aims to be as simple as
>>  possible.  The domain dictionaries are the primary location of
>>  semantic information in the Crystallographic Information Framework.
>>
>>  1. A feature should only be added to CIF syntax if all of the
>>  following are satisfied:
>>
>>  (i) implementation or use of equivalent behaviour at dictionary level
>>  is either significantly more cumbersome or not possible;
>>  (ii) the feature provides significant new functionality that is widely
>>  applicable to most scientific domains
>>  (iii) reliable transfer and archiving of data is not compromised
>>  (iv) there is no simpler way of achieving the desired behaviour
>>  (v) a feature should only be added if it has been shown possible to
>>  implement it with "reasonable ease," i.e. provides a benefit
>>  worth the cost, and there is a "rough consensus and running code"
>>
>>  2. As long as the requirements in (1) are satisfied, the CIF 
>>framework should:
>>    (i) behave in a way that is consistent with common usage
>>    (ii) align with pre-existing standards where those standards provide
>>  the required behaviour. CIF1 can be considered a pre-existing standard
>>  for CIF2 in this context.
>>
>>  3. Non-technical issues should be dealt with in non-technical arenas.
>>
>>  (End)
>>
>>  ====================================================================
>>
>>  Before turning to the lower numbered principles, I would suggest we
>>  discuss item 3, because I believe it conflicts with item 1, which
>>  discusses features in terms of being "cumbersome," "significant,"
>>  being "simpler," "desired behavior," "reasonable ease," and "rough
>>  consensus," all of which have strong psychological and other
>>  human factors components that may be difficult to quantify
>>  and make into "technical issues."  In software engineering, my
>>  own field, there is a long history of failure in trying to make
>>  the systems design process into something purely technical.  On
>>  purely technical grounds, we would never have accepted C or Python
>>  as a suitable programming languages for serious work.  Everything
>>  important from operating systems to dREL would be based on Pascal.
>>  That was the technical consensus of the Computer Science world of
>>  a few decades ago.  Technically, Pascal is a much better language
>>  than C or Python.  It just happens that real people produce better,
>>  more reliable code with C and with Python than with Pascal, because,
>>  for reasons that are still not clearly understood, they get less
>>  confused and make fewer mistakes when working with those languages.
>>  Precisely because we don't fully understand the non-technical issues
>>  in the design of information systems (and, I suspect, in almost all
>>  systems), one of the accepted principles of software engineering
>>  is to clearly identify all stakeholders, bring them into the discussion
>>  and work to achieve their "buy-in".  Therefore I propose that we
>>  replace principle 3 with
>  >
>>  ======================================================================
>>
>>  3.  The stakeholders impacted by any change should be clearly
>>  identified and the proposed changes should be fully and openly
>>  discussed them in an effort to achieve their buy-in to the change,
>>  and the change should not go forward in absence of such buy-in
>>  absent pressing technical reasons for making the change over
>  > such objections.
>>
>>  ======================================================================
>>
>>  In principle 2, we reference CIF1.  I believe that should be CIF1.1.
>>
>>  Now I would like to turn to principle 1.(ii):  the feature provides
>>  significant new functionality that is widely applicable to most
>>  scientific domains
>>
>>  This principle would prevent CIF from having features which
>>  support any one scientific domain.  Under this principle, we never
>>  would have had DDL2 and mmCIF, nor imgCIF.  I would suggest changing
>>  this principle to"
>>
>>  ======================================================================
>>
>>  1.(ii).  the feature provides significant new functionality for
>>  some scientific application domain, and does not interfere with
>>  the use of CIF in other scientific application domains.
>>
>>  ======================================================================
>>
>>  Finally, let us consider principle 1.(i): implementation or use of
>>  equivalent behaviour at dictionary level is either significantly more
>>  cumbersome or not possible;
>>
>>  Depending on how we interpret the non-technical word "cumbersome", this
>>  may create the impression that we will require all uses of CIF
>>  to require use of dictionaries.  I would suggest instead:
>>
>>  ======================================================================
>>
>>
>>  1.(i): If it is feasible to implement the desired behavior by
>>  specification of changes to dictionaries rather then to CIF syntax,
>>  that alternative should be seriously considered and balanced against
>>  the human-readability of the resulting CIFs without reference to
>>  dictionaries;
>>
>>  ======================================================================
>>
>>  Thus the revised principles I would suggest would be:
>>
>>  ====================================================================
>>
>>  Principles guiding development of CIF syntax
>>  -----------------------------------------------------------------
>>
>>  Preamble: The CIF syntax describes a human-readable, syntactic
>>  container for scientific data.  CIF syntax aims to be as simple as
>>  possible.  The domain dictionaries are the primary location of
>>  semantic information in the Crystallographic Information Framework.
>>
>>  1. A feature should only be added to CIF syntax if all of the
>>  following are satisfied:
>>
>>  (i): If it is feasible to implement the desired behavior by
>>  specification of changes to dictionaries rather then to CIF syntax,
>>  that alternative should be seriously considered and balanced against
>>  the human-readability of the resulting CIFs without reference to
>>  dictionaries;
>>  (ii).  the feature provides significant new functionality for
>>  some scientific application domain, and does not interfere with
>>  the use of CIF in other scientific application domains.
>>  (iii) reliable transfer and archiving of data is not compromised
>>  (iv) there is no simpler way of achieving the desired behaviour
>>  (v) a feature should only be added if it has been shown possible to
>>  implement it with "reasonable ease," i.e. provides a benefit
>>  worth the cost, and there is a "rough consensus and running code"
>>
>>  2. As long as the requirements in (1) are satisfied, the CIF 
>>framework should:
>>    (i) behave in a way that is consistent with common usage
>>    (ii) align with pre-existing standards where those standards provide
>>  the required behaviour. CIF1.1 can be considered a pre-existing standard
>>  for CIF2 in this context.
>>
>>  3.  The stakeholders impacted by any change should be clearly
>>  identified and the proposed changes should be fully and openly
>>  discussed them in an effort to achieve their buy-in to the change,
>>  and the change should not go forward in absence of such buy-in
>  > absent pressing technical reasons for making the change over
>>  such objections.
>>
>>  (End)
>>
>>  ====================================================================
>>
>>
>>
>>
>>
>>  Regards,
>>    Herbert
>>
>>
>>
>>  At 10:47 PM +1100 3/4/11, James Hester wrote:
>>>Thanks Peter for your comments.  While you may not be a voting member
>>>of COMCIFS, you and other COMCIFS members fulfill an important
>  >>advisory role and I would encourage everybody to take the opportunity
>>>to provide their perspectives.
>>>
>>>I assume you have no particular disagreement with the principles that
>>>you haven't commented on explicitly?
>>>
>>>I've added some comments in response to your comments, inserted below:
>>>
>>>On Fri, Mar 4, 2011 at 7:25 PM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>>>>   I add some comments arising out of my own experience with 
>>>>XML/CML which may
>>>>   be useful. I don't think I am a full member of COMCIFs so feel free to
>>>>   ignore all or any. I comment after significant paragraphs.
>>>>
>>>>   On Fri, Mar 4, 2011 at 6:03 AM, James Hester 
>>>><jamesrhester at gmail.com> wrote:
>>>>>
>>>>>   1. A feature should only be added to CIF syntax if all of the
>>>>>   following are satisfied:
>>>>>
>>>>>   (i) implementation or use of equivalent behaviour at dictionary level
>>>>>   is either significantly more cumbersome or not possible;
>>>>>   (ii) the feature provides significant new functionality that is widely
>>>>>   applicable to most scientific domains
>>>>>   (iii) reliable transfer and archiving of data is not compromised
>>>>>   (iv) there is no simpler way of achieving the desired behaviour
>>>>>
>>>>   I would add:
>>>   > * a feature should only be added if it has been shown possible 
>>>to implement
>>>>   it with "reasonable ease". "Rough consensus and running code"
>>>
>>>I agree that this is a reasonable requirement.  I would express it in
>>>terms of cost/benefit, so something with a significant benefit would
>>>justify extra effort.
>>>
>>>>>
>>>>>   Example 2: Unicode support in CIF2.  This is broadly useful, given the
>>>>>   international nature of science and range of symbols used in
>>>>>   scientific papers.  It could have been implemented in dictionaries
>>>>>   using ASCII escapes, but this would have been cumbersome to use, so it
>>>>>   satisfies Principle 1.  We have adopted Unicode (rather than created
>>>>>   our own international character set) and copied the XML character
>>>>>   ranges (Principle 2)
>>>>
>>>>   I found the original ASCII escapes difficult/tedious for some code points
>>>>   and woudl urge full unicode support (with numeric values).
>>>
>>>I perhaps wasn't clear that we have already taken this step.  The
>>>current CIF2 draft envisions full Unicode support using UTF-8
>>>encoding.  Some provision has been made for allowing other encodings
>>>in the future.  The point of the example was to show how this decision
>>>to adopt Unicode was justifiable in terms of these principles.
>>>
>>>[rest edited out]
>>>--
>>>T +61 (02) 9717 9907
>>>F +61 (02) 9717 3145
>>>M +61 (04) 0249 4148
>>>_______________________________________________
>>>comcifs mailing list
>>>comcifs at iucr.org
>>>http://scripts.iucr.org/mailman/listinfo/comcifs
>>
>>
>>  --
>>  =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   yaya at dowling.edu
>>  =====================================================
>>  _______________________________________________
>>  comcifs mailing list
>>  comcifs at iucr.org
>>  http://scripts.iucr.org/mailman/listinfo/comcifs
>>
>
>
>
>--
>T +61 (02) 9717 9907
>F +61 (02) 9717 3145
>M +61 (04) 0249 4148
>_______________________________________________
>comcifs mailing list
>comcifs at iucr.org
>http://scripts.iucr.org/mailman/listinfo/comcifs


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya at dowling.edu
=====================================================


More information about the comcifs mailing list