Advice on COMCIFS policy regarding compatibility of CIF syntax with other domains

James Hester jamesrhester at gmail.com
Tue Mar 15 03:57:34 GMT 2011


Herbert has proposed a revised version of the principles for syntax
development I put forward.  Starting from the lower-numbered ones:

(Herbert suggests)
> ======================================================================
>
>
> 1.(i): If it is feasible to implement the desired behavior by
> specification of changes to dictionaries rather then to CIF syntax,
> that alternative should be seriously considered and balanced against
> the human-readability of the resulting CIFs without reference to
> dictionaries;
>
> ======================================================================

Original text:

> (i) implementation or use of equivalent behaviour at dictionary level
> is either significantly more cumbersome or not possible;

Note that the intent of my original formulation was to move semantic
complexity to the dictionary level where this is practical, and
Herbert's formulation has rephrased this in terms of human
readability.   I have fiddled with Herbert's text below in order to
indicate that dictionary-based change is not always going to impact
human readability negatively, and to remove 'without reference to the
dictionary' as I don't think that is relevant.  I can expand on that
last sentence if it is controversial.

===================================================
1.(i): If it is feasible to implement the desired behavior by
specification of changes to dictionaries rather than to CIF syntax,
that alternative is to be preferred, and balanced against a syntax-based
change in those cases where a dictionary-based change would
significantly reduce human readability.
=====================================================

Next is principle 1(ii)

Original text:
>  the feature provides
> significant new functionality that is widely applicable to most
> scientific domains
>
> This principle would prevent CIF from having features which
> support any one scientific domain.  Under this principle, we never
> would have had DDL2 and mmCIF, nor imgCIF.  I would suggest changing
> this principle to
> ======================================================================
>
> 1.(ii).  the feature provides significant new functionality for
> some scientific application domain, and does not interfere with
> the use of CIF in other scientific application domains.
>
> ======================================================================

As John B has pointed out, 1(ii) applies to syntax only, so those
counterexamples are invalid.  If anything, they demonstrate the power
of the CIF1 framework and justify the approach of focusing development
work at the dictionary level.  Furthermore, there is no such thing as
a new syntactical feature that does not impact other application
domains - a new syntactical feature requires that everybody implements
it, which is therefore an unavoidable cost.  Pending some valid
counterexamples, I do not accept Herbert's proposed alteration to
1(ii).

Finally and most importantly, Herbert suggests a completely new
Principle 3.  The original Principle 3 was intended to resolve
conflicts between Principles 1 and 2 in clear favour of Principle 1.
This original Principle 3 would ensure that the standard remained
consistent with the clear vision of the Preamble, while requiring any
issues not relating to those in Principle 1 to be addressed in
appropriate arenas.

The crux of Herbert's argument around the original Principle 3 is that
(i) Principles 1 and 2 use some words that are not entirely technical
(so belong in the realm of Principle 3), and (ii) basing design on
purely technical grounds is not a guarantee of success.  To address
these perceived flaws, Herbert proposes instead working for 'buy-in'
among stakeholders before making any changes.  In response to point
(i): consider that Principles 1 and 2 describe a framework within
which syntax changes can be made.  Whether this is called "technical"
or "objective" or something else is not the key point; the key point
is that a syntax change must satisfy those principles, whatever their
character.  I have endeavoured to use wording that would allow
reasonable people to agree. So Herbert's objection (i) does not
ultimately matter. In response to (ii) I would argue that basing
design on stakeholder buy-in is also not a guarantee of success, so it
is not on the face of it a superior approach to the original Principle
3 approach. Brian makes the point that we cannot guarantee stakeholder
buy-in.  I would then take the next logical step of saying that making
changes to the syntax contingent on stakeholder buy-in would be
tantamount to freezing the standard.  John B raises all sorts of
questions about just how the level of buy-in might be assessed.

I would ask further: in order to achieve 'buy-in', are we talking only
about weighting Principle 2 (conformance to other standards) higher
than Principle 1 (remaining true to the vision of CIF syntax in the
Preamble) in order to achieve buy-in?  Or are we talking about making
arbitrary syntactical changes in return for 'buy-in'?  And what do we
get in return for this willingness on our part to compromise our
consistency and coherency?  A promise to think about using CIF?  An
undertaking to write software for CIF?  A statement from that
stakeholder that CIF is a fine standard?  And how on earth can anybody
in science speak on behalf of everybody in their field in anything but
a speculative manner anyway?  Let us remember that it is one thing to
achieve 'buy-in' from a company that is paying for a software
engineering project (where cash on the table is a tangible result of
buy in) and another to get a whole field of science to move in the
same direction (particularly if most of them don't care one way or the
other about CIF syntax).

So I conclude that it would be terribly counterproductive to include
'buy-in' in these principles.  Instead I propose a 4th principle
(which is simply a restatement of accepted COMCIFS procedure) to
address any concerns that serious objections from CIF stakeholders
will be ignored:

===============
4.  Draft syntax changes will be made available on the IUCr website
for public comment for a period of at least 6 weeks, following which
COMCIFS voting members, after consideration of any objections raised,
can vote to accept the change. A syntax change will be accepted if 3/4
of COMCIFS voting members approve it.
===============

Insofar as the voting members represent, and can seek advice from, a
broad cross-section of CIF stakeholders, their vote should be a
sufficient reflection of the desirability of any given change.

Considerable time has gone by without any further responses on this
topic, so we may safely assume that those who are interested in this
issue have commented.  It remains to converge on an agreed position
and then to tidy up the wording, following which we will apply these
principles to what I hope is the last syntax change for a while, the
'elide question'.

James.

On Sat, Mar 5, 2011 at 1:20 AM, Herbert J. Bernstein
<yaya at bernstein-plus-sons.com> wrote:
> Dear Colleagues,
>
>    I very much agree with the concept of clearly established principles to
> help guide our discussions.  To summarize James's list as amended by Peter,
> what he has proposed as guiding principles are:
>
> ====================================================================
>
> Principles guiding development of CIF syntax
> -----------------------------------------------------------------
>
> Preamble: The CIF syntax describes a human-readable, syntactic
> container for scientific data.  CIF syntax aims to be as simple as
> possible.  The domain dictionaries are the primary location of
> semantic information in the Crystallographic Information Framework.
>
> 1. A feature should only be added to CIF syntax if all of the
> following are satisfied:
>
> (i) implementation or use of equivalent behaviour at dictionary level
> is either significantly more cumbersome or not possible;
> (ii) the feature provides significant new functionality that is widely
> applicable to most scientific domains
> (iii) reliable transfer and archiving of data is not compromised
> (iv) there is no simpler way of achieving the desired behaviour
> (v) a feature should only be added if it has been shown possible to
> implement it with "reasonable ease," i.e. provides a benefit
> worth the cost, and there is a "rough consensus and running code"
>
> 2. As long as the requirements in (1) are satisfied, the CIF framework should:
>   (i) behave in a way that is consistent with common usage
>   (ii) align with pre-existing standards where those standards provide
> the required behaviour. CIF1 can be considered a pre-existing standard
> for CIF2 in this context.
>
> 3. Non-technical issues should be dealt with in non-technical arenas.
>
> (End)
>
> ====================================================================
>
> Before turning to the lower numbered principles, I would suggest we
> discuss item 3, because I believe it conflicts with item 1, which
> discusses features in terms of being "cumbersome," "significant,"
> being "simpler," "desired behavior," "reasonable ease," and "rough
> consensus," all of which have strong psychological and other
> human factors components that may be difficult to quantify
> and make into "technical issues."  In software engineering, my
> own field, there is a long history of failure in trying to make
> the systems design process into something purely technical.  On
> purely technical grounds, we would never have accepted C or Python
> as a suitable programming languages for serious work.  Everything
> important from operating systems to dREL would be based on Pascal.
> That was the technical consensus of the Computer Science world of
> a few decades ago.  Technically, Pascal is a much better language
> than C or Python.  It just happens that real people produce better,
> more reliable code with C and with Python than with Pascal, because,
> for reasons that are still not clearly understood, they get less
> confused and make fewer mistakes when working with those languages.
> Precisely because we don't fully understand the non-technical issues
> in the design of information systems (and, I suspect, in almost all
> systems), one of the accepted principles of software engineering
> is to clearly identify all stakeholders, bring them into the discussion
> and work to achieve their "buy-in".  Therefore I propose that we
> replace principle 3 with
>
> ======================================================================
>
> 3.  The stakeholders impacted by any change should be clearly
> identified and the proposed changes should be fully and openly
> discussed them in an effort to achieve their buy-in to the change,
> and the change should not go forward in absence of such buy-in
> absent pressing technical reasons for making the change over
> such objections.
>
> ======================================================================
>
> In principle 2, we reference CIF1.  I believe that should be CIF1.1.
>
> Now I would like to turn to principle 1.(ii):  the feature provides
> significant new functionality that is widely applicable to most
> scientific domains
>
> This principle would prevent CIF from having features which
> support any one scientific domain.  Under this principle, we never
> would have had DDL2 and mmCIF, nor imgCIF.  I would suggest changing
> this principle to"
>
> ======================================================================
>
> 1.(ii).  the feature provides significant new functionality for
> some scientific application domain, and does not interfere with
> the use of CIF in other scientific application domains.
>
> ======================================================================
>
> Finally, let us consider principle 1.(i): implementation or use of
> equivalent behaviour at dictionary level is either significantly more
> cumbersome or not possible;
>
> Depending on how we interpret the non-technical word "cumbersome", this
> may create the impression that we will require all uses of CIF
> to require use of dictionaries.  I would suggest instead:
>
> ======================================================================
>
>
> 1.(i): If it is feasible to implement the desired behavior by
> specification of changes to dictionaries rather then to CIF syntax,
> that alternative should be seriously considered and balanced against
> the human-readability of the resulting CIFs without reference to
> dictionaries;
>
> ======================================================================
>
> Thus the revised principles I would suggest would be:
>
> ====================================================================
>
> Principles guiding development of CIF syntax
> -----------------------------------------------------------------
>
> Preamble: The CIF syntax describes a human-readable, syntactic
> container for scientific data.  CIF syntax aims to be as simple as
> possible.  The domain dictionaries are the primary location of
> semantic information in the Crystallographic Information Framework.
>
> 1. A feature should only be added to CIF syntax if all of the
> following are satisfied:
>
> (i): If it is feasible to implement the desired behavior by
> specification of changes to dictionaries rather then to CIF syntax,
> that alternative should be seriously considered and balanced against
> the human-readability of the resulting CIFs without reference to
> dictionaries;
> (ii).  the feature provides significant new functionality for
> some scientific application domain, and does not interfere with
> the use of CIF in other scientific application domains.
> (iii) reliable transfer and archiving of data is not compromised
> (iv) there is no simpler way of achieving the desired behaviour
> (v) a feature should only be added if it has been shown possible to
> implement it with "reasonable ease," i.e. provides a benefit
> worth the cost, and there is a "rough consensus and running code"
>
> 2. As long as the requirements in (1) are satisfied, the CIF framework should:
>   (i) behave in a way that is consistent with common usage
>   (ii) align with pre-existing standards where those standards provide
> the required behaviour. CIF1.1 can be considered a pre-existing standard
> for CIF2 in this context.
>
> 3.  The stakeholders impacted by any change should be clearly
> identified and the proposed changes should be fully and openly
> discussed them in an effort to achieve their buy-in to the change,
> and the change should not go forward in absence of such buy-in
> absent pressing technical reasons for making the change over
> such objections.
>
> (End)
>
> ====================================================================
>
>
>
>
>
> Regards,
>   Herbert
>
>
>
> At 10:47 PM +1100 3/4/11, James Hester wrote:
>>Thanks Peter for your comments.  While you may not be a voting member
>>of COMCIFS, you and other COMCIFS members fulfill an important
>>advisory role and I would encourage everybody to take the opportunity
>>to provide their perspectives.
>>
>>I assume you have no particular disagreement with the principles that
>>you haven't commented on explicitly?
>>
>>I've added some comments in response to your comments, inserted below:
>>
>>On Fri, Mar 4, 2011 at 7:25 PM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>>>  I add some comments arising out of my own experience with XML/CML which may
>>>  be useful. I don't think I am a full member of COMCIFs so feel free to
>>>  ignore all or any. I comment after significant paragraphs.
>>>
>>>  On Fri, Mar 4, 2011 at 6:03 AM, James Hester <jamesrhester at gmail.com> wrote:
>>>>
>>>>  1. A feature should only be added to CIF syntax if all of the
>>>>  following are satisfied:
>>>>
>>>>  (i) implementation or use of equivalent behaviour at dictionary level
>>>>  is either significantly more cumbersome or not possible;
>>>>  (ii) the feature provides significant new functionality that is widely
>>>>  applicable to most scientific domains
>>>>  (iii) reliable transfer and archiving of data is not compromised
>>>>  (iv) there is no simpler way of achieving the desired behaviour
>>>>
>>>  I would add:
>>  > * a feature should only be added if it has been shown possible to implement
>>>  it with "reasonable ease". "Rough consensus and running code"
>>
>>I agree that this is a reasonable requirement.  I would express it in
>>terms of cost/benefit, so something with a significant benefit would
>>justify extra effort.
>>
>>>>
>>>>  Example 2: Unicode support in CIF2.  This is broadly useful, given the
>>>>  international nature of science and range of symbols used in
>>>>  scientific papers.  It could have been implemented in dictionaries
>>>>  using ASCII escapes, but this would have been cumbersome to use, so it
>>>>  satisfies Principle 1.  We have adopted Unicode (rather than created
>>>>  our own international character set) and copied the XML character
>>>>  ranges (Principle 2)
>>>
>>>  I found the original ASCII escapes difficult/tedious for some code points
>>>  and woudl urge full unicode support (with numeric values).
>>
>>I perhaps wasn't clear that we have already taken this step.  The
>>current CIF2 draft envisions full Unicode support using UTF-8
>>encoding.  Some provision has been made for allowing other encodings
>>in the future.  The point of the example was to show how this decision
>>to adopt Unicode was justifiable in terms of these principles.
>>
>>[rest edited out]
>>--
>>T +61 (02) 9717 9907
>>F +61 (02) 9717 3145
>>M +61 (04) 0249 4148
>>_______________________________________________
>>comcifs mailing list
>>comcifs at iucr.org
>>http://scripts.iucr.org/mailman/listinfo/comcifs
>
>
> --
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>
>                  +1-631-244-3035
>                  yaya at dowling.edu
> =====================================================
> _______________________________________________
> comcifs mailing list
> comcifs at iucr.org
> http://scripts.iucr.org/mailman/listinfo/comcifs
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148


More information about the comcifs mailing list