Advice on COMCIFS policy regarding compatibility of CIF syntax with other domains

Herbert J. Bernstein yaya at bernstein-plus-sons.com
Fri Mar 4 14:20:43 GMT 2011


Dear Colleagues,

    I very much agree with the concept of clearly established principles to
help guide our discussions.  To summarize James's list as amended by Peter,
what he has proposed as guiding principles are:

====================================================================

Principles guiding development of CIF syntax
-----------------------------------------------------------------

Preamble: The CIF syntax describes a human-readable, syntactic
container for scientific data.  CIF syntax aims to be as simple as
possible.  The domain dictionaries are the primary location of
semantic information in the Crystallographic Information Framework.

1. A feature should only be added to CIF syntax if all of the
following are satisfied:

(i) implementation or use of equivalent behaviour at dictionary level
is either significantly more cumbersome or not possible;
(ii) the feature provides significant new functionality that is widely
applicable to most scientific domains
(iii) reliable transfer and archiving of data is not compromised
(iv) there is no simpler way of achieving the desired behaviour
(v) a feature should only be added if it has been shown possible to
implement it with "reasonable ease," i.e. provides a benefit
worth the cost, and there is a "rough consensus and running code"

2. As long as the requirements in (1) are satisfied, the CIF framework should:
   (i) behave in a way that is consistent with common usage
   (ii) align with pre-existing standards where those standards provide
the required behaviour. CIF1 can be considered a pre-existing standard
for CIF2 in this context.

3. Non-technical issues should be dealt with in non-technical arenas.

(End)

====================================================================

Before turning to the lower numbered principles, I would suggest we
discuss item 3, because I believe it conflicts with item 1, which
discusses features in terms of being "cumbersome," "significant,"
being "simpler," "desired behavior," "reasonable ease," and "rough
consensus," all of which have strong psychological and other
human factors components that may be difficult to quantify
and make into "technical issues."  In software engineering, my
own field, there is a long history of failure in trying to make
the systems design process into something purely technical.  On
purely technical grounds, we would never have accepted C or Python
as a suitable programming languages for serious work.  Everything
important from operating systems to dREL would be based on Pascal.
That was the technical consensus of the Computer Science world of
a few decades ago.  Technically, Pascal is a much better language
than C or Python.  It just happens that real people produce better,
more reliable code with C and with Python than with Pascal, because,
for reasons that are still not clearly understood, they get less
confused and make fewer mistakes when working with those languages.
Precisely because we don't fully understand the non-technical issues
in the design of information systems (and, I suspect, in almost all
systems), one of the accepted principles of software engineering
is to clearly identify all stakeholders, bring them into the discussion
and work to achieve their "buy-in".  Therefore I propose that we
replace principle 3 with

======================================================================

3.  The stakeholders impacted by any change should be clearly
identified and the proposed changes should be fully and openly
discussed them in an effort to achieve their buy-in to the change,
and the change should not go forward in absence of such buy-in
absent pressing technical reasons for making the change over
such objections.

======================================================================

In principle 2, we reference CIF1.  I believe that should be CIF1.1.

Now I would like to turn to principle 1.(ii):  the feature provides
significant new functionality that is widely applicable to most
scientific domains

This principle would prevent CIF from having features which
support any one scientific domain.  Under this principle, we never
would have had DDL2 and mmCIF, nor imgCIF.  I would suggest changing
this principle to"

======================================================================

1.(ii).  the feature provides significant new functionality for
some scientific application domain, and does not interfere with
the use of CIF in other scientific application domains.

======================================================================

Finally, let us consider principle 1.(i): implementation or use of
equivalent behaviour at dictionary level is either significantly more
cumbersome or not possible;

Depending on how we interpret the non-technical word "cumbersome", this
may create the impression that we will require all uses of CIF
to require use of dictionaries.  I would suggest instead:

======================================================================


1.(i): If it is feasible to implement the desired behavior by
specification of changes to dictionaries rather then to CIF syntax,
that alternative should be seriously considered and balanced against
the human-readability of the resulting CIFs without reference to
dictionaries;

======================================================================

Thus the revised principles I would suggest would be:

====================================================================

Principles guiding development of CIF syntax
-----------------------------------------------------------------

Preamble: The CIF syntax describes a human-readable, syntactic
container for scientific data.  CIF syntax aims to be as simple as
possible.  The domain dictionaries are the primary location of
semantic information in the Crystallographic Information Framework.

1. A feature should only be added to CIF syntax if all of the
following are satisfied:

(i): If it is feasible to implement the desired behavior by
specification of changes to dictionaries rather then to CIF syntax,
that alternative should be seriously considered and balanced against
the human-readability of the resulting CIFs without reference to
dictionaries;
(ii).  the feature provides significant new functionality for
some scientific application domain, and does not interfere with
the use of CIF in other scientific application domains.
(iii) reliable transfer and archiving of data is not compromised
(iv) there is no simpler way of achieving the desired behaviour
(v) a feature should only be added if it has been shown possible to
implement it with "reasonable ease," i.e. provides a benefit
worth the cost, and there is a "rough consensus and running code"

2. As long as the requirements in (1) are satisfied, the CIF framework should:
   (i) behave in a way that is consistent with common usage
   (ii) align with pre-existing standards where those standards provide
the required behaviour. CIF1.1 can be considered a pre-existing standard
for CIF2 in this context.

3.  The stakeholders impacted by any change should be clearly
identified and the proposed changes should be fully and openly
discussed them in an effort to achieve their buy-in to the change,
and the change should not go forward in absence of such buy-in
absent pressing technical reasons for making the change over
such objections.

(End)

====================================================================





Regards,
   Herbert



At 10:47 PM +1100 3/4/11, James Hester wrote:
>Thanks Peter for your comments.  While you may not be a voting member
>of COMCIFS, you and other COMCIFS members fulfill an important
>advisory role and I would encourage everybody to take the opportunity
>to provide their perspectives.
>
>I assume you have no particular disagreement with the principles that
>you haven't commented on explicitly?
>
>I've added some comments in response to your comments, inserted below:
>
>On Fri, Mar 4, 2011 at 7:25 PM, Peter Murray-Rust <pm286 at cam.ac.uk> wrote:
>>  I add some comments arising out of my own experience with XML/CML which may
>>  be useful. I don't think I am a full member of COMCIFs so feel free to
>>  ignore all or any. I comment after significant paragraphs.
>>
>>  On Fri, Mar 4, 2011 at 6:03 AM, James Hester <jamesrhester at gmail.com> wrote:
>>>
>>>  1. A feature should only be added to CIF syntax if all of the
>>>  following are satisfied:
>>>
>>>  (i) implementation or use of equivalent behaviour at dictionary level
>>>  is either significantly more cumbersome or not possible;
>>>  (ii) the feature provides significant new functionality that is widely
>>>  applicable to most scientific domains
>>>  (iii) reliable transfer and archiving of data is not compromised
>>>  (iv) there is no simpler way of achieving the desired behaviour
>>>
>>  I would add:
>  > * a feature should only be added if it has been shown possible to implement
>>  it with "reasonable ease". "Rough consensus and running code"
>
>I agree that this is a reasonable requirement.  I would express it in
>terms of cost/benefit, so something with a significant benefit would
>justify extra effort.
>
>>>
>>>  Example 2: Unicode support in CIF2.  This is broadly useful, given the
>>>  international nature of science and range of symbols used in
>>>  scientific papers.  It could have been implemented in dictionaries
>>>  using ASCII escapes, but this would have been cumbersome to use, so it
>>>  satisfies Principle 1.  We have adopted Unicode (rather than created
>>>  our own international character set) and copied the XML character
>>>  ranges (Principle 2)
>>
>>  I found the original ASCII escapes difficult/tedious for some code points
>>  and woudl urge full unicode support (with numeric values).
>
>I perhaps wasn't clear that we have already taken this step.  The
>current CIF2 draft envisions full Unicode support using UTF-8
>encoding.  Some provision has been made for allowing other encodings
>in the future.  The point of the example was to show how this decision
>to adopt Unicode was justifiable in terms of these principles.
>
>[rest edited out]
>--
>T +61 (02) 9717 9907
>F +61 (02) 9717 3145
>M +61 (04) 0249 4148
>_______________________________________________
>comcifs mailing list
>comcifs at iucr.org
>http://scripts.iucr.org/mailman/listinfo/comcifs


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya at dowling.edu
=====================================================


More information about the comcifs mailing list