Advice on COMCIFS policy regarding compatibility of CIFsyntax with other domains. .

James Hester jamesrhester at gmail.com
Fri Mar 18 04:13:51 GMT 2011


I will be somewhat terse in my reply due to time constraints.  I will
not address the many ideas raised in Herbert's revised preamble unless
we can get to some agreement on the principles.

What I have agreed with is that there is an in-principle need for a
"common semantics" document that is logically separate from the syntax
document.  A key issue I have with Herbert's completely revised
formulation below is that it seeks to bundle "common semantics" into
the syntax document.  I have deliberately separated out semantics from
syntax for clarity of specification and discussion.  Putting them back
together in the guidelines simply complicates our task for no reason
that I can discern.

Note that having separate specifications for syntax and common
semantics does not mean that programmers are required to separate
syntax and semantics when writing applications, or that educational
materials are required to do this.

A further issue with Herbert's new formulation is that the role of the
DDL is either ignored or conflated with "common semantics".  While
some DDL changes may require changes to core syntax, there is a vast
amount of semantics that can be defined within the confines of a DDL,
with no need to appeal to "common semantics" or changes in syntax.
Therefore the new principle 1(b) below needs, at the very least, to be
redrafted to include the possibility of using DDL-based mechanisms to
implement the required semantics - most obviously, by adding a new
type in com_val.dic in the case of DDLm.

I will close by asking Herbert: Do you see the approach of having
separate syntax and a common semantics documents as being undesirable?
If so, why?  If you assert that syntax and semantics are inseparable,
can you give an example from CIF1.1 to make your point?

On Wed, Mar 16, 2011 at 11:35 PM, Herbert J. Bernstein
<yaya at bernstein-plus-sons.com> wrote:
> Dear James,
>
>  I am glad we are getting closer, but now please consider what
> you have written, and what it really means in practical terms:
>
>> Preamble: The CIF syntax describes a human-readable, syntactic
>> container for scientific data.
>
> The word syntactic is misplaced here, and the "human-readable"
> constratint was lost years ago with the creation of mmCIF.  As
> we have just agreed, the semantics is an important part of the language,

Human-readability has in no way been lost.  In the context of
syntax, what I mean by human-readability refers to:
(i) use of space-separated tokens (like words in sentences);
(ii) access to file contents using generic text-editing tools;
(iii) ability for a human reader to immediately understand their
location within a CIF datastructure (ie in a loop, reading a value,
reading a dataname etc.)

In the context of syntax, human-readability does *not* refer to the
contents of datanames or datavalues.  That would be a semantic
concern, and as Herbert points out, some datavalues are not
intelligible to the human reader.

We have never disagreed that semantics is important to CIF.  For
anyone to think otherwise would be bizarre.  What we recently agreed
was that there was an in-principle need for a 'common semantic
features' document.

> Also, in practice, one of the most important contributions
> of CIF to our science has been the controlled vocaubulary it has
> provided, independent of the form of expression:  tag-value, XML,
> HDF5, etc.  In addition, for the PDB, the important issue is _not_
> the human readability, but the preservation of all the essential
> information of a scientific experiment, and, if you glance throught
> some Acta C entries, you will see that even for small molecules,
> the days of human readable CIFs are far behind us.   When we make a change,
> we need to bear all of that in mind.

Yes, controlled vocabulary is an important contribution.  Another
important contribution is the entire ontological framework. I do not
see that the guidelines as I've presented them impact negatively on
those areas. Furthermore, I maintain that those human-readable aspects
that I have listed above are worth preserving.  Note that I do not
even wish to exclude the possibility of an alternative representation
of CIF datastructures in binary form in future developments, but that
would necessarily be a different syntax to which the present guiding
principles are not intended to apply.

> I would recommend starting with a clearer expression of what CIF is:
>
> ===================
> CIF is a language for the management of scientific data.  If combines
> a controlled vocuabulary with a simple, human-readable form of expression
> (the CIF syntax) backed by rules clarifying the meaning of the language
> (the CIF semantics).  The overarching goal of CIF is to ensure that
> the data of the relevant domains can be generated, transformed, transmitted
> and archived in ways that facilitate doing the science
> involved in ways that both serve the individual scientific domains and
> ensure that different domains can share information reliably.
> =================

This is not a clearer expression of what CIF is.  It is not a language
in the accepted sense of the word.  It fulfills some functions
necessary for the management of scientific data, it does not manage
the data.  "Facilitation of science" is far too vague a term to be
useful, as different scientists will legitimately claim different
features "facilitate" their science.

(Original preamble)
>> CIF syntax aims to be as simple as
>> possible.  The domain dictionaries are the primary location of
>> semantic information in the Crystallographic Information Framework.
>> In the following, the phrase 'dictionary level' refers either to the
>> domain
>> dictionaries, the DDL language in which the domain dictionaries are
>> written, or the CIF2 common semantic features specification which
>> imposes minimum requirements on the semantics specified by dictionaries
>> and DDLs.
>
> Given that much modified goals, this next paragraph becomes an
> inappropriate strait jacket, misallocating responsibilies.  I
> would suggest we return to what the real practice has been:
>
> ============
> The CIF language tries for an appropriate balance between simplicity
> and sufficient expressive ability to meet the needs of the scientific
> domains involved, and changes to the existing syntax and common semantics
> should only be made for good reason.  If it is possible to make a
> needed change by simply defining a new term in the controlled vocabulary,
> in one of the domain dictionaries, then that option whould be considered
> first, especially because the controlled vocabulary is used in
> other forms of expression, such as XML and HDF5.  This is what we
> will call a change "at the dictionary level".  However, there are
> times, e.g. with the introduction of a new dictionary definition
> langauge, when changes are needed in the common syntax and semantics
> that apply to all domains.
> ================

>>
>> 1. A feature should only be added to CIF syntax if all of the
>> following are satisfied:
>>
>> (i) Implementation of the desired behavior by
>> changes at the dictionary level rather than to CIF syntax
>> is not feasible, or else such changes, while feasible, would
>> significantly reduce human readability;
>
> Then I would suggest the following version of this guideline, recognizing
> the current division of labor.
>
> ==========
>
> 1. a. A feature should only be added to or changed in the the common syntax
> and semantics of the CIF language if implementation of the desired behavior
> by changes in the controlled vocabulary at the dictionary level is not
> feasible, or such changes, while feasible, would make it significantly more
> difficult for either people or software systems to work with the data
> effectively than when done by a change in the vocabulary; and
>
> 1. b. A feature should only be added by changes in the common syntax
> of the CIF language if implementation of the desired behavior by
> changes in the common semantics is not feasible, or such changes, while
> feasible, would make it significantly more difficult for either people or
> software systems to work with the data effectively than when done
> by a change in the syntax.
>
> ==============
>
> -- Herbert
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>        Idle Hour Blvd, Oakdale, NY, 11769
>
>                 +1-631-244-3035
>                 yaya at dowling.edu
> =====================================================
>
> On Wed, 16 Mar 2011, James Hester wrote:
>
>> Hi Herbert,
>>
>> I agree that there is an in-principle need for a common semantic
>> features document, and I thank you for directing our attention to this
>> issue.  John B has suggested a 'base semantics' document to accompany
>> the 'base syntax' document.  This seems like a workable approach to
>> me, and we would call the 'base semantics' document 'common semantic
>> features' in keeping with CIF1.  I would further suggest we hold off
>> on developing the 'common semantic features' document until we have
>> finished the syntax.
>>
>> Below find a redrafted version of the Preamble and point 1(i) to make
>> the existence of the common semantic features document clear.
>>
>> ===========================================================
>> Principles guiding development of CIF syntax
>> -----------------------------------------------------------------
>>
>> Preamble: The CIF syntax describes a human-readable, syntactic
>> container for scientific data.  CIF syntax aims to be as simple as
>> possible.  The domain dictionaries are the primary location of
>> semantic information in the Crystallographic Information Framework.
>> In the following, the phrase 'dictionary level' refers either to the
>> domain
>> dictionaries, the DDL language in which the domain dictionaries are
>> written, or the CIF2 common semantic features specification which
>> imposes minimum requirements on the semantics specified by dictionaries
>> and DDLs.
>>
>> 1. A feature should only be added to CIF syntax if all of the
>> following are satisfied:
>>
>> (i) Implementation of the desired behavior by
>> changes at the dictionary level rather than to CIF syntax
>> is not feasible, or else such changes, while feasible, would
>> significantly reduce human readability;
>>
>> (end of changes)
>>
>> On Wed, Mar 16, 2011 at 11:44 AM, Herbert J. Bernstein
>> <yaya at bernstein-plus-sons.com> wrote:
>>>
>>> Dear James,
>>>
>>>  I am not objecting to Brian's document.  I think we should keep
>>> as much of it as possible for CIF2.  The only problem is that it
>>> is a "semantic" document and your policy according to you and
>>> John B. seems to want to relegate all semantic issues to the
>>> dictionaries.  It is that relegation to which I am objecting.
>>> Most features consist of both syntactic and semantic components,
>>> and I find it much less confusing to deal with a feature in
>>> its entirety than to deal with just the syntax.
>>>
>>>  Until this discussion, I had thought the intent of the dictionaries
>>> was to deal with the tag definitions particular to certain domains
>>> and that both the syantax and semantics of CIF was a global concern.
>>> I find the relegation of the semantics of CIF2 to the dictionaries
>>> surprising and recommend against it.  I want to keep Brian's
>>> document a global document.
>>>
>>>  Regards,
>>>    Herbert
>>>
>>> =====================================================
>>>  Herbert J. Bernstein, Professor of Computer Science
>>>   Dowling College, Kramer Science Center, KSC 121
>>>        Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>>                 +1-631-244-3035
>>>                 yaya at dowling.edu
>>> =====================================================
>>>
>>> On Wed, 16 Mar 2011, James Hester wrote:
>>>
>>>> Dear Herbert,
>>>>
>>>> Please explain why you think that the latest version of the guiding
>>>> principles is at variance with the 'Common Semantic Features' document
>>>> and approach.  For example, what would prevent us from adopting a
>>>> similar CSF document for CIF2?  It would help if you quoted particular
>>>> points from the guidelines in your reply.
>>>>
>>>> James.
>>>>
>>>> On Wed, Mar 16, 2011 at 3:59 AM, Herbert J. Bernstein
>>>> <yaya at bernstein-plus-sons.com> wrote:
>>>>>
>>>>> Dear Colleagues,
>>>>>
>>>>>   I would suggest that people review Brian's excellent common
>>>>> semantic features document for CIF 1.1.  I think keeping those
>>>>> sort of semantic decisions couple to the syntax decisions for
>>>>> CIF has worked well, and I do not think the sharp departure
>>>>> now proposed for handling CIF2 will work as well for the
>>>>> reasons I stated previously.  It ain't broke.  Why are
>>>>> we fixing it?  New feautures involve a mix of syntax and
>>>>> semantics depedending on the feature.  I believe we should
>>>>> be focusing on features rather than the bin within which
>>>>> they fit for presentation purposes.
>>>>>
>>>>>   Regards,
>>>>>      Herbert
>>>>> =====================================================
>>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>>    Dowling College, Kramer Science Center, KSC 121
>>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>>
>>>>>                  +1-631-244-3035
>>>>>                  yaya at dowling.edu
>>>>> =====================================================
>>>>>
>>>>
>>>>
>>>> --
>>>> T +61 (02) 9717 9907
>>>> F +61 (02) 9717 3145
>>>> M +61 (04) 0249 4148
>>>> _______________________________________________
>>>> comcifs mailing list
>>>> comcifs at iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/comcifs
>>>
>>> _______________________________________________
>>> comcifs mailing list
>>> comcifs at iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/comcifs
>>>
>>>
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>> _______________________________________________
>> comcifs mailing list
>> comcifs at iucr.org
>> http://scripts.iucr.org/mailman/listinfo/comcifs
>
> _______________________________________________
> comcifs mailing list
> comcifs at iucr.org
> http://scripts.iucr.org/mailman/listinfo/comcifs
>
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148


More information about the comcifs mailing list