Advice on COMCIFS policy regarding compatibility of CIFsyntax with other domains. .
Herbert J. Bernstein
yaya at bernstein-plus-sons.com
Tue Mar 15 16:59:12 GMT 2011
I would suggest that people review Brian's excellent common
semantic features document for CIF 1.1. I think keeping those
sort of semantic decisions couple to the syntax decisions for
CIF has worked well, and I do not think the sharp departure
now proposed for handling CIF2 will work as well for the
reasons I stated previously. It ain't broke. Why are
we fixing it? New feautures involve a mix of syntax and
semantics depedending on the feature. I believe we should
be focusing on features rather than the bin within which
they fit for presentation purposes.
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
yaya at dowling.edu
On Tue, 15 Mar 2011, Bollinger, John C wrote:
> On Tuesday, March 15, 2011 6:39 AM, Herbert J. Bernstein wrote:
>> My apologies. This may take a while. To avoid critical points getting
>> lost, I'd like to focus on one sub-issue at a time, starting with the
>> divide between syntax and semantics. If we isolate all syntax development
>> from semantics and relegate all semantics to the dictionaries, CIF becomes
>> something very different from what it has been in the past, through
>> CIF1.1. CIF would be confined purely to considerations of which strings
>> of characters are valid.
> I agree that syntax cannot be wholly divorced from semantics, but I don't think anyone suggested such a split. James's revised principles merely express a bias towards standardizing "behavior" via dictionaries vs. via the base syntax. This is consistent with usage of CIF 1.1. In any case, it is the *syntax* specification that is currently the focus of attention, and it is feasible for that document to be limited to exactly the scope Herbert describes, provided that it is accompanied by a companion document specifying the needed base semantics.
>> The dictionaries would deal with such issues
>> as whether the numeric strings 13.45 and 1.345E1 are equivalent. All
>> the "common semantic features" of CIF 1.1 would have to be replicated
>> dictionary by dictionary and no longer would have to be common. The
>> relationships between CIFS and their dictionaries now specified by
>> the DDLs as part of CIF would have to be moved down purely to dictionary
>> development, and instead of just having DDL1, DDL2 and DDLm, we could have
>> one or more flavors of DDL for each subdomain using CIF, or even one per
>> data file.
> I confess I don't follow the latter part of those comments, but I am confident at least that DDLs will not proliferate. DDLm may well not be the last DDL, but a new DDL requires too much investment and infrastructure to be created casually. Moreover, I don't see why new DDLs would be needed to support the kinds of semantics that might be considered for inclusion in the base CIF specifications.
>> I, for one, think that the divide used in the past, in which as much
>> as possible of the common semantics was treated along with the raw
>> syntax, was a very useful approach and helped to reduce the drift
>> of CIF into multiple dialects, and that we will consider all proposed
>> features in terms of their total impact on the use of CIF, not just
>> in terms of the validity or invalidity of particular strings.
> The CIF syntax specification must document how to express logical, base CIF structure and content in concrete electronic form. By "logical, base CIF structure and content" I mean:
> 1) The logical structure of CIF, consisting of data blocks, save frames, loops, data names, and data values
> 2) Logical data block names, save frame names, and data names, consisting of limited-length sequences of abstract characters (in the Unicode sense of the term), excluding certain characters.
> 3) A small set of base data types that values may have. At minimum, these would be just character and null, but I think it useful to include a base numeric type as well, as Herbert's comments suggest he also does.
> 4) The properties of values of each base type. For example, at the logical level, values of character type consist of a sequence of any number of arbitrary abstract characters (Unicode sense). (Or are some characters excluded at this level?) Following CIF 1.x, values of numeric type might be arbitrary-precision, arbitrary-scale floating point numbers with an optional associated standard uncertainty.
> Supporting that model covers most of the ground that the CIF 1.1 "Common Semantic Features" document does, thus cleaving closely to "the divide used in the past". For example, it follows from a mandate to support that model that the base CIF 2.0 specifications should provide, among other things,
> a) either unlimited line length or some means of line-folding for data values
> b) a means to express all characters allowed in logical data values, data block codes, save frame codes, and data names
> c) a means to express logical data values that contain all data value delimiters employed by the syntax
> It does not follow, however, that every or even several reasonable means of addressing each of those needs should be included in the base specifications. It also does not follow that alternative means of addressing them cannot or should not be defined in dictionaries.
> John C. Bollinger, Ph.D.
> Department of Structural Biology
> St. Jude Children's Research Hospital
> Email Disclaimer: www.stjude.org/emaildisclaimer
> comcifs mailing list
> comcifs at iucr.org
More information about the comcifs