Opinions on comments as part of the content

peter murray-rust pm286 at cam.ac.uk
Thu Mar 8 08:28:26 GMT 2007


At 01:31 08/03/2007, Herbert J. Bernstein wrote:

Thanks - this is very useful

>In practice, CIF has one "type" of undefined value -- it is usually called
>null.  Just as the number 1 has many representations (e.g. 1.0, 1, 
>.1e1, etc.),
>null has two representations: . and ?  From the point of view of a
>program processing submitted data they have the same meaning -- the
>user has not provided a value.  if you have a specified default, use
>it.

This is the key question. I have assumed that '.' is a mandatory 
request to retrieve the dictionary entry, extract the default value 
and substitute it. If it is not mandatory then different software 
systems will give different results from the same CIF.

>    If you don't have a specified default, you have an unspecified
>value.  For the point of view of the user, they have two very different
>meanings.  The "?" is an invitation to provide a value to be used
>in place of the default, if any.  The "." is not such an invitation.

This is true if the user is an author, but not true if the user is a 
machine or a reader.

>The question of whether
>
> >  > > loop_
> >>  > _foo _bar
> >>  > a .
> >>  > b .
> >  > >
> >>  > should be equivalent to
> >>  > loop_
> >>  > _foo
> >>  > a
> >>  > b
> >>  >
>
>depends on the dictionary.  If _bar has been declared mandatory
>and has a default value, then the first construction is valid,
>while the second is not.  If _bar has been declared implicit
>then the two constructs are equivalent and, from the point
>of view of an application, equivalent to
>
> >  > > loop_
> >>  > _foo _bar
> >  > > a ?
> >  > > b ?

Thanks - I had missed this point. Again it emphasizes that a program 
cannot take a consistent view unless it reads and applies the 
dictionary. We now do this, but I am not sure how common this is in 
CIF software in general - it's not trivial

>While it may not be easy to deal with "missing" values in
>a column of numbers, it is important in many cases to
>be able to do so.

Fully agreed.

>I for one, not only think it practical to have two ways to
>represent an unspecified value, I think it to be essential.
>The distinction between "?" and "." has worked rather well
>as a way to be able to guide users in filling in appropriate
>"blanks" without tempting them to override defaults that
>should be left alone.

Again no problem where user=author.

P.


>    -- Herbert
>
> >
> >>...
> >>  > My own heuristics are:
> >>  > _foo '?'
> >>  > carries no useful information other than the author hasn't bothered
> >>  > to remove it from the file
> >>  > _foo '.'
> >>  > is highly dangerous as the dictionary can contain default values
> >>  > which most users have no idea of. Thus the default extinction
> >>  > correction is (or certainly was)  'Zachariasen' and algorithmically
> >>  > linking '.' to this value is certain to give misleading info.
> >>  >
> >  > > loop_
> >>  > _foo _bar
> >>  > a .
> >>  > b c
> >>  >
> >>  > has a null value for one cell - this is required to make a
> >>  rectangular table
> >  > >
> >  > > loop_
> >>  > _foo _bar
> >>  > a .
> >>  > b .
> >>  >
> >>  > should be equivalent to
> >>  > loop_
> >>  > _foo
> >>  > a
> >>  > b
> >>  >
> >  > > and this construct should be avoided
> >>  >
> >>  > loop_
> >>  > _foo _bar
> >>  > a ?
> >>  > b ?
> >>  >
> >>  > is almost certainly an unedited template and should be replaced by:
> >>  >
> >>  > loop_
> >>  > _foo
> >>  > a
> >>  > b
> >>  >
> >>  > and finally
> >>  > loop_
> >>  > _foo _bar
> >>  > a ?
> >>  > b c
> >>  >
> >>  > is indistinguishable from
> >>  >
> >>  > loop_
> >>  > _foo _bar
> >>  > a .
> >>  > b c
> >>  >
> >>  > All these issues come into very sharp focus when processing CIFs - it
> >>  > is not trivial to manage '.' in a column of otherwise real numbers.
> >>  >
> >>  > P.
> >>I take a similar approach. They both represent missing values, but
> >>missing for different reasons. If one really wants a default value in
> >>the dictionary, it should be "if not otherwise specified" and not "if
> >>the value is '.'". In that case, both still mean missing, just different
> >>reasons.
> >>
> >>Does ANYBODY really think it is practical to have two types of undefined
> >>values?
> >>
> >>Of course, CIF is just a text archive. There is nothing preventing the
> >>use of a string in the middle of an array of real numbers.
> >
> >If the CIF name occurs in a loop_ and is defined in a dictionary as a
> >NUMB then all values must be valid real numbers. If defined as CHAR
> >it can be sequence of legal characters (there may be length restrictions).
> >
> >>Some rules
> >>about numeric arrays would be helpful for practical use of CIF.
> >
> >P.
> >
> >
> >Peter Murray-Rust
> >Unilever Centre for Molecular Sciences Informatics
> >University of Cambridge,
> >Lensfield Road,  Cambridge CB2 1EW, UK
> >+44-1223-763069
> >
> >_______________________________________________
> >comcifs mailing list
> >comcifs at iucr.org
> >http://scripts.iucr.org/mailman/listinfo/comcifs
>
>_______________________________________________
>comcifs mailing list
>comcifs at iucr.org
>http://scripts.iucr.org/mailman/listinfo/comcifs

Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road,  Cambridge CB2 1EW, UK
+44-1223-763069 



More information about the comcifs mailing list