CIF Infoset

Tue Sep 7 15:03:55 BST 2004

Here are a few IDB comments on the comments of DDB

>>The core dictionary defines three items which can be looped:
>>    _audit_conform_dict_name
>>    _audit_conform_dict_version
>>    _audit_conform_dict_location        # Contains the URL where the
>>dictionary can be found
>>As far as I know these have not been widely used - Acta Cryst. should
>>start insisting that these be included in submitted papers.  There is no
>>need to give the dictionary version in anything as ephemeral a comment.
>>    
>>
>
>
>That sounds like a positive step, but would that go in every data_block or 
>is it a global_ thing?
>
Since each datablock is independent, each would have its own 
_audit_conform items at least until such time as we develop a better 
linkage between datablocks.

>The problem I see is that the effort invested in implementing it for all 
>newly created and submitted CIFs is wasted because it is an 
>incomplete solution and no current software uses it or needs it.
>
There are already editor/browsers that read in the dictionaries and use 
them to valicate a CIF.  They do not yet check the _audit_conform items 
so the dictionaries have to be identified to the program by the user (or 
the program loads all the dictionaries it can find, willy nilly).  
However, we are looking to the future, not just trying to keep up with 
the past.

>So, to try and resolve the namespace of each name, you would need to
>(1) check the _audit_conform list of dictionaries in reverse order
>(2) check against the list of registered prefixes for accidental matches
>(3) check all versions of all publically accessible dictionaries
>(4) then give up.
>
If an _audit_conform loop is present, it should list all the 
dictionaries that were used in  writing the CIF together with their 
URLs, so an application should be able to download all the dictionaries 
it needs.  If there are data names appearing in the CIF that do not 
appear in these dictionaries, then the items are undefined and the user 
can do what seems most appropriate.  In an editor written by some of my 
students, items not located in the dictionary are loaded into a category 
called 'miscelaneous' where the user can view them and decide whether 
they are legitimate or the result of a syntactic error.

>If its important enough to create a name for it then isn't it important 
>enough 
>define its purpose somewhere? Ad hoc data names seem to provide
>nothing useful besides a legitimate excuse for laziness in the 
>specification. Theres no incentive to organize things tidily.
>Maybe they were important originally when COMCIFS were exploring 
>the field, before dictionaries were introduced, but is it still important 
>to be able to make up arbitrary stuff and stick it in a CIF without 
>definition?
>Who is doing this and how are they using it?
>Do they really intend to save it for posterity?
>
New concepts are continually being developed in crystallography and it 
is impractical to assign them names until it is clear that the concept 
has some permanance, otherwise the dictionaries quickly become filled 
with a legacy of discarded ideas.  Thus people are encouraged to develop 
software that involves ad hoc names that may later be adopted by CIF or 
discarded.  Yes, this does lead to potential problems in the archive, 
though such items can be defined in a local dictionary which is listed 
in the _audit_conform loop.  In practice this is not likely to be a 
problem because such items are not usually used in archived CIFs.  We 
wish to retain the flexibility of CIF to develop with the field and not 
make people think they have to get the permission of the Academy 
(COMCIFS) before they try out a new idea.

>>>I had a hazy recollection that  "this is a string" and   
>>>      
>>>
>this_is_a_string
>  
>
>>>were equally valid CIF constructs containing identical information
>>>content,
>>>used for example in space group names. Would they be formally identical 
>>>      
>>>
>in
>  
>
>>>an infoset? Does the white space in all strings have to be normalised 
>>>      
>>>
>(is
>  
>
>>>that the right word?)?
>>>      
>>>
>>We had a discussion of this point while preparing the symmetry_CIF
>>dictionary and came to the decision that these two strings were not
>>equivalent, i.e., underscore is not white space.. 
>>    
>>
>
>Bummer. I know one program that needs changes made :-(
>
Because there is a legacy of underscore space group names (etc.) it is 
wise to be able to read them, but they should not be written.

>But perhaps I could also draw your attention to this:
>      http://journals.iucr.org/services/cif/stdcodes.html#Appdx4.3
>as evidence that underscores do seem to be an
>officially sanctioned form of white space in uchar data types.
>
The instructions in this URL refer to an item in the 2.2 version of the 
dictionary that has now been replaced in 2.3 by three separate items 
that are fully enumerated.  Thus this problem is resolved in the latest 
dictionary version.  Tightening up the dictionaries is an ongoing process.


David

-- 
Dr. I.D.Brown, Professor Emeritus,
Department of Physics and Astronomy
McMaster University, Hamilton
Ontario, Canada

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://scripts.iucr.org/pipermail/comcifs/attachments/20040907/dfe26104/attachment.htm

Crystallography Online: the website of the International Union of Crystallography

CIF Infoset