Please advise regarding a design of CIF dictionaries for material properties

Nick Spadaccini nick at csse.uwa.edu.au
Mon Oct 3 03:07:15 BST 2011


On 2/10/11 9:19 PM, "Saulius Grazulis" <grazulis at ibt.lt> wrote:

> Dear Nick,
> 
> many thanks for you detailed answer, and for your comprehensive example
> of DDLm dictionary!

I forgot to mention, that comprehensive DDLm example was done by Syd, in the
space of about 30 mins.

> This is exactly the question which bothers me: is it a must that the
> data_.. block prefix in DDL1 dictionary matches the declared data name,
> is it a formal recommendation, or is it just common practice and
> tradition? In other words, is it a MUST match, SHOULD match or MAY match
> according to the RFC 2119 (http://www.ietf.org/rfc/rfc2119.txt)?

Certainly it is not a requirement in STAR, and to my recollection nor was it
in CIF.

The datablock is a syntactic container in STAR/CIF, and the syntax rules
require it have a name. That makes it addressable, and its contents can be
addressed indirectly. In the DDL languages the data block (DDL1) and the
saveframe (DDL2 and DDLm) are containers that hold data item definitions.
The contents of each container completely and wholly defines the data item,
and hence the saveframe or datablock name is a redundancy.

The convention that the saveframe and datablock name exactly matches the
data item name is there for readability and because the most widely used
tool back in the early 1990s would have been "grep".

I will put my neck out and say there is NO FORMAL REQUIREMENT that the
datablock name has to match the _name of the data item being defined in
DDL1. I know that to be true for the saveframe name in DDLm. I can find
nothing in the literature that suggests it is a requirement in DDL2. Hence
the datablock/saveframe name can be anything your want so long as it is
unique within the file.
 
> May I explain why I insist on that precise wording. When we write a CIF
> processing program, we want it to be correct, in a sense that it MUST
> process every correct CIF and produce defined results, and it MUST
> report an error for every incorrect CIF (provided the sets of correct
> and incorrect CIFs are computable, which I guess they are according to
> the current definitions).
> 
> Now, if the data block<->declare name correspondence is a MUST, then I
> infer that:
> 
> a) correct software MAY use data block names to search for name
> declarations (do we need this?);
> b) correct software MUST report an error when data block name is not a
> prefix of a declared data name;
> c) if a dictionary where b) is the case is ever encountered, then the
> dictionary is incorrect and it the responsibility of the dictionary
> maintainer to fix the error.
> d) if a validator program validates a dictionary against DDL and does
> not report an error when the the non-conforming dictionary is processed,
> then the validator is buggy and needs a fix.

I re-iterate the correspondence is NOT a must. BUT if it were (as a
programmer) I would separate the strict syntax rules from the pragmatics. In
(b) I would report an error (syntax) but not terminate (pragmatics).

The other thing that is worth mentioning is that dictionaries are built over
time with many contributors and once made public are correct. They may have
regular additions all of which will be authenticated and ratifies. So the
likelihood your software will detect errors in dictionaries is very low.
Unless the operation model at your site is that you expect submitters to
have their own specific dictionaries which are
extensions/additions/variations to the accepted dictionary in which case
your validator would have to be written to expose a magnitude of possible
user errors.
 
> If, however, the block<->declare name correspondence MAY or SHOULD be, then:
> 
> a) correct software MUST NOT use data block names to search for name
> declarations (programmers, beware!);
> b) correct software MAY/SHOULD report a (suppressable, non-fatal)
> warning  when data block name is not a prefix of a declared data name;
> c) if dictionaries where b) is the case are encountered, and a program
> does not accept them, then a program is buggy and it is the
> responsibility of the program maintainer to fix it.
> 
> As you see, a course of supposed events when a program accepts or
> rejects a dictionary differs radically depending on whether the
> data<->name correspondence is a MUST, SHOULD or MAY item.

As I said I would not reject the dictionary even if the correspondence was a
requirement. Certainly report the problem, but it would not be a fatal error
in any parser I built.
 
>> From what you say in the quote above ("It is a convenience to have the
> data block name match the _name of the item"), the correspondence MAY be
> present (and it MAY be not). According to what David Brown wrote (Wed,
> 28 Sep 2011 12:07:30 -0400, "It is not a problem in DDLm, I am not sure
> about DDL1, but it could be confusing.  Best avoided."), I get
> impression that it SHOULD. But according what John Bollinger wrote (Wed,
> 28 Sep 2011 11:05:54 -0500, "It also specifies (ITG 2.5.5) that item
> names be used as definition datablock names."), it sounds more like a MUST.

I will leave it to COMCIFs to formally state the position on this question,
but from my recollection it is a SHOULD and not a MUST.

> So, for me to know how to write a correct CIF validator and a correct
> CIF dictionary, I need to know how to interpret the definition of the
> correct DDL1 dictionary -- whether:
> 
> a) "item names MUST be used as definition datablock names"
> b) "item names SHOULD be used as definition datablock names"
> c) "item names MAY be used as definition datablock names"
> 
> which of the a)-c) situations is the actual case?

Strictly speaking b) and c) are the same in a semantic sense. Again my
stance is NOT a).

> BTW, I have scanned the existing (ftp://ftp.iucr.org/cifdics/) IUCr
> dictionaries for the correspondence. In the mmCIF dictionary, all save
> block names are prefixes of the corresponding declared tag names (data
> not shown ;); however there are 4 dictionaries that have several cases
> of data block names differing slightly (I attach a file with the
> non-matching tag list; the first line is a Perl command that produced
> it; warning -- long lines!). Thus, picking a "MUST" clause (the case
> "a)" above) would probably be too restrictive and invalidate too many
> existing dictionaries...

You have answered your own question, the MUST clause is too restrictive if
you enforce it as a "fatal error".

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD.
Adjunct Research Fellow
The University of Western Australia
35 Stirling Highway
CRAWLEY, Perth,  WA  6009 AUSTRALIA
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini at uwa.edu.au






More information about the comcifs mailing list