Please advise regarding a design of CIF dictionaries for material properties

Saulius Grazulis grazulis at ibt.lt
Sun Oct 2 14:19:24 BST 2011


Dear Nick,

many thanks for you detailed answer, and for your comprehensive example
of DDLm dictionary!

Here, for brevity, I will only highlight the key questions, one question
at a time, that might require a very definite and formal answer.

On 10/02/2011 01:18 PM, Nick Spadaccini wrote:

>> data_... block name in the dictionary no longer matches tag 
>> name. I guess this should not be a problem... Is it?
> 
> It is a convenience to have the data block name match the _name of 
> the item, it is NOT a requirement of the DDLs (well certainly was not
> at its inception, but I am not sure if interpretations have since
> changed).

This is exactly the question which bothers me: is it a must that the
data_.. block prefix in DDL1 dictionary matches the declared data name,
is it a formal recommendation, or is it just common practice and
tradition? In other words, is it a MUST match, SHOULD match or MAY match
according to the RFC 2119 (http://www.ietf.org/rfc/rfc2119.txt)?

May I explain why I insist on that precise wording. When we write a CIF
processing program, we want it to be correct, in a sense that it MUST
process every correct CIF and produce defined results, and it MUST
report an error for every incorrect CIF (provided the sets of correct
and incorrect CIFs are computable, which I guess they are according to
the current definitions).

Now, if the data block<->declare name correspondence is a MUST, then I
infer that:

a) correct software MAY use data block names to search for name
declarations (do we need this?);
b) correct software MUST report an error when data block name is not a
prefix of a declared data name;
c) if a dictionary where b) is the case is ever encountered, then the
dictionary is incorrect and it the responsibility of the dictionary
maintainer to fix the error.
d) if a validator program validates a dictionary against DDL and does
not report an error when the the non-conforming dictionary is processed,
then the validator is buggy and needs a fix.

If, however, the block<->declare name correspondence MAY or SHOULD be, then:

a) correct software MUST NOT use data block names to search for name
declarations (programmers, beware!);
b) correct software MAY/SHOULD report a (suppressable, non-fatal)
warning  when data block name is not a prefix of a declared data name;
c) if dictionaries where b) is the case are encountered, and a program
does not accept them, then a program is buggy and it is the
responsibility of the program maintainer to fix it.

As you see, a course of supposed events when a program accepts or
rejects a dictionary differs radically depending on whether the
data<->name correspondence is a MUST, SHOULD or MAY item.

>From what you say in the quote above ("It is a convenience to have the
data block name match the _name of the item"), the correspondence MAY be
present (and it MAY be not). According to what David Brown wrote (Wed,
28 Sep 2011 12:07:30 -0400, "It is not a problem in DDLm, I am not sure
about DDL1, but it could be confusing.  Best avoided."), I get
impression that it SHOULD. But according what John Bollinger wrote (Wed,
28 Sep 2011 11:05:54 -0500, "It also specifies (ITG 2.5.5) that item
names be used as definition datablock names."), it sounds more like a MUST.

So, for me to know how to write a correct CIF validator and a correct
CIF dictionary, I need to know how to interpret the definition of the
correct DDL1 dictionary -- whether:

a) "item names MUST be used as definition datablock names"
b) "item names SHOULD be used as definition datablock names"
c) "item names MAY be used as definition datablock names"

which of the a)-c) situations is the actual case?

Any choice among a-c is actually possible; I am sure that every
developer has taken this choice silently and meybe even implicitely, but
it would probably be beneficial for CIF users to make the choice
explicit, especially given the variety of possible interpretations.

BTW, I have scanned the existing (ftp://ftp.iucr.org/cifdics/) IUCr
dictionaries for the correspondence. In the mmCIF dictionary, all save
block names are prefixes of the corresponding declared tag names (data
not shown ;); however there are 4 dictionaries that have several cases
of data block names differing slightly (I attach a file with the
non-matching tag list; the first line is a Perl command that produced
it; warning -- long lines!). Thus, picking a "MUST" clause (the case
"a)" above) would probably be too restrictive and invalidate too many
existing dictionaries...

Regards,
Saulius

-- 
Dr. Saulius Gražulis
Institute of Biotechnology, Graiciuno 8
LT-02241 Vilnius, Lietuva (Lithuania)
fax: (+370-5)-2602116 / phone (office): (+370-5)-2602556
mobile: (+370-684)-49802, (+370-614)-36366
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dictionary-data-block-name-NOT-a-prefix-of-the-declared-tag.txt
Url: http://scripts.iucr.org/pipermail/comcifs/attachments/20111002/045c060b/attachment.txt 


More information about the comcifs mailing list