DDLm implementation discussion
idbrown at mcmaster.ca
Tue Mar 10 19:45:39 GMT 2009
My discussion paper on the use of intermediate computation items in DDLm
has produced many interesting suggestions but not a lot of consensus. I
have included an edited version of the discussion bringing related
comments together under three headings. The first deals with the
desirability or otherwise of defining beta in the dictionary, the second
addresses the implications of methods serving as definitions, and the
third the problem of hiding or removing intermediate items in order not
to clutter or otherwise compromise the CIF. The solution I am proposing
as a result of the discussion is that all the derived items (i.e. those
with methods) should be defined in a 'derived-item' dictionary, while
the experimental and assigned items will appear in an archive
dictionary. Anyone looking for a derived item will obtain these if they
read an archive CIF under control of the archive + derived-items
dictionary. This is more fully described at the end of this document.
AMBIGUITY IN THE USE OF THE BETA FORM OF THE ADP
Carol Brock (Senior editor of Acta Cryst. C and a user of CIF) in a
private email provides a strong argument for why betas should be invisible.
What confusion ADPs cause. I came into crystallography at about the
time that calls were being made for a switch from betas (used in ORFLS
and its variants) to Us. While reading your email a picture of the page
in ORTEP manual where all the ADP types are listed flashed through my
mind. You are so right about betas being nearly useless because of the
confusion about the factor of two. I remember drawing ellipsoids with
and without the factor of two for structures in the literature when
attempting (usually in vain) to figure out which form had been used.
Please be sure no calculated beta is ever available where somebody might
use it in the archival literature.
I would be very happy to help argue with the IT people. Nobody should
ever see a beta again. A complication is that it is only the fossils
who remember how much confusion betas caused.
Doug Duboulay (DD) on the other hand provides equally persuasive
arguments why beta should be accessible.
If I understand correctly, beta is the *only*
true tensor form of the ADPs. If you want to convert between
different unit cells, transforming the Uijs is only possible by
tensor form before the symmetry transform and back to Uijs afterwards.
To obfuscate its role in a definitive treatise seems lacking.
Of course I may be wrong :))
Nick Spadaccini (NS)
The _matrix_beta item is an item that does warrant definition and as
Doug states is a true tensorial form of ADPs. David alludes to another
problem that does not exist - there can be NO confusion in the definition of
the individual off-diagonal beta terms (historically there is a factor of 2
confusion). Why? Because the individual off diagonal terms can never be
accessed because there is no definition for them. There is never an error in
building the _matrix_beta because it is constructed directly from the U or B
matrices (where there is no confusion).
Nick is referring here to the way matrices are constructed from the
individual matrix elements stored in the CIF. The assumption is that
experimental information will be given as matrix elements and the
matrices themselves would only be created under dictionary control. So
Nick is right as long as no-one enters the beta matrix as a matrix
because individual matrix elements do not appear in the dictionary.
However, there is nothing I am aware of in DDLm that prevents the matrix
from being generated by external software or a word processor, and the
convenience of doing so may encourage users to take this shortcut. The
question is how to prevent this.
METHODS AS DEFINITIONS
>> METHODS ARE THE NEW DEFINITIONS
>> At the meeting of COMCIFS in Osaka it was decided that when a method is
>> present in the dictionary it takes precedence over text in defining the
>> item. One immediate corollary to this is that only one method is
>> allowed for each data item.
DD: I am not sure why that is a corollary.
Why is it not possible to have fallback methods when a particular
evaluation strategy fails?
My understanding of the derivation pathway was more like:
3 beta -> Uaniso -> Baniso
i.e. there are discrete component forms as well as matrix forms
as well as tensor forms.
Also, for calculating H atom adp's isn't there an algorithm based on
Uiso of their coordinating C,N,O atoms?
How are you going to get Uiso from Uaniso?
Uiso is not the same as Uequiv. If Uequiv is required we should define
Uequiv -> Uaniso -> Uiso. Presumably Uequiv will be the same as Uiso if
no value of Uaniso is provided, providing we supply the correct methods.
NS: David suggests there is a problem with the fact that
different evaluation pathways exist to obtaining a value for an item,
i.e., that multiple paths is problematic, whereas it is the expressed
design in dREL.
The beta->Uani->Bani->Uiso->Biso pathway David describes is not correct, and
Doug makes an attempt to clarify (and gets closer). The actual calculation
Which seems perfectly logical to me. dREL is a Turing complete language and
while there is one evaluation method in an item definition you can create a
multitude of paths to the answer - and that is a GOOD THING.
In the proof-of-principle dictionary this branching depends on the value
of .adp_type which may not be present (it has no method in the
proof-of-principle dictionary and an enumeration default of Uiso which
could be incorrect). It would be better if branching were based on an
ordered test for the existence of an item, thus if Uaniso is not
present, look for Baniso. However, as Doug points out, a search for the
existence of Uaniso would have to allow Uaniso the opportunity to
generate itself from its individual elements. I am not sure how this is
done in practice but that is a problem for the dREL implementation.
There seems to be a consensus here that branching is built into dREL and
is desirable in a definition. It is not clear if this has to be
achieved using if-then constructs or an ordered loop defining the
different branch methods. If one goes for the if-then construction, how
does one ensure the tested item is present, or can one make provision
for a default procedure if it is not? If the definitions form a tree
(rather than a network with closed loops) it should not be possible to
get stuck in a loop as mentioned below by DD.
HIDING THE INTERMEDIATE ITEMS
>POSSIBLE SOLUTIONS from IDB's original discussion paper
>> Tree 3
beta -> Uaniso -> Baniso -> Uiso-> Biso
>> can be made to work if the beta form can be made
>> invisible. It cannot be completely invisible as it must appear in the
>> appropriate CIF dictionary, and its text description will be displayed
>> by any CIF editor such as publCIF or enCIFer.
>> One possible solution is to include a flag in the dictionary definition
>> to indicate that the item should be hidden from the user or deleted
>> after the calculation is complete.
>> A second possibility is to give the item a dataname that disguises its
>> identity, e.g., a name such as _atom_site_aniso.intermediate1. The
>> dictionary would contain the .description 'This item is an
>> an ADP calculation and is not to be used for archival or retrieval
>> A third solution would be to rearrange the method for calculating the
>> structure factors so that it works directly with Uaniso and does not
>> generate beta as an intermediate. In this case there is no need to
>> beta in the dictionary.
DD: I would tend towards the first solution, but with multiple evaluation
strategies (i.e. loop_ed), combined with dREL software that checks
for multiple iterations around an evaluation path and which falls back
to an alternative if it exists, as well as the flag to say don't print
this in a result CIF (and possibly hide this in an editor!).
James Hester (JH)
I agree with IDB's diagnosis of the problem, and, rather than clutter
the dictionary with unnecessary baggage as solutions 1 and 2 do, I
would suggest a variant of your 3rd solution:
Solution 4: As beta is used primarily for calculational convenience, a
dREL function 'beta()' is defined which calculates the beta value.
The structure factor calculation is rewritten to call this function.
A given dREL implementation can choose to cache values returned by
this function to improve efficiency.
In [David's] first solution invisibility can only be maintained if
everyone respects the new DDLm attribute that will be created to flag
it. The value [of the item] will leak out.
The second solution is better than the first solution, but I believe an
unnecessary cluttering of the dictionary
Herbert Bernstein (HB)
I like the idea of defining useful functions in the dictionary that
will not in and off themselves generate tags, but we need to
provide some control over scope and namespaces to make it easy to
combine useful functions from multiple dictionaries -- perhaps
adopting python's module based dotted notation to resolve
JH: Yes. Currently all dREL functions (and one hopes builtin functions in
the final standard) belong to DDLm category 'Function'. We could
usefully sketch out a hierarchy of subcategories here when putting
together the final DDLm standard, or alternatively/additionally the
standard DDLm importation mechanism when importing other dictionaries
could resolve name conflicts.
NS: Correct, any function definition can be handled by the "functions"
A quick response to David's original post is "there is no identifiable
problem" in what David has written.
The interim data items, many of which are the actual legitimate
crystallographic objects, like the cell vectors rather than their scalar
dimensions and hence I personally believe they should be part of the
dictionary, don't have to be exported. They are in our prototype parser
because I am too lazy to clean up the output, I simply dump the in-memory
This aspect of what David sees as a problem, can be made to go away by using
DDLm's import facility. That is, the parser reads in the core dictionary
(with only the data items David/community would like to see in a submission
file) and import a "fuller" dictionary to handle everything. On output the
parser can be restricted to only exporting the data items in the core
dictionary. Problem solved. The user would never see or know about the extra
James' idea of creating functions would work also, but there are two quite
different classes of items here. Those which are truly library/utility
functions like those that strip the ortep-like object 2_567 in to a symmetry
pointer 2, and a cell displacement vector [0, 1, 2]. That is a function.
The other type of items are legitimate crystallographic items that merit
definition and should not be obfuscated in code. For instance the cell
displacement vector is a legitimate item and merits definition.
Crystallographically [0, 1, 2] is meaningful whereas _567 is actually
syntactic rubbish - albeit popular syntactic rubbish.
You may insist on only seeing _567 but to deny the ability to define a truly
crystallographic object like [0, 1, 2] is not sensible. Especially when it
can be hidden using an importing functionality.
The solution I describe by using importation will solve any perceived
problem and is the very basis on which DDLm had an importing functionality
IDB's PROPOSED SOLUTION
Branching will be allowed in definitions, though a tree structure would
be necessary with items arranged in an hierarchy to ensure that loops
are not created and that all paths end in experimental or assigned items
such as cell dimensions or symmetry operations. The application of this
will require care to respect the crystallographic integrity of the
definitions, e.g., if U is calculated from B there should be no route by
which B can be calculated from U. The question remains how this is
implemented: by if-then or loop construction. I welcome advice on the
best way to include multiple methods without having to test a
potentially missing item.
Various solutions for the treatment of intermediate items have their
supporters, including flagging them, eliminating them in favour of
functions, or segregating them into a 'derived-items' dictionary. It is
clear that most intermediates are good crystallographic items that would
not be out of place in the CIF output, but leaving aside the special
problem of beta, there would be a danger of cluttering up the CIF with
large numbers of derived items, many of them duplicating in a different
format information already present. A further danger is one that
Herbert pointed out in Osaka. He suggested that DDLm CIF dictionaries
would make CIF more dynamic compared with the static character of CIF1
and CIF2, and if some experimental or assigned value (e.g., the cell
constants), were changed there would be no way we could ensure that all
the derived items would be automatically updated.
The relevant items in DDLm dictionaries can be classified as either
derived or experimental (or assigned) (ignoring those used for data
management and description which do not concern us). The derived items
will all have methods, in principle the experimental values will not
have methods. Following Nick's suggestions, we could arrange that the
derived items are placed in a 'derived-items' dictionary, while those
giving the experimental and assigned values, such as cell constants and
space group symmetry, would appear in an archive dictionary.
This represents an important change in the way we handle and think about
CIF: the basic experimental information would be found in an archival
CIF, but reading this CIF under DDLm dictionary control would allow any
desired derived items to be retrieved as if they had been originally in
the CIF. These requested items could be passed to a user program or
exported as a CIF, though the default exported CIF would contain only
items from the archival CIF. Provision would be needed to allow derived
items already in the archival CIF to be optionally retained rather than
recalculated. Derived items currently present in CIF such as cell
volume and calculated density would appear in the derived-items
dictionary. The derived-item dictionary would contain a rich supply of
derived items, e.g., a person requiring a set of bond vectors could
retrieve these by supplying the labels and symops of the terminal atoms,
which in turn might be generated by an external program so as to include
all the bond distances of interest to the user. The derived-items
dictionary could include a definition of the beta matrix since this
might be required by a user program designed to transform cell settings,
but being in the derived-items dictionary there would be little
temptation to use it archivally. Editors such as publCIF which are
designed to help in producing archival CIFs would not need to import the
Of course we could also make more use of functions as suggested by James
and Herbert if there are no issues with importing functions, but only
when the intermediate item was not a potentially useful derived item.
As is well known, the devil is in the details, and in adapting the CIF
dictionaries I will have to make many decisions in matters of detail.
But if there is agreement on splitting the dictionaries into archive and
derived-item dictionaries as described above, I will have a guideline to
work with. I will undoubtedly come back with other problems in the
future but this split seems to be in the spirit of DDLm and appears to
solve a number of important problems.
Is this plan acceptable to everyone? If so, I will start to apply it
and move this discussion on the next problem.
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 298 bytes
Desc: not available
Url : http://scripts.iucr.org/pipermail/comcifs/attachments/20090310/9e300c30/attachment-0001.vcf
More information about the comcifs