DDLm, dREL, images and NeXus

Fri Dec 12 04:58:31 GMT 2008

Let me flesh out a proposal in some detail, so that holes can be picked in
it.

First, an overall view.

At the moment, a DDLm/dREL engine is initialised with a set of DDLm
dictionaries.  When passed a CIF instance, it will return values of any
datanames that are contained in the CIF instance or that it is capable of
calculating from those datanames that are already in the instance.  Now,
what I envision as a 'translating' DDLm engine is initialised as before with
the standard DDLm dictionaries, but also with two further dictionaries: a
'NeXuS dictionary' and a 'translation dictionary' (contents of these
explained later).  Finally, it requires a 'NeXuS plugin'. Now, when passed a
CIF instance the DDLm engine works as before.  When passed a NeXuS instance,
it returns values of CIF datanames that it can calculate.

Now for an explanation of these various extra bits.

1.  The 'NeXuS' dictionary is just another DDLm dictionary.  It contains
definitions for datanames using a CIF namespace: e.g. _nexus.slit_height.
The linkage to a NeXuS file is accomplished using a set of new DDLm
attributes, which work like the current 'xref' attributes: in the header
section of this 'NeXuS' dictionary file the various versions of the NeXuS
standard are assigned a short code in a loop.  Each of the definitions in
the body of the dictionary then contains two new DDLm attributes:
_alien.code (referencing the version of the standard in the header) and
_alien.location (where to find the dataname).  The syntax of the value of
_alien.location might be borrowed from, for example, XPath in the case of
NeXuS.

The data definitions containing _alien.location attributes could be
considered 'raw' NeXuS data, which may not map easily onto CIF datanames.
Therefore, this dictionary could contain further DDLm definitions of
dataitems (still in the CIF 'nexus' namespace) which contained dREL methods
for manipulating the raw datanames into something that mapped more directly
into CIF.  This is where one might foresee adding a few more builtin
functions to dREL to ease e.g. image processing.

Note also that one might, instead of using an '_alien.location' DDLm
attribute, define a new dREL builtin (like 'nexus_locate(string)'), and
provide a simple dREL expression calling this builtin.

2. The meaning of the 'NeXuS' plugin to the DDLm engine is that it enables
the DDLm engine to understand the '_alien.location' attribute and use it to
return a dREL-compatible value.  It would also supply extra builtin
functions if necessary.

3. The 'translation dictionary' contains alternative definitions of items in
the standard CIF dictionaries, where the dREL methods used to derive those
items in the standard dictionaries are replaced by dREL manipulations of
dataitems defined in the 'NeXuS' dictionary.  When it is read into the
DDLm/dREL engine, these new dREL methods could either replace the old ones,
or be added as alternative methods of type 'translation'.  This then allows
lots of flexibility in terms of checking derived values in a NeXuS file
against those values derived via standard CIF dREL methods from primitive
values from that same NeXuS file.

So, in summary, in order to implement this scheme, one needs about 6 new
DDLm attributes and a few more builtin dREL functions.  The crucial work is
in designing the behaviour of the '_alien.location' attribute to properly
capture the information.

One thing to think about is the introduction of an opaque data type into
dREL, so that e.g. image data returned from a NeXuS file need not be
specified as an array of numbers, but simply as an object, which can only be
passed to a builtin function and not otherwise manipulated.

Advantages of this scheme:
(1) dREL simplicity is preserved (I think this is important);
(2) implemented as a modular addon to a normal DDLm engine;
(3) easily extensible to other data formats;

Here follows a simple example.  Note that a NeXuS file may contain multiple
'NXuser' groups, each of which may contain one or more names but only a
single affiliation.  By specifying that the corresponding CIF category is a
'List' category, it can be deduced by the NeXuS plugin that each name in the
NeXuS file is a new entry in the category loop.  I have accomplished
renormalisation by walking the tree and returning the items in the order
encountered.  The category key is used to determine when to duplicate an
item; so in this case, every time the key is encountered, the associated
value of the target item is returned.  Otherwise, there would be fewer
affiliations than names in cases where more than one name belongs to a
single affiliation.

There is no additional processing of the raw nexus datanames in this
example.
==============================
The 'NeXuS' dictionary file:
==============================
data_nexus
 _dictionary.namespace       nexus
  loop_
   _dictionary_alien.code
   _dictionary_alien.type
    _dictionary_alien.version
   _dictionary_alien.uri
   nexus4        nexus        4.0     www.nexusformat.org
...
save_RAW_AUTHOR
    _definition.class        List     #items go into a loop
    _category_key.generic   '_raw_author.user_name'  #one value per
user_name
    ...
save_

save_raw_author.user_name
    _category.id       raw_author
     _type.container    Single
    _type.contents     Text
    _alien.code        nexus4
    _alien.location   "NXuser:name" #renormalise by returning all names in
order
                                    #encountered when walking the tree
save_

save_raw_author.affiliation
    _category.id       raw_author
    _type.container    Single
    _type.contents     Text
    _alien.code        nexus4
    _alien.location    "NXuser:affiliation" #renormalise by returning
affiliation
                                            #each time name encountered in
tree
save_
...

================
The translation file
================
data_nexus_translate
    ...
save_audit_author.name
loop_
    _method.purpose
    _method.expression
    'translate'
;
with aa as audit_author
with rn as nexus:raw_author
aa.name = rn.user_name
;
save_

On Thu, Dec 11, 2008 at 11:32 AM, Herbert J. Bernstein <
yaya at bernstein-plus-sons.com> wrote:

> Dear Colleagues,
>
>   This is a distillation of an email conversation James Hester and I
> have been having since the Osaka meeting.  We both feel that it
> would be helpful if others were to join in and express their views.
> We have been discussing the interaction among DDLm, dREL, images and
> NeXus.  We agree on most points, and disagree on a few, and hope,
> by opening up the discussion, to arrive at a consensus.
>
>   What is driving this discussion is a need to understand how best to
> manage image data in the context of both imgCIF and NeXus, and to do
> so in a way that is consistent with the recent adoption of DDLm as
> the target framework for new work on CIF dictionaries.
>
>   It must be clearly understood that it is highly unlikely that a
> single standard will ever be adopted for crystallographic diffraction
> images, much less for the broader context of pixel-based data in
> structural biology.  The best we can hope for right now is to have
> some number of clearly defined image data frameworks, and agreed
> algorithms for conversion among them.  There are many frameworks
> to consider, but two that are very close to achieving the goal of becoming
> inter-operable in the immediate future are imgCIF and NeXus.  What is
> missing is a formal language within which to specify how to move between
> them.
>
>   We could, of course, just come up with a verbal description of how
> to move between imgCIF and NeXus and a couple of example conversion
> programs written ad hoc in whatever language might come to mind.  However,
> the effort being expended on dREL, the supporting language for DDLm,
> suggests the possibility of building on dREL as a base to do this job
> by extending dREL to have the capability of working with NeXus (dREL
> is already capable of dealing with CIF).  James has made the counter
> proposal of leaving dREL as just a CIF-specific language and keeping
> the CIF-NeXus conversion algorithm specification as a matter for a
> different language and/or API.  James has also suggested the further
> step of stripping out the built-in functions from dREL and dealing with
> just a very stable dREL language specification in one instance and a
> perhaps evolving API (list of builtin functions available in dREL) on the
> other:
>
> "My comment at this stage would be that defining a coupling mechanism
> between CIF and a given language is not a large task, due to the
> simplicity of the CIF syntax, whereas adding lots of stuff to dREL
> would be a serious task and has some important downsides (loss of
> simplicity being an important one). Apropros the
> simplicity of the coupling mechanism, I (predictably) quite like my
> Python model of a CIF file as a hash table of CIF data block objects
> indexed by datablock name, and the datablock objects are themselves
> hash tables of strings/lists of strings indexed by dataname.  This
> model would appear to translate pretty easily into most other
> languages.  What then remains is some syntactic sugar (the use of
> square brackets to do key-based lookup is nice in dREL) which can be
> replaced in another language by a few standard methods."
>
> There was a lot more to the discussion, but let us try to settle a
> direction:
>
> Should we be trying to extend dREL to support more than just CIF,
> specifically NeXus, making something we might call dREL++, or should the
> language for this broader task be something distinct from dREL with a
> distinct name.  In practice, in either case, I suspect all of this will be
> built on a python base, or something similar, as James suggests, but
> names do matter,
>
> Comments please.
>
> Regards,
>   Herbert
>
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>
>                  +1-631-244-3035
>                  yaya at dowling.edu
> =====================================================
> _______________________________________________
> comcifs mailing list
> comcifs at iucr.org
> http://scripts.iucr.org/mailman/listinfo/comcifs
>

-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://scripts.iucr.org/pipermail/comcifs/attachments/20081212/90b10660/attachment-0001.html