DDLm, dREL, images and NeXus

James Hester jamesrhester at gmail.com
Tue Dec 16 02:37:47 GMT 2008


Before responding to Doug, I might comment that, although we are thinking
about NeXuS in particular here, we should make sure that whatever scheme we
come up with is generic enough to allow translations to be implemented from
(and to) other data description schemes (e.g. data repositories).

On Mon, Dec 15, 2008 at 3:22 PM, Doug <doug.duboulay at gmail.com> wrote:

>
> On Fri, 12 Dec 2008, James Hester wrote:
> > Let me flesh out a proposal in some detail, so that holes can be picked
> in
> > it.
>
> The fact that the NeXus data model effectively supports infinite recursion
> on
> some elements and also that NXdata can hold many things that will not
> have CIF equivalents both suggest that NeXus -> CIF conversion could be
> lossy.
>
> > First, an overall view.
> >
> > At the moment, a DDLm/dREL engine is initialised with a set of DDLm
> > dictionaries.
>
> To elaborate a little bit, a dictionary is compiled to jython/java byte
> code as a set of classes, one for each category and containing
> methods for get/set and evaluate.  Although DDLm goes to some
> effort to express a hierarchy of categories, at least in the dREL prototype
> engine, at the implementation level, those categories were flattened to the
> two-level CIF model.


As an aside, there are currently two alternative implementations for dealing
with dREL and DDLm.  One has been produced by Doug, Nick, Syd and Ian, which
I would characterise as 'static': a DDLm dictionary is actually converted to
executable code at compile time, allowing distribution to end users of an
executable dictionary.   One alternative approach which I have been pursuing
is to load the DDLm dictionary into memory at runtime and execute the dREL
code as needed.  In either case I think my abstract description above gives
the essential gist of what happens.

The nice part about the CIF + DDL way of working is that no particular
implementation is mandated, but the correct behaviour is specified.  I think
this is why Herbert would prefer to see as much of the NeXuS to CIF
conversion logic in a DDLm/dREL form.

>
> > When passed a CIF instance, it will return values of any
> > datanames that are contained in the CIF instance or that it is capable of
> > calculating from those datanames that are already in the instance.
>
> An instance of the 2-level dictionary object is created and then populated
> with the raw CIF data. Any items for which a "?" was recorded against them
> are
> subsequently evaluated where possible.


> Thereafter, to print the CIF, the 2-level dictionary object is
> walked/visited
> and CIF tag/values are written to some output device.
> To generate hierarchical NeXus from CIF, the dREL engine would have to be
> reworked, if it hasn't been already.


To be honest, I was tackling only the 'from NeXuS to CIF' issues at this
stage, as they are the most difficult.

>
> > Now,
> > what I envision as a 'translating' DDLm engine is initialised as before
> > with the standard DDLm dictionaries, but also with two further
> > dictionaries: a 'NeXuS dictionary' and a 'translation dictionary'
> (contents
> > of these explained later).
>
> Those two dictionaries would currently be precompiled and created as above.
> I suspect the current dREL interpreter can not understand more than one
> dictionary simultaneously. Concatentation of dictionaries at the
> compilation
> stage might be possible, but probably isn't what you want, because that
> would likely embed Nexus names and value in the result CIF.


My understanding is that the implementation of which you speak only fills in
the question marks in the supplied CIF file: so presumably any
NeXuS-specific names would not be output.

> Finally, it requires a 'NeXuS plugin'. Now,
> when passed a CIF instance the DDLm engine works as before.  When passed a
> NeXuS instance, it returns values of CIF datanames that it can calculate.
>
> Now for an explanation of these various extra bits.
>
> 1.  The 'NeXuS' dictionary is just another DDLm dictionary.  It contains
> definitions for datanames using a CIF namespace: e.g. _nexus.slit_height.
> The linkage to a NeXuS file is accomplished using a set of new DDLm
> attributes, which work like the current 'xref' attributes: in the header
> section of this 'NeXuS' dictionary file the various versions of the NeXuS
> standard are assigned a short code in a loop.  Each of the definitions in
> the body of the dictionary then contains two new DDLm attributes:
> _alien.code (referencing the version of the standard in the header) and
> _alien.location (where to find the dataname).  The syntax of the value of
> _alien.location might be borrowed from, for example, XPath in the case of
> NeXuS.

XPath can provide a mechanism to locate items in an XML document tree,
> but it doesn't provide a mechanism to specify/generate the structure of
> that
> tree.  e.g. //NXdata/@name  might get a nodeset corresponding to a list of
> name attribute nodes for potential use as CIF tags, but says nothing about
> the location of the NXdata elements.
> i.e. this is helpful for NeXus -> CIF, but not for CIF -> NeXus


Yes, I was only aiming to solve the NeXuS -> CIF problem.

>
> > The data definitions containing _alien.location attributes could be
> > considered 'raw' NeXuS data, which may not map easily onto CIF datanames.
> > Therefore, this dictionary could contain further DDLm definitions of
> > dataitems (still in the CIF 'nexus' namespace) which contained dREL
> methods
> > for manipulating the raw datanames into something that mapped more
> directly
> > into CIF.  This is where one might foresee adding a few more builtin
> > functions to dREL to ease e.g. image processing.
>
> I get the feeling that somewhere there will need to be a list that says
> something like:
> nexus4:some_cat.some_item1   ?
> nexus4:some_cat.some_item2   ?
> ...
> - in order to trigger the evaluations. Though maybe they would be deduced
> by a CIF full of "?" on the request side.


The idea would be to trigger all the evaluations as usual, and because you
have loaded in the 'translate' DDLm dictionary over the top of the normal
dictionary, at some stage the evaluation chain will access NeXuS-derived
values instead of primitive values.

[example from previous email deleted]

>
>
> Just as an alternative:
>
> <xsl:stylesheet>
> <xsl:output method="text"/>
> <xsl:template match="NXuser">
>   <xsl:if test="position()= 1">  <!-- if multiple NXuser elements -->
>     <xsl:text>loop_&#xA;</xsl:text>   <!--  append newline char in hex -->
>     <xsl:text>          audit_author.name&#xA;</xsl:text>
>     <xsl:text>          audit_author.affiliation&#xA;</xsl:text>
>     <xsl:text>          audit_author.phone&#xA;</xsl:text>
>     <xsl:text>          audit_author.fax&#xA;</xsl:text>
>     <xsl:text>          audit_author.email&#xA;</xsl:text>
>   </xsl:if>
>
>   <xsl:for-each select="./name">
>      <xsl:variable name="audit_author" select="parent::node()"/>
>        <xsl:call-template name="dumpItem">
>           <xsl:with-param name="item" select="."/><!--i.e. name -->
>        </xsl:call-template>
>        <xsl:call-template name="dumpItem">
>           <xsl:with-param name="item" select="$audit_author/affiliation"/>
>        </xsl:call-template>
>        <xsl:call-template name="dumpItem">
>         <xsl:with-param name="item"
> select="$audit_author/telephone_number"/>
>        </xsl:call-template>
>        <xsl:call-template name="dumpItem">
>           <xsl:with-param name="item" select="$audit_author/fax_number"/>
>        </xsl:call-template>
>        <xsl:call-template name="dumpItem">
>           <xsl:with-param name="item" select="$audit_author/email"/>
>        </xsl:call-template>
>      <xsl:text>&#xA;</xsl:text>
>   </xsl:for-each>
> </xsl:template>
>
> <xsl:template name="dumpItem">
>  <xsl:param name="item"/>
>  <xsl:text> </xsl:text>
>  <xsl:choose>
>    <xsl:when test="$item !=''">
>        <!-- add space and parentheses checks here -->
>       <xsl:value-of select="$item"/>
>    </xsl:when>
>    <xsl:otherwise>
>       <xsl:text>.</xsl:text>
>    </xsl:otherwise>
>  </xsl:choose>
> </xsl:template>
> </xsl:stylesheet>
>
>
> - a simple (untested) XSLT stylesheet, usable by a significant number of
> current XSLT processing engines that could transform NeXus/NXuser data in
> XML format directly into CIF. Some XSLT engines provide extension options
> for doing more complicated transformations when needed. NeXus HDF would
> need
> transformation to XML first. A separate stylesheet would need to be defined
> to do the reverse transformation, assuming that the CIF was first converted
> to
> some XML format.
>
> (not that XSLT is really what I would be looking for in a "mapping" file,
> but its good to be aware of other possibilites - but maybe there is already
> a CIF->CML->NeXus converter and vice versa?)


This is an intriguing example and I think if the actual values themselves
don't need manipulation it would do a good job.  Perhaps the initial
transformation to what I called previously a 'raw NeXuS' CIF could be best
done by XSLT, using the conventions of that program to do the
renormalisation.   Manipulations of data values could then be done by dREL
routines in a 'translate' dictionary.  There is however an important
practical limitation of this scheme, which is that trying to deal with XML
files that have images in them is ridiculously slow even with current
desktop processing power (that is our experience at the Bragg, anyway).

Also, Nick S. tells me that back in the late 90s he produced an XSLT-based
transformation from CIF and DDL to XML, and was able to use standard XML
tools to validate the CIF-derived XML file against the DDL-derived XML
schema.  Maybe the time has come for this tool to be dusted off.

James.

-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://scripts.iucr.org/pipermail/comcifs/attachments/20081216/decc2d5e/attachment-0001.html 


More information about the comcifs mailing list