DDLm, dREL, images and NeXus

Doug doug.duboulay at gmail.com
Mon Dec 15 04:22:20 GMT 2008


On Fri, 12 Dec 2008, James Hester wrote:
> Let me flesh out a proposal in some detail, so that holes can be picked in
> it.

The fact that the NeXus data model effectively supports infinite recursion on 
some elements and also that NXdata can hold many things that will not 
have CIF equivalents both suggest that NeXus -> CIF conversion could be lossy.

> First, an overall view.
>
> At the moment, a DDLm/dREL engine is initialised with a set of DDLm
> dictionaries.  

To elaborate a little bit, a dictionary is compiled to jython/java byte
code as a set of classes, one for each category and containing 
methods for get/set and evaluate.  Although DDLm goes to some 
effort to express a hierarchy of categories, at least in the dREL prototype
engine, at the implementation level, those categories were flattened to the 
two-level CIF model.


> When passed a CIF instance, it will return values of any 
> datanames that are contained in the CIF instance or that it is capable of
> calculating from those datanames that are already in the instance. 

An instance of the 2-level dictionary object is created and then populated
with the raw CIF data. Any items for which a "?" was recorded against them are 
subsequently evaluated where possible.

Thereafter, to print the CIF, the 2-level dictionary object is walked/visited
and CIF tag/values are written to some output device.
To generate hierarchical NeXus from CIF, the dREL engine would have to be 
reworked, if it hasn't been already.


> Now, 
> what I envision as a 'translating' DDLm engine is initialised as before
> with the standard DDLm dictionaries, but also with two further
> dictionaries: a 'NeXuS dictionary' and a 'translation dictionary' (contents
> of these explained later).  

Those two dictionaries would currently be precompiled and created as above.
I suspect the current dREL interpreter can not understand more than one
dictionary simultaneously. Concatentation of dictionaries at the compilation 
stage might be possible, but probably isn't what you want, because that
would likely embed Nexus names and value in the result CIF.


> Finally, it requires a 'NeXuS plugin'. Now, 
> when passed a CIF instance the DDLm engine works as before.  When passed a
> NeXuS instance, it returns values of CIF datanames that it can calculate.
>
> Now for an explanation of these various extra bits.
>
> 1.  The 'NeXuS' dictionary is just another DDLm dictionary.  It contains
> definitions for datanames using a CIF namespace: e.g. _nexus.slit_height.
> The linkage to a NeXuS file is accomplished using a set of new DDLm
> attributes, which work like the current 'xref' attributes: in the header
> section of this 'NeXuS' dictionary file the various versions of the NeXuS
> standard are assigned a short code in a loop.  Each of the definitions in
> the body of the dictionary then contains two new DDLm attributes:
> _alien.code (referencing the version of the standard in the header) and
> _alien.location (where to find the dataname).  The syntax of the value of
> _alien.location might be borrowed from, for example, XPath in the case of
> NeXuS.

XPath can provide a mechanism to locate items in an XML document tree,
but it doesn't provide a mechanism to specify/generate the structure of that 
tree.  e.g. //NXdata/@name  might get a nodeset corresponding to a list of 
name attribute nodes for potential use as CIF tags, but says nothing about 
the location of the NXdata elements. 
i.e. this is helpful for NeXus -> CIF, but not for CIF -> NeXus

XPath does provide some quite versatile conditional select expressions though
so potentially you could select only NXdata nodes whose text value had a 
leading underscore, for instance: //NXdata[starts-with(@name,'_')]


> The data definitions containing _alien.location attributes could be
> considered 'raw' NeXuS data, which may not map easily onto CIF datanames.
> Therefore, this dictionary could contain further DDLm definitions of
> dataitems (still in the CIF 'nexus' namespace) which contained dREL methods
> for manipulating the raw datanames into something that mapped more directly
> into CIF.  This is where one might foresee adding a few more builtin
> functions to dREL to ease e.g. image processing.

I get the feeling that somewhere there will need to be a list that says
something like:
nexus4:some_cat.some_item1   ?
nexus4:some_cat.some_item2   ?
...
- in order to trigger the evaluations. Though maybe they would be deduced
by a CIF full of "?" on the request side. 

> Note also that one might, instead of using an '_alien.location' DDLm
> attribute, define a new dREL builtin (like 'nexus_locate(string)'), and
> provide a simple dREL expression calling this builtin.
>
> 2. The meaning of the 'NeXuS' plugin to the DDLm engine is that it enables
> the DDLm engine to understand the '_alien.location' attribute and use it to
> return a dREL-compatible value.  It would also supply extra builtin
> functions if necessary.
>
> 3. The 'translation dictionary' contains alternative definitions of items
> in the standard CIF dictionaries, where the dREL methods used to derive
> those items in the standard dictionaries are replaced by dREL manipulations
> of dataitems defined in the 'NeXuS' dictionary.  When it is read into the
> DDLm/dREL engine, these new dREL methods could either replace the old ones,
> or be added as alternative methods of type 'translation'.  This then allows
> lots of flexibility in terms of checking derived values in a NeXuS file
> against those values derived via standard CIF dREL methods from primitive
> values from that same NeXuS file.
>
> So, in summary, in order to implement this scheme, one needs about 6 new
> DDLm attributes and a few more builtin dREL functions.  The crucial work is
> in designing the behaviour of the '_alien.location' attribute to properly
> capture the information.
>
> One thing to think about is the introduction of an opaque data type into
> dREL, so that e.g. image data returned from a NeXuS file need not be
> specified as an array of numbers, but simply as an object, which can only
> be passed to a builtin function and not otherwise manipulated.
>
> Advantages of this scheme:
> (1) dREL simplicity is preserved (I think this is important);
> (2) implemented as a modular addon to a normal DDLm engine;
> (3) easily extensible to other data formats;
>
> Here follows a simple example.  Note that a NeXuS file may contain multiple
> 'NXuser' groups, each of which may contain one or more names but only a
> single affiliation.  By specifying that the corresponding CIF category is a
> 'List' category, it can be deduced by the NeXuS plugin that each name in
> the NeXuS file is a new entry in the category loop.  I have accomplished
> renormalisation by walking the tree and returning the items in the order
> encountered.  The category key is used to determine when to duplicate an
> item; so in this case, every time the key is encountered, the associated
> value of the target item is returned.  Otherwise, there would be fewer
> affiliations than names in cases where more than one name belongs to a
> single affiliation.
>
> There is no additional processing of the raw nexus datanames in this
> example.
> ==============================
> The 'NeXuS' dictionary file:
> ==============================
> data_nexus
>  _dictionary.namespace       nexus
>   loop_
>    _dictionary_alien.code
>    _dictionary_alien.type
>     _dictionary_alien.version
>    _dictionary_alien.uri
>    nexus4        nexus        4.0     www.nexusformat.org
> ...
> save_RAW_AUTHOR
>     _definition.class        List     #items go into a loop
>     _category_key.generic   '_raw_author.user_name'  #one value per
> user_name
>     ...
> save_
>
> save_raw_author.user_name
>     _category.id       raw_author
>      _type.container    Single
>     _type.contents     Text
>     _alien.code        nexus4
>     _alien.location   "NXuser:name" #renormalise by returning all names in
> order
>                                     #encountered when walking the tree
> save_
>
> save_raw_author.affiliation
>     _category.id       raw_author
>     _type.container    Single
>     _type.contents     Text
>     _alien.code        nexus4
>     _alien.location    "NXuser:affiliation" #renormalise by returning
> affiliation
>                                             #each time name encountered in
> tree
> save_
> ...
>
> ================
> The translation file
> ================
> data_nexus_translate
>     ...
> save_audit_author.name
> loop_
>     _method.purpose
>     _method.expression
>     'translate'
> ;
> with aa as audit_author
> with rn as nexus:raw_author
> aa.name = rn.user_name
> ;
> save_
>

Just as an alternative:

<xsl:stylesheet>
<xsl:output method="text"/>
<xsl:template match="NXuser">
   <xsl:if test="position()= 1">  <!-- if multiple NXuser elements -->
     <xsl:text>loop_&#xA;</xsl:text>   <!--  append newline char in hex -->
     <xsl:text>          audit_author.name&#xA;</xsl:text>
     <xsl:text>          audit_author.affiliation&#xA;</xsl:text>
     <xsl:text>          audit_author.phone&#xA;</xsl:text>
     <xsl:text>          audit_author.fax&#xA;</xsl:text>
     <xsl:text>          audit_author.email&#xA;</xsl:text>
   </xsl:if>
   
   <xsl:for-each select="./name">
      <xsl:variable name="audit_author" select="parent::node()"/>
        <xsl:call-template name="dumpItem">
           <xsl:with-param name="item" select="."/><!--i.e. name -->
        </xsl:call-template>
        <xsl:call-template name="dumpItem">
           <xsl:with-param name="item" select="$audit_author/affiliation"/>
        </xsl:call-template>
        <xsl:call-template name="dumpItem">
         <xsl:with-param name="item" select="$audit_author/telephone_number"/>
        </xsl:call-template>
        <xsl:call-template name="dumpItem">
           <xsl:with-param name="item" select="$audit_author/fax_number"/>
        </xsl:call-template>
        <xsl:call-template name="dumpItem">
           <xsl:with-param name="item" select="$audit_author/email"/>
        </xsl:call-template>
      <xsl:text>&#xA;</xsl:text>
   </xsl:for-each>
</xsl:template>

<xsl:template name="dumpItem">
  <xsl:param name="item"/>
  <xsl:text> </xsl:text>
  <xsl:choose>
    <xsl:when test="$item !=''">
        <!-- add space and parentheses checks here -->
       <xsl:value-of select="$item"/>
    </xsl:when>
    <xsl:otherwise>
       <xsl:text>.</xsl:text>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
</xsl:stylesheet>


- a simple (untested) XSLT stylesheet, usable by a significant number of
current XSLT processing engines that could transform NeXus/NXuser data in 
XML format directly into CIF. Some XSLT engines provide extension options
for doing more complicated transformations when needed. NeXus HDF would need 
transformation to XML first. A separate stylesheet would need to be defined 
to do the reverse transformation, assuming that the CIF was first converted to
some XML format. 

(not that XSLT is really what I would be looking for in a "mapping" file,
but its good to be aware of other possibilites - but maybe there is already
a CIF->CML->NeXus converter and vice versa?)

Doug




More information about the comcifs mailing list