No subject

Nick Spadaccini <nick@csse.uwa.edu.au "http://nick" at csse.uwa.edu.au
Tue Dec 16 08:04:57 GMT 2008


Standard (COMCIFS)" <comcifs at iucr.org <http://comcifs@iucr.org> >
Conversation: DDLm, dREL, images and NeXus
Subject: Re: DDLm, dREL, images and NeXus

 I am tracking this discussion but don't have time at the moment for a long
and considered response.   I am slowly getting something together though. I
can see a way of doing much of what Herb suggests without making to great a
change to the current form of DDLm/dREL and certainly avoiding the need to
extend DDLm to deal with various alien attributes. It has to do with making
the use of methods in a dictionary context sensitive (really just exploiting
the import mechanism).

What I am thinking is that, for instance, _cell_volume has an evaluation
method which will generate its value from _cell_vector_a etc. What is
important is all the definition information associated with _cell_volume, that
has to be consistent. But I can import all this in to another dictionary,
where I have an overwrite of the method. In this dictionary a request for
_cell_volume executes its method, which pokes in to the DOM representation of
an imported NeXus file and extracts its value, if it is there. It is OK it
isn't there because I can take what I find back to the original dictionary and
the method there will calculate the _cell_volume for me. I can have a method
that takes a CIF formalised data item and injects in to a DOM representation
ready for export OUT to NeXus. The essence is that I use imports to bring in
the method I want, "fit for purpose". The neatness of this approach is that
most the dictionary is constant, consistent and correct, ONLY the methods
change as needed.

The problem now is the API. The guts of the dREL parser will do most of what
you want. We will need to develop an extension that takes a NeXus and reads it
into its DOM formalism. This can be generalized and much of the Java (and
Python) library already exists. But what about the complications of extending
the API. Well in the newest form of DDLm we created a new category called
_function where all the "functions" to be used in dREL are defined. Since we
can access all of Python in our current implementation we should be able to
build functions that connect dREL to a DOM trawler relatively easy (says the
man who hasn't had time to look at dREL in the last 6 months).

These are my initial thoughts, I will go a mull them over to see if I am
making sense.


On 16/12/08 11:37 AM, "James Hester" <jamesrhester at gmail.com
<http://jamesrhester@gmail.com> > wrote:

>Before responding to Doug, I might comment that, although we are thinking
>about NeXuS in particular here, we should make sure that whatever scheme we
>come up with is generic enough to allow translations to be implemented from
>(and to) other data description schemes (e.g. data repositories).
>
>On Mon, Dec 15, 2008 at 3:22 PM, Doug <doug.duboulay at gmail.com
><http://doug.duboulay@gmail.com> > wrote:
>>
>>On Fri, 12 Dec 2008, James Hester wrote:
>>>> Let me flesh out a proposal in some detail, so that holes can be picked
>>>in
>>>> it.
>>
>>The fact that the NeXus data model effectively supports infinite recursion
>>on
>>some elements and also that NXdata can hold many things that will not
>>have CIF equivalents both suggest that NeXus -> CIF conversion could be
>>lossy.
>>
>>>> First, an overall view.
>>>>
>>>> At the moment, a DDLm/dREL engine is initialised with a set of DDLm
>>>> dictionaries.
>>
>>To elaborate a little bit, a dictionary is compiled to jython/java byte
>>code as a set of classes, one for each category and containing
>>methods for get/set and evaluate.  Although DDLm goes to some
>>effort to express a hierarchy of categories, at least in the dREL prototype
>>engine, at the implementation level, those categories were flattened to the
>>two-level CIF model.
>
>As an aside, there are currently two alternative implementations for dealing
>with dREL and DDLm.  One has been produced by Doug, Nick, Syd and Ian, which
>I would characterise as 'static': a DDLm dictionary is actually converted to
>executable code at compile time, allowing distribution to end users of an
>executable dictionary.   One alternative approach which I have been pursuing
>is to load the DDLm dictionary into memory at runtime and execute the dREL
>code as needed.  In either case I think my abstract description above gives
>the essential gist of what happens.
>
>The nice part about the CIF + DDL way of working is that no particular
>implementation is mandated, but the correct behaviour is specified.  I think
>this is why Herbert would prefer to see as much of the NeXuS to CIF
>conversion logic in a DDLm/dREL form.
>>
>>>> When passed a CIF instance, it will return values of any
>>>> datanames that are contained in the CIF instance or that it is capable of
>>>> calculating from those datanames that are already in the instance.
>>
>>An instance of the 2-level dictionary object is created and then populated
>>with the raw CIF data. Any items for which a "?" was recorded against them
>>are
>>subsequently evaluated where possible.
>>
>>Thereafter, to print the CIF, the 2-level dictionary object is
>>walked/visited
>>and CIF tag/values are written to some output device.
>>To generate hierarchical NeXus from CIF, the dREL engine would have to be
>>reworked, if it hasn't been already.
>
>To be honest, I was tackling only the 'from NeXuS to CIF' issues at this
>stage, as they are the most difficult.
>>
>>>> Now,
>>>> what I envision as a 'translating' DDLm engine is initialised as before
>>>> with the standard DDLm dictionaries, but also with two further
>>>> dictionaries: a 'NeXuS dictionary' and a 'translation dictionary'
>>>(contents
>>>> of these explained later).
>>
>>Those two dictionaries would currently be precompiled and created as above.
>>I suspect the current dREL interpreter can not understand more than one
>>dictionary simultaneously. Concatentation of dictionaries at the compilation
>>stage might be possible, but probably isn't what you want, because that
>>would likely embed Nexus names and value in the result CIF.
>
>My understanding is that the implementation of which you speak only fills in
>the question marks in the supplied CIF file: so presumably any NeXuS-specific
>names would not be output.
>
>>> Finally, it requires a 'NeXuS plugin'. Now,
>>> when passed a CIF instance the DDLm engine works as before.  When passed a
>>> NeXuS instance, it returns values of CIF datanames that it can calculate.
>>>
>>> Now for an explanation of these various extra bits.
>>>
>>> 1.  The 'NeXuS' dictionary is just another DDLm dictionary.  It contains
>>> definitions for datanames using a CIF namespace: e.g. _nexus.slit_height.
>>> The linkage to a NeXuS file is accomplished using a set of new DDLm
>>> attributes, which work like the current 'xref' attributes: in the header
>>> section of this 'NeXuS' dictionary file the various versions of the NeXuS
>>> standard are assigned a short code in a loop.  Each of the definitions in
>>> the body of the dictionary then contains two new DDLm attributes:
>>> _alien.code (referencing the version of the standard in the header) and
>>> _alien.location (where to find the dataname).  The syntax of the value of
>>> _alien.location might be borrowed from, for example, XPath in the case of
>>> NeXuS.
>
>>XPath can provide a mechanism to locate items in an XML document tree,
>>but it doesn't provide a mechanism to specify/generate the structure of that
>>tree.  e.g. //NXdata/@name  might get a nodeset corresponding to a list of
>>name attribute nodes for potential use as CIF tags, but says nothing about
>>the location of the NXdata elements.
>>i.e. this is helpful for NeXus -> CIF, but not for CIF -> NeXus
>
>Yes, I was only aiming to solve the NeXuS -> CIF problem.
>>
>>>> The data definitions containing _alien.location attributes could be
>>>> considered 'raw' NeXuS data, which may not map easily onto CIF datanames.
>>>> Therefore, this dictionary could contain further DDLm definitions of
>>>> dataitems (still in the CIF 'nexus' namespace) which contained dREL
>>>methods
>>>> for manipulating the raw datanames into something that mapped more
>>>directly
>>>> into CIF.  This is where one might foresee adding a few more builtin
>>>> functions to dREL to ease e.g. image processing.
>>
>>I get the feeling that somewhere there will need to be a list that says
>>something like:
>>nexus4:some_cat.some_item1   ?
>>nexus4:some_cat.some_item2   ?
>>...
>>- in order to trigger the evaluations. Though maybe they would be deduced
>>by a CIF full of "?" on the request side.
>
>The idea would be to trigger all the evaluations as usual, and because you
>have loaded in the 'translate' DDLm dictionary over the top of the normal
>dictionary, at some stage the evaluation chain will access NeXuS-derived
>values instead of primitive values.
>
>[example from previous email deleted]
>>
>>
>>Just as an alternative:
>>
>><xsl:stylesheet>
>><xsl:output method="text"/>
>><xsl:template match="NXuser">
>>   <xsl:if test="position()= 1">  <!-- if multiple NXuser elements -->
>>     <xsl:text>loop_&#xA;</xsl:text>   <!--  append newline char in hex -->
>>     <xsl:text>          audit_author.name <http://audit_author.name>
>><http://audit_author.name> &#xA;</xsl:text>
>>     <xsl:text>          audit_author.affiliation&#xA;</xsl:text>
>>     <xsl:text>          audit_author.phone&#xA;</xsl:text>
>>     <xsl:text>          audit_author.fax&#xA;</xsl:text>
>>     <xsl:text>          audit_author.email&#xA;</xsl:text>
>>   </xsl:if>
>>
>>   <xsl:for-each select="./name">
>>      <xsl:variable name="audit_author" select="parent::node()"/>
>>        <xsl:call-template name="dumpItem">
>>           <xsl:with-param name="item" select="."/><!--i.e. name -->
>>        </xsl:call-template>
>>        <xsl:call-template name="dumpItem">
>>           <xsl:with-param name="item" select="$audit_author/affiliation"/>
>>        </xsl:call-template>
>>        <xsl:call-template name="dumpItem">
>>         <xsl:with-param name="item"
>>select="$audit_author/telephone_number"/>
>>        </xsl:call-template>
>>        <xsl:call-template name="dumpItem">
>>           <xsl:with-param name="item" select="$audit_author/fax_number"/>
>>        </xsl:call-template>
>>        <xsl:call-template name="dumpItem">
>>           <xsl:with-param name="item" select="$audit_author/email"/>
>>        </xsl:call-template>
>>      <xsl:text>&#xA;</xsl:text>
>>   </xsl:for-each>
>></xsl:template>
>>
>><xsl:template name="dumpItem">
>>  <xsl:param name="item"/>
>>  <xsl:text> </xsl:text>
>>  <xsl:choose>
>>    <xsl:when test="$item !=''">
>>        <!-- add space and parentheses checks here -->
>>       <xsl:value-of select="$item"/>
>>    </xsl:when>
>>    <xsl:otherwise>
>>       <xsl:text>.</xsl:text>
>>    </xsl:otherwise>
>>  </xsl:choose>
>></xsl:template>
>></xsl:stylesheet>
>>
>>
>>- a simple (untested) XSLT stylesheet, usable by a significant number of
>>current XSLT processing engines that could transform NeXus/NXuser data in
>>XML format directly into CIF. Some XSLT engines provide extension options
>>for doing more complicated transformations when needed. NeXus HDF would need
>>transformation to XML first. A separate stylesheet would need to be defined
>>to do the reverse transformation, assuming that the CIF was first converted
>>to
>>some XML format.
>>
>>(not that XSLT is really what I would be looking for in a "mapping" file,
>>but its good to be aware of other possibilites - but maybe there is already
>>a CIF->CML->NeXus converter and vice versa?)
>
>This is an intriguing example and I think if the actual values themselves
>don't need manipulation it would do a good job.  Perhaps the initial
>transformation to what I called previously a 'raw NeXuS' CIF could be best
>done by XSLT, using the conventions of that program to do the
>renormalisation.   Manipulations of data values could then be done by dREL
>routines in a 'translate' dictionary.  There is however an important
>practical limitation of this scheme, which is that trying to deal with XML
>files that have images in them is ridiculously slow even with current desktop
>processing power (that is our experience at the Bragg, anyway).
>
>Also, Nick S. tells me that back in the late 90s he produced an XSLT-based
>transformation from CIF and DDL to XML, and was able to use standard XML
>tools to validate the CIF-derived XML file against the DDL-derived XML
>schema.  Maybe the time has come for this tool to be dusted off.
>
>James.

cheers

Nick

--------------------------------
Dr N. Spadaccini
School of Computer Science & Software Engineering

The University of Western Australia    t: +(61 8) 6488 3452
35 Stirling Highway                    f: +(61 8) 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
<http://www.csse.uwa.edu.au/%7Enick>
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini at uwa.edu.au <http://Nick.Spadaccini@uwa.edu.au>


More information about the comcifs mailing list