[Imgcif-l] Pilatus 2M putative full CBF implementation
Herbert J. Bernstein
yaya at bernstein-plus-sons.com
Mon Jul 4 18:23:28 BST 2011
Dear Michael,
There are many cases in which CIF uses a value to link multiple tables.
array_data.binary_id is one of them. As a link its value is arbitrary,
provided it is used consistently, and because of that, when combining
CIFs it well may have to be changed.
In this case we have a large group of CBFs that are likely to be
combined into one larger structure in a data management system,
perhaps in HDF5, perhaps in XML, perhaps in CIF, and then spewed
out later for processing. I may be wrong, but as a firm believer
in Murphy's law, I suspect there will be cases in which what is
spewed out will not be in the same order as what went in, perhaps
with shifts. perhaps with inversions. Therefore, I tend to favor
redundancy to have a chance of catching, and perhaps fixing such
errors.
Each to his own.
Regards,
Herbert
At 10:40 AM -0500 7/4/11, Michael Blum wrote:
>Dear Herb,
>
>This use of binary_id is not consistent with my understanding nor
>use of it, so I'd like to
>put in my two cents here.
>
>Because, in this discussion, we are usually talking about
>crystallographic data frames, it is tempting
>to think of the binary data as THE binary data (the diffraction
>image). Howerver, a CIF/CBF file might contain multiple
>binary data, and I think it is easy to imagine that a CBF file might
>contain, beside a single wavelength diffraction image,
>possibly one or more microscopic images of the crystal, or a second
>wavelength diffraction image, or some other data (eg, spectroscopic )
>that is best stored in binary form. In our case, we store the
>original MarCCD format binary header as binary data, in a second
>Mime encoded section.
>In our files, the diffraction image in binary_id 1 and the header is
>binary_id 2. These are associated with the appropriate metadata
>with the binary_id (eg, the
>use of the _array_data.binary_id for the diffraction image), but
>the numbers themselves are completely arbitrary - they could as well
>be 17 and 123.
>
>Also, since CIF is really a database format, it should be possible
>to combine files, (eg a diffraction data set) into a single file.
>
>For this reason, I think it would be wrong to have the binary_id
>mean anything more than the id that associates the binary data to
>the appropriate other fields in the
>CBF file.
>
>As a general observation, I think (and others may disagree), that
>the idea of duplicating an id or association as a "cross-check"
>causes many more headaches than
>it may solve. It is difficult enough, as a developer (or worse, a
>user), to identify the correct fields and their proper usage when
>there is a single field used for a particular purpose. If I must
>then also discover ALL the other possible places where the same
>information must be recorded consistently, it is a recipe for
>creating files on which downstream programs will choke - for no good
>reason. If programs are to choke, let them choke on bad data not
>overly complex, cross-linked headers.
>
>I would like to strongly advocate that all data be recorded uniquely
>- NOT multiply!
>
>regards,
>
>Michael
>
>
>
>
>
>On Jul 4, 2011, at 6:03 AM, Herbert J. Bernstein wrote:
>
>> rstrt is used for the similar tags for rotation and translation,
>> so doing so for time made a set of it. None of them are used much,
>> but they are there if needed.
>>
>> On _array_data.binary_id, if you are trying for a complete header,
>> you should make the value of _array_data.binary_id consistent with
>> the mime header binary ID, but I had the thought that making both
>> of the equal to the frame number (starting from 1) would be a good
> > cross-check when working with lots of images.
>>
>> =====================================================
>> Herbert J. Bernstein, Professor of Computer Science
>> Dowling College, Kramer Science Center, KSC 121
> > Idle Hour Blvd, Oakdale, NY, 11769
>>
>> +1-631-244-3035
>> yaya at dowling.edu
>> =====================================================
>>
>> On Mon, 4 Jul 2011, Graeme.Winter at Diamond.ac.uk wrote:
>>
>>> Dear Herbert,
>>>
>>> Many thanks for looking at these in such detail. On (1) I would say that
>>> rstrt is not an obvious name to use for the interval, however this will
>>> also not be used very much so perhaps that is not so vital.
>>>
>>> On (2) the dectris software writes the CBF MIME header
>>>
>>> --CIF-BINARY-FORMAT-SECTION--
>>> Content-Type: application/octet-stream;
>>> conversions="x-CBF_BYTE_OFFSET"
>>> Content-Transfer-Encoding: BINARY
>>> X-Binary-Size: 2478811
>>> X-Binary-ID: 1
>>> X-Binary-Element-Type: "signed 32-bit integer"
>>> X-Binary-Element-Byte-Order: LITTLE_ENDIAN
>>> Content-MD5: tidsaCIz+gfJVEe5u+RaKw==
>>> X-Binary-Number-of-Elements: 2476525
>>> X-Binary-Size-Fastest-Dimension: 1475
>>> X-Binary-Size-Second-Dimension: 1679
>>> X-Binary-Size-Padding: 4095
>>>
>>>
>>> Thanks again & best wishes,
>>>
>>> Graeme
>>>
>>> -----Original Message-----
>>> From: imgcif-l-bounces at iucr.org
>>>[mailto:imgcif-l-bounces at iucr.org] On Behalf Of Herbert J.
>>>Bernstein
>>> Sent: 02 July 2011 19:21
>>> To: The Crystallographic Binary File and its imgCIF application
>>>to image data
>>> Subject: Re: [Imgcif-l] Pilatus 2M putative full CBF implementation
>>>
>>> Dear Graeme,
>>>
>>> Nice job.
>>>
>>> The CBF's look pretty good in terms of following the "rules". I only
>>> see two small problems. Here's the list for G1F_3_0017.cbf:
>>>
>>> 1. CBFlib: warning input line 133 (19) -- item name
>>> _diffrn_scan_frame.exposure_time not found in the dictionary
>>>
>>> 2. CBFlib: warning -- required parent tag _array_data.binary_id for
>>> _diffrn_data_frame.binary_id in G1F_3_0017 not given
>>> CBFlib: warning -- required parent tag _array_data.binary_id for
>>> _array_intensities.binary_id in G1F_3_0017 not given
>>> Time to read input_cif: 0.414s
>>>
>>>
>>> Let's take them one at a time:
>>>
>>> 1. _diffrn_scan_frame.exposure_time really is not in dictionary and the
>>> DECTRIS tag you associate it with
>>>
>>> Exposure_period _expp_
>>>
>>> really isn't so much an exposure time as an interval between exposures.
>>> I propose we add the following tags to the dictionary:
>>> _diffrn_scan.time_increment
>>> _diffrn_scan_frame.time_increment
>>>
>>> _diffrn_scan.time_rstrt_incr
>>> _diffrn_scan_frame.time_rstrt_incr
>>>
>>> The first two would be used for the Dectris "Exposure_period".
>>> The second two would be optional tags for the case where what
>>> we have is the time from end of one integration to the start
>>> of the next integration, rather then the time from start to start
>>>
>>> 2. _array_data.binary_id
>>>
>>> This seems to be missing. The value you give should agree with the
>>> value you use in _diffrn_data_frame.binary_id and
>>> _array_intensities.binary_id. In the past, we have always used 1,
>>> and that is the default, but I would suggest using the frame
>>> number (counting from 1) instead. If you do that, you should
>>> put the same value into _diffrn_scan_frame.frame_number.
>>> Ideally, this should start with a frame number in the DECTRIS
>>> header (that would be a new field, I believe) and would provide
>>> a cross-check on frame numbers, instead of just relying on the
>>> file name.
>>>
>>> Regards,
>>> Herbert
>>>
>>>
>>>
>>> At 3:44 PM +0000 6/30/11, <Graeme.Winter at Diamond.ac.uk> wrote:
>>>> Dear people interested in imgCIF,
>>>>
>>>> As you will no doubt know, we have been battling for a while to get
>>>> to generating full cbf images from our detectors. We have now
>>>> reached a milestone - full cbf images from Pilatus instruments which
>>>> cbflib recognises and can read, and that can be processed with our
>>>> automated software. However, we felt that it would be important to
> >>> make these data available for review among CIF experts to ensure
>>>> that there are no real boo boo's in there.
>>>>
>>>> So, a full Pilatus2M image data set can be found from:
>>>>
>>>> ftp://ftpanon.diamond.ac.uk/GraemeWinter/CBF/Pilatus2M/C.tar.bz2
> >>>
>>>> which appears to process well using xia2 / XDS, which relies on
>>>> pycbf now included in cctbx to read the image headers and is
>>>> currently an unreleased version. It is unreleased for a good reason:
>>>> the assumptions which define the CIF in the images here are the same
>>>> set of assumptions used in understanding them, though the headers
>>>> are based on those from an ADSC Q315 so should be good. If you are
>>>> keen though you can get the bleeding edge code from sourceforge.
>>>>
>>>> The axes described are the "canonical" rather than true ones i.e.
>>>> they are where we would ideally like everything to be rather than
>>>> where everything is actually measured to be - the latter will be a
>>>> refinement at some point in the future.
>>>>
>>>> What would I like? People to download this, unpack it and critique
>>>> the headers which are contained therein.
>>>>
>>>> For people who are interested in the how, you will also find a .cif
>>>> file in the tarball - this was generated by GDA from a metatemplate
>>>> using some Python code and is used by the "camserver" program to
>>>> compose the full cbf image. Any errors in the CIF are therefore my
>>>> fault in composing this template and need fixing!
>>>>
>>>> Thanks in advance and best wishes,
>>>>
>>>> Graeme
>>>>
>>>> Dr. Graeme Winter
>>>> Senior Software Scientist
>>>> Diamond Light Source
>>>>
>>>> +44 1235 778091 (work)
>>>> +44 7786 662784 (work mobile)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> This e-mail and any attachments may contain confidential, copyright
>>>> and or privileged material, and are for the use of the intended
>>>> addressee only. If you are not the intended addressee or an
>>>> authorised recipient of the addressee please notify us of receipt by
>>>> returning the e-mail and do not use, copy, retain, distribute or
>>>> disclose the information in or attached to the e-mail.
>>>>
>>>> Any opinions expressed within this e-mail are those of the
>>>> individual and not necessarily of Diamond Light Source Ltd.
>>>>
>>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>>>> attachments are free from viruses and we cannot accept liability for
>>>> any damage which you may sustain as a result of software viruses
>>>> which may be transmitted in or with the message.
>>>>
>>>> Diamond Light Source Limited (company no. 4375679). Registered in
>>>> England and Wales with its registered office at Diamond House,
>>>> Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11
>>>> 0DE, United Kingdom
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> imgcif-l mailing list
>>>> imgcif-l at iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/imgcif-l
>>>
>>>
>>> --
>>> =====================================================
>>> Herbert J. Bernstein, Professor of Computer Science
>>> Dowling College, Kramer Science Center, KSC 121
>>> Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>> +1-631-244-3035
>>> yaya at dowling.edu
>>> =====================================================
>>> _______________________________________________
>>> imgcif-l mailing list
>>> imgcif-l at iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/imgcif-l
>>>
>>> --
>>> This e-mail and any attachments may contain confidential,
>>>copyright and or privileged material, and are for the use of the
>>>intended addressee only. If you are not the intended addressee or
>>>an authorised recipient of the addressee please notify us of
>>>receipt by returning the e-mail and do not use, copy, retain,
>>>distribute or disclose the information in or attached to the
>>>e-mail.
>>> Any opinions expressed within this e-mail are those of the
>>>individual and not necessarily of Diamond Light Source Ltd.
>>> Diamond Light Source Ltd. cannot guarantee that this e-mail or
>>>any attachments are free from viruses and we cannot accept
>>>liability for any damage which you may sustain as a result of
>>>software viruses which may be transmitted in or with the message.
> >> Diamond Light Source Limited (company no. 4375679). Registered
>in England and Wales with its registered office at Diamond House,
>Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11
>0DE, United Kingdom
> >>
>>>
>>>
>>>
>>> _______________________________________________
>>> imgcif-l mailing list
>>> imgcif-l at iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/imgcif-l
>>>
>> _______________________________________________
>> imgcif-l mailing list
>> imgcif-l at iucr.org
>> http://scripts.iucr.org/mailman/listinfo/imgcif-l
>
>_______________
>Michael L. Blum Toll Free: 877-627-XRAY (627-9729)
>Rayonix, LLC Tel: 847-869-1548
>1880 Oak Avenue Fax: 847-869-1587
>Evanston, IL 60201 Email: blum at rayonix.com
>USA WWW: www.rayonix.com
>
>
>
>
>_______________________________________________
>imgcif-l mailing list
>imgcif-l at iucr.org
>http://scripts.iucr.org/mailman/listinfo/imgcif-l
--
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
yaya at dowling.edu
=====================================================
More information about the imgcif-l
mailing list