[Imgcif-l] Pilatus 2M putative full CBF implementation

Herbert J. Bernstein yaya at bernstein-plus-sons.com
Mon Jul 4 18:23:28 BST 2011


Dear Michael,

   There are many cases in which CIF uses a value to link multiple tables.
array_data.binary_id is one of them.  As a link its value is arbitrary,
provided it is used consistently, and because of that, when combining
CIFs it well may have to be changed.

   In this case we have a large group of CBFs that are likely to be
combined into one larger structure in a data management system,
perhaps in HDF5, perhaps in XML, perhaps in CIF, and then spewed
out later for processing.  I may be wrong, but as a firm believer
in Murphy's law, I suspect there will be cases in which what is
spewed out will not be in the same order as what went in, perhaps
with shifts. perhaps with inversions.  Therefore, I tend to favor
redundancy to have a chance of catching, and perhaps fixing such
errors.

   Each to his own.

   Regards,
     Herbert




At 10:40 AM -0500 7/4/11, Michael Blum wrote:
>Dear Herb,
>
>This use of binary_id is not consistent with my understanding nor 
>use of it, so I'd like to
>put in my two cents here.
>
>Because, in this discussion, we are usually talking about 
>crystallographic data frames, it is tempting
>to think of the binary data as THE binary data (the diffraction 
>image).   Howerver, a CIF/CBF file might contain multiple
>binary data, and I think it is easy to imagine that a CBF file might 
>contain, beside  a single wavelength diffraction image,
>possibly one or more microscopic images of the crystal, or a second 
>wavelength diffraction image, or some other data (eg, spectroscopic )
>that is best stored in binary form.   In our case, we store the 
>original MarCCD format binary header as binary data, in a second 
>Mime encoded section.
>In our files, the diffraction image in binary_id 1 and the header is 
>binary_id 2.    These are associated with the appropriate metadata 
>with the binary_id (eg, the
>use of the _array_data.binary_id  for the diffraction image), but 
>the numbers themselves are completely arbitrary - they could as well 
>be 17 and 123.
>
>Also, since CIF is really a database format, it should be possible 
>to combine files, (eg a diffraction data set) into a single file.
>
>For this reason, I think it would be wrong to have the binary_id 
>mean anything more than the id that associates the binary data to 
>the appropriate other fields in the
>CBF file.
>
>As a general observation,   I think  (and others may disagree), that 
>the idea of duplicating an id or association as a "cross-check" 
>causes many more headaches than
>it may solve.   It is difficult enough, as a developer (or worse, a 
>user), to identify the correct fields and their proper usage when 
>there is a single field used for a particular purpose.  If I must 
>then also discover ALL the other possible places where the same 
>information must be recorded consistently, it is a recipe for 
>creating files on which downstream programs will choke - for no good 
>reason.  If programs are to choke, let them choke on bad data not 
>overly complex, cross-linked headers.
>
>I would like to strongly advocate that all data be recorded uniquely 
>- NOT multiply!
>
>regards,
>
>Michael
>
>
>
>
>
>On Jul 4, 2011, at 6:03 AM, Herbert J. Bernstein wrote:
>
>>  rstrt is used for the similar tags for rotation and translation,
>>  so doing so for time made a set of it.  None of them are used much,
>>  but they are there if needed.
>>
>>  On _array_data.binary_id, if you are trying for a complete header,
>>  you should make the value of _array_data.binary_id consistent with
>>  the mime header binary ID, but I had the thought that making both
>>  of the equal to the frame number (starting from 1) would be a good
>  > cross-check when working with lots of images.
>>
>>  =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>  >         Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   yaya at dowling.edu
>>  =====================================================
>>
>>  On Mon, 4 Jul 2011, Graeme.Winter at Diamond.ac.uk wrote:
>>
>>>  Dear Herbert,
>>>
>>>  Many thanks for looking at these in such detail. On (1) I would say that
>>>  rstrt is not an obvious name to use for the interval, however this will
>>>  also not be used very much so perhaps that is not so vital.
>>>
>>>  On (2) the dectris software writes the CBF MIME header
>>>
>>>  --CIF-BINARY-FORMAT-SECTION--
>>>  Content-Type: application/octet-stream;
>>>     conversions="x-CBF_BYTE_OFFSET"
>>>  Content-Transfer-Encoding: BINARY
>>>  X-Binary-Size: 2478811
>>>  X-Binary-ID: 1
>>>  X-Binary-Element-Type: "signed 32-bit integer"
>>>  X-Binary-Element-Byte-Order: LITTLE_ENDIAN
>>>  Content-MD5: tidsaCIz+gfJVEe5u+RaKw==
>>>  X-Binary-Number-of-Elements: 2476525
>>>  X-Binary-Size-Fastest-Dimension: 1475
>>>  X-Binary-Size-Second-Dimension: 1679
>>>  X-Binary-Size-Padding: 4095
>>>
>>>
>>>  Thanks again & best wishes,
>>>
>>>  Graeme
>>>
>>>  -----Original Message-----
>>>  From: imgcif-l-bounces at iucr.org 
>>>[mailto:imgcif-l-bounces at iucr.org] On Behalf Of Herbert J. 
>>>Bernstein
>>>  Sent: 02 July 2011 19:21
>>>  To: The Crystallographic Binary File and its imgCIF application 
>>>to image data
>>>  Subject: Re: [Imgcif-l] Pilatus 2M putative full CBF implementation
>>>
>>>  Dear Graeme,
>>>
>>>    Nice job.
>>>
>>>    The CBF's look pretty good in terms of following the "rules".  I only
>>>  see two small problems.  Here's the list for G1F_3_0017.cbf:
>>>
>>>  1.  CBFlib: warning input line 133 (19) -- item name
>>>  _diffrn_scan_frame.exposure_time not found in the dictionary
>>>
>>>  2.  CBFlib: warning -- required parent tag _array_data.binary_id for
>>>  _diffrn_data_frame.binary_id in G1F_3_0017 not given
>>>  CBFlib: warning -- required parent tag _array_data.binary_id for
>>>  _array_intensities.binary_id in G1F_3_0017 not given
>>>   Time to read input_cif: 0.414s
>>>
>>>
>>>  Let's take them one at a time:
>>>
>>>  1.  _diffrn_scan_frame.exposure_time really is not in dictionary and the
>>>  DECTRIS tag you associate it with
>>>
>>>    Exposure_period   _expp_
>>>
>>>  really isn't so much an exposure time as an interval between exposures.
>>>  I propose we add the following tags to the dictionary:
>>>      _diffrn_scan.time_increment
>>>      _diffrn_scan_frame.time_increment
>>>
>>>      _diffrn_scan.time_rstrt_incr
>>>      _diffrn_scan_frame.time_rstrt_incr
>>>
>>>  The first two would be used for the Dectris "Exposure_period".
>>>  The second two would be optional tags for the case where what
>>>  we have is the time from end of one integration to the start
>>>  of the next integration, rather then the time from start to start
>>>
>>>  2.  _array_data.binary_id
>>>
>>>  This seems to be missing.  The value you give should agree with the
>>>  value you use in _diffrn_data_frame.binary_id and
>>>  _array_intensities.binary_id.  In the past, we have always used 1,
>>>  and that is the default, but I would suggest using the frame
>>>  number (counting from 1) instead.  If you do that, you should
>>>  put the same value into _diffrn_scan_frame.frame_number.
>>>  Ideally, this should start with a frame number in the DECTRIS
>>>  header (that would be a new field, I believe) and would provide
>>>  a cross-check on frame numbers, instead of just relying on the
>>>  file name.
>>>
>>>  Regards,
>>>   Herbert
>>>
>>>
>>>
>>>  At 3:44 PM +0000 6/30/11, <Graeme.Winter at Diamond.ac.uk> wrote:
>>>>  Dear people interested in imgCIF,
>>>>
>>>>  As you will no doubt know, we have been battling for a while to get
>>>>  to generating full cbf images from our detectors. We have now
>>>>  reached a milestone - full cbf images from Pilatus instruments which
>>>>  cbflib recognises and can read, and that can be processed with our
>>>>  automated software. However, we felt that it would be important to
>  >>> make these data available for review among CIF experts to ensure
>>>>  that there are no real boo boo's in there.
>>>>
>>>>  So, a full Pilatus2M image data set can be found from:
>>>>
>>>>  ftp://ftpanon.diamond.ac.uk/GraemeWinter/CBF/Pilatus2M/C.tar.bz2
>  >>>
>>>>  which appears to process well using xia2 / XDS, which relies on
>>>>  pycbf now included in cctbx to read the image headers and is
>>>>  currently an unreleased version. It is unreleased for a good reason:
>>>>  the assumptions which define the CIF in the images here are the same
>>>>  set of assumptions used in understanding them, though the headers
>>>>  are based on those from an ADSC Q315 so should be good. If you are
>>>>  keen though you can get the bleeding edge code from sourceforge.
>>>>
>>>>  The axes described are the "canonical" rather than true ones i.e.
>>>>  they are where we would ideally like everything to be rather than
>>>>  where everything is actually measured to be - the latter will be a
>>>>  refinement at some point in the future.
>>>>
>>>>  What would I like? People to download this, unpack it and critique
>>>>  the headers which are contained therein.
>>>>
>>>>  For people who are interested in the how, you will also find a .cif
>>>>  file in the tarball - this was generated by GDA from a metatemplate
>>>>  using some Python code and is used by the "camserver" program to
>>>>  compose the full cbf image. Any errors in the CIF are therefore my
>>>>  fault in composing this template and need fixing!
>>>>
>>>>  Thanks in advance and best wishes,
>>>>
>>>>  Graeme
>>>>
>>>>  Dr. Graeme Winter
>>>>  Senior Software Scientist
>>>>  Diamond Light Source
>>>>
>>>>  +44 1235 778091 (work)
>>>>  +44 7786 662784 (work mobile)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>>
>>>>  This e-mail and any attachments may contain confidential, copyright
>>>>  and or privileged material, and are for the use of the intended
>>>>  addressee only. If you are not the intended addressee or an
>>>>  authorised recipient of the addressee please notify us of receipt by
>>>>  returning the e-mail and do not use, copy, retain, distribute or
>>>>  disclose the information in or attached to the e-mail.
>>>>
>>>>  Any opinions expressed within this e-mail are those of the
>>>>  individual and not necessarily of Diamond Light Source Ltd.
>>>>
>>>>  Diamond Light Source Ltd. cannot guarantee that this e-mail or any
>>>>  attachments are free from viruses and we cannot accept liability for
>>>>  any damage which you may sustain as a result of software viruses
>>>>  which may be transmitted in or with the message.
>>>>
>>>>  Diamond Light Source Limited (company no. 4375679). Registered in
>>>>  England and Wales with its registered office at Diamond House,
>>>>  Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11
>>>>  0DE, United Kingdom
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  _______________________________________________
>>>>  imgcif-l mailing list
>>>>  imgcif-l at iucr.org
>>>>  http://scripts.iucr.org/mailman/listinfo/imgcif-l
>>>
>>>
>>>  --
>>>  =====================================================
>>>  Herbert J. Bernstein, Professor of Computer Science
>>>    Dowling College, Kramer Science Center, KSC 121
>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>>                  +1-631-244-3035
>>>                  yaya at dowling.edu
>>>  =====================================================
>>>  _______________________________________________
>>>  imgcif-l mailing list
>>>  imgcif-l at iucr.org
>>>  http://scripts.iucr.org/mailman/listinfo/imgcif-l
>>>
>>>  --
>>>  This e-mail and any attachments may contain confidential, 
>>>copyright and or privileged material, and are for the use of the 
>>>intended addressee only. If you are not the intended addressee or 
>>>an authorised recipient of the addressee please notify us of 
>>>receipt by returning the e-mail and do not use, copy, retain, 
>>>distribute or disclose the information in or attached to the 
>>>e-mail.
>>>  Any opinions expressed within this e-mail are those of the 
>>>individual and not necessarily of Diamond Light Source Ltd.
>>>  Diamond Light Source Ltd. cannot guarantee that this e-mail or 
>>>any attachments are free from viruses and we cannot accept 
>>>liability for any damage which you may sustain as a result of 
>>>software viruses which may be transmitted in or with the message.
>  >> Diamond Light Source Limited (company no. 4375679). Registered 
>in England and Wales with its registered office at Diamond House, 
>Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 
>0DE, United Kingdom
>  >>
>>>
>>>
>>>
>>>  _______________________________________________
>>>  imgcif-l mailing list
>>>  imgcif-l at iucr.org
>>>  http://scripts.iucr.org/mailman/listinfo/imgcif-l
>>>
>>  _______________________________________________
>>  imgcif-l mailing list
>>  imgcif-l at iucr.org
>>  http://scripts.iucr.org/mailman/listinfo/imgcif-l
>
>_______________
>Michael L. Blum                    Toll Free: 877-627-XRAY (627-9729)
>Rayonix, LLC                        Tel: 847-869-1548
>1880 Oak Avenue                 Fax: 847-869-1587
>Evanston, IL  60201              Email: blum at rayonix.com
>USA                                       WWW: www.rayonix.com
>
>
>
>
>_______________________________________________
>imgcif-l mailing list
>imgcif-l at iucr.org
>http://scripts.iucr.org/mailman/listinfo/imgcif-l


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya at dowling.edu
=====================================================


More information about the imgcif-l mailing list