[medsbio-l] Fwd: RE: HDF image format

Herbert J. Bernstein yaya at bernstein-plus-sons.com
Tue Sep 11 01:05:22 BST 2007


Matt Dougherty has asked that the following emails be posted to the
medsbio list. -- HJB



>Delivered-To: yaya-bernstein-plus-sons:com-yaya at bernstein-plus-sons.com
>X-Virus-Check-By: mailwash16.pair.com
>Subject: RE: HDF image format
>Date: Mon, 10 Sep 2007 18:17:09 -0500
>Thread-Topic: HDF image format
>Thread-Index: AcfwuScYxuroJ4TrRkyr0l1W6nLn3gDRYo0O
>From: "Dougherty, Matthew T." <matthewd at bcm.tmc.edu>
>To: "Thomas Goddard" <goddard at cgl.ucsf.edu>,
>         "Ludtke, Steven J." <sludtke at bcm.tmc.edu>,
>         "Pawel Penczek" <Pawel.A.Penczek at uth.tmc.edu>
>Cc: <yaya at bernstein-plus-sons.com>, <mfolk at hdfgroup.org>, <ram at cs.umb.edu>,
>         "Chiu, Wah" <wah at bcm.tmc.edu>
>X-Status:
>X-Keywords:                
>
>Hi Tom,
>
>To answer your primary question, yes it is very important to have a 
>unified image data format.
>
>The EMAN HDF format is based on a straw-man prototype I drafted a 
>few years ago.
>This prototype has revealed some performance/design problems in HDF.
>Also, as I thought more deeply about this layout, I realized 
>deficiencies in my format design requiring a new prototype.
>
>The real opportunity here is that HDF has not been adopted in the 
>biological community; all of the prototypes have been created by our 
>labs, so there is not an installed base of HDF EM formats resisting 
>change.
>
>As the EM images get larger, existing EM formats will fail and the 
>need for a capable format design to replace them is critical; a 
>unified approach is best.
>
>
>Recently I spoke at the Consortium for Management of Experimental 
>Data in Structural Biology Third imgCIF workshop at 9th 
>International Conference on Biology and Synchrotron Radiation, led 
>by Herbert Bernstein.  My talk on the "Status of Data Formats in 
>Cryo EM" is located at 
><http://medsbio.org/meetings/BSR_2007_imgCIF_Workshop/>http://medsbio.org/meetings/BSR_2007_imgCIF_Workshop/
>
>Mike Folk, director of HDF, and I are recent members of MEDSBIO core 
>committee.  Mike and I had time to discuss at length the issues 
>brought up in the December 7, 2006 teleconference on developing an 
>EM image format.
>
>Another opportunity is that MEDSBIO is the center point for the 
>integration of the NEXUS & imgCIF formats within the beamline 
>community; devising an EM/HDF format that is interoperable with 
>these efforts would be strategic.
>
>Getting  the experimental data, viz, data storage, repository, 
>standards and archival communities working in concert is most 
>desirable in terms of legacy, but will require a concerted effort of 
>all parties.  A well designed EM format could be appropriated by 
>other biological imaging communities.
>
>
>
>
>Regarding the four capabilities noted in your email, all of them are 
>definitely needed, but I differ on the approach. 
>1) the generation of sub sampled density maps has different 
>possibilities (i.e. skipping pixels, use of median filters, 
>generation of datasets directly from inverse space, use of JPEG2000 
>part 10).  A mechanism to manage/track this is needed.
>2) users prefer a variety of coordinate transformations (Euler angle 
>variations, quaternions, cosine matrices).  How best to manage this?
>3) alternate disk layouts can be transparently accomplished by a 
>common EM format API, or by the viz, et al, softwares directly 
>manipulating the HDF API.  What method provides best long term 
>performance and simplicity?
>4) one comprehensive strategy to deal with transformations, 
>symmetry, and cell angles would be preferred.  This also has 
>implications regarding item #3.
>
>
>
>Instead of one long email, I will be sending you eight more emails 
>over the next few days that address issues I have identified in the 
>implementation of HDF for biological imaging applications:
>
>1) definition of an image core
>2) metadata,  hdf attributes & pytables
>3) archiving, provenance, and the role of METS & OAIS
>4) management of transformations and symmetry
>5) integration of LSID
>6) needed enhancements to HDF
>7) design goals for digital masters and derivatives
>8) proposed collaboration roadmap
>
>
>If there are no objections, I would like to ask Herbert Bernstein to 
>post these emails on the MEDSBIO website.
>
>
>
>Matthew
>
>
>
>
>-----Original Message-----
>From: Thomas Goddard 
>[<mailto:goddard at cgl.ucsf.edu>mailto:goddard at cgl.ucsf.edu]
>Sent: Thu 9/6/2007 1:59 PM
>To: Ludtke, Steven J.; Pawel Penczek; Dougherty, Matthew T.
>Subject: HDF map format
>
>Hi Steve, Pawel, Matthew,
>
>    Do you think it is important that Chimera and EMAN use the same HDF
>representation for density maps?
>
>    I have been experimenting with HDF5 file format for EM density maps
>in Chimera for some months.  I've been trying 4 new capabilities, mostly
>with EM tomography in mind:
>
>1) Put subsampled maps in file for fast loading and display of large
>data sets (> 100 Mbytes).
>
>2) Put coordinate rotation in file for bricks of data extracted from
>another map that are not aligned with the original maps axes.  Commonly
>needed in tomography.
>
>3) Allow disk layout with alternate, possibly more than one chunk shape,
>for fast disk reads of xy planes, xz planes, and yz planes, and sub-regions.
>
>4) Include symmetry matrices for single-particle reconstructions in hdf
>map header.  Used in fitting of monomers into map.
>
>    None of these added capabilities are part of the EMAN HDF format as
>far as I know, so my HDF maps have different format than EMAN HDF maps.
>   Chimera reads both, but EMAN won't read the HDF maps written by
>Chimera.  That is not ideal.  Below is an example of the current Chimera
>HDF.
>
>         Tom
>
>
>
># Example HDF5 format written by Chimera.
>#
>#  /image
>#    chimera_version "1.2422"
>#    step (1.2, 1.2, 1.2)
>#    origin (-123.4, -522, 34.5)
>#    cell_angles (90.0, 90.0, 90.0)
>#    rotation_axis (0.0, 0.0, 1.0)
>#    rotation_angle 45.0
>#    data (3d array of uint8 (123,542,82))
>#    data_acs (3d array of uint8 (123,542,82), alternate chunk shape)
>#    data_2 (3d array of uint8 (61,271,41))
>#        subsample_spacing (2, 2, 2)
>#    (more subsampled or alternate chunkshape versions of same data)
>#
># Names "chimera_version", "step", "origin", "cell_angles",
># "rotation_axis", "rotation_angle", "subsample_spacing" are fixed
># while "image", "data", "data_acs" and "data_2" can be any name.
>#
># In the example "image" is an HDF group, "chimera_version", "step",
># "origin", "cell_angles", "rotation_axis", "rotation_angle",
># are group attributes, "data", "data_acs" and "data_2" are
># hdf datasets (arrays), and "subsample_step" is a dataset attribute.
>#
># All data sets within the group represent the same data, such as optional
># subsampled arrays or alternate chunk shape for efficient disk access.
>#
># Cell angles need not be included if they are 90,90,90.  They are
># included for handling crystallographic density maps.  An identity
># rotation need not be included.  The rotation angle is in degrees.
>#
># The file is saved with the Python PyTables modules which includes
># additional attributes "VERSION", "CLASS", "TITLE",
>"PYTABLES_FORMAT_VERSION".
>#


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya at dowling.edu
=====================================================


More information about the medsbio-l mailing list