[medsbio-l] Fwd: RE: HDF image format
Herbert J. Bernstein yaya at bernstein-plus-sons.comTue Sep 11 01:05:22 BST 2007
- Previous message: [medsbio-l] Draft Report on Third imgCIF workshop
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Matt Dougherty has asked that the following emails be posted to the medsbio list. -- HJB >Delivered-To: yaya-bernstein-plus-sons:com-yaya at bernstein-plus-sons.com >X-Virus-Check-By: mailwash16.pair.com >Subject: RE: HDF image format >Date: Mon, 10 Sep 2007 18:17:09 -0500 >Thread-Topic: HDF image format >Thread-Index: AcfwuScYxuroJ4TrRkyr0l1W6nLn3gDRYo0O >From: "Dougherty, Matthew T." <matthewd at bcm.tmc.edu> >To: "Thomas Goddard" <goddard at cgl.ucsf.edu>, > "Ludtke, Steven J." <sludtke at bcm.tmc.edu>, > "Pawel Penczek" <Pawel.A.Penczek at uth.tmc.edu> >Cc: <yaya at bernstein-plus-sons.com>, <mfolk at hdfgroup.org>, <ram at cs.umb.edu>, > "Chiu, Wah" <wah at bcm.tmc.edu> >X-Status: >X-Keywords: > >Hi Tom, > >To answer your primary question, yes it is very important to have a >unified image data format. > >The EMAN HDF format is based on a straw-man prototype I drafted a >few years ago. >This prototype has revealed some performance/design problems in HDF. >Also, as I thought more deeply about this layout, I realized >deficiencies in my format design requiring a new prototype. > >The real opportunity here is that HDF has not been adopted in the >biological community; all of the prototypes have been created by our >labs, so there is not an installed base of HDF EM formats resisting >change. > >As the EM images get larger, existing EM formats will fail and the >need for a capable format design to replace them is critical; a >unified approach is best. > > >Recently I spoke at the Consortium for Management of Experimental >Data in Structural Biology Third imgCIF workshop at 9th >International Conference on Biology and Synchrotron Radiation, led >by Herbert Bernstein. My talk on the "Status of Data Formats in >Cryo EM" is located at ><http://medsbio.org/meetings/BSR_2007_imgCIF_Workshop/>http://medsbio.org/meetings/BSR_2007_imgCIF_Workshop/ > >Mike Folk, director of HDF, and I are recent members of MEDSBIO core >committee. Mike and I had time to discuss at length the issues >brought up in the December 7, 2006 teleconference on developing an >EM image format. > >Another opportunity is that MEDSBIO is the center point for the >integration of the NEXUS & imgCIF formats within the beamline >community; devising an EM/HDF format that is interoperable with >these efforts would be strategic. > >Getting the experimental data, viz, data storage, repository, >standards and archival communities working in concert is most >desirable in terms of legacy, but will require a concerted effort of >all parties. A well designed EM format could be appropriated by >other biological imaging communities. > > > > >Regarding the four capabilities noted in your email, all of them are >definitely needed, but I differ on the approach. >1) the generation of sub sampled density maps has different >possibilities (i.e. skipping pixels, use of median filters, >generation of datasets directly from inverse space, use of JPEG2000 >part 10). A mechanism to manage/track this is needed. >2) users prefer a variety of coordinate transformations (Euler angle >variations, quaternions, cosine matrices). How best to manage this? >3) alternate disk layouts can be transparently accomplished by a >common EM format API, or by the viz, et al, softwares directly >manipulating the HDF API. What method provides best long term >performance and simplicity? >4) one comprehensive strategy to deal with transformations, >symmetry, and cell angles would be preferred. This also has >implications regarding item #3. > > > >Instead of one long email, I will be sending you eight more emails >over the next few days that address issues I have identified in the >implementation of HDF for biological imaging applications: > >1) definition of an image core >2) metadata, hdf attributes & pytables >3) archiving, provenance, and the role of METS & OAIS >4) management of transformations and symmetry >5) integration of LSID >6) needed enhancements to HDF >7) design goals for digital masters and derivatives >8) proposed collaboration roadmap > > >If there are no objections, I would like to ask Herbert Bernstein to >post these emails on the MEDSBIO website. > > > >Matthew > > > > >-----Original Message----- >From: Thomas Goddard >[<mailto:goddard at cgl.ucsf.edu>mailto:goddard at cgl.ucsf.edu] >Sent: Thu 9/6/2007 1:59 PM >To: Ludtke, Steven J.; Pawel Penczek; Dougherty, Matthew T. >Subject: HDF map format > >Hi Steve, Pawel, Matthew, > > Do you think it is important that Chimera and EMAN use the same HDF >representation for density maps? > > I have been experimenting with HDF5 file format for EM density maps >in Chimera for some months. I've been trying 4 new capabilities, mostly >with EM tomography in mind: > >1) Put subsampled maps in file for fast loading and display of large >data sets (> 100 Mbytes). > >2) Put coordinate rotation in file for bricks of data extracted from >another map that are not aligned with the original maps axes. Commonly >needed in tomography. > >3) Allow disk layout with alternate, possibly more than one chunk shape, >for fast disk reads of xy planes, xz planes, and yz planes, and sub-regions. > >4) Include symmetry matrices for single-particle reconstructions in hdf >map header. Used in fitting of monomers into map. > > None of these added capabilities are part of the EMAN HDF format as >far as I know, so my HDF maps have different format than EMAN HDF maps. > Chimera reads both, but EMAN won't read the HDF maps written by >Chimera. That is not ideal. Below is an example of the current Chimera >HDF. > > Tom > > > ># Example HDF5 format written by Chimera. ># ># /image ># chimera_version "1.2422" ># step (1.2, 1.2, 1.2) ># origin (-123.4, -522, 34.5) ># cell_angles (90.0, 90.0, 90.0) ># rotation_axis (0.0, 0.0, 1.0) ># rotation_angle 45.0 ># data (3d array of uint8 (123,542,82)) ># data_acs (3d array of uint8 (123,542,82), alternate chunk shape) ># data_2 (3d array of uint8 (61,271,41)) ># subsample_spacing (2, 2, 2) ># (more subsampled or alternate chunkshape versions of same data) ># ># Names "chimera_version", "step", "origin", "cell_angles", ># "rotation_axis", "rotation_angle", "subsample_spacing" are fixed ># while "image", "data", "data_acs" and "data_2" can be any name. ># ># In the example "image" is an HDF group, "chimera_version", "step", ># "origin", "cell_angles", "rotation_axis", "rotation_angle", ># are group attributes, "data", "data_acs" and "data_2" are ># hdf datasets (arrays), and "subsample_step" is a dataset attribute. ># ># All data sets within the group represent the same data, such as optional ># subsampled arrays or alternate chunk shape for efficient disk access. ># ># Cell angles need not be included if they are 90,90,90. They are ># included for handling crystallographic density maps. An identity ># rotation need not be included. The rotation angle is in degrees. ># ># The file is saved with the Python PyTables modules which includes ># additional attributes "VERSION", "CLASS", "TITLE", >"PYTABLES_FORMAT_VERSION". ># -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya at dowling.edu =====================================================
- Previous message: [medsbio-l] Draft Report on Third imgCIF workshop
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the medsbio-l mailing list