From yaya at bernstein-plus-sons.com Sun May 14 23:56:13 2006 From: yaya at bernstein-plus-sons.com (Herbert J. Bernstein) Date: Mon May 15 03:56:30 2006 Subject: [medsbio-l] MEDSBIO goals Message-ID: Welcome to the discussion list for the Consortium on Management of Experimental Data in Structural Biology (MEDSBIO). Our first discussion topic will be to make any necessary refinements and adjustments to our goals. Our starting point is given below. Please submit your comments and suggestions to medsbio-l@iucr.org. We should have a medsbio.org web-site by the middle of next week. 1. Collaboratively resolve the interface issues among multiple structural biology data management protocols, including imgCIF, NeXuS, vendor data formats, instrument control and signalling protocols, local and remote experiment control protocols, etc. with the objective of making the collection, transfer and archiving of data for experiments in structural biology as efficient as practicable. 2. Maintain an archive of documentation on standards and proposals for ontologies, software, hardware specifications, web templates and other documentation related to such protocols. 3. Maintain an archive of open source software related to such protocols. 4. Maintain a archive of samples and test cases related to such protocols. 5. Run annual workshops on issues relating to such protocols. 6. Contribute open source software to fill gaps in the infrastructure related to such protocol. 7. Gather and where necessary create curricular material to assist in training experimenters in issues related to such protocols. -- H. J. Bernstein -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 Office: +1-631-244-3035 Lab (KSC 020): +1-631-244-3451 yaya@dowling.edu ===================================================== From yaya at bernstein-plus-sons.com Wed May 17 10:48:28 2006 From: yaya at bernstein-plus-sons.com (Herbert J. Bernstein) Date: Wed May 17 14:48:50 2006 Subject: [medsbio-l] Partial Support Applications for imgCIF Workshop in Hawaii Message-ID: Partial Support Applications for imgCIF Workshop in Hawaii 22 July 2006 in conjunction with the Annual Meeting of the American Crystallographic Association, 22-27 July 2006 Honolulu, Hawaii Thanks to funding from the US. Department of Energy under grant ER64212-1027708-0011962 and (subject to pending final action on funding) from the National Science Foundation, a limited number of Partial Support Awards are available for participants in the imgCIF workshop on 22 July 2006 in association with the Summer 2006 meeting of the American Crystallographic Association in Hawaii (see the WK.02 workshop description below). These awards will be in the form of reimbursements after attendance at the workshop for actual expenses and after the submission of receipts and hotel bills. Applications from students, young researchers, and those with an interest in the development of crystallographic software or beam line infrastructure are particularly encouraged. You don't have to be an expert, but it would not hurt if you were interested in the possibility of becoming an expert some day and of contributing to the management and use of synchrotron image data. Applicants will be accepted on the basis of their interest in the subject matter of the workshop as explained in their application letter or email. Please send email or a letter to: Prof. Herbert J. Bernstein re: imgCIF Workshop Support Application Dowling College 150 Idle Hour Blvd, KSC 121 Oakdale, NY 11769-1999, USA yaya@dowling.edu Applications received before 22 May 2006 from people who have pre-registered both for the ACA meeting and for the workshop will be given priority, but, subject to funds remaining available, applications will be considered through 1 July 2006. Please include the following information: Name, address, email address Position: Undergraduate student (School, major, expected date of graduation) Graduate student (School, field of study, area of study, degree) Postdoc (Institution, field of research, supervisor) Research Position (Institution, field of research) Teaching Position (institution, Department, primary subjects taught) Registration Status: Registered for the meeting? Registered for WK.02? Level of support requested: One night of student housing, student workshop registration Dollar amount of request One night of postdoc housing, workshop registration Dollar amount of request One night of other housing, workshop registration Dollar amount of request More extensive support Details and dollar amount of request 300 to 600 words explaining your interest in the subject matter of this workshop We expect to provide a limited number of awards in the range of $200 to $500, and, with sufficient justification and subject to funding availability, some larger awards. If you apply for a supplement and it is awarded to you, we will send you the necessary forms to use to submit verified copies of the receipts for the WK.02 workshop registration and one night of housing or other expenses relevant to attendance at the workshop to justify reimbursement after the meeting ends. Naturally, to get the supplement, you have to participate in the workshop and submit an Advance Registration form (available on the meeting web site: www.hwi.buffalo.edu/aca/) The supplement cannot be used for the main meeting registration and it would not be appropriate to use the supplement for expenses being reimbursed from other sources. These awards will be made on an equal opportunity basis. WK.02 The Management of Synchrotron Image Data: The imgCIF File System and Beyond Herbert J. Bernstein, yaya@dowling.edu Robert M. Sweet, sweet@bnl.gov The pace of data collection and the volume of data collected at synchrotron beam lines is increasing. The ACA Data, Standards, and Computing Committee is spearheading an effort to improve the efficiency of the handling and storage of these data by encouraging the adoption of common data formats and standard software interfaces. The goal of this is firstly to have the data be self defining, therefore equally accessible to all data-reduction and -visualization codes. The second goal, for the purposes of secure archiving, is to provide a robust internal documentation of the source of the data. The first imgCIF/CBF workshop took place at the Brookhaven National Laboratory in 1997 and proposed a format combining support for an efficient binary representation of images with a fully CIF-compliant ASCII equivalent. An imgCIF/CBF dictionary, and software to support the format, were created, are available on the web, and are described in Volume G of the IUCr International Tables for Crystallography. Now the community should adopt a consensus standard for management of data at synchrotron beam lines, and to make it easier for users to process data taken from various beam lines. Also, as our science evolves, new concepts will be considered; possibilities include NeXuS and XML. The primary activity at the first workshop will be to adopt a new standard for synchrotron image data for immediate use. As a practical matter, that standard should be based on the existing imgCIF IUCr standard. At the same time, the attendees, including standards developers, crystallographic software developers, and interested users, will be invited to seek a consensus on the next steps to extend imgCIF/CBF. A follow-up workshop will take place during the 2006-2007 academic year to resolve any issues that arise in the adoption of the adjusted imgCIF standard, to discuss the extensions, and to enhance the standard as appropriate. There will be a wrap-up workshop in summer 2007. The workshop registration fee is $80 for students, $100 for all others, and does not include breakfast or lunch. -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 Office: +1-631-244-3035 Lab (KSC 020): +1-631-244-3451 yaya@dowling.edu ===================================================== From yaya at bernstein-plus-sons.com Wed May 17 20:56:26 2006 From: yaya at bernstein-plus-sons.com (Herbert J. Bernstein) Date: Thu May 18 00:56:37 2006 Subject: [medsbio-l] Relationship to BioSync Message-ID: The PDB just announced that they have revitalized and updated the BioSync web site. This makes an excellent complement to the MEDSBIO effort, and saves us from having to update that very valuable information. BioSync has a focus on the information that synchrotron users need. MEDSBIO has a focus more on very detailed technical information that beamline developers and maintainers need as well as the information that software designers and hardware designers need to work with a variety of data management protocols. Many people need both kinds of information. Hopefully a fruitful interaction will develop between these two perspectives on experimental data in structural biology -- the perspective of the user and the perspective of the developer to the benefit of both. -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 Office: +1-631-244-3035 Lab (KSC 020): +1-631-244-3451 yaya@dowling.edu ===================================================== From yaya at bernstein-plus-sons.com Sat May 20 16:38:36 2006 From: yaya at bernstein-plus-sons.com (Herbert J. Bernstein) Date: Sat May 20 20:38:49 2006 Subject: [medsbio-l] draft expanded goals for MEDSBIO Message-ID: The following is a draft of the goals for MEDSBIO expanded to help clarify the technical focus of the the group. Specific citations to other groups (PDB, BioSync, NOBUGS, etc.) will be added in the next draft, but first it is important to clarify the focus of this group. Comments, criticism and suggestions appreciated. -- H. J. Bernstein Consortium on Management of Experimental Data in Structural Biology (MEDSBIO) There is a complex relationship among raw experimental data, derived data and experimental models used in structural biology. There are strong collaborative efforts that help to achieve coherence and consistency in the nomenclature and representation of derived data and of experimental models in structural biology. There are many existing efforts in the management of raw data, wherein lies a problem. Each vendor of data collection equipment defines their own data acquisition protocols and data formats. Each synchrotron beamline development group layers their own data acquisition protocols and formats on top of and sometimes in place of a variety of vendor formats. Multiple collaborations have developed to reduce the complexity of raw data management data protocols in structural biology. For image data in synchrotron-based protein crystallography we have both imgCIF/CBF and NeXuS from collaborations as well a multiple vendor image formats, with not only different formats for different detectors, but even with different formats for the same type of detector. If we do not bring the imgCIF and NeXuS collaborations together with some significant number of vendors to establish clean, well-documented relationships among the formats, instead of standards resulting in coherence, they may add to the chaos as poorly documented variants of "standards" emerge. We are not certain that this risk can be, or even that it should be, avoided completely. Perfect standardization could suppress creativity and scientific development. We are creating a new consortium on the Management of Experimental Data in Structural Biology (MEDSBIO) not to enforce standardization on a single data management protocol, but to document clearly the interfaces among protocols, so that individual experimental efforts working in the intersection of multiple protocols can function as efficiently as possible and so that the competition among standards can be resolved as an open competition of ideas to the betterment of the science involved, rather than as a political exercise. The goals of the MEDSBIO consortium are to collaboratively resolve the interface issues among multiple structural biology data management protocols, including imgCIF, NeXuS, vendor data formats, instrument control and signaling protocols, local and remote experiment control protocols, etc. with the objective of making the collection, transfer and archiving of data for experiments in structural biology as efficient as practicable; maintain an archive of documentation on standards and proposals for ontologies, software, hardware specifications, web templates and other documentation related to such protocols; maintain an archive of open source software and links to closed source software related to such protocols; maintain a archive of samples and test cases related to such protocols; run annual workshops on issues relating to such protocols; contribute open source software to fill gaps in the infrastructure related to such protocols; gather and where necessary create curricular material to assist in training experimenters in issues related to such protocols. These efforts are primarily focused on the fine details of data acquisition, of managing raw data in hardware and software in ways that conserve resources. These are issues that users of this data often gloss over or do not consider at all. For the users, data derived from the raw data - e.g. structure factors derived from pixel-by-pixel photon counts are the primary data, to be provided by "black-box" systems. MEDSBIO is concerned with issues in the innards of those black boxes. There is a strong relationship between these internal issues and the issues that users must confront. They are connected by the data and the representations of the derived data required by the users. Thus if a particular user community were to standardize on, say, imgCIF for their "raw" data in a synchrotron environment using, say, NeXuS, for its overall data management, working with detectors using an idiosyncratic detector element coordinate system, the users well might wish to be isolated from NeXuS and the oddities of the detector coordinate system, but the beam line designers need to have a detailed, well-documented understanding of how to interface among all the messy innards that the users never wish to deal with. If this is not done well and done in a consistent manner at multiple beam lines, then, instead of imgCIF providing a standard, it will exist in multiple difficult-to-translate dialects. Because end users and developers have a lot in common and are tied together by the data itself as it is transformed from raw images, photon counts, axis settings, etc., it is important that there be collegial collaboration between people working on problems on both ends of the data stream, but it is equally important to allow the technical issues on the raw data side to be fully discussed and explored without being swamped by the equally demanding discussions needed on the derived data side. Therefore it is important to have a collaborative consortium in the developer community that is neither focused on a single data management protocol, nor dominated by discussions of derived data user-level issues. The MEDSBIO consortium formalizes several existing collaborations and introduces a new level of coordination and cooperation in working with raw experimental data of importance in structural biology, complementing well-established efforts in working with the data derived from this raw data, hopefully producing a better understanding of the data upon which the much experimental work in structural biology is based and an understanding of the issues which affect the quality and reliability of that data. By clarifying and codifying the parameters of the information streams that interact to produce the raw data, we hope to bring a new level of consistency and coherence to the presentation of scientific results of the experiments that depend upon this data, thereby facilitating reliable intercomparisons among experiments and facilitating analysis based upon the results of multiple experiments. -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 Office: +1-631-244-3035 Lab (KSC 020): +1-631-244-3451 yaya@dowling.edu ===================================================== From yaya at bernstein-plus-sons.com Thu May 25 17:18:08 2006 From: yaya at bernstein-plus-sons.com (Herbert J. Bernstein) Date: Thu May 25 21:18:21 2006 Subject: [medsbio-l] WK.02 agenda Message-ID: 25 May 2006 Draft Agenda of the imgCIF Workshop (WK.02) 22 July 2006 8:30 Definition of the problem What are the objectives of this workshop? R. Sweet 8:50 Brief refresher on the structure and flexibility of imgCIF and available supporting software and mechanisms for making changes to both accepted mechanism for updating H. Bernstein 9:10 - 9:30 Go round the room for people to introduce themselves, giving a few sentences on who they are and their involvement in the subject and to make adjustments in who will present and the order of the presentations after the break. 9:30-10 Break for coffee and to allow for last minute reproduction of late/revised handouts. 10:00-12:20 Short (10-15 minutes) prepared statements (based primarily on handouts, rather than visual aids) to help frame the discussion, for example: What is the problem should we be solving? G. Bricogne Issues for equipment vendors. C. Nielsen Issues for data-red'n software developers. W. Minor J. Pflugrath Issues for beam-line software developers. TBD Issues for other software developers. A. Ashton Issues for data archivists. J. Westbrook Issues for CIF wonks. B. McMahon Issues for relative to HDF and XML. TBD Presenters should leave some time for questions, so a strict 12 minute limit will apply to the formal part of any one presentation. 12:20-12:30 Organize the afternoon R. Sweet Working groups (Note: one of the working groups will be for novices and concentrate on familiarization with imgCIF) 12:30 Lunch 1:30 Working groups meet 2:30 Working groups merge into working group of the whole to decide whether conclusions can be drawn. 3-3:30 break 3:30 - 4:30 Further discussion 4:30-5 Planning for next steps 5 Adjourn for the day -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 Office: +1-631-244-3035 Lab (KSC 020): +1-631-244-3451 yaya@dowling.edu ===================================================== From yaya at bernstein-plus-sons.com Fri Jun 23 10:34:41 2006 From: yaya at bernstein-plus-sons.com (Herbert J. Bernstein) Date: Fri Jun 23 14:34:59 2006 Subject: [medsbio-l] next steps Message-ID: Dear Colleagues, The proposal for support of MEDSBIO has been submitted to the US National Science Foundation. Most of the information from that proposal will be posted to the MEDSBIO web site over the next week. It is time to look at our next steps. Suggestions would be appreciated. One very useful item would be to start collecting sample data for people to use in validating interface software. If you have a data file that you would be willing to share with the community, please email a URL of the file to me: yaya@bernstein-plus-sons.com Please do not email the file itself as an attachment. My spam filter will probably kill it. In your email, please include a "release" in the following form: "I certify that I am this holder of the copyright on this file. "I give permission to any and all persons to this file, to create files, documents and other information derived from this file, to mirror this file and to incorporate this file into software packages, including into open source software packages under the GPL subject to the following conditions: "1. Credit for this file be given to: ___________________ "2. No warranty express or implied is given with respect to this file. In particular, there is not now and is never to be any warranty for merchantability or fitness for use. "3. If the file or a file derived therefrom is distributed under a license other than the GPL or LGPL, then no patent, license or other mechanism shall be used to prevent others from distributing this file under the GPL or LGPL." If you do not have access to a web server able to serve the file, contact me and we will make arrangements to pick it up by other means. Regards, Herbert -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 Office: +1-631-244-3035 Lab (KSC 020): +1-631-244-3451 yaya@dowling.edu =====================================================