[Cif2-encoding] Splitting of imgCIF and other sub-topics
Herbert J. Bernstein
yaya at bernstein-plus-sons.com
Tue Aug 24 14:31:51 BST 2010
Dear James,
I have not been at all reticent -- imgCIF will be very poorly supported
by CIF2 as currently proposed. Of necessity, imgCIF changes encodings
internally -- that it why it uses MIME -- same problem as email with
images, same solution.
Any purely text version has at least a 7% overhead as compared to
pure binary. Restricting to UTF-8 increases the overhead to at least 50%.
We may get away with the 7% (UTF-16). The 50% version (UTF-8) will be
ignored by the community as unworkable. The most likely to be used
version will be the current DDL2-based version with embedded
compressed binaries that I am augmenting with DDLm-like features
and merging in with HDF5.
As I noted many months ago, the unfortunate reality is that the
current CIF2 effort will not merge well with imgCIF. If avoiding
a split is a important -- we need a meeting. I would suggest
involving Bob Sweet and holding it at BNL in conjunction with
something relevant to NSLS-II.
Regards,
Herbert
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
yaya at dowling.edu
=====================================================
On Tue, 24 Aug 2010, James Hester wrote:
> Hi Herbert: regarding imgCIF, I agree that splitting it off is not a
> desirable outcome. I would like to get an idea of how well imgCIF can
> be accommodated under the various encoding proposals currently
> floating around, as you have been rather reticent to bring it up. My
> naive take on things is that a UTF8-only encoding scheme for CIF2
> would not pose significant issues for imgCIF, and a decorated UTF16
> encoding in the style of Scheme B would be even better, and quite
> adequate, so imgCIF is not actually presenting any problems and so was
> a red herring.
>
> I'm not sure that face-to-face or Skype discussions are necessarily
> going to be more productive. Writing things down, while slower,
> allows me at least to collect my thoughts and those of other
> participants, and hopefully make a reasoned contribution (my apologies
> if I am too long-winded) and as an added bonus those thoughts are
> recorded for later reference. For example, where would I now find the
> background on why a container format for imgCIF is such a bad idea?
> Presumably that was all thrashed out in face to face discussions, and
> no record now remains.
>
> On Tue, Aug 24, 2010 at 8:56 PM, Herbert J. Bernstein
> <yaya at bernstein-plus-sons.com> wrote:
>> Dear Colleagues,
>>
>> James' and John's last interchange is so voluminous, I doubt any of
>> us has been able to fully appreciate the rich complexity of ideas
>> contained therein. For example, one of the suggestions far down in
>> the text is:
>>
>> (James now) Indeed. My intent with this specification was to ensure
>> that third parties would be able to recover the encoding. If imgCIF is
>> going to cause us to make such an open-ended specification, it is
>> probably a sign that imgCIF needs to be addressed separately. For
>> example, should we think about redefining it as a container format,
>> with a CIF header and UTF16 body (but still part of the
>> "Crystallographic Information Framework")?
>>
>> The idea of an imgCIF "header" in CIF format and a image in another is an
>> old, well-established, thoroughly discussed, and mistaken idea, rejected
>> in 1998. The handling of multiple images in a single file (e.g.
>> a jpeg thumbnail and crystal image and a full-size diffraction image)
>> requires the ability to switch among encodings within the file --
>> something handled by the current DDL2 and MIME-based imgCIF format and
>> which would be a serious problem in CIF2 has currently proposed,
>> increasing the chances that we will have to move imgCIF entirely into
>> HDF5 and abandon the CIF representation entirely, sharing only
>> the dictionary and not the framework.
>>
>> If you look carefully, you will see a similar trend with mmCIF, in which
>> and XML representation sharing the dictionary plays a much more
>> important role than the CIF format.
>>
>> Is it really desirable to make the new CIF format so rigid and
>> unadaptable that major portions of macromolecular crysallography
>> end up migrating to very different formats, as they already are
>> doing? Yes, there is great value in having a common dictionary,
>> but would there not be additional value in having a sufficiently
>> flexible common format to allow for more software sharing than
>> we now have? It is really desirable for us to continue in the
>> direction of a single macromolecular experiment having to
>> deal with HDF5 and CIF/DDL2/MIME representations of the image data
>> during collection, CCP4-style CIF representations during processing
>> and deposition and legacy PDB and PDBML representations in subsequent
>> community use? If we could be a little bit more flexible, we might be
>> able to reduce the data interchange software burdens a little.
>> Right now, this discussion seems headed in the direction of simply
>> adding yet another data representation (DDLm/CIF2) to the mix,
>> increasing the chances of mistranslation and confusion, rather
>> that reducing them.
>>
>> Please, step back a bit from the detailed discussion of UTF8 and
>> look at the work-flow of doing and publishing crystallographic
>> experiments and let us try to make a contribution that simplifies
>> it, not one that makes it more complex than it needs to be.
>>
>> I suggest we need to meet and talk, either face-to-face, or by skype.
>>
>> Regards,
>> Herbert
>>
>> =====================================================
>> Herbert J. Bernstein, Professor of Computer Science
>> Dowling College, Kramer Science Center, KSC 121
>> Idle Hour Blvd, Oakdale, NY, 11769
>>
>> +1-631-244-3035
>> yaya at dowling.edu
>> =====================================================
>>
>> _______________________________________________
>> cif2-encoding mailing list
>> cif2-encoding at iucr.org
>> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>>
>
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> cif2-encoding mailing list
> cif2-encoding at iucr.org
> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>
More information about the cif2-encoding
mailing list