Accent escape sequences
Joe Krahn krahn at niehs.nih.govTue Mar 6 15:41:22 GMT 2007
- Previous message: Accent escape sequences
- Next message: Accent escape sequences
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
James Hester wrote: > Thinking about the mechanics of implementing these suggestions, it would > make sense to define different types of text field using the > _item_type_list.code DDL2 attribute. Currently mmCIF appears to have > only 'text' for multiline data, and imgCIF has 'binary' in addition to > this. A new type (e.g. 'mime') could specify a regex that matches a > mime header, something like what is done for the imgCIF 'binary' type. Should these follow standard MIME types for better standardization, or maybe just have an optional subtype code specific to CIF to allow for better specialization? I would go with the latter. > > A variation on this would be to define a larger number of > _item_type_list.codes corresponding to the various text formats of > interest, for example 'ascii_markup','tex','html','mathml'. This would > mean that the format of a given data item would be determined at > dictionary writing time if a single type code is given in the > dictionary. While this might work and be quite useful when writing > dictionaries, it is probably too onerous when producing data files. So > the data dictionary would specify a list of possible text type codes, > and a magic number or mime header would be useful in the data item text > field in order to disambiguate. Why is that too onerous for text fields? CIF already has the problem that everything is plain text in the absence of a dictionary, yet there is no numeric flag. CIF would be much more self-defined if that were the case, but the current design is to base everything on a dictionary. In general, a generic CIF parser should be able to handle all of the text fields un-processed, and leave it to the reader to make sense of it. That is why the multipart/alternative is good; even the raw form is readable. The only caveat added by MIME is that regions between the multi-part boundaries may contain the "<eof>;" end mark. > > Regarding the suggestion that there be several representations of the > same text using a mime multipart approach, I think caution is warranted > insofar as this might relate to dictionary data items (as opposed to > data file data items), in that all of the parts should be kept > synchronised, entailing more work, and work which involves specialised > knowledge. Perhaps alternative parts need to include a flag as to which form is the authoritative representation. Then, if it is changed and the other form(s) are not, the alternatives must be deleted or marked "invalid" until they can be remade properly. However, it is certainly worth avoiding excessive use. In general, something like equations should only be edited by the author(s), whereas most other users will handle it in "read only" form. This could be used for something like internationalization of CIF dictionaries as well. In that case, I assume that English would be the primary reference, and other contributors can add translations. If the English part changes, the translations could be flagged as out-of-date until a language contributor can update the translation. Of course, my native language is American English, so it is not a big deal for me. What do non-English crystallographers think? I have wondered about the possibility of having US and UK alternatives for words like metre. Or should we just declare US spelling as wrong? Joe
- Previous message: Accent escape sequences
- Next message: Accent escape sequences
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the comcifs mailing list