Dictionary release policy

James H jamesrhester at gmail.com
Fri Jan 6 04:06:55 GMT 2023


Dear COMCIFS,

I've initiated our very first release process from Github over at
https://github.com/COMCIFS/cif_core/issues/317. Hopefully this first
attempt will iron out some kinks and address some of Brian's comments in
the process. Feel free to follow along on Github. Those of you on the
cif_core or ccp4 mailing lists will be advised that a release is pending at
some point.

all the best,
James.

On Thu, 27 Oct 2022 at 23:15, Brian McMahon via comcifs <comcifs at iucr.org>
wrote:

> James
>
> This seems like a reasonable proposal.
>
> Peter MR responded with a comment on versioning, and there is some
> discussion on the github site, which looks fine to me:
>
> https://github.com/COMCIFS/comcifs.github.io/blob/master/accepted/dictionary_development_practices.md
>
> I see some further general discussion on the GitHub page that James
> references, and make the following remarks in response to that.
>
> Antanas asked 'One thing that should be further specified is which one
> of the multiple dictionary locations (GitHub, IUCr website, Zenodo,
> etc.) should be regarded as the original one and reflected in the
> dictionary URL', to which James replied 'My idea on URLs is that we
> should mint a DOI for the dictionary and use the dx.doi URL for the
> URL in the dictionary'.
>
> I think that's probably the right long-term decision, but the idea of
> dictionary DOI registration has been hovering around for a while, and
> is not necessarily an easy solution to implement, because of the
> requirements to characterise the registration with appropriate
> metadata. What's 'appropriate' is well defined by CrossRef for
> journal articles, but less clear-cut for other digital object types.
> Note that, strictly, DOI is an identifier only; the dx.doi.org
> resolver is a particular mechanism implemented by CrossRef, and
> it's making the resolvers work that requires well-defined metadata
> characterising the registered digital object. As a purist I'm not
> happy that CrossRef insists that DOIs should be cited with the
> resolver URL.
>
> In answer to Antanas' question about the preferred dictionary
> location, I would certainly prefer that canonical approved dictionaries
> are attributed to the IUCr. But currently the dictionary register
> contains URLs that use the ftp scheme, e.g.
>
>    data_validation_dictionaries
>      loop_
>        _cifdic_dictionary.name
>        _cifdic_dictionary.version
>        _cifdic_dictionary.DDL_compliance
>        _cifdic_dictionary.reserved_prefix
>        _cifdic_dictionary.date
>        _cifdic_dictionary.URL
>        _cifdic_dictionary.description
>
>    ######################################################################
>    # COMCIFS approved dictionaries                                      #
>    ######################################################################
>      cif_core.dic    .    1.4.1   .    .
>        ftp://ftp.iucr.org/pub/cifdics/cif_core.dic
>        'Core CIF Dictionary'
>
>
> The register itself is advertised in International Tables G 1st edition
> at the location ftp://ftp.iucr.org/pub/cifdics/cifdic.register, but
> the IUCr CIF dictionaries page links to it at the URL
> https://www.iucr.org/__data/iucr/cif/dictionaries/cifdic.register
>
> Now, we advertised the ftp: address since the early 1990s on the
> assumption that ftp was a simple and durable protocol that would
> reliably be supported indefinitely. And the register and its
> referenced dictionaries are indeed all still available using those
> URLs. That is, if you use a software client that interprets those
> URLs as requests over the FTP protocol to the cited locations, you
> will fetch the desired files.
>
> However, until recently such "a software client" would have included all
> common browsers, so you could simply click on a link in Chrome
> or Firefox to download the file. Browsers now no longer support
> the ftp protocol, so this makes the retrieval procedure less convenient
> for most users.
>
> What are this group's thoughts on how best to approach this? Now that
> I've retired from the IUCr, we'll need to interact with the Chester
> office to see what they can implement, but I'd appreciate a feel for
> a preferred solution before I talk to them. A few considerations:
>
> [1] I value long-term stability, so would like to see the resources
> still available over ftp, at least as one option, though I guess the
> thing to do there would be to look at the ftp logs in Chester to see
> if any dictionaries are in fact being downloaded from the ftp server.
>
> [2] I note that in a similar situation the PDB now advertises resources
> as https://ftp.wwpdb.org/pub/pdb/data/structures/... (i.e. their
> server ftp.wwpdb.org handle both ftp and http/https schemes). For IUCr
> to do so would require running a web server on the ftp server, or
> configuring the nginx proxy that sits in front of IUCr web services to
> translate the request to a different location on our existing website
> - something we will have to discuss with the Chester office.
>
> [3] Currently all the dictionaries are available also from the main
> IUCr website, but with those slightly ugly /__data/ components in the
> URLs. These are an artefact of the content management system currently
> in use, but I understand IUCr will move away from that system in the
> future, so some effort would need to go into an http/https naming
> scheme that would be robust across different webserver platforms.
>
> [4] Should _cifdic_dictionary.URL be replaced by scheme-specific
> data items (_cifdic_dictionary.URL_ftp, _cifdic_dictionary.URL_https
> etc.)? Or should one allow for a new category CIFDIC_DICTIONARY_LOCATION
> so that an arbitrary number of locations can be specified for each
> dictionary? This is the most general solution and could of course
> include DOI and other persistent URL formulations. One could then easily
> list multiple locations at IUCr, GitHub, Zenodo etc., though it would
> probably then also be worth thinking about providing MD5 hashes to
> confirm that all such mirror copies were identical.
>
> [5] Or should we just focus on a DOI-based solution? However,
> I note that the PDB DOIs now resolve to a "landing page" (e.g.
> http://dx.doi.org/10.2210/pdb5cro/pdb) whereas in earlier days
> such a DOI would immediately download the required PDB file.
> I do see some benefit of providing an address that is known to
> serve directly the resource you want (i.e. the dictionary file),
> so that for instance validation program can automatically load
> the current version. So having citable DOIs to reference a
> dictionary doesn't entirely remove the need for something
> like the existing register.
>
> Brian
>
>
>
>
>
> On 10/10/2022 06:18, James H via comcifs wrote:
> > Dear COMCIFS,
> >
> > Activity is picking up on the dictionary development front, particularly
> > regarding powder and core. This has led me to wonder about developing
> > some sort of process for releasing dictionary updates. See the below
> > message that I've raised as an issue on Github (see
> > https://github.com/COMCIFS/cif_core/issues/307
> > <https://github.com/COMCIFS/cif_core/issues/307>). Please feel free to
> > respond there or here. Message follows:
> >
> > We should develop some sort of dictionary release policy. At the moment
> > we commit updates to the master branch of the dictionary on Github, and
> > no further release activity happens. The status of the release is
> > unclear: is it official once the commit is made? It certainly has an
> > internal version number. I suggest we develop a process. Here is a start:
> >
> >  1. A dictionary becomes official once it has been tagged on Github as a
> >     Github release
> >  2. A Github release should be simultaneously reflected on the main IUCr
> >     website as the latest version of the dictionary
> >  3. The machine-readable IUCr dictionary catalogue should be updated at
> >     the same time as (2)
> >  4. There should be one release at least every 3 months unless a
> >     dictionary has not changed in that time.
> >  5. A dictionary may be released sooner than every 3 months if there is
> >     an urgent need
> >  6. Approximately one week before the official release date relevant
> >     IUCr mailing lists should be advised of the forthcoming release
> >     together with a summary of changes
> >  7. A "release manager" is nominated for each dictionary and is
> >     responsible for managing the release process.
> >
> > Thoughts? Experiences?
> >
> > thanks,
> > James.
> >
> > --
> > T +61 (02) 9717 9907
> > F +61 (02) 9717 3145
> > M +61 (04) 0249 4148
> _______________________________________________
> comcifs mailing list
> comcifs at iucr.org
> http://mailman.iucr.org/cgi-bin/mailman/listinfo/comcifs
>


-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.iucr.org/pipermail/comcifs/attachments/20230106/493af0fe/attachment.htm>


More information about the comcifs mailing list