Talk:Research papers 1.1

From cbwiki.net

Jump to: navigation, search

Links to more than one version

San 16:38, 25 July 2008 (BST) Our application guide doesn't give much guidance on how to handle a link to a PDF of a paper if you don't want that to be what the feed link is. For example, we do *not* want the feed link to go directly to the PDF; rather, we want folks to go to the abstract page from where there is a link to different versions of the paper (PDF, Screen reader, etc.)

But, this doesn't play nicely with our stated purpose of helping the aggregators: Timo still has to mine the abstract page to pull out the PDF info.

So we should address this. I suspect that this is where we'd use a <cb:resource> element but does anyone want to help me put together suggestions on how these should be structured?

Dan Chall-FRBNY 17:07, 25 July 2008 (BST) I do. You nailed the problem. I agree, for us, we do not want the feed (or search engines, or any general navigation) to send the user to the PDF. For our site, the only way to the PDF should be from the abstract page. So the first question might be: should the RSS-CB standard support the desire of the "anonymous CB aggegator" (Timo didn't want to be named, as I recall) to send users directly to the PDF? One might suggest that your site and mine might prefer if Timo didn't link to PDFs, to keep the visitors within our own navigation structure when they get here. Then again, I suppose it could always be voluntary, and some central banks might want to make it possible for users or aggregators to find either the abstract page or the PDF. In that case, I think you're right that cb:resource is the answer. See http://www.cbwiki.net/wiki/index.php/User_guide_1.1#.3Ccb:resource.3E

On our site, we have some publications with two primary representations in addition to the abstract page: the PDF version of the Current Issues article, and the interactive HTML version. For instance, see: http://www.newyorkfed.org/research/current_issues/ci14-4.html This would require two different cb:resouce tags. Each one would have a title, a link, and a description. Title might be the title of the article in each case (repeated from the feed entry's title), and Description might say "PDF version" or something like that.

San 18:47, 25 July 2008 (BST) Yeah Dan! We're on the same page again!

So it's the "something like that" that I'm interested in putting together guidance for. As a starting point, I've been toying with something as banal as the following:

<cb:resource>
  <cb:title>PDF version of "Monetary Policy in a Forward-Looking Input-Output Economy"</cb:title>
  <cb:link>http://www.federalreserve.gov/pubs/feds/2008/200833/200833pap.pdf</cb:link>
  <cb:description>Link to PDF file of FEDS 2008-33</cb:description>
</cb:resource> 

I'm sure we could also do it for the screen reader version if we wanted to really have equal time but I don't know if other unnamed aggregators would be interested in the info.

Now we need to nail down the details - this is the fun part! Suggestions?

Dan Chall-FRBNY 19:07, 25 July 2008 (BST)Like Hannah Arendt on the banality of XML? Banal is good, here, I think. I'm not sure whether we need "PDF" identified in the title element, since that's conveyed in the description and "PDF" doesn't appear on the title page. And I'm not sure if we need "link to" in the description element, as the user interface should provide that information. I also think you might not want FEDS 2008-33 in the description, because you may be overloading it. That information should be stored elsewhere in the feed, and the key piece of info for this resource is "PDF," as I see it.

How about:

<cb:resource>
  <cb:title>Monetary Policy in a Forward-Looking Input-Output Economy</cb:title>
  <cb:link>http://www.federalreserve.gov/pubs/feds/2008/200833/200833pap.pdf</cb:link>
  <cb:description>PDF version</cb:description>
</cb:resource> 

Additionally, in this context I think there might be interest in some new elements. On our site we always want to tell the filesize and the number of pages whenever there's a link to a PDF file. I don't see those two concepts identified as elements anywhere in the spec, at least in the context of cb:resource.

Christine Sommo-FRBNY 19:49, 25 July 2008 (BST) Yes. <cb:resource> is what I think you should use. We use it often in our news feed for linking to PDFs (and other stuff as well). I'd be happy to work on this with you.

Christine Sommo-FRBNY 19:56, 25 July 2008 (BST) Here's a bit more detail on how we use this in our News feed when we're dealing with research papers. We use <cb:resourceTitle> to refer to the series title, <cb:resourceLink> to refer to the URL (duh.) and <cb:resourceDescription> to refer to the file format (HTML, PDF, etc.) Dan's code above is of course more appropriate to version 1.1 than what I just described. Maybe we haven't updated our feeds in a while??

San 20:17, 25 July 2008 (BST) Maybe Dan has done the work for us! I like the pithiness of his example. Any objections? Do we need to have a vote on this or can we just add this to the application guide and check this box?


Dan Chall-FRBNY 20:34, 25 July 2008 (BST) "'It's a wiki!'" What we came up with fills a vacancy; we're not doing something for which anybody has expressed a contrary opinion. Just do it! You or I? Check which box?

San 21:12, 25 July 2008 (BST) Done! Application guide updated. User guide updated. Decision made. Box checked. Now I just need to change our feeds......

I *love* wikis!

Dan Chall-FRBNY 21:15, 25 July 2008 (BST) Great! Which box checked?

San 21:21, 25 July 2008 (BST) The one on my "To Do" list!

Dan Chall-FRBNY 21:41, 25 July 2008 (BST)Check!

Christine Sommo-FRBNY 22:25, 25 July 2008 (BST) Double check! Nice work.

San 22:36, 25 July 2008 (BST)Thanks! It's been nice collaborating with you!

Paul Asman-FRBNY 13:40, 28 July 2008 (BST) Just for clarification - I just noticed this now - why are these cb terms being used (resource, link, title, and description) when Dublin Core terms would seem to do the work (hasVersion, title, and description)?

Dan Chall-FRBNY 20:50, 30 July 2008 (BST)I was following the user guide entry for cb:resource, which explicitly refers to PDF (http://www.cbwiki.net/wiki/index.php/User_guide_1.1#.3Ccb:resource.3E). I think "dc:hasVersion" might work, but version of what? We want the primary links for the feed to go to the abstract pages, but I don't think we are asserting that the PDF is another version of the abstract. I suppose dc:title could be used too. Would you make that and dc:description child elements of dc:hasVersion? And how would you handle the link to the PDF version?

Paul Asman-FRBNY 13:52, 4 August 2008 (BST) The description of hasVersion is, "A related resource that is a version, edition, or adaptation of the described resource." The link element is a link for the resource. I don't see a conflict. The 'hasVersion' points to another version of the resource that has the link element pointing to its abstract. The title and description elements are part of the description of the resource, so they would remain child elements of that. The link to the PDF version would be the text child of the hasVersion element.

Dan Chall-FRBNY 14:49, 4 August 2008 (BST) The proposed usage is also consistent with the specification for the resource element. There is a judgment to make whether the abstract is a version of the paper. You know, there may be a fundamental issue here: In NY and at the Board there's a clear interest in having the feed direct the user to the abstract page. But are we describing the abstract with the RSS? I don't think so. I don't think we're describing the PDF document either, since we are directing the user to the abstract page.

So I think we are describing the whole gestalt: the paper, the online version, the abstract. As such we are not referring to one object from another object as if they were peers. The PDF file is not a "version" of the entity we are describing. It is part of that entity. "resource" seems to indicate something that you can access that's associated with the paper.

San was promising to post her views on this.

Paul Asman-FRBNY 16:11, 4 August 2008 (BST) First, some specific responses: 1. I think that it is legitimate to say that a resource has a version that is an abstract. I don't know that it's required or even desirable to say that, but I think it's correct. 2. I think that we can have the link point to whatever we deem appropriate, be that the abstract or some other version of the paper. 3. I agree that we are not describing the abstract, but the paper. Which leads to ...

The more general response: I view the resource as an abstraction from all the versions in might have. Sort of a form / matter thing. There is the paper itself, and then there are ways in which that paper presents itself. These ways may be as Word documents, PDFs, abstracts, and more. So I do see the PDF file as a version. If the word choice were up to me, I would have said representation, but the DCMI preempted me.

On the DCMI FAQ, the question is posed, "What is a resource?" The answer is: "In Web terminology, a resource is 'anything addressable via a URL.' However, Dublin Core implementations are not necessarily Web-based. Dublin Core metadata can be used to describe any kind of resource - including various collections of documents and non-electronic forms of media such as a museum or library archive." The first part would, I think, exclude both Dan and my interpretations, and make a resource a specific web page. But the rest of the answer seems to allow anything, and specifically mentions collections of documents. So I think that we're okay with an abstracted notion of resource.

Dan Chall-FRBNY 17:27, 7 August 2008 (BST) I would agree with "representation" but I think "version" may have for some a specific connotation of referring to revisions to the content of a paper, as opposed to different representations. That was San's point, as I understand it, but we're still waiting for her to post her views here. I like the idea of using the explicit "link" element that is in our spec, because it makes it explicit what the text is (as opposed to requiring a little more knowledge that a text child of a hasVersion element is a URL of the file that is the version). San?

San 18:54, 7 August 2008 (BST) Nothing like putting words in my mouth to get me to respond Dan! Luckily they were the right words.

Dan is correct here: version has a connotation with respect to research papers. The fact that there is a working paper *version* that appears before the published *version* is the prime example. They can be fairly substantially different representations of the same research regardless of the format in which the information is presented: the working paper is likely to have a PDF format, an accessible HTML format (if it's one of ours) and and abstract (which is a condensed format if you want to think of it that way) which are separate entities from the various formats available for the published version.

By using cb:resource the way we have, I don't think we've violated the spirit of the "Web terminology" meaning; we've just redirected things to match our usage. I believe we made similar decisions about the location tags.

That said, I see that there is a dc term that might make sense: hasFormat which is described as: "A related resource that is substantially the same as the pre-existing described resource, but in another format."

I think this is more appropriate than hasVersion in this instance but I'm not sure how we would implement it or whether it matters that "As of December 2007, the DCMI Usage Board is seeking a way to express this intention with a formal range declaration."

Paul Asman-FRBNY 19:59, 7 August 2008 (BST) If isFormat is the right dc term, great; that just means that I read too hastily. That said, I think that it's important to use dc terms wherever we can, even if they're not perfect fits. That is, I think it more important that we cast our information in ways that others expect and understand rather than in ways that are truer to us but not generally understood. Or, in one more formulation, if it's close enough, use it.

I think that it's fine that DCMI is looking to specify the range of formats. I doubt any of us use anything outside that range, and if we do, we can always extend the DCMI range with our own additons.

Personal tools