Talk:User guide 1.1

From cbwiki.net
Jump to: navigation, search

Contents

cb:resource

Paul Asman-FRBNY 14:55, 18 November 2008 (UTC) At the technical session on Tuesday before the CBOC conference in Mexico City in October, 2008, I presented some slides (available through the presentation) questioning whether we should replace cb:resource with elements from the DCMI element set. In the discussion, it was agreed that the cb:resource element was a bit of a mess. Since others are more expert in this subject matter than I, I invite them to use this space as a means to straightening it out.

dcterms elements

Paul Asman-FRBNY 14:42, 18 November 2008 (UTC) Following on the conclusions of the technical session on Tuesday before the CBOC conference in Mexico City in October, 2008, I added two dcterms elements, issued and publisher, into the user guide. These are optional, and their presence does not affect the specification, as all dcterms elements are optional. Their inclusion in the user guide calls attention to the possibility of their use. I would have also included dcterms:subject, but the reaction to it was more mixed. I'd like to see it argued for, though.

cb:coverage

Paul Asman-FRBNY 14:00, 18 November 2008 (UTC) The discussion for this element in the user guide is currently this: "If there are further refinements to the topic, they should be expressed here. For example, "manufacturing" indicates industry coverage or "Alsace" for regional coverage."

Manufacturing and Alsace are significantly different categories, and we might not want to lump them together. My guess is that we include both under coverage only as an accident of English usage.

If we do make such a separation, we might consider putting locations into the dcterms element coverage, which is meant for spatial or temporal coverage, and retain the cb element coverage for area of activity. It might be odd having two elements called coverage that cover such different concepts, though. (And it's also odd that dcterms uses one element for two such distinct domains.)

support for the semantic web

Paul Asman-FRBNY 13:28, 18 November 2008 (UTC) At the technical session on Tuesday before the CBOC conference in Mexico City in October, 2008, I raised the question of whether dc:creator should be replaced by dcterms:creator. The difference is in the range of permitted values. dc:creator can take a literal value, while dcterms:creator must be web-addressable. Participants were content to leave the element where it is, as dc, without requiring it to be web-addressable. Fostering the semantic web did not seem to be a priority. Should someone wish to create more semantically rich content, the dcterms element could be included as well, since all dcterms elements are optional for rss-cb.

dc:creator

Paul Asman-FRBNY 14:23, 14 October 2008 (BST) This was listed (as of today) as required in the User Guide, but not listed at all in the Specification (except as part of a code example). Given that the New York Fed doesn't use it in the one feed I checked, and given that I don't want to make us out of compliance with the RSS-CB spec, I changed the status in the User Guide to recommended. Some may wish to change the spec instead.

location of georss:point

Steven Bagshaw-BIS 16:14, 20 February 2008 (UTC) I'm going to move this element to outside of the cb application. At the moment, only cb namespace elements appear within the cb application element, which I think is cleaner.

We already have the dc and dcterms elements outside, so it should go there I think. I'm open to disagreement of course.

This would require changes to the schemas I guess. Mike (and others), what do you think?

cb:observationPeriod

Paul Asman-FRBNY 15:46, 25 January 2008 (UTC) In the specification, this is a child element of cb:otherStatistic. In the user guide, it is said to be required for all statistical applications. I'm not competent to resolve this, so I leave it to others. There's a discussion about this field below, in the context of dc:date, but I'm not interested here in the semantics, but the syntax. I've also separated it out as it needs quick attention - as things stand, something is wrong in either the spec or the user guide.

Steven Bagshaw-BIS 16:00, 25 January 2008 (UTC) I'm not able to resolve it either, but isn't this what Dan was trying to sort out here on the CBWiki forum ? No one answered him.

Paul Asman-FRBNY 18:41, 25 January 2008 (UTC) I don't think it's the same issue - I think that Dan was trying to sort out was the content of the text child of cb:observationPeriod. I'm asking what parent elements it has. In the specification, it is a child element only of cb:otherStatistic. In the user guide, it is a required child element of cb:exchangeRate, cb:interestRate, and cb:transaction as well.

Dan 19:10, 25 January 2008 (UTC)It seems clear that any statistic with a time-series dimension should have an observationPeriod. There are a few other things to clean up, I think. If "observationPeriod" identifies the time index of a value, then we should use that for exchange rates instead of "date". (I think this came up in the other discussion a while back.) The spec for exchange rate refers to five required elements, but seems to show only four. For interest rates, the spec says three and lists three, but doesn't have a time dimension. Transaction claims three required fields but lists four, none of which are the (should-be-required) observationPeriod. Do I read correctly that otherStatistic claims three required fields and lists four (value, observationPeriod, topic, dataType, coverage)? Another question is whether frequency should be an attribute of value or observationPeriod. I think the latter.

dc:date

Noe Palmerin-Banco de Mexico 20:30, 3 September 2007 (BST)

Quick question:

I need to publish a statistical data (inflation). These are monthly data published August ninth at 09:00 and the data correspond to July. The RSS-CB should contains: . <dc:date>2007-08-09T09:00:00-05:00</dc:date> . Besides the title. Where do I need to put the date of the data (July in this example)?

Paul Asman-FRBNY 17:59, 4 September 2007 (BST) I don't think that we specified something for this. As you note, the dc:date field refers to the date of the feed, rather than what underlies the feed. For some of the other feed types, we have a field that does not quite meet your needs, occurrenceDate, which works well for speeches, publications, and so on. There's also a publicationDate, which has some features in common with what you want, but that's for non-specific dates of a publication, e.g. 'Spring, 2007'.

I think that you have three choices. 1. You can put the date into the title, such as "Inflation statistics for July 2007." 2. You could put in your own extension. 3. You could propose a modification to the specification to handle this case. The last option would be appropriate, I think, if you find the case common or general (and I think there is a good argument for this).

Noe Palmerin-Banco de Mexico 19:47, 4 September 2007 (BST) Actually I was thinking in option 3 when I did the question. We allow other kinds of data (<cb:otherStatistics>) but there is no way to put the date of the data. "occurrenceDate" is for publications as you say, and "publicationDate" is ambiguous if we want a good machine-machine communication. I think.


San 21:40, 6 September 2007 (BST) I confess that I haven't looked at things much since we moved to 1.1 but we used to have a field specifically for this. We'll be publishing monthly IP data and will have the same issue. In my test files from several months ago, I use <cb:publishdate>2006-11</cb:publishdate> but don't remember how it was in the spec (or if I even remembered to put it there!) I'm not wed to this term but we definitely have to have a way to specify non-daily data. I think that "observationDate" is unambiguous and can be applied to all data types - even the daily stuff if people want to allow for the publication date and the observation date to differ.

Noe Palmerin-Banco de Mexico 14:58, 10 September 2007 (BST) I like the "observationDate" idea. I vote for it.

Steven Bagshaw-BIS 15:29, 10 September 2007 (BST): In other places where we have used "date" in the element name, it requires a W3C-format date, with time etc. e.g. cb:occurrenceDate.

To me, just reading "observationDate" implies one of these dates. And that it refers to a specific moment in time at which an observation was made, rather than a period like "July". So, it doesn't seem unambiguous to me, although I'm no statistician.

Anyway, looking at the stats application guide (version 1.0), there is cb:publicationDate. (I think this is the cb:publishDate San refers to). The description of it seems to match what is required here. However, its usage is here is perhaps somewhat different from that in the "papers" feeds.

So, my suggestion...

1) Delete cb:publicationDate from the statistical data application guide, as if we can't use it in this instance, we shouldn't use it at all. Deleting it removes the slight differences in possible usage of the field between papers and stats.

2) Create a cb:observationPeriod element for statistical data. It takes a string. Example values "July 2007", "Week 13, 2007", "3-9 June 2007", "Q3 2006" etc. Using "period" instead of "date" to me makes it clearer.

OR

Just use cb:publicationDate as the spec seems to say you should at the moment. (NB: cb:occurrenceDate is a moment-in-time type date and couldn't be used here).

San 21:34, 11 September 2007 (BST) I don't like "publicationDate" for this because it implies the date that the data are published which may not be the same as the date to which they refer. Although it does allow for the quarterly and annual date issues, it doesn't really capture the "spirit" of the thing - especially if we want to enforce that elements with the word "date" require a W3C date format.

Therefore, I vote for the "observationPeriod" element option for statistical data. Noe may want to include an example of how they specify data that are released three times per year (if I remember correctly from when we visited) since that is likely to be an edge case.

Noe Palmerin-Banco de Mexico 16:58, 13 September 2007 (BST) Well, for internal representation we use a combination of date and granularity. For example for monthly information we represent the July data as: DATE=2007-07-01 GRANULARITY=MONTHLY.

For quarterly information the April-Jun data: DATE=2007-04-01 GRANULARITY=QUARTERLY.

Personally I don't believe this is a good solution.

A possible solution could be a small modification of "observationPeriod" proposed by Steven:

  <cb:observationPeriod>
     <cb:initialDate>2007-04-01</cb:initialDate>
     <cb:finalDate>2007-06-01</cb:finalDate>
  </cb:observationPeriod>

What do you think? Too much elaborated?

PD: initialDate and finalDate must be W3C-format.

Steven Bagshaw-BIS 08:34, 14 September 2007 (BST): I think that would work and machines can understand it too. Do we make it mandatory? I imagine it could apply to all stats feeds?

Are there any possible observation period type values that wouldn't fit into this structure?

We'd need some usage guidelines on the times... for example, if an observation is for one day, do we put 2007-03-04T00:00+01:00 for initialDate and 2007-03-04T23:59+01:00 for end date? Or should we always use midnight?

Noe Palmerin-Banco de Mexico 16:15, 14 September 2007 (BST) Well, W3C has many formats and one of them is YYYY-MM-DD. We could use it: initial date-> 2007-03-04 and end date -> 2007-03-04.

I'm not sure if we need to make it mandatory. I would like to hear other ideas about it.

San 18:59, 14 September 2007 (BST) I think this is way too much! It seems to me that we're using a sledgehammer to kill a fly! First of all, what do people do with all that information when they get the data? When my staff processes quarterly data, for example, they look for some indicator for which quarter the observation is relevant so they can put it into that spot in the database. Having a start date and end date for coverage will make this processing more cumbersome not to mention confusing. The staff here feels very strongly about not attributing 3 part dates (dd/mm/ccyy) to anything but daily data; this was one of our big issues with SDMX - the spec requires all dates to be W3C so we had a real problem with quarterly data because there isn't a single day or month to which we wanted to attribute an observation.

Maybe this is irrelevant - or I'll be outvoted like I was with the hierarchy stuff - but this just seems to make things more complicated for users, both data producers and consumers. To have to write out the same information twice if it is a daily observation seems absurd! Will we also then check to make sure that time span between the start and end dates matches the information in the required "frequency" attribute for the value? What do people do with the information if they don't match? I know that to accommodate this, we'll have to do a fair amount of coding to calculate the first and last day of each quarterly and monthly period for every observation to be able to write out the W3C date, since that information isn't stored with the observation in the database. Then I guess we just force people to write code to translate the period Jan 1 to March 30 of any year to correspond to the first quarter and the period February 1 to February 28 corresponds to the second month except when the second month is defined as February 1 to February 29.

I'd certainly not like to see it be made mandatory. That said, I suppose that as long as the text in the title can still present the observationDate as "2007Q1" so we can stay within the 40 character recommendation and humans can actually read the thing, people can always parse the title to get the representation they need. Of course, this defeats the purpose....

<rant off>

/san/

Steven Bagshaw-BIS 08:18, 17 September 2007 (BST): I think if I were programming a parser for this, 2007Q1 (or any other string) would be not be of any use for me. I don't think what people would do with the feed is relevant in this case, rather the machines.

A programmer would need the frequency (from a code list/lookup list) and the start date at least. So, then cb:publicationDate (W3C for simplicity's sake) could be used - we put a usage note in the spec saying it is the start of the observation period and hey presto! We're done...

Or we rename it to cb:observationStartDate. Or use cb:occurrenceDate, which is used in a somewhat similar way in other feeds.

The understanding is that this value is to be used in combination with the frequency.

San 13:32, 17 September 2007 (BST) This part I agree with. When I mentioned people, I meant the people programming the machines and ultimately using the output from the machines; hopefully, machines themselves are not the ultimate consumer of this information. That's why I'm really concerned about tailoring things strictly for a parser with no meaning behind the representation. How would the folks in MED think about an observation with a frequency of "Quarterly" and a W3C date that specifies a day/month/year? Maybe I'm off base but folks here think that such a thing is just plain wrong - the observation represents the entire period and should not have a particular day assigned to it.

Therefore, I would have to vote either for using clearly specified, W3C-formatted start and end dates (which I think is complete overkill but at least is correct) or using the string representation for observationPeriod.

Am I the only one with strong feelings on this or are others unaware that this discussion is taking place?

/san/

Paul Asman-FRBNY 13:43, 17 September 2007 (BST) "Am I the only one with strong feelings on this or are others unaware that this discussion is taking place?" Since you ask explicitly: I have no strong feelings about this. I have a general preference for accuracy over elegance in XML meant for machines, but not enough of a preference to get into the middle of this. I like the start and end dates, but if people wanted a field that held quarters (something like observationQuarter) that wouldn't exercise me either.

Steven Bagshaw-BIS 15:14, 17 September 2007 (BST): "How would the folks in MED think about an observation with a frequency of "Quarterly" and a W3C date that specifies a day/month/year?"

It seems to me, looking at some samples, that the reason SDMX-ML files can have "2007-10" is because they have an extra attribute of TIME_FORMAT, in addition to the frequency. This specifies how the observation period value is formatted. It's a bit more complex that just jamming in the first date of the period, which can be read generically for every value then.

So, avoiding using this date idea, we would also have to have rules in RSS-CB about how to specify dates for each type of period, so that the programs could be written. e.g. "2007-01-01", "2007-01", "2007Q1", "2007", and semi-annual and weekly? RSS feeds can be jumbled up any old way, so each item would have to have this info. Maybe we end up jamming it all into cb:value?

<cb:value frequency="quarterly" period="2007Q4" periodFormat="CCYYQ" decimals="4">1.1240</cb:value>

Or we just put in the spec that quarterly values are formatted like this, weekly like that etc and the consumer has to read the spec.

<cb:value frequency="quarterly" period="2007Q4" decimals="4">1.1240</cb:value>

You (reader) have to know how period is formatted for frequency="quarterly".

I think this last one is pretty simple. The coder would have to do some work for each frequency, but they probably would have needed to regardless of the various solutions proposed so far.

(Side note: we don't yet have a lookup list of valid frequency values in the application guide. We should do that.)

Timo Laurmaa-BIS 18:19, 17 September 2007 (BST) Wow, this discussion gives me the opportunity to open my FAME Users Guide from 1990 for the first time in this millennium :-) I vote for easy acceptance (combined with ease of reading) by people who are used to dealing with (non-daily) statistical data. I am with San in rejecting three-part YMD dates for anything but daily data. FAME understands 2007 (frequency="annual"), 2007S1 (semiannual), 2007Q2 (quarterly), 2007M8 (monthly) plus others, even something like this for 3-periods-per-year:

<cb:value frequency="ppy(3)" period="2007P2" decimals="4">1.1240</cb:value>

Noe Palmerin-Banco de Mexico 19:57, 17 September 2007 (BST) I agree that yyyy-mm-dd is to much for non-daily data, but certainly I don't like the parser idea over the title. Timo's solution works for me. Maybe we could use Dublin Core for Frequency[1].


San 21:32, 17 September 2007 (BST) I also like Timo's idea - it is similar to what I'm used to but I confess to trying to avoid making things too FAME-specific since it's usage among our audience is waning (one of the things I picked up on my sabbatical!) The issue with FAME frequencies is one that Paul and I have been "discussing" for a while: they consider weekly observations dated on a Monday to be a different frequency than weekly observations dated on a Wednesday. Technically, the frequency on both is weekly since 7 days pass between observations but try getting Excel (or any statistical package) to correctly handle a data import with weekly data where the observations fall on different dates.

So, to avoid that issue entirely, I'm all for using the Dublin Core list for Frequency with one caveat: can we extend it to represent data at a higher frequency than daily? These financial types are very big on hourly, minutely, and other ridiculous notions - FAME's next version will handle observations dated on the millisecond! - so we need to make sure we don't preclude the ability to use the spec to accommodate the very high frequency data if someone decides they really want to do it.

As for the actual specification, if we came up with a (required? suggested?) representation for each frequency on the Dublin Core list, then we can use the simpler representation listed by Steve and Timo above.

Dan 17:27, 20 September 2007 (BST) (Rewritten immediately after posting) There are two issues: labels for humans, and date representations for computers. So before we talk about the attributes, do we have a proper system for making date representations readable to people? I still wonder whether we are trying to do too much for computer readers, using RSS as the foundation. How does SDMX deal with this issue? Might it not be a better method for communicating data to computers on a production basis?

I have no problem with the DC frequencies, provided (agreeing with San, I think), that there is a method of adding additional information to distinguish weekly-thursday from weekly-tuesday (with an additional attribute, say). I don't think it would make sense to define July in terms of the start-minute and end-minute. I suppose any method that we agree on that is comprehensive enough to cover all the different kinds of data observations we're likely to encounter (nanosecondly interest rates, and a few other things that might not be covered by DC) will be one that developers will be able to work with.

I wonder if we'll need to have some series-level (feed-level?) attributes to help communicate this information.

I think we also need to talk about the set of period names: Fame gives various options ranging from 2007M9 to September. It won't be friendly to humans, and we don't know how many people will be building systems to get the data.

Noe Palmerin-Banco de Mexico 21:15, 20 September 2007 (BST) I believe the friendly date must be in the title, just because that is what the humans read. The idea around all these metadata (and the standard itself) is to provide more information (as the value tag) so readers (machines) could take advantage of all this (meta)information.

Steven Bagshaw-BIS 08:17, 21 September 2007 (BST): I agree with Noe. Stuff for humans should be in the title and the description. The description itself leaves lots of room for human-oriented metadata. e.g. "These are the data for blah, based on weekly-Thursday observations blah blah blah".


San 20:57, 21 September 2007 (BST) Wait - I think that Dan meant that the human readable stuff should be in the title but that we don't provide any guidelines for how to make it clearest. At least that would be my concern. Steven and Noe are right: the rss-cb tags don't need to (necessarily) be completely user-friendly as long as they are correct and appropriate; my objection to a three-part date for quarterly data is that it is neither.

If we want to make suggestions on clear ways to specify dates in the title (2007M9 vs. Sep2007, for example), I think that would be great; we certainly don't hesitate to make suggestions on title details in other places. As for the observationPeriod tag (still my preferred moniker), I think a useful example to keep in mind when talking about the "machine" information is not to think of how a parser might take that observation information and stuff it into a database. While that is possible and indeed likely, I don't think we should cater to or encourage that behavior. As Dan mentions: that's what SDMX exchanges are for. Rather I like to think about the "repurposing" aspect: if JP Morgan wants to publish our current CP rates on an internal (or external for that matter)website, then I want to make sure they have the correctly tagged information to be able to incorporate our information correctly into a page with their look and feel. Using the title doesn't accomplish that but building the page using the rss-cb components will.

In essence, I think we are basically in agreement but I think that Dan's concern about overloading the RSS and using it as a replacement for fully specified data delivery mechanisms is a valid one.

Noe Palmerin-Banco de Mexico 23:15, 21 September 2007 (BST) I concur that Dan's concern is a valid one. I just want to add that we are faraway from SDMX since our intention is to publish just the last available data and the essential metadata. Date of data is an essential one, I believe.

Also I liked the San's example about the possible uses of our Feeds (JP Morgan example).

Steven Bagshaw-BIS 10:00, 24 September 2007 (BST): The stats application guide was still at version 1.0, which would be a problem to start adding these changes.

So, now there is a Statistical_data_1.1 page where someone can put in the observationPeriod stuff. It's getting a little hard to follow, so perhaps we should start modifying the wiki spec itself?

San 14:17, 24 September 2007 (BST) I'll try to start working on it this afternoon. I think the first step is to add observationPeriod to the user guide as an application specific tag - if that's the moniker we've decided on.

Dan 20:07, 1 October 2007 (BST) Thank you for clarifying my concern, San. I do think we need to suggest a standard for presenting "observation period" to humans as well as to computers, and we need to be careful about presenting them to computers.

Noe said " I just want to add that we are faraway from SDMX since our intention is to publish just the last available data and the essential metadata. Date of data is an essential one, I believe." My point, or at least my question, involved not to make use of the content of SDMX, but rather to consider the SDMX syntax for representing a data item and the period to which it applies. I agree with San that we shouldn't try to turn an RSS-CB feed into a universal database feeder. That means we need something straightforward and simple to identify observation periods. In a sense, if some coder at JP Morgan will need to use the RSS-CB tags to populate fields in their own web content, we do need to make it "user friendly" in some way: friendly to a coder that will need to convert our fields into a format they'd want to use. In that regard, it's possible that there is a "standard" we can adopt, such as the observation period label used by the original provider, or something unambiguous such as the three-letter month abbreviation (if monthly). I'd be leery of trying to invent something that would work with generic database loaders.