Statistical data can be published as RDF Data cube. General documentation of RDF Data cube vocabulary can be found at http://www.w3.org/TR/vocab-data-cube/. The following content of this section discusses information and proposals for RDF Data cube export functionality of Odalic and corresponding issues and problems that were encountered.

Input file structure

First we had to decide which structure of input file would be supported by Odalic for processing statistical data. There is an example of input file in documentation on the page http://www.w3.org/TR/vocab-data-cube/#example:

 

2004-2006

2005-2007

2006-2008

 MaleFemaleMaleFemaleMaleFemale
Newport

76.7

80.7

77.1

80.9

77.0

81.5

Cardiff

78.7

83.3

78.6

83.7

78.7

83.4

Monmouthshire

76.6

81.3

76.5

81.5

76.6

81.7

Merthyr Tydfil

75.5

79.1

75.5

79.4

74.9

79.6

There are three dimensions: time period (rolling averages over three year time-spans), region and sex. Each observation represents the life expectancy for that population (the measure) and we needed an attribute to define the units (years) of the measured values. This table has multiline headers, heading rows and heading column. Then every cell represents one observation. But this structure of the table is not in the end supported by Odalic. Odalic supports only tables with exactly one header row and no header columns. So the data above can be transformed to following table structure (slightly extended):

CountryRegionTime periodSexLife expectancy
Count1Newport

2004-2006

Male

76.7

Count1Newport

2004-2006

Female

80.7

Count1Newport

2005-2007

Male

77.1

Count1Newport

2005-2007

Female

80.9

Count1Newport

2006-2008

Male

77.0

Count1Newport

2006-2008

Female

81.5

Count1Cardiff

2004-2006

Male

78.7

Count1Cardiff

2004-2006

Female

83.3

Count1Cardiff

2005-2007

Male

78.6

Count1Cardiff

2005-2007

Female

83.7

Count1Cardiff

2006-2008

Male

78.7

Count1Cardiff

2006-2008

Female

83.4

Count2Monmouthshire

2004-2006

Male

76.6

Count2Monmouthshire

2004-2006

Female

81.3

Count2Monmouthshire

2005-2007

Male

76.5

Count2Monmouthshire

2005-2007

Female

81.5

Count2Monmouthshire

2006-2008

Male

76.6

Count2Monmouthshire

2006-2008

Female

81.7

Count2

Merthyr Tydfil

2004-2006

Male

75.5

Count2

Merthyr Tydfil

2004-2006Female

79.1

Count2

Merthyr Tydfil

2005-2007

Male

75.5

Count2

Merthyr Tydfil

2005-2007

Female

79.4

Count2

Merthyr Tydfil

2006-2008

Male74.9
Count2

Merthyr Tydfil

2006-2008

Female

79.6

Then every row represents one observation. One column represents measure (Life expectancy) and three columns represent dimensions (Region, Time period, Sex). First column is neither measure nor dimension, because there is a relation between Country and Region. There are no relations among other columns. Theoretically there could be more columns representing measures.

Resulting RDF Data cube and the generated patterns

Based on the example above, there is complete resulting RDF Data cube in documentation at http://www.w3.org/TR/vocab-data-cube/#full-example. According to the example in documentation the RDF Data cube contains these parts:

  • Data Set
  • Data structure definition
  • Dimensions and measures
  • Observations

For every part there is a "pattern" showing how Odalic producec the RDF. For producing the RDF Data cube we needed the Result provided by Odalic core algorithm and also the Data cube definition ("CubeDef") provided by user. For every pattern there is depicted what information we need from Result and CubeDef for producing the RDF.

Data Set pattern

Input from Odalic Result:

  • (none)

Input from user's CubeDef:

ParameterValue
TitleLife expectancy title
LabelLife expectancy desc
CommentLife expectancy within Welsh Unitary authorities comment
DescriptionLife expectancy within Welsh Unitary authorities - extracted from Stats Wales
Subjecthttp://purl.org/linked-data/sdmx/2009/subject/3.2
OrganizationExample org
  • Note: Date for "issued" can be computed by program during RDF producing.

RDF output pattern:

# -- Data Set --------------------------------------------

eg:dataset a qb:DataSet;
    dct:title       "Life expectancy title";
    rdfs:label      "Life expectancy desc";
    rdfs:comment    "Life expectancy within Welsh Unitary authorities comment";
    dct:description "Life expectancy within Welsh Unitary authorities - extracted from Stats Wales";
    dct:publisher   eg:organization;
    dct:issued      "2016-09-22";
    dct:subject     <http://purl.org/linked-data/sdmx/2009/subject/3.2>;
    qb:structure    eg:dsd;
    .

eg:organization a org:Organization, foaf:Agent;
    rdfs:label "Example org";
    .

Data structure definition pattern

Input from Odalic Result:

  • (none)

Input from user's CubeDef:

  • Which columns are dimensions and measures - the column numbers (order in the Input of the task) are enough.

RDF output pattern:

# -- Data structure definition ----------------------------

eg:dsd a qb:DataStructureDefinition;
    # The dimensions
    qb:component [ qb:dimension eg:refArea;         qb:order 1 ];
    qb:component [ qb:dimension eg:refPeriod;       qb:order 2 ];

    # The measure(s)
    qb:component [ qb:measure eg:lifeExpectancy ];

    # The attributes
    qb:component [ qb:attribute sdmx-attribute:unitMeasure ];
    .

Dimensions and measures pattern

Input from Odalic Result:

  • Classification of columns pointed by user as dimensions and measures (Label and Resource)

Input from user's CubeDef:

  • Which columns are dimensions and measures (for example the number of column in input is enough).

RDF output pattern:

# -- Dimensions and measures  ----------------------------

eg:refPeriod a rdf:Property, qb:DimensionProperty;
    rdfs:label "reference period";
    qb:concept <http://dbpedia.org/resource/Reference_period>;
    .

eg:refArea a rdf:Property, qb:DimensionProperty;
    rdfs:label "reference area";
    qb:concept <http://dbpedia.org/resource/Region>;
    .

eg:lifeExpectancy a rdf:Property, qb:MeasureProperty;
    rdfs:label "life expectancy";
    rdfs:subPropertyOf sdmx-measure:obsValue;
    qb:concept <http://dbpedia.org/resource/Life_expectancy>;
    .

Observations pattern

Input from Odalic Result:

  • Disambiguation of cells in columns pointed by user as dimensions (Resource)

Input from user's CubeDef:

  • Unit of measure (Resource)

Note: Values of cells in column pointed by user as measure are obtained from the Input of the task.

RDF output pattern:

# -- Observations -----------------------------------------

 
eg:o1 a qb:Observation;
    qb:dataSet  eg:dataset ;
    eg:refArea                 <http://dbpedia.org/page/Newport,_New_South_Wales> ;
    eg:refPeriod               <http://reference.data.gov.uk/id/gregorian-interval/2004-01-01T00:00:00/P3Y> ;
    sdmx-attribute:unitMeasure <http://dbpedia.org/resource/Year> ;
    eg:lifeExpectancy          76.7 ;
    .

eg:o2 a qb:Observation;
    qb:dataSet  eg:dataset ;
    eg:refArea                 <http://dbpedia.org/resource/Cardiff> ;
    eg:refPeriod               <http://reference.data.gov.uk/id/gregorian-interval/2004-01-01T00:00:00/P3Y> ;
    sdmx-attribute:unitMeasure <http://dbpedia.org/resource/Year> ;
    eg:lifeExpectancy          78.7 ;
    .