cansim package provides R bindings to Statistics Canada’s
main socioeconomic time series database, previously known as (and
frequently referred to in this package, and elsewhere, as) CANSIM. Data
can be accessed by table number, vector or both table number and
coordinate. The package accepts both old and new (NDM) CANSIM table
cansim package is available on CRAN and can be
installed directly using the default package installation process:
# install.packages("remotes") ::install_github("mountainmath/cansim") remotes library(cansim)
If you know the data table catalogue number you are interested in,
get_cansim to download the entire table.
<- get_cansim("14-10-0293") data #> Accessing CANSIM NDM product 14-10-0293 from Statistics Canada #> Parsing data head(data) #> # A tibble: 6 × 24 #> REF_D…¹ GEO DGUID UOM UOM_ID SCALA…² SCALA…³ VECTOR COORD…⁴ VALUE STATUS #> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> #> 1 2001-03 Cana… 2016… Pers… 249 thousa… 3 v9141… 1.1.1 24281. <NA> #> 2 2001-03 Cana… 2016… Pers… 249 thousa… 3 v9141… 1.2.1 15758. <NA> #> 3 2001-03 Cana… 2016… Pers… 249 thousa… 3 v1018… 1.2.2 33 <NA> #> 4 2001-03 Cana… 2016… Pers… 249 thousa… 3 v1018… 1.2.3 NA .. #> 5 2001-03 Cana… 2016… Pers… 249 thousa… 3 v9141… 1.3.1 14572. <NA> #> 6 2001-03 Cana… 2016… Pers… 249 thousa… 3 v1018… 1.3.2 36.3 <NA> #> # … with 13 more variables: SYMBOL <chr>, TERMINATED <chr>, DECIMALS <chr>, #> # GeoUID <chr>, `Hierarchy for GEO` <chr>, #> # `Classification Code for Labour force characteristics` <chr>, #> # `Hierarchy for Labour force characteristics` <chr>, #> # `Classification Code for Statistics` <chr>, #> # `Hierarchy for Statistics` <chr>, val_norm <dbl>, Date <date>, #> # `Labour force characteristics` <fct>, Statistics <fct>, and abbreviated …
By default, the data tables retrieved by the package comes in the original format provided by Statistics Canada and is enriched by several added columns and transformations.
Datecolumn is added that tries to intelligently infer a Date object from the
val_normcolumn is added, that applies the appropriate scaling factor to the
VALUEcolumn. So if data is coded as “thousands of dollars”, a value of
VALUEcolumn is converte to a value of
val_normcolumn. Similarly, a percentage of
VALUEcolumn is converted to a value of
Taking a look at an overview of the data within a table is a common
first step. This is implemented in the package with the
get_cansim_table_overview("14-10-0293") #> Labour force characteristics by economic region, three-month moving average, unadjusted for seasonality, last 5 months, inactive #> CANSIM Table 14-10-0293 #> Start Reference Period: 2001-03-01, End Reference Period: 2020-12-01, Frequency: Monthly #> #> Column Geography (76) #> Newfoundland and Labrador, Prince Edward Island, Nova Scotia, New Brunswick, Quebec, Ontario, Manitoba, Saskatchewan, Alberta, British Columbia, ... #> #> Column Labour force characteristics (10) #> Labour force, Not in labour force, Employment, Unemployment, Full-time employment, Part-time employment, Population, Unemployment rate, Participation rate, Employment rate #> #> Column Statistics (3) #> Estimate, Standard error of estimate, Standard error of year-over-year change
When a table number is unknown, you can browse the available tables or search by survey name, keyword or title.
search_cansim_cubes("housing price indexes") #> Retrieving cube information from StatCan servers... #> # A tibble: 2 × 19 #> cansim_table_number cubeTitleEn cubeT…¹ produ…² cansi…³ cubeStar…⁴ cubeEndD…⁵ #> <chr> <chr> <chr> <chr> <chr> <date> <date> #> 1 18-10-0073 New housing… Indice… 181000… 327-00… 1981-01-01 2010-11-01 #> 2 18-10-0095 New housing… Indice… 181000… 327-00… 1981-01-01 1997-12-01 #> # … with 12 more variables: releaseTime <dttm>, archived <lgl>, #> # subjectCode <chr>, surveyCode <chr>, frequencyCode <chr>, #> # corrections <chr>, dimensionNameEn <chr>, dimensionNameFr <chr>, #> # surveyEn <chr>, surveyFr <chr>, subjectEn <chr>, subjectFr <chr>, and #> # abbreviated variable names ¹cubeTitleFr, ²productId, ³cansimId, #> # ⁴cubeStartDate, ⁵cubeEndDate
Individual series in Statistics Canada data tables can also be
accessed by using individual numbered vectors. This is especially useful
when building reports using specific indicators. For convenience, the
cansim package allows users to specify named vectors, where
label field will be added to the returned data frame
containing the specified name for each vector.
get_cansim_vector(c("Metro Van Apartment Construction Price Index"="v44176267", "Metro Van CPI"="v41692930"), start_time = "2015-05-01", end_time="2015-08-01") #> Accessing CANSIM NDM vectors from Statistics Canada #> # A tibble: 5 × 12 #> DECIMALS VALUE REF_DATE releas…¹ SYMBOL frequ…² SCALA…³ COORD…⁴ VECTOR label #> <int> <dbl> <chr> <chr> <int> <int> <int> <chr> <chr> <chr> #> 1 1 122. 2015-05-01 2021-07… 0 6 0 27.2.0… v4169… Metr… #> 2 1 122. 2015-06-01 2021-07… 0 6 0 27.2.0… v4169… Metr… #> 3 1 122. 2015-07-01 2021-07… 0 6 0 27.2.0… v4169… Metr… #> 4 1 123. 2015-08-01 2021-07… 0 6 0 27.2.0… v4169… Metr… #> 5 1 153 2015-07-01 2015-11… 0 9 0 8.7.1.… v4417… Metr… #> # … with 2 more variables: val_norm <dbl>, Date <date>, and abbreviated #> # variable names ¹releaseTime, ²frequencyCode, ³SCALAR_ID, ⁴COORDINATE
The code in this package is licensed under the MIT license. The bundled table metadata in Sysdata.R, as well as all Statistics Canada data retrieved using this package is made available under the Statistics Canada Open Licence Agreement, a copy of which is included in the R folder. The Statistics Canada Open Licence Agreement requires that:
Subject to this agreement, Statistics Canada grants you a worldwide, royalty-free, non-exclusive licence to: - use, reproduce, publish, freely distribute, or sell the Information; - use, reproduce, publish, freely distribute, or sell Value-added Products; and, - sublicence any or all such rights, under terms consistent with this agreement. In doing any of the above, you shall: - reproduce the Information accurately; - not use the Information in a way that suggests that Statistics Canada endorses you or your use of the Information; - not misrepresent the Information or its source; - use the Information in a manner that does not breach or infringe any applicable laws; - not merge or link the Information with any other databases for the purpose of attempting to identify an individual person, business or organization; and - not present the Information in such a manner that gives the appearance that you may have received, or had access to, information held by Statistics Canada about any identifiable individual person, business or organization.
Subject to the Statistics Canada Open Licence Agreement, licensed products using Statistics Canada data should employ the following acknowledgement of source:
Acknowledgment of Source (a) You shall include and maintain the following notice on all licensed rights of the Information: - Source: Statistics Canada, name of product, reference date. Reproduced and distributed on an "as is" basis with the permission of Statistics Canada. (b) Where any Information is contained within a Value-added Product, you shall include on such Value-added Product the following notice: - Adapted from Statistics Canada, name of product, reference date. This does not constitute an endorsement by Statistics Canada of this product.