--- title: "Australian Weather Station Rpostgis Database On-Boarding Document" author: "" date: "Last update: `r format(Sys.time(), '%d %B, %Y')`" output: html_document: fig_caption: yes code_folding: show number_sections: false fontsize: 14pt --- ## Introduction to Accessing the Australian Weather Station Database This document provides a brief introduction to accessing a [PostgreSQL](https://www.postgresql.org/) [PostGIS](https//www.postgis.net/) database and executing GIS queries against it from within an R environment using the [RpostGIS](https://cran.r-project.org/web/packages/rpostgis/index.html) package. As a first step, we will install the needed packages. ### Installing the Required Packages ```{r} r = getOption("repos") r["CRAN"] = "http://cran.us.r-project.org" options(repos = r) install.packages('RPostgreSQL') # Required for accessing a PostgreSQL database install.packages('rpostgis') # required for making use of the PostGIS extension - rasters and vectors install.packages('sp') # for plotting results install.packages('maptools') # also for plotting results ``` Make the packages available: ```{r} library(RPostgreSQL) library(rpostgis) library(sp) library(maptools) ``` With these packages installed and available, we should be able to connect to the database. ```{r} pg = dbDriver("PostgreSQL") dns = "ec2-18-205-105-168.compute-1.amazonaws.com" con = dbConnect(pg, user="postgres", password="hendra", host=dns, port=5432, dbname="ausws") ``` The DNS string in the snippet above should mostly remain stable, but may change if system reboots are required. If you are unable to connect using the DNS string above, please contact me (Brian Lambert, Penn State) and I will furnish the new address. ## Database Table Information Now that we have established a connection, a few words about the three tables in the database you are likely to interact with. These tables are named *Places*, *Stations*, and *Weather*. The *Places* table is provided as convienence to enable accessing lat/lon coordinates for Australian cities, a truncated list of important table fields follows: Column Type ------------ ------------------------ gid integer name character varying(100) geom geometry(Point,4326) Table: Places Table The *Stations* table contains information on the 9,175 weather stations that have contribution weather information. The most important fields in this table are as follows: Column Type ----------------- ----------------------- station_number integer district_code integer name character varying(40) opened date closed date lat real lon real method character varying(20) state character varying(3) station_height real barometer_height real geom geometry(Point,4326) Table: Stations Table The *Weather* table contains more than 150 million weather observations, and the most important members of this table are as follows: Column Type Description ----------------- ---------------------- ---------------------- station integer [Station ID](http://www.bom.gov.au/climate/data/stations/) day date reporting day precip real accumulation of rainfall (mm) precip_qual character(1) rain_days integer number of days that rainfall accumulated measured_days integer max_temp real daily maximum temp against previous day (9am-9am) max_temp_quality character varying(1) max_days integer min_temp real daily minimum temp against day of observation (9am-9am) min_temp_quality character varying(1) min_temp_accum integer Table: Weather Table ## Example of a PostGIS Queries As a simple example, we will attempt to build up a query that will answer the following question: What are the maximum daily temperatures within a specific radius of Brisbane for the past 20 yrs? First we examine stations surrounding Brisbane, after having looked up the gid for Brisbane location in the *places* table: ```{r} querygid = "select gid from places where 'Brisbane' ~ name;" brisbaneGID <- dbGetQuery(con,querygid) #conncection and character string with SQL print(brisbaneGID) ``` Now we will fetch the geometry associated with Brisbane: ```{r} querybrisloc = "select gid, name, geom from places where 'Brisbane' ~ name;" brisbaneGeom <- pgGetGeom(con,query=querybrisloc) #returns spatial points dataframe with lat and long of Brisbane ``` Now we will query for the stations near Brisbane... The coordinate reference system (CRS) is WGS84 (latitute/longitude). Check out this brief explanation of [CRS](https://geocompr.robinlovelace.net/reproj-geo-data.html). The query is asking for stations that are within a buffer surrounding Brisbane that is a ceratin distance (0.25) based on the CRS (again, here the measurement in lat/long). ```{r} queryBrisbaneStations = "SELECT b.geom, b.name from places r, stations b where r.gid = 7085 and ST_within(b.geom,ST_Buffer(r.geom,0.25));" stationsBrisbane <- pgGetGeom(con,query=queryBrisbaneStations) ``` We can now check to see if the locations of the stations returned make at least visual sense, after setting our figure size: ```{r} knitr::opts_chunk$set(fig.width=16, fig.height=16) plot(brisbaneGeom) text(brisbaneGeom, brisbaneGeom$name) points(stationsBrisbane,pch=".") ``` Because the CRS is measured in lat/long, you'll see that the 0.25 will *not* give you an even distance N-S and E-W. To get an equal distance around Brisbane, you should first reproject to create a buffer (Australian Albers reprojection is recommended). You can also label the stations: ```{r} plot(brisbaneGeom) text(brisbaneGeom, brisbaneGeom$name) points(stationsBrisbane,pch=20, col = 'darkgoldenrod') text(stationsBrisbane,stationsBrisbane$name, cex = 0.75) ``` To acquire the maximum temperatures for the last twenty years, we would execute the following query. It takes more than ten minutes to execute, so it is currently commented out. Returns a spatial points dataframe for 23 stations (smaller buffer than above to reduce run time) ```{r} # queryBrisbane = "SELECT b.geom, b.station_number, y.station, y.day, y.max_temp from places r, stations b, weather y where r.gid = 7085 and ST_within(b.geom,ST_Buffer(r.geom,0.1)) and y.station = b.station_number and y.day > '2000-01-01';" #maxTempsBrisbane <- pgGetGeom(con,query=queryBrisbane) ``` ### Addtional resources You may find the following resources useful for using this database and interpreting the data within: - [Bureau of Meterology (BOM)](http://www.bom.gov.au/climate/cdo/about/sites.shtml) - source of raw data - [This](https://pdfs.semanticscholar.org/e407/6004f330b2f2130ca86402a35c2c732b803d.pdf) paper provides a useful overview of BOM data collection over the past century - [Cheatsheet](https://gist.github.com/Kartones/dd3ff5ec5ea238d4c546) for POSTGRES - [Article](https://journal.r-project.org/archive/2018/RJ-2018-025/RJ-2018-025.pdf) and [github explainer](https://gist.github.com/clhenrick/ebc8dc779fb6f5ee6a88) on PostGIS