Plotting the Average Driving Time to the Nearest Hospital in Ghana

Introduction

This post in an extension of my post on calculating the average walking time from any given location in Accra, Ghana to the nearest bar. Instead of walking times to bars this post uses the same methodology to calculate the driving time to the nearest hospital from any given point in Ghana. This involves getting data from OSM, planning walking routes and times in OSRM, plotting in cartography and ggplot2, using the sp, sf, raster, jstat, osmdata, and osrm packages.

This code took heavy inspiration and some direct code from these sources, all credit to these authors. Please have a look at what they did and some of the great tutorials they wrote:

Packages

These are the packages used for this project:

library(tidyverse)
library(osmdata)
library(sf)
library(maptools)
library(raster)
library(RColorBrewer)
library(sp)
library(gstat)
library(osrm)
library(ggmap)
library(ggtext)
library(cartography)
library(showtext)
library(readxl)
library(fuzzyjoin)
library(chron)
library(kableExtra)

#raster overrides the dplyr select function, so assign I assigned it back
select <- dplyr::select

Data

First I needed to the find the locations of all hospitals in Ghana, a non-trivial task. The two main sources of this information are:

  1. A list published on https://data.gov.gh/dataset/health-facilities-ghana. This list form 2012 contains the names and locations of 3,756 different health facilities in Ghana, including health directorate offices, senior homes, CHPS clinics, and Maternity Homes. The list is almost complete, but misses some location data (longitude and latitude).
  2. A 2019 public database by Joseph Maina, Paul O. Ouma, Peter M. Macharia, Victor A. Alegana, Benard Mitto, Ibrahima Socé Fall, Abdisalan M. Noor, Robert W. Snow and Emelda A. Okiro, which is an Africa-wide database of Government/public hospitals. This database does not have any private hospitals (at least not in Ghana), but it has some GPS coordinates for hospitals that are missing in the first source.

Using the download.file() command I can download these files straight into my working directory

# Download first dataset
download.file("https://data.gov.gh/sites/default/files/Health%20Facility%20in%20Ghana%202012_0.csv", "dataset_GOV.csv")
# Download second dataset
download.file("https://springernature.figshare.com/ndownloader/files/14379593", "dataset_Maina.xlsx")

This is how the first dataset looks like (I am using kable tables to make the HTML tables a bit more pleasant to look at):

# load data
dat_1 <- read_csv("dataset_GOV.csv")

head(dat_1)  %>%
  kable() %>%
  kable_styling(c("striped", "hover", "condensed"))
RegionDistrictFacility NameTypeTownOwnershipLatitudeLongitude
AshantiOffinso NorthA.M.E Zion ClinicClinicAfranchoCHAG7.40801-1.96317
AshantiBekwai MunicipalAbenkyiman ClinicClinicAnwiankwantaPrivate6.46312-1.58592
AshantiAdansi NorthAboabo Health CentreHealth CentreAboabo No 2Government6.22393-1.34982
AshantiAfigya-KwabreAboabogya Health CentreHealth CentreAboabogyaGovernment6.84177-1.61098
AshantiKwabreAboaso Health CentreHealth CentreAboasoGovernment6.84177-1.61098
AshantiOffinso NorthAboffour Health CentreHealth CentreAboffourGovernment7.12986-1.73294
# table hospital types
dat_1 %>%
  group_by(Type) %>%
  count() %>%
  kable() %>%
  kable_styling(c("striped", "hover", "condensed"))
Typen
Centre2
CHPS652
clinic2
Clinic1171
CPHS1
DHD1
District Health Directorate99
District Hospital82
Health Centre786
Hospital277
Maternity Home369
Metropolitan Health Directorate2
Metropolitan Hospital1
Municipal Health Directorate1
Municipal Health Directorate7
Municipal Hospital4
Others31
Polyclinic16
Psychiatric Hospital3
RCH152
Regional Health Directorate9
Regional Hospital9
Research Institution2
Teaching Hospital3
Training Institution74

For this specific post I am not interested in any health facility that is not classified as a hospital, so using some regex, I can filter only for types that include the string “Hospital”, which includes the Regional Hospitals, District Hospitals.

# filter only for Hospital
dat_1 <- dat_1 %>%
  filter(grepl("Hospital", Type))

dat_1%>%
  group_by(District,  Town, `Facility Name`) %>%
  mutate(n = n()) %>%
  filter(n != 1) %>%
  arrange(`Facility Name`) %>%
  kable() %>%
  kable_styling(c("striped", "hover", "condensed"))
RegionDistrictFacility NameTypeTownOwnershipLatitudeLongituden
Brong AhafoSeneKwame Danso HospitalHospitalKwame DansoGovernmentNANA2
Brong AhafoSeneKwame Danso HospitalHospitalKwame DansoGovernmentNANA2
AshantiMampong MunicipalMampong Quality HospitalHospitalMampongPrivate7.04621-1.401952
AshantiMampong MunicipalMampong Quality HospitalHospitalMampongPrivateNANA2
WesternAhanta WestNana Hima-Dekyi HospitalDistrict HospitalNAGovernmentNANA2
WesternAhanta WestNana Hima-Dekyi HospitalDistrict HospitalNAGovernmentNANA2
NorthernSavelugu-NantonSavelugu HospitalHospitalSaveluguGovernment9.61862-0.829292
NorthernSavelugu-NantonSavelugu HospitalDistrict HospitalSaveluguGovernment9.61862-0.829292
Brong AhafoSunyani MunicipalSDA HospitalHospitalSunyaniPrivateNANA2
Brong AhafoSunyani MunicipalSDA HospitalHospitalSunyaniCHAGNANA2

There are some duplicates in this data I need to deal with.

# using distinct to omit duplicates
dat_1 <- dat_1 %>%
  arrange(`Facility Name`) %>%
  distinct(District, Town, `Facility Name`, .keep_all = TRUE)

nrow(dat_1)
## [1] 374

Thera are 374 hospitals left. Now I can load in the second dataset and filter for only hospitals in Ghana.

dat_2 <- read_excel("dataset_Maina.xlsx", sheet = 1) %>%
  # filter for Ghana
  filter(Country == "Ghana") %>%
  # filter for hospitals
  filter(grepl("Hospital", `Facility type`))

There is one duplicate in this dataset:

dat_2 %>%
  group_by(Admin1, `Facility name`) %>%
  mutate(n = n()) %>%
  filter(n != 1) %>%
  kable() %>%
  kable_styling(c("striped", "hover", "condensed"))
CountryAdmin1Facility nameFacility typeOwnershipLatLongLL sourcen
GhanaGreater AccraAchimota HospitalHospitalMoH5.609531-0.228106Other2
GhanaGreater AccraAchimota HospitalHospitalMoH5.629220-0.214760GPS2

After some Googling, it is clear it should be the second Achimota Hospital.There are now 177 Hospitals in this dataset.

dat_2 <- dat_2 %>%
  filter(!(`Facility name` == "Achimota Hospital" & Lat == 5.609531))

glimpse(dat_2)
## Rows: 177
## Columns: 8
## $ Country         <chr> "Ghana", "Ghana", "Ghana", "Ghana", "Ghana", "Ghana...
## $ Admin1          <chr> "Ashanti", "Ashanti", "Ashanti", "Ashanti", "Ashant...
## $ `Facility name` <chr> "Agogo Presby Hospital", "Akomaa Memorial Hospital"...
## $ `Facility type` <chr> "Hospital", "Hospital", "Hospital", "Hospital", "Di...
## $ Ownership       <chr> "FBO", "FBO", "FBO", "FBO", "MoH", "MoH", "FBO", "M...
## $ Lat             <dbl> 6.796180, 6.439660, 7.053220, 6.843610, 6.801620, 6...
## $ Long            <dbl> -1.085290, -1.538370, -1.509640, -1.381201, -1.5137...
## $ `LL source`     <chr> "GPS", "GPS", "GPS", "Other", "GPS", "GPS", "GPS", ...
sum(is.na(dat_1$Latitude))
## [1] 42

A total of 42 locations are missing from the first dataset. I can join the second dataset to it to link some location data from it that is available in the second dataset, but not the first. I need to use a fuzzy-join, using the Levenstein-distance from the fuzzyjoin package, to allow for some misspellings (for example, I do want “Nkwanta Dsitrict Hospital” to match to “Nkwanta District Hospital” (typo in district in first dataset).

# match datasets
dat_complete <- dat_1 %>%
  stringdist_left_join(select(dat_2, Admin1, `Facility name`, Lat, Long),
            by = c("Facility Name" = "Facility name",
                   "Region" = "Admin1"),
            max_dist = 2,
            distance_col = "col",
            method = "lv") %>%
  mutate(Latitude  = ifelse(is.na(Latitude), Lat, Latitude),
         Longitude = ifelse(is.na(Longitude), Long, Longitude)) %>%
  select(- Lat, -Long, -`Facility Name.col`, -Region.col, -col)

# show entries that are still incomplete
dat_complete %>%
  filter(is.na(Latitude)) %>%
  kable() %>%
  kable_styling(c("striped", "hover", "condensed")) %>%
  scroll_box(width = "100%", height = "200px")
RegionDistrictFacility NameTypeTownOwnershipLatitudeLongitudeAdmin1Facility name
AshantiKumasi MetropolitanAbrepo County HospitalHospitalAbrepoPrivateNANANANA
AshantiKumasi MetropolitanAdiebeba HospitalHospitalAdiebebaPrivateNANANANA
AshantiKumasi MetropolitanAgyei Eli HospitalHospitalSokobanPrivateNANANANA
AshantiKumasi MetropolitanAhodwo Washie HospitalHospitalAhodwoPrivateNANANANA
AshantiKumasi MetropolitanAllen HospitalHospitalDichemsoPrivateNANANANA
AshantiKumasi MetropolitanAsafo Agyei HospitalHospitalDaabanPrivateNANANANA
AshantiKumasi MetropolitanAsawasi Boakye Dunkwa HospitalHospitalAsawasiPrivateNANANANA
AshantiKumasi MetropolitanBoampong HospitalHospitalSantasiPrivateNANANANA
Greater AccraAccra MetropolitanC & J MedicareHospitalTuduPrivateNANANANA
WesternBibiani-Anhwiaso-BekwaiCentral African Gold HospitalHospitalNAPrivateNANANANA
AshantiKumasi MetropolitanEffah HospitalHospitalBuokrom EstatePrivateNANANANA
Greater AccraAccra MetropolitanFamily Health Ltd.HospitalTeshiePrivateNANANANA
AshantiKumasi MetropolitanHebrona HospitalHospitalSokobanPrivateNANANANA
AshantiAfigya-KwabreHilltop HospitalHospitalBuohoPrivateNANANANA
Greater AccraGa WestHoly Dove HospitalHospitalNew AchimotaPrivateNANANANA
Greater AccraGa EastJilac Specialist HospitalHospitalDomePrivateNANANANA
AshantiAsante-Akim SouthJuaso SABS HospitalHospitalJuasoPrivateNANANANA
AshantiAsante-Akim North MunicipalKonongo Adom HospitalHospitalKonongoPrivateNANANANA
CentralMfantsemanMercy Maternity & Fistula Women CentreHospitalNAPrivateNANANANA
WesternAhanta WestNana Hima-Dekyi HospitalDistrict HospitalNAGovernmentNANANANA
AshantiKumasi MetropolitanNeuro-Psych HospitalHospitalPankronoPrivateNANANANA
Greater AccraAccra MetropolitanObenfo HospitalHospitalMatahekoPrivateNANANANA
AshantiAfigya-KwabrePACKS HospitalHospitalAfranchoPrivateNANANANA
AshantiKumasi MetropolitanPatasi Suame Nkwanta HospitalHospitalPatasiPrivateNANANANA
AshantiKumasi MetropolitanSDA HospitalHospitalKwadasoPrivateNANANANA
Brong AhafoSunyani MunicipalSDA HospitalHospitalSunyaniPrivateNANANANA
WesternShamaSt.Benedit HospitalHospitalNAPrivateNANANANA
NorthernTamale MetropolitanTania Specialist HospitalHospitalKanvilliPrivateNANANANA
Greater AccraAccra MetropolitanTwumasi-Waah Memorial HospitalHospitalEast Legon (before America house road)PrivateNANANANA
Greater AccraGa EastVan Medical CentreHospitalAshongmanPrivateNANANANA

There are still 30 missing locations, 29 from Private Hospitals, and 1 government hospitals. For these I will use the mutate_geocode() function to query Google Maps to find me latitude and longitude for these hospitals. This is not a free service, but Google gives 200 USD to use, which is a lot (every query is only a few cents or less).

# you can get your key here: https://console.cloud.google.com/google/maps-apis/
# register google key
register_google("MY_GOOGLE_API_KEY")

# get locations from Google
Locations_from_google <- dat_complete %>%
  filter(is.na(Latitude)) %>%
  mutate(address_string = paste0(`Facility Name`, " ", Region, " ", Town, " Ghana")) %>%
  mutate_geocode(address_string)

# complete dataset
dat_complete <- bind_rows(filter(dat_complete, !is.na(Latitude)),
          Locations_from_google)%>%
  mutate(Latitude  = ifelse(is.na(Latitude), lat, Latitude),
         Longitude = ifelse(is.na(Longitude), lon, Longitude)) %>%
  select(-Admin1 ,- `Facility name`, - address_string,-   lon ,-lat)

There are only 3 hospitals left for which I cannot find a geo-location (even if I google myself). For the hospital with at least the name of the town, I will simply take the central location of the town. The other 2 hospitals will be dropped.

town <- dat_complete %>%
  filter(is.na(Latitude))  %>%
  filter(!is.na(Town))%>%
  mutate(address_string = paste0(Region, " ", Town, " Ghana")) %>%
  mutate_geocode(address_string)

dat_complete <- bind_rows(filter(dat_complete, !is.na(Latitude)),
          town) %>%
  mutate(Latitude  = ifelse(is.na(Latitude), lat, Latitude),
         Longitude = ifelse(is.na(Longitude), lon, Longitude)) %>%
  select( -address_string, -lon , -lat)

Upon further inspection, there are 3 points added by Google that are clearly outside of Ghana (Max latitude of Ghana is 11.17).

dat_complete %>%
  arrange(desc(Latitude)) %>%
  head()  %>%
  kable() %>%
  kable_styling(c("striped", "hover", "condensed"))
RegionDistrictFacility NameTypeTownOwnershipLatitudeLongitude
AshantiKumasi MetropolitanNeuro-Psych HospitalHospitalPankronoPrivate41.67122-86.15888
CentralMfantsemanMercy Maternity & Fistula Women CentreHospitalNAPrivate38.64539-90.44668
Greater AccraAccra MetropolitanC & J MedicareHospitalTuduPrivate37.02745-95.61599
Upper EastBawku MunicipalBawku HospitalMunicipal HospitalBawkuCHAG11.06124-0.23287
Upper EastBawku WestZebilla HospitalDistrict HospitalZebillaGovernment10.93621-0.47351
Upper EastBongoBongo HospitalDistrict HospitalBongo/WegurugoGovernment10.91450-0.80727
I will also drop these hospitals.
dat_complete <- dat_complete %>%
  arrange(desc(Latitude)) %>%
  filter(Latitude <  11.18)

Now I have 369 hospitals with name and location. As a first step I can plot these on an overlay of the map of Ghana. In 2019, Ghana went from 10 to 16 regions, so most data and shapefiles that are available online still use the 10 region. I found a shape file of the 16 regions here.

# load the shape file both as sf and  SpatialPolygonsDataFrame
Ghana_16_region <- st_read("GHANA_16_REGIONS-shp/GHANA_16_REGIONS.shp",   quiet = TRUE)
Ghana_16_region_df <- rgdal::readOGR("GHANA_16_REGIONS-shp/GHANA_16_REGIONS.shp", layer= "GHANA_16_REGIONS", verbose=FALSE)
Ghana_16_region <- select(Ghana_16_region, REGION, Shape__Are )

# set the hospital as SpatialPointsDataFrame
coordinates(dat_complete) <- ~ Longitude + Latitude
proj4string(dat_complete) <- proj4string(Ghana_16_region_df)

# using the over() function  to check in which of the 16 regions a hospital is located
dat_complete_df <-  bind_cols(as_tibble(dat_complete),
                                sp::over(dat_complete, Ghana_16_region_df, fun = "sum"))

# summarize the number of hospitals per region
Ghana_16_region <- Ghana_16_region %>%
  left_join(
  dat_complete_df %>%
  group_by(REGION ) %>%
  summarise(N_hospitals = n()),
  by = "REGION")

# create colour breaks   
bks <- getBreaks(v = Ghana_16_region$N_hospitals, method = "jenks", nclass = 10)
cols <- carto.pal(pal1 = "sand.pal", n1 = 11)

# plot the background
plot(Ghana_16_region$geometry, border = NA, col = NA, bg = "#A6CAE0")

# plot a choropleth with number of hospitals per region
choroLayer(
  add = TRUE,
  x = Ghana_16_region,
  var = "N_hospitals",
  breaks = bks,
  border = "burlywood3",
  col = cols,
  legend.pos = "topleft",
  legend.values.rnd = 1,
  legend.title.txt = "number of hospitals\nper region"
)
# plot the hospital points
plot(dat_complete, add = T)

# add labels
layoutLayer(title = "Locations of Hospitals Ghana",
            author = "Data by GHS and Maina et al. • Vizualization by Laurent Smeets",
            frame = TRUE,
            scale = "auto",
            north = TRUE,
            theme = "sand.pal")

Driving Times

To calculate this driving times, I need to query some type of routing service to calculate the driving distance (or driving actually time) from a given place to the nearest hospital Google Maps would be one option to use, but that is not a free API to use, so the Open Source Routing Machine (OSRM) is a more suitable open-source alternative. There is an online demo version available, but that one will not be sufficient for the number of driving distances I want to calculate.

OSRM

To run OSRM on a windows machine, you would ideally have Docker Desktop installed (https://www.docker.com/products/docker-desktop). To do so you will need a newer version of Windows 10 (version 2004 or higher, released in May 2020). Install instructions can be found here.

After installing Docker desktop, you should also make sure you have the wget command installed for windows (if you are using a Windows operating system), more information on that here. Now just open the Docker Desktop programme and your command line interface (type “CMD” into you Windows search bar to open the command line interface).

docker pull osrm/osrm-backend

Second navigate to the folder from where you want to run this project, by using the cd command.

cd NAME_OF_FOLDER\NAME_OF_SUBFOLDER\ETC

Then download the map of the country you want to use in this case Ghana.

wget http://download.geofabrik.de/africa/ghana-latest.osm.pbf

Now extract the data from this file (These commands are Windows specific for, Linux specific commands see the OSRM tutorial page). As, I am interested in driving times I use opt/car.lua instead of opt/foot.lua.

#docker run -t -v %cd%:/data osrm/osrm-backend osrm-extract -p /opt/foot.lua /data/ghana-latest.osm.pbf
docker run -t -v %cd%:/data osrm/osrm-backend osrm-extract -p /opt/car.lua /data/ghana-latest.osm.pbf

It will end with something looking like this:

[info] ok, after 15.887s
[info] Processed 2214859 edges
[info] Expansion: 103744 nodes/sec and 31209 edges/sec
[info] To prepare the data for routing, run: ./osrm-contract "/data/ghana-latest.osrm"
[info] RAM: peak bytes used: 1406402560

Now you have to partition the data using this command:

docker run -t -v %cd%:/data osrm/osrm-backend osrm-partition /data/ghana-latest.osrm

Which will give you an output similar to this:

[info] Level 1 #cells 4479 #boundary nodes 102100, sources: avg. 15, destinations: avg. 22, entries: 1923670 (15389360 bytes)
[info] Level 2 #cells 370 #boundary nodes 14768, sources: avg. 26, destinations: avg. 39, entries: 492861 (3942888 bytes)
[info] Level 3 #cells 23 #boundary nodes 1300, sources: avg. 37, destinations: avg. 54, entries: 60780 (486240 bytes)
[info] Level 4 #cells 1 #boundary nodes 0, sources: avg. 0, destinations: avg. 0, entries: 0 (0 bytes)
[info] Bisection took 13.7743 seconds.
[info] RAM: peak bytes used: 487546880

Now you have to customize the data using this command:

docker run -t -v %cd%:/data osrm/osrm-backend osrm-customize /data/ghana-latest.osrm

Which will give you an output similar to this:

[info] MLD customization writing took 0.421035 seconds
[info] Graph writing took 0.503969 seconds
[info] RAM: peak bytes used: 400687104

Now you are ready to use OSRM, to start the programme simply run:

docker run -t -i -p 5000:5000 -v %cd%:/data osrm/osrm-backend osrm-routed --algorithm mld --max-table-size 10000  /data/ghana-latest.osrm

Which will give you an output similar to this:

[info] starting up engines, v5.22.0
[info] Threads: 8
[info] IP address: 0.0.0.0
[info] IP port: 5000
[info] http 1.1 compression handled by zlib version 1.2.8
[info] Listening on: 0.0.0.0:5000
[info] running and waiting for requests
options(osrm.server = "http://localhost:5000/", osrm.profile = "drving")

Building a Grid

I have the locations of the hospitals in Ghana, the next step is getting a grid of coordinates from where to calculate distances from. I could just draw a random sample of coordinates that lie within Ghana and calculate the driving time (in minutes) from these points to the nearest hospital, but it is more efficient to create a grid of coordinates. This means that every 0.025 degree in longitude and 0.025 degree in latitude within the bounding box of Ghana a coordinate is created. (I know that one degree difference in longitude is not the same a one degree difference in latitude in km. However, Ghana is close enough to zero degrees latitude and longitude (0°N 0°E), that this difference does really not matter that much. Also I will interpolate the data later so any tiny gaps will be smoothed over.)

See this page on how driving speeds are calculated in OSRM. In summary, base driving speed is set at the maximum speed of a road penalty multipliers are added for different types of road surfaces and traffic situations. For example, corners have a multiplier of .75, meaning that driving speed of a place with a 50km/h speed limits in a corner is not 50km/h but, but $50\cdot0.75 = 37.5$ km/h.

# extract the bouding box
bounding_box <- Ghana_16_region_df@bbox

# create the grid, using expand.grid()
grid <- expand.grid(seq(bounding_box[1,1], bounding_box[1,2], by = 0.025),
                    seq(bounding_box[2,1], bounding_box[2,2], by = 0.025)) %>%
  as_tibble() %>%
  mutate(id = 1:n()) %>%
  relocate(id) %>%
  setNames(c("id", "lon","lat"))


# plot the grid, with as background the shapefile of Ghana
ggplot() +
  geom_point(data = grid,
             aes(x = lon, y = lat),
             col = "red", size = .01) +
    geom_polygon(data = fortify(Ghana_16_region_df),
               aes(x=long, y=lat, group = group),
               fill = NA, col = "black") +
  theme_minimal()+
  coord_map()

nrow(grid)
## [1] 46182

There are 46182 different points in this bounding box, but not all fall within the borders of Ghana To subset only those points that fall within Ghana I use the following code, using the over() command from the sp package.

coordinates(grid) <- ~ lon + lat
proj4string(grid) <- proj4string(Ghana_16_region_df)

# select only those points that lie within the country boundaries
points_in_polygon <- bind_cols(over(grid, Ghana_16_region_df) %>% select(REGION),
                               as_tibble(grid) %>% select(id, lon, lat )) %>%
  filter(!is.na(REGION )) %>%
  as_tibble() %>%
  select(-REGION ) %>%
  mutate(id = 1:n())

# plot it again
ggplot() +
  geom_point(data = points_in_polygon,
             aes(x = lon, y = lat),
             col = "red", size = .01) +
  geom_polygon(data = fortify(Ghana_16_region_df),
               aes(x=long, y=lat, group = group),
               fill = NA, col = "black") +
  theme_minimal()+
  coord_map()

nrow(points_in_polygon)
## [1] 30343

Now there are only 30343 point left. The next part of code will take some time to run as it needs to calculate the traveling distance/time from 46182 locations to 369 hospitals (11,196,567 routes in total), and pick the shortest travel time. There are definitely ways to optimize this code (for example, I could decide to only calculate the route to the 10 Great-circle distance nearest hospitals and calculate travel times form them). But I only intend to run this code once, and computers are patient workers. I do split the points in some different sections, to limit the complexity on the OSRM server a bit. All in all, this takes like 5 minutes on my machine.

# Split data in to chunks of 500
points_in_polygon$section <- rep(1:ceiling(nrow(points_in_polygon)/500), each = 500)[1:nrow(points_in_polygon)]

# this input to the osrmTable is only allowed to have 3 columns.
hospital_locations_lat_long <- dat_complete_df %>%
  mutate(id = 1:n()) %>%
  select(id, Longitude, Latitude) %>%
  rename(lat = Latitude,
         lon = Longitude)

# create an empty list
list_points <- list()

# iterate of the different 500 point chunks
for(i in unique(points_in_polygon$section)){

  # select current chunk
  data_points <- points_in_polygon %>%
  filter(section == i) %>%
  select(-section)

  # this is the OSRM function that calculates the travel time
  travel_time_story <- osrmTable(src = data_points,
                                 dst = hospital_locations_lat_long,
                                 measure = "duration")

  # for every point the function calculates the travel time to every hospital
  # I only need to save the driving duration to the nearest hospital
  list_points[[i]] <- bind_cols(travel_time_story$sources,
                                 mintime = apply(travel_time_story$durations, 1, min)) %>%
    as_tibble()

}

# I  bind the travel duration back on to the original grid
points_in_polygon_with_travel_times <- data.table::rbindlist(list_points) %>%
  bind_cols(points_in_polygon %>%
              select(lon, lat) %>%
              rename(lon.org = lon,
                     lat.org = lat))
median(points_in_polygon_with_travel_times$mintime)
## [1] 39.9
mean(points_in_polygon_with_travel_times$mintime)
## [1] 48.09315

The median travel time from any given point in Ghana is around 40 minutes and the mean travel time 48 minutes (that does of course not mean that the average Ghanaian lives 40 minutes from a hospital, as this does not take population density into consideration).

We can also calculate the longest possible travel time to the nearest hospital.

longest_travel_time <- points_in_polygon_with_travel_times[which.max(points_in_polygon_with_travel_times$mintime), ]

longest_travel <- osrmTable(src = data.frame(id = 1 ,
                          lon = longest_travel_time[1,4],
                          lat = longest_travel_time[1,5]),
          dst = hospital_locations_lat_long,
          measure = "duration")


longest_travel_df <- tibble(id = 1,
                            longest_travel$sources[1],
       longest_travel$sources[2],
       mintime = apply(longest_travel$durations, 1, min),
       lon_end = longest_travel$destinations[which.min(longest_travel$durations), 1],
       lat_end = longest_travel$destinations[which.min(longest_travel$durations), 2])

route <- osrmRoute(src = longest_travel_df[,c(1,2,3)],
          dst = longest_travel_df[,c(1,5,6)],
          overview = "full",
          returnclass = "sf")

This route is around 3 hours and 20 minutes (according to OSRM algorithm over unpaved road) and the distance 94 km over

sub(":\\d{2}", "", times((route$duration%/%60 +  route$duration%%60/3600)/24))
## [1] "03:18"
route$distance
## [1] 93.8869

If I quickly plan this road we see that it route is in Brong-Ahafo Region in the centre of Ghana.

# Display the path
plot(Ghana_16_region$geometry, border = "black", col = "white", bg = "grey")
plot(st_geometry(route), lwd = 5, add = TRUE, col = "red")

A bit more zoomed in the plot looks like this (using OpenStreetMap as a background).

osm <- getTiles(x = route, crop = FALSE, type = "osm", zoom = 13)
tilesLayer(osm)
plot(st_geometry(route), lwd = 4, add = TRUE)
plot(st_geometry(route), lwd = 1, col = "white", add = TRUE)

If I plot this, this is how it looks, It starts resembling what I am looking for.

# for points with more than 2 hour driving distance, I cap them at 2 hours (for plotting purposes)

points_in_polygon_with_travel_times <- points_in_polygon_with_travel_times %>%
    mutate(mintime_capped = ifelse(mintime > 120, 120, mintime))


ggplot() +
  geom_polygon(data = fortify(Ghana_16_region_df),
               aes(x=long, y=lat, group = group),
               fill = NA, col = "black") +
  geom_point(data = points_in_polygon_with_travel_times,
             aes(x = lon.org, y = lat.org, col = mintime_capped), size = .9, alpha = .9) +
               viridis::scale_color_viridis(option = "magma", direction = -1) +
  theme_minimal()+
  coord_map()

Beautify the Plot

Now we just need to make a raster file out of these points, interpolate between points, project it over a real map of Ghana, and make the plot a bit more beautiful.

##correct spatial reference for Ghana here: https://spatialreference.org/ref/
CRS_string <- "+proj=aea +lat_1=20 +lat_2=-23 +lat_0=0 +lon_0=25 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs"


# we set a resolution
gridRes  <- 100

# transform data to SpatialPointsDataFrame  with longitude/latitude projection
data_sp <- points_in_polygon_with_travel_times
coordinates(data_sp) <- ~lon.org+lat.org
proj4string(data_sp) <- crs("+init=epsg:4326")
coords <- c("lon", "lat")

data_sf <- st_as_sf(data_sp, coords = coords, crs = proj4string(data_sp))

# Reset it to the projection of Ghana
data_sp_mp <- spTransform(data_sp, CRSobj = CRS_string)


# Here we set the bounding box I want to use (plus a bit of a margin)
# We also create a new grid that is used for the interpolation

boxx = data_sp_mp@bbox
deltaX = as.integer((boxx[1,2] - boxx[1,1]) + 1.5)
deltaY = as.integer((boxx[2,2] - boxx[2,1]) + 1.5)
gridSizeX = deltaX / gridRes
gridSizeY = deltaY / gridRes
grd = GridTopology(boxx[,1], c(gridRes,gridRes), c(gridSizeX,gridSizeY))
pts = SpatialPoints(coordinates(grd))


# Reset it to the projection of Ghana
proj4string(pts) <- crs(CRS_string)

# create an empty raster
r.raster <- raster::raster()
# set the extent of the raster to the data points we gave
extent(r.raster) <- extent(pts)
# set the resolution (cell size)
res(r.raster) <- gridRes
# set the correct crs
crs(r.raster) <- crs(CRS_string) #

# Interpolate the grid cells Nearest neighbour function (k = 30) using the gstat package
gs <- gstat(formula = mintime_capped~1,
            locations = data_sp_mp,
            nmax = 30,
            set = list(idp = 0))

# This might take a while to run
raster_object <- interpolate(r.raster, gs)

# ad both object to a list
minraster <- list(data_sf, raster_object)
names(minraster) <- c("data", "raster_object")

# Instead of the 16 different regions, I can create a single shapefile of Ghana,
# combining all the regions into one shape file
Ghana_country <- as(st_union(Ghana_16_region), "Spatial")


Ghana_country <- spTransform(Ghana_country, CRSobj = CRS_string)
minraster_shaped <- mask(minraster$raster_object, Ghana_country)
#saveRDS(minraster_shaped , "minraster_shaped_120.RDS")
#saveRDS(CRS_string , "CRS_string.RDS")

CRS_string <- readRDS("CRS_string.RDS")
minraster_shaped <- readRDS("minraster_shaped_120.RDS")

To plot this is ggplot2 I need to make a polygon object out of the raster object.

# you could use this code to speed up the plotting.

 minraster_shaped_aggr <- aggregate(minraster_shaped, fact = 5)
# again, this function might take a while to run
# if it takes to long, set a higher aggregation factor
# or a higher Gridres
# or less points


minraster_shaped_rtp <- rasterToPolygons(minraster_shaped_aggr)

# we go back to standard long/lat projection for plotting purposes
minraster_shaped_rtp <- spTransform(minraster_shaped_rtp,
                                    CRS("+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))

# Now I fortify this data to plot it
minraster_shaped_rtp@data$id <- 1:nrow(minraster_shaped_rtp@data)
minraster_shaped_fort <- fortify(minraster_shaped_rtp,
                                 data = minraster_shaped_rtp@data)
minraster_shaped_fort <- merge(minraster_shaped_fort,
                               minraster_shaped_rtp@data,
                               by.x = 'id',
                               by.y = 'id')

I would like to have some real map as a background, so I download it from Stamenmap and use a function I found here to crop out a polygon and plot the result.

# download background map tiles
# for this level of zoom, you might have to run this command a few times

Ghana_map_bg <- get_stamenmap(bbox = bounding_box, zoom = 9, maptype="terrain-background", color = "bw", crop = F)

# load the function to make a raster file out of the tiles
ggmap_rast <- function(map){
  map_bbox <- attr(map, 'bb')
  .extent <- extent(as.numeric(map_bbox[c(2,4,1,3)]))
  my_map <- raster(.extent, nrow= nrow(map), ncol = ncol(map))
  rgb_cols <- setNames(as.data.frame(t(col2rgb(map))), c('red','green','blue'))
  red <- my_map
  values(red) <- rgb_cols[['red']]
  green <- my_map
  values(green) <- rgb_cols[['green']]
  blue <- my_map
  values(blue) <- rgb_cols[['blue']]
  stack(red,green,blue)
}


# make a raster file out of the maps
Ghana_map_bg_rast <- ggmap_rast(Ghana_map_bg)

# crop the raster file
Ghana_map_bg_rast_masked <- mask(Ghana_map_bg_rast, Ghana_16_region_df, inverse=F)

# create a geospatial dataframe out of the raster file
Ghana_map_crop_masked_df <- data.frame(rasterToPoints(Ghana_map_bg_rast_masked))

sysfonts::font_add_google("Cinzel", "Cinzel") # this is the font I will be using

#These are needed to render the fonts
showtext_auto()
x11()
# create the map in ggplot
static_map  <- ggplot() +
   geom_point(data = Ghana_map_crop_masked_df,
              aes(x = x,
                  y = y,
                  col = rgb(layer.1/255,
                            layer.2/255,
                            layer.3/255))) +
  geom_polygon(data = fortify(Ghana_16_region_df),
               aes(x=long, y=lat, group = group),
               fill = "grey", col = "black") +
  scale_color_identity() +
  geom_polygon(data = minraster_shaped_fort,
               aes(x = long,
                   y = lat,
                   group = group,
                   fill = var1.pred,
                   alpha = var1.pred),
               size = 0) +
  scale_fill_gradient(high = "darkred",
                       low = "white",
                       name = "Driving Time",
                       limits = c(0, 120),
                       breaks = c(0, 60, 120),
                       labels = c("0","1h", "+2h"),
                       guide = guide_colorbar(
                         direction = "horizontal",
                         barheight = unit(10, units = "mm"),
                         barwidth = unit(5, units = "inch"),
                         draw.ulim = T,
                         title.position = 'top',
                         title.hjust = 0.5,
                         label.hjust = 0.5,
                         legend.location = "bottom"))+
  scale_alpha_continuous(guide = "none",
                         range = c(0.5, .9))+
  geom_point(data = dat_complete_df, aes(x=Longitude   , y=Latitude ),
             col = "white",
             size = 2,
             shape = 16,
             alpha= .8)  +
  labs(title  = "Driving Time to Nearest Hospital",
       caption = "Data by OpenStreetMap • Map by Stamenmap • Driving times by OSRM • Visualization by Laurent Smeets") +
  coord_map()+
  theme_minimal()+
  theme(
    # Change plot and panel background
    legend.position = "bottom",
    plot.background=element_rect(fill = "black"),
    panel.background = element_rect(fill = 'black'),
    plot.title = element_markdown(family = "Cinzel",
                                  size = 70,
                                  face = "bold",
                                  hjust = 0.5,
                                  color = "white",
                                  margin = margin(t = 20, b = 2)),
    legend.title = element_markdown(family = "Cinzel",
                                    color = "white",
                                    size = 50
                                    ),
    legend.text = element_text(size = 50,
                               color = "white"),
    plot.caption = element_markdown(size = 25,
                                color = "white",
                                family = "Cinzel",
                                hjust = 0.5))


#Do not save this is a .PDF, the raster file as a polygon, is HUGE in a vector file format
ggsave("static_map_high_res.png", static_map, height = 12, width = 8, dpi = 300)

This is how the plot looks:

Notes on Map

  • Please do not use this data in any form as medical (travel) advice. I have not verified the operation status of any of these hospital, nor am I familiar with the quality of medical service at any of these hospitals.
  • The big red cloud of far travel times in the North East of the country correspond with the location of Mole National Park.
  • In general, you can, as expected, see the furthest travel times outside of big cities and between major highways.
  • This information does not take traffic into consideration. Anyone who has ever been in Accra can tell you that even within Accra you can travel 2 hours to the nearest hospital on a Friday afternoon.
Ghana Data Stuff
Ghana Data Stuff

This website is my hobby project to showcase some of the project I am working one, when I am not working on the official statistics at the Ghana Statistical Service.

Related