Plotting Street Name Suffixes in R

Introduction

In this document I will show how I created this nice map depicting the different suffixes of all the streets in Accra, Ghana. This requires loading data from OpenStreetMap, cleaning these data, and some advance plotting in ggplot2. I will also make use of the new ggtext package to create some beautiful and colourful plot titles.

This code took heavy inspiration and some direct code from these sources, all credits to these original authors. Please have a look at what they did and some of the great tutorials they wrote:

Packages

These are the packages used for this project:

library(sf)
library(foreign)
library(tidyverse)
library(osmdata)
library(ggtext)
library(showtext)
library(stringr)
# font_add_google("Oswald", "Oswald") # this is the font I will be using

Download raw data

I am using the osmdata package to download the data directly from OSM into R, but for this project you could also download the data to your machine first and then pull it in to R. You can find the relevant file (.shp.zip) for the whole of Ghana here.

loc <- "Accra Ghana" # This is the city I want to create a map for

bounding_box <- getbb(loc) # this function looks for the square bounding box for this city

# I like to add a bit of margins to all sides of the bounding box
bounding_box_wide <- bounding_box
bounding_box_wide["x", "min"] <- bounding_box["x", "min"] - 0.1
bounding_box_wide["y", "min"] <- bounding_box["y", "min"] - 0.1
bounding_box_wide["x", "max"] <- bounding_box["x", "max"] + 0.1
bounding_box_wide["y", "max"] <- bounding_box["y", "max"] + 0.1

This what the boding box now looks likes

bounding_box_wide
##          min       max
## x -0.4657437 0.0542563
## y  5.3000141 5.8200141

Using this bounding box we can now download the data from OSM using the syntax of the osmdata package

roads <- bounding_box_wide %>%
  opq()%>%
  add_osm_feature(key = "highway") %>%
  osmdata_sf()

# Since we are plotting roads, we are just interested in the OSM lines
# We can omit things like OMS points and polygons
roads <- roads$osm_lines

# remove unnamed footpaths
roads <- roads[!(roads$highway  == "footway" & is.na(roads$name)),]

We can simply plot these roads like this:

ggplot() +
  geom_sf(data = roads,
          inherit.aes = FALSE,
          color = "black",
          size = .05,
          alpha = .5)

Subset Data

I personally think it is a nice visual effect to plot a circular city with a well-defined central point. Of course, no city’s boundaries are actually round and neither are those of Accra, this is just a visual effect. I just think it is a nice visual effect to centre Accra around a Independence square and the Black Star Gate, so I looked up the coordinates of the square and drew a 12km radius around it and filtered all road (segments) within that radius. To draw this circle I had to look up the correct spatial reference here: https://spatialreference.org/ref/. If you are using this code for a city that is not in West-Africa, you probably need to use a different reference system.

# subset the roads into a circle.
mid_point <- data.frame(lat = 5.548790, long = -0.192898 ) %>%  # This is the location of Black Star Gate
  st_as_sf(coords = c("long", "lat"), crs = 4326) %>% # # standardize the map projection
  st_transform("+proj=aea +lat_1=20 +lat_2=-23 +lat_0=0 +lon_0=25 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs ")

circle <- st_buffer(mid_point, dist = 12000) # this dist is distance in meters.
circle <- circle %>% st_transform(st_crs(roads))
roads <- st_intersection(circle, roads)

Now the map looks like this (With a red dot for the Black Star Gate)

ggplot() +
  geom_sf(data = roads,
          inherit.aes = FALSE,
          color = "black",
          size = .1,
          alpha = .5) +
  geom_sf(data = mid_point,
          inherit.aes = FALSE,
          color = "red",
          size = 5,
          alpha = .7)

Get Street Names

With some simple regex manipulations, using the stringr package, we can see what are common street suffixes in Accra (only within this 24km diameter). First lets see how the data looks like, by showing the first 20 road names.

roads$name[1:20]
##  [1] "Nortei Ababio Road"    NA                      NA                     
##  [4] "Volta Road"            "South Liberation Link" NA                     
##  [7] NA                      NA                      "4th Circular Road"    
## [10] "John Churcher Loop"    NA                      NA                     
## [13] "Patrice Lumumba Road"  NA                      NA                     
## [16] "Jasmine Road"          "Prestige Link"         "2nd Circular Road"    
## [19] "Kakramadu Road"        "Violet Road"

And this is how we can extract the suffixes for these roads

word(roads$name,-1)[1:20]
##  [1] "Road" NA     NA     "Road" "Link" NA     NA     NA     "Road" "Loop"
## [11] NA     NA     "Road" NA     NA     "Road" "Link" "Road" "Road" "Road"

Some content knowledge is needed to write code like this. For example, this code assumes that the street suffix comes after the main name of the street. So, in France where road names might start with Rue instead of end with it, this would not work. Similarly, in places like the Netherlands where the suffix is attached to the name (e.g. Dorpstraat), this would not work.

If we tabulate the results this is what we get these results:

table(stringr::word(roads$name,-1), exclude=NULL)
##
##                        10           18  Addo)street Agbogbloshie        Alley
##           27            5            1            1           10            4
##       Avenue      Avernue       Bonnie    Boulevard       Bridge     Broadway
##          555           12            1            5            1            2
##     Bulevard         Bush      Central        Circ.       Circle        close
##            2            1           78            1           51            1
##        Close        CLose        Court     crescent     Crescent      Cresent
##          301            1            1            5          117            1
##        Drive         East          Esi    Extension         Gate        Guava
##           62           15            2            9            3            1
##      Hermosa      Highway  Hypermarket  Interchange     Junction         Kubi
##            1          149            4           23            1            1
##       Lagoon         lane         Lane         LAne         link         Link
##            3            3          187            1            1          219
##        Linlk         Linx         Loop       Mawena       Mawuli       Mirada
##            1            1           45            2            1            1
##       Mosque     Motorway        Ofram      Ollennu         Oyeo         path
##            1           82            1            2           11            1
##         Path     proposed         Raod        Ridge         road         Road
##            2            1            9            3           20         1172
##          Rod   Roundabout     Shortcut      Sithole          Sol        South
##            1           60            1            4            1            1
##        Spiga           St       Streer       street       Street        Stret
##            1            1            1            7         1396            1
##       Tunnel        Vista        Volta         Walk          Way         West
##            1            4            2            2           13           14
##         <NA>
##        10905

Data Cleaning

We see that there are quite some missing values (meaning that street is not named), and quite some misspellings. Surely “Avernue”, should just be “Avenue”. I need to clean this data first, before I can plot it. These are the steps I took:

  1. fix certain common misspelling
  2. For roads that have as suffix a cardinal direction (Ring Road East, Ring Road West, etc.) do not look at the last word of the street name, but the second to last word
  3. If the name street name is missing, simply call it “Unnamed’
  4. At this stage I am going to concatenate “Highways” and “Motorways” into one category instead of two categories
misspellings <- tribble(
  ~org, ~target,
  "10",   "Unnamed",
  "1", "Unnamed",
  "18", "Unnamed",
  "avenue",   "Avenue",
  "Avernue",   "Avenue",
  "Bulevard",   "Boulevard",
  "Circ.", "Circle",
  "close", "Close",
  "CLose", "Close",
  "Cresent ", "Crescent",
  "crescent", "Crescent",
  "lane", "Lane",
  "LAne", "Lane",
  "Linlk", "Link",
  "Lk", "Link",
  "LK", "Link",
  "Linx", "Link",
  "Raod", "Road",
  "road", "Road",
  "Rod", "Road",
  "Rd", "Road",
  "Rd.", "Road",
  "Roaf", "Road",
  "LP", "Loop",
  "Addo)street",   "Street",
  "St", "Street",
  "Streer", "Street",
  "street", "Street",
  "Stret", "Street")


roads_cleaned <- roads %>%
  mutate(Suffix = ifelse(is.na(name), "Unnamed",
                         ifelse(tolower(stringr::word(name,-1)) %in% c("north", "west", "east", "south", "central"),
                                stringr::word(name,-2),
                                stringr::word(name,-1)))) %>%
  left_join(misspellings, by = c("Suffix" = "org")) %>%
  mutate(Suffix = ifelse(is.na(target), Suffix, target),
         Suffix = ifelse(Suffix %in% c("Motorway", "Highway"), "Motorway/Highway", Suffix)) %>%
  select(osm_id, highway, name,  Suffix, geometry)

Common suffixes

Lets see what the most common road suffixes are in Accra.

top_road_names <- roads_cleaned %>%
  select(Suffix) %>%
  as_tibble() %>%
  count(Suffix) %>%
  arrange(desc(n)) %>%
  top_n(11, n)

top_road_names
## # A tibble: 11 x 2
##    Suffix               n
##    <chr>            <int>
##  1 Unnamed          10911
##  2 Street            1407
##  3 Road              1309
##  4 Avenue             567
##  5 Close              303
##  6 Motorway/Highway   231
##  7 Link               222
##  8 Lane               191
##  9 Crescent           122
## 10 Drive               62
## 11 Roundabout          60

Percentage of Roads with Suffix

There are two ways to calculate the prevalence of road suffixes, either by total roads with a given suffix or by km of road with a given suffix. This first option is not ideal, since OSM data splits a road in many different segments and maybe there is some bias on how this happens. Maybe straight lines highways are split in less segments than unnamed curvy backroads, or vice versa. So it is better to base calculation on the total of km of roads with a certain suffix. To calculate the lengths of roads we can use the st_length() function. I also did some further data cleaning and renamed all suffixes that are not among the 11 most common names to “Other”.

roads_cleaned <- roads_cleaned %>%
  mutate(length_road = st_length(geometry)) %>%
  mutate(Suffix = ifelse(Suffix %in% c(top_road_names$Suffix), Suffix, "Other"))

Now I can summarise these results.

summary_stats <- roads_cleaned %>%
  select(-geometry) %>%
  as_tibble() %>%
  group_by(Suffix) %>%
  summarise(km = as.numeric(sum(length_road)/1000),
            n = n(),
            mean_km = km/n,
            .groups = "drop") %>%
  mutate(percent_road_network = km/sum(km),
         percent_roads = n/sum(n))

This is how this looks in a bar chart:

summary_stats %>%
  ggplot() +
  geom_bar(aes(x = reorder(Suffix,percent_road_network) ,
               y = percent_road_network), stat = "identity") +
  scale_y_continuous(labels = scales::percent_format()) +
  coord_flip()

Around 63.3% of roads are still unnamed in Accra, Ghana (well, at least on OSM).

Extra Data Manipulations

Some final data manipulations are in order to make this map look like a proper map. For example, I downloaded the railways and bodies of water (rivers, streams and canals) from OSM to make the map extra ‘map-like’.

# Download rivers
rivers <- st_bbox(roads_cleaned) %>%
  opq()%>%
  add_osm_feature(key = "waterway", value = c("river", "canal", "stream")) %>%
  osmdata_sf()

rivers <- rivers$osm_lines
rivers <- st_transform(rivers,st_crs(roads_cleaned))
rivers <- st_intersection(circle, rivers)

# Download railways
railways <- st_bbox(roads_cleaned)%>%
  opq()%>%
  add_osm_feature(key = "railway", value="rail") %>%
  osmdata_sf()

railways <- railways$osm_lines
railways <- st_transform(railways,st_crs(roads_cleaned))
railways <- st_intersection(circle, railways)

I also want t0 specify the thickness of different roads, so that, for example, Motorways are plotted thicker than normal residential roads

roads_cleaned <- roads_cleaned %>%
  mutate(Suffix = ifelse(Suffix %in% c(top_road_names$Suffix), Suffix, "Other")) %>%
  mutate(road_type = ifelse(highway %in% c("motorway",
                                          "primary",
                                          "motorway_link",
                                          "primary_link"),
                            "fat",
                            ifelse(highway %in% c("secondary",
                                                 "tertiary",
                                                 "secondary_link",
                                                 "tertiary_link"),
                                   "medium",
                                   "thin")))

Now I add the percentage of road suffixes to the legend.

labels_colors  <- summary_stats %>%
  mutate(percent_road_network  = as.numeric(percent_road_network) *100) %>%
  mutate(label = paste0(Suffix, " (", round(percent_road_network, 1), "%)")) %>%
  select(label) %>%
  unlist(use.names = FALSE)

names(labels_colors) <- summary_stats$Suffix

I also assign different colours to the different types of streets and I manually set the order of the legend (mostly to guarantee that ‘Other’ and ‘Unnamed’ are the two last categories in the legend).

street_colours <- c("Unnamed" = "#c6c6c6",
                    "Street" = "#193264FF",
                    "Close" = "#B8A463",
                    "Avenue" = "#434C5E",
                    "Crescent" = "#F46D43",
                    "Drive"= "#651eac",
                    "Motorway/Highway" = "#E6F598",
                    "Lane" = "#CC6666",
                    "Link" = "#FDDBC7",
                    "Other" = "#B48EAD",
                    "Road" = "#9E0142",
                    "Roundabout" = "#66C2A5")

order <- c("Street",
           "Road",
           "Avenue",
           "Motorway/Highway",
           "Close",
           "Link",
           "Crescent",
           "Lane",
           "Drive",
           "Roundabout",
           "Other",
           "Unnamed")
# some re-ordering is needed to make use that the unnamed layers are
# plotted below the order layers

plot <- roads_cleaned %>%
  mutate(Suffix = factor(Suffix, levels = c("Unnamed",
                                            "Link",
                                            "Drive",
                                            "Other",
                                            "Avenue",
                                            "Close",
                                            "Street",
                                            "Lane",
                                            "Crescent",
                                            "Road",
                                            "Motorway/Highway",
                                            "Roundabout"))) %>%
  ggplot() +
  geom_sf(data = rivers,
          color = "steelblue",
          size = .9,
          alpha = .6) +
  geom_sf(data = railways,
          inherit.aes = FALSE,
          color = "black",
          size = 1,
          linetype = "dotdash",
          alpha = 1)+
  geom_sf(aes(colour = Suffix,
              size = road_type),
          alpha = 1,
          show.legend = "point") +
  scale_color_manual(values = street_colours,
                    labels = labels_colors,
                    breaks = order)+
  scale_size_manual(values = c(2, 1, .3)) +
  theme_void() +
  theme(  panel.background = element_rect(fill = "white",  #"grey92" would also look good
                                          color = "white"),
          plot.background = element_rect(fill = "white",
                                         color = "grey60",
                                         size = 7 ),
          plot.margin =  margin(20, 30, 20, 30),
          plot.title = element_markdown(family = "Oswald",
                                        size = 54,
                                        face = "bold",
                                        hjust = 0.5,
                                        margin = margin(t = 20, b = 2)),
          legend.position = "top",
          legend.box.spacing = unit(0.2, "cm"),
          legend.key = element_rect(fill = NA, color = NA),
          legend.key.size = unit(1, "lines"),
          legend.text = element_text(family = "Oswald",
                                     color = "grey60",
                                     size = 12,
                                     face = "bold"))  +
  guides(color = guide_legend(title.position = "top",
                              title.hjust = 0.5,
                              nrow = 2,
                              label.position = "right",
                              override.aes = list(size = 7.5))) +
  guides(size = F) +
  # The colours of the Titles are based on the colour codes of the Ghanaian flag,
  # which can be found here: https://www.schemecolor.com/ghana-flag-colors.php
  labs(title = "<b style='color:#CE1126'>STREET <b style='color:#FCD116'>NAMES</b><b style='color:#000000'> OF</b><b style='color:#006B3F'> ACCRA</b>",
       fill  = NULL,
       col  = NULL) +
  annotate("text",
           x = mean(c(st_bbox(roads_cleaned$geometry)["xmin"],
                      st_bbox(roads_cleaned$geometry)["xmax"])), # to centre it
           y = 5.515,
           hjust = 0.5,
           vjust = 1,
           label = "    Visualization by Laurent Smeets  •   Data from OpenStreetMap", # some extra spaces to align the • in the centre
           family = "Oswald",                                                          # One day I will find a better solution
           size = 4.5,
           color = "grey60")

save plot in high-res

ggsave("plot_accra_streets2.png", plot, height = 11, width = 15, dpi = 1000)

This is how the final plot looks.

Ghana Data Stuff
Ghana Data Stuff

This website is my hobby project to showcase some of the project I am working one, when I am not working on the official statistics at the Ghana Statistical Service.

Related