While the data is not complete, it is obvious that tornadoes have an interesting history in Canada. As I stumbled upon what we're obviously some of the most devastating tornadoes like the big one in Edmonton, it was interesting to research old news articles about the events. While the data itself is very analytical, there are also many human stories behind it. One can only put themselves in the shoes of the people who faced these events at the time.
I never gave tornadoes much thought as they are such rare events, but they are always a threat and in an instant can change lives forever.
The data provided by the OpenData portal is seriously lacking. It appears a team of researchers have taken on the task to better understand tornado activity in Canada.
The following link from the Weather Network provides a brief synopsis of the work happening.
%matplotlib inline
import pandas as pd
import folium
import matplotlib.style as style
style.use('seaborn-poster') #sets the size of the charts
style.use('ggplot')
from IPython.core.display import HTML
css = open('../css/style-table.css').read() + open('../css/style-notebook.css').read()
HTML('<style>{}</style>'.format(css))
df = pd.read_csv('../data/tornadoes.csv', encoding = "ISO-8859-1")
df.head()
df.info()
columns_of_interest = ['YYYY_LOCAL', 'MM_LOCAL', 'DD_LOCAL', 'HHMM_LOCAL',
'NEAR_CMMTY','PROVINCE', 'FUJITA', 'START_LAT_', 'START_LON_', 'END_LAT_N',
'END_LON_W', 'LENGTH_M', 'MOTION_DEG', 'WIDTH_MAX_', 'HUMAN_FATA',
'HUMAN_INJ', 'ANIMAL_FAT', 'ANIMAL_INJ', 'DMG_THOUS']
df = df[columns_of_interest].copy()
df.head(1)
df.columns=['year', 'month', 'day', 'hhmm', 'community',
'province', 'ef','start_lat_n', 'start_lon_w', 'end_lat_n',
'end_lon_w', 'length', 'motion_deg', 'width_max',
'human_deaths', 'human_injuries', 'animal_deaths', 'animal_injuries',
'house_damage']
df.head(2)
# Over 80% of readings don't provide end lon/lat
df.end_lon_w.value_counts().head()
# Number of tornadoes recorded in Canada from 1980 to 2009
len(df)
df.dtypes
df['year'] = df.year.astype(int)
df['month'] = df.month.astype(int)
df['day'] = df.day.astype(int)
df['hhmm'] = df.hhmm.astype(int)
df.head(2)
df['date'] = pd.to_datetime(df[['year', 'month', 'day']], errors='coerce')
df.head()
df.dtypes
df['hour'] = df['hhmm'].apply(lambda x: str(x)[:-2])
df.head(2)
df.dtypes
#Most Active years for tornodoes
df.year.value_counts().head(10)
# Graph the most active years
df.year.value_counts().head(10).plot(kind='bar')
# Average number of tornadoes per year
df.year.value_counts().mean()
# The sample size is too small but already we can see a normal distribution taking shape. The following
# graph shows a distibution of tornadoes per year.
df.year.value_counts().plot(kind='hist', bins=5)
#most active hours of the day
df.hour.value_counts().plot(kind='bar')
Data looks off. It is unlikely that midnight sees the highest frequency. More likely that no hour is recorded but still assigned hour 0 based on the amounts later in the night and early morning.
Late afternoon / Early evening (between 3pm-6pm) appears to be most likely time for a tornado
#Top 10 cities have the highest tornado activity
df.community.value_counts().head(10).plot(kind='barh')
cities in western canada see the highest tornado activity (Top 4)
# Graph activity frequency by Province
df.province.value_counts().head(10).plot(kind='bar')
# Looking at Canada's largent province by population
df_on = df[df.province == 'ON']
df_on.head()
# Top 10 communities in Ontario for tornado activity
df_on.community.value_counts().head(10).plot(kind='barh')
# Looking at Cochrane specifically as it was mentioned in the national top 10 earlier.
df[df.community == 'Cochrane']
The Top 10 cities had Cochrane listed. I assumed Cochrane Ontario had 5 tornadoes but they did not show up in Ontario top 10. Turns out there is a Cochrane Alberta
len(df.date.unique())
# most active community days for tornadoes
df.date.value_counts().head(10).plot(kind='barh')
#Who had tornadoes on the most active day
df20090820 = df[df.date == '2009-08-20']
# Ontario communities that got hit on 2009-08-20
df20090820
# Ontario communities that got hit on 2006-08-02
df20060802 = df[df.date == '2006-08-02']
df20060802
# Get a list of tuples for geo coordinates to be used in plotting a tornado map
def locations(lat, lon, community, province, ef, year, deaths):
startlat = [la for la in lat]
startlon = [lo for lo in lon]
community_ = [c for c in community]
province_ = [p for p in province]
cat = [e for e in ef]
year_ = [y for y in year]
death_ = [d for d in deaths]
return list(zip(startlat, startlon, community_, province_, cat, year_, death_))
coordinates = locations(df.start_lat_n, df.start_lon_w, df.community, df.province, df.ef, df.year, df.human_deaths)
def tornado_map(coordinates, mapname):
#tornado_map = folium.Map(location=[49.641438,-97.389353], zoom_start=4)
tornado_map = folium.Map(location=[56.0,-96.0], zoom_start=4)
for coord in coordinates:
folium.CircleMarker(location=[
coord[0], coord[1]],
popup=coord[2]+coord[3]+','+str(coord[5])+'\n'+str(coord[6])+' '+'Deaths'+'\n'+'F'+str(coord[4]),
radius=3).add_to(tornado_map)
tornado_map.save(f'{mapname}.html')
tornado_map(coordinates, 'map')
We see that the bulk of recorded tornado activity is centred in the Alberta, Saskatchewan, Manitoba, Southern Ontario and Southern Quebec.
Populated coastal areas see less tornado activity. Also, tornadoes probably hit remote areas but may not hit any areas with population centres and are therefore not recorded.
from IPython.display import IFrame
IFrame('map.html', width=960, height=530)
df_1991 = df[df['year'] == 1991]
df_1991.head()
# The most active year was 1991 and recorded 94 tornadoes
len(df_1991)
# Tornadoes by Province
df_1991['province'].value_counts().plot('bar')
coordinates91 = locations(df_1991.start_lat_n, df_1991.start_lon_w, df_1991.community, df_1991.province, df_1991.ef, df_1991.year, df_1991.human_deaths)
tornado_map(coordinates91, 'map1991')
IFrame('map1991.html', width=960, height=530)
# All tornadoes recorded on August 8 2009
df20090820
# We can see that it was very windy in Southern Ontario right up to as far north as North Bay
d = df20090820
coords_on_09 = locations(d.start_lat_n, d.start_lon_w, d.community, d.province, d.ef, d.year, d.human_deaths)
tornado_map(coords_on_09, 'on2009')
IFrame('on2009.html', width=960, height=530)
The data provides information on human and animal deaths and injuries as well as damage to housing.
# Assuming -999 has no information recorded or there were no deaths
df['human_deaths'].unique()
df.dtypes
# Remove the -999 value and assume 0 deaths.
df.replace(-999, 0, inplace=True)
df['human_injuries'].unique()
df.head()
len(df.date.unique())
df['human_deaths'].unique()
df.head()
print('Human Deaths = {}'.format(df['human_deaths'].sum()))
print('Human Injuries = {}'.format(df['human_injuries'].sum()))
print('Animal Deaths = {}'.format(df['animal_deaths'].sum()))
print('Animal Injuries = {}'.format(df['animal_injuries'].sum()))
# All Tornadoes with deaths recorded between 1980 and 2009
c = df
c = c[c['human_deaths'] > 0].sort_values('human_deaths', ascending=False)
c
c[c['province'] == 'AB'].sort_values('human_deaths', ascending=False)
# The worst tornadoes for death, injuries
c.loc[c['human_deaths'] > 0, ['date', 'community', 'province', 'human_deaths', 'ef']]
# graph it
c.loc[c['human_deaths'] > 0, ['community','human_deaths']].set_index('community').plot(kind='barh')
I found the following video on youtube that is like a documentary on this historic event.
IFrame('https://www.youtube.com/embed/L-98xXnyWbQ', width=960, height=530)
coords_death = locations(c.start_lat_n, c.start_lon_w, c.community, c.province, c.ef, c.year, c.human_deaths)
tornado_map(coords_death, 'deaths')
IFrame('deaths.html', width=960, height=530)
# Total of all deaths from 1980 to 2009
c['human_deaths'].sum()
# Total number of injuries suffered from 1980 to 2009
i = df
i['human_injuries'].sum()
# Average number of people injured per year in Canada from 1980 to 2009
round(i['human_injuries'].sum() / len(i.year.unique()), 1)
# The most injuries suffered in one year
i['human_injuries'].max()
# Some years saw no injuries recorded
i['human_injuries'].min()
# the top injury days in the last 30 years
# All Tornadoes with deaths recorded between 1980 and 2009
ij = i[i['human_injuries'] > 0].sort_values('human_injuries', ascending=False)
ij
# Number of tornadoes that recorded human injuries
len(ij)
# The worst tornadoes for injuries
ij.loc[ij['human_injuries'] > 0, ['date', 'community', 'province', 'human_injuries']]
# Cities that have suffered the most injuries due to tornadoes
ij.loc[ij['human_injuries'] > 0, ['community',
'human_injuries', 'human_deaths']].set_index('community').head(10).plot(kind='barh')
A F1 tornado ripped through Paris, Ontario back on May 20, 1996 that caused significant animal deaths. Google searches point to the tornado ripping through barns in the area but further searches came up empty. Apparently many residents took video of the tornado as it passed the area but I could not find any posted.
The record does not show any human deaths and injuries for that particular tornado but the financial fallout to farmers would can be extensive.
This tornado was caught on video as it tracked for 1.5km on the north side of Paris at about 7:00pm. Several barns were torn apart, trees were snapped and uprooted and houses damaged. On one property, a number of antique cars were battered by debris and destroyed, and damage there was estimated at $500,000."
# All Tornadoes with deaths recorded between 1980 and 2009
a = df
a = a[a['animal_deaths'] > 0].sort_values('animal_deaths', ascending=False)
a.head(1)
a.loc[a['animal_deaths'] > 0, ['community', 'province', 'date', 'hour',
'animal_deaths', 'house_damage', 'ef' ]].set_index('community').head(10)#.plot(kind='barh')
a.loc[a['animal_injuries'] > 0, ['community', 'province', 'date',
'animal_injuries' ]].set_index('community')