Covid in Ontario Canada¶

The following analysis uses data published by the Government of Ontario.

https://data.ontario.ca/

At some point during the pandemic I started feeling like news outlets were not reporting on the things I cared about. I care about numbers and actual data, not some news outlets interpretation. Even worse is editorialized content that always puts a spin on the data to push an agenda. I don't care about any of that, I just want to know what is going on.

The best way to do this is to download the data yourself and analyse it. Even if you don't know programming you could easily import this data into Excel and do something similar.

Since I am a python hobbyist this feels like a great use case for Python Pandas, Matplotlib and Seaborn for visualizations.

I did my best to interpret the data in an unbiased way. However, its easy to make mistakes and if you see something that doesnt make sense or you don't agree with please drop me an email, I would like to hear from you.

You can reach out to me at [email protected]

Feedback is always welcome.

Load the libraries¶

import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style("whitegrid")

Import the data¶

# Dataset #1 - Covid Cases in Ontario
df = pd.read_csv('../data/conposcovidloc.csv', index_col="Row_ID")
# The conposcovidloc.csv file is over 100Mb. 
# If you prefer to download it directly from the source, use this instead;
# df = pd.read_csv('https://data.ontario.ca/dataset/f4112442-bdc8-45d2-be3c-12efae72fb27/resource/455fd63b-603d-4608-8216-7d8647f43350/download/conposcovidloc.csv', index_col="Row_ID")

# schema_df source: https://data.ontario.ca/dataset/f4112442-bdc8-45d2-be3c-12efae72fb27/resource/a2ea0536-1eae-4a17-aa04-e5a1ab89ca9a/download/conposcovidloc_data_dictionary.xlsx
# converted from xlsx to csv and available on linuxnorth.org
schema_df = pd.read_csv('https://www.linuxnorth.org/pandas/data/conposcovidloc_data_dictionary.csv', index_col="Variable Name", encoding = "ISO-8859-1", error_bad_lines=False)


# Dataset #2 - Covid Retransmission Rate in Ontario
dfre = pd.read_csv('https://data.ontario.ca/dataset/8da73272-8078-4cbd-ae35-1b5c60c57796/resource/1ffdf824-2712-4f64-b7fc-f8b2509f9204/download/re_estimates_on.csv')

# Dataset #3 - Vaccine data for Ontario
dfvaccine = pd.read_csv('https://data.ontario.ca/dataset/752ce2b7-c15a-4965-a3dc-397bf405e7cc/resource/8a89caa9-511c-4568-af89-7f2174b4378c/download/vaccine_doses.csv')

# Dataset #4 - Vaccine Status
dfvacstatus = pd.read_csv('https://data.ontario.ca/dataset/752ce2b7-c15a-4965-a3dc-397bf405e7cc/resource/eed63cf2-83dd-4598-b337-b288c0a89a16/download/vac_status.csv.csv')

Dataset 1 Analysing Covid in Ontario¶

# taking a peek
df.head(10)

# Dataframe size (rows, columns)
df.shape

(578048, 17)

# Looking at the schema provided
schema_df = schema_df[['Definition', 'Additional Notes']]
schema_df.sort_index(inplace=True)
schema_df

# How many missing values in each column
df.isna().sum()

Accurate_Episode_Date             0
Case_Reported_Date                0
Test_Reported_Date            12665
Specimen_Date                  2382
Age_Group                         0
Client_Gender                     0
Case_AcquisitionInfo              0
Outcome1                          0
Outbreak_Related             480431
Reporting_PHU_ID                  0
Reporting_PHU                     0
Reporting_PHU_Address             0
Reporting_PHU_City                0
Reporting_PHU_Postal_Code         0
Reporting_PHU_Website             0
Reporting_PHU_Latitude            0
Reporting_PHU_Longitude           0
dtype: int64

# Looking only at columns of interest
columns_of_interest = ['Accurate_Episode_Date', 'Case_Reported_Date', 'Age_Group', 'Client_Gender', 'Case_AcquisitionInfo', 
                       'Outcome1', 'Outbreak_Related', 'Reporting_PHU_ID', 'Reporting_PHU']
df = df[columns_of_interest]
df.columns = ['adate','rdate', 'age', 'gender', 'source', 'outcome', 'outbreak', 'phuid', 'phu']

df.dtypes

adate       object
rdate       object
age         object
gender      object
source      object
outcome     object
outbreak    object
phuid        int64
phu         object
dtype: object

# Dates are stored as strings. Change them to pandas datetime
df['rdate']= pd.to_datetime(df['rdate'])
df['adate']= pd.to_datetime(df['adate'])

df.dtypes

adate       datetime64[ns]
rdate       datetime64[ns]
age                 object
gender              object
source              object
outcome             object
outbreak            object
phuid                int64
phu                 object
dtype: object

# Take another peek....that's better
df.tail()

# Total number of covid cases reported in Ontario all time.
len(df)

578048

# Change '<20' to '0-19'.  This will make age distribution charts easier to read later.
df['age'] = df['age'].replace(['<20'],'0-19')
df.head(2)

Case distribution by date since the beginning of the pandemic¶

We can see three distinct waves of covid spread in Ontario. The initial smaller wave at the beginning that devastated the elderly in March/April of 2020, then two distinct larger waves in January and May 2021 which was mostly spread by younger people.

plt.figure(figsize=(14,6))
plt.title('Ontario Covid Waves - Daily Cases', fontsize=20)
sns.lineplot(data=df['rdate'].value_counts())
plt.ylabel('Cases', fontsize=15)
plt.xlabel('Date', fontsize=15)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
plt.show()

Gender breakdown of Covid Cases in Ontario¶

Covid infects all genders proportionally.

print(df['gender'].value_counts())
gender_filter = (df["gender"] == 'MALE') | (df["gender"] == 'FEMALE') | (df["gender"] == 'UNSPECIFIED') | (df["gender"] == 'GENDER DIVERSE')
gdf = df[gender_filter]
plt.figure(figsize=(10,6))
plt.title("Ontario - Covid Infections by Gender", fontsize=20)
sns.countplot(x=gdf["gender"], data=df)
plt.xlabel('Gender', fontsize=13)
plt.ylabel('Count', fontsize=13)
plt.show()

MALE              288058
FEMALE            285966
UNSPECIFIED         3990
GENDER DIVERSE        34
Name: gender, dtype: int64

Region specific covid cases¶

My hometown is Timmins and I am originally from Sudbury. Let's compare the two communities covid cases. Timmins is represented by the Porcupine Health Unit area.

The Porcupine Health Unit area had an explosion of cases in May, especially in the James Bay area.

You can compare multiple areas easily.

df_tim = df[df.phu == "Porcupine Health Unit"]
df_sud = df[df.phu == "Sudbury & District Health Unit"]
df_wat = df[df.phu == "Region of Waterloo, Public Health"]

plt.figure(figsize=(14,6))
plt.title('Cases in Porcupine and Sudbury Health Unit Areas', fontsize=20)
plt.xlabel("")
plt.ylabel("Daily Cases", fontsize=15)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
sns.lineplot(data=df_tim['rdate'].value_counts(), label="Porcupine Health Unit")
sns.lineplot(data=df_sud['rdate'].value_counts(), label="Sudbury & District Health Unit")
#sns.lineplot(data=df_wat['rdate'].value_counts(), label="Grey Bruce Health Unit")
plt.show()

Distribution of cases by age group¶

We can see that young people have been hit especially hard by Covid.

plt.figure(figsize=(10,6))
plt.title("Ontario - Infections by Age Category", fontsize=18)
sns.countplot(data=df, x=df['age'],order=['0-19', '20s','30s','40s','50s','60s','70s','80s','90s'])#df["age"].value_counts().index)#.iloc[:10].index)
plt.ylabel('Age Group', fontsize=15)
plt.xlabel('Infections', fontsize=15)
plt.show()

Tracking age distribution of infections during the three Ontario covid waves.¶

Note: These dates are approximate by looking at the Ontario cases graph higher up.

Wave 1 - March to May 2020
Wave 2 - October 2020 to February 2021
Wave 3 - April 2021 to May 2021

wave1 = (df['rdate'] > '2020-03-01') & (df['rdate'] < '2020-05-30')
wave2 = (df['rdate'] > '2020-10-01') & (df['rdate'] < '2021-02-28')
wave3 = (df['rdate'] > '2021-04-01') & (df['rdate'] < '2021-05-21')
wave4 = (df['rdate'] > '2021-07-26')

dfwave1 = df[wave1].sort_values(by='age')
dfwave2 = df[wave2].sort_values(by='age')
dfwave3 = df[wave3].sort_values(by='age')
dfwave4 = df[wave4].sort_values(by='age')

Age trends¶

We can see a clear trend of age distributions moving towards younger generations with each wave. There is a lot of speculation and people are quick to criticize younger Canadians for not following Covid guidelines like social distancing and not gathering in groups. I don't think that is entirely fair as Ontario has been proritizing older Ontarians during vaccine rollout.

Also more virulent variants have taken hold and many young Canadians work in the service sector, therefore may not have the luxury of working from home. They have no choice but to get out there.

Also as we see in the last wave "under 20's" have not had access to vaccination in the -12 years old group. The under 30 group now account for almost three quarters of new cases.

# wave 1 graph
plt.figure(figsize=(10,6))
plt.title("Wave 1 Ontario - Infections by Age Category", fontsize=18)
sns.countplot(data=dfwave1, x='age')
plt.xlabel('Age Group', fontsize=15)
plt.ylabel('Wave 1 Infections', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

# wave 2 graphb
plt.figure(figsize=(10,6))
plt.title("Wave 2 - Ontario - Infections by Age Category", fontsize=18)
sns.countplot(data=dfwave2, x='age')
plt.xlabel('Age Group', fontsize=15)
plt.ylabel('Wave 2 Infections', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

# wave 3 graph
plt.figure(figsize=(10,6))
plt.title("Wave 3 - Ontario - Infections by Age Category", fontsize=18)
sns.countplot(data=dfwave3, x='age')
plt.xlabel('Age Group', fontsize=15)
plt.ylabel('Wave 3 Infections', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

# wave 4 graph
plt.figure(figsize=(10,6))
plt.title("Wave 4 - Ontario - Infections by Age Category since July 26", fontsize=18)
sns.countplot(data=dfwave4, x='age')
plt.xlabel('Age Group', fontsize=15)
plt.ylabel('Wave 4 Infections', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

plt.show()

#df.age.value_counts().sort_index()

Look at how the different age categories are getting infected by Covid 19.¶

print('Missing Information and Unspecified EPI Link have been ommitted')
plt.figure(figsize=(14,6))
plt.title("Ontario - Source of Infection by Age Category", fontsize=18)
sns.countplot(data=df, x='age', hue='source', hue_order=('CC', 'NO KNOWN EPI LINK','OB', 'TRAVEL'), 
              order=['0-19', '20s','30s','40s','50s','60s','70s','80s','90s'])
plt.legend(title='Source of Infection', loc=7,labels=('Contact of a Case', 'Outbreak',
                                              'No Known Link', 'Travel', 'Missing Information', 'Unspecified Link'))
plt.xlabel('Age Group', fontsize=15)
plt.ylabel('Infections', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

Missing Information and Unspecified EPI Link have been ommitted

Tracking Deaths¶

The risk of death from Covid rises exponentially as we age. Despite most infections occurring in younger Ontarians, the elderly have suffered the most deaths.

dfdeath = df[df.outcome == 'Fatal'].age.value_counts().sort_index()
print(dfdeath)
plt.figure(figsize=(10,6))
plt.title('Deaths by Age Group', fontsize=20)
plt.ylabel('Number of Deaths', fontsize=15)
plt.xlabel('')
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
dfdeath.plot(kind='bar')
#sns.countplot(data=df, x='age', hue='outcome', hue_order=['Fatal'], order=df.age.value_counts().index)
plt.show()

0-19          5
20s          28
30s          66
40s         157
50s         489
60s        1129
70s        1998
80s        3260
90+        2504
UNKNOWN       1
Name: age, dtype: int64

Death by time period¶

With well over 9000 deaths in Ontario since the beginning of the Covid pandemic, the vast majority have been in individuals over 70 years of age. Despite the increasing number of cases throughout the second and third wave, deaths have dropped dramatically as infections moved to younger individuals, who are less susceptible to death as a result of infection.

Vaccination is also contributing to decreased rates of death.

df_fatal = df[df.outcome == 'Fatal'].sort_index()
df_fatal = df_fatal.sort_values(by=['rdate'])
print('There have been',len(df_fatal), 'Deaths Total.')

There have been 9637 Deaths Total.

plt.figure(figsize=(14,6))
plt.title('Deaths Since Beginning of Covid Pandemic', fontsize=20)
plt.ylabel('Deaths', fontsize=15)
plt.xlabel('Date', fontsize=15)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
df_fatal['rdate'].value_counts().plot()
plt.show()

df['outcome'].unique()

array(['Resolved', 'Fatal', 'Not Resolved'], dtype=object)

df['outcome'].value_counts()

Resolved        562172
Fatal             9637
Not Resolved      6239
Name: outcome, dtype: int64

The hardest hit regions in Ontario¶

No surprise that large urban centres had the highest rates of transmission

plt.figure(figsize=(10,6))
plt.title("Infections by Top 10 PHU Area", fontsize=20)
sns.countplot(data=df, y=df['phu'], order=df.phu.value_counts().iloc[:10].index)
plt.ylabel('Area', fontsize=15)
plt.xlabel('Count', fontsize=15)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

df.phu.value_counts().iloc[:10].index

Index(['Toronto Public Health', 'Peel Public Health',
       'York Region Public Health Services', 'Ottawa Public Health',
       'Durham Region Health Department', 'Hamilton Public Health Services',
       'Region of Waterloo, Public Health', 'Windsor-Essex County Health Unit',
       'Halton Region Health Department',
       'Niagara Region Public Health Department'],
      dtype='object')

Ontario Hotspots (name and phuid) where Delta Variant has taken hold¶

Toronto 3895, Peel 2253, York 2270, Durham 2230, Hamilton 2237, Waterloo 2265, Halton 2236, Porcupine 2256, Wellington-Dufferin-Guelph 2266, and Simcoe-Muskoka 2260, Grey Bruce 2233

hotspots = (df['phuid'] == 3895) | (df['phuid'] == 2253) | (df['phuid'] == 2270) | (df['phuid'] == 2230) | (df['phuid'] == 2237) | (df['phuid'] == 2265) | (df['phuid'] == 2236) | (df['phuid'] == 2256) | (df['phuid'] == 2266) | (df['phuid'] == 2260)

dfhot = df.loc[hotspots]
dfhot.tail()

junehot = dfhot['rdate'] > "2021-07-01"
dfhot.loc[junehot]['rdate'].value_counts().plot()

<AxesSubplot:>

4th Wave Timmins and Sudbury¶

df4 = dfwave4.loc[wave4]

df_tim4 = df4[df4.phu == "Porcupine Health Unit"]
df_sud4 = df4[df4.phu == "Sudbury & District Health Unit"]

plt.figure(figsize=(14,6))
plt.title('4th wave Cases in Porcupine and Sudbury Health Unit Areas', fontsize=20)
plt.xlabel("")
plt.ylabel("Daily Cases", fontsize=15)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
sns.lineplot(data=df_tim4['rdate'].value_counts(), label="Porcupine Health Unit")
sns.lineplot(data=df_sud4['rdate'].value_counts(), label="Sudbury & District Health Unit")
plt.show()

2. Effective reproduction number (Re) for COVID-19 in Ontario¶

An estimate of the average number of people 1 person will infect when they have COVID-19.

Source: https://data.ontario.ca/dataset/effective-reproduction-number-re-for-covid-19-in-ontario

Note: A rate over one will mean that covid numbers are on the rise. A rate below one means Covid cases are shrinking.

# Make date_start and date_end Pandas datetime objects instead of strings.
dfre['date_start'] = pd.to_datetime(dfre['date_start'])
dfre['date_end'] = pd.to_datetime(dfre['date_end'])

dfre.dtypes

region                object
date_start    datetime64[ns]
date_end      datetime64[ns]
Re                   float64
lower_CI             float64
upper_CI             float64
dtype: object

Create a Baseline Re rate of 1¶

dfre['Re_baseline'] = dfre.apply(lambda x: 1, axis=1)

Set date_end as the index of the dataframe.¶

The Re number is provided as a rolling average of the past 7 days in Ontario's data.

dfre.set_index('date_end', inplace=True)

dfre.tail()

Re rate observations¶

The Re rate can be a powerful predictor of where we are headed in terms of an increasing or decreasing number of cases. Vaccination of Ontarians started in February and has really picked up steam in April, May and June. The Re rate seems to reflect this and has been on a continuous decline since April. However it may still be too early to tell for sure with the Delta variant taking hold.

We see a similar trend from January to the end of February before the third wave hit. Vaccination was not an issue at that time.

It will be interesting to follow the Re rate in the next months given high vaccination rates but also increased spread of the Delta variant (and future unknown variants). If vaccination manages to contain Re then we can get ahead of Covid and return to a more normal way of life. The wildcard in this will be variants. While vaccination appears to be working with current strains, new variants could take hold and push Re back up again resulting in more waves.

Prediction¶

Looking at the graph below and the upward trend, I predict that the rate of decrease in cases will stop in August and numbers will climb by September. (Assuming no changes)

wildcards - Delta Variant, Success in getting first doses, Opening Immunization to under 12. All of these can impact Re.

# Re Graph
plt.figure(figsize=(14, 6))
plt.title("Ontario Covid Reproduction Rate (Re) vs Cases", fontsize=20)
plt.xticks(fontsize=13)
plt.yticks(fontsize=13)
sns.lineplot(data=dfre[['Re', 'Re_baseline']])
plt.xlabel("")
plt.ylabel("Re Number", fontsize=15)

# Ontario Covid Case graph for comparison.  

#Let's lineup the dates with the Re dataset first.
df = df[df['rdate'] > '2020-03-19']

plt.figure(figsize=(14,6))
#plt.title('Ontario Covid Waves - Daily Cases', fontsize=20)
sns.lineplot(data=df['rdate'].value_counts())
plt.ylabel('Cases', fontsize=15)
plt.xlabel('')
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
plt.show()

Vaccination Analysis¶

The following looks at vaccination rates in Ontario. We can see that Ontarians overall are being vaccinated in large numbers. As of June 27, 2021 we have not yet seen a plateau although rates are expected to slow down.

# Dataset #3 - Vaccine data for Ontario
dfvaccine = pd.read_csv('https://data.ontario.ca/dataset/752ce2b7-c15a-4965-a3dc-397bf405e7cc/resource/8a89caa9-511c-4568-af89-7f2174b4378c/download/vaccine_doses.csv')

#dfvaccine.tail()

# Create a 7 day rolling average column of daily vaccinations.
dfvaccine['7day'] = dfvaccine.iloc[:,1].rolling(window=7).mean()

#plt.figure(figsize=(14,6))
dfvaccine[['report_date','previous_day_at_least_one', 'previous_day_fully_vaccinated',
           'previous_day_total_doses_administered', '7day']].set_index('report_date').tail(10)#.plot(kind='bar')

# Make report_date a pandas datetime object instead of a string.
dfvaccine['report_date'] = pd.to_datetime(dfvaccine['report_date'])
#dfvaccine.dtypes

Interesting to see that numbers really drop on Sundays as Monday reporting always shows lower numbers¶

plt.figure(figsize=(14,6))
plt.title('Daily Vaccine Doses - Ontario', fontsize=20)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
sns.lineplot(data=dfvaccine, x='report_date', y='7day', label='7 Day Rolling Average')
sns.lineplot(data=dfvaccine, x='report_date', y='previous_day_total_doses_administered', label='Daily Dose Count')
plt.xlabel('Date',fontsize=15)
plt.ylabel('Number Vaccinated',fontsize=15)
plt.show()

Show the trend of first and second doses¶

plt.figure(figsize=(14,6))
plt.title('First and Second Dose Daily Counts - Ontario', fontsize=20)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12)
sns.lineplot(data=dfvaccine, x='report_date', y='previous_day_at_least_one', label='First Dose')
sns.lineplot(data=dfvaccine, x='report_date', y='previous_day_fully_vaccinated', label='Second Dose')
plt.xlabel('')
plt.ylabel('Number Vaccinated',fontsize=15)
plt.show()

total_doses = dfvaccine['previous_day_total_doses_administered'].sum()
total_fully_vaccinated = dfvaccine['total_individuals_fully_vaccinated'].max()
total_first_doses = total_doses - total_fully_vaccinated
population = 14734014 # See sources (1)
eligible_pop = population - 1961438 # See sources (2)
vaccine_rate = (total_first_doses / eligible_pop) * 100
vaccine_rate_tot = (total_first_doses /population) * 100
full_vaccine_rate = (total_fully_vaccinated / eligible_pop) * 100
full_vaccine_rate_tot = (total_fully_vaccinated / population) * 100
total_unvaccinated = int(eligible_pop - dfvaccine['total_individuals_at_least_one'].max())
unvaccinated_percentage = round((total_unvaccinated / eligible_pop) * 100,1)

###### print('Fast Sheet')
print("----------")
print("Data Published:", str(dfvaccine['report_date'].iloc[-1])[0:10])
print()
print('Eligible Population - 12 and over')
print('---------------------------------')
print("First Dose Only: ", round((vaccine_rate),1),"%")
print("Fully Vaccinated:", round((full_vaccine_rate),1),"%")
print()

print('Total Population')
print('----------------')
print("First Dose Only: ", round((vaccine_rate_tot),1),"%")
print("Fully Vaccinated:", round((full_vaccine_rate_tot),1),"%")
print()

print("Maximum Vaccinated in one day:", int(dfvaccine['previous_day_total_doses_administered'].max()) )
print("Vaccinated Yesterday", int(dfvaccine['previous_day_total_doses_administered'].tail(1)) )
print()
print("Total individuals with at least one dose:", int(dfvaccine['total_individuals_at_least_one'].max()))
print("Total individuals fully vaccinated:", int(dfvaccine['total_individuals_fully_vaccinated'].max()))
print()
print("Total Percentage of Unvaccinated Individual:", unvaccinated_percentage,"%")
print("Estimated total of eligible population foregoing vaccination:", total_unvaccinated )

----------
Data Published: 2021-09-17

Eligible Population - 12 and over
---------------------------------
First Dose Only:  86.5 %
Fully Vaccinated: 80.3 %

Total Population
----------------
First Dose Only:  75.0 %
Fully Vaccinated: 69.6 %

Maximum Vaccinated in one day: 268884
Vaccinated Yesterday 35285

Total individuals with at least one dose: 11061902
Total individuals fully vaccinated: 10256563

Total Percentage of Unvaccinated Individual: 13.4 %
Estimated total of eligible population foregoing vaccination: 1710674

sources¶

(1) Vaccine Data from Ontario Open Data Portal

(2) Statistics Canada. Table 17-10-0005-01 Population estimates on July 1st, by age and sex

(3) 1,950,000 is an estimate of population under 12 based from source (2) above. Stats Can lists only pop from 10-14. 1,961,438 represents 60% of that age group. Assumed an even distribution of ages.

¶

dfvacstatus.set_index('Date', inplace=True)

plt.figure(figsize=(14,6))
dfvacstatus[['covid19_cases_unvac', 'covid19_cases_partial_vac', 'covid19_cases_full_vac']].describe().plot(kind='bar')
plt.show()

<Figure size 1008x432 with 0 Axes>

plt.figure(figsize=(16,6))
plt.title('Cases by Vaccine Status - Ontario', fontsize=20)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12, rotation=30)
plt.xlabel('')
plt.ylabel('Cases',fontsize=15,)
sns.lineplot(data=dfvacstatus[['covid19_cases_unvac', 'covid19_cases_partial_vac', 'covid19_cases_full_vac']])
plt.show()

plt.figure(figsize=(16,6))
plt.title('Cases per 100k - Ontario', fontsize=20)
plt.yticks(fontsize=12)
plt.xticks(fontsize=12, rotation=30)
plt.xlabel('')
plt.ylabel('Cases',fontsize=15,)
sns.lineplot(data=dfvacstatus[['cases_unvac_rate_per100K', 'cases_partial_vac_rate_per100K',
       'cases_full_vac_rate_per100K']])
plt.show()

	Accurate_Episode_Date	Case_Reported_Date	Test_Reported_Date	Specimen_Date	Age_Group	Client_Gender	Case_AcquisitionInfo	Outcome1	Outbreak_Related	Reporting_PHU_ID	Reporting_PHU	Reporting_PHU_Address	Reporting_PHU_City	Reporting_PHU_Postal_Code	Reporting_PHU_Website	Reporting_PHU_Latitude	Reporting_PHU_Longitude
Row_ID
1	2019-05-30	2020-05-05	2020-05-05	2020-05-03	50s	FEMALE	CC	Resolved	NaN	2260	Simcoe Muskoka District Health Unit	15 Sperling Drive	Barrie	L4M 6K9	www.simcoemuskokahealth.org	44.410713	-79.686306
2	2019-11-20	2020-10-21	2020-11-21	2019-11-20	20s	FEMALE	NO KNOWN EPI LINK	Resolved	NaN	4913	Southwestern Public Health	1230 Talbot Street	St. Thomas	N5P 1G9	www.swpublichealth.ca	42.777804	-81.151156
3	2020-01-01	2020-04-24	2020-04-24	2020-04-23	80s	MALE	NO KNOWN EPI LINK	Resolved	NaN	2234	Haldimand-Norfolk Health Unit	12 Gilbertson Drive	Simcoe	N3Y 4N5	www.hnhu.org	42.847825	-80.303815
4	2020-01-01	2020-05-17	2020-05-17	2020-05-15	50s	MALE	CC	Resolved	NaN	2265	Region of Waterloo, Public Health	99 Regina Street South	Waterloo	N2J 4V3	www.regionofwaterloo.ca	43.462876	-80.520913
5	2020-01-01	2021-05-26	2021-03-31	2021-03-28	UNKNOWN	MALE	TRAVEL	Resolved	NaN	2263	Timiskaming Health Unit	247 Whitewood Avenue, Unit 43	New Liskeard	P0J 1P0	www.timiskaminghu.com	47.509284	-79.681632
6	2020-01-10	2020-06-10	2020-06-10	2020-06-09	50s	MALE	CC	Resolved	NaN	2234	Haldimand-Norfolk Health Unit	12 Gilbertson Drive	Simcoe	N3Y 4N5	www.hnhu.org	42.847825	-80.303815
7	2020-01-13	2021-01-23	2021-01-23	2021-01-22	30s	MALE	NO KNOWN EPI LINK	Resolved	NaN	2260	Simcoe Muskoka District Health Unit	15 Sperling Drive	Barrie	L4M 6K9	www.simcoemuskokahealth.org	44.410713	-79.686306
8	2020-01-16	2020-10-08	2020-10-08	2020-10-06	50s	FEMALE	NO KNOWN EPI LINK	Resolved	NaN	2258	Eastern Ontario Health Unit	1000 Pitt Street	Cornwall	K6J 5T1	www.eohu.ca	45.029152	-74.736298
9	2020-01-21	2020-01-23	2020-01-26	2020-01-23	50s	MALE	TRAVEL	Resolved	NaN	3895	Toronto Public Health	277 Victoria Street, 5th Floor	Toronto	M5B 1W2	www.toronto.ca/community-people/health-wellnes...	43.656591	-79.379358
10	2020-01-22	2020-01-23	2020-01-27	2020-01-25	50s	FEMALE	TRAVEL	Resolved	NaN	3895	Toronto Public Health	277 Victoria Street, 5th Floor	Toronto	M5B 1W2	www.toronto.ca/community-people/health-wellnes...	43.656591	-79.379358

	Definition	Additional Notes
Variable Name
Accurate_Episode_Date	The field uses a number of dates entered in th...	Blank records may exist where a Public Health ...
Age_Group	Age group of the patient.	Patient ages are clustered in 10-year interval...
Case_AcquisitionInfo	Suspected method of exposure to COVID-19, if k...	As of June 17, 2020, values include: CC (clo...
Case_Reported_Date	The date that the case was reported to the loc...	NaN
Client_Gender	Gender information of the patient.	Values Include: 'FEMALE', 'MALE', 'GENDER DIV...
Outbreak_Related	Describes whether a confirmed positive case is...	A confirmed positive case that is associated w...
Outcome1	Patient outcome.	Values include: Resolved, Not Resolved, Fatal.
Reporting_PHU	Public Health Unit (PHU) where confirmed posit...	For a list of Ontario's Public Health Units, p...
Reporting_PHU_Address	Official physical street address of Public Hea...	This variable does not indicate the specfic ph...
Reporting_PHU_City	Official city of Public Health Unit (PHU).	This variable does not indicate the specfic ci...
Reporting_PHU_ID	Public Health Unit (PHU) ID where confirmed po...	NaN
Reporting_PHU_Latitude	Latitude of Public Health Unit (PHU) physical ...	This variable does not indicate the specfic co...
Reporting_PHU_Longitude	Longitude of Public Health Unit (PHU) physical...	This variable does not indicate the specfic co...
Reporting_PHU_Postal_Code	Official postal code of Public Health Unit (PHU).	This variable does not indicate the specfic po...
Reporting_PHU_Website	Official website of Public Health Unit (PHU).	NaN
Row_ID	Identifier for each individual row/record with...	The values under this variable are not continu...
Specimen_Date	Set to the earliest specimen date on record fo...	NaN
Test_Reported_Date	The test reported date as indicated on the lab...	NaN

	region	date_start	Re	lower_CI	upper_CI	Re_baseline
date_end
2021-09-10	Ontario	2021-09-04	0.98	0.96	1.01	1
2021-09-11	Ontario	2021-09-05	0.97	0.94	1.00	1
2021-09-12	Ontario	2021-09-06	0.97	0.95	1.00	1
2021-09-13	Ontario	2021-09-07	0.98	0.95	1.01	1
2021-09-14	Ontario	2021-09-08	0.99	0.97	1.02	1

	previous_day_at_least_one	previous_day_fully_vaccinated	previous_day_total_doses_administered	7day
report_date
2021-09-08	17447.0	20727.0	38174.0	33033.285714
2021-09-09	18043.0	20348.0	38391.0	33496.000000
2021-09-10	16477.0	19367.0	35844.0	32351.571429
2021-09-11	16532.0	23688.0	40220.0	31542.142857
2021-09-12	11733.0	17449.0	29182.0	31075.285714
2021-09-13	6616.0	9226.0	15842.0	30292.000000
2021-09-14	12538.0	16119.0	28657.0	32330.000000
2021-09-15	15171.0	20520.0	35691.0	31975.285714
2021-09-16	15271.0	20192.0	35463.0	31557.000000
2021-09-17	14865.0	20420.0	35285.0	31477.142857

	adate	rdate	age	gender	source	outcome	outbreak	phuid	phu
Row_ID
578044	2021-09-16	2021-09-16	60s	MALE	MISSING INFORMATION	Not Resolved	NaN	2253	Peel Public Health
578045	2021-09-16	2021-09-16	<20	FEMALE	MISSING INFORMATION	Not Resolved	NaN	2253	Peel Public Health
578046	2021-09-16	2021-09-16	<20	FEMALE	MISSING INFORMATION	Not Resolved	NaN	2227	Brant County Health Unit
578047	2021-09-16	2021-09-16	<20	FEMALE	MISSING INFORMATION	Not Resolved	NaN	2227	Brant County Health Unit
578048	2021-09-16	2021-09-16	<20	FEMALE	MISSING INFORMATION	Not Resolved	NaN	3895	Toronto Public Health

	adate	rdate	age	gender	source	outcome	outbreak	phuid	phu
Row_ID
578038	2021-09-16	2021-09-16	50s	FEMALE	MISSING INFORMATION	Not Resolved	NaN	2270	York Region Public Health Services
578041	2021-09-16	2021-09-16	30s	MALE	MISSING INFORMATION	Not Resolved	NaN	2253	Peel Public Health
578044	2021-09-16	2021-09-16	60s	MALE	MISSING INFORMATION	Not Resolved	NaN	2253	Peel Public Health
578045	2021-09-16	2021-09-16	0-19	FEMALE	MISSING INFORMATION	Not Resolved	NaN	2253	Peel Public Health
578048	2021-09-16	2021-09-16	0-19	FEMALE	MISSING INFORMATION	Not Resolved	NaN	3895	Toronto Public Health