To use markdown first we install the package: install.packages("rmarkdown")
As a result of climate change, there has been an increase in flooding across the U.S., particularly along the coasts and in low-lying areas. Warmer temperatures increase evaporation and rain availability. Warmer oceans can also cause storms along the coast in intensity, in turn causing increased flooding. Therefore, we suggest establishing a flood modeling and planning center.
A flood inundation modeling and planning center could lead to a comprehensive flood mitigation program to make urban construction more ecological and sustainable and provide timely flood warnings for post-disaster relief and reconstruction.
The center’s establishment will enable simulation of flood inundation, provide more accurate boundary information and flood prevention recommendations to planning authorities, and provide timely flood warnings. The advice and alerts will reduce economic losses and fatalities caused by floods.
This analysis could help planners understand why a local area needs a flood forecast, and how a data center could help manage flood risk. With more detailed data, the data center could provide more accurate analysis.
boundary
flood map: https://data.calgary.ca/Environment/1-5-Flood-Map-20-chance-of-occurring-in-any-year-/iyqi-dvvj
https://data.calgary.ca/Base-Maps/Digital-Elevation-Model-DEM-ASCII-2M/eink-tu9p
http://www.cec.org/north-american-environmental-atlas/land-cover-30m-2015-landsat-and-rapideye/
CRS(coordinate reference system): 26912
Coordinate System in ArcGIS Pro: NAD 1983 CSRS UTM Zone 12N
spatialreference.org for projections
To get started, first install libraries.
library(caret)
library(pscl)
library(plotROC)
library(pROC)
library(sf)
library(tidyverse)
library(knitr)
library(kableExtra)
library(tigris)
library(viridis)
library(maptools)
library(raster)
library(rasterVis)
library(dplyr)
Set ggplot styles called mapTheme and plotTheme, also load our data
We are going to include two datasets in this step:cal_cityboundary, which is the boundary shapefile of Calgary; and flood, which is the original flood dataset.
mapTheme <- function(base_size = 14) {
theme(
text = element_text( color = "black"),
plot.title = element_text(size = 12,colour = "black"),
plot.subtitle=element_text(face="italic"),
plot.caption=element_text(hjust=0),
axis.ticks = element_blank(),
panel.background = element_blank(),axis.title = element_blank(),
axis.text = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=1)
)
}
plotTheme <- theme(
plot.title =element_text(size=12),
plot.subtitle = element_text(size=8),
plot.caption = element_text(size = 6),
axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
axis.text.y = element_text(size = 10),
axis.title.y = element_text(size = 10),
# Set the entire chart region to blank
panel.background=element_blank(),
plot.background=element_blank(),
#panel.border=element_rect(colour="#F0F0F0"),
# Format the grid
panel.grid.major=element_line(colour="#D0D0D0",size=.75),
axis.ticks=element_blank())
cal_b <-
st_read("C:/Users/jrach/Desktop/CPLN675Midterm/Resource/Cal_cityboundline/Cal_cityboundline.shp") %>%
st_transform(crs=26912)
fishnet <-
st_make_grid(cal_b, cellsize = 500) %>%
st_sf()
fishnet <-
fishnet[cal_b,] %>%
mutate(uniqueID = rownames(.)) %>%
dplyr::select(uniqueID)
flood <- st_read("C:/Users/jrach/Desktop/CPLN675Midterm/Resource/flood/cal_flood.shp") %>%
st_transform(crs=26912)
ggplot() +
geom_sf(data=flood,color="orange", fill="orange") +
geom_sf(data=cal_b) +
labs(title="Regulatory Flood Map (Bylaw Flood Hazard) in Calgary") +
mapTheme()
This is a flood map shown above.
Now, let’s load cal_result, which is a fishnet dataset representing four variables:slope, distance from water, drainage, and infiltration probability, and each variable is shown below:
cal_result <-
st_read("C:/Users/jrach/Desktop/CPLN675Midterm/result/cal_result.shp")%>%
st_transform(crs=26912)
ggplot() +
geom_sf(data=cal_result,aes(fill=diwater),size=0.1) +
scale_fill_viridis() +
#geom_sf(data=flood,color="orange", fill="orange", alpha=0.2) +
labs(title="distance from water body in Calgary") +
mapTheme()
ggplot() +
geom_sf(data=cal_result,aes(fill=slope),size=0.05) +
#geom_sf(data=flood,color="orange", fill="orange", alpha=0.2) +
scale_fill_viridis() +
labs(title="slope in Calgary") +
mapTheme()
ggplot() +
geom_sf(data=cal_result,aes(fill=drain),size=0.05) +
scale_fill_viridis() +
#geom_sf(data=flood,color="orange", fill="orange", alpha=0.2) +
labs(title="drainage in Calgary") +
mapTheme()
ggplot() +
geom_sf(data=cal_result,aes(fill=infilp), size=0.05) +
scale_fill_viridis() +
#geom_sf(data=flood,color="orange", fill="orange", alpha=0.2) +
labs(title="infiltration probability in Calgary") +
mapTheme()
set.seed generates a random number createDataPartition randomly separates our data into two sets. We set p to p - a 70% training set and 30% test set.
overl <-
st_intersects(cal_result$geometry,flood$geometry,sparse = FALSE) %>%
apply(1,any)
cal_temp <-
cal_result %>%
mutate(is_flood=ifelse(overl==FALSE,0,1))
cal_m <-
cal_temp %>%
dplyr::select(diwater,slope,is_flood,drain,infilp) %>%
mutate(infilp = as.factor(infilp))
set.seed(42)
trainIndex <- createDataPartition(cal_m$diwater, p = .70,
list = FALSE,
times = 1)
floodTrain <- cal_m[ trainIndex,]
The binomial logit model runs in the glm function (generalized linear models). We specify the dependent variable as is_flood and run the model on our training set floodTrain.
floodModel <- glm( is_flood ~ .,
family="binomial"(link="logit"), data = floodTrain %>%
as.data.frame() %>%
select(-geometry))
summary(floodModel)
##
## Call:
## glm(formula = is_flood ~ ., family = binomial(link = "logit"),
## data = floodTrain %>% as.data.frame() %>% select(-geometry))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.1357 -0.6044 -0.4056 -0.1546 2.7406
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.451e+01 3.394e+02 -0.043 0.966
## diwater -2.323e+02 2.253e+01 -10.311 < 2e-16 ***
## slope 9.710e-02 2.494e-02 3.893 9.89e-05 ***
## drain 1.591e-05 1.923e-06 8.277 < 2e-16 ***
## infilp1 1.367e+01 3.394e+02 0.040 0.968
## infilp2 1.304e+01 3.394e+02 0.038 0.969
## infilp3 1.440e+01 3.394e+02 0.042 0.966
## infilp4 1.335e+01 3.394e+02 0.039 0.969
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2290.6 on 2512 degrees of freedom
## Residual deviance: 1771.9 on 2505 degrees of freedom
## AIC: 1787.9
##
## Number of Fisher Scoring iterations: 12
We usepredict function to create a vector of classification probabilities called classProbs and set the parameter type="reponse" which will returns probabilities that range from 0 to 1.
floodTest <- cal_m[-trainIndex,]
classProbs <- predict(floodModel, floodTest, type="response")
hist(classProbs)
We build testProbsPlot. The vertical line represents a 0.5 probability of inundation.
testProbs <- data.frame(obs = as.numeric(floodTest$is_flood),
pred = classProbs)
ggplot(testProbs, aes(x = pred, fill=as.factor(obs))) +
geom_density() +
facet_grid(obs ~ .) +
xlab("Probability") +
geom_vline(xintercept = .5) +
scale_fill_manual(values = c("dark blue", "dark green"),
labels = c("not flood","flood"),
name = "")+
plotTheme
We build Confusion metrics
Predicted = 0, Observed = 0 —> True Negative
Predicted = 1, Observed = 1 —> True Positive
Predicted = 1, Observed = 0 —> False Positive
Predicted = 0, Observed = 1 —> False Negative
Sensitivity - the proportion of actual positives (1’s) that were predicted to be positive. Also known as “true positive rate”.
Specificity - The proportion of actual negatives (0’s) that were predicted to be negatives. Also known as “true negative rate”.
testProbs$predClass = ifelse(testProbs$pred > .5 ,1,0)
caret::confusionMatrix(reference = as.factor(testProbs$obs),
data = as.factor(testProbs$predClass),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 855 152
## 1 14 54
##
## Accuracy : 0.8456
## 95% CI : (0.8226, 0.8667)
## No Information Rate : 0.8084
## P-Value [Acc > NIR] : 0.0008601
##
## Kappa : 0.3305
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.26214
## Specificity : 0.98389
## Pos Pred Value : 0.79412
## Neg Pred Value : 0.84906
## Prevalence : 0.19163
## Detection Rate : 0.05023
## Detection Prevalence : 0.06326
## Balanced Accuracy : 0.62301
##
## 'Positive' Class : 1
##
Then we create an ROC (receiver operating characteristic) curve. It’s better when the black line is more curved to the y=1.00
ggplot(testProbs, aes(d = obs, m = pred)) +
geom_roc(n.cuts = 50, labels = FALSE) +
style_roc(theme = theme_grey) +
geom_abline(slope = 1, intercept = 0, size = 1.5, color = 'grey')
auc(testProbs$obs, testProbs$pred)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
## Area under the curve: 0.8124
Cross-validation iteratively creates many randomly generated test sets or ‘folds’, testing the power of the model on each.
ctrl <- trainControl(method = "cv",
number = 100,
savePredictions = TRUE)
cvFit <- train(as.factor(is_flood) ~ ., data = cal_m %>%
as.data.frame() %>%
select(-geometry),
method="glm", family="binomial",
trControl = ctrl)
cvFit
ggplot(as.data.frame(cvFit$resample), aes(Accuracy)) +
geom_histogram() +
scale_x_continuous(limits = c(0, 1)) +
labs(x="Accuracy",
y="Count")+
plotTheme
## Generalized Linear Model
##
## 3588 samples
## 4 predictor
## 2 classes: '0', '1'
##
## No pre-processing
## Resampling: Cross-Validated (100 fold)
## Summary of sample sizes: 3553, 3552, 3552, 3552, 3552, 3551, ...
## Resampling results:
##
## Accuracy Kappa
## 0.8526246 0.2710506
In our opinion, we have a generalizable model.
Then, we predict for the entire dataset and assess our predictions.
allPredictions <-
predict(cvFit, cal_m, type="prob")[,2]
cal_pred <-
cbind(cal_m,allPredictions) %>%
mutate(allPredictions = round(allPredictions * 100))
ggplot() +
geom_sf(data=cal_pred, aes(fill=factor(ntile(allPredictions,5))),
colour=NA) +
scale_fill_manual(values = c("#edf8fb","#b3cde3","#8c96c6","#8856a7","#810f7c"),
labels=as.character(quantile(cal_pred$allPredictions,
c(0.1,.2,.4,.6,.8),
na.rm=T)),
name="Predicted\nProbabilities(%)\n(Quintile\nBreaks)") +
mapTheme()
Observed and Predicted Flood Areas Calgary; flood area in red
ggplot() +
geom_sf(data=cal_pred, aes(fill=factor(ntile(allPredictions,5))), colour=NA) +
scale_fill_manual(values = c("#edf8fb","#b3cde3","#8c96c6","#8856a7","#810f7c"),
labels=as.character(quantile(cal_pred$allPredictions,
c(0.1,.2,.4,.6,.8),
na.rm=T)),
name="Predicted\nProbabilities(%)\n(Quintile\nBreaks)") +
geom_sf(data=cal_pred %>%
filter(is_flood == 1),
fill="red",colour=NA, alpha=0.5) +
mapTheme ()
cal_pred %>%
mutate(confResult=case_when(allPredictions < 50 & is_flood==0 ~ "True_Negative",
allPredictions >= 50 & is_flood==1 ~ "True_Positive",
allPredictions < 50 & is_flood==1 ~ "False_Negative",
allPredictions >= 50 & is_flood==0 ~ "False_Positive")) %>%
ggplot()+
geom_sf(aes(fill = confResult), color = "transparent")+
scale_fill_manual(values = c("Red","Orange","Light Blue","Light Green"),
name="Outcomes")+
labs(title="Confusion Metrics") +
mapTheme()
den_result <-
st_read("C:/Users/jrach/Desktop/CPLN675Midterm/result/den_result.shp") %>%
st_transform(crs=6427)
den_temp<-
den_result %>%
mutate(slope = slope /11)%>%
mutate(diwater = diwater/6246)
den_m <-
den_temp %>%
select(diwater,slope,drain,infilp)
classProbs_den <- predict(floodModel, den_m, type="response")
finalpred <-
cbind(den_m,classProbs_den) %>%
mutate(allPredictions_den = round(classProbs_den * 100))
ggplot() +
geom_sf(data=finalpred, aes(fill=factor(ntile(allPredictions_den,5))), colour=NA) +
scale_fill_manual(values = c("#edf8fb","#b3cde3","#8c96c6","#8856a7","#810f7c"),
labels=as.character(quantile(finalpred$allPredictions_den,
c(0.1,.2,.4,.6,.8),na.rm=T)),
name="Predicted\nProbabilities(%)\n(Quintile\nBreaks)") +
mapTheme()
Finally, we use our model on the dataset we collected for Denver. Here is the prediction map.
finalprediction <-
finalpred %>%
mutate(pre = ifelse(allPredictions_den<24,1,0))
ggplot() +
geom_sf(data=finalprediction, aes(fill=factor(pre)), colour=NA) +
scale_fill_manual(values = c("blue","grey"),
labels=c("flood","non-flood"),
name="Final Prediction\n(comparable city)") +
mapTheme()