Create polygon files of school districts in Harris county and intersect with zipcodes.
Author
Alan Jackson
Published
September 11, 2020
Prepare data for analyzing zipcode-based data by school district
During COVID, at one point, Harris county began to supply infection rates daily by zip code. At about the same time, different school districts in the area developed wildly differing policies on returning to in-person instruction, masking, etc. In order to study those differences I needed to be able to take the COVID data by zip code and interpolate it into the various districts. That is the purpose of this exercise.
First let’s get the school districts read in and cleaned up
Data was downloaded in 2019 from the Texas Education Agency website at https://schoolsdata2-tea-texas.opendata.arcgis.com/datasets/36b0225013a04341bb49b98155186863/explore as a shapefile.
The proj file indicates that the data is in EPSG 4326 - same as Google Maps, so Yay!
# Read in the shapefilesSchoolPolys <-read_sf(paste0(path,"SchoolDistricts/Current_Districts.shp"))summary(SchoolPolys)
FID SDLEA10 NAME NAME2
Min. : 1 Length:1021 Length:1021 Length:1021
1st Qu.: 256 Class :character Class :character Class :character
Median : 511 Mode :character Mode :character Mode :character
Mean : 511
3rd Qu.: 766
Max. :1021
DISTRICT_N DISTRICT DISTRICT_C NCES_DISTR
Min. : 1902 Length:1021 Length:1021 Length:1021
1st Qu.: 64903 Class :character Class :character Class :character
Median :117904 Mode :character Mode :character Mode :character
Mean :125557
3rd Qu.:184902
Max. :254902
COLOR SHAPE_Leng SHAPE_Area geometry
Min. :1.000 Min. :0.0456 Min. :1.272e+06 MULTIPOLYGON :1021
1st Qu.:2.000 1st Qu.:0.8799 1st Qu.:2.100e+08 epsg:4326 : 0
Median :4.000 Median :1.2803 Median :3.936e+08 +proj=long...: 0
Mean :3.984 Mean :1.4230 Mean :6.795e+08
3rd Qu.:6.000 3rd Qu.:1.8629 3rd Qu.:7.842e+08
Max. :7.000 Max. :5.1094 Max. :1.254e+10
# CRS is good, we have polys, I think I'm happy. Let's plot themplot(SchoolPolys[,"NAME"])
# So this is all districts in Texas. How do I carve out Harris county?# I guess by intersecting with Harris countyHarris <-readRDS(paste0(path,"HarrisCounty_16.rds"))sf::st_crs(Harris) <- googlecrs
old-style crs object detected; please recreate object with a recent sf::st_crs()
Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
that
summary(Harris)
County Tract BlkGrp MedAge
Length:2539 Length:2539 Length:2539 Min. :12.20
Class :character Class :character Class :character 1st Qu.:30.70
Mode :character Mode :character Mode :character Median :36.05
Mean :37.22
3rd Qu.:42.70
Max. :83.00
NA's :3
MedAgeSigma MedAgeMale MedAgeFemale Pop
Min. : 0.200 Min. : 7.70 Min. :12.80 Min. : 0.0
1st Qu.: 4.100 1st Qu.:29.50 1st Qu.:30.90 1st Qu.: 993.5
Median : 6.800 Median :34.90 Median :37.20 Median : 1407.0
Mean : 7.924 Mean :36.06 Mean :38.23 Mean : 1683.0
3rd Qu.:10.400 3rd Qu.:41.92 3rd Qu.:44.60 3rd Qu.: 2049.0
Max. :48.200 Max. :85.60 Max. :82.30 Max. :17391.0
NA's :3 NA's :3 NA's :5
PopSigma White Black Amerind
Min. : 13.0 Min. : 0.0 Min. : 0.0 Min. : 0.000
1st Qu.: 261.5 1st Qu.: 730.5 1st Qu.: 5.5 1st Qu.: 0.000
Median : 357.0 Median : 1069.0 Median : 66.0 Median : 0.000
Mean : 390.9 Mean : 1259.5 Mean : 206.1 Mean : 8.259
3rd Qu.: 479.0 3rd Qu.: 1535.5 3rd Qu.: 255.5 3rd Qu.: 4.000
Max. :3022.0 Max. :12241.0 Max. :3025.0 Max. :514.000
Asian Hispanic MedIncome MedIncomeSigma
Min. : 0.00 Min. : 0.0 Min. : 5491 Min. : 1246
1st Qu.: 0.00 1st Qu.: 173.0 1st Qu.: 36698 1st Qu.: 9523
Median : 7.00 Median : 410.0 Median : 51704 Median : 14869
Mean : 70.69 Mean : 613.0 Mean : 60642 Mean : 18716
3rd Qu.: 66.00 3rd Qu.: 839.5 3rd Qu.: 72434 3rd Qu.: 23232
Max. :2249.00 Max. :8892.0 Max. :250001 Max. :156982
NA's :71 NA's :81
Shape
MULTIPOLYGON :2539
epsg:4326 : 0
+proj=long...: 0
# convert to proper EPSGHarris <-st_transform(Harris, googlecrs)foo <- sf::st_make_valid(SchoolPolys)[Harris,]plot(foo)
Warning: plotting the first 10 out of 11 attributes; use max.plot = 11 to plot
all