This web application was part of a 3 week project at Insight Data Science. I originally started this project because I was interested in developing a data driven approach to reducing crime in the NYC area. Working on this project I quickly noticed that different neighborhoods are affected by different types of crime and these crimes peak at different times of the year (you can see this blog post to read more). I thought if I could make a web application that forecasts monthly crime rates on a local level it might help police redistribute their resources more effectively and thus reduce the crime in NYC. The applicaiton could also be of interest to individuals or business who are concerned about crime rates in their neighborhood.
The application prompts to enter an address from the input page seen below:
And they get back a report on the historical trends of crimes in their neighborhood:
Users can select specific crimes in their neighborhood and get the historical trend, seasonality, as well as which days and times most of these crimes happen. The results for assault are shown below:
Users can also choose to forecast specific crime rates into the future.
How It works
This code was written using Python and Flask and deployed to Amazon web services. Users are prompted to enter an address and then I use the geopy library to get the latitude and longitude of the address. Once that latitude and longitude are known I use the shapely library to find out which police precinct the address is in and obtain the data on that police precinct.
The info for police precincts was obtained by scraping the NYPD’s website using the beautifulsoup library and also this specific database. The historical crime data was obtained from the NYC Open Data Website and cleaning was completed using Pandas and GeoPandas. The data was then stored in a SQLite database. Forecasted crime rates were predicted using a seasonal ARIMA model through the Python library StatsModels. I used a grid search to obtain the appropriate model paramaters with the selection criteria that the choice of parameters must minimize the validation error.
- Python 2.7
- StatsModels (0.8.0rc1)
- Pandas (0.19.1)
- GeoPandas (0.2.1)
- Geopy (1.11.0)
- Shapely (1.5.17)
- Flask (0.11.1)
- Basemap (1.0.7)
- Matplotlib (1.5.3)
- Numpy (1.11.2)
- Beautifulsoup4 (4.5.3)
- Sphinx (only to build documentation)
- pytest (only for testing)
Running it on your own computer
To run this web application on your computer make sure you have obtained or built the SQLite
database and have all the dependencies installed on you computer. You can install all the
dependencies using pip (except for python, Sphinx, Basemap and Statsmodels) by typing the following command from the
pip install -r requirements.txt
To install Basemap (1.0.7) and StatsModels (0.8.0rc1) use the Anaconda distribution.
Then run the command in the
You should see something like:
Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
Enter the address http://0.0.0.0:5000/ into your web browser to use the web application.
Building the database
To build the database on your local machine first download the file “NYPD_7_Major_Felony_Incident_Map.csv” from the NYC Open Data website and
place it in the
CrimeTime/data/ directory. Then type
the folowing command into your terminal from the
NOTE: If NYC Open Data no longer has the file on their website, please email me and I will provide you with the database.
To test the code to make sure it works run the following command in your terminal shell from the
You will then see a report on the testing results.
To build the documentation for this code type the following command in terminal from
sphinx-apidoc -F -o doc/ backend/ Then cd into the <code>doc/</code> directory and type, make html
The html documentation will be in the directory
_build/html/. Open the file
index.html in that directory.