Data Science Insight 2

In the podcast episode “Criminology and Data Science” by Linear Digressions, Zach Drake advocates for a “crime and place” method to collect data on crime locations. He argues that it is more efficient than the traditional method that is more person-centered. Instead of seeking patterns and similar characteristics throughout a variety of crime locations, which is the current practice, Drake suggests focusing on the many spatial features in each particular location. It also employs smaller units of analysis in regards to crime locations, such as streets rather than neighborhoods, which is also to better understand the variability in crime based on location. The person-centric view collapses the data, often returning a very low correlation and is unable to explain the variability. The data typically falls at an R² of .2 or .3. A possible explanation could be that the data is so complex that analysts cannot input every possible variable, making it extremely difficult to have a reliable predictive model.

When using the “crime and place” method, the analysts can precisely map where the crimes took place and the spatial features nearby. The uniquely spatial variables, such as traffic patterns, public transportation variability, or street lighting, are then used to build models to explain crime rates. This method returns a higher coefficient of determination, as well as reveals correlations between spatial features and crime that might otherwise be overlooked in the person-centered method because the analyst did not think it would have an effect on crime rates. While neither process can determine causation of crime, the ”crime and place” data science method reveals the correlates and better explains variation in the data. The “crime and place” method uses intricate and abundant inputs in order to create linear regressions that reveal a stronger correlation. This section of criminology is improved by allowing the covariates, the spatial features, to correlate with crime rates on their own, without having to be identified, deemed relevant by the analyst, and then prove themselves statistically significant. This perspective prioritizes the interdependency of covariates and the results they yield rather than neglecting one for the other.

Drake’s assertion could also be applied elsewhere in criminology in order to remove the racial inequality in the judicial system when conducting recidivism risk assessments. Currently, the popular recidivism algorithm called COMPAS, which influences jail time, exclusively uses the personal experiences and characteristics of criminals to determine their recidivism. The algorithm utilizes 137 inputs, to include employment, parental imprisonment, and location of residence, that predict the recidivism of the criminal. It consistently predicts that black defendants are about twice as likely to reoffend than the white defendants, yet race is not an input for COMPAS. However, the inputs can be largely influenced by race due to residual racism that creeps through job opportunities, criminal justice systems, and society in general. This clearly causes a bias in the predictive technology, even though the input data is clean and seemingly unprejudiced. Recidivism algorithms could be improved by focusing on the results of criminals with similar inputs, rather than concentrating only on the inputs (which are overwhelmingly laced with racism and stigmas). This would remove the human bias that comes with certain inputs in the recidivism algorithm, just as human opinions on what spatial features are significant to crime locations are removed in the “crime and place” method. An improved algorithm would consider how the covariates interact with the actual results of previous criminals.