Category Archives: Statistics
It is difficult to draw conclusions from these data due to abundant sampling error.
The sampling methods were not consistent throughout the data collection process. Some students utilized a random number generator in order to choose which profiles to sample, whereas others chose the profiles that were the most legible. Due to the differences in selection criteria, the aggregate sample cannot be considered truly random.
Much of the error also lies within the source material itself. The profiles were not uniformly formatted throughout the time period in question. The information provided was not consistent in terms of vocabulary used or categories provided. For this reason, we standardized the qualitative data to fit within a few specific categories.
The changes that were made to the data are as follows:
All eye colors were fixed to be one color (some were given as two different colors or a mixture of two), and were fixed to be Blue, Brown, Grey, Black, or Hazel. Any variation of these (e.g. Light Blue, Dark Blue, etc.) were fixed to be the base color (e.g. Blue).
All hair colors were fixed to be one color (some were given as two different colors or a mixture of two), and were fixed to be Black, Dark Brown, Brown, Light Brown, Red, Grey, and Light. Any other colors that were listed (e.g. Chestnut, Blond, Auburn, etc.) were fixed to fall within these categories (e.g. Brown, Light, Red, etc.). If a hair color fell within two categories, the color that was listed first was chosen to be the color.
All skin tones were fixed to be Black, Dark, Florid, Fair, and Medium. Any other tones that were listed (e.g. Sallow, Ruddy, Mulatto, etc.) were fixed to fall within one of the given categories. Any extra features (e.g. Freckles) were removed and are not accounted for in the final data set. This section required a great amount of fixing, as there was a broad range of complexions in our sample. Also, when analyzing the data in relation to time, Medium skin tones did not appear until 1879. Florid skin tones appeared more frequently earlier in the data but stopped appearing after 1894. When two skin tones were listed (e.g. Dark or Florid), the tone that was listed first was chosen.
In terms of the crimes committed, crimes were fixed to be uniform. This meant reducing all types of Larceny, Burglary, Robbery, Assault, Forgery, Manslaughter, and Murder to one category. This removed distinctions between Grand Larceny and Petty Larceny, as well as any distinction between different degrees of crime (e.g. 3rd degree Burglary and 1st degree Burglary). This was done in order to compile all of the data. Some profiles listed degrees while others did not, therefore the degrees were removed entirely. In addition, all crimes listed as “Assault to” (e.g. Assault to Commit Rape, Assault to Commit Robbery) were considered to be Assault.
Only Larceny, Burglary, Robbery, Assault, Forgery, Counterfeiting, Rape, Murder and Manslaughter were included in the final analysis, as these were the most common crimes committed (those exceeding 20 instances) or those that were determined to be otherwise significant (e.g. Rape and Murder).
In terms of crimes committed versus complexion, any attempted crime charges were included within their greater categories. Counterfeiting charges were not included in this section because any and all Counterfeiting charges fell within 1896 and 1907. I did not believe that this captured the data as a whole, and chose to leave them out. An analysis of the birth places of the inmates sampled was also not included in order to maintain the scale of the project.
Some data was left unaccounted for (e.g. Occupation, Religion, Literacy). This was due to difficulty in standardizing the data (as in the case of Occupation), or because it was deemed less important for the project as a whole (as in the case of Religion and Literacy). There is also some missing data, such as a large gap in reporting of crimes and sentences between June 1880 and May 1884. There other smaller gaps in the data for eye color, hair color, and complexion throughout the data, but most notably between July 1865 and February 1868. These gaps are due to the data missing from the original register or due to the data being unreadable on the original register.
In making these changes to the data, the accuracy of the sample is diminished, especially when accounting for human error. It is possible that mistakes were made when handling the data, which can lead to further inaccuracies.
This can be considered to be a preliminary analysis of the data. It provides a simple overview of the sample but is by no means exhaustive; it cannot accurately describe the population. There are many sources of error and as such, no true statistical analysis or extrapolation of any kind should be pulled from this data or the charts provided. These data should be used only as a way to illustrate the concepts discussed throughout this site.
If you would like to view the source material, you may do so here [https://docs.google.com/spreadsheets/d/1rHRQhDtqeZfXuY03lDcGL9YR7e6RTB0s7sYkWBrFsqQ/edit?usp=sharing].