# Projects and Jupyter Notebooks

This Jupyter Books deployment houses some of the smaller projects that I work on in my spare time. These notebooks are mostly concerned with practicing the skills that I learn in class on data sets that I find on my own, not with trying to draw conclusions from data. I am not concerned with the validity of the data in these notebooks, only with practicing the techniques that are being taught. You can see the full-scale project portfolio online at portfolio.cpyles.com.

Notebook Summary
insurance.ipynb This notebook focuses on an insurance data set that has information about premiums, BMI, sex, and smoker vs. non-smoker, among other things. In this notebook, I looked for linear relationships between BMI and insurance premiums using the correlation coefficient and I ran an A/B test to determine whether or not smokers are charged more in insurance premiums than are non-smokers.
avocado.ipynb This notebook, written in R and Python, has a $k$-nearest neighbors classifier that classified avocados as organic or conventional using their average price, volume sold, and number of bags used. I then used the test set to determine the optimum value of $k$, which was 35 (with an accuracy of 94.4%). It also contains a principal component analysis of the data set.
graduate-admissions.ipynb This notebook focuses on graduate program admissions data. I started with a Principal Component Analysis of the data, and then used sklearn to create a linear model to predict the chance of admission. I ended by bootstrapping the dataset to determine the steady-state accuracy of the model.
movies.ipynb [IN PROGRESS] In this notebook, I am building a $k$-NN classifier that determines a movie’s genre by analyzing the frequency of words in its synopsis.