Projects and Jupyter Notebooks
This Jupyter Books deployment houses some of the smaller projects that I work on in my spare time. These notebooks are mostly concerned with practicing the skills that I learn in class on data sets that I find on my own, not with trying to draw conclusions from data. I am not concerned with the validity of the data in these notebooks, only with practicing the techniques that are being taught. You can see the full-scale project portfolio online at portfolio.cpyles.com.
Notebook | Summary |
---|---|
insurance.ipynb | This notebook focuses on an insurance data set that has information about premiums, BMI, sex, and smoker vs. non-smoker, among other things. In this notebook, I looked for linear relationships between BMI and insurance premiums using the correlation coefficient and I ran an A/B test to determine whether or not smokers are charged more in insurance premiums than are non-smokers. |
avocado.ipynb | This notebook, written in R and Python, has a $k$-nearest neighbors classifier that classified avocados as organic or conventional using their average price, volume sold, and number of bags used. I then used the test set to determine the optimum value of $k$, which was 35 (with an accuracy of 94.4%). It also contains a principal component analysis of the data set. |
graduate-admissions.ipynb | This notebook focuses on graduate program admissions data. I started with a Principal Component Analysis of the data, and then used sklearn to create a linear model to predict the chance of admission. I ended by bootstrapping the dataset to determine the steady-state accuracy of the model. |
movies.ipynb | [IN PROGRESS] In this notebook, I am building a $k$-NN classifier that determines a movie’s genre by analyzing the frequency of words in its synopsis. |