README.adoc 1.8 KB

12345678910111213141516171819202122232425262728
  1. = What is this? =
  2. This is a small repository of suggestions and guidelines intended to help you getting to experiment with simple ML models.
  3. These are the docs:
  4. * link:docs/GETTING_UP.adoc[Getting Up]: how to figure out what the hell this is all about
  5. * link:docs/GETTING_RUNNING.adoc[Getting Running]: how to set up your Conda environments so you can start playing
  6. * link:docs/JUPYTERLAB.adoc[JupyterLab]: modify your base Conda env to run JupyterLab and easily execute notebooks in other envs
  7. == Magic Time ==
  8. https://www.kaggle.com/datasets/yasserh/wine-quality-dataset[Wine Quality Dataset] is a versatile dataset that can be used both as a classification or a regression data set.
  9. Download it from the above link and place it in the same directory as the other files, calling it `WineQT.csv`.
  10. It has features using 11-dimension tensors describing a wine's chemical composition, with one integer label between 0 and 10 to express the rating the wine got.
  11. === What the Files Do ===
  12. The following files are available in this project:
  13. `wine-sklearn.py`::
  14. A SciKit-Learn script that loads data, splits it into training and testing subsets, normalizes the features and trains a _C-Support Vector Classification_ model called `SVC` in SKLearn. It then proceeds to visualise the efficiency of the model using a _confusion matrix_ and a heatmap. The idea is that the commented part, training of a modified SVC called NuSVC, which has an issue, would demonstrate how awkward it is to test and fix the script by constantly re-running it.
  15. `wine-sklearn.ipynb`::
  16. The same as the above script, only using a JupyterLab notebook. Because you can be selective about which cells to run, nothing is commented out. You are free to re-run sections of the notebook as often as you want, but of course - provisions have to be made for prerequisites.