= What is this? = This is a small repository of suggestions and guidelines intended to help you getting to experiment with simple ML models. These are the docs: * link:docs/GETTING_UP.adoc[Getting Up]: how to figure out what the hell this is all about * link:docs/GETTING_RUNNING.adoc[Getting Running]: how to set up your Conda environments so you can start playing * link:docs/JUPYTERLAB.adoc[JupyterLab]: modify your base Conda env to run JupyterLab and easily execute notebooks in other envs For fun: * link:docs/INSTRUCTLAB.adoc[InstructLab]: how to play with HuggingFace models from InstructLab == Magic Time == https://www.kaggle.com/datasets/yasserh/wine-quality-dataset[Wine Quality Dataset] is a versatile dataset that can be used both as a classification or a regression data set. Download it from the above link and place it in the same directory as the other files, calling it `WineQT.csv`. It has features using 11-dimension tensors describing a wine's chemical composition, with one integer label between 0 and 10 to express the rating the wine got. === What the Files Do === The following files are available in this project: `wine-sklearn.py`:: A SciKit-Learn script that loads data, splits it into training and testing subsets, normalizes the features and trains a _C-Support Vector Classification_ model called `SVC` in SKLearn. It then proceeds to visualise the efficiency of the model using a _confusion matrix_ and a heatmap. The idea is that the commented part, training of a modified SVC called NuSVC, which has an issue, would demonstrate how awkward it is to test and fix the script by constantly re-running it. `wine-sklearn.ipynb`:: The same as the above script, only using a JupyterLab notebook. Because you can be selective about which cells to run, nothing is commented out. You are free to re-run sections of the notebook as often as you want, but of course - provisions have to be made for prerequisites.