ml
/
demo


			
				
					
						
						
							12345678910111213141516171819202122232425262728
							= What is this? =

This is a small repository of suggestions and guidelines intended to help you getting to experiment with simple ML models.

These are the docs:

* link:docs/GETTING_UP.adoc[Getting Up]: how to figure out what the hell this is all about
* link:docs/GETTING_RUNNING.adoc[Getting Running]: how to set up your Conda environments so you can start playing
* link:docs/JUPYTERLAB.adoc[JupyterLab]: modify your base Conda env to run JupyterLab and easily execute notebooks in other envs

== Magic Time ==

https://www.kaggle.com/datasets/yasserh/wine-quality-dataset[Wine Quality Dataset] is a versatile dataset that can be used both as a classification or a regression data set.

Download it from the above link and place it in the same directory as the other files, calling it `WineQT.csv`.

It has features using 11-dimension tensors describing a wine's chemical composition, with one integer label between 0 and 10 to express the rating the wine got.

=== What the Files Do ===

The following files are available in this project:

`wine-sklearn.py`::
    A SciKit-Learn script that loads data, splits it into training and testing subsets, normalizes the features and trains a _C-Support Vector Classification_ model called `SVC` in SKLearn. It then proceeds to visualise the efficiency of the model using a _confusion matrix_ and a heatmap. The idea is that the commented part, training of a modified SVC called NuSVC, which has an issue, would demonstrate how awkward it is to test and fix the script by constantly re-running it.

`wine-sklearn.ipynb`::
    The same as the above script, only using a JupyterLab notebook. Because you can be selective about which cells to run, nothing is commented out. You are free to re-run sections of the notebook as often as you want, but of course - provisions have to be made for prerequisites.