ml/demo

暫無描述

Grega Bremec 372b47bbb4 update with results from jupyter notebook run		3 月之前
code	372b47bbb4 update with results from jupyter notebook run	3 月之前
docs	b3357c5cdb add data sources, fix download instructions in readme, fix tensorflow env temporarily	3 月之前
envs	88325ed940 finally complete the tflow 2.19 env	3 月之前
.gitignore	b3357c5cdb add data sources, fix download instructions in readme, fix tensorflow env temporarily	3 月之前
README.adoc	b3357c5cdb add data sources, fix download instructions in readme, fix tensorflow env temporarily	3 月之前

		
				README.adoc
			
				= What is this? =

This is a small repository of suggestions and guidelines intended to help you getting to experiment with simple ML models.

These are the docs:

* link:docs/GETTING_UP.adoc[Getting Up]: how to figure out what the hell this is all about
* link:docs/GETTING_RUNNING.adoc[Getting Running]: how to set up your Conda environments so you can start playing
* link:docs/JUPYTERLAB.adoc[JupyterLab]: modify your base Conda env to run JupyterLab and easily execute notebooks in other envs

For fun:

* link:docs/INSTRUCTLAB.adoc[InstructLab]: how to play with HuggingFace models from InstructLab

== Magic Time ==

https://www.kaggle.com/datasets/yasserh/wine-quality-dataset[Wine Quality Dataset] is a versatile dataset that can be used both as a classification or a regression data set.

Download it from the above link and place it at the top of this git repository, in the same directory as this file, calling it `WineQT.csv`.

It has features using 11-dimension tensors describing a wine's chemical composition, with one integer label between 0 and 10 to express the rating the wine got.

=== What the Files Do ===

The following files are available in this project:

`code/wine-sklearn.py`::
    A SciKit-Learn script that loads data, splits it into training and testing subsets, normalizes the features and trains a _C-Support Vector Classification_ model called `SVC` in SKLearn. It then proceeds to visualise the efficiency of the model using a _confusion matrix_ and a heatmap. The idea is that the commented part, training of a modified SVC called NuSVC, which has an issue, would demonstrate how awkward it is to test and fix the script by constantly re-running it. Run this in `sklearn-16` environment, by executing `python3 ./code/wine-sklearn.py` from the top level directory.

`code/wine-sklearn.ipynb`::
    Starts the same as the above script, only using a JupyterLab notebook. Because you can be selective about which cells to run, nothing is commented out. You are free to re-run sections of the notebook as often as you want, but of course - provisions have to be made for prerequisites, blocks of code that either declare some variables or process their data in some way.
+
In addition to fitting a `SVC` and a `NuSVC` classification models, it also shows how the Wine Quality Dataset can be used with regression by fitting a `SVR` model.
+
Run the examples in this notebook after you enabled JupyterLab and added the kernels to the base environment. If you named your kernels differently, ensure you chose the correct one in the top-right corner after opening the notebook.