Load the Wine Quality Dataset (Combined) data.
import pandas as pd
data = pd.read_csv("../WineQuality.csv")
Sample the data.
data
Unnamed: 0 | fixed acidity | volatile acidity | citric acid | residual sugar | chlorides | free sulfur dioxide | total sulfur dioxide | density | pH | sulphates | alcohol | quality | Type | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2732 | 7.4 | 0.170 | 0.29 | 1.4 | 0.047 | 23.0 | 107.0 | 0.99390 | 3.52 | 0.65 | 10.4 | 6 | White Wine |
1 | 2607 | 5.3 | 0.310 | 0.38 | 10.5 | 0.031 | 53.0 | 140.0 | 0.99321 | 3.34 | 0.46 | 11.7 | 6 | White Wine |
2 | 1653 | 4.7 | 0.145 | 0.29 | 1.0 | 0.042 | 35.0 | 90.0 | 0.99080 | 3.76 | 0.49 | 11.3 | 6 | White Wine |
3 | 3264 | 6.9 | 0.260 | 0.29 | 4.2 | 0.043 | 33.0 | 114.0 | 0.99020 | 3.16 | 0.31 | 12.5 | 6 | White Wine |
4 | 4931 | 6.4 | 0.450 | 0.07 | 1.1 | 0.030 | 10.0 | 131.0 | 0.99050 | 2.97 | 0.28 | 10.8 | 5 | White Wine |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
32480 | 2838 | 5.0 | 0.255 | 0.22 | 2.7 | 0.043 | 46.0 | 153.0 | 0.99238 | 3.75 | 0.76 | 11.3 | 6 | White Wine |
32481 | 6414 | 6.6 | 0.360 | 0.52 | 11.3 | 0.046 | 8.0 | 110.0 | 0.99660 | 3.07 | 0.46 | 9.4 | 5 | White Wine |
32482 | 1126 | 6.3 | 0.200 | 0.24 | 1.7 | 0.052 | 36.0 | 135.0 | 0.99374 | 3.80 | 0.66 | 10.8 | 6 | White Wine |
32483 | 2924 | 6.2 | 0.200 | 0.33 | 5.4 | 0.028 | 21.0 | 75.0 | 0.99012 | 3.36 | 0.41 | 13.5 | 7 | White Wine |
32484 | 5462 | 8.1 | 0.280 | 0.46 | 15.4 | 0.059 | 32.0 | 177.0 | 1.00040 | 3.27 | 0.58 | 9.0 | 4 | White Wine |
32485 rows × 14 columns
Retain the useful 11 features and isolate quality alone as a label, then split them both into training/test subsets.
x_data = data.drop(data.columns[0], axis=1).drop(["quality","Type"], axis=1)
y_data = data.quality
Split the samples into 80/20 train/test subsets.
train_size = int(len(x_data) * 0.8)
x_train = x_data[:train_size]
y_train = y_data[:train_size]
x_test = x_data[train_size:]
y_test = y_data[train_size:]
print("Using {} samples for training and {} samples for training.\n".format(len(x_train), len(x_test)) +
"Total of {} records, dataset size is {} rows.\n".format(len(x_train) + len(x_test), len(x_data)) +
"Training set has a shape of {}, labels have a shape of {}".format(x_data.shape, y_data.shape))
Using 25988 samples for training and 6497 samples for training. Total of 32485 records, dataset size is 32485 rows. Training set has a shape of (32485, 11), labels have a shape of (32485,)
Create a sequential model definition, one deep RELU intermediate layer with softmax output and 10 possible values.
import keras
from keras import layers
classifier_init = keras.Sequential([
layers.Dense(11, activation="relu"),
layers.Dense(44, activation="relu"),
layers.Dense(10, activation="softmax")
])
Compile the model with adam optimiser and sparse categorical cross-entropy loss function. Track accuracy.
classifier_init.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
2025-04-16 20:56:16.639749: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Max 2025-04-16 20:56:16.639783: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 64.00 GB 2025-04-16 20:56:16.639792: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 24.00 GB WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1744826176.639810 59696050 pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. I0000 00:00:1744826176.639835 59696050 pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Fit the model across 20 epochs with batches of 1000 samples, using a further 80/20 split for training and validation subsets.
epochs_init = 20
history_init = classifier_init.fit(x_train, y_train,
epochs=epochs_init, batch_size=1000,
validation_split=0.2)
Epoch 1/20 2025-04-16 20:56:17.150147: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled. [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 44ms/step - accuracy: 0.0138 - loss: 32.4910 - val_accuracy: 0.3705 - val_loss: 8.4827 Epoch 2/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 0.3566 - loss: 7.4516 - val_accuracy: 0.3228 - val_loss: 4.4466 Epoch 3/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - accuracy: 0.2947 - loss: 4.3572 - val_accuracy: 0.3032 - val_loss: 3.1995 Epoch 4/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.3092 - loss: 2.8685 - val_accuracy: 0.3322 - val_loss: 1.9692 Epoch 5/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.3381 - loss: 1.8321 - val_accuracy: 0.3405 - val_loss: 1.6127 Epoch 6/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.3474 - loss: 1.5980 - val_accuracy: 0.3621 - val_loss: 1.5295 Epoch 7/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.3604 - loss: 1.5212 - val_accuracy: 0.3588 - val_loss: 1.4604 Epoch 8/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.3757 - loss: 1.4557 - val_accuracy: 0.3809 - val_loss: 1.4095 Epoch 9/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.3969 - loss: 1.3931 - val_accuracy: 0.3948 - val_loss: 1.3708 Epoch 10/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4164 - loss: 1.3594 - val_accuracy: 0.4071 - val_loss: 1.3422 Epoch 11/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4286 - loss: 1.3367 - val_accuracy: 0.4204 - val_loss: 1.3164 Epoch 12/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4420 - loss: 1.2971 - val_accuracy: 0.4240 - val_loss: 1.2971 Epoch 13/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4422 - loss: 1.2808 - val_accuracy: 0.4323 - val_loss: 1.2831 Epoch 14/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4440 - loss: 1.2797 - val_accuracy: 0.4375 - val_loss: 1.2734 Epoch 15/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4593 - loss: 1.2620 - val_accuracy: 0.4396 - val_loss: 1.2652 Epoch 16/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4515 - loss: 1.2635 - val_accuracy: 0.4357 - val_loss: 1.2605 Epoch 17/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4544 - loss: 1.2541 - val_accuracy: 0.4356 - val_loss: 1.2582 Epoch 18/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4617 - loss: 1.2464 - val_accuracy: 0.4365 - val_loss: 1.2535 Epoch 19/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4540 - loss: 1.2518 - val_accuracy: 0.4454 - val_loss: 1.2491 Epoch 20/20 [1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4577 - loss: 1.2375 - val_accuracy: 0.4361 - val_loss: 1.2477
The above doesn't seem to be achieving a particularly good accuracy. Let's tweak the model a bit and retrain:
import tensorflow as tf
tf.random.set_seed(42);
tf.keras.utils.set_random_seed(42);
classifier_new = keras.Sequential([
layers.Input((11,)),
layers.Dense(128, activation="relu"),
layers.Dense(128, activation="relu"),
layers.Dense(10, activation="softmax")
])
classifier_new.compile(optimizer=keras.optimizers.Adam(learning_rate=0.0001),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
epochs_new = 30
history_new = classifier_new.fit(x_train, y_train,
epochs=epochs_new, batch_size=500,
validation_split=0.3)
Epoch 1/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 24ms/step - accuracy: 0.3593 - loss: 17.3410 - val_accuracy: 0.3596 - val_loss: 3.3322 Epoch 2/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.3873 - loss: 2.9919 - val_accuracy: 0.3977 - val_loss: 1.9189 Epoch 3/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4113 - loss: 1.7894 - val_accuracy: 0.4211 - val_loss: 1.3407 Epoch 4/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4389 - loss: 1.3377 - val_accuracy: 0.4422 - val_loss: 1.2794 Epoch 5/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.4576 - loss: 1.2830 - val_accuracy: 0.4526 - val_loss: 1.2479 Epoch 6/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4629 - loss: 1.2533 - val_accuracy: 0.4617 - val_loss: 1.2289 Epoch 7/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.4738 - loss: 1.2359 - val_accuracy: 0.4615 - val_loss: 1.2208 Epoch 8/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.4770 - loss: 1.2270 - val_accuracy: 0.4645 - val_loss: 1.2155 Epoch 9/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4809 - loss: 1.2223 - val_accuracy: 0.4657 - val_loss: 1.2157 Epoch 10/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4810 - loss: 1.2186 - val_accuracy: 0.4701 - val_loss: 1.2144 Epoch 11/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4813 - loss: 1.2167 - val_accuracy: 0.4733 - val_loss: 1.2142 Epoch 12/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4810 - loss: 1.2151 - val_accuracy: 0.4742 - val_loss: 1.2139 Epoch 13/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4813 - loss: 1.2138 - val_accuracy: 0.4724 - val_loss: 1.2130 Epoch 14/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4817 - loss: 1.2128 - val_accuracy: 0.4743 - val_loss: 1.2123 Epoch 15/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4822 - loss: 1.2120 - val_accuracy: 0.4716 - val_loss: 1.2119 Epoch 16/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4822 - loss: 1.2115 - val_accuracy: 0.4724 - val_loss: 1.2111 Epoch 17/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4823 - loss: 1.2104 - val_accuracy: 0.4724 - val_loss: 1.2103 Epoch 18/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.4825 - loss: 1.2101 - val_accuracy: 0.4735 - val_loss: 1.2098 Epoch 19/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4835 - loss: 1.2096 - val_accuracy: 0.4747 - val_loss: 1.2101 Epoch 20/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4828 - loss: 1.2089 - val_accuracy: 0.4769 - val_loss: 1.2081 Epoch 21/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4845 - loss: 1.2080 - val_accuracy: 0.4780 - val_loss: 1.2078 Epoch 22/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4839 - loss: 1.2070 - val_accuracy: 0.4781 - val_loss: 1.2064 Epoch 23/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4846 - loss: 1.2058 - val_accuracy: 0.4789 - val_loss: 1.2049 Epoch 24/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4834 - loss: 1.2048 - val_accuracy: 0.4802 - val_loss: 1.2044 Epoch 25/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.4856 - loss: 1.2040 - val_accuracy: 0.4825 - val_loss: 1.2030 Epoch 26/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4854 - loss: 1.2031 - val_accuracy: 0.4825 - val_loss: 1.2027 Epoch 27/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.4862 - loss: 1.2027 - val_accuracy: 0.4822 - val_loss: 1.2021 Epoch 28/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4865 - loss: 1.2016 - val_accuracy: 0.4849 - val_loss: 1.2008 Epoch 29/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.4871 - loss: 1.2008 - val_accuracy: 0.4865 - val_loss: 1.1995 Epoch 30/30 [1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.4867 - loss: 1.1996 - val_accuracy: 0.4848 - val_loss: 1.1989
This looks better, but how can you be sure? Visualise it!
import matplotlib.pyplot as plt
val_loss = history_new.history["val_loss"]
trn_loss = history_new.history["loss"]
val_accuracy = history_new.history["val_accuracy"]
trn_accuracy = history_new.history["accuracy"]
fig, loss = plt.subplots()
loss.plot(range(0, epochs_new), val_loss, "b-", label="Validation Loss")
loss.plot(range(0, epochs_new), trn_loss, "r-", label="Training Loss")
loss.set_ylabel("Loss")
h1, l1 = loss.get_legend_handles_labels()
accr = loss.twinx()
accr.plot(range(0, epochs_new), val_accuracy, "g", label="Validation Accuracy")
accr.plot(range(0, epochs_new), trn_accuracy, "y", label="Training Accuracy")
accr.set_ylabel("Accuracy")
h2, l2 = accr.get_legend_handles_labels()
fig.legend(h1 + h2, l1 + l2, loc=(0.5, 0.5))
plt.xlabel("Epochs")
plt.xticks(range(0, epochs_new, 5))
plt.show()
Let's compare that with the previous run.
val_loss = history_init.history["val_loss"]
trn_loss = history_init.history["loss"]
val_accuracy = history_init.history["val_accuracy"]
trn_accuracy = history_init.history["accuracy"]
fig, loss = plt.subplots()
loss.plot(range(0, epochs_init), val_loss, "b-", label="Validation Loss")
loss.plot(range(0, epochs_init), trn_loss, "r-", label="Training Loss")
loss.set_ylabel("Loss")
h1, l1 = loss.get_legend_handles_labels()
accr = loss.twinx()
accr.plot(range(0, epochs_init), val_accuracy, "g", label="Validation Accuracy")
accr.plot(range(0, epochs_init), trn_accuracy, "y", label="Training Accuracy")
accr.set_ylabel("Accuracy")
h2, l2 = accr.get_legend_handles_labels()
fig.legend(h1 + h2, l1 + l2, loc=(0.1, 0.7))
plt.xlabel("Epochs")
plt.xticks(range(0, epochs_init, 5))
plt.show()