Home > Software

yin-yang
yin-yang

Download: source or SVMdark.exe - last updated: 9 July 2005

Download example files: train.txt validation.txt test.txt

The software can perform both classification and regression, the following example uses regression. For classification, replace the outputs with "1" or "-1".

Any field left blank in the interface will be treated as a zero.

This is an exercise in supervised learning, we are given the following input/output data. Given future inputs, we wish to predict (unknown) outputs.

input 1input 2output
707
9110
4610
3811
9413
3811
549
606
3811
459
011
3811
909
303
2911
145
7916
099
189
235


Split the data into three sets (in the ratio 50%, 25%, 25%).

In each set, make the first column the output column and label the others 1:, 2:, etc.

Save as three separate text files, as follows.

Training set (train.txt)
7 1:7 2:0
10 1:9 2:1
10 1:4 2:6
11 1:3 2:8
13 1:9 2:4
11 1:3 2:8
9 1:5 2:4
6 1:6 2:0
11 1:3 2:8
9 1:4 2:5

Validation set (validation.txt)
1 1:0 2:1
11 1:3 2:8
9 1:9 2:0
3 1:3 2:0
11 1:2 2:9

Test set (test.txt)
5 1:1 2:4
16 1:7 2:9
9 1:0 2:9
9 1:1 2:8
5 1:2 2:3


Select "Regression".
Click the "Test file..." button and select validation.txt.
Fill in the other fields as shown below.

screenshot1

Click on the large "Optimize" button.
When the hour glass has gone, open the tab-delimited file "optimize.csv". Note the parameters used which produce a low MSE (mean squared error).
Reduce the number of potential models by "zooming in" on those which perform well on the validation set, repeat if necessary.
In this example, the linear kernel with a small value for epsilon performed best, so we narrow our search accordingly:

screenshot2

Again, inspect "optimize.csv" and identify the smallest MSE.
We select the parameters that performed optimally on the validation set.
Now, we select the test set - click the "Test file..." button again and select test.txt.

screenshot3

Click the "Learn" button.
Click the "Predict" button.

Inspect the file "predictions.txt".

Predictions:
5.0006471
15.999537
9.0003896
9.0003437
5.0006011

Remember the actual outputs:
5
16
9
9
5

Not bad, huh? That is because the above example was both linear and noise-free. However, it's with noisy nonlinear real-world data that the support vector machine shows its real strengths. Enjoy!

Disclaimer

This software is free only for non-commercial use.


LightData Agent written by Ophir Gottlieb takes comma, space or tab delimited data and converts it into the required data format with colons, etc.

formatlibsvm my own code, which formats data for LIBSVM, SVMlight, TinySVM and SVMdark.


Valid XHTML 1.1! | Valid CSS!