Matlab tutorial - part 1

Although Rapidminer is a great tool for selecting and applying your chosen data mining technique, it falls a bit short with respect to the visualization of the data and/or the results. As an alternative, we are going to look at Matlab, which is installed in Caslab as well.
  Matlab is a commercial package for numerical computation and visualization, with a large number of add-on toolboxes for specialized tasks. As a numerical tool, it doesn't deal  with categorical values, which is something to keep in mind for the target classes in general. There is a way to add text labels to graphs, which will be explained below.

Matlab has a command line interface, where you can switch from one directory to another. If you'd like to go up one level in the directory structure, type cd .. Togo to a subdirectory of the current directory, type cd directoryname.

  • Matlab likes things to be in certain places. The easiest way to make things work is to have everything in the same directory, this includes for example your data file and any scripts you write.
  • The command whos shows you which variables are currently in use, together with their size
  • The Matlab command save saves your current session, for example this would save you from loading the same values from your data files again. To save a matlab space, type save myMatlabspace.mat  To load it again, you type load myMatlabspace.mat
  • The command help help gets you started with Matlab's help feature
  • Typing a name on the left side of an equal sign creates a variable with that name and assignes whatever is on the right of the equal sign to it. For example, the command myVariable = 4 creates a variable called myVariable and assigns the value 4 to it. In Matlab, all variables are considered matrices.
  • How to get the data into Matlab 

    Some of the data formats supported by Matlab are delimited text , tab separated text and comma separated numbers. Matlab can also import Excel worksheet but might give you trouble if the values aren't numerical. To get a list of valid fileformats in Matlab, type help fileformats from the Matlab command line. Since we have the data in a comma separated file, we'll be using csv read. 
    Assuming that we're in the same directory as the data file, the command myMatlabData  =  csvread('myData.csv'); will read the data contained in the file called myData.csv into a Matlab variable called myMatlabData.  The semicolon behind the command indicates that the values contained in the file will not be printed out to the screen as the file is read. To see the contents of a variable, type the variable name without a semicolon, followed by return.
     

    Working with matrices

    Matlab allows you to access parts of the matrices. For example, a colon means everything. Assuming we have a matrix called myOldMatrix, then typing myNewMatrix  =  myOldMatrix would be equivalent to typing myNewMatrix = myOldMatrix(:,:) . myOldMatrix(:,:) refers to elements in all rows and all columns of myOldMatrix.

    For accessing parts of the matrix, you can specify row and column numbers. Some examples are:
    myOldMatrix(1,1)   refers to the element in the first row and first column of myOldMatrix
    myOldMatrix(1,:)  refers to the elements in the first row and all columns of myOldMatrix
    myOldMatrix(:,1)   refers to the elements in all rows and the first column of myOldMatrix
    myOldMatrix(:,1-4)   refers to the elements in all rows and the first 4 columns of myOldMatrix
     

    Plotting the data in two dimensions

    Choose which two of your columns you would like to visualize. To create a plot of the first two columns of  the values of the matrix called myMatlabData, which contains the Iris data, you can type plot(myMatlabData(:,1),myMatlabData(:,2)); which produces the figure below.
     
    MatlabVisualization1
     
    By default, Matlab connects the data values with blue lines. This can be changed by adding additional parameters to the plot command. By the way, the matlab command help plot gives you a lot of information about plotting in two dimensions. If you want to change the color in which the data values are plotted from the default blue, you can use the following letters to specify other colors:
     

    g green
    r red
    k black
    c cyan
    y yellow
    m magenta
    b blue

    For example, the command plot(myMatlabData(:,1),myMatlabData(:,2),'r'); produces the same plot with red lines. Note that everytime you execute another plot command, the previous figure is cleared. You can change this by typing hold on to add to the initial figure. This can be turned off with hold off

    You can also change the markers for the values.  The markers are specified in the following way (this is a partial list, for more information, use help plot).
     

    .  point
    o circle
    x x-mark
    + plus sign

    As an example, if you want to use green x-marks to plot all row values s for the last two columns of the matrix, you would use 
    plot(myMatlabData(:,3),myMatlabData(:,4),'gx'); which produces the following:
    MatlabTutorial2
    You can also change the marker size, add legends, titles and labels to your plots. Please refer to help plot or to the Mathworks site (link is at bottom of page) for complete information on how to do that. There are a number of buttons on the top of the figure, which let you zoom in or out, or even rotate the picture, which is especially helpful in 3 dimensions.

    Plotting the data in three dimensions

    Plotting the data in three dimensions is simply an extension of plotting it in two dimensions.  Assuming we want to plot the values contained in the first three dimensions for all rows, we can use the following command. Note though that now we have to use the plot3 instead of the plot command.

    plot3(myMatlabData(:,1),myMatlabData(:,2),myMatlabData(:,3),'rx'); This produces the figure:
     

    plotting in 3d

    How to add text labels 

    Note that this is shown below for two dimensions, but it extends to 3 d in a straighforward manner.
    Now that the data is plotted, we need to find a way to identify the points in the plots. There are a number of ways to do this. Perhaps the easiest way is to plot points in different colors according to the different classes they belong to. Of course, the data needs to be sorted to save you a lot of typing. For the Iris data, if we assume the data is sorted according to the target classes, then you can plot the first 50 values in one color, the next 50 values in a second color and the last 50 in a third color. 
    There is also a way to add the values in the matrix itself to the plot. This can get really messy for even a small number of data points, but it gives you some information about columns (remember, they are attribute values) that aren't plotted. In order to do so for the first 20 rows of your matrix you type (the order of the commands is important, you also need to press enter after each command)

    plot the values of the first 20 rows in the first two column with values marked by a red x
    plot(myMatlabData(1:20,1),myMatlabData(1:20,2),'rx');

    convert the numerical values in the matrix to strings
    n=num2str(myMatlabData,'%5.3f/');

    remove the trailing slash from the strings, which is the last character in the character array (the matlab command end refers to the last column).
    n=n(1:20,1:end-1); 

    add the text to the plot (the text command can also be used for pointing out interesting parts of the plot, try out help text from the Matlab command line for further information).
    text(myMatlabData(1:20,1),myMatlabData(1:20,2),n);
     

    This produces the following picture for the Iris data, where next to each of the 20 data points marked in red and plotted according to the values in the first two columns, the complete attribute values are added as labels. 


    adding text labels
    Of course the labels don't have to be the values themselves, they could also be the target class names themselves. To create a matrix holding the target classes you need to use textread, since the names of the classes in general will not be numeric like in the dwarf dataset. For the Iris dataset, this works as follows. Let's assume we have the original target classes in a text file called IrisClasses.txt (one target class per row for each of the 150 samples). To read in the class values, we use textread. The second (and last) parameter in the bracket and enclosed in quotes meansthat we read in one column at a time, ignoring everything following until the newline character.
    [IrisClasses]=textread('IrisClasses.txt','%s%*[^\n]')

    Add a blank character to the front of the strings contained in IrisClasses for easier readability.
    [IrisClasses]=strcat({' '},IrisClasses);

    Now plot the Iris data, this time we plot all the values for the first two columns
    plot(myMatlabData(:,1),myMatlabData(:,2),'rx'); 

    This time, the values we want as labels are already strings, so we can skip the num2str conversion we performed before and can now add the labels to the plot, this time using the Strings in IrisClasses rather than n.
    text(myMatlabData(:,1),myMatlabData(:,2),IrisClasses);

    This produces the figure below (for readibility, I changed the labels in the original text file).

    Text labels
    Of course, you can then try to plot more than one class value at a time...for example the predicted vs. the target class from the original datafile. To achieve this, you could use a combination of text labels, colors and marker shapes to plot the values. 
     

    How to save your figures

    If you look at the top of the figure you produced in Matlab, you will see a FILE button. Clicking on that will bring up a menu, from which you can choose EXPORT to save the figure in various file formats. Some of the file formats you can choose are jpeg and bitmap as well as eps files. You can also print your figures from the FILE menu. 
     

    Where to get more help

    One site to get help from is  Matlab's home.  For example, you can search for keywords from the support page.
    There are also a number of FAQs you can try, some of these are
    comp.soft-sys.matlab
    University of Texas
     
    Go to the Matlab tutorial part 2
    Back to the course homepage