This file contains
instructions for using NxNet 2.0
Contents:
Percentage
data for validation
Percentage
data for evaluation
Example
2 – HR Recruitment net
NxNet is a at heart a simple back-propagation neural net (BPN). This means that it is an application that implements a simple neural network using an algorithmic analogy loosely based on the fundamental structure of groups of brain cells. NxNet is able to 'learn' the relationships between variables and then be used to predict answers - possibly for cases that it has not been trained on. Neural nets are useful in situations where there are known variables that are known to be in someway responsible or contribute to an outcome. During the process of 'training' the neural net it will 'extract' any structure it can discover that relates the input variables to the output variables. For example, the example data files that come with NxNet 2.0 contain a fictitious data for a scenario where data has been obtained about previous employees in a firm. The data includes their number of years working in the field before coming to the firm, the number of school years they completed, their level of competitiveness, their level of cooperativeness, and lastly whether they were successful as a manager or as a worker. This data is used to train a neural net to predict whether future employees of a certain number of years experience, years at school, competitiveness, and cooperation will be best employed as a manager or work or neither. Neural nets are also used in situations such as predicting insurance fraud, pattern recognition, footy tabs, horse racing and the stock market. The success of the neural net in each case depends strongly on the quality of the training data, and the sophistication of the neural net – NxNEt at this stage is quick primitive – but should be serve as a reasonable introduction.
The typical process for using NxNet is to:
This process is reflected in the tabs on the main window of NxNet namely the “Neural Net Configuration”, “Training Control”, and “Query Control” tabs. Each of these and the controls they contain are described below.
This tab contains the Neural Net Structure, the Construction Control and the Learning Parameters sections.
This section contains three grid controls that list the Input Nodes, Hidden Layers, and the Output Nodes. The simplest neural to construct is one that has a layer of input nodes – one for each input variable, and a layer of output nodes – one for each output variable. Typically however, BPN networks also have one or more so called hidden layers of nodes. These hidden layers sit between the input and out layers. Each node in each layer is connected to every node in the layers above and below it.
This grid control lists the Input Nodes by name (ie. label). When constructing a neural net, input nodes can be added or removed by simply editing the contents of the grid control.
This grid control lists each hidden layer and the number of nodes contained in the layer. When constructing a neural net layers and nodes can be added or removed by editing the contents of the grid control.
This grid control lists the output nodes by name (ie. label). When constructing a neural net, output nodes can be added or removed by simply editing the contents of the grid control.
This section contains the controls which allow the user to construct a new neural net or save or load a neural net file. A neural net file is usually saved once it has been successfully trained. The state of neural net is saved along with its structure and learning parameters. The typical process for creating a neural net from scratch is to:
Each of these buttons and their function is described in more detail below.
Press this button to clear any previous neural net from memory and to reset the contents of the grid controls in the Neural Net Structure section. Note the contents of the grid controls become editable until the Construct Neural Net button is clicked.
Clicking this button will create a new neural net in memory according to the nodes and layers listed in the grid controls in the Neural Net Structure section. After clicking this button the contents of the grid controls are no longer editable.
Clicking this button will prompt the user for a file name and location in which to save the neural net file. A neural net can be saved at any point in its construction or training.
Clicking this button will prompt the user for the location and filename of a valid neural net file. The neural net will be constructed in memory from the file. The neural net state and learning parameters will also be restored. Any existing neural net will be cleared from memory.
This section contains several editable fields for variables that affect the way in which the neural net will learn. Usually the default settings are sufficient for most situations.
<further detail to be added here>
These settings control the sensitivity of the neural net learning process.
<further detail to be added here>
Setting all heuristic settings to zero means the neural net
will behave as vanilla BPN. The settings
relate to the heuristics suggested by Robert A. Jacobs,
<further detail to be added here>
Clicking this button will reset the learning parameters to their default values.
This tab contains controls that determine the training data source, how the data within the training data source is used to train the neural net, the training parameters, the actual control of training, and evaluation of the training. Each training cycle involves the neural net reading data and calculating the error between its predicted output and the actual correct output value as recorded in the training data. It does this for as many times as set in the Training Iterations control. It then verifies its trained state against a new set of data not included in its training data. If the verification indicates that the neural net is more accurate on this second set of data than when last verified it then saves its state. Finally at the end of the training session (ie. when either the Max Training Cycles or the Min Training Error are reached), the neural net is tested (ie. evaluated) against a third set of data that has not been included in the training set. The results of the evaluation are printed in the Training Evaluation table at the bottom of this tab. Each of these sections is described in more detail below.
This section contains controls that determine the data source, the data set within the data source and the proportion of data within the data set assigned for training, validating and evaluating the neural net.
Enter into this field or click on the Browse button to locate the file that contains the data set to be used for training and evaluation.
Click on this button to locate the file that contains the data set to be used for training and evaluation.
Once the Data Source has been loaded – select the Data Set (ie. data table) within the Data Source from this dropdown control.
This slide controls the percentage of the data set rows to be included in the training set. Data rows are randomly allocated to the training set.
This slide controls the percentage of the data set rows to be included in the validation set. Data rows are randomly allocated to the valiation set.
This slide controls the percentage of the data set rows to be included in the evaluation set. Data rows are randomly allocated to the evaluation set.
This set of controls determine global settings for the training session.
This setting controls how many times the neural net will read the training data set and propagate the errors. Generally the more iterations the faster the neural net will learn. However, there is the possibility that the nural net will ‘over learn’ and start to adapt to idiosyncrasies in the training data set (this is why the validation step is important).
This setting determines the maximum training cycles (ie. train, validate) that the neural net will execute. If the Min Training Error is not reached before the Max Training Cycles then the training will cease.
This setting determines the minimum error required of the neural net during training. Once the overall error level reaches this value then the training session will cease. The smaller this value the more accurate the neural net will be (assuming the error level reaches this minimum).
Selecting this option ensures that all data is scaled to fall within the range 1.0 and 0.0 – this is the usual range of data to be used with the default neural net Learning Parameters.
These buttons control the training session.
Clicking this button will cause the application to attempt to read in the dataset from the data source (ie. Microsoft Excel workbook or Microsoft Access database, or comma delimited text file) and then run the training session using the training parameters. Training will continue until the Min Training Error or Max Training Cycles is reached or until the user clicks on the Stop Training button. During training the iteration count and the error level is displayed at the bottom of the NxNet window.
Clicking this button during a training session will cause the training to stop and the evaluation statistics to be printed in the Training Evaluation table.
This button can only be selected once training has stopped. Clicking the button causes the neural net state to be randomised and all learing lost. This may be required if learning is particularly slow.
At the end of each training session a set of evaluation statistics are calculated for the output node values. This table prints the training run number, the output node label, the correlation coefficient ( r ) for the correlation between the original values and the neural net predicted values, the t statistic to indicate the likelihood of the correlation value occurring by chance, and the p statistic to indicate the significance of the t statistic. Higher values mean a more significant result.
This tab contains controls that allow for one-off queries and batch queries.
To execute a single one-off query simply edit the values in the Input Nodes grid control (the Max and Min values should not need to be changed) then press the Run Single Case Query button to display the result in the Output Node grid control.
This is probably the more useful query option. This section contains controls that identify the batch data source containing the batch data set (ie. table or excel sheet), and then execution of the query. Results are saved back to the data set. Examine the example data sets that come with NxNet to understand the layout that NxNet expects.
Click this button to open a file dialogue to identify the file data source (either an Microsoft Excel workbook or Microsoft Access database).
Select the dataset (ie. table) from the data source using this drop down.
Clicking this button will cause the application to read in the data set and query the neural net with each row in the dataset. The predicted output values will then be saved back to the source data set. To view the results – wait until the batch run is complete then open the datasource in Excel or Access.
NxNet is distributed with several example data files to provide some guidance regarding the formatting of training data.
The following files are required for this example.
Example_SimpleXOR_MSAccessDB.mdb
Example_SimpleXOR_MSExcelWorkBook.xls
Example_SimpleXOR Net.nxn
An XOR relationship is one where there are two inputs both of which can be true or false – when both are the same value then the answer or output should be ‘true’ where the values of the inputs are different (ie. one input is true and the other is false) then the output value should be ‘false’. In the training data 1 signifies ‘true’ and 0 stands for false.
Step 1. Load NxNet
Step 2. Load the neural net file Example_SimpleXOR Net.nxn by going to the menu item: File -> Open Neural Net or clicking on the Load Neural Net button on the Neural Net Configuration tab.
Step 3. Load the training data source Example_SimpleXOR_MSExcelWorkBook.xls or Example_SimpleXOR_MSAccessDB.mdb (see Training Data above) and select the training data table.
Step 4 Training the neural net by clicking the Start Training button.
Step 5. Once training has completed, test the trained neural net by conducting a Single Case Query by entering a ‘1’ for the value of Input Node 1 and 2 in the Single Case Query grid control – and then click on Run Single Case Query button. The result is typically something like 0.99987… in the Output Node gride control (ie something close to 1.0 – ie. ‘true’). Change on the input node values to ‘0’ and click the Run Single Case Query button again. This time the result should be something like 0.000323… ie. close to 0.0 – ie. ‘false’.
Step 6. Save the neural net for future reference – should you ever need an XOR tester :-) – by going to the menu item File->Save Neural Net or clicking on the Save Neural Net button on the Neural Net Configuration tab.
The next example is slightly more interest. The scenario is that a business has collated data on past employees recording their years in the industry, years of schooling, how cooperative and how competitive they were, and finally whether they were successful in the role of worker or manager. The neural net is then used to learn from the data – the relationship between these variables and then to predict the likelihood of future potential recruits being successful as either a worker or manager.
The following files are required for this example:
Example_HR Recruitment Net - Untrained.nxn
Example_HR Recruitment_MSExcelWorkBook.xls
Step 1. Load NxNet
Step 2. Load the neural net file Example_HR Recruitment Net - Untrained.nxn
Step 3. Load the data file Example_HR Recruitment_MSExcelWorkBook.xls and select the training data table from within this data source.
Step 4. Train the neural net – if for some reason you have problems there is a pre-trained neural net file called Example_HR Recruitment Net - Trained.nxn you can use.
Step 5. Test the trained neural net using a Single Case Query
Step 6. Start a Batch Query by selecting Example_HR Recruitment_MSExcelWorkBook.xls as the batch query data source, select the batch table from within the data source and click the Run Batch Query button.
Step 7. Locate the file Example_HR Recruitment_MSExcelWorkBook.xls on disk and note that the output values have been written into the batch data table.
Other potential projects include any scenario where you have access to suitable training data (preferably lots of it) for a set of variables that are thought to be related to some outcome or outcomes that you’re interested in. Some suggestions (some more serious than others!) are listed below – please note that no guarantee is either given or implied nor is any liability accepted should you decide to undertake any of these suggestions.
- Footy tab predictor – you would need team statistics and their performance over the season or several seasons. A suggestion would be to record: the number of regular players excluded due to injury, average player salary, coach’s salary, whether they won their last game, whether they were playing on their home ground, whether they won or not…. plus any other data you think might be relevant. Then construct a neural net with one input node per variable, and one output node that predicted the likelihood of a win or loss. You would back which ever team had the highest likelihood of winning.
- Horse race predictor – To make this work you would need to collect horse statistics for several seasons. Again you would then construct a neural net with as many input nodes as statistics collected and with a single output node predicting the likelihood of a win.
- Stock market share predictor – this would involve constructing a neural net that would predict when to sell or buy a given stock – you would need to collect performance information about investment stock and each firm and any information that you thought was related to the longer term performance of shares. You would need to collect this data for several hundred stock market entries. There are many informative sites on the Internet covering how to use a neural net to predict stock market changes – search for ‘neural net stock market’ on Google.
- Weather predictor – same principle as previous suggestions.
- Hand writing recognition – develop an application which has a 100 x 100 square of ‘sensor cells’ (eg. pixels) on which the user writes a letter (eg. using the mouse) – each cell is an input node for the neural net, the neural net has 26 output nodes – one for each letter of the alphabet. You would then need to create a large amount of training data by asking a very patient person to write a letter with the mouse on the 100 x 100 square and record which letter it was. The more variations recorded for each letter the better.