32

exit( “********ERROR:select_random_MO:: reached [] without selecting a mutation operator. ”).

%select_random_MO/1, using the analogy of a roulette wheel, the function first calculates the entire area of the wheel by summing together all the slice sizes. The function then chooses ran- domly a spot on the wheel, and through select_random_MO/3 calculates where that spot is lo- cated, with regards to the mutation operator that it falls on. Since some slices are larger than others, they will have uniformly larger probabilities of being selected.

With this modification, the genome_mutator module can now function with our updated system architecture. The way we currently have the mutation operators setup and specified in the records.hrl in the constraint record, is that each one's “slice size ” is 1, thus they are still all equally likely to be selected. But this new approach gives us the ability to test mutation operator lists where each operator has a different chance of being selected. In this way we can rapidly convert our memetic algorithm based neuroevolutionary system, into a genetic algorithm based one, by for example setting the max_attempts parameter to 1, and drastically increasing the probability of selecting the mutate_weights mutation operator.

Having now updated the essential parts of our mutation algorithm, we need to add the new mutation operators, ones that mutate plasticity functions, and other parameters, similar to the manner in which the mutate_af (mutate activation func- tion) works. To be able to evolve the new decoupled features of our system, we should add the following new mutation operators:

mutate_pf : Mutates the plasticity function. Checks the currently used plasticity function, and if there are other plasticity functions available in the constraint of the agent, then the current function is swapped for a random new one. If there are no new plasticity functions available, the operator exits with an error, thus not wasting a mutation on the non available mutation operator.

mutate_aggrf : Mutates the neural aggregation function. As with plasticity and activation functions, it checks if there are other aggregation functions available. If there are, then the currently used function is mutated into another one. If there aren't, then the mutation operator exits with an error.

We add these mutate_pf and mutate_aggrf mutation operators to the ge- nome_mutator module, as shown in Listing-11.3. Similar to the mutate_af opera- tor, the mutate_pf mutation operator chooses a random neuron in the NN, and then changes its currently used plasticity function to another one available in the plas- ticity_fs list in the agent's constraint record. The same way, the mutate_aggrf op- erator chooses a random neuron, and then mutates its aggregation function (dot_product, diff...). In both cases, if the only available such function is the one already being used by the neuron, then it is left in place, and our neuroevolutionary system tries to use another mutation operator on the agent.

Listing-11.3 The implementation of the new mutate_pf (mutate plasticity function) and mu- tate_aggrf (mutate aggregation function) mutation operators.

mutate_pf(Agent_Id)->

A = genotype:read({agent,Agent_Id}),

Cx_Id = A#agent.cx_id,

Cx = genotype:read({cortex,Cx_Id}),

N_Ids = Cx#cortex.neuron_ids,

N_Id = lists:nth(random:uniform(length(N_Ids)),N_Ids),

Generation = A#agent.generation,

N = genotype:read({neuron,N_Id}),

PF = N#neuron.pf,

case (A#agent.constraint)#constraint.neural_pfs -- [PF] of

[] ->

exit( “********ERROR:mutate_pf:: There are no other plasticity functions to use. ”);

Plasticity_Functions ->

NewPF = lists:nth(random:uniform(length(Plasticity_Functions)),

Plasticity_Functions),

U_N = N#neuron{pf=NewPF,generation=Generation},

EvoHist = A#agent.evo_hist,

U_EvoHist = [{mutate_pf,N_Id}|EvoHist],

U_A = A#agent{evo_hist=U_EvoHist},

genotype:write(U_N),

genotype:write(U_A)

end.

%The mutate_pf/1 function chooses a random neuron, and then changes its currently used plas- ticity function into another one available from the neural_pfs list of the agent's constraint record.

mutate_aggrf(Agent_Id)->

A = genotype:read({agent,Agent_Id}),

Cx_Id = A#agent.cx_id,

Cx = genotype:read({cortex,Cx_Id}),

N_Ids = Cx#cortex.neuron_ids,

N_Id = lists:nth(random:uniform(length(N_Ids)),N_Ids),

Generation = A#agent.generation,

N = genotype:read({neuron,N_Id}),

AggrF = N#neuron.aggr_f,

case (A#agent.constraint)#constraint.neural_aggr_fs -- [AggrF] of

[] ->

exit( “********ERROR:mutate_aggrf:: There are no other aggregation functions to use. ”);

Aggregation_Functions ->

NewAggrF = lists:nth(random:uniform(length(Aggregation_Functions)),

Aggregation_Functions),

U_N = N#neuron{aggr_f=NewAggrF,generation=Generation},

EvoHist = A#agent.evo_hist,

U_EvoHist = [{mutate_aggrf,N_Id}|EvoHist],

U_A = A#agent{evo_hist=U_EvoHist},

genotype:write(U_N),

genotype:write(U_A)

end.

%The mutate_aggrf/1 function chooses a random neuron, and then changes its currently used aggregation function into another one available from the neural_aggr_fs list of the agent's con- straint record.

It is also worth adding the following mutation operators, that have nothing to do with topological mutation, but instead mutate the evolutionary strategy, the evolutionary search algorithm itself:

mutate_tuning_selection : Mutates the tuning selection function used by the agent to tune the NN during training.

mutate_tuning_duration : Mutates the tuning duration function used by the agent to tune the NN during training.

mutate_tuning_annealing : Mutates the tuning annealing parameter used by the agent to tune the NN during training.

mutate_perturbation_range : Mutates the perturbation range used by the exoself when tuning the synaptic weights of the NN. mutate_tot_topological_mutations : Mutates the function responsible for cal- culating the total number of topological mutations to be applied to the NN.

The evolutionary strategy mutation operators and their parameters, are unrelat- ed to the actual topological mutation. We should apply them separately from the topological mutation operators. Not only should they be applied separately, but al- so the number of these evolutionary strategy mutation operators, and the probabil- ity of applying them, should be independent of the topological mutation operators. For this reason we add and define the new macro (this one is going to be a de- scriptive one): ?SEARCH_PARAMETERS_MUTATION_PROBABILITY , in the genome_mutator module. We also further augment the mutate/1 function, adding the mutate_SearchParameters/1 function to it. The mutate_SearchParameters/1 function is executed every time the agent undergoes a topological mutation phase. The new mu- tation probability value defines the chance that the mutate_SearchParameters/1 func- tion performs any type of evolutionary strategy mutation.

In the case that the evolutionary strategy (ES) is mutated, the number of evolu- tionary strategy mutation operators applied to the agent is uniformly and randomly chosen to be between 1 and total number of ES mutation operators available. In a similar way we used to define the standard topological mutation operators at the top of the genome_mutator module, we now define the ES mutation operators, while having moved the topological mutation operators to the constraint record. The new ES mutation operators are defined as follows:

-define(ES_MUTATORS,[

mutate_tuning_selection,

mutate_tuning_duration,

mutate_tuning_annealing,

mutate_tot_topological_mutations

]).

Though a case could be made that we should define these ES mutation opera- tors in the same way we are now defining the topological mutation operators, there is at this point no need for it. Since the addition of ES mutation is done primarily to allow our neuroevolutionary system to have a greater level of flexibility, and so that it can be tweaked more easily in the future with different types of search algo- rithms and parameters.

The updated version of the mutate/1 function, and the mutate_SearchParameters/1 function that it executes, is shown in Listing-11.4. As you will notice, the mu- tate_SearchParameters/1 function operates very similarly to the way the original function that applied topological mutation operators functioned.

Listing-11.4 The new version of the mutate/1 function, with the added mu- tate_SearchParamters/1 function that applies, with a probability of ?SEARCH_PARAMETERS_MUTATION_PROBABILITY , a random number of ES mutation operators to the agent.

mutate(Agent_Id)->

random:seed(now()),

F = fun()->

mutate_SearchParameters(Agent_Id),

A = genotype:read({agent,Agent_Id}),

{TTM_Name,Parameter} = A#agent.tot_topological_mutations_f,

TotMutations = tot_topological_mutations:TTM_Name(Parameter,Agent_Id),

OldGeneration = A#agent.generation,

NewGeneration = OldGeneration+1,

genotype:write(A#agent{generation = NewGeneration}),

apply_Mutators(Agent_Id,TotMutations),

genotype:update_fingerprint(Agent_Id)

end,

mnesia:transaction(F).

%The function mutate/1 first updates the generation of the agent to be mutated, then calculates the number of mutation operators to be applied to it by executing the tot_topological_mutations:TTM_Name/2 function, and then finally runs the apply_Mutators/2 function, which mutates the agent. Once the agent is mutated, the function updates its finger- print by executing genotype:update_finrgerprint/1.

mutate_SearchParameters(Agent_Id)->

case random:uniform() < ?SEARCH_PARAMTERS_MUTATION_PROBABILITY of true ->

TotMutations = random:uniform(length(?ES_MUTATORS)),

apply_ESMutators(Agent_Id,TotMutations);

false ->

ok

end.

%The mutate_SearchParameters/1 function, mutates the search parameters of the evolutionary strategy with a probability of: ?SEARCH_PARAMETERS_MUTATION_PROBABILITY. When it does mutate the evolutionary strategy, it chooses a random number between 1 and length(?ES_MUTATORS) of evolutionary strategy mutation operators from the ?ES_MUTATORS list, and then executes them in series.

apply_ESMutators(_Agent_Id,0)->

done;

apply_ESMutators(Agent_Id,MutationIndex)->

ES_Mutators = ?ES_MUTATORS,

ES_Mutator = lists:nth(random:uniform(length(ES_Mutators)),ES_Mutators), io:format( “Evolutionary Strategy Mutation Operator:~p~n ”,[ES_Mutator]),

F = fun()->

genome_mutator:ES_Mutator(Agent_Id)

end,

Result = mnesia:transaction(F),

case Result of

{atomic,_} ->

apply_ESMutators(Agent_Id,MutationIndex-1);

Error ->

io:format( “******** Error:~p~nRetrying with new Mutation...~n ”, [Er-

ror]),

apply_ESMutators(Agent_Id,MutationIndex-1)

end.

%The apply_ESMutators/2 function chooses an evolutionary strategy mutation operator, with uniform distribution, from the ?ES_MUTATORS list of such functions. It then applies it to the agent. Whether the mutation is successful or not, the function counts down the total number of mutation operators left to apply. This is to ensure that if the researcher set for each such evolu- tionary strategy to be static, having only one available mutatable parameter for every agent, the system will try to mutate the strategy TotMutations number of times, and then return to the caller whether it was successful or not.

Unlike the case with the application of topological mutation operators, if the application of the ES mutation operator is not successful, we still decrement the MutationIndex value. This ensures that whether our system does or does not have multiple annealing, selection, duration, and tot_topological_mutation parameters and functions, the MutationIndex will still reach 0. Thus, even if every ES muta- tion operator fails, the apply_ESMutators/2 function will be able to finish and re- turn to the caller.

As with the topological mutation operators, the ES operators also need to be implemented. For the time being we will implement these functions in the ge- nome_mutator module, rather than their own module. These simple ES mutation operator functions are shown in Listing-11.5.

Listing-11.5 The implementation of the three new evolutionary strategy mutation operators: mutate_tuning_selection/1, mutate_tuning_duration/1, and mutate_tuning_annealing/1.

mutate_tuning_selection(Agent_Id)->

A = genotype:read({agent,Agent_Id}),

case (A#agent.constraint)#constraint.tuning_selection_fs -- [A#agent.tuning_selection_f] of

[] ->

exit( “********ERROR:mutate_tuning_selection/1:: Nothing to mutate, only a

single function available. ”);

Tuning_Selection_Functions->

New_TSF = lists:nth(random:uniform(length(Tuning_Selection_Functions)),

Tuning_Selection_Functions),

U_A = A#agent{tuning_selection_f = New_TSF},

genotype:write(U_A)

end.

%The mutate_tuning_selection/1 function checks if there are any other than the currently used tuning selection functions available in the agent's constraint record. If there are, then it chooses a random one from this list, and sets the agent's tuning_selection_f to it. If there are no other tuning selection functions, then it exits with an error.

mutate_tuning_annealing(Agent_Id)->

A = genotype:read({agent,Agent_Id}),

case (A#agent.constraint)#constraint.annealing_parameters --

[A#agent.annealing_parameter] of

[] ->

exit( “********ERROR:mutate_tuning_annealing/1:: Nothing to mutate, only a single function available. ”);

Tuning_Annealing_Parameters->

New_TAP= lists:nth(random:uniform(length(Tuning_Annealing_Parameters)),

Tuning_Annealing_Parameters),

U_A = A#agent{annealing_parameter = New_TAP},

genotype:write(U_A)

end.

%The mutate_annealing_parameter/1 function checks if there are any other than the currently used tuning annealing parameters available in the agent's constrain recordt. If there are, then it chooses a random one from the list, and sets the agent's annealing_parameter to it. If there are no other tuning annealing parameters, then it exits with an error.

mutate_tot_topological_mutations(Agent_Id)->

A = genotype:read({agent,Agent_Id}),

case (A#agent.constraint)#constraint.tuning_selection_fs -- [A#agent.tuning_selection_f] of

[] ->

exit( “********ERROR:mutate_tuning_selection/1:: Nothing to mutate, only a single function available. ”);

Tuning_Selection_Functions->

New_TSF = lists:nth(random:uniform(length(Tuning_Selection_Functions)),

Tuning_Selection_Functions),

U_A = A#agent{tuning_selection_f = New_TSF},

genotype:write(U_A)

end.

%The mutate_tot_topological_mutations/1 function checks if there are any other than the currently used tuning tot topological mutation functions available in the agent's constraint record.

If there are, then it chooses a random one from this list, and sets the agent's tot_topological_mutations_f to it. If there are no other functions that can calculate tot topologi- cal mutations, then it exits with an error.

These new additions do of course make our source code slightly more complex, but as you've noticed, it is still very simple, and the added flexibility will pay off when we decide that we wish to test out different evolutionary strategies with dif- ferent parameters. In the following sections we develop the code needed to con- vert this new genotype to its phenotype.

Sommaire

1 11.4.3 Updating the population_monitor Module
2 11.4.4 Creating the selection_algorithm Module
3 11.4.5 Creating the fitness_postprocessor Module
4 11.4.6 Creating the steady_state Evolutionary Loop
5 11.4.7 Updating the exoself Module
6 11.4.8 Updating the neuron Module
7 11.4.9 Creating the signal_aggregator Module
8 11.4.10 Creating the plasticity Module
9 11.5 Compiling & Testing the New System
10 11.6 Summary & Discussion
11 11.7 References
12 12.1 The Necessary Additions to the System
13 12.2 The Trace Format
14 12.3 Implementation
- 14.1 12.3.1 Updating records.hrl
- 14.2 12.3.2 Building the Topological Summary of a Neural Network
- 14.3 12.3.3 Implementing the Trace Updating Cast Clause
- 14.4 12.3.4 Updating the exoself Module
15 12.4 Compiling & Testing
16 12.5 Summary & Discussion
- 16.1 13.1 The benchmarker Architecture
- 16.2 13.2 Adding New Records
- 16.3 13.3 Updating the population_monitor Module
- 16.4 13.4 Implementing the benchmarker
- 16.5 13.5 Compiling and Testing
17 13.6 Summary
18 13.7 References
19 14.1 Pole Balancing Simulation
- 19.1 14.1.1 Implementing the Pole Balancing Scape
- 19.2 14.1.2 Implementing the Pole Balancing morphology
- 19.3 14.1.3 Benchmark Results
20 14.2 T-Maze Simulation
- 20.1 14.2.1 T-Maze Implementation
- 20.2 14.2.2 Benchmark Results
21 14.3 Summary & Discussion
22 14.4 References
23 15.1 Hebbian Rule
- 23.1 15.1.1 Implementing the New input_idps & pf Formats
- 23.2 15.1.2 Implementing the Simple Hebbian Learning Rule
24 15.2 Oja's Rule
- 24.1 15.2.1 Implementing the Oja's Learning Rule
25 15.3 Neuromodulation
- 25.1 15.3.1 The Neuromodulatory Architecture
- 25.2 15.3.2 Implementing the self_modulation Learning Rules
- 25.3 15.3.3 Implementing the input_idps_modulation Based Neuromodulated Plasticity
26 15.4 Plasticity Parameter Mutation Operators
- 26.1 15.4.1 Implementing the Weight Parameter Mutation Operator
- 26.2 15.4.2 Implementing the Neural Parameter Mutation Operator
- 26.3 15.4.3 Implementing the Hybrid, Weight & Neural Parameters Mutation Operator
- 26.4 15.4.4 Updating the genome_mutator Module
27 15.5 Tuning of a NN which has Plastic Neurons
28 15.6 Compiling & Testing
29 15.7 Summary & Discussion
30 15.8 References
31 16.1 A Brief Overview of Substrate Encoding
32 16.2 The Updated Architecture of Our NN Based Systems
33 16.3 The Genotype of the Substrate Encoded NN
34 16.4 The SENN Phenotype
35 16.5 Implementing the substrate_cpps & substrate_ceps
- 35.1 16.5.1 Implementing the substrate_cpp Module
- 35.2 16.5.2 Implementing the substrate_cep Module
36 16.6 Updating the genotype Module
37 16.7 Updating the exoself Module
38 16.8 Implementing the substrate Module
39 16.13 References
40 16.12 Summary and Discussion
41 17.1 The Updated Architecture
42 17.2 Implementing the abcn Learning Rule
- 42.1 17.2.1 Updating the substrate Module
- 42.2 17.2.2 Updating the Morphology Module
- 42.3 17.2.3 Updating the substrate_cpp & substrate_cep Modules
- 42.4 17.2.4 Benchmarking the New Substrate Plasticity
43 17.3 Implementing the iterative Learning Rule
- 43.1 17.3.1 Benchmarking the New iterative Substrate Plasticity
44 17.4 Discussion
45 18.1 Simulated Environment and Artificial Organisms: Flatland
46 18.2 The Scape and the Fitness Function
- 46.1 18.2.1 Public Scape Architectures, Polis Interface, and Scape Sectors
- 46.2 18.2.2 The Flatland Interface
47 18.3 Flatland's Avatar Encoding
48 18.4 Updating the Morphology, Sensor, and Actuator Modules
49 18.5 Updating the exoself Module
50 18.6 The Simulation and Results
- 50.1 18.6.1 Simple Food Gathering Simulation
- 50.2 18.6.2 Dangerous Food Gathering Simulation
51 18.6. 3 Predator Vs. Prey Simulation
52 18.7 Discussion
53 18.8 Summary
54 18.9 References
55 19.1 Introduction to Forex
56 19.2 Trading and the Objective
57 19.3 The Forex Simulator
58 19.4 Implementing the Forex Simulator
59 19.5 Implementing the New Sensors and Actuators
60 19.6 Generalization Testing
61 19.7 Benchmark & Results
- 61.1 19.7.1 Running the Benchmark
62 19.8 Discussion
63 19.9 Summary
64 19.10 References
65 20.1 References

11.4.3 Updating the population_monitor Module

The population_monitor module is the one responsible for mapping the geno- types to their phenotypes. We have modified the genotype in a number of ways, and thus we must now modify the population monitor process such that it can convert the agent's elements into their corresponding process based representa- tions. We must now also change the way population_monitor calculates the agent's true fitness, the way it uses the selection function, and the way it imple- ments the evolutionary loop, so that it is based on whether it's steady-state or gen- erational. Currently, the various functions and information flow in the popula- tion_monitor, has the form shown in Fig-11.3 .

Fig. 11.3 The information flow, and function execution order, in the population_monitor process.

What we need to do is change this architecture such that the popula- tion_monitor behaves differently based on whether it is using steady-state evolu- tion, or generational evolution. Finally, we also need to modify the init_population function such that all these parameters are specified through the constraint record, and so that the evolutionary loop function, fitness function, and the selection func- tion is saved to the population's record, and read from the same during the popula- tion_monitor's operation. The new population monitor should operate as follows:

1. Specify the constraints, and set the INIT_CONSTRAINTS to it, then call init_population/2 with it.

2. init_population({Population_Id,Constraints,OpMode})

This is the function with which we create a new population, and start the popu- lation_monitor process. We specify the id of the new population, the con- straints which it should use to create species and agents within the species, and the operational mode in which it should work. We have not yet used the

OpMode parameter for anything specific, currently it is set by a constant macro to gt, which is just a place holder. We will finally start using this parameter when we get to Chapter-19, and hit a dilemma of having to perform generaliza- tion tests.

3. IF population with Population_Id already exists then:

4. delete_population: Delete the existing population using the same id. 5. create_population: Create a new population of the specified id.

6. Else:

7. create_population(Population_Id,Constraints)

This function creates a new population, with the agents being creat- ed based on the constraints specified.

8. Start population_monitor process.

At this point the population monitor needs to start waiting for the termination signals that are sent out by the agents when they terminate/die. The popula- tion_monitor can then act on those termination signals based on the evolution- ary loop that it is using. For example, if it is using steady_state , then after re- ceiving the termination signal, it should immediately generate a new agent using the selection function it's utilizing. If it is using a generational evolution- ary loop, then it should simply count off the terminated agent, and wait for the remainder of the population to terminate. Once all the agents have terminated, the population_monitor should then use the selection algorithm to compose the next generation of agents.

9. Create the dead_pool list (empty when first created), which is to contain the genotypes of the best performing terminated agents.

10. Wait for termination signals.

11. If evo_alg_f == generational:

12. Wait for all agents to terminate.

13. Apply fitness_f to the agent's fitness, to produce a list of agents with their true fitness values.

14. Apply selection_f to choose the fit agents from which to compose the offspring agents.

15. Produce the offspring agents by cloning the fit organisms, and sending them through the topological mutation phase.

16. GOTO: Step-10

If evo_alg_f == steady_state:

17. After receiving the termination signal from an agent, enter it into the dead_pool, with its fitness updated through the application of the fitness_f function.

18. Using the selection function, choose an agent from the dead_pool to either create an offspring, or return/apply it to the simulation/scape.

19. Ensure that the dead_pool is of the specified size, if the dead_pool is overflowing with agents, then remove the worst performing agents in the dead_pool, until it reaches the specified size. This ensures that the dead_pool list contains the best performing agent genotypes.

20. GOTO: Step-10

We thus start by updating the init_population function. It originally accepted 4 parameters: Population_Id, Specie_Constraints, OpMode, and Selection_Function. The Selection_Function is now specified in the constraint record, so we can re- move it from the parameter list. The init_population checks if there already exists a population with the Population_Id that it was executed with. If that is the case, the function first deletes the already existing population, and then creates a new population. If there is no such population already in existence, then it creates a new population by executing the create_population function. The modification to the init_population function is in boldface, shown in Listing-11.6.

Listing-11.6 The modified init_population function, where the selection function is specified through the constraint tuple.

init_population( {Population_Id,Specie_Constraints,OpMode} )->

random:seed(now()),

F = fun()-> case genotype:read({population,Population_Id}) of undefined -> create_Population( Population_Id,Specie_Constraints );

_ ->

delete_population(Population_Id),

create_Population( Population_Id,Specie_Constraints )

end

end,

Result = mnesia:transaction(F),

case Result of

{atomic,_} ->

population_monitor:start( {OpMode,Population_Id} );

Error ->

io:format( “******** ERROR in PopulationMonitor:~p~n ”,[Error])

end.

Though the ?INIT_CONSTRAINTS contains a list of constraint records, one for each species the researcher wants the population to possess, it is nevertheless expected that the population_evo_alg_f , population_fitness_f , and popula- tion_selection_f , are to be the same for all these constraints. These constraint pa- rameters are expected by the system to be global, belonging to the population to which the species belong. Thus, all the constraint tuples in the ?INIT_CONSTRAINTS list will have these parameters equivalent. We modify the create_population func- tion to accept 2 parameters, and use the parameters in the constraint record to set the population's evolutionary loop, fitness, and selection functions, as shown in Listing-11.7.

Listing-11.7 The updated create_Population/2 function, with the new elements highlighted in boldface.

Chapter 11 Decoupling & Modularizing Our Neuroevolutionary Platform create_Population(Population_Id,Specie_Constraints)->

SpecieSize = ?INIT_SPECIE_SIZE,

Specie_Ids = [create_specie(Population_Id,SpecCon,origin,SpecieSize) || SpecCon <-

Specie_Constraints],

[C|_]=Specie_Constraints,

Population = #population{

id = Population_Id,

specie_ids = Specie_Ids,

evo_alg_f = C#constraint.population_evo_alg_f,

fitness_f = C#constraint.population_fitness_f,

selection_f = C#constraint.population_selection_f

},

genotype:write(Population).

As you've noticed, the init_population function, due to the selection algorithm now being stored in the population record, calls the start/1 with just the OpMode and Population_Id parameters. Thus, we must also modify the continue/2 and con- tinue/3 functions from:

continue(OpMode,Selection_Algorithm)->

Population_Id = test,

population_monitor:start({OpMode,Population_Id,Selection_Algorithm}).

continue(OpMode,Selection_Algorithm,Population_Id)->

population_monitor:start({OpMode,Population_Id,Selection_Algorithm}).

To ones expecting the population record to carry all the needed information to start the population_process:

continue( OpMode )->

Population_Id = test,

population_monitor:start( {OpMode,Population_Id} ).

continue( OpMode,Population_Id )->

population_monitor:start( {OpMode,Population_Id}) .

With this done, we now modify the init/1 function, and then the actual pro- cess's functionality by updating the call and cast handling functions of this mod- ule. The init/1 function requires that the population_monitor state record also keeps track of the evolutionary_algorithm ( generational or steady_state ), fit- ness_postprocessing, and selection_algorithm functions. The updated version of the state record and the init/1 function are shown in the Listing-11.8.

Listing-11.8 The updated state record, and the init/1 function. Modifications are shown high- lighted in boldface.

-record(state,{op_mode,population_id,activeAgent_IdPs=[],agent_ids=[],tot_agents,

agents_left,op_tag,agent_summaries=[],pop_gen=0,eval_acc=0,cycle_acc=0,time_acc=0,

step_size,next_step,goal_status, evolutionary_algorithm,fitness_postprocessor, selec-

tion_algorithm}).

…

...

init(Parameters) ->

process_flag(trap_exit,true),

register(monitor,self()),

io:format( “******** Population monitor started with parameters:~p~n ”,[Parameters]), State = case Parameters of

{OpMode,Population_Id}->

Agent_Ids = extract_AgentIds(Population_Id,all),

ActiveAgent_IdPs = summon_agents(OpMode,Agent_Ids),

P = genotype:dirty_read({population,Population_Id}),

state{op_mode=OpMode,

population_id = Population_Id,

activeAgent_IdPs = ActiveAgent_IdPs,

tot_agents = length(Agent_Ids),

agents_left = length(Agent_Ids),

op_tag = continue,

evolutionary_algorithm = P#population.evo_alg_f,

fitness_postprocessor = P#population.fitness_f,

selection_algorithm = P#population.selection_f}

end,

{ok, State}.

The first cast, one which handles the terminating agents, is easily modified by updating the guard of the cast from one using a selection_function parameter to one using the evolutionary_algorithm , and by modifying the mutate_population function to allow it to be called with an extra parameter, the fitness_postprocessor function name. These modifications are shown in the following snippet of source code, showing just these modified two lines:

handle_cast({Agent_Id,terminated,Fitness,AgentEvalAcc,AgentCycleAcc,AgentTimeAcc},S) when S#state.evolutionary_algorithm == generational ->

...

mutate_population(Population_Id, ?SPECIE_SIZE_LIMIT, S#state.fitness_postprocessor,

S#state.selection_algorithm),

We will create the cast handling clause which implements the steady_state evo- lutionary loop in a later section. For now, we update the mutate_population/4 function. The updated mutate_population function, which calls the mutate_specie function for every species in the population, is shortened dramatically, because it

now offloads both, the fitness postprocessing, and the actual selection of fit agents, to the specialized modules that we will create later. The updated mu- tate_population and mutate_specie functions are shown in Listing-11.9.

Listing-11.9 The updated mutate_population and mutate_specie functions, which utilize the specialized fitness_postprocessor and selection_algorithm modules.

mutate_population(Population_Id,KeepTot,Fitness_Postprocessor,Selection_Algorithm)->

NeuralEnergyCost = calculate_EnergyCost(Population_Id),

F = fun()->

P = genotype:read({population,Population_Id}),

Specie_Ids = P#population.specie_ids,

[mutate_Specie(Specie_Id,KeepTot,NeuralEnergyCost,Fitness_Postprocessor,

Selection_Algorithm) || Specie_Id <- Specie_Ids]

end,

{atomic,_} = mnesia:transaction(F).

%The function mutate_population/3 mutates the agents within every specie in its specie_ids

list, maintaining each specie within the size of KeepTot. The function first calculates the aver-

age cost of each neuron, and then mutates each specie separately using the particular Fit- ness_Postprocessor and Selection_Algorithm parameters for that specie.

mutate_Specie(Specie_Id,PopulationLimit,NeuralEnergyCost,Fitness_Postprocessor_Name,

Selection_Algorithm_Name)->

S = genotype:dirty_read({specie,Specie_Id}),

{AvgFitness,Std,MaxFitness,MinFitness} = calculate_SpecieFitness({specie,S}),

Agent_Ids = S#specie.agent_ids,

Sorted_AgentSummarie =lists:reverse(lists:sort(construct_AgentSummaries(

Agent_Ids, []))),

io:format( “Using: Fitness Postprocessor:~p Selection Algorirthm:~p~n ”, [

Fitness_Postprocessor_Name, Selection_Algorithm_Name]),

ProperlySorted_AgentSummaries=

fitness_postprocessor:Fitness_Postprocessor_Name( Sorted_AgentSummaries),

{NewGenAgent_Ids,TopAgent_Ids} =

selection_algorithm:Selection_Algorithm_Name(ProperlySorted_AgentSummaries,

NeuralEnergyCost,PopulationLimit),

{FList,_TNList,_AgentIds}=lists:unzip3(Sorted_AgentSummaries),

[TopFitness|_] = FList,

{Factor,Fitness}=S#specie.innovation_factor,

U_InnovationFactor = case TopFitness > Fitness of

true ->

{0,TopFitness};

false ->

{S#specie.innovation_factor-1,Fitness}

end,

genotype:write(S#specie{

agent_ids = NewGenAgent_Ids,

champion_ids = TopAgent_Ids,

fitness = {AvgFitness,Std,MaxFitness,MinFitness},

innovation_factor = U_InnovationFactor}).

%The function mutate_Specie/5 calls the selection algorithm function to separate the fit from

the unfit organisms in the specie, and then mutates the fit organisms to produce offspring, main- taining the total species size within PopulationLimit. The function first calls the fit- ness_postprocessor function which sorts the agent summaries. Then, the resorted updated sum- maries are split into a valid (fit) and invalid (unfit) lists of agents by the selection algorithm. The invalid agents are deleted, and the valid agents are used to create offspring using the par- ticular Selection_Algorithm_Name function. The agent ids belonging to the next generation (the valid agents and their offspring) are then produced by the selection function. Then, the in- novation factor (the last time the specie's top fitness improved) is updated. And finally, the ids of the top 3 agents within the specie are noted, and the updated specie record is written to data- base.

construct_AgentSummaries([Agent_Id|Agent_Ids],Acc)->

A = genotype:dirty_read({agent,Agent_Id}),

construct_AgentSummaries(Agent_Ids,[{A#agent.fitness,

length((genotype:dirty_read({cortex,A#agent.cx_id}))#cortex.neuron_ids),Agent_Id}|Acc]);

construct_AgentSummaries([],Acc)->

Acc.

%The construct_AgentSummaries/2 function reads the agents in the Agent_Ids list, and com- poses a list of tuples of the following format: [{AgentFitness,AgentTotNeurons,Agent_Id}...]. This list of tuples is referred to as AgentSummaries. Once the AgentSummaries list is com- posed, it is returned to the caller.

The population_monitor module is simplified by offloading the selection and fitness postprocessing functions to their own respective modules. The popula- tion_monitor, after this modification, primarily holds population and specie opera- tor functions. In the following sections we build the selection and the fitness postprocessing modules.

11.4.4 Creating the selection_algorithm Module

The selection_algorithm module is a container for the selection_algorithm functions. In the original population_monitor module we had two such functions for the generational evolutionary algorithm loop. Those two functions were the competition selection function, and the top3 selection function. We modified the population_monitor system in the previous section by moving the fitness postprocessing code and the selection code to their own respective modules. The following listing shows the selection_algorithm module, after the competition and top3 functions were modified to be self contained within the module.

Listing-11.10 The implementation of the selection_algorithm module. -module(selection_algorithm).

-compile(export_all).

-include( “records.hrl ”).

-define(SURVIVAL_PERCENTAGE,0.5).

competition(PropSorted_ASummaries,NeuralEnergyCost,PopulationLimit)->

TotSurvivors=round(length(PropSorted_ASummaries)*?SURVIVAL_PERCENTAGE),

Valid_AgentSummaries = lists:sublist(PropSorted_ASummaries,TotSurvivors), Invalid_AgentSummaries = PropSorted_ASummaries -- Valid_AgentSummaries, {_,_,Invalid_AgentIds} = lists:unzip3(Invalid_AgentSummaries), [genotype:delete_Agent(Agent_Id) || Agent_Id <- Invalid_AgentIds], io:format( “Valid_AgentSummaries:~p~n ”,[Valid_AgentSummaries]),

io:format( “Invalid_AgentSummaries:~p~n ”,[Invalid_AgentSummaries]),

TopAgentSummaries = lists:sublist(Valid_AgentSummaries,3),

{_TopFitnessList,_TopTotNs,TopAgent_Ids} = lists:unzip3(TopAgentSummaries),

io:format( “NeuralEnergyCost:~p~n ”,[NeuralEnergyCost]),

{AlotmentsP,NextGenSize_Estimate} = calculate_allotments(Valid_AgentSummaries,

NeuralEnergyCost,[],0),

Normalizer = NextGenSize_Estimate/PopulationLimit,

io:format( “Population size normalizer:~p~n ”,[Normalizer]),

NewGenAgent_Ids = gather_survivors(AlotmentsP,Normalizer,[]),

{NewGenAgent_Ids,TopAgent_Ids}.

%The competition/3 function implements the “competition ” selection algorithm. The function first sorts the agent summaries. The function then executes calculate_allotments/4 to calculate the number of offspring allotted for each agent in the Sorted_AgentSummaries list. The func- tion then calculates the Normalizer value, which is used to normalize the allotted number of offspring for each agent, to ensure that the final specie size is within the PopulationLimit. The function then drops into the gather_survivors/3 function which, using the normalized offspring allotment values, creates the actual mutant offspring. Finally, the function returns to the caller a tuple composed of the new generation's agent ids, and the top 3 agent ids of the current genera- tion.

calculate_allotments([{Fitness,TotNeurons,Agent_Id}|Sorted_AgentSummaries],

NeuralEnergyCost,Acc,NewPopAcc)->

NeuralAlotment = Fitness/NeuralEnergyCost,

MutantAlotment = NeuralAlotment/TotNeurons,

U_NewPopAcc = NewPopAcc+MutantAlotment,

calculate_allotments(Sorted_AgentSummaries,NeuralEnergyCost, [{MutantAlotment,

Fitness,TotNeurons,Agent_Id}|Acc],U_NewPopAcc);

calculate_allotments([],_NeuralEnergyCost,Acc,NewPopAcc)->

io:format( “NewPopAcc:~p~n ”,[NewPopAcc]),

{Acc,NewPopAcc}.

%The calculate_allotments/4 function accepts the AgentSummaries list as a parameter, and for each agent, using the NeuralEnergyCost, calculates how many offspring that agent can produce by using the agent's Fitness, TotNeurons, and NeuralEnergyCost values. The function first cal- culates how many neurons the agent is allotted, based on the agent's fitness and the cost of each neuron (which itself was calculated based on the average performance of the population). From the number of neurons allotted to the agent, the function then calculates how many offspring the agent should be allotted, by dividing the number of neurons it is allotted, by the agent's NN size. The function also keeps track of how many offspring will be created from all these agents in general, by adding up all the offspring allotments. The calculate_allotments/4 function does this for each tuple in the AgentSummaries, and then returns the calculated allotment list and NewPopAcc to the caller.

gather_survivors([{MutantAlotment,Fitness,TotNeurons,Agent_Id}|AlotmentsP],

Normalizer, Acc)->

Normalized_MutantAlotment = round(MutantAlotment/Normalizer),

io:format( “Agent_Id:~p Normalized_MutantAlotment:~p~n ”, [Agent_Id,

Normalized_MutantAlotment]),

SurvivingAgent_Ids = case Normalized_MutantAlotment >= 1 of

true ->

MutantAgent_Ids = case Normalized_MutantAlotment >= 2 of

true ->

[population_monitor:create_MutantAgentCopy( Agent_Id)

|| _ <-lists:seq(1,Normalized_MutantAlotment-1)];

false ->

[]

end,

[Agent_Id|MutantAgent_Ids];

false ->

io:format( “Deleting agent:~p~n ”,[Agent_Id]),

genotype:delete_Agent(Agent_Id),

[]

end,

gather_survivors(AlotmentsP,Normalizer,lists:append(SurvivingAgent_Ids,Acc));

gather_survivors([],_Normalizer,Acc)->

io:format( “New Population:~p PopSize:~p~n ”,[Acc,length(Acc)]),

Acc.

%The gather_survivors/3 function accepts the list composed of the allotment tuples and a popu- lation normalizer value calculated by the competition/3 function, and from those values calcu- lates the actual number of offspring that each agent should produce, creating those mutant off- spring and accumulating the new generation agent ids. For each Agent_Id the function first calculates the normalized offspring allotment value, to ensure that the final number of agents in the specie is within the population limit of that specie. If the offspring allotment value is less than 0, the agent is killed. If the offspring allotment is 1, the parent agent is allowed to survive to the next generation, but is not allowed to create any new offspring. If the offspring allotment is greater than one, then the agent is allowed to create Normalized_MutantAlotment-1 number

of offspring, by calling upon the create_MutantAgentCopy/1 function. The function cre- ate_MutantAgentCopy/1 function, creates an offspring and returns its id. Once all the offspring have been created, the function returns to the caller a list of ids, composed of the surviving par- ent agent ids, and their offspring, the next generation.

top3(ProperlySorted_AgentSummaries,NeuralEnergyCost,PopulationLimit)->

TotSurvivors = 3,

Valid_AgentSummaries = lists:sublist(ProperlySorted_AgentSummaries,TotSurvivors), Invalid_AgentSummaries = ProperlySorted_AgentSummaries -- Valid_AgentSummaries, {_,_,Invalid_AgentIds} = lists:unzip3(Invalid_AgentSummaries),

{_,_,Valid_AgentIds} = lists:unzip3(Valid_AgentSummaries),

[genotype:delete_Agent(Agent_Id) || Agent_Id <- Invalid_AgentIds],

io:format( “Valid_AgentSummaries:~p~n ”,[Valid_AgentSummaries]),

io:format( “Invalid_AgentSummaries:~p~n ”,[Invalid_AgentSummaries]),

TopAgentSummaries = lists:sublist(Valid_AgentSummaries,3),

{_TopFitnessList,_TopTotNs,TopAgent_Ids} = lists:unzip3(TopAgentSummaries),

io:format( “NeuralEnergyCost:~p~n ”,[NeuralEnergyCost]),

NewGenAgent_Ids = random_offspring(Valid_AgentIds,PopulationLimit-TotSurvivors,[]),

{NewGenAgent_Ids,TopAgent_Ids}.

%The top3/3 function is a simple selection algorithm. This function extracts the top 3 agents from the ProperlySorted_AgentSummaries list, subtracts 3 from the PopulationLimit, and then uses the function random_offspring/3 to create offspring based on these top 3 agents. Once the offspring have been created, the function returns a list of the offspring ids, and the top agent ids, back to the caller.

random_offspring (_Valid_AgentIds,0,Acc)->

Acc;

random_offspring (Valid_AgentIds,OffspringIndex,Acc)->

Parent_AgentId = lists:nth(random:uniform(length(Valid_AgentIds)),

Valid_AgentIds),

MutantAgent_Id = population_monitor:create_MutantAgentCopy(Parent_AgentId),

random_offspring (Valid_AgentIds,OffspringIndex-1,[MutantAgent_Id|Acc]). %The random_offspring/3 function is part of a very simple selection algorithm, which just se- lects the top 3 most fit agents, and then uses the create_MutantAgentCopy/1 function to create their offspring. Each offspring is created from a randomly selected top agent.

competition(ProperlySorted_AgentSummaries)->

TotEnergy = lists:sum([Fitness || {Fitness,_TotN,_Agent_Id}<- ProperlySorted_AgentSummaries]),

TotNeurons = lists:sum([TotN || {_Fitness,TotN,_Agent_Id} <- ProperlySorted_AgentSummaries]),

NeuralEnergyCost = TotEnergy/TotNeurons,

{AlotmentsP,Normalizer} = calculate_alotments(ProperlySorted_AgentSummaries,

NeuralEnergyCost, [],0),

Choice = random:uniform(),

{WinnerFitness,WinnerTotN,WinnerAgent_Id}=choose_CompetitionWinner(AlotmentsP,

Normalizer,Choice,0),

{WinnerFitness,WinnerTotN,WinnerAgent_Id}.

%competition/1 is the competition selection algorithm for the steady_state evolutionary loop implementation. It functions similar to the competition/3 selection algorithm, but it converts the allotments to probabilities of the agent being chosen as the winner of the selection algorithm. The population monitor decides on what to do with the winner, either to create an offspring from it, re-enter it into a simulated environment, or re-apply it to some problem again.

choose_CompetitionWinner([{MutantAllotment,Fitness,TotN,Agent_Id}|AllotmentsP],

Normalizer,Choice,Range_From)->

Range_To = Range_From+MutantAllotment/Normalizer,

case (Choice >= Range_From) and (Choice =< Range_To) of

true ->

{Fitness,TotN,Agent_Id};

false ->

choose_CompetitionWinner(AllotmentsP,Normalizer,Choice,Range_To)

end;

choose_CompetitionWinner([],_Normalizer,_Choice,_Range_From)->

exit( “********ERROR:choose_CompetitionWinner:: reached [] without selecting a

winner. ”).

%The choose_CompetitionWinner/4 function, uses the Choice value to randomly choose an agent from the AllotmentsP, with the probability of choosing the agent being proportional to the agent's MutantAllotment value.

By keeping all the selection functions in this module, it makes it easier for us to later add new ones, and then simply reference them by their name.

11.4.5 Creating the fitness_postprocessor Module

The fitness_postprocessor gives us an added level of flexibility when sorting and computing the fitness of the agents belonging to some species. In this manner, we can allow the scapes and various problems to concentrate on providing fitness scores to the agents based simply on their performance, rather than other proper- ties of those agents, like size and complexity for example. The fit- ness_postprocessor functions modify the fitness scores of the agents, such that the updated fitness score reflects some particular property that the researcher finds important, but which is general and separate from the particular simulation or problem that the neuroevolutionary system is applied to. Listing-11.11 presents the fitness_postprocessor module which contains two simple fitness postproces- sors, the none and the size_proportional functions.

Listing-11.11 The implementation of the fitness_postprocessor module.

-module(fitness_postprocessor).

-compile(export_all).

-include( “records.hrl ”).

-define(EFF,0.1). %Efficiency .

none(Sorted_AgentSummaries)->

Sorted_AgentSummaries.

%The none/1 fitness postprocessor function does nothing to the agent summaries, returning the original fitness scores to the caller.

size_proportional(Sorted_AgentSummaries)->

SDX=lists:reverse(lists:sort([{Fitness/math:pow(TotN,?EFF),{Fitness,TotN,Agent_Id}}||{

Fitness,TotN,Agent_Id}<-Sorted_AgentSummaries])),

ProperlySorted_AgentSummaries = [Val || {_,Val}<-SDX],

ProperlySorted_AgentSummaries.

%The size_proportional/1 fitness postprocessor function modifies the fitness scores belonging to the agent summaries such that they are decreased proportional to the NN size, and the ?EFF parameter. Every fitness score is changed to: TrueFitness = Fitness/math:pow(TotN,?EFF). Based on these true fitness scores, the agent summaries are resorted, and then returned to the caller.

With this module completed, we now return back to the population_monitor function, to add the cast clause which allows our neuroevolutionary system to em- ploy the steady-state evolutionary loop.

11.4.6 Creating the steady_state Evolutionary Loop

We need to update the population_monitor process such that when the popula- tion is set to use a steady_state evolutionary loop with its complementary selection algorithm, the population_monitor process is able to maintain a proper population size, creating a new agent for every one that terminated. The creation of new agents must be done in such a way that the average fitness goes up, that there is evolution. In Chapter-10 we discussed the manner in which the DXNN platform solves this problem, and the implementation of the dead_pool list, which allows for the neuroevolutionary system to track content drift, and allow for the selection algorithm to have a list of Ids to choose from when selecting a parent agent. We will take a similar approach with our system.

To update our population_monitor process such that it can deal with a steady_state evolutionary loop, we need to construct a cast clause that allows the population_monitor to perform the following set of steps:

1. The init_population function creates a new population of agents, with the seed population being of size X.

2. The population monitor spawns the seed population, and enters its main func- tional loop.

3. The population_monitor waits for termination signals from agents.

4. When the population_monitor receives the termination signal of the form: {Agent_Id, terminated, Fitness, AgentEvalAcc, AgentCycleAcc, AgentTimeAcc} with a the cast guard: S#state.evolutionary_algorithm == steady_state , it should function as follows:

5. Update the eval_acc, cycle_acc, and time_acc parameters of the state rec- ord.

6. If termination condition is reached (based on eval_acc, or achieved fit- ness), go to Step-14. Else continue to step-7.

7. Compose and add the terminated agent's summary tuple to its species' dead_pool list.

8. Apply the fitness_postprocessor function of the population to the dead_pool summary list.

9. Using the newly sorted dead_pool summary list, use the selection func- tion to choose an agent from the dead_pool, an agent that will either be used as a parent for the creation of an offspring, or be the agent that will be released back into the environment (if the neuroevolutionary system is applied to ALife), or reapplied to the problem. This is unlike the selection algorithm function used in the generational evolutionary loop, which re- turned a list of agent ids belonging to the new generation, and a list of top/champion agent ids.

10.Randomly choose, (90/10) to either use the agent to create an offspring (in which case the agent remains in the dead pool), or apply it to the prob- lem again (in which case the agent is extracted from the dead pool).

11.If creating an offspring, then clone the selected agent, send the clone through the topological mutation phase, and then spawn the offspring and apply it to the problem. If the agent is selected to be re-applied to the problem, then extract it from the dead_pool, spawn it, and apply to the problem.

12.Check if the size of the dead_pool is greater than X (population size). If the dead_pool size is greater than X, then keep the top X agents, and de- lete the remainder. The population size that is active is of size X, but we will also make the dead_pool of size X, thus the total number of agents stored in the population is X*2 .

13. GOTO: Step-3

14.Termination Condition Reached: Terminate all currently running pheno- types. This means that the population monitor must keep track of not just the inactive agents (the ones in the dead_pool), but also of the active ones.

The implemented cast clause based on this algorithm is shown in Listing-11.12

Listing-11.12 The implementation of the steady_state handle_cast clause of the population monitor.

handle_cast({Agent_Id,terminated,Fitness,AgentEvalAcc,AgentCycleAcc,AgentTimeAcc},S) when S#state.evolutionary_algorithm == steady_state ->

Population_Id = S#state.population_id,

Specie_Ids = (genotype:dirty_read({population,Population_Id}))#population.specie_ids,

SpecFitList=[(genotype:dirty_read({specie,Specie_Id}))#specie.fitness || Specie_Id <-

Specie_Ids],

BestFitness=lists:nth(1,lists:reverse(lists:sort([MaxFitness || {_,_,MaxFitness,_} <-

SpecFitList]))),

U_EvalAcc = S#state.eval_acc+AgentEvalAcc,

U_CycleAcc = S#state.cycle_acc+AgentCycleAcc,

U_TimeAcc = S#state.time_acc+AgentTimeAcc,

case (S#state.eval_acc >= ?EVALUATIONS_LIMIT) or (BestFitness > ?FITNESS_GOAL) of

true ->

case lists:keydelete(Agent_Id,1,S#state.activeAgent_IdPs) of

[] ->

U_S=S#state{activeAgent_IdPs=[], eval_acc=U_EvalAcc, cy-

cle_acc =U_CycleAcc, time_acc=U_TimeAcc},

{stop,normal,U_S};

U_ActiveAgent_IdPs ->

U_S=S#state{activeAgent_IdPs=U_ActiveAgent_IdPs, eval_acc

=U_EvalAcc, cycle_acc=U_CycleAcc, time_acc=U_TimeAcc},

{noreply,U_S}

end;

false ->

io:format( “Tot Evaluations:~p~n ”,[S#state.eval_acc+AgentEvalAcc]),

FitnessPostprocessorName = S#state.fitness_postprocessor,

SelectionAlgorithmName = S#state.selection_algorithm,

[A] = genotype:dirty_read({agent,Agent_Id}),

Morphology= (A#agent.constraint)#constraint.morphology,

io:format( “Agent_Id:~p of morphology:~p with fitness:~p terminated.~n ”,

[Agent_Id, Morphology,Fitness]),

Specie_Id = A#agent.specie_id,

[S] = genotype:dirty_read({specie,Specie_Id}),

Old_DeadPool_AgentSummaries = S#specie.dead_pool,

Old_Agent_Ids = S#specie.agent_ids,

io:format( “Old_DeadPool:~p~n Old_Agent_Ids:~p~n ”,

[Old_DeadPool_AgentSummaries,Old_Agent_Ids]),

[AgentSummary] = construct_AgentSummaries([Agent_Id],[]),

DeadPool_AgentSummaries = [AgentSummary |

Old_DeadPool_AgentSummaries],

ProperlySorted_AgentSummaries

=fitness_postprocessor:FitnessPostprocessorName(DeadPool_AgentSummaries),

Top_AgentSummaries =lists:sublist(ProperlySorted_AgentSummaries,

round(?SPECIE_SIZE_LIMIT*?SURVIVAL_PERCENTAGE)),

{WinnerFitness,WinnerProfile,WinnerAgent_Id}

=selection_algorithm:SelectionAlgorithmName(ProperlySorted_AgentSummaries),

Valid_AgentSummaries = case length(ProperlySorted_AgentSummaries) >=

?SPECIE_SIZE_LIMIT of

true ->

[{InvalidFitness,InvalidTotN,InvalidAgent_Id}|Remaining_AgentSummaries]

=lists:reverse(ProperlySorted_AgentSummaries),

io:format( “Informationtheoretic Death:~p::~p~n ”, [

InvalidAgent_Id, {InvalidFitness,InvalidTotN,InvalidAgent_Id}]),

genotype:delete_agent(InvalidAgent_Id,safe),

Remaining_AgentSummaries;

false ->

ProperlySorted_AgentSummaries

end,

ActiveAgent_IdP = case random:uniform() < 0.1 of

true ->

U_DeadPool_AgentSummaries = lists:delete({WinnerFitness,

WinnerProfile,WinnerAgent_Id},Valid_AgentSummaries),

{ok,WinnerAgent_PId} = exoself:start_link({S#state.op_mode,

WinnerAgent_Id,void_MaxTrials}),

{WinnerAgent_Id,WinnerAgent_PId};

false ->

U_DeadPool_AgentSummaries = Valid_AgentSummaries,

AgentClone_Id = create_MutantAgentCopy(WinnerAgent_Id,

safe),

{ok,AgentClone_PId} = exoself:start_link({S#state.op_mode,

AgentClone_Id,void_MaxTrials}),

{AgentClone_Id,AgentClone_PId}

end,

{_,_,TopAgent_Ids} = lists:unzip3(lists:sublist(Top_AgentSummaries,3)),

io:format( “TopAgent_Ids:~p~n ”,[TopAgent_Ids]),

[USpecie]=genotype:dirty_read({specie,Specie_Id}),

genotype:dirty_write(USpecie#specie{dead_pool

=U_DeadPool_AgentSummaries, champion_ids = TopAgent_Ids}),

ActiveAgent_IdPs = S#state.activeAgent_IdPs,

U_ActiveAgent_IdPs = [ActiveAgent_IdP|lists:keydelete(Agent_Id,1,

ActiveAgent_IdPs)],

U_S=S#state{activeAgent_IdPs=U_ActiveAgent_IdPs, eval_acc=U_EvalAcc,

cycle_acc=U_CycleAcc, time_acc=U_TimeAcc},

{noreply,U_S}

Chapter 11 Decoupling & Modularizing Our Neuroevolutionary Platform end;

%This handle_cast clause accepts a termination message from an agent. The message contains the Id of the agent, and its reached fitness. The clause finds the species to which the agent be- longs, processes the agent's fitness to produce the true fitness, and adds the agent to the dead_pool. From the dead_pool the function then chooses an agent, the probability of choosing the agent is proportional to its offspring allocation number, and thus there is a greater chance of choosing the more fit agents. After an agent has been chosen from the dead_pool, there is 90% that the agent will be used as a base to produce an offspring, and 10% that the agent itself will be reapplied to the problem, for reevaluation, to ensure that it belongs in the dead_pool. This makes sure that if the environment has changed, that if the world has advanced further than this ancient agent can cope with, and that if it lived in an easy world compared to the current one where fitness points are not so easily achieved … the agent is reevaluated, and its competitive- ness is found. This allows the dead_pool to track the drift of the world, the environment … the dead_pool is used for content drift tracking. If the offspring is produced, it is then released into the environment or applied to the problem again. After this, because dead_pool size must stay within a certain size X, the top X agents of the dead_pool are left in it, and the rest are deleted. If the agent was itself chosen to be applied to the problem again, then its id is extracted from the dead_pool, and it is applied to the problem again.

The population_monitor using the steady_state evolutionary loop terminates when a fitness goal has been reached, or after reaching a particular number of evaluations. When one of these termination conditions is reached, the popula- tion_monitor process stops generating new agents, and waits for all the remaining agents to terminate. Afterwards, the population_monitor itself terminates normal- ly.

The population_monitor process utilizes the same fitness postprocessor func- tions in its steady_state approach as it does in its generational one. The selection algorithm though is different. In the steady_state evolutionary loop, the selection algorithm, competition/1 , is executed with the DeadPool_AgentSummaries pa- rameter, and it must return some fit agent. This is unlike the generational version of selection algorithms, which accept a list of summaries, neural energy cost, and population size limit, and return a tuple composed of the next generation popula- tion and a list of champion agents.

We create the necessary new selection algorithm for the steady_state evolu- tionary loop by modifying the original competition algorithm. We will postfix stst (steady-state) to the algorithm name that is used with the steady_state evolutionary loop. The new selection algorithm, competition_stst/1 , accepts the agent summar- ies list as a parameter, and performs the computations similar to the original com- petition selection algorithm. But this selection algorithm uses the offspring allot- ments as probabilities, the higher the allotment the higher the chance that the agent is chosen. Based on these allotments, the algorithm then chooses one of the agents, and returns its summary back to the caller. The new competition_stst/1 selection algorithm is shown in Listing-11.13.

Listing-11.13 A new selection algorithm for the steady_state evolutionary loop. competition_stst(ProperlySorted_AgentSummaries)->

TotEnergy = lists:sum([Fitness || {Fitness,_TotN,_Agent_Id}<- ProperlySorted_AgentSummaries]),

TotNeurons = lists:sum([TotN || {_Fitness,TotN,_Agent_Id} <- ProperlySorted_AgentSummaries]),

NeuralEnergyCost = TotEnergy/TotNeurons,

{AlotmentsP,NextGenSize_Estimate}

=calculate_alotments(ProperlySorted_AgentSummaries,NeuralEnergyCost,[],0),

{WinnerFitness,WinnerTotN,WinnerAgent_Id} =choose_CompetitionWinner(AlotmentsP,

random:uniform(round(100*NextGenSize_Estimate))/100,0),

{WinnerFitness,WinnerTotN,WinnerAgent_Id}.

choose_CompetitionWinner([{MutantAlotment,Fitness,TotN,Agent_Id}|AlotmentsP],

Choice,Range_From)->

Range_To = Range_From+MutantAlotment,

case (Choice >= Range_From) and (Choice =< Range_To) of

true ->

{Fitness,TotN,Agent_Id};

false ->

choose_CompetitionWinner(AlotmentsP,Choice,Range_To)

end;

choose_CompetitionWinner([],_Choice,_Range_From)->

exit( “********ERROR:choose_CompetitionWinner:: reached [] without selecting a

winner. ”).

With the modification of the population_monitor now complete, and with the addition of the new selection algorithm, we can now move forward and update the exoself module.

11.4.7 Updating the exoself Module

We now update the exoself module, primarily the prep and the loop functions of the exoself. We first add the state record:

-record(state,{

agent_id,

generation,

pm_pid,

idsNpids,

cx_pid,

spids=[],

}).

Chapter 11 Decoupling & Modularizing Our Neuroevolutionary Platform npids=[],

nids=[],

apids=[], scape_pids=[],

highest_fitness=0,

eval_acc=0,

cycle_acc=0,

time_acc=0,

max_attempts=10,

attempt=1,

tuning_duration_f,

tuning_selection_f,

annealing_parameter,

perturbation_range

Which will be used to keep track of all the parameters used by the exoself pro- cess. The new prep/2 function will use this state record to store all the useful in- formation, including the perturbation parameter value, tuning selection function name, and the annealing parameter. Once the prep/2 function completes setting up the NN system, the exoself drops into its main loop. We update the loop to take advantage of the now decoupled tuning selection algorithms, and the annealing and the perturbation parameters.

The updated version of the exoself's functionality is as follows:

1. The updated exoself process' main loop awaits from its cortex process the evoluation_completed message.

2. Once the message is received, based on the fitness achieved, the exoself de- cides on whether to continue tuning the weights or terminate the system.

3. Exoself tries to improve the fitness by perturbing/tuning the weights of its neu- rons, after each tuning session, the Neural Network based system performs an- other evaluation by interacting with the scape until completion (the NN solves a problem, or dies within the scape, or...).

The order of events that the exoself performs is important: When evalua- tion_completed message is received, the function first checks whether the newly achieved fitness is higher than the highest fitness achieved thus far, which is set to 0 during the prep phase when the exoself just comes online. If the new fitness is not higher than the currently recorded highest fitness the agent achieved, then the exoself sends its neurons a message to restore their weights to their previous val- ues, to the values which produced the highest fitness instead of their current values which yielded the current lower fitness score. If on the other hand the new fitness is higher than the previously highest achieved fitness, then the function tells the neurons to backup their current synaptic weights, as these weights represent the NN's best, most fit form yet.

The exoself process then tells all the neurons to prepare for a “memory reset ” by sending each neuron the: {self(), reset_prep} , message. Since the NN can have recursive connections, it is important for each neuron to flush its buffer/inbox and be reset into its initial fresh state. Thus each neuron goes into standby mode when it receives the reset_prep signal, and begins to wait for the reset signal from the exoself. This ensures that none of the neurons are functioning or processing data when they are reset, and that all of them are synchronized. Once all the neurons go to this standby mode, by replying to the exoself that they received its reset_prep message, the exoself sends them the actual reset message, which makes them flush their buffers, and returns them into their main loop.

Finally, the exoself checks whether it has already tried to improve the NN's fit- ness a maximum of S#state.max_attempts number of times. If that is the case, the exoself process backs up the updated NN (with the updated, tuned weights) to da- tabase using the backup_genotype/2 function, prints to screen that it is terminat- ing, and sends to the population_monitor its accumulated statistics (highest fitness, evaluation count, cycle count...). On the other hand, if the exoself process is not yet done tuning the neural weights, when it has not yet reached its termination condition, it forms a list of neuron ids and their perturbation Spread values, and asks them to perturb their synaptic weights.

This is the new feature that we have added, unlike before, the exoself uses the tuning selection function, the perturbation value, and the annealing value, to com- pose a list of tuples: [{Nid, Spread}...] , which dictates which neuron ids should be perturbed, and also the perturbation intensity range. The Spread value is the actual range of possible perturbation values. The spread is calculated through the use of perturbation range value and the annealing parameter. The NIds are chosen using the exoself's tuning selection algorithm function.

The tuning_selection_f is used to compose a list of tuples: [{Nid,Spread}...], where each tuple is composed of a neuron id and the perturbation spread value. The actual tuning selection function accepts the NIds (not NPIds as in the original code), the generation value of the agent (its age), the perturbation range value, and the annealing parameter. The selection function then composes the list, and returns that list of tuples to the exoself. Once this list of tuples is composed, the exoself sends each of the selected neurons a message to perturb its synaptic weights using the Spread value. The message format is changed from {self(), weight_perturb} to {self(), weight_perturb, Spread} . Unlike before where we directly dealt with NPIds, since we simply chose the NPIds randomly from all the NPIds com- posing the NN, we now use NIds, because during the selection function, the way we compute the Spread value a neuron should use, is by analyzing that neuron's age, and its generation. Thus, once the list of selected neurons is composed, we use the IdsNPIds ets table which maps ids to pids and back, to convert the NIds to NPIds, and send each of the selected NPIds the noted message. Finally, the exoself then reactivates the cortex, and drops back into its main loop. The updated source

of the prep/2 function, and the new main loop/1 function of the exoself, are shown in Listing-11.14.

Listing-11.14 The updated prep and main loop functions of the exoself module. prep(Agent_Id,PM_PId)->

random:seed(now()),

IdsNPIds = ets:new(idsNpids,[set,private]),

A = genotype:dirty_read({agent,Agent_Id}),

Cx = genotype:dirty_read({cortex,A#agent.cx_id}),

SIds = Cx#cortex.sensor_ids,

AIds = Cx#cortex.actuator_ids,

NIds = Cx#cortex.neuron_ids,

ScapePIds = spawn_Scapes(IdsNPIds,SIds,AIds),

spawn_CerebralUnits(IdsNPIds,cortex,[Cx#cortex.id]),

spawn_CerebralUnits(IdsNPIds,sensor,SIds),

spawn_CerebralUnits(IdsNPIds,actuator,AIds),

spawn_CerebralUnits(IdsNPIds,neuron,NIds),

link_Sensors(SIds,IdsNPIds),

link_Actuators(AIds,IdsNPIds),

link_Neurons(NIds,IdsNPIds),

{SPIds,NPIds,APIds}=link_Cortex(Cx,IdsNPIds),

Cx_PId = ets:lookup_element(IdsNPIds,Cx#cortex.id,2),

{TuningDurationFunction,Parameter} = A#agent.tuning_duration_f,

S = #state{

agent_id=Agent_Id,

generation=A#agent.generation,

pm_pid=PM_PId,

idsNpids=IdsNPIds,

cx_pid=Cx_PId,

spids=SPIds,

npids=NPIds,

nids=NIds,

apids=APIds,

scape_pids=ScapePIds,

max_attempts= tuning_duration:TuningDurationFunction(Parameter, NIds,

A#agent.generation),

tuning_selection_f=A#agent.tuning_selection_f,

annealing_parameter=A#agent.annealing_parameter,

tuning_duration_f=A#agent.tuning_duration_f,

perturbation_range=A#agent.perturbation_range

},

loop(S).

%The prep/2 function prepares and sets up the exoself's state before dropping into the main

loop. The function first reads the agent and cortex records belonging to the Agent_Id of the NN

based system. The function then reads the sensor, actuator, and neuron ids, then spawns the pri- vate scapes using the spawn_Scapes/3 function, spawns the cortex, sensor, actuator, and neuron processes, and then finally links up all these processes together using the link_.../2 functions. Once the phenotype has been generated from the genotype, the exoself drops into its main loop.

loop(S)->

receive

{Cx_PId,evaluation_completed,Fitness,Cycles,Time}->

IdsNPIds = S#state.idsNpids,

{U_HighestFitness,U_Attempt}=case Fitness > S#state.highest_fitness of

true ->

[NPId ! {self(),weight_backup} || NPId <- S#state.npids], {Fitness,0};

false ->

Perturbed_NIdPs=get(perturbed),

[ets:lookup_element(IdsNPIds,NId,2) ! {self(),weight_restore} ||

{NId,_Spread} <- Perturbed_NIdPs],

{S#state.highest_fitness,S#state.attempt+1}

end,

[PId ! {self(), reset_prep} || PId <- S#state.npids], gather_acks(length(S#state.npids)),

[PId ! {self(), reset} || PId <- S#state.npids],

U_CycleAcc = S#state.cycle_acc+Cycles,

U_TimeAcc = S#state.time_acc+Time,

U_EvalAcc = S#state.eval_acc+1,

case U_Attempt >= S#state.max_attempts of

true -> %End training

A=genotype:dirty_read({agent,S#state.agent_id}),

genotype:write(A#agent{fitness=U_HighestFitness}),

backup_genotype(S#state.idsNpids,S#state.npids),

terminate_phenotype(S#state.cx_pid,S#state.spids,S#state.npids,

S#state.apids,S#state.scape_pids),

io:format( “Agent:~p terminating. Genotype has been backed

up.~n Fitness:~p~n TotEvaluations:~p~n TotCycles:~p~n TimeAcc:~p~n ”,[self(), U_HighestFitness, U_EvalAcc,U_CycleAcc,U_TimeAcc]),

gen_server:cast(S#state.pm_pid,{S#state.agent_id, terminated,

U_HighestFitness,U_EvalAcc,U_CycleAcc,U_TimeAcc});

false -> %Continue training

TuningSelectionFunction=S#state.tuning_selection_f,

PerturbationRange = S#state.perturbation_range,

AnnealingParameter = S#state.annealing_parameter,

ChosenNIdPs=tuning_selection:TuningSelectionFunction(

S#state.nids,S#state.generation,PerturbationRange,AnnealingParameter),

[ets:lookup_element(IdsNPIds,NId,2) ! {self(),weight_perturb,

Spread} || {NId,Spread} <- ChosenNIdPs],

put(perturbed,ChosenNIdPs),

Cx_PId ! {self(),reactivate},

U_S =S#state{

cycle_acc=U_CycleAcc,

time_acc=U_TimeAcc,

eval_acc=U_EvalAcc,

attempt=U_Attempt,

highest_fitness=U_HighestFitness

},

loop(U_S)

end

end.

%When exoself receives the evaluation_complete message from the cortex, it first checks whether the newly achieved fitness by its NN is greater than the currently highest fitness

achieved value. Before the NN based system terminates, it must fail to increase in fitness for Max_Attempts number of times. Thus, if the new fitness is not higher than highest_fitness, and the agent failed to increase in fitness more than Max_Attempts number of times, the exoself terminates its NN's phenotype, and forwards the highest achieved fitness and other statistics of its performance to the population_monitor process. If the fitness is not higher than the high- est_fitness on record, but the exoself has not attempted to increase the NN's fitness more than Max_Attempts number of times, it requests that the neurons restore their previous set of synap- tic weights, representing the thus far best achieved combination of weights, then it requests that they perturb their synaptic weights, and flush their inbox to get back into their initial pristine form. Finally, the exoself sends the cortex a message to reactivate, triggering it to action and its synchronization duties. The way the exoself chooses the neurons to perturb, is using the tun- ing_selection function. If on the other hand the newly achieved fitness is higher than the previ- ously achieved highest_fitness, then the exoself requests that the neurons backup their current synaptic weights. The exoself then resets its attempt counter back to 0, so as to give the new synaptic weight combination another Max_Attempts number of perturbations and attempts at improvement, and then again requests that the neurons perturb their synaptic weights. Finally, the exoself then drops back into its main receive loop.

The tuning_duration module contains all the tuning duration functions, func- tions which calculate how long the tuning phase must run. The tuning duration function sets the max_attempts value, with the function format being as follows:

Input : Neuron_Ids, AgentGeneration

Output : Max_Attempts

The tuning duration function can output a constant, which is what we used thus far. It can output a value that is proportional to the number of neurons composing the NN, or it can produce a value based on the number of all neurons in the popu- lation. Listing-11.15 shows the implementation of the tuning_duration module.

Listing-11.15 The tuning_duration module which stores the various tuning duration functions.

const(Parameter,_N_Ids,_Generation)->

ConstMaxAttempts = Parameter,

ConstMaxAttempts.

%const/3 returns the preset const max_attempts value. wsize_proportional(Parameter,N_Ids,Generation)->

Power = Parameter,

Active_NIds = extract_RecGenNIds(N_Ids,Generation,3,[]),

Tot_ActiveNeuron_Weights = extract_NWeightCount(Active_NIds,0),

20 + functions:sat(round(math:pow(Tot_ActiveNeuron_Weights,Power)),100,0). %wsize_proportional/3 calculates the max_attempts value based on the agent's features. In this case the max_attempts is proportional to the agent's number of weights belonging to the neu- rons which were added or mutated within the last 3 generations.

extract_RecGenNIds([N_Id|N_Ids],Generation,AgeLimit,Acc)->

N = genotype:dirty_read({neuron,N_Id}),

NeuronGen = N#neuron.generation,

case NeuronGen >= (Generation-AgeLimit) of

true ->

extract_RecGenNIds(N_Ids,Generation,AgeLimit,[N_Id|Acc]);

false ->

extract_RecGenNIds(N_Ids,Generation,AgeLimit,Acc)

end;

extract_RecGenNIds([],_Generation,_AgeLimit,Acc)->

Acc.

%extract_RecGenNIds/4 extracts the NIds of all neurons whose age is lower or equal to the specified AgeLimit.

extract_NWeightCount([N_Id|RecGenN_Ids],Acc)->

N = genotype:dirty_read({neuron,N_Id}),

Input_IdPs = N#neuron.input_idps,

TotWeights = lists:sum([length(Weights) || {_IId,Weights} <- Input_IdPs]), extract_NWeightCount(RecGenN_Ids,TotWeights+Acc);

extract_NWeightCount([],Acc)->

Acc.

%extract_NWeightCount/2 counts the total number of weights which belong to the list of neu- ron ids that the function was called with.

nsize_proportional(Parameter,N_Ids,Generation)->

Power = Parameter,

Tot_Neurons = length(extract_RecGenNIds(N_Ids,Generation,3,[])),

20 + functions:sat(round(math:pow(Tot_Neurons,Power)),100,0).

%nsize_proportional/3 calculates the max_attempts to be proportional to the number of neurons which were mutated or added to the NN within the last 3 generations.

There are many different ways in which you can calculate the max_attempts value for the tuning phase. The main thing that must be kept in mind, is that the same tuning duration function must be used by all competing agents in the popula- tion, thus ensuring that the tuning process is fair, and that no agent gets an ad- vantage because it uses some particular tuning duration function that is not used by others. If that happens, then evolution and fitness of the competing agents be- comes dependent not on true fitness of the agent but on the function it uses. The agents competing with each other, must use the same tuning duration function.

Also, when creating tuning duration functions that take into account the NN's size, we must ensure that this factor skews the fitness towards producing smaller NN systems, not larger. We do not want to reward neural bloating. For example, if we create a tuning duration function which uses the following equation: MaxAttempts = 100*TotNeurons , we will be giving an incentive for the NNs to bloat. Since just by adding one extra neuron, the NN has 100 extra tries to im- prove its fitness, and chances are that it will be a bit more fit than its better coun- terparts which did not get as many attempts. To avoid this, we must analyze the tuning duration functions, ensuring that they promote more concise NN systems, or that they at least do not provide the incentives which overwrite the actual fit- ness function. Another alternative is to use a constant MaxAttempts value. With a constant MaxAttempts value, the larger NNs will evolve such structures that leaves them general, and competitive with smaller NNs, since the larger NN sys- tems will have the same MaxAttempts to optimize, and their architecture and to- pology will have to be such that they can still compete and optimize easily with the few synaptic weight permutation attempts that they are given.

The nsize_proportional and wsize_proportional functions have their exponen- tial power parameters set to 0.5, and thus take the square root of the number of neurons and weights respectively. Thus, the NN systems which have a larger number of weights or neurons to optimize, will have a larger number of chances, but just barely. Hopefully this approach will not overwrite and undermine the fit- ness function, still push towards more concise topologies, while at the same time provide for a few more optimization attempts to the larger NN based agents, which need them due to having that many more synaptic weight permutations which can be explored. There are many different ways to create tuning duration functions, having decoupled them from the system will help us experiment with them, and perhaps find one that has the best of all worlds.

Having completed the tuning_duration module, we move to the tun- ing_selection module. The tuning_selection module contains all the tuning selec- tion functions, which accept as input four parameters:

1. All NIds belonging to the NN.

2. The agent's generation, which is the number of topological mutation phases that it has undergone.

3. The perturbation range, the multiplier of math:pi(), which when used produces the spread value.

4. The annealing parameter, which is used to indicate how the perturbation range decays with the age of the neuron to which synaptic weight perturbation is ap- plied. It makes less sense to perturb the more stable elements of the NN system, less so than those elements which have just recently been added to the NN sys- tem, and which still need to be tuned and modified to work well with the al- ready existing larger system. The concept is that of simulated annealing [3].

We gather all these selection functions in their own module because there are many ways to select neurons which should be perturbed in local search during the tuning phase. This makes it easier for us to add new selection functions later on, and see if a new function can improve the performance. Of course it would be even better to decouple the system to such an extent that local and global searches are completely swappable, letting us have the ability to apply to the NN system anything during global search and local search phases. Being able to use particle swarm optimization, ant colony optimization … and swap between all these ap- proaches during the parameter and topology optimization would be something in- teresting to explore. Eventually, that feature too shall be added. But for now, the tuning_selection module will primarily concentrate on holding the different types of local search, neuron selection algorithms.

Also, because we now wish to take advantage of the perturbation range value and the annealing parameter, the tuning selection function must not only select the neuron ids for synaptic perturbation, but also compute the perturbation intensity, the available range of the perturbation intensity, from which the neuron will then randomly generate a weight perturbation value. Thus, the selection function cre- ates a list of tuples rather than simply a list of neuron ids. The selection function outputs a list of the following form: [{NId,Spread}...] , where NId is the neuron id, and Spread is the spread above and below 0, the value within which the neuron generates the actual perturbation. The Spread equals the peturbation_range value if there is no annealing, if annealing is present (annealing_parameter =< 1), then the Spread is further modified. The annealing factor must scale the Spread, pro- portional to the age of the neuron whose synaptic weights are to be perturbed. In our tuning selection algorithms, the spread value is calculated as follows:

Spread=PerturbationRange*math:pi()*math:pow(AnnealingParameter,NeuronAge).

When AnnealingParameter = 1 , there is no annealing. But when the AnnealingParameter is set to a number lower than 1, then annealing is exponen- tially proportional to the neuron's age.

We will create 8 such selection functions, their names and features are as fol- lows:

1. dynamic: This function randomly generates a neuron age limit using math:sqrt(1/random:uniform()). The distribution of neuron age limits is thus skewed towards lower values. Once the neuron age limit is generated, all neu- rons in the NN of that age and lower are chosen for synaptic weight perturba-

2.

3.

4.

5.

6.

7.

8.

tion. If annealing parameter is less than 1, then the Spread is calculated for eve- ry chosen neuron.

dynamic_random: This function does the same as the dynamic selection func- tion, but after that pool of NIds is created, it then composes a new pool of tu- ples by going through the list of tuples and selecting each with a probability of 1/math:sqrt(length(TupleList)). In this manner, during every tuning evaluation, a random set of NIds is chosen for perturbation, at times a large number, and sometimes just a few. This further randomizes the intensity of tuning.

active: The active selection function chooses all neurons which were affected by mutation or created within the last 3 generations.

active_random: Performs the same function as active , but then creates a sublist by randomly choosing tuples from the original active list, each tuple is chosen with a probability of 1/math:sqrt(length(TupleList)).

current: The current selection function chooses all neurons affected during the last generation, those that were just added or affected by topological mutation. current_random: Again, uses the tuple list created in the current function, but then generates a sublist with each tuple having a chance of being chosen with the probability of 1/math:sqrt(length(TupleList)).

all: The tuple list is composed of all the neurons in the NN. This would become ineffective once the NN grows in size, since it would be very difficult to find the right neuron to perturb if the size of the NN is 1000000, and during the last topological mutation phase only a single neuron has been added, for example. This tuning selection algorithm is something to compare the other tuning selec- tion functions with. Although this function can be made effective with a proper annealing parameter. With an annealing parameter, it would then have the most recent neurons in the NN use high perturbation Spreads, while those which have stabilized, would have Spread values that were almost nonexistent. When set up in this manner, this function becomes the true annealing based tuning se- lection function.

all_random: The same as the all function, but uses the initial list to generate a new sublist by randomly choosing tuples from the all list, each tuple with a probability of 1/math:sqrt(length(TupleList)).

The implementation of these tuning selection functions is shown in Listing- 11.16.

Listing-11.16 The implementation of the tuning_selection module.

-module(tuning_selection).

-compile(export_all).

-include( “records.hrl ”).

dynamic(N_Ids,AgentGeneration,PerturbationRange,AnnealingParameter)->

AgeLimit = math:sqrt(1/random:uniform()),

ChosenN_IdPs = case extract_CurGenNIdPs(N_Ids,AgentGeneration,AgeLimit,

PerturbationRange, AnnealingParameter,[]) of

[] ->

[N_Id|_] = N_Ids,

[{N_Id,PerturbationRange*math:pi()}];

ExtractedN_IdPs->

ExtractedN_IdPs

end,

ChosenN_IdPs.

%The dynamic/4 selection function randomly selects an age limit for its neuron id pool. The age limit is chosen by executing math:sqrt(1/random:uniform()), which creates a value between 1 and infinity. Using this function there is 75% that the number will be =<2, then 25% that it will be >=2, then 11% that it will be >=3... Every time this selection function is executed, the AgeLimit is generated anew, thus different executions will produce different neuron id pools for tuning.

extract_CurGenNIdPs([N_Id|N_Ids],Generation,AgeLimit,PR,AP,Acc)->

N = genotype:dirty_read({neuron,N_Id}),

NeuronGen = N#neuron.generation,

case NeuronGen >= (Generation-AgeLimit) of

true ->

Age = Generation-NeuronGen,

Spread = PR*math:pi()*math:pow(AP,Age),

extract_CurGenNIdPs(N_Ids,Generation,AgeLimit,PR,AP, [{N_Id,

Spread}|Acc]);

false ->

extract_CurGenNIdPs(N_Ids,Generation,AgeLimit,PR,AP,Acc)

end;

extract_CurGenNIdPs([],_Generation,_AgeLimit,_PR,_AP,Acc)->

Acc.

%The extract_CurGenNIdPs/6 composes a neuron id pool from neurons who are younger than the AgeLimit parameter. This is calculated by comparing the neuron generation, which notes when it was created or affected by mutation, to the agent's generation, which increments with every topological mutation phase. Id pool accumulates not just the neurons but also the spread which will be used for the synaptic weight perturbation. The spread is calculated by multiplying the perturbation_range variable by math:pi(), and then multiplied by the annealing factor which is: math:pow(AnnealingParameter,Age). If the Annealing parameter is less than 1, then the greater the age of the neuron, the lower the Spread will be. If Annealing parameter is set to 1, then no annealing occurs.

dynamic_random(N_Ids,AgentGeneration,PerturbationRange,AnnealingParameter) ->

ChosenN_IdPs = case extract_CurGenNIdPs(N_Ids,AgentGeneration,

math:sqrt(1/random:uniform()),PerturbationRange,AnnealingParameter,[]) of

[] ->

[N_Id|_] = N_Ids,

[{N_Id,PerturbationRange*math:pi()}];

ExtractedN_IdPs->

ExtractedN_IdPs

end,

Tot_Neurons = length(ChosenN_IdPs),

MutationP = 1/math:sqrt(Tot_Neurons),

choose_randomNIdPs(MutationP,ChosenN_IdPs).

%dynamic_random/4 selection function composes the neuron id pool the same way as the dy- namic/4 selection function, but after the id pool is generated, this selection function extracts ids from it randomly with a probability of 1/math:sqrt(Tot_Neurons). Thus the probability of a neu- ron being selected from this pool is proportional to the number of ids in that pool. If through chance no ids are selected, then the first element in the id pool is automatically selected, and given the highest spread.

choose_randomNIdPs(MutationP,N_IdPs)->

case choose_randomNIdPs(N_IdPs,MutationP,[]) of

[] ->

{NId,Spread} = lists:nth(random:uniform(length(N_IdPs)),N_IdPs),

[{NId,Spread}];

Acc ->

Acc

end.

choose_randomNIdPs([{NId,Spread}|N_IdPs],MutationP,Acc)->

U_Acc = case random:uniform() < MutationP of

true ->

[{NId,Spread}|Acc];

false ->

Acc

end,

choose_randomNIdPs(N_IdPs,MutationP,U_Acc);

choose_randomNIdPs([],_MutationP,Acc)->

Acc.

% choose_randomNIdPs/2 calls choose_randomNIdPs/3 which accepts a mutation probability parameter and a list of tuples composed of neuron ids and their spreads. The function then se- lects randomly from this list with a probability MutationP, composes a new sublist, and returns it to the caller (choose_randomNIdPs/2). If by chance the sublist ends up being empty, the func- tion choose_randomNIdPs/2 chooses a random tuple from the list, and returns it to the caller. Otherwise the composed sublist is returned to the caller as is.

active(N_Ids,AgentGeneration,PerturbationRange,AnnealingParameter)->

extract_CurGenNIdPs(N_Ids,AgentGeneration,3,PerturbationRange,AnnealingParameter,[]). %active/4 selection algorithm composes a neuron id pool from all neurons which are younger than 3 generations. I refer to the neurons as Active, if they have been affected or created within the last 3 generations, because they are still being integrated and tuned in to work with the rest of the NN based system.

active_random(N_Ids,AgentGeneration,PerturbationRange,AnnealingParameter)->

ChosenN_IdPs = case extract_CurGenNIdPs(N_Ids,AgentGeneration,3,PerturbationRange,

AnnealingParameter,[]) of

[] ->

[N_Id|_] = N_Ids,

[{N_Id,PerturbationRange*math:pi()}];

ExtractedN_IdPs->

ExtractedN_IdPs

end,

Tot_Neurons = length(ChosenN_IdPs),

MutationP = 1/math:sqrt(Tot_Neurons),

choose_randomNIdPs(MutationP,ChosenN_IdPs).

%active_random/4 is a selection algorithm that composes an id pool by first creating a list of all neurons who are younger than 3 generations, and then composing a sublist from it by randomly choosing elements from this list with a probability of 1/math:sqrt(Tot_Neurons).

current(N_Ids,AgentGeneration,PerturbationRange,AnnealingParameter)->

case extract_CurGenNIdPs(N_Ids,AgentGeneration,0,PerturbationRange,

AnnealingParameter,[]) of

[] ->

[N_Id|_] = N_Ids,

[{N_Id,PerturbationRange*math:pi()}];

IdPs ->

IdPs

end.

%current/4 is a tuning selection algorithm that returns a list of all neurons which have been added to the NN, or affected by mutation, during the last generation.

current_random(N_Ids,AgentGeneration,PerturbationRange,AnnealingParameter)->

ChosenN_IdPs = case extract_CurGenNIdPs(N_Ids,AgentGeneration,0,PerturbationRange, ularities that we can readil

[] ->

[N_Id|_] = N_Ids,

[{N_Id,PerturbationRange*math:pi()}];

IdPs ->

IdPs

end,

Tot_Neurons = length(ChosenN_IdPs),

MutationP = 1/math:sqrt(Tot_Neurons),

choose_randomNIdPs(MutationP,ChosenN_IdPs).

%current_random/4 composes the list of tuples in the same way as current/4 does, but it then composes a sublist by randomly selecting elements from that list with a probability of 1/math:sqrt(Tot_Neurons), and returning this resulting sublist, back to the caller.

5 08

all(N_Ids,AgentGeneration,PerturbationRange,AnnealingParameter)->

extract_CurGenNIdPs(N_Ids,AgentGeneration,AgentGeneration,PerturbationRange,

AnnealingParameter,[]).

%all/4 returns a list of tuples composed of all neuron ids (and their spread values) belonging to the NN, to the caller.

all_random(N_Ids,AgentGeneration,PerturbationRange,AnnealingParameter)->

ChosenN_IdPs = extract_CurGenNIdPs(N_Ids,AgentGeneration,AgentGeneration,

PerturbationRange,AnnealingParameter,[]),

Tot_Neurons = length(ChosenN_IdPs),

MutationP = 1/math:sqrt(Tot_Neurons),

choose_randomNIdPs(MutationP,ChosenN_IdPs).

%all_random/4 first composes a list of tuples from nids and their spreads, and then creates a sublist by choosing each element with a probability of 1/math:sqrt(Tot_neurons), returning the result to the caller.

With the updated exoself module, and the newly created tuning_duration and selection_algorithm modules, the exoself decoupling is complete. We now need only update the neuron module, create the necessary plasticity and sig- nal_accumulator modules, and we are ready to test our new, future ready, and ag- ile, topology and weight evolving artificial neural network platform.

11.4.8 Updating the neuron Module

The neuron is the basic processing element, the basic processing node in the neural network system. The neurons in the system we've created are more general than those used by others. We created them to easily use various activation func- tions, and to accept and output vectors. Because we can use anything for the acti- vation function, including logical operators, the neurons are really just processing nodes. In some sense, we have developed a system that is not a Topology and Weight Evolving Artificial Neural Network, but a Topology and Parameter Evolv- ing Universal Learning Network (TPEULN). Nevertheless, we will continue refer- ring to these processing elements as neurons.

At the moment, a neuron can accept vector signals from other elements. Since the inputs and outputs are standardized in their format, lists of float() values, the neuron does not need to know whether the input is from a neuron, a sensor, or from some other module capable of producing a vector signal. The problem though is that to use the aggregator functions to their full potential, to allow them full control when it comes to figuring out what to do with input signals, it would be useful to let them see the whole input vector. At this time, a neuron ac- cepts a vector input from an element with an Id for which it has readied the appro- priate synaptic weights, and then it computes the dot product. This means our neu-

rons are looking at each input signal, one at a time, and then move on to the next. They do not see the entire input, which means they cannot look at all the input signals, and then decide how to aggregate them, what to do with them, how to process them... One of the many side effects of this, is that a neuron cannot nor- malize the list of input vectors, together as one.

Another way a neuron can process the input vectors is as follows: Instead of computing a dot product of every input vector with its synaptic weight list, we could have the neuron first aggregate the input signals, in the same order as its synaptic weights are, and then perform the information processing step. This is of course a bit less efficient, since it means each neuron will have to store the entire input, which might be a list of thousands of numbers. This is the way in which the DXNN neurons gather and process signals.

It is difficult to say which approach is more effective, the one we've built thus far, or the one used by DXNN, where the neuron first accumulates all the input signals from all the elements it is connected from, and then decides on what to do with them. Benchmarks show the one we have implemented here to work slightly faster, and of course each neuron takes up less memory, since each neuron does not have to store all the input signals first. At the same time, the DXNN neurons make certain things much simpler, having the entire input at your disposal when performing computations, when deciding on how plasticity affects the weights, is easy. Thus, in this second approach, what we lose in efficiency, we gain in extendi- bility and future readiness of the system.

To make the decision on which approach we should use, consider the imple- mentation of the diff_aggregator: To implement the now numerously discussed diff_aggregator function, which instead of calculating the dot product of the input vector and the synaptic weights directly, calculates the dot product of the synaptic weights and the difference between the current input vector and the input vector the same element sent last time, the difference vector, we need to first store this previously received vector in memory. We could of course store each received in- put vector in process dictionary separately, but we could also aggregate the input vectors, and then store it as an ordered list of input vectors, which can then imme- diately be dotted with the synaptic weights list... and also be stored and then re- covered if we use the diff aggregator. Also, if we wish to normalize the input vec- tors, though again possible with both neuron implementations, if we use the implementation where we store all input signals first, vector normalization be- comes trivial.

Because the second implementation, the one used by DXNN, makes a number of these things simpler, we will use it instead. If at a later time we need to change things back, it will be easy to accomplish, and independent of the rest of the sys- tem. Thus, the new neuron implementation should use the following algorithm:

1. Neuron is spawned, and awaits in a prep state, waiting for its initialization pa- rameters from the exoself.

2. Neuron receives its Input_PIdPs, AF, PF, AggrF, and other parameters from the exoself. Where the Input_PIdPs is a list of the form: [{Input_PId,Weights}...], in which Input_PId is that of the element that sends it a signal, and Weights are the synaptic weights associated with the vector input signal from the presynap- tic element. After receiving the initialization parameters, the neuron sends out a default output signal to all the elements to which it is connected recurrently, and then drops into its main receive loop.

3. The neuron awaits signals in its main loop. It can accept signals of the follow- ing format:

{Input_PId,forward,Input} : The signal from other elements which send vector signals to it.

{ExoSelf_PId,weight_backup} : The signal from the exoself, which tells the neuron that the NN system performs best when this particular neuron is using its current synaptic weight combination, and thus it should save this synaptic weight list as MInput_PidPs, and that it is the best weight combi- nation achieved thus far. This message is sent if after the weight perturba- tion, the NN's evaluation achieves a higher fitness than when the neurons of this NN used their previous synaptic weights. {ExoSelf_PId,weight_restore} : This message is sent from the exoself, and it tells the neuron that it should restore its synaptic weight list to one previ- ously used, saved as MInput_PIdPs. This message is usually sent if after the weight perturbation, the NN based agent's evaluation performs worse than it did with its previous synaptic weight combinations. {ExoSelf_PId,weight_perturb,Spread}: This is a new message type, in our original version the neuron received the {ExoSelf_PId,weight_perturb} message, and used ?DELTA_MULTIPLIER macro to generate the pertur- bation intensities. With the new message, it will use the Spread value for the purpose of generating synaptic weight perturbations. {ExoSelf,reset_prep}: This message is sent after a single evaluation is completed, and the exoself wishes to reset all the neurons to their original states, with empty inboxes. Once a neuron receives this message, it goes into a reset_prep state, flushes its buffer/inbox, and then awaits for the {ExoSelf, reset} signal. When the neuron receives the {ExoSelf,reset} message, it again sends out the default output messages to all its recurrent connections (ids stored in its ro_ids list), and then finally drops back into its main receive loop.

{ExoSelf_PId,get_backup} : When receiving this message, the neuron sends back to the exoself its last best synaptic weight combination, stored as the MInput_PIdPs list.

{ExoSelf_PId,terminate} : The neuron terminates after it receives this message.

Except for the way the neuron processes the {Input_PId, forward, Input} and {ExoSelf_PId, weight_perturb, Spread} messages, the rest function in the same way they did in our original implementation.

4. The neuron accepts the {Input_PId,forward,Input} message only when the In- put_PId in the message matches the Input_PId in its Input_PIdPs list. When the neuron receives the {Input_PId,forward,Input} message, unlike in the original implementation, our new neuron simply accumulates the {Input_PId,Input} message into its IAcc list. Once the neuron has received the Input signals from all the Input_PIds in its Input_PIdPs list, it then runs the aggregation function, synaptic plasticity function, and the activation function, to produce its final output signal. The accumulated {Input_PId, Input} messages are in the same order as the {Input_PId,Weights} tuples are in the Input_PIdPs list, since the neuron does a selective receive, forming the IAcc in the same order as the In- put_PIds are in its Input_PIdPs. Because of this, once the IAcc list has been formed, taking a dot product or some function of the list, is easy. To take the dot product of the two lists, we simply dot the IAcc and the Input_PIdPs, since each is a vector composed of tuples which contain vectors.

5. When the neuron receives the {ExoSelf_PId, weight_perturb, Spread} message, it executes the same functions as in the original implementation, only in this implementation the perturb_IPIdPs function is executed with the Spread pa- rameter instead of the ?DELTA_MULTIPLIER macro.

That is essentially it. As you can see, the functionality is retained, we simply stopped computing the dot product immediately after every input message is re- ceived. Instead, we now first accumulate all the input vectors in the same order as the neuron's Input_IdPs, and then dot everything all at once. If we're using anoth- er aggregation function, we can then send the accumulated input vectors and the Input_IdPs through that function first. The implementation of this new neuron ver- sion is shown in Listing-11.17.

Listing-11.17 The updated neuron implementation. Only the updated parts are shown, high- lighted in boldface.

gen(ExoSelf_PId,Node)->

spawn(Node,?MODULE,prep,[ExoSelf_PId]).

prep(ExoSelf_PId) ->

random:seed(now()),

receive

{ExoSelf_PId,{Id,Cx_PId,AF,PF,AggrF,Input_PIdPs,Output_PIds,RO_PIds}} ->

fanout(RO_PIds,{self(),forward,[?RO_SIGNAL]}),

IPIds = [IPId || {IPId,_W} <- Input_PIdPs], loop(Id,ExoSelf_PId,Cx_PId,AF,PF,AggrF, {IPIds,IPIds} ,[],{Input_PIdPs,

Input_PIdPs},Output_PIds,RO_PIds)

end.

%When gen/2 is executed, it spawns the neuron element which immediately begins to wait for its initial state message from the exoself. Once the state message arrives, the neuron sends out

the default forward signals to any elements in its ro_ids list, if any. Afterwards, prep/1 drops in- to the neuron's main receive loop.

loop(Id,ExoSelf_PId,Cx_PId,AF,PF,AggrF,{[Input_PId|IPIds],MIPIds},IAcc,{Input_PIdPs,

MInput_PIdPs},Output_PIds,RO_PIds)->

receive

{Input_PId,forward,Input}->

loop(Id,ExoSelf_PId,Cx_PId,AF,PF,AggrF,{IPIds,MIPIds},

[{Input_PId,Input}| IAcc] , {Input_PIdPs,MInput_PIdPs},Output_PIds,RO_PIds);

{ExoSelf_PId,weight_backup}->

loop(Id,ExoSelf_PId,Cx_PId,AF,PF,AggrF,{[Input_PId|IPIds],MIPIds},IAcc,

{Input_PIdPs,Input_PIdPs} ,Output_PIds,RO_PIds);

{ExoSelf_PId,weight_restore}->

loop(Id,ExoSelf_PId,Cx_PId,AF,PF,AggrF,{[Input_PId|IPIds],MIPIds},IAcc,

{MInput_PIdPs,MInput_PIdPs} ,Output_PIds,RO_PIds);

{ExoSelf_PId,weight_perturb,Spread}->

Perturbed_IPIdPs=perturb_IPIdPs(Spread,MInput_PIdPs),

loop(Id,ExoSelf_PId,Cx_PId,AF,PF,AggrF,{[Input_PId|IPIds],MIPIds},IAcc,

{Perturbed_IPIdPs,MInput_PIdPs},Output_PIds,RO_PIds);

{ExoSelf,reset_prep}->

neuron:flush_buffer(),

ExoSelf ! {self(),ready},

receive

{ExoSelf, reset}->

fanout(RO_PIds,{self(),forward,[?RO_SIGNAL]})

end,

loop(Id,ExoSelf_PId,Cx_PId,AF,PF,AggrF,{MIPIds,MIPIds}, [] ,{Input_PIdPs,

MInput_PIdPs}, Output_PIds,RO_PIds);

{ExoSelf_PId,get_backup}->

ExoSelf_PId ! {self(),Id,MInput_PIdPs},

loop(Id,ExoSelf_PId,Cx_PId,AF,PF,AggrF,{[Input_PId|IPIds],MIPIds},IAcc,

{Input_PIdPs, MInput_PIdPs}, Output_PIds,RO_PIds);

{ExoSelf_PId,terminate}->

io:format( “Neuron:~p has termianted.~n ”,[self()]),

ok

end;

loop(Id,ExoSelf_PId,Cx_PId,AF,PF,AggrF,{[],MIPIds},IAcc,{Input_PIdPs,MInput_PIdPs},

Output_PIds,RO_PIds)->

Aggregation_Product = aggregation:AggrF(IAcc,Input_PIdPs),

Output = functions:AF(Aggregation_Product),

U_IPIdPs = plasticity:PF(IAcc,Input_PIdPs,Output),

[Output_PId ! {self(),forward,[Output]} || Output_PId <- Output_PIds], loop(Id,ExoSelf_PId,Cx_PId,AF,PF,AggrF,

{MIPIds,MIPIds},[],{U_IPIdPs,MInput_PIdPs}, Output_PIds, RO_PIds).

…

perturb_IPIdPs( Spread,I nput_PIdPs)->

Tot_Weights=lists:sum([length(Weights) || {_Input_PId,Weights}<-Input_PIdPs]),

MP = 1/math:sqrt(Tot_Weights),

perturb_IPIdPs( Spread, MP,Input_PIdPs,[]).

perturb_IPIdPs( Spread ,MP,[{Input_PId,Weights}|Input_PIdPs],Acc)->

U_Weights = perturb_weights( Spread ,MP,Weights,[]),

perturb_IPIdPs( Spread ,MP,Input_PIdPs,[{Input_PId,U_Weights}|Acc]);

perturb_IPIdPs( _Spread ,_MP,[],Acc)->

lists:reverse(Acc).

%The perturb_IPIdPs/1 function perturbs each synaptic weight in the Input_PIdPs list with a probability of: 1/math:sqrt(Tot_Weights). The probability is based on the total number of

weights in the Input_PIdPs list, with the actual mutation probability equating to the inverse of square root of the total number of synaptic weights belonging to the neuron. The per- turb_IPIdPs/3 function goes through each weights block and calls the perturb_weights/3 to per- turb the weights.

perturb_weights( Spread ,MP,[W|Weights],Acc)->

U_W = case random:uniform() < MP of

true->

sat((random:uniform()-0.5)* 2*Spread +W,-

?SAT_LIMIT,?SAT_LIMIT);

false ->

W

end,

perturb_weights( Spread ,MP,Weights,[U_W|Acc]);

perturb_weights( _Spread ,_MP,[],Acc)->

lists:reverse(Acc).

%The perturb_weights/3 is the function that actually goes through each weight block

(A weight block is a synaptic weight list associated with a particular input vector sent to the neuron by another element), and perturbs each weight with a probability of MP. If the weight is chosen to be perturbed, the perturbation intensity is chosen uniformly between -Spread and Spread.

I have highlighted the parts of the implementation that have been changed, added, or whose function is important for the new implementation. Once the IAcc is formed, everything hinges on the execution of:

Aggregation_Product = signal_aggregator:AggrF(IAcc,Input_PIdPs),

Output = functions:AF(Aggregation_Product),

U_IPIdPs = plasticity:PF(IAcc,Input_PIdPs,Output),

These three functions compose the Aggregation_Product (which might simply be a dot product of the input vectors and the associated synaptic weights), apply

the activation function to the Aggregation_Product value to produce the final out- put, and then finally update the synaptic weights (Input_PIdPs) using the plasticity function (if any), respectively. The aggregation, activation, and plasticity func- tions are stored in their own respective modules. The activation functions are all stored in the functions module, the aggregation and the plasticity functions are stored in the signal_aggregator and plasticity modules respectively. In the follow- ing sections we will build these two modules.

11.4.9 Creating the signal_aggregator Module

The signal_aggregator module contains the various aggregation functions. An aggregation function is a function that in some manner gathers the input signal vectors, does something with it and the synaptic weights, and then produces a sca- lar value. For example, consider the dot product. The dot_product aggregation function composes the scalar value by aggregating the input vectors, and then cal- culating the dot product of the input vectors and the synaptic weights. Another way to calculate a scalar value from the input and weight vectors is by multiplying the corresponding input signals by their weights, but instead of adding the result- ing multiplied values, we multiply them. For example consider the input vector to be: [I1,I2,I3], and the corresponding weight vector to be: [W1,W2,W3], then this “ mult_product ” is: I1*W1 * I2*W2 * I3*W3 , compared to the dot product which is: I1*W1 + I2*W2 + I3*W3 . Another one is the diff_product, which we can cal- culate if we assume that I (k-1) to be the input element one time step ago, and I (k) to be the input element in the current time step. Thus, whereas the previous time step input vector is: [I1 (k-1) ,I2 (k-2), I3 (k-3) ] , the current input vector is: [I1 (k) ,I2 (k) ,I3 (k) ] , and the synaptic weight vector is: (W1,W2,W3) , then the diff_product is: (I1 (k-1) - I1 (k) )*W1 + (I2 (k-1) -I2 (k) )*W2 + (I3 (k-1) -I3 (k) )*W3 . A neuron using this signal aggrega- tion function would only see the differences in the signals, rather than the signals themselves. Thus if there is a rapid change in the signal, the neuron would see it, but if the signal were to stay the same for a long period of time, the neuron's input would be a vector of the form: [0, …].

There are certainly many other types of aggregation functions that could be created, and it is for this reason we have decoupled this functionality. Listing- 11.18 shows the implementation of the signal_aggregator module, with the source code for dot_product, mult_product, and the diff_product, aggregator functions.

Listing-11.18 The signal_aggregator module containing the dot_product, mult_product, and diff_product aggregation functions.

--module(signal_aggregator).

-compile(export_all).

-include( “records.hrl ”).

dot_product(IAcc,IPIdPs)->

dot_product(IAcc,IPIdPs,0).

dot_product([{IPId,Input}|IAcc],[{IPId,Weights}|IPIdPs],Acc)->

Dot = dot(Input,Weights,0),

dot_product(IAcc,IPIdPs,Dot+Acc);

dot_product([],[{bias,[Bias]}],Acc)->

Acc + Bias;

dot_product([],[],Acc)->

Acc.

dot([I|Input],[W|Weights],Acc) ->

dot(Input,Weights,I*W+Acc);

dot([],[],Acc)->

Acc.

%The dot/3 function accepts an input vector and a weight list, and computes the dot product of the two vectors.

diff_product(IAcc,IPIdPs)->

case get(diff_product) of

undefined ->

put(diff_product,IAcc),

dot_product(IAcc,IPIdPs,0);

Prev_IAcc ->

put(diff_product,IAcc),

Diff_IAcc = input_diff(IAcc,Prev_IAcc,[]),

dot_product(Diff_IAcc,IPIdPs,0)

end.

input_diff([{IPId,Input}|IAcc],[{IPId,Prev_Input}|Prev_IAcc],Acc)->

Vector_Diff = diff(Input,Prev_Input,[]),

input_diff(IAcc,Prev_IAcc,[{IPId,Vector_Diff}|Acc]);

input_diff([],[],Acc)->

lists:reverse(Acc).

diff([A|Input],[B|Prev_Input],Acc)->

diff(Input,Prev_Input,[A-B|Acc]);

diff([],[],Acc)->

lists:reverse(Acc).

%The diff_product/2 function accepts the IAcc and the IPIdPs tuple lists as input, and checks if it has the previous IAcc stored in memory. If it doesn't, then the function calculates a dot prod- uct of the IAcc and PIdPs, and returns the result to the caller. If it does, then it subtracts (value by value in the vectors) the previous IAcc from the current IAcc, then calculates a dot product of the resulting vector and the IPIdPs, and returns the result to caller.

mult_product(IAcc,IPIdPs)->

mult_product(IAcc,IPIdPs,1).

mult_product([{IPId,Input}|IAcc],[{IPId,Weights}|IPIdPs],Acc)->

Dot = mult(Input,Weights,1),

mult_product(IAcc,IPIdPs,Dot*Acc);

mult_product([],[{bias,[Bias]}],Acc)->

Acc * Bias;

mult_product([],[],Acc)->

Acc.

mult([I|Input],[W|Weights],Acc) ->

mult(Input,Weights,I*W*Acc);

mult([],[],Acc)->

Acc.

%The mult_product/2 function first multiplies the elements of the IAcc vector by their corre- sponding weight values in the IPIdPs vector. It then multiplies the resulting values together (whereas the dot product adds them), and finally returns the result to the caller.

The dot product aggregation function has already proven itself, it is used in al- most all artificial neural network implementations. The diff_product can be thought of as a neuron that looks not at the actual signal amplitudes, but the tem- poral difference in signal amplitudes. If the input signals have stabilized, then the neuron's input is calculated as a 0, if there is a sudden change in the signal, the neuron will see it. The worth of the mult_product aggregation function is certainly questionable, and should be further studied through benchmarking and testing. If there is any worth to this type of signal aggregator, evolution will find it. We can also add normalizer functions, which could normalize the input signals. The nor- malizers could be implemented as part of the aggregator functions, although it could be argued that even normalizing functions deserve their own module. Later on, we could make them a part of the aggregator functions, perhaps create two versions of each aggregation function, one which does normalize, and one which does not.

11.4.10 Creating the plasticity Module

The only remaining module left to implement is the plasticity module. True learning is not achieved when a static NN is trained on some data set through de- struction and recreation by the exoself based on its performance, but instead is the self organization of the NN, the self adaptation and changing of the NN based on the information it is processing. The learning rule, the way in which the neurons adapt independently, the way in which their synaptic weights change based on the neuron's experience, that is true learning, and that is neuroplasticity.

There are different algorithms which try to emulate biological neuroplasticity. One of such simple plasticity algorithms is the Hebbian Rule, which states that “neurons which fire together, wire together ”. The rule states that if a neuron A re- ceives a positive input signal from another neuron B, and in response, after using its aggregation and activation function, neuron A produces a positive output, then A's synaptic weight for B's connection, increases in magnitude. If that synaptic weight was positive, it becomes more positive, if it was negative, it becomes more negative.

There are numerous plasticity rules, some more faithful to their biological counterparts than others, and some more efficient than their biological counter- parts. We will discuss plasticity in a chapter dedicated to it. At this time, we will simply create a module, and a standardized plasticity function format, a function which accepts as input the accumulated input vector IAcc, Input_PIdPs, and Out- put, where IAcc is the input vector, Input_PIdPs is the associated vector of syn- aptic weights, and the Output value is the neuron's calculated output. In response, the plasticity function will produce an updated set of synaptic weights, the updated Input_PIdPs vector. This will simulate the adaptation and the morphing of synap- tic weights due to the neuron's interaction with the world, the neuron's processing of input signals. Our neuroevolutionary system will be able to generate NN based agents with and without plasticity. At this time, the plasticity module will only contain one type of plasticity function, the none/3 function. The plasticity function none/3 does not change the neuron's synaptic weights, and thus represents the static neuron. This initial plasticity module is shown in Listing-11.19

Listing-11.19 The plasticity module containing the none plasticity function.

-module(plasticity).

-compile(export_all).

-include( “records.hrl ”).

none(_IAcc,Input_PIdPs,_Output)->

Input_PIdPs.

The none/3 plasticity function accepts the 3 parameters, and returns the same Input_PIdPs to the caller as the one it was called with. This module completes the modification to our neuroevolutionary system. Our system has now most of its important functionalities decoupled. These various functions are now exchangea- ble, evolvable, modifiable, making our neuroevolutionary system more dynamic, generalizable, and flexible. In the following section we will test our updated sys- tem to ensure that it works, and that all its new parts are functional.

11.5 Compiling & Testing the New System

We have made numerous modifications to the source code. Though our system is now more flexible, and we can modify the way it functions, and change its activation functions, plasticity, selection, genetic vs memetic evolution, and generational vs steady_state evolutionary loop, we have at the same time made it more complex. Our updated system has the following relation amongst the modules, processes, and functions (with regards to which are contained within which, and which are used by which):

polis.erl

o

scape.erl

genotype.erl

genome_mutator.erl

population_monitor.erl

o

evo_alg_f

fitness_postprocessor_f

fitness_postprocessor.erl

selection_f

selection_algorithm.erl

Agent:

o

exoself.erl

morphology

morphology.erl

tuning_duration_f

tuning_duration.erl

tuning_selection_f

tuning_selection.erl

annealing_parameter

perturbation_range

encoding_type

tot_topological_mutations_f

tot_topological_mutations.erl

mutation_operators

cortex.erl

neuron.erl

af

functions.erl

aggr_f

signal_aggregator.erl

o

pf

plasticity.erl

sensor.erl

actuator.erl

The leafs of this bulleted list represent the various elements which are change- able, evolvable, and have been decoupled. For example, the population_monitor, and thus the particular population in question, behaves differently depending on the evo_alg_f (evolutionary algorithm function), the fitness_postprocessor_f (fit- ness postprocessor function), and the selection_f (the selection algorithm). These elements can all be set up differently for different populations, or even possibly changed/mutated during evolution. Also, because of the way we de- coupled the various elements and parameters of the evolving NN based agents, each agent can have a different set of tuning_selection, tuning_duration, and tot_topological_mutations_f functions. Each agent can also evolve and use differ- ent annealing and perturbation range values. Furthermore, agents belonging to the same population can now be encoded differently, some neural, and some could employ the substrate encoding. Finally, the evolving topology can incorporate dif- ferent types of neurons, each of which can have different plasticity functions, acti- vation functions, and aggregation functions.

Because all of these now decoupled features can be identified by their tags/names, and activated by the same due to belonging to their own modules, we can evolve them. Each evolving agent can now also use the mutation operators which change, modify, and mutate these tags and values. This adds further flexi- bility to our neuroevolutionary system, and lets us modify its functionality by simply setting up the INIT_CONSTRAINTS macro to the set of parameters we wish to use for any particular experiment, or problem.

Though our system is now more flexible, and we can modify the way it func- tions, we have also made it more complex, with more movable parts, and thus more elements that can break, and hide bugs. We must now compile our new sys- tem and test its functionality. The full source code of our system thus far can be found at [4].

We have created the polis:sync() function to compile everything in our project folder. Thus to first test for compilation errors, all we do is execute this function:

1> polis:sync().

Recompile: signal_aggregator

Recompile: tot_topological_mutations

Recompile: tuning_duration

Recompile: tuning_selection

…

up_to_date

It works! But before we can test our new system on the XOR benchmark, due to our records having changed, we must first delete the existing mnesia database, and then create a new one. If there is no mnesia database (if you are using the pro- vided Ch_11 work folder from [4]), then we simply create a new mnesia database by running polis:create(), and then start the polis function by running polist:start():

2> polis:create().

{atomic,ok}

3> polis:start().

Parameters:{[],[]}

- - - - Polis: ##MATHEMA## is now online.

{ok,<0.272.0>}

With the polis started, we will first create a test agent and test it on the xor problem independently with a few varying parameters. Once that test is complete, we will test the population_monitor, applying it to the XOR problem using the generational and steady_state evolutionary loops.

To test the exoself we must first create a test agent by executing the geno- type:create_test() function. The create_test/1 function creates the agent using the default #constraint{} record we defined in the records.hrl file. The default con- straint record uses the following parameters:

-record(constraint,{

morphology=xor_mimic, %xor_mimic

connection_architecture = recurrent, %recurrent|feedforward

neural_afs=[tanh,cos,gaussian,absolute], %[tanh,cos,gaussian,absolute,sin,sqrt,sigmoid],

neural_pfs=[none], %[none,hebbian,neuro_modulated]

neural_aggr_fs=[dot_product], %[dot_product, mult_product, diff]

tuning_selection_fs=[all], %[all,all_random, recent,recent_random, lastgen,lastgen_random]

tuning_duration_f={const,20}, %[{const,20},{nsize_proportional,0.5},

{nweight_proportional, 0.5}...]

annealing_parameters=[1], %[1,0.9]

perturbation_ranges=[1], %[0.5,1,2,3...]

agent_encoding_types= [neural], %[neural,substrate]

mutation_operators= [{mutate_weights,1}, {add_bias,1}, {mutate_af,1}, {add_outlink,1}, {add_inlink,1}, {add_neuron,1}, {outsplice,1}, {add_sensor,1}, {add_actuator,1}], %[ {mutate_weights,1}, {add_bias,1}, {remove_bias,1}, {mutate_af,1}, {add_outlink,1}, {remove_outLink,1}, {add_inlink,1}, {remove_inlink,1}, {add_sensorlink,1}, {add_actuatorlink,1}, {add_neuron,1}, {remove_neuron,1}, {outsplice,1}, {insplice,1}, {add_sensor,1}, {remove_sensor,1}, {add_actuator,1}, {remove_actuator,1}]

tot_topological_mutations_fs = [{ncount_exponential,0.5}], %[{ncount_exponential,0.5},

{ncount_linear,1}]

population_evo_alg_f=generational, %[generational, steady_state]

population_fitness_postprocessor_f=size_proportional, %[none,nsize_proportional]

population_selection_f=competition %[competition,top3]

}).

With the new mnesia database created, and the polis process started, we now test the creation of a new agent using our modified genotype module, and the up- dated constraint record:

4> genotype:create_test().

{agent,test,neural,0,undefined,test,

{{origin,7.551215163115267e-10},cortex},

{[{0,1}],

[],

[{sensor,undefined,xor_GetInput,undefined,

{private,xor_sim},

2,[],0,undefined,undefined,undefined,undefined,undefined,

undefined}],

[{actuator,undefined,xor_SendOutput,undefined,

{private,xor_sim},

1,[],0,undefined,undefined,undefined,undefined,undefined,

undefined}]},

{constraint,xor_mimic,recurrent,

[tanh,cos,gaussian,absolute],

[none],

[dot_product],

[all],

{const,20},

[1],

[neural],

[{mutate_weights,1},

{add_bias,1},

{mutate_af,1},

{add_outlink,1},

{add_inlink,1},

{add_neuron,1},

{outsplice,1},

{add_sensor,1},

{add_actuator,1}],

[{ncount_exponential,0.5}],

generational,size_proportional,competition},

[],undefined,0,

[{0,[{{0,7.551215163115199e-10},neuron}]}],

all,1,

{const,20},

1

Chapter 11 Decoupling & Modularizing Our Neuroevolutionary Platform [{mutate_weights,1},

{add_bias,1},

{mutate_af,1},

{add_outlink,1},

{add_inlink,1},

{add_neuron,1},

{outsplice,1},

{add_sensor,1},

{add_actuator,1}],

{ncount_exponential,0.5}}

{cortex,{{origin,7.551215163115267e-10},cortex},

test,

[{{0,7.551215163115199e-10},neuron}],

[{{-1,7.551215163115238e-10},sensor}],

[{{1,7.551215163115216e-10},actuator}]}

{sensor,{{-1,7.551215163115238e-10},sensor},

xor_GetInput,

{{origin,7.551215163115267e-10},cortex},

{private,xor_sim},

2

[{{0,7.551215163115199e-10},neuron}],

0,undefined,undefined,undefined,undefined,undefined,undefined}

{neuron,{{0,7.551215163115199e-10},neuron},

0

{{origin,7.551215163115267e-10},cortex},

tanh,none,dot_product,

[{{{-1,7.551215163115238e-10},sensor},

[0.15548205860608455,0.17397940203921358]}],

[{{1,7.551215163115216e-10},actuator}],

[]}

{actuator,{{1,7.551215163115216e-10},actuator},

xor_SendOutput,

{{origin,7.551215163115267e-10},cortex},

{private,xor_sim},

1

[{{0,7.551215163115199e-10},neuron}],

0,undefined,undefined,undefined,undefined,undefined,undefined}

{atomic,{atomic,[ok]}}

It works! The genotype is printed to console, and it includes all the new fea- tures and parameters we've added. The genotype was created without any errors, and thus we can now test the agent by converting the genotype to phenotype, and applying it to the XOR mimicking problem. We do this by executing the exoself:start(Agent_Id,void) function, where void is just an atom in the place where we'd usually use a PId of the population_monitor:

6> exoself:start(test,void).

<0.363.0>

IPIdPs:[{<0.366.0>,[0.15548205860608455,0.17397940203921358]}]

Finished updating genotype

Terminating the phenotype:

Cx_PId:<0.365.0>

SPIds:[<0.366.0>]

NPIds:[<0.368.0>]

APIds:[<0.367.0>]

ScapePids:[<0.364.0>]

Agent:<0.363.0> terminating. Genotype has been backed up. Fitness:0.23792642646665235

TotEvaluations:21

TotCycles:84

TimeAcc:3270

Sensor:{{-1,7.551215163115238e-10},sensor} is terminating.

Actuator:<0.367.0> is terminating.

Neuron:<0.368.0> is terminating.

Cortex:{{origin,7.551215163115267e-10},cortex} is terminating.

It works! The exoself converted the test genotype to its phenotype, tried to tune the NN for 21 evaluations (we know this due to the TotEvaluations: 21 printout), and then terminated all the elements of the NN, and then itself terminated. With this test completing successfully, we can now test the whole neuroevolutionary system by creating a population of agents, applying them to the problem, evolving the population using the population_monitor, and then terminating the system once a termination condition has been reached. To do all of this, we need simply equate #INIT_CONSTRAINTS macro to the list of constraint tuples with the pa- rameters we wish to use, and then execute the population_monitor:test() function.

We first test our neuroevolutionary system with the generational evolutionary algorithm, we do this by setting up the INIT_CONSTRAINT macro as follows:

-define(INIT_CONSTRAINTS,[#constraint{morphology=Morphology,connection_architecture

=CA, population_evo_alg_f=generational }|| Morphology<-[xor_mimic],CA<-[feedforward]]). With the constraint record set, we compile the population_monitor module, and

then run it. For the sake of brevity, only a partial printout to console is shown: 13> c(population_monitor).

...

{ok,population_monitor}

14> population_monitor:test().

Specie_Id:7.551210273616779e-10 Morphology:xor_mimic

- - - - Population monitor started with parameters:{gt,test}

...

Using: Fitness Postprocessor:size_proportional Selection Algorirthm:competition Valid_AgentSummaries:[ {999529.2413070924,2,{7.551210272768589e-10,agent}} ,

{1512.9118761841332,1,{7.551210272806172e-10,agent}},

{1352.7815191404268,2,{7.551210272916421e-10,agent}},

{302.13492581117015,1,{7.551210273600174e-10,agent}},

{24.488124260342552,1,{7.551210273564292e-10,agent}}]

Invalid_AgentSummaries:[{10.146259718093239,2,{7.551210273018837e-10,agent}},

{0.49999586555860426,1,{7.551210273039854e-10,agent}},

{0.4999165758596308,1,{7.551210272834928e-10,agent}},

{0.49062112602642133,2,{7.551210273073314e-10,agent}},

{0.4193814651130151,1,{7.551210272863694e-10,agent}}]

...

- - - - Population_Monitor:test shut down with Reason:normal OpTag:continue, while in OpMode:gt

- - - - Tot Agents:10 Population Generation:2 Eval_Acc:878 Cycle_Acc:3512 Time_Acc:476237

It works! We can execute population_monitor:test() a few more times to ensure that this was not a fluke and that it does indeed work. We next test the system us- ing the new steady_state evolutionary loop. To accomplish this, all we need to do is modify the INIT_CONSTRAINTS, changing the population_evo_alg_f parame- ter from generational to steady_state :

-define(INIT_CONSTRAINTS,[#constraint{morphology=Morphology,connection_architecture =CA, population_evo_alg_f=steady_state} || Morphology<-[xor_mimic], CA<-[feedforward]]).

With this modification, we again compile the population_monitor module, and execute population_monitor:test():

1> c(population_monitor).

...

{ok,population_monitor}

2> population_monitor:test().

Specie_Id:7.55120754326942e-10 Morphology:xor_mimic

- - - - Population monitor started with parameters:{gt,test}

...

Agent:<0.4146.0> terminating. Genotype has been backed up.

Fitness:959252.9093456662

TotEvaluations:45

TotCycles:180

TimeAcc:2621

Neuron:<0.4151.0> is terminating.

Sensor:{{-1,7.551207540548967e-10},sensor} is terminating.

Actuator:<0.4150.0> is terminating.

11.6 Summary & Discussion

Neuron:<0.4152.0> is terminating.

Cortex:{{origin,7.551207540549013e-10},cortex} is terminating.

- - - - Population_Monitor:test shut down with Reason:normal OpTag:continue, while in OpMode:gt

- - - - Tot Agents:10 Population Generation:0 Eval_Acc:2338 Cycle_Acc:9352 Time_Acc:630207

It works! Note that the Population Generation is 0 in this case, because it never gets incremented. While at the same time the evaluation accumulator (Eval_Acc), is 2338. The fitness achieved is high, Fitness: 959252 . Thus our system is func- tioning well, and can easily and efficiently solve the simple XOR problem. We should try the system with a few other parameters to ensure that it works. But with these tests done, we are now in a possession of an advanced, modularly designed, decoupled, scalable, and agile, neuroevolutionary platform.

In this chapter we have modified the neuroevolutionary system we finished building by Chapter-9. We have decoupled numerous features of our system, in- cluding plasticity, signal aggregation, annealing parameters, various tuning pa- rameters, and even the evolutionary algorithm loop type. All these decoupled ele- ments of our neuroevolutionary system were given their own modules, and all these features can be accessed through their own function names. Due to these various elements now being specified by their names and parameters, rather than built and embedded into the functionality of the system, we can evolve them, mu- tate them, and change them during the evolutionary process. Thus, our system can now mutate and evolve new plasticity rules, tuning parameters, signal aggregation functions … in the same way that our original system allowed for the neurons to change and swap their activation functions.

This decoupling and modularization of the neuroevolutionary system design al- so makes it that much easier to add new elements, and to test out new features in the future. The new architecture of our system allows us to build new functions, new functionalities, without affecting the already tested and working elements of our TWEANN platform.

After making these modifications, we have tested the creation of a new agent using its newly modified genotypic and phenotypic elements. We have tested and found the creation of the genotype, the conversion of the genotype to phenotype, the processing ability of the exoself, and the functionality of the popula- tion_monitor with a generational and steady_state evolutionary loop, all functional and in perfect working condition. With this complete, we can now begin to add the various advanced features to our neuroevolutionary platform.

As we have found in this chapter, after adding new features, whether they simply optimize or augment the architecture, or whether they add new features to improve the evolutionary properties of the system, the resulting platform must be tested, and it must be benchmarked. For this reason, in the following chapter we will modify the population_monitor to keep track of the various evolutionary per- formance statistics and properties that change over time. Statistics like the average neural network size of the population, the maximum, minimum, average, and standard deviation of the population's fitness, and the numerous other evolution- ary parameters that might further shine light on our system's performance.

11.7 References

[1] Paulsen O, Sejnowski TJ (2000). Natural Patterns of Activity and Long-Term Synaptic Plas- ticity. Current Opinion in Neurobiology 10, 172-179.

[2] Oja E (1982) A Simplified Neuron Model as a Principal Component Analyzer. Journal of Mathematical Biology 15, 267-273.

[3] Bertsimas D, Tsitsiklis J (1993) Simulated Annealing. Statistical Science 8, 10-15.

[4] Source code for each chapter can be found here: https://github.com/CorticalComputer/ Book_NeuroevolutionThroughErlang

Chapter 12 Keeping Track of Important Population and Evolutionary Stats

Abstract To be able to keep track of the performance of a neuroevolutionary system, it is essential for that system to be able to accumulate the various statistics with regards to its fitness, population dynamics, and other changing features, throughout the evolutionary run. In this chapter we add to the population_monitor of our TWEANN system the ability to compose a trace, which is a list of tuples, where each tuple is calculated every 500 (by default) evaluations, containing the various statistics about the population achieved during those evaluations, tracing the population's path through its evolutionary history.

As we discussed in Section-8.1, and as was presented in Fig-8.1 , the architec- ture we're going after needs to contain, beside the polis, population_monitor, da- tabase , and agent processes, also the stat_accumulator and the error_logger con- current processes. In some sense, we do not really need to create the error_logger, because we can use the already robust implementation offered by OTP. The stat_accumulator on the other hand is something that needs to be built, and that is the program we will develop in this chapter.

Because the goal is to continue improving and generalizing our neuroevolutionary platform, the way to test whether our neuroevolutionary system works or not is by benchmarking it. But to properly analyze the results of the benchmarks we need to gather the resulting statistics produced by the popula- tion_monitor and other processes, so that these statistics can then be graphed and perused for signs of improvements based on the new additions to the platform. In this chapter we further modify the population_monitor process, adding to it the ability to keep track of the various important population and evolutionary accumu- lated statistics.

The updated population_monitor should be able to compose useful population information, build lists of this information, and later on be able to write to file and produce data in a form that can be graphed. The population parameters that we would like to keep track of are as follows:

1. How the average NN complexity/size is changing in the population, the aver- age NN size, the maximum NN size, the minimum NN size, and the standard deviation of the NN sizes.

2. How the fitness is changing over time with regards to evaluations. Again, we want to keep track of the average population fitness over time, maximum fit- ness over time, minimum fitness over time, and standard deviation of the same.

3. Population diversity is another element that is useful to keep track off, since we want to know whether our system produces diverse populations, or not at all. It

DOI 10.1007/978-1-4614- 4463 - 3_12, © Springer Science+Business Media New York 2013

is essential that the TWEANN system is able to maintain a high diversity on its own, only then does it have a chance to truly be general and innovative, and have a chance of evolving solutions and NN based systems capable of solving complex and varying problems.

Without a doubt there are other interesting features that should be kept track off, and thus we should develop and implement this stat_accumulator program with an eye to the future, allowing for easy expans ion of its capabilities when the time comes. In the following sections we will first discuss the new architecture and how it will work with the system we've developed so far. Then we will develop a format for how this information should be stored and accumulated. Afterwards, we will implement the actual system, making it an extension and part of the popu- lation_monitor itself, rather than an independent process. Finally, we will test our updated population_monitor on the XOR problem, demonstrating its new ability to gather data about the evolutionary path the population is taking, and the various performance statistics of the population and its species.

12.1 The Necessary Additions to the System

Whereas before each agent sent to the population monitor the total number of evaluations it had completed, the number of cycles, and the amount of time taken for tuning, we need our new system to be more precise. Since an agent can take anywhere from Max_Attempts number of evaluations per tuning attempt to hun- dreds or even thousands of evaluations before reporting that number to the popula- tion_monitor, we need to update the way each agent informs the population moni- tor when it has completed an evaluation, so that the population monitor can keep track of every evaluation. This will allow the population_monitor to not only stop or terminate an agent at the exact number of evaluations that is set as a limit, but also it will allow the population monitor to build statistics about its population of agents, for example every 500 evaluations. If each agent would have contacted the population monitor only at its termination, then the population_monitor would not be able to calculate the various features of the population at the specific evaluation index, and instead would be at the mercy of whenever each agent finishes all its evaluations.

To accomplish this, we simply let each agent send a signal to the popula- tion_monitor whenever it has finished an evaluation. And for its part, the popula- tion_monitor will have a new cast clause, specifically dedicated to receiving eval- uation_completed messages from the agents.

Because we are finally at the point where we also can track the population di- versity, we now have a chance to use the fingerprint tuple that each agent con- structs. The diversity of the population loosely defines the total number of differ- ent species, or the total number of significantly different agents. This “significant ”

difference can be reflected in the agents having different topologies, using a dif- ferent number of neurons, using a different set of activation functions, having tak- en a different evolutionary path (reflected by the different evo_hist lists of the compared agents), and finally by having a different set of sensors and actuators. At this point we define the specie by the particular morphology that the specie supports. Thus at this time, each specie can have many topologically different agents, but all using the same constraint and morphology, interfacing with the same type of scapes, having access to the same set of sensors and actuators, and competing with each other for fitness and offspring allotments using the same evo- lutionary algorithm loops. Thus for now, we will calculate the diversity not based on the number of species, but the number of significantly differing agents within the entire population.

To calculate the population diversity, we must first decide on the defining char- acteristics of an agent, the granularity of diversity. Though the fingerprint of the agent is a good start, let us expand it to also include not only the evolutionary path of the agent, and the sensors and actuators used, but also a few defining topologi- cal features of a NN based agent. At this time, the agent's fingerprint is defined by the tuple: {GeneralizedPattern, GeneralizedEvoHist, GeneralizedSensors, GeneralizedActuators} . We change this definition so that the fingerprint tuple also includes a topological summary, which is itself defined by the tuple: {type, tot_neurons, tot_n_ils, tot_n_ols, tot_n_ros, af_distribution} where:

type : Is the NN encoding type: neural or substrate , for example.

tot_neurons : Is the total number of neurons composing the NN based agent. tot_n_ils : Is the total number of neuronal input links, calculated by adding to- gether the number of input links of every neuron in the NN based agent. tot_n_ols : Is the total number of neuronal output links, counted in the same way as input links. Though somewhat redundant, the tot_n_ils and tot_n_ols will differ from each other based on the number of sensors and actuators used by the agent.

tot_n_ros : Is the total number of recurrent connections within the NN system. af_distribution : Is the count of every type of activation function used by the NN based agent. This has the format of: {TotTanh, TotSin, TotCos, TotGaussian, TotAbsolute,TotSgn,TotLog,TotSqrt,TotLin} , and thus agents which have the same topology, but whose neurons use different sets of activa- tion functions, will be functionally very different, and this activation function frequency distribution tuple will to some degree summarize these differences. There could of course be numerous agents with the same topology and the same activation function frequency distribution, but which differ in the neurons which use those activation functions, and locations of those neurons within the NN topology. Nevertheless, this gives us a short and easy way to calculate a summary which could be used in addition to other fingerprint elements to dis- tinguish between two functionally different agents. These summaries can be further considered as representing how the different agents are exploring the different areas on the topological landscape.

Thus, based on these defining characteristics, two agents with different Finger- prints (which will include their topological summaries), will warrant being con- sidered as significantly different. The diversity is then the total number of differ- ent fingerprints within the population.

To allow our system to keep track of progress and the changing properties and statistics of the population, we first need to decide on the step_size during which these statistics will be calculated. The step size is defined as X number of evalua- tions, such that every X number of evaluations we measure the various evolution- ary statistics and properties of the population at hand. For example, if X is defined as 500 evaluations, then every 500 evaluations we would compute the average fit- ness of the population, the max, min , and std of the fitness within the population, the diversity of the population, the average size of the NN systems in the popula- tion... And then create a format for adding the tuple with this information to a list that traces out the evolutionary path of the population. Even better, we could cal- culate these values for every specie composing the population, and then compose the trace from the list of tuples where each tuple was created for a particular spe- cie (defined by a different morphology) of the population. The trace would then be a list of lists, where the inside list would be the list of Specie Stats: [SpeciesStats1, SpeciesStats2... SpeciesStatsN] , with the outer list then being the actual trace which traces out the various general properties of the evolving population by cal- culating the various decided-on properties of the species composing this popula- tion. The SpeciesStats is a list of tuples, where each tuple contains the general sta- tistics and various properties (Average fitness, average NN sizes...) of a particular specie belonging to the population. Thus in the next section we will create the format and manner in which we will store and gather this information.

12.2 The Trace Format

We want the population monitor to, every X number of evaluations, calculate the general properties of each specie in the population that it is evolving. W e will call the record that stores these various statistical properties of the specie: stat . The specie stat will have the following format:

-record(stat,{morphology,specie_id,avg_neurons,std_neurons,avg_fitness,std_fitness,

max_fitness, min_fitness,avg_diversity,evaluations,time_stamp}).

Where the definition of each of the elements within this record is as follows: morphology : This is the specie's morphology.

specie_id : Is the id of the specie for whom the stat is calculated.

avg_neurons : The size of the average NN based agent of the specie, calculated by summing up all the neurons of the specie, and dividing the number by the number of agents belonging to the specie at the time of calculation.

std_neurons : The standard deviation of the avg_neurons value.

avg_fitness : Is the average fitness of the agents within the population. std_fitness : Is the standard deviation of the avg_fitness value.

max_fitness : The maximum fitness achieved by the agents within the specie at the time of calculation.

min_fitness : The minimum fitness achieved by the agents within the specie at the time of calculation.

avg_diversity : The average diversity of the specie.

evaluations : The number of evaluations that the agents of this particular specie used during the given X number of evaluations. So for example if the popula- tion calculates the specie stats every 500 evaluations, and there are 2 species in the population, then one specie might have taken 300 of the evaluations if its organisms kept dying rapidly, and the other specie would then have taken the other 200 during that 500 evaluations slot. This value represents the turnover of the specie, and is dependent on numerous factors, amongst which is of course the number of the species in the population (if only one, then it will take all 500 evaluations), the general fitness of the agents (how often they die or get re- evaluated if applied to ALife for example), and the specie size.

time_stamp : Finally, we also time stamp the stat record at the moment of cal- culation.

Thus, every X (500 by default) number of evaluations, for every specie we cal- culate all the properties of the stat , and thus form a list of stat elements: SpeciesStats = [Specie_Stat1, Specie_Stat2,...Specie_StatN] , where N is the num- ber of species in the population at the time of calculating the SpeciesStats.

We then enter this list of specie stats into a list we will call the population stats . The stats list will belong to the population's element by the name trace. We call it a trace because when we dump the population's trace to console, we can see the general progress of the population, the number of species, and the properties of those species, as it is outlined by their stat tuples. It is a trace of the population's evolutionary history, it is the evolutionary path that the population has taken.

The trace element will be represented by the record: -record(trace,{stats=[], tot_evaluations=0, step_size=500}) , where stats is the list which will contain the lists of SpeciesStats . The element tot_evaluations will keep track of the total num- ber of evaluations, so that we can at any time pause the population_monitor, and later on continue counting the evaluations when we resume, starting from the pre- vious stop. Finally, we specify through the step_size element the value which de- termines X, the number every how many evaluations that we will calculate the population's various general properties.

Thus the population's stats list, being a list of lists, will have the following format: [[SpecieStat1...SpecieStatN],...[SpecieStat1...SpecieStatN]] Allowing us to easily extract any one particular list of specie stats, a list which belongs to some particular 500 evaluations slot window.

12.3 Implementation

Having decided on the format through which we will keep track of the specie and population statistics, and the modifications that need to be added to the exoself and population_monitor, lets implement these new features. We will first modify the records.hrl file, adding the three new records: trace , stat , and topolo- gy_summary , needed for the new extended fingerprint tuple. Then we will im- plement the function that calculates the topological summary of an agent. We will then implement the function that constructs the specie stat tuples, and updates the population's trace with the accumulated specie stat list. We will then modify the population_monitor module, implementing the evaluation counting cast clause, which takes it upon itself to build and add to the population's trace every X num- ber of evaluations, where X is specified by the step_size parameter in the trace record. Finally, we will make the small modification to the exoself module, mak- ing the agent's exoself notify the population_monitor that it has completed an evaluation after each such completion, rather than doing so at the very end when the agent is ready to terminate.

12.3.1 Updating records.hrl

This chapter's new features require us to create the new topology_summary, stat , and trace, records. We also have to update the population record, so that it has a trace parameter, and we need to update the specie record so that it can keep a list of its stat tuples. Though the population's trace parameter will keep a list of lists of the specie stat tuples (list of SpeciesStats, where SpeciesStats itself is a list), each specie will also keep a list of its own stat tuples.

The following are the three new records: topology_summary, stat, and trace :

-record(topology_summary,{type,tot_neurons,tot_n_ils,tot_n_ols,tot_n_ros,af_distribution}).

-record(stat,{morphology,specie_id,avg_neurons,std_neurons,avg_fitness,std_fitness,

max_fitness, min_fitness, avg_diversity,evaluations,time_stamp}).

-record(trace,{stats=[],tot_evaluations=0,step_size=500}).

After adding these three records to the records.hrl, we now update the popula- tion and specie records. The elements in boldface, are the ones newly added:

-record(population,{id, polis_id, specie_ids=[], morphologies=[], innovation_factor, evo_alg_f, fitness_postprocessor_f, selection_f, trace=#trace{} }).

-record(specie,{id, population_id, fingerprint, constraint, agent_ids=[], dead_pool=[], champion_ids=[], fitness, innovation_factor={0,0}, stats=[] }).

With this done, we move on to building the function that constructs the topo- logical summary.

12.3.2 Building the Topological Summary of a Neural Network

The topological summary will become part of the agent's fingerprint, catalog- ing the number of neurons, synaptic connections, and types of activation functions used by it. We put this new function inside the genotype module, since it will be called by the update_fingerprint/1 function. Listing 12.1 shows the modified up- date_fingerprint/1 function, and the new get_NodeSummary/1 function which cal- culates the activation function frequency distribution tuple (the number and types of activation functions used by the NN system), and counts the total number of links used by the NN system. The modified and added parts of the up- date_fingerprint/1 function are highlighted with boldface.

Listing-12.1 The modified update_fingerprint/1 function.

update_fingerprint(Agent_Id)->

A = read({agent,Agent_Id}),

Cx = read({cortex,A#agent.cx_id}),

GeneralizedSensors = [(read({sensor,S_Id}))#sensor{id=undefined,cx_id=undefined,

fanout_ids=[]} || S_Id<-Cx#cortex.sensor_ids],

GeneralizedActuators = [(read({actuator,A_Id}))#actuator{id=undefined, cx_id=undefined,

fanin_ids=[]} || A_Id<-Cx#cortex.actuator_ids],

GeneralizedPattern = [{LayerIndex,length(LNIds)}||{LayerIndex,LNIds}<-A#agent.pattern],

GeneralizedEvoHist = generalize_EvoHist(A#agent.evo_hist,[]),

N_Ids = Cx#cortex.neuron_ids,

{Tot_Neuron_ILs,Tot_Neuron_OLs,Tot_Neuron_ROs,AF_Distribution} =

get_NodeSummary(N_Ids),

Type = A#agent.encoding_type,

TopologySummary = #topology_summary{

type = Type,

tot_neurons = length(N_Ids),

tot_n_ils = Tot_Neuron_ILs,

tot_n_ols = Tot_Neuron_OLs,

tot_n_ros = Tot_Neuron_ROs,

af_distribution = AF_Distribution},

Fingerprint = {GeneralizedPattern,GeneralizedEvoHist,GeneralizedSensors,

GeneralizedActuators, TopologySummary },

write(A#agent{fingerprint=Fingerprint}).

%update_fingerprint/1 calculates the fingerprint of the agent, where the fingerprint is just a tu-

ple of the various general features of the NN based system. The genotype's fingerprint is a list

of features that play some role in distinguishing its genotype's general properties from those of

other NN systems. The fingerprint is composed of the generalized pattern (pattern minus the unique ids), generalized evolutionary history (evolutionary history minus the unique ids of the elements), a generalized sensor set, and a generalized actuator set.

update_NNTopologySummary(Agent_Id)->

A = mnesia:read({agent,Agent_Id}),

Cx_Id = A#agent.cx_id,

Cx = mnesia:read({cortex,Cx_Id}),

N_Ids = Cx#cortex.neuron_ids,

{Tot_Neuron_ILs,Tot_Neuron_OLs,Tot_Neuron_ROs,AF_Distribution} =

get_NodeSummary(N_Ids),

Type = A#agent.encoding_type,

Topology_Summary = #topology_summary{

type = Type,

tot_neurons = length(N_Ids),

tot_n_ils = Tot_Neuron_ILs,

tot_n_ols = Tot_Neuron_OLs,

tot_n_ros = Tot_Neuron_ROs,

af_distribution = AF_Distribution},

Topology_Summary.

%The update_NNTopologySummary/1 function calculates the total number of input links, out- put links, recurrent links, neurons, and uses the get_NodeSummary/5 function to compose the activation function frequency distribution tuple. It then enters all the calculated values into the topology_summary record, and returns it to the caller.

get_NodeSummary(N_Ids)->

get_NodeSummary(N_Ids,0,0,0,{0,0,0,0,0,0,0,0,0}).

get_NodeSummary([N_Id|N_Ids],ILAcc,OLAcc,ROAcc,FunctionDistribution)->

N = genotype:read({neuron,N_Id}),

IL_Count = length(N#neuron.input_idps),

OL_Count = length(N#neuron.output_ids),

RO_Count = length(N#neuron.ro_ids),

AF = N#neuron.af,

{TotTanh,TotSin,TotCos,TotGaussian,TotAbsolute,TotSgn,TotLog,TotSqrt,TotLin} =

FunctionDistribution,

U_FunctionDistribution= case AF of

tanh ->{TotTanh+1,TotSin,TotCos,TotGaussian,TotAbsolute,TotSgn,

TotLog,TotSqrt,TotLin};

sin ->{TotTanh,TotSin+1,TotCos,TotGaussian,TotAbsolute,TotSgn,

TotLog,TotSqrt,TotLin};

cos ->{TotTanh,TotSin,TotCos+1,TotGaussian,TotAbsolute,TotSgn,

TotLog,TotSqrt,TotLin};

gaussian->{TotTanh,TotSin,TotCos,TotGaussian+1,TotAbsolute,TotSgn,

TotLog,TotSqrt,TotLin};

absolute->{TotTanh,TotSin,TotCos,TotGaussian,TotAbsolute+1,TotSgn,

TotLog,TotSqrt,TotLin};

sgn ->{TotTanh,TotSin,TotCos,TotGaussian,TotAbsolute,TotSgn+1,

TotLog,TotSqrt,TotLin};

log ->{TotTanh,TotSin,TotCos,TotGaussian,TotAbsolute,TotSgn, TotLog+1,

TotSqrt,TotLin};

sqrt ->{TotTanh,TotSin,TotCos,TotGaussian,TotAbsolute,TotSgn, TotLog,

TotSqrt+1,TotLin};

linear ->{TotTanh,TotSin,TotCos,TotGaussian,TotAbsolute,TotSgn, TotLog,

TotSqrt,TotLin+1};

Other -> io:format( “Unknown AF, please update AF_Distribution tuple with:~p~n. ”,

[Other])

end,

U_ILAcc = IL_Count+ILAcc,

U_OLAcc = OL_Count+OLAcc,

U_ROAcc = RO_Count+ROAcc,

get_NodeSummary(N_Ids,U_ILAcc,U_OLAcc,U_ROAcc,U_FunctionDistribution);

get_NodeSummary([],ILAcc,OLAcc,ROAcc,FunctionDistribution)->

{ILAcc,OLAcc,ROAcc,FunctionDistribution}.

As shown in the get_NodeSummary function, we create the activation function frequency distribution tuple by simply counting the different activation functions, and forming a long tuple where every activation function (thus far used by our system) has a position. Though simple, this is not the best implementation, be- cause every time a new activation function is added, we will need to update the AF distribution tuple such that it takes the new activation function into considera- tion. Having now updated the update_fingerprint/1 function, we can implement in the next section the population diversity calculating function needed by the stat composing program.

12.3.3 Implementing the Trace Updating Cast Clause

Our new population_monitor no longer accepts the AgentEvalAcc, AgentCyc – leAcc, and AgentTimeAcc containing message from the agent when it terminates. The exoself no longer sends to the population_monitor the message {Agent_Id, terminated, Fitness, AgentEvalAcc, AgentCycleAcc, AgentTimeAcc} . This message was accepted by the cast clauses of the generational and steady_state evolutionary loops. We modify both of these cast clauses to only accept the message: {Agent_Id,terminated,Fitness} , because we now offload the evaluation, cycle, and time accumulation, to its own dedicated cast clause. This new cast clause will ac- cept the message of the form: {From,evaluations,Specie_Id,AgentEvalAcc, AgentCycleAcc,AgentTimeAcc}, sent to it by the agent after it completes every single evaluation.

Because we want for the stat tuple to be constructed for each specie in the pop- ulation, we need to keep a running evaluation counter for each specie, so that when an agent sends the {From,evaluations...} message to the popula- tion_monitor, this evaluation can be added to the evaluation counter belonging to the proper specie. We modify the init/1 function with the parts shown in boldface in the following listing.

Listing-12.2. The updated init/1 function.

init(Parameters) ->

process_flag(trap_exit,true),

register(monitor,self()),

io:format( “******** Population monitor started with parameters:~p~n ”,[Parameters]),

State = case Parameters of

{OpMode,Population_Id}->

Agent_Ids = extract_AgentIds(Population_Id,all),

ActiveAgent_IdPs = summon_agents(OpMode,Agent_Ids),

P = genotype:dirty_read({population,Population_Id}),

[put({evaluations,Specie_Id},0) || Specie_Id<-P#population.specie_ids],

T = P#population.trace,

TotEvaluations=T#trace.tot_evaluations,

io:format( “Initial Tot Evaluations:~p~n ”,[TotEvaluations]), #state{op_mode=OpMode,

population_id = Population_Id,

activeAgent_IdPs = ActiveAgent_IdPs,

tot_agents = length(Agent_Ids),

agents_left = length(Agent_Ids),

op_tag = continue,

evolutionary_algorithm = P#population.evo_alg_f,

fitness_postprocessor = P#population.fitness_postprocessor_f,

selection_algorithm = P#population.selection_f,

best_fitness = 0,

step_size = T#trace.step_size,

tot_evaluations = TotEvaluations

}

end,

{ok, State}.

It is the line:

[put({evaluations,Specie_Id},0) || Specie_Id <-P#population.specie_ids]

In the above code which is the one that initializes the evaluation counters for each species in the population. This way, when an agent sends its evaluation mes- sage, we can, using the Specie_Id within the agent's message, execute the com-

mand: get({evaluations,Specie_Id}), and retrieve the proper specie's evaluations accumulator from the process registry. Also, because the population's evaluation accumulator will be held by the #trace.tot_evaluations , we initialize the popula- tion_monitor state's initial tot_evaluations value from this trace parameter. We al- so add to the population_monitor's state record the step_size parameter, and set it to the step_size specified by the population's trace record. The rest of the init/1 function remains the same.

The new cast clause that will update the population's trace , accepts messages from the agents and keeps count of the total number of evaluations. If the number of evaluations performed by the population_monitor in question exceeds the value specified by the step_size parameter, the population_monitor updates the trace by composing the specie stat tuples and entering them into the trace's stats list. The implementation of this new cast clause is shown in Listing-12.3.

Listing-12.3. The population_monitor's new evaluation accumulating and trace updating cast clause.

handle_cast({From,evaluations,Specie_Id,AgentEvalAcc,AgentCycleAcc,AgentTimeAcc},S)->

Eval_Acc = S#state.eval_acc,

U_EvalAcc = S#state.eval_acc+AgentEvalAcc,

U_CycleAcc = S#state.cycle_acc+AgentCycleAcc,

U_TimeAcc = S#state.time_acc+AgentTimeAcc,

U_TotEvaluations = S#state.tot_evaluations + AgentEvalAcc, SEval_Acc=get({evaluations,Specie_Id}),

put({evaluations,Specie_Id},SEval_Acc+AgentEvalAcc),

case Eval_Acc rem 50 of

0 ->

io:format( “Evaluations/Step:~p~n ”,[Eval_Acc]);

_ ->

done

end,

U_S=case U_EvalAcc >= S#state.step_size of

true ->

gather_STATS(S#state.population_id,U_EvalAcc),

Population_Id = S#state.population_id,

P = genotype:dirty_read({population,Population_Id}),

T = P#population.trace,

TotEvaluations=T#trace.tot_evaluations,

io:format( “Tot Evaluations:~p~n ”,[TotEvaluations]),

S#state{eval_acc=0, cycle_acc=0, time_acc=0,

tot_evaluations=U_TotEvaluations};

false ->

S#state{eval_acc=U_EvalAcc,cycle_acc=U_CycleAcc,time_acc=U_TimeAcc,tot_evaluations

=U_TotEvaluations}

{noreply,U_S};

handle_cast({_From,print_TRACE},S)->

Population_Id = S#state.population_id,

P = genotype:dirty_read({population,Population_Id}),

io:format( “******** TRACE ********:~n~p~n ”,[P#population.trace]),

{noreply,S};

There is a second cast clause in the above listing: han- dle_cast({_From,print_TRACE},S) , which when receiving a print_TRACE re- quest, prints to console the thus-far-composed trace. And of course the trace is al- so automatically printed to console every 500 evaluations by the population_monitor itself, to keep the researcher in the loop of the general evolu- tionary progress of the population at hand.

By default, every 500 evaluations the population_monitor executes the gath- er_STATS/2 function, which is the actual function that updates all the specie stat lists and the population's trace. This function is shown in Listing-12.4. When exe- cuted, the gather_STATS/2 function executes the update_SpecieSTAT/2 function for every specie belonging to the population. The update_SpecieSTAT/2 function retrieves the evaluation accumulator from the process registry, so that the evalua- tion accumulator value can be stored in the specie's stat tuple. The function then goes through every agent and adds up the number of neurons in each, and then di- vides that number by the total number of agents and thus computes the average NN size and its standard deviation. In the same manner, the function then calcu- lates the average fitness, fitness standard deviation, and the max and min fitness values reached by the agents belonging to the specie at the time of calculation. Fi- nally, the diversity of the specie is calculated by executing the calcu- late_SpecieDiversity/1 function. This function, using the agent fingerprints, calcu- lates how many distinct fingerprints are present within the specie, which is the diversity number of that specie. With all these values computed, the function then enters this data into the stat record, and adds this new stat to the specie's stats list.

Listing-12.4 The implementation of the gather_STATS/2 function, which updates the popula-

tion's trace.

gather_STATS(Population_Id,EvaluationsAcc)->

io:format( “Gathering Species STATS in progress~n ”),

TimeStamp = now(),

F = fun() ->

P = genotype:read({population,Population_Id}),

T = P#population.trace,

SpecieSTATS = [update_SpecieSTAT(Specie_Id,TimeStamp) || Specie_Id<-

P#population.specie_ids],

PopulationSTATS = T#trace.stats,

U_PopulationSTATS = [SpecieSTATS|PopulationSTATS],

U_TotEvaluations = T#trace.tot_evaluations+EvaluationsAcc,

U_Trace = T#trace{

stats = U_PopulationSTATS,

tot_evaluations=U_TotEvaluations

},

io:format( “Population Trace:~p~n ”,[U_Trace]),

mnesia:write(P#population{trace=U_Trace})

end,

Result=mnesia:transaction(F),

io:format( “Result:~p~n ”,[Result]).

update_SpecieSTAT(Specie_Id,TimeStamp)->

Specie_Evaluations = get({evaluations,Specie_Id}),

put({evaluations,Specie_Id},0),

S = genotype:read({specie,Specie_Id}),

{Avg_Neurons,Neurons_Std} = calculate_SpecieAvgNodes({specie,S}),

{AvgFitness,Fitness_Std,MaxFitness,MinFitness} = calculate_SpecieFitness({

specie,S}),

SpecieDiversity = calculate_SpecieDiversity({specie,S}),

STAT = #stat{

morphology = (S#specie.constraint)#constraint.morphology,

specie_id = Specie_Id,

avg_neurons=Avg_Neurons,

std_neurons=Neurons_Std,

avg_fitness=AvgFitness,

std_fitness=Fitness_Std,

max_fitness=MaxFitness,

min_fitness=MinFitness,

avg_diversity=SpecieDiversity,

evaluations = Specie_Evaluations,

time_stamp=TimeStamp

},

STATS = S#specie.stats,

U_STATS = [STAT|STATS],

mnesia:dirty_write(S#specie{stats=U_STATS}),

STAT.

calculate_SpecieAvgNodes({specie,S})->

Agent_Ids = S#specie.agent_ids,

calculate_AvgNodes(Agent_Ids,[]);

calculate_SpecieAvgNodes(Specie_Id)->

io:format( “calculate_SpecieAvgNodes(Specie_Id):~p~n ”,[Specie_Id]),

S = genotype:read({specie,Specie_Id}),

calculate_SpecieAvgNodes({specie,S}).

calculate_AvgNodes([Agent_Id|Agent_Ids],NAcc)->

io:format( “calculate_AvgNodes/2 Agent_Id:~p~n ”,[Agent_Id]),

A = genotype:read({agent,Agent_Id}),

Cx = genotype:read({cortex,A#agent.cx_id}),

Tot_Neurons = length(Cx#cortex.neuron_ids),

calculate_AvgNodes(Agent_Ids,[Tot_Neurons|NAcc]);

calculate_AvgNodes([],NAcc)->

{functions:avg(NAcc),functions:std(NAcc)}.

calculate_SpecieDiversity({specie,S})->

Agent_Ids = S#specie.agent_ids,

Diversity = calculate_diversity(Agent_Ids);

calculate_SpecieDiversity(Specie_Id)->

S = genotype:dirty_read({specie,Specie_Id}),

calculate_SpecieDiversity({specie,S}).

calculate_diversity(Agent_Ids)->

calculate_diversity(Agent_Ids,[]).

calculate_diversity([Agent_Id|Agent_Ids],DiversityAcc)->

A = genotype:read({agent,Agent_Id}),

Fingerprint = A#agent.fingerprint,

U_DiversityAcc = (DiversityAcc -- [Fingerprint]) ++ [Fingerprint], calculate_diversity(Agent_Ids,U_DiversityAcc);

calculate_diversity([],DiversityAcc)->

length(DiversityAcc).

Finally, because we have changed from using eval_acc parameter in the state record, and because we wish for the population_monitor to dump the trace tuple to console when it has terminated, we must also update the terminate/2 function. The updated terminate(Reason,S) function is show in the following listing.

Listing-12.5 The updated terminate/2 function.

terminate(Reason, S) ->

case S of

[] ->

io:format( “******** Population_Monitor shut down with Reason:~p, with

State: []~n ”,[Reason]);

_ ->

OpMode = S#state.op_mode,

Population_Id = S#state.population_id,

P = genotype:dirty_read({population,Population_Id}),

T = P#population.trace,

TotEvaluations=T#trace.tot_evaluations,

OpTag = S#state.op_tag,

io:format( “******** TRACE START ********~n ”),

io:format( “~p~n ”,[T]),

io:format( “******** ^^^^ TRACE END ^^^^ ********~n ”),

io:format( “******** Population_Monitor:~p shut down with Reason:~p

OpTag:~p, while in OpMode:~p~n ”,[Population_Id,Reason,OpTag,OpMode]),

io:format( “******** Tot Agents:~p Population Generation:~p

Tot_Evals:~p~n ”,[S#state.tot_agents,S#state.pop_gen,S#state.tot_evaluations])

end.

With this function complete, the only remaining modification that we need to add, is one to the exoself module. We have modified the types of messages the population_monitor process can accept, and thus we have to update the exoself process so that it can properly send such messages to the updated popula- tion_monitor.

12.3.4 Updating the exoself Module

In our current exoself implementation, when the agent exceeds the max_attempts number of improvement attempts, it sends to the popula- tion_monitor its thus-far-achieved fitness and the total number of evaluations, cy- cles, and time taken to achieve it, by sending to the population monitor the mes- sage: {S#state.agent_id,terminated, U_HighestFitness, U_EvalAcc, U_CycleAcc, U_TimeAcc} . The population_monitor now accepts the evaluation, cycles, and time accumulator values separately from the agent's termination signals sent to it when the agent terminates. We modify the termination message to use the format: {Agent_Id,terminated,U_HighestFitness} , and make the agent execute: gen_server:cast(S#state.pm_pid,{self(),evaluations,S#state.specie_id,1,Cycles, Time}) after every completed evaluation. With this small modification, the exoself can now send the properly formatted messages to the population_monitor. The partial source code of the exoself's main loop/1 function, with the modified parts of the source code highlighted in bold, is as follows:

loop(S)->

receive

{Cx_PId,evaluation_completed,Fitness,Cycles,Time}->

...

true -> %End training

A=genotype:dirty_read({agent,S#state.agent_id}),

genotype:write(A#agent{fitness=U_HighestFitness}),

backup_genotype(S#state.idsNpids,S#state.npids),

terminate_phenotype(S#state.cx_pid, S#state.spids, S#state.npids,

S#state.apids, S#state.scape_pids),

gen_server:cast(S#state.pm_pid,{S#state.agent_id,terminated,

U_HighestFitness});

false -> %Continue training

...

loop(U_S)

end

after 10000 ->

io:format( “exoself:~p stuck.~n ”,[S#state.agent_id])

end.

12.4 Compiling & Testing

With the updates to the source code complete, we now test our neuroevolutio – nary system using both, the generational and steady_state evolutionary loops, by applying it to the XOR problem. To do this, we must first recreate the mnesia da- tabase with the updated population record. Then compile the source, then set the population_monitor parameters appropriately, and then finally run the test.

We first execute polis:sync(), polis:reset(), and then polis:start() to recompile and load all the modules, create the mnesia database, and then start the polis pro- cess:

1> polis:sync().

…

...

up_to_date

2> polis:reset().

{atomic,ok}

3> polis:start().

Parameters:{[],[]}

- - - - Polis: ##MATHEMA## is now online.

{ok,<0.181.0>}

With this done, we go into the population_monitor module and set the INIT_CONSTRAINTS to:

-define(INIT_CONSTRAINTS,[#constraint{morphology=Morphology,

connection_architecture =CA, population_evo_alg_f= generational } || Morphology<- [xor_mimic],CA<-[feedforward]]).

And the terminating conditions to:

-define(GENERATION_LIMIT,100).

-define(EVALUATIONS_LIMIT,100000).

-define(FITNESS_GOAL,inf).

This will allow us to first test the new features with the population_monitor running in the generational evolutionary loop, and the fitness goal set to inf, thus letting the neuroevolutionary system to run for at least 100000 evaluations or 100 generations, giving itself plenty of time to compose a long trace. With these pa- rameters set, we compile the population_monitor module, and execute popula- tion_monitor:test(). Because the population_monitor automatically prints the trace every 500 evaluations, if you run the test to completion, and then scroll upwards on the console, you will see the trace printout. In the following listing, I run the population_monitor:test() program, and for the sake of brevity only printout the first and last stat tuples in the trace's stats list:

Listing-12.5 Testing the trace construction using the generational evolutionary loop.

- - - - TRACE START ********

{trace,[[{stat,xor_mimic,7.545734705407886e-10,2.0,0.0,899806.2187523855,

299935.2472592368, 999803.9690288296, 0.4871219879281525,7,500,

{1325,252006,807398}}],

...

[{stat,xor_mimic,7.545734705407886e-10,1.2,0.4,165.48172889712018,

42.53153201193604, 187.42531939466735 ,41.051077769669675,7,500,

{1325,251998,861970}}]],

28000 ,500}

- - - - ^^^^ TRACE END ^^^^ ********

- - - - Population_Monitor:test shut down with Reason:normal OpTag:continue, while in OpMode:gt

- - - - Tot Agents:10 Population Generation:100 Tot_Evals: 28332

It works! The test ran to completion, and the trace was composed and printed to console. The trace produced by your run will differ of course, but the common features will be that the trace will represent the gradual progress from unfit agents to the more fit ones. In the above printout I've highlighted the max fitness reached during the first 500 evaluations, and during the last 500 evaluations, after 28332 evaluations in total. We can now use this trace to create a graph of fitness vs. eval- uations , or NN size vs. evaluations ... Also, we could run the test multiple times, gathering the traces, and then averaging them. Doing so would allow us to better understand the average and general performance of our system, how rapidly the fitness improves, and other temporally progressing features of the evolutionary runs on the particular problem we applied the system to.

In the same manner we can again modify the INIT_CONSTRAINTS, changing the population_evo_alg_f from generational, to steady_state. We then recompile the population_monitor module, and execute the population_monitor:test() func- tion again, the results of which are shown in Listing-12.6.

Listing-12.6 Testing the trace construction using the steady_state evolutionary loop.

- - - - TRACE START ********

{trace,[[{stat,xor_mimic,7.545736660182336e-10,2.210526315789474,

0.4076824574955175,684245.1500749566,464778.56244874984,

999999.995222936 ,0.39673797258986543,9,500,

{1325,251701,23562}}],

...

[{stat,xor_mimic,7.545736660182336e-10,1.4210526315789473,

0.4937279747182558,60098.723519683044,222254.0763078003,

998548.771899295 ,0.20552720368183247,12,500,

{1325,251655,536129}}]],

100000 ,500}

- - - - ^^^^ TRACE END ^^^^ ********

- - - - Population_Monitor:test shut down with Reason:normal OpTag:continue, while in OpMode:gt

- - - - Tot Agents:10 Population Generation:0 Tot_Evals: 100321

Again only the first and last stat lists are shown, where each stat list only has a single stat tuple since there is only one specie within the population. Notice that unlike the last time, where the neuroevolutionary system stopped after 28332 evaluations, here our system continued evolving agents for 100000 evaluations, after which it stopped creating new offspring, and then waited for the remaining agents to terminate (hence the reason for Tot_Evals:100321, a number slightly larger than 100000). This is because in the steady_state evolutionary loop, there are no generations, hence it staying at 0. And because we set the fitness goal to inf, the only remaining termination condition that could be satisfied was the evaluation limit set to 100000.

We have now ran the test using both, generational and steady_state evolution- ary loops, and it worked because we had modified the cast clauses for both of them. We now can be assured that the evaluation counting and other statistic counting features of the population_monitor are independent of which evolution- ary loop we choose to use.

12.5 Summary & Discussion

In this chapter we have extended our neuroevolutionary system, and gave the population_monitor the ability to keep track of the population's various statistics, and an ability to generally monitor how the agents evolve and change over time. Some of those statistics are with regards to the average fitness of the species, other statistics deal with the size of the NN systems, and still other deal with the popula- tion's diversity. All of these are important to keep track of when one attempts to determine whether the system is functioning properly and is able to improve and evolve the population towards the right direction.

Every time a neuroevolutionary system is applied to a problem, or used in a simulation, we need to be able to see how the population is evolving. The tem- poral factors, the diversity, and everything else about the population, needs to be somehow gaged. In this chapter we created a simple extension to the popula- tion_monitor, that allows it to calculate the various statistics every X number of evaluations. The population acquired a new parameter, the trace tuple. The trace not only counts the number of evaluations performed by the population as a whole, but also keeps a list called stats which is a list of lists, where each list is composed of specie stat tuples. The stat tuple holds the statistical features of a par- ticular specie for which it was calculated. In this manner we can keep track of spe- cie turnover values, average neural network sizes, fitness, diversity, efficiency...

The trace constructing program and evaluation counting implementation we created in this chapter is decoupled enough from our general evolutionary system, that we can extend it in the future without worrying of also having to modify the rest of our TWEANN platform. Though at this time the stat tuple keeps track of simply the size, fitness, and diversity of a specie, the record can be easily modified to keep track of other statistics, such as connectedness, level of recurrence, effi- ciency with regards to the use of neurons, evolvability... Using the stats list, we can graph this data easily, and thus determine how our system behaves, where it lacks, what should be improved... But this only allows us to compose a trace of a single population, of a single evolutionary run. When benchmarking, an experi- ment is usually composed of multiple evolutionary runs and applications to a par- ticular problem, and the resulting graphs and statistics are the averages of said evolutionary runs. In the next chapter we create another program that will assist in performing just that task.

Chapter 13 The Benchmarker

Abstract In this chapter we add the benchmarker process which can sequentially spawn population_monitors and apply them to some specified problem/simulation. We also extend the database to include the experiment record, which the benchmarker uses to deposit the traces of the population's evolutionary statistics, and to recover from crashes to continue with the specified experiment. The benchmarker can compose experiments by performing multiple evolutionary runs, and then produce statistical data and GNUplot ready files of the various evolution- ary dynamics and averages calculated within the experiment.

Though in the previous chapter we have completed the development of the most important part of keeping track of the population's statistics and progress, we can still go a step further and add one more program, the benchmarker . When run- ning a simulation or experiment, the progress of the population, the trace, repre- sents a single evolutionary path of the population. When analyzing the functionali- ty of our system, when we want to benchmark a new added element, we might wish to run the simulation multiple times, we might want to create multiple traces for the same problem, and then average them before starting to analyze the func- tionality of our TWEANN, or the results of applying it to some simulation or problem.

The benchmarker process we want to create here is in some sense similar to the one we implemented in Section-7.7. This program will offer us a concise and ro- bust way in which to apply the population_monitor to some problem multiple times, and thus build a dataset by averaging the performance of our neuroevolu – tionary system from multiple applications to the problem, from multiple evolu- tionary runs. The benchmarker will be called with the following parameters:

1. The INIT_CONSTRAINTS parameter, which will specify the type of problem the benchmarker will create the populations for.

2. The parameter N, which will specify the number of times the benchmarker should apply the neuroevolutionary system to the problem.

3. The termination condition parameters (evaluations limit, generation limit, and fitness goal).

The benchmarker's operational scenario would be as follows: The benchmarker process would first spawn the population_monitor. Then wait for the popula- tion_monitor to reach its termination condition, send benchmarker the accumulat- ed trace record, and then terminate. Afterwards, the benchmarker would store the trace into its trace accumulator, and spawn a new population_monitor which would try to solve the problem again. This would continue for N number of times, at which point the benchmarker would have accumulated N traces. It could then average the trace results and form a single trace average (the various averages

DOI 10.1007/978-1-4614- 4463 - 3_13, © Springer Science+Business Media New York 2013

between all the traces composing the experiment). This trace average can then be written to file in the format which can be graphed and visualized, by perhaps a program like gnuplot [1].

In the following sections we will implement this benchmarker process. The ability to determine and graph the performance statistics of a neuroevolutionary system allows one to advance it, to see where it might have flaws and what new features should be added, and the affect of those new features on its performance. The benchmarker program also assists in conducting research, for the results and applications of the neuroevolutionary system must be presented at one point or another, and thus a benchmark of the neuroevolutionary system's general and av- erage performance on some task must be composed. The experiment must be run multiple times, such that the accuracy and the standard deviation of the results can be calculated. And that is exactly what the benchmarker program will assist in doing.

13.1 The benchmarker Architecture

The purpose of the benchmarker process is simple, to spawn a popula- tion_monitor, wait for it to finish solving the problem or reach a termination con- dition and send its composed trace to the benchmarker process (if the benchmarker was the one that spawned the population_monitor), and then respawn another population_monitor, repeating the procedure N times. Once the benchmarker has done this N number of times, and thus has accumulated N traces, the benchmarker is to analyze the traces, build the averages of those traces, and write this data to a file, and optionally print it to console.

Because gnuplot is so prevalent in plotting data in the scientific community, we want the benchmarker to write to file the resulting benchmark data in a format that can be directly used by gnuplot. Some of the information that can be plotted is: Fitness Vs. Evaluations, NN Size Vs. Evaluations , and Specie Diversity Vs. Evalu- ations .

Furthermore, assume that we are running our benchmark on a single machine. We planned on applying our neuroevolutionary system to some problem 100 times, each for 100000 evaluations. And on the 90 evolutionary run there is a power outage, and we lose all 90 evolutionary run traces when we only had 10 more to go before completing the full experiment composed of 100 evolutionary runs. To prevent such situations, we must of course save the trace results which belong to the same experiment, after every evolutionary run. Thus if there is a power outage, or we wish to stop the experiment at some point, we need to ensure that whichever evolutionary runs have already been done, will have their traces backed up, and thus give us a chance to continue with the experiment when we are ready again.

th

13.2 Adding New Records

To add such functionality, we will create a new mnesia table called experiment , which will allow for every experiment to have its own id or name, and a trace_acc list where it will accumulate the traces which belong to that particular experiment. It will be the benchmarker process that will backup the traces to their appropriate experiment entry, after every completed simulation or problem run.

To accomplish all of this, the benchmarker process needs to be able to do the following tasks:

4. Know how many evolutionary runs to perform for the experiment.

5. Know the name of the experiment, so that it can store the traces to their appro- priate locations in the mnesia table.

6. Be able to specify the initial state parameters with which to start the popula- tion_monitor process, and restart it after a crash.

This means that other than adding the experiment record to the records.hrl file and creating a mnesia table of the same name, we must also modify how the popu- lation_monitor is started. Currently, it uses the macros defined within the module. These macros define how large the initial population size should be, the termina- tion conditions... This makes it difficult to start the population_monitor from an- other module, and control the population_monitor's parameters from the same. Thus we will need to expand its state record to include the previously macro defined parameters, and add a new func tion with which to start the popula- tion_monitor, a function which can be executed with a list of parameters, the parameters that are then entered into the state tuple with which the population_monitor is started.

In the following sections we create the new records and add the new table to the mnesia database. We then make a small modification to the popula- tion_monitor module, move the previously macro defined parameters into the state record, and add a new function with which the population_monitor can be started and have its state record initialized. Finally, we then create the actual benchmarker module.

We need to modify the population_monitor's state record, and then add two new records to the records.hrl file. The population_monitor's new state record will include all the elements that were previously defined through the macros of that module. With regards to the two new records to be added to the records.hrl, one of them will be the new mnesia table, experiment, and the other record, pmp (popula- tion monitor parameters) will be used specifically by the benchmarker to call and start the population_monitor process with a certain set of parameters, thus setting the population_monitor's initial state tuple to the proper values.

The population_monitor originally specified its state and other parameters for its operation using the macros and records at the top of the module, as shown in Listing-13.1.

Listing-13.1 The macros and records originally used by the population_monitor process.

-define(INIT_CONSTRAINTS,[#constraint{morphology=Morphology, connec-

tion_architecture=CA, population_evo_alg_f=steady_state} || Morphology<-[xor_mimic],CA<-

[feedforward]]).

-define(SURVIVAL_PERCENTAGE,0.5).

-define(SPECIE_SIZE_LIMIT,10).

-define(INIT_SPECIE_SIZE,10).

-define(INIT_POPULATION_ID,test).

-define(OP_MODE,gt).

-define(INIT_POLIS,mathema).

-define(GENERATION_LIMIT,100).

-define(EVALUATIONS_LIMIT,100000).

-define(GEN_UID,genotype:generate_UniqueId()).

-define(FITNESS_GOAL,1000).

-record(state,{ op_mode, population_id, activeAgent_IdPs=[], agent_ids=[], tot_agents, agents_left, op_tag,agent_summaries=[], pop_gen=0, eval_acc=0, cycle_acc=0, time_acc=0, step_size, next_step, goal_status,evolutionary_algorithm, fitness_postprocessor,

selection_algorithm, best_fitness }).

Because the population_monitor's macros are module specific, and we would like to be able to specify in which manner to start the population_monitor, what its fitness goal should be, evaluation and generation limits, and what polis it should use... we need to move all the macro defined elements into the popula- tion_monitor's state record. This way the benchmarker process can call the popu- lation_monitor and specify all these previously macro defined parameters. We al- so add one extra parameter to the state record, the benchmarker_pid element, which can be set to the PId of the benchmarker process, and then used by the pop- ulation_monitor to send its trace to the benchmarker process that spawned it. The population_monitor's new state record is shown in Listing-13.2, where the newly added elements are shown in boldface.

Listing-13.2 The updated state record of the population_monitor module.

-record(state,{

op_mode = gt,

population_id = test,

activeAgent_IdPs = [],

agent_ids = [],

tot_agents,

agents_left,

op_tag,

agent_summaries = [],

pop_gen = 0,

eval_acc = 0,

cycle_acc = 0,

time_acc = 0,

tot_evaluations = 0,

step_size,

goal_status,

evolutionary_algorithm,

fitness_postprocessor,

selection_algorithm,

best_fitness,

survival_percentage = 0.5,

specie_size_limit = 10,

init_specie_size = 10,

polis_id = mathema,

generation_limit = 100,

evaluations_limit = 100000,

fitness_goal = inf ,

benchmarker_pid

}).

When we start the population_monitor, we want to be able to define these ele- ments. Their default values are shown in the state record, but every-time we run an experiment, we want to be able to set these parameters to whatever we want. Thus, we add the pmp (population monitor parameters) record to the records.hrl, so that it can be set by the benchmarker, and read by the population_monitor. This new record is shown in Listing-13.3, and its elements are defined as follows:

1. op_mode : Allows the benchmarker to define the mode in which the popula- tion_monitor operates. Thus far we only used the gt, which we have not yet used to specify any particular mode of operation, but we will in a much later chapter. In the future we can define new modes, for example the throughput mode during which the agents are not tuned or evaluated, but simply tested for whether they are functional, whether they can gather signals through sensors and output actions through their actuators. The throughput op_mode could also then be used to benchmark the speed of the cycle of the NN based agent, and thus used to test which topologies can process signals faster, and which designs and architectures and implementations of neurons, sensors, actuators, and cortexes are more effi- cient. Or we could specify the op_mode as standard, which would make the pop- ulation monitor function in some standard default manner. With regards to gt, it stands for genetic tuning , but due to our not yet having specified other operational modes, or taken advantage of this parameter, it is effectively the standard mode of operation until we add a new one in Chapter-19.

2. population_id : Allows the benchmarker to set the population's id.

3. survival_percentage : Allows the benchmarker to set which percentage of the population survives during the selection phase.

4. specie_size_limit : Allows the benchmarker to set the size limit of every spe- cie within the population. This is an important parameter to define when start- ing an experiment.

5. init_specie_size : Allows the benchmarker to define the initial size of the spe- cie. For example the experiment can be started where the initial specie size is set to 1000, but the specie size limit is set to 100. In this way, there would be a great amount of diversity (given the constraint is defined in such a manner that NN based agents have access to a variety of plasticity functions, activa- tion functions...), but after a while only 100 are allowed to exist at any one time. Or things could be done in the opposite way, the initial specie size can be small, and the limit specie size large. Allowing the specie to rapidly expand in numbers and diversity, from some small initial bottleneck in the population.

6. polis_id : Allows the benchmarker to define in which polis the popula- tion_monitor will create the new agent population.

7. generation_limit : Every experiment needs a termination condition, and the benchmarker specifies the generation limit based termination condition for the population_monitor, using this parameter.

8. evaluations_limit : Lets the benchmarker specify the evaluations limit based termination condition.

9. fitness_goal : Lets the benchmarker specify the fitness based termination con- dition.

10. benchmarker_pid : This parameter is set to undefined by default. If the popu- lation_monitor has been spawned for a particular experiment by the benchmarker, then the benchmarker sets this parameter to its own PId. Using this PId, the population_monitor can, when the neuroevolutionary run has reached its termination condition, send its trace to the benchmarker process.

Listing-13.3 The new pmp (population monitor parameters) record added to the records.hrl -record(pmp,{

op_mode=gt,

population_id=test,

survival_percentage=0.5,

specie_size_limit=10,

init_specie_size=10,

polis_id = mathema,

generation_limit = 100,

evaluations_limit = 100000,

fitness_goal = inf ,

benchmarker_pid

}).

The pmp record does not necessarily need to be used only by the benchmarker. The researcher can of course, rather than specifying these parameters in the popu- lation_monitor module and then recompiling it, simply start the popula- tion_monitor using the pmp record and the new prep_PopState/2 function we will build in the next subsection, and in this way define all the necessary experiment parameters.

The new experiment table we will add to the mnesia database will hold all the general, experiment specific data, particularly the traces. This is the record that the benchmarker populates as it runs the problem or experiment multiple times to generate multiple traces. The experiment record is shown in Listing-13.4, and its elements are defined as follows:

1. id : Is the unique id or name of the experiment being conducted. Because we wish for this new mnesia table to hold numerous experiments, we need to be able to give each experiment its own particular id or name.

2. backup_flag : This element is present for the use by the benchmarker. When we start the benchmarker program with the experiment tuple whose back- up_flag is set to false, it does not backup that particular experiment to mnesia. This might be useful when we wish to quickly run an experiment but not write the results to the database.

3. pm_parameters : This element will store the pmp record with which the pop- ulation_monitor was started for this particular experiment. This will allow us to later on know what the experiment was for, and how the population_monitor was started (all the initial parameters) to produce the results and traces in the experiment entry. This way the experiment can be replicated later on.

4. init_constraints : Similarly to the pm_parameters which defines how the pop- ulation_monitor runs, we also need to remember the parameters of the popula- tion itself, and the experiment to which the traces belong. This information is uniquely identified by the init_constraints list with which the population is created. Having the init_constraints will allow us to later on replicate the ex- periment if needed.

5. progress_flag : This element can be set to two values: in_progress and com- pleted. The experiment is in progress until it has been run for tot_runs number of times, and thus the experiment has accumulated tot_runs number of traces in its trace_acc list. If for example during the experiment run there is a power outage, when we later go through all the experiments in the experiment table, we will be able to determine which of the experiments were interrupted, based on their progress_flag. Any experiment whose progress_flag is set to in_progress, but which is not currently running, must have been interrupted, and still needs to be completed. Once it is completed, the progress_flag is set to: completed .

6. trace_acc : This is a list where we store the trace tuples. If we apply our TWEANN to some particular problem 10 times, and thus perform 10 evolu- tionary runs, we keep pushing new trace tuples into this list until it contains

all 10 traces, which we can later use at our leisure to build graphs and/or de- duce performance statistics.

7. run_index : We plan on running the experiment some tot_runs number of times. The run_index keeps track of what the current run index is. If the ex- periment is interrupted, using this and other parameters we can restart and continue with the experiment where we left off.

8. tot_runs : This element defines the total number of times that we wish to per- form the evolutionary run, the total number of traces to build this particular experiment from.

9. notes : This can contain a data of any form; string, lists, tuple... This element simply adds an extra open element where some other data can be noted, data which does not belong to any other element in this record.

10. started : This element is the tuple: {date(), time()}, which specifies when the experiment was started.

11. completed : Complementary to the started element, this one stores the date() and time() of when the experiment was finally completed.

12. interruptions : This element is a list of tuples, whose form is: {date(), time()}. These tuples are generated every time the experiment has been restarted after an interruption. For example assume we are running an experiment, and on the 4 run, at which point the trace_acc already contains 3 trace tuples, the experiment was interrupted. Later on when we wish to continue with the ex- periment, we look through the mnesia database, in the experiment table, for an experiment whose progress_flag is set to in_progress . When we find this experiment, we know it has been interrupted, we take its pm_parameters and init_constraints and continue with the experiment, but also, we push to the in- terruptions list the tuple {date(),time()}, which ensures that this experiment notes that there was an interruption to the experiment, it was not a single con- tinues run, and that though we do not know when that interruption occurred, we did continue with the experiment on the date: date(), and time: time().

Listing-13.4 The experiment record.

-record(experiment,{

id,

backup_flag = true,

pm_parameters,

init_constraints,

progress_flag=in_progress,

trace_acc=[],

run_index=1,

tot_runs=10,

notes,

started={date(),time()},

completed,

interruptions=[]

th

13.3 Updating the population_monitor Module

}).

With all the new records defined, we can now move forward and make the small modification to the population_monitor module, creating its new prep_PopState/2 function, which will allow the benchmarker, and the researcher, to start the population_monitor process with its state parameters defined by the pmp record that the prep_PopState/2 is executed with.

Instead of using the macros, we now store all the parameters in the popula- tion_monitor's state record. To start the population_monitor with a particular set of parameters, we now need to create a new function in which we define and set the state to the particular parameters we want the population_monitor to operate under. To set everything up for a population_monitor, we only need the parame- ters defined in the pmp and the constraint record. Thus we create the prep_PopState/2 function which is executed with the pmp record, and a list of constraint records, as its parameters. The new prep_PopState/2 function is shown in Listing-13.5.

Listing-13.5 The prep_PopState/2 function used to initialize the state parameters of the popula- tion_monitor.

prep_PopState(PMP,Specie_Constraints)->

S=#state{

op_mode=PMP#pmp.op_mode,

population_id = PMP#pmp.population_id,

survival_percentage=PMP#pmp.survival_percentage,

specie_size_limit=PMP#pmp.specie_size_limit,

init_specie_size=PMP#pmp.init_specie_size,

polis_id=PMP#pmp.polis_id,

generation_limit=PMP#pmp.generation_limit,

evaluations_limit=PMP#pmp.evaluations_limit,

fitness_goal=PMP#pmp.fitness_goal ,

benchmarker_pid=PMP#pmp.benchmarker_pid

},

init_population(S,Specie_Constraints).

As can be seen, we now execute the init_population/2 function with the state tuple rather than the original population_id and the opmode parameters. This means that all the other functions which originally used the macros of this module, need to be slightly modified to now simply use the parameters which are now specified within the population_monitor's state record. The modifications are very

small and few in number, and are thus not shown. The updated popula- tion_monitor module can be found in the 13 chapter of the available supplemen- tary material [2].

Finally, we modify the termination clause of the population_monitor, since now at the moment of termination, the population_monitor needs to check whether it was a benchmarker that had spawned it. The population_monitor accomplishes this by checking the benchmarker_pid parameter. If this parameter is set to unde- fined , then the population_monitor does not need to send its trace anywhere. If the benchmarker_pid is defined, then the process forwards its trace to the specified PId. The updated terminate/2 callback is shown in Listing-13.6.

Listing-13.6 The updated terminate/2 function, capable of sending the benchmarker the popula- tion_monitor's trace record, if the benchmarker was the one which spawned it.

terminate(Reason, S) ->

case S of

[] ->

io:format( “******** Population_Monitor shut down with Reason:~p, with

State: []~n ”,[Reason]);

_ ->

OpMode = S#state.op_mode,

OpTag = S#state.op_tag,

TotEvaluations=S#state.tot_evaluations,

Population_Id = S#state.population_id,

case TotEvaluations < 500 of

true -> %So that there is at least one stat in the stats list. gather_STATS(Population_Id,0);

false ->

ok

end,

P = genotype:dirty_read({population,Population_Id}),

T = P#population.trace,

U_T = T#trace{tot_evaluations=TotEvaluations},

U_P = P#population{trace=U_T},

genotype:write(U_P),

io:format( “******** TRACE START ********~n ”),

io:format( “~p~n ”,[U_T]),

io:format( “******** ^^^^ TRACE END ^^^^ ********~n ”),

io:format( “******** Population_Monitor:~p shut down with Reason:~p

OpTag:~p, while in OpMode:~p~n ”,[Population_Id,Reason,OpTag,OpMode]),

io:format( “******** Tot Agents:~p Population Generation:~p

Tot_Evals:~p~n ”,[S#state.tot_agents,S#state.pop_gen,S#state.tot_evaluations]),

case S#state.benchmarker_pid of

undefined ->

th

13.4 Implementing the benchmarker

ok;

PId ->

PId ! {S#state.population_id,completed,U_T}

end

end.

With this done, and everything set up for the benchmarker to be able to spawn the population_monitor and store the experiment data if it wishes to do so, we now move forward to the next subsection and create this new benchmarker module.

The benchmarker process will have three main functionalities:

1. To run the population_monitor N number of times, waiting for the popula- tion_monitor's trace after every run.

2. Create the experiment entry in the mnesia database, and keep updating its trace_acc as it itself accumulates the traces from the spawned popula- tion_monitors. The benchmarker should only do this if the backup_flag is set to true in the experiment record with which the benchmarker was started.

3. When the benchmarker has finished performing N number of evolutionary runs, and has accumulated N number of traces, it must print all the traces to console, calculate averages of the parameters between all the traces, and then finally write that data to file in the format which can be immediately graphed by GNUPlot.

In addition, because the benchmarker might be interrupted as it accumulates the traces, we want to build a function which can continue with the experiment when executed. Because each experiment will have its own unique Id, and because each experiment is stored to mnesia, this continue function should be executed with the experiment id parameter. When executed, it should read from the mnesia database all the needed information about the experiment, and then run the popula- tion_monitor the remaining number of times to complete the whole experiment.

In Listing-13.7 we implement the new benchmarker module. The comments af- ter each function describe its functionality and purpose.

Listing-13.7 The implementation of the benchmarker module.

-module(benchmarker).

-compile(export_all).

-include( “records.hrl ”).

%%% Benchmark Options %%%

-define(DIR, ”benchmarks/ ”).

-define(INIT_CONSTRAINTS,[#constraint{morphology=Morphology, connection_architecture =CA, population_evo_alg_f=generational} || Morphology<- [xor_mimic], CA<-[feedforward]]).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

start(Id)->

PMP = #pmp{

op_mode=gt,

population_id=Id,

survival_percentage=0.5,

specie_size_limit=10,

init_specie_size=10,

polis_id = mathema,

generation_limit = 100,

evaluations_limit = 10000,

fitness_goal = inf

},

E=#experiment{

id = Id,

backup_flag = true,

pm_parameters=PMP,

init_constraints = ?INIT_CONSTRAINTS,

progress_flag=in_progress,

run_index=1,

tot_runs=10,

started={date(),time()},

interruptions=[]

},

genotype:write(E),

register(benchmarker,spawn(benchmarker,prep,[E])).

%start/1 is called with the experiment id or name. It first assigns all the parameters to the pmp and experiment records, and then writes the record to database (overwriting an existing one of the same name, if present), and then finally spawns and registers the actual benchmarker pro- cess.

continue(Id)->

case genotype:dirty_read({experiment,Id}) of

undefined ->

io:format( “Can't continue experiment:~p, not present in the database.~n ”,[Id]); E ->

case E#experiment.progress_flag of

completed ->

Trace_Acc = E#experiment.trace_acc,

io:format( “Experiment:~p already completed:~p~n ”, [Id,

Trace_Acc]);

in_progress ->

Interruptions = E#experiment.interruptions,

U_Interruptions = [now()|Interruptions],

U_E = E#experiment{

interruptions = U_Interruptions

},

genotype:write(U_E),

register(benchmarker,spawn(benchmarker,prep,[U_E]))

end

end.

%The continue/1 function spawns a benchmarker to continue a previously stopped experiment. If the experiment with the name/id of the Id parameter is already present in the database, and its progress_flag is set to in_progress, which means that the experiment has not yet completed and should continue running and accumulating new traces into its trace_acc list, then this function updates the experiment's interruptions list, and then spawns the benchmarker process using the experiment tuple as its parameter. The experiment record holds all the needed information to start the population_monitor, it contains a copy of the population monitor parameters, and the initial constraints used.

prep(E)->

PMP = E#experiment.pm_parameters,

U_PMP = PMP#pmp{benchmarker_pid=self()},

Constraints = E#experiment.init_constraints,

Population_Id = PMP#pmp.population_id,

population_monitor:prep_PopState(U_PMP,Constraints),

loop(E#experiment{pm_parameters=U_PMP},Population_Id).

%prep/1 function is run before the benchmarker process enters its main loop. This function ex- tracts from the experiment all the needed information to run the popula- tion_monitor:prep_PopState/2 function and to start the population_monitor process with the right set of population monitor parameters and specie constraints.

loop(E,P_Id)->

receive

{P_Id,completed,Trace}->

U_TraceAcc = [Trace|E#experiment.trace_acc],

U_RunIndex = E#experiment.run_index+1,

case U_RunIndex >= E#experiment.tot_runs of

true ->

U_E = E#experiment{

trace_acc = U_TraceAcc,

run_index = U_RunIndex,

completed = {date(),time()},

progress_flag = completed

},

genotype:write(U_E),

report(U_E#experiment.id, ”report ”);

false ->

U_E = E#experiment{

trace_acc = U_TraceAcc,

run_index = U_RunIndex

},

genotype:write(U_E),

PMP = E#experiment.pm_parameters,

Constraints = E#experiment.init_constraints,

population_monitor:prep_PopState(PMP,Constraints),

loop(U_E,P_Id)

end;

terminate ->

ok

end.

%loop/2 is the main benchmarker loop, which can only receive two types of messages, a trace from the population_monitor process, and a terminate signal. The benchmarker is set to run the experiment, and thus spawn the population_monitor process tot_runs number of times. After receiving the trace tuple from the population_monitor, it checks whether this was the last run or not. If it is not the last run, the benchmarker updates the experiment tuple, writes it to the data- base, and then spawns a new population_monitor by executing the popula- tion_monitor:prep_PopState/2 function. If it is the last run, then the function updates the exper- iment tuple, sets the progress_flag to completed, writes the updated experiment tuple to database, and runs the report function which calculates the averages and other statistical data, and produces the data for graphing, a file which can be used by the gnuplot program.

report(Experiment_Id,FileName)->

E = genotype:dirty_read({experiment,Experiment_Id}),

Traces = E#experiment.trace_acc,

{ok, File} = file:open(?DIR++FileName++ ”_Trace_Acc ”, write),

lists:foreach(fun(X) -> io:format(File, “~p.~n ”,[X]) end, Traces),

file:close(File),

io:format( “******** Traces_Acc written to

file:~p~n ”,[?DIR++FileName++ ”_Trace_Acc ”]),

Graphs = prepare_Graphs(Traces),

write_Graphs(Graphs,FileName++ ”_Graphs ”),

Eval_List = [T#trace.tot_evaluations||T<-Traces],

io:format( “Avg Evaluations:~p~n ”,[functions:avg(Eval_List),functions:std(Eval_List)]). %report/2 is called with the id of the experiment to report upon, and the FileName to which to write the gnuplot formatted graphable data calculated from the given experiment. The function first extracts the experiment record from the database, then opens a file in the ?DIR directory to deposit the traces there, then calls the prepare_Graphs/1 function with the trace list from the experiment, and finally, with the data having now been prepared by the prepare_Graphs/1 func- tion, the report function executes write_Graphs/2 to write the produced graphable data to the file FileName.

-record(graph,{morphology,avg_neurons=[],neurons_std=[],avg_fitness=[],fitness_std=[],

max_fitness=[], min_fitness=[],avg_diversity=[],diversity_std=[],evaluations=[],

evaluation_Index=[]}).

-record(avg,{avg_neurons=[],neurons_std=[],avg_fitness=[],fitness_std=[],max_fitness=[],

min_fitness=[], avg_diversity=[],diversity_std=[],evaluations=[]}).

%These two records contain the parameters specifically for the prepare_Graphs function. These records are used to accumulate data needed to calculate averages and other statistical data from the traces.

prepare_Graphs(Traces)->

[T|_] = Traces,

[Stats_List|_] = T#trace.stats,

Morphologies = [S#stat.morphology || S<-Stats_List],

Morphology_Graphs = [prep_Traces(Traces,Morphology,[])|| Morphology <-

Morphologies],

[io:format( “Graph:~p~n ”,[Graph])|| Graph<-Morphology_Graphs],

Morphology_Graphs.

%prepare_Graphs/1 first checks a single trace in the Traces list to build a list of the morpholo- gies present in the population (the number and types of which stays stable in our current im- plementation throughout the evolutionary run), since the statistical data is built for each mor- phology as its own specie. The function then prepares the graphable lists of data for each of the morphologies in the trace. Finally, the function prints to screen the lists of values built from av- eraging the traces. The data within the lists, like in the traces, is temporally sorted, composed every 500 evaluations by default.

prep_Traces([T|Traces],Morphology,Acc)->

Morphology_Trace=lists:flatten([[S||S<-Stats,S#stat.morphology==Morphology]||Stats<-

T#trace.stats]),

prep_Traces(Traces,Morphology,[Morphology_Trace|Acc]);

prep_Traces([],Morphology,Acc)->

Graph = avg_MorphologicalTraces(lists:reverse(Acc),[],[],[]),

Graph#graph{morphology=Morphology}.

%prep_Traces/3 goes through every trace, and extracts from the stats list of those traces only

the stats associated with the morphology with which the function was called. Once the function goes through every trace in the Traces list, and the morphologically specific trace data has been extracted, the function calls avg_MorphologicalTraces/4 to construct a tuple similar to the trace, but whose lists are composed of the average based values of all the morphology specific traces, the average, std, max, min... of all the evolutionary runs in the experiment.

avg_MorphologicalTraces([S_List|S_Lists],Acc1,Acc2,Acc3)->

case S_List of

[S|STail] ->

avg_MorphologicalTraces(S_Lists,[STail|Acc1],[S|Acc2],Acc3);

[] ->

Graph = avg_statslists(Acc3,#graph{}),

Graph

end;

avg_MorphologicalTraces([],Acc1,Acc2,Acc3)->

avg_MorphologicalTraces(lists:reverse(Acc1),[],[],[lists:reverse(Acc2)|Acc3]).

%avg_MorphologicalTraces/4 changes the dropped in S_lists from [Specie1_stats::[stat500, stat1000,...statN], Specie2_stats::[stat500,stat1000,...statN]...] to [[Spec1_Stat500, Spec2_Stat500... SpecN_Stat500], [Spec1_Stat1000, Spec2_Stat1000,... SpecN_Stat1000]...]. The trace accumulator contains a list of traces. A trace has a stats list, which is a list of lists of stat tuples. The stats list is a temporal list, since each stat list is taken every 500 evaluations, so the stats list traces-out the evolution of the population. Averages and other calculations need to be made for all experiments at the same temporal point, for example computing the average fit- ness between all experiments at the end of the first 500 evaluations, or at the end of the first 20000 evaluations... To do this, the function rebuilds the list from a list of separate temporal traces, to a list of lists where every such sublist contains the state of the specie (the stat) at that particular evaluation slot (at the end of 500, or 1000,...). Once this new list is built, the function calls avg_statslists/2, which calculates the various statistics of the list of lists.

avg_statslists([S_List|S_Lists],Graph)->

Avg = avg_stats(S_List,#avg{}),

U_Graph = Graph#graph{

avg_neurons = [Avg#avg.avg_neurons|Graph#graph.avg_neurons],

neurons_std = [Avg#avg.neurons_std|Graph#graph.neurons_std],

avg_fitness = [Avg#avg.avg_fitness|Graph#graph.avg_fitness],

fitness_std = [Avg#avg.fitness_std|Graph#graph.fitness_std],

max_fitness = [Avg#avg.max_fitness|Graph#graph.max_fitness],

min_fitness = [Avg#avg.min_fitness|Graph#graph.min_fitness],

evaluations = [Avg#avg.evaluations|Graph#graph.evaluations],

avg_diversity = [Avg#avg.avg_diversity|Graph#graph.avg_diversity],

diversity_std = [Avg#avg.diversity_std|Graph#graph.diversity_std]

},

avg_statslists(S_Lists,U_Graph);

avg_statslists([],Graph)->

Graph#graph{

avg_neurons = lists:reverse(Graph#graph.avg_neurons),

neurons_std = lists:reverse(Graph#graph.neurons_std),

avg_fitness = lists:reverse(Graph#graph.avg_fitness),

fitness_std = lists:reverse(Graph#graph.fitness_std),

max_fitness = lists:reverse(Graph#graph.max_fitness),

min_fitness = lists:reverse(Graph#graph.min_fitness),

evaluations = lists:reverse(Graph#graph.evaluations),

avg_diversity = lists:reverse(Graph#graph.avg_diversity),

diversity_std = lists:reverse(Graph#graph.diversity_std)

}.

%avg_statslists/2 calculates the averages and other statistics for every list in the S_lists, where each sublist is a list of stat tuples on which it executes the avg_stats/2 function, which returns

back a tuple with all the various parameters calculated from that list of stat tuples of that partic- ular evaluations time slot.

avg_stats([S|STail],Avg)->

U_Avg = Avg#avg{

avg_neurons = [S#stat.avg_neurons|Avg#avg.avg_neurons],

avg_fitness = [S#stat.avg_fitness|Avg#avg.avg_fitness],

max_fitness = [S#stat.max_fitness|Avg#avg.max_fitness],

min_fitness = [S#stat.min_fitness|Avg#avg.min_fitness],

evaluations = [S#stat.evaluations|Avg#avg.evaluations],

avg_diversity = [S#stat.avg_diversity|Avg#avg.avg_diversity]

},

avg_stats(STail,U_Avg);

avg_stats([],Avg)->

Avg#avg{

avg_neurons=functions:avg(Avg#avg.avg_neurons),

neurons_std=functions:std(Avg#avg.avg_neurons),

avg_fitness=functions:avg(Avg#avg.avg_fitness),

fitness_std=functions:std(Avg#avg.avg_fitness),

max_fitness=lists:max(Avg#avg.max_fitness),

min_fitness=lists:min(Avg#avg.min_fitness),

evaluations=functions:avg(Avg#avg.evaluations),

avg_diversity=functions:avg(Avg#avg.avg_diversity),

diversity_std=functions:std(Avg#avg.avg_diversity)

}.

%avg_stats/2 function accepts a list of stat tuples as a parameter. First it extracts the various el- ements of that tuple. For every tuple in the list (each of the tuples belongs to a different evolu- tionary run) it puts the particular value of that tuple into its own list. Once all the values have been put into their own lists, the function uses the functions:avg/1 and functions:std/1 to calcu- late the averages and standard deviations as needed, to finally build the actual single tuple of said values (avg_neurons, neurons_std...). The case is slightly different for the max and min fit- ness values amongst all evolutionary runs, for which the function extracts the max amongst the maxs and the min amongst the mins, calculating the highest max and the lowest min achieved amongst all evolutionary runs. This can be further augmented to also simply calculate the avg of the max and min lists by changing the lists:min/1 and lists:max/1 to the function func- tions:avg/1.

write_Graphs([G|Graphs],Graph_Postfix)->

Morphology = G#graph.morphology,

U_G = G#graph{evaluation_Index=[500*Index || Index <-lists:seq(1, length(G#graph.avg_fitness))]},

{ok, File} = file:open(?DIR++ ”graph_ ”++atom_to_list(Morphology)++ ”_ ” ++Graph_Postfix, write),

io:format(File, ”#Avg Fitness Vs Evaluations, Morphology:~p~n ”,[Morphology]),

lists:foreach(fun({X,Y,Std}) -> io:format(File, “~p ~p ~p~n ”,[X,Y,Std]) end,

lists:zip3(U_G#graph.evaluation_Index,U_G#graph.avg_fitness,U_G#graph.fitness_std)), io:format(File, ”~n~n#Avg Neurons Vs Evaluations, Morphology:~p~n ”,[Morphology]), lists:foreach(fun({X,Y,Std}) -> io:format(File, “~p ~p ~p~n ”,[X,Y,Std]) end,

lists:zip3(U_G#graph.evaluation_Index,U_G#graph.avg_neurons,U_G#graph.neurons_std)), io:format(File, ”~n~n#Avg Diversity Vs Evaluations, Morphology:~p~n ”,[Morphology]), lists:foreach(fun({X,Y,Std}) -> io:format(File, “~p ~p ~p~n ”,[X,Y,Std]) end,

lists:zip3(U_G#graph.evaluation_Index,U_G#graph.avg_diversity,U_G#graph.diversity_std)), io:format(File, ”~n~n#Avg. Max Fitness Vs Evaluations, Morphology:~p~n ”,[Morphology]), lists:foreach(fun({X,Y}) -> io:format(File, “~p ~p~n ”,[X,Y]) end,

lists:zip(U_G#graph.evaluation_Index,U_G#graph.max_fitness)),

io:format(File, ”~n~n#Avg. Min Fitness Vs Evaluations, Morphology:~p~n ”,[Morphology]), lists:foreach(fun({X,Y}) -> io:format(File, “~p ~p~n ”,[X,Y]) end,

lists:zip(U_G#graph.evaluation_Index,U_G#graph.min_fitness)),

io:format(File, ”~n~n#Specie-Population Turnover Vs Evaluations, Morphology:~p~n ”, [Morphology]),

lists:foreach(fun({X,Y}) -> io:format(File, “~p ~p~n ”,[X,Y]) end, lists:zip(U_G#graph.evaluation_Index,U_G#graph.evaluations)),

file:close(File),

write_Graphs(Graphs,Graph_Postfix);

write_Graphs([],_Graph_Postfix)->

ok.

%write_Graphs/2 accepts a list of graph tuples, each of which was created for a particular spe- cie/morphology within the experiment. Then for every graph, the function writes to file the var- ious statistic results in the form readable by the gnuplot software. With the final result being a file which can be immediately used by the gnuplot to produce graphs of the various properties of the experiment.

With the benchmarker now implemented, we test it in the next subsection to ensure that all of its features are functional.

13.5 Compiling and Testing

Because we have created a new record, we now need to either add it to the mnesia database independently, or simply reset the whole thing (database), by ex- ecuting the polis:reset() function. We now also need to test our new benchmarker system, and see whether it functions properly and does indeed save the data to the database, is able to continue the experiment after an interruption, and is able to produce a file which can be used by the gnuplot. Also, due to the following line in the benchmarker module: -define(DIR, ”benchmarks/ ”) , our benchmarker will be expecting for this folder to exist. Thus this folder must first be added, before we perform the following tests.

To test all these new features we will first recompile the code, and then reset the database. Afterwards, we will test our system in the following manner and or- der:

1. Set the benchmarker's pmp record to its current default, running the XOR mim- icking experiment 10 times, to completion, using the generational evolutionary loop.

2. Examine the resulting console printout, to ensure basic structural validity, and that no crashes occurred.

3. Examine the two resulting files, the file that should have a list of traces, and the file which has data formatted in a gnuplot graphable format.

4. Plot the data in the graph based file, performing a basic sanity check on the re- sulting graph.

5. Again run the benchmarker, only this time, in the middle of the experiment ex- ecute: Ctrl-C to stop the interpreter midway, and then execute ‘a' to abort. This simulates the crashing of the machine in the middle of the experiment. We then re-enter the interpreter, and start up the polis to check whether the half finished experiment is present in the database. Once its presence is confirmed, we test benchmarker:continue(Id) by executing: benchmarker:continue(test) .

6. Finally, we examine the resulting console printout and the final experiment en- try in the database, to ensure that the progress_flag is now set to: completed .

Because our implemented evolutionary loops (steady_state and generational) are independent of the evaluations accumulation, and thus the termination and the triggering of the benchmarker, we can simply perform these tests with the genera- tional evolutionary loop, and not need to redo them with the steady_state evolu- tionary loop.

The default pmp and experiment records, and the ?INIT_CONSTRAINTS mac- ro, are all set as follows:

-define(INIT_CONSTRAINTS,[#constraint{morphology=Morphology,connection_architecture

=CA, population_evo_alg_f=generational} || Morphology<-[xor_mimic],CA<-[feedforward]]).

pmp{ op_mode=gt, population_id=test, survival_percentage=0.5, specie_size_limit=10, init_specie_size=10, polis_id = mathema, generation_limit = 100, evaluations_limit = 10000, fitness_goal = inf }

experiment{ id = Id, backup_flag = true, pm_parameters=PMP, init_constraints

=?INIT_CONSTRAINTS, progress_flag=in_progress, run_index=1, tot_runs=10, start-

ed ={date(),time()}, interruptions=[] }

Having set everything to the intended values, we now (assuming that the new source has been compiled, and the new mnesia database has been created with all the appropriate tables by executing polis:reset()) run the benchmarker:start(test) function, as shown in Listing-13.8.

Listing-13.8 Running the benchmarker:start(test) function to test the benchmarker functionality. 2> benchmarker:start(test).

...

[{stat,xor_mimic,7.544823116774118e-10,1.0,0.0,278.5367828868784,

235.4058314015377,979.1905253086005,112.76113310465351,4,500,

{1325,412119,825873}}]],

10000,500}

- - - - ^^^^ TRACE END ^^^^ ********

- - - - Population_Monitor:test shut down with Reason:normal OpTag:continue, while in OpMode:gt

- - - - Tot Agents:10 Population Generation:36 Tot_Evals:10076

- - - - Traces_Acc written to file: ”benchmarks/report_Trace_Acc ”

Graph:{graph,xor_mimic,

[1.1345679012345677,1.3708193041526373,1.4792929292929293,...1.9777777777777774],

[0.11516606301253175,0.3429379936199053,0.472713243338398,

...0.28588178511708023],

[6376.044863498539,171964.06677104777,405553.7010466698,

...948483.9530134387],

[13996.969949682387,305943.44537378295,421839.1376054512,

...46957.98926294873],

[7595.914268861698,242099.32776384687,566599.7452288255,

...999402.6491394333],

[1736.703111779903,1157.4193567602842,227914.43647811364,...497519.90979294974],

[5.111111111111111,6.444444444444445,...7.0],

[0.7370277311900889,1.257078722109418, …2.1081851067789197],

[500.0,500.0,500.0,500.0,500.0,500.0,444.44444444444446,...500.0],

[]}

It works! The console printout looks proper, a graph record, where each list is the average between all the experiments, with the averages calculated within the same evaluation frames. When we look into the benchmark folder, we see the presence of two files within: the graph_xor_mimic_report_Graphs file , and the report_Trace_Acc file. The report_Trace_Acc file contains a list of traces as ex- pected, and shown in Listing-13.9.

Listing-13.9 The shortened contents of the report_Trace_Acc file.

{trace,[[{stat,xor_mimic,7.544235757558436e-10,2.0,0.0,999793.8069900939,

20.85034690621442,999805.1547609345,999739.967822178,9,500,

{1325,515312,752712}}],...

10000,500}.

{trace,[[{stat,xor_mimic,7.544235772700672e-10,2.0,0.0,999796.4301657086,

3.35162014431123,999799.6097959183,999792.3483220651,8,500,

{1325,515310,43590}}],...

10000,500}.

…

So far so good, the report_Trace_Acc contains all 10 traces. Another file, with the name graph_ xor_mimic _ report_Graphs, is also present in the benchmark folder . This file contains rows of values in the format we specifically created so that we can then use gnuplot to plot the resulting data. A sample of the formatted data within the file is shown in Listing-13.10.

Listing-13.10 The format of the graph_xor_mimic_report_Graphs file.

Avg Fitness Vs Evaluations, Morphology:xor_mimic

500 6376.044863498539 13996.969949682387

1000 171964.06677104777 305943.44537378295

…

Avg Neurons Vs Evaluations, Morphology:xor_mimic

500 1.1345679012345677 0.11516606301253175

1000 1.3708193041526373 0.3429379936199053

…

Again, after analyzing the graph, all the data seems to be in proper order. If we wish, we can use this file to create a plot using the gnuplot program. An example of such a plot is shown in Fig-13.1 . Fig-13.1a and Fig-13.1b show the plots of Fit- ness (Avg, Max, and Min) vs. Evaluations, and Population Diversity vs. Evalua- tions, respectively. In Fig-13.1a we see that the average and max fitness quickly increases, and within the first 1000 evaluations they have already reached a very good score. The Min fitness within the graph is shown to always go up and down, as is expected, since every offspring might have a mutation which might make it ineffective. But even in that plot, we see that the minimum fitness also reaches high values, primarily because the mutations that break the system in some way, are mitigated by the tuning of the synaptic weights. In Fig-13.2b we see the diver- sity plotted against evaluations, with vertical error bars.

Fig. 13.1 The graphs produced with the data created by the benchmarker process, and plotted by the gnuplot program. Graph ‘a' shows Fitness (Avg, Max, and Min) vs. Evalua- tions, and graph ‘b' shows Diversity vs. Evaluations.

In the above figure we see that diversity never goes below 5 in a population of 10. A diversity of 4 is only present during the seed population, and primarily be- cause there are only so many ways to create the minimalistic 1 neuron NN topolo- gy for this problem (through the use of different activation functions). The diversi- ty in fact is increasing over time, not decreasing. The diversity reaches a stable value of 6-7, which means that 60%-70% of the population is different from one another, and the other 3-4 have similar topologies to those belonging to the 6-7 di- verse topologies.

High population diversity is one of the important features of a memetic algo- rithm based TWEANN. In a system that we designed, it is simply not possible for diversity to shrink, because no matter which NN systems are fit or unfit, their off- spring will have to be topologically different from them because they will pass through a topological mutation phase when created. As the size of the NN increas- es, so does the possible number of mutation operators applied to the clone during offspring creation, and thus the number of possible topological permutations, fur- ther increasing the number of mutants in the population, which results in an even higher diversity. As we increase the population size, again the result is greater di- versity because now more agents can create offspring, and every one of those agents will produce a topological mutant, which will have a chance to be different from every other agent in the population and not just its parent.

Thus, a memetic algorithm based topology and weight evolving artificial neural network has a naturally emerging high diversity within its population, unlike the standard TWEANNs which usually converge very rapidly, and thus have a lower chance of solving the more complex problems. At the same time, the memetic TWEANN is also able to very rapidly solve problems it is applied to, and in my experience almost always faster than the standard TWEANN no matter the prob- lem or simulation it is being used for. We will have a chance to test this bold claim when we benchmark our system against other TWEANNs in the following chap- ters.

With this done, we can now test the benchmarker's ability to continue a crashed or stopped experiment. You will most likely get a different result when testing on your machine, depending on when you stop the interpreter. On my ma- chine, after having started the benchmarker, and then almost immediately stopping it by executing Ctrl-C a , and then re-entering the interpreter, my results were as follows when performing steps 5 and 6:

Listing-13.11 Crashing the benchmarker, and then attempting to continue by executing the benchmarker:continue(Id) function.

2> polis:start().

Parameters:{[],[]}

- - - - Polis: ##MATHEMA## is now online.

{ok,<0.35.0>}

2> benchmarker:start(test).

…

Ctrl-C

BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded

(v)ersion (k)ill (D)b-tables (d)istribution

a

…

- - - - Polis: ##MATHEMA## is now online.

{ok,<0.34.0>}

2> mnesia:dirty_read({experiment,test}).

[{experiment,test,true,

{pmp,gt,test,0.5,10,10,mathema,100,10000,inf,<0.143.0>},

[{constraint,xor_mimic,feedforward, [tanh,cos,gaussian,absolute], [none],

[dot_product], [all],...],

in_progress,

[{trace,[[{stat,xor_mimic,7.544226409998199e-10,

2.0833333333333335,0.2763853991962833,833118.8760231837,

372581.937787711,999804.3485638215,0.34056136711788676,8,

{1325,516955,41224}}],

...

10000,500}],

2,10,undefined,

{{2012,1,2},{7,9,12}},

undefined,[]}]

3>benchmarker:continue(test).

…

Graph:{graph,xor_mimic,...}

4>mnesia:dirty_read({experiment,test}).

…

completed,

…(TRACES)

10,10,undefined,

start {{2012,1,2},{7,9,12}},

end {{2012,1,2},{7,14,51}},

[{1325,517268,871875}]}]

It works! The benchmarker was first run and then abruptly stopped. After re- starting the polis and checking the mnesia database, the experiment with the id test was present. Printing it to console showed, color coded in the above listing, that it contained the pmp record (green, and if you're reading the black & white printed version, it's the one starting with: “{pmp ”), the constraints (blue, and starting with: “[{constraint ”), and had a list of traces (red, and starting with: “[{trace ”), 2 of which were present, out of the 10 the full experiment must contain. Finally, we also see the in_progress tag, which confirms that this experiment was stopped abruptly and is not yet finished. The function benchmarker:continue(test) was then executed, and the benchmarker ran to completion, printing the Graph tuple to console at the end. Finally, when rechecking the experiment entry in the database by executing mnesia:dirty_read({experiment,test}), we see that it contains 10 out of 10 evolutionary runs (traces), parameter completed is present, and we also see the start: {{2012,1,2},{7,9,12}} and end: {{2012,1,2},{7,9,12}} times respectively (which I marked with italicized “start ” and “end ” tags), are also present. The bench- marker works as expected, and we have completed testing it.

13.6 Summary

Every time an addition or extension is made to the neuroevolutionary system, it is important to see how it affects it as a whole. Is the neuroevolutionary system able to more effectively evolve agents? Is there high or low diversity? Does the neuroevolutionary approach taken converges too quickly, and is thus unable to in- ject enough diversity to overcome fitness walls present on the fitness landscape? Using a benchmarker helps us answer these questions.

We also created a new module called benchmarker, and a new table called ex- periment, within the database . The experiment table holds multiple complete ex- periment entries, each of which is composed of multiple traces, which are evolu- tionary runs applied to some problem. This allows for the experiment entry to be used to calculate the average performance of multiple runs of the same simula- tion/problem, thus giving us a general idea of how the system performs. We have created the benchmarker in such a way that it can run an experiment and save the traces to database after every successful run, such that in the case of a crash it can recover and continue with the experiment.

13.7 References

We are almost at the point where we can start adding new, much more ad- vanced features. Features like plasticity, indirect encoding, crystallization... And though we can now perform benchmarks after adding such advanced features, we do not at this point have problems and simulations complex enough to test the new features on. Thus we first need to create this new set of more complex benchmarks and problems.

We need to create two types of new benchmarks. One standard neurocontroller benchmark, for which a recurrent and non recurrent solutions need to be evolved to solve it. This standard benchmark is called the pole balancing problem [3,4]. Another standard benchmark requires the NN based agent to learn as it interacts with the environment. We need such a benchmark to be able to tell whether the addition of neural plasticity to our evolved NN based systems improves them, and whether the added plasticity features work at all. The standard benchmark in this particular area is called the T-Maze navigation problem [5,6]. In the next chapter we will create both of these new problems, representing them as private scapes with which the evolving NN based agents can interact with.

[1] gnuplot: http://www.gnuplot.info/

[2] https://github.com/CorticalComputer/NeuroevolutionThroughErlang

[3] Gomez F, Miikkulainen R (1998). 2-D Pole Balancing with Recurrent Evolutionary Net- works. In Proceedings of the International Conference on Artificial Neural Networks (Else- vier), pp. 2-7.

[4] Durr P, Mattiussi C, Floreano D (2006) Neuroevolution With Analog Genetic Encoding. Par- allel Problem Solving from NaturePPSN iX, 671-680.

[5] Soltoggio A, Bullinaria JA, Mattiussi C, Durr P, Floreano D (2008) Evolutionary Advantages of Neuromodulated Plasticity in Dynamic, Reward-based Scenarios. Artificial Life 2, 569-576.

[6] Blynel J, Floreano D (2003) Exploring the T-maze: Evolving learning-like robot behaviors using CTRNNs. Applications of evolutionary computing 2611, 173-176.

Chapter 14 Creating the Two Slightly More Complex Benchmarks

Abstract To test the performance of a neuroevolutionary system after adding a new feature, or in general when trying to assess its abilities, it is important to have some standardized benchmarking problems. In this chapter we create two such benchmarking problems, the Pole Balancing Benchmarks (Single, Double, and With and Without dampening), and the T-Maze navigation benchmark, which is one of the problems used to assess the performance of recurrent and plasticity en- abled neural network based systems.

Though we have created an extendible and already rather advanced TWEANN platform, how can we prove it to be so when we only have the basic XOR bench- mark to test it on? As we continue to improve and advance our system, we will need to test it on more advanced benchmarks. In this chapter we develop and add two such benchmarking problems, the pole balancing benchmark, and the T-Maze navigation benchmark. Both of these benchmarks are standard within the compu- tational intelligence field, and our neuroevolutionary system's ability to solve them is the minimum requirement to be considered functional.

To allow our TWEANN to use these benchmarks, we need to create a simula- tion/scape of the said problems, and create the agent morphology that contains the sensors/actuators that the NN based agents can use to interface with these new scapes. In the following sections we will first build the pole balancing simulation. Afterwards, we will develop the T-Maze simulation, a problem which can be much better solved by a NN system which can learn and adapt as it interacts with the environment, by a NN which has plasticity (a feature we will add to our neuroevolutionary system in Chapter-15).

Once these two types of new simulations are created, we will briefly test them, and then move on to the next chapter, where we will begin advancing and expand- ing our neuroevolutionary system.

14.1 Pole Balancing Simulation

The pole balancing benchmark consists of the NN based agent having to push a cart on a track, such that the pole standing on the cart is balanced and does not tip over and fall. Defined more specifically, the pole balancing problem is posed as follows: Given a two dimensional simulation of a cart on a 4.8 meter track, with a pole of length L on the top of a cart, attached to the cart by a hinge, and thus free to swing, the NN based controller must apply a force to the cart, pushing it back

DOI 10.1007/978-1-4614- 4463 - 3_14, © Springer Science+Business Media New York 2013

and forth on the track, such that the pole stays balanced on the cart and within 36 degrees of the cart's vertical. For sensory inputs, the NN based agent is provided with the cart's position and velocity, and the pole's angular position (from the ver- tical) and angular velocity. The output of the NN based agent is the force value F in newtons (N), saturated at 10N of magnitude. Positive F pushes the cart to the left, and negative pushes it to the right. Given these conditions, the problem is to balance the pole on the cart for 30 simulated minutes, or as long as possible, where the fitness is the amount of time the NN can keep the pole balanced by pushing the cart back and forth.

The temporal granularity of the simulation is 0.01 seconds, which means that every 0.01 seconds we perform all the physics based calculations, to determine the position of the cart and the pole. The Agent requests sensory signals and acts eve- ry 0.02 seconds. The simulation termination conditions are as follows: the cart must stay on the 4.8 meter track or the simulation ends, the simulation also ends if the pole falls outside the 36 degrees of the vertical.

There are multiple versions of this problem, each one differs in its difficulty: 1. The simple single pole balancing problem, as shown in Fig-14.1a . In this simu-

lation the NN based agent pushes the cart to balance the single 1 meter pole on it. This problem is further broken down into two different versions.



The NN receives as a sensory signal the cart's position on the track (CPos), the cart's velocity (CVel), the pole's angular position (PAngle), and the pole's angular velocity (PVel). Sensory_Signal = [CPos, CVel, PAngle, PVel].

The NN receives as a sensory signal only the CPos and PAngle values. To figure out how to solve the problem, how to push the cart and in which di- rection, the NN will need to figure out how to calculate the CVel and PVel values on its own, which requires recurrent connections. Sensory_Signal = [Cpos,PAngle].

It is possible to very rapidly move the cart back and forth, which keeps the pole balanced. To prevent this type of a solution, the problem is sometimes further modified with the fitness of the NN based agent not only being dependent on the amount of time it has balanced the pole, but on how smoothly it has pushed the cart. One type of fitness function simply rewards the NN based on the length of time it has balanced the pole, while the other rewards the NN based on the length of time it has balanced the pole, and penalizes it for very high velocities and rapid velocity changes. The first is the standard fitness function, while the other is called the damping fitness function.

2. A more difficult version of the pole balancing problem is the double pole bal- ancing version, as shown in Fig-14.1b . In this problem we try to balance two poles of differing lengths at the same time. The closer the lengths of the two poles are, the more difficult the problem becomes. Usually, the length of one pole is set to 0.1 meters, and the length of the second is set to 1 meter. As with the single pole balancing problem, there are two versions of this, and again for each version we can use either of the two types of fitness functions:

The sensory signal gathered by the NN is composed of the cart's position and velocity (CPos,CVel), the first pole's angle and velocity (P1_Angle, P1_Vel), and the second pole's angle and velocity (P2_Angle, P2_Vel). Sensory_Signal = [CPos,CVel,P1_Angle,P1_Vel,P2_Angle,P2_Vel]. The second more complex version of the problem, just as with the single pole balancing problem, only provides the NN with partial state infor- mation, the cart's position, and the first and second pole's angular position. Sensory_Signal = [CPos,P1_Angle,P2_Angle]. This requires the NN based agent to derive the velocities on its own, which can be done by evolving a recurrent NN topology.

As with the single pole balancing problem, the fitness can be based on simply the amount of time the poles have been balanced, or also on the manner in which the agent pushes the cart, using the damping fitness function.

Fig. 14.1 The architecture of single (A.) and double (B.) pole balancing simulations, repre- sented as private scapes with which the agents can interface with, to push the cart and bal- ance the pole/s.

As with the XOR simulator, we will set the pole balancing simulation to be self contained in a private scape process, which will accept sense and push messages from the agent to whom it belongs. Since the simulation of the track/cart/pole is independent of the types of sense signals the agent wishes to use, we will only need to implement a single version of such private scape. We will implement the system using a realistic physical model of the system, and fourth order Runge- Kutta integration, as is specified and done in [1].

Because the two-pole balancing problem is simply an extension of the single pole balancing problem, and because the two poles are independent of each other, we can create a single double pole balancing simulator, which can then be used for either benchmark. It will be the sense and force messages that determine what in-

formation is sent to the sensors of the NN based agent. Furthermore, depending on the parameters sent by the actuator of the agent, the scape will calculate the fitness and decide on whether to use both poles or only a single pole with regards to the termination conditions.

Thus, the scape will always be simulating two poles. But if the agent is being applied to the single pole balancing problem, and this fact will be specified by the actuator and sensor pair used by the agent, the scape which receives the messages from the sensor and actuator of that agent, will simply not take into account the second pole. In this manner, if the second pole falls, deviates more than 36 de- grees from the vertical... it will not trigger the termination condition or affect the fitness in any way. The parameter sent by the actuator will notify the scape that the agent is only concerned with the single pole being balanced.

We will set up the functionality of each such pole balancing simulation, con- tained and wrapped in a private scape, represented as a single process, to use the following steps:

1. PB (pole balancing) private scape is spawned.

2. The PB scape initializes the physical simulation, with the first pole's initial an- gle from the vertical randomly selected to be between -3.6 and 3.6 degrees, and the second pole's angle set to 0 degrees. Furthermore, the first pole's length will be set to 1 meter, and 0.1 meter for the second one.

3. The PB process drops into its main loop, and awaits for sense and push mes- sages.

4. DO:

5. If {From_PId, sense , Parameters} message is received: The Parameters value specifies what type of sensory information should be returned to the caller. If Parameters is set to 2, then the scape will return the cart po- sition and the pole position information. If the Parameters value is set to 3, then the scape will return the cart, pole_1, and pole_2 positions. If 4, then cart position and velocity, plus pole_1 angular position and velocity, will be returned. Finally, if Parameters is set to 6, then the scape will re- turn the cart position and velocity, and the pole_1 and pole_2 angular po- sitions and velocities.

6. If {From_PId, push, Force, Parameters} message is received: The PB scape applies the force specified in the message to the cart, and calculates the results of the physical simulation. The response to the push are calcu- lated for two 0.01s time steps, taking the simulation 0.02 seconds for- ward, and then returning the scape back to waiting for the sense/push messages again. Furthermore, the Parameters value will have the form: {Damping_Flag, PB_Type}, where the Damping_Flag parameter speci- fies whether the fitness function will be calculated with damping features to prevent the rapid shaking of the cart, and where the PB_Type parame- ter specifies whether the private scape should be used as a single pole or double pole balancing simulator. If it is used as a single pole balancing

simulator, then the condition of the second pole will not affect the fitness value, and its reaching the termination condition (falling beyond 36 de- grees from the vertical) will not end the simulation.

UNTIL: Termination condition is reached (goal number of time steps, or one of the boundary condition breaches).

The termination condition is considered to be any one of the following:

The simulation has run for 30 simulated minutes, which is composed of 90000 0.02 second time steps.

The pole has deviated 36 or more degrees from the cart's vertical.

The cart has left the track. The track itself is 4.8 meters long, and the cart will start at the center, and thus be 2.4 meters away from either side. If it goes be- yond -2.4 or 2.4 point on the axis of the track, the termination condition is reached.

Based on this architecture, we will in the following subsection create the pri- vate scape process, and its main loop which after receiving the push message calls the function which does the physical simulation of the track/cart/pole system. Af- terwards, we will create the sensors/actuators and the new morphology specifica- tion entry in the morphology module. These will be the sensors and actuators used by the agents to interface with this type of private scape. Finally, we will then compile and run a quick test of this new problem, to see how well our system per- forms.

14.1.1 Implementing the Pole Balancing Scape

For the pole balancing simulation, the process will need to keep track of the po- sition of the cart on the track, its velocity, the angular position and velocity of both poles, the time step the simulation is currently in, the goal time steps, and finally the fitness accumulated by the interfacing agent. To keep track of all these values, we will use a state record. Listing-14.1 shows the implementation of the pb_sim/2, the pole balancing simulation scape. We will add the source code of this listing to the scape module. The comments after every function in Listing-14.1 elaborate on the details of its implementation.

Listing-14.1 The complete implementation of the pole balancing simulation scape.

-record(pb_state,{cpos=0,cvel=0,p1_angle=3.6*(2*math:pi()/360),p1_vel=0, p2_angle=0,

p2_vel=0, time_step=0, goal_steps=90000,fitness_acc=0}).

pb_sim(ExoSelf_PId)->

random:seed(now()),

pb_sim(ExoSelf_PId,#pb_state{}).

%pb_sim/1 is executed to initialize and startup the pole balancing simulation scape. Once exe- cuted it creates initial #pb_state{}, and drops into the main simulation loop.

pb_sim(ExoSelf_PId,S)->

receive

{From_PId,sense, [Parameter]}->

SenseSignal=case Parameter of

cpos -> [S#pb_state.cpos];

cvel -> [S#pb_state.cvel];

p1_angle -> [S#pb_state.p1_angle];

p1_vel -> [S#pb_state.p1_vel];

p2_angle -> [S#pb_state.p2_angle];

p2_vel -> [S#pb_state.p2_vel];

2 -> [S#pb_state.cpos,S#pb_state.p1_angle];

3 -> [S#pb_state.cpos,S#pb_state.p1_angle,S#pb_state.p2_angle];

4 -> [S#pb_state.cpos, S#pb_state.cvel, S#pb_state.p1_angle,

S#pb_state.p1_vel];

6 -> [S#pb_state.cpos, S#pb_state.cvel, S#pb_state.p1_angle,

S#pb_state.p1_vel, S#pb_state.p2_angle, S#pb_state.p2_vel]

end,

From_PId ! {self(),SenseSignal},

pb_sim(ExoSelf_PId,S);

{From_PId,push,[Damping_Flag,DPB_Flag], [F]}->

AL = 2*math:pi()*(36/360),

U_S=sm_DoublePole(F,S,2),

TimeStep=U_S#pb_state.time_step,

CPos=U_S#pb_state.cpos,

CVel=U_S#pb_state.cvel,

PAngle1=U_S#pb_state.p1_angle,

PVel1=U_S#pb_state.p1_vel,

case (abs(PAngle1) > AL) or (abs(U_S#pb_state.p2_angle)*DPB_Flag > AL)

or (abs(CPos) > 2.4) or (TimeStep >= U_S#pb_state.goal_steps) of

true ->

From_PId ! {self(),0,1},

pb_sim(ExoSelf_PId,#pb_state{});

false ->

Fitness = case Damping_Flag of

without_damping ->

1;

with_damping ->

Fitness1 = TimeStep/1000,

Fitness2 = case TimeStep < 100 of

true ->

0;

false ->

0.75/(abs(CPos) +abs(CVel) +

abs(PAngle1) + abs(PVel1))

end,

Fitness1*0.1 + Fitness2*0.9

end,

From_PId ! {self(),Fitness,0},

pb_sim(ExoSelf_PId, U_S#pb_state{fitness_acc

=U_S#pb_state.fitness_acc+Fitness})

end;

{ExoSelf_PId,terminate} ->

ok

end.

%The pole balancing simulation scape can accept 3 types of messages, push, sense , and termi- nate . When a sense message is received, the scape checks the Parameter value, and based on whether the Parameters == 2, 3,4, or 6, it returns a sensory list with an appropriate number of elements. 2 and 4 specify that the NN based agent wants a sensory signal associated with the single pole balancing problem, with partial or full system information, respectively. 4 and 6 im- plies that the NN wants the scape to send it sensory information associated with double pole balancing, with partial or full system information respectively. When the scape receives the push message, based on the message it decides on what fitness function is used (with or without damping), the actual force to be applied to the cart, and whether the termination condition should be based on the single pole balancing problem (DPB_Flag=0) or double pole balancing problem (DPB_Flag=1). When the angle of the second pole is multiplied by DPB_Flag which is set to 0, the value will always be 0, and thus it cannot trigger the termination condition of being over 36 degrees from the vertical. When it is multiplied by DPB_Flag=1, then its actual angle is used in the calculation of whether the termination condition is triggered or not. Once the mes- sage is received, the scape calculates the new position of the poles and the cart after force F is applied to it. The state of the poles/cart/track system is updated by executing the sm_DoublePole/3 function, which performs the physical simulation calculations.

sm_DoublePole(_F,S,0)->

S#pb_state{time_step=S#pb_state.time_step+1};

sm_DoublePole(F,S,SimStepIndex)->

CPos=S#pb_state.cpos,

CVel=S#pb_state.cvel,

PAngle1=S#pb_state.p1_angle,

PAngle2=S#pb_state.p2_angle,

PVel1=S#pb_state.p1_vel,

PVel2=S#pb_state.p2_vel,

X = CPos, %EdgePositions = [-2.4,2.4],

PHalfLength1 = 0.5, %Half-length of pole 1

PHalfLength2 = 0.05, %Half-length of pole 2

M = 1, %CartMass

PMass1 = 0.1, %Pole1 mass

PMass2 = 0.01, %Pole2 mass

MUc = 0.0005, %Cart-Track Friction Coefficient

MUp = 0.000002, %Pole-Hinge Friction Coefficient

G = -9.81, %Gravity

Delta = 0.01, %Timestep

EM1 = PMass1*(1-(3/4)*math:pow(math:cos(PAngle1),2)),

EM2 = PMass2*(1-(3/4)*math:pow(math:cos(PAngle2),2)),

EF1 = Pmass1*PHalfLength1*math:pow(PVel1,2)*math:sin(PAngle1)+(3/4)*PMass1

math:cos(PAngle1)*(((MUp*PVel1)/(PMass1*PHalfLength1))+G*math:sin(PAngle1)),

EF2 = Pmass2*PHalfLength2*math:pow(PVel2,2)*math:sin(PAngle2)+(3/4)*PMass2

math:cos(PAngle2)*(((MUp*PVel2)/(PMass1*PHalfLength2))+G*math:sin(PAngle2)), NextCAccel = (F - MUc*functions:sgn(CVel)+EF1+EF2)/(M+EM1+EM2),

NextPAccel1 = -(3/(4*PHalfLength1))*((NextCAccel*math:cos(PAngle1))

+(G*math:sin(PAngle1))+((MUp *PVel1)/(PMass1*PHalfLength1))),

NextPAccel2 = -(3/(4*PHalfLength2))*((NextCAccel*math:cos(PAngle2))

+(G*math:sin(PAngle2))+((MUp *PVel2)/(PMass2*PHalfLength2))),

NextCVel = CVel+(Delta*NextCAccel),

NextCPos = CPos+(Delta*CVel),

NextPVel1 = PVel1+(Delta*NextPAccel1),

NextPAngle1 = PAngle1+(Delta*NextPVel1),

NextPVel2 = PVel2+(Delta*NextPAccel2),

NextPAngle2 = PAngle2+(Delta*NextPVel2),

U_S=S#pb_state{

cpos=NextCPos,

cvel=NextCVel,

p1_angle=NextPAngle1,

p1_vel=NextPVel1,

p2_angle=NextPAngle2,

p2_vel=NextPVel2

},

sm_DoublePole(0,U_S,SimStepIndex-1).

%sm_DoublePole/3 performs the calculations needed to keep track of the two poles and the

cart, it simulates the physical properties of the track/cart/pole system. The granularity of the physical simulation is 0.1s, and so to get a state at the end of 0.2s, the calculation of the state is performed twice at the 0.1s granularity. During the first execution of the physical simulation we have the force set to the appropriate force sent by the neurocontroller. But during the second, F=0. Thus the agent actually only applies the force F for 0.1 seconds. This can be changed to have the agent apply the force F for the entire 0.2 seconds.

With the simulation completed, we now need a way for our agents to spawn and interface with it. This will be done through the agent's morphology, its sen- sors and actuators, which we will create next.

14.1.2 Implementing the Pole Balancing morphology

Like the case with the xor_mimic morphology function, which when called re- turns the available sensors or actuators for that particular morphology, we will in this subsection develop the pole_balancing/1 morphology function which does the same. Unlike the xor_mimic though, here we will also populate the parameters el- ement of the sensor and actuator records.

For both sensors and actuators we will again specify the scape element to be of type private: scape = {private, pb_sim} . For the sensor, we will set the parameters to: [2], this parameter can then be modified to 3, 4, or 6, dependent on what test we wish to apply the population of agents to. After every such parameters value change, the morphology module would then have to be recompiled before use. We could simply create multiple morphologies, for example: pole_balancing2, pole_balancing3, pole_balancing4, and pole_balancing6 , but that would not add an advantage over changing the parameters and recompiling, since it would still require us to use our neuroevolutionary system on different problems and thus to change the constraints in either population_monitor or benchmarker modules, and then recompile them still...

Similarly, the actuator record's parameters element is set to: [no_damping,0]. The no_damping tag specifies that the fitness function used should be the simple one that does not take damping into account. The 0 element of the list specifies, based on our implementation of the pb_sim , that the second pole should not be taken into account when calculating the fitness and whether the termination condi- tion is reached. This is achieved in: (abs(U_S#pb_state.p2_angle)*DPB_Flag > AL) , where DPB_Flag is either 0 or 1 . When set to 1, the second pole's condi- tion/angle is taken into account, and when 0, it is not. This is so because 0 = 0*P2_Angle , and 0 is never greater than AL which is set to 36 degrees. Listing- 14.2 shows the implementation of this new addition to the morphology module.

Listing-14.2 The pole_balancing morphology; adding the new pb_GetInput sensor and pb_Push actuator to the morphology module.

pole_balancing(sensors)->

[

sensor{name=pb_GetInput,scape={private,pb_sim},vl=2,parameters=[2]}

];

pole_balancing(actuators)->

[

actuator{name=pb_SendOutput,scape={private,pb_sim},vl=1, parameters

=[no_damping,0]}

].

%Both, the pole balancing sensor and actuator, interface with the pole balancing simulation.

The type of benchmark the pole balancing simulation is used as (whether it is used as a double

pole or a single pole balancing benchmark) depends on the sensor and actuator parameters. The sensor's vl and parameters specify that the sensor will request the private scape for the cart's position and pole's angular position. The actuator's parameters specify that the scape should use no_damping type of fitness, and that since only a single pole is being used, that the termina- tion condition associated with the second pole is zeroed out, by being multiplied by 0. When in- stead of using 0 we use 1, the private scape will use the angular position of the second pole as an element in calculating whether the termination condition has been reached or not.

Having specified the sensor and the actuator used by the pole_balancing mor- phology, we now need to implement them both. The pb_GetInput sensor will be similar to the xor_GetInput, only it will use its Parameters value in its message to the private scape it is associated with, as shown in Listing-14.3. We add this new sensor function to the sensor module, placing it after the xor_GetInput/3 function.

Listing-14.3 The implementation of the pb_GetInput sensor. pb_GetInput(VL,Parameters,Scape)->

Scape ! {self(),sense,Parameters},

receive

{Scape,percept,SensoryVector}->

case length(SensoryVector)==VL of

true ->

SensoryVector;

false ->

io:format( “Error in sensor:pb_GetInput/2, VL:~p

SensoryVector:~p~n ”, [VL,SensoryVector]),

lists:duplicate(VL,0)

end

end.

Similarly, Listing-14.4 shows the implementation of the actuator pb_SendOutput/3 function, added to the actuator module. It too is similar to the xor_SendOutput/3 function, but unlike its neighbor, it sends its Parameters value as an element of the message that it forwards to the scape. Because we usually im- plement the morphologies and the scapes together, we can set up any type of inter- facing, and thus be able to implement complex scapes and messaging schemes with ease.

Listing-14.4 The implementation of the pb_SendOutput actuator. pb_SendOutput([Output],Parameters,Scape)->

Scape ! {self(),push,Parameters,[10*functions:sat(Output,1,-1)]},

receive

{Scape,Fitness,HaltFlag}->

{Fitness,HaltFlag}

end.

Though simple to implement, this new problem allows us to test the ability of our neuroevolutionary system to evolve neurocontrollers on problems which re- quire a greater level of complexity than the simple XOR mimicry problem. The benchmarking of our system on this problem also allows us to compare its results to those of other neuroevolutionary systems. Having implemented this new simu- lation, we now move forward in running a quick test on it in the next subsection.

14.1.3 Benchmark Results

In the previous chapter we have developed the benchmarking and reporting tools specifically to improve our ability to test new additions to the system. Thus all we must do now is to decide which variation of the pole balancing test to apply our system to, and then execute the benchmarker:start/1 function with the appro- priate constraint, pmp, and experiment parameters .

Our benchmarker, on top of generating graphable data, also calculates the sim- ple average number of evaluations from all the evolutionary runs within the exper- iment, which is exactly the number we seek because the benchmark here is how quickly a solution can be evolved on average using our system. Let us run 3 exper- iments, which will only entail us to execute the benchmarker:start/1 function 3 times, each time with a different sensor and actuator specification. Thus we next run three experiments, each with its own morphological setup:

1. The single pole, partial information, standard fitness function (without damp- ing) benchmark:

pole_balancing(sensors)->

[ #sensor{name=pb_GetInput,scape={private,pb_sim},vl= 2 ,parameters=[ 2 ]} ]; pole_balancing(actuators)->

[#actuator{name=pb_SendOutput,scape={private,pb_sim},vl=1, parameters

=[ without_damping,0 ]}].

2. The double pole, partial information, standard fitness function (without damp- ing) benchmark:

pole_balancing(sensors)->

[ #sensor{name=pb_GetInput,scape={private,pb_sim},vl= 3 ,parameters=[ 3 ]} ]; pole_balancing(actuators)->

[#actuator{name=pb_SendOutput,scape={private,pb_sim},vl=1, parameters

=[ without_damping,1 ]}].

3. The double pole, partial information, with damping fitness function benchmark:

pole_balancing(sensors)->

[ #sensor{name=pb_GetInput,scape={private,pb_sim},vl= 3 ,parameters=[ 3 ]} ]; pole_balancing(actuators)->

[ #actuator{name=pb_SendOutput,scape={private,pb_sim},vl=1, parameters

=[ with_damping,1 ]} ].

We must also set the pmp's fitness goal to 90000, since with the standard, without_damping fitness function, the 90000 fitness score represents the NN based agent's ability to balance a pole for 30 minutes. But what about the with_damping simulation? In that event a neurocontroller will have different fit- ness scores for the same number of time steps that it has balanced the pole/s, since the fitness will be based on its effectiveness of balancing the poles as well. In the same manner, different number of time steps of balancing the pole/s might map to the same fitness score... This situation arises due to the fact that the more compli- cated problems will not have a one-to-one mapping with regards to fitness scores reached, and progress towards solving a given problem or achieving some goal. Different such simulations and problems will have different types of fitness scores, and using a termination condition based on a fitness goal value set in the population_monitor, will not work. On the other hand, each simulation/problem it- self, will have all the necessary information about the agent's performance to de- cide whether a goal has been reached or not.

Furthermore, sometimes we wish to see just how quickly on average the neuroevolutionary system can generate a result for a problem, at those times we only care about the minimum number of evaluations needed to reach the solution. In our system no matter when the termination condition is reached, it is not until all the agents of the current generation, or all the currently active agents, have terminated, that the evolutionary run is complete. This means that the total number of evaluations keeps incrementing even after the goal has already been reached, simply because the currently-still-running agents are continuing being tuned.

To solve both problems, we can allow each scape to inform the agent that it has reached the particular goal of the problem/scape when it has done so. At this point the agent would forward that message to the population_monitor, which could then stop counting the evaluations by freezing the tot_evaluations value. In this one move we allow each scape to use the extra feature of goal_reached notifica- tion ability to be able to, on its own terms, use any fitness function, and at the same time be able to stop and notify the agent that it has reached the particular fit- ness goal, or solved the problem, and thus stop the evaluations accumulator from incrementing. This will allow us to no longer need to calculate fitness goals for every problem by pre-calculating various values (fitness goals) and setting them in the population_monitor. This method will also allow us to deal with problems where the fitness score is not directly related to the completion of the problem or to the reaching of the goal, and thus cannot be used as the termination condition in the first place. Thus, before we run the benchmarks, let's make this small program modification.

Currently when the agent has triggered the scape's stopping condition, the scape sends back to the agent the message: {Scape_PId,0,1}, where 0 means that it has received 0 fitness points for this last event, and 1 means that this particular scape has reached its termination condition. The actuator does nothing with this value but pass it to the cortex, thus if we retain the same message structure, we can piggyback it with new functionality. We will allow each scape to also have, on top of the standard termination conditions, the ability to check for its own goal reach- ing condition. When that goal condition is reached, instead of sending to the ac- tuator the original message, the scape will send it: {Scape_PId,goal_reached,1}. The actuator does not have to be changed, its job is simply to forward this mes- sage to the cortex.

In the cortex we modify its receive clause to check whether the Fitness score sent to it is actually an atom goal_reached . The new receive clause is implement- ed as follows:

{APId,sync,Fitness,EndFlag} ->

case Fitness == goal_reached of

true ->

put(goal_reached,true),

loop(Id,ExoSelf_PId,SPIds,{APIds,MAPIds},NPIds,CycleAcc,FitnessAcc,

EFAcc +EndFlag, active);

false ->

loop(Id,ExoSelf_PId,SPIds,{APIds,MAPIds},NPIds,CycleAcc,FitnessAcc

+Fitness, EFAcc +EndFlag, active)

end;

We also modify the cortex's message to the exoself when its evaluation termi- nation condition has been triggered by the EndFlag, when the actuator sends it the message of the form: {APId, sync, Fitness, EndFlag}. The new message the cortex sends to the exoself is extended to include the note on whether goal_reached is set to true or not. The new message format will be: {self(), evaluation_completed, FitnessAcc, CycleAcc, TimeDif, get(goal_reached)}.

Reflectively, the exoself's receive pattern is extended to receive the GoalReachedFlag message, and to then forward it to the population_monitor, as shown by the boldfaced source code in the following listing:

Listing-14.5 The updated exoself's receive pattern.

loop(S)->

receive

{Cx_PId,evaluation_completed,Fitness,Cycles,Time ,GoalReachedFlag }->

case (U_Attempt >= S#state.max_attempts) or (GoalReachedFlag==true) of true -> %End training

A=genotype:dirty_read({agent,S#state.agent_id}),

genotype:write(A#agent{fitness=U_HighestFitness}),

backup_genotype(S#state.idsNpids,S#state.npids),

terminate_phenotype(S#state.cx_pid,S#state.spids,S#state.npids,

S#state.apids, S#state.scape_pids),

io:format( “Agent:~p terminating. Genotype has been backed

up.~n Fitness:~p~n TotEvaluations:~p~n TotCycles:~p~n TimeAcc:~p~n ”, [self(), U_HighestFitness, U_EvalAcc,U_CycleAcc, U_TimeAcc]),

case GoalReachedFlag of

true ->

gen_server:cast(S#state.pm_pid,

{S#state.agent_id, goal_reached);

_ ->

ok

end,

gen_server:cast(S#state.pm_pid,{S#state.agent_id,terminated,

U_HighestFitness});

…

Next, we update the population_monitor by first adding to its state record the goal_reached element, which is set to false by default, and then by adding to it a new handle_cast clause:

handle_cast({_From,goal_reached},S)->

U_S=S#state{goal_reached=true},

{noreply,U_S};

This cast clause sets the goal_reached parameter to true when triggered. Final- ly, we add to all population_monitor's termination condition recognition cases the additional operator: “ or S#state.goal_reached ”, and modify the evaluations mes- sage receiving handle_cast clause to:

handle_cast({From,evaluations,Specie_Id,AEA,AgentCycleAcc,AgentTimeAcc},S)->

AgentEvalAcc=case S#state.goal_reached of

true ->

0;

_ ->

AEA

end,

This ensures that the population_monitor stops counting evaluations when the goal_reached flag is set to true. These changes effectively modify our system, giv- ing it the ability to use the goal_reached parameter. This entire modification is succinctly shown in Fig-14.2 .

Fig. 14.2 The updated goal_reached message processing capable scape, and the goal_reached signal's travel path: scape to actuator to cortex to exoself to popula- tion_monitor.

This small change allows us to continue with our pole_balancing benchmarking test. And thus we finally set the experiment's tot_runs parameter to 50, which makes the benchmarker run 50 evolutionary runs in total, which means that the calculated average is based on 50 runs, which is a standard for this type of prob- lem.

To run the first benchmark, we simply use the morphology setup listed earlier, set the fitness_goal parameter of the pmp record to 90000, the tot_runs to 50, and leave everything else as default. We then compile and reload everything by run- ning polis:sync(), and execute the benchmarker:start(spb_without_damping) func- tion, where spb_without_damping is the Id we give to this experiment, which stands for Single Pole Balancing Without Damping.

With this setup, the benchmarker will spawn the population_monitor process, wait for the evolutionary run to complete, add the resulting trace to the experi- ment's stats list, and then perform another evolutionary run. In total 50 evolution- ary runs will comprise the benchmark. The result we are after is not the graphable data, but the report's average evaluations value (the average number of evalua- tions taken to reach the goal), and its standard deviation. The results of the first benchmark are shown in the following listing.

Listing-14.6 The results of the single pole balancing, partial information, without_damping, benchmark.

3> benchmarker:start(spb_without_damping).

...

- - - - Traces_Acc written to file: ”benchmarks/report_Trace_Acc ” Graph:{graph,pole_balancing,

[1.1782424242424248],

[0.16452932686308724],

[60910.254989899],

[24190.827695700948],

[75696.42],

[32275.24],

[6.04],

[1.232233744059949],

[457.72],

[]}

Tot Evaluations Avg: 646.78 Std: 325.8772339394086

It works! The results are also rather excellent, on average taking only 646 eval- uations (though as can be seen from the standard deviation, there were times when it was much faster). We achieved this high performance (as compared to the re- sults of other neuroevolutionary systems) without even having taken the time to optimize or tune our neuroevolutionary system yet. If we compare the resulting evaluations average that we received from our benchmark (your results might dif- fer slightly), to those done by others, for example compared to the list put together in paper [1], we see that our system is the most efficient of the topology and weight evolving artificial neural network systems on this benchmark. The two faster neuroevolutionary systems ESP [2], and CoSyNE [3], do not evolve topolo- gy. The ESP and CoSyNE systems solved the problem in 589 and 127 evaluations respectively, while the CNE [4] and SANE [5] and NEAT [6] solved it in 724, 1212, and 1523 evaluations on average, respectively.

When using the non topology and weight evolving neuroevolutionary systems (ESP, CMA-ES, and CoSyNE), the researcher must first create a topology he knows works (or have the neuroevolutionary system generate random topologies, rather than evolving one from another), and then the neuroevolutionary system simply optimizes the synaptic weights to a working combination of values. But such systems cannot be applied to previously unknown problems, or problems for which we do not know the topology, nor its complexity and size, beforehand. For complex problems, topology cannot be predicted, in fact this is why we use a to- pology and weight evolving artificial neural network system, because we cannot predict and create the topology for non-toy problems on our own, we require the help of evolution.

Next we benchmark our system on the second problem, the more complex dou- ble pole balancing problem which uses a standard fitness function without damp- ing. Listing-14.7 shows the results of the experiment.

Listing-14.7 The double pole balancing benchmark, using the without_damping fitness func-

tion.

3> benchmarker:start(spb_without_damping).

...

Graph:{graph,pole_balancing,

[2.4315606060606063],

[0.8808311444164436],

[22194.480560606058],

[15614.417335306674],

[34476.74],

[6285.78],

[7.34],

[1.4779715829473847],

[500.0],

[]}

Tot Evaluations Avg: 5184.0 Std: 3595.622677645695

Our system was able to solve the given problem in 5184 evaluations, whereas again based on the table provided in [1], the next closest TWEANN in that table is ESP [2], which solved it in 7374 evaluations on average. But, the DXNN system we discussed earlier was able to solve the same problem in 2359 evaluations on average. As we continue advancing and improving the system we're developing together, it too will improve to such numbers.

Finally, we run the third benchmark, the double pole balancing with partial state information and with damping. Because we have added the goal_reached messaging by the scapes, we can deal with the non one-to-one mapping between the number of time steps the agent can balance the cart, and the fitness calculated for this balancing act. Thus, we modify the pmp's fitness_goal back to inf, letting the scape terminate when the goal has been reached, and thus when the evaluation run should stop (we could have done the same thing during the previous experi- ment, rather than using the fitness goal of 90000, which was possible due to the goal and fitness having a one-to-one mapping). The results of this experiment are shown in Listing-14.8.

Listing-14.8 The results of running the double pole balancing with damping benchmark. Graph:{graph,pole_balancing,

[3.056909090909092],

[1.3611906067001034],

Chapter 14 Creating the Two Slightly More Complex Benchmarks [67318.29389102172],

[84335.29879824212],

[102347.17542007213],

[11861.325171196118],

[7.32],

[1.5157836257197137],

[500.0],

[]}

Tot Evaluations Avg: 4792.38 Std:3834.866761127432

It works! The goal_reached feature has worked, and the average number of evaluations our neuroevolutionary system needed to produce a result is highly competitive to other state of the art systems as shown in Table-14.1 which quotes the benchmark results from [1]. The DXNN system's benchmark results are also added to the table for comparison, with the results of our system added at the bot- tom. Note that neither CMA-ES nor CoSyNE evolves neural topologies. These two systems only optimize the synaptic weights of the already provided NN.

Table 14.1 Benchmark results for the pole balancing problem.

Method

Single-Pole/Incomplete state Double-Pole/Partial Information Double-Pole W/ Information W/O Damping Damping

RWG

SANE

CNE*

ESP

NEAT

CMA-ES*

CoSyNE*

DXNN

8557

1212

-

127*

Not Performed

415209

262700

76906*

7374

-

3521*

1249*

2359

1232296

451612

87623*

26342

6929

6061*

3416*

2313

OurSystem 647

5184

4792

These do not evolve topologies, but only optimize the synaptic weights

Having completed developing these two benchmarks, and having finished test- ing our TWEANN system on the pole and double pole balancing benchmark, we move forward and begin developing the more complex T-Maze problem.

14.2 T-Maze Simulation

The T-Maze problem is another standard problem that is used to test the ability of a NN based system to learn and change its strategy while existing in, and inter- acting with, a maze environment. In this problem an agent navigates a T shaped maze as shown in Fig-14.3 . At one horizontal end of the maze is a low reward,

and at another a high reward. The agent is a simulated robot which navigates the maze. Every time the robot crashes into a wall or reaches one of the maze's ends, its position is reset to the start of the maze. The whole simulation run (agent is al- lowed to navigate the maze until it either finds the reward and its position resets to base, or crashes into a wall and its position is reset to base) lasts X number of maze runs, which is usually set to 100. At some random time during those 100 maze runs, the high and low reward positions are swapped. The goal is for the agent to gather as many reward points as possible. Thus, if the agent has been reaching the high reward end of the maze, and suddenly there was a switch, the best strategy is for the agent when it has reached the location of where previously there was a high reward, is to realize that it now needs to change its strategy and always go to the other side of the maze, for the remainder of the simulation. To do this, the agent must remember what reward it has picked up and on what side, and change its traveling path after noticing that the rewards have been switched, which is most easily done when some of the agent's neurons are plastic.

Fig. 14.3 The T-Maze setup.

We will create a simplified version of the T-Maze problem. It is used widely [6,7], and it does not require us to develop an entire 2d environment and robot simulation (which we will do in Chapter-18, when we create an Artificial Life simulation). Our T-Maze will have all the important features of the problem, but will not require true navigation in 2d space. We will create a discrete version of the T-Maze, as shown in Fig-14.4 .

Fig. 14.4 A discrete version of the T-Maze simulation.

The agents traveling through the maze will be able to move forward, and turn left or right, but there will be no width to the corridors. The corridors will have a certain discrete length, and the agent will see forward in a sense that its range sen- sor will measure the distance to the wall ahead, and its side sensors will measure a distance to the sides of the “corridor ” it is in, which when traveling down a single dimensional corridor will be 0, yet when reaching the T intersection, will show that it can turn left or right. The turns themselves will be discrete 90 degree turns, thus allowing the agent to turn left or right, and continue forward to gather the re- ward at the end of the corridor. This version of the T-Maze though simple, still re- quires the agent to solve the same problem as the non discrete Maze. In the dis- crete version, the agent must still remember where the reward is, evolve an ability to move down the corridors and turn and move in the turned direction where there is space to move forward, and finally, remember on which side of the maze it last found the highest reward.

The T-Maze will be contained in a private scape, and the movement and senses will, as in the previous simulation, be done through the sending and receiving of messages. Because we will create a discrete version of the maze, we can simulate the whole maze by simply deciding on the discrete length of each section of the corridor, and what the agent will receive as its sensory signals when in a particular section of the maze. The agent will use a combination of the following two sen- sors:

1. distance_sensor: A laser distance sensor pointing forward, to the left side, and to the right side, with respect to the simulated robot's direction. Since the maze is self contained and closed, the sensors will always return a distance. When traveling down the single dimensional corridor, the forward sensor will return the distance to the wall ahead, and the side distance sensors will return 0, since there is no place to move sideways. When the agent reaches an intersection, the side range sensors will return the distances to the walls on the side, thus the

agent can decide which way to turn. If the agent has reached a dead end, then both the forward facing, and the side facing range sensors will return 0, which will require the agent to turn, at which point it can start traveling in the other direction.

2. reward_consumed: The agent needs to know not only where the reward is, but how large it is, since the agent must explore the two rewards, and then for the remainder of the evaluation go towards the larger reward. To do this, the agent must have a sensory signal which tells it how large the reward it just consumed is. This sensor forwards to the NN a vector of length one: [RewardMagnitude], where RewardMagnitude is the magnitude of the actual reward.

The agent must also be able to move around this simplified, discrete labyrinth. There are different ways that we could allow the NN based agent to control the simulated robot within the maze. We could create an actuator that uses a vector of length one, where this single value is then used to decide whether the agent is to turn left (if the value is < -0.33), or turn right (if the value is > 0.33) or continue moving forward (if the value is between -0.33 and 0.33). Another type of actuator could be based on the differential drive, similar to one used by the Khepera [5] ro- bot (a small puck shaped robot). The differential_drive actuator would have as in- put a vector of length 2: [Val1,Val2], where Val1 would control the rotation speed of the left wheel, and Val2 would control the rotation speed of the right wheel. In this manner if both wheels are spinning backwards (Val1 < 0, and Val2 < 0), the simulated robot moves backwards, if both spin forward with the same speed, then the robot moves forward. If they spin at different speeds, the robot either turns left or right depending on the angular velocities of the two wheels. Finally, we could create an actuator that accepts an input vector of length 2: [Val1,Val2], where Val1 maps directly to the simulated robot's velocity on the Y axis, and Val2 maps to the robot's velocity on the X axis. This would be a simple transla- tion_drive actuator, and the simulated robot in this scenario would not be able to rotate. The inability to rotate could be alleviated if we add a third element to the vector, which we could than map to the angular velocity value, which would dic- tate the robot's rotation clockwise or counterclockwise, dependent on that value's sign. Or Val1 could dictate the robot's movement forward/backward, and Val2 could dictate whether the robot should turn left, right, or not at all. There are many ways in which we could let the NN control the movement of the simulated robot. For our discrete version of the T-Maze problem, we will use the same movement control method that was used in paper [7] which tested another NN system on the discrete T-Maze problem. This actuator accepts an input from a single neuron, and uses this accumulated vector: [Val], to then calculate whether to move forward, turn counterclockwise and move forward in that direction, or turn clockwise and then move forward in that direction. If Val is between -0.33 and 0.33, the agent moves one step forward, if it is less than -0.33, the agent turns counterclockwise and then moves one step forward, and if Val is greater than 0.33, the agent turns clockwise and moves one step forward in the new direction.

Due to this being a discrete version of the maze, it can easily be represented as a state machine, or simply as a list of discrete sections. Looking back at Fig-14.4 , we can use a list to keep track of all the sensor responses for every position and orientation within the maze. In the standard discrete T-Maze implementation used in [7], there are in total 4 sectors. The agent starts at the bottom of the T-Maze lo- cated at {X=0,Y=0}, it can then move up to {0,1}, which is an intersection. At this point the agent can turn left and move a step forward to {-1,1}, or turn right and move a step forward to {1,1}.

If we are to draw the maze on a Cartesian plane, the agent can be turned to face towards the positive X axis, at 0 degrees, the positive Y axis at 90 degrees, the negative X axis at 180 degrees, and finally the negative Y axis, at 270 degrees. And if the maze is drawn on the Cartesian plane, then each sector's Id can be its coordinate on that plane. With the simulated robot in this maze being in one of the sectors (on one of the coordinates {0,0},{0,1},{1,1},or {-1,1}), and looking in one particular direction (at 0, 90, 180, or 270 degrees), we can then perfectly define what the sensory signals returned to the simulated robot should be. But before we can do that, we need a format for how to store the simulated robot's location, viewing direction, and how it should perceive whether it is looking at a wall, or at a reward located at one of the maze's ends. The superposition of the T-Maze on a Cartesian plane, with a few examples of the agent's position/orientation, and what sensory signals it receives there, is shown in Fig-14.5 .

Fig. 14.5 Discrete T-Maze, and the sensory signals the simulated robot receives at various locations and orientations. The agent is shown as a gray circle, with the arrow pointing in the direction the simulated robot is looking, its orientation.

We will let each discrete sector keep track of the following:

 id : It's own id, its Cartesian coordinate.

 r : The reward the agent gets for being in that sector. There will be only two sectors that give reward, the two horizontal endings of the “ T ” . This reward will be sensed by the reward_sensor.

 description : This will be the list that contains all the sensory information available when the agent is in that particular sector. In this simulation it will contain the range sensory signals. This means that each section will contain 4 sets of range sensory signals, one each for when the simulated robot is turned and is looking at 0, at 90, at 180, and at 270 degrees in that sector. Each of the range signals appropriate for the agent's particular orientation can then be ex- tracted through a key, where the key is the agent's orientation in degrees (one of the four: 0, 90, 180, or 270). The complete form of the description list is as follows: [{0, NextSector, RangeSense}, {90, NextSector, RangeSense}, {180, NextSector, RangeSense}, {270, NextSector, RangeSense}] . The NextSector pa- rameter specifies what is the coordinate of the next sector that is reachable from the current sector, given that the agent will move forward while in the current orientation. Thus, if for example the agent's forward is at 90 degrees, looking toward the positive Y axis on the Cartesian coordinate, and its actuator speci- fies that it should move forward, then we look at the 90 degree based tuple, and move the agent to the NextSector of that tuple.

We will call the record containing all the sector information of a single sector: dtm_sector , which stands for Discrete T-Maze Sector. An example of the sector located at coordinate [0,0], and part of the maze shown in the above figure, is as follows:

dtm_sector{id=[0,0],description=[{0,[],[1,0,0]}, {90,[0,1],[0,1,0]} ,{180,[],[0,0,1]},{270,[],[0,0,

0]}],r=0}

Let's take a closer look at this sector, located at [0,0], and on which the agent is for example turned at 90 degrees, and thus looking towards the positive Y axis. For this particular orientation when the agent requests sensory signals, they will come from the following tuple: {90,[0,1],[0,1,0]} , also highlighted in the above record. The first value, 90, is the orientation for which the follow-up senso- ry information is listed. The [0,1] is the coordinate of the sector to which the agent will move if it decides to move forward at this orientation. The vector [0,1,0] is the range sensory signal, and is fed to the agent's range sensor when requested. It states that on both sides, the agent's left and right, there are walls right next to it, and the distance to them is 0, and that straight ahead the wall does not come up for 1 sector. The value r=0 states that the current sector has no reward, and this is the value fed to the agent's reward sensor.

Thus this allows the agent to move around the discrete maze, travel from one sector to another, where each sector has all the information needed when the agent's sensors send a request for percepts. These sectors will all be contained in a single record's list, used by the private scape which represents the entire maze.

We will call the record for this private scape: dtm_state , and it will have the fol- lowing default format:

-record(dtm_state,{agent_position=[0,0],agent_direction=90,sectors=[],tot_runs=60,

run_index=0, switch_event, fitness_acc=0}).

Let's go through each of this record's elements and discuss its meaning: agent_position : Keeps track of the agent's current position, the default is [0,0],

the agent's starting position in the maze.

agent_direction : Keeps track of the agent's current orientation, the default is 90 degrees, where the agent is looking down the maze, towards the positive Y axis.

sectors : This is a list of all the sectors: [SectorRecord1...SectorRecordN], each of which is represented by the dtm_sector record, and a list of which will repre- sent the entire T-Maze.

tot_runs : Sets the total number of maze runs (trials) the agent performs per evaluation.

run_index : This parameter keeps track of the current maze run index. switch_event : Is the run index during which the large and small reward loca- tions are switched. This will require the agent, if it wants to continue collecting the larger reward, to first go to the large reward's original position, at which it will now find the smaller reward, figure out that the location of the large re- ward has changed, and during the following maze run go to the other side of the maze to collect the larger reward.

switched : Since the switch of the reward locations needs to take place only once during the entire tot_runs of maze runs, we will set this parameter to false by default, and then to true once the switch is made, so that this parameter can then be used as a flag to ensure that no other switch is performed for the re- mainder of the maze runs.

step_index : If we let the agents travel through the maze for as long as they want, there might be certain phenotypes that simply spin around in one place, although not possible with our current type of actuator, which requires the agent to take a step every time, either forward, to the right, or to the left. To prevent such infinite spins when we decide to use another type of actuator, we will give each agent only a limited number of steps. It takes a minimum of 2 steps to get from the base of the maze to one of the rewards, 1 step up the main vertical hall, and 1 turn/move step to the left or right. With an eye to the future, we will give the agents a maximum of 50 steps, after which the maze run ends as if the agent crashed into a wall. Though not useful in this implementation, it might become useful when you extend this maze and start exploring other actu- ators, sensors...

As with the pole balancing, this private scape will allow the agent to send it messages requesting sensory signals, either all signals (range sense, and the just

acquired reward size sense) merged into a single vector, or one sensory signal vec- tor at a time. And it will allow the agent to send it signals from its actuators, dic- tating whether it should move or rotate/move the simulated robot.

Thus, putting all of this together: The scape will keep track of the agent's position and orientation, and be able to act on the messages sent from its sensor and actua- tor, and based on them control the agent's avatar. The T-Maze will start with the large and small rewards at the two opposite sides of the T-Maze, and then at some random maze run to which the switch_event is set (different for each evaluation), the large and small reward locations will flip, and require for the agent to figure this out and go to the new location if it wants to continue collecting the larger of the two rewards. As per the standard T-Maze implementation, the large reward is worth 1 point, and the small reward is worth 0.2 points. If at any time the agent hits a wall, by for example turn/moving when located at the base of the maze, and thus hitting the wall, the maze run ends and the agent is penalized with -0.4 fitness points, is then re-spawned at the base of the maze, and the run_index is increment- ed. If the agent collects the reward, the maze run ends and the agent is re-spawned at the base of the maze, with the run_index incremented. Finally, once the agent has finished tot_runs number of maze runs, the evaluation of the agent's fitness ends, at which point the exoself might perturb the NN's synaptic weights, or end the tuning run... To ensure that the agents do not end up with negative fitness scores when setting the tot_runs to 100, we will start the agents off with 50 fitness points. Thus an agent that always crashes will have a minimum fitness score of 50 – 100*0.4 = 10.

Finally, though we will implement the T-Maze scenario where the agent gets to the reward at one of the maze's ends, and is then teleported back to the base of the maze for another maze-run, there are other possible implementations and scenari- os. For example, as is demonstrated in Fig-14.6 , we could also extend the maze to have teleportation portals located at {-2,1} and {2,1}, through which the agent has to go after gathering the food, so that it is teleported back to the base to reset the rewards. Or we could require it to have to travel all the way back to the base man- ually, though we would need to change the simple actuator so that it can rotate in place without crashing into walls. Finally, we could also create the T-Maze which allows for both options, teleportation and manual travel. All, the 3 extended T- Mazes, and 1 default T-Maze which we will implement, are shown in the follow- ing figure.

Fig. 14.6 The various possible scenarios for the T-Maze after the agent has acquired the reward.

Having decided on the architecture, and having created Fig-14.5 and Fig-14.6d to guide us in the designing and setting the T-Maze system and each of its sectors, we can now move forward to the next subsection and implement this private T- Maze scape, and the needed sensors and actuators to interface with it.

14.2.1 T-Maze Implementation

Through Fig-14.5 we can immediately map the maze's architecture to its im- plementation shown in Listing-14.9. For the implementation we first define the two new records needed by this new scape: the dtm_sector and dtm_state records. The function dtm_sim/1 prepares and starts up the maze, dropping into the pro- cess's main loop. In this main loop the scape process can accept requests for sen- sory signals, and accept signals from the actuators and return to them a message containing the fitness points acquired. The sensors we will use will poll the private scape for an extended range sensor, which is a vector of length 4, and contains the signals from the agent's range sensor, appended with the reward value in the cur- rent maze sector: [Reward,L,F,R] , where Reward is the value of the actual reward, L is the range to the left wall, F is the range to the wall in front, and R is the range to the wall on the right.

Listing-14.9 The implementation of the Discrete T-Maze scape.

-record(dtm_sector,{

id,

description=[],

r

}).

-record(dtm_state,{

agent_position=[0,0],

agent_direction=90,

sectors=set_tmaze_sectors(),

tot_runs=100,

run_index=0,

switch_event=35+random:uniform(30),

switched=false,

step_index=0,

fitness_acc=50

}).

dtm_sim(ExoSelf_PId)->

io:format( “Starting dtm_sim~n ”),

random:seed(now()),

dtm_sim(ExoSelf_PId,#dtm_state{}).

dtm_sim(ExoSelf_PId,S) when (S#dtm_state.run_index == S#dtm_state.switch_event) and (S#dtm_state.switched==false)->

Sectors=S#dtm_state.sectors,

SectorA=lists:keyfind([1,1],2,Sectors),

SectorB=lists:keyfind([-1,1],2,Sectors),

U_SectorA=SectorA#dtm_sector{r=SectorB#dtm_sector.r},

U_SectorB=SectorB#dtm_sector{r=SectorA#dtm_sector.r},

U_Sectors=lists:keyreplace([-1,1],2,lists:keyreplace([1,1],2,Sectors, U_SectorA),

U_SectorB),

scape:dtm_sim(ExoSelf_PId,S#dtm_state{sectors=U_Sectors, switched=true});

dtm_sim(ExoSelf_PId,S)->

receive

{From_PId,sense,Parameters}->

APos = S#dtm_state.agent_position,

ADir = S#dtm_state.agent_direction,

Sector=lists:keyfind(APos,2,S#dtm_state.sectors),

{ADir,NextSec,RangeSense} = lists:keyfind(ADir,1, Sec-

tor#dtm_sector.description),

SenseSignal=case Parameters of

[all] ->

RangeSense++[Sector#dtm_sector.r];

[range_sense]->

RangeSense;

[reward] ->

[Sector#dtm_sector.r]

end,

From_PId ! {self(),percept,SenseSignal},

scape:dtm_sim(ExoSelf_PId,S);

{From_PId,move,_Parameters,[Move]}->

APos = S#dtm_state.agent_position,

ADir = S#dtm_state.agent_direction,

Sector=lists:keyfind(APos,2,S#dtm_state.sectors),

U_StepIndex = S#dtm_state.step_index+1,

{ADir,NextSec,RangeSense} = lists:keyfind(ADir,1,

Sector#dtm_sector.description),

if

(APos == [1,1]) or (APos == [-1,1]) ->

Updated_RunIndex=S#dtm_state.run_index+1,

case Updated_RunIndex >= S#dtm_state.tot_runs of true ->

From_PId ! {self(), S#dtm_state.fitness_acc

+Sector#dtm_sector.r, 1},

dtm_sim(ExoSelf_PId,#dtm_state{});

false ->

From_PId ! {self(),0,0},

U_S = S#dtm_state{

agent_position=[0,0],

agent_direction=90,

run_index=Updated_RunIndex,

step_index = 0,

fitness_acc = S#dtm_state.fitness_acc

+Sector#dtm_sector.r

},

dtm_sim(ExoSelf_PId,U_S)

end;

Move > 0.33 -> %clockwise NewDir=(S#dtm_state.agent_direction + 270) rem 360, {NewDir,NewNextSec,NewRangeSense} =

lists:keyfind(NewDir, 1, Sector#dtm_sector.description),

U_S = move(ExoSelf_PId,From_PId,S#dtm_state{

agent_direction =NewDir},NewNextSec,U_StepIndex),

dtm_sim(ExoSelf_PId,U_S);

Move < -0.33 -> %counterclockwise NewDir=(S#dtm_state.agent_direction + 90) rem 360,

{NewDir,NewNextSec,NewRangeSense} =

lists:keyfind(NewDir, 1, Sector#dtm_sector.description),

U_S = move(ExoSelf_PId,From_PId,S#dtm_state{

agent_direction=NewDir},NewNextSec,U_StepIndex),

dtm_sim(ExoSelf_PId,U_S);

true -> %forward

move(ExoSelf_PId,From_PId,S,NextSec,U_StepIndex)

end;

{ExoSelf_PId,terminate} ->

ok

end.

% The dtm_sim/2 function generates a simulated discrete T-Maze scape, with all the sensory information and the maze architecture specified through a list of sector records. The scape can receive signals from the agent's sensor, to which it then replies with the sensory information, and it can receive the messages from the agent's actuator, which it uses to move the agent's av- atar around the maze.

move(ExoSelf_PId,From_PId,S,NextSec,U_StepIndex)->

case NextSec of

[] -> %wall crash/restart_state

Updated_RunIndex = S#dtm_state.run_index+1,

case Updated_RunIndex >= S#dtm_state.tot_runs of

true ->

From_PId ! {self(),S#dtm_state.fitness_acc-0.4,1},

dtm_sim(ExoSelf_PId,#dtm_state{});

false ->

From_PId ! {self(),0,0},

U_S = S#dtm_state{

agent_position=[0,0],

agent_direction=90,

run_index=Updated_RunIndex,

step_index = 0,

fitness_acc = S#dtm_state.fitness_acc-0.4

},

dtm_sim(ExoSelf_PId,U_S)

end;

_ -> %move

From_PId ! {self(),0,0},

U_S = S#dtm_state{

agent_position=NextSec,

step_index = U_StepIndex

},

dtm_sim(ExoSelf_PId,U_S)

end.

%The move/5 function accepts as input the State S of the scape, and the specification of where

the agent wants to move its avatar next, NextSec. The function then determines whether that

next sector exists, or whether the agent will hit a wall if it moves in its currently chosen direc- tion.

set_tmaze_sectors()->

Sectors = [

dtm_sector{id=[0,0],description=[{0,[],[1,0,0]},{90,[0,1],[0,1,0]},{180,[],[0,0,1]},

{270,[], [0,0,0]}],r=0},

dtm_sector{id=[0,1],description=[{0,[1,1],[0,1,1]},{90,[],[1,0,1]},{180,[-1,1],

[1,1,0]}, {270, [0,0], [1,1,1]}],r=0},

dtm_sector{id=[1,1],description=[{0,[],[0,0,0]},{90,[],[2,0,0]},{180,[0,1],[0,2,0]},

{270,[], [0,0,2]}],r=0.2},

dtm_sector{id=[-1,1],description=[{0,[0,1],[0,2,0]},{90,[],[0,0,2]},{180,[],[0,0,0]},

{270,[],[2,0,0]}],r=1}

].

% The set_tmaze_sectors/0 function returns to the caller a list of sectors representing the T- Maze. In this case, there are 4 such sectors, the vertical sector, the two horizontal sectors,

and the cross section sector.

With the T-Maze implemented, we now need to develop the complementary sensor and the actuator. For the sensor, since the agent needs all the information appended: sensory vectors from the range_sensor , and the reward sensor, com- bined into a single vector, we will create a single sensor which will contain the in- formation from both of these sensors. What sensory signal the scape sends back to the agent's sensor will be defined by the sensor's parameter message. The actuator will simply forward the NN based agent's output to the discrete T-Maze process, which will then interpret the signal as turning left and moving forward 1 step, turning right and moving forward 1 step, or just moving forward 1 step. We first create the morphology, which follows the same format as the one we created for the pole_balancing morphology. This morphology we will call discrete_tmaze , with its implementation shown in Listing-14.10, and which we add to the mor- phology module.

Listing-14.10 The discrete_tmaze morphology specification.

discrete_tmaze(sensors)->

[

sensor{name=dtm_GetInput,scape={private,dtm_sim},vl=4,parameters=[all]}

];

discrete_tmaze(actuators)->

[

actuator{name=dtm_SendOutput,scape={private,dtm_sim},vl=1,parameters=[]}

].

Similarly, the sensor's implementation is shown in Listing-14.11, which we add to the sensor module.

Listing-14.11 The dtm_GetInput sensor implementation.

dtm_GetInput(VL,Parameters,Scape)->

Scape ! {self(),sense,Parameters},

receive

{Scape,percept,SensoryVector}->

case length(SensoryVector)==VL of

true ->

SensoryVector;

false ->

io:format( “Error in sensor:dtm_GetInput/3, VL:~p

SensoryVector:~p~n ”, [VL,SensoryVector]),

lists:duplicate(VL,0)

end

end.

Finally, the actuator implementation is shown in Listing-14.12, which we add it to the actuator module.

Listing 14.12 The dtm_SendOutput actuator implementation . dtm_SendOutput(Output,Parameters,Scape)->

Scape ! {self(),move,Parameters,Output},

receive

{Scape,Fitness,HaltFlag}->

{Fitness,HaltFlag}

end.

And with that we've completely developed all the parts of the discrete T-Maze benchmark. We've created the actual private scape that represents the maze and in which an agent can travel. And we created the complementary morphology, with its own sensor and actuator set, used to interface with the T-Maze scape. With this particular problem/benchmark, we will now be able to test whether our topology and weight evolving artificial neural network system is able to evolve NN based agents which can perform complex navigational tasks, evolve agents which have memory and can make choices based on it, and even learn when the neurons with- in the tested NN have plasticity.

14.2.2 Benchmark Results

Let's run a quick test of our system by applying it to our newly developed problem. Though I do not expect our neuroevolutionary system to evolve an agent capable of effectively solving the problem at this stage, we still need to test whether the new scape, morphology, sensor, and actuator, are functional. Before we run the benchmark, let us figure out what fitness score value represents that the problem has been solved.

An evaluation is composed of 100 total maze runs, and sometime during the midpoint, between run 35 and 65, the high and low rewards are flipped. In this implementation, we set the switch_event to occur on the run number: 35+random:uniform(30) . It will take at least one wrong trip to the reward to fig- ure out that its position has been changed. Also, we should expect that eventually, evolution will create NNs that always first go to the maze corner located at [1,1], which holds the high reward before it is flipped.

So then, the maximum possible score achievable in this problem, a score repre- senting that the problem has been solved, is: 99*1 + 1*0.2 + 50 = 149.2 , which represents an agent that first always goes to the right corner, at some point it goes there and notices that the reward is now small (0.2 instead of 1), and thus starts going to the [-1,1] corner. This allows the agent to achieve 99 high rewards, and 1 low reward. A score which represents that the agent evolved to always go to {1,1}, is at most: 65*1 + 35*0.2 + 50 = 122 , which is achieved during the best case scenario, when the reward is flipped on the 65 count, thus allowing the agent to gather high reward for 65 maze runs, and low reward for the remaining 35 maze runs. The agent will perform multiple evaluations, during some evaluations the reward switch event will occur early, and every once in a while it will occur on the 65 maze run, which is the latest time possible. During that lucky evaluation, the agent can reach 122 fitness points by simply not crashing and always going to the {1,1} side. The agent can accomplish this by first having: 0.33> Output >- 0.33 , which will make the avatar move forward, and during the second step have Output > 0.33 , which will make the avatar turn right and move forward to get the reward. Finally, the smallest possible fitness is achieved when the agent always crashes into the wall: 50 – 100*0.4 = 10 .

With this out of the way, we now set the Morphology element in the benchmarker module within the ?INIT_CONSTRAINTS macro, to discrete_tmaze . We then set generation limit to inf, and evaluations_limit to 5000, in the pmp rec- ord. Finally, we run polis:sync() to recompile and load everything, then start the polis, and then finally execute benchmarker:start(dtm_test), as shown in Listing- 14.3.

Listing-14.3 The results of running the T-Maze benchmark.

Graph:{graph,discrete_tmaze,

th

[1.1300000000000001,1.12,1.195,1.1816666666666666,

1.1633333333333333,1.156111111111111,1.2322222222222223,

1.1400000000000001,1.1766666666666665,1.1800000000000002],

[0.10535653752852737,0.11661903789690603,0.10234744745229357,

0.10026354161796684,0.10214368964029706,0.08123088569087163,

0.13765675688483067,0.11575836902790224,0.1238726945070803,

0.092736184954957],

[111.38000000000011,115.31900000000012,112.4590000000001,

114.4511111111112,112.8790000000001,112.6335555555556,

112.13066666666677,111.12500000000009,110.68722222222232,

114.57700000000014],

[9.305813236896594,6.245812917467183,6.864250796700242,

8.069048898318606,8.136815662374111,9.383282426018074,

7.888934134455533,9.98991266228088,9.41834002503416,

8.867148978110151],

[122.0000000000001,122.0000000000001,122.0000000000001,

122.0000000000001,122.0000000000001,122.0000000000001,

122.0000000000001],

[10.000000000000115,10.000000000000115,10.000000000000115,

10.000000000000115,10.000000000000115,10.000000000000115,

10.000000000000115],

[8.1,8.8,9.1,8.9,8.0,7.75,8.1,7.65,7.9,7.8],

[0.8888194417315588,1.2884098726725124,0.8306623862918073,

0.7681145747868607,0.8366600265340756,0.8874119674649424,

0.9433981132056604,0.7262919523166975,1.57797338380595,

1.3638181696985856],

[500.0,500.0,500.0,500.0,500.0,500.0,500.0,475.0,500.0,500.0],

[]}

Tot Evaluations Avg:5083.75 Std:53.78835840588556

We are not interested in the “Tot Evaluations Avg ” value, since the benchmark was not set up to use the goal_reached feature. But from the graph printout we do see the score 122.00 , boldfaced. I've boldfaced the list showing the highest fit- ness scores achieved amongst all the evolutionary runs. Though as we guessed, the system did not produce a solution (which requires plasticity as we will see in the next chapter), it has rapidly (within the first 500 evaluations), produced the score of 122, which means that agents learned to always navigate to the right corner.

It is always a good idea to at least once double check and printout all the in- formation produced within the scape, following it in the console, and manually analyzing it to check for bugs. We will do that just this once, following a single extracted agent, and the signals its sensors acquire and its actuators produce. First, we run the function population_monitor:test() with the same parameters we started the benchmarker until a fit agent is evolved. We then add the line:

io:format( “Position:~p SenseSignal:~p “,[Apos,SenseSignal]),

And lines:

timer:sleep(1000),

io:format( “Move:~p StepIndex:~p RunIndex:~p~n ”, [Move,U_StepIndex, S#dtm_state.run_index]),

To the receive sense and move pattern matchers, respectively. We then extract the evolved fit agent, and execute the function: exoself:start(AgentId,void) to ob- serve the path the agent takes. A short console printout I saw when performing these steps is shown in Listing-14.4. The console printout shows the agent's start- ing moves, up to the point when the position of the rewards was switched, and a few steps afterwards.

Listing-14.4 Console printout of a champion agent's maze navigation. exoself:start({7.513656492058022e-10,agent},void).

Starting dtm_sim

Position:[0,0] SenseSignal:[0,1,0,0] <0.5846.1>

Move:4.18876787545547e-15 StepIndex:1 RunIndex:0

Position:[0,1] SenseSignal:[1,0,1,0] Move:0.7692260090076106 StepIndex:2 RunIndex:0 Position:[1,1] SenseSignal:[0,0,0,1] Move:0.011886120521166272 StepIndex:3 RunIndex:0 Position:[0,0] SenseSignal:[0,1,0,0] Move:4.18876787545547e-15 StepIndex:1 RunIndex:1 Position:[0,1] SenseSignal:[1,0,1,0] Move:0.7692260090076106 StepIndex:2 RunIndex:1 Position:[1,1] SenseSignal:[0,0,0,1] Move:0.011886120521166272 StepIndex:3 RunIndex:1 Position:[0,0] SenseSignal:[0,1,0,0] Move:4.18876787545547e-15 StepIndex:1 RunIndex:2 Position:[0,1] SenseSignal:[1,0,1,0] Move:0.7692260090076106 StepIndex:2 RunIndex:2 Position:[1,1] SenseSignal:[0,0,0,1] Move:0.011886120521166272 StepIndex:3 RunIndex:2 Position:[0,0] SenseSignal:[0,1,0,0] Move:4.18876787545547e-15 StepIndex:1 RunIndex:3 ...

Position:[0,0] SenseSignal:[0,1,0,0] Move:4.18876787545547e-15 StepIndex:1 RunIndex:38 Position:[0,1] SenseSignal:[1,0,1,0] Move:0.7692260090076106 StepIndex:2 RunIndex:38 Position:[1,1] SenseSignal:[0,0,0,1] Move:0.011886120521166272 StepIndex:3 RunIndex:38 Position:[0,0] SenseSignal:[0,1,0,0] Move:4.1887678754555e-15 StepIndex:1 RunIndex:39 Position:[0,1] SenseSignal:[1,0,1,0] Move:0.7692260090076106 StepIndex:2 RunIndex:39 Position:[1,1] SenseSignal:[0,0,0,0.2] Move:0.837532377697202 StepIndex:3 RunIndex:39 Position:[0,0] SenseSignal:[0,1,0,0] Move:4.1887678754555e-15 StepIndex:1 RunIndex:40 Position:[0,1] SenseSignal:[1,0,1,0] Move:0.7692260090076106 StepIndex:2 RunIndex:40 Position:[1,1] SenseSignal:[0,0,0,0.2] Move:0.837532377697202 StepIndex:3 RunIndex:40 Position:[0,0] SenseSignal:[0,1,0,0] Move:4.18876787545547e-15 StepIndex:1 RunIndex:41 Position:[0,1] SenseSignal:[1,0,1,0] Move:0.7692260090076106 StepIndex:2 RunIndex:41 Position:[1,1] SenseSignal:[0,0,0,0.2] Move:0.837532377697202 StepIndex:3 RunIndex:41 ...

14.3 Summary & Discussion

I've boldfaced the very first maze run, where we see the agent taking the steps from [0,0] to [0,1] to [1,1], and receiving the reward 1. Then we fast-forward and see that during the RunIndex:39 , the reward has been switched. We know this because when the agent gets to [1,1] on that run, the reward is a mere 0.2 now. On the RunIndex: 40, the agent still goes to this same location, indicating it has not learned, and it has not evolved the ability to change its strategy.

In this chapter we built two new problems to benchmark and test our neuroevolutionary system on. We built the Double Pole Balancing (DPB) simula- tion, and the Discrete T-Maze (DTM) simulation. We created different versions of the pole balancing problem, the single pole balancing with and without damping, and with and without full system state information, and the double pole balancing with and without damping, and with and without full system state information. The complexity of solving the pole balancing problem grows when we increase the number of poles to balance simultaneously, when we remove the velocity in- formation and thus require the NN based agent to derive it on its own, and when we use the damping based fitness function instead of the standard one. We also created a discrete version of the T-Maze navigation problem, where an agent must navigate a T shaped maze to collect a reward located at one of the horizontal maze ends. In this maze there are two rewards, located at the opposite ends of the maze, one large and one small, and their location is switched at a random point during the 100 maze runs in total. This requires the agent to remember where the large reward was last time, explore that position, find that the reward is now small, and during the remaining maze runs navigate to the other side of the maze to continue collecting the large reward. This problem can be further expanded by changing the fitness function used, and by requiring the agent to collect the reward and then re- turn to the base of the maze, rather than being automatically teleported back as is the case with our current implementation. Furthermore, we could expand the T- Maze into a Double T-Maze, with 4 corners where the reward can be collected, and thus requiring the agent to remember more navigational patterns and reward locations.

Based on our benchmark, the system we've built thus far has performed very well on the DPB problem, with its results being higher than those of other Topol- ogy and Weight Evolving Artificial Neural Networks (TWEANN), as was seen when the results we achieved were compared to the results of such systems refer- enced from paper [1]. Yet still the performance was not higher than that of DXNN, because we have yet to tune our system. When we applied our TWEANN to the T-Maze Navigation problem, it evolved NNs that were not yet able to change their strategy based on their experience. Adding plasticity in the next chap- ter will further expand the capabilities of the evolved NNs, giving us a chance to

again apply our system to this problem, and see that the performance improves, and allows the agents to achieve perfect scores.

Having a good set of problems in our benchmark suit will allow us to add and create features that we can demonstrate to improve the system's generalization abilities and general performance. The two new problems we added in this chapter will allow us to better test our system, and the performance of new features we add to it in the future. Finally, the T-Maze problem will allow us to test the im- portant feature that we will add in the next chapter: neural plasticity .

14.4 References

[1] Gomez F, Schmidhuber J, Miikkulainen R (2008) Accelerated Neural Evolution through Co- operatively Coevolved Synapses. Journal of Machine Learning Research 9, 937-965.

[2] Sher GI (2010) DXNN Platform: The Shedding of Biological Inefficiencies. Neuron, 1-36. Available at: http://arxiv.org/abs/1011.6022.

[3] Durr P, Mattiussi C, Soltoggio A, Floreano D (2008) Evolvability of Neuromodulated Learn- ing for Robots. 2008 ECSIS Symposium on Learning and Adaptive Behaviors for Robotic Systems LABRS, 41-46.

[4] Blynel J, Floreano D (2003) Exploring the T-maze: Evolving Learning-Like Robot Behaviors using CTRNNs. Applications of evolutionary computing 2611, 173-176.

[5] Khepera robots: www.k-team.com

[6] Risi S, Stanley KO (2010) Indirectly Encoding Neural Plasticity as a Pattern of Local Rules. Neural Plasticity 6226, 1-11.

[7] Soltoggio A, Bullinaria JA, Mattiussi C, Durr P, Floreano D (2008) Evolutionary Advantages of Neuromodulated Plasticity in Dynamic, Reward-based Scenarios. Artificial Life 2, 569- 576.

Chapter 15 Neural Plasticity

Abstract In this chapter we add plasticity to our direct encoded NN system. We implement numerous plasticity encoding approaches, and develop numerous plas- ticity learning rules, amongst which are variations of the Hebbian Learning Rule, Oja's Rule, and Neural Modulation. Once plasticity has been added, we again test our TWEANN system on the T-Maze navigation benchmark.

We have now built a truly advanced topology and weight evolving artificial neural network (TWEANN) platform. Our system allows for its various features to evolve, the NNs can evolve not only the topology and synaptic weights, but also evolutionary strategies, local and global search parameters, and the very way in which the neurons/processing-elements interact with input signals. We have im- plemented our system in such a way that it can easily be further expanded and ex- tended with new activation functions (such as logical operators, or activation func- tions which simulate a transistor for example), mutation operators, mutation strategies, and almost every other feature of our TWEANN. We have also created two benchmarks, the double pole balancing benchmark and the T-Maze navigation benchmark, which allows us to test our system's performance.

There is something lacking at this point though, our evolved agents are but stat- ic systems. Our NN based agents do not learn during their lifetimes, they are trained by the exoself, which applies the NN based system to the problem time af- ter time, with different parameters, until one of the parameter/synaptic-weight combinations produces a more fit agent. This is not learning. Learning is the pro- cess during which the NN changes due to its experience, due to its interaction with the environment. In biological organisms, evolution produces the combination of neural topology, plasticity parameters, and the starting synaptic weight values, which allows the NN, based on this plasticity and initial NN topology and setup, to learn how to interact with the environment, to learn and change and adapt dur- ing its lifetime. The plasticity parameters allow the NN to change as it interacts with the environment. While the initial synaptic weight values send this newborn agent in the right direction, in hope that the plasticity will change the topology and synaptic weights in the direction that will drive the agent, the organism, further in its exploration, learning, adaptation, and thus towards a higher fitness.

Of course with plasticity comes a new set of questions: What new mutation op- erators need to be added? How do we make the mutation operators specific to that particular set of parameters used by the plasticity learning rule? What about the tuning phase when it comes to neurons with plasticity, what is the difference be- tween plasticity enabled NNs which are evolved through genetic algorithm ap- proaches, and those evolved through memetic algorithm approaches? During the

DOI 10.1007/978-1-4614- 4463 - 3_15, © Springer Science+Business Media New York 2013

tuning phase, what do we perturb, the synaptic weights or the plasticity parame- ters?...

Plasticity is that feature which allows the neuron and its parameters to change due to its interaction with input signals. In this book's neural network foundations chapters we discussed this in detail. In this chapter we will implement the various learning rules that add neural plasticity to our system. In this chapter we will cre- ate 3 types of plasticity functions, the standard Hebbian plasticity, the more ad- vanced Oja's rule, and finally the most dynamic and flexible approach, neural plasticity through neuromodulation. We will first discuss and implement these learning rules, and then add the perturbation and mutation operators necessary to take advantage of the newly added learning mechanism.

15.1 Hebbian Rule

We discussed the Hebbian learning rule in Section-2.6.1. The principle behind the Hebbian learning rule is summarized by the quote “Neurons that fire together, wire together. ” If a presynaptic neuron A which is connected to a neuron B, sends it an excitatory ( SignalVal > 0 ) signal, and in return B produces an excitatory out- put, then the synaptic weight between the two neurons increases in magnitude. If on the other hand neuron A sends an excitatory signal to B, and B's resulting out- put signal is inhibitory ( SignalVal < 0 ), then B's synaptic weight for A's connec- tion, decreases. In a symmetric fashion, an inhibitory signal from A that results in an inhibitory signal from B, increases the synaptic weight strength between the two, but an inhibitory signal from A resulting in an excitatory signal from B, de- creases the strength of the connection.

The simplest Hebbian rule used to modify the synaptic weight after the neuron has processed some signal at time t is:

Delta_Weight = h * I_Val * Output,

Thus:

W (t+1) = W (t) + Delta_Weight.

Where Delta_Weight is the change in the synaptic weight, and where the speci- fied synaptic weight belongs to B, associated with the incoming input signal I_Val , coming from neuron A. The value h is the learning parameter, set by the re- searcher. The algorithm and architecture of a neuron using a simple Hebbian learning rule, repeated from Section-2.6.1 for clarity, is shown in Fig-15.1 .

Fig. 15.1 An architecture of a neuron using the Hebbian learning rule based plasticity.

This is the simplest Hebbian rule, but though computationally light, it is also unstable. Because the synaptic weight does not decay, if left unchecked, the Hebbian rule will keep increasing the magnitude of the synaptic weight, indefi- nitely, and thus eventually drown out all other synaptic weights belonging to the neuron. For example, if a neuron has 5 synaptic weights, 4 of which are between -Pi and P, and the fifth weight has climbed to 1000, this neuron is effectively use- less with regards to processing since the signal weighted by 1000 will most likely overpower other signals. No matter what the other 4 synaptic weights are, no mat- ter what pattern they have evolved to pick up, the fifth weight with magnitude 1000 will drown out everything, saturating the output. We will implement it for the sake of completeness, and also because it is so easy to implement. To deal with unchecked synaptic weight magnitudes, we will use our previously created functions:sat/1 and functions:sat/2 functions to ensure that the synaptic weights do not increase in magnitude unchecked, that they do not increase to infinity, and in- stead get saturated at some level specified by the sat function and the ?SAT_LIMIT parameter specified within the neuron module.

There is though a problem with the current architecture of our neuron, which prevents it from having plasticity. That problem is that the neuron's input_idps list specifies only the Input_Id of the node that sends it an input signal, and the ac- companying synaptic weight list Weights : [{Input_Id,Weights}...] . With the addi- tion of plasticity, we must have the ability to also specify the various new parame- ters (like the learning parameter for example) of the learning rule. There are multiple ways in which we can solve this dilemma, the following are four of them:

1. Extend the input_idps from: [{Input_Id,Weights}...] to: [{Input_Id, Weights, LearningParameters}...]

2. Extend the neuron record to also include input_lpps , a list with the format: [{Input_Id,LPs}...] , where input_lpps stands for input learning parameters

plus , and the LPs list in the tuple stands for Learning Parameters , mirroring the input_idps list's format.

3. Extend the Weights list in the input_idps tuple list from: [W 1 ,W 2 ,W 3 ...] To: [{W 1 ,P 1 },{W 2 ,P 2 },{W 3 ,P 3 }...]

4. Extend pf (Plasticity Function) specification from: atom()::FunctionName to: {atom()::FunctionName, ParameterList}

All of these solutions would require us to modify the genotype, ge- nome_mutator, exoself, neuron, signal_aggregator , and plasticity modules, so that these modules can properly create, mutate, map genotype to phenotype, and in general properly function when the NN system is active. DXNN uses the 3 solu- tion, but only because at one point I also allowed the evolved NN systems to use a modified back propagation learning algorithm, and P i contained the learning pa- rameter. There were also D i and M i parameters, making the input_idps list of the neurons evolved by the DXNN platform have the following format: [{W 1 ,P 1 ,D 1 ,M 1 },{W 2 ,P 2 ,D 2 ,M 2 }...] , where the value D contained the previous time step's change in synaptic weight, and M contained the momentum parameter used by the backprop algorithm.

Options 1-3 are appropriate for when there is a separate plasticity function, a separate synaptic weight modification and learning rule, for every synaptic weight. But in a lot of cases, the neuron has a single learning rule which is applied to all synaptic weights equally. This is the case with the Hebbian Learning Rule, where the neuron needs only a single learning parameter specifying the rate of change of the synaptic weights. For the learning rules that use a single parameter or a list of global learning parameters, rather than a separate list of learning parameters for every synaptic weight, option 4 is the most appropriate, in which we extend the plasticity function name with a parameter list used by that plasticity function.

But what if at some point in the future we decide that every weight should be accompanied not by one extra parameter, but by 2, or 3, or 4... To solve this, we could use solution-3, but have each P i be a list. If there is only one parameter, then it is a list of length 1: [A 1 ], if two parameters are needed by some specific learning rule, then each P is a list of length 2: [A 1 ,A 2 ], and so on. If there is no plasticity, the list is empty.

Are there such learning rules that require so many parameters? Yes, for exam- ple some versions of neuromodulation can be set such that a single neuron simu- lates having 5 other modulating neurons within, each of whom analyzes the input vectors to the neuron in question, and each of whom outputs a value which speci- fies a particular parameter in the generalized Hebbian learning rule. This type of plasticity function could use anywhere from 2 to 5 parameters (in the version we will implement) for each synaptic weight (those 2-5 parameters are themselves synaptic weights of the embedded modulating neurons), and we will discuss that particular approach and neuromodulation in general in section 15.3. Whatever rule we choose, there is a price. Luckily though, due to the way we've constructed our

rd

system, it is easy to fix and modify it, no matter which of the listed approaches we decide to go with.

Let us choose the 3 option where each P i is a list of parameters for each weight W i , and where that list length is dependent on the plasticity function the neuron uses. In addition, we will also implement the 4 option, which requires us to modify the pf parameter format. The pf parameter for every neuron will be specified as a tuple, composed of the plasticity function name and a global learn- ing parameter list. This will, though making the implementation a bit more diffi- cult, allow for a much greater level of flexibility in the types of plasticity rules we can implement. Using both methods, we will have access to plasticity functions which need to specify a parameter for every synaptic weight, and those which only need to specify a single or a few global parameters of the learning rule for the en- tire neuron.

15.1.1 Implementing the New input_idps & pf Formats

We first update the specification format for the neuron's pf parameter. This re- quires only a slight modification in the neuron module, changing the line:

U_IPIdPs =plasticity:PF(Ordered_IAcc,Input_PIdPs,Output)

To:

{PFName,PFParameters} = PF,

U_IPIdPs = plasticity:PFName(PFParameters,Ordered_IAcc,Input_PIdPs,Output),

And a change in the genotype module, to allow us to use the plasticity function name to generate the PF tuple. The way we do this is by creating a special func- tion in the plasticity module with arity 1 and of the form: plastici- ty:PFName(neural_parameters) , which returns the necessary plasticity function specifying tuple: {PFName, PL} , where PL is the Parameter List. In this manner, when we develop the plasticity functions, we can at the same time create the func- tion of arity 1 which returns the appropriate tuple defining the actual plasticity function name and its parameters. The change in the genotype module is done to the generate_NeuronPF/1 function, changing it from:

generate_NeuronPF(Plasticity_Functions)->

case Plasticity_Functions of

[] ->

none;

Other ->

lists:nth(random:uniform(length(Other)),Other)

rd

th

end.

To:

generate_NeuronPF(Plasticity_Functions)->

case Plasticity_Functions of

[] ->

{none,[]};

Other ->

PFName = lists:nth(random:uniform(length(Other)),Other),

plasticity:PFName(neural_parameters)

end.

With this modification completed, we can specify the global, neural level learn- ing parameters. But to be able to specify synaptic weight level parameters, we have to augment the neuron's input_idps list specification format. Because our new format for input_idps stays very similar to the original, we need only convert the original list's form from: [{Input_Id, Weights}...] to: [{Input_Id,WeightsP}...] . Any function that does not directly operate on Weights, does not get affected by us changing Weights: [W 1 ,W 2 ...] to WeightsP: [{W 1 ,PL 1 },{W 2 ,PL 2 }...] , where PL is the plasticity function's Parameter List. The only function that does get affected by this change is the one in the genotype module which creates the input_idps list, create_NeuralWeights/2 . In genome_mutator module, again the only affected function is the mutate_weights function which uses the perturb_weights function and thus needs to choose the weights rather than the learning parameters to mu- tate. Finally, the neuron process also perturbs its synaptic weights, and so we will need to use a modified version of the perturb_weights function.

The most interesting modification occurs in the create_NeuralWeights func- tion. We modify it from:

create_NeuralWeights(0,Acc) ->

Acc;

create_NeuralWeights(Index,Acc) ->

W = random:uniform()-0.5,

create_NeuralWeights(Index-1,[W|Acc]).

To:

create_NeuralWeightsP(_PFName,0,Acc) ->

Acc;

create_NeuralWeightsP(PFName,Index,Acc) ->

W = random:uniform()-0.5,

create_NeuralWeightsP(PFName,Index-1,[{W, plasticity:PFName(weight_parameters) } |

Acc]).

The second version creates a list of tuples rather than a simple list of synaptic weights. Since each learning rule, each plasticity function, will have its own set of parameters, we defer the creation of a parame ter list to its own plasticity function. To have the plasticity function create an initial synaptic level parameter list, we will call it with the atom parameter: weight_parameters . Thus for every plasticity func- tion, we will create a secondary clause, which takes as input a single parameter, and through the use of this parameter it will specify whether the plasticity function will return neural level learning rule parameters, or synaptic weight level learning rule parameters. The weight_parameters specification will make the plasticity function return a randomized list of parameters required by that learning rule at the synaptic weight level.

We also add to the plasticity module a secondary none function: none/1. This none/1 function can be executed with the neural_parameters or the weight_parameters atom, and in both cases it returns an empty list, since a neuron which does not have plasticity and thus uses the none/1 plasticity function, does not need learning parameters of any type. Thus, our plasticity module now holds two functions by the name none: one with arity 4, and one with arity 1:

none(neural_parameters)->

[];

none(weight_parameters)->

[].

%none/0 returns a set of learning parameters needed by the none/0 plasticity function. Since

this function specifies that the neuron has no plasticity, the parameter lists are empty. none(_NeuralParameters,_IAcc,Input_PIdPs,_Output)->

Input_PIdPs.

%none/3 returns the original Input_PIdPs to the caller.

The modification to the perturb_weights function (present in the neuron mod- ule, and present in the genome_mutator module in a slightly modified form) is much simpler. The updated function has the form, where the changes have been highlighted in boldface:

perturb_weights P (Spread,MP,[ {W,LPs} |Weights P ],Acc)->

U_W = case random:uniform() < MP of

true->

sat((random:uniform()-0.5)*2*Spread+W,-?SAT_LIMIT,?SAT_LIMIT);

false ->

W

end,

perturb_weightsP(Spread,MP,Weights P ,[{U_W, LPs }|Acc]); perturb_weightsP(_Spread,_MP,[],Acc)->

lists:reverse(Acc).

All that has changed is the function name, and that instead of using: [W|Weights], we now use: [{W,LPs}|WeightsP], where the list LPs stands for Learning Parameters.

Finally, we must also update the synaptic weight and plasticity function specif- ic mutation operators. These functions are located in the genome_mutator module. These are the add_bias/1 , mutate_pf/1 , and the link_ToNeuron/4 functions. The add_bias/1 and link_ToNeuron/4 functions add new synaptic weights, and thus must utilize the new plasticity:PFName(weight_parameters) function, based on the particular plasticity function used by the neuron. The mutate_pf/1 is a muta- tion operator function. Due to the extra parameter added to the input_idps list, when we mutate the plasticity function, we must also update the synaptic weight parameters so that they are appropriate for the format of the new learning rule. Only the mutate_pf/1 function requires a more involved modification to the source code, with the other two only needing for the plasticity function name to be ex- tracted and used to generate the weight parameters from the plasticity module. The updated mutate_pf/1 function is shown in Listing-15.1, with the modified parts in boldface.

Listing-15.1 The updated implementation of the mutate_pf/1 function.

mutate_pf(Agent_Id)->

A = genotype:read({agent,Agent_Id}),

Cx_Id = A#agent.cx_id,

Cx = genotype:read({cortex,Cx_Id}),

N_Ids = Cx#cortex.neuron_ids,

N_Id = lists:nth(random:uniform(length(N_Ids)),N_Ids),

Generation = A#agent.generation,

N = genotype:read({neuron,N_Id}),

{PFName,_NLParameters} = N#neuron.pf,

case (A#agent.constraint)#constraint.neural_pfns -- [PFName] of

[] ->

exit( “********ERROR:mutate_pf:: There are no other plasticity functions to

use. ”);

Other_PFNames ->

New_PFName=lists:nth(random:uniform(length(Other_PFNames)),Other_PFNames),

New_NLParameters = plasticity:New_PFName(neural_parameters),

NewPF = {New_PFName,New_NLParameters},

InputIdPs = N#neuron.input_idps,

U_InputIdPs = [{Input_IdP,plasticity:New_PFName(weight_parameters)}

|| {Input_IdP,_OldPL} <- InputIdPs],

U_N = N#neuron{ pf=NewPF,input_idps = U_InputIdPs, generation

=Generation},

EvoHist = A#agent.evo_hist,

U_EvoHist = [{mutate_pf,N_Id}|EvoHist],

U_A = A#agent{evo_hist=U_EvoHist},

genotype:write(U_N),

genotype:write(U_A)

end.

After making these modifications, we ensure that everything is functioning as it should, by executing:

polis:sync().

polis:start().

population_monitor:test().

Which compiles the updated modules ensuring that there are no errors, then starts the polis process, and then finally runs a quick neuroevolutionary test. The function population_monitor:test/0 can be executed a few times (each execution done after the previous one runs to completion), to ensure that everything still works. Because neuroevolutionary systems function stochastically, the genotypes and to- pologies evolved during one evolutionary run will be different from another, and so it is always a good idea to run it a few times, to test out the various combina- tions and permutations of the evolving agents.

With this update completed, we can now create plasticity functions. Using our plasticity module implementation, we allow the plasticity functions to completely isolate and decouple their functionality and setup from the rest of the system, which will allow others to add and test new plasticity functions as they please, without disturbing or having to dig through the rest of the code.

15.1.2 Implementing the Simple Hebbian Learning Rule

We need to implement a rule where every synaptic weight W i is updated every time the neuron processes an input vector and produces an output vector. The weight W i must be updated using the rule: Updated_W i =W i + h*I i *Output , where I i is the float() input value associated with the synaptic weight W i . The Updat- ed_W i must be, in the same way as done during weight perturbation, saturated at the value: ?SAT_LIMIT , so that its magnitude does not increase indefinitely.

From the above equation, it can be seen from the common h for all I i and W i , that the standard Hebbian learning rule is one where the neuron has a single, glob- al, neural level learning parameter h , which is used to update all the synaptic weights belonging to that neuron. Because our neuron also has the ability to have a learning parameter per weight, we can also create a Hebbian learning rule where every synaptic weight uses its very own h. Though note that this approach will double the number of mutatable parameters for the neuron: a list of synaptic

weights, and a list of the same size of Hebbian learning parameters. For the sake of completeness, we will implement both versions. We will call the standard Hebbian learning function which uses a single learning parameter h for all synap- tic weights, hebbian/4 , and one which uses a separate learning parameter h i for every synaptic weight, hebbian_w/4 (where _w stands for weights). Let us first implement the hebbian_w function, which uses the following weight update rule: Updated_W i = W i + h i *I i *Output , where W i is the synaptic weight, h i is the learn- ing parameter for neuron W i , and I i is the input signal associated with synaptic weight W i .

In the previous section we have updated our neuron to apply a learning rule to its weights through: U_IPIdPs = plasticity:PFName(Neural_Parameters, Or- dered_IAcc,Input_PIdPs,Output), which gives the plasticity function access to the neural parameters list, the output signal, the synaptic weights and their associated learning parameters, and the accumulated input vector. To set up the plasticity function by the name hebbian_w, we first implement the function hebbian_w/1 which returns a weight parameters list composed of a single element [H] when hebbian_w/1 is executed with the weights_parameters parameter, and an empty list when it is executed with the neural_parameters parameter. We then create the function hebbian_w/4 which implements this actual learning rule. The implemen- tation of these two hebbian_w functions is shown in Listing-15.2.

Listing 15.2 The implementation of hebbian_w/1 and hebbian_w/4 functions. hebbian_w(neural_parameters)->

[];

hebbian_w(weight_parameters)->

[(lists:random()-0.5)].

%hebbian_w/1 function produces the necessary parameter list for the hebbian_w learning rule

to operate. The weights parameter list generated by hebbian_w learning rule is a list composed

of a single parameter H: [H], for every synaptic weight of the neuron. When hebbian_w/1 is called with the parameter neural_parameters, it returns [].

hebbian_w(_NeuralParameters,IAcc,Input_PIdPs,Output)->

hebbian_w1(IAcc,Input_PIdPs,Output,[]).

hebbian_w1([{IPId,Is}|IAcc],[{IPId,WPs}|Input_PIdPs],Output,Acc)->

Updated_WPs = hebbrule_w(Is,WPs,Output,[]),

hebbian_w1(IAcc,Input_PIdPs,Output,[{IPId,Updated_WPs}|Acc]);

hebbian_w1([],[],_Output,Acc)->

lists:reverse(Acc);

hebbian_w1([],[{bias,WPs}],Output,Acc)->

lists:reverse([{bias,WPs}|Acc]).

%hebbian_w/4 function operates on each Input_PIdP, calling the hebbian_w1/4 function which processes each of the complementary Is and WPs lists, producing the Updated_WPs lists in re- turn, with the now updated/adapted weights, based on the hebbian_w learning rule.

hebbrule_w([I|Is],[{W,[H]}|WPs],Output,Acc)->

Updated_W = functions:saturation(W + H*I*Output,?SAT_LIMIT), hebbrule_w(Is,WPs,Output,[{Updated_W,[H]}|Acc]);

hebbrule_w([],[],_Output,Acc)->

lists:reverse(Acc).

%hebbrule_w/4 applies the Hebbian learning rule to each synaptic weight by using the input value I, the neuron's calculated Output, and each W's own distinct learning parameter H.

The function hebbian_w/4 calls hebbian_w1/4 with a list accumulator, which separately operates on the input vectors from each Input_PId by calling the hebbrule_w/4 function. It is the hebbrule_w/4 function that actually executes the modified Hebbian learning rule: Updated_W = functions:saturation(W+H*I*Output, ?SAT_LIMIT) , and updates the WeightsP list.

Note that hebbian_w/1 generates a parameter list composed of a single value with a range between -0.5 and 0.5 (This range was chosen to ensure that from the very start the learning parameter will not be too large). The Hebbian rule which uses a negative learning parameter embodies Anti-Hebbian learning. The Anti-Hebbian learning rule decreases the postsynaptic weight between neurons outputting signals of the same sign, and increases magnitude of the postsynaptic weight between those neurons that are connected and output signals of differing signs. Thus, if a neuron A sends a signal to neuron B, and the presynaptic signal is positive, while the postsyn- aptic neuron B's output signal is negative, and it has H < 0, and is thus using the Anti- Hebbian learning rule, then the B's synaptic weight for the link from neuron A will increase in magnitude. This means that in the hebbian_w/4 learning rule implemen- tation, some of the synaptic weights will be using Hebbian learning, and some Anti- Hebbian. This will add some extra agility to our system that might prove useful, and allow the system to evolve more general learning networks.

With the modified Hebbian rule now implemented, let us implement the stand- ard one. In the standard Hebbian rule, the hebbian/1 function generates an empty list when called with weight_parameters , and the list [H] when called with neu- ral_parameters. Also, the hebbian/4 function that implements the actual learning rule will use a single common H learning parameter to update all the synaptic weights in the input_idps. Listing-15.3 shows the implementation of such standard Hebbian learning rule.

Listing-15.3 The implementation of the standard Hebbian learning rule. hebbian(neural_parameters)->

[(lists:random()-0.5)];

hebbian(weight_parameters)->

[].

%The hebbian/1 function produces the necessary parameter list for the Hebbian learning rule to operate. The parameter list for the standard Hebbian learning rule is a list composed of a single parameter H: [H], used by the neuron for all its synaptic weights. When hebbian/1 is called with the parameter weight_parameters, it returns [].

hebbian([H],IAcc,Input_PIdPs,Output)->

hebbian(H,IAcc,Input_PIdPs,Output,[]).

hebbian(H,[{IPId,Is}|IAcc],[{IPId,WPs}|Input_PIdPs],Output,Acc)->

Updated_WPs = hebbrule(H,Is,WPs,Output,[]),

hebbian(H,IAcc,Input_PIdPs,Output,[{IPId,Updated_WPs}|Acc]);

hebbian(_H,[],[],_Output,Acc)->

lists:reverse(Acc);

hebbian(_H,[],[{bias,WPs}],Output,Acc)->

lists:reverse([{bias,WPs}|Acc]).

%hebbian/4 function operates on each Input_PIdP, calling the hebbian/5 function which pro- cesses each of the complementary Is and WPs lists, producing the Updated_WPs list in return, with the updated/adapted weights based on the standard Hebbian learning rule, using the neu- ron's single learning parameter H.

hebbrule(H,[I|Is],[{W,[]}|WPs],Output,Acc)->

Updated_W = functions:saturation(W + H*I*Output,?SAT_LIMIT), hebbrule(H,Is,WPs,Output,[{Updated_W,[]}|Acc]);

hebbrule(H,[],[],_Output,Acc)->

lists:reverse(Acc).

%hebbrule/5 applies the Hebbian learning rule to each weight, using the input value I, the neu- ron's calculated output Output, and the neuron's single learning parameter H.

The standard Hebbian learning rule has a number of flaws. One of these flaws is that without the saturation/2 function that we're using, the synaptic weight would grow in magnitude to infinity. A more biologically faithful implementation of this auto-associative learning, is the Oja's learning rule, which we discuss and implement next.

15.2 Oja's Rule

The Oja's learning rule is a modification of the standard Hebbian learning rule that solves its stability problems through the use of multiplicative normalization, derived in [1]. This learning rule is also closer to what occurs in biological neu- rons. The synaptic weight update algorithm embodied by the Oja's learning rule is as follows: Updated_W i = W i + h*O*(I i – O*W i ) , where h is the learning parame- ter, O is the output of the neuron based on its processing of the input vectors using

its synaptic weights, I i is the i input signal, and W i is the i synaptic weight asso- ciated with the I i input signal.

We can compare the instability of the Hebbian rule to the stability of the Oja's rule by running this learning rule through a few iterations with a positive input signal I. Assuming our neuron only has a single synaptic weight for an input vec- tor of length one, we test the stability of the synaptic weight updated through the Oja's rule as follows:

Initial setup: W = 0.5, h = 0.2, activation function is tanh, using a constant in- put I = 1:

1. O=math:tanh(W*I)=math:tanh(0.5*1)=0.46

Updated_W = W + h*O*(I – O*W) = 0.5 + 0.2*0.46*(1 – 0.46*0.5) = 0.57 2. O=math:tanh(W*I)=math:tanh(0.57*1)=0.52

Updated_W = W + h*O*(I – O*W) = 0.57 + 0.2*0.52(1 – 0.52*0.57) = 0.64 3. O=math:tanh(W*I)=math:tanh(0.64*1)=0.56

Updated_W = W + h*O*(I - O*W) = 0.64 + 0.2*0.56*(1 - 0.56*0.64) = 0.71 4. …

This continues to increase, but once the synaptic weight achieves a value higher than the input, for example when W = 1.5, the learning rule takes the weight up- date in the other direction:

5. O=math:tanh(W*I)=math:tanh(1.5*1)=0.90

Updated_W = W + h*O*(I - O*W) = 1.5 + 0.2*0.90*(1 - 0.90*1.5) = 1.43 Thus this learning rule is indeed self stabilizing, the synaptic weights will not

continue to increase in magnitude towards infinity, as was the case with the Hebbian learning rule. Let us now implement the two functions, one which returns the needed learning parameters for this learning rule, and the other implementing the actual Oja's synaptic weight update rule.

15.2.1 Implementing the Oja's Learning Rule

Like the Hebbian learning rule, the standard Oja's rule too only uses a single parameter h to pace the learning rate of the synaptic weights. We implement ojas_w/1 in the same fashion we did the hebbian_w/1, it will be a variation of the Oja's learning rule that uses a single learning parameter per synaptic weight, ra- ther than a single learning parameter for the entire neuron. This synaptic weight update rule is as follows:

Updated_W i = W i + h i *O*(I i – O*W i )

We set the initial learning parameter to be randomly chosen between -0.5 and 0.5. The implementation of ojas_w/1 and ojas_w/4 is shown in Listing-15.4.

th th

Listing-15.4 The implementation of a modified Oja's learning rule, and its initial learning pa- rameter generating function.

ojas_w(neural_parameters)->

[];

ojas_w(synaptic_parameters)->

[(lists:random()-0.5)].

%oja/1 function produces the necessary parameter list for the Oja's learning rule to operate.

The parameter list for Oja's learning rule is a list composed of a single parameter H: [H] per synaptic weight. If the learning parameter is positive, then the postsynaptic neuron's synaptic weight increases if the two connected neurons produce output signals of the same sign. If the learning parameter is negative, and the two connected neurons produce output signals of the same sign, then the synaptic weight of the postsynaptic neuron, decreases in magnitude.

ojas_w(_Neural_Parameters,IAcc,Input_PIdPs,Output)->

ojas_w1(IAcc,Input_PIdPs,Output,[]).

ojas_w1([{IPId,Is}|IAcc],[{IPId,WPs}|Input_PIdPs],Output,Acc)->

Updated_WPs = ojas_rule_w(Is,WPs,Output,[]),

ojas_w1(IAcc,Input_PIdPs,Output,[{IPId,Updated_WPs}|Acc]);

ojas_w1([],[],_Output,Acc)->

lists:reverse(Acc);

ojas_w1([],[{bias,WPs}],Output,Acc)->

lists:reverse([{bias,WPs}|Acc]).

%ojas_w/4 function operates on each Input_PIdP, calling the ojas_rule_w/4 function which processes each of the complementary Is and WPs lists, producing the Updated_WPs list in re- turn. In the returned Updated_WPs, the updated/adapted weights are based on the oja's learning rule, using each synaptic weight's distinct learning parameter.

ojas_rule_w([I|Is],[{W,[H]}|WPs],Output,Acc)->

Updated_W = functions:saturation(W + H*Output*(I - Output*W),?SAT_LIMIT), ojas_rule_w(Is,WPs,Output,[{Updated_W,[H]}|Acc]);

ojas_rule_w([],[],_Output,Acc)->

lists:reverse(Acc).

%ojas_weights/4 applies the oja's learning rule to each weight, using the input value I, the neu- ron's calculated output Output, and each weight's distinct learning parameter H.

The standard implementation of Oja's learning rule, which uses a single learn- ing parameter H for all synaptic weights, is shown in Listing-15.5. The standard Oja's rule uses the following weight update algorithm: Updated_W i = W i + h*O*(I i – O*W i ) .

Listing-15.5 The implementation of the standard Oja's learning rule.

ojas(neural_parameters)->

[(lists:random()-0.5)];

15.3 Neuromodulation

ojas(synaptic_parameters)->

[].

%oja/1 function produces the necessary parameter list for the oja's learning rule to operate. The parameter list for oja's learning rule is a list composed of a single parameter H: [H], used by the neuron for all its synaptic weights. If the learning parameter is positive, and the two connected neurons produce output signals of the same sign, then the postsynaptic neuron's synaptic weight increases. Otherwise it decreases.

ojas([H],IAcc,Input_PIdPs,Output)->

ojas(H,IAcc,Input_PIdPs,Output,[]).

ojas(H,[{IPId,Is}|IAcc],[{IPId,WPs}|Input_PIdPs],Output,Acc)->

Updated_WPs = ojas_rule(H,Is,WPs,Output,[]),

ojas(H,IAcc,Input_PIdPs,Output,[{IPId,Updated_WPs}|Acc]);

ojas(_H,[],[],_Output,Acc)->

lists:reverse(Acc);

ojas(_H,[],[{bias,WPs}],Output,Acc)->

lists:reverse([{bias,WPs}|Acc]).

%ojas/5 function operates on each Input_PIdP, calling the ojas_rule/5 function which processes each of the complementary Is and WPs lists, producing the Updated_WPs list in return, with the updated/adapted weights.

ojas_rule(H,[I|Is],[{W,[]}|WPs],Output,Acc)->

Updated_W = functions:saturation(W + H*Output*(I - Output*W),?SAT_LIMIT), ojas_rule(H,Is,WPs,Output,[{Updated_W,[H]}|Acc]);

ojas_rule(_H,[],[],_Output,Acc)->

lists:reverse(Acc).

%ojas_rule/5 updates every synaptic weight using the Oja's learning rule.

With the implementation of this learning rule complete, we now move forward and discuss neural plasticity through neuromodulation.

Thus far we have discussed and implemented the Hebbian learning, which is a homosynaptic plasticity (also known as homotropic modulation) method, where the synaptic strength changes based on its history of activation. It is a synaptic weight update rule which is a function of its post- and pre- synaptic activity, as shown in Fig-15.2 . But research shows that there is another approach to synaptic plasticity which nature has discovered, a highly dynamic and effective one, plas- ticity through neuromodulation.

Fig. 15.2 Homosynaptic mechanism for Neuron A's synaptic weight updating, based on the pre- and post- synaptic activity of neuron A.

Neuromodulation is a form of heterosynaptic plasticity. In heterosynaptic plas- ticity the synaptic weights are changed due to the synaptic activity of other neu- rons, due to the modulating signals other neurons can produce to affect the given neuron's synaptic weights. For example, assume we have a neural circuit com- posed of two neurons, a presynaptic neuron N1, and a postsynaptic neuron N2. There can be other neurons N3, N4... which also connect to N2, but their neuro- transmitters affect N2's plasticity, rather than being used as signals on which the N2's output signal is based on. The accumulated signals, neurotransmitters, from N3, N4..., could then dictate how rapidly and in what manner N2's connection strengths change. This type of architecture is shown in Fig-15.3 .

Fig. 15.3 Heterosynaptic mechanism for plasticity, where the Hebbian plasticity is modu- lated by a modulatory signal from neurons N3 and N4.

If we assume the use of the Generalized Hebbian learning rule for the synaptic weight update rule: Updated_W i = W i + h*(A*I i *Output + B*I i + C*Output + D), then the accumulated neuromodulatory signals from the other neurons could be used to calculate the learning parameter h, with the parameters A, B, C, and D evolved and specified within the postsynaptic neuron N2. In addition, the neuromodulatory signals from neurons N3, N4... could also be used to modulate and specify the parameters A, B, C, and D, as well.

The modulating neurons could be standard neurons, and whether their output signals are used as modulatory signals, or standard input signals, could be deter- mined fully by the postsynaptic neuron to which they connect, as shown in Fig- 15.4 .

Fig. 15.4 Input signals used as standard signals, and as modulatory signals, dependent on how the postsynaptic neuron decides to treat the presynaptic signals.

Another possible approach is to set-up secondary neurons to the postsynaptic neuron N2 which we want modulated, where the secondary neurons receive exact- ly the same input signals as the postsynaptic neuron N2, but the output signals of these secondary neurons are used as modulatory signals of N2. This type of topo- logical and architectural setup is shown in Fig-15.5 .

Fig. 15.5 Secondary neurons, created and used specifically for neuromodulation.

Through the use of dedicated modulatory neurons, it is possible to evolve whole modulatory networks. Complex systems whose main role is to modulate another neural network's plasticity and learning, its long-term potentiation, its ability to form memory. In this method, the generated learning parameter is signal specific, and itself changes; the learning ability and form evolves with everything else. Unlike the simple Hebbian or Oja's learning rule, these plasticity systems would depend on the actual input signals, on the sensory signals, and other regula- tory and processing parts of the neural network system, which is a much more bio- logically faithful neural network architecture, and would allow our system to evolve even more complex behaviors.

Nature uses a combination of the architectures shown in figures 15.1 through 15.5 . We have already discussed the Hebbian learning rule, and implemented the architecture of Fig-15.2 . We now add the functionality to give our neuroevolutionary system the ability to evolve NN systems with architectures shown in Fig-15.4 and Fig-15.5 . This will give our systems the ability to evolve self adaptation, and learning.

15.3.1 The Neuromodulatory Architecture

The architecture in Fig-15.5 could be easily developed using our already exist- ing architecture, and it would even increase the ratio of neural computations per- formed by the neuron to the number of signals sent to the neuron. This is im- portant because Erlang becomes more effective with big computations and small messages. The way we can represent this architecture is through the weight_parameters based approach. The weight_parameters could be thought of

as synaptic weights themselves, but for the secondary neurons. These secondary neurons share the process of the neuron they are to modulate, and because the sec- ondary neurons need to process the same input vectors that the neuron they are modulating is processing, it makes this design highly efficient. This architectural implementation is shown in Fig-15.6 .

Fig. 15.6 The architectural implementation of neuromodulation through dedicat- ed/embedded modulating neurons.

In the above figure we see three neurons: N1, N2, and N3, connected to another neuron, which is expanded in the figure and whose architecture is shown. This neuron has a standard activation function, and a learning rule, but its input_idps list is extended. What we called parameters in the other learning rules, are here used as synaptic weights belonging to this neuron's embedded/dedicated modulat- ing neurons: D 1 , D 2 , and D 3 . Furthermore, each dedicated/embedded modulating neuron (D 1 ,D 2 ,D 3 ) can have its own activation function, but usually just uses the tanh function.

If each weight parameter list is of length 1, then there is only a single dedicated modulating neuron, and the dedicated neuron's output can be designated as the learning parameter: h . The learning parameters A , B , C , and D , can be specified by the neural_parameters list. Or we can have the weight parameters list be of size 2, and thus specify 2 dedicated modulating neurons, whose outputs would dictate the learning parameters h and A , with the other parameters specified in the neu- ral_parameters list. Finally, we can have the weight parameters list be of length 5, thus representing the synaptic weights of 5 dedicated modulating neurons, whose outputs specify all the parameters (h, A, B, C, D) of the General Hebbian learning rule.

Having 5 separate dedicated modulating neurons does have its problems though, because it magnifies the number of synaptic weights/parameters our neuroevolutionary system has to tune, mutate, and set up. If our original neuron, without plasticity, had a synaptic weight list of size 10, this new modulated neuron would have 60 synaptic weight parameters for the same 10 inputs. All of these pa- rameters would somehow have to be specified, tuned, and made to work perfectly with each other, and this would all only be a single neuron. Nevertheless, it is an efficient implementation of the idea, and would be easy to add due to the way our neuroevolutionary system's architecture is set up.

To allow for general neuromodulation ( Fig-15.3 ), so that the postsynaptic neu- ron can designate some of the presynaptic signals as holding standard information, and others as holding modulatory information, could be done in a number of ways. Let us consider two of such approaches next:

1. This approach would require us adding a new element to the neuron record, akin to input_idps. We could add a secondary such element and designate it in- put_idps_modulation . It too would be represented as a list of tuples: [{In- put_Id,Weight}...] , but the resulting computed dot product, sent through its own activation function, would be used as a learning parameter. But which of the learning parameters? H, A, B, C, or D? The standard approach is to use the fol- lowing equation: Updated_W = M_Output*H*(A*I*Output + B*Output + C*Output + D ) , where M_Output is the output signal produced by processing the input signals using the synaptic weights specified in the in- put_idps_modulation list , and where the parameters H, A, B, C, and D are simply neural_parameters, and as other parameters can be perturbed and evolved during the tuning phase and/or during the topological mutation phase.

How would the post synaptic neuron decide whether the new connection (add- ed during the topological mutation phase) should be used as a standard signal, and thus be added to the input_idps list, or as modulatory input signal, and thus added to input_idps_modulation list? We could set up a rule so that if the neuron is des- ignated to have general modulation based plasticity, the very first connection to the neuron is designated as standard input, and then any new connections are ran- domly sorted into either the input_idps or input_idps_modulation lists. To add this approach would only require adding a new list, and we would already have all the necessary functions to mutate its parameters, to clone it during neuronal cloning process, and to process input signals, because this new list would be exactly like the input_idps list. The overhead of simply adding this extra parameter, in- put_idps_modulation, to the neuron record, would be minuscule, and this architec- ture is what was represented in Fig-15.4 .

2. Another way a neuron could decide on whether the presynaptic signal sent to it is standard or modulatory, is by us having neuronal types, where some neurons are type: standard , and others are type: modulatory . The signals sent by modu- latory neurons are always used by all postsynaptic neurons for modulating the generalized Hebbian plasticity rule. The architecture of this type of system is

shown in Fig-15.7 . In this figure I show a NN topology composed of standard neurons (std), and modulatory neurons (mod). They are all interconnected, each can receive signals from any other. The difference in how those signals are processed is dependent on the presynaptic neuron's type. If it is of type mod, then it is used as modulatory, if it is type std, then it is used as a standard input signal. Modulatory neurons can even modulate other modulatory neurons, while the outputs of the standard neurons can be used by both standard and modulatory neurons.

Fig. 15.7 A topology of a heterosynaptic, general, neural network system with neurons of type standard (std) and modulatory (mod).

3. But the first and second implementation does not solve the problem that the Hebbian learning rule uses multiple parameters, and we want to have the flexi- bility to specify 1 or more of them, based on the incoming modulatory signals. Another solution that does solve this is by tagging input signals with tags i , h , a , b , c , d , where i tags the standard inputs, and h , a , b , c , and d , tag the modula- tory input signals associated with the tag named modulating learning parame- ter. Though this may at first glance seem like a more complex solution, we ac- tually already have solved it, and it would require us only changing a few functions.

We are already generating weight based parameters. Thus far they have been lists, but they can also be atomic tags as follows: [{Input_PId, [{Weight1,Tag1}, {Weight2,Tag2}...]}...] . This is a clean solution that would allow us to designate different incoming signals to be used for different things. Mutation operators would not need to be modified significantly either, we would simply add a clause stating that if the neuron uses the general_modulation plasticity function, then the Tag is generated randomly from the following list: [i, h, a, b, c, d] . The most sig- nificant modification would have to be done to the signal_aggregation function,

since we would need to sort the incoming signals based on their tags, and then cal- culate the different output signals based on their tags, with the i output signal be- ing the standard one produced by the postsynaptic neuron, and the h , a , b , c , and d , output signals being used as modulatory learning parameters. But even that could be isolated to just the plasticity function, which has access to the IAcc, In- put_PIdPs, and everything else necessary to compute output signals. The architec- ture of a neuron using this approach to general neuromodulation is shown in Fig- 15.8 .

Fig. 15.8 Tag based architecture of a general neuromodulation capable neural network.

What is the computational difference between all of these neuromodulation ap- proaches? How would the neural networks act differently when evolved with one approach rather than another? Would it even be possible to see the difference? Should we implement them all, provide all of these options to the neuroevolutionary system in hopes that it can sort things out on its own, and use the best one (throwing everything at the wall, and see what sticks)? How do we test which of these plasticity type architectures is better? How do we define “bet- ter ”? Do we define it as the NN evolving faster (the neuroevolutionary system tak- ing less number of evaluations to evolve a solution for some given problem)? Or do we define better as having the evolved NNs more dynamic, more adaptive, more general, but evolved slower due to so many different parameters for the evo- lutionary process to having to deal with? These are all open research questions.

We cannot test the effectiveness of plasticity enabled neural network systems on the standard double pole balancing, xor, or clustering type of benchmarks and tests. To test how well a plasticity enabled NN system functions, we need to apply our neuroevolutionary system to a problem where environment changes, where adaptation and learning over time gives an advantage. We could test plasticity by using it in the ALife simulation, T-Maze and double T-Maze navigation [2,3], or by applying it to some other robotics & complex navigation project. Though the small differences between these various modulatory approaches might require a lot of work to see, since evolution will tend to go around any small problems

posed by any one implementation or architecture over another. Nevertheless, the fact that it is so easy for us to implement, test, and research these advanced learn- ing rules and plasticity approaches, means that we can find out, we can determine what works better, and what approach will yield a more general, more intelligent, neural network based agent. If our system were not have been written in Erlang, adding neuroplasticity would have posed a much greater problem.

We will implement the dedicated neuromodulators (where the weight parame- ters represent the synaptic weights of embedded secondary neurons, whose output dictates the parameters of the general Hebbian learning rule), and the general neuromodulation plasticity through the use of the input_idps_modulation element. Our plasticity function using the first of these two approaches will be called: self_modulation , and the second: general_modulation. In the next section we will further define and implement these neuromodulatory based learning rules.

15.3.2 Implementing the self_modulation Learning Rules

We will first implement the self_modulation plasticity function. Given the gen- eral Hebbian learning rule for synaptic weight updating: Updated_W i = W i + H*(A*I i *Output + B*I i + C*Output + D), we can have multiple versions of this function. Version-1: where the secondary embedded neuron only outputs the H learning parameter, with the parameter A set to some predetermined constant val- ue within the neural_parameters list, and B=C=D=0. Version-2: where A is gener- ated randomly when generating the neural_parameters list, and B=C=D=0. Ver- sion-3: where B, C, and D are also generated randomly in the neural_parameters list. Version-4: where the weight_parameters generates a list of length 2, thus al- lowing the neuron to have 2 embedded modulatory neurons, one outputting a pa- rameter we use for H, and another outputting the value we can use as A, with B=C=D=0. Version-5: Where B, C, and D are generated randomly by the PlasticityFunctionName(neural_parameters) function. And finally Version-6: Where the weight_parameters produces a list of length 5, allowing the neuron to have 5 embedded modulatory neurons, whose outputs are used for H, A, B, C, and D. All of these variations will have most of their functionality shared, and thus will be quick and easy to implement.

The self_modulationV1 , self_modulationV2 , and self_modulationV3 are all very similar, mainly differing in the parameter lists returned by the PlasticityFunctionName(neural_parameters) function, as shown in Listing 15.6. All three of these plasticity functions use the neuromodulation/5 function which accepts the H, A, B, C, and D learning parameters, and updates the synaptic weights of the neuron using the general Hebbian rule: Updated_W i = W i + H*(A*I i *Output + B*I i + C*Output + D) .

Listing-15.6 The self_modulationV1-3 functions of arity 1, generating the neural and weight parameters.

self_modulationV1(neural_parameters)->

A=0.1,

B=0,

C=0,

D=0,

[A,B,C,D];

self_modulationV1(weight_parameters)->

[(lists:random()-0.5)].

self_modulationV1([A,B,C,D],IAcc,Input_PIdPs,Output)->

H = math:tanh(dot_productV1(IAcc,Input_PIdPs)),

neuromodulation([H,A,B,C,D],IAcc,Input_PIdPs,Output,[]).

dot_productV1(IAcc,IPIdPs)->

dot_productV1(IAcc,IPIdPs,0).

dot_productV1([{IPId,Input}|IAcc],[{IPId,WeightsP}|IPIdPs],Acc)->

Dot = dotV1(Input,WeightsP,0),

dot_productV1(IAcc,IPIdPs,Dot+Acc);

dot_productV1([],[{bias,[{_Bias,[H_Bias]}]}],Acc)->

Acc + H_Bias;

dot_productV1([],[],Acc)->

Acc.

dotV1([I|Input],[{_W,[H_W]}|Weights],Acc) ->

dotV1(Input,Weights,I*H_W+Acc);

dotV1([],[],Acc)->

Acc.

neuromodulation([H,A,B,C,D],[{IPId,Is}|IAcc],[{IPId,WPs}|Input_PIdPs],Output,Acc)->

Updated_WPs = genheb_rule([H,A,B,C,D],Is,WPs,Output,[]),

neuromodulation([H,A,B,C,D],IAcc,Input_PIdPs,Output,[{IPId,Updated_WPs}|Acc]);

neuromodulation(_NeuralParameters,[],[],_Output,Acc)->

lists:reverse(Acc);

neuromodulation([H,A,B,C,D],[],[{bias,WPs}],Output,Acc)->

Updated_WPs = genheb_rule([H,A,B,C,D],[1],WPs,Output,[]),

lists:reverse([{bias,Updated_WPs}|Acc]).

genheb_rule([H,A,B,C,D],[I|Is],[{W,Ps}|WPs],Output,Acc)->

Updated_W = functions:saturation(W + H*(A*I*Output + B*I + C*Output + D), ?SAT_LIMIT),

genheb_rule(H,Is,WPs,Output,[{Updated_W,Ps}|Acc]);

genheb_rule(_H,[],[],_Output,Acc)->

lists:reverse(Acc).

self_modulationV2(neural_parameters)->

A=(lists:random()-0.5),

B=0,

C=0,

D=0,

[A,B,C,D];

self_modulationV2(weight_parameters)->

[(lists:random()-0.5)].

self_modulationV2([A,B,C,D],IAcc,Input_PIdPs,Output)->

H = math:tanh(dot_productV1(IAcc,Input_PIdPs)),

neuromodulation([H,A,B,C,D],IAcc,Input_PIdPs,Output,[]).

self_modulationV3(neural_parameters)->

A=(lists:random()-0.5),

B=(lists:random()-0.5),

C=(lists:random()-0.5),

D=(lists:random()-0.5),

[A,B,C,D];

self_modulationV3(weight_parameters)->

[(lists:random()-0.5)].

self_modulationV3([A,B,C,D],IAcc,Input_PIdPs,Output)->

H = math:tanh(dot_productV1(IAcc,Input_PIdPs)),

neuromodulation([H,A,B,C,D],IAcc,Input_PIdPs,Output,[]).

The self_modulationV4 – V5 differ only in that the weight_parameters is a list of length 2, and the A parameter is no longer specified in the neural_parameters list, and is instead calculated by the second dedicated modulatory neuron. The self_modulationV6 function on the other hand specifies the neural_Parameters as an empty list, and the weight_parameters list is of length 5, a single weight for every embedded modulatory neuron. The implementation of self_modulationV6 is shown in Listing-15.7.

Listing-15.7 The implementation of the self_modulationV6 plasticity function, composed of 5 embedded modulatory neurons.

self_modulationV6(neural_parameters)->

[];

self_modulationV6(weight_parameters)->

[(lists:random()-0.5),(lists:random()-0.5),(lists:random()-0.5), (lists:random()-0.5),

(lists:random()-0.5)].

self_modulationV6(_Neural_Parameters,IAcc,Input_PIdPs,Output)->

{AccH,AccA,AccB,AccC,AccD} = dot_productV6(IAcc,Input_PIdPs),

H = math:tanh(AccH),

A = math:tanh(AccA),

B = math:tanh(AccB),

C = math:tanh(AccC),

D = math:tanh(AccD),

neuromodulation([H,A,B,C,D],IAcc,Input_PIdPs,Output,[]).

dot_productV6(IAcc,IPIdPs)->

dot_productV6(IAcc,IPIdPs,0,0,0,0,0).

dot_productV6([{IPId,Input}|IAcc],[{IPId,WeightsP}|IPIdPs],AccH,AccA,AccB,AccC,

AccD)->

{DotH,DotA,DotB,DotC,DotD} = dotV6(Input,WeightsP,0,0,0,0,0),

dot_productV6(IAcc,IPIdPs,DotH+AccH,DotA+AccA,DotB+AccB,DotC+AccC,DotD

+AccD);

dot_productV6([],[{bias,[{_Bias,[H_Bias,A_Bias,B_Bias,C_Bias,D_Bias]}]}],AccH,AccA,

AccB,AccC,AccD)->

{AccH + H_Bias,AccA+A_Bias,AccB+B_Bias,AccC+C_Bias,AccD+D_Bias};

dot_productV6([],[],AccH,AccA,AccB,AccC,AccD)->

{AccH,AccA,AccB,AccC,AccD}.

dotV6([I|Input],[{_W,[H_W,A_W,B_W,C_W,D_W]}|Weights],AccH,AccA,AccB,AccC,

AccD) ->

dotV6(Input,Weights,I*H_W+AccH,I*A_W+AccA,I*B_W+AccB,I*C_W+AccC,I*D_W+

AccD);

dotV6([],[],AccH,AccA,AccB,AccC,AccD)->

{AccH,AccA,AccB,AccC,AccD}.

The architecture of the neuron using this particular plasticity function is shown in Fig-15.9 . Since every synaptic weight of this neuron has a complementary pa- rameter list of length 5, with an extra synaptic weight for every secondary, embedded modulatory neuron that analyzes the same signals as the actual neuron, but whose output signals modulate the plasticity of the neuron, each neuron thus has x5 number of parameters (synaptic weights) that need to be tuned. This might be a price too high to pay by amplifying the curse of dimensionality. The more parameters that one needs to tune and set up concurrently, the more difficult it is to find a good com- bination of such parameters. Nevertheless, the generality it provides, and the abil- ity to use a single process to represent multiple embedded modulatory neurons, has its benefits in computational efficiency. Plus, our system does after all try to alleviate the curse of dimensionality through Targeted Tuning , by concentrating on the newly added and affected neurons of the NN system. And thus we might just be on the edge of this one.

Fig. 15.9 The architecture of the neuron using self_modulationV6 plasticity function.

We noted earlier that there is another approach to neuromodulation, one that is more biologically faithful, in which a postsynaptic neuron uses some of the signals coming from the presynaptic neurons as modulatory signals, and others as stand- ard signals. In the next section we will see what needs to be done to implement such a learning rule.

15.3.3 Implementing the input_idps_modulation Based Neuromodulated Plasticity

To implement neuromodulation using this method, we first modify the neuron's record by adding the input_idps_modulation element to it. The input_idps_modulation element will have the same purpose and formating as the input_idps element, to hold a list of tuples of the form: {Input_PId, WeightP}. The Input_PIds will be associ- ated with the elements that send the postsynaptic neuron its modulatory signals, with the WeightP being of the same format as in the input_Idps list.

This particular implementation of neuromodulation will not require a lot of work, due to the input_idps_modulation list having a format which we already can process with the developed functions. The neuron cloning function in the genotype can be used to clone this list, the Id to PId conversion performed by the exoself to compose the Input_PIdPs list is also viable here. Even the synaptic weight pertur- bation can be applied to this list, due to it having such a similar format. The main changes we have to perform are to the neuron's main loop.

We must convert the neuron's main loop such that it can support 2 Input_PId lists, the SI_PIds (standard input PId list), and the MI_PIds (modulatory input PId

list), in the same way that the original neuron implementation supported the single Input_PIds list created from the Input_PIdPs. With these two lists we can then ag- gregate the input signals, and sort them either in to the standard input signal ac- cumulator, or the modulatory signal accumulator, dependent on whether the in- coming signal was coming from an element with an SI_PId or an MI_PId.

To make the implementation and the source code cleaner, we will create a state record for the neuron, which will contain all the necessary elements it requires for operation:

-record(state,{

id,

cx_pid,

af,

pf,

aggrf,

si_pids=[],

si_pidps_current=[],

si_pidps_backup=[],

mi_pids=[],

mi_pidps_current=[],

mi_pidps_backup=[],

output_pids=[],

ro_pids=[]

}).

With this state record, we update the prep/1 function to use it, and clean the original loop function to hide all the non-immediately used lists and data in the state record. As in the original neuron process implementation, we have to create the Input_PId list so that the incoming signals can be sorted in the same order that the Input_PIdPs are sorted. This time though, we have two such lists, designated as the SI_PIdPs (the standard one), and the MI_PIdPs (the modulatory one). Thus we create two PId lists for the loop.

The main problem here is that as the neuron accumulates its input signals, one of these PId lists will empty out first, which would require a new clause to deal with it, since our main loop uses: [SI_PId|SI_PIds],[MI_PId|MI_PIds]. We did not have such a problem when we only used a single list, because when that list emp- tied out, the signal accumulation was finished. To avoid having to create a new clause, we add the atom ok to the end of both PId lists, and put the clause: loop(S,ExoSelf_PId,[ok],[ok],SIAcc,MIAcc) above the main loop. Because of the ok atom at the end of both lists, neither goes empty, letting us keep a single clause with the final state for both lists being [ok], which is achieved after the neuron has accumulated all the incoming standard and modulatory signals. The only problem with this setup is that the first clause is always pattern matched before the main loop, making the neuron process slower and less efficient. There are other ways to

implement this, and we could even set up two different main process loops, one for when the neuron uses neuromodulation, and one for when it does not (and thus needing only a single PId list). But this implementation is the most concise, and cleanest. The neuron process can always be optimized later on. The modified prep/1 function, and the neuron's new main loop, are shown in Listing-15.8.

Listing-15.8 The updated implementation of the neuron process.

prep(ExoSelf_PId) ->

random:seed(now()),

receive

{ExoSelf_PId,{Id,Cx_PId,AF,PF,AggrF,SI_PIdPs,MI_PIdPs,Output_PIds,

RO_PIds}} ->

fanout(RO_PIds,{self(),forward,[?RO_SIGNAL]}),

SI_PIds = lists:append([IPId || {IPId,_W} <- SI_PIdPs, IPId =/= bias],[ok]), MI_PIds = lists:append([IPId || {IPId,_W} <- MI_PIdPs, IPId =/= bias],[ok]), io:format( “SI_PIdPs:~p ~nMI_PIdPs:~p~n ”,[SI_PIdPs,MI_PIdPs]),

S=#state{

id=Id,

cx_pid=Cx_PId,

af=AF,

pf=PF,

aggrf=AggrF,

si_pids=SI_PIds,

si_pidps_current=SI_PIdPs,

si_pidps_backup=SI_PIdPs,

mi_pids=MI_PIds,

mi_pidps_current=MI_PIdPs,

mi_pidps_backup=MI_PIdPs,

output_pids=Output_PIds,

ro_pids=RO_PIds

},

loop(S,ExoSelf_PId,SI_PIds,MI_PIds,[],[])

end.

%When gen/1 is executed, it spawns the neuron element and immediately begins to wait for its initial state message from the exoself. Once the state message arrives, the neuron sends out the default forward signals to any elements in its ro_ids list, if any. Afterwards, the prep function drops into the neuron's main loop.

loop(S,ExoSelf_PId,[ok],[ok],SIAcc,MIAcc)->

PF = S#state.pf,

AF = S#state.af,

AggrF = S#state.aggrf,

{PFName,PFParameters} = PF,

Ordered_SIAcc = lists:reverse(SIAcc),

SI_PIdPs = S#state.si_pidps_current,

SAggregation_Product = signal_aggregator:AggrF(Ordered_SIAcc,SI_PIdPs),

SOutput = functions:AF(SAggregation_Product),

Output_PIds = S#state.output_pids,

[Output_PId ! {self(),forward,[SOutput]} || Output_PId <- Output_PIds], Ordered_MIAcc = lists:reverse(MIAcc),

MI_PIdPs = S#state.mi_pidps_current,

MAggregation_Product = signal_aggregator:dot_product(Ordered_MIAcc,MI_PIdPs),

MOutput = functions:tanh(MAggregation_Product),

U_SI_PIdPs = plasticity:PFName([MOutput|PFParameters],Ordered_SIAcc,SI_PIdPs,

SOutput),

U_S=S#state{

si_pidps_current = U_SI_PIdPs

},

SI_PIds = S#state.si_pids,

MI_PIds = S#state.mi_pids,

loop(U_S,ExoSelf_PId,SI_PIds,MI_PIds,[],[]);

loop(S,ExoSelf_PId,[SI_PId|SI_PIds],[MI_PId|MI_PIds],SIAcc,MIAcc)->

receive

{SI_PId,forward,Input}->

loop(S,ExoSelf_PId,SI_PIds,[MI_PId|MI_PIds],[{SI_PId,Input}|SIAcc],

MIAcc);

{MI_PId,forward,Input}->

loop(S,ExoSelf_PId,[SI_PId|SI_PIds],MI_PIds,SIAcc,[{MI_PId,Input}|

MIAcc]);

{ExoSelf_PId,weight_backup}->

U_S = S#state{

si_pidps_backup=S#state.si_pidps_current,

mi_pidps_backup=S#state.mi_pidps_current

},

loop(U_S,ExoSelf_PId,[SI_PId|SI_PIds],[MI_PId|MI_PIds],SIAcc,MIAcc);

{ExoSelf_PId,weight_restore}->

U_S = S#state{

si_pidps_current=S#state.si_pidps_backup,

mi_pidps_current=S#state.mi_pidps_backup

},

loop(U_S,ExoSelf_PId,[SI_PId|SI_PIds],[MI_PId|MI_PIds],SIAcc,MIAcc);

{ExoSelf_PId,weight_perturb,Spread}->

Perturbed_SIPIdPs=perturb_IPIdPs(Spread,S#state.si_pidps_backup),

Perturbed_MIPIdPs=perturb_IPIdPs(Spread,S#state.mi_pidps_backup),

U_S = S#state{

si_pidps_current=Perturbed_SIPIdPs,

mi_pidps_current=Perturbed_MIPIdPs

},

loop(U_S,ExoSelf_PId,[SI_PId|SI_PIds],[MI_PId|MI_PIds],SIAcc,MIAcc);

{ExoSelf_PId,reset_prep}->

neuron:flush_buffer(),

ExoSelf_PId ! {self(),ready},

RO_PIds = S#state.ro_pids,

receive

{ExoSelf_PId, reset}->

fanout(RO_PIds,{self(),forward,[?RO_SIGNAL]})

end,

loop(S,ExoSelf_PId,S#state.si_pids,S#state.mi_pids,[],[]);

{ExoSelf_PId,get_backup}->

NId = S#state.id,

ExoSelf_PId ! {self(),NId,S#state.si_pidps_backup,S#state.mi_pidps_backup},

loop(S,ExoSelf_PId,[SI_PId|SI_PIds],[MI_PId|MI_PIds],SIAcc,MIAcc);

{ExoSelf_PId,terminate}->

io:format( “Neuron:~p is terminating.~n ”,[self()])

end.

With the implementation of the updated neuron now complete, we need to cre- ate the neuromodulation function in the plasticity module. Since the modulatory signals will be used to compute a nonlinear value used to modulate the standard general Hebbian rule, we will not need any weight_parameters and so our plasticity function will produce an empty weight_parameters list. But we will need the general neural_parameters for the hebbian function, thus the neuromodulation/1 function exe- cuted with the neuronal_parameters atom will return a list with 5 randomly generated (and later tuned and evolved) parameters: [H,A,B,C,D] . The neuromodulation/4 func- tion is very simple, since it is executed with a list of all the necessary parameters to call the neurmodulation/5 function that applies the general hebbian rule to all the synaptic weights. These two added functions are shown in Listing-15.9.

Listing-15.9 The implementation of the neuromodulation/1 and neuromodulation/4 functions. neuromodulation(neural_parameters)->

H = (lists:random()-0.5),

A = (lists:random()-0.5),

B = (lists:random()-0.5),

C = (lists:random()-0.5),

D = (lists:random()-0.5),

[H,A,B,C,D];

neuromodulation(weight_parameters)->

[].

neuromodulation([M,H,A,B,C,D],IAcc,Input_PIdPs,Output)->

Modulator = scale_dzone(M,0.33,?SAT_LIMIT),

neuromodulation([Modulator*H,A,B,C,D],IAcc,Input_PIdPs,Output,[]).

The value M is the one computed by using the synaptic weights of the in- put_idps_modulation, using the dot_product signal aggregator, and the hyperbolic tangent (tanh) activation function. Since H scales the plasticity in general, multi- plying the Modulator value by H allows for the modulation signal to truly modu- late synaptic plasticity based on the parameters evolved by the neuron.

The Modulator value is computed by executing the scale_dzone/3 function, which performs 2 tasks:

1. Zero out M if it is between -0.33 and 0.33.

2. If M is greater than 0.33 or less than -0.33, normalize and scale it to be between 0 and ?SAT_LIMIT, or 0 and -?SAT_LIMIT, respectively.

This means that M has to reach a particular magnitude for the Hebbian rule to be executed, since when the Modulator value is 0 and is multiplied by H, the weights are not updated. The scale_dzone/3 function, and its supporting function, are shown in Listing-15.10.

Listing-15.10 The implementation of scale_dzone and scale function. scale_dzone(Val,Threshold,MaxMagnitude)->

if

Val > Threshold ->

(functions:scale(Val,MaxMagnitude,Threshold)+1)*MaxMagnitude/2;

Val < -Threshold ->

(functions:scale(Val,-Threshold,-MaxMagnitude)-1)*MaxMagnitude/2;

true ->

0

end.

scale(Val,Max,Min)->

case Max == Min of

true ->

0;

false ->

(Val*2 - (Max+Min))/(Max-Min)

end.

%The scale/3 function scales Val to be between -1 and 1, with the scaling dependent on the

Max and Min value, using the equation: Scaled_Val = (Val*2 - (Max + Min))/(Max-Min). The function scale_dzone/3 zeroes the Val parameter if it is below the threshold, and scales it to be between Threshold and MaxMagnitude if it is above the threshold.

Though we have now successfully implemented the autoassociative learning rules, and neuromodulation, we cannot use those features until we create the nec- essary tuning and mutation operators, such that our neuroevolutionary system can actually tune in the various learning parameters, and add the synaptic weights

15.4 Plasticity Parameter Mutation Operators

needed by the neuromodulation functionality. We discuss and implement these necessary features in the next section.

For the plasticity based learning rules to be useful, our neuroevolutionary sys- tem must be able to optimize them. For this we need to create new mutation opera- tors. Though we could add the new mutation operators to the genome_mutator module, we will do something different instead. Since each plasticity function has its own restrictions (which learning parameters can/should be modified, and which can/should not be), and because there are so many of the different variants, and many more to be added as time goes on, it would not be effective to create these mutation operators inside the genome_mutator module. The genome_mutator should concentrate on the standard topology oriented mutation operators.

To more effectively handle this, we can offload these specialized mutation op- erators in the same way we offloaded the generation of the initial plasticity param- eters, to the plasticity module itself. We can add a single mutation operator mu- tate_plasticity, which when executed, executes the plasticity:PFName(Agent_Id, mutate) function. Then the researcher which created the various plasticity function variants and types, can also create the mutation operator functions for it, whether they simply perturb neural level learning parameters, synaptic weight level param- eters, or perform a more complex mutation. And of course if the plasticity func- tion is set to none, we will have the function plasticity:none(Agent_Id,mutate) ex- ecute: exit( “Neuron does not support plasticity. ”) , which will allow our neuroevolutionary system to attempt another mutation operator, without wasting the topological mutation try.

The plasticity specializing mutation operators should perform the following general operations:

If the neuron uses neural_parameters, randomly choose between 1 and math:sqrt(TotParameters) number of parameters, and perturb them with a value selected randomly between -Pi and Pi.

If the neuron uses weight_parameters, randomly choose between 1 and math:sqrt(TotWeightParameters) number of parameters, and perturb them with a value selected randomly between -Pi and Pi.

If the neuron uses both, neural_parameters and weight_parameters, randomly choose one or the other, and perturb that parameter list using one of the above approaches, depending which of the two apply.

The neuromodulation is a special case, since it does not only have the global neural_level parameters which can be mutated/perturbed using the standard meth- od listed above, but also allows for the establishment of new modulatory connec- tions. Because the input_idps_modulation list has the same format as the standard

input_idps list, we can use the already existing synaptic connection establishing mutation operators and functions. The only modification we need to make so that some of the connections are standard, and others are modulatory, is set a case such that if the neuron to which the connection is being established has neuromodulation enabled, then the choice of whether the new connection will be standard or modulatory is 50/50, and if there is no neuromodulation enabled, then only the standard connection is allowed.

15.4.1 Implementing the Weight Parameter Mutation Operator

We first create the mutation operators which are applied to the weight_parameters. This mutation operator, executed when the plasticity function is run with the parameter: {N_Id,mutate} , performs similarly to the standard per- turb_IPIdPs/2 function, but instead of mutating the synaptic weights, it operates on, and mutates the, parameter values. The probability for any weight parameter to be perturbed is 1/math:sqrt(TotParameters) . The plasticity functions that only use weight_parameters are the hebbian_w and ojas_w . Because in both of these plas- ticity functions the same implementation for the mutator is used, only the hebbian_w/1 version is shown (the difference for the ojas_w version is that instead of hebbian_w({N_Id,mutate}), we have ojas_w({N_Id,mutate})). This implemen- tation is shown in Listing-15.11.

Listing-15.11 Implementation of the plasticity function based weight_parameter mutation oper- ators.

hebbian_w({N_Id,mutate})->

random:seed(now()),

N = genotype:read({neuron,N_Id}),

InputIdPs = N#neuron.input_idps,

U_InputIdPs=perturb_parameters(InputIdPs,?SAT_LIMIT),

N#neuron{input_idps = U_InputIdPs};

hebbian_w(neural_parameters)->

[];

hebbian_w(weight_parameters)->

[(lists:random()-0.5)].

%hebbian_w/1 function produces the necessary parameter list for the hebbian_w learning rule

to operate. The parameter list for the simple hebbian_w learning rule is a parameter list com- posed of a single parameter H: [H], for every synaptic weight of the neuron. When hebbian_w/1 is called with the parameter neural_parameters, it returns []. When hebbian_w/1 is executed with the {N_Id,mutate} tuple, the function goes through every parameter in the neuron's in- put_idps, and perturbs the parameter value using the specified spread (?SAT_LIMIT).

perturb_parameters(InputIdPs,Spread)->

TotParameters = lists:sum([lists:sum([length(Ps) || {_W,Ps} <- WPs]) || {_Input_Id, WPs} <- InputIdPs]),

MutationProb = 1/math:sqrt(TotParameters), [{Input_Id,[{W,perturb(Ps,MutationProb,Spread,[])}|| {W,Ps} <- WPs]} || {Input_Id,

WPs} <- InputIdPs].

%The perturb_parameters/2 function goes through every tuple in the InputIdPs list, extracts the WeightPlus blocks for each input connection, calculates the total number of weight parameters the neuron has, and from it the probability with which those parameters will be perturbed. The function then executes perturb/4 to perturb the said parameters.

perturb([Val|Vals],MutationProb,Spread,Acc)->

case random:uniform() < MutationProb of

true ->

U_Val = sat((random:uniform()-0.5)*2*Spread+Val,Spread,

Spread),

perturb(Vals,MutationProb,Spread,[U_Val|Acc]);

false ->

perturb(Vals,MutationProb,Spread,[Val|Acc])

end;

perturb([],_MutationProb,_Spread,Acc)->

lists:reverse(Acc).

%The perturb/5 function is executed with a list of values and a probability with which each value has the chance of being perturbed. The function then goes through every value and per- turbs it with the given probability.

15.4.2 Implementing the Neural Parameter Mutation Operator

We next create the mutation operators which are applied to the neu- ral_parameters, which are lists of values. To accomplish this, we just make that list pass through a function which with some probability, 1/sqrt(ListLength) , per- turbs the values within it. We add such mutation operators to the plasticity func- tions which only use the neural_parameters. The following plasticity functions on- ly use the neural_parameters: hebbian , ojas , and the neuromodulation . Since all 3 would use exactly the same implementation, only the neuromodulation/1 im- plementation is shown in Listing-15.12.

Listing-15.12 Implementation of the neural_parameters mutation operator. neuromodulation({N_Id,mutate})->

random:seed(now()),

N = genotype:read({neuron,N_Id}),

{PFName,ParameterList} = N#neuron.pf,

MSpread = ?SAT_LIMIT*10,

MutationProb = 1/math:sqrt(length(ParameterList)),

U_ParameterList = perturb(ParameterList,MutationProb,MSpread,[]),

U_PF = {PFName,U_ParameterList},

N#neuron{pf=U_PF};

neuromodulation(neural_parameters)->

H = (lists:random()-0.5),

A = (lists:random()-0.5),

B = (lists:random()-0.5),

C = (lists:random()-0.5),

D = (lists:random()-0.5),

[H,A,B,C,D];

neuromodulation(weight_parameters)->

[].

%neuromodulation/1 function produces the necessary parameter list for the neuromodulation

learning rule to operate. The parameter list for this learning rule is a list composed of parame-

ters H,A,B,C,D: [H,A,B,C,D]. When the function is executed with the {NId,mutate} parameter,

it calculates the perturbation probability of every parameter through the equation: 1/math:sqrt(length(ParameterList)), and then executes the perturb/5 function to perturb the ac- tual parameters.

The above shown mutation operator, called by executing neuromodulation/1 with the parameter {N_Id,mutate}, uses the perturb/4 function from the weight_parameters based mutation operator which was shown in the previous list- ing, Listing-15.11.

15.4.3 Implementing the Hybrid, Weight & Neural Parameters Mutation Operator

Finally, we also have plasticity functions which have both, neural_parameters and weight_parameters. This is the case for example for the self_modulationV5, V3, and V2 learning rules. For these type of plasticity functions, we create a com- bination of the neural_parameters and weight_parameters mutation operators, as shown in Listing-15.13.

Listing-15.13 A hybrid of the neural_parameters and weight_parameters mutation operator, im- plemented here for the self_modulationV5 plasticity function.

self_modulationV5({N_Id,mutate})->

random:seed(now()),

N = genotype:read({neuron,N_Id}),

{PFName,ParameterList} = N#neuron.pf,

MSpread = ?SAT_LIMIT*10,

MutationProb = 1/math:sqrt(length(ParameterList)),

U_ParameterList = perturb(ParameterList,MutationProb,MSpread,[]),

U_PF = {PFName,U_ParameterList},

InputIdPs = N#neuron.input_idps,

U_InputIdPs=perturb_parameters(InputIdPs,?SAT_LIMIT),

N#neuron{pf=U_PF,input_idps=U_InputIdPs};

self_modulationV5(neural_parameters)->

B=(lists:random()-0.5),

C=(lists:random()-0.5),

D=(lists:random()-0.5),

[B,C,D];

self_modulationV5(weight_parameters)->

[(lists:random()-0.5),(lists:random()-0.5)].

For this plasticity module, this is all that is needed, there are only these 3 vari- ants. We now modify the genome_mutator module to include the mu- tate_plasticity_parameters mutation operator, and modify the functions which deal with linking neurons together, so that we can add the modulatory connection establishment functionality.

15.4.4 Updating the genome_mutator Module

Since our neuroevolutionary system can only apply to a population the muta- tion operators available in its constraint record, we first add the {mu- tate_plasticity_parameters,1} tag to the constraint's mutation_operators list. This means that the mutate_plasticity_parameter mutation operator has the same chance of being executed as any other mutation operator within the muta- tion_operators list. After having modified the constraint record, we add the mu- tate_plasticity_parameters/1 function to the genome_mutator module. It is a sim- ple mutation operator that chooses a random neuron from the NN, and through the execution of plasticity:PFName({N_Id,mutate}) function, mutates the plasticity parameters of that neuron, if that neuron has plasticity. If the neuron does not have plasticity enabled, then the plasticity:none/1 function is executed, which exits the mutation operator, letting our neuroevolutionary system try another mutation. The implemented mutate_plasticity_parameters/1 function is shown in Listing-15.14.

Listing-15.14 The implementation of the mutate_plasticity_parameters mutation operator. mutate_plasticity_parameters(Agent_Id)->

A = genotype:read({agent,Agent_Id}),

Cx_Id = A#agent.cx_id,

Cx = genotype:read({cortex,Cx_Id}),

N_Ids = Cx#cortex.neuron_ids,

N_Id = lists:nth(random:uniform(length(N_Ids)),N_Ids),

N = genotype:read({neuron,N_Id}),

{PFName,_Parameters} = N#neuron.pf,

U_N = plasticity:PFName({N_Id,mutate}),

EvoHist = A#agent.evo_hist,

U_EvoHist = [{mutate_plasticity_parameters,N_Id}|EvoHist],

U_A = A#agent{evo_hist=U_EvoHist},

genotype:write(U_N),

genotype:write(U_A).

%The mutate_plasticity_parameters/1 chooses a random neuron from the NN, and mutates the parameters of its plasticity function, if present.

Having implemented the mutation operator, we now look for the connec- tion/synaptic-link establishing functions. We need to modify these functions be- cause we want to ensure that if the neuron uses the neuromodulation plasticity function, then some of the new connections that are added to it through evolution, are randomly chosen to be modulatory connections rather than standard ones.

The functions that need to be updated are the following four:

add_bias/1: Because the input_idps_modulation can also use a bias weight. remove_bias/1: Because the input_idps_modulation should also be able to rid itself of its bias.

link_ToNeuron/4: Which is the function that actually establishes new links, and adds the necessary tuples to the input_idps list. We should be able to randomly choose whether to add the new tuple to the standard input_idps list, or the modulatory input_idps_modulation list.

cutlink_ToNeuron/3: Which is the function which cuts the links to the neuron, and removes the synaptic weight containing tuple from the input_idps list. We should be able to randomly choose whether to remove such a tuple from the in- put_idps or input_idps_modulation list.

Again, because of the way we developed, and modularized the code in the ge- nome_mutator module, almost everything with regards to linking is contained in the link_ToNeuron and cutlink_ToNeuron, so by just modifying those, and the add_bias/remove_bias functions, we will be done with the update.

Originally the add_bias/1 function checks whether the input_idps list already has a bias, and then adds a bias if it does not, and exits if it does. We now have to check whether input_idps and input_idps_modulation lists already have biases. To do this, we randomly generate a value by executing random:uniform(2), which generates either 1 or 2. If value 2 is generated, and the input_idps_modulation does not have a bias, we add one to it. Otherwise, if the input_idps list does not have a bias, we add one to it, and thus in the absence of neuromodulation based plasticity, probability of adding the bias to input_idps does not change. The modi- fied add_bias mutation operator is shown in Listing-15.15.

Listing-15.15 The updated add_bias mutation operator.

add_bias(Agent_Id)->

A = genotype:read({agent,Agent_Id}),

Cx_Id = A#agent.cx_id,

Cx = genotype:read({cortex,Cx_Id}),

N_Ids = Cx#cortex.neuron_ids,

N_Id = lists:nth(random:uniform(length(N_Ids)),N_Ids),

Generation = A#agent.generation,

N = genotype:read({neuron,N_Id}),

SI_IdPs = N#neuron.input_idps,

MI_IdPs = N#neuron.input_idps_modulation,

{PFName,_NLParameters} = N#neuron.pf,

case {lists:keymember(bias,1,SI_IdPs), lists:keymember(bias,1,MI_IdPs), PFName == neuromodulation, random:uniform(2)} of

{_,true,true,2} ->

exit( “********ERROR:add_bias:: This Neuron already has a modulatory bias

part. ”);

{_,false,true,2} ->

U_MI_IdPs = lists:append(MI_IdPs,[{bias,[{random:uniform()-0.5,

plasticity:PFName(weight_parameters)}]}]),

U_N = N#neuron{

input_idps_modulation = U_MI_IdPs,

generation = Generation},

EvoHist = A#agent.evo_hist,

U_EvoHist = [{{add_bias,m},N_Id}|EvoHist],

U_A = A#agent{evo_hist=U_EvoHist},

genotype:write(U_N),

genotype:write(U_A);

{true,_,_,1} ->

exit( “********ERROR:add_bias:: This Neuron already has a bias in in-

put_idps. ”);

{false,_,_,_} ->

U_SI_IdPs = lists:append(SI_IdPs,[{bias,[{random:uniform()-0.5,

plasticity:PFName(weight_parameters)}]}]),

U_N = N#neuron{

input_idps = U_SI_IdPs,

generation = Generation},

EvoHist = A#agent.evo_hist,

U_EvoHist = [{{add_bias,s},N_Id}|EvoHist],

U_A = A#agent{evo_hist=U_EvoHist},

genotype:write(U_N),

genotype:write(U_A)

end.

The remove_bias is modified in the same manner, and only a few elements of the source code are changed. Like the add_bias, we update the link_ToNeuron/4 function to randomly choose whether to make the new link modulatory or stand- ard, and only if the chosen list (either input_idps or input_idps_modulation), does not already have a link from the specified presynaptic element. The updated func- tion is shown in Listing-15.16.

Listing-15.16 The updated link_ToNeuron/4 function.

link_ToNeuron(FromId,FromOVL,ToN,Generation)->

ToSI_IdPs = ToN#neuron.input_idps,

ToMI_IdPs = ToN#neuron.input_idps_modulation,

{PFName,_NLParameters}=ToN#neuron.pf,

case {lists:keymember(FromId,1,ToSI_IdPs),lists:keymember(FromId,1,ToMI_IdPs)} of

{false,false} ->

case {PFName == neuromodulation, random:uniform(2)} of

{true,2} ->

U_ToMI_IdPs = [{FromId,

genotype:create_NeuralWeightsP(PFName,FromOVL,[])}|ToMI_IdPs],

ToN#neuron{

input_idps = U_ToMI_IdPs,

generation = Generation

};

_ ->

U_ToSI_IdPs = [{FromId,

genotype:create_NeuralWeightsP(PFName,FromOVL,[])}|ToSI_IdPs],

ToN#neuron{

input_idps = U_ToSI_IdPs,

generation = Generation

}

end;

_ ->

exit( “ERROR:add_NeuronI::[cannot add I_Id]: ~p already connected to ~p~n ”,

[FromId,ToN#neuron.id])

end.

%link_ToNeuron/4 updates the record of ToN, so that it's updated to receive a connection from the element FromId. The link emanates from element with the id FromId, whose output vector length is FromOVL, and the connection is made to the neuron ToN. In this function, either the ToN's input_idps_modulation or input_idps list is updated with the tuple {FromId, [{W_1, WPs} ...{W_FromOVL,WPs}]}. Whether input_idps or input_idps_modulation is updated, is chosen randomly. Then the neuron's generation is updated to Generation (the current, most re- cent generation). After this, the updated ToN's record is returned to the caller. On the other hand, if the FromId is already part of the ToN's input_idps or input_idps_modulation list (de- pendent on which was randomly chosen), which means that the standard or modulatory link al- ready exists between the neuron ToN and element FromId, this function exits with an error.

Finally, we update the cutlink_ToNeuron/3 function. In this case, since there can only be one link between two elements, we simply first check if the specified input link is specified in the input_idps, and cut it if it does. If it does not, we check the input_idps_modulation next, and cut it if this link is modulatory. If such a link does not exist in either of the two lists, we exit the mutation operator with an error, printing to console that the specified link does not exist, neither in the synaptic weights list, nor in the synaptic parameters list. The implementation of the cutlink_ToNeuron/3, is shown in Listing-15.17.

Listing-15.17 The cutlink_ToNeuron/3 implementation.

cutlink_ToNeuron(FromId,ToN,Generation)->

ToSI_IdPs = ToN#neuron.input_idps,

ToMI_IdPs = ToN#neuron.input_idps_modulation,

Guard1 = lists:keymember(FromId, 1, ToSI_IdPs),

Guard2 = lists:keymember(FromId, 1, ToMI_IdPs),

if

Guard1->

U_ToSI_IdPs = lists:keydelete(FromId,1,ToSI_IdPs),

ToN#neuron{

input_idps = U_ToSI_IdPs,

generation = Generation};

Guard2 ->

U_ToMI_IdPs = lists:keydelete(FromId,1,ToMI_IdPs),

ToN#neuron{

input_idps = U_ToMI_IdPs,

generation = Generation};

true ->

exit( “ERROR[can not remove I_Id]: ~p not a member of

~p~n ”,[FromId,ToN#neuron.id])

end.

%cutlink_ToNeuron/3 cuts the connection on the ToNeuron (ToN) side. The function first

checks if the FromId is a member of the ToN's input_idps list, if it's not, then the function

checks if it is a member of the input_idps_modulation list. If it is not a member of either, the function exits with error. If FromId is a member of one of these lists, then that tuple is removed from that list, and the updated ToN record is returned to the caller.

With these updates completed, the genome_mutator module is up to date. In the case that a plasticity is enabled in any neuron, the topological mutation phase will be able to mutate the plasticity function learning parameters, and add modulatory connections in the case the plasticity function is neuromodulation. The only re- maining update we have to make is one to the tuning phase related functions.

15.5 Tuning of a NN which has Plastic Neurons

It can be argued whether both standard synaptic weights and modulatory synap- tic weights should be perturbed at the same time when the neuron has plasticity enabled, or just one or the other separately during the tuning phase. For example, should we allow for the neural_parameters to be perturbed during the tuning phase, rather than only during the topological mutation phase? What percentage of tuning should be dedicated to learning parameters and what percentage to synaptic weights? This of course can be tested, and benchmarked, and in general deduced through experimentation. After it has been decided on what and when to tune with regards to learning rules, there is still a problem with regards to the parameter and synaptic weight backup during the tuning phase. The main problem of this section is with regards to this dilemma, the dilemma of the backup process of the tuned weights.

Consider a neuron that has plasticity enabled, no matter what plasticity function it's using. The following scenario occurs when the neuron is perturbed:

1. The neuron receives a perturbation request.

2. Neuron selects random synaptic weights, weight_parameters, or even neu- ral_parameters (though we do not allow for neural_parameters perturbation during the tuning phase, yet).

3. Then the agent gets re-evaluated, and IF:

4. Perturbed agent has a higher fitness: the neuron backups its current weights/parameters.

5. Perturbed agent has a lower fitness: the neuron restores its previous backed up weights/parameters.

There is a problem with step 4. Because by the time it's time to backup the synaptic weights, they have already changed from what they original started with during the evaluation, since they have adapted and learned due to their plasticity function. So we would not be backing up the synaptic weights of the agent that achieved the higher fitness score, but instead we would be backing up the learned and adapted agent with its adapted synaptic weights.

The fact that the perturbed agent, or topologically mutated agent, is not simply a perturbed genotype on which its parent is based, but instead is based on the gen- otype which has resulted from its parent's experience (due to the parent having changed based on its learning rule, before its genotype was backed up), means that the process is now based on Lamarckian evolution, rather than the biologically correct Darwinian. The definition of Lama rckian Evolution is based on the idea that an organism can pass on to its offspring the characteristics that it has acquired and learned during its lifetime (evaluation), all its knowledge and learned skills. Since plasticity affects the agent's neural patterns, synaptic weights... all of which are defined and written back to the agent's genotype, and the offspring is a mutat- ed version of that genotype, the offspring thus in effect will to some extent inherit

the agent's adapted genotype, and not the original genotype with which the parent started when it was being evaluated.

When the agent backs up its synaptic weights after it has been evaluated for fit- ness, the agent uses Lamarckian evolution, because its experience, what it has learned during its evaluation (and what it has learned is reflected in how the syn- aptic weights changed due to the used plasticity learning rule), is written to its ge- nome, and it is this learned agent that gets perturbed. The cleanest way to solve this problem, and have control of whether we use Lamarckian or the biologically correct Darwinian evolution, is to add a new parameter to the agent, the darwini- an / lamarckian flag.

Darwinian vs. Lamarckian evolution, particularly in ALife simulations, could lead to interesting possibilities. When using Lamarckian evolution, and for exam- ple applying our neuroevolutionary system to an ALife problem, the agent's expe- rience gained from interacting with the simulated environment, would be passed on to its offspring, and perturbed during the tuning phase. The perturbed organism (during the tuning phase, belonging to the same evaluation) would re-experience the interaction with the environment, and if it was even more successful, it would be backed up with its new experience (which means that the organism has now experienced and learned in the environment twice, since through plasticity the en- vironment has affected its synaptic weights twice...). If the perturbed agent is less fit, then the previous agent, with its memories and synaptic weight combination, is reverted to, and re-perturbed. If we set the max_attempts counter to 1, then it will be genetic rather than a memetic based neuroevolutionary system. But again, when Lamarckian evolution is allowed, the memories of the parent are passed on to its offspring... A number of papers have researched the usefulness and efficien- cy of Darwinian Vs. Lamarkian evolution [4,5,6,7]. The results vary, and so add- ing a heredity flag to the agent will allow us to experiment and use both if we want to. We could then switch between the two heredity approaches (Darwinian or Lamarckian) easily, or perhaps even allow the hereditary flag to flip between the two during the topological mutation phase through some new topological mutation operator, letting the evolutionary process decide what suits the assigned problem best.

To implement the proper synaptic weight updating method to reflect the decid- ed on hereditary approach during the tuning phase, we will need to add minor up- dates to the records.hrl file, the exoself, the neuron, and the genotype modules. In the records.hrl, we have to update the agent record by adding the heredity_type flag to it, and modifying the constraint record by adding the heredity_types ele- ment to it. The agent's heredity_type element will simply store a tag, an atom which can either be : darwinian or lamarckian . The constraint's heredity_types el- ement will be a list of heredity_type tags. This list can either contain just a single tag, ‘darwinian' or ‘lamarckian' for example, or it could contain both. If both at- oms are present in the heredity_types list, then during the creation of the seed population, some agents will use the darwinian method of passing on their heredi-

tary information, and others will use a lamarckian approach. It would be interest- ing to see which of the two would have an advantage, or be able to evolve faster, and during what stages of evolution and in which problems...

After updating the 2 records in records.hrl, we have to make a small update to the genotype module. In the genotype module we update the construct_Agent/3 function, and set the agent's heredity_type to one of the available heredity types in the constraint's heredity_types list. We do this by adding the following line when setting the agent's record: heredity_type = random_element (SpecCon#constraint.heredity_types). We then update the exoself module, by modifying the link_Neurons/2 function to link_Neurons/3 function, and pass to it the agent's heredity_type parameter, the parameter which is then forwarded to each spawned neuron.

With this done, we make the final and main source modification, which is all contained within the neuron module. To allow for Darwinian based heredity in the presence of learning and plastic neurons, we need to keep track of two states of the input_pidps:

1. The input_pidps that are currently effective and represent the neuron's pro- cessing dynamics, which is the input_pidps_current.

2. A second input_pidps list, which represents the state of input_pidps right after perturbation, before the synaptic weights are affected by the neuron's plasticity function.

We can call this new list the input_pidps_bl , where bl stands for Before Learn- ing.

When a neuron is requested to perturb its synaptic weights, right after the weights are perturbed, we want to save this new input_pidps list, before plasticity gets a chance to modify the synaptic weights. Thus, whereas before we stored the Perturbed_PIdPs in input_pidps_current, we now also save it to input_pidps_bl. Afterwards, the neuron can process the input signals using its input_pidps_current, and its learning rule can affect the input_pidps_current list. But input_pidps_bl will remain unchanged.

When a neuron is sent the weight_backup message, it is here that heredity_type plays its role. When it's darwinian , the neuron saves the input_pidps_bl to in- put_pidps_backup, instead of the input_pidps_current which could have been modified by some learning rule by this point. On the other hand, when the heredi- ty_type is lamarckian , the neuron saves the input_pidps_current to in- put_pidps_backup. The input_pidps_current represents the synaptic weights that could have been updated if the neuron allows for plasticity, and thus the in- put_pidps_backup will then contain not the initial states of the synaptic weight list with which the neuron started, but the state of the synaptic weights after the neu- ron has experienced, processed, and had its synaptic weights modified by its learn- ing rule. Using this logic we add to the neuron's state the element input_pidps_bl, and update the loop/6 function, as shown in Listing-15.18.

Listing-15.18 The neuron's loop/6 function which can use both, Darwinian and Lamarckian in- heritance.

loop(S,ExoSelf_PId,[ok],[ok],SIAcc,MIAcc)->

PF = S#state.pf,

AF = S#state.af,

AggrF = S#state.aggrf,

{PFName,PFParameters} = PF,

Ordered_SIAcc = lists:reverse(SIAcc),

SI_PIdPs = S#state.si_pidps_current,

SAggregation_Product = signal_aggregator:AggrF(Ordered_SIAcc,SI_PIdPs),

SOutput = functions:AF(SAggregation_Product),

Output_PIds = S#state.output_pids,

[Output_PId ! {self(),forward,[SOutput]} || Output_PId <- Output_PIds],

Ordered_MIAcc = lists:reverse(MIAcc),

MI_PIdPs = S#state.mi_pidps_current,

MAggregation_Product = signal_aggregator:dot_product(Ordered_MIAcc,MI_PIdPs),

MOutput = functions:tanh(MAggregation_Product),

U_SI_PIdPs = plasticity:PFName([MOutput|PFParameters],Ordered_SIAcc,SI_PIdPs,

SOutput),

U_S=S#state{

si_pidps_current = U_SI_PIdPs

},

SI_PIds = S#state.si_pids,

MI_PIds = S#state.mi_pids,

loop(U_S,ExoSelf_PId,SI_PIds,MI_PIds,[],[]);

loop(S,ExoSelf_PId,[SI_PId|SI_PIds],[MI_PId|MI_PIds],SIAcc,MIAcc)->

receive

{SI_PId,forward,Input}->

loop(S,ExoSelf_PId,SI_PIds,[MI_PId|MI_PIds],[{SI_PId,Input}|SIAcc],

MIAcc);

{MI_PId,forward,Input}->

loop(S,ExoSelf_PId,[SI_PId|SI_PIds],MI_PIds,SIAcc,[{MI_PId,Input}|MIAcc]);

{ExoSelf_PId,weight_backup}->

U_S=case S#state.heredity_type of

darwinian ->

S#state{

si_pidps_backup=S#state.si_pidps_bl,

mi_pidps_backup=S#state.mi_pidps_current

};

lamarckian ->

S#state{

si_pidps_backup=S#state.si_pidps_current,

mi_pidps_backup=S#state.mi_pidps_current

}

end,

loop(U_S,ExoSelf_PId,[SI_PId|SI_PIds],[MI_PId|MI_PIds],SIAcc,MIAcc);

{ExoSelf_PId,weight_restore}->

U_S = S#state{

si_pidps_bl=S#state.si_pidps_backup,

si_pidps_current=S#state.si_pidps_backup,

mi_pidps_current=S#state.mi_pidps_backup

},

loop(U_S,ExoSelf_PId,[SI_PId|SI_PIds],[MI_PId|MI_PIds],SIAcc,MIAcc);

{ExoSelf_PId,weight_perturb,Spread}->

Perturbed_SIPIdPs=perturb_IPIdPs(Spread,S#state.si_pidps_backup),

Perturbed_MIPIdPs=perturb_IPIdPs(Spread,S#state.mi_pidps_backup),

U_S=S#state{

si_pidps_bl=Perturbed_SIPIdPs,

si_pidps_current=Perturbed_SIPIdPs,

mi_pidps_current=Perturbed_MIPIdPs

},

loop(U_S,ExoSelf_PId,[SI_PId|SI_PIds],[MI_PId|MI_PIds],SIAcc,MIAcc);

{ExoSelf_PId,reset_prep}->

neuron:flush_buffer(),

ExoSelf_PId ! {self(),ready},

RO_PIds = S#state.ro_pids,

receive

{ExoSelf_PId, reset}->

fanout(RO_PIds,{self(),forward,[?RO_SIGNAL]})

end,

loop(S,ExoSelf_PId,S#state.si_pids,S#state.mi_pids,[],[]);

{ExoSelf_PId,get_backup}->

NId = S#state.id,

ExoSelf_PId ! {self(),NId,S#state.si_pidps_backup,S#state.mi_pidps_backup},

loop(S,ExoSelf_PId,[SI_PId|SI_PIds],[MI_PId|MI_PIds],SIAcc,MIAcc);

{ExoSelf_PId,terminate}->

io:format( “Neuron:~p is terminating.~n ”,[self()])

after 10000 ->

io:format( “neuron:~p stuck.~n ”,[S#state.id])

end.

With this modification, our neuroevolutionary system can be used with Dar- winian and Lamarckian based heredity. If we start the population_monitor process with a constraint where the agents are allowed to have neurons with plasticity, and set the heredity_types to either [lamarckian] or [darwinian,lamarckian], then some of the agents will have plasticity and be able to use the Lamarckian inheritance.

We can next add a simple mutation operator which works similarly to the way the mutation operators of other evolutionary strategy parameters work. We simply check whether there are any other heredity types in the constraint's heredity_types list, if there are, we change the currently used one to a new one, randomly chosen from the list. If there are no others, then the mutation operator exits with an error, without wasting the topological mutation attempt. This simple mu- tate_heredity_type mutation operator implementation is shown in Listing-15.19.

Listing-15.19 The implementation of the genome_mutator:mutate_heredity_type/1 mutation operator.

mutate_heredity_type(Agent_Id)->

A = genotype:read({agent,Agent_Id}),

case (A#agent.constraint)#constraint.heredity_types -- [A#agent.heredity_type] of

[] ->

exit( “********ERROR:mutate_heredity_type/1:: Nothing to mutate, only a

single function available. ”);

Heredity_Type_Pool->

New_HT = lists:nth(random:uniform(length(Heredity_Type_Pool)),

Heredity_Type_Pool),

U_A = A#agent{heredity_type = New_HT},

genotype:write(U_A)

end.

%mutate_heredity_type/1 function checks if there are any other heredity types in the agent's constraint record. If any other than the one currently used by the agent is present, the agent ex- changes the heredity type it currently uses for a random one from the remaining list. If no other heredity types are available, the mutation operator exits with an error, and the neuroevolutionary system tries another mutation operator.

Since this particular neuroevolutionary feature is part of the evolutionary strat- egies, we add it to the evolutionary strategy mutator list, which we created earlier:

-define(ES_MUTATORS,[

mutate_tuning_selection,

mutate_tuning_duration,

mutate_tuning_annealing,

mutate_tot_topological_mutations,

mutate_heredity_type

]).

With this final modification, our neuroevolutionary system can now fully em- ploy plasticity, and two types of heredity inheritance methods. We now finally compile, and test our updated system on the T-Maze Navigation problem we de- veloped in the previous chapter.

15.6 Compiling & Testing

Our TWEANN system can now evolve NNs with plasticity, which means the evolved agents do not simply have an evolved response/reflex to sensory signals, but can also change, adapt, learn, modify their strategies as they interact with the ever changing and dynamic world. Having added this feature, and having created the T-Maze Navigation problem which requires the NN to change its strategy as it interacts with the environment, we can now test the various plasticity rules to see whether the agents will be able to achieve a fitness of 149.2, a fitness score achieved when the agent can gather the highest reward located in the right corner, and then when sensing that the reward is now not 1 but 0.2 in the right corner, start moving to the left corner to continue gathering the highest reward.

Having so significantly modified the records and the various modules, we reset the mnesia database after recompiling the modules. To do this, we first execute polis:sync(), then polis:reset(), and then finally polis:start() to startup the polis process. We have created numerous plasticity learning rules: [hebbian_w, hebbian, ojas_w, ojas, self_modulationV1, self_modulationV2, self_modulationV3, self_modulationV4, self_modulationV5, self_modulationV6, neuromodulation], too many to show the console printouts of. Here I will show you the results I achieved while benchmarking the hebbian_w and the hebbian learning rules, and I highly recommend testing the other learning rules by using the provided source code in the supplementary material.

To run the benchmarks, we first modify the ?INIT_CONSTRAINTS in the benchmarker module, setting the constraint's parameter: neural_pfns, to one of these plasticity rules for every benchmark. We can leave the evaluations_limit in the pmp record as 5000, but in the experiments I've performed, I set the popula- tion limit to 20 rather than 10, to allow for a greater diversity. The following are the results I achieved when running the experiments for the hebbian_w and the hebbian plasticity based benchmarks:

T-Maze Navigation with neural_pfns=[hebbian_w]:

Graph:{graph,discrete_tmaze,

[1.1185328852434115,1.1619749686158354,1.1524569668377718,

1.125571504518873,1.1289114832535887,1.1493175172780439,

1.136998936735779,1.151456292245766,1.1340011357153639,

1.1299993522129745],

[0.0726690757747553,0.08603433346506212,0.07855604082593783,

0.10142838037124464,0.07396159578145513,0.10671412852082847,

0.07508707481514428,0.09451139923220694,0.10140517337683815,

0.07774940615923569],

[91.76556804891021,101.28562704890575,111.38602998360439,

110.65857974481669,110.16398032961199,111.09056977671462,

110.92899944938112,110.89051253132838,115.36595268212,

111.07567142455073],

[14.533256849468248,13.058657299854085,10.728855341054617,

10.993110357580642,10.14374645989871,8.753610288273324,

8.392536182954592,7.795296190771122,5.718415463002469,

8.367092075873826],

[122.0000000000001,122.0000000000001,148.4,149.2,149.2,149.2,

149.2,149.2,149.2,149.2],

[10.000000000000115,10.000000000000115,10.000000000000115,

10.000000000000115,10.000000000000115,10.000000000000115,

10.000000000000115],

[11.45,14.3,15.3,15.8,15.3,16.15,16.15,15.55,15.95,15.7],

[1.5321553446044565,2.451530134426253,2.1702534414210706,

2.541653005427767,2.2825424421026654,2.7253440149823285,

2.127792283095321,2.0118399538730714,2.246664193866097,

2.0273134932713295],

[500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0],

[]}

Tot Evaluations Avg:5172.95 Std:103.65301491032471

The boldfaced list shows the maximum achieved scores from all the evolution- ary runs, and this time through plasticity, the score of 149.2 was achieved, imply- ing our TWEANN's ability to solve the T-Maze navigation problem in under 2000 evaluationss (by the 4 of the 500 evaluations set).

T-Maze Navigation with neural_pfns=[hebbian]:

Graph:{graph,discrete_tmaze,

[1.1349113313586998,1.1720830155097892,1.1280659983291563,

1.1155462519936203,1.1394258373205741,1.1293439592742998,

1.1421323920317727,1.1734812130593864,1.1750255550524766,

1.2243932469319467],

[0.07930932911768754,0.07243567080038446,0.0632406890972406,

0.05913247338612391,0.07903341129827642,0.07030745338352402,

0.09215871275247499,0.09666623776054033,0.1597898002580627,

0.2447504142533042],

[90.66616594516601,97.25899378881999,104.36751796157071,

105.0985582137162,106.70360792131855,108.09892415530814,

108.23839098414494,109.28814527629243,108.0643063975331,

111.0103593241125],

[15.044059269853784,13.919179099169385,10.613477213673535,

13.557400867791436,13.380234103652047,12.413686820724935,

11.936102929326337,11.580780191261242,12.636714964991167,

12.816711475442705],

[122.0000000000001,147.8,145.60000000000002,149.2,149.2,149.2,

th th

149.2,149.2,149.2,149.2],

[10.000000000000115,10.000000000000115,10.000000000000115,

10.000000000000115,10.000000000000115,10.000000000000115,

10.000000000000115],

[11.05,12.2,12.3,12.85,13.35,14.25,14.35,15.3,15.4,14.9],

[1.6271140095272978,2.6381811916545836,2.215851980616034,

1.7399712641305316,1.7399712641305318,2.2332711434127295,

1.9817921182606415,2.0760539492026697,1.9078784028338913,

2.046948949045872],

[500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0],

[]}

Tot Evaluations Avg:5145.65 Std:91.87234349900953

In this case, our TWEANN again was able to solve the T-Maze problem. Plas- tic NN based agents do indeed have the ability to solve the T-Maze problem which requires the agents to change their strategy as they interact with the maze which changes midway. Our TWEANN is now able to evolve such plastic NN based agents, our TWEANN can now evolve agents that can learn new things as they in- teract with the environment, that can change their behavioral strategies based on their experience within the environment.

15.7 Summary & Discussion

Though we have tested only two of the numerous plasticity learning rules we've implemented, they both produced success. In both cases our TWEANN platform has been able to evolve NN based agents capable of solving the T-Maze problem, which was not solvable by our TWEANN in the previous chapter with- out plasticity. Thus we have successfully tested our plasticity rule implementa- tions, and the new performance capabilities of our TWEANN. Outside this text I have tested the learning rules which were not tested above, and they are also capa- ble of solving this problem, with varying performance levels. All of this without us having even optimized our algorithms yet.

With this benchmark complete, we have now finished developing numerous plasticity learning rules, implementing the said algorithms, and then benchmark- ing their performance. Our TWEANN system has finally been able to solve the T- Maze problem which requires the agents to change their strategy. Our TWEANN platform can now evolve not only complex topologies, but NN systems which can learn and adapt. Our system can now evolve thinking neural network based agents. There is nothing stopping us from producing more complex and more bio- logically faithful plasticity based learning rules, which would further improve the

15.8 References

capabilities and potential of the types of neural networks our TWEANN system can evolve.

With the plasticity now added, our next step is to add a completely different NN encoding, and thus further advance our TWEANN system. In the next chapter we will allow our TWEANN platform to evolve not only the standard encoded NN based agents we've been using up to this point, but also the new indirect en- coded type of NN systems, the substrate encoded NN based systems.

[1] Oja E (1982) A Simplified Neuron model as a Principal Component Analyzer. Journal of Mathematical Biology 15, 267-273.

[2] Soltoggio A, Bullinaria JA, Mattiussi C, Durr P, Floreano D (2008) Evolutionary Advantages of Neuromodulated Plasticity in Dynamic, Reward-based Scenarios. Artificial Life 2, 569- 576.

[3] Blynel J, Floreano D (2003) Exploring the T-maze: Evolving Learning-Like Robot Behaviors using CTRNNs. Applications of evolutionary computing 2611, 173-176.

[4] Whitley LD, Gordon VS, Mathias KE (1994) Lamarckian Evolution, The Baldwin Effect and Function Optimization. In Parallel Problem Solving From Nature - PPSN III, Y. Davidor and H. P. Schwefel, eds. (Springer), pp. 6-15.

[5] Julstrom BA (1999) Comparing Darwinian, Baldwinian, and Lamarckian Search in a Genetic Algorithm For The 4-Cycle Problem. In Late Breaking Papers at the 1999 Genetic and Evolu- tionary Computation Conference, S. Brave and A. S. Wu, eds., pp. 134-138.

[6] Castillo PA, Arenas MG, Castellano JG, Merelo JJ, Prieto A, Rivas V, Romero G (2006) Lamarckian Evolution and the Baldwin Effect in Evolutionary Neural Networks. CoRR abs/cs/060, 5.

[7] Esparcia-Alcazar A, Sharman K (1999) Phenotype Plasticity in Genetic Programming: A Comparison of Darwinian and Lamarckian Inheritance Schemes. In Genetic Programming Proceedings of EuroGP99, R. Poli, P. Nordin, W. B. Langdon, and T. C. Fogarty, eds. (Springer-Verlag), pp. 49-64.

Chapter 16 Substrate Encoding

Abstract In this chapter we augment our TWEANN to also evolve indirect en- coded NN based systems. We discuss, architect, and implement substrate encod- ing. Substrate encoding allows for the evolved NN based systems to become geo- metrical-regularity sensitive with regards to sensory signals. We extend our existing genotype encoding method and give it the ability to encode both, neural and substrate based NNs. We then extend the exoself to map the extended geno- type to the extended phenotype capable of supporting substrate encoded NN sys- tems. Finally, we modify the genome mutator module to support new, substrate NN specific mutation operators, and then test the system on our previously devel- oped benchmarking problems.

With all the main features of a neuroevolutionary system complete, and with our TWEANN system now able to evolve learning networks, we can now move forward and add some of the more elaborate features to our platform. In this chap- ter we will modify our TWEANN platform to evolve substrate encoded NN sys- tems, which we briefly discussed in Section-4.1.3, and again in Chapter-10.

In indirect encoded NN systems, the genotype and phenotype do not have a 1- to-1 mapping. Substrate Encoding is one of such indirect encoding methods. As we discussed in Chapter-10, and as was shown in a number of relatively recently published papers [2,3,4], it has numerous advantages, particularly when it comes to generalization, image analysis based problems, and problems with geometrical regularities. Substrate encoding allows us to build substrate modules whose em- bedded interconnected neurodes and their very connections and synaptic weights are defined by a directly encoded NN system. Thus by evolving a NN, we are evolving a system which accepts as input the coordinates of the substrate embed- ded neurodes and outputs/paints the topology, synaptic weights, connection ex- pression patterns, and other connectivity parameters on the multidimensional sub- strate. In this chapter we implement the extension to our system so that it can evolve such substrate encoded NN based agents.

Though having already discussed the manner in which substrate encoding works in Chapter-10, in the next section we will cover it in a greater detail. After- wards, we will implement this indirect encoding system, and then test it on the T- Maze problem.

DOI 10.1007/978-1-4614- 4463 - 3_16, © Springer Science+Business Media New York 2013

16.1 A Brief Overview of Substrate Encoding

Substrate encoding was popularized by the HyperNEAT [2] neuroevolutionary system. Though simple, it is a new approach in the effort of trying to reduce the curse of dimensionality problem (number of variables to evolve and deal with, as the size of the NN system increases). In this encoding method, it is not the directly evolved neural network that processes the incoming signals from the sensors, and outputs signals to the actuators to interact with the world, but instead it is the neurode impregnated multidimensional substrate that interacts with the environ- ment. A substrate is simply a hypercube with every axis ranging from -1 to 1. Within the hypercube are neurodes, as shown in Fig-16.1 . The neurodes are em- bedded in this multidimensional space, and thus each has its own coordinate. Fur- thermore, the neurodes are connected to each other in some particular pattern, and thus the substrate represents a neural network. The trick here is that we do not have to evolve the neurode connectivity patterns or the synaptic weights between them, instead it is the evolved NN that sets up the synaptic weights and connectiv- ity expression for the neurodes within the substrate. Thus, even if we have a 3d substrate containing millions of neurodes, heavily interconnected with each other, we might still only have a few dozen neurons in the NN that sets up the synaptic weights between the neurodes within the substrate. This is accomplished by feed- ing the NN the coordinates of the connected neurodes within the substrate, and with this NN's resulting output being used to set the synaptic weights between those connected neurodes. Thus, no matter how many millions of interconnected neurodes are within the substrate, we're still only evolving a few interconnected neurons within the direct encoded NN. Of course the greater the number of the neurons within the NN, the more complex a connection and synaptic weight pat- tern that it can paint on this multidimensional substrate.

Fig. 16.1 A multidimensional substrate with embedded neurodes, connected in a feedforward, hyperlayer-to-hyperlayer fashion.

- - - Note****

In HyperNEAT the NN is referred to as a CPNN (Compositional Pattern Producing Network), due to the fact that the NEAT [5] neuroevolutionary system only evolves NNs which use the

tanh activation function, while the CPNN evolved in HyperNEAT uses other activation func- tions as well. This terminology need not apply to other Neuroevolutionary systems whose neu- rons use various types of activation functions as a standard, and also when used with substrate encoding. Since our evolved NNs use different types of activation functions in general, we need not distinguish between the evolved NNs during direct encoding, and the evolved NNs during indirect encoding used for the purpose of painting the connectivity patterns on the substrate. This terminology would in particular be difficult to use if we were to evolve modular neural network systems, with interconnected substrate encoded modules, direct encoded modules, and other programs. Thus, because there is really no difference between the NNs used on their own and those used with substrates, we will simply refer to them as NNs. The NNs are simply being used for different purposes. In the direct encoded approach the NN is being applied to the prob- lem directly. In the substrate encoded approach, the NN is being applied to set up the synaptic weights of the neurodes embedded in the substrate, which is applied to the problem. I will refer to the neurons embedded in the substrate, as neurodes.

As was shown in the above figure, we can thus feed to the NN an appended list of the coordinates of any two connected neurodes, and use the NN's output as the synaptic weight between those neurodes. In this manner we can calculate the syn- aptic weights between all connected neurodes in the substrate, whether there are ten or ten million neurodes, as long as we have the coordinates of those neurodes. Because the NN is now dealing with the coordinates, the length of the input vector to the NN is thus at most 2*SubstrateDimensionality , which alleviates problems associated with extremely large NN input vectors. It is the substrate that processes the input vectors, and produces the output. Another important feature is that due to the NN dealing with coordinates, the system becomes sensitive to the geometrical regularities within the input, and allows us to set up the substrate's geometry which can further emulate the geometrical features of the problem we wish to solve. If the input to the substrate is such that there is geometrical data in it (For example an image from a camera), and it can be exploited, this type of encoding is much more suited to this data's analysis than a directly encoded NN.

The next question is of course: In what manner are the neurodes within the sub- strate connected; what is the substrate's topology? And how do we forward data from the sensors to the substrate, and use the output produced by the substrate to control the actuators? These two questions are related because both, the input to the substrate and the output from the substrate, are parts of the substrate's topolo- gy, as will be explained next.

The most common substrates are standard hypercubes where each hyperplane is connected to the one ahead of it. Consider a 3d substrate, a cube, as was shown in Fig-16.1 . A hyperplane to hyperplane feedforward topology, where the hyperplane's dimension is one less than the dimensionality of the entire substrate, is such that all the neurodes in the hyperplane are connected in a feedforward fash- ion to all the neurodes in the next hyperplane, with the synaptic weights between the connected neurodes decided by the NN. Another substrate topology is where all the neurodes in the substrate are interconnected, and again the NN decides the synaptic weights between the neurodes. But we can also allow the NN to output not just the synaptic weight for the coordinates used as input to the NN, but also whether or not the neurons are at all connected. In this manner the topology is no longer set ahead of time, and instead it is the NN that decides which neurodes are connected, and with what synaptic weights (rather than just deciding on the synap- tic weights between the feedforward connected neurodes). Fig-16.2a shows a hyperplane-to-hyperplane Jordan recurrent (The last plane outputs signals back to the first plane) substrate where the NN uses the coordinates between any two neurodes to output the synaptic weights between them. On the other hand, Fig- 16.2b shows a substrate full of neurodes, where the NN decides both on the con- nectivity expression between all the neurodes, and the synaptic weights between those that are indeed connected.

Fig. 16.2 A hyperlayer-to-hyperlayer fully connected substrate topology, and a “freeform ” substrate topology. In A, the NN uses as input the coordinates of every two connected neurodes of different planes, while in B the NN outputs a vector length of 2, a synaptic weight and whether the synaptic connection is expressed, for every two neurode combina- tion in the whole substrate.

Though usually the neurodes are equidistantly spaced within the substrate (in case of freeform), it need not be the case. For example, we can just as easily ran- domly pepper the hypercube with neurodes, and then use the NN to decide whether

and which neurodes are connected, and what the synaptic weights between the con- nected neurodes is. Indeed at times it might be better to even use a mutation opera- tor which randomly adds and subtracts neurodes to and from the substrate respec- tively, and thus overtime evolves a substrate with different neurode density zones, and thus different signal sensitivity and specialization zones.

- - - Note****

A hyperlayer is a group of neurodes all belonging to the same, most external coordinate, and thus forming a structure which is one dimension lower than the full substrate itself. If the sub- strate is 5d (x,y,z,a,b) , then the hyperlayer is 4d, with each separate hyperlayer on its own b co- ordinate. If the substrate is 3d (x,y,z) , then the hyperlayer is 2d, with each hyperlayer on its own z coordinate. Furthermore, I will use the term hyperplane to designate the planes composing the hyperlayer, forming structures one dimension lower than the hyperlayers. This will make more sense as new substrate topologies are shown. For example, multiple hyperplanes, where each designates a separate sensor, can then form a single input hyperlayer of a substrate. Similarly, multiple hyperplanes of the output hyperlayer, would each be associated with a particular actua- tor. This terminology will allow us to discuss the substrates, evolving substrates, and connectiv- ity between the substrate and the sensors and actuators, much more easily.

Given all these substrate topologies, how do we present the sensory data to them? Since for the neurodes to have a synaptic weight for an incoming signal, the origin of that signal too must have a coordinate, the sensory signal output zone must somehow be located within the substrate. For example assume that the input is an image, and the substrate is 3d, with a standard feedforward topology, with 3 hyperlayers in total, with the hyperlayers located at Z = -1, Z = 0, and Z = 1, and with the hyperlayer-to-hyperlayer feedforward connections going from Z = -1 to- wards Z = 1. One way we can allow for the sensory image to also have coordi- nates, is by positioning it as the input hyperlayer at Z = -1. The pixels then would have coordinates, and the pixel's color can then be mapped to floating point val- ues. In this manner the image can be used as an input to the substrate encoded NN. The NN can then produce the synaptic weights for the neurodes in the hyperlayer located at Z = 0 because the NN would now have the coordinates of both, the neurodes in the hyperlayer at Z = 0, and the sensory signal producers (the image pixels) of the hyperlayer at Z = -1, which is connected to the neurodes at Z = 0. This type of setup is shown in Fig-16.3 .

Fig. 16.3 The image being used as the input hyperlayer of the substrate.

We can also designate the neurodes in the last hyperlayer, or neurodes located at Z = -1, as the output neurodes. We would then use the output of the neurodes in the output hyperlayer, as the output signals to control the actuators. It is complete- ly up to the researcher what combination of output neurodes is designated to con- trol which actuators. For example in the above figure, the input hyperlayer is the image, which in this case is a financial instrument price chart, and the output hyperlayer is composed of a single neurode. And this neurode's output is then used to control some actuator. In comparison to the above figure, in Fig-16.1 the output hyperlayer had 9 neurodes. It is up to us when creating the substrate and tagging the output hyperlayer, whether all 9 neurodes are used as signals for a sin- gle actuator, or whether there are 3 actuators and each 3 neurode hyperplane with- in the output hyperlayer is associated with its own actuator. Fig-16.4 shows two 3d substrates, both using the same topology, but one has designated all the neurodes in the output hyperlayer to be used for a single actuator whose input vec- tor length is 9, while in the second substrate the output hyperlayer is broken up in- to 3 layers, and each 3 neurode hyperplane is designated for a different actuator.

Fig. 16.4 Two 3d substrates of the same topology, using different number of actuators through some output-neurode/actuator designation method. Substrate A has designated all the neurodes in its output hyperlayer to be associated with a single actuator, while sub- strate B has separated the 9 neurodes into 3 groups, each group associated with its own ac- tuator.

In the same fashion, a substrate can have multiple sensors associated with its input hyperlayer. For example we could use a four dimensional substrate, where the input hyperlayer is 3 dimensional, and is composed of 2 or more 2d hyperplanes, where each hyperplane is an image fed from a camera. Fig-16.5 shows just such an arrangement, where the 3d input hyperlayer uses sensors which produce chart images of possibly differing resolutions, and using different tech- nical indicators.

Fig. 16.5 Four dimensional substrate with multiple sensors and actuators.

Finally, the dimensionality and the topology of the substrate itself can vary. It is dependent on the problem and on the goal of the researcher whether to use 1, 5, or 20 dimensional substrate hypercube, and whether that substrate is cuboid, or spherical. If for example the data being analyzed has spherical geometrical regu- larities, perhaps it would be best to paint the input hyperlayer on the spherically shaped substrate, and use spherical coordinates rather than Cartesian. Fig-16.6a shows a 2d substrate of polar topology and using polar coordinates, where the sen- sory signals are mapped to the circumference of the circle, and the output is pro- duced by the inner circle of the substrate. For comparison, Fig-16.6b shows a standard 2d substrate, with a layer to layer feedforward topology.

Fig. 16.6 A two dimensional circular substrate using polar coordinates (A), and a two di- mensional standard feedforward substrate using Cartesian coordinates (B).

With the sensory signals mapped to one of the surfaces of the substrate, or in- ternal structures, or geometries of the substrate, and the output signals produced by any of the neurodes within the substrate tagged for such a job, there is an

16.2 The Updated Architecture of Our NN Based Systems

enormous amount of flexibility in such an encoded system. Because NNs are uni- versal function approximators, it is theoretically feasible to evolve any type of synaptic weight and connectivity pattern within the substrate. Indeed, consider a 3d PET (Positron Emission Tomography) scan of the brain, which outlines the metabolic activity of the same. Would it be possible for a complex enough NN to paint such an activity pattern within a 3d substrate? The sensory signals for such a substrate could be mapped to a few internal structures within it (perhaps the 3d structures similar to optical nerves...), and its output extracted from the neurodes along a 3d structure which outputs signals to be forwarded down a simulated spi- nal cord... But why be limited to simulating the activity patterns at the resolution of a PET scan, how about the activity patterns at an even greater detailed? Indeed, we will eventually, with a substrate dense enough and a NN complex enough, should be able to achieve producing activity patterns of the same granularity as the biological brain.

Furthermore, why be limited to 3d topologies, why not 4d, or 10d? There is certainly an enormous amount of things left to explore in direct and indirect en- coded NN based systems. There is an enormous amount exploration and experi- mentation, still beyond the horizon, which will yield new and certainly incredible results. We start off in this direction in the next section, in which we will discuss methods of how to represent such a substrate in our TWEANN system.

To implement substrate encoding we need to figure out a way to represent it within the genotype in such a way that we can then map it to phenotype in a man- ner that will meld well with our TWEANN system. We would also prefer that the genotype has at least the following features:

 The representation must allow us to specify any number of sensors and actua- tors, allow us to use mutation operators to add new sensors and actuators through evolution, and for the substrate to have any number of dimensions.

 The phenotype should be representable as a single process. Thus this entire substrate, no matter how many dimensions and how many neurodes belong to it, should have the genotype that can easily be mapped to this single process based phenotype.

 The genotype should allow us to easily represent at least the standard set of substrate topologies: hyperlayer-to-hyperlayer (HtH) feedforward, fully con- nected, and Jordan recurrent (HtH topology where the output hyperlayer of the substrate is used as part of its input hyperlayer).

 Finally, the encoding must be simple enough so that we can easily work with it, and extend it in the future.

These requirements were not chosen arbitrarily, they represent what is neces- sary for our substrate encoded system to provide all the features of the bleeding edge and beyond, of today's substrate encoded systems. What other systems do not provide, and ours will, is for our substrate to have the ability to integrate new sensors and actuators into itself through evolution. The architecture of a substrate encoded agent is diagrammed in Fig-16.7 .

Fig. 16.7 The architecture of a substrate encoded NN based agent.

What's important to notice in this architecture is that now the sensors and actu- ators are used by the substrate. Whereas the NN is used for the substrate synaptic weight and connectivity pattern generation and setup. In the standard encoding, neural encoding, the NN is the one that polls the environment using its sensors,

and then acts upon the environment using its actuators. But now that it is being used by the substrate, and the substrate only needs to use the NN to set or update its synaptic weights, the NN is no longer the driver behind the use of sensors and actuators. Instead, the NN is now being simply used by the substrate, being fed neurode coordinates, and producing synaptic weights and other parameters that the substrate uses.

In Chapter-10 we discussed how DXNN uses various coordinate preprocessors before they are fed to the NN, which allows the NN to use not just the Cartesian coordinates, but also evolve other preprocessors such as the: polar coordinates, spherical coordinates, neurode distance to substrate center, synaptic connection length... These coordinate preprocessors preprocess the standard Cartesian coordi- nates before feeding the resulting vector to the NN. Thus the NN can use combi- nations of these, and therefore have the ability to extract and be aware of more ge- ometrical regularities, and produce more complex synaptic weight and connectivity patterns. But should we represent these preprocessors and postpro- cessors which the NN will use to feed the resulting synaptic weights and connec- tivity expression to the substrate, using some new set of elements like the sensors and actuators? Or should the substrate keep track of which neurons in the NN will get what signals, and which neurons should produce which outputs?

For example, we could allow the substrate to simply keep track which neurons should be fed which types of signals and through the use of which types of coor- dinate processors, and also from which neurons it, the substrate, should await the synaptic weight signals and connectivity expression signals. This is the way DXNN is implemented, where the substrate deals directly with the NN. Because it is the substrate that is in the driving seat, using and polling the NN, rather than the other way around, it is an effective implementation. But the system we are devel- oping here is different enough that it would be easier for us to create a new set of elements, similar to sensors and actuators but dedicated to substrate based coordi- nate preprocessing and connectivity expression setting. These elements will be similar enough to sensors and actuators, such that we will be able to reuse their mutation operators for adding and integrating new such elements into a substrate encoded NN. Thus our design will follow the architecture shown in the above fig- ure, where the substrate will poll the substrate_cpp /s (Substrate Coordinate Pre- Processor), and then wait for the signals from the substrate_cep /s (Substrate Con- nectivity Expression Producer) process, which will tell it what the synaptic weight is between the two neurodes with which the substrate_cpps were called with, and whether the connection between these neurodes is expressed or not. This approach should give our system an excellent amount of flexibility and scalability in the fu- ture.

So then, we will create two new process types: substrate_cpp and sub- strate_cep. These will be analogous to the sensors and actuators respectively, but driven and polled by the substrate when it wishes to calculate the synaptic weights and connectivity expression between its various neurodes. The substrate will for-

ward to its one or more substrate_cpps the coordinates of the two connected neurodes in question, the called substrate_cpp will process those coordinates based on its type (If cartesian type for example, then simply leaving it as Carte- sian coordinates and simply fanning out the vector to the neurons in the NN. If an- other type, then converting the coordinates to polar , spherical , or some other vec- tor form first...), and forward the processed vector to the NN. The substrate will then wait for the signals from its one or more substrate_ceps, which will provide it with the various signals which the substrate will then use to set its synaptic weights, connectivity expressions, or even plasticity based synaptic weight up- dates (as will be discussed in the next chapter). The substrate will use its sub- strate_cpps and substrate_ceps for every synaptic weight/expression it wishes to set or update. Unlike the sensors and actuators, the substrate_cpps and sub- strate_ceps will not need to sync up with the cortex because the substrate_cpps will be triggered by the substrate, and because the signals from substrate_ceps will be awaited by the substrate, and since the substrate itself only processes signals once it has received all the sensory signals from the sensors which themselves are triggered by the cortex, the whole system will be synchronized.

Having now decided on the architecture, and representation of the substrate and its substrate_cpps and substrate_ceps as independent processes, it is time to come up with the way we will represent all of this within the genotype. We now need to create a genotype encoding.

16.3 The Genotype of the Substrate Encoded NN

We know how we want our substrate encoded NN system to function, but how do we represent a NN's genotype which is substrate encoded? If we were to give the substrate_cpp and substrate_cep their own records within the record.hrl file, they would have looked something like this:

-record(substrate_cpp,{id,name,substrate_id,vl,fanout_ids=[],generation,parameters, pre_f,

post_f}).

-record(substrate_cep,{id,name,substrate_id,vl,fanin_ids=[],generation,parameters, pre_f,

post_f}).

Let's go through each of the elements from the above records:

id : This is the id of the substrate_cpp or substrate_cep.

name : Similar to sensors and actuators, there will be different kinds of sub- strate_cpps and substrate_ceps, each with its own name.

substrate_id : This element will hold the substrate's id, so that the sub- strate_cpp/subsrate_cep will know which process to wait for a signal from, or send its signal to respectively. This is a bit analogous to the scape id used by the sensors and actuators.

vl : The vector length of the signal the process substrate_cpp/substrate_cep is certified to deal with.

fanout_ids : The same as for the sensors, the list of element ids to which this substrate_cpp will forward vector signals.

fanin_ids : The same as for the actuators, the list of element ids from which the substrate_cep expects to receive the signals.

generation : This element keeps track of the generation during which the ele- ment was added to the NN, or last affected by a mutation.

parameters : At some point it might be useful to also specify parameters to fur- ther modify how the particular substrate_cpp/substrate_cep pre or post process- es the signal vectors respectively.

post_f : The signal postprocessing function name, if any, used by the sub- strate_cpp/substrate_cep process.

Thus these two elements are basically shorter versions of the sensor and actua- tor elements, specializing in substrate signal pre- and post- processing. There is a large number of similarities between the sensors/actuators, and cpps/ceps respec- tively... And so it might be worth further consideration of whether we should give substrate_cpps and substrate_ceps their own records, or whether perhaps we should somehow modify the sensor and actuator records to allow them to be dual purpose... We will get back to this issue in just a short while; but now we finally get to the main questions of how we will represent the substrate topology.

We do not need to make our substrates hyperspheres, toroids, or possess any other type of exotic topology, because we can use substrate_cpps and sub- strate_ceps which will be able to convert the standard coordinates to spherical, po- lar, toroidal... which means we can use a standard Cartesian hypercube topology, and if the researcher or the problem requires that the input hyperplanes and output hyperplanes have spherical or toroidal or some other topology, we can position the hyperplanes within the hypercube in the preferred coordinates, and use the appro- priate substrate_cpps to emulate the chosen exotic topology. Since our substrates will be hypercubes, and we will want to support at least the hyperlayer-to- hyperlayer feedforward topologies and fully connected topologies, we could im- plement the topology specification using a layer density list: Densities = [H...,Y,X], where the first element in the list specifies the depth of the hidden (non input or output) processing hyperlayers, shown in Fig-16.8 . The rest of the ele- ments within the Densities list specify the densities of the hidden processing hyperlayers. We can think of H as specifying the number of hyperlayers between the Input_Hyperlayer and the Output_Hyperlayer. The following figure shows 2d and 3d examples, with multiple densities. What should be further noted from the substrate diagrams is that the signals coming from the sensors are not sent to the input hyperlayers , they are input hyperlayers . In substrate encoded systems, the sensory signals acquire geometrical properties, positions, coordinates... so that the processing hyperlayers can have the synaptic weights calculated for the signals coming from those sensory coordinates. It is because we give the sensors their co- ordinates, which can reflect real world coordinates, (for example the actual posi-

tion of the cameras on a cuboid robot, in which case the coordinates of the sensory signals would reflect the actual coordinates of where the signals are coming from on the robot's body) that this approach offers the substrate encoded NN based agent geometrical sensitivity, and allows for the system to take advantage of geo- metrical regularities in the sensory signals. Similar is the case for the substrate's output hyperlayers. The neurodes of the output hyperlayers can have any coordi- nates, and their coordinates may be geometrically significant to the way the sig- nals are processed. For example the output hyperlayer neurodes might mimic the coordinates of the legs or wheels on some robot.

Fig. 16.8 Two examples of the substrate specification and architecture.

Now notice that this specification is Input Hyperlayer and Output_Hyperlayer blind. If for example the substrate is 3d: [Z,Y,X], then the input hyperlayer will always be located at Z = -1, and the output hyperlayer will always be located at Z = 1. The input hyperlayer represents the coordinate based plane where the sensory signals are presented to the substrate (since for neurodes to have synaptic weights, the substrate must give the sensors some kind of geometrical representation with coordinates). The output hyperlayer contains the neurodes whose output signals are gathered and used to control the actuators. The densities list specifies those in- between processing hyperlayers, and says nothing about the input or output hyperlayers.

So then, the Densities list represents the bulk of the substrate, the hidden pro- cessing hyperlayers of the substrate (though note that the output hyperlayer also processes signals, only the input hyperlayer does not, it only produces the signals). Since different sets of sensors and actuators will require different input and output

hyperlayer structures, we calculate those geometrical structures and coordinates from the sensors & actuators live. We calculate how to compose the input and output hyperlayers based on the sensors/actuators, their types, their specified geo- metrical properties, if any, and their signal vector lengths.

As you remember, both the sensors and actuators had the element format within their records, we have finally reached the point where we can use it. It is this for- mat element which will hold the geometrical information of that sensor or actua- tor. If for example the sensory signal is coming from a camera of resolution 500x500, we would specify this fact in the format element. If the NN is of type neural, then this has no meaning, it does not matter, and the vector fed to the NN, which in this case will be of size 500*500 = 250000, is fed directly as a single list of values. But if the NN is substrate encoded, then the input signal from this sen- sor is a 2d hyperplane with dimensions: 500x500. In this manner the evolved NNs can take advantage of any geometrical information, when specified.

There is a similar case when it comes to actuators. If a NN is of type neural, then the neurons simply forward their signals to the actuator and that is the end of that. But if the NN is of type substrate, and for example the output itself is an im- age of resolution 10x10, then we would like to have the output hyperplane des- tined for this actuator to be a 2d plane with dimensions 10x10, so that we can re- tain the geometrical properties of the actuator signal. This output signal of the output hyperplane would then be fed to the actuator. Using the format parameter means that the substrate will be able to take advantage of the geometrical proper- ties associated with processing and generating images, and other geometry sensi- tive information, or anything that can better be handled when geometrical proper- ties are taken into account.

So then, through the Densities list we can specify the general substrate proper- ties, and the substrate specification is then completed based on the sensors and ac- tuators used. If the sensors and actuators change, or more are added, or some are removed, it will in no way affect our system since it will be able to generate the new substrate by reading the Densities, and analyzing the new Sensor and Actua- tor list during the genotype to phenotype mapping. Two examples of Densities specified substrate, integrated with sensors and actuators which have their hyperlayers specified by their format and vl parameters, are shown in Fig-16.9 .

In the below figure, the format element is specified through the tuple: {IHDTag, IHD}, where IHDTag stands for Input Hyperlayer Densities Tag, and IHD is Input Hyperlayer Densities. In the below figure there are two formats, the no_geo format (equivalent to an undefined format, which is simply a single di- mensional vector), and the {symmetric,IHD} format. The symmetric tag specifies that the IHD uses a standard Densities format, and that the neurodes should use equidistant coordinates from each other in their respective dimensions.

Fig. 16.9 Substrates created based on the Densities list and the sensors and actuators used by the substrate encoded NN system.

Having discussed all of this, consider now what happens when a researcher de- cides to use a morphology by a hypothetical name: forex_trader , which is a mor- phology of agents which trade currencies on the forex market. Some of the sensors in this morphology have a format which specify that the sensor outputs 2d images. All actuators on the other hand just accept simple unformatted signals from the output hyperlayer. There are multiple sensors in this morphology, each for its own technical indicator (a chart of closing prices, a chart for currency pair volume...). An agent which is allowed to evolve and connect to new sensors, will eventually have an input from at least two such 2d input planes (two charts). Combined, this input will be 3d, since each chart is 2d, and if we now give each one of them their own 3 coordinate, the input hyperlayer is 3d. But if the researcher decided to have his entire substrate be 3d, which only allows 2d input hyperplanes, how can the substrate encoded NN acquire access to new sensors?

rd

For this reason, though we will allow for the densities to be set by the research- er, the actual dimension of the substrate will be computed by the genotype con- structor function based on the morphology, and the formatting rules of its sensors and actuators. It will calculate the dimensionality of the substrate as follows:

1. Find the largest dimensionality specified in the format parameter of the sensors and actuators, and designate it as MaxDimIOH, which stands for Max Dimen- sionality of the Input/Output Hyperplanes. If a sensor or actuator does not spec- ify geometrical properties in its format variable, then we assume that it is single dimensional, and that the signal has a list form.

2. Because we want to allow the substrate encoded NN based system to evolve connections to multiple new sensors and actuators, it will be possible in the fu- ture for the substrate to be connected to multiple such MaxDimIOH hyperplanes. The way to put multiple such hyperplanes together, is to equidis- tantly space them out on another dimension. Thus we designate the IOHDim as MaxDimIOH+1, where IOHDim stands for Input/Output Hyperlayer Dimen- sion.

3. Finally, because the substrate must have depth as well, and the IOHDim hyperlayers should be able to connect to and be connected from the hidden pro- cessing hyperlayers (which themselves will at most be of dimensionality IOHDim), the final dimensionality of the substrate is: SubstrateDim = IOHDim+1 , because we create a new dimension and then position the input hyperlayer at the -1, and the output hyperlayer at the 1 of this new dimension, with the hidden processing hyperlayers, if any, spaced out equidistantly be- tween -1 and 1.

The application of these 3 steps are shown in the creation of the substrate in Fig-16.10 . In the below figure our forex_trader morphology based substrate en- coded NN has a single actuator which uses only a single dimension, since the evolved agent need only be able to produce a vector of length 1, which specifies if the agent wishes to go long, short, or hold its currently traded currency pair. The available sensors for this morphology have at most 2 dimensions. These 2d based sensors feed the substrate the image charts of various technical indicators. Thus, to allow the substrate to use multiple sensors and actuators, we need for the input and output hyperlayers to be of at least 3 dimensions. But because the substrate must have at least 2 such hyperlayers, one input and one output hyperlayer, we thus need to position these hyperlayers on another dimension. Thus the minimum di- mensionality of the entire substrate is 4. We can make the substrate have more than 4 dimensions if we believe the added dimensionality will increase the flexi- bility of the substrate, but we cannot use a substrate of a dimension less than 4 for this morphology, unless we will restrict the substrate encoded NN to use only a single sensor and a single actuator, at which point the minimum dimension would be 3, or unless we use a non hyperlayer-to-hyperlayer connection topology.

Fig. 16.10 A four dimensional substrate composed by analyzing the morphology based sen- sor and actuator dimensionalities.

Certainly, we could of course encode everything on a two dimensional sub- strate if we wanted to. We could always try to simply aggregate all the signals to- gether into a single vector, which would then represent a single dimensional input or output hyperplane. But then we would lose the geometrical information within those sensors and actuators. Thus the above approach is the most effective way to automatically set up hypercube substrates which will allow for the substrate en- coded NNs to evolve, and have the ability to incorporate multiple sensors and ac- tuators if needed.

So then, during the genotype creation we need only specify the Densities list for the hidden hyperlayer. The dimension of the substrate will depend on the anal- ysis of the sensors and actuators of the NN system's morphology. We do not need to specify the weights or anything else within the genotype of the substrate encod- ed NN (SENN). During the mapping from genotype to phenotype, the connected sensors and actuators used by the NN system will be analyzed and the hyperlayers based on their properties will be composed, sandwiching any other hidden pro- cessing hyperlayers of the substrate, and thus producing the final substrate. Thus the way the substrate can be represented, is simply through the use of the Densi- ties parameter and the Sensors and Actuators list.

But just having a way to represent the substrate is not enough. There is some- thing beside the location of the neurodes within the substrate that we must specify. We must specify how the neurodes are interconnected. For example, we should be able to specify with a tag whether the substrate uses a hyperlayer-to-hyperlayer feedforward topology, or whether it uses a feedforward topology but where every neurode is also self recurrent, or perhaps where every hyperlayer is self recurrent... Since all of this will only matter in the phenotype, which we will discuss in the

16.4 The SENN Phenotype

next section, in the genotype we can specify the substrate connectivity architecture using a single extra element: linkform . The element linkform should be able to take the values of : l2l_feedforward , fully_connected , jordan_recurrent , freeform ... or any other type of architecture we decide on implementing.

To sum it all up, the new substrate encoded NN based genotype will need to keep track of the densities parameter, the list of substrate_cpps and substrate_ceps that the substrate has to communicate with, the plasticity of the substrate, the to- pology of the substrate (layer-to-layer, fully connected, jordan_recurrent, freeform...), with the sensors and actuators still being tracked by the cortex ele- ment. That is a lot of new parameters and lists to keep track of, which will be best done if we give the substrate its own element. Thus we add a new substrate record to the records.hrl. The new substrate element will have the following form:

-record(substrate, {

id,

agent_id,

densities,

linkform,

plasticity=none,

cpp_ids=[],

cep_ids=[]

}).

Finally, because the substrate element will have its own id, we will need to add the element substrate_id to the agent record, so that it can keep track of this new element/process as well.

Having now discussed all the needed features to implement a substrate encoded NN based system: the new architecture, the representation of the substrate in the genotype, and the new coordinate processors and connectivity expression produc- ers... The only thing left to decide on before we can move forward and begin the implementation of the said system, is what the substrate will look like in its phe- notypic form, how will it process the signals from the input hyperlayer, and gener- ate signals by its output hyperlayer. Thus the topic of our next section is the SENN's phenotypic representation.

We have the architecture, and we know how to specify the substrate topology, and even the type of substrate linkforms (l2l_feedforward, fully_connected, jor- dan_recurrent...), but how exactly do we represent it in its phenotypic form? How do we represent it as a single process so that it can actually process the sensory signals and produce outputs to control its actuators? How do we use the sensors,

actuators, substrate_cpps, and substrate_ceps? In this section we will discuss the phenotype of our Substrate Encoded Neural Network (SENN) based system.

In the previous section we let the format element of the sensors and actuators have the following style: format={Tag,HpD} , where Tag is an atom which can further specify the formatting, and where HpD stands for Hyperplane Densities, which has a list form similar to the Densities, with the length of this list being the dimensionality of the signal.

Now let's assume that our genome constructor is creating a new SENN geno- type, and is building a substrate that will from the start use 2 sensors and 2 actua- tors. What they do is not important and so they are simply named sensor1, sen- sor2, actuator1, and actuator2. These two sensors and actuators are as follows:

Sensors = [

sensor{name=sensor1,format=no_geo,tot_vl=3},

sensor{name=sensor2,format={symmetric,lists:reverse([2,3])},tot_vl=6}

]

Actuators = [

actuator{name=actuator1,format=no_geo,tot_vl=2},

actuator{name=actuator2,format={symmetric,lists:reverse([3,2])},tot_vl=6}

]

We see that the maximum dimensionality in the list of sensors and actuators is 2, and thus the substrate that will be created will be 4 dimensional. Let us also fur- ther suppose that we have specified the following substrate Densities: [1,3,2,2]. The dimensionality of the substrate is 4 as it should be. Depth is 1, so there will be one hidden processing hyperlayer, and it will be composed of three 2x2 layers. The input hyperlayer will be composed of 2 planes, the first is 1x3 associated with sensor1, and the second is 2x3, associated with sensor2. The 3d output hyperlayer will also be composed of 2 planes, the 1x2 plane of actuator1, and the 3x2 plane of actuator2. This is shown in the following figure.

Fig. 16.11 Creating the phenotype based on sensors and actuators .

Note that no matter the densities, the neurodes should automatically be posi- tioned equidistantly from each other, preferably as far from each other as possible. We do this because the further the neurodes are from each other, the more differ- ent their coordinates are from each other, and thus the NN which calculates the synaptic weights for each neurode pair will see a greater difference between every coordinate pair fed to it, allowing for a higher level of discernability between the various coordinate pairs.

Had our Densities specification had 0 for the depth, then there would only be the input and output hyperlayers present. So we now know what the phenotypes look like, for this, or for any other specified genotype. But how do we represent all these neurodes in phenotype form? And as noted, how do we effectively get them to process signals coming from the input hyperlayers?

Since all neurodes use the tanh activation function, we do not need to specify the AF for every neurode. Because in the standard l2l_feedforward substrate, eve- ry neurode in the substrate processes the signals coming from all the neurodes in the previous hyperlayer, we can represent the phenotype as a list of lists, using the following format: [Hyperlayer1,Hyperlayer2...HyperlayerN] , where N is the total number of hyperlayers, which is Depth+2 (total number of hidden processing hyperlayers, plus input and output hyperlayers). Each Hyperlayer in this list, is a list of tuples, where each tuple represents a neurode: [{Coordinate, OutputSignal, SynapticWeights}...] .

So then, the phenotypic representation of a substrate using:

Densities=[1,3,2,2]

Sensors = [

sensor{format=no_geo,tot_vl=3},

sensor{format={symmetric,lists:reverse([2,3])},tot_vl=6}

]

Actuators = [

actuator{format=no_geo,tot_vl=2},

actuator{format={symmetric,lists:reverse([3,2])},tot_vl=6}

]

Will have the architecture shown in the above figure, and a phenotypic list based representation, before the synaptic weights are set to some values (by de- fault they will be set to 0 before the substrate uses the NN to set them to their ap- propriate values) is as follows:

[ [{[-1,-1,0,-1],0,void},

{[-1,-1,0,0.0],0,void},

{[-1,-1,0,1],0,void},

{[-1,1,-1,-1],0,void},

{[-1,1,-1,0.0],0,void},

{[-1,1,-1,1],0,void},

{[-1,1,1,-1],0,void},

{[-1,1,1,0.0],0,void},

{[-1,1,1,1],0,void}],

[{[0,-1,-1,-1],0,[0,0,0,0,0,0,0,0,0]},

{[0,-1,-1,1],0,[0,0,0,0,0,0,0,0,0]},

{[0,-1,1,-1],0,[0,0,0,0,0,0,0,0,0]},

{[0,-1,1,1],0,[0,0,0,0,0,0,0,0,0]},

{[0,0.0,-1,-1],0,[0,0,0,0,0,0,0,0,0]},

{[0,0.0,-1,1],0,[0,0,0,0,0,0,0,0,0]},

{[0,0.0,1,-1],0,[0,0,0,0,0,0,0,0,0]},

{[0,0.0,1,1],0,[0,0,0,0,0,0,0,0,0]},

{[0,1,-1,-1],0,[0,0,0,0,0,0,0,0,0]},

{[0,1,-1,1],0,[0,0,0,0,0,0,0,0,0]},

{[0,1,1,-1],0,[0,0,0,0,0,0,0,0,0]},

{[0,1,1,1],0,[0,0,0,0,0,0,0,0,0]}] ,

[{[1,-1,0,-1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,-1,0,1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,1,-1,-1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,1,-1,1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,1,0.0,-1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,1,0.0,1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,1,1,-1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,1,1,1],0,[0,0,0,0,0,0,0,0,0,0,0,0]}] ]

I highlighted each hyperlayer with a different color. The hyperlayer at K = -1, the input hyperlayer, is highlighted green (if you're reading the black & white

printed version, it's the first block). The hyperlayer at K = 0, the hidden pro- cessing hyperlayer, is highlighted blue (the second block). And the processing output hyperlayer at K = 1, is highlighted red (the third block). Note that the input hyperlayer has the atom void for its synaptic weights list, this is because there are no synaptic weights, since this hyperlayer represents the sensors, it only produces signals. The coordinates are inverted, rather than having the format: [X,Y,Z,K], they have the form: [K,Z,Y,X], which makes it easier to see the the separate hyper- layers. This list of lists is composed of tuples, the tuples have the following format: {NeurodeCoordinate, OutputSignal, SynapticWeights} . The Neurode- Coordinate is the actual coordinate of the neurode that this tuple represents. Every tuple represents a neurode, and the OutputSignal is what that neurode's output signal is. Thus, if you are looking at a l2l_feedforward substrate at any given time, the neurode's OutputSignal is actually the neurode's previous OutputSignal, because this value is calculated for it on the fly and is then immediately used as an input signal by the neurodes in the next hyperlayer. The calculation of each neurode's output is performed by our algorithm, which processes the input signals (the OutputSignals of the neurodes in the previous hyperlayer) for this neurode, and calculates this neurode's output signal by the application of the activation function tanh to the accumulated signal sum, without bias. The element: Synaptic- Weights , is a list of synaptic weights, set by querying the NN with the coordinates of this neurode, and the coordinates of all its presynaptic neurodes. Since the synaptic weights have an order in the list, and the neurode representing tuples also have a static order in the substrate, there is an implicit correlation between the synaptic weights within the SynapticWeights list, and the neurodes in the previous hyperlayer, as long as one does not change the order of either list, they will match.

Before we discuss how layer to layer feedforward processing can be efficiently implemented using this encoding once the synaptic weights are set, let's first take a closer look at each list represented hyperlayer. Starting with the input hyperlayer which was created by analyzing the sensors:

Sensors = [

sensor{format=no_geo,tot_vl=3},

sensor{format={symmetric,lists:reverse([2,3])},tot_vl=6}

]:

[{[-1,-1,0,-1],0,void},

{[-1,-1,0,0.0],0,void},

{[-1,-1,0,1],0,void},

{[-1,1,-1,-1],0,void},

{[-1,1,-1,0.0],0,void},

{[-1,1,-1,1],0,void},

{[-1,1,1,-1],0,void},

{[-1,1,1,0.0],0,void},

{[-1,1,1,1],0,void}]

The first coordinate, K, is -1 as it should be, since the input hyperlayer is al- ways positioned at the most negative end of the substrate. There are two sensors of max dimensionality of 2, thus there should be 2 planes on the Z axis. The next co- ordinate in the list is Z, and there are 3 tuples which have Z = -1, and the other 6 tuples have Z = 1, as expected. The first 3 tuples represent the values originating from the first sensor, and the second 6 tuples represent the values originating from the second sensor. Also because the first sensor has no geometrical information, it is a single one dimensional vector, and so for the first 3 tuples Y = 0, with only the X taking the coordinates of -1, 0, and 1. The second sensor is 2 dimensional, and so Y takes the coordinates of -1 and 1, each Y coordinate further comes with 3 X coordinates, which take the values -1, 0, and 1. Thus indeed this tuple list does represent the input hyperlayer.

The next hyperlayer was specified using the Densities list:[1,3,2,2]: [{[0,-1,-1,-1],0,[0,0,0,0,0,0,0,0,0]},

{[0,-1,-1,1],0,[0,0,0,0,0,0,0,0,0]},

{[0,-1,1,-1],0,[0,0,0,0,0,0,0,0,0]},

{[0,-1,1,1],0,[0,0,0,0,0,0,0,0,0]},

{[0,0.0,-1,-1],0,[0,0,0,0,0,0,0,0,0]},

{[0,0.0,-1,1],0,[0,0,0,0,0,0,0,0,0]},

{[0,0.0,1,-1],0,[0,0,0,0,0,0,0,0,0]},

{[0,0.0,1,1],0,[0,0,0,0,0,0,0,0,0]},

{[0,1,-1,-1],0,[0,0,0,0,0,0,0,0,0]},

{[0,1,-1,1],0,[0,0,0,0,0,0,0,0,0]},

{[0,1,1,-1],0,[0,0,0,0,0,0,0,0,0]},

{[0,1,1,1],0,[0,0,0,0,0,0,0,0,0]}]

There is only one hidden processing hyperlayer, so it is positioned equidistantly between the input and the output hyperlayers, at K = 0. There are 3 planes in this hyperlayer, each has a 2x2 topology. Thus the next coordinate takes 3 values, Z = -1, 0, and 1. For every Z coordinate there are 4 tuples, representing the 4 neurodes, and we see that for every Z coordinate, Y takes on the values of -1 and 1, and for every Y coordinate, X takes a value of -1 and 1. Furthermore, notice that each tu- ple has the default Output = 0, and the synaptic weight list: [0,0,0,0,0,0,0,0,0]. The weight list is of length 9, which is the number of neurodes in the input hyperlayer. Thus every neurode in the hidden hyperlayer is ready to process the signals coming from every neurode of the previous hyperlayer, the input hyperlayer in this case.

Next comes the output hyperlayer, whose phenotypic representation was com- posed through the analysis of the actuator list that the SENN in question uses:

Actuators = [

actuator{format=no_geo,tot_vl=2},

actuator{format={symmetric,lists:reverse([3,2])},tot_vl=6}

]:

[{[1,-1,0,-1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,-1,0,1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,1,-1,-1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,1,-1,1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,1,0.0,-1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,1,0.0,1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,1,1,-1],0,[0,0,0,0,0,0,0,0,0,0,0,0]},

{[1,1,1,1],0,[0,0,0,0,0,0,0,0,0,0,0,0]}]

The output hyperlayer is always located at the most positive end of the sub- strate. In this 4d substrate this means that it is located at K = 1. The output hyperlayer is 3d, and there are 2 planes composing it, thus Z takes on two values, -1, and 1. The signals destined for actuato r1 are positioned at Z = -1. The signals destined for actuator2 are positioned at Z = 1. The output hyperlayer is a signal processing hyperlayer, and so each tuple comes with a synaptic weight list (which in this default form has not yet been set to its appropriate values through the use of the synaptic weight producing NN).

We discussed before and wondered how we would decide which neurodes should receive signals from which sensors, and which neurodes should forward their signals to which actuators. Using this representation this question can now be answered. The neurodes come in a particular pattern within their hyperlayer, and in the same order as they are listed in the sensor and actuator lists. There are vl number of neurodes associated with each sensor and actuator. Thus we can simply take the vl number of neurodes from the corresponding input or output hyperlayer and map them to their appropriate, and in the same order based, sensors and actua- tors respectively.

What is excellent about this substrate representation is the ease with which we can use it for signal processing. In fact it only takes a few lines in Erlang. Using the above substrate representation, the l2l_feedforward processing of signals from the input hyperlayer to the output hyperlayer, can be done using the source code shown in Listing-16.1.

Listing-16.1 Processing signals from the input hyperlayer to the output hyperlayer, in a hyperlayer-to-hyperlayer feedforward substrate.

calculate_output_std(Prev_Hyperlayer,[Cur_Hyperlayer|Substrate],Plasticity,Acc)->

Updated_CurHyperlayer = [{Coord,calculate_neurode_output_noplast(Prev_Hyperlayer, {Coord,Prev_O, Weights}, Plasticity), Weights} || {Coord,Prev_O,Weights} <-

Cur_Hyperlayer],

calculate_output_std(Updated_CurHyperlayer,Substrate,Plasticity, [Updated_CurHyperlayer

|Acc]);

calculate_output_std(Output_Hyperlayer,[],_Plasticity,Acc)->

{[Output || {_Coord,Output,_Weights} <- Output_Hyperlayer],lists:reverse(Acc)}. calculate_neurode_output_noplast([{_I_Coord,O,_I_Weights}|I_Neurodes],

{Coord,Prev_O,[Weight|Weights]},Acc)->

calculate_neurode_output_noplast(I_Neurodes,{Coord,Prev_O,Weights}, O*Weight

+Acc);

calculate_neurode_output_noplast([],{Coord,Prev_O,[]},Acc)->

functions:tanh(Acc).

And that is effectively it, 2 functions with 9 lines (if one does not count the lines introduced through the line breaks due to the page width of this book) of code total. Since every hyperlayer is contained in its own list, and since every neurode in the hyperlayer must process the signals from all the neurodes in the previous hyperlayer, we simply execute the calculate_output_std/2 function with the substrate whose input hyperlayer was used as the first parameter to the func- tion, with the remaining substrate as the second parameter. Then each neurode in the current hyperlayer is fed the signals from the neurodes in the previous hyperlayer. We calculate the output of the neurodes in the current hyperlayer, updating it in the process with the said neurodal outputs. The next iteration of the calculate_output_std recurrent function is then executed with the updated current hyperlayer as the first parameter (now it's a Prev_Hyperlayer), with the remainder of the substrate as the second parameter. Thus the current hyperlayer during the next iteration becomes the previous hyperlayer. In this manner all the hyperlayers are processed, with the end result being the updated last hyperlayer, the output hyperlayer. At this point the output hyperlayer is the only remaining updated hyperlayer, so we simply extract its Output values using list comprehension. We can then use the list of actuators and their vl values to extract the output lists from the output value list of the output hyperlayer.

It would even be easy to add some level of recurrence to the substrate using this encoding, which we will do within the implementation section. In this manner the substrate can contain millions of neurons, and they would be processed rather effi- ciently by a single process. Optimizations could be made to separate the substrate into multiple parallel hypercubes, or feed this vector based representation to a GPU, which could then process it in parallel if implemented accordingly.

The more difficult part is implementing the functions to compose these sub- strates from their Densities list, and their sensor and actuator lists. We will discuss just that in the substrate implementation section, but first let us finally implement the substrate_cpps and substrate_ceps.

16.5 Implementing the substrate_cpps & substrate_ceps

We have discussed how the substrate_cpps and substrate_ceps have a very sim- ilar functionality to the sensors and actuators respectively. In fact, so much so that

to recreate these two functions would require copy-pasting most of the mutation operators, sensor and actuator modules, the genotype construction functions which create the seed NNs, and the sensor/actuator cloning and deleting functions. It is true that the substrate_cpps and substrate_ceps are not sensors and actuators, re- spectively. They are part of the NN system itself, part of the Think element in the Sense-Think-Act loop. But it also would not be effective for us to have to re- implement the same functionality we've already developed. There is an alternative though.

Since we now know what the substrate_cpps and substrate_ceps are, and that they are not sensors and actuators, we can be comfortable enough to reuse and piggyback the sensor and actuator records by extending them to include the type element, which will specify whether the sensor/actuator is of the standard type, or whether it is of type substrate_cpp/substrate_cep. It is a better approach at this time, because either way we will have to modify the genotype_mutator and other modules to discern between sensors/actuators and substrate_cpps/substrate_ceps. For example, if the agent is of type substrate rather than neural , the mutation op- erators add_sensor, add_actuator, add_sensorlink, add_actuatorlink, and the func- tions which perform sensor to neuron, and neuron to actuator linking, all must be modified to accommodate the fact that the addition of new sensors and actuators, and linking to them, is done very differently in a substrate encoded system. Thus, since we already have to modify these functions, we might as well use the existing sensor and actuator based functionality.

Before we continue and begin modifying the sensor and actuator records and modules, here are the similarities between the sensors & actuators, and their sub- strate based counterparts:

sensors: The NN evolves and is able to integrate and connect to new sensors during the evolutionary process. Furthermore, the sensors are used to interface with the environment through message passing, potentially processing the data before forwarding the sensory signals to the NN.

substrate_cpps: The NN evolves and is able to integrate and connect to new coordinate processors during the evolutionary process. Starting off with a sin- gle standard coordinate type of substrate_cpp, which simply passes the coordi- nates of the 2 connected neurodes to the NN, and over time integrating new substrate_cpps which convert Cartesian coordinates to polar or spherical coor- dinates, calculate distances between the connected neurodes, distances between the neurode and the center of the substrate... Furthermore, the substrate_cpps interact with the substrate through message passing, potentially processing the data before forwarding it to the NN.

actuators: The NN evolves and is able to integrate and connect to new actua- tors during the evolutionary process. Furthermore, the actuator interfaces with the environment through message passing, and direct execution of its functions. substrate_ceps: The NN evolves and is able to integrate and connect to new substrate_ceps, which it controls and sends signals to. There can be different

types of substrate_ceps. The most basic type is one which has a vl=1, and based on the signals from the NN, it sets the synaptic weights between the two neurodes for which the substrate_cpps have acquired coordinate based data. Other types which can be integrated over time are those which deal with con- nectivity expression (whether there even should be a synaptic weight between the noted neurodes), synaptic weight update (a substrate_cep which is used to update synaptic weights, when one implements synaptic plasticity for exam- ple), and others which might reveal themselves and become useful in the fu- ture. Furthermore, the actuator interfaces with the substrate through message passing, and potentially by executing functions directly.

Not only is their functionality similar, where the only difference is that the sen- sors and actuators interface with the scape, whereas the substrate_cpps and sub- strate_ceps interface with the substrate, but also the mutation operators are exactly the same between the two. The NN neither knows, nor cares, whether it is getting connected to the sensors/actuators or substrate_cpps/substrate_ceps. Of course if we allow the sensors and actuators to come in two types, standard and substrate, the substrate encoded NN will need to use both types. The standard sen- sors/actuators will be used by the substrate itself to interface with the world/environment/scape, while the substrate sensors/actuators will be used by the substrate to drive the NN, sending it coordinate based signals and acquiring from it synaptic weight and other parameter settings. Through the use of the type ele- ment in the sensors and actuators, we can also ensure that the cortex does not sync them as it does with the standard sensors and actuators.

Having now decided that the sensors and actuators can be modified to be effi- ciently used in our substrate encoding implementation, let us first modify their records. The updated sensor and actuator records will have the following form:

-record(sensor,{id,name, type, cx_id,scape,vl,fanout_ids=[],generation,format,parameters,

phys_rep,vis_rep, pre_f, post_f}).

-record(actuator,{id,name, type, cx_id,scape,vl,fanin_ids=[],generation,format,parameters,

phys_rep,vis_rep, pre_f, post_f}).

Next we have to modify the morphology module, adding to it a new set of sub- strate type sensors and actuators, used specifically by the substrate encoded NN. We also have to modify the original sensors and actuators specified in the mor- phology module, setting their type to standard. We mirror the functions: get_InitSensors/get_InitActuators and get_Sensors/get_Actuators , to create get_ InitSubstrateCPPs/get_InitSubstrateCEPs and get_SubstrateCPPs/get_SubstrateCEPs ,

as shown in Listing-16.2.

Listing-16.2 The implementation of the new get_InitSubstrateCPPs/get_InitSubstrateCEPs and get_SubstrateCPPs/get_SubstrateCEPs functions.

get_InitSubstrateCPPs(Dimensions,Plasticity)->

Substrate_CPPs = get_SubstrateCPPs(Dimensions,Plasticity),

[lists:nth(1,Substrate_CPPs)].

get_InitSubstrateCEPs(Dimensions,Plasticity)->

Substrate_CEPs = get_SubstrateCEPs(Dimensions,Plasticity),

[lists:nth(1,Substrate_CEPs)].

get_SubstrateCPPs(Dimensions,Plasticity)->

io:format( “Dimensions:~p, Plasticity:~p~n ”,[Dimensions,Plasticity]),

if

(Plasticity == none) ->

Std=[

sensor{name=cartesian,type=substrate,vl=Dimensions*2},

sensor{name=centripetal_distances,type=substrate,vl=2},

sensor{name=cartesian_distance,type=substrate,vl=1},

sensor{name=cartesian_CoordDiffs,type=substrate,vl=Dimensions},

sensor{name=cartesian_GaussedCoordDiffs,type=substrate, vl

=Dimensions}

],

Adt=case Dimensions of

2 ->

[#sensor{name=polar,type=substrate,vl=Dimensions*2}];

3 ->

[#sensor{name=spherical,type=substrate,vl=Dimensions*2}];

_ ->

[]

end,

lists:append(Std,Adt)

end.

get_SubstrateCEPs(Dimensions,Plasticity)->

case Plasticity of

none ->

[#actuator{name=set_weight,type=substrate,vl=1}]

end.

In the function get_SubstrateCPP, there are two lists, the Std (standard), and the Adt (additional), of cpp (from this point on we will refer to the substrate cpp type based sensors, simply as cpp sensors or cpps, and similarly our reference to sub- strate ceps will be using the word: ceps) functions. The standard list contains the various cpp functions which are substrate dimension independent. While the cpps in the Adt list are dimension specific. For example the conversion of cartesian to polar coordinate requires for the substrate to be 2d, while the cpp that feeds the NN the spherical coordinates can operate only on a 3d substrate. Let us discuss each of the listed cpps next, which we will implement afterwards:

cartesian : The cartesian cpp simply forwards to the NN the appended coordi- nates of the two connected neurodes. Because each neurode has a coordinate specified by a list of length: Dimension, the vector specifying the two appended coordinates will have vl = Dimensions*2. For example: [X 1 ,Y 1 ,Z 1 ,X 2 ,Y 2 ,Z 2 ] will have a vector length of dimension: vl = 3*2 = 6.

centripetal_distances : This cpp uses the Cartesian coordinates of the two neurodes to calculate the Cartesian distance of neurode_1 to the center of the substrate located at the origin, and the Cartesian distance of neurode_2 to the center of the substrate. It then fans out to the NN the vector of length 2, com- posed of the two distances.

cartesian_distance : This cpp calculates the Cartesian distance between the two neurodes, forwarding the resulting vector of length 1 to the NN. cartesian_CoordDiffs : This cpp calculates the difference between each coor- dinate element of the two neurodes, and thus for this cpp, the vl = Dimensions. cartesian_GaussedCoordDiffs : Exactly the same as the above cpp, but each of the values is first sent through the Gaussian function before it is entered into the vector.

polar : This cpp converts the Cartesian coordinates to polar coordinates. This can only be done if the substrate is 2d.

spherical : This cpp converts the Cartesian coordinates to the spherical coordi- nates. This can only be done if the substrate is 3d.

In the case of the available CEPs, there is only one, by the name weight. This CEP is connected from a single neuron in the NN, and it sets the synaptic weight between two neurodes in the substrate. Furthermore, we will allow it to compute the synaptic weight based on the following function:

Threshold = 0.33,

Processed_Weight = if

Weight > Threshold ->

(functions:scale(Weight,1,Threshold)+1)/2;

Weight < -Threshold ->

(functions:scale(Weight,-Threshold,-1)-1)/2;

true ->

0

end

This function allows for the synaptic weight to either be expressed or zeroed, depending on whether the NN's output magnitude is greater than 0.33 or not. When the synaptic weight is expressed, it is then scaled to be between -1 and 1.

We could certainly add any number of other cpps and ceps, limited only by our imagination. And we could add new ones at any time, and then use our benchmarker to see if it improves the agility and performance of our TWEANN

system on some difficult problem. But at this point, the listed cpps and ceps will be enough.

16.5.1 Implementing the substrate_cpp Module

We've established the record format for the substrate_cpp type sensors, and de- cided which cpp functions we will implement. Also in the previous section we have created the substrate architecture, in which the substrate triggers the cpp sen- sors to fanout the coordinate based information to all the neurons the cpps are connected to. To realize this architecture, we need to set up the cpps, and create their processes with the ability to receive messages from the substrate's PId. We will have the exoself create the substrate before it links the cpps and ceps, and so by the time the exoself initializes the cpps and ceps, it will already know the sub- strate's Pid. We can then have the exoself tell the cpps and ceps the substrate's PId when it initializes them. Based on this information, we can now create the cpps in their own substrate_cpp module. In Listing-16.3, we implement the cpp function in its own substrate_cpp module.

Listing-16.3 Implementation of the substrate_cpp module.

-module(substrate_cpp).

-compile(export_all).

-include( “records.hrl ”).

gen(ExoSelf_PId,Node)->

spawn(Node,?MODULE,prep,[ExoSelf_PId]).

prep(ExoSelf_PId) ->

receive

{ExoSelf_PId,{Id,Cx_PId,Substrate_PId,CPPName,VL,Parameters,Fanout_PIds}} ->

loop(Id,ExoSelf_PId,Cx_PId,Substrate_PId,CPPName,VL,Parameters,

Fanout_PIds)

end.

%When gen/2 is executed, it spawns the substrate_cpp process which immediately begins to

wait for its initial state message.

loop(Id,ExoSelf_PId,Cx_PId,Substrate_PId,CPPName,VL,Parameters,Fanout_PIds)->

receive

{Substrate_PId,Presynaptic_Coords,Postsynaptic_Coords}->

SensoryVector = functions:CPPName(Presynaptic_Coords,

Postsynaptic_Coords),

[Pid ! {self(),forward,SensoryVector} || Pid <- Fanout_PIds],

loop(Id,ExoSelf_PId,Cx_PId,Substrate_PId,CPPName,VL,Parameters,Fanout_PIds);

{ExoSelf_PId,terminate} ->

%io:format( “substrate_cpp:~p is terminating.~n ”,[Id]),

ok

end.

Since the conversion of Cartesian coordinates to other types belongs in the mathematical function module, we implement all the coordinate operators in the functions module, with the substrate_cpps calling them when needed, depending on their name. The coordinate operator functions added to the functions module are shown in Listing-16.4

Listing-16.4 The implemented coordinate operators added to the functions module. cartesian(I_Coord,Coord)->

lists:append(I_Coord,Coord).

polar(I_Coord,Coord)->

lists:append(cart2pol(I_Coord),cart2pol(Coord)).

spherical(I_Coord,Coord)->

lists:append(cart2spher(I_Coord),cart2spher(Coord)).

centripital_distances(I_Coord,Coord)->

[centripital_distance(I_Coord,0),centripital_distance(Coord,0)].

cartesian_distance(I_Coord,Coord)->

[calculate_distance(I_Coord,Coord,0)].

cartesian_CoordDiffs(I_Coord,Coord)-> %I:[X1,Y1,Z1] [X2,Y2,Z2] O:[X2-X1,Y2-Y1,Z2-Z1]

cartesian_CoordDiffs1(I_Coord,Coord,[]).

cartesian_CoordDiffs1([FromCoord|FromCoords],[ToCoord|ToCoords],Acc)->

cartesian_CoordDiffs1(FromCoords,ToCoords,[ToCoord-FromCoord|Acc]);

cartesian_CoordDiffs1([],[],Acc)->

lists:reverse(Acc).

cartesian_GaussedCoordDiffs(FromCoords,ToCoords)->

cartesian_GaussedCoordDiffs1(FromCoords,ToCoords,[]).

cartesian_GaussedCoordDiffs1([FromCoord|FromCoords], [ToCoord|ToCoords],Acc)->

cartesian_GaussedCoordDiffs1(FromCoords,ToCoords,[functions:gaussian(

ToCoord-FromCoord)|Acc]);

cartesian_GaussedCoordDiffs1([],[],Acc)->

lists:reverse(Acc).

cart2pol([Y,X])->

R = math:sqrt(X*X + Y*Y),

Theta = case R == 0 of

true ->

0;

false ->

if

(X>0) and (Y>=0)

(X>0) and (Y<0)

(X<0)

(X==0) and (Y>0)

(X==0) and (Y<0)

end

end,

[R,Theta].

cart2spher([Z,Y,X])->

PreR = X*X + Y*Y,

R = math:sqrt(PreR),

P = math:sqrt(PreR + Z*Z),

Theta = case R == 0 of

true ->

0;

false ->

if

(X>0) and (Y>=0)

(X>0) and (Y<0)

(X<0)

(X==0) and (Y>0)

(X==0) and (Y<0)

end

end,

Phi = case P == 0 of

false ->

math:acos(Z/P);

true ->

0

end,

[P,Theta,Phi].

centripetal_distance([Val|Coord],Acc)->

-> math:atan(Y/X);

-> math:atan(Y/X) + 2*math:pi();

-> math:atan(Y/X) + math:pi();

-> math:pi()/2;

-> 3*math:pi()/2

-> math:atan(Y/X);

-> math:atan(Y/X) + 2*math:pi();

-> math:atan(Y/X) + math:pi();

-> math:pi()/2;

-> 3*math:pi()/2

centripetal_distance(Coord,Val*Val+Acc);

centripetal_distance([],Acc)->

math:sqrt(Acc).

calculate_distance([Val1|Coord1],[Val2|Coord2],Acc)-> Distance = Val2 - Val1,

calculate_distance(Coord1,Coord2,Distance*Distance+Acc);

calculate_distance([],[],Acc)->

math:sqrt(Acc).

The implementation of the new functions in the functions module is straight- forward. These algorithms are standard implementations of coordinate operators.

With this new substrate_cpp module, and the updated functions module, the cpps can now be called by the substrate process. When called, the cpps calculate the “sensory ” vectors for the NN, and then fanout those composed vectors to the neurons that they are connected to. In the next subsection, we implement the sub- strate_ceps.

16.5.2 Implementing the substrate_cep Module

Similarly to the implementation of substrate_cpp module, we now implement the substrate_cep module based on the way the actuator module is setup. Similarly to the way the actuator works, the substrate_cep receives the signals from the neu- rons in the NN that are connected to it. The substrate_cep then processes the sig- nals, and sends the processed message to the substrate. We only have one sub- strate_cep at this time, the weight substrate_cep, which has a single presynaptic connection to a neuron, whose signal it then processes, scales, and sends to the substrate as the synaptic weight associated with the coordinate information for- warded to the NN by its substrate_cpps. Listing-16.5 shows the implemented sub- strate_cep module.

Listing-16.5 The implementation of the substrate_cep module.

-module(substrate_cep).

-compile(export_all).

-include( “records.hrl ”).

gen(ExoSelf_PId,Node)->

spawn(Node,?MODULE,prep,[ExoSelf_PId]).

prep(ExoSelf_PId) ->

receive

{ExoSelf_PId,{Id,Cx_PId,Substrate_PId,CEPName,Parameters,Fanin_PIds}} ->

loop(Id,ExoSelf_PId,Cx_PId,Substrate_PId,CEPName,Parameters,

{Fanin_PIds, Fanin_PIds},[])

end.

%When gen/2 is executed, it spawns the substrate_cep process which immediately begins to wait for its initial state message.

loop(Id,ExoSelf_PId,Cx_PId,Substrate_PId,CEPName,Parameters,{[From_PId|Fanin_PIds],

MFanin_PIds},Acc) ->

receive

{From_PId,forward,Input} ->

loop(Id,ExoSelf_PId,Cx_PId,Substrate_PId,CEPName,Parameters,

{Fanin_PIds, MFanin_PIds},lists:append(Input,Acc));

{ExoSelf_PId,terminate} ->

ok

end;

loop(Id,ExoSelf_PId,Cx_PId,Substrate_PId,CEPName,Parameters,{[],MFanin_PIds},Acc)->

ProperlyOrdered_Input=lists:reverse(Acc),

substrate_cep:CEPName(ProperlyOrdered_Input,Parameters,Substrate_PId),

loop(Id,ExoSelf_PId,Cx_PId,Substrate_PId,CEPName,Parameters,{MFanin_PIds,

MFanin_PIds},[]).

%The substrate_cep process gathers the control signals from the neurons, appending them to

the accumulator. The order in which the signals are accumulated into a vector is in the same or- der that the neuron ids are stored within NIds. Once all the signals have been gathered, the sub- strate_cep executes its function, forwards the processed signal to the substrate, and then again begins to wait for the neural signals from the output layer by resetting the Fanin_PIds from the second copy of the list stored in the MFanin_PIds.

%%%%%%%% Substrate_CEPs %%%%%%%%

set_weight(Output,_Parameters,Substrate_PId)->

[Val] = Output,

Threshold = 0.33,

Weight = if

Val > Threshold ->

(functions:scale(Val,1,Threshold)+1)/2;

Val < -Threshold ->

(functions:scale(Val,-Threshold,-1)-1)/2;

true ->

0

end,

Substrate_PId ! {self(),set_weight,[Weight]}.

%The set_weight/2 function first checks whether the neural output signal has a greater magni- tude than the Threshold value, which is set to 0.33 in this implementation. If it does not, then

the synaptic weight is zeroed out, and sent to the substrate. If the magnitude of the output is

higher, then the value is scaled between -1 and 1, and the resulting synaptic weight value is sent

to the substrate.

Mirroring the actuator, the substrate_cep gathers the signals, processes them, then sends the substrate process a message, and returns to its main process loop. Unlike the actuator, it does not need to sync up with the cortex, or receive any in- formation from the substrate after sending it the action message.

16.6 Updating the genotype Module

Having now developed the actual substrate_cpp and substrate_cep modules, we can return to the genotype module and update it so that it is capable of creating seed SENNs. This is a very simple module update, because cpps and ceps both behave just like the sensors and actuators of the standard NN system do. In the case of the substrate encoding, we simply generate both the sensor/actuator lists and the substrate_cpp/substrate_cep lists. Afterwards, we create the seed NN to- pology by forwarding to it the cpps and ceps, and because there is no difference between the structure of the sensors/actuators and cpps/ceps, the seed NN topolo- gy is created.

When the encoding type is set to substrate, we create not just the sub- strate_cpps and substrate_ceps, but also the substrate record. It is the substrate record in which we store the cpp_ids, cep_ids, the densities, the linkform, and the plasticity of the substrate (which at this time is set to none). The updated version of the construct_Cortex/4 function within the genotype module is shown in List- ing-16.6. The newly added source code is in bold text, note the significant reuse of the code when Encoding_Type = substrate .

Listing-16.6 The updated construct_Cortex/4 function, now capable of creating seed SENN genotypes.

construct_Cortex(Agent_Id,Generation,SpecCon,Encoding_Type, SPlasticity,SLinkform )->

Cx_Id = {{origin,generate_UniqueId()},cortex},

Morphology = SpecCon#constraint.morphology,

case Encoding_Type of

neural ->

Sensors = [S#sensor{id={{-1,generate_UniqueId()},sensor},cx_id=Cx_Id,

generation =Generation}|| S<- morphology:get_InitSensors(Morphology)],

Actuators = [A#actuator{id={{1,generate_UniqueId()},actuator},cx_id

=Cx_Id, generation=Generation}||A<-morphology:get_InitActuators(Morphology)],

N_Ids=construct_InitialNeuroLayer(Cx_Id,Generation,SpecCon,Sensors,

Actuators,[],[]),

S_Ids = [S#sensor.id || S<-Sensors],

A_Ids = [A#actuator.id || A<-Actuators],

Cortex = #cortex{

id = Cx_Id,

agent_id = Agent_Id,

neuron_ids = N_Ids,

sensor_ids = S_Ids,

actuator_ids = A_Ids

},

Substrate_Id = undefined;

substrate ->

Substrate_Id={{void,generate_UniqueId()},substrate},

Sensors = [S#sensor{id={{-1,generate_UniqueId()},sensor},cx_id=Cx_Id,

generation =Generation, fanout_ids=[Substrate_Id]}|| S<- morpholo- gy:get_InitSensors(Morphology)],

Actuators = [A#actuator{id={{1,generate_UniqueId()},actuator},cx_id

=Cx_Id,generation=Generation,fanin_ids=[Substrate_Id]}||A<-

morphology:get_InitActuators(Morphology)],

[write(S) || S <- Sensors],

[write(A) || A <- Actuators], Dimensions=calculate_OptimalSubstrateDimension(Sensors,Actuators),

Density = 5,

Depth = 1,

Densities = [Depth,1|lists:duplicate(Dimensions-2,Density)], %[X,Y,Z,T...]

Substrate_CPPs = [CPP#sensor{id={{-1,generate_UniqueId()},sensor},

cx_id =Cx_Id,generation=Generation}|| CPP<-

morphology:get_InitSubstrateCPPs(Dimensions, SPlasticity)],

Substrate_CEPs =

[CEP#actuator{id={{1,generate_UniqueId()},actuator},cx_id=Cx_Id,generation

=Generation}||CEP<-morphology:get_InitSubstrateCEPs(Dimensions,SPlasticity)],

N_Ids=construct_InitialNeuroLayer(Cx_Id, Generation, SpecCon,

Substrate_CPPs,Substrate_CEPs,[],[]),

S_Ids = [S#sensor.id || S<-Sensors],

A_Ids = [A#actuator.id || A<-Actuators],

CPP_Ids = [CPP#sensor.id || CPP<-Substrate_CPPs],

CEP_Ids = [CEP#actuator.id || CEP<-Substrate_CEPs],

Substrate = #substrate{

id = Substrate_Id,

agent_id = Agent_Id,

cpp_ids = CPP_Ids,

cep_ids = CEP_Ids,

densities = Densities,

plasticity=SPlasticity ,

linkform=SLinkform

},

write(Substrate),

Cortex = #cortex{

id = Cx_Id,

agent_id = Agent_Id,

neuron_ids = N_Ids,

sensor_ids = S_Ids,

actuator_ids = A_Ids

}

end,

write(Cortex),

{Cx_Id,[{0,N_Ids}],Substrate_Id}.

…

calculate_OptimalSubstrateDimension(Sensors,Actuators)->

S_Formats = [S#sensor.format || S<-Sensors],

A_Formats = [A#actuator.format || A<-Actuators], extract_maxdim(S_Formats++A_Formats,[]) + 2.

%The calculate_OptimalSubstrateDimension/2 function calculates the largest dimension be- tween the sensors and actuators, and then returns that value + 2.

extract_maxdim([F|Formats],Acc)->

DS=case F of

{symmetric,Dims}->

length(Dims);

no_geo ->

1;

undefined ->

1

end,

extract_maxdim(Formats,[DS|Acc]);

extract_maxdim([],Acc)->

lists:max(Acc).

%The extract_maxdim/2 function goes through a list of formats, and returns to the caller the largest value found, counting no_geo and undefined atoms as representing 1.

But this only takes care of creating the genotype of the substrate encoded NN, we also need to update the agent cloning, agent deleting, and agent genotype print- ing functions. Listing-16.7 shows the implementation of the noted functions, with the new source code shown in boldface.

Listing-16.7 The implementation of the updated genotype:print/1, genotype:delete_Agent/1,

and genotype:clone_Agent/1 functions.

print(Agent_Id)->

F = fun()->

A = read({agent,Agent_Id}),

Cx = read({cortex,A#agent.cx_id}),

io:format( “~p~n ”,[A]),

io:format( “~p~n ”,[Cx]),

[io:format( “~p~n ”,[read({sensor,Id})]) || Id <- Cx#cortex.sensor_ids],

[io:format( “~p~n ”,[read({neuron,Id})]) || Id <- Cx#cortex.neuron_ids],

[io:format( “~p~n ”,[read({actuator,Id})]) || Id <- Cx#cortex.actuator_ids],

case A#agent.substrate_id of

undefined ->

ok;

Substrate_Id->

Substrate = read({substrate,Substrate_Id}),

io:format( “~p~n ”,[Substrate]),

[io:format( “~p~n ”,[read({sensor,Id})]) || Id <-

Substrate#substrate.cpp_ids],

[io:format( “~p~n ”,[read({actuator,Id})]) || Id <-

Substrate#substrate.cep_ids]

end

end,

mnesia:transaction(F).

%print/1 accepts an agent's id, and prints out the complete genotype of that agent. delete_Agent(Agent_Id)->

A = read({agent,Agent_Id}),

Cx = read({cortex,A#agent.cx_id}),

[delete({neuron,Id}) || Id <- Cx#cortex.neuron_ids],

[delete({sensor,Id}) || Id <- Cx#cortex.sensor_ids],

[delete({actuator,Id}) || Id <- Cx#cortex.actuator_ids],

delete({cortex,A#agent.cx_id}),

delete({agent,Agent_Id}),

case A#agent.substrate_id of

undefined ->

ok;

Substrate_Id ->

Substrate = read({substrate,Substrate_Id}),

[delete({sensor,Id}) || Id <- Substrate#substrate.cpp_ids], [delete({actuator,Id})|| Id <- Substrate#substrate.cep_ids],

delete({substrate,Substrate_Id})

end.

%delete_Agent/1 accepts the id of an agent, and then deletes that agent's genotype. This func- tion assumes that the id of the agent will be removed from the specie's agent_ids list, and that any other clean up procedures, will all be done by the calling function.

clone_Agent(Agent_Id)->

CloneAgent_Id = {generate_UniqueId(),agent},

clone_Agent(Agent_Id,CloneAgent_Id).

clone_Agent(Agent_Id,CloneAgent_Id)->

F = fun()->

A = read({agent,Agent_Id}),

Cx = read({cortex,A#agent.cx_id}),

IdsNCloneIds = ets:new(idsNcloneids,[set,private]),

ets:insert(IdsNCloneIds,{bias,bias}),

ets:insert(IdsNCloneIds,{Agent_Id,CloneAgent_Id}),

[CloneCx_Id] = map_ids(IdsNCloneIds,[A#agent.cx_id],[]),

CloneN_Ids = map_ids(IdsNCloneIds,Cx#cortex.neuron_ids,[]),

CloneS_Ids = map_ids(IdsNCloneIds,Cx#cortex.sensor_ids,[]),

CloneA_Ids = map_ids(IdsNCloneIds,Cx#cortex.actuator_ids,[]),

case A#agent.substrate_id of

undefined ->

clone_neurons(IdsNCloneIds,Cx#cortex.neuron_ids),

clone_sensors(IdsNCloneIds,Cx#cortex.sensor_ids),

clone_actuators(IdsNCloneIds,Cx#cortex.actuator_ids),

U_EvoHist=map_EvoHist(IdsNCloneIds,A#agent.evo_hist),

write(Cx#cortex{

id = CloneCx_Id,

agent_id = CloneAgent_Id,

sensor_ids = CloneS_Ids,

actuator_ids = CloneA_Ids,

neuron_ids = CloneN_Ids

}),

write(A#agent{

id = CloneAgent_Id,

cx_id = CloneCx_Id,

evo_hist = U_EvoHist

});

Substrate_Id ->

Substrate = read({substrate,A#agent.substrate_id}),

[CloneSubstrate_Id] = map_ids(IdsNCloneIds,

[A#agent.substrate_id], []),

CloneCPP_Ids = map_ids(IdsNCloneIds,

Substrate#substrate.cpp_ids, []),

CloneCEP_Ids = map_ids(IdsNCloneIds,

Substrate#substrate.cep_ids,[]),

clone_neurons(IdsNCloneIds,Cx#cortex.neuron_ids),

clone_sensors(IdsNCloneIds,Cx#cortex.sensor_ids),

clone_actuators(IdsNCloneIds,Cx#cortex.actuator_ids),

Substrate = read({substrate,A#agent.substrate_id}),

clone_sensors(IdsNCloneIds,Substrate#substrate.cpp_ids),

clone_actuators(IdsNCloneIds,Substrate#substrate.cep_ids),

U_EvoHist=map_EvoHist(IdsNCloneIds,A#agent.evo_hist),

write(Substrate#substrate{

id = CloneSubstrate_Id,

agent_id = CloneAgent_Id,

cpp_ids = CloneCPP_Ids,

cep_ids = CloneCEP_Ids

16.7 Updating the exoself Module

}),

write(Cx#cortex{

id = CloneCx_Id,

agent_id = CloneAgent_Id,

sensor_ids = CloneS_Ids,

actuator_ids = CloneA_Ids,

neuron_ids = CloneN_Ids

}),

write(A#agent{

id = CloneAgent_Id,

cx_id = CloneCx_Id,

substrate_id = CloneSubstrate_Id,

evo_hist = U_EvoHist

})

end,

ets:delete(IdsNCloneIds)

end,

mnesia:transaction(F),

CloneAgent_Id.

Now that we have the ability to create the substrate encoded NN genotypes, we need to update the exoself module, so that the exoself process can properly spawn and link all the elements together when the NN is of type: substrate .

Exoself reads the genotype and produces the phenotype. When a NN based agent is substrate encoded, the exoself must behave slightly differently when spawning and linking the elements of the genotype. The exoself process must now keep track of the substrate_pid, cpp_pids and cep_pids, so that it can terminate them when the evaluation is done. Thus we update its state record by appending to it the following elements: substrate_pid, cpp_pids=[], cep_pids=[].

The spawning of the substrate encoded NN and the neural encoded NN, re- quires us to add the new spawn and link substrate_cpps and substrate_ceps func- tions. We must also update the exoself:prep/2 function so that when the agent is of type substrate, it can spawn and link the substrate, cpp, and the cep processes to- gether. The updated exoself:prep/2 function is shown in Listing-16.8, with the modified and added source code shown in boldface.

Listing-16.8 The implementation of the updated exoself:prep/2 function. prep(Agent_Id,PM_PId)->

random:seed(now()),

IdsNPIds = ets:new(idsNpids,[set,private]),

A = genotype:dirty_read({agent,Agent_Id}),

HeredityType = A#agent.heredity_type,

Cx = genotype:dirty_read({cortex,A#agent.cx_id}),

SIds = Cx#cortex.sensor_ids,

AIds = Cx#cortex.actuator_ids,

NIds = Cx#cortex.neuron_ids,

ScapePIds = spawn_Scapes(IdsNPIds,SIds,AIds),

spawn_CerebralUnits(IdsNPIds,cortex,[Cx#cortex.id]),

spawn_CerebralUnits(IdsNPIds,sensor,SIds),

spawn_CerebralUnits(IdsNPIds,actuator,AIds),

spawn_CerebralUnits(IdsNPIds,neuron,NIds),

case A#agent.encoding_type of

substrate ->

Substrate_Id=A#agent.substrate_id,

Substrate = genotype:dirty_read({substrate,Substrate_Id}),

CPP_Ids = Substrate#substrate.cpp_ids,

CEP_Ids = Substrate#substrate.cep_ids,

spawn_CerebralUnits(IdsNPIds,substrate_cpp,CPP_Ids),

spawn_CerebralUnits(IdsNPIds,substrate_cep,CEP_Ids),

spawn_CerebralUnits(IdsNPIds,substrate,[Substrate_Id]),

Substrate_PId=ets:lookup_element(IdsNPIds,Substrate_Id,2),

link_SubstrateCPPs(CPP_Ids,IdsNPIds,Substrate_PId),

link_SubstrateCEPs(CEP_Ids,IdsNPIds,Substrate_PId),

SDensities = Substrate#substrate.densities,

SPlasticity = Substrate#substrate.plasticity,

SLinkform = Substrate#substrate.linkform,

Sensors=[genotype:dirty_read({sensor,SId})||SId <- SIds],

Actuators=[genotype:dirty_read({actuator,AId})||AId <- AIds],

CPP_PIds=[ets:lookup_element(IdsNPIds,Id,2)||Id<-CPP_Ids],

CEP_PIds=[ets:lookup_element(IdsNPIds,Id,2)||Id<-CEP_Ids],

Substrate_PId ! {self(),init,{Sensors,Actuators,

[ets:lookup_element( IdsNPIds, Id,2)||Id<-SIds], [ets:lookup_element(IdsNPIds,Id,2)||Id<-

AIds],CPP_PIds, CEP_PIds, SDensities, SPlasticity,SLinkform}};

_ ->

CPP_PIds=[],

CEP_PIds=[],

Substrate_PId = undefined

end,

link_Sensors(SIds,IdsNPIds),

link_Actuators(AIds,IdsNPIds),

link_Neurons(NIds,IdsNPIds,HeredityType),

{SPIds,NPIds,APIds}=link_Cortex(Cx,IdsNPIds),

Cx_PId = ets:lookup_element(IdsNPIds,Cx#cortex.id,2),

{TuningDurationFunction,Parameter} = A#agent.tuning_duration_f,

S = #state{

agent_id=Agent_Id,

generation=A#agent.generation,

pm_pid=PM_PId,

idsNpids=IdsNPIds,

cx_pid=Cx_PId,

specie_id=A#agent.specie_id,

spids=SPIds,

npids=NPIds,

nids=NIds,

apids=APIds,

substrate_pid=Substrate_PId,

cpp_pids = CPP_PIds,

cep_pids = CEP_PIds,

scape_pids=ScapePIds,

max_attempts= tuning_duration:TuningDurationFunction(Parameter,NIds,

A#agent.generation),

tuning_selection_f=A#agent.tuning_selection_f,

annealing_parameter=A#agent.annealing_parameter,

tuning_duration_f=A#agent.tuning_duration_f,

perturbation_range=A#agent.perturbation_range

},

loop(S).

…

link_SubstrateCPPs([CPP_Id|CPP_Ids],IdsNPIds,Substrate_PId) ->

CPP=genotype:dirty_read({sensor,CPP_Id}),

CPP_PId = ets:lookup_element(IdsNPIds,CPP_Id,2),

Cx_PId = ets:lookup_element(IdsNPIds,CPP#sensor.cx_id,2),

CPPName = CPP#sensor.name,

Fanout_Ids = CPP#sensor.fanout_ids,

Fanout_PIds = [ets:lookup_element(IdsNPIds,Id,2) || Id <- Fanout_Ids],

CPP_PId ! {self(),{CPP_Id,Cx_PId,Substrate_PId,CPPName,CPP#sensor.vl,

CPP#sensor.parameters,Fanout_PIds}},

link_SubstrateCPPs(CPP_Ids,IdsNPIds,Substrate_PId);

link_SubstrateCPPs([],_IdsNPIds,_Substrate_PId)->

ok.

%The link_Sensors/2 function sends to the already spawned and waiting sensors their states, composed of the PId lists and other information which is needed by the sensors to link up and interface with other elements in the distributed phenotype.

link_SubstrateCEPs([CEP_Id|CEP_Ids],IdsNPIds,Substrate_PId) ->

CEP=genotype:dirty_read({actuator,CEP_Id}),

CEP_PId = ets:lookup_element(IdsNPIds,CEP_Id,2),

Cx_PId = ets:lookup_element(IdsNPIds,CEP#actuator.cx_id,2),

CEPName = CEP#actuator.name,

Fanin_Ids = CEP#actuator.fanin_ids,

Fanin_PIds = [ets:lookup_element(IdsNPIds,Id,2) || Id <- Fanin_Ids], CEP_PId ! {self(),{CEP_Id,Cx_PId,Substrate_PId,CEPName,

CEP#actuator.parameters, Fanin_PIds}},

link_SubstrateCEPs(CEP_Ids,IdsNPIds,Substrate_PId);

link_SubstrateCEPs([],_IdsNPIds,_Substrate_PId)->

ok.

%The link_SubstrateCEPs/2 function sends to the already spawned and waiting substrate_ceps their states, composed of the PId lists and other information which is needed by the sub- strate_ceps to link up and interface with other elements in the distributed phenotype.

The exoself's main loop must also be updated, but not as extensively. We have to be able to tell the substrate process to reset itself when the neurons have been perturbed, or reverted. Thus after every evaluation, we let the exoself send the substrate process a message to reset itself, which makes the substrate process call all the perturbed or reverted neurons and set the synaptic weights and connectivity expression of its neurodes anew. The added code is shown in the following listing:

Listing-16.9 The extra algorithm needed to tell the substrate process to reset itself.

case S#state.substrate_pid of

undefined ->

ok;

Substrate_PId ->

Substrate_PId ! {self(),reset_substrate},

receive

{Substrate_PId,ready}->

ok

end

end,

Finally, the exoself must now also be able to terminate the substrate process, and its cpps and ceps. Thus we make a slight modification to the termi- nate_phenotype function, as shown in Listing-16.10.

Listing-16.10 The updated exoself:terminate_phenotype function.

terminate_phenotype(Cx_PId,SPIds,NPIds,APIds,ScapePIds,CPP_PIds, CEP_PIds,

Substrate_PId)->

[PId ! {self(),terminate} || PId <- SPIds],

[PId ! {self(),terminate} || PId <- APIds],

[PId ! {self(),terminate} || PId <- NPIds],

16.8 Implementing the substrate Module

[PId ! {self(),terminate} || PId <- ScapePIds],

case Substrate_PId == undefined of

true ->

ok;

false ->

[PId ! {self(),terminate} || PId <- CPP_PIds],

[PId ! {self(),terminate} || PId <- CEP_PIds],

Substrate_PId ! {self(),terminate}

end,

Cx_PId ! {self(),terminate}.

No other modifications are needed. With this, the exoself is ready to work with both, the direct encoded NNs, and the substrate encoded NNs. Next, we imple- ment the most important thing, the actual substrate module.

While developing the substrate_cpp and substrate_cep modules, we have also created the message format that will be used to exchange messages between the substrate process and the interfacing cpps and ceps. Now that we know the archi- tecture of the substrate encoded NN, the manner in which cpps and ceps will pro- cess signals, and interface with the substrate, and the way we will encode the sub- strate, through the use of the densities list, and dynamic input and output hyperlayer creation through the analysis of sensor and actuator lists, we are ready to create the phenotypic representation of the substrate.

We now continue where we left of in section 16.4 with regards to the substrate representation and signal processing. This is going to be a bit more complex than the other things we've created. Knowing everything else about our system, the way cortex synchronizes the sensors and actuators, the way sensors then acquire, process, and fanout their sensory signals to the NN, or in this case a substrate, and the way the actuators wait for the signals from every id in their fanin_ids list, in this case just the substrate, how do we make it all work?

For the substrate to function as we planned for it in our architectural design, it must perform the following steps:

1. The substrate process is created using the gen/2 function.

2. The process then receives from the exoself a list of sensors & actuators with their pids, a list of pids for the substrate_cpps and substrate_ceps, and finally it also receives the Densities list, and the linkform. From this, it must create a proper substrate using the information from Sensors, Actuators, Densities lists, and the Linkform parameter. The substrate process must do so while keeping in

mind that every time exoself perturbs or reverts the neurons, the substrate's synaptic connection weights and expressions in general must too be updated.

3. The process must use some kind of flag, a substrate_state_flag, which will keep track of whether the substrate must be updated (as in the case when the NN has been perturbed) or can be kept for use again. The substrate process then drops into its main loop.

4. Just like other processes that accumulate incoming signals, the substrate pro- cess waits and accumulates (in the same order as the sensors are within the sen- sors list) the sensory signals. Once all the sensory signals have been accumulat- ed, the substrate process drops into its processing clause.

5. The substrate process checks what the substrate_state_flag is set to. If the flag is set to reset, then the substrate should be recreated or reset (due to a perturba- tion of the NN), before being able to process the sensory signals and produce the output signals destined for the actuators. If the substrate_state_flag is set to hold the current substrate, then it does not need to be recreated/updated.

6. This step is executed if the substrate_state_flag was set to reset. The substrate process analyzes the sensors, densities, and actuators, and based on the format within the sensors and actuators, and the dimensionality and form of the Densi- ties list, it creates a substrate in the form we have discussed in Section-16.4. At this point all the synaptic weights between this feedforward substrate (for now, we will only use the hyperlayer-to-hyperlayer feedforward substrate linkform) are set to 0. Thus they must now be set to their proper synaptic weight values.

7. For every connected set of neurodes, the substrate process forwards the coordi- nates of the two neurodes to the PIds of the substrate_cpps in its CPP_PIds list. And for every sent out tuple of coordinates, it waits for the signals from the PIds of the substrate_ceps in its CEP_PIds list. The signals will have some val- ue, and the name of the function which will dictate what function to execute on the currently used synaptic weight. The function will either simply set the syn- aptic weight to this new value forwarded to the substrate process by the sub- strate_cep, or perhaps modify the existing synaptic weight in the case plasticity is implemented within the substrate... Once this is done for every connection between the neurodes, given the hyperlayer-to-hyperlayer feedforward archi- tecture, the substrate is now considered functional and ready for use, and the substrate_state_flag is set to the value hold .

8. Because the input hyperlayer was created specifically based on the list of sen- sors that SENN uses, it will have the architecture needed for the accumulated list of sensory signals to be mapped to this input hyperlayer perfectly. The ac- cumulated list of sensory signals is a list of vectors. Each vector has the same length as every multidimensional hyperplane within the input hyperlayer, thus we can now replace the Output part within the tuples [{NeurodeCoordinate, Output, void}...] of the input hyperplanes, by the values of the sensory signals, as shown in Fig-16.12 , making these tuples into: [{NeurodeCoordinate , RealOutput, void}...] , where RealOutput is the value taken from the accumu- lated sensory signals list, associated with that particular neurode coordinate.

9. Now that the Input Hyperlayer has its output values set to real output values from the sensors, the substrate can be used to process the signals. We now use the algorithm discussed in Section-16.4 to perform processing in the hyperlayer-to-hyperlayer feedforward fashion. Since the neurodes in the pro- cessing hyperlayers now have their synaptic weights, they can process all the input signals from the presynaptic neurodes of the presynaptic hyperlayer. In this manner, the substrate processes the sensory signals until the output hyperlayer's Output values are calculated.

10.The Output values in the tuples representing the neurodes within the output hyperlayer now contain the actual output signals of the substrate. The layers within the output hyperlayer are in the same order as the actuators within the Actuators list for which they are destined. Thus the substrate can now extract from the layers associated with each actuator the corresponding output vectors, and forward them to the PIds of their respective actuators.

11.At this point the actuators are now interacting with the environment, and the substrate drops back into its main loop, awaiting again for the signals from its sensors or the exoself.

Fig. 16.12 The initial input hyperlayer with default output values, and the mapping between sensory signals produced by th e sensors, to the input hyperlayer.

At any point the exoself can send the substrate process a signal to reset itself. When the substrate process receives this signal, it simply sets its sub- strate_state_flag to reset , and thus after the next time it accumulates the sensory signals, it resets. To reset, it re-queries the NN to set the synaptic weights and connectivity between its neurodes anew, before processing the new sensory sig- nals. The diagram of the step-by-step functionality of the substrate process is shown in Fig-16.13 .

Fig. 16.13 The step-by-step functionality of the substrate process.

Lets quickly go over the steps shown in the above figure:

1. Exoself spawns neurons, sensors, actuators, substrate_cpps, substrate_ceps, substrate, and the cortex process.

2. Cortex sends the sync message to all the sensors, calling them to action.

3. Sensors poll the environment for sensory signals.

4. Sensors do postprocessing of the signals.

5. Sensors forward the processed sensory signals to the substrate.

6. Substrate process gathers all the signals from the sensors, and based on those signals, its densities, and the actuators, constructs a substrate if its sub- strate_state_flag is set to reset. If substrate_state_flag is set to hold, go to next step.

7. Substrate sends the coordinates of the connected neurodes to the substrate_cpps it is connected to.

8. The cpps process the coordinates , producing a new output vector.

9. The cpps forward the processed coordinate vectors to the neurons they are con- nected to in the NN.

10.NN processes the coordinate signals.

11.The neurons in the output layer of the NN produce output signals, which are then sent to the ceps they are connected to.

12.The ceps wait and gather the signals from all the neurons with whom they have presynaptic links. The ceps process the accumulated signals.

13.The ceps forward the vector signals to the substrate.

14.The substrate process calls the cpps for every connected neurode in the sub- strate. Once all the neurodes have their synaptic weights, the substrate maps the signals from the sensors to the input hyperlayer. It then processes the sensory signals, until at some later point the output hyperlayer contains the output sig- nals.

15.Each hyperplane in the output hyperlayer is associated with its own actuator, to which the output vector is then forwarded to.

16.Actuators gather the signals sent to them from their fanin_ids list (in this case the id of the substrate process).

17.Actuators use the signals to take action and interact with the environment they are interfacing with.

18.Actuators send the sync message back to the cortex.

19.The cortex gathers all the sync messages from all its actuators.

20.The cortex calls sensors to action, for another Sense-Think-Act loop. Go to step 3.

Using all of this information, we can now create the substrate module. The im- plementation of the substrate module will follow the 11 step approach we dis- cussed before the above 20 step sequence.

1. The process is created using the gen/2 function .

This is a simple function, similar to the one we use for every other element, as shown in the following listing.

Listing-16.11 The substrate process' state record, and the gen/2 function.

-module(substrate).

-compile(export_all).

-include( “records.hrl ”).

-define(SAT_LIMIT,math:pi()).

-record(state,{

type,

plasticity=none,

morphology,

specie_id,

sensors,

actuators,

spids=[],

apids=[],

cpp_pids=[],

cep_pids=[],

densities,

substrate_state_flag,

old_substrate,

cur_substrate,

link_form

}).

gen(ExoSelf_PId,Node)->

spawn(Node,?MODULE,prep,[ExoSelf_PId]).

2. The process then receives from the exoself a list of sensors & actuators with their pids, a list of pids for the substrate_cpps and substrate_ceps, and finally it also receives the Densities list, and the linkform. From this, it must create a proper substrate using the information from Sensors, Actuators, Densities lists, and the Linkform parameter. The substrate process must do so while keeping in mind that every time exoself perturbs or reverts the neurons, the substrate's synaptic connection weights and expressions in general must too be updated.

3. The process must use some kind of flag, a substrate_state_flag, which will keep track of whether the substrate must be updated (as in the case when the NN has been perturbed) or can be kept for use again. The substrate process then drops into its main loop.

During this step the substrate process begins to wait for its state information from the exoself. Its implementation is shown in Listing-16.12. Because the sub- strate is created during the processing step based on its substrate_state_flag, the initial Substrate is set to the init atom. The substrate_state_flag is set to the atom reset . We use the atom reset because the substrate's synaptic expression and weight values have to be reset when the NN is perturbed, hence the atom reset is appropriate.

Listing-16.12 The implementation of the prep/1 function.

prep(ExoSelf)->

random:seed(now()),

receive

{ExoSelf,init,InitState}->

{Sensors,Actuators,SPIds,APIds,CPP_PIds,CEP_PIds,Densities,Plasticity, LinkForm}

=InitState,

S = #state{

sensors=Sensors,

actuators=Actuators,

spids=SPIds,

apids=APIds,

cpp_pids=CPP_PIds,

cep_pids=CEP_PIds,

densities = Densities,

substrate_state_flag=reset,

old_substrate=void,

cur_substrate=init,

plasticity=Plasticity,

link_form = LinkForm

},

substrate:loop(ExoSelf,S,SPIds,[])

end.

4. Just like other processes that accumulate incoming signals, the substrate pro- cess waits and accumulates (in the same order as the sensors are within the sensors list) the sensory signals. Once all the sensory signals have been accu- mulated, the substrate process drops into its processing clause .

Next we implement the main process loop, during which the substrate process gathers the sensory signals from the sensors, and receives the signal from the exoself to reset the substrate when the exoself perturbs the NN. The implementa- tion of the main loop is shown in the next listing.

Listing-16.13 The implementation of the substrate process's main loop. loop(ExoSelf,S,[SPId|SPIds],SAcc)->

receive

{SPId,forward,Sensory_Signal}->

loop(ExoSelf,S,SPIds,[Sensory_Signal|SAcc]);

{ExoSelf,reset_substrate}->

U_S = S#state{

old_substrate=S#state.cur_substrate,

substrate_state_flag=reset

},

ExoSelf ! {self(),ready},

loop(ExoSelf,U_S,[SPId|SPIds],SAcc);

{ExoSelf,terminate}->

ok;

end;

loop(ExoSelf,S,[],SAcc)-> %All sensory signals received

{U_Substrate,U_SMode,OAcc} = reason(SAcc,S),

advanced_fanout(OAcc,S#state.actuators,S#state.apids),

U_S = S#state{

cur_substrate=U_Substrate,

substrate_state_flag=U_SMode

},

loop(ExoSelf,U_S,S#state.spids,[]).

As seen from the above implementation, once the sensory signals have been accumulated from all the sensors, we drop into the step where the substrate loop calls the necessary functions to create the substrate if needed, and process the sen- sory signals. The function which does all that is called reason/2, which is called once we drop out of the main receive loop. The function reason/2 returns the up- dated substrate: U_Substrate, the updated substrate_state_flag: U_SMode, and the accumulated output signal: OAcc. The function advanced_fanout/3 uses the OAcc list, breaks it up, and forwards the output signals to their respective actuators.

5. The substrate process checks what the substrate_state_flag is set to. If the flag is set to reset, then the substrate should be recreated or reset (due to a pertur- bation of the NN), before being able to process the sensory signals and produce the output signals destined for the actuators. If the substrate_state_flag is set to hold the current substrate, then it does not need to be recreated/updated.

This whole step is performed within the reason/2 function, as shown in Listing- 16.14.

Listing-16.14 The implementation of the reason/2 function.

reason(Input,S)->

Densities = S#state.densities,

Substrate = S#state.cur_substrate,

SMode = S#state.substrate_state_flag,

case SMode of

reset ->

Sensors=S#state.sensors,

Actuators=S#state.actuators,

CPP_PIds = S#state.cpp_pids,

CEP_PIds = S#state.cep_pids,

Plasticity = S#state.plasticity,

New_Substrate = create_substrate(Sensors,Densities,Actuators,

S#state.link_form),

{Output,Populated_Substrate} = calculate_ResetOutput(Densities,

New_Substrate, Input, CPP_PIds, CEP_PIds, Plasticity, S#state.link_form),

U_SMode=case Plasticity of

none ->

hold

end,

{Populated_Substrate,U_SMode,Output};

hold ->

{Output,U_Substrate} = calculate_HoldOutput(Densities,Substrate, Input,

S#state.link_form, S#state.plasticity),

{U_Substrate,SMode,Output}

end.

As can be seen from the above function, we are already in some sense prepar- ing for the case where the substrate_state_flag is updated differently, when sub- strate plasticity is used for example. If the substrate_state_flag stored in SMode is set to reset , the function create_substrate/4 is executed to create the new substrate, and then the function calculate_ResetOutput/6 is used to perform substrate based processing. If SMode is set to hold , then the function processes the signals but by executing the function calculate_HoldOutput/4. The difference here is that during the reset state, we still need to create the substrate, where's during the hold state, we expect that the substrate is already created, and so we need only map the input sensory signals to the input hyperlayer.

6. This step is executed if the substrate_state_flag was set to reset. The substrate process analyzes the sensors, densities, and actuators, and based on the format within the sensors and actuators, and the dimensionality and form of the Densi- ties list, it creates a substrate in the form we have discussed in Section-16.4. At this point all the synaptic weights between this feedforward substrate (for now, we will only use the hyperlayer-to-hyperlayer feedforward substrate linkform) are set to 0. Thus they must now be set to their proper synaptic weight values.

The substrate is constructed by executing the create_substrate/4 function, shown in the following listing.

Listing-16.15 The implementation of the create_substrate/4 function. create_substrate(Sensors,Densities,Actuators,LinkForm)->

[Depth|SubDensities] = Densities,

Substrate_I = compose_ISubstrate(Sensors,length(Densities)),

I_VL = length(Substrate_I),

case LinkForm of

l2l_feedforward ->

Weight = 0,

H = mult(SubDensities),

IWeights = lists:duplicate(I_VL,Weight),

HWeights = lists:duplicate(H,Weight);

fully_interconnected ->

Output_Neurodes = tot_ONeurodes(Actuators,0),

Weight = 0,

Tot_HiddenNeurodes = mult([Depth-1|SubDensities]),

Tot_Weights = Tot_HiddenNeurodes + I_VL + Output_Neurodes, IWeights = lists:duplicate(Tot_Weights,Weight),

HWeights = lists:duplicate(Tot_Weights,Weight);

jordan_recurrent ->

Output_Neurodes = tot_ONeurodes(Actuators,0),

Weight = 0,

H = mult(SubDensities),

IWeights = lists:duplicate(I_VL+Output_Neurodes,Weight),

HWeights = lists:duplicate(H,Weight)

end,

case Depth of

0 ->

Substrate_O=compose_OSubstrate(Actuators,length(Densities),IWeights),

[Substrate_I,Substrate_O];

1 ->

Substrate_R = cs(SubDensities,IWeights),

Substrate_O=compose_OSubstrate(Actuators,length(Densities),HWeights),

[Substrate_I,extrude(0,Substrate_R),Substrate_O];

_ ->

Substrate_R = cs(SubDensities,IWeights),

Substrate_H = cs(SubDensities,HWeights),

Substrate_O=compose_OSubstrate(Actuators,length(Densities),HWeights),

[_,RCoord|C1] = build_CoordList(Depth+1),

[_|C2] = lists:reverse(C1),

HCoords = lists:reverse(C2),

ESubstrate_R = extrude(RCoord,Substrate_R),

ESubstrates_H = [extrude(HCoord,Substrate_H) || HCoord<-HCoords], lists:append([[Substrate_I,ESubstrate_R],ESubstrates_H,[Substrate_O]])

end.

compose_ISubstrate(Sensors,SubstrateDimension)->

compose_ISubstrate(Sensors,[],1,SubstrateDimension-2).

compose_ISubstrate([S|Sensors],Acc,Max_Dim,Required_Dim)->

case S#sensor.format of

undefined ->

Dim=1,

CoordLists = create_CoordLists([S#sensor.vl]),

ISubstrate_Part=[{Coord,0,void}|| Coord<-CoordLists],

{Dim,ISubstrate_Part};

no_geo ->

Dim=1,

CoordLists = create_CoordLists([S#sensor.vl]),

ISubstrate_Part=[{Coord,0,void}|| Coord<-CoordLists],

{Dim,ISubstrate_Part};

{symmetric,Resolutions}->

Dim = length(Resolutions),

Signal_Length = mult(Resolutions),

CoordLists = create_CoordLists(Resolutions),

ISubstrate_Part=[{Coord,0,void}|| Coord<-CoordLists],

{Dim,ISubstrate_Part}

end,

U_Dim = case Max_Dim > Dim of

true ->

Max_Dim;

false ->

Dim

end,

compose_ISubstrate(Sensors,[ISubstrate_Part|Acc],U_Dim,Required_Dim);

compose_ISubstrate([],Acc,ISubstratePart_MaxDim,Required_Dim)->

case Required_Dim >= ISubstratePart_MaxDim of

true ->

ISubstrate_Depth = length(Acc),

ISubstrate_DepthCoords = build_CoordList(ISubstrate_Depth),

adv_extrude(Acc,Required_Dim, lists:reverse(ISubstrate_DepthCoords),

-1,[]);

false ->

exit( “Error in adv_extrude, Required_Depth <

ISubstratePart_MaxDepth ~n ”)

end.

adv_extrude([ISubstrate_Part|ISubstrate],Required_Dim, [IDepthCoord

|ISubstrate_DepthCoords], LeadCoord,Acc)->

Extruded_ISP =

[{[LeadCoord,IDepthCoord|lists:append(lists:duplicate(Required_Dim - length(Coord), 0),Coord)], O, W} || {Coord,O,W}<-ISubstrate_Part],

extrude(ISubstrate_Part,Required_Dim,IDepthCoord,[]),

adv_extrude(ISubstrate,Required_Dim,ISubstrate_DepthCoords,LeadCoord,

lists:append(Extruded_ISP,Acc));

adv_extrude([],_Required_Dim,[],_LeadCoord,Acc)->

Acc.

extrude([{Coord,O,W}|ISubstrate_Part],Required_Dim,DepthCoord,Acc)-> Dim_Dif = Required_Dim - length(Coord),

U_Coord= [1,DepthCoord|lists:append(lists:duplicate(Dim_Dif,0),

Coord)],

extrude(ISubstrate_Part,Required_Dim,DepthCoord, [{U_Coord,O,W}

|Acc]);

extrude([],_Required_Dim,_DepthCoord,Acc)->

Acc.

compose_OSubstrate(Actuators,SubstrateDimension,Weights)->

compose_OSubstrate(Actuators,[],1,SubstrateDimension-2,Weights).

compose_OSubstrate([A|Actuators],Acc,Max_Dim,Required_Dim,Weights)->

case A#actuator.format of

undefined -> %Dim=void,OSubstrate_Part=void,

Dim=1,

CoordLists = create_CoordLists([A#actuator.vl]),

OSubstrate_Part=[{Coord,0,Weights}|| Coord<-CoordLists],

{Dim,OSubstrate_Part};

no_geo -> %Dim=void,OSubstrate_Part=void,

Dim=1,

CoordLists = create_CoordLists([A#actuator.vl]),

OSubstrate_Part=[{Coord,0,Weights}|| Coord<-CoordLists],

{Dim,OSubstrate_Part};

{symmetric,Resolutions}-> %Dim=void,OSubstrate_Part=void,

Dim = length(Resolutions),

Signal_Length = mult(Resolutions),

CoordLists = create_CoordLists(Resolutions),

OSubstrate_Part=[{Coord,0,Weights}|| Coord<-CoordLists],

{Dim,OSubstrate_Part}

end,

U_Dim = case Max_Dim > Dim of

true ->

Max_Dim;

false ->

Dim

end,

com-

pose_OSubstrate(Actuators,[OSubstrate_Part|Acc],U_Dim,Required_Dim,Weights);

compose_OSubstrate([],Acc,OSubstratePart_MaxDim,Required_Dim,_Weights)->

case Required_Dim >= OSubstratePart_MaxDim of

true -> %done;

ISubstrate_Depth = length(Acc),

ISubstrate_DepthCoords = build_CoordList(ISubstrate_Depth),

adv_extrude(Acc,Required_Dim,lists:reverse(ISubstrate_DepthCoords),

1,[]);

false ->

exit( “Error in adv_extrude, Required_Depth <

OSubstratePart_MaxDepth~n ”)

end.

find_depth(Resolutions)->find_depth(Resolutions,0).

find_depth(Resolutions,Acc)->

case is_list(Resolutions) of

true ->

[_Head|Tail] = Resolutions,

find_depth(Tail,Acc+1);

false ->

Acc

end.

build_CoordList(Density)->

case Density == 1 of

true ->

[0.0];

false ->

DensityDividers = Density - 1,

Resolution = 2/DensityDividers,

build_CoordList(Resolution,DensityDividers,1,[])

end.

extend(I,DI,D,Substrate)->

void.

mult(List)->

mult(List,1).

mult([Val|List],Acc)->

mult(List,Val*Acc);

mult([],Acc)->

Acc.

tot_ONeurodes([A|Actuators],Acc)->

Tot_ANeurodes=case A#actuator.format of

undefined ->

A#actuator.vl;

no_geo ->

A#actuator.vl;

{symmetric,Resolutions}->

mult(Resolutions)

end,

tot_ONeurodes(Actuators,Tot_ANeurodes+Acc);

tot_ONeurodes([],Acc)->

Acc.

cs(Densities,Weights)->

RDensities = lists:reverse(Densities),

Substrate = create_CoordLists(RDensities,[]),

attach(Substrate,0,Weights).

create_CoordLists(Densities)->

create_CoordLists(Densities,[]).

create_CoordLists([Density|RDensities],[])->

CoordList = build_CoordList(Density),

XtendedCoordList = [[Coord]||Coord <- CoordList], create_CoordLists(RDensities,XtendedCoordList);

create_CoordLists([Density|RDensities],Acc)->

CoordList = build_CoordList(Density),

And finally recompile the population_monitor module, and execute the func- tion benchmarker:start(dpb), as shown in the following listing:

Listing-16.23 The double pole balancing benchmark, performed with the substrate encoded

NNs.

Graph:{graph,pole_balancing,

[1.106212121212121,1.1408585858585858,1.1193686868686867,

1.161489898989899,1.143080808080808,1.0764141414141413,

1.1325252525252525,1.1934343434343437,1.1413383838383837,

1.1829797979797978],

[0.05902592904791409,0.09147103257884823,0.0803810171785662,

0.07401185164044073,0.08683375207803117,0.08533785941757911,

0.08215891142076008,0.24593906122148776,0.20476041049617125,

0.2504944656040026],

[0.0855150202020202,0.6052218588038502,1.5313114901359988,

2.599710070705357,3.797623517536588,44.833702130336846,

50.523672653857076,51.832271099817774,180.47316244780285,

158.35976811529105],

[0.010608461667327089,2.1795158240432704,6.130303687259124,

8.258297144062041,8.028234616671885,121.66517882421797,

111.40580585983162,72.48290852966396,424.3416641012721,

343.9761405284347],

[0.1431,10.105238893248718,40.68352829762942,40.68352829762942,

77.07757425714148,887.0261903586879,887.0261903586879,

887.0261903586879,2588.3673781672096,2588.3673781672096],

[0.0253,0.0406,0.0253,0.0253,0.0253,0.0253,0.0253,0.0171,0.0253,

0.0253],

[7.45,7.55,7.65,7.4,6.65,6.05,6.3,6.9,6.15,6.3],

[1.1608186766243898,1.116915395184434,1.5256146302392357,

1.562049935181331,1.3518505834595773,1.5321553446044565,

1.452583904633395,1.6703293088490065,1.3883443376914824,

1.4177446878757827],

[500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0],

[]}

Tot Evaluations Avg:5079.95 Std:44.72189061298728

It did not solve the problem, but that is to be expected. The most important part is that our SENN system is functional, there are no bugs, and our neuroevolutionary system did evolve more and more fit substrate encoded NN systems over time. The fact that this type of NN system does not solve the double pole balancing problem is to be expected, because the solution requires recurrent connections and topologies not available to our SENN yet. Substrate encoding of type l2l_feedforward, or jordan_recurrent, simply don't provide the right topolo- gies. It would require us to implement the freeform version of the substrate, before it is able to solve this problem.

On the other hand the T-Maze problem does have geometrical regularities. For example the maze has directions, and the sensors come from the left, straight ahead, and right, and the movements based on the signals are similarly made to the

left, straight ahead, or to the right. Thus the sensors and movements are geometri- cally correlated. But the T-Maze sensors and actuators we've created for our NN based agents do not encode the sensory information, and do not accept the output signals in the ways that take advantage of the geometrical regularities of this problem. Thus we build the new sensor and actuator which was shown to perform well in [2], which discussed the use of substrate encoded NN based systems in the T-Maze navigation based problems.

We first create two new sensors and a new actuator. The problem with our cur- rent sensor is that it mingles the reward data with the range data. The range senso- ry signals do have geometrical properties, the signals coming from the left, for- ward, and the right range sensors, hold that directional geometrical information. But the reward signal has no geometrical information, and should be in its own sensor and thus on a different input layer. Thus, we need to create two new sen- sors, a single dimensional range sensor that just gathers the range sensory data from the private scape, and a single dimensional reward size sensor, which for- wards to the substrate a vector of length 1, containing the reward size of the re- ward acquired at that particular sector. But because the signals that the private scape returns to a querying sensor is based on that sensor's parameter, we have ac- tually already implemented these two sensors. We can allow our existing dtm_GetInput sensor to act as the two sensors we are after by simply having one dtm_GetInput use the parameter reward, and the other use range_sense , with the vl set to 1 and 3 respectively. Thus, with regards to the sensors, we need only modify the morphology module, so that two sensors are produced through two dif- ferent parameters, as show in Listing-16.24.

Listing-16.24 Modifying the morphology module to specify two sensors using the parameters: reward , and range_sense , rather than: all.

discrete_tmaze(sensors)->

[#sensor{name=dtm_GetInput,type=standard,scape={private,dtm_sim},vl=VL, parameters =[Parameter]} || {VL,Parameter} <- [{1,reward},{3,range_sense}]];

discrete_tmaze(actuators)->

[

%#actuator{name=dtm_SendOutput,type=standard,scape={private,dtm_sim},vl=1,

parameters=[]}

actuator{name=dtm_SubstrateSendOutput,type=standard,scape={private,dtm_sim},vl=3,

parameters=[]}

].

From the above listing, you will also notice that a new actuator has already been specified, the dtm_SubstrateSendOutput actuator, with vl=3. The actuator we used in the previous two chapters was a simple program which accepted a vector of length one, and based on it decided whether to send to the private scape an ac- tion that would turn the agent's avatar to the left, to the right, or move straight

ahead. But now we have access to the geometry of the sensor, and thus we can use an actuator that takes advantage of that geometry. And so we make an actuator that is executed with an output vector of length 3: [L,F,R], calculates which of the elements within the vector has the highest magnitude, and based on that value per- forms the action. If L is the highest, then the agent turns left and moves one sector forward, if F is the highest, the agent moves forward one sector, and if R is the highest, the agent turns right and moves one sector forward. The implementation of the new actuator is shown in Listing-16.25.

Listing-16.25 The implementation of the substrate based actuator added to the actuator module. dtm_SubstrateSendOutput(Output,Parameters,Scape)->

[L,F,R] = Output,

Action =if

((L > F) and (L > R)) -> [-1];

((R > F) and (R > L)) -> [1];

true -> [0]

end,

Scape ! {self(),move,Parameters,Action},

receive

{Scape,Fitness,HaltFlag}->

{Fitness,HaltFlag}

end.

Though certainly these new sensors and actuators can also be used by our standard neural encoded NNs, it is the substrate encoded NN based agent that can better take advantage of their properties. With these new sensors and actuators, we now perform the benchmark of our system on the T-Maze problem. We leave the pmp parameters as in the previous chapter, but we slightly change the genotype, so that the agent starts off with both of the sensors rather than just one. In this man- ner the agent from the very start will have access to the reward and range sensors, and access to the movement actuator (just as did our neural encoded agent). This is accomplished by us simply using the function morphology:get_Sensors/1 in- stead of morphology:get_InitSensors/1 in the genotype module under the encoding case type: substrate. We do this only for this problem, and can change it back after we're done.

to:

To perform the benchmark, we set the benchmarker's ?INIT_CONSTRAINTS

-define(INIT_CONSTRAINTS,[#constraint{morphology=Morphology,connection_architecture

=CA, population_evo_alg_f=generational, agent_encoding_types=[substrate],

substrate_plasticities=[none]} || Morphology<-[discrete_tmaze], CA<-[feedforward]]),

Then execute polis:sync(), and then finally run the benchmark by executing: benchmarker:start(substrate_dtm). The result of this benchmark is shown in the following listing.

Listing-16.26 The benchmark results of performing the T-Maze benchmark with a substrate en- coded NN based agent.

Graph:{graph,discrete_tmaze,

[1.0941666666666665,1.0963888888888889,1.1025,1.1183333333333334,

1.0733333333333335,1.0944444444444446,1.0950000000000002,

1.0899999999999999,1.0900000000000003,1.107222222222222],

[0.08648619029134716,0.09606895350299437,0.1077903056865505,

0.09157571245210767,0.07423685817106697,0.08624541497922236,

0.09733961166965892,0.099498743710662,0.06999999999999999,

0.1057177117038637],

[113.04933333333342,107.690888888889,109.23533333333341,

110.34133333333344,112.18933333333344,117.18266666666673,

110.90533333333342,113.21533333333343,110.77066666666674,

111.924888888889],

[10.895515978807254,12.527729754034578,9.791443225819389,

9.502769093503451,9.827619673371789,6.41172966783014,

12.66381855699315,8.934695692138094,10.400047521258944,

8.231449225116977],

[122.0000000000001,122.0000000000001,122.0000000000001,

122.0000000000001,122.0000000000001,122.0000000000001,

122.0000000000001],

[10.000000000000115,10.000000000000115,10.000000000000115,

10.000000000000115,10.000000000000115,10.000000000000115,

10.000000000000115],

[8.4,8.3,9.15,9.25,8.85,8.9,9.0,8.65,8.9,9.0],

[1.1135528725660042,1.004987562112089,0.7262919523166976,

0.7664854858377946,1.3883443376914821,0.8306623862918074,

1.140175425099138,1.3518505834595775,1.1789826122551597,

0.9486832980505138],

[500.0,500.0,500.0,475.0,500.0,500.0,500.0,500.0,500.0,500.0],

[]}

Tot Evaluations Avg:5091.5 Std:37.78160928282436

As expected, and similarly to the results of the T-Maze navigation of chapter- 14, our TWEANN was able to evolve agents which always go to the right corner containing the large reward before the position of the large and the small rewards are switched. And as before, we will be able to evolve agents which learn from

16.13 References

experience after we've added plasticity in the next chapter, which will result in agents able to solve this problem and achieve the score of 149.2.

Nevertheless, the test shows that our system works, and the new actuator and sensors work. Our TWEANN system evolved a relatively competent T-Maze nav- igator, but the substrate still lacks plasticity, and so we cannot expect our TWEANN to evolve a perfect solution just yet. Both of these examples, double pole balancing and T-Maze navigation, demonstrate that our substrate encoded NN based systems are functional, without errors, and capable of evolving and im- proving, and that our TWEANN can successfully evolve such agents. We need now only to apply our substrate encoded NN based agents to a problem which of- fers an advantage to systems capable of extracting the problem's geometrical regularities. An example of such a problem is the analysis of financial charts. And it is this problem that we will explore in Chapter-19.

16.12 Summary and Discussion

In this chapter we have developed a substrate encoded architecture for our memetic algorithm based topology and weight evolving artificial neural network platform. We modified the implementation of our TWEANN to allow the evolution of both, neural and substrate encoded NNs. Our substrate encoded NN is able to evolve and integrate new substrate_cpps and substrate_ceps, and to also use vari- ous substrate topologies through the use of the linkform element in the substrate record. We have implemented and tested our new system. The tests demonstrate that our TWEANN system is able to evolve substrate encoded NNs effectively, but that the problems we have created for testing purposes are not well suited for this type of encoding.

This was not an easy chapter, the code was long, and it will require some time to analyze. Even more, our evolved SENN agents are not yet fully implemented; the substrate encoded NN based agents are still missing an important feature: plasticity . In the next chapter we add that feature, allowing for the substrates to not only have the evolved NNs simply set the neurode synaptic weights, but allow those same NNs to act as evolving learning rules, which update the synaptic weights of the substrate embedded neurodes, based on their location, and their pre- and post- synaptic signals.

[1] Supplementary material: www.DXNNResearch.com/NeuroevolutionThroughErlang

[2] Risi S, Stanley KO (2010) Indirectly Encoding Neural Plasticity as a Pattern of Local Rules. Neural Plasticity 6226, 1-11.

[3] Haasdijk E, Rusu AA, Eiben AE (2010) HyperNEAT for Locomotion Control in Modular Robots. Control 6274, 169-180.

[4] Coleman OJ (2010) Evolving Neural Networks for Visual Processing. Undergraduate Hon- ours Thesis (Bachelor of Computer Science), University of New South Wales

[5] Stanley KO, Miikkulainen R (2002) Evolving Neural Networks Through Augmenting Topol- ogies. Evolutionary Computation 10, 99-127.

Chapter 17 Substrate Plasticity

Abstract In this chapter we develop a method for the substrate to possess plastic- ity, and thus have the synaptic weights of its neurodes change through experience. We first discuss the ABC and the Iterative substrate learning rules popularized within the HyperNEAT neuroevolutionary system. Then we implement the said learning rules within our own system, through only just a few minor modifications to our existing architecture.

Our system now has a highly advanced direct encoded NN implementation with plasticity. It includes various other performance improving features. Our TWEANN platform can even evolve substrate encoded NN based agents. Yet still our TWEANN does not have all the essential elements of a bleeding edge system, one thing is missing... substrate plasticity .

The implementation of substrate plasticity is not as simple as turning the plas- ticity on in the NN itself. If we allow for the NN to have plasticity, it will not translate into a substrate encoded NN with plasticity because the NN will simply be changing, but simply based on the sequence of the coordinates that it is processing from the substrate embedded neurodes, as opposed to the actual sensory signals. Also, as it changes, it will have no affect on the substrate which is the one pro- cessing the sensory signals, because the NN sets the substrate's synaptic weights only at the very start of an evaluation. Thus, what we need is a method that will allow for the synaptic weights between the neurodes to update using some kind of advanced learning rule, not necessarily Hebbian. Surprisingly, this is an easy task to accomplish.

At this time the NN produces synaptic weights between the neurodes. But what if we feed the NN not just the coordinates of the connected neurodes, but also the neurode's presynaptic signal, its postsynaptic signal, its current weight for the connection, and then designate and use the NN's output not as a new synaptic weight, but as a change in that weight? And let the NN produce these changes not only at the very start of the evaluation, but continually, every cycle, every time the substrate processes its set of sensory vectors. This approach will effectively make the entire NN into a learning rule producing system. So how do we do it?

For example, if we change the substrate_cpp from feeding the NN the vector: [X1,Y1...X2,Y2...] to: [X1,Y1...X2,Y2...PreSynaptic,PostSynaptic,CurWeight], and change the standard substrate_cep currently used from outputting: [SynapticWeight] to: [Delta_SynapticWeight], and instead of setting the sub- strate_state_flag to hold, we set it to iterative, letting the NN produce the Del- ta_SynapticWeight signals after every time the substrate processes a signal, the substrate will gain a highly dynamic learning rule. After processing the sensory signals, the substrate has all its neurode weights updated from CurWeight to CurWeight+Delta_SynapticWeight. And so the substrate is changing, learning

through sensory experience. Of course this does mean that we have to execute the NN X number of times, every time the substrate processes the signals from its sensors, where X is the number of neurode connections. So if SENN lives for 1000 cycles, it must execute and poll the NN 1000*X number of times.

On the other hand, if we for example use a substrate_cep which outputs: [W,A,B,C,N] similarly to the substrate_cep that outputs: [W], just once during the evaluation at the beginning, then the substrate_cep outputs not just the synaptic weights between the neurodes, but the parameters the neurodes can use for the exe- cution of a Hebbian learning rule. This too will effectively add plasticity to the substrate, since now it not only has the initial synaptic weights, but also the para- meters needed to calculate the update of the synaptic weights of every neurode after it processes some signal. Not to mention, the NN is a highly complex, advanced, evolving system, and thus the learning rules, the parameters, will be coordinate and connection specific, and the learning rule itself will evolve and optimize over the generations due to the NN itself growing and evolving over time.

The two of the above rules were originally introduced in the HyperNEAT sys- tem, and tested on a discrete T-Maze problem similar to the one we built in an earlier chapter. The first method is called iterative , while the second method is called the abc update rule (aka abcn update rule). As you can see, we can use the NN to really produce any kind of signal, and use that NN output signal it any way we want, as a synaptic weight, as a synaptic weight update rule, as both, as none... In this chapter we implement the above two rules, and then test our new learning system on the Discrete T-Maze problem.

17.1 The Updated Architecture

We have actually already performed the brunt of the work in the previous chap- ter. The way we designed our substrate encoded NN system, allows us to easily use the NN on the substrate in any way we want, and it allows us to similarly re- purpose the substrate to any task. To implement the abcn learning rule, we will need to slightly change the tuple representation of the neurodes within the sub- strate, such that each tuple can also keep track of the learning parameters for every synaptic weight. The case will be even simpler with regards to the Iterative learn- ing rule, in which the NN can directly output a signal which simply acts as the change in the synaptic weight. The only major update to our implementation with regards to the iterative learning rule, is that the neurodes need to be updated after every time they process a signal, and that update has to be done by calling the NN for every synaptic weight.

The other significant update to our system will deal with creating new sub- strate_cpps within the morphology. As you recall, a general Hebbian learning rule requires the postsynaptic neurode's input from presynaptic neurode X, postsynaptic neurode's synaptic weight for the connection with X, and the postsynaptic

17.2 Implementing the abcn Learning Rule

neurode's output. Since our NN will now act as a learning rule by producing the change in the synaptic weight (in the case of the iterative implementation), or pro- duce a set of parameters for each neurode to utilize a form of a Hebbian learning rule (in the case of the abcn implementation), we need to feed the NN not just the coordinates of the connected neurodes, but also these three essential values: Input, Weight, Output.

- - - Note****

Although granted, we could choose to simply feed the NN just the coordinates… but that would make for a much less effective learning rule, since the NN will have that much less information about the synaptic links.

In the next two sections we will add the necessary functions to our substrate implementation such that it can support these two types of learning rules. The best part is that these two rules encompass a whole class of other related rules, which after the implementation can easily be expanded. For example, after the abcn rule is implemented, we can easily change it to Oja's, or any other type of parameter based learning rule. On the other hand, after we implement the iterative rule, we will be able to implement any type of rule where synaptic weight or other types of parameters have to be updated after every single signal processing step (during every sense-think-act cycle).

When using the abcn rule, the NN needs to be called only once every evalua- tion, per connection between two neurodes. The only difference between it and the standard none plasticity version we have implemented, is that the NN does not on- ly set up the synaptic weight W, but also generates the parameters: A, B, C, N, for each such synaptic weight. The implementation of this learning rule will primarily concentrate on the change of the tuple representing the neurode within the sub- strate.

17.2.1 Updating the substrate Module

While updating the substrate module, we want to disrupt as little of it as possi- ble. The standard neurode representing tuple has the following format: {Coordi- nate,Output,Weights}. The Weights list is only used for its values within the func- tion calculate_neurode_output_std/3, which itself is executed from the function calculate_output/3, which is called for every neurode to calculate that neurode's output. This function is shown in the following listing.

Listing-17.1 The calculate_output/3 function.

calculate_output(I_Neurodes,Neurode,Plasticity,CPP_PIds,CEP_PIds)->

{Coord,_Prev_O,Weights} = Neurode,

case Plasticity of

none ->

Output=calculate_neurode_output_std(I_Neurodes,Neurode,0),

{Coord,Output,Weights}

end.

This is the function executed during the times when the substrate_state_flag is set to reset and hold, and it's already almost perfectly set up for use with different types of plasticities. Thus, since this is the only function that really sees the format of the Weights list, which is itself set by the set_weight/2 function executed when the substrate_cep sends the substrate the set_weight message, we need only add a new function triggered by a substrate_cep (similar to set_weight, but which sets all the other Hebbian learning rule parameters as well), and a new calcu- late_neurode_output_plast/3 function, which can handle the new type of tuple rep- resenting a plastic neurode.

To allow for Oja's and Hebbian types of plasticity, we need for the neurode to store not just weights, but also the learning parameters for each weight. When the substrate uses a Hebbian type of plasticity, we need to allow for the neurodes to be represented as: {Coordinates,Output,WeightsP}, where WeightsP has the format: [{W, LearningRuleName, Parameters}...].

As we noted before, because the structure of the list representing the neurode's synaptic weights is accessed and used when those weights are being set by the set_weight function, and when being read to calculate the output by the calcu- late_substrate_output_std function, we need only modify those two functions to set the weights list to the new format, and create a new calcu- late_substrate_output_plast function which can read that list. Finally, we also cre- ate a new function which updates the neurode after it has calculated an output, at which point the Hebbian learning rule has the Input, Output, and the Synaptic Weights needed to calculate the weight changes.

Since we must send the substrate_cpps not just the coordinates of the two neu- rons, but also the neurode's Input, Output, and current synaptic Weight values, we first update the populate_PHyperlayers_l2l , populate_PHyperlayers_fi , and popu- late_PHyperlayers_nsr functions, from which the get_weights is called. The up- dated version simply checks the plasticity, and if it is of type abcn , it then calls the get_weights function, which calls the cpps with the three new values, as shown in the following listing.

Listing-17.2 The updated implementation of the populate_PHyperlayers_l2l function, and the

new get_weights function.

populate_PHyperlayers_l2l(PrevHyperlayer,[{Coord,PrevO,PrevWeights}|CurHyperlayer],

Substrate, CPP_PIds, CEP_PIds, Plasticity,Acc1,Acc2)->

NewWeights = case Plasticity of

none ->

get_weights(PrevHyperlayer,Coord,CPP_PIds,CEP_PIds,[]);

_ ->

get_weights(PrevHyperlayer,Coord,CPP_PIds,CEP_PIds,[],PrevWeights,

PrevO)

end,

populate_PHyperlayers_l2l(PrevHyperlayer,CurHyperlayer,Substrate,CPP_PIds,CEP_PIds,

Plasticity,[{Coord,PrevO, NewWeights}|Acc1],Acc2);

populate_PHyperlayers_l2l(_PrevHyperlayer,[],[CurHyperlayer|Substrate],CPP_PIds,

CEP_PIds, Plasticity, Acc1,Acc2)->

PrevHyperlayer = lists:reverse(Acc1),

populate_PHyperlayers_l2l(PrevHyperlayer, CurHyperlayer,Substrate,CPP_PIds,CEP_PIds,

Plasticity,[],[PrevHyperlayer|Acc2]);

populate_PHyperlayers_l2l(_PrvHyperlayer,[],[],CPP_PIds,CEP_PIds,Plasticity,Acc1,Acc2)->

lists:reverse([lists:reverse(Acc1)|Acc2]).

…

get_weights([{I_Coord,I,_I_Weights}|I_Neurodes],Coord,CPP_PIds,CEP_PIds,Acc,

[W|Weights],O)->

plasticity_fanout(CPP_PIds,I_Coord,Coord,[I,O,W]),

U_W=fanin(CEP_PIds,W),

get_weights(I_Neurodes,Coord,CPP_PIds,CEP_PIds,[U_W|Acc],Weights,O);

get_weights([],_Coord,CPP_PIds,CEP_PIds,Acc,[],_O)->

lists:reverse(Acc).

plasticity_fanout([CPP_PId|CPP_PIds],I_Coord,Coord,IOW)->

CPP_PId ! {self(),I_Coord,Coord,IOW},

plasticity_fanout(CPP_PIds,I_Coord,Coord,IOW);

plasticity_fanout([],_I_Coord,_Coord,_IOW)->

done.

The above listing only shows the populate_PHyperlayers_l2l function, with the updated source in boldface. The other two populate_PHyperlayers_ functions are similarly updated with the new case clause. When Plasticity is not set to none , the new get_weights/7 function is executed, which in return calls the plastici- ty_fanout/4 function, which forwards the vector [I,O,W], along with the coordi- nates, to the NN's substrate_cpps. We will update the morphology and create the new substrate_cpps a bit later. When the NN has finished processing these signals, the NN's new abcn substrate_cep, sends the substrate the vector: [W,A,B,C,N], and the message to execute the set_abcn/2 function, whose implementation is shown next.

Listing-17.3 The implementation of the set_abcn function.

set_abcn(Signal,_WP)->

[U_W,A,B,C,N] = Signal,

{functions:sat(U_W,3.1415,-3.1415),abcn,[A,B,C,N]}.

When this function is called, it returns the tuple: {functions:sat(U_W,3.1415,- 3.1415),abcn,[A,B,C,N]}, which now takes the place of the simple synaptic weight value. This is done for every synaptic weight in the weights list of every neurode.

Once all neurodes have been updated in this manner, the substrate can now start processing the sensory signals. Everything is left the same as before, the only thing that needs to be changed is the actual function which calculates the output of the neurode by processing its input and the neurode's synaptic weights. This is done in the calculate _output function. We thus update it to check for the plasticity type of the substrate, and based on it, either execute the standard calcu- late_neurode_output_std, or in the case of the abcn rule, the new calcu- late_neurode_output_plast function. Listing-17.4 shows the updated calcu- late_output/5 function, and the new functions which it calls.

Listing-17.4 The implementation of the updated calculate_output/5 function.

calculate_output(I_Neurodes,Neurode,Plasticity,CPP_PIds,CEP_PIds)->

{Coord,_Prev_O,Weights} = Neurode,

case Plasticity of

none ->

Output=calculate_neurode_output_std(I_Neurodes,Neurode,0),

{Coord,Output,Weights};

abcn ->

Output=calculate_neurode_output_plast(I_Neurodes,Neurode,0),

update_neurode(I_Neurodes,{Coord,Output,Weights},[])

end.

…

calculate_neurode_output_plast([{_I_Coord,O,_I_Weights}|I_Neurodes],{Coord,Prev_O,

[{W,_LF,_Parameters}|WPs]}, Acc)->

calculate_neurode_output_plast(I_Neurodes,{Coord,Prev_O,WPs},O*W+Acc);

calculate_neurode_output_plast([],{Coord,Prev_O,[]},Acc)->

functions:tanh(Acc).

update_neurode([{_I_Coord,I_O,_I_Weights}|I_Neurodes],{Coord,O, [{W,LF,Parameters}

|WPs]},Acc)->

U_W = substrate:LF(I_O,O,W,Parameters),

update_neurode(I_Neurodes,{Coord,O,WPs},[{U_W,LF,Parameters}|Acc]);

update_neurode([],{Coord,O,[]},Acc)->

{Coord,O,lists:reverse(Acc)}.

abcn(Input,Output,W,[A,B,C,N])->

Delta_Weight = N*(A*Input*Output + B*Input + C*Output),

W+Delta_Weight.

When plasticity is set to abcn, the calculate_output function first executes the calculate_substrate_output_plast function which calculates the neurode's output, and then the function update_neurode/3, which updates the neurode's synaptic weights based on the parameters stored in the tuple representing the given neurode.

What is interesting here is that the update_neurode executes the update func- tion based on the atom within the tuple representing neurode. So if we change the learning rule specifying atom from: abcn, to for example: ojas , and implement the ojas/4 function, then we can easily and without any further changes, start using the Oja's learning rule.

That is effectively it with regards to updating the substrate module. We next update the morphology, and add to it the new substrate cpps and ceps needed for this learning rule.

17.2.2 Updating the Morphology Module

Our substrate can now deal with the new type of substrate_cpp and sub- strate_cep, yet we have neither specified nor created them yet. We do just that in this section.

When creating substrate_cpps and substrate_ceps, we have to specify the Plas- ticity type that the substrate will use. Thus we can modify the cpps and ceps to use different input and output vector lengths dependent on the type of plasticity. For example, since both the iterative and the abcn learning rules both require sub- strate_cpps to pass to the NN an extended vector with the extra three values for Input, Output, and Wieght: [I,O,W], we set the morphology in such a way that when plasticity is set to abcn or iterative, a different set of substrate_cpps is creat- ed, as shown in the following listing.

Listing-17.5 The implementation of the updated get_SubstrateCPPs/2 function. get_SubstrateCPPs(Dimensions,Plasticity)->

io:format( “Dimensions:~p, Plasticity:~p~n ”,[Dimensions,Plasticity]),

if

(Plasticity == iterative) or (Plasticity == abcn) ->

Std=[

sensor{name=cartesian,type=substrate,vl=Dimensions*2+3}

sensor{name=centripital_distances,type=substrate,vl=2+3},

sensor{name=cartesian_distance,type=substrate,vl=1+3},

sensor{name=cartesian_CoordDiffs,type=substrate,vl

=Dimensions+3},

sensor{name=cartesian_GaussedCoordDiffs,type=substrate,vl

=Dimensions+3},

sensor{name=iow,type=substrate,vl=3}

],

Adt=case Dimensions of

2 ->

[#sensor{name=polar,type=substrate,vl=Dimensions*2+3}];

3 ->

[#sensor{name=spherical,type=substrate,vl

=Dimensions*2+3}];

_ ->

[]

end,

lists:append(Std,Adt);

(Plasticity == none) ->

Std=[

sensor{name=cartesian,type=substrate,vl=Dimensions*2},

sensor{name=centripital_distances,type=substrate,vl=2},

sensor{name=cartesian_distance,type=substrate,vl=1},

sensor{name=cartesian_CoordDiffs,type=substrate,vl=Dimensions},

sensor{name=cartesian_GaussedCoordDiffs,type=substrate, vl

=Dimensions}

],

Adt=case Dimensions of

2 ->

[#sensor{name=polar,type=substrate,vl=Dimensions*2}];

3 ->

[#sensor{name=spherical,type=substrate,vl=Dimensions*2}];

_ ->

[]

end,

lists:append(Std,Adt)

end.

In the above updated function, we simply add to the vl parameter the extra: 3, that is required to deal with the extended vector when the plasticity is set to abcn or iterative. Listing-17.6 shows the updated get_SubstrateCEPs/2 function, with its new cep that the abcn learning rule uses.

Listing-17.6 The implementation of the updated get_SubstrateCEPs/2 function.

get_SubstrateCEPs(Dimensions,Plasticity)->

case Plasticity of

abcn ->

[#actuator{name=set_abcn,type=substrate,vl=5}];

none ->

[#actuator{name=set_weight,type=substrate,vl=1}];

end.

With this done, we need now only modify the substrate_cpp and substrate_cep modules.

17.2.3 Updating the substrate_cpp & substrate_cep Modules

The only remaining modification left, is one done to the substrate_cpp and sub- strate_cep modules. For the first, we simply extend the receive loop so that it can accept a message tuple of the form: {Substrate_PId, Presynaptic_Coords, Postsyn- aptic_Coords, IOW }. Which is an extended message to also accommodate the new vector [ I,O,W] of length 3. The updated loop/8 function is shown in Listing-17.7, with the added functionality in boldface.

Listing-17.7 The implementation of the updated substrate_cpp:loop/6 function.

loop(Id,ExoSelf_PId,Cx_PId,Substrate_PId,CPPName,VL,Parameters,Fanout_PIds)->

receive

{Substrate_PId,Presynaptic_Coords,Postsynaptic_Coords}->

SensoryVector = functions:CPPName(Presynaptic_Coords,

Postsynaptic_Coords),

[Pid ! {self(),forward,SensoryVector} || Pid <- Fanout_PIds], loop(Id,ExoSelf_PId,Cx_PId,Substrate_PId,CPPName,VL,Parameters,

Fanout_PIds);

{Substrate_PId,Presynaptic_Coords,Postsynaptic_Coords,IOW}->

SensoryVector = functions:CPPName(Presynaptic_Coords,

Postsynaptic_Coords, IOW),

[Pid ! {self(),forward,SensoryVector} || Pid <- Fanout_PIds], loop(Id,ExoSelf_PId,Cx_PId,Substrate_PId,CPPName,VL,Parameters,

Fanout_PIds);

{ExoSelf_PId,terminate} ->

ok

end.

Since to produce the actual SensoryVector the cpp uses the CPPName func- tion found in the functions module, we also add the new necessary functions to

the functions module. The following listing shows the newly added functions which allow the substrate_cpp to process the extended vector, and produce the sensory vectors of appropriate lengths, which now will include the IOW values.

Listing-17.8 The new sensory signal processing functions in the functions module. cartesian(I_Coord,Coord, [I,O,W] )->

[ I,O,W| lists:append(I_Coord,Coord)].

polar(I_Coord,Coord, [I,O,W] )->

[ I,O,W| lists:append(cart2pol(I_Coord),cart2pol(Coord))].

spherical(I_Coord,Coord, [I,O,W] )->

[ I,O,W| lists:append(cart2spher(I_Coord),cart2spher(Coord))].

centripetal_distances(I_Coord,Coord, [I,O,W] )->

[ I,O,W,centripetal _distance(I_Coord,0),centripetal_distance(Coord,0)].

cartesian_distance(I_Coord,Coord, [I,O,W] )->

[ I,O,W, calculate_distance(I_Coord,Coord,0)].

cartesian_CoordDiffs(FromCoords,ToCoords, [I,O,W] )->

[ I,O,W| cartesian_CoordDiffs(FromCoords,ToCoords)].

cartesian_GaussedCoordDiffs(FromCoords,ToCoords, [I,O,W] )->

[ I,O,W| cartesian_GaussedCoordDiffs(FromCoords,ToCoords)].

iow(_I_Coord,_Coord,IOW)->

IOW.

These are basically the same functions as used by the substrate plasticity of type none , except that they append to the resulting processed coordinates the vec- tor IOW, as is shown above in boldface. Finally, we now add the new set_abcn/3 function to the substrate_cep module, as shown next.

Listing-17.9 A new set_abcn/3 function added to the substrate_cep module. set_abcn(Output,_Parameters,Substrate_PId)->

Substrate_PId ! {self(),set_abcn,Output}.

After the standard substrate_cep receive loop gathers all the signals from the presynaptic neurons, it executes the morphologically specified substrate_cep func- tion, which is set_abcn/3 in the case when plasticity = abcn . This function sends to the substrate the output vector: [W,A,B,C,N], and the message set_abcn , which

the substrate then uses to execute the set_abcn function which we added to the substrate module earlier in this chapter.

17.2.4 Benchmarking the New Substrate Plasticity

Undramatically, these are all the modifications that were needed to allow our substrate encoded system to let the neurodes utilize the Hebbian learning rule. Now that our substrate encoded NN based system has the ability to learn, we again test it on the discrete T-Maze problem, with the results shown in Listing-17.10.

Listing-17.10 The benchmark results of the application of the substrate encoded NN based sys- tem with abcn plasticity, to the discrete T-Maze problem.

Graph:{graph,discrete_tmaze,

[5.117069757727654,5.2276148705096075,5.256698564593302,

5.323939393939395,5.367008430166325,5.383246753246754,

5.340942697653223,5.335703463203464,5.310778651173387,

5.318170426065162],

[0.09262706161715503,0.13307346652534205,0.15643200235420435,

0.19103627317236116,0.24840028484238094,0.3074955828617828,

0.22050190155526622,0.2687961935948596,0.27809920403845456,

0.2890597857666472],

[102.30049832915634,115.57537176274028,115.82996172248811,

119.80377705627716,117.495043745728,122.40998917748922,

126.54832308042839,127.85407575757583,131.0029333561176,

129.20257552973348],

[19.387895647932833,6.436782140204616,7.641824017959771,

9.509692909802402,11.439728974016472,8.54974974710698,

10.520194897286766,10.492965582165443,10.152832110453105,

10.20904378137977],

[145.20000000000002,149.2,145.60000000000002,149.2,148.6,149.2,

149.2,149.2,149.2,149.2],

[0,10.000000000000115,10.000000000000115,10.000000000000115,

10.000000000000115,10.000000000000115,10.000000000000115,

10.000000000000115,10.000000000000115,10.000000000000115],

[16.95,18.05,18.15,18.35,18.05,18.4,18.15,18.0,18.5,18.0],

[1.5960889699512364,1.8834808201837359,1.3518505834595775,

1.2757350822173077,1.8020821290940099,1.5297058540778357,

1.3883443376914821,1.224744871391589,1.466287829861518,

1.2649110640673518],

[500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0],

[]}

Tot Evaluations Avg:5172.3 Std:110.92885107130606

As in Chapter-15, once plasticity is enabled, our TWEANN is able to evolve substrate encoded NN based agents which can effectively solve the T-Maze prob- lem, achieving full score. As we've done once before, I will select one of the champions, and print out it's sensory and action based signals to demonstrate that indeed the agent, once it comes across the small reward occurring at [1,1] after the switch, does change its strategy and begins to move towards [-1,1], as shown next:

Position:[0,0] SenseSignal:[0,1,0] RewardSignal:[0] Move:0 StepIndex:1 RunIndex:0 Position:[0,1] SenseSignal: [1,0,1] RewardSignal:[0] Move:1 StepIndex:2 RunIndex:0 Position:[1,1] SenseSignal:[0,0,0] RewardSignal:[1] Move:0 StepIndex:3 RunIndex:0 Position:[0,0] SenseSignal:[0,1,0] RewardSignal:[0] Move:0 StepIndex:1 RunIndex:1 Position:[0,1] SenseSignal: [1,0,1] RewardSignal:[0] Move:1 StepIndex:2 RunIndex:1 Position:[1,1] SenseSignal:[0,0,0] RewardSignal:[1] Move:0 StepIndex:3 RunIndex:1 ...

Position:[0,0] SenseSignal:[0,1,0] RewardSignal:[0] Move:0 StepIndex:1 RunIndex:55

Position:[0,1] SenseSignal:[1,0,1] RewardSignal:[0] Move:1 StepIndex:2 RunIndex:55

Position: [1,1] SenseSignal:[0,0,0] RewardSignal: [0.2] Move:0 StepIndex:3 RunIndex:55 Position:[0,0] SenseSignal:[0,1,0] RewardSignal:[0] Move:0 StepIndex:1 RunIndex:56 Position:[0,1] SenseSignal:[1,0,1] RewardSignal:[0] Move:-1 StepIndex:2 RunIndex:56 Position: [-1,1] SenseSignal:[0,0,0] RewardSignal:[1] Move:0 StepIndex:3 RunIndex:56 Position:[0,0] SenseSignal:[0,1,0] RewardSignal:[0] Move:0 StepIndex:1 RunIndex:57 Position:[0,1] SenseSignal:[1,0,1] RewardSignal:[0] Move:-1 StepIndex:2 RunIndex:57 Position:[-1,1] SenseSignal:[0,0,0] RewardSignal:[1] Move:0 StepIndex:3 RunIndex:57

In the above console printout, the switch event occurred on the 55 maze run, and on the 56 the agent began going to the [-1,1] corner to collect the switched, large reward. We next add the iterative learning rule, and see how it fares in this problem. Although based on the Listing-17.10, we can readily see that the substrate encoded NN based system, using the abcn learning rule, can already solve the problem within the first 1000 evaluations. There is little margin for improvement.

17.3 Implementing the iterative Learning Rule

The iterative plasticity works in a different way than the just added abcn learn- ing rule. For the substrate to utilize the iterative plasticity function, it must pole the NN for every weight of every neurode after every time that neurode produces an output by processing its input signals. Whereas before the substrate used the NN to set all the synaptic weights and parameters once per evaluation, it now must call the NN for every synaptic weight every sense-think-act cycle. This re- quires significantly more computational power. Yet at the same time it does allow for the entire NN to act as a learning rule, and because a NN is a universal func- tion approximator, and because it produces its output based on the coordinates, the

th

input, the output, and the current synaptic weight between the connected neurodes, it could potentially be incredibly versatile, and allow for virtually any type of learning rule to evolve.

The implementation of this learning rule also encompasses all the learning rules in which the substrate needs to update the synaptic weights every cycle, as opposed to every evaluation as is the case with the abcn rule. Thus by implement- ing it here and now, we open our SENN system to all future learning rules which use this type of updating approach.

Again, surprisingly the implementation of this learning rule will require only a few minor modifications to our existing substrate module, and a small addition to the substrate_cep, and the morphology module. The first change is within the rea- son function. Though we set up and compose the new substrate as usual when the substrate_state_flag is set to reset, after the initial substrate with the default synap- tic weights equaling to zero have been set, we set the substrate_state_flag to itera- tive . We do not set it to hold because we will need to call get_weights function during every cycle. Thus by setting the substrate_state_flag to iterative , and by creating a new case in the reason function, as shown in Listing-17.11, we can re- use the calculate_ResetOutput/7 function during every cycle to update the synaptic weights of the neurodes by calling the get_weights function.

Listing-17.11 The implementation of the updated reason/7 function.

reason(Input,S)->

Densities = S#state.densities,

Substrate = S#state.cur_substrate,

SMode = S#state.substrate_state_flag,

CPP_PIds = S#state.cpp_pids,

CEP_PIds = S#state.cep_pids,

Plasticity = S#state.plasticity,

case SMode of

reset ->

Sensors=S#state.sensors,

Actuators=S#state.actuators,

New_Substrate=create_substrate(Sensors,Densities,Actuators,

S#state.link_form),

U_SMode=case Plasticity of

iterative ->

{Output,Populated_Substrate} = calculate_ResetOutput(

Densities, New_Substrate, Input, CPP_PIds, CEP_PIds, Plasticity,S#state.link_form),

iterative;

_ ->

{Output,Populated_Substrate}=calculate_ResetOutput( Densities,

New_Substrate, Input,CPP_PIds,CEP_PIds,Plasticity, S#state.link_form),

hold

end,

{Populated_Substrate,U_SMode,Output};

iterative ->

{Output,U_Substrate} = calculate_ResetOutput(Densities,Substrate, Input,

CPP_PIds,CEP_PIds,Plasticity,S#state.link_form),

{U_Substrate,SMode,Output};

hold ->

{Output,U_Substrate} = calculate_HoldOutput(Densities,Substrate, Input,

S#state.link_form, Plasticity,CPP_PIds,CEP_PIds),

{U_Substrate,SMode,Output}

end.

Since this update, shown in boldface in the above listing, ensures that the func- tion calculate_ResetOutput is going to be called during every cycle, we need to update that function as shown in Listing-17.12.

Listing-17.12 The implementation of the updated calculate_ResetOutput/7 function.

calculate_ResetOutput(Densities,Substrate,Input,CPP_PIds,CEP_PIds,Plasticity,LinkForm)->

[IHyperlayer|PHyperlayers] = Substrate,

Populated_IHyperlayer = populate_InputHyperlayer(IHyperlayer,lists:flatten(Input),[]),

case Plasticity of

iterative ->

{Output,U_PHyperlayers}=calculate_substrate_output(

Populated_IHyperlayer, PHyperlayers,LinkForm, Plasticity, CPP_PIds,CEP_PIds),

{Output,[IHyperlayer|U_PHyperlayers]};

_->

Populated_PHyperlayers = populate_PHyperlayers(Substrate,CPP_PIds,

CEP_PIds, LinkForm, Plasticity),

{Output,U_PHyperlayers}=calculate_substrate_output(Populated_IHyperlayer,

Populated_PHyperlayers, LinkForm,Plasticity, CPP_PIds,CEP_PIds),

{Output,[IHyperlayer|U_PHyperlayers]}

end.

As can be seen from the above listing, the difference in the functions executed when Plasticity == iterative , is in the fact that we no longer need to execute the populate_PHyperlayers/5, and instead we slightly modify the calculate_output/5 function called deep within the calculate_substrate_output/7 function, such that it calls the get_weights function after calculating the output of every node, and thus updating that node's synaptic weights. This updated function is shown in the fol- lowing listing.

Listing-17.13 The implementation of the updated calculate_output/5 function. calculate_output(I_Neurodes,Neurode,Plasticity,CPP_PIds,CEP_PIds)->

{Coord,_Prev_O,Weights} = Neurode,

case Plasticity of

none ->

Output=calculate_neurode_output_std(I_Neurodes,Neurode,0),

{Coord,Output,Weights};

iterative ->

Output=calculate_neurode_output_std(I_Neurodes,Neurode,0),

U_Weights = get_weights(I_Neurodes,Coord,CPP_PIds, CEP_PIds,

[], Weights, Output),

{Coord,Output,U_Weights};

abcn ->

Output=calculate_neurode_output_plast(I_Neurodes,Neurode,0),

update_neurode(I_Neurodes,{Coord,Output,Weights},[])

end.

Unlike the case when Plasticity==none , or abcn , when Plasticity==iterative, we call get_weights after calculating the Output value for every neurode. And it is this that allows us to update ever synaptic weight for every neurode during every cycle. This is effectively it. We now need only add, mirroring the set_abcn/2 func- tion, the new set_iterative/2 function, as shown in Listing-17.14, and the substrate is now able to function without plasticity, with plasticity which is only set once during evaluation, and with plasticity that requires the polling of the NN during every cycle.

Listing-17.14 The implementation of the new set_iterative/2 function.

set_iterative(Signal,W)->

[Delta_Weight] = Signal,

functions:sat(W + Delta_Weight,3.1415,-3.1415).

We next modify the morphology module by updating the get_SubstrateCEPs/2 function, as shown in Listing-17.15. We update it by simply adding the new sub- strate_cep specification, which has an output vector length of 1, the value of the delta weight.

Listing-17.15 The updated get_SubstrateCEPs/2 function. get_SubstrateCEPs(Dimensions,Plasticity)->

case Plasticity of

iterative ->

[#actuator{name=delta_weight,type=substrate,vl=1}];

abcn ->

[#actuator{name=set_abcn,type=substrate,vl=5}];

none ->

[#actuator{name=set_weight,type=substrate,vl=1}]

end.

And finally we implement this new substrate_cep, updating the substrate_cep module as shown in the following listing.

Listing-17.16 The new delta_weight/3 function added to the substrate_cep module. delta_weight(Output,_Parameters,Substrate_PId)->

[Val] = Output,

Threshold = 0.33,

DW = if

Val > Threshold ->

(functions:scale(Val,1,Threshold)+1)/2;

Val < -Threshold ->

(functions:scale(Val,-Threshold,-1)-1)/2;

true ->

0

end,

Substrate_PId ! {self(),set_iterative,[DW]}.

The new delta_weight/3 function is very similar to set_weight/3 function, the only difference is in the way it is used by the substrate. As we saw in Listing- 17.11, the set_iterative/2 function updates the synaptic weight rather than over- writing it, which is the case with the original set_weight/2 function.

And that concludes the implementation of this learning rule. Again, completely undramatic, and accomplished very easily. In the next subsection we test this new- ly added plasticity learning rule on the now very familiar to use discrete T-Maze benchmark.

17.3.1 Benchmarking the New iterative Substrate Plasticity

Having now implemented the iterative plasticity learning rule, we test and see how it compares to the abcn learning rule, and the substrate which does not pos- sess plasticity at all. The results of running the benchmark with the iterative learn- ing rule is shown in Listing-17.17

Listing-17.17 The results of running the T-Maze benchmark with the iterative learning rule. Graph:{graph,discrete_tmaze,

[1.080697304856344,1.105477329687856,1.082608984582669,

1.10173644338118,1.1018239026419805,1.1040638651965884,

1.1067042854170999,1.1083983426946815,1.0720152902017892,

1.1131932598294159],

[0.06523721492122392,0.05737539536993592,0.061080859604004445,

0.05539953173437482,0.04951815451574514,0.06096095772186648,

0.06779249262624067,0.055316036698257,0.0459640769682992,

0.0655751724549261],

[106.75590608768309,108.585491949571,114.94932017543867,

117.83561297182356,121.16001164967763,118.65738256708975,

121.01826794258375,122.92790528858384,122.86203973372172,

126.88959090909096],

[9.162803145363911,9.325149636485971,9.817899227379831,

9.10148250224565,9.783690310057073,9.788726424047805,

11.122540054241757,11.654351487379284,10.826960884668908,

10.50203798322592],

[148.4,148.4,149.2,149.2,149.2,149.2,149.2,149.2,149.2,149.2],

[11.40000000000012,10.600000000000119,10.600000000000119,

10.000000000000115,10.000000000000115,10.000000000000115,

10.600000000000119,10.600000000000119,10.000000000000115,

10.000000000000115],

[10.05,12.7,14.2,15.0,15.35,15.65,16.25,16.65,16.7,17.05],

[1.116915395184434,1.7349351572897473,1.5999999999999999,

2.32379000772445,2.1041625412500817,2.2197972880423116,

2.2107690969434146,1.7684739183827396,1.9261360284258222,

2.4181604578687494],

[500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0,500.0],

[]}

Tot Evaluations Avg:5188.85 Std:110.24122414051831

As expected, with this plasticity type, our TWEANN was again able to rapidly evolve SENNs capable of solving the T-Maze problem. Indeed, some near perfect solutions were evolved within the first 500 evaluations during a number of evolu- tionary runs. Unfortunately though, the iterative learning rule requires us to poll the NN for every synaptic weight of every neurode, during every sense-think-act cycle, which makes it slower than other rules, but also incredibly more versatile and powerful. Though it is possible to accelerate and optimize this system by for example transforming the evolved feedforward NNs into single functions, then embedding those functions within each neurode, and thus allowing each neurode to simply call on it as if it were a simple plasticity function… Nevertheless, the benchmark result in this subsection was a success, and we now have a fully func- tional, highly advanced TWEANN platform capable of effectively evolving neural and substrate encoded, static and plastic, neural network based agents.

17.4 Discussion

We have come a long way. At this point we've implemented a fully concurrent, and cutting edge, TWEANN platform. The benchmarks of both, the abcn and the iterative learning rules we've developed for our substrate encoded NNs have been a success. Both versions were rapidly (even more so than NNs with neural plas- ticity enabled) utilized by our TWEANN, which successfully evolved agents ca- pable of solving the T-Maze problem. During many of the evolutionary runs com- posing the experiment, the solutions were evolved within the first 1000 evaluations. Yet still the iterative rule, though much more flexible, and allowing us to evolve any learning rule due to the NNs being universal function approximators, is computationally heavy. But it is possible to significantly accel- erate this encoding by for example converting the feedforward NNs into single functions, which can then be utilized independently by each neurode. A feedforward NN is after all just a function of functions, which can be represented in the form of: FF = f1(f2(f3(...)...)...) , with the FF function then used directly by the substrate embedded neurodes. This would effectively make the entire SENN, excluding the sensors and actuators, be represented by a single process. But im- plementing this computational option is outside the scope of this volume, and will be covered with other advancements in the next book volume. Having now created an advanced TWEANN system, we apply it to some real world applications in the following chapters.

Part V

Applications

Our system is ready, it has direct and indirect encoding, plasticity of varying kinds, numerous activation functions, with new ones easily added, and even the ability to evolve evolutionary strategies. I noted that we will apply our system to complex, real world problems, not just benchmarks. We will do that in the next two chapters. The following two application areas are exciting, interesting, and lu- crative. We will apply our system to an Artificial Life simulation, evolving the brains of 2d simulated robots, inhabiting a flatland, a 2d environment. And then we will use our system to evolve currency trading agents, an agent that reads the historical exchange rates of Euro vs. USD, and either buys or sells one against an- other on the Forex market, to make a profit. In fact, we will not only evolve NNs which use as sensory signals simply the lists of historical prices, but also NNs which use actual candle stick plot charts of historical prices.

Chapter 18 Artificial Life

Abstract In this chapter we apply our neuroevolutionary system to an ALife simulation. We create new sensors and actuators for the NN based agents interfac- ing with their simulated environment avatars. Discuss the construction and im- plementation of the 2d ALife environment called Flatland. Interface our system to the said system, and then observe the resulting evolving behaviors of the simulat- ed organisms.

We now come full circle. We started this book with a discussion o n evolution of intelligent organisms, inhabiting a simulated (or real) world. In this chapter we convert that discussion into reality. We will create a flatland, a 2d world inhabited by 2d organisms, artificial life. Our neuroevolutionary system will evolve the brains of these 2d organisms, the avatars within the simulated environment con- trolled by our NN based agents, and through their avatars our NN based agents will explore the barren flatland, compete with other flatlanders for food, and pre- date on each other while trying to survive and consume the simulated food within the 2d world.

18.1 Simulated Environment and Artificial Organisms: Flatland

The goal of Alife [1,2,3] is to use evolution to evolve intelligent agents in an open ended simulated environment, using a fitness function that is at least to some degree similar to the animal world, in which the accumulation of resources and survival is correlated with achievement and therefore higher fitness and the crea- tion of more offspring than those with a lower fitness.

Our simulation will be composed of a two dimensional environment, with food scattered throughout some region, where that food will be represented as green circles. The environment will be populated by simulated 2d herbivore robots, sim- ulated as blue circles, and 2d predator type simulated robots, represented by red circles with small spears at one end.

Note that the circles will truly represent the morphologies of those simulated robots. The collision detection will be based on the circle's radius, and the robot's mass. The simulated environment will look something like Fig-18.1. In it you can see the world, the scattered through it food elements (small green circles), the prey (large blue circles), and the predators (smaller red circles). In honor of the book [4] “Flatland: A Romance of Many Dimensions ”, we will call this 2d environment simulation: flatland , and the evolving agents: flatlanders.

Fig. 18.1 Flatland, prey, predators, and plants.

Unlike other scapes we have developed before, flatland is a public scape. The agents when created will not spawn this scape, but join/enter it. The public scape itself must be created before any population monitor is even spawned. As we dis- cussed in the first few chapters, it is these types of public scapes that should be spawned by the polis process, and it is these public scapes which will allow for the NN based agents to interact not just with the environment of the scape, but also with each other. The simulated robots are, in essence, avatars controlled by the NNs. We can also think of the simulated robots as the physical representations of the NN based agents, whose morphology, sensors, and actu ators, dictate the physical representation of their avatars, with the NN truly then being the core of those organisms, the brains within.

18.2 The Scape and the Fitness Function

Most of the concepts we have developed when creating the XOR_Mimic, Dou- ble Pole Balancing, and particularly the T-Maze private scapes, can be reused here

in the creation of the Flatland scape. The flatland will have to create a simulated representation, an avatar, for the NN that enters it. When a NN is spawned, it first sends the flatland a message that it wishes to enter the scape. At this point the scape checks the NNs morphology (specified by the message the exoself sent it), and based on that morphology creates an avatar for it. The scape also registers the NN as currently possessing an avatar within the scape, such that when the NN starts sending it action messages to control the avatar, and polls it for sensory messages based on the sensors of its avatar, the scape can move its avatar, and forward to it the sensory signals from its avatar, respectively. Thus, our system in this scenario should function as follows:

1. We create all the necessary databases to support the polis infrastructure by exe- cuting polis:create().

2. Then the polis is started by executing polis:start().

3. The polis spawns all public scapes at its disposal, which in this case is a list containing a single scape called flatland.

4. Once the flatland scape is created and its PId is registered with the polis, the scape creates all the basic elements within itself (all the simulated plants).

5. The researcher decides to start an ALife simulation, specifies within the con- straint to use the prey morphology, which is a flatlander specie which the flat- land scape recognizes as a type which can move around and eat plants when moving over them.

6. The researcher then compiles and executes population_monitor:test().

7. The population_monitor begins spawning agents.

8. When an agent is spawned, its exoself first determines what scapes the agent should either spawn (if private), or enter (if public).

9. The exoself thus forms a list of scape names. If private, they are spawned, and if public, the exoself sends the polis a message to request the PId of the public scape of that particular name.

10.The polis sends back to the exoself the PId of the particular scape. At which point the exoself sends this scape a message with its PId and morphology.

11.The scape, depending on what type of scape it is (flatland in this case), creates an avatar based on the exoself's provided morphology name, and associates the exoself's PId with that avatar.

12.The exoself spawns all the elements composing the NN based system. 13.Since sensors and actuators have the exoself's PId, when they are triggered to

action they send a request for sensory signals, and action signals, to the scape. The signals contain the PId of the exoself, so that the scape can associate these messages with the right avatar.

14.When the flatland scape receives a message, it uses the PId in the message to figure out which avatar the message is associated with. And then based on the message, replies back to the caller.

15.The flatland scape keeps track of the environment, the avatars, their interac- tions... Performing all the physical simulation calculations, and thus having ac- cess to all the information to keep track of the fitness of every avatar.

16.Because flatland keeps track of the amount of energy every avatar has, and because it knows when two avatars collide, or interact, it will know when an avatar dies from losing all its energy, or from being eaten. As soon as an avatar dies, the scape performs the cleanup associated with that avatar's physical rep- resentation. Whether that be simply removing the avatar from the physical sim- ulation, or leaving it there to be devoured or decay.

17.When the NN receives a message from the scape that it has just died, that is the end of that agent's evaluation, similarly to when in the T-Maze simulation the simulated robot crashes into a wall. At this point the exoself can perturb the weights, revert them to previous values, or perform other computations.

18.Afterwards, the exoself can send the scape a message requesting a new avatar... And the cycle continues.

For the simulation to be consistent, and for the world to have a constant or growing population, we will of course have to use the steady_state based evolu- tion, rather than generational . The steady_state evolution reflects the continuous birth and death of agents within an environment, unlike generational evolution in which the entire generation must die before new agents are created, and a new generation is spawned.

The plan of attack for this application would be to first develop the actual flat- land scape. This would require us to first create the Cartesian world simulation. Create the functions for collision detection between two circles, between a circle and a line, and between a line and a line. We would then have to decide on wheth- er we want to use some kind of quad-tree to optimize the support of large number of avatars within one scape, or whether to just be content in using 20-30 avatars at any time, and allow a single process to represent the entire flatland, and thus act as a bottleneck of computing collision detections between avatars, and simulating vi- sion and other sensors of each avatar. With regards to the scape, we would also have to decide on the schedule of when to perform collision detection, for exam- ple:

1. Do we first wait for every single agent to send an action based message to the scape, a batch form of signals from all the currently alive agents, then in one go apply the requested actions to their avatars, and then calculate the collision de- tections and other resulting properties of the scape after that single action step has been taken. Then, afterwards, allow the scape to again go into its receive loop, waiting for another batch of messages from all the active agents?

2. Or do we let the scape perform collision detection and other types of environ- ment calculations (let the simulated plants grow, or spread, or create offspring, or mature and increase in the amount of energy the plant provides when eaten, or decay and turn into poison and decrease the avatar's energy reserves when eaten, or change in color over time...) after receiving every single action message from an agent?

3. Or do we run the physical simulation nonstop, updating the world's physical properties every simulated 0.01 seconds, and somehow synchronize all the agents with it...?

I believe that option-2 has an interesting advantage in that the smaller NNs will have faster “reflexes ”, because the smaller NNs will have lower NN topology depth, and thus go through more sense-think-act cycles for any given time than those NNs which are much deeper. For example, a 10 neuron NN with depth 1, will be able to send most likely at least 10 messages in the time frame that a 100000 neuron NN of depth 1000 can send... and thus its reflexes, and the number of actions performed per given time, will be higher than that of a more complex intelligence. This is somewhat similar to the real world, and the response times between various organisms, based on their brain structure and its complexity.

I have implemented Flatland in [5], and it uses option 2. Because the imple- mentation of the 2d world simulation is outside the scope of this book, and be- cause there can be so many different ways to implement it, or even use the scape as a simple wrapper for an already existing 2d robot simulator like Stage [6], we will concentrate on the discussion and implementation of features needed by our TWEANN system to interact with the provided flatland simulation (rather than building it), given that we know the interface format for it.

18.2.1 Public Scape Architectures, Polis Interface, and Scape Sectors

The general architecture of a polis, and its relation to the scapes it spawns, and the sectors into which some scapes are divided, is shown in Fig-18.2 . Ideally, the polis process as we discussed, is the infrastructure, it is basically everything, it is all the things that are needed to run the actual TWEANN algorithm, and the soft- ware that synchronizes, and manages public scapes. The public scapes are the al- ways-present environment simulations, not necessarily 3d. The scapes are a way for us to present the simulations, environments, and problems, to the agent.

Fig. 18.2 The architecture of polis, scapes, and sectors.

Each scape can be interfaced with through messages. Each scape has a particu- lar list of messages that it can accept, and that it expects to receive and reply to, this is the interface format for that scape. When a scape simulates an environment, that environment can be further broken up into sectors, represented by processes and running in parallel. The sectors would simply represent the small sections of the physical simulation of the environment, and thus allow for a parallelization of the collision detection, and sensory computations. Sectors allow for scapes to im- plement quad-trees [7], and thus improve their performance when interfacing with a lot of agents. If the scape uses sectors, it assigns the agent to a particular sector based on that agent's location within the scape. This is done by forwarding to the agent the PId of the sector, which the agent then uses as if it were a PId of the scape. Thus this is all seamless to the agent.

Finally, there is also a spawnable Visor process, which allows the researcher to poll the scape for visualizable data. The visor is independent and separate from the scape, and thus when developing a scape it is not necessary to consider how to visually represent the system. The visualization is an afterthought, and is only necessary for us to be able to actually tap into the scape and see what is occurring. It is the visor that deals with constructing a representation of the scape, based on the type of scape it was requested to poll for data.

18.2.2 The Flatland Interface

The way the scapes deal with the messages, whether there are physical repre- sentations of the agents interfacing with it or not, and whether it keeps track of particular agents or not, all depend on the scape and its purpose. The flatland scape keeps track of the agents, it requires that agents first register with it, before they are allocated their avatars. Once an agent is registered with the flatland scape, the scape gives it an avatar, gives that avatar physical parameters, some default amount of energy, starting coordinates within the flatland, and then allows for per- cept polling and action signals to be sent from the registered agent.

The flatland scape is implemented as a gen_server process. It can accept the following messages from the interfacing agents:

{enter,Morphology,Agent_Id,Sensors,Actuators,TotNeurons} : This mes- sage is used to register and enter the flatland scape. The flatland scape uses the Sensors, Actuators, and Morphology, to generate an avatar for the agent. If the agent has the morphology of a predator , the avatar would be very different from one given to a prey type morphology. Finally, the reason for sending the scape the TotNeurons value, is because the scape might make the energy con- sumption and the designated size of the avatar proportional to the size of the NN itself, and for this it needs the total number of neurons composing the agent.

{leave,Agent_Id}: The atom leave is sent by the agent that wishes to unregis- ter, and leave the scape.

{get_all,avatars}: Each sensor needs to perform a different set of computations to produce the sensory signal. Each sensor knows which types of computations need to be performed, thus the sensor simply requests the scape to send it the avatar list, where the avatar list is composed of the avatar records, each having all the information about the avatar's location. The avatar list includes all the avatars/objects within the scape (agent controlled avatars, static objects...). Once the needed data to calculate a sensory signal is received, the sensor performs its function, generates a sensory vector, and forwards it to the neurons in the agent's NN.

{actuator,Agent_Id,Command,Output,Parameters} : The actuator sends to the scape a Command to execute with the parameter Output. The scape takes the Command and Output, and applies it to the avatar with which Agent_Id is associated.

Now that we know how to interface with the flatland scape, we can create the sensors and actuators to allow our agents to interface with it. But this will also re- quire us to modify the exoself so that it can now deal with public scapes, given that all public scapes will use the request for entry and request for leaving the scape, and the messaging format specified in the above list.

18.3 Flatland's Avatar Encoding

The flatland scape keeps track of every avatar's position, pointing direction, energy reserves, morphology, set of sensors and actuators... by entering all these features into an avatar representing record when the correlated agent registers with the scape. The avatar record has the following form:

-record(avatar,{id,sector,morphology,energy=0,health=0,food=0, age=0, kills=0, loc, direction,

r, mass, objects=[], state,actuators,sensors}).

The objects list is composed of object records, where each record has the form: -record(object,{id,sector,type,color,loc,pivot,parameters=[]}).

In which the type can be either circle or line . It is the avatar's objects list that define what the avatar looks like, for he is composed from the objects. The colli- sion detection too is based on the collision of the object lists of every avatar. Hav- ing all avatars being circular though, makes the collision detection calculations a lot faster and simpler.

As noted in the previous section, it is a list of these avatars that is returned to the sensor when requested, and it is the sensor which then calculates the various percepts based on this list. The scape itself can be considered to be composed of avatars, every object is an avatar. A plant is an avatar which has the morphology: plant. A wall section is an avatar of morphology wall, a rock is an avatar of mor- phology rock. The agent's sensor does not need to know the morphology of the avatar, nothing is needed by it except the objects list. Based on the objects: circles and lines, the sensor can calculate whether its line of sight (if camera, or a range sensor for example) intersects those objects. Since every object has a coordinate (loc stands for location) and a color, the sensors can then extract both the color and the range to the object.

For the agent to be able to control the avatar, the avatar must have sensors, and actuators. More precisely, the NN based agent's morphology which defines the avatar, must come with sensors and actuators, such that the agent can gather sensory signals from the avatar, and control the avatar through its actuators. In our ALife simulation we will create 2 sensors: Range Sensor and Color Sensor, and 1 actuator: Differential Drive. The range sensor works like a standard robot range sensor in which we define the resolution and the coverage angle. The resolution defines how many range rays will be cast, and the coverage area defines the angle coverage of these rays. Thus for example if the resolution is 5 rays, and the degree coverage is 90 degrees, than the avatar will cast 5 rays, each 22.5 degrees from the other. The rays cast return the distance to anything they intersect. The color sensor works similarly, but the returned values are colors encoded in some fashion. For example black = -1, green = -0.5, blue = 0, red = 0.5, and white = 1. It

18.4 Updating the Morphology, Sensor, and Actuator Modules

is simply spectrum encoding, and can be similar to frequencies of the colors of the visible spectrum, scaled to be between -1 and 1 (though not in our implementa- tion, where I simply chose a few colors and gave them their own numbers). Final- ly, the actuator the avatar will be controlled through is a simulated differential drive, where the NN outputs a vector of length 2, where the first element controls the velocity of the left wheel, and the second element of the vector controls the ve- locity of the right wheel. The graphical representation of the avatars, sensors, and color encoding, is shown in Fig-18.3 .

Fig. 18.3 The representation of the prey avatar, the predator avatar, the plant avatar, the poison avatar, and the available sensors and actuators for the avatars, and the color sensor encoding.

Because our system is completely independent of the scapes with which the evolved NN based agents interface, we need only add the new records to the rec- ords.hrl file, then add the new sensors, one new actuator, and then finally modify the exoself module. After that, we will be able to apply our TWEANN to an ALife simulation. We add the two new records specified in the previous section to the records.hrl file. Once that is accomplished, we add two new sensors: range_scanner and color_scanner. These two new sensors are shown in the fol- lowing listing.

Listing-18.1 The implementation of the range_scanner and the color_scanner sensors. distance_scanner(Agent_Id,VL,[Spread,Density,RadialOffset],Scape)->

case gen_server:call(Scape,{get_all,avatars}) of

destroyed->

lists:duplicate(VL,-1);

Avatars ->

Self = lists:keyfind(self(),2,Avatars),

Loc = Self#avatar.loc,

Direction = Self#avatar.direction,

distance_scanner(silent,{1,0,0},Density,Spread,Loc, Direc-

tion,lists:keydelete(self(), 2, Avatars))

end.

%The distance_scanner/4 function contacts the scape and requests for a list of all avatars within

it. If for some reason the scape cannot do so, it replies with the void atom, otherwise the avatar

list is returned. If the reply is void, the sensor simply composes a vector of the right length us-

ing the VL value, and returns the result. Otherwise, the function calls color_scanner/7, which performs the actual calculation to compose the color sensory vector. This is primarily done through ray casting, and seeing whether any of the cast rays intersect the objects from which the avatars in the avatar list are composed of.

color_scanner(Agent_Id,VL,[Spread,Density,RadialOffset],Scape)->

case gen_server:call(Scape,{get_all,avatars}) of

void->

lists:duplicate(VL,-1);

Avatars ->

Self = lists:keyfind(self(),2,Avatars),

Loc = Self#avatar.loc,

Direction = Self#avatar.direction,

color_scanner(silent,{1,0,0},Density,Spread,Loc,Direction,

lists:keydelete(self(), 2, Avatars))

end.

%Functions similar to the distance_scanner/4, but returns a list of encoded colors rather than ranges. Whatever objects the cast rays intersect, their color is returned.

Due to most of the functions dealing with ray casting and collision detection, only the wrappers are shown, with the comments explaining their functionality. Because each agent keeps track of its position and the direction in which it is look- ing, we can easily calculate what the range sensor and color sensor should return. Besides the two sensors, the NN based agent also needs an actuator with which to move its avatar, the implementation of which is shown in Listing-18.2.

Listing-18.2 The implementation of the differential_drive actuator.

differential_drive(Agent_Id,Output,Parameters,Scape)->

{Fitness,HaltFlag}=gen_server:call(Scape,{actuator,Agent_Id,differential_drive,Output,

Parameters}).

%The differential_drive/4 function calls the Scape with the command: differential_drive, its output and parameters. The flatland scape will use the differential_drive to simulate the said ac- tuator for the avatar, executing the function with the Output and Parameters.

What can be noted from the three above functions, is that the parameters with which they are called, now have been extended to include the Agent_Id, which uniquely identifies the NN based agent, and can thus be used in public scapes as

the unique identifier of the avatar. This means that we also have to modify sen- sor:prep/1 and actuator:prep/1 to accept the Agent_Id value in the initial state sent to these processes by the exoself.

To be used by the agent, the sensors and the actuator must also be added to the morphology module. Because we want to have two different spe- cies/morphologies, one for prey and one for the predator, each of which will have a different set of privileges and avatars within the flatland, yet both use the same set of sensors and actuators, we must create two morphological specifications that are identical to each other in everything but the morphology name. The two new morphological types added to the morphology.erl are shown in Listing-18.3.

Listing-18.3 The new prey and predator morphological specifications.

predator(actuators)->

prey(actuators);

% The predator morphology uses the same set of actuators as the prey, thus it simply calls the prey function to get the list of actuators available.

predator(sensors)->

prey(sensors).

% The predator morphology uses the same set of sensors as the prey, thus it simply calls the prey function to get the list of sensors available.

prey(actuators)->

Movement = [#actuator{name=differential_drive,type=standard,scape={public,flatland},

vl=2, parameters=[2]}],

Movement;

prey(sensors)->

Pi = math:pi(),

Color_Scanners = [#sensor{name=color_scanner,type=standard,scape={public,flatland}, vl=Density, parameters=[Spread,Density,ROffset]} || Spread <-[Pi/2], Density <-[5], ROffset<- [Pi*0/2]],

Range_Scanners = [#sensor{name=range_scanner,type=standard,scape={public,flatland}, vl=Density, parameters=[Spread,Density,ROffset]} || Spread <-[Pi/2], Density <-[5], ROffset<- [Pi*0/2]],

Color_Scanners++Range_Scanners.

%The prey morphology has access to two types of sensors at this time, the color and the range scanner. The density and the radial offset parameters (in which direction the mounted simulated scanner is pointing) can be modified. Thus instead of simply 2 sensors, the Resolution list can be set to: [5,10,20,50,10], and we would have 10 sensors in total, each one differing in the reso- lution of the sensor. The radial offsets could further be modified in a similar fashion.

Due to the predator and prey morphologies using the same set of sensors and actuators, and differing only in their avatar representations and the privileges

allotted to their avatars, we can let the predator morphology simply call the prey function with the parameter sensor and actuator , to retrieve the provided sensors and actuators available. We can generate multiple sensors by changing the resolu- tion list to contain more than a single value. In this manner we would allow evolu- tion to decide what is the most optimal and efficient resolution for the sensors within the environment.

With the sensors, actuators, and the morphology modules updated, we now make a small update to the exoself module in the next section.

18.5 Updating the exoself Module

We've now constructed the necessary tools with which the agent can interface with the public scape, but we still need a way for the agent to actually register with the public scape in question. In the function spawn_Scapes/4 the exoself ex- tracts unique scape names, and then from this list of unique scape names, the exoself extracts a list of private scapes. The private scapes are then spawned for the agent. We now also need to extract the public scapes from the unique list of scape names, and then for each such public scape contact the polis process to re- quest its PId, and then finally register with that public scape. To accomplish this, we modify the spawn_Scapes/4 function, as shown in the following listing with the new functionality highlighted in boldface.

Listing-18.4 The updated spawn_Scapes/4 function.

spawn_Scapes(IdsNPIds,Sensor_Ids,Actuator_Ids ,Agent_Id )->

Sensor_Scapes = [(genotype:dirty_read({sensor,Id}))#sensor.scape || Id<-Sensor_Ids], Actuator_Scapes = [(genotype:dirty_read({actuator,Id}))#actuator.scape || Id<-

Actuator_Ids],

Unique_Scapes = Sensor_Scapes++(Actuator_Scapes--Sensor_Scapes),

Private_SN_Tuples=[{scape:gen(self(),node()),ScapeName} || {private,ScapeName}<-

Unique_Scapes],

[ets:insert(IdsNPIds,{ScapeName,PId}) || {PId,ScapeName} <- Private_SN_Tuples], [ets:insert(IdsNPIds,{PId,ScapeName}) || {PId,ScapeName} <-Private_SN_Tuples],

[PId ! {self(),ScapeName} || {PId,ScapeName} <- Private_SN_Tuples], enter_PublicScape(IdsNPIds,Sensor_Ids,Actuator_Ids,Agent_Id),

[PId || {PId,_ScapeName} <-Private_SN_Tuples].

enter_PublicScape(IdsNPIds,Sensor_Ids,Actuator_Ids,Agent_Id)->

A = genotype:dirty_read({agent,Agent_Id}),

Sensors = [genotype:dirty_read({sensor,Id}) || Id<-Sensor_Ids],

Actuators = [genotype:dirty_read({actuator,Id}) || Id<-Actuator_Ids],

TotNeurons = length((genotype:dirty_read({cortex,

A#agent.cx_id}))#cortex.neuron_ids),

Morphology = (A#agent.constraint)#constraint.morphology,

Sensor_Scapes = [Sensor#sensor.scape || Sensor<-Sensors],

Actuator_Scapes = [Actuator#actuator.scape || Actuator<-Actuators], Unique_Scapes = Sensor_Scapes++(Actuator_Scapes--Sensor_Scapes),

Public_SN_Tuples=[{gen_server:call(polis,{get_scape,ScapeName}),ScapeName}

|| {public,ScapeName}<-Unique_Scapes],

[gen_server:call(PId,{enter,Morphology,Agent_Id,Sensors,Actuators,TotNeurons}) ||

{PId,ScapeName} <- Public_SN_Tuples].

The modification in the above listing allows the exoself to extract not only the private scapes but also the public scapes, and then enter them. Of course the agent will get booted from the public scape every time its avatar perishes, and every time the agent's evaluation ends, the exoself receives the message: {Cx_PId,evaluation_completed,Fitness,Cycles,Time,GoalReachedFlag} , after which the exoself decides whether to continue or end training. If the exoself de- cides to continue training, and thus perform another evaluation, we can choose to re-enter the public scape by executing the function:

enter_PublicScape(S#state.idsNpids,[genotype:dirty_read({sensor,Id})||Id<-S#state.spids],

[genotype:dirty_read({actuator,Id})||Id<-S#state.apids], S#state.agent_id),

Though elaborate, it allows us to modify nothing else within the exoself at this time. By executing this function, the exoself again finds the PId of the needed public scapes, and re-requests an entry. There is no need to re-join the private scapes, since those are spawned by the exoself, and the sensors and actuators al- ways have access to them, until the agent terminates at which point the exoself terminates all the processes, including the private scapes.

With this modification, our system is now ready to interface with the public scapes, and be applied to the ALife simulations. We could further modify the exoself module, and add to the exoself's state record the elements: sensors, actua- tors, morphology, and public_scapes , which would allow us to then create a spe- cialized function for re-entering public scapes by executing a reenter_PublicScape/4 function as follows:

reenter_PublicScape(S#state.public_scapes,S#state.sensors,S#state.actuators, S#state.agent_id,

S#state.morphology,length(S#state.nids)),

This function could then have a much more streamlined implementation, as shown next:

reenter_PublicScape([PS_PId|PS_PIds],Sensors,Actuators,Agent_Id,Morphology,

TotNeurons)->

gen_server:call(PS_PId,{enter,Morphology,Agent_Id,Sensors,Actuators,TotNeurons}),

reenter_PublicScape(PS_PIds,Sensors,Actuators,Agent_Id,Morphology,TotNeurons);

reenter_PublicScape([],_Sensors,_Actuators,_Agent_Id,_Morphology,_TotNeurons)->

ok.

18.6 The Simulation and Results

The updated implementation of our neuroevolutionary system with the flatland integrated and implemented as discussed above, can be found in the supplemen- tary material [8] for Chapter-18. Now that we are to run the ALife simulation, the briefly discussed visor module which allows us to visualize the scape and the moving avatars within, is finally of vital use. When running the simulation without the visor, we would only see the fitness of the agents, and a few other printouts to console as the agents consumed each other (predator consuming prey), and the plants (prey consuming the plants). A sample of such a printout is shown in List- ing-18.5.

Listing-18.5 A sample console printout of the Predator Vs. Prey simulation.

Tot_Evaluations:150

Avatar:<0.827.0> destroyed.

Avatar:<0.851.0> died at age:5082

Avatar:<0.835.0> died at age:3254

Avatar:<0.708.0> died at age:3058

Creating Flatlander

Avatar:<0.811.0> destroyed.

Avatar:<0.801.0> died at age:4511

Avatar:<0.819.0> destroyed.

Avatar:<0.859.0> died at age:3088

Avatar:<0.700.0> died at age:3819

Creating Flatlander

Avatar:<0.757.0> died at age:3000

Creating Flatlander

Avatar:<0.827.0> destroyed.

Avatar:<0.767.0> died at age:3903

Creating Flatlander

Avatar:<0.819.0> destroyed.

Avatar:<0.716.0> died at age:2779

Creating Flatlander

Avatar:<0.784.0> died at age:3079

Creating Flatlander

Avatar:<0.851.0> destroyed.

Avatar:<0.843.0> died at age:2468

But when we execute the population_monitor:start() function to create the population of prey or predators, or both, and then execute: visor:start(flatland), we will be able to observe the 2d world from above, in the same way that the sphere was able to observe the flatland from its third dimension in the book Flatland.

For the next part where we test our TWEANN system on the ALife simulation, I will assume that you have downloaded the source code for Chapter-18, with the new scape/flatland/world/visor libraries, with which you will be able to execute the same commands I will use in the following subsections to test our TWEANN's performance in this advanced application.

18.6.1 Simple Food Gathering Simulation

There are different types of ALife simulations that we could run at this time. We could run the simple Food Gathering Simulation , in which we set up the flat- land scape to only spawn the plants, and then we set the population_monitor pro- cess to only spawn the prey. This would result in a scape with renewable food source, plants, populated by prey agents learning to navigate through it and eat the plants as quickly as they can.

Assuming you have the newly added modules ( flatland and world ) present in the supplementary material, and you have opened the world.erl module, we first must ensure that only the plants are spawned in the flatland simulation. Thus in the init/3 function, with the World_Type == flatland, we set the case clause as fol- lows:

init(World_Type,Physics,Metabolics)->

XMin = -5000,

XMax = 5000,

YMin = -5000,

YMax = 5000,

WorldPivot = {(XMin+XMax)/2,(YMin+YMax)/2},

World_Border = [{XMin,XMax},{YMin,YMax}],

case World_Type of

flatland ->

Plants=[scape:create_avatar(plant,plant,gen_id(),{undefined,

scape:return_valid(Rocks++FirePits)},respawn,Metabolics)||_<-lists:duplicate(10,1)],

Poisons=[scape:create_avatar(poison,poison,gen_id(),{undefined,

scape:return_valid(Rocks++FirePits)},respawn,Metabolics)||_<-lists:duplicate(10,1)],

Plants

end.

Which ensures that polis spawns the flatland scape with only the plants present. We next define INIT_CONSTRAINTS within the population_monitor module as follows:

-define(INIT_CONSTRAINTS,[#constraint{morphology=Morphology,

connection_architecture =CA, population_evo_alg_f=steady_state,agent_encoding_types

=[neural]} || Morphology<-[flatland],CA<-[recurrent]]).

We also set in the flatland module to allow for the agents to live for a maxi- mum age of 20000 cycles, retain at most 10000 energy, and for the fitness func- tion to be: 0.001*CyclesAlive + PlantsEaten . This ensures that the fitness guides the evolution towards longevity, but also for the agents to be able to navigate the world as effectively as possible, and eat as many plants as they can. Furthermore, we set the plants to provide the agent 500 energy points when eaten, and poison to subtract 2000 energy points when eaten.

With this set, we execute polis:sync() to compile the two modified modules, and then execute population_monitor:start() function to run the ALife simulation. Finally, to observe the simulation, we also execute visor:start(). A video of one of the evolutionary runs of this simulation is available at [9,10].

We can set the termination condition for 25000 evaluations in the population monitor, and compose a trace that we could later graph to see how the fitness, NN sizes of the evolving agents, and their diversity, change over time. In fact, using our benchmarker module, we could even perform multiple evolutionary runs to compose a graph of Fitness Vs. Evaluations, NN Size Vs. Evaluations, and Popu- lation Diversity Vs. Evaluations. The following figures show the results of multi- ple benchmarks, where the populations were started with different constraints. I set up the constraints for the 4 performed benchmarks as follows:

1. A simulation where the population size was set to 10, and the seed agents were started with 2 sensors from the very start, each agent started with the range and the color sensors.

2. A simulation where the population size was set to 10, but the seed agents start- ed with only the range sensor, and had to evolve the connection to the color sensor over time.

3. A simulation where the population size was set to 20, and the seed agents were started with both, color and range sensors.

4. A simulation where the population size was set to 20, but the seed agents start- ed with only the range sensor.

Fig. 18.4 Fitness Vs. Evaluations.

We can see from Fig-18.4 that in all scenarios the agents increased in fitness, it did not matter whether the seed population was started with just the range sensor, or both, the range and the color sensor. Our system was successfully able to evolve connections to the color sensor for the agents which did not already have such a morphological feature. Also as expected, the two simulations where the population limit was set to 10, were able to achieve a slightly higher fitness faster. This occurred because the agents in the population of 10 had less competition for food than those in a population of 20. And of course when the agents started with both, range and color sensors, they achieved higher fitness faster than those which just started with the range sensor, since those starting with just the range sensor had to take time to evolve color vision.

Fig. 18.5 NN Size Vs. Evaluations.

The above figure shows how the size of the NNs increased during the evolu- tionary runs. More complex NNs were evolved over time, as the NNs evolved bet- ter navigational capabilities. What is interesting to note is that the NNs evolved in the populations where the population size was set to 20, were able to achieve the same fitness but with smaller NNs. I think that this is the case because with a larg- er population, more exploration of the various genotypes can be performed, and thus more efficient NNs can be evolved.

Fig. 18.6 Population Diversity Vs. Evaluations.

Finally, the above Fig-18.6 presents the diversity of the population plotted against evaluations. It is remarkable just how diverse the populations were. Diver- sity is calculated for the active population and the agents in the dead_pool, hence the ranges being between 0-20 and 0-40, rather than 0-10 and 0-20 agents. On eve- ry occasion, nearly 75% of the population was composed of agents which were all different from one another. We can also see that the diversity never dropped, yet as we saw from the fitness plot, the fitness did rapidly increase. This is just one of the great features of using the memetic approach to neuroevolution.

18.6.2 Dangerous Food Gathering Simulation

We can also set up the flatland scape to have not only the plants, but also the poison representing avatars. The poison avatars are just like plants, but when they are consumed, they decrease the prey's energy instead of increasing it, they de- crease the agent's energy by 2000 rather than increasing it by 500. When the ava- tar's energy reaches 0, it dies. This type of simulation, if we were to execute with just the prey population, would result in the Dangerous Food Gathering Simula- tion , because the agents must now move around the 2d world, gather the plants and avoid the poison.

To perform the Dangerous Food Gathering Simulation, we can leave the population_monitor module alone, or increase the termination condition to 50000 evaluations, as I've done during my experiments, and then modify the world.erl

file, setting the flatland World_Type to use 10 Plants and 10 Poisons, all of which respawn immediately after being consumed. The source code for this setup, looks as follows:

case World_Type of

flatland ->

Plants=[scape:create_avatar(plant,plant,gen_id(),{undefined,

scape:return_valid(Rocks++FirePits)},respawn,Metabolics)||_<-lists:duplicate(10,1)],

Poisons=[scape:create_avatar(poison,poison,gen_id(),{undefined,

scape:return_valid(Rocks++FirePits)},respawn,Metabolics)||_<-lists:duplicate(10,1)],

Plants ++ Poisons

end.

As in the Simple Food Gathering Simulation, 4 benchmarks were performed with the previously specified constraints, and the resulting plots of Fitness Vs. Evaluations, NN Size Vs. Evaluations, and Diversity Vs. Evaluations. The plots of these benchmark scenarios, are shown in the following figures, and the recorded videos of the simulation can be found at [11,12].

Fig. 18.7 Average Fitness Vs. Evaluations.

Similarly to the Simple Food Gathering Simulation (SFGS) , the above figure shows the plots of average fitness for the 4 constraints scenarios. In this simulation color plays a much more important role, because inability to differentiate between color results in one not being to tell the difference between plants and poison. On top of this, because the agents can push each other in the 2d environment, when we set the population limit to 20, there is enough of them that they will inadvert-

ently push each other on top of poison locations. Thus we can see that the bench- mark in which we set the population size to 20 and started the seed agents with just the range sensors, results in the lowest performance. While interestingly enough the benchmark of the seed population which started with only the range sensor, but whose population size limit was set to 10, performed the best. But in general, in all benchmarks the agents learned how to navigate through the 2d world, eat plants, and avoid poison. In all scenarios, though initially the agents started off wondering aimlessly, over time the agents first learned to move to- wards the plants and poisons, then mostly towards the plants only, and finally in the case of the champions of the population, swiftly navigate through the poison ridden landscape while eating plants.

Fig. 18.8 NN Size Vs. Evaluations.

Similarly to the SFGS, the NN size grows as the NNs evolve more complex behaviors, and adapt to the environment. The environment in this simulation is more complex, and thus unlike in the previous simulation, the agents themselves are more complex, reaching the size of 19 neurons. Also as in the SFGS, bench- marks conducted when the population size limit was set to 20, produce more con- cise NN based agents, composed of fewer neurons.

Fig. 18.9 Population Diversity Vs. Evaluations.

The diversity plot in Dangerous Food Gathering Simulation (DFGS) is similar to the one in SFGS, in both cases nearly 75% of the population is composed of agents different from one another. The diversity is maintained throughout the benchmark, there is no decline, only a sharp increases in diversity at the very start, followed by a stable maintenance of the high diversity profile.

18.6. 3 Predator Vs. Prey Simulation

Finally, we can again start the polis with a flatland which initially spawns only the plants within itself, but this time when starting the population monitor, we set the constraints to start with two morphologies, prey and predator. Because the prey can only consume plants for sustenance, and the predators can only consume the prey for sustenance, absorbing the energy of the prey into themselves, this re- sults in the Predator Vs. Prey Simulation (PPS). Furthermore, we set the simula- tion such that the predators can push the plants around when their energy reserve is above 1000.

In this simulation the prey agents learn how to navigate the 2d world, and eat the plants while avoiding the predators. At the same time, the predators learn to navigate the 2d world, and hunt the prey. Thus the two species co-evolve, learn how to evade and hunt, improve their strategies over time, and learn some very

clever trapping and baiting methods (in the case of the predators), as will be dis- cussed and shown next.

Because the evolutionary paths of the two species were so dependent on which of the species learned to navigate, evade, and hunt first, the averages of the evolu- tionary runs were meaningless. This is due to the fact that when the prey learned to navigate through the flatland and eat the plants before the predators learned how to hunt them efficiently, the prey were able to achieve high fitness scores, while the predators did not do as well. On the other hand, during the evolutionary runs in which the predator specie was able to evolve agents which could navigate and hunt, before the prey evolved agents which could evade the predators, the predators achieved high fitness, and the prey did not do as well. Thus, instead of creating the averages, I chose to plot the results of a single such evolutionary run.

Because the flatland will now be populated by a population of size 10 of prey, and a population of size 10 of predators, we will be able to see in the plots the in- teraction and correlations between the two competing species. The next 4 plots are of Fitness Vs. Evaluations, NN Size Vs. Evaluations, Diversity Vs. Evaluations, and the agent Turnover (death rate) Vs. Evaluations. Because we calculate the specie statistics every 500 evaluations, the Turnover Vs. Evaluations shows the death rate of the particular specie with relation to another for those 500 evalua- tions. Thus for example if both species survive for an equal amount of time, both will have a turnover of 250. On the other hand if it's an open hunting season on prey, and the predators are just running around eating the prey, the prey will have a very high turnover, while the predators will live for a much longer number of cycles, and thus have a much lower turnover. The recorded videos of the evolu- tionary run are shown in [13,14].

35

30

25

20

15

10

5

P-10: Prey

P-10: Predator

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

Evaluations

Fig. 18.10 Average Fitness Vs. Evaluations.

The average fitness for the prey drops dramatically with predators around. Al- most during every simulation, eventually the predators learned to navigate effec- tively, and attack the prey that passed nearby. Because the prey had a slightly higher maximum speed than the predators, the predators eventually evolved to on- ly briefly chase the prey. If the prey moved away too quickly, the predators would move back, closer towards the plants. This was too a very interesting, and highly organic adaptation. The predators would horde around the plants, because the prey at some point or another would have to go towards the plants to survive. Finally, the most interesting and complex behavior was trapping, as shown in [13]. In this evolved behavior, the agents push the plants around, and seem to be hiding behind them. Because the ray casting based sensors used by the prey only see the plants, the prey would go towards the plants, and as soon as they would consume the plant, the predators behind that plant would eat the prey, and thus consume its energy, and the energy it just gained from eating the plant. This was a very clever ambushing behavior, and one of the most complex I've seen evolved in any ALife simulation. I cannot see a more complex behavior that is possible to evolve in such a simple and barren 2d environment.

Avg. Fitness

18

16

14

12

10

8

6

4

P-10: Prey

P-10: Predator

2

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

Evaluations

Fig. 18.11 NN Size Vs. Evaluations.

As seen in the Fig-18.11 , the predators, though possessing complex behaviors, ended up with much smaller neural networks than the prey. I suspect that this oc- curred because of the high Turnover of the prey, and their need to deal with mov- ing and dynamic predators. Then again, the behaviors evolved by the predators would make one think that they would require more neurons to execute such ma- neuvers. The evolution within this environment must have found an efficient way to encode such a behavior. The prey most likely increased in their NN size be- cause they were getting killed too quickly, and so evolution had difficult time op- timizing their topologies, due to not having enough time to work with it. Thus the selection pressure was most likely compromised, and so the prey specie's NN size increased at a higher rate than is optimal.

Avg. # of Neurons/Organism

Fig. 18.12 Population Diversity Vs. Evaluations.

Figure-18.12 shows that as in the previous simulations, the population diversity within the Predator Vs. Prey Simulation, is similarly high. Almost every agent was different from every other agent in each species.

Fig. 18.13 Population Turnover Vs. Evaluations.

18.7 Discussion

As expected from a simulation where one specie completely dominates anoth- er, and being able to hunt and kill it while surviving on the energy gained from consuming it, the turnover of the predators is much lower than that of the prey. As seen in Fig-18.13 , though both species start off equally, the predators quickly learn to hunt the prey, and thus the prey's turnover increases while that of the predator's decreases. After 30000 evaluations the turnover for both species reach- es an equilibrium. The predators maintain a turnover of about 150 evaluations per 500 evaluations. Complementary, the prey stay within a turnover average of 350 evaluations per every 500 evaluations.

In all three simulations we have seen the agents evolve in complexity, their be- havior, and fitness. In the Simple Food Gathering Simulation (SFGS), the agents learned to navigate the 2d flatland and gather the plants. In the more complex ver- sion, the Dangerous Food Gathering Simulation (DFGS), the agents learned to navigate the flatland, eat the plants, and avoid poison. Finally, in the Predator Vs. Prey Simulation (PPS), we evolved two species, that of prey and predator. The prey evolved to navigate the 2d world, gather food, and avoid being eaten by the predators. The predators evolved to navigate the 2d world and hunt the prey. All the simulations were successful, and the videos of the simulations are available online for viewing [9-14].

The most impressive results of our ALife experiment was with regards to the evolved ambushing behavior of the predator species. This behavior, shown in [13], has a very organic nature to it. It is difficult to come up with a more clever ap- proach to hunting prey in the barren flatland. Amongst all the other ALife simula- tions I've had come across, the behavior evolved in Flatland and by the TWEANN system we developed here, seems as one of the more complex. Thus, we have shown that our TWEANN can indeed produce advanced behavior, and evolve complex NN structures.

But even the 9 (4 variations of SFGS, 4 variations of DFGS, and 1 type of PPS) different ALife experiments performed in this chapter do not even begin to scratch the surface within this field. The flatland could be further advanced, and we could allow the plants to grow, increase in energy, decay, produce seeds... Make the en- vironment more dynamic and natural for the inhabiting prey and predators. We could allow for the prey and predators to evolve new and other sensors and actua- tors, perhaps projectile weapon actuators, and new kind of sensors, for example a simulation of a chemical sensor, a sense of smell, which could simply gage the proximity of the prey or predator, but not the direction. Numerous other sensors and actuators can be built, including those used for communication. It would be interesting to see what other behaviors would evolve, given the environment is complex enough to support them.

Even better, we could interface our TWEANN system to a 3d robot simulator like Gazebo [15], and evolve agents in a much more complex and dynamic world. The higher the granularity at which the simulated environment approximates the natural world, the more complex the organisms that we can evolve in it, and the more intelligent the agents evolved will be, due to the need to deal with the more dynamic environment, which can be modified by the agents, and require them to be modified in order to deal with the modified environment...

Though in this chapter we did not get the chance to perform the same experi- ments but with substrate encoded NNs, on my own I did perform such evolu- tionary runs, and they were just as successful. The only interesting difference I've noticed was that at the very beginning, the substrate encoded NN based agents tended to go in straight lines and turn at sharp angles, unlike the standard, direct encoded NNs, which from the very start wondered through flatland in a fluent manner. Truly there are an infinite amount things to try and experiments to run, when it comes to Artificial Life. And based on the types of behaviors evolved in our simulations, perhaps the Artificial part can be removed, after all, intelligence is intelligence is intelligence . It does not matter whether sentience and intelligence is based on the computations performed within the chemical computer of the soft and always decaying flesh, or within the analog and digital computer of the im- maculate and perfect circuits of the non biological substrate.

18.8 Summary

In this chapter we extended our TWEANN system and constructed a public scape by the name flatland. The flatland scape provides avatars and an artificial environment to the interfacing NN based agents, and allows us to evolve 2d organ- isms which learn to navigate and live in the simulated 2d environment. We per- formed 3 ALife experiments, the Simple Food Gathering Simulation, the Danger- ous Food Gathering Simulation, and the Predator Vs. Prey Simulation. In all three experiments our NN based agents evolved to navigate the flatland, gather food, avoid poison, and hunt and kill each other. The PPS simulation stood out in par- ticular. In it, the predators evolved the behavior to ambush the prey, pushing the plants in front of themselves until the prey came near to eat the plant, at which point the predator ate the prey, consuming it and the energy it had gained from eating the plant.

Yet still there is much to explore when it comes to ALife. The flatland module can be much further extended. Due to the amount of non Neuroevolutionary back- ground that was used in the creation of the 2d simulation, we did not go over every single line of code as we did in the previous chapters while developing our bleeding edge TWEANN. But the source code for the presented system is availa- ble as supplementary material in [8], and thus everything shown and presented can

18.9 References

be replicated by using the Chapter-18 source code. Finally, the recorded videos of the simulations can be found in [9-14].

[1] Bedau, M. (2003). Artificial Life: Organization, Adaptation and Complexity From The Bot- tom Up. Trends in Cognitive Sciences 7, 505-512.

[2] Danaher PJ, Conroy DM, McColl-Kennedy JR (2007) Artificial Life Models in Software . Andrew Adamatzky and Maciej Komosinski (Eds.). (2005, Springer-Verlag.) Hardcover, 69.95, 344 pages, 189 illustrations. Journal of Service Research 13, 43-62, ISBN 9781848822849.

[3] Adamatzky (2009). Artificial Life Models in Hardware. Media, 280, ISBN 9781848825291. [4] Abbott, E. A. (2008). Flatland: A Romance of Many Dimensions (Oxford University Press). [5] Flatland 2d robot and environment simulator:

www.DXNNResearch.com/NeuroevolutionThroughErlang/Flatland

[6] Stage, a 2d environment and robot simulator:

http://playerstage.sourceforge.net/index.php?src=stage

[7] De BM, Van KM, Overmars M, Schwarzkopf O (2000) Computational Geometry. Springer- Verlag. ISBN 3540656200. Chapter 14: Quadtrees: pp. 291-306.

[8] Chapter-18 Supplementary material:

www.DXNNResearch.com/NeuroevolutionThroughErlang/Chapter18

[9] Simple Food Gathering Simulation 1:

http://www.youtube.com/watch?v=i0nCHMd5Oc8&feature=related

[10] Simple Food Gathering Simulation 2:

http://www.youtube.com/watch?v=i0nCHMd5Oc8&feature=related

[11] Dangerous Food Gathering Simulation 1:

http://www.youtube.com/watch?v=mZPCXZUEog8&feature=related

[12] Dangerous Food Gathering Simulation 2:

http://www.youtube.com/watch?v=yOTEMhXbow&feature=related

[13] Predator Vs. Prey Simulation 1:

http://www.youtube.com/watch?v=HzsDZt8EO70&feature=related

[14] Predator Vs. Prey Simulation 2:

http://www.youtube.com/watch?v=s0_ghNq1hwQ&feature=related

Chapter 19 Evolving Currency Trading Agents

Abstract The application of Neural Networks to financial analysis in general, and currency trading in particular, has been explored for a number of years. The most commonly [2,3,4,5] used NN training algorithm in this application is the backpropagation. The application of TWEANN systems to the same field is only now starting to emerge, and is showing a significant amount of potential. In this chapter we create a Forex simulator, and then use our neuroevolutionary system to evolve automated currency trading agents. For this application we will utilize not only the standard sliding window approach when feeding the sensory signals to the neural encoded agents, but also the sliding chart window, where we feed the evolved substrate encoded agents the actual candle-stick price charts, and then compare the performance of the two approaches. As of this writing, the use of ge- ometrical pattern sensitive NN based agents in the analysis of financial instrument charts has not yet been explored in any other paper, to this author's knowledge. Thus in this chapter we pioneer this approach, and explore its performance and properties.

This particular chapter is based on a paper which I have recently submitted for publication, in which I present this new approach, and compare the interesting results achieved. Thus, in this chapter we are indeed pioneering the approach of performing geometrical analysis of the actual charts of the financial instruments through substrate encoded NN based agents.

Because of this chapter's similarity to the noted paper, quite a few sections will be very similar to it, and some of the sections and paragraphs quoted from it. This will be primarily the case with the results, encoding method, and the introduction to the forex sections whose content is quoted to a significant extent from the pa- per. But unlike the readers of the above mentioned paper, you have an intricate understanding of how our system works, and why the results are the way they are, and how exactly the simulation and interfacing are performed and the manner in which the used agents were evolved, because you and I have built this new version of DXNN together.

Foreign exchange (also known as Forex, or FX) is a global and decentralized financial market for currency trading. It is the largest financial market, with a dai- ly turnover of 4 trillion US dollars. The spot market, specializing in the immediate exchange of currencies, comprises almost 40% of all FX transactions, 1.5 trillion dollars daily. Because the foreign exchange market is open 24 hours a day, closing only for the weekend, and because of the enormous daily volume, there are no

sudden interday price changes, and there are no lags in the market, unlike in the stock market. In this chapter we discuss and implement the first of its kind (to my knowledge) of a topology and weight evolving artificial neural network (TWEANN) algorithm which evolves geometry-pattern sensitive trading agents that use the actual technical indicator charts (the actual graphs) as input. Once we have dis- cussed and developed the said application, we will compare the SENN based traders which use Price Chart Input (PCI) , to the standard, direct encoded NN based trading agents which use Price List Input (PLI) , in which the time series of closing prices is encoded as a list of said prices and/or other technical indicators. Our goal in this chapter is to implement and test the utility of using graphical input of the time series, the use of candle-stick style chart as direct input to the geometry sensitive NN systems evolved using our TWEANN, and then to benchmark the performance. To accomplish this, we will build the forex simulator, the new inter- faces (sensors/actuators), and all the needed new features to make our TWEANN work within this field, and in general be used in time series analysis problems not related to finance (such as earthquake data analysis, or frequency analysis...).

One of the two ways to use machine learning in financial market is as follows: we can use the agents to predict the future price of a financial instrument, and then based on the prediction trade the said instrument ourselves, or we can have the agent trade the financial instrument autonomously, without us in the loop. We will implement and test the second approach. Neural networks have shown time and time again [7,8,9,10,11,12,13,14] that due to their highly robust nature, and uni- versal function approximation qualities, that they fit well in the application to fi- nancial market analysis. In published literature though [15,17,18,19,20], the most commonly used neural network learning algorithm is backpropagation. Backpropagation, being a local optimization algorithm, can and does at times get stuck in local optima. Furthermore, it is usually necessary for the researcher to set up the NN topology beforehand, and since the knowledge of what type of NN to- pology works best for which dataset and market is very difficult, or even impossi- ble to deduce, one usually has to randomly create NN topologies and then try them out before settling down on some particular system. TWEANN systems are rela- tively new, and they have not yet been tested thoroughly in financial markets. But because it is exactly these types of systems that can evolve not only synaptic weights, but also the NN topologies, and thus perform a robust global search, the use of such a TWEANN system in evolving NN based traders is exactly the con- cern of this chapter.

19.1 Introduction to Forex

The foreign exchange market, or Forex, is a global, fully distributed, currency trading financial market. Unlike the stock market where a single buyer or seller with enough capital can dramatically change the price of the stock, the forex market

is much too vast and distributed for any currency pair to be so easily affected by a single trader. Furthermore, the fact that currencies can be traded nonstop, 24 hours a day, 5 days a week, there are a lot fewer spaces in the data stream where news might be aggregating but no technical data is available. Because of these factors, there is a greater chance that the pricing data does indeed represent the incorpo- rated news and fundamental factors, which might thus allow for prediction and trend finding through the use of machine learning approaches to be made.

The question of predicting future market prices of a stock, or currency pairs as is the case here, has been a controversial one in general, and especially so when using machine learning to do so. There are two main market hypotheses which state that such predictions should be impossible. These two market hypotheses are the Efficient Market Hypothesis (EMH), and the Random Walk Theory (RWT).

The EMH states that the prices fully reflect all the available information, and that all new information is instantly absorbed into the price, thus it is impossible to make profits in the market since the prices already reflect the true price of the traded financial instrument. The RWT on the other hand states that historical data has no affect on pricing, and that the future price of a financial instrument is com- pletely random, independent of the past, and thus it cannot be predicted from it. Yet we know that profit is made by the financial institutions and independent trad- ers in the existing markets, and that not every individual and institution participat- ing in the trading of a financial instrument has all the available information imme- diately at his disposal when making those trades. Thus it cannot be true that EMH and RWT fully apply in a non ideal system representing the real world markets. Therefore, with a smart enough system, some level of prediction above a mere coin toss, is possible.

There are two general approaches to market speculation, the technical and the fundamental. Technical analysis is based on the hypothesis that all reactions of the market to all the news, is contained within the price of the financial instrument. Thus past prices can be studied for trends, and used to make predictions of future prices due to the price data containing all the needed information about the market and the news that drive it. The fundamental analysis group on the other hand con- centrates on news and events. The fundamental analyst peruses the news which cause the prices, he analyzes supply & demand, and other factors, with the general goal of acting on this information before others do, and before the news is incor- porated into the price. In general of course, almost every trader uses a combination of both, with the best results being achieved when both of these analysis ap- proaches are combined. Nevertheless, in this paper our NN systems will primarily concentrate on the raw closing price data. Though in the future, the use of neuroevolution for news mining is a definite possibility, and research in this area is already in the works.

19.2 Trading and the Objective

If you were to decide on trading currency, you'd first need to find a financial service provider. Just like with the stock market, the financial institution you'd choose would provide you with the access to the financial data, where the delay between the true current trading price and the one you see, would depend on the account you have. More pricier accounts with pricier brokers could provide lower lags and more accurate data. Whichever institution is chosen, and whatever ac- count you may have decided to get with the broker, your broker would then pro- vide you with different ways of trading currencies through them. They could simply provide you with the IP address of their servers, your password, login, and an API, the format with which to request the financial data and send to the server commands to trade the currency pairs. Or they could provide you with a web based interface, where you'd be able to see your balance, the plotted charts, have the ability to close, open and hold long and short positions, and use various plot- ting tools and technical indicators to try and extract or recognize various geomet- rical and financial patterns within the data. But web based interfaces are usually a bit slower than many like, and so many traders opt for a software they can install on their desktop. Among such trading interface programs is one that is offered by a lot of brokers, it is called MetaTrader4 (MT4), and more recently the MetaTrader5 (MT5). Both are very similar, they differ mostly in the script lan- guage offered. The MT5 offers a slightly more updated version of the embedded scripting language which resembles C/C++. For our current conversation it does not matter whether the broker offers MT4 or MT5, and so for the sake of argument we simply assume it is MT5. Fig-19.1 shows an example of the MetaTrader inter- face

Fig. 19.1 The MetaTrader Interface.

The broker provides you with your login and password, and the IP of their server. At this point you'd install their trading platform, and then immediately be able to see the current currency pair exchange rates for various pairs. You would also have access to historical financial data. Finally, the trading platform would al- so usually offer even an incorporated news service from one of the news provid- ers, it would provide plotting tools, various technical indicators, built in technical analysis graphs, and different ways to represent the currency pair plots. One of the most popular charts is the candlesticks chart, an example of which is shown in Fig- 19.2. This type of plot presents the current price, the highest price achieved, and the lowest price achieved, during the chosen time intervals/steps/ticks, all in one chart. In the below figure, the candlesticks style plot uses 15 minute ticks (the change in pricing is calculated for 15 minute intervals)

Fig. 19.2 Candlesticks chart using 15min ticks.

Though at any given time you are given the exchange rate of the two currencies you wish to exchange against each other, the broker must also make some profit. To do this, unlike in the stock market where fee is charged, in the FX market the price is given with a spread (the difference between the bidding price and the ask- ing price) which incorporates the broker's fee. When you open a long position, buying USD against EUR, you do so at a slightly higher price than the true market exchange rate. When you go short, by for example selling USD against EUR, you do so at a slightly lower price than is the true market value. So, if immediately after you make a trade, you try to trade back, you will sustain at least a loss asso- ciated with the spread. Thus, the lower the spread that is offered by the financial service provider, the greater profit you can reach, and the more precise currency exchange price that you are observing.

Given this data, these tools, and financial instrument charts, the goal of a forex trader is to opportunistically exchange two currencies, such that when he trades them back to close the position, he is left with a profit. The technical analyst be- lieves there are patterns and trends in the financial market and therefore in the

charts observed, and that you can at least to some degree exploit these patterns and trends to make a profit. On the other hand, the fundamental analyst does not care about the patterns, he pays attention to the news, and tries to act on them before everyone else does, he tries to figure out and project what the news might mean with regards to the global economic situation and how it will affect the worth of currencies.

At this point it might seem as if the fundamental analyst is on the right track, and that there is a much lower level of predictive power that you can extract from prices alone. But consider the situation where a significant amount of institutions and individuals exist that believe in technical analysis, and therefore a lot of them trade by the rules and techniques prescribed by the technical analysis. Thus, if some particular chart pattern predicts that the price should rise, a lot of individuals and organizations will buy... and the price will rise. If another pattern predicts the price will fall, then those who believe in such patterns, will sell one currency against another, and the price will fall. We have seen what such panic and group think leads to, as we sustained a number of market crashes during the last decade. Technical analysis might or might not provide a significant advantage. But the fact that so many people believe in it, and trade based on its rules, means that they will contribute to these self fulfilling prophecies. So, because they believe in these pat- terns, because they trade based on these patterns, there will be, at least to some degree, such patterns in the signal, and so we can exploit it if our system can see the patterns and act on them faster and better than the other traders. If only our system too could extract such geometrical regularities from the charts.

Neural Networks have seen a lot of use and success in the financial market. One of the main strengths of NN systems, which makes them so popular as market predictors, is that they are naturally non linear, and can learn non linear data corre- lation and mapping. Artificial neural networks are also data driven, can be on-line- trained, are adaptive and can be easily retrained when the markets shift, and final- ly, they deal well with data that has some errors; neural networks are robust.

When traders look at the financial data plots, they do not usually look just at raw price lists, when a trader performs a time series analysis he instead looks at the chart patterns. This is especially the case when dealing with a trader prescrib- ing to the technical analysis approach. The technical analyst uses the various tech- nical indicators to look for patterns and emerging trends in these charts. There are many recurring patterns within the charts, some of which have even been given names due to their common appearance, like for example the “head and shoul- ders ” pattern in which the time series has 3 hills, resembling head and shoulders. Other such patterns are the “cup and handle ”, the “double tops and bottoms ”, the “triangles ”... Each of these geometrical patterns has a meaning to a trader, and is used by the trader to make predictions about the market. Whether these patterns really do have a meaning or not, is under debate. It is possible that the fact that so many traders do use these techniques, results in a self fulfilling prophecy, where a large number of the traders act similarly when encountering similar geometrical

chart patterns, thus making the patterns and their consequences a reality. This also means that if we can evolve an agent which can respond to such patterns faster than the other traders, then we will be able to exploit this type of market behavior.

The standard neural networks used for price prediction, trend prediction, or au- tomated trading, primarily use the sliding window approach, as shown in Fig-19.3 , where the data is fed as a vector, a price list, to the NN. This vector, whether it holds only the historical price data, or also various other technical indicators, does not show these existing geometrical chart patterns which are used by the traders. If the NN does not have a direct access to the geometrical patterns used by human traders, it is at a disadvantage because it does not have all the information on which the other traders base their decisions on.

Fig. 19.3 A standard price list input based currency trading agent.

But how do we allow the NN to have access to this geometrical pattern within the data, and give it the ability to actually use it? We cannot simply convert these charts to bitmaps and feed them to the NN, because the bitmap encoded chart will still be just a long vector, and the NN will not only have to deal with an input with high dimensionality (dependent on the resolution of the bitmap), but also there would really be no connection between this input vector and the actual geomet- rical properties of the chart that could be exploited.

The solution comes in the substrate encoding we implemented in the earlier chapters. The substrate encoding approach has been actively used in computer vi- sion, and as we discussed in the previous chapters, it has a natural property of tak- ing geometrical properties of the sensory signals into consideration, and it can

through its own geometrical topological structures, further extract and reveal the geometrical regularities within the data, and it is these geometric regularities that technical analysis tries to find and exploit.

With this type of indirect encoded neural network we can analyze the price charts directly, making use of the geometrical patterns, and trends within. Because each neurode in the substrate receives a connection from every neurode or input element in the preceding hyperlayer, the chart that is fed to the substrate must first be reconstructed to the resolution that still retains the important geometrical in- formation, and yet is computationally viable as input. For example, if the sliding chart that is fed to the substrate is 1000x1000, which represents 1000 historical points (horizontal axis), with the resolution of the price data being (MaxPlotPrice – MinPlotPrice)/1000 (the vertical axis), then each neurode in the first hidden pro- cessing hyperlayer of the substrate will have 1000000 inputs. If the substrate has three dimensions, and we set it up such that the input signals are plane encoded and located at Z = -1, with a 10X10 neurodes in the hidden hyperplane located at Z = 0, and 1X1 neurodes in the third hyperplane located at Z=1 (a very similar architecture is shown in Fig-19.4 , only with the hidden hyperplane being a 3x3 one), then each of the 100 neurodes at Z = 0 receives 1000000 inputs, so each has 1000000 synaptic weights, and for this feedforward substrate to process a single input signal would require 100*1000000 + 1*100 calculations, where the 1*100 calculations are performed by the neurode at Z = 1, which is the output neurode of the substrate. This means that there would be roughly 100000000 calculations per single input, per processing of a single frame of the price chart.

Fig. 19.4 A hyperlayer-to-hyperlayer feedforward substrate processing a 2d chart input.

Thus it is important to determine and test what resolution provides enough ge- ometrical detail to allow for prediction to be made, yet not overwhelm the NN it- self and the processing power available to the researcher. Once the number of the historical prices (horizontal axis on the price chart) and the resolution of the prices (vertical axis on the price chart) are agreed upon, the chart can then be generated for the sliding window of the currency pair exchange rates, producing the sliding chart. For example, Fig-19.5A shows a single frame of the chart whose horizontal and vertical resolution is 100 and 20 respectively, for the EUR/USD closing prices taken at 15 minute time-frames (pricing intervals). This means that the chart is able to capture 100 historical prices, from N to N-99, where N is the current price, and N-99 is the price (99*15) min ago. Thus, if for example this chart's highest price was $3.00 and the lowest price was $2.50, and we use a vertical resolution of 20, the minimum discernible price difference (the vertical of the pixel) is (3- 2.5)/20 = $0.025. For comparison, Fig-19.5B shows a 10x10 chart resolution of a recreated candlesticks chart.

Fig. 19.5 A. and B. show a 100x20 and 10x10 resolution based charts respectively, using the candlesticks charting style.

Similar to Fig-19.4 , in Fig-19.5 the pixels of the background gray are given a value of -1, the dark gray have a value of 0, and the black a value of 1. These can- dlestick charts, though of low resolution, can be seen to retain most of the geo- metrical regularities and high/low pricing info of the original higher resolution plot, with the fidelity increasing with the recreated chart's resolution. It is this type of chart that we can use as input plane that is fed to the substrate.

To accomplish this objective we need to use our TWEANN to evolve such agents, agents which instead of simply predicting the next tick's currency ex- change rate, can directly interface with the financial service provider, gather cur- rency exchange pricing data, and make the actual trades. To do so we can connect our TWEANN system through its sensors and actuators to an electronic trading platform like the mentioned MT4 or MT5, and then use the demo account as the

simulator as we evolve the trading agents, letting the MetaTrader itself keep track of the agent's balance and therefore its implicit fitness score. But that will take too long, and we cannot very easily use thousands of MT instances, one private in- stance for every agent being evaluated.

Instead, we can get historical financial data from one of the brokers, and build our own Forex simulator in Erlang, simulated as a private scape. If we take that route, we could then easily spawn such private scapes for every agent in the popu- lation, allowing the agent to interface with it through its sensors and actuators. As long as we build the FX simulator accurately enough, such that it emulates the fees and the prices associated with one of the real broker offered services, and us- es real world data, the historical data for example, we will be able to evolve cur- rency trading agents which could be, after having been evolved and tested, applied to real markets and used to autonomously interface with the financial service pro- viders and make autonomous trades.

In the next chapter we will discuss the architecture of the Forex simulator, and the sensors and actuators used to interface with it. We will discuss how to create the sensors and actuators so that we can feed our evolved NN based agents the Price Chart Input (PCI) signals (the actual graphical plots of the financial instru- ments), and the Price List Input (PLI) signals (the standard sliding window price list). We will then implement the Forex simulator and the sensors and actuators needed to interface with it. And then finally evolve the autonomous currency trad- ing agents.

19.3 The Forex Simulator

Like with previous simulations, we need to abstract the Forex simulator into its own private scape. The private scape will simply simulate the forex market for a particular currency pair based on real historical data that we can download from one of the existing financial service providers.

Once we download the historical data for one of the currency pairs, let's say EUR/USD, which is the most popularly traded currency pair, we can enter it into a list, an ets or dets table, or mnesia , which can then be used by the simulator. In our case, we will use ets, although a simple list would have worked just as well, if not even better and faster in a scenario when the currency based properties have to be fed in a series to an interfacing trading agent. Although the list representation would not be as flexible as an ets one, and it is for that reason we are not using it.

We will then create a Forex market simulator where each interfacing NN will be given a $300 starting balance. Each agent will interface with its own private scape for which it will produce an output which will be converted by its actuator to – 1 if it's less than -0.5 , 0 if between -0.5 and 0.5 , and 1 if greater than 0.5 . When interacting with the Forex simulator, -1 means go short, 0 means close posi-

tion (or do nothing if no position is open), and 1 means go long (if you currently have a short position opened, then first close the position, and then go long). The Forex simulator will simulate the market using 1000 real EUR/USD currency pair closing prices, stretching from 2009-11-5-22:15 to 2009-11-20-10:15, using 15 min ticks. The simulator will use a price spread of $0.00015, which is about aver- age for a standard account from a financial service provider like OANDA or Alpari.

Because we will want to test the generalization of our evolved agents, their ability to be applied to previously unseen financial price data and successfully use it to make trades, we take the mentioned 1000 point dataset and further split it into training and generalization subsets. We w ill make the training set out of the first 800 time ticks, ranging from: 2009-11-5-22:15 to 2009:11-18-8:15, and the test- ing/generalization data we will set to the immediately following 200 time steps from 2009-11-18-8:15 to 2009-11-20-10:15. Finally, when the agent is opening a position, it is always done with $100 leveraged by x50 to $5000. Thus the losses and gains are based on the $5000 opened order. The leverage of x50 is a standard one provided by most brokers, since the change in currency pair prices is very low, profit is made by trading high volumes of the same.

The private scape simulation will interface with the agent through the agent's actuator and sensor interfacing messages. The Forex simulating private scape will accept the following list of messages from the agent:

1. {From,sense,TableName,Feature,Parameters,Start,Finish} : Is a request to the Forex simulator sent from the agent's sensor to acquire a list or a chart of historical financial data. The element TableName specifies from which ta- ble/database to read the financial data, essentially it specifies the currency pair. The element Parameters specifies which set of technical indicators, and the vector length the simulator should compose and send to this sensor. It could for example simply be [Hres,list_sensor], which would prompt the scape to send the sensor a PLI based signal. On the other hand if the Parameters element was set to [HRes,VRes,graph_sensor], the scape would first compose an input plane with a horizontal and vertical resolution of HRes and VRes respectively, and then forward that signal to the sensor. Finally, the Start and Finish parameters are used by the private scape at the very start to dictate the starting and ending indexes for this particular evaluation.

2. {From,sense,internals,Parameters} : This is another type of sensory signal re- quest that a sensor could poll the scape with. This sensory signal requests in- formation with regards to the agent's account internals, the information pertain- ing to the agent's account with the financial service provider. Information of things like the current balance, whether the agent is currently holding a long, short, or no position. And the percentage change of the position the agent is holding, if any.

3. {From,trade,TableName,TradeSignal} : This is the signal sent by the agent's actuator. It specifies the TableName parameter, which is effectively the currency

pair which is to be traded (though in this scape we will only use a single cur- rency pair, in future implementations we could evolve multiple sensors and ac- tuators that trade multiple currency pairs all at the same time using these TableName specifications). The TradeSignal parameter is a list of length 1, with a value set to either -1, 0, or 1, which specifies whether to short, hold, or go long on the currency pair. For example, if TradeSignal is set to -1, and the agent is already shorting a position, then it maintains its short position, if the agent holds no position, it opens a short position on the currency pair, and fi- nally if the agent currently has a long position open, it first closes it, and then opens a short position. It acts symmetrically if the signal is 1. If it sends the signal of 0, then whatever the position is open, gets closed.

The scape will of course have to keep track of the agent's positions, fitness score (net worth), and other parameters to keep track what time it is currently sim- ulating, and what signals it should feed the agent. To allow the scape to keep track of it all, the technical information about the currency pair, the agent's state, its po- sition, its open and closed orders... we will need to create 4 new records, as shown next:

-record(state,{table_name,feature,index_start,index_end,index,price_list=[]}).

-record(account,{leverage=50,lot=10000,spread=0.000150,margin=0,balance=300,

net_asset_value=300, realized_PL=0,unrealized_PL=0,order}).

-record(order,{pair,position,entry,current,units,change,percentage_change,profit}).

-record(technical,{

id, %Format: {Year,Month,Day,Hour,Minute,Second,Sampling_Rate}

open,

high,

low,

close,

volume

}).

The state record keeps track of the general state, the index_start and index_end values, specified by the sensor. The current price list, which is particular to the implementation, and which makes things a bit more efficient by keeping the most recently sent signal in the list, so that the next time we need only poll the database for a single value, which is appended to the list, with the oldest value removed from it. The account record keeps track of the agent's standard account information. Its leverage, the currency pair lot size it trades in, the spread, margin if any, agent's current balance, its net_asset_value which is effectively the agent's fitness, and fi- nally the realized and unrealized profit/loss. The last element in the list is the or- der, which is set to undefined when the agent has no order open with the simulated broker, and is set to the record order , when an order is opened. The record order specifies everything about the currently opened order. It specifies the current posi- tion (long or short), the entry price, the number of units traded, the dollar change

19.4 Implementing the Forex Simulator

in the position, the percentage_change of the position, and the profit (or loss) in dollars of the order. Finally, the record technical is used by the scape to store the actual historical financial data in its ets table. This table can be populated by for example first having MetaTrader5 dump a text file with historical data to the desk- top, and then read the open, high, low, close, and volume parameters from the text file to ets.

Now that we've decided on the Forex simulator interfacing format and the data types this new private scape will use, we can implement it.

We know what elements need to be implemented, and from having implement- ed a number of private scapes, and one public scape, we know the general archi- tecture and structure that we need to construct. But this time we will slightly devi- ate from our standard approach, and instead of implementing the forex simulator inside the scape module, we will implement it in its own module, and only use the scape module to call it. As more and more scapes are added, using a single module is simply not enough, especially when the more complex scapes will require hun- dreds or thousands of lines of code. Thus we create the fx.erl module in which we will implement the private scape of the forex simulator.

We first give the private scape acting as a forex simulator a name: fx_sim. With that, we will be able to specify it in the sensors and actuators we will create in the next section. We then add to the scape module the function fx_sim/1, which is ex- ecuted by the scape:prep/2 function when a private scape is spawned. It is the fx_sim/1 function that will be called to execute the actual private scape located in the fx module. The following shows the simple implementation of the fx_sim/1 function added to the scape module.

fx_sim(Exoself_PId)->

fx:sim(Exoself_PId).

It is the sim/1 program and the fx module that contains all the needed functions that read from the text file, populate the ets table with financial information, and the functions that compose the PLI and PCI sensory vectors for the agent. The reading of the text file generated by MT5, and the writing of the ets table with the said da- ta is out of scope for this text. Thus we will concentrate only on the implementa- tion of the actual forex simulator, whose architecture and operational steps are shown in Fig-19.6 , and whose implementation is shown and elaborated on, in List- ing-19.1.

Fig. 19.6 The fx:sim/1 architecture diagram.

Let's go through the above shown steps before implementing the architecture:

1. The sensors of the agent poll the fx_sim for sensory data.

2. The scape checks what type of sensory signals are being asked for: account internals, PCI encoded signals, or PLI encoded signals.

3. The scape looks inside its database for the currency exchange rates, and based on the resolution/length of the historical currency exchange rate, builds a price list of that resolution.

4. The calling function in the receive clause is returned the price list.

5. Based on the receive clause, whether it is PCI or PLI, it encodes the returned price list accordingly, either as a simple price list, or as a price chart using trinary (-1,0,1) encoding.

6. The sensory signal (PCI or PLI, and the Internals) are forwarded to the agent's sensors.

7. The agent processes the sensory signals.

8. Based on the sensory signals and the agent's reasoning about them, the agent produces an output, and with its actuator forwards it to the fx_sim to make a trade.

9. The receive clause forwards the trading account made by the agent to the order handling function of fx_sim.

10.The signal is forwarded to the account processing function.

11.The private scape accesses the agent's account.

12.fx_sim queries the database for the current currency pair exchange rate. 13.The database checks the current currency pair exchange rate.

14.The database returns the current currency pair exchange rate, but at the same time moves the index of the “current ” timestep, to the next time step, advancing one tick forward in the simulated market.

15.The private scape executes the agent's order. But also, knowing the exchange rate of the next tick, calculates the profit/loss/change within the agent's net worth.

16.Based on whether the simulation has ended, which occurs when the index used in the exchange rate database has reached ‘$end_of_table', or the agent's net worth has dipped below $100, the function returns to the calling receive clause a response message to be sent back to the actuator.

17.The private scape returns back to the actuator the tuple: {Fitness,HaltFlag}, where the Fitness is set to 0 when the HaltFlag is set to 0, and it is set to the agent's net worth when the HaltFlag is set to 1 (when a termination condition has been reached).

18.At this point the loop repeats and we go to step 1 if termination condition has not been reached.

Now that we know the step by step actions and interactions with the private scape, and its architecture, we can move forward and implement it. When reading the following implementation, it is essential to go through the comments, as they discuss and explain the functionality of every function they follow. Finally, the en- tire implementation of the forex simulator that we create in this chapter is availa- ble in the supplementary material at [1].

Listing-19.1 The implementation of the fx:sim/1 function.

sim(ExoSelf)->

put(prev_PC,0),

S = #state{},

A = #account{},

sim(ExoSelf,S,A).

sim(ExoSelf,S,A)->

receive

{From,sense,TableName,Feature,Parameters,Start,Finish}->

{Result,U_S}=case S#state.table_name of

undefined ->

sense(init_state(S,TableName,Feature,Start,Finish),Parameters);

TableName ->

sense(S,Parameters)

end,

From ! {self(),Result},

case ?SENSE_CA_TAG of

true ->

timer:sleep(10000),

IndexT = U_S#state.index,

NextIndexT = fx:next(TableName,IndexT),

RowT = fx:lookup(TableName,IndexT),

NextRowT = fx:lookup(TableName,NextIndexT),

QuoteT = RowT#technical.close,

NextQuoteT = NextRowT#technical.close;

false ->

ok

end,

fx:sim(ExoSelf,U_S,A);

{From,sense,internals,Parameters}->

Result = case A#account.order of

undefined ->

[0,0,0];

O ->

Position = O#order.position,

Entry = O#order.entry,

Percentage_Change = O#order.percentage_change,

[Position,Entry,get(prev_PC)]

end,

From ! {self(),Result},

fx:sim(ExoSelf,S,A);

{From,trade,TableName,TradeSignal}->

U_A = make_trade(S,A,TradeSignal),

Total_Profit = A#account.balance + A#account.unrealized_PL,

case ?ACTUATOR_CA_TAG of

true ->

timer:sleep(10000),

IndexT = S#state.index,

NextIndexT = fx:next(TableName,IndexT),

RowT = fx:lookup(TableName,IndexT),

NextRowT = fx:lookup(TableName,NextIndexT),

QuoteT = RowT#technical.close,

NextQuoteT = NextRowT#technical.close;

false ->

ok

end,

case (U_A#account.balance + U_A#account.unrealized_PL) =< 100 of true ->

Result = {1,0},

From ! {self(),Result},

io:format( “Lost all money~n ”),

put(prev_PC,0),

fx:sim(ExoSelf,#state{},#account{});

false ->

case update_state(S) of

sim_over ->

Total_Profit = A#account.balance +

A#account.unrealized_PL,

Result = {1,Total_Profit},

From ! {self(),Result},

put(prev_PC,0),

fx:sim(ExoSelf,#state{},#account{});

U_S ->

Result = {0,0},

From ! {self(),Result},

U_A2 = update_account(U_S,U_A),

fx:sim(ExoSelf,U_S,U_A2)

end

end;

restart ->

fx:sim(ExoSelf,#state{},#account{});

terminate ->

ok

after 10000 ->

fx:sim(ExoSelf,S,A)

end.

% The sim/1 function is the main receive loop of the forex simulator. It accepts messages from the agent, messages which either request for sensory signals (internal account data, or currency exchange rates), or messages from the agent requesting that the simulator opens a position for the agent. The simulator also monitors if the market has reached the end, or if the agent's net-worth dipped below 100, in which case the evaluation ends.

init_state(S,TableName,Feature,StartBL,EndBL)->

Index_End = case EndBL of

last ->

ets:last(TableName);

_ ->

prev(TableName,ets:last(TableName),prev,EndBL)

end,

Index_Start = prev(TableName,ets:last(TableName),prev,StartBL),

S#state{

table_name = TableName,

feature = Feature,

index_start = Index_Start,

index_end = Index_End,

index = Index_Start

}.

%init_state/5 function generates a default state of the simulator, based on the parameters speci- fied by the agent's messages during the initial contact.

update_state(S)->

NextIndex = fx:next(S#state.table_name,S#state.index),

case NextIndex == S#state.index_end of

true ->

sim_over;

false ->

S#state{index=NextIndex}

end.

%The function update_state/1 accepts the state as the parameter, and updates it by moving the historical pricing forward. During the move of the historical prices forward, the state is updated by updating the agent's account.

update_account(S,A)->

case A#account.order of

undefined ->

nothing_to_update,

A;

O ->

TableName = S#state.table_name,

Index = S#state.index,

Row = fx:lookup(TableName,Index),

Close = Row#technical.close,

Balance = A#account.balance,

Position = O#order.position,

Entry = O#order.entry,

Units = O#order.units,

Change = Close - Entry,

Percentage_Change = (Change/Entry)*100,

Profit = Position*Change*Units,

Unrealized_PL = Profit,

Net_Asset_Value = Balance + Unrealized_PL,

U_O = O#order{current=Close,change=Change, percentage_change

=Percentage_Change, profit=Profit},

U_A = A#account{unrealized_PL=Unrealized_PL, net_asset_value

=Net_Asset_Value, order=U_O},

put(prev_PC,O#order.percentage_change),

U_A

end.

%The update_account/2 function accepts the state and the account as parameters, and updates the account and the order, based on the state's specified current temporal position within the simulated market.

determine_profit(A)->

U_Realized_PL = A#account.realized_PL + A#account.unrealized_PL.

%The function determine_profit/1 calculates the agent's realized profit by adding to the agent's current realized profit, the yet unrealized profit in the account.

make_trade(S,A,Action)->

case A#account.order of

undefined ->

case Action == 0 of

true -> %Do nothing

A;

false -> %Open new position

open_order(S,A,Action)

end;

O ->

case Action == 0 of

true -> %Close Order

close_order(S,A);

false -> %Modify Order

Current_Position = O#order.position,

case Current_Position == Action of

true ->

A;

false ->

U_A=close_order(S,A),

open_order(S,U_A,Action)

end

end.

%The make_trade/3 function opens an order (or keeps one open) for the agent, based on the Action the agent specifies. If the agent holds a long position and Action specifies a short posi- tion, then the long position is closed, and a short is opened. Reflectively, other Actions are dealt with in the same manner.

open_order(S,A,Action)->

BuyMoney = 100,

Spread=A#account.spread,

Leverage = A#account.leverage,

Balance = A#account.balance,

TableName = S#state.table_name,

Index = S#state.index,

Row = fx:lookup(TableName,Index),

Quote = Row#technical.close,

Entry = Quote + Spread*Action,

Units = round((BuyMoney*Leverage)/Entry),

Change= Quote-Entry,

PChange = (Change/Entry)*100,

Profit=Action*Change*Units,

Unrealized_PL = Profit,

New_Order = #order{pair=TableName,position=Action,entry=Entry,current=Quote,

units=Units,change=Change,percentage_change=PChange,profit=Profit},

A#account{unrealized_PL = Unrealized_PL,order=New_Order}.

%The open_order/3 function opens a position using the default leverage and buy in value (100$), making the order short or long dependent on the value of the Action parameter.

close_order(S,A)->

U_Balance = A#account.balance + A#account.unrealized_PL,

U_Realized_PL = A#account.realized_PL + A#account.unrealized_PL,

A#account{

balance=U_Balance,

realized_PL=U_Realized_PL,

unrealized_PL = 0,

order=undefined

}.

%The close_order/2 function, closes any currently opened position, updating the agent's account in the process.

%%%% FX SENSORY SIGNAL FUNCTIONS %%%%

sense(S,Parameters)->

case Parameters of

[HRes,VRes,graph_sensor]->

{Result,U_S}=plane_encoded(HRes,VRes,S);

[HRes,list_sensor]->

{Result,U_S}=list_encoded(HRes,S)

end.

list_encoded(HRes,S)->

Index = S#state.index,

CurrencyPair = S#state.table_name,

PriceListPs = S#state.price_list,

case lists:keyfind(HRes, 2,PriceListPs) of

false ->

Trailing_Index = prev(CurrencyPair,Index,prev,HRes-1),

U_PList = fx_GetPriceList(CurrencyPair,Trailing_Index,HRes,[]),

U_PriceListPs = [{U_PList,HRes}|PriceListPs];

{PList,HRes} ->

R = fx:lookup(CurrencyPair,Index),

U_PList = [{R#technical.open,R#technical.close,R#technical.high,

R#technical.low}|lists:sublist(PList,HRes-1)],

U_PriceListPs = lists:keyreplace(HRes, 2, PriceListPs, {U_PList,HRes})

end,

U_S=S#state{price_list=U_PriceListPs},

{[Close||{_Open,Close,_High,_Low}<-U_PList],U_S}.

% The function list_encoded/2 returns to the caller a price list of length HRes. plane_encoded(HRes,VRes,S)->

Index = S#state.index,

CurrencyPair = S#state.table_name,

PriceListPs = S#state.price_list,

case lists:keyfind(HRes, 2,PriceListPs) of

false ->

Trailing_Index = prev(CurrencyPair,Index,prev,HRes-1),

U_PList = fx_GetPriceList(CurrencyPair,Trailing_Index,HRes,[]),

U_PriceListPs = [{U_PList,HRes}|PriceListPs];

{PList,HRes} ->

R = fx:lookup(CurrencyPair,Index),

U_PList = [{R#technical.open,R#technical.close,R#technical.high,

R#technical.low}|lists:sublist(PList,HRes-1)],

U_PriceListPs = lists:keyreplace(HRes, 2, PriceListPs, {U_PList,HRes})

end,

LVMax1 = lists:max([High||{_Open,_Close,High,_Low}<-U_PList]),

LVMin1 = lists:min([Low||{_Open,_Close,_High,Low}<-U_PList]),

LVMax =LVMax1+abs(LVMax1-LVMin1)/20,

LVMin =LVMin1-abs(LVMax1-LVMin1)/20,

VStep = (LVMax-LVMin)/VRes,

V_StartPos = LVMin + VStep/2,

U_S=S#state{price_list=U_PriceListPs},

{l2fx(HRes*VRes,{U_PList,U_PList},V_StartPos,VStep,[]),U_S}.

%The function plane_encoded/3, returns to the caller a chart with a resolution of HResXVRes.

l2fx(Index,{[{Open,Close,High,Low}|VList],MemList},VPos,VStep,Acc)-> {BHigh,BLow} = case Open > Close of

true ->

{Open,Close};

false ->

{Close,Open}

end,

O = case (VPos+VStep/2 > BLow) and (VPos-VStep/2 =< BHigh) of

true ->

1;

false ->

case (VPos+VStep/2 > Low) and (VPos-VStep/2 =< High) of

true ->

0;

false ->

-1

end

end,

l2fx(Index-1,{VList,MemList},VPos,VStep,[O|Acc]);

l2fx(0,{[],_MemList},_VPos,_VStep,Acc)->

Acc;

l2fx(Index,{[],MemList},VPos,VStep,Acc)->

l2fx(Index,{MemList,MemList},VPos+VStep,VStep,Acc).

%The l2fx/5 function is the one that actually composes the candle stick chart, based on the

price list, and the HRes and VRes values.

fx_GetPriceList(_Table,EndKey,0,Acc)->

Acc;

fx_GetPriceList(_Table,'end_of_table',_Index,Acc)->

exit( “fx_GetPriceList, reached end_of_table ”);

fx_GetPriceList(Table,Key,Index,Acc) ->

R = fx:lookup(Table,Key),

fx_GetPriceList(Table,fx:next(Table,Key),Index-1, [{R#technical.open,

R#technical.close, R#technical.high, R#technical.low}|Acc]).

%The fx_GetPriceList/4 function, accesses the table: Table, and returns to the caller a list of tuples composed of the open, close, high, low exchange rate values, running from the initial

Key index, to the EndKey index within the table.

Having now implemented the actual simulator, we have to construct the sensors and actuators which the agent will use to interface with it. We do so in the next section.

19.5 Implementing the New Sensors and Actuators

With the private scape implemented, we now update the sensor and actuator modules to include the two new sensors that poll the scape for PLI and PCI sig- nals, and send to the scape messages to execute trades. As with the other sensors and actuators, these are very simple due to the fact that the scape does most of the heavy lifting. Listing-19.2 shows the implementation of the three new sensor func- tions added to the sensor module. There are two sensors which request financial signals from the private scape, one in the standard linear sliding window format, and the other requests for that vector to be in the form appropriate for a substrate encoded NN based agent, and for that vector to represent the sensory signal using an appropriate and sensor specified resolution. The third sensor requests infor- mation with regards to the agent's account, encoded always in the linear form, as a vector of length 3.

Listing-19.2 The implementation of fx_PCI/4, fx_PLI/4, and fx_Internals / 4 , sensor functions.

fx_PCI(Exoself_Id,VL,Parameters,Scape)->

[HRes,VRes] = Parameters,

case get(opmode) of

standard ->

Scape

!{self(),sense,'EURUSD15',close,[HRes,VRes,graph_sensor],1000,200};

gentest ->

Scape ! {self(),sense,'EURUSD15',close,[HRes,VRes,graph_sensor],200,last}

end,

receive

{_From,Result}->

Result

end.

fx_PLI(Exoself_Id,VL,Parameters,Scape)->

[HRes,Type] = Parameters, %Type=open|close|high|low

case get(opmode) of

standard ->

Scape ! {self(),sense,'EURUSD15',close,[HRes,list_sensor],1000,200}; gentest ->

Scape ! {self(),sense,'EURUSD15',close,[HRes,list_sensor],200,last}

end,

receive

{_From,Result}->

normalize(Result)

end.

normalize(Vector)->

Normalizer=math:sqrt(lists:sum([Val*Val||Val<-Vector])),

[Val/Normalizer || Val <- Vector].

fx_Internals(Exoself_Id,VL,Parameters,Scape)->

Scape ! {self(),sense,internals,Parameters},

receive

{PId,Result}->

Result

end.

The only thing that is different about these implementations is their use of the get(opmode) function, because the generalization testing and training, are per- formed on different subsets of the financial data. We will get back to that in the next section.

Similarly to the new sensors, we now implement the new actuator, and add it to the actuator module. The implementation of the fx_Trade/4 function is shown in

8 08

the following listing. This actuator simply contacts the scape and requests to make a trade for a specified currency pair.

Listing-19.3 The implementation of the fx_Trade actuator. fx_Trade(ExoSelf,Output,Parameters,Scape)->

[TradeSignal] = Output,

Scape ! {self(),trade,'EURUSD15',functions:trinary(TradeSignal)},

receive

{Scape,Fitness,HaltFlag}->

{Fitness,HaltFlag}

end.

And finally we add the new fx morphological specification to the morphology module, as shown in the next listing.

Listing-19.4 The fx morphological specification.

forex_trader(actuators)->[

actuator{name=fx_Trade,type=standard,scape={private,fx_sim},format=no_geo, vl =1,

parameters=[]}

];

forex_trader(sensors)->

PLI_Sensors=[#sensor{name=fx_PLI,type=standard,scape={private,fx_sim}, format

=no_geo, vl=HRes, parameters=[HRes,close]} || HRes<-[10]],

PCI_Sensors = [#sensor{name=fx_PCI,type=standard,scape={private_fx_sim}, format ={symmetric,[HRes,VRes]},vl=HRes*VRes,parameters=[HRes,VRes]} || HRes <-[50], VRes <-[20]],

Internal_Sensors = [#sensor{name=fx_Internals,type=standard,scape={private_fx_sim},

format=no_geo,vl=3,parameters=[3]}],

PCI_Sensors++Internal_Sensors.

Within this morphological specification we can choose which of the sensors to use, the one for substrate encoded NN based agents, or the one for neural encoded. We can also choose to use different sensors at the same time, sensors which gather data for different currency pairs, resolutions... It might be useful in the future to al- low the agent to evolve connections and use different types of sensors which pro- vide it with not only different signals, and different currency pair information, but also with differently encoded signals, which might allow the agent to more effec- tively extract and blend the information it acquires.

Having now implemented the sensors, actuators, and the new morphology, we now return to the discussion of the training and generalization testing, and what changes it entails for our TWEANN.

19.6 Generalization Testing

When we evolve a neural network to solve some problem, there is always a chance that instead of learning the concept behind the given problem, the neural network will simply memorize the associations between some specific sensory signals and actions that have to be taken, especially if the sensory signals come from some static list of data. In applications and simulations of ALife, the sensory signals and the environment itself is so dynamic, that simple memorization is not feasible, and would lead to death of the organism. Ability to deal with dynamic environments requires a greater amount of learning, and generalization is already part of the environment and what it requires from an agent if it wishes to survive. But when we deal with something like financial analysis, and the NN is trained on a some particular static dataset, there is a chance that it will simply memorize how to profitably trade that memorized section of the financial dataset, and when we move it to new data in the real world, it will be unable to generalize or make prof- itable trades with the new input signals.

The process of simple memorization rather than learning the concept, is re- ferred to as overtraining/fitting. To prevent overtraining, stopping the evolutionary process right before the population begins to memorize rather than learn, is cus- tomarily done by dividing the given training dataset into two sections: Training and Generalization Testing. We first apply the population to the training dataset, evolving the solution. But after every generation, or every X number of evalua- tions, we take the champion of the population and apply it to the Generalization Testing dataset which it has not yet seen before. If at some point the agents begin to do worse and worse on the generalization datasets while continuing to improve on the training dataset, then the population is beginning to memorize, and is be- coming overtrained. Thus at this point we would stop the training process, as it has shown to not be able to generalize and improve on new data any further.

Our system does not yet support such a feature. In general, the DXNN system and particularly the system we've built here, has in the problems I've applied it to, proven itself to generalize rather well even without the use of data splitting. To ac- tually demonstrate this, to demonstrate the fact that our TWEANN system does not suffer significantly from over-training and poor generalization, we will slight- ly modify the population monitor such that every time a stat of a specie is calcu- lated, the population monitor also takes the champion of that specie and applies it to a generalization test. This is done through a slight modification to the popula- tion_monitor module, the exoself module, and the sensor/actuator modules.

The amount of modification we will need to make is very little. Let us first dis- cuss what the new behavior of an agent should be when it has been spawned only for the purpose of generalization testing. When we want to simply test a champion agent on how well it generalizes, we do not want to mutate or tune it in any way. We merely want to spawn the exoself, and apply it to a problem once, with its at-that-time topology and synaptic weight and connectivity pattern. Once the agent

finishes its single evaluation, that score sh ould then be returned to the population monitor. The population monitor should perform (if at all), the generalization test when it is building a stat record. Whether the agent is applied to a new problem or set of sensory signals during its generalization test is dependent on the problem. So then, we need only modify the exoself so that it can act in its standard way, but also when being generalization tested, to work in a very simple manner of just spawning and linking the NN system, waiting for the fitness score from the cortex element, and then forwarding that to the population monitor. The population monitor needs to be modified very little as well, it should simply, if set to do so, perform a generalization test by summoning the champion agent with the general- ization_test operational mode, wait for its reply, and enter that score into the mod- ified stat record. We will need to add to the stat record the new gentest_fitness el- ement. Finally, the sensors and actuators could be modified at their core, by us changing their records to have two sets of parameters. One standard parameter el- ement, and one gentest_parameter. This gentest_parameter could then specify a set of parameters that would dictate how the sensor/actuator should behave when generalization testing is performed... But we will take a simpler approach. We will not modify anything so significant with regards to sensors and actuators, and simply set the fx_PCI, fx_PLI, fx_Internals, and fx_Trade functions to check their operational mode, sent to them by the exoself, and then based on that, request sig- nals from the private scape differently. This way we will not have to change any- thing. All that we would need to modify, is allow the exoself to specify the agent's operational mode when linking the sensors and actuators. The sensors and actua- tors would store to the process registry this operation mode, and those sensors and actuators that behave differently during standard evaluation and generalization testing, would simply use the command get(opmode) , to decide how to function. Thus, the sensor and actuator based modified prep/1 functions are as follows, with the small modification in boldface:

prep(ExoSelf_PId) ->

receive

{ExoSelf_PId,{Id,Cx_PId,Scape,SensorName,VL,Parameters,Fanout_PIds,

OpMode }} ->

put(opmode,OpMode),

loop(Id,ExoSelf_PId,Cx_PId,Scape,SensorName,VL,Parameters,Fanout_PIds)

end.

prep(ExoSelf_PId) ->

receive

{ExoSelf_PId,{Id,Cx_PId,Scape,ActuatorName,Parameters,Fanin_PIds, OpMode }} ->

put(opmode,OpMode),

loop(Id,ExoSelf_ PId,Cx_PId,Scape,ActuatorName,Parameters, {Fanin_PIds,

Fanin_PIds}, [])

end.

The first prep/1 belongs to the modified sensor module, while the second prep/1 belongs to the modified actuator module. And it is this simple modification that allows the sensor and actuator functions in Listing-19.2 to work.

Before we make the small changes to the benchmarker, population_monitor, and the exoself modules, let's go through the steps of our TWEANN when set to perform generalization testing when evolving a population of agents:

1. Because we now use the pmp record to set all the parameters of the popula- tion_monitor, we can use this record's op_mode element, and either set it to gentest_off , or gentest_on value. When set to gentest_off , it will function nor- mally, without performing any kind of generalization tests every X number of evaluations. But when we set op_mode to gentest_on , the population_monitor will perform a generalization test. So then, we do not need to modify the benchmarker module in any way other than now having to also specify the op_mode parameter. Let us then set it in this example to: gentest_on .

2. The benchmarker starts the population_monitor. The population_monitor mod- ule's state record also has an op_mode element. And in the prep_PopState/2 func- tion that the benchmarker uses to spawn the population_monitor with the pa- rameters specified by the pmp record, the pmp record's op_mode is mapped to that of the population_monitor's state record's op_mode. It is only when we summon agents with the function summon_Agents/3 that we might wish to specify the operational mode with which the exoself should operate. But even here, we can simply create an extra exoself:start/3 function in which OpMode is specified. Thus, if the agent is started with the exoself:start/2 function (which is what we usually use), then it starts in standard mode. The only change in the population_monitor occurs in the gather_STATS/2 function. We modify it to gather_STATS/3 function, calling it with the OpMode parameter, so that when it executes: update_SpecieSTAT(Specie_Id,TimeStamp,OpMode) for every spe- cie, it does so with the OpMode value. It's this function: update_SpecieSTAT/3 , that we need to update to add this new generalization_testing functionality to our TWEANN.

3. We modify the stat record in the records.hrl file to also include the element gentest_fitness. The function update_SpecieSTAT, when composing the stat record, executes the following: gentest_fitness = run_GenTest(S,OpMode). The run_GenTest/2 function, based on the OpMode, either simply returns 0, if the OpMode is set to gentest_off, or spawns the Specie's champion agent with the operational mode gentest . Thus for the population_monitor module, we need only update a few functions (gather_STATS/2 and update_SpecieSTAT/3) so that they can be called with the OpMode parameter. And we also need to build this new run_GenTest/2 function, which spawns the agent in the benchmark mode, waits for it to return its generalization score, and then returns that score to the caller, and thus setting the gentest_fitness parameter to that score.

4. The run_GenTest/2 function checks the Specie's S#specie.champion_ids, to get the agent id of the champion, and then spawn it in gentest mode by executing: exoself:start(ChampionAgent_Id, self(), gentest).

5. For step-4 to work, our exoself should be spawnable with the OpMode parameter, and we need to further update its state record to include the opmode element in it. Finally, we also need to allow the exoself to operate the loop/1 function in two ways, the standard way, and the gentest way where it simply spawns the NN, waits for the fitness score, and then sends that fitness score as a message to the run_GenTest/2 function which spawned it, and then terminates the pheno- type and itself. This can easily be done by modifying the loop/1 into loop/2: exoself:loop(State,OpMode). In this way, we do not modify the standard code, instead we simply add a secondary loop clause which performs the required functionality and terminates. This will also require us to allow for the exoself to send the sensors and actuators the OpMode when linking them, but that is a simple modification, we just extend the InitState tuple that the exoself sends the sensors and actuators, such that the extended tuple also includes the OpMode as the last element.

Thus, to make it all work we now first modify the gather_STATS/3, as shown in the following listing, with the new source code in boldface:

Listing-19.3 The implementation of the updated gather_STATS/3, update_SpecieSTAT/3, and

the new function run_GentTest/2. The modified and new code is shown in boldface. gather_STATS(Population_Id,EvaluationsAcc, OpMode )->

io:format( “Gathering Species STATS in progress~n ”),

TimeStamp = now(),

F = fun() ->

P = genotype:read({population,Population_Id}),

T = P#population.trace,

SpecieSTATS = [update_SpecieSTAT(Specie_Id,TimeStamp, OpMode ) || Specie_Id

<-P#population.specie_ids],

PopulationSTATS = T#trace.stats,

U_PopulationSTATS = [SpecieSTATS|PopulationSTATS],

U_TotEvaluations = T#trace.tot_evaluations+EvaluationsAcc,

U_Trace = T#trace{

stats = U_PopulationSTATS,

tot_evaluations=U_TotEvaluations

},

io:format( “Population Trace:~p~n ”,[U_Trace]),

mnesia:write(P#population{trace=U_Trace})

end,

Result=mnesia:transaction(F),

io:format( “Result:~p~n ”,[Result]).

update_SpecieSTAT(Specie_Id,TimeStamp, OpMode )->

Specie_Evaluations = get({evaluations,Specie_Id}),

put({evaluations,Specie_Id},0),

S = genotype:read({specie,Specie_Id}),

{Avg_Neurons,Neurons_Std} = calculate_SpecieAvgNodes({specie,S}),

{AvgFitness,Fitness_Std,MaxFitness,MinFitness} = calculate_SpecieFitness({

specie,S}),

SpecieDiversity = calculate_SpecieDiversity({specie,S}),

STAT = #stat{

morphology = (S#specie.constraint)#constraint.morphology,

specie_id = Specie_Id,

avg_neurons=Avg_Neurons,

std_neurons=Neurons_Std,

avg_fitness=AvgFitness,

std_fitness=Fitness_Std,

max_fitness=MaxFitness,

min_fitness=MinFitness,

avg_diversity=SpecieDiversity,

evaluations = Specie_Evaluations,

time_stamp=TimeStamp,

gentest_fitness = run_GenTest(S,OpMode)

},

STATS = S#specie.stats,

U_STATS = [STAT|STATS],

mnesia:dirty_write(S#specie{stats=U_STATS}),

STAT.

run_GenTest(S,gentest)->

TopAgent_Id = case S#specie.champion_ids of

[Id] ->

Id;

[Id|_] ->

Id;

[]->

void

end,

case TopAgent_Id of

void ->

0;

_ ->

Agent_PId=exoself:start(TopAgent_Id,self(),gentest),

receive

{Agent_PId,gentest_complete,Specie_Id,Fitness, Cycles,

Time}->

genotype:print(TopAgent_Id),

Fitness;

Msg ->

io:format( “Msg:~p~n ”,[Msg])

end

end;

run_GenTest(_S,_)->

0.

Now we modify the exoself module by allowing the exoself to be executed in 3 ways, so that it now also supports being executed with the OpMode parameter. And we also add the new loop/2 clause. This is shown in the following listing.

Listing-19.4 The 3 new exoself:start/1/2/3 functions, and the new loop/2 clause.

start(Agent_Id)->

case whereis(monitor) of

undefined ->

io:format( “start(Agent_Id):: ‘monitor' is not registered~n ”);

PId ->

start(Agent_Id,PId,standard)

end.

start(Agent_Id,PM_PId)->

start(Agent_Id,PM_PId,standard).

start(Agent_Id,PM_PId,OpMode)->

spawn(exoself,prep,[Agent_Id,PM_PId,OpMode]).

…

loop(S, standard )->

receive

…

...

end;

loop(S, gentest )->

receive

{Cx_PId,evaluation_completed,Fitness,Cycles,Time,GoalReachedFlag}->

terminate_phenotype(S#state.cx_pid,S#state.spids,S#state.npids, S#state.apids,

S#state.scape_pids,S#state.cpp_pids,S#state.cep_pids,S#state.substrate_pid),

io:format( “GenTest complete, agent:~p terminating. Fitness:~p~n

TotCycles:~p~n TimeAcc:~p Goal:~p~n ”,[self(),Fitness,Cycles,Time,GoalReachedFlag]),

S#state.pm_pid !

{self(),benchmark_complete,S#state.specie_id,Fitness,Cycles,Time}

end.

19.7 Benchmark & Results

From this we can see that when exoself is started with either exoself:start/1 or exoself:start/2, it starts with an OpMode = standard, which is just normal opera- tional mode in which it performs tuning. But we can also start it using exoself:start/3, specifying the OpMode directly. If that OpMode is gentest, after prepping and mapping the genotype to phenotype, the exoself will drop into the loop(S,gentest) clause, wait for the fitness score, and then immediately forward that fitness score to the population_monitor, and then terminate. With this, our system can now be started either in the standard mode, or in the gentest mode, in which it will perform generalization testing every X number of evaluations.

Finally, to actually build a graph of generalization test fitness scores, we also have to modify the benchmarker module, particularly the prepare_Graphs/2 func- tion, so that it also dumps the gentest_fitness values to the file. With that done, composed of a few very simple modifications not shown here, we can now com- pile the modified modules, and move forward to perform the benchmarks.

With our TWEANN now also being able to perform generalization tests when specified to do so, and store the results in the database, we can now apply our TWEANN system to the Forex simulator we've developed, and test the evolved agent's ability to generalize. Our goal now is to perform the benchmarks using Price List Input for neural encoded agents, and Price Chart Input for substrate en- coded agents capable of extracting the geometrical patterns within the charts.

In the following benchmarks a single evaluation of a NN is counted when the NN based agent has went through all the 800 training data points, or if its balance dips below $100. The fitness of the NN is its net worth at the end of its evaluation. Each evolutionary run will last for 25000 evaluations, and each experiment is composed of 10 such evolutionary runs. In each experiment the population size was set to 10. Finally, in every experiment we will allow the NNs to use and inte- grate through evolution the following set of activation functions: [tanh, gaussian, sin, absolute, sgn, linear, log, sqrt].

In the experiments we will perform, we will set the NNs to use price sliding window vectors for direct encoded NNs, and price charts for substrate encoded NNs. We will also connect each agent not only to the sensors providing them with closing prices, but also the fx_sensor which produces the vector composed of: [Position, Entry, PercentageChange], where Position takes the value of either -1 (currently shorting the held order), 0 (no position), or 1 (currently going long on the held order), Entry is the price at which the position was entered (or set to 0 if no position is held), and PercentageChange is the percentage change in the posi- tion since entry, and finally the substrate's own output, a vector of length 1, will be fed back to the substrate's input hyperlayer. This will, due to feeding the

substrate its own output, make the substrate Jordan Recurrent, an architecture of which is shown in Fig-19.7 . Because we have already implemented the jordan_recurrent topology in Chapter-16, the use of this architecture will entail nothing more than allowing the seed agents to start with two sensors, the fx_internals and fx_PCI, and using: substrate_linkforms = [jordan_recurrent] in the ?INIT_CONSTRAINTS of the population to which the agent belongs.

Fig. 19.7 The topology of the Jordan Recurrent substrate of the PCI using agent.

We will perform 14 benchmarks/experiments in total, each experiment is com- posed of 10 evolutionary runs from which the experiment's average/max/min is calculated for both the training and the generalization testing. Through the exper- iments we will compare the performance of PCI based NNs and the PLI based NNs. Finally, the sliding window and chart resolution that we will implement, will be comparable for both the neural and substrate encoded NN based agents. We will perform the following experiments:

5 PLI experiments:

Experiments 1-5 will be performed using the PLI using NNs. Each experiment will differ in the resolution of the sliding window input the NNs use. Each NN will start with the sliding window sensor, and the fx_internals sensor. These 5 experiments are:

1. [SlidingWindow5] 2. [SlidingWindow10] 3. [SlidingWindow20]

4. [SlidingWindow50] 5. [SlidingWindow100]

Their names are based on the resolutions used by the agent's sensors.

 9 PCI experiments:

We will perform experiments 6-14 with the PCI based NNs. In these experiments each PCI based NN will use a 4 dimensional substrate. The input hyperlayer to the substrate will be composed of the fx_PCI, fx_internals sensors, and the Jordan recurrent connection. For the PCI based NNs, we will create a 4 dimen- sional substrate with an input hyperlayer composed of the noted 3 hyperplanes and located at K = -1, all of which will be connected to the 5X5 hyperlayer positioned at K = 0, which then is further connected to the 1X1 output hyperlayer (composed of a single neurode in this case) located at K = 1, which outputs the short/hold/long signal, and which is also used for the recurrent connection back to the input hyperlayer. Each of the 9 experiments will use a sensor of a different resolution:

1. [ChartPlane5X10], 2. [ChartPlane5X20] 3. [ChartPlane10X10]

4. [ChartPlane10X20] 5. [ChartPlane20X10] 6. [ChartPlane20X20]

7. [ChartPlane50X10] 8. [ChartPlane50X20] 9. [ChartPlane100x10]

We will set the benchmarker to test generalization abilities of the evolved NN based agents every 500 evaluations, applying the best NN in the population at that time to the 200 data point generalization test. Performing the generalization tests consistently throughout the evolution of the population will not only allow us to test the generalization ability of the best NNs in the population, but it will also al- low us to build a plot of the general generalization capabilities of that particular encoding and sensor type, and the generalization abilities of our TWEANN in general. Finally, doing this will allow us to get a better idea of whether generaliza- tion drops off as the PCI and PLI NNs are trained, whether it improves, or whether it stays the same throughout the training process.

19.7.1 Running the Benchmark

Having set everything up, we execute the benchmark for every noted exper- imental setup, and run it to completion. To do this, we simply modify the con- straints used in our benchmarker module, and then execute benchmarker:start(Experiment_Name), for every of our experimental setups. Due to the number of the experiments, and the amount of time that the PCI based ex- periments take, particularly the ChartPlane100x10 and ChartPlane50x20 experi- ments, the benchmarking process will take up to a week even on a rather powerful quad core sandy bridge CPU. A problem which can be alleviated by interfacing Erlang with a GPU, and leveraging the vector multiplication performed by the substrate, but that is a story which will be covered in the next tome of this series...

A week later (which of course could be much less if using a much more power- ful server, or running all the experiments in parallel on different machines), we fi- nally have all the benchmarking results, similar to the results shown in Table-1 . The following table presents the training average, training best, testing worst, test- ing average, testing standard deviation, and testing best fitness score results of every experiment. At the very bottom of the table, I list the Buy & Hold strategy, and the Maximum Possible profit results for comparison. The Buy & Hold profits are calculated by trading the currencies at the very start of the training or testing run respectively, and then trading back at the end. The best possible profit is cal- culated by looking ahead and trading the currencies only if the profit gained be- fore the trend changes will be greater than the spread covering cost.

Table 1 Benchmark/Experiment Results.

TrnAvg

N/A

TrnBst

N/A

TstWrst

N/A

TstAvg

N/A

TstStd

13

16

15

9

14

32

38

32

26

29

36

23

20

37

N/A

TstBst

N/A

Price Vector Sensor Type

[SlidWindow5]

[SlidWindow10]

[SlidWindow20]

[SlidWindow50]

[SlidWindow100]

[ChartPlane5X10]

[ChartPlane5X20]

[ChartPlane10X10]

[ChartPlane10X20]

[ChartPlane20X10]

[ChartPlane20X20]

[ChartPlane50X10]

[ChartPlane50X20]

[ChartPlane100x10]

Buy & Hold

Max Possible

From the above results we can note that the generalization results for both, the PCI based NNs and PLI based NNs, show profit. Indeed, the acquired profits seem rather significant as well, for example the highest profit reached during generaliza- tion, $88 (single occurrence ending with net worth of $388) out of the $128 possi- ble when the agent started with 300$. This shows that the agent was able to extract 68% of the available profit, a considerable amount. This is substantial, but we must keep in mind that even though the agents were used on real world data, they were still only trading in a simulated market, and we have chosen the best per- formers from the entire experiment..., thus it is only after we test these champions on another set of previously unseen data, would it be possible to say with some certainty that these generalization abilities carry over, and for how many time- steps before the agents require re-training (In our experiment the agents are trained on 800 time steps, and tested on the immediately followed 200 time steps).

By only analyzing the information provided in the above table, the first thing we notice is that the PCI NN generalization's worst performers are significantly worse than those of the PLI based NNs. The PCI based NNs either generalized well during an evolutionary run, or lost significantly. The PLI based NNs mostly kept close to 300 during generalization test phase when not making profit. Also, on average the best of PCI are lower than those produced by the best of PLI dur- ing generalization. On the other hand the training fitness scores are comparable for both the PCI and PLI NNs. Another observation we can make is that on average the higher price (vertical) resolution (X20 Vs. X10) correlates with higher achieved profits by the PCI NNs during generalization testing. And finally, we al- so see that for both PLI and PCI, generalization achieved by 5 and 100 based price window resolutions is highest.

We have discussed time and again that substrate encoding could potentially of- fer a greater level of generalization to NN based agents due to the NNs operating on coordinates, never seeing the actual input sensory signals, and thus being una- ble to pick out particular patterns for memorization. NNs paint the synaptic weights and connectivity patterns on the substrate in broad strokes . But based on Table-1 , at the face of it, it would almost seem as if we are wrong. This changes if we perform further analysis of the results of our experiments, and plot the benchmarker produced GNUplot ready files. The plots produced are quiet interest- ing, as shown in Fig-19.8 .

Fig. 19.8 PLI & PCI based Training and Testing Fitness Vs. Evaluations.

Though somewhat difficult to see, we can make out that though yes the PLI NNs did achieve those generalization fitness scores, they were simply tiny and very short lived blips during the experiment, occurring a few times, and then dis- appearing, diving back under 300. On the other hand though, the PCI NNs pro- duced lower profits on average when generalization was tested, but they produced those profits consistently, they generalized more frequently. When a PCI NN system generalized, it maintained that generaliza tion ability for most of the entire evolutionary run. This is easier to see if we analyze the graph of PLI Generalization Fitness Vs. Evaluations, shown in Fig-19.9 , and the PCI Generalization Fitness Vs. Evaluations, shown in Fig-19.10 .

Fig. 19.9 PLI based Generalization Testing Fitness Vs. Evaluations.

If we look at SlidingWindow100, produced by plotting the best generalization scores from the 10 evolutionary runs of that experiment, we see that the score of 367 was achieved briefly, between roughly the evaluation number 5000 and 10000. This means that there was most likely only a single agent out of all the agents in the 10 evolutionary runs, that achieved this, and then only briefly so. On the other hand, we also see that majority of the points are at 300, which implies that most of the time, the agents did not generalize. And as expected, during the

very beginning, evaluations 0 to about 3000, there is a lot more profit producing activity amongst all sliding window resolutions, which is rather typical of over trained NNs, whose generalization score decreases while training score increases over time. The most stable generalization and thus profitability was shown by SlidingWindow5 and SlidingWindow100, and we know this because in those ex- periments, there were a lot more fitness scores above 300, consistently. From this, we can extract the fact that when it comes to PLI based NNs, during all the exper- iments, there are only a few agents that generalize well, and do so only briefly.

Fig. 19.10 PCI based Generalization Testing Fitness Vs. Evaluations.

Let us now analyze Fig-19.10 , the generalization results for just the PCI based NN systems. The story here is very different. Not only there are more consistently higher than 300 generalization fitness scores in this graph, but they also last throughout the entire 25000 evaluations. This means that there are more general- izing agents which stayed within the population without being replaced due to over-fitting, or simply those few that did generalize, performed superior to those with poorer generalization, and thus stayed within the population for longer periods of time. Which gives hope that the generalization ability of these PCI NN based systems will carry over to real world trading.

When going through raw data, it was usually the case that for every PLI NN based experiment, only about 1-2 in 10 evolutionary runs had a few agents which generalized for a brief while to scores above 320, before being replaced by over- fitted, poorly generalizing agents. On the other hand when going through the PCI NN based experiments, 3-6 out of 10 evolutionary runs had agents generalizing, and remaining within the population for the entire evolutionary run, with scores above 320.

Analyzing the generalization plots further, we can also note that the low resolu- tion substrates produce at times better results, and do so more often, than their high resolution counterparts... except for the SlidingWindow100 , and ChartPlane100x10 based experiments, which performed the best. I think that the low resolution based experiments performed well due to the fact that by increasing the number of neurodes in one layer, the neurodes in the postsynaptic layer would now have so many inputs that they become saturated, becoming unable to function effectively, although scaling and normalizing the presynaptic vectors for every neurode did not seem to improve this problem significantly, leaving this anomaly to future work. Contradictory to this assumption is of course the exceptionally well performing SlidingWindow100 , and ChartPlane100x10 based agents. So then, it seems that the lowest resolution and highest resolution based experiments performed the best, but the question of why is still under analysis.

With regards to the evolved high performing PLI NNs, it was clear that they all had one feature in common, they all had a substantial number of recurrent connec- tions. I think that the high performing PLI NNs which used the sliding window vectors of size 5, did so due to having a large number of recurrent connections, which made it difficult to evolve simple memorization, and thus forcing generali- zation. But this is also just a hypothesis at the moment.

19.8 Discussion

We have seen that TWEANN systems can successfully be applied to financial analysis. We have also been able to compare a geometrical pattern sensitive sub- strate encoded NN based agent which uses price chart input, vs. standard neural encoded agent using price list input. As we hoped, the geometrically sensitive agents were able to produce profit when applied to this problem, and were the su- perior of the two approaches when it came to generalization. This implies that PCI agents were able to extract the geometrical patterns within the financial data just as we hoped.

19.9 Summary

These results also imply that the geometrical pattern sensitive NNs have poten- tial in time series analysis applications. This means, anything from earthquake data analysis, to frequency and audio/voice analysis based applications, could poten- tially be leveraged by these types of systems. The generalization is particularly impressive, and the substrate encoded system's ability to use and improve with the resolution at which the time series analysis is sampled, implies enormous applica- tion based potential.

Though our systems have produced profits when applied to Forex trading in simulations, we still need to apply the resulting evolved agents to the real market, by connecting the evolved agent to a trading platform. Application of these types of systems to voice analysis, is now also an area of interest with regards to future applications.

In this chapter we applied our TWEANN system to evolve currency trading agents. We extended our TWEANN by adding to it the generalization testing fea- tures, allowing the benchmarker to extract the champions of the population and separately apply them to the given problem, or a separate problem, to see if the evolved agents can generalize and perform effectively on a related problem, but one not yet explored directly during evolution. We implemented the translation of the pricing data into price chart based on the candle stick style, and evolved sub- strate encoded geometrical pattern sensitive agents which could use these price chart inputs, and based on the geometrical patterns within those charts, trade cur- rency. We also evolved the standard sliding window, price list input based neural networks. These types of agents simply read historical data, and then made cur- rency pair trading decisions.

From the performed benchmarks and experiments, we have confirmed that sub- strate encoding does indeed provide for excellent generalization capabilities. We also confirmed that geometrical sensitivity with regards to technical analysis can give the NN an ability to trade currency. Indeed the PCI based agents performed much better, with regards to generalization, than PLI based agents. But more importantly, through experiments that we have performed, we determined that TWEANNs can indeed evolve geometry sensitive agents that are successful in this very complex and chaotic time series analysis application, thus there is hope for its application to other time series analysis problems.

19.10 References

[1] Chapter-19 Supplementary material: www.DXNNResearch.com/NeuroevolutionThrough Erlang/Chapter19

[2] Halliday R (2004) Equity Trend Prediction With Neural Networks. Res. Lett. Inf. Math. Sci., Vol. 6, pp 15-29.

[3] Mendelsohn L (1993) Using Neural Networks For Financial Forecasting. Stocks & Commod- ities. Volume 11:12, October. p.518-521.

[4] Min Qi, Peter GZ (2008) Trend Time-Series Modeling and Forecasting With Neural Net- works. IEEE Transactions on neural networks, Vol. 19, no. 5.

[5] Lowe David (1994) Novel Exploitation of Neural Network Methods in Financial Markets. Proceedings of the 3rd IEE International Conference on Artificial Neural Networks, IEE Pub- lications, Aston, United Kingdom.

[6] Jung H, Jia Y, et al (2010) Stock Market Trend Prediction Using ARIMA-Based Neural Networks. 2008 Proceedings of 17th International Conference on Computer Communications and Networks 4, 1-5.

[7] Versace M, Bhatt R, Hinds O, Shiffer M (2004) Predicting The Exchange Traded Fund DIA With a Combination of Genetic Algorithms and Neural Networks. Expert Systems with Ap- plications 27, 417-425.

[8] Yao J, Poh HL (1995) Forecasting the KLSE Index Using Neural Networks. IEEE Interna- tional Conference on Artificial neural networks.

[9] Kimoto T, Asakawa K, Yoda M, Takeoka M (1990) Stock Market Prediction System With Modular Neural Networks. International Joint Conference on Neural Networks 1, 1-6.

[10] Hutchinson JM, Lo AW, Poggio T (1994) A Nonparametric Approach to Pricing and Hedg- ing Derivative Securities Via Learning Networks. Journal of Finance 49, 851-889.

[11] Refenes AN, Bentz Y, Bunn DW, Burgess AN, Zapranis AD (1997) Financial Time Series Modelling With Discounted Least Squares Backpropagation. Science 14, 123- 138.

[12] Li Y, Ma W (2010) Applications of Artificial Neural Networks in Financial Economics: A Survey. 2010 International Symposium on Computational Intelligence and Design, 211-214.

[13] Rong L, Zhi X (2005) Prediction Stock Market With Fuzzy Neural Networks. Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21.

[14] Maridziuk J, Jaruszewicz M (2007) Neuro-Evolutionary Approach to Stock Market Predic- tion. 2007 International Joint Conference on Neural Networks, 2515-2520.

[15] Soni S (2005) Applications of ANNs in Stock Market Prediction: A Survey. ijcsetcom 2, 71-83.

[16] White H, Diego S (1988) Economic Prediction Using Neural Networks: The Case of IBM Daily Stock Returns. Neural Networks 1988 IEEE International Conference on, 451-458.

[17] Dogac S (2008) Prediction of stock price direction by artificial neural network approach. Master thesis, Bogazici University.

[18] Yamashita T, Hirasawa K, Hu J (2005) Application of Multi-Branch Neural Networks to Stock Market Prediction. English, 2544-2548.

[19] Quiyong Z, Xiaoyu Z, Fu D (2009) Prediction Model of Stock Prices Based on Correlative Analysis and Neural Networks. Second International Conference on Information and Compu- ting Science, pp: 189-192 , IEEE.

[20] Risi S, Stanley KO (2010) Indirectly Encoding Neural Plasticity as a Pattern of Local Rules. Neural Plasticity 6226, 1-11.

Part VI

Promises Kept

Promises kept. We have created a decoupled, dynamic, flexible, memetic and genetic algorithm based, topology and parameter evolving universal learning net- work, where a node can act as any type of function. The neurons, if they can even be called that at this point, since they are not limited to the use of tanh activation function, can be anything. In some sense, we can instead use the term node rather than neuron, and allow the nodes to be, amongst other things, NNs themselves, and thus transforming our system into a modular NN. Each node could potentially represent a standard neural encoded NN, or a substrate encoded NN. The encoding of our system is flexible enough to scale further, to be further molded and expand- ed. Our system supports both, direct encoding, and indirect encoding. It supports static neural networks, and neural networks with plasticity, where the plasticity functions can be changed and modified, and new ones added without much diffi- culty, they are self contained and decoupled. This extends to indirect encoding, which also supports plasticity. Furthermore, we have made even the evolutionary parameters able to evolve:

-define(ES_MUTATORS,[

mutate_tuning_selection,

mutate_tuning_duration,

mutate_tuning_annealing,

mutate_tot_topological_mutations,

mutate_heredity_type

]).

Our system supports the mutation of various evolutionary strategy parameters. We also made the probabilities of any one mutation operator have itself a mutatable percentage value, and hence the mutation operators use the format: {MutationOperatorName,RelativeProbability}. Our system supports both, Dar- winian and Lamarckian evolution. Our system is fully concurrent, with the pro- cesses like Cortex and Exoself, ready to act as monitors, and allow for self-healing networks to emerge, with further monitors in the hierarchy possibly being the new specie process, the already existing population_monitor process, and finally the polis itself . And yet there is so much more that can easily be added. And as you've seen in the previous chapters, due to the way we constructed our neuroevolutionary platform, adding new features has become trivial.

If our neurons use the activation function tanh, then what we have created is an advanced Topology and Weight Evolving Artificial Neural Network. If we create the activation functions AND, NOT, and OR, then our system becomes a digital circuit evolutionary system, which we can apply to the optimization of already ex- isting digital circuits, or to the creation of new ones. If we use any activation function,

Part VI Promises Kept

then our system acts as a Topology and Parameter Evolving Universal Learning Network (TPEULN) system. If we set some of the activation functions to act as programs, our system is a genetic programming system. If we allow for the above listed evolutionary strategy mutation operators to be active, by setting the ?SEARCH_PARAMTERS_MUTATION_PROBABILITY in the genome_mutator module to anything above 0, our system becomes an evolutionary strategy system. If we set the activation functions to be state machines, our system will act as an evolutionary programming system. If we use indirect encoding, the substrate en- coding we developed in Chapter-16 & 17, our system uses evolutionary embryol- ogy. Why even call our “neurons ” neurons? Why not just nodes, since our neurons can be anything, including NNs themselves, or substrate encoded NNs, and thus making our system into a modular universal learning network. But we do not need to force our system to use just one particular approach, we can set it in the con- straints to use all available functions, all the available features, and it will evolve it all. We can increase the population size, allow our system to use everything, and it will evolve and settle on the parameters and features that gives the evolving NN based agents an advantage in the environment in which they evolve. Our system truly is a Topology and Parameter Evolving Universal Learning Network. It en- compass all the modern learning systems, and yet there is still infinite room to ex- pand, explore, and advance.

In the Applications part of the book, I showed how easy it was to apply our system to two completely different problems, the Artificial Life (and thus robotics, and anything related) simulation, and the Financial Analysis (and thus any other predictive, or classification problem). Because of the way we created our system to use easily modifiable and changeable sensor and actuator modules, our system is so flexible that its application to any problem needs only for the said problem specific sensor and actuator functions to be created. The ALife experiment showed that we can just as easily use our system with something like Player/Gazebo sen- sor/actuator driver provider and 3d robot and environment simulator, respectively. We can continue and evolve not just predators and prey, but evolve robot mor- phologies, allow them to learn not to just hunt each other and find food in 2d space, but to learn how to use physics to their advantage, to use their 3d bodies, to evolve new sensors and actuators, to evolve new morphologies, to evolve... We can similarly evolve NNs which control combat UAVs, in exactly the same man- ner, and I hope after reading this book you see that such a feat is indeed simple to accomplish, and not just words. For we have done it already, just in a 2d environ- ment rather than 3d. The sensors and actuators would define from what systems the UAV acquires its signals, cameras, range sensors, sonars... and through actua- tors we can evolve and let the NN control the various morphological systems, pro- pellers, fins, guns... Our system can also be applied to medicine, we can let the NN learn correlations between genetics and pathology, or symptoms and diseases, evolving a diagnostician. We can apply our system to bioinformatics, using the substrate encoded NN system to explore and create new drugs... Truly, the appli- cation areas in biology are enormous [1,2,3,4,5,6,7,8,9,10,11,12,13].

Part VI Promises Kept

I noted that there is still an infinite amount of things to explore, and more ad- vancements to be included. There are thousands more pages to fill, and so further extensions to the system will be explored in the next volume. In the last remaining chapter I will discuss some of the more pertinent things we will explore in the next book. A glimps of things to come. Though I have a feeling that by the time we get to the next book, you will already have created many of those improvements on your own, and those improvements and advancements that I have not even consid- ered. You already have the knowledge of the system and the theory to continue and explore what I have not, all on your own, pushing beyond the horizon.

[1] Moreira A (2003) Genetic Algorithms for the Imitation of Genomic Styles in Protein Backtranslation. Theoretical Computer Science 322, 17.

[2] Terfloth L, Gasteiger J (2001) Neural Networks and Genetic Algorithms in Drug Design. Drug Discovery Today 6, 102-108.

[3] Huesken D, Lange J, Mickanin C, Weiler J, Asselbergs F, Warner J, Meloon B, Engel S, Rosenberg A, Cohen D, et al (2005) Design of a Genome-Wide siRNA Library Using an Ar- tificial Neural Network. Nature Biotechnology 23, 995-1001.

[4] Vladimir BB, et al (2002) Artificial Neural Networks Based Systems for Recognition of Ge- nomic Signals and Regions: A Review. Informatica 26 389-400 389

[5] Yoshihara I, Kamimai Y, Yasunaga M (2001) Feature Extraction from Genome Sequence Using Multi-Modal Network. Genome Informatics 12, 420-422.

[6] Gutteridge A, Bartlett GJ, Thornton JM (2003) Using a Neural Network and Spatial Cluster- ing to Predict the Location of Active Sites in Enzymes. Journal of Molecular Biology 330, 719-734.

[7] Emmanuel A, Frank AI (2011) Ensembling of EGFR Mutations - based Artificial Neural Networks for Improved Diagnosis of Non-Small Cell Lung Cancer. International Journal of Computer Applications (0975 - 8887) Volume 20 - No.7.

[8] Herrero J, Valencia A, Dopazo J (2001) A Hierarchical Unsupervised Growing Neural Net- work for Clustering Gene Expression Patterns. Bioinformatics 17, 126-136.

[9] Wang DH, Lee NK, Dillon TS (2003) Extraction and Optimization of Fuzzy Protein Se- quence Classification Rules Using GRBF Neural Networks. Neural Information Processing Letters and Reviews, 1(1): 53-59.

[10] Chan CK, Hsu AL, Tang SL, Halgamuge SK (2008) Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing. Journal of Biomedicine and Biotechnology 2008, 513701.

[11] Reinhardt A, Hubbard T (1998) Using Neural Networks for Prediction of the Subcellular Location of Proteins. Nucleic Acids Research 26, 2230-2236.

[12] Oliveira M, Mendes DQ, Ferrari LI, Vasconcelos A (2004) Ribosome Binding Site Recog- nition Using Neural Networks. Genetics and Molecular Biology 27, 644-650.

[13] Azuaje F (2002) Discovering Genome Expression Patterns With Self-Organizing Neural Networks., in Understanding and Using Microarray Analysis Techniques: A Practical Guide. London: Springer Verlag.

Chapter 20 Conclusion

Abstract Last words, future work, and motivation for future research within this field.

We have developed a state of the art topology and weight evolving artificial neural network system. The system is developed to be scalable, concurrent, and the architecture of our system is highly modular and flexible, allowing for future extensions and modifications to the system. We have tested our system on a few standard benchmarking problems, the XOR problem, the Single and Double Pole Balancing problem, and the T-Maze navigation problem. Our system performed superbly in all scenarios without us even having tuned it or optimized it yet.

There is still an enormous amount of features we can add to our system, and due to the way we have developed it and due to it being written in Erlang, it will be easy to do so. We can make our NNs modular, add Kohonen Map, Competitive NN, Hopfield Network, and other types of self organizing network based modules. We can add and test out new mutation operators, for example the use of pruning, the use of splitting in which we would take sections of the NN and make copies of them, slightly perturb them, and reconnect the perturbed copies to the original NN. Or allow for multiple substrates to work together, where one substrate could even modulate the other. We can create committee machines using dead_pools . We can add new forms of neuromodulation, plasticity... There are also new fitness func- tions we can create, those that take into account the Cartesian distance of the con- nections between the neurodes within the substrate for example, which would allow us to push for closely connected neural clusters. We could also add a “crystallization ” feature, where neural circuits which have been topologically sta- ble during the NN's evolution are crystallized into a single function, a single pro- cess represented module. We could add new types of signal normalization prepro- cessors... Even in applications, we can spend hundreds more pages on the integration of Player/Stage/Gazebo with the TWEANN system we have created here, allowing our neuroevolutionary system to evolve the brains in simulated ro- bots inhabiting 3d environments. We can evolve neural networks to control simu- lated Unmanned Combat Aerial Vehicles inside Gazebo, evolving new aerial dog fighting tactics... There is so much more to explore and create, all of it possible and simple due to the use of Erlang. These mentioned features we will develop and design in the next book, advancing this already bleeding edge TWEANN (Or is it Topology and Parameter Evolving Universal Learning Network, TPEULN?) system to the next level.

You now have the information, the tools, and the experience to continue devel- oping this system, or a completely new one on your own. The next advancements in this field will be done by you. We've built this system together, so you are as

familiar with it as I am. The source code is available on GitHub [1], join the group and contribute to the source code, its modular enough that you can add hundreds of features, and those that work well will get taken up by the community working on this project [2]. Or fork the project, and create and advance a parallel version. This field is open-ended, you get to decide where it goes next, how fast it gets there, and what role you will play in it all.

- - - - Last Note ********

I almost forgot, we never really gave a name to the system we've developed here. If you've looked over the published papers on DXNN, then you probably know that we've basically been developing the next generation of the DXNN Platform. I do not wish to give the system we've developed together a new name, let it take on the name DXNN, it is a good name, and appro- priate for such grandiose goals. I began developing DXNN many years ago, and in a sense have been growing it over time, retrofitting the system with new features. This also means that while developing it, I did not take the best path possible, due to not knowing of what else would be added and what problems I would face in the future. This is not the case with the system you and I have developed here. The system we developed in this book was done with all the fore- sight of having previously already developed a system of similar purpose and form. The system we have developed here is cleaner, better, more agile, modular in its implementation, and in general more flexible than DXNN. I hope you can apply it to useful problems, and that if you use it in your research, it is of help.

I sincerely hope you enjoyed reading this book, and developing this system along with me. As I mentioned at the beginning of this book, I believe that we have stumbled upon the perfect neu- ral network programming language, and that Erlang is it. I cannot see myself using anything else in the future, and I have experimented with dozens of different languages for this research. I also think that it adds the flexibility, and the direct mapping from programming language ar- chitecture to problem space in such a way that we can now think clearly when developing dis- tributed computational intelligence systems, which will allow us to create systems with features and capabilities previously not possible. Evolution will generate the complexity, we just need to give our system enough tools, flexibility, and the space in which it can carve out the evolution- ary path towards new heights.

Gene I. Sher

20.1 References

[1] GitHub Account with the source code: https://github.com/CorticalComputer/DXNN2

[2] Research site: www.dxnnresearch.com

[3] Supplementary Material: www.DXNNResearch.com/NeuroevolutionThroughErlang

Abbreviations

AF – Activation Function

ALife – Artificial Life

AC – Augmented Competition Selection Algorithm

BMU – Best Matching Unit

BP – Backpropagation

CI – Computational Intelligence

CL – Competitive Learning

CO – Concurrency Oriented

UCAV – Unmanned Combat Aerial Vehicle

DFGS – Dangerous Food Gathering Simulation

DXNN – Dues Ex Neural Network

EANT – Evolutionary Acquisition of Neural Topologies

EMH – Efficient Market Hypothesis

EPNet – Evolutionary Programming Network

ES – Evolutionary Strategy

FX – Foreign Exchange

GHA – Generalized Hebbian Algorithm

GSOM – Growing Self Organizing Map

GTM – General Topographic Map

HtH – Hyperlayer-to-Hyperlayer

IHD – Input Hyperlayer Densities

IHDTag – Input Hyperlayer Densities Tag

NAO – Normalized Allotted Offspring

NEAT – Neuroevolution of Augmenting Topologies

NN – Neural Network

MO – Mutation Operator

PCI – Price Chart Input

PLI – Price List Input

PPS – Predator vs. Prey Simulation

RIM – Random Intensity Mutation

RR-SHC – Random Restart Stochastic Hill Climber

RWT – Random Walk Theory

SENN – Substrate Encoded Neural Network

SHC – Stochastic Hill Climbing

SFGS – Simple Food Gathering Simulation

SOM – Self Organizing Map

TPEULN – Topology and Parameter Evolving Universal Learning Network TWEANN – Topology and Weight Evolving Artificial Neural Network UAV – Unmanned Aerial Vehicle

32

Sommaire

11.4.3 Updating the population_monitor Module

11.4.4 Creating the selection_algorithm Module

11.4.5 Creating the fitness_postprocessor Module

11.4.6 Creating the steady_state Evolutionary Loop

11.4.7 Updating the exoself Module

11.4.8 Updating the neuron Module

11.4.9 Creating the signal_aggregator Module

11.4.10 Creating the plasticity Module

11.5 Compiling & Testing the New System

11.6 Summary & Discussion

11.7 References

12.1 The Necessary Additions to the System

12.2 The Trace Format

12.3 Implementation

12.3.1 Updating records.hrl

12.3.2 Building the Topological Summary of a Neural Network

12.3.3 Implementing the Trace Updating Cast Clause

12.3.4 Updating the exoself Module

12.4 Compiling & Testing

12.5 Summary & Discussion

13.1 The benchmarker Architecture

13.2 Adding New Records

13.3 Updating the population_monitor Module

13.4 Implementing the benchmarker

13.5 Compiling and Testing

13.6 Summary

13.7 References

14.1 Pole Balancing Simulation

14.1.1 Implementing the Pole Balancing Scape

14.1.2 Implementing the Pole Balancing morphology

14.1.3 Benchmark Results

14.2 T-Maze Simulation

14.2.1 T-Maze Implementation

14.2.2 Benchmark Results

14.3 Summary & Discussion

14.4 References

15.1 Hebbian Rule

15.1.1 Implementing the New input_idps & pf Formats

15.1.2 Implementing the Simple Hebbian Learning Rule

15.2 Oja's Rule

15.2.1 Implementing the Oja's Learning Rule

15.3 Neuromodulation

15.3.1 The Neuromodulatory Architecture

15.3.2 Implementing the self_modulation Learning Rules

15.3.3 Implementing the input_idps_modulation Based Neuromodulated Plasticity

15.4 Plasticity Parameter Mutation Operators

15.4.1 Implementing the Weight Parameter Mutation Operator

15.4.2 Implementing the Neural Parameter Mutation Operator

15.4.3 Implementing the Hybrid, Weight & Neural Parameters Mutation Operator

15.4.4 Updating the genome_mutator Module

15.5 Tuning of a NN which has Plastic Neurons

15.6 Compiling & Testing

15.7 Summary & Discussion

15.8 References

16.1 A Brief Overview of Substrate Encoding

16.2 The Updated Architecture of Our NN Based Systems

16.3 The Genotype of the Substrate Encoded NN

16.4 The SENN Phenotype

16.5 Implementing the substrate_cpps & substrate_ceps

16.5.1 Implementing the substrate_cpp Module

16.5.2 Implementing the substrate_cep Module

16.6 Updating the genotype Module

16.7 Updating the exoself Module

16.8 Implementing the substrate Module

16.9 Updating the genome_mutator Module

16.10 Implementing the add_cpp and add_cep Mutation Operators

16.11 Testing the New Encoding Method

16.13 References

16.12 Summary and Discussion

17.1 The Updated Architecture

17.2 Implementing the abcn Learning Rule

17.2.1 Updating the substrate Module

17.2.2 Updating the Morphology Module

17.2.3 Updating the substrate_cpp & substrate_cep Modules

17.2.4 Benchmarking the New Substrate Plasticity

17.3 Implementing the iterative Learning Rule

17.3.1 Benchmarking the New iterative Substrate Plasticity

17.4 Discussion