2. Minimal examples – by genotype

  1. BitString genotype – OneMax example

  2. Binary genotype – Function minimization example

  3. FloatingPoint genotype - alternative to Binary encoding

  4. Tree genotype – Symbolic regression example

  5. Permutation genotype – Traveling Salesperson Problem


a) Genotype BitString – OneMax example

Suppose you want to maximize the number of ones in a vector of bits – which is a simple OneMax problem. You would then use the BitString genotype, which offers no coding/decoding, just a storage of n bits (and associated genetic operators). To implement the evaluation operator, you need to inherit from the EvaluateOp class:

class OneMaxEvalOp : public EvaluateOp
{
public:
        FitnessP evaluate(IndividualP individual);
};
typedef boost::shared_ptr<OneMaxEvalOp> OneMaxEvalOpP;  // smart ptr typedef; see e.g. http://www.codeproject.com/KB/stl/boostsmartptr.aspx

and then implement the evaluate function:

// evaluate() receives a smart pointer to the individual to evaluate
FitnessP OneMaxEvalOp::evaluate(IndividualP individual)
{
        // evaluation creates a new fitness object using a smart pointer
        // in our case, we try to maximize the number of ones, so we use FitnessMax fitness (for maximization problems)
        FitnessP fitness = static_cast<FitnessP> (new FitnessMax);

        // Each individual is a vector of genotypes. We must define them in config.
        // We'll use BitString, and retrieve it as the first and only genotype
        BitStringP bitstr = boost::dynamic_pointer_cast<BitString::BitString> (individual->getGenotype(0)); // don't need zero for the first one
	// (you can also use 'ordinary' pointers:)
	//Bittring::BitString* bitstr = (BitString::BitString*) individual->getGenotype().get();
        
        // count the ones; where are they?
        // BitString genotype contains a std::vector of bool's named 'bits'
        uint ones = 0;
        for(uint i = 0; i<bitstr->bits.size(); i++){
                if(bitstr->bits[i] == true)
                        ones++ ;
        }
        fitness->setValue(ones);

        // return the smart pointer to new fitness
        return fitness;
}

Then, we reference the evaluation op. in main:

int main(int argc, char **argv)
{
        StateP state = static_cast<StateP> (new State);
        state->setEvalOp(static_cast<EvaluateOpP> (new OneMaxEvalOp));

        state->initialize(argc, argv);
        state->run();
        return 0;
}

and finally, define the BitString genotype – with its one parameter, the number of bits – in a minimal configuration file:

<ECF>
        <Genotype>
                <BitString>
                        <Entry key="size">20</Entry>
                </BitString>
        </Genotype>
</ECF>

The program needs to be run with the name of the above configuration file as its command line argument, i.e. "./program parameters.txt".

That's it. No other parameters needed (see section Using the parameters to see the default parameter settings). And with a problem this simple, this will usually work fine. See this example in examples directory.

 


b) Genotype Binary – Function minimization example

Suppose you want to minimize a multidimensional function, which is a common GA task of function minimization. Say the function looks like this:

f(x) = (x[1]-1)2 + (x[2]-2)2 + ... + (x[n]-n)2 , where n is a given parameter (function dimensionality) and x[i] the i-th component of vector x.

You would then use the Binary genotype which encodes real-valued numbers as a sequence of bits using binary encoding (the most usual in GA; the alternative representation is floating-point in the next example). Binary genotype offers these mandatory parameters:

All of these four must be defined for a valid Binary genotype. If you need different boundaries and precision for different variables, you can simply use more than one Binary genotype, as shown in section Using the parameters. To implement the evaluation operator, we will inherit from the EvaluateOp class:

class FunctionMinEvalOp : public EvaluateOp
{
public:
        FitnessP evaluate(IndividualP individual);
};
typedef boost::shared_ptr<FunctionMinEvalOp> FunctionMinEvalOp P;  // smart ptr typedef; see e.g. http://www.codeproject.com/KB/stl/boostsmartptr.aspx

and then implement the evaluate function:

FitnessP evaluate(IndividualP individual)
{
        // evaluation creates a new fitness object using a smart pointer
        // in our case, we try to minimize the function value, so we use FitnessMin fitness (for minimization problems)
        FitnessP fitness = static_cast<FitnessP> (new FitnessMin);

        // we define Binary as the only genotype (in configuration file)
        BinaryP gen = boost::dynamic_pointer_cast<Binary::Binary> (individual->getGenotype());
	// (you can also use 'ordinary' pointers:)
	//Binary::Binary* gen = (Binary::Binary*) individual->getGenotype().get();


        double realTemp, value = 0;
        // we implement the fitness function 'as is', without any translation
        // the number of variables we get from the genotype 
        for (uint i = 0; i < gen->realValue.size(); i++){
                realTemp = pow((gen->realValue[i] - (i + 1)), 2.);
                value += realTemp;
        }

        fitness->setValue(value);
        return fitness;
}

Then, we reference the evaluation operator in main:

int main(int argc, char **argv)
{
        StateP state = static_cast<StateP> (new State);
        state->setEvalOp(static_cast<EvaluateOpP> (new FunctionMinEvalOp));

        state->initialize(argc, argv);
        state->run();
        return 0;
}

and finally, define the Binary genotype and its parameters in a minimal configuration file:

<ECF>
        <Genotype>
                <Binary>
                        <Entry key="lbound">-10</Entry>
                        <Entry key="ubound">10</Entry>
                        <Entry key="precision">3</Entry>
                        <Entry key="dimension">2</Entry>
                </Binary>
        </Genotype>
</ECF>

The program can be run with the name of the above configuration file as its command line argument, i.e. "./program parameters.txt".

This function has an arbitrary number of variables. Where this is not the case, then the dimension parameter should be the same (or at least not less than) as implemented in the evaluation operator. See this example in examples directory.

 

This example can also be solved in ECF with particle swarm optimization algorithm: just supplement the minimal configuration file with the file "parameters_PSO.txt" (in this example's directory) or add the following Algorithm block in the configuration:

<Algorithm>
	<ParticleSwarmOptimization>
		<Entry key="weightType">1</Entry>
		<Entry key="weight">0.9</Entry>
		<Entry key="maxVelocity">100</Entry>
	</ParticleSwarmOptimization>
</Algorithm>

 

c) FloatingPoint genotype – alternative to Binary encoding

The previous example of function minimization/maximization can also be solved using FloatingPoint genotype, which is simply a vector of floating-point numbers (variables of type double). Depending on the problem, floating-point representation may exhibit better (or worse) convergence than binary encoding, and is much faster in almost any configuration. FloatingPoint genotype uses equivalent parameters (the same as Binary genotype, only without precision):

All of these must be defined for a valid genotype. The only difference in the previous source code is that the following line in FunctionMinEvalOp::evaluate

        BinaryP gen = boost::dynamic_pointer_cast<Binary> (individual->getGenotype());

should be replaced with

        FloatingPointP gen = boost::dynamic_pointer_cast<FloatingPoint> (individual->getGenotype());

Minimal configuration file for FlotingPoint genotype would look like this (see "parameters_flpoint.txt" in this example's directory):

<ECF>
        <Genotype>
		<FloatingPoint>
			<Entry key="lbound">-50</Entry>
			<Entry key="ubound">50</Entry>
			<Entry key="dimension">2</Entry>
		</FloatingPoint>
        </Genotype>
</ECF>

 


d) Genotype Tree – Symbolic regression example

Suppose you want to discover the analytic form of an unknown function that matches some given data – this is a typical symbolic regression problem. This problem is commonly solved with genetic programming, using functional primitives and variables in a tree-like syntactic structure. In ECF you would use the genotype Tree, which can contain predefined or user defined functions and variables.

Say we want to discover the function that will best represent the following data in range [-10, 10]:

X values

-10

-8

-6

-4

-2

0

2

4

6

8

Y values

-9.456

-8.989

-5.721

-3.243

-2.909

0.000

2.909

3.243

5.721

8.989

This data is actually generated with the function y = x + sin(x), and this is the target function that GP will try to evolve. How? Genotype Tree offers these parameters:

When applied to this problem, we may the define the following elements: the variable is only one, and we may name it anyway we like (say X). The function set should be sufficient to describe the given data – for instance, we may include functions sin, cos, +, -, * and /. The mindepth and maxdepth values may be 1 and 5, so we get the following genotype:

<ECF>
        <Genotype>
                <Tree>
                        <Entry key="maxdepth">5</Entry>
                        <Entry key="mindepth">1</Entry>
                        <Entry key="functionset">sin cos + - / *</Entry>
                        <Entry key="terminalset">X</Entry>
                </Tree>
        </Genotype>
</ECF>

The evaluation operator should measure the difference between the given data (y value) and the output of the actual evolved tree (function) for each given x value. To do that, the evaluation operator should have the pairs of x and y values predefined or calculated beforehand (no need to calculate y values each time an individual is evaluated). This can be done in the initialization phase, before the evolution starts. So the evaluation operator is defined as:

class SymbRegEvalOp : public EvaluateOp
{
public:
        FitnessP evaluate(IndividualP individual);
        bool initialize(StateP); // initialization of training data
        std::vector<double> domain;
        std::vector<double> codomain;
        uint nSamples;
};
typedef boost::shared_ptr<SymbRegEvalOp> SymbRegEvalOpP;   // smart ptr typedef; see e.g. http://www.codeproject.com/KB/stl/boostsmartptr.aspx

(check section Using the parameters to see how data points can also be loaded from the configuration file). The evaluator implementation may look like this:

// called only once, before the evolution – generates training data
bool SymbRegEvalOp::initialize(StateP state)
{
        nSamples = 10;
        double x = -10;
        for(uint i = 0; i < nSamples; i++) {
                domain.push_back(x);
                codomain.push_back(x + sin(x));
                x += 2;
        }
        return true;
}

FitnessP SymbRegEvalOp::evaluate(IndividualP individual)
{
        // we try to minimize the function value, so we use FitnessMin fitness (for minimization problems)
        FitnessP fitness = static_cast<FitnessP> (new FitnessMin);

        // get the genotype we defined in the configuration file
        TreeP tree = boost::dynamic_pointer_cast<Tree::Tree> (individual->getGenotype());
	// (you can also use 'ordinary' pointers:)
	//Tree::Tree* bitstr = (Tree::Tree*) individual->getGenotype().get();

        
        double value = 0;
        for(uint i = 0; i < nSamples; i++) {
                // for each test data instance, the x value (domain) must be set
                tree->setTerminalValue( "X", domain[i] );
                // get the y value of the current tree
                double result;
                tree->execute(&result);
                // add the difference
                value += abs(codomain[i] - result);
        }
        fitness->setValue(value);
        
        return fitness;
}

And the example may be run with the described configuration file, i.e. "./program parameters.txt". See this example in examples directory.

NOTE: For additional customization of Tree genotype (e.g. defining ephemereal random constants, adding new functions, adding custom function/terminal types etc.) see section Customizing the Tree genotype.

 


e) Genotype Permutation – Traveling Salesperson Problem

Suppose you want to solve what is commonly known as a Traveling Salesman Problem (TSP), i.e. finding the shortest route to visit all 'cities' in a set where each 'city' must be visited once. The most common representation uses the permutation solution encoding where each potential solution is a sequence of 'city' indexes. In ECF, this is what the Permutation genotype is used for.

A single most important genotype parameter is its size, usually corresponding to the number of 'cities', which we can define in a minimal configuration file (in the ECF TSP example there are 29 cities). We also define a separate problem description file which contains the distances between each two cities:

<ECF>
         <Genotype>
                 <Permutation>
                         <Entry key="size">29</Entry>
                 </Permutation>
         </Genotype>

         <Registry>
                 <Entry key="tsp.infile">../distances.txt</Entry>
         </Registry>
 </ECF>

A permutation genotype of size n will contain a permutation of indexes 0,1, ... , n - 1 which can be decoded into a corresponding sequence of 'cities'. The evaluation operator may be declared as follows, where the matrix weights will represent all the distances between any two cities:

class TSPEvalOp : public EvaluateOp 
{
private:
	int dimension;
	vector< vector<int> > weights;
public:
	void registerParameters(StateP);
	bool initialize(StateP);
	FitnessP evaluate(IndividualP individual);
};
typedef boost::shared_ptr<TSPEvalOp> TSPEvalOpP;   // smart ptr typedef; see e.g. http://www.codeproject.com/KB/stl/boostsmartptr.aspx

The evaluation operator should either define the distances between cities or load them from a separate problem description file:

void TSPEvalOp::registerParameters(StateP state)
{
	state->getRegistry()->registerEntry("tsp.infile", (voidP) (new std::string), ECF::STRING);
}

bool TSPEvalOp::initialize(StateP state)
{
	if(!state->getRegistry()->isModified("tsp.infile")) {
		ECF_LOG_ERROR(state, "Error: no input file defined for TSP! (parameter 'tsp.infile'");
		return false;
	}

	voidP sptr = state->getRegistry()->getEntry("tsp.infile"); // get parameter value
	string filePath = *((std::string*) sptr.get()); // convert from voidP to user defined type

	ifstream iFile(filePath.c_str());
	if(!iFile.is_open()) {
		ECF_LOG_ERROR(state, "Error: can't open input file " + filePath);
		return false;
	}

	// ...
	// not shown here: code that initializes weights from tsp.infile
	// ...

	return true;
}

FitnessP TSPEvalOp::evaluate(IndividualP individual)
{
	FitnessP fitness = static_cast<FitnessP> (new FitnessMin);
	PermutationP perm = boost::dynamic_pointer_cast<Permutation::Permutation> (individual->getGenotype());

	// genotype Permutation keeps a vector of indexes named 'variables'
	int fitnessV = 0;
	for(int i = 0; i < perm->size - 1; i++) {
		// the length of each route is the sum of distances (weights) between each city in a route
		fitnessV += weights[perm->variables[i]][perm->variables[i+1]];
	}
	fitnessV += weights[perm->variables[0]][perm->variables[dimension-1]];

	fitness->setValue(fitnessV);
	return fitness;
}

And the example may be run with the described configuration file, i.e. "./program parameters.txt". See this example in examples directory.