Title:

Automatically evolving rule induction algorithms with grammarbased genetic programming

In the last 30 years, research in the field of rule induction algorithms produced
a large number of algorithms. However, these algorithms are usually obtained
from the combination of a basic rule induction algorithm (typically following the
sequential covering approach) with new evaluation functions, pruning methods
and stopping criteria for refining or producing rules, generating many "new" and
more sophisticated sequential covering algorithms.
We cannot deny that these attempts to improve the basic sequential covering
~approach have succeeded. Hence, if manually changing these major components
of rule induction algorithms can result in new, significantly better ones, why not
to automate this process to make it more costeffective? This is the core idea
of this work: to automate the process of designing rule induction algorithms by
means of grammarbased genetic programming.
Grammarbased Genetic Programming (GGP) is a special type of evolutionary
algorithm used to automatically evolve computer programs. The most interesting
feature of this type of algorithm is that it incorporates a grammar into its search
mechanism, which expresses prior knowledge about the problem being solved.
Since we have a lot of previous knowledge about how humans design rule induction
algorithms, this type of algorithm is intuitively a suitable tool to automatically
evolve rule induction algorithms.
The grammar given to the proposed GGP system includes knowledge about
how humans design rule induction algorithms, and also presents some new elements
which could work in rule induction algorithms, but to the best of our
knowledge were not previously tested. The GG P system aims to evolve rule induction
algorithms under two different frameworks, as follows. In the first framework,
the GGP is used to evolve robust rule induction algorithms, i.e., algorithms
which were designed to be applied to virtually any classification data set, like a
manuallydesigned rule induction algorithm. In the second framework, the GGP
is applied to evolve rule induction algorithms tailored to a specific application
XVI
domain, i.e., rule induction algorithms tailored to a single data set. Note that
the latter framework is hardly feasible on a hard scale in the case of conventional,
manuallydesigned algorithms, since the number of classification data sets greatly
outnumbers the number of rule induction algorithms designers. However, it is
clearly feasible on a large scale when using the proposed system, which automates
the process of rule induction algorithm design and implementation.
Overall, extensive computational experiments with 20 VCI data sets and 5
bioinformatics data sets showed that effective rule induction algorithms can be
automatically generated using the GGP in both frameworks. Moreover, the automatically
evolved rule induction algorithms were shown to be competitive with
(and overall slightly better than) four wellknown manually designed rule induction
algorithms when comparing their predictive accuracies.
The proposed GGP system was also compared to a grammarbased hillclimbing
system, and experimental results showed that the GGP system is a more
effective method to evolve rule induction algorithms than the grammarbased hillclimbing
method.
At last, a multiobjective version of the GGP (based on the concept of Pareto
dominance) was also proposed, and experiments were performed to evolve robust
rule induction algorithms which generate both accurate and simple models.
The results showed that in most of the cases the GGP system can produce
rule induction algorithms which are competitive in predictive accuracy to wellknown
humandesigned rule induction algorithms, but generate simpler classification
modes  i.e., smaller rule sets, intuitively easier to be interpreted by the
user.
