Invitation to submit to the special issue on Genetic Improvement in the Automated Software Engineering Journal

NEW A GI tutorial (details) will be given at GECCO 2026.

Jump to papers from:

GI@ICSE 2025
GI@ICSE 2024
GI@ICSE 2023
GI@GECCO 2022
GI@ICSE 2021
GI@GECCO 2020
GI@ICSE 2020
GI@GECCO 2019
GI@ICSE 2019
GI@GECCO 2018
GI@ICSE 2018
GI@GECCO 2017
GI@GECCO 2016
GI@GECCO 2015

GI@ICSE 2025: 14th International Workshop

A Three-Stage Genetic Algorithm for Compiler Flag and Library Version Selection to Minimize Execution Time
by Chi Ho Chan and Spyro Nita
DOI PDF VIDEO VIDEO URL Abstract

Existing research in compiler autotuning mainly focuses on selecting optimization flags without configurable values. However, the potential of selecting optimization flags with configurable values, alongside using directory and link flags for library version selection to improve performance, remains largely unexplored. We propose a three-stage Genetic Algorithm (GA) that incrementally selects optimization flags without configurable values, then optimization flags with configurable values, and finally library versions, to minimize software execution time. We also discuss the implementation challenges of the proposed algorithm and outline potential future work.

Automated Software Performance Improvement with Magpie
by Aymeric Blot
DOI PDF SLIDES VIDEO URL Abstract

In this tutorial, I will present Magpie (https://github.com/bloa/magpie), a powerful tool for both Genetic Improvement researchers and practitioners. Magpie stands at the forefront of software evolution, providing a streamlined approach to model, evolve, and automatically improve software systems. Addressing both functional and non-functional concerns, Magpie offers a user-friendly no-code interface that seamlessly integrates various search processes, as well as enabling easy Python code injection for advanced users to further tailor and specialise the improvement process to meet their specific needs. We will provide a concise overview of Magpie internals before exploring diverse real-world scenarios.

Empirical Comparison of Runtime Improvement Approaches: Genetic Improvement, Parameter Tuning, and Their Combination
by Thanatad Songpetchmongkol, Aymeric Blot, and Justyna Petke
DOI PDF PDF URL URL Abstract

Software can be optimised in various ways, e.g., by changing the code directly, modifying compiler or software paramters. From our literature review, we found that the best search strategies in genetic improvement and algorithm configuration, that generalise to both domains, are based on local search and genetic algorithms. We thus compared the two approaches for runtime improvement of the MiniSAT solver. We also explored the two search strategies on the joint search space of parameter and source code edits. We found that genetic improvement with first improvement local search led to the best results by improving MiniSAT runtime by 18.05 percent.

Enhancing Software Runtime with Reinforcement Learning-Driven Mutation Operator Selection in Genetic Improvement
by Damien Bose, Carol Hanna, and Justyna Petke
DOI PDF VIDEO VIDEO URL URL Abstract

Genetic Improvement employs heuristic search algorithms to explore the search space of program variants by modifying code using mutation operators. This research focuses on operators that delete, insert and replace source code statements. Traditionally, in GI, an operator is chosen uniformly at random at each search iteration. Reinforcement Learning to intelligently guide the selection of these operators specifically to improve program runtime. We propose to integrate RL into the operator selection process. Four Multi-Armed bandit RL algorithms (Epsilon Greedy, UCB, Probability Matching, and Policy Gradient) were integrated within a GI framework, and their efficacy and efficiency were bench marked against the traditional GI operator selection approach. These RL-guided operator selection strategies have demonstrated empirical superiority over the traditional GI methods of randomly selecting a search operator, with UCB emerging as the top-performing RL algorithm. On average, the UCB-guided Hill Climbing search algorithm produced variants that compiled and passed all tests 44% of the time, while only 22% of the variants produced by the traditional uniform random selection strategies compiled and passed all tests.

LLM-Assisted Crossover in Genetic Improvement of Software
by Dimitrios Stamatios Bouras, Sergey Mechtaev, and Justyna Petke
DOI PDF VIDEO VIDEO URL URL Abstract

We evaluated against five traditional crossover methods across seven benchmarks, measuring performance on four key metrics: average ranking, best variant execution time, efficiency in reaching performance milestones, and viable variant count. Results show that LLM-assisted crossover achieved an average ranking of 2.27 (on a scale where 1 is best and 6 is worst), making it the top-performing method across benchmarks based on the quality of the optimal variants produced. The LLM-based approach also improved the fitness (execution time) by an average of 8.5 percent over the best variant produced by the traditional methods. In terms of efficiency, the LLM-assisted crossover required on average 25.6 percent fewer variants to reach 25 percent, 50 percent, 75 percent, and 100 percent of the final performance improvement, compared to the traditional methods. Additionally, the LLM-assisted crossover produced 4.8 percent more viable variants across scenarios, including both source code modification and parameter tuning cases. These findings suggest that LLMs can significantly enhance genetic programming by guiding the crossover process toward more effective and viable solutions, providing motivation for further research in LLM-assisted evolutionary algorithms.

Large Language Model based Code Completion is an Effective Genetic Improvement Mutation
by Jingyuan Wang, Carol Hanna, and Justyna Petke
DOI PDF VIDEO VIDEO URL URL Abstract

we introduce a novel large language model (LLM)-based masking mutation operator for Genetic Improvement (GI), which leverages code completion capabilities of large language models to replace masked code segments with contextually relevant modifications. Our approach was tested on five open-source Java projects, where we compared its effectiveness against both traditional GI mutations and an existing LLM-based replacement mutation operator using random sampling and local search algorithms. Results show that the masking mutation operator creates a search space with more compiling and test-passing patches, reducing model response time by up to 60.7% compared to the replacement mutation. Additionally, it outperforms the replacement mutation in achieving the highest runtime improvement on four out of five projects and discovers more runtime-improving patches across all projects. However, combining the masking mutation with traditional GI mutations yielded inconsistent results, suggesting further investigation is needed. This study highlights the promise of LLM-based code completion to boost the efficiency and effectiveness of GI for automated software optimisation.

Put on Your Tester Hat: Improving programs for Automated Program Generation
by Shin Hwei Tan
DOI PDF Abstract

Given either a specification written in natural language or an input program, automated program generation techniques produce a program according to the given specification or by modifying the input program. Automated program generation is a powerful technique that can be used for finding bugs in software systems that take programs as input or fixing bugs in the input programs. However, most existing techniques focus on automated program generation for automated program repair or program synthesis. This talk introduces a different perspective of automated program generation where we will discuss our latest results on automated program generation for testing static program analyzers by designing different types of program transformations. We will also explore and rethink about the automated program generation problem from the tester perspective. The new perspective could have huge potential for the design of new genetic improvement techniques to improve programs for automated program generation.

The gem5 C++ glibc Heap Fitness Landscape
by William B. Langdon and Bobby R. Bruce
DOI PDF VIDEO VIDEO URL Abstract

Using the language independent genetic improvement tool MAGPIE (Machine Automated General Performance Improvement via Evolution of software) and logarithmic sampling, we measure the parameter fitness landscape when optimising the GNU glibc heap management of a million line C++ application, gem5. The malloc_info landscape is far smoother than is commonly assumed and savings of 11 percent with no loss of runtime speed are readily obtained by both Magpie and CMA-ES.

GI@ICSE 2024: 13th International Workshop

Automated Software Performance Improvement with Magpie
by Aymeric Blot
DOI PDF SLIDES VIDEO VIDEO URL URL Abstract

we present Magpie, a powerful tool for both Genetic Improvement researchers and practitioners. Magpie stands at the forefront of software evolution, providing a streamlined approach to model, evolve, and automatically improve software systems. Addressing both functional and non-functional concerns, Magpie offers a user-friendly no-code interface that seamlessly integrates various search processes, as well as enabling easy Python code injection for advanced users to further tailor and specialise the improvement process to meet their specific needs. We will provide a concise overview of Magpie's internals before exploring diverse real-world scenarios. Aymeric Blot is a Senior Lecturer at the Universite of Rennes, France.

Deep Mutations have Little Impact
by William B. Langdon and David Clark
DOI PDF PDF SLIDES VIDEO VIDEO VIDEO Abstract

Using the language independent genetic improvement tool MAGPIE (Machine Automated General Performance Improvement via Evolution of software), we measure the impact of GI on a non-deterministic deeply nested PARSEC VIPS parallel computing multi-threaded image processing benchmark written in C. More than 53% of mutants compile and generate identical output. Overall we find approximately 10% Failed Disruption Propagation (FDP). Excluding software detected internal errors and asserts, almost all changes deeper than 30 nested functions which are Executed and Infect data or change control are not Propagated. That is, almost all deep PIE code changes have no external impact. Suggesting (where it relies on testing) automatic software engineering on deeply nested code will be hard.

Ecosystem Curation in Genetic Improvement for Emergent Software Systems
by Zsolt Nemeth, Penn Faulkner Rainford, and Barry Porter
DOI PDF SLIDES VIDEO VIDEO Abstract

Emergent software systems are composed, and continuously recomposed at runtime, from a large pool of small potential building blocks with the aim of responding to changes in the deployment environment [REX]. The approach assumes that each building block has a set of available variations in the selection pool, such that the most appropriate collection of variants can be composed according to the current conditions, with some objective function in mind. Populating such a pool of implementation variation, however, is not a trivial task, and existing work has examined the use of Genetic Improvement (GI) to drive this process [rainford:2022:GECCO]

Executing One's Way out of the Chinese Room
by Shin Yoo
DOI PDF SLIDES VIDEO Abstract

One very attractive property of Large Language Models (LLMs) is their emergent in-context learning capability, which enables us to simply describe our requirements in natural languages and get the corresponding source code generated in programming languages. While LLMs as a generative model are known to hallucinate, i.e., generate factually incorrect contents, the fact that code can be executed can be used to fight this phenomenon. We briefly look at existing techniques designed to improve the correctness of code generated by LLMs, and will try to imagine the future of Genetic Improvement that is supported, enhanced, and driven by LLMs.

Genetic Improvement for DNN Security
by Hunter Baxter, Yu Huang, and Kevin Leach
DOI PDF SLIDES VIDEO VIDEO VIDEO Abstract

Genetic improvement (GI) in Deep Neural Networks (DNNs) has traditionally enhanced neural architecture and trained DNN parameters. Recently, GI has supported large language models by optimising DNN operator scheduling on accelerator clusters. However, with the rise of adversarial AI, particularly model extraction attacks, there is an unexplored potential for GI in fortifying Machine Learning as a Service (MLaaS) models. We suggest a novel application of GI, not to improve model performance, but to diversify operator parallelism for the purpose of a moving target defence against model extraction attacks. We discuss an application of GI to create a DNN model defense strategy that uses probabilistic isolation, offering unique benefits not present in current DNN defense systems.

Grammar evolution and symbolic regression for astrometric centering of Hubble Space Telescope images
by Ricardo Sarmiento, Marina de la Cruz, Alfonso Ortega, Terrence M. Girard, Dana I. Casetti-Dinescu, Alejandro Cervantes, and Roberto Baena-Galle
DOI PDF SLIDES VIDEO VIDEO URL Abstract

Symbolic regression, in general, and genetic models, in particular, are promising approaches to mathematical modeling in astrometry where it is not always clear which is the fittest analytic expression depending on the problem under consideration. Several attempts and increasing research efforts are being made in this direction mainly from the Genetic Programming (GP) viewpoint. Our proposal is, as far as we know, the first one to apply Grammatical Evolution (GE) in this domain. GE (and further GE extensions) aim to outperform GP limitations by incorporating formal language stools to guarantee the correctness (both syntactic and semantic) of the generated expressions. The current contribution is a first proof to check the viability of GE on astrometric real datasets. Its success in finding adequate parameters for predefined families of functions in star centering (Gaussian and Moffat PSFs) with simple and naive GE experiments supports our hypothesis on taking advantage of the expressive power of GE to tackle astrometry scenarios of interest and hence greatly improve current astrometric software thanks to specific genetic approaches.

Human Guidance Approaches for the Genetic Improvement of Software
by Benjamin J. Craine, Penn Faulkner Rainford, and Barry Porter
DOI PDF SLIDES VIDEO VIDEO Abstract

Existing research on Genetic Improvement (GI) of source code to improve performance [rainford:2022:GECCO] has examined the mixed application of code synthesis and traditional GI mutation/crossover to gain higher-performing individuals that are tailored to particular deployment contexts, for examples such as hash tables or scheduling algorithms. While demonstrating successful improvements, this research presents a host of challenges [Rainford:2021:GI], from search space size to fitness landscape shape, which raise questions on whether GI alone is able to present a complete solution. In this position paper we propose to augment GI processes with Human Guidance (HG) to offer a co-pilot paradigm which may overcome these challenges.

On Reducing Network Usage with Genetic Improvement
by James Callan, William B. Langdon, and Justyna Petke
DOI PDF SLIDES URL URL Abstract

Mobile applications can be very network-intensive. Mobile phone users are often on limited data plans, while network infrastructure has limited capacity. There is little work on optimising network usage of mobile applications. The most popular approach has been prefetching and caching assets. However, past work has shown that developers can improve the network usage of Android applications by making changes to Java source code. We built upon this insight and investigated the effectiveness of automated, heuristic application of software patches, a technique known as Genetic Improvement (GI), to improve network usage. Genetic improvement has already shown effective at reducing the execution time and memory usage of Android applications. We thus adapt our existing GIdroid framework with a new mutation operator and develop a new profiler to identify network-intensive methods to target. Unfortunately, our approach is unable to find improvements. We conjecture this is due to the fact source code changes affecting network might be rare in the large patch search space. We thus advocate use of more intelligent search strategies in future work.

GI@ICSE 2023: 12th International Workshop

All about the money: Cost modeling and optimization of cloud applications
by Sebastian Baltes
DOI PDF SLIDES SLIDES VIDEO VIDEO VIDEO Abstract

Cost is an essential non-functional property of cloud applications and is often a primary reason for companies to move to the cloud. One significant advantage of cloud platforms is the possibility to scale compute, storage, and networking resources up and down based on demand. However, as an application scales, so does the cost. Cost transparency of cloud applications is a common problem, and cloud providers have responded by providing means for detecting cost anomalies. However, detecting anomalies after billing is a workaround rather than a solution addressing the core problem. After introducing central cloud computing concepts and typical pricing approaches in the cloud, this talk outlines our vision of a vendor-agnostic cost model enabling reasoning about cost-optimal infrastructure and platform configurations based on expected workloads. The overall goal is to shift cost transparency left, i.e., to the developers and platform engineers who frequently provision cloud environments using web portals or Infrastructure-as-Code (IaC) files. The talk concludes by summarising the current trend towards Infrastructure-from-Code (IfC), where programming languages and cloud infrastructure descriptions converge into one paradigm, intending to automate infrastructure provisioning as much as possible. This area has huge potential for genetic improvement to optimize the IfC code and the provisioning mechanisms while balancing nonfunctional properties such as performance and cost.

DebugNS: Novelty Search for Finding Bugs in Simulators
by David Griffin, Susan Stepney, and Ian Vidamour
DOI PDF SLIDES VIDEO VIDEO Abstract

Novelty search is used to find a range of novel behaviours in a system. Software bugs are behaviours that are a) unexpected and b) incorrect. As the intersection between ``novel'' and ``unexpected'' is non-empty, here we overview how novelty search can be employed to find bugs in simulation software. We give an example of this approach applied to the RingSim simulator.

Exploring the Use of Natural Language Processing Techniques for Enhancing Genetic Improvement
by Oliver Krauss
DOI PDF SLIDES VIDEO VIDEO VIDEO Abstract

We explore the potential of using large-scale Natural Language Processing (NLP) models, such as GPT-3, for enhancing genetic improvement in software development These models have previously been used to automatically find bugs, or improve software. We propose using these models as a novel mutator, as well as for explaining the patches generated by genetic improvement algorithms. Our initial findings indicate promising results, but further research is needed to determine the scalability and applicability of this approach across different programming languages.

Generative Art via Grammatical Evolution
by Erik M. Fredericks, Abigail C. Diller, and Jared M. Moore
DOI PDF SLIDES VIDEO VIDEO URL Abstract

Generative art produces artistic output via algorithmic design. Common examples include flow fields, particle motion, and mathematical formula visualization. Typically an art piece is generated with the artist/programmer acting as a domain expert to create the final output. A large amount of effort is often spent manipulating and/or refining parameters or algorithms and observing the resulting changes in produced images. Small changes to parameters of the various techniques can substantially alter the final product. We present GenerativeGI, a proof of concept evolutionary framework for creating generative art based on an input suite of artistic techniques and desired aesthetic preferences for outputs. GenerativeGI encodes artistic techniques in a grammar, thereby enabling multiple techniques to be combined and optimized via a many-objective evolutionary algorithm. Specific combinations of evolutionary objectives can help refine outputs reflecting the aesthetic preferences of the designer. Experimental results indicate that GenerativeGI can successfully produce more visually complex outputs than those found by random search.

Genetic Improvement of OLC and H3 with Magpie
by William B. Langdon and Bradley J. Alexander
DOI PDF PDF SLIDES VIDEO VIDEO URL Abstract

Magpie (Machine Automated General Performance Improvement via Evolution of software) has been recently developed by Aymeric Blot from PyGGI 2.0. Like PyGGI, it claims to be able to optimise computer source code written in arbitrary programming languages. So far it has been demonstrated on benchmarks written in Python and C. Recently we have used hill climbing to customise two industrial open source programs: Google's Open Location Code OLC and Uber's Hexagonal Hierarchical Spatial Index H3 [W. B. Langdon et al., ``Genetic improvement of LLVM intermediate representation'', in EuroGP 2023]. Magpie found much faster improvements (reducing instruction counts by up to 15 percent v. 2 percent) which generalise. Various glitches in Magpie are also reported.

It's all in the Semantics: When are Genetically Improved Programs Still Correct?
by Myra B. Cohen
DOI PDF VIDEO Abstract

Genetic improvement (GI) is a powerful technique to automatically optimise programs, often for nonfunctional properties. As such, we expect to retain the original program semantics, hence GI is guided by both a functional test suite and at least one other objective such as program efficiency, memory usage, energy efficiency, etc. An assumption made is that it is possible to improve a program's non-functional objective while retaining the program's correctness, however, this assumption may not hold for all types of non-functional properties. In this talk I show why GI is naturally a multi-objective optimization problem and argue that it may be necessary to relax part of the program correctness to satisfy our non-functional goals. I discuss a few recent examples where we have had to balance functional correctness and non-functional objectives and demonstrate how this may lead to programs that are of higher quality in the end. This raises an important question about when it is possible to completely satisfy multiple (potentially competing) program objectives during GI, and when it is semantically impossible. This leads to the ultimate question of what it means for a program to be correct when using GI.

Towards Objective-Tailored Genetic Improvement Through Large Language Models
by Sungmin Kang and Shin Yoo
DOI PDF PDF SLIDES VIDEO VIDEO URL Abstract

While Genetic Improvement (GI) is a useful paradigm to improve functional and nonfunctional aspects of software, existing techniques tended to use the same set of mutation operators for differing objectives, due to the difficulty of writing custom mutation operators. we suggest that Large Language Models (LLMs) can be used to generate objective-tailored mutants, expanding the possibilities of software optimisations that GI can perform. We further argue that LLMs and the GI process can benefit from the strengths of one another, and present a simple example demonstrating that LLMs can both improve the effectiveness of the GI optimization process, while also benefiting from the evaluation steps of GI. As a result, we believe that the combination of LLMs and GI has the capability to significantly aid developers in optimizing their software.

Updating Gin's profiler for current Java
by Myles Watkinson and Alexander Brownlee
DOI PDF SLIDES VIDEO VIDEO URL Abstract

Genetic improvement is a young and growing field. With much research still to be done, a number of tools to support the research community have emerged, with Gin being one such tool targeted at GI for Java. One core component of Gin is the profiler, which is used to identify hot methods in target applications: methods where the CPU spends most time and so may offer the most fertile sections of code for improvements to run time. Gin's profiler is HPROF, which was included with JDKs up to version 8. HPROF is no longer supported and so needs replaced if Gin is to support later versions of Java. Furthermore, little investigation has been made within the GI community comparing different profiling approaches. With this paper and its associated accepted pull request, we replace Gin's CPU profiler with Java Flight Recorder (JFR) to allow Gin to be applied to current Java code, allowing researchers working in GI with more recent JVMs to easily integrate profiling in their pipeline. We also contribute an experimental comparison of the HPROF and JFR profilers for the JVM.

GI@GECCO 2022: 11th International Workshop

Amaru - A Framework for combining Genetic Improvement with Pattern Mining
by Oliver Krauss
DOI PDF SLIDES VIDEO URL Abstract

We present Amaru, a framework for Genetic Improvement using Abstract Syntax Trees directly at the interpreter and compiler level. Amaru also enables the mining of frequent, discriminative patterns from Genetic Improvement populations. These patterns in turn can be used to improve the crossover and mutation operators to increase population diversity, reduce the number of individuals failing at run-time and increasing the amount of successful individuals.

Automatically Exploring Computer System Design Spaces
by Bobby R. Bruce
DOI PDF SLIDES Abstract

While much research has focused on using search to optimize software, it is possible to use these same approaches to optimise other parts of the computer system. With decreased costs in silicon customisation, and the return of centralized systems carrying out specialized tasks, the time is right to begin thinking about tools to optimise systems for the needs of specific software targets. In the approach outlined, the standard Genetic Improvement process is flipped with source code considered static and the remainder of the computer system altered to the needs of the software. The project proposed is preliminary research into incorporating grammar-based GP with an advanced computer architecture simulator to automatically design computer systems. I argue this approach has the potential to significantly improve the design of computer systems while reducing manual effort.

Dissecting Copy/Delete/Replace/Swap mutations: Insights from a GIN Case Study
by Sherlock A. Licorish and Markus Wagner
DOI PDF SLIDES VIDEO Abstract

Research studies are increasingly critical of publicly available code due to evidence of faults. This has led researchers to explore ways to improve such code, with static analysis and genetic code improvement previously singled out. Previous work has evaluated the feasibility of these techniques, using PMD and GIN for enhancing Stack Overflow code snippets. Results reported in this regard pointed to the potential of these techniques, especially in terms of GIN removal of PMD performance faults from 58 programs. We use a contextual lens to explore these mutations in this study, to evaluate the promise of these techniques. The outcomes show that while the programs were syntactically correct after GIN mutations, many of GIN mutations changed the semantics of the code, rendering its purpose questionable. However, certain code mutations tend to retain code semantics more than others. In addition, GINs mutations at times affected PMDs parsing ability, potentially increasing false negatives. Overall, while these approaches may prove useful, full utility may not be claimed at this time. For enhancing the outcomes of these approaches, we outline multiple strategies.

Evaluation of Genetic Improvement Tools for Improvement of Non-functional Properties of Software
by Shengjie Zuo, Aymeric Blot, and Justyna Petke
DOI PDF URL Abstract

Genetic improvement (GI) improves both functional properties of software, such as bug repair, and non-functional properties, such as execution time, energy consumption, or source code size. There are studies summarising and comparing GI tools for improving functional properties of software; however there is no such study for improvement of its non-functional properties using GI. Therefore, this research aims to survey and report on the existing GI tools for improvement of non-functional properties of software. We conducted a literature review of available GI tools, and ran multiple experiments on the found open-source tools to examine their usability. We applied a cross-testing strategy to check whether the available tools can work on different programs. Overall, we found 63 GI papers that use a GI tool to improve non-functional properties of software, within which 31 are accompanied with open-source code. We were able to successfully run nine GI tools, and found that ultimately only two, Gin and PyGGI, can be readily applied to new general software.

From Deep Learning to Human Judgments: Lessons for Genetic Improvement
by Westley Weimer
SLIDES VIDEO

Genetic Improvement in the Shackleton Framework for Optimizing LLVM Pass Sequences
by Shuyue Stella Li, Hannah Peeler, Andrew N. Sloss, Kenneth N. Reid, and Wolfgang Banzhaf
DOI PDF PDF SLIDES VIDEO URL URL Abstract

Genetic Improvement is a search technique that aims to improve a given acceptable solution to a problem. we present the novel use of genetic improvement to find problem-specific optimized LLVM Pass sequences. We develop a Pass-level edit representation in the linear genetic programming framework, Shackleton, to evolve the modifications to be applied to the default optimization Pass sequences. Our GI-evolved solution has a mean of 3.7percent runtime improvement compared to the default LLVM optimisation level -O3 which targets runtime. The proposed GI method provides an automatic way to find a problem-specific optimization sequence that improves upon a general solution without any expert domain knowledge. we discuss the advantages and limitations of the GI feature in the Shackleton Framework and present our results.

Genetic Improvement of Shoreline Evolution Forecasting Models
by Mahmoud Al Najar, Rafael Almar, Erwin W. J. Bergsma, Jean-Marc Delvit, and Dennis G. Wilson
DOI PDF SLIDES SLIDES VIDEO Abstract

Coastal development and climate change are changing the geography of our coasts, while more and more people are moving towards the coasts. Recent advances in artificial intelligence allow for automatic analysis of observational data. Cartesian Genetic Programming (CGP) is a form of Genetic Programming (GP) that has been successfully used in a large variety of tasks including data-driven symbolic regression. We formulate the problem of shoreline evolution forecasting as a Genetic Improvement (GI) problem using CGP to encode and improve upon ShoreFor, an equilibrium shoreline prediction model, to study the effectiveness of CGP in GI in forecasting tasks. This work presents an empirical study of the sensitivity of CGP to a number of evolutionary configurations and constraints and compares the performances of the evolved models to the base ShoreFor model.

Leveraging Fuzzy System to Reduce Uncertainty of Decision Making in Software Engineering Automation
by Yueke Zhang and Yu Huang
DOI PDF SLIDES VIDEO Abstract

Decision making in automated GI-based software engineering tasks can significantly affect the performance of the system. However, modern SE usually presents high uncertainty in such decision making process due to the existence of multiple solutions that reply on different heuristics. We propose to apply the theory of Fuzzy System to obtain a final decision with lower uncertainty and higher accuracy. We also demonstrate a motivating example and discuss the challenges and opportunities for applying fuzzy system to SE tasks.

Opportunities for Genetic Improvement of Cryptographic Code
by Chitchanok Chuengsatiansup, Markus Wagner, and Yuval Yarom
DOI PDF SLIDES VIDEO Abstract

Cryptography is one of the main tools underlying the security of our connected world. Cryptographic code must meet not only high security requirements, but also exhibit excellent non-functional properties, such as high performance and unique security requirements. As both automatic code generation and genetic improvement of such code are under explored, we motivate here what makes cryptographic code a prime target for future research.

Py2Cy: A Genetic Improvement Tool To Speed Up Python
by James Zhong, Max Hort, and Federica Sarro
DOI PDF SLIDES VIDEO URL Abstract

Due to its ease of use and wide range of custom libraries, Python has quickly gained popularity and is used by a wide range of developers all over the world. While Python allows for fast writing of source code, the resulting programs are slow to execute when compared to programs written in other programming languages like C. One of the reasons for its slow execution time is the dynamic typing of variables. Cython is an extension to Python, which can achieve execution speed-ups by compiler optimization. One possibility for improvements is the use of static typing, which can be added to Python scripts by developers. To alleviate the need for manual effort,we create Py2Cy, a Genetic Improvement tool for automatically converting Python scripts to statically typed Cython scripts. To show the feasibility of improving runtime with Py2Cy, we optimise a Python script for generating Fibonacci numbers. The results show that Py2Cy is able to speed up the execution time by up to a factor of 18.

The case for Grammatical Evolution in test generation
by Aidan Murphy, Thomas Laurent, and Anthony Ventresque
DOI PDF SLIDES VIDEO Abstract

Generating tests for software is an important, but difficult, task. Search-based test generation is promising, as it reduces the time required from human experts, but suffers from many problems and limitations. Namely, the inability to fully incorporate a testers domain knowledge into the search, its difficulty in creating very complex objects, and the problems associated with variable length tests. This paper illustrates how Grammatical Evolution could address and provide a possible solution to each of these concerns.

Towards evolution-based autonomy in large-scale systems
by Damien Anderson, Paul Harvey, Yusaku Kaneta, Petros Papadopoulos, Philip Rodgers, and Marc Roper
DOI PDF SLIDES VIDEO Abstract

To achieve truly autonomous behaviour systems need to adapt to not only previously unseen circumstances but also previously unimagined ones. A hierarchical evolutionary system is proposed which is capable of displaying the emergent behaviour necessary to achieve this goal. we report the practical challenges encountered in implementing this proposed approach in a large-scale open-source system.

GI@ICSE 2021: 10th International Workshop

(Genetically) Improving Novelty in Procedural Story Generation
by Erik Fredericks and Byron DeVries
DOI PDF VIDEO VIDEO VIDEO VIDEO URL URL Abstract

Procedural story generation (PCG) tailors a unique narrative experience for a player and can be accomplished via multiple techniques, from matching storylets to grammar-based generation. There exists a rich opportunity for evolutionary algorithms to be applied to this domain for intelligently constructing game narratives. We describe a conceptual procedure for applying genetic improvement to a grammar-driven procedural narrative within the context of a browser-based game.

A Permutation Representation of Covering Arrays
by Ryan Dougherty and Xi Jiang
DOI VIDEO VIDEO VIDEO Abstract

Testing a large-scale system requires understanding how each of the components interact with each other, which is the subject of interaction testing. Covering arrays are a well-studied object, but traditional representations of these arrays in the context of genetic algorithms has not yielded much success. We propose a new representation of covering arrays based on a permutation of the rows considered. Preliminary results for reducing the mean-time-to-failure of these arrays are given.

Applying Automated Program Repair to Dataflow Programming Languages
by Yu Huang, Hammad Ahmad, Stephanie Forrest, and Westley Weimer
DOI VIDEO VIDEO VIDEO Abstract

Dataflow programming languages are used in a variety of settings, and defects in their programs can have serious consequences. However, prior work in automated program repair (APR) emphasizes control flow over dataflow languages. We identify three impediments to the use of APR in dataflow programming, parallelism, state, and evaluation, and highlight opportunities for overcoming them.

CRNRepair: Automated Program Repair of Chemical Reaction Networks
by Ibrahim Mesecan, Michael C. Gerten, James I. Lathrop, Myra B. Cohen, and Tomas Haddad Caldas
DOI PDF VIDEO VIDEO VIDEO URL Abstract

Chemical reaction networks (CRNs) are abstractions of distributed networks that form the foundations of many natural phenomena such as biological processes. These can be encoded and/or compiled into DNA and have been shown to be Turing complete. Before CRNs are implemented in a physical environment, they are often simulated in programming environments. Like traditional programs, these CRN programs must be validated. Researchers have recently designed a software testing framework for CRNs, however, repairing CRN programs is still a manual task. While the programs are often small in size, finding and repairing the faults can be difficult without automated support. we present CRNRepair, a program repair framework for CRN programs. We built our framework on top of an existing APR framework. We use a testing infrastructure built in the Matlab SimBiology package and adapt it to use the SBML representation for its abstract syntax tree. In a case study on 19 mutant versions of 2 programs, we find plausible patches for 90 percent of one of the programs, and 50 percent of the other. We find several common types of repairs, which differ from the correct programs, but are functionally correct.

Engineering and Evolving Software
by Stephanie Forrest
DOI VIDEO URL Abstract

Provides a brief professional biography of the presenter Stephanie Forrest of Arizona State University. The complete presentation was not made available for publication as part of the conference proceedings.

Exploring the Accuracy -- Energy Trade-off in Machine Learning
by Alexander E. I. Brownlee, Jason Adair, Saemundur O. Haraldsson, and John Jabbo
DOI PDF VIDEO VIDEO VIDEO URL Abstract

Machine learning accounts for considerable global electricity demand and resulting environmental impact, as training a large deep-learning model produces 284000 kgs of the greenhouse gas carbon dioxide. In recent years, search-based approaches have begun to explore improving software to consume less energy. Machine learning is a particularly strong candidate for this because it is possible to trade off functionality (accuracy) against energy consumption, whereas with many programs functionality is simply a pass-or-fail constraint. We use a grid search to explore hyperparameter configurations for a multilayer perceptron on five classification data sets, considering trade-offs of classification accuracy against training or inference energy. On one data set, we show that 77 percent of energy consumption for inference can saved by reducing accuracy from 94.3 percent to 93.2 percent. Energy for training can also be reduced by 30-50 percent with minimal loss of accuracy. We also find that structural parameters like hidden layer size is a major driver of the energy-accuracy trade-off, but there is some evidence that non-structural hyperparameters influence the trade-off too. We also show that a search-based approach has the potential to identify these trade-offs more efficiently than the grid search.

Generating Objected-Oriented Source Code Using Genetic Programming
by Vicente Illanes and Alexandre Bergel
DOI PDF PDF VIDEO VIDEO VIDEO Abstract

Using machine learning to generate source code is an active and highly important research area. In particular,it has been shown that genetic programming (GP) efficiently contributes to software repair. However, most of the published advances on applying GP to generate source code are limited to the C programming language, a statically-typed procedural language. As a consequence, applying GP to object-oriented and dynamically-typed languages may represent a significant opportunity. explores the use of genetic programming to generate objected-oriented source code in a dynamically-typed setting. We found that GP is able to produce missing one-line statements with a precision of 51 percent. Our preliminary results contributes to the state of the art by indicating that GP maybe effectively employed to generate source code for dynamically-typed object-oriented applications.

Open Challenges in Genetic Improvement for Emergent Software Systems
by Penelope Faulkner Rainford and Barry Porter
DOI PDF VIDEO VIDEO VIDEO Abstract

Genetic improvement for emergent software systems faces unique challenges due to its deployment in highly dynamic environments. We discuss four of those challenges along with our initial plans for new research.

Optimising SQL Queries Using Genetic Improvement
by James Callan and Justyna Petke
DOI PDF VIDEO VIDEO VIDEO Abstract

Structured Query Language (SQL) queries are ubiquitous in modern software engineering. These queries can be costly when run on large databases with many entries and tables to consider. We propose using Genetic Improvement (GI) to explore patches for these queries, with the aim of optimising their execution time, whilst maintaining the functionality of the program in which they are used. Specifically, we propose three ways in which SQL JOIN statements can be mutated in order to improve performance. We also discuss the requirements of software being improved in this manner and the potential challenges of our approach.

Partial Specifications for Program Repair
by Linsey Kitt and Myra B. Cohen
DOI PDF VIDEO VIDEO VIDEO Abstract

we argue for using many partial test suites instead of one full test suite during program repair. This may provide a pool of simpler, yet correct patches, addressing both the overfitting and poor repair quality problem. To support this idea, we present some insight obtained running APR partial test suites on the well studied triangle program.

Uniform Edit Selection for Genetic Improvement: Empirical Analysis of Mutation Operator Efficacy
by Marta Smigielska, Aymeric Blot, and Justyna Petke
DOI PDF VIDEO VIDEO VIDEO URL Abstract

Genetic improvement (GI) tools find improved program versions by modifying the initial program. These can be used for the purpose of automated program repair (APR). GI uses software transformations, called mutation operators, such as deletions, insertions, and replacements of code fragments. Current edit selection strategies, however, under-explore the search spaces of insertion and replacement operators. Therefore, we implement a uniform strategy based on the relative operator search space sizes. We evaluate it on the QuixBugs repair benchmark and find that the uniform strategy has the potential for improving APR tool performance. We also analyse the efficacy of the different mutation operators with regard to the type of code fragment they are applied to. We find that, for all operators, choosing expression statements as target statements is the most successful for finding program variants with improved or preserved fitness (50.03 percent, 33.12 percent and 23.85 percent for deletions, insertions and replacements, respectively), whereas choosing declaration statements is the least effective (3.16 percent, 10.82 percent and 3.14 percent for deletions, insertions and replacements).

Using Genetic Improvement to Retarget quantum Software on Differing Hardware
by George O'Brien and John Clark
DOI VIDEO VIDEO VIDEO Abstract

Quantum computers are rapidly developing to a point where they can solve problems faster than any classical computation. As they're developed competing standards for languages, models and architectures are also being created. These standards are often bespoke and aimed at optimizing around a single algorithm or problem. This can make it very difficult to reuse these them should the original hardware become unavailable or obsolete. We demonstrate a method that can compile circuits more generally across hardware constraints with the use of a genetic improvement inspired search technique that includes a realistic model of the hardware. We show that this method is effective at selecting gates that can be more easily implemented and ran compared to generic optimization which reduces the total chance of failure. To ensure that these results are practical, empirical results are generated using different IBM hardware and a selection of real algorithms.

GI@GECCO 2020: 9th International Workshop

An Annotated Dataset of Stack Overflow Post Edits
by Sebastian Baltes and Markus Wagner
DOI PDF PDF URL Abstract

To improve software engineering, software repositories have been mined for code snippets and bug fixes. Typically, this mining takes place at the level of files or commits. To be able to dig deeper and to extract insights at a higher resolution, we hereby present an annotated dataset that contains over 7 million edits of code and text on Stack Overflow. Our preliminary study indicates that these edits might be a treasure trove for mining information about fine-grained patches, e.g., for the optimisation of non-functional properties.

Evolving sqrt into 1/x via Software Data Maintenance
by William B. Langdon and Oliver Krauss
DOI PDF SLIDES VIDEO Abstract

While most software automation research concentrates on programs code, we have started investigating if Genetic Improvement (GI) of data can assist developers by automating aspects of the maintenance of parameters embedded in source code. We extend recent GI work on optimising compile time constants to give new functionality and describe the transformation of a GNU C library square root function into the double precision reciprocal function, drcp. Multiplying by 1/x (drcp) allows division free division without requiring the hardware to support division. The evolution (6 seconds) and indeed the GI dp division (7.14 nS) are both surprisingly fast.

Genetic Improvement of Software Efficiency: The Curse of Fitness Estimation
by Mahmoud A. Bokhari, Markus Wagner, and Brad Alexander
DOI PDF Abstract

Many challenges arise in the application of Genetic Improvement (GI) of Software to improve non-functional requirements of software such as energy use and run-time. These challenges are mainly centred around the complexity of the search space and the estimation of the desired fitness function. For example, such fitness function are expensive, noisy and estimating them is not a straight-forward task. we illustrate some of the challenges incomputing such fitness functions and propose a synergy between in-vivo evaluation and machine learning approaches to overcome such issues.

Optimising the Fit of Stack Overflow Code Snippets into Existing Code
by Brittany Reid, Christoph Treude, and Markus Wagner
DOI PDF PDF URL URL Abstract

Software developers often reuse code from online sources such as Stack Overflow within their projects. However, the process of searching for code snippets and integrating them within existing source code can be tedious. In order to improve efficiency and reduce time spent on code reuse, we present an automated code reuse tool for the Eclipse IDE (Integrated Developer Environment), NLP2TestableCode. NLP2TestableCode can not only search for Java code snippets using natural language tasks, but also evaluate code snippets based on a user's existing code, modify snippets to improve fit and correct errors, before presenting the user with the best snippet, all without leaving the editor. NLP2TestableCode also includes functionality to automatically generate customisable test cases and suggest argument and return types, in order to further evaluate code snippets. In evaluation, NLP2TestableCode was capable of finding compilable code snippets for 82.9percent of tasks, and testable code snippets for 42.9percent.

Tuning Genetic Algorithm Parameters using Design of Experiments
by Mohsen Mosayebi and Manbir Sodhi
DOI PDF Abstract

Tuning evolutionary algorithms is a persistent challenge in thefield of evolutionary computing. The efficiency of an evolutionary algorithm relates to the coding of the algorithm, the design of the evolutionary operators and the parameter settings for evolution. we explore the effect of tuning the operators and parameters of a genetic algorithm for solving the Traveling Salesman Problem using Design of Experiments theory. Small scale problems are solved with specific settings of parameters including population size, crossover rate, mutation rate and the extent of elitism. Good values of the parameters suggested by the experiments are used to solve large scale problems. Computational tests show that the parameters selected by this process result in improved performance both in the quality of results obtained and the convergence rate when compared with untuned parameter settings.

GI@ICSE 2020: 8th International Workshop

Checkers: Multi-modal Darwinian API Optimisation
by Santanu Kumar Dash, Fan Wu, Michail Basios, Lingbo Li, and Leslie Kanthan
DOI PDF SLIDES VIDEO Abstract

Advent of microservices has increased the popularity of the API-first design principles. Developers have been focusing on concretising the API to a system before building the system. An API-first approach assumes that the API will be correctly used. Inevitably, most developers, even experienced ones, end-up writing sub-optimal software because of using APIs incorrectly. we discuss an automated approach for exploring API equivalence and a framework to synthesise semantically equivalent programs. Unlike existing approaches to API transplantation, we propose an amorphous or formless approach to software translation in which a single API could potentially be replaced by a synthesised sequence of APIs which ensures type progress. Our search is guided by the non-functional goals for the software, a type-theoretic notion of progress and an automatic multi-modal embedding of the API from its documentation and code analysis.

Human Factors in the Study of Automatic Software Repair: Future Directions for Research with Industry
by Emily Winter, David Bowes, Steve Counsell, Tracy Hall, Saemundur Haraldsson, Vesna Nowack, and John Woodward
DOI PDF SLIDES VIDEO Abstract

Automatic software repair represents a significant development in software engineering, promising considerable potential change to the working procedures and practices of software engineers, developers, and testers. Technical advances within this domain have been the focus of many recent publications. However, despite the simultaneous rising prominence of studies that consider the role of human factors within software engineering, there has not been an equivalent growth of studies of human factors within the domain of automatic software repair. This position paper presents the case for increased research in this area and suggests three key focuses and approaches for a future research agenda. These are: considerations that go beyond the current focus on the usability of automatic software repair tools; longitudinal studies; and the use of a wide range of appropriate social research methods, not just surveys. All three of these enable industry-based software engineers not just to provide feedback on automatic software repair tools but to participate in the shaping of these technologies, facilitating the development of tools that meet developer and industry needs, as well as allaying any concerns.

Stack-Based Genetic Improvement
by Aymeric Blot and Justyna Petke
DOI PDF SLIDES VIDEO Abstract

Genetic improvement (GI) uses automated search to find improved versions of existing software. If originally GI directly evolved populations of software, most GI work nowadays use a solution representation based on a list of mutations. This representation however has some limitations, notably in how genetic material can be re-combined. We introduce a novel stack-based representation and discuss its possible benefits

Synthetic Benchmarks for Genetic Improvement
by Aymeric Blot and Justyna Petke
DOI PDF SLIDES VIDEO Abstract

Genetic improvement (GI) uses automated search to find improved versions of existing software. If over the years the potential of many GI approaches have been demonstrated, the intrinsic cost of evaluating real-world software makes comparing these approaches in large-scale meta-analyses very expensive. We propose and describe a method to construct synthetic GI benchmarks, to circumvent this bottleneck and enable much faster quality assessment of GI approaches.

Towards Knowledge Guided Genetic Improvement
by Oliver Krauss, Hanspeter Moessenboeck, and Michael Affenzeller
DOI PDF SLIDES VIDEO Abstract

We outline a combination of Grammar Guided Genetic Programming with Tree Genetic Programming. Instead of using a grammar directly, an operator graph based on that grammar is created, that is responsible for producing abstract syntax trees. Each operator contains knowledge about the grammar symbol it represents and returns only trees valid according to user-defined restrictions such as depth, complexity and approximated run-time performance. Initial Results show that the approach reduces invalid individuals in an evolutionary run, while supporting the extension towards further non-functional features

WES: Agent-based User Interaction Simulation on Real Infrastructure
by John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Ralf Laemmel, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers
DOI PDF VIDEO URL Abstract

We introduce the Web-Enabled Simulation (WES) research agenda, and describe FACEBOOK WW system. We describe the application of WW to reliability, integrity and privacy at FACEBOOK, where it is used to simulate social media interactions on an infrastructure consisting of hundreds of millions of lines of code. The WES agenda draws on research from many areas of study, including Search Based Software Engineering, Machine Learning, Programming Languages, Multi Agent Systems, Graph Theory, Game AI, and AI Assisted Game Play. We conclude with a set of open problems and research challenges to motivate wider investigation.

GI@GECCO 2019: 7th International Workshop

A Survey of Genetic Improvement Search Spaces
by Justyna Petke, Brad Alexander, Earl T. Barr, Alexander E. I. Brownlee, Markus Wagner, and David R. White
DOI PDF Abstract

Genetic Improvement (GI) uses automated search to improve existing software. Most GI work has focused on empirical studies that successfully apply GI to improve softwares running time, fix bugs, add new features, etc. There has been little research into why GI has been so successful. For example, genetic programming has been the most commonly applied search algorithm in GI. Is genetic programming the best choice for GI? Initial attempts to answer this question have explored GIs mutation search space. This paper summarises the work published on this question to date

Genetic Algorithms for Affine Transformations to Existential t-Restrictions
by Ryan E. Dougherty, Erin Lanus, Charles J. Colbourn, and Stephanie Forrest
DOI PDF VIDEO Abstract

The subject of t-restrictions has garnered considerable interest recently as it encompasses many different types of combinatorial objects, all of which have unique and important applications. One of the most popular of these is an ingredient in the generation of covering arrays, which are used for discovering faulty interactions among software components. We focus on existential t-restrictions, which have a structure that can be exploited by genetic algorithms. In particular, recent work on such restrictions considers affine transformations while maximizing the corresponding score of the formed restriction. We propose to use genetic algorithms for existential t-restrictions by providing a general framework that can be applied to all such objects.

Genetic Improvement of Data gives double precision invsqrt
by W. B. Langdon
DOI PDF SLIDES URL Abstract

CMA-ES plus manual code changes rapidly transforms 512 Newton-Raphson start points from a GNU C library table driven version of sqrt into a double precision reciprocal square root function. The GI 1/sqrt(x) is far more accurate than Quake's InvSqrt, Quare root.

On Adaptive Specialisation in Genetic Improvement
by Aymeric Blot and Justyna Petke
DOI PDF Abstract

Genetic improvement uses automated search to find improved versions of existing software. Software can either be evolved with general-purpose intentions or with a focus on a specific application (e.g., to improve its efficiency for a particular class of problems). Unfortunately, software specialisation to each problem application is generally performed independently, fragmenting and slowing down an already very time-consuming search process. We propose to incorporate specialisation as an online mechanism of the general search process, in an attempt to automatically devise application classes, by benefiting from past execution history.

The Quest for Non-Functional Property Optimisation in Heterogeneous and Fragmented Ecosystems: a Distributed Approach
by Mahmoud A. Bokhari, Markus Wagner, and Brad Alexander
DOI PDF Abstract

The optimisation of non-functional properties of software is of growing importance in all scales of modern computing (from embedded systems to data-centres). In mobile computing, smart devices have complex interactions between their hardware and software components. Small changes in the environment can greatly impact the measurements of non-functional properties of software. In-vivo optimisation of applications on a platform can be used to evolve robust new solutions. However, the portability of such solutions performance across different platforms is questionable. In this paper we discuss the issue of optimising the non-functional properties of applications running in the Android ecosystem. We also propose a distributed framework that can mitigate such issues.

Toward Human-Like Summaries Generated from Heterogeneous Software Artefacts
by Mahfouth Alghamdi, Christoph Treude, and Markus Wagner
DOI PDF PDF URL Abstract

Automatic text summarisation has drawn considerable interest in the field of software engineering. It can improve the efficiency of software developers, enhance the quality of products, and ensure timely delivery. In this paper, we present our initial work towards automatically generating human-like multi-document summaries from heterogeneous software artefacts. Our analysis of the text properties of 545 human-written summaries from 15 software engineering projects will ultimately guide heuristics searches in the automatic generation of human-like summaries.

GI@ICSE 2019: 6th International Workshop

Better Code Search and Reuse for Better Program Repair
by Qi Xin and Steven Reiss
DOI PDF SLIDES URL Abstract

A branch of automated program repair (APR) techniques look at finding and reusing existing code for bug repair. ssFix is one of such techniques that is syntactic search-based: it searches a code database for code fragments that are syntactically similar to the bug context and reuses such code fragments to produce patches. The keys to its success lie in the approaches it uses for code search and code reuse. We investigated the effectiveness of ssFix using the Defects4J bug dataset and found that its code search and code reuse approaches are not truly effective and can be significantly improved. Motivated by the investigation, we developed a new repair technique sharpFix that follows ssFix's basic idea but differs significantly in the approaches used for code search and code reuse. We compared sharpFix and ssFix on the Defects4J dataset and confirmed through experiments that (1) sharpFix's code search and code reuse approaches are better than ssFix's approaches and (2) sharpFix can do better repair. sharpFix successfully repaired a total of 36 Defects4J bugs and outperformed many existing repair techniques in repairing more bugs. We also compared sharpFix, ssFix, and four other techniques on another dataset Bugs.jar-ELIXIR. Our results show that sharpFix did better than others and repaired the largest number of bugs.

Evolutionary Computation for Improving Malware Analysis
by Kevin Leach, Ryan Dougherty, Chad Spensky, Stephanie Forrest, and Westley Weimer
DOI PDF PDF SLIDES Abstract

Research in genetic improvement (GI) conventionally focuses on the improvement of software, including the automated repair of bugs and vulnerabilities as well as the refinement of software to increase performance. Eliminating or reducing vulnerabilities using GI has improved the security of benign software, but the growing volume and complexity of malicious software necessitates better analysis techniques that may benefit from a GI-based approach. Rather than focus on the use of GI to improve individual software artefacts, we believe GI can be applied to the tools used to analyse malicious code for its behaviour. First, malware analysis is critical to understanding the damage caused by an attacker, which GI-based bug repair does not currently address. Second, modern malware samples leverage complex vectors for infection that cannot currently be addressed by GI. In this paper, we discuss an application of genetic improvement to the realm of automated malware analysis through the use of variable-strength covering arrays.

Fuzzy Edit Sequences in Genetic Improvement
by Aymeric Blot
DOI PDF PDF SLIDES Abstract

Genetic improvement uses automated search to find improved versions of existing software. Edit sequences have been proposed as a very convenient way to represent code modifications, focusing on the changes themselves rather than duplicating the entire program. However, edits are usually defined in terms of practical operations rather than in terms of semantic changes; indeed, crossover and other edit sequence mutations usually never guarantee semantic preservation. We propose several changes to usual edit sequences, specifically augmenting edits with content data and using fuzzy matching, in an attempt to improve semantic preservation.

Genetic Improvement of GPU Code
by Jhe-Yu Liou, Stephanie Forrest, and Carole-Jean Wu
DOI PDF PDF SLIDES URL URL Abstract

As the programming stack and tool support for GPU have matured, GPUs have become accessible to programmers who often lack domain-specific knowledge of the underlying architecture and fail to fully leverage the GPU's computation power. This paper presents GEVO (Gpu EVOlution), a tool for automatically tuning the performance of GPU kernels in the LLVM representation to meet desired criteria. GEVO uses population-based search to find edits to programs compiled to LLVM-IR that improve performance on desired criteria and retain required functionality. GEVO extends earlier GI work by operating directly on the LLVM-IR without custom representations or other manual interventions. We demonstrate that GEVO improves runtime on NVIDIA Tesla P100 for many programs in the Rodinia benchmark suite and a supervised machine learning code, ThunderSVM. For the Rodinia benchmark, GEVO improves GPU kernel runtime performance by an average of 13.87percent and as much as 43percent over the fully compiler-optimized baseline. If the kernel output accuracy is relaxed to tolerate 1percent error, GEVO can find kernel variants that outperform the baseline version by an average of 15.47percent. For ThunderSVM, GEVO reduces entire model training time by 50percent and 24.8percent, for MNIST handwriting recognition dataset and a9a income prediction, where the accuracy of trained model are improved by 0.17percent and 0.04percent respectively.

Industrial experience of Genetic Improvement in Facebook
by Nadia Alshahwan
DOI PDF Abstract

Facebook recently had their first experience with Genetic Improvement (GI) by developing and deploying the automated bug fixing tool SapFix. The experience was successful resulting in landed fixes but also very educational. This paper will briefly outline some of the challenges for GI that were highlighted by this experience as well as a look at future directions in the area of mobile apps.

Leveraging Program Invariants to Promote Population Diversity in Search-Based Automatic Program Repair
by Zhen Yu Ding, Yiwei Lyu, Christopher Timperley, and Claire Le Goues
DOI PDF SLIDES Abstract

Search-based automatic program repair has shown promise in reducing the cost of defects in real-world software.However, to date, such techniques have typically been most successful when constructing short or single-edit repairs. This is true even when techniques make use of heuristic search strategies, like genetic programming, that in principle support the construction of patches of arbitrary length. One key reason is that the fitness function traditionally depends entirely on test cases, which are poor at identifying partially correct solutions and lead to a fitness landscape with many plateaus. We propose a novel fitness function that optimizes for both functionality and semantic diversity, characterized using learned invariant solver intermediate behaviour. Our early results show that this new approach improves semantic diversity and fitness granularity, but does not statistically significantly improve repair performance.

Mining Patterns from Genetic Improvement Experiments
by Oliver Krauss, Hanspeter Moessenboeck, and Michael Affenzeller
DOI PDF SLIDES Abstract

When conducting genetic improvement experiments, a large amount of individuals (approx population size times generations) is created and evaluated. The corresponding experiments contain valuable data concerning the fitness of individuals for the defined criteria, such as run-time performance, memory use or robustness. This publication presents an approach to use this information in order to identify recurring context independent patterns in abstract syntax trees (ASTs). These patterns can be applied for restricting the search space (in the form of anti-patterns) or for grafting operators in the population. Future work includes an evaluation of this approach, as well as extending it with wildcards and class hierarchies for larger and more generalised patterns.

GI@GECCO 2018: 5th International Workshop

Assessing Single-Objective Performance Convergence and Time Complexity for Refactoring Detection
by David Nader-Palacio, Daniel Rodriguez-Cardenas, and Jonatan Gomez
DOI PDF Abstract

The automatic detection of refactoring recommendations has been tackled in prior optimization studies involving bad code smells, semantic coherence and importance of classes; however, such studies informally addressed formalisms to standardize and replicate refactoring models. We propose to assess the refactoring detection by means of performance convergence and time complexity. Since the reported approaches are difficult to reproduce, we employ an Artificial Refactoring Generation (ARGen) as a formal and naive computational solution for the Refactoring Detection Problem. ARGen is able to detect massive refactoring sets in feasible areas of the search space. We used a refactoring formalization to adapt search techniques (Hill Climbing, Simulated Annealing and Hybrid Adaptive Evolutionary Algorithm) that assess the performance and complexity on three open software systems. Combinatorial techniques are limited in solving the Refactoring Detection Problem due to the relevance of developers' criteria (human factor) when designing reconstructions. Without performance convergence and time complexity analysis, a software empirical analysis that uses search techniques is incomplete.

Dynamic Fitness Functions for Genetic Improvement in Compilers and Interpreters
by Oliver Krauss, Hanspeter Moessenboeck, and Michael Affenzeller
DOI PDF Abstract

When attempting to improve the non-functional requirements of software, specifically run-time performance of code, an important requirement is to preserve the correctness of the optimized code. Additionally when attempting to integrate Genetic Improvement into a compiler or interpreter, the large search spaces resulting from the amount of operators and operands a language provides needs to be dealt with. This publication explores dynamic fitness functions as a foundation for a use in Genetic Improvement to optimize programs. An approach of using a test suite to verify code correctness in the Truffle Framework and Graal Compiler is presented. Two types of fitness functions are explored, which split the test suite according to their complexity and attempt to generate correct solutions with a growing set of increasingly complex tests. One of them increases the amount of tests sequentially over several iterations. The parallel fitness function attempts to split a test suite and to re-combine the results with increasingly large suites. The results show that these functions only marginally improve the fitness landscape on their own, but show that more partially correct solutions can be found with dynamic fitness functions. In the future, our approach may be improved by implementing specific crossover and mutator operations to accompany the dynamic fitness functions.

Genetic Configuration Sampling: Learning a Sampling Strategy for Fault Detection of Configurable Systems
by Jifeng Xuan, Yongfeng Gu, Zhilei Ren, Xiangyang Jia, and Qingna Fan
DOI PDF Abstract

A highly-configurable system provides many configuration options to diversify application scenarios. The combination of these configuration options results in a large search space of configurations. This makes the detection of configuration-related faults extremely hard. Since it is infeasible to exhaust every configuration, several methods are proposed to sample a subset of all configurations to detect hidden faults. Configuration sampling can be viewed as a process of repeating a pre-defined sampling action to the whole search space, such as the one-enabled or pair-wise strategy. we propose genetic configuration sampling, a new method of learning a sampling strategy for configuration-related faults. Genetic configuration sampling encodes a sequence of sampling actions as a chromosome in the genetic algorithm. Given a set of known configuration-related faults, genetic configuration sampling evolves the sequence of sampling actions and applies the learnt sequence to new configuration data. A pilot study on three highly-configurable systems shows that genetic configuration sampling performs well among nine sampling strategies in comparison.

Novelty Search for software improvement of a SLAM system
by Victor R. Lopez-Lopez, Leonardo Trujillo, and Pierrick Legrand
DOI PDF Abstract

Genetic Improvement (GI) performs a search at the level of source code to find the best variant of a baseline system that improves non-functional properties while maintaining functionality, with noticeable results in several domains. There a many aspects of this general approach that are currently being explored. In particular, this work deals to the way in which the search is guided to efficiently explore the search space of possible software versions in which GI operates. The proposal is to integrate Novelty Search (NS) within the GISMOE GI framework to improve KinectFusion, which is a vision-based Simultaneous Localization and Mapping (SLAM) system that is used for augmented reality, autonomous vehicle navigation, and many other real-world applications. This is one of a small set of works that have successfully combined NS with a GP system, and the first time that it has been used for software improvement. To achieve this, we propose a new behaviour descriptor for SLAM algorithms, based on state-of-the-art benchmarking and present results that show that NS can produce significant improvement gains in a GI setting, when considering execution time and trajectory estimation as the main performance criteria.

Synthesizing Customized Network Protocols using Genetic Programming
by Mohammad Roohitavaf, Ling Zhu, Sandeep Kulkarni, and Subir Biswas
DOI PDF Abstract

Given the advances in areas such as home automation, agricultural networks, smart cities, designers often need to design protocols that use the features of that network while dealing with its limitations. Using standardized protocols for such networks may not be appropriate as they may not address limitations of the network such as heterogeneous nodes or limited capability of some nodes. While designing a customized protocol for that network would be desirable, it is extremely time-consuming unless we can automate the development of the required protocol. In this paper, we present NetSynth, a GP based mechanism to develop customized routing protocol for the given network. NetSynth lets us conveniently describe a network using an XML file, and it synthesizes a routing protocol that suits the input network by considering the characteristics specific to the given network. We show how NetSynth creates protocols that perform comparably to best-known protocols for the case where we want to broadcast a set of messages to all nodes in a grid. We also show how NetSynth helps us design protocols that provide a trade-off between throughput and energy.

Towards Modular Large-Scale Darwinian Software Improvement
by Michael Orlov
DOI PDF PDF Abstract

This paper proposes to explore a software engineer-assisted method for evolutionarily improving large-scale software systems. A frame-work is outlined for selecting and evolving specific components of such systems, while avoiding treating the complete software as a single independent individual in the population, thereby forgoing the high costs of that approach.

GI@ICSE 2018: 4th International Workshop

A Turing Test for Genetic Improvement
by Afsoon Afzal, Jeremy Lacomis, Claire Le Goues, and Christopher Steven Timperley
DOI PDF PDF Abstract

Genetic improvement is a research field that aims to develop searchbased techniques for improving existing code. GI has been used to automatically repair bugs, reduce energy consumption, and to improve run-time performance. In this paper, we reflect on the often-overlooked relationship between GI and developers within the context of continually evolving software systems. We introduce a distinction between transparent and opaque patches based on intended lifespan and developer interaction. Finally, we outline a Turing test for assessing the ability of a GI system to produce opaque patches that are acceptable to humans. This motivates research into the role GI systems will play in transparent development contexts.

A spoonful of DevOps helps the GI Go Down
by Benoit Baudry, Nicolas Harrand, Eric Schulte, Christopher Timperley, Shin Hwei Tan, Marija Selakovic, and Emamurho Ugherughe
DOI PDF Abstract

DevOps emphasizes a high degree of automation at all phases of the software development lifecyle. Meanwhile, Genetic Improvement (GI) focuses on the automatic improvement of software artefacts. In this paper, we discuss why we believe that DevOps offers an excellent technical context for easing the adoption of GI techniques by software developers. We also discuss A/B testing as a prominent and clear example of GI taking place in the wild today, albeit one with human-supervised fitness and mutation operators.

Comparing Line and AST Granularity Level for Program Repair using PyGGI
by Gabin An, Jinhan Kim, and Shin Yoo
DOI PDF PDF SLIDES Abstract

PyGGI is a lightweight Python framework that can be used to implement generic Genetic Improvement algorithms at the API level. The original version of PyGGI only provided lexical modifications, i.e., modifications of the source code at the physical line granularity level. This paper introduces new extensions to PyGGI that enables syntactic modifications for Python code, i.e., modifications that operates at the AST granularity level. Taking advantage of the new extensions, we also present a case study that compares the lexical and syntactic search granularity level for automated program repair, using ten seeded faults in a real world open source Python project. The results show that search landscapes at the AST granularity level are more effective (i.e. eventually more likely to produce plausible patches) due to the smaller sizes of ingredient spaces (i.e., the space from which we search for the material to build a patch), but may require longer time for search because the larger number of syntactically intact candidates leads to more fitness evaluations.

Evolutionary Fuzzing for Genetic Improvement: Toward Adaptive Software Defense
by Jason Landsborough, Stephen Harding, and Bryan Beabout
DOI PDF SLIDES Abstract

As fuzz testing strategies have become more and more sophisticated, we see a natural application of fuzz testing to Genetic Improvement techniques. In particular, the ability to generate high quality and high coverage tests with advanced fuzzers can greatly enhance the effectiveness of Genetic Improvement algorithms, especially when the algorithm is applied to bug fixing or other similar kinds of software improvement to improve qualities such as security.

Evolving Software Quality
by Claire Le Goues
PDF Abstract

Genetic Improvement describes a class of search-based techniques that automatically improve software along a a variety of quality dimensions. In this talk, I will survey the research advances that have made possible the significant recent progress in this field. I will focus on the ongoing research opportunities that lie in genetic software improvement, with an especial focus on the challenges of confidently reasoning about, measuring, and assuring the quality of automatically and constantly evolving artefacts.

Experiments in Genetic Divergence for Emergent Systems
by Christopher McGowan, Alexander Wild, and Barry Porter
DOI PDF PDF SLIDES Abstract

Emergent software systems take a step towards tackling the ever-increasing complexity of modern software, by having systems self-assemble from a library of building blocks, and then continually re-assemble themselves from alternative building blocks to learn which compositions of behaviour work best in each deployment environment. One of the key challenges in emergent systems is populating the library of building blocks, and particularly a set of alternative implementations of particular building blocks, which form the runtime search space of optimal behaviour. We present initial work in using a fusion of genetic improvement and genetic synthesis to automatically populate a divergent set of implementations of the same functionality, allowing emergent systems to explore new behavioural alternatives without human input. Our early results indicate this approach is able to successfully yield useful divergent implementations of building blocks which are more suited than any existing alternative for particular operating conditions.

Learning to Synthesize
by Yingfei Xiong, Bo Wang, Guirong Fu, and Linfei Zang
DOI PDF PDF PDF SLIDES URL Abstract

n many scenarios we need to find the most likely program under a local context, where the local context can be an incomplete program, a partial specification, natural language description, etc. We call such problem program estimation. In this paper we propose an abstract framework, learning to synthesis, or L2S in short, to address this problem. L2S combines four tools to achieve this: syntax is used to define the search space and search steps, constraints are used to prune off invalid candidates at each search step, machine-learned models are used to estimate conditional probabilities for the candidates at each search step, and search algorithms are used to find the best possible solution. The main goal of L2S is to lay out the design space to motivate the research on program estimation. We have performed a preliminary evaluation by instantiating this framework for synthesizing conditions of an automated program repair (APR) system. The training data are from the project itself and related JDK packages. Compared to ACS, a state-of-the-art condition synthesis system for program repair, our approach could deal with a larger search space such that we fixed 4 additional bugs outside the search space of ACS, and relies only on the source code of the current projects.

Neutrality and Epistasis in Program Space
by Joseph Renzullo, Westley Weimer, Melanie Moses, and Stephanie Forrest
DOI PDF SLIDES Abstract

Neutral networks in biology often contain diverse solutions with equal fitness, which can be useful when environments (requirements) change over time. we present a method for studying neutral networks in software. In these networks, we find multiple solutions to held-out test cases (latent bugs), suggesting that neutral software networks also exhibit relevant diversity. We also observe instances of positive epistasis between random mutations, i.e. interactions that collectively increase fitness. Positive epistasis is rare as a fraction of the total search space but significant as a fraction of the objective space: 9percent of the repairs we found to look (and 4.63percent across all programs analysed) were produced by positive interactions between mutations. Further, the majority (62.50percent) of unique repairs are instances of positive epistasis

Performance Localisation
by Brendan Cody-Kenny, Michael O'Neill, and Stephen Barrett
DOI PDF SLIDES Abstract

Profiling techniques highlight where performance issues manifest and provide a starting point for tracing cause back through a program. While people diagnose and understand the cause of performance to guide formulation of a performance improvement, we seek automated techniques for highlighting performance improvement opportunities to guide search algorithms. We investigate mutation-based approaches for highlighting where a performance improvement is likely to exist. For all modification locations in a program, we make all possible modifications and analyse how often modifications reduce execution count. We compare the resulting code location rankings against rankings derived using a profiler and find that mutation analysis provides the higher accuracy in highlighting performance improvement locations in a set of benchmark problems, though at a much higher execution cost. We see both approaches as complimentary and consider how they may be used to further guide Genetic Programming in finding performance improvements

GI@GECCO 2017: 3rd International Workshop

Deep Parameter Optimisation on Android Smartphones for Energy Minimisation - A Tale of Woe and a Proof-of-Concept
by Mahmoud A. Bokhari, Bobby R. Bruce, Brad Alexander, and Markus Wagner
DOI PDF PDF Abstract

With power demands of mobile devices rising, it is becoming increasingly important to make mobile software applications more energy efficient. Unfortunately, mobile platforms are diverse and very complex which makes energy behaviours difficult to model. This complexity presents challenges to the effectiveness of off-line optimisation of mobile applications. we demonstrate that it is possible to automatically optimise an application for energy on a mobile device by evaluating energy consumption in-vivo. In contrast to previous work, we use only the device's own internal meter. Our approach involves many technical challenges but represents a realistic path toward learning hardware specific energy models for program code features.

Embedding Genetic Improvement into Programming Languages
by Shin Yoo
DOI PDF Abstract

We present a vision of genetic improvement firmly embedded in, and supported by, programming languages. Genetic improvement has already been envisioned as the next compiler, which would take human written programs as input and return versions optimised with respect to various objectives. As an intermediate stage, or perhaps to complement the fully automated vision, we imagine genetic improvement processes that are hinted at and directed by humans but understood and undertaken by programming languages and their run-times, via interactions through the source code. We examine existing similar ideas and examine the benefits of embedding them within programming languages.

Evolving Software Building Blocks with FINCH
by Michael Orlov
DOI PDF PDF Abstract

This paper proposes to explore the following question: can software evolution systems like FINCH, that evolve linear representations originating from a higher-level structural language, take advantage of building blocks inherent to that original language?

Fixing Bugs in Your Sleep: How Genetic Improvement Became an Overnight Success
by Saemundur O. Haraldsson, John R. Woodward, Alexander E. I. Brownlee, and Kristin Siggeirsdottir
DOI PDF PDF URL Abstract

We present a bespoke live system in commercial use with self-improving capability. During daytime business hours it provides an overview and control for many specialists to simultaneously schedule and observe the rehabilitation process for multiple clients. However in the evening, after the last user logs out, it starts a self-analysis based on the day's recorded interactions. It generates test data from the recorded interactions for Genetic Improvement to fix any recorded bugs that have raised exceptions. The system has already been under test for over 6 months and has in that time identified, located, and fixed 22 bugs. No other bugs have been identified by other methods during that time. It demonstrates the effectiveness of simple test data generation and the ability of GI for improving live code.

From Problem Landscapes to Language Landscapes: Questions in Genetic Improvement
by Brendan Cody-Kenny, Michael Fenton, and Michael O'Neill
DOI PDF Abstract

Managing and curating software is a time consuming process particularly as programming languages, libraries, and execution environments change. To support the engineering of software, we propose applying a GP-based continuous learning system to all useful software. We take the position that search-based itemization and analysis of all commonly used software is feasible, in large part, because the requirements that people place on software can be used to bound the search space to software which is of high practical use. By repeatedly reusing the information generated during the search process we hope to attain a higher-level, but also more rigorous, understanding of our engineering material: source code.

GI in No Time
by David R. White
DOI PDF PDF URL Abstract

We describe a small, simple, and lightweight micro-framework for the Genetic Improvement of Java code. We call the framework GI in no time, or Gin. Gin is designed to be a straightforward, hackable, GI tool for Java. It currently lacks large features found in comparable program repair tools, but nonetheless it is capable of performing optimisation of a Java class via local search. We hope that providing this contribution will encourage researchers to collaborate on GI tool development, whilst lowering the barrier to entry for those interested in experimenting with GI. It is intended to serve both as a tool kit to be extended, and also an example of how GI can be implemented. We discuss some of the design principles behind Gin, and outline observations made during its development.

Gaining Insights into Traffic Data through Genetic Improvement
by Aniko Ekart, Alina Patelli, Victoria Lush, and Elisabeth Ilie-Zudor
DOI PDF PDF URL Abstract

We argue that Genetic Improvement can be successfully used for enhancing road traffic data mining. This would support the relevant decision makers with extending the existing network of devices that sense and control city traffic, with the end goal of improving vehicle flow and reducing the frequency of road accidents. Our position results from a set of preliminary observations emerging from the analysis of open access road traffic data collected in real time by the Birmingham City Council.

Genetic Improvement of Runtime in a Bioinformatics Application
by Saemundur O. Haraldsson, John R. Woodward, Alexander E. I. Brownlee, Albert V. Smith, and Vilmundur Gudnason
DOI PDF Abstract

We present a Genetic Improvement (GI) experiment on ProbAbel, a piece of bioinformatics software for Genome Wide Association (GWA) studies. The GI framework used here has previously been successfully used on Python programs and can, with minimal adaptation, be used on source code written in other languages. We achieve improvements in execution time without the loss of accuracy in output while also exploring the vast fitness landscape that the GI framework has to search. The runtime improvements achieved on smaller data set scale up for larger data sets. Our findings are that for ProbAbel, the GI's execution time landscape is noisy but flat. We also confirm that human written code is robust with respect to small edits to the source code.

Improving SSE Parallel Code with Grow and Graft Genetic Programming
by William B. Langdon and Ronny Lorenz
DOI PDF PDF SLIDES Abstract

RNAfold predicts the secondary structure of RNA molecules from their base sequence. We apply a mixture of manual and automated genetic improvements to RNAfold's C source code. GI gives a 1.6 percent improvement to parallel SSE4.1 code. The automatic programming evolutionary system has access to Intel library code and previous revisions. On 4666 curated structures from RNA_STRAND, GGGP gives a combined average speed up of 31.9 percent, with no loss of accuracy.

Learning from Super-Mutants: searching post-apocalyptic software ecosystems for novel semantics-preserving transforms
by Jason Landsborough, Stephen Harding, and Sunny Fugate
DOI PDF Abstract

In light of recent advances in genetic-algorithm-driven automated program modification, our team has been actively exploring the art, engineering, and discovery of novel semantics-preserving transforms. While modern compilers represent some of the best ideas we have for automated program modification, current approaches represent only a small subset of the types of transforms which can be achieved. In the wilderness of post-apocalyptic software ecosystems of genetically-modified and mutant programs, there exist a broad array of potentially useful software mutations, including semantics-preserving transforms that may play an important role in future software design, development, and most importantly, evolution.

Modelling Genetic Improvement Landscapes with Local Optima Networks
by Nadarajen Veerapen, Fabio Daolio, and Gabriela Ochoa
DOI PDF Abstract

Local optima networks are a compact representation of the global structure of a search space. They can be used for analysis and visualisation. This paper provides one of the first analyses of program search spaces using local optima networks. These are generated by sampling the search space by recording the progress of an Iterated Local Search algorithm. Source code mutations in comparison and Boolean operators are considered. The search spaces of two small benchmark programs, the triangle and TCAS programs, are analysed and visualised. Results show a high level of neutrality, i.e. connected test-equivalent mutants. It is also generally relatively easy to find a path from a random mutant to a mutant that passes all test case

New Operators for Non-functional Genetic Improvement
by Justyna Petke
DOI PDF Abstract

Genetic improvement uses automated search to find improved versions of existing software. Typically software is modified using either delete, copy or replace operations at the level of source code, its abstract syntax tree, binary or assembly representation. Impressive improvements have been achieved through this approach, yet research in the use of other search operators is largely unexplored. We propose several ways for devising new search operators for improvement of non-functional properties using a genetic improvement approach.

GI@GECCO 2016: 2nd International Workshop

Automatic Improvement of Apache Spark Queries using Semantics-preserving Program Reduction
by Zoltan A. Kocsis, John H. Drake, Douglas Carson, and Jerry Swan
DOI PDF Abstract

Apache Spark is a popular framework for large-scale data analytics. Unfortunately, Spark's performance can be difficult to optimise, since queries freely expressed in source code are not amenable to traditional optimisation techniques. This article describes Hylas, a tool for automatically optimising Spark queries embedded in source code via the application of semantics-preserving transformations. The transformation method is inspired by functional programming techniques of deforestation, which eliminate intermediate data structures from a computation. This contrasts with approaches defined entirely within structured query formats such as Spark SQL. Hylas can identify certain computationally expensive operations and ensure that performing them creates no superfluous data structures. This optimisation leads to significant improvements in execution time, with over 10,000 times improvement observed in some cases.

Benchmarking Genetically Improved BarraCUDA on Epigenetic Methylation NGS datasets and nVidia GPUs
by William B. Langdon, Albert Vilella, Brian Yee Hong Lam, Justyna Petke, and Mark Harman
DOI PDF PDF Abstract

BarraCUDA uses CUDA graphics cards to map DNA reads to the human genome. Previously its software source code was genetically improved for short paired end next generation sequences. On longer noisy epigenetics strings using nVidia Titan and twin Tesla K40 the same GI-ed code is more than 3 times faster than bwa-meth on an 8 core CPU.

Evals is not enough: why we should report wall-clock time
by John Woodward, Alexander Brownlee, and Colin Johnson
DOI PDF Abstract

Have you ever noticed that your car never achieves the fuel economy claimed by the manufacturer? Does this seem unfair? Unscientific? Would you like the same situation to occur in Genetic Improvement? Comparison will always be difficult [9], however, guidelines have been discussed [3, 5, 4]. With two GP [8] approaches, comparing the number of evaluations of the fitness function is reasonably fair. This means you are comparing the GP systems, and not how well they are implemented, how fast the language is. However, the situation with GI [6, 1] is unique. With GI we will typically compare systems which are applied to the same application written in the same language (i.e. a GI systems targeted at Java, may not even be applied to C). Thus, wall-clock time becomes more relevant. Thus, this paper asks if reporting number of evaluations is enough, or if wall-clock time is also important, particularly in the context of GI. It argues that reporting time is even more important when doing GI when compared to traditional GP.

Evolutionary optimization of compiler flag selection by learning and exploiting flags interactions
by Unai Garciarena and Roberto Santana
DOI PDF Abstract

Compiler flag selection can be an effective way to increase the quality of executable code according to different code quality criteria. Evolutionary algorithms have been successfully applied to this optimization problem. However, previous approaches have only partially addressed the question of capturing and exploiting the interactions between compilation options to improve the search. In this paper we deal with this question comparing estimation of distribution algorithms (EDAs) and a traditional genetic algorithm approach. We show that EDAs that learn bivariate interactions can improve the results of GAs for some of the programs considered. We also show that the probabilistic models generated as a result of the search for optimal flag combinations can be used to unveil the (problem-dependent) interactions between the flags, allowing the user a more informed choice of compilation options.

GP vs GI: if you can't beat them, join them
by John Woodward, Colin Johnson, and Alexander Brownlee
DOI PDF Abstract

Genetic Programming (GP) has been criticized for targeting irrelevant problems [12], and is also true of the wider machine learning community [11]. which has become detached from the source of the data it is using to drive the field forward. However, recently GI provides a fresh perspective on automated programming. In contrast to GP, GI begins with existing software, and therefore immediately has the aim of tackling real software. As evolution is the main approach to GI to manipulating programs, this connection with real software should persuade the GP community to confront the issues around what it originally set out to tackle i.e. evolving real software.

Genetic Improvement 2016 Chairs' Welcome and Organization
by Westley Weimer, Justyna Petke, and David R. White
DOI Abstract

It is our great pleasure to welcome you to the 2nd Workshop on Genetic Improvement...

Genetic Improvement for Code Obfuscation
by Justyna Petke
DOI PDF Abstract

Genetic improvement (GI) is a relatively new area of software engineering and thus the extent of its applicability is yet to be explored. Although a growing interest in GI in recent years started with the work on automatic bug fixing, the area flourished when results on optimisation of non- functional software properties, such as efficiency and energy consumption, were published. Further success of GI in trans-planting functionality from one program to another leads to a question: what other software engineering areas can benefit from the use of genetic improvement techniques? We propose to use GI for code obfuscation

Genetic Programming: From design to improved implementation
by Victor R. Lopez-Lopez, Leonardo Trujillo, Pierrick Legrand, and Gustavo Olague
DOI PDF Abstract

Genetic programming (GP) is an evolutionary-based search paradigm that is well suited to automatically solve difficult design problems. The general principles of GP have been used to evolve mathematical functions, models, image operators, programs, and even antennas and lenses. Since GP evolves the syntax and structure of a solution, the evolutionary process can be carried out in one environment and the solution can then be ported to another. However, given the nature of GP it is common that the evolved designs are unorthodox compared to traditional approaches used in the problem domain. Therefore, efficiently porting, improving or optimizing an evolved design might not be a trivial task. In this work we argue that the same GP principles used to evolve the solution can then be used to optimize a particular new implementation of the design, following the Genetic Improvement approach. In particular, this paper presents a case study where evolved image operators are ported from Matlab to OpenCV, and then the source code is optimized an improved using Genetic Improvement of Software for Multiple Objectives (GISMOE). In the example we show that functional behaviour is maintained (output image) while improving non-functional properties (computation time). Despite the fact that this first example is a simple case, it clearly illustrates the possibilities of using GP principles in two distinct stages of the software development process, from design to improved implementation.

Guiding Unconstrained Genetic Improvement
by David White
DOI PDF Abstract

This paper argues that the potential for arbitrary transformation is what differentiates GI from other program transformation work. With great expressive power comes great responsibility, and GI has had mixed success finding effective program repairs and optimisations. The search must be better guided in order to improve solution quality.

Optimising Energy Consumption Heuristically on Android Mobile Phones
by Mahmoud Bokhari and Markus Wagner
DOI PDF PDF Abstract

In this paper we outline our proposed framework for optimising energy consumption on Android mobile phones. To model the power usage, we use an event-based modelling technique. The internal battery fuel gauge chip is used to measure the amount of energy being consumed and accordingly the model is built on. We use the model to estimate components' energy usages. In addition, we propose the use of evolutionary computations to prolong the battery life. This can be achieved by using the power consumption model as a fitness function, re-configuring the smartphone's default settings and modifying existing code of applications.

Speeding up the proof strategy in formal software verification
by Markus Wagner
DOI PDF PDF Abstract

The functional correctness of safety- and security-critical software is of utmost importance. Nowadays, this can be achieved through computer assisted verification. While formal verification itself typically poses a steep learning-curve for anyone who wants to apply it, its applicability is further hindered by its (typically) low runtime performance. With the increasing popularity of algorithm parameter tuning and genetic improvement, we see a great opportunity for assisting verification engineers in their daily tasks.

GI@GECCO 2015: 1st International Workshop

Embedded Dynamic Improvement
by Nathan Burles, Jerry Swan, Edward Bowles, Alexander E. I. Brownlee, Zoltan A. Kocsis, and Nadarajen Veerapen
DOI PDF Abstract

We discuss the useful role that can be played by a subtype of improvement programming, which we term Embedded Dynamic Improvement. In this approach, developer-specified variation points define the scope of improvement. A search framework is embedded at these variation points, facilitating the creation of adaptive software that can respond online to changes in its execution environment.

Embedding Adaptivity in Software Systems using the ECSELR framework
by Kwaku Yeboah-Antwi and Benoit Baudry
DOI PDF PDF Abstract

ECSELR is an ecologically-inspired approach to software evolution that enables environmentally driven evolution at runtime in extant software systems without relying on any offline components or management. ECSELR embeds adaptation and evolution inside the target software system enabling the system to transform itself via Darwinian evolutionary mechanisms and adapt in a self contained manner. This allows the software system to benefit autonomously from the useful emergent byproducts of evolution like adaptivity and bio-diversity, avoiding the problems involved in engineering and maintaining such properties. ECSELR enables software systems to address changing environments at run time, ensuring benefits like mitigation of attacks and memory-optimisation among others while avoiding time consuming and costly maintenance and downtime. ECSELR differs from existing work in that, 1) adaptation is embedded in the target system, 2) evolution and adaptation happens online (i.e. in-situ at runtime) and 3) ECSELR is able to embed adaptation inside systems that have already been started and are in the midst of execution. We demonstrate the use of ECSELR and present results on using the ECSELR framework to slim a software system.

Energy Optimisation via Genetic Improvement A SBSE technique for a new era in Software Development
by Bobby R. Bruce
DOI PDF SLIDES Abstract

The discipline of Software Engineering has arisen during a time in which developers rarely concerned themselves with the energy efficiency of their software. Due to the growth in both mobile devices and large server clusters this period is undoubtedly coming to an end and, as such, new tools for creating energy-efficient software are required. This paper takes the position that Genetic Improvement, a Search-Based Software Engineering technique, has the potential to aid developers in refactoring their software to a more energy-efficient state; allowing focus to remain on functional requirements while leaving concerns over energy consumption to an automated process.

Evolutionary Approximation of Software for Embedded Systems: Median Function
by Vojtech Mrazek, Zdenek Vasicek, and Lukas Sekanina
DOI PDF SLIDES Abstract

This paper deals with genetic programming-based improvement of non-functional properties of programs intended for low-cost microcontrollers. As the objective is to significantly reduce power consumption and execution time, the approximate computing scenario is considered in which occasional errors in results are acceptable. The method is based on Cartesian genetic programming and evaluated in the task of approximation of 9-input and 25-input median function. Resulting approximations show a significant improvement in the execution time and power consumption with respect to the accurate median function while the observed errors are moderate.

Fitness as Task-relevant Information Accumulation
by Colin G. Johnson and John R. Woodward
DOI PDF Abstract

If you cannot measure it, you cannot improve it. Lord Kelvin Fitness in GP/GI is usually a short-sighted greedy fitness function counting the number of satisfied test cases (or some other score based on error). If GP/GI is to be extended to successfully tackle full software systems, which is the stated domain of Genetic Improvement, with loops, conditional statements and function calls, then this kind of fitness will fail to scale. One alternative approach is to measure the fitness gain in terms of the accumulated information at each executed step of the program. This paper discusses methods for measuring the way in which programs accumulate information relevant to their task as they run, by building measures of this information gain based on information theory and model complexity.

GI4GI: Improving Genetic Improvement Fitness Functions
by Mark Harman and Justyna Petke
DOI PDF SLIDES Abstract

Genetic improvement (GI) has been successfully used to optimise non-functional properties of software, such as execution time, by automatically manipulating program's source code. Measurement of non-functional properties, however, is a non-trivial task; energy consumption, for instance, is highly dependant on the hardware used. Therefore, we propose the GI4GI framework (and two illustrative applications). GI4GI first applies GI to improve the fitness function for the particular environment within which software is subsequently optimised using traditional GI.

Genetic Improvement 2015 Chairs' Welcome
by William B. Langdon, Justyna Petke, and David R. White
PDF URL

Genetic Improvement for Software Product Lines: An Overview and a Roadmap
by Roberto E. Lopez-Herrejon, Lukas Linsbauer, Wesley K. G. Assuncao, Stefan Fischer, Silvia R. Vergilio, and Alexander Egyed
DOI PDF Abstract

Software Product Lines (SPLs) are families of related software systems that provide different combinations of features. Extensive research and application attest to the significant economical and technological benefits of employing SPL practices. However, there are still several challenges that remain open. Salient among them is reverse engineering SPLs from existing variants of software systems and their subsequent evolution. In this paper, we aim at sketching connections between research on these open SPL challenges and ongoing work on Genetic Improvement. Our hope is that by drawing such connections we can spark the interest of both research communities on the exciting synergies at the intersection of these subject areas.

Genetic Improvement of Energy Usage is only as Reliable as the Measurements are Accurate
by Saemundur O. Haraldsson and John R. Woodward
DOI PDF Abstract

Energy has recently become an objective for Genetic Improvement. Measuring software energy use is complicated which might tempt us to use simpler measurements. However if we base the GI on inaccurate measurements we can not assure any improvements. This paper seeks to highlight important issues when evaluating energy use of programs.

Genetic Improvement using Higher Order Mutation
by Yue Jia, Fan Wu, Mark Harman, and Jens Krinke
DOI PDF SLIDES Abstract

This paper presents a brief outline of a higher-order mutation based framework for Genetic Improvement (GI). We argue that search-based higher-order mutation testing can be used to implement a form of genetic programming (GP) to increase the search granularity and testability of GI.

Grow and Graft a better CUDA pknotsRG for RNA pseudoknot free energy calculation
by William B. Langdon and Mark Harman
DOI PDF PDF SLIDES URL Abstract

Grow and graft genetic programming greatly improves GPGPU dynamic programming software for predicting the minimum binding energy for folding of RNA molecules. The parallel code inserted into the existing CUDA version of pknots was grown using a BNF grammar. On an nVidia Tesla K40 GPU GGGP gives a speed up of up to 10000 times.

Removing the Kitchen Sink from Software
by Jason Landsborough, Stephen Harding, and Sunny Fugate
DOI PDF SLIDES Abstract

We would all benefit if software were slimmer, thinner, and generally only did what we needed and nothing more. To this end, our research team has been exploring methods for removing unused and undesirable features from compiled programs. Our primary goal is to improve software security by removing rarely used features in order to decrease a program's attack surface. This paper describes two different approaches for thinning binary programs. The first approach removes specific program features using dynamic tracing as a guide. This approach is relatively safe, but is only capable of removing code which is reachable in a trace when an undesirable feature is enabled. The second approach uses a genetic algorithm (GA) to mutate a program until a suitable variant is found. Our GA-based approach can potentially remove any code that is not strictly required for proper execution, but may break program semantics in unpredictable ways. We show results of these approaches on a toy program and real-world software and explore some of the implications for software security.

Repairing COTS Router Firmware without Access to Source Code or Test Suites: A Case Study in Evolutionary Software Repair
by Eric Schulte, Westley Weimer, and Stephanie Forrest
DOI PDF Abstract

The speed with which newly discovered software vulnerabilities are patched is a critical factor in mitigating the harm caused by subsequent exploits. Unfortunately, software vendors are often slow or unwilling to patch vulnerabilities, especially in embedded systems which frequently have no mechanism for updating factory-installed firmware. The situation is particularly dire for commercial off the shelf (COTS) software users, who lack source code and are wholly dependent on patches released by the vendor. We propose a solution in which the vulnerabilities drive an automated evolutionary computation repair process capable of directly patching embedded systems firmware. Our approach does not require access to source code, regression tests, or any participation from the software vendor. Instead, we present an interactive evolutionary algorithm that searches for patches that resolve target vulnerabilities while relying heavily on post-evolution difference minimisation to remove most regressions. Extensions to prior work in evolutionary program repair include: repairing vulnerabilities in COTS router firmware; handling stripped MIPS executable; operating without fault localisation information; operating without a regression test suite; and incorporating user interaction into the evolutionary repair process. We demonstrate this method by repairing two well-known vulnerabilities in version 4 of NETGEAR's WNDR3700 wireless router before NETGEAR released patches publicly for the vulnerabilities. Without fault localisation we are able to find repair edits that are not located on execution traces. Without the advantage of regression tests to guide the search, we find that 80percent of repairs of the example vulnerabilities retain program functionality after minimisation. With minimal user interaction to demonstrate required functionality, 100percent of the proposed repairs were able to address the vulnerabilities while retaining required functionality.

Rethinking Genetic Improvement Programming
by David R. White and Jeremy Singer
DOI PDF SLIDES Abstract

We re-examine the central motivation behind Genetic Improvement Programming (GIP), and argue that the most important insight is the concept of applying Genetic Programming to existing software. This viewpoint allows us to make several observations about potential directions for GIP research

locoGP: Improving Performance by Genetic Programming Java Source Code
by Brendan Cody-Kenny, Edgar Galvan Lopez, and Stephen Barrett
DOI PDF SLIDES URL Abstract

We present locoGP, a Genetic Programming (GP) system written in Java for evolving Java source code. locoGP was designed to improve the performance of programs as measured in the number of operations executed. Variable test cases are used to maintain functional correctness during evolution. The operation of locoGP is demonstrated on a number of typically constructed off-the-shelf hand-written implementations of sort and prefix-code programs. locoGP was able to find improvement opportunities in all test problems.