By Benjamin Daniel Høyer
Supervisors: Ole Winther, DTU Informatics, and Thomas Agersten Poulsen, Novozymes.
Abstract:
Optimal protein expression levels is of great interest for companies in the enzyme business. It is an in-house efficiency that takes nothing away from the customer. The task of predicting expression levels, given a nucleotide sequence, is hard. But given the cost of testing many sequences, it is well worth the effort.
This thesis documents an M.Sc. project aimed at describing the biological and machine learning background of the problem at hand.
Additionally it aims to train a predictive model with a prediction correlation exceeding what is currently considered state of the art.
Making this thesis has also meant many useful R functions have been written. These are useful in themselves, but collecting them to a package will enhance portability and ease of use.