Talk

How to Build an Open-Source Machine Learning Platform in Biology?

Sunday, May 28

14:30 - 15:00
RoomLasagna
LanguageEnglish
Audience levelIntermediate
Elevator pitch

Are you interested in building an open-source, web-based machine learning platform in biology to foster new discoveries? Or do you want to try ML algorithms on your omics data and share them in a reproducible and standard way without the need for code? You are more than welcome to OmicLearn talk!

Abstract

The global transition to personalized medicine has been accelerated by the advancements in high-throughput technologies in biology and increasing computational power and storage capacities. Accompanying the diversity and growing volume of the complex biological datasets, called “—omics” data (e.g., genomics and proteomics), revolutionized our understanding and our way of interpreting disease and health states, although they generate new challenges. To reveal the complex and hidden patterns in omics data, machine learning (ML) showed promising results and brought new opportunities for transforming scientific discovery today. Popular packages such as scikit-learn or XGBoost enable predictive data analysis. However, the researchers still require programming skills to write their own ML pipelines and are not always easy to follow by non-specialists due to lacking domain knowledge and a graphical interface.

Furthermore, several parameters can be changed to tune the algorithms, which might show differences from version to version, resulting in reproducibility issues. To reproduce published results, the same software environment needs to be set up and configured with the matching package versions and algorithm parameters.

Additionally, omics sciences and ML require special domain knowledge since metrics can be deceiving and algorithms might need extra preselection or preprocessing steps.

Thus, transparent and open-source software is highly valuable for open and reproducible science. To address all the issues and to enable researchers to access state-of-the-art ML algorithms without requiring any prior bioinformatics and programming knowledge, we introduce OmicLearn (OmicLearn.org), a ready-to-use, open-source, web-based, ML platform specifically developed for omics datasets.

This talk is for every scientist and developer who is interested in biology or omics or who wants to learn how to build a machine learning platform from open-source tools.

TagsMachine-Learning, Community, Open-Source, Science
participant photo

Furkan M. Torun

I am a molecular biologist and geneticist with research experience and programming background.

After working as a computational biologist and data scientist at a rare disease research laboratory and OmicEra Diagnostics, respectively, now, I am working at a cancer diagnostics biotechnology company as a Researcher and Data Scientist in Munich.

The underlying ultimate goal of my works is to combine the power of computation with mysterious biological questions to reveal the unknown.

So, let’s continue 🧬 debugging DNA software!