AI Engineer | LLM Training & Optimization | High-Performance Computing
Ph.D. physicist with deep expertise in High-Performance Computing (HPC), specializing in training, fine-tuning, and optimizing large-scale AI models. Proven experience in pre-training billion-parameter LLMs on distributed systems and achieving >50% performance gains via LoRA fine-tuning.
When I'm not working, you can find me powerlifting at the gym. I believe in continuous learning and pushing the boundaries of what's possible, both mentally and physically.
Improved domain-specific Vision-Language Model (VLM) performance by >50%, increasing extraction accuracy on a proprietary benchmark from around 60% to over 90% through targeted LoRA fine-tuning and a novel synthetic data generation pipeline. Led pre-training of a 3B parameter German-language LLM on a Cerebras CS-3 cluster, managing training stability to achieve a final validation loss of around 2 on a 4.4T token dataset. Architected and built multiple advanced RAG systems, including a GraphRAG PoC that improved retrieval precision by 25% over vanilla vector search on a complex legal document corpus by constructing a knowledge graph based on semantic entity relationships. Developed multi-step AI agents using ReAct-style prompting to automate complex document analysis workflows, integrating custom tools for data extraction and exploration.
Designed and executed large-scale quantum field theory (QCD+QED) simulations on HPC clusters across Europe, managing distributed workloads with thousands of compute cores. Authored a 5,000+ line mass-reweighting module in C/MPI for the openQxD framework, directly manipulating low-level data structures to improve simulation efficiency by 15%. Developed a high-performance data analysis pipeline in Python (NumPy, SciPy) to process terabytes of Markov Chain Monte Carlo simulation output, identifying key physical observables. Developed an automatic differentiation engine from scratch in Python for error analysis of complex derived observables. Accelerated the core Dirac differential operator by porting computational kernels to NVIDIA GPUs using CUDA, resulting in a 4x speedup during an NVIDIA-sponsored OpenACC/CUDA hackathon. Member of the Research Training Group RTG2575 - Rethinking Quantum Field Theory. Taught graduate courses in statistical physics, quantum mechanics, and linear algebra.
A full list of publications can be found on Google Scholar.
Just run this in your Python terminal:
"".join(["@", "je", "ecke", ".", "ns", "ai", "lu"][i] for i in [1, 4, 0, 6, 2, 3, 5])
Berlin, Germany