Philipp Borchert

👋 Hi, I'm Philipp!
I'm a NLP researcher fascinated by how we can teach LLMs to reason and tackle complex, structured problems. My research spans from multilingual NLP over information extraction to my current work on reasoning in AI for Math at Huawei in London. I completed my PhD at KU Leuven, where my research focused on multilinguality and NLP for Business applications.

What excites me |

📄 Selected Publications & Projects

Language Fusion for Parameter-Efficient Cross-lingual Transfer 💻 GitHub 📝 ArXiv

Philipp Borchert, Ivan Vulić, Marie-Francine Moens, Jochen De Weerdt

ACL 2025 | Vienna, Austria 📍

This study introduces Fusion for Language Representations (FLARE), a novel method that merges source and target language representations within low-rank adapters, enhancing cross-lingual transfer performance while maintaining parameter efficiency. Keywords: cross-lingual transfer, peft

Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages 💻 GitHub 📝 ArXiv

Fabian David Schmidt, Philipp Borchert, Ivan Vulić, Goran Glavaš

EMNLP (Findings) 2024 | Miami, USA 📍

This paper introduces MT-LLM, a method integrating machine translation (MT) encoders into LLM backbones using self-distillation. This approach unlocks natural language understanding capabilities for over 127 languages by allowing them to tap into the rich knowledge of English-centric LLMs. Keywords: cross-lingual transfer, model merging, mt-llm

Efficient Information Extraction in Few-Shot Relation Classification through Contrastive Representation Learning 💻 GitHub 📝 ArXiv

Philipp Borchert, Jochen De Weerdt, Marie-Francine Moens

NAACL 2024 | Mexico City, Mexico 📍

This paper presents MultiRep, a novel approach to improve few-shot relation classification by combining multiple sentence representations using contrastive learning. This method effectively extracts complementary, discriminative information, proving especially beneficial in low-resource scenarios. Keywords: low-resource nlp, information extraction

CORE: A Few-Shot Company Relation Classification Dataset for Robust Domain Adaptation 💻 GitHub 📝 ArXiv

Philipp Borchert, Jochen De Weerdt, Kristof Coussement, Arno De Caigny, Marie-Francine Moens

EMNLP 2023 | Singapore 📍

CORE is a dataset for few-shot relation classification focused on company relations, designed to challenge models with the contextual complexity of business entities. The study demonstrates that while models struggle to adapt to CORE, training on this high-quality, information-rich dataset improves out-of-domain performance. Keywords: relation extraction, domain adaptation, dataset

Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques 💻 GitHub 📝 ArXiv

Manon, Reusens, Philipp Borchert, Margot Mieskes, Jochen De Weerdt, Bart Baesens

EMNLP 2023 | Singapore 📍

This study investigates whether debiasing techniques can be effectively transferred across different languages within multilingual LLMs. The findings confirm that cross-lingual transfer is not only feasible but also beneficial, with the SentenceDebias technique proving most effective by reducing bias by an average of 13% across the tested languages. Keywords: fairness, llm bias, cross-lingual transfer

SEER: A Knapsack approach to Exemplar Selection for In-Context HybridQA 💻 GitHub 📝 ArXiv

Jonathan Tonglet, Manon Reusens, Philipp Borchert, Bart Baesens

EMNLP 2023 | Singapore 📍

This paper introduces SEER, a novel method for selecting diverse and representative examples for in-context learning in complex question-answering tasks. SEER formulates exemplar selection as a Knapsack problem, which allows it to optimize for desirable attributes under size constraints and outperform previous methods on the FinQA and TAT-QA benchmarks. Keywords: in-context learning, integer linear programming