Résumé

Felix Hirwa Nshuti

I am Felix Hirwa Nshuti, an MS student in Electrical and Computer Engineering at Carnegie Mellon University (CMU). My work resides at the critical intersection of computer architecture, optimizing compilers, and theoretical machine learning. I focus on building next-generation ML infrastructure by integrating low-level systems engineering with high-level mathematical abstractions, specifically leveraging convex optimization to close the gap between algorithmic intent and architectural execution.

My research and development efforts center on compiler infrastructure for AI workloads, exploring how intermediate representations (IR) can effectively transform computation graphs into hardware-aware schedules. Additionally, I specialize in ML-driven signal processing, where I apply tensor-level optimizations to ensure predictable performance and scalability in real-time inference environments.

As an active open-source contributor and maintainer, I am dedicated to community collaboration and the development of verifiable, efficient AI systems. Previously, I completed my Bachelor's in Computer Science and Engineering at Pandit Deendayal Energy University (PDEU), focusing on programming languages, compilers, and machine learning.

  • Optimization & Theory: Deeply interested in the intersection of theoretical machine learning and convex optimization. Experienced in applying mathematical rigor to close the gap between algorithmic intent and architectural execution.
  • Signal Processing: Specialized in the design of efficient, scalable infrastructure for ML-driven signal processing, leveraging tensor-level optimizations to ensure predictable performance in real-time inference environments.
  • Systems for ML: Research and development in compiler infrastructure for AI workloads, focusing on how intermediate representations (IR) transform computation graphs into hardware-aware schedules.
  • Objective: To build next-generation ML systems that integrate low-level systems engineering with high-level mathematical abstractions for efficient and verifiable AI deployment.

In my free time, I play football and share memes with friends 😎

profile photo

Research

Accepted Conference Talks

  • TransISA: A Static Assembly Transpiler for Automating x86-to-ARM Migration in Scientific Computing.
    Improving Scientific Software Conference, Boulder CO — 2026

Selected Projects

  • Enhancing Prophet with Question-Aware Captioning for Knowledge-Based VQA

    Knowledge-Based Visual Question Answering — Python, PyTorch, Transformers, NLP, CV, VQA — Aug 2025 – Dec 2025

    • Aimed to improve the performance of knowledge-based VQA systems by integrating question-aware captioning techniques into the Prophet framework.
    • Developed a module that generates contextually relevant captions based on the input question, enhancing the model's ability to retrieve and utilize external knowledge effectively.
  • Context-Aware Demand Forecasting in Pittsburgh's Bike Share System

    Time Series Forecasting with Contextual Features — Python, Pandas, NumPy, XGBoost, CatBoost, statsmodels — Aug 2025 – Dec 2025

    • Developed a context-aware demand forecasting model incorporating temporal, weather, and event-based features to enhance prediction accuracy.
    • Employed advanced time series analysis techniques and machine learning algorithms to capture complex patterns in bike usage data.
  • TransISA — Lightweight CISC-to-RISC Transpiler

    Compiler Design and Architecture Translation — LLVM, x86, AArch64, C++, IR, CFG — Dec 2024 – May 2025

    • Designed and implemented a compiler pipeline using the LLVM C++ API to translate x86 assembly to AArch64 assembly via LLVM IR.
    • Extracted and analyzed Control Flow Graphs (CFGs) of source programs to support instruction-level translation.
  • Scaling Deep Learning Backends with sktime

    Machine Learning with Time Series — Python, PyTorch, TensorFlow, Darts, CI/CD — May 2024 – Aug 2024

    • Added new classification models to sktime using PyTorch (GRU and GRU-FCNN Classifiers).
    • Efficiently migrated the classifier models from legacy sktime-dl to the sktime main repository.
    • Implemented the modular interface of darts regression models in sktime.

Some Open-Source Contributions

  • Ivy - A Unified AI Framework (Maintainer: 2022 - 2023)
  • pytorch-forecasting - Time series forecasting with PyTorch (Maintainer: 2024 - Present)
  • sktime - A unified framework for machine learning with time series. (Maintainer: 2024 - Present)
  • pytorch-lightning - Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes. (Contributor: 2024 - Present)