Ayman Mahfuz

Computer Science Student | Software Engineer & Machine Learning Researcher

RoboCup 2025 Bronze Patent Pending 4 Research Labs ML @ Arm

LinkedIn GitHub Read my Blog View Resume

Featured Impact

99.8th

Percentile

Patent-Pending ML Validation

First-of-its-kind Bayesian Optimization framework that discovers worst-case hardware stress in <1% of search space at Arm

Python Bayesian Optimization Random Forests

🥉 Bronze

RoboCup 2025

Multi-Agent RL for Robot Soccer

Core AI developer for UT Austin Villa's 7v7 robot soccer team, building RL-powered agent skills that won 3rd place globally

Reinforcement Learning C++ GPU Optimization

150M+

Data Entries

Large-Scale Media Analysis Pipeline

Built end-to-end data/ML infrastructure processing 50M+ articles and 70M+ comments with 99%+ accuracy BERT models at UT's Center for Media Engagement

BERT BigQuery React/Flask

About Me

Hi, I'm Ayman. I'm a cs student at UT Austin (Hook 'em) pursuing a concentration in AI and ML. I've worked across 4 labs at UT as a undergrad research assistant, from building RL envs for robot soccer to implementing and benchmarking ViTs and LLMs to aid medical research. I currently do ML research at ARM to optimize post-silicon validation. I love building impactful projects and solving complex problems, and I'm extremely passionate about deep learning.

Patent-pending ML validation at Arm: Indusrty-first Bayesian optimization that uncovers worst-case CPU & memory stress in <1 % of the search space.
RoboCup 2025 bronze: Worked on multi-agent reinforcement-learning skills and enviorments powering UT Austin Villa's 7-v-7 robot soccer team, where we won bronze in a global tournament.
200 M-entry media pipeline: Built data/ML stack & fine-tuned BERT models (99 %+ accuracy) for UT's Center for Media Engagement.
Pancreas MRI segmentation on H100s using computer vision and transformers: Engineered 3D medical-image pipeline, achieving +12 % Dice at the Oden Institute.
Youngest speaker, UT AI Health Symposium: Presented research on multi-agent LLM clinical reasoning.

Education

The University of Texas at Austin

Bachelor of Science

Location: Austin, TX, USA

Major: Computer Science

Concentration: Artificial Intelligence and Machine Learning

Relevant Coursework:

Generative Visual Computing
Natural Language Processing
Science of High Performance Computing
Data Structures
Computer Architecture and Organization
Computer Systems and Operating Systems
Algorithms
Linear Algebra
Probability

Experience

Arm

ML Research Engineer (Part-Time) | Austin, TX • May 2025 – Present

Independently proposed and built patent-pending ML framework using Bayesian Optimization to automatically discover worst-case hardware stress tests, achieving 99.8th-percentile stress levels while exploring <1% of configuration space. Automated 10,000+ hours of validation testing for next-generation AI platforms.

Invented dual-surrogate Bayesian Optimization pipeline using Random Forests to navigate vast, non-linear search space of hardware parameters
Achieved 99.8th-percentile hardware stress by intelligently exploring <1% of configuration space, automating 10,000+ validation hours
Earned executive-level (SVP) recognition and pending patent; framework now key part of Arm's validation strategy

CURRENT

University of Texas – AI Lab, Texas Robotics

Research Assistant | Austin, TX • Jan 2025 – Present

Developed AI-driven agent skills that helped secure 3rd place at RoboCup 2025. Built RL-based policies for walking, dribbling, and attacking in 400K-line C++ codebase, slashing training time 70% through GPU optimization.

Designed hierarchical RL policies for all attacker behaviors, proving more robust than classical methods
Reduced RL training time 70% through aggressive GPU optimization and C++ simulator tuning
Pioneered curriculum learning strategies and novel reward shaping (Pitch Control, xG) for multi-agent RL

BRONZE

The Sunwater Institute

Data Engineer Intern | North Bethesda, MD • Jan 2025 – March 2025

Built high-performance data pipelines for Legis-1 Platform, processing millions of legislative documents. Developed LLM pipelines using RAG and embeddings to analyze 500K+ legal records for AI-driven policy research.

Optimized retrieval speed, storage efficiency, and AI-readiness for legislative database with millions of documents
Developed LLM pipelines with RAG, embeddings, and scalable processing across 500K+ legal records

University of Texas – Center for Media Engagement

Software Engineer, Research Assistant | Austin, TX • Sep 2023 – May 2025

Built 150M-entry dataset processing 50M+ articles and 70M+ comments. Fine-tuned BERT models achieving 99% accuracy for NLP tasks. Designed React/Flask/Firebase platform serving 1,000+ participants with 99.99% uptime.

Engineered data pipelines to BigQuery using APIs, sitemaps, and Pandas; built dashboards with Python and SQL
Fine-tuned BERT models for clickbait detection, story ID, entity recognition, sentiment analysis (99% accuracy)
Built React/Flask/Firebase platform with 3 interactive games, MTurk integration, 15+ metrics tracking

University of Texas - Oden Institute

ML Research Assistant | Austin, TX • Feb 2024 – Jan 2025

Built containerized 3D pancreas MRI segmentation pipeline on H100 supercomputer using CNNs and transformers. Achieved +12% Dice gain matching SOTA performance across 1000+ scans with 5-fold cross-validation.

Engineered Apptainer/SLURM pipeline on TACC H100s, achieving +12% Dice gain with hybrid architectures
Benchmarked CNNs, vision transformers, and hybrids across 1000+ scans, finding PanSegNet excels for small organs
Resolved GPU memory bottlenecks, I/O lag, and mixed precision instability for stable large-scale training

University of Texas - School of Information

Research Assistant | Austin, TX • Feb 2024 – Jan 2025

Designed multiagent LLM research project studying diagnostic consistency in medical reasoning. Applied Cohen's Kappa, Chi-square tests, and logistic regression to assess agreement and bias. Presented at UT AI Health Conference as youngest speaker.

Project on page 124 of this report
Tested multiagent LLM consistency with demographic/symptom variations; analyzed inter-agent communication patterns
Applied Cohen's Kappa, Chi-square, logistic regression to assess agreement, accuracy, and bias across agents

Lockheed Martin

Software Engineer Intern | Remote • Jun 2022 – Oct 2022

Optimized CRM workflows with JavaScript and RPA, achieving centralized device data framework. Refined Configuration Database, purging redundant records and presenting data-driven insights to executives.

Developed CRM workflows achieving centralized device data framework for enhanced enterprise efficiency
Implemented RPA for data de-duplication, streamlining processes and elevating data integrity

University of Maryland – College Park

Research Intern | Remote • Jun 2023 – May 2024

Built NLP-driven chatbot for online news engagement using deep learning. Executed text analytics with POS tagging, LIWC, and sentence embeddings. Published research at CHI 2024 conference.

Published at CHI 2024 conference
Led NLP chatbot development for news reader engagement, conducting studies on human-chatbot dynamics
Performed text analytics using POS tagging, LIWC, and clustering on sentence embeddings with Python

City of Austin

Software Engineer Intern | Austin, TX • Jun 2021 – Aug 2021

Improved post-COVID loan processing workflows for small businesses using Python scripting and data visualization to streamline operations.

AT&T

Summer Learning Academy | Austin, TX • Jun 2021 – Aug 2021

As youngest participant, gained exposure to AI, business strategies, and professional development while collaborating on tech-focused initiatives.

Projects

Deep technical work spanning AI systems, LLMs, and generative models

Live Demo AI-First

Cursor for Product Managers

A Jira alternative with AI-first workflows

Natural language ticket management, auto-generated daily briefings, and live meeting capture that converts discussions into tickets automatically. Built an agentic system that reasons over your entire project context.

                25+
                Callable Tools
              

                Hybrid
                RAG Layer
              

                Live
                Meeting Capture
              

Agentic LLM system via OpenAI function calling for contextual reasoning over tickets, GitHub PRs, commits, and transcripts
Hybrid RAG combining SQL filtering with vector embeddings across all project data
Auto-briefings that synthesize overnight activity, surface blockers, and recommend prioritized actions

OpenAI Function Calling RAG Vector Embeddings React PostgreSQL

Try Live Demo

From Scratch 27.03 PPL

Modern LLM Training Pipeline

253M parameter model outperforming GPT-2

Frontier-style LLM training pipeline from scratch with modern architectural choices and a complete alignment workflow: Pretrain → SFT → DPO → Verifier.

                253M
                Parameters
              

                -33%
                vs GPT-2 PPL
              

                4
                Training Stages
              

RoPE: Rotary Position Embeddings for better length extrapolation
RMSNorm: Root Mean Square LayerNorm for faster, stable training
SwiGLU: Gated Linear Units with Swish activation (2-4% better than GELU)
Attention Sinks: Learnable tokens for stable generation beyond training context

PyTorch RoPE RMSNorm SwiGLU DPO RLHF

View Repository

From Scratch MNIST + CIFAR

Diffusion Models: DDPM & DDIM

Denoising diffusion from first principles

Complete DDPM and DDIM implementations in PyTorch without relying on Hugging Face or diffusers. Trained on MNIST and CIFAR-10 with full analysis of speed/quality trade-offs.

                2
                U-Net Architectures
              

                DDPM
                + DDIM Samplers
              

                EMA
                Weight Averaging
              

MNIST U-Net: Cosine noise schedule, sinusoidal time embeddings, self-attention bottleneck
CIFAR-10 U-Net: Multi-resolution self-attention, dropout, EMA for stable sampling
DDIM Sampling: Deterministic generation with arbitrary step counts (10-1000)
Full speed/quality analysis comparing DDPM vs DDIM across step counts

PyTorch U-Net DDPM DDIM EMA SLURM/HPC

View Repository

More Projects

Inkwell

YouTube for Books - dynamic book-sharing platform empowering authors

React • Django • AWS S3

InReach AI

AI networking tool with 200+ users - auto-drafts professional outreach

Flask • React • OpenAI

CodeXray

Graph-powered codebase explorer with GPT-4o for architecture visualization

React • AST • GPT-4o

LeetCode Matchmaker

Find similar LeetCode problems using ML-powered matching

Flask • Scikit-learn • PostgreSQL

EAGLE Router

Training-free multi-LLM router using ELO ratings and embeddings

Python • OpenAI • HuggingFace

Skills

Programming Languages

Python Java C JavaScript HTML/CSS Ruby C++ PHP

Frontend Development

React.js Node.js HTML/CSS

Backend Development

Flask Django Node.js

Data Science & Machine Learning

Pandas NumPy Scikit-learn

Databases

SQL PostgreSQL

Tools & Libraries

Git AWS Google Cloud Platform

Miscellaneous

ARM64 MATLAB

📄 Resume

Resume

View my latest resume

View

Hobbies & Interests

Deep Learning

I'm passionate about learning and keeping up with the latest advancements in deep learning, from new architectures to real-world applications.

Weightlifting

I love spending time in the gym and hitting new maxes

Soccer

I've been playing soccer since I could walk. If I'm not working, you can find me on the nearest field!

Family and Friends

I cherish spending quality time with friends and family, whether it's a casual hangout or a special gathering.

Startups

I spend a lot of time working on my startups. Outside of coding, it's asking people I'm close with for advice and feedback on my current projects.