Hieu Quoc Nguyen

Hieu Quoc Nguyen

ML/AI x Engineering x Business

Georgia Institute of Technology

University of Waterloo

About Me

Hi, I’m a Senior Data Scientist/ML Engineer with a firm grounding in statistics and computer science from the University of Waterloo and Georgia Institute of Technology. I have a track record of operating at the intersection of people, business, engineering and science - collaborating with cross-functional teams to design and build high-performance ML systems.

My expertise lies in translating complex business challenges into scalable, data-driven, science-backed solutions that drive measurable business impact and operational efficiency. I specialize in developing machine learning solutions across critical domains including Trust & Safety, fraud detection, content moderation, browser security, search/ranking and financial services.

I’m passionate about building robust ML systems that serve hundreds of millions of users with high availability, leveraging advanced techniques in NLP, search/ranking, distributed systems, and applied mathematics.

Interests
  • Statistics & Applied Mathematics
  • Machine Learning & Deep Learning (NLP, Search/Ranking)
  • Distributed Systems & MLOps
Education
  • Master of Science in Computer Science, 2028

    Georgia Institute of Technology

  • Bachelor of Mathematics, 2019

    University of Waterloo

Core Technologies & Skills

Machine/Deep Learning

NLP/LLM, search/ranking, and distributed ML systems with focus on business impact

Programming & Data

Python, SQL for scalable data processing and analysis with statistical and mathematical foundations

Infrastructure & Cloud

AWS, Databricks, Docker, Kubernetes, Terraform for high-performance ML system deployment

Data Engineering

Spark, Apache Airflow, dbt, Redis, MongoDB for robust data pipelines and distributed systems

MLOps & Automation

GitHub Actions, CI/CD pipelines, model versioning and monitoring for scalable ML operations

Web Frameworks

Flask, Django for building production ML services and APIs with high availability

Experiences

 
 
 
 
 
Senior Data Scientist
March 2025 – Present Toronto, Canada
• We help people get jobs while keeping our users and platform safe
 
 
 
 
 
Machine Learning Engineer/Data Scientist
March 2024 – March 2025 Toronto, Canada

• Led end-to-end MLOps Strategy & Cloud Architecture for McAfee’s Search Acceleration initiative (half billion dollar business)

• Designed and deployed high-performance ML infrastructure using Terraform, Databricks Assets Bundle (IaC) and GitHub Actions, reducing model deployment time by over 50%

• Led ML service design and deployment on AWS supporting 200M+ weekly active devices with 99.99% availability worldwide and 95th percentile response time under 100ms

 
 
 
 
 
Technical Co-Founder
b(x) Theory - Insolvency Analytics
August 2022 – September 2023 Toronto, Canada
• Along with the team, did whatever necessary to actualize b(x) Theory, an innovative fintech service that leverages high quality data, intuitive visualization and machine learning to transform how financial advisors, c-suite executives manage risks and identify opportunities in restructuring/special situations in Canada.
 
 
 
 
 
NLP Data Scientist
April 2021 – August 2022 Toronto, Canada

• Led end-to-end research, development, and deployment of novel NLP deep learning services on AWS (ECS, EC2, EMR, SageMaker)

• Achieved under 20ms inference latency per request with 99.99% service availability across global markets for domain name auto-completion

• Delivered 18% uplift in premium domain sales, 16% surge in search volumes, and 30% increase in click-through rates, resulting in $20M+ USD annual SERP revenue increase

• Enhanced performance through personalization based on language, region, and device preferences using FNet, Transformers, BERT, CNN, and bi-directional LSTM architectures

 
 
 
 
 
NLP Data Scientist
January 2020 – April 2021 Toronto, Canada

• Led R&D of NLP solution for Canadian Banking Operations to detect crucial payment-related emails from commercial clients nationwide

• Transformed unsupervised text classification into semi-supervised learning paradigm, boosting true positive rates by 19% while reducing false negatives by 35%

• Achieved projected annual savings of $1M CAD through advanced techniques including topic modeling with LDA, XGBoost and BERT embeddings

 
 
 
 
 
Various Internships in Data Science and Software Engineering
RBC Capital Markets, RBC Global Cybersecurity, and Marsh Canada Ltd
May 2017 – September 2019 Toronto, Canada
Gained foundational experience in data science, software engineering, and financial technology across multiple industry-leading organizations.

Research/Projects

*
Generative AI with Large Language Models Course Notes
Comprehensive course notes of “Generative AI with Large Language Models (LLMs)” Course, offered by DeepLearning.ai
Generative AI with Large Language Models Course Notes
AI Research Agent - R0D1
R0D1 is an autonomous research agent powered by Flask API. The end-to-end pipeline from Docker containerization, image push to AWS ECR, and deployment to AWS ECS Fargate with a load balancer is fully automated with Github Actions
AI Research Agent - R0D1
End-to-End Python ETL Pipeline
This project demonstrates the construction of an end-to-end Python ETL (Extract, Transform, Load) pipeline using AWS services. The pipeline is designed to extract Toronto real estate property data from the Zillow Rapid API and process it through various AWS components, such as EC2, S3, Lambda, Redshift, and QuickSight
End-to-End Python ETL Pipeline
Alpha Go
Mastering the game of Go with deep neural networks and tree search
Alpha Go
Latent Dirichlet Allocation (LDA) Algorithms
Full Derivation of Latent Dirichlet Allocation (LDA) with Variational E-M Algorithms
Latent Dirichlet Allocation (LDA) Algorithms
Graph Convolution Networks (GCN)
An Introduction to Graph Convolution Networks (GCN)
Graph Convolution Networks (GCN)