Advancing Scalable Video Understanding with Specialized Spatiotemporal Architectures

PhD Student, NTU Singapore CIS Scholar, A*STAR IยฒR

I'm a Year 3 PhD student at Nanyang Technological University (School of Electrical & Electronic Engineering) and a CIS Scholar at A*STAR's Institute for Infocomm Research (IยฒR). My advisor is Prof. Xudong Jiang.

My research focuses on efficient video understanding using state space models (Mamba), achieving significant spatiotemporal compression while maintaining segmentation accuracy. I'm particularly interested in making large foundation models computationally tractable for real-world deployment.

Originally from Tamil Nadu, India, I graduated with First-Class Honours while working full-time as a Research Assistant, where I led R&D projects that secured >$200K in government funding and supervised 40+ students.

Video Segmentation State Space Models (Mamba) Token Compression Foundation Models Efficient Deep Learning SLAM

News

Jan 2026
๐Ÿ”ฌ Research Trip: Preparing for a research collaboration visit to Tianjin, China.
2025
๐ŸŽ‰ CVPR 2025 Accepted: Our paper "TV3S: Exploiting Temporal State Space Sharing for Video Semantic Segmentation" has been accepted to CVPR 2025 (Acceptance Rate: 22.1%)! My first CVPR paper as first author.
2025
๐Ÿ’ฐ Grant: Awarded the Lambda Research Grant for GPU computing resources.
Sep 2024
๐ŸŽ“ Leadership: Promoted to Asst. President (Academic) at NTU Graduate Student Association.
Jan 2024
๐Ÿš€ PhD Started: Began my PhD journey at NTU with the prestigious A*STAR Computing and Information Science (ACIS) Scholarship.

Education & Experience

Jan 2024 โ€” Present

Doctor of Philosophy (PhD)

Nanyang Technological University, Singapore

School of Electrical and Electronic Engineering
Topic: Effective and Label-Efficient Visual Perception with Large Foundation Models
Advisor: Prof. Xudong Jiang

GPA: 5.0/5.0 A*STAR CIS Scholar
Jan 2024 โ€” Present

A*STAR Research Scholar (ACIS)

A*STAR - Agency for Science, Technology and Research

Institute for Infocomm Research (IยฒR), Fusionopolis, Singapore
Computing & Intelligence Systems Programme

  • Conducting cutting-edge research in AI-powered video understanding and deep learning
  • Pushing boundaries of image and video understanding using foundation models
Jul 2020 โ€” Dec 2023

Bachelor of Engineering (Honours)

Nanyang Technological University, Singapore

Electrical and Electronic Engineering
FYP: Deep Features based Real-Time SLAM

GPA: >4.9/5.0 First-Class Honours Dean's List ร—4
Nov 2020 โ€” Jan 2024

Research Staff (Computer Vision)

Republic Polytechnic, School of Engineering

Concurrent with Bachelor's studies (full-time work + part-time degree)

  • Led R&D team developing cost-effective near-field pose-estimation system, securing >$200K government funding
  • Published multiple conference and journal papers in computer vision and AI applications
  • Supervised and mentored 40+ students, achieving competition wins
  • Implemented real-time SLAM visualization pipeline using Python, C++, C#, and ROS
Mar 2018 โ€” Aug 2018

Research Intern

Continental Automotive R&D, Singapore

  • Collaborated on 3+ research projects: UV bus passenger monitoring, autonomous pick & place robot, iOS indoor localization with Bluetooth Beacons, RTK GPS evaluation
  • Managed project showcases during company open house events
2016 โ€” 2019

Diploma in Electrical & Electronic Engineering

Republic Polytechnic, Singapore

Final Year Project: Improving Recognition Performance for Low-Resolution Images Using DBPN

GPA: >3.9/4.0 Director's Roll of Honor 4ร— Module Prize 15/19 Distinctions

Honors & Awards

2025
CVPR 2025 Paper Accepted
First-author paper at the #1 Computer Vision conference (22.1% acceptance rate)
2025
Lambda Research Grant
GPU computing resources for deep learning research
2024
A*STAR CIS Scholarship
Prestigious computing scholarship for PhD studies
2021-2024
NTU Dean's List (ร—4)
Top 5% of cohort for four consecutive academic years
2022
Engineering Innovation Challenge Winner
As project supervisor
2021
Lee Hsien Loong IDM Smart Nation Award
As project supervisor
2021
IEEE ICME Grand Challenge Bronze
As team leader
2019
National IES Innovation Challenge Bronze
National-level engineering competition
2019
Director's Roll of Honor
Republic Polytechnic academic excellence award

Skills

Deep Learning

PyTorch TensorFlow Scikit-Learn W&B

Computer Vision

Video Segmentation State Space Models Foundation Models OpenCV SLAM

Programming

Python C++ C# ROS

Tools & Systems

Linux/Ubuntu Docker Git SLURM

Languages

English (Fluent) Tamil (Fluent) Hindi (Fluent) Malay (Native)

Publications

๐Ÿ† Top-Tier Venues

TV3S: Exploiting Temporal State Space Sharing for Video Semantic Segmentation

Syed Ariff Syed Hesham, Yun Liu, Guolei Sun, Henghui Ding, Jing Yang, Ender Konukoglu, Xue Geng, Xudong Jiang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 CVPR 2025 22.1% Acc. Rate

Efficient video segmentation using shared temporal state spaces in Mamba architecture for reduced computational cost.

PIX2PT Map for Transfer-Based Few-Shot Learning

Syed Ariff Syed Hesham, Sui JinZhou*, Gabrielle Ee Song Xin, Lek Chen Ping, Lijun Jiang

IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2021 ICME Workshop

Pixel-to-prototype mapping framework for improving transfer learning in few-shot classification tasks.

๐Ÿ“š Journal Publications

Evaluating SAM2 for Video Semantic Segmentation

Syed Hesham Syed Ariff, Yun Liu, Guolei Sun, Jing Yang, Henghui Ding, Xue Geng, Xudong Jiang

Machine Intelligence Research, 2025 Journal

Benchmark and analysis of Segment Anything Model 2 for video-level semantic segmentation tasks.

Leveraging on Few-Shot Learning for Tire Pattern Classification in Forensics

Lijun Jiang, Syed Ariff Syed Hesham*, Keng Pang Lim, Changyun Wen

Journal of Automation and Intelligence, 2023 Journal

Few-shot learning approach for tire pattern identification in forensic investigations with limited training samples.

๐Ÿ“ Preprints & Under Review

STAC: Selective Spatiotemporal Aggregation and Compression for Video Reasoning Segmentation

Syed Ariff Syed Hesham, Yun Liu, Guolei Sun, Jing Yang, Henghui Ding, Xue Geng, Xudong Jiang

CVPR 2026 Conference Submission, December 2025 Under Review

Efficient state-space compression framework that achieves 85% token reduction while maintaining competitive accuracy with 1.8ร— speedup in both training and inference.

A Comprehensive Survey on Video Scene Parsing: Advances, Challenges, and Prospects

Guohuan Xie, Syed Ariff Syed Hesham*, Wenya Guo, Bing Li, Ming-Ming Cheng, Guolei Sun, Yun Liu

arXiv:2506.13552, 2025 Under Review

Holistic review of video scene parsing covering semantic, instance, panoptic segmentation and open-vocabulary methods.

๐ŸŽค Conference Publications

Tracking and Monitoring of Underwater Object with SLAM

Lijun Jiang, Syed Ariff Syed Hesham*, Lan JunHang, Seah Kai Wen Kelvin, Yuhan Jiang, Yao Mengdi, Wei Dongliang, Bo Jiang

IEEE 19th Conference on Industrial Electronics and Applications (ICIEA), 2024 ICIEA 2024

Visual SLAM system for real-time tracking and monitoring of underwater objects in challenging aquatic environments.

Adapted Lightweight MobileNet for Tire Pattern Classification

Syed Ariff Syed Hesham, Lijun Jiang, Keng Pang Lim, et al.

IEEE 18th Conference on Industrial Electronics and Applications (ICIEA), 2023 ICIEA 2023

Efficient mobile-optimized deep learning model for automated tire pattern recognition and classification.

A Versatile Application for Visual SLAM with Object Detection

Lijun Jiang, Syed Ariff Syed Hesham*, Keng Pang Lim, Yusong Wang, Hongkai Lin, Yuhang Zhao

IEEE 17th Conference on Industrial Electronics and Applications (ICIEA), 2022 ICIEA 2022

Integrated framework combining visual SLAM with real-time object detection for versatile robotic applications.

YOLO Based Thermal Screening Using AI for Instinctive Human Facial Detection

Lijun Jiang, Syed Ariff Syed Hesham, Keng Pang Lim, Krishnadas Manoj*, et al.

IEEE 17th Conference on Industrial Electronics and Applications (ICIEA), 2022 ICIEA 2022

YOLO-based AI system for automated thermal screening and facial detection in public health monitoring.

Borescope Tracking and Visualization of Internal Aero-Structure with VSLAM

Lijun Jiang, Syed Ariff Syed Hesham*, Keng Pang Lim, Sui JinZhou, Bo Jiang

IEEE 17th Conference on Industrial Electronics and Applications (ICIEA), 2022 ICIEA 2022

Visual SLAM-based borescope navigation system for 3D reconstruction and inspection of internal aircraft structures.

Improving Recognition Performance for Low-Resolution Images Using DBPN

Lijun Jiang, Keng Pang Lim, Syed Ariff Syed Hesham*

IEEE 16th Conference on Industrial Electronics and Applications (ICIEA), 2021 ICIEA 2021

Deep Back-Projection Network for super-resolution enhancement of low-resolution images to improve recognition accuracy.

*denotes corresponding/equal contribution

โ†’ View all publications on Google Scholar

Get in Touch

Location

Singapore

Lab

A*STAR IยฒR, Fusionopolis

_________________________

Rapid-Rich Object Search Lab (ROSE) | NTU Singapore