Vision Transformer (ViT-B/16) Architecture Implementation arrow_outward
PythonPyTorchTorchvisionTorchinfoNumPyMatplotlibPILKagglehubJupyter NotebookGoogle ColabGitGitHub
Implemented the Vision Transformer (ViT-B/16) architecture from scratch in PyTorch, following the paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." Manually built all core components, including convolutional patch embeddings, class and positional embeddings, Multi-Head Self-Attention (MSA) and MLP blocks with Layer Normalization (LN) and residual connections, as well as the final classification head.
Used the equations and architectural definitions from the original paper to reason about data flow and tensor transformations throughout the model, explicitly tracking tensor shapes step-by-step from input images to output classification in order to ensure correctness and deepen understanding of the model structure.
Validated the implementation end-to-end by training the model from scratch on a 5-class weather image classification dataset sourced from Kaggle. Documented training simplifications relative to the paper and compared the custom implementation with PyTorch's built-in ViT.
Autonomous Vehicle Path Planning, Deep Learning & Ethics
International Baccalaureate Programme IB Extended Essay November 2023 – February 2025
PythonTensorFlowPyTorch
Produced a 4000-word research paper evaluating the societal, ethical, and regulatory impacts of autonomous vehicles (AVs) through empirical research and academic literature. Analyzed deep learning applications in AV perception and path-planning systems, assessing both technical capabilities and ethical limitations.
Designed and conducted a primary survey on public perceptions of AV safety and adoption, generating quantitative insights through data analysis. Synthesized primary and secondary sources to develop evidence-based predictions on future AV regulation and adoption trends.