Technical Methodology

Detailed breakdown of the AlzDetect vision transformer model and training protocol.

Training Dataset

Our model was developed using a comprehensive clinical brain MRI dataset. We utilized a multi-modal approach with extreme data augmentation to improve generalization across different scanner types.

33,600

Total Samples

4:1

Augment Ratio

224x224

Resolution

Classes

ViT-B/32 Architecture

We leverage the Vision Transformer (ViT) architecture, specifically the Base model with 32x32 patch sizes. Unlike CNNs, which process pixels locally, ViT treats image patches as visual tokens and uses self-attention to learn relationships between them globally.

Patch Embedding:MRI scans are divided into 49 non-overlapping patches.

Multi-Head Self-Attention:Enables the model to focus on various anatomical regions simultaneously (e.g., Hippocampus vs. Cortical thinning).

Explainability (XAI) Protocol

The AlzDetect system does not just provide a number. It outputs an "Attention Map" which is the visualization of the model's self-attention weights projected back onto the original MRI scan.

// Manual Attention Rollout Equation
identity = identity_matrix(n_patches)
total_attention = (attention_weights + identity) / 2
weighted_result = total_attention * previous_result