Model Architecture Design

model architecture deep-learning

The core of the recognition system is a custom Attention-based Bidirectional LSTM (BiLSTM) network designed for processing streaming skeletal data.

Architecture Overview

The model takes a sequence of spatial keypoints and outputs a probability distribution over the sign classes.

Diagram

graph TD
    Input["Input Sequence (T,F)"] --> SGE[Spatial Group Embedding]
    SGE --> L1[ResBiLSTM Block 1]
    L1 --> L2[ResBiLSTM Block 2]
    L2 --> L3[ResBiLSTM Block 3]
    L3 --> L4[ResBiLSTM Block 4]
    L4 --> Attn[Multihead Attention]
    Attn --> Pool[Attention Pooling]
    Pool --> FC[Classifier Head]
    FC --> Softmax[Softmax Probabilities]

1. Spatial Group Embedding (SGE)

Before temporal processing, we independently project distinct body parts into a shared latent space. This allows the model to learn part-specific features.

Inputs: Pose, Face, Left Hand, Right Hand.
Projections: 4 separate Linear layers.
Fusion: Concatenation → GELU → BatchNorm → Permute.
Output: A unified feature vector per time step.

2. Residual BiLSTM Layers

We use a stack of BiLSTM blocks to capture temporal dependencies.

Bidirectional: Processes the sequence forwards and backwards to capture context.
Residual Connection: The input to the block is added to the output to prevent vanishing gradients.
Layer Normalization: Applied after the residual addition for stability.

3. Self-Attention Pooling

Instead of simply taking the last hidden state (which loses early context) or averaging all states (which dilutes information), we use a Self-Attention mechanism.

Query: The model learns to weigh each time step based on its relevance.
Weighted Sum: The final representation is a weighted sum of all time steps.

4. Classification Head

Dropout: For regularization.
Linear Layer: Maps the pooled representation to num_classes logits.

Arabic Sign Language

Explorer

Model Architecture Design

Model Architecture Design

Architecture Overview

Diagram

1. Spatial Group Embedding (SGE)

2. Residual BiLSTM Layers

3. Self-Attention Pooling

4. Classification Head

Table of Contents

Graph View

Backlinks

Arabic Sign Language

Explorer

Model Architecture Design

Model Architecture Design

Architecture Overview

Diagram

1. Spatial Group Embedding (SGE)

2. Residual BiLSTM Layers

3. Self-Attention Pooling

4. Classification Head

Related Documentation

Table of Contents

Graph View

Backlinks