mmap_dataset_preprocessing.py

File Path: src/data/mmap_dataset_preprocessing.py

Purpose: Compiles thousands of individual .npz keypoint files into a monolithic memory-mapped binary file for efficient training.

Process Overview

Scan: Iterates over all (Signer, Word) pairs.
Load: Reads every .npz file into RAM (accumulating a large list).
Concatenate: Merges into a single (Total_Frames, 184, 4) float32 array.
Save:
- X.mmap: The raw binary data.
- y.npz: Corresponding labels per sample.
- X_shape.npy: Dimensions metadata.
- X_map_samples_lens.npy: Lookup table for sample lengths.

Functions

`load_raw_kps(...)`

Traverses the NPZ_KPS_DIR and aggregates data.

Handling Missing Data: Prints error but continues if a file is missing.

`mmap_process_and_save_split(...)`

Orchestrates the conversion for a specific split (train/test).

Memory Management: Uses gc.collect() and del to free RAM after processing each split to avoid OOM kills.

CLI Usage

python src/data/mmap_dataset_preprocessing.py \
    --splits train test \
    --signers 01 02 03 \
    --selected_signs_from 1 --selected_signs_to 502

Output Structure

data/
└── word-level-arabic-sign-language-preprcsd-keypoints/
    ├── train_X.mmap      (Several GBs)
    ├── train_y.npz
    ├── train_X_shape.npy
    └── train_X_map_samples_lens.npy

Depends On:

constants.py - Directory paths

Used By:

Used offline before training.
Generates data for mmap_dataset.py

Arabic Sign Language

Explorer

mmap_dataset_preprocessing.py

mmap_dataset_preprocessing.py

Process Overview

Functions

`load_raw_kps(...)`

`mmap_process_and_save_split(...)`

CLI Usage

Output Structure

Table of Contents

Graph View

Backlinks

Arabic Sign Language

Explorer

mmap_dataset_preprocessing.py

mmap_dataset_preprocessing.py

Process Overview

Functions

load_raw_kps(...)

mmap_process_and_save_split(...)

CLI Usage

Output Structure

Related Documentation

Table of Contents

Graph View

Backlinks

`load_raw_kps(...)`

`mmap_process_and_save_split(...)`