lazy_dataset.py
source data pytorch dataloader
File Path: src/data/lazy_dataset.py
Purpose: A PyTorch Dataset implementation that loads individual .npz files on demand (Lazy Loading).
Overview
Unlike the Memory-Mapped dataset (which loads a massive monolithic file), this class keeps the data as thousands of individual small files. This is efficient for memory but incurs higher I/O overhead (many open() calls).
Class LazyKArSLDataset
Inherits: torch.utils.data.Dataset
__init__
Parameters: split, signers, signs, transforms.
Logic:
- Initializes
TSNSamplerandDataAugmentor. - Iterates over all requested
signsandsigners. - Checks for existence of
.npzfiles inNPZ_KPS_DIR. - Builds a list of metadata tuples:
self.samples = [(signer, vid_id, label), ...].
_load_file(path)
Decorator: @lru_cache(maxsize=1024)
Purpose: Caches recently accessed file contents to reduce disk I/O for frequently accessed samples (though in efficient training, re-access is rare per epoch).
__getitem__(index)
Logic:
- Retrieves metadata
(signer, vid, label). - Constructs file path.
- Loads raw keypoints via
_load_file. - Sampling: Applies
TSNSamplerto get fixed-lengthSEQ_LEN. - Transform: Applies spatial augmentation.
- Return:
(FloatTensor, LongTensor).
Comparison
| Feature | Lazy Dataset | MMap Dataset |
|---|---|---|
| Startup Time | Slow (File Scanning) | Fast (Offset Calc) |
| Memory Usage | Low | Low (Virtual Mem) |
| IO Pattern | Random Small Reads | Random Seek/Read |
| Flexibility | High (Add/Remove files) | Low (Rebuild MMap) |
Related Documentation
Depends On:
- data_preparation.py -
TSNSampler,DataAugmentor - constants.py -
NPZ_KPS_DIR
Used By: