SlowFast
SlowFast
参考资料
官方安装依赖清单表
https://github.com/facebookresearch/SlowFast/blob/main/INSTALL.md
window 环境安装
别人是怎么做的?别人曾遇到了什么坑?别人怎么解决的?
在 window10、云平台、linux 等平台下,运行 slowfast 程序的环境是否正确?
linux 环境安装
新建一个全新的 anaconda 环境
conda create -n slowfast python=3.8
安装依赖
pytorch
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
fvcore √
参考资料
- https://blog.csdn.net/weixin_43720054/article/details/126051206#t6
- https://github.com/facebookresearch/fvcore#install
git clone https://github.com/facebookresearch/fvcore
cd fvcore
python setup.py install
这篇文章称 pip 会下载额外所需的子依赖。根据官方文章所述,这里选择本地克隆项目,然后让 pip 自己安装指定包依赖,并首先索引到本地目录做安装。
git clone https://github.com/facebookresearch/fvcore
pip install -e fvcore
使用正确的克隆地址:
git clone https://github.com/facebookresearch/fvcore.git
pip install -e fvcore
torchvision √
simplejson √
pip install simplejson
GCC >= 4.9 √
PyAV √
conda install av -c conda-forge
这里走清华镜像,故命令为:
conda install av
速度太慢,最终选择 pip
pip install av
iopath √
pip install -U iopath
psutil √
pip install psutil
OpenCV √
pip install opencv-python
pip install tensorboard
pip install pytorchvideo
pip install moviepy
pip install pytorchvideo
fairscale √
git clone https://github.com/facebookresearch/fairscale.git
pip install fairscale
cython √
根据文章所述,这里的命令为:
pip install -U cython
Detectron2
开始逐步执行这些语句
git clone https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
git clone https://github.com/facebookresearch/detectron2 detectron2_repo
git clone https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
根据文章所述,这里的命令为:
pip install -U git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
出错:
ERROR: Could not find a version that satisfies the requirement matplotlib>=2.1.0 (from pycocotools) (from versions: none)
ERROR: No matching distribution found for matplotlib>=2.1.0
尝试换成:
git clone https://github.com/cocodataset/cocoapi.git
cd cocoapi/PythonAPI
pip install -U PythonAPI
pip install -U PythonAPI
失败。不存在该依赖包。根据官方文档,改写为:
pip install -e cocoapi/PythonAPI
缺少指定版本依赖,根据官方文档,自主编写命令:
pip install 'matplotlib>=2.1.0'
取消翻墙,使用镜像。顺利完成。
再次执行:
pip install -e cocoapi/PythonAPI
出错。尝试换成:
pip install pycocotools
git clone https://github.com/facebookresearch/detectron2 detectron2_repo
pip install -e detectron2_repo
顺利完成。
$PYTHONPATH
export PYTHONPATH=/path/to/SlowFast/slowfast:$PYTHONPATH
根据参考资料,自主编写设置临时环境变量:
set PTYTHONPATH=slowfast
echo %PYTHONPATH%
set PTYTHONPATH=D:\code\web-dev-work-place\github-desktop-store\SlowFast\slowfast
echo %PYTHONPATH%
set PTYTHONPATH=%PYTHONPATH%;D:\code\web-dev-work-place\github-desktop-store\SlowFast\slowfast
echo %PYTHONPATH%
无法校验是否正确。跳转到下一个阶段。
Build PySlowFast
开始构建
python setup.py build develop
顺利。无任何错误。
开始运行?
python tools/run_net.py --cfg configs/Kinetics/C2D_8x8_R50.yaml NUM_GPUS 1 TRAIN.BATCH_SIZE 8 SOLVER.BASE_LR 0.0125 DATA.PATH_TO_DATA_DIR path_to_your_data_folder
报错 ImportError: DLL load failed while importing _imaging: 找不到指定的模块
Traceback (most recent call last):
File "tools/run_net.py", line 6, in <module>
from slowfast.utils.misc import launch_job
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\utils\misc.py", line 12, in <module>
import torchvision.io as io
File "D:\dev-evn\anaconda\envs\slowfast\lib\site-packages\torchvision\__init__.py", line 7, in <module>
from torchvision import datasets
File "D:\dev-evn\anaconda\envs\slowfast\lib\site-packages\torchvision\datasets\__init__.py", line 1, in <module>
from .lsun import LSUN, LSUNClass
File "D:\dev-evn\anaconda\envs\slowfast\lib\site-packages\torchvision\datasets\lsun.py", line 2, in <module>
from PIL import Image
File "D:\dev-evn\anaconda\envs\slowfast\lib\site-packages\PIL\Image.py", line 100, in <module>
from . import _imaging as core
ImportError: DLL load failed while importing _imaging: 找不到指定的模块。
尝试升级
pip install -U pillow
报错 ImportError: cannot import name 'cat_all_gather' from 'pytorchvideo.layers.distributed'
Traceback (most recent call last):
File "tools/run_net.py", line 6, in <module>
from slowfast.utils.misc import launch_job
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\utils\misc.py", line 19, in <module>
import slowfast.utils.logging as logging
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\utils\logging.py", line 15, in <module>
import slowfast.utils.distributed as du
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\utils\distributed.py", line 12, in <module>
from pytorchvideo.layers.distributed import ( # noqa
ImportError: cannot import name 'cat_all_gather' from 'pytorchvideo.layers.distributed' (D:\dev-evn\anaconda\envs\slowfast\lib\site-packages\pytorchvideo\layers\distributed.py)
根据issue,处理方式为
git clone https://github.com/facebookresearch/pytorchvideo.git
cd pytorchvideo
pip install -e .
根据上述的操作方式,这里改写为以下命令:
git clone https://github.com/facebookresearch/pytorchvideo.git
pip install -e pytorchvideo
报错 ModuleNotFoundError: No module named 'scipy'
Traceback (most recent call last):
File "tools/run_net.py", line 6, in <module>
from slowfast.utils.misc import launch_job
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\utils\misc.py", line 21, in <module>
from slowfast.datasets.utils import pack_pathway_output
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\datasets\__init__.py", line 4, in <module>
from .ava_dataset import Ava # noqa
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\datasets\ava_dataset.py", line 10, in <module>
from . import transform as transform
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\datasets\transform.py", line 14, in <module>
from scipy.ndimage import gaussian_filter
ModuleNotFoundError: No module named 'scipy'
根据报错,自主安装依赖:
pip install scipy
ModuleNotFoundError: No module named 'sklearn'
Traceback (most recent call last):
File "tools/run_net.py", line 9, in <module>
from demo_net import demo
File "D:\code\web-dev-work-place\github-desktop-store\SlowFast\tools\demo_net.py", line 10, in <module>
from slowfast.visualization.async_predictor import AsyncDemo, AsyncVis
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\visualization\async_predictor.py", line 12, in <module>
from slowfast.visualization.predictor import Predictor
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\visualization\predictor.py", line 15, in <module>
from slowfast.visualization.utils import process_cv2_inputs
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\visualization\utils.py", line 8, in <module>
from sklearn.metrics import confusion_matrix
ModuleNotFoundError: No module named 'sklearn'
自主编写的命令
pip install sklearn
根据文章得知,sklearn 是 scikit-learn 的缩写,应该改为:
pip install scikit-learn
RuntimeError: Distributed package doesn't have NCCL built in
注意到 window 必须要用 gloo,linux 用 nccl。
try 1
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
pip install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi
这里仅且克隆,但是不安装了。发现 git clone pytorch 会下载太多的子模块。很容易导致后续的失败。这里放弃了。
try 2
import torch
torch.cuda.is_available()
import torch
torch.__version__
# '1.8.0+cu111'
torch.cuda.nccl.is_available(torch.randn(1).cuda())
# True
torch.cuda.nccl.version()
运行了上述代码,为 false:
>>> import torch
>>> torch.__version__
'1.8.0'
>>> torch.cuda.nccl.is_available(torch.randn(1).cuda())
D:\dev-evn\anaconda\envs\slowfast\lib\site-packages\torch\cuda\nccl.py:16: UserWarning: PyTorch is not compiled with NCCL support
warnings.warn('PyTorch is not compiled with NCCL support')
False
>>> torch.cuda.nccl.version()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\dev-evn\anaconda\envs\slowfast\lib\site-packages\torch\cuda\nccl.py", line 36, in version
return torch._C._nccl_version()
AttributeError: module 'torch._C' has no attribute '_nccl_version'
参考资料的说法无头无尾的。说不清楚到底应该用什么方式处理。都是在分析能不能用,而不是怎么兼容。这个方式不合适,放弃了。
try 3
这篇 issue 给出的解决方案是增加环境变量。设置环境变量 PL_TORCH_DISTRIBUTED_BACKEND=gloo
尝试了。效果不好。正如 issue 所述,效果不好。
try 4
按照同事郭睿的说法,更改代码。这里先选择在自定义配置内改写为 gloo
D:\code\web-dev-work-place\github-desktop-store\SlowFast\build\lib\slowfast\config\custom_config.py
事实上根本判断不出来,到底是应该在那个地方改动,改成 gloo。内容过多。
终止此命令
经过一系列的查询资料。发现在 window 内硬装该项目,太容易暴毙了。环境很不适合。这里考虑参考别人的资料,下载模型权重文件,再看看本地运行效果。
如果情况不好。按照这两个策略来做:
- 笔记本电脑或者是台式机,本地新建 linux 虚拟机,看看是否可以使用到 gpu。在 linux 环境下,再完成 slowfast 的环境搭建和训练。
- 用云服务器搭建。
编纂 SLOWFAST_32x2_R101_50_50.yaml
教程的配置:
TRAIN:
ENABLE: False
DATASET: ava
BATCH_SIZE: 16
EVAL_PERIOD: 1
CHECKPOINT_PERIOD: 1
AUTO_RESUME: True
# 刚刚下载的官方权重文件的路径
CHECKPOINT_FILE_PATH: "D:/python/video_classify/SlowFast-main/weights/SLOWFAST_32x2_R101_50_50.pkl" #path to pretrain model
CHECKPOINT_TYPE: pytorch
DATA:
NUM_FRAMES: 32
SAMPLING_RATE: 2
TRAIN_JITTER_SCALES: [256, 320]
TRAIN_CROP_SIZE: 224
TEST_CROP_SIZE: 256
INPUT_CHANNEL_NUM: [3, 3]
DETECTION:
ENABLE: True
ALIGNED: False
AVA:
BGR: False
DETECTION_SCORE_THRESH: 0.8
TEST_PREDICT_BOX_LISTS: ["person_box_67091280_iou90/ava_detection_val_boxes_and_labels.csv"]
SLOWFAST:
ALPHA: 4
BETA_INV: 8
FUSION_CONV_CHANNEL_RATIO: 2
FUSION_KERNEL_SZ: 5
RESNET:
ZERO_INIT_FINAL_BN: True
WIDTH_PER_GROUP: 64
NUM_GROUPS: 1
DEPTH: 101
TRANS_FUNC: bottleneck_transform
STRIDE_1X1: False
NUM_BLOCK_TEMP_KERNEL: [[3, 3], [4, 4], [6, 6], [3, 3]]
SPATIAL_DILATIONS: [[1, 1], [1, 1], [1, 1], [2, 2]]
SPATIAL_STRIDES: [[1, 1], [2, 2], [2, 2], [1, 1]]
NONLOCAL:
LOCATION: [[[], []], [[], []], [[6, 13, 20], []], [[], []]]
GROUP: [[1, 1], [1, 1], [1, 1], [1, 1]]
INSTANTIATION: dot_product
POOL: [[[2, 2, 2], [2, 2, 2]], [[2, 2, 2], [2, 2, 2]], [[2, 2, 2], [2, 2, 2]], [[2, 2, 2], [2, 2, 2]]]
BN:
USE_PRECISE_STATS: False
NUM_BATCHES_PRECISE: 200
SOLVER:
MOMENTUM: 0.9
WEIGHT_DECAY: 1e-7
OPTIMIZING_METHOD: sgd
MODEL:
NUM_CLASSES: 80
ARCH: slowfast
MODEL_NAME: SlowFast
LOSS_FUNC: bce
DROPOUT_RATE: 0.5
HEAD_ACT: sigmoid
TEST:
ENABLE: False
DATASET: ava
BATCH_SIZE: 8
DATA_LOADER:
NUM_WORKERS: 2
PIN_MEMORY: True
NUM_GPUS: 1
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: .
#TENSORBOARD:
# MODEL_VIS:
# TOPK: 2
DEMO:
ENABLE: True
LABEL_FILE_PATH: "./demo/AVA/ava.json" #刚刚生成的label文件
INPUT_VIDEO: "./input/1.mp4" #视频输入路径
OUTPUT_FILE: "./output/1.mp4" #视频输出路径
DETECTRON2_CFG: "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
DETECTRON2_WEIGHTS: detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl
当前配置:
TRAIN:
ENABLE: False
DATASET: ava
BATCH_SIZE: 16
EVAL_PERIOD: 1
CHECKPOINT_PERIOD: 1
AUTO_RESUME: True
CHECKPOINT_FILE_PATH: ./SLOWFAST_32x2_R101_50_50.pkl #path to pretrain model
CHECKPOINT_TYPE: pytorch
DATA:
NUM_FRAMES: 32
SAMPLING_RATE: 2
TRAIN_JITTER_SCALES: [256, 320]
TRAIN_CROP_SIZE: 224
TEST_CROP_SIZE: 256
INPUT_CHANNEL_NUM: [3, 3]
DETECTION:
ENABLE: True
ALIGNED: False
AVA:
BGR: False
DETECTION_SCORE_THRESH: 0.8
TEST_PREDICT_BOX_LISTS:
["person_box_67091280_iou90/ava_detection_val_boxes_and_labels.csv"]
SLOWFAST:
ALPHA: 4
BETA_INV: 8
FUSION_CONV_CHANNEL_RATIO: 2
FUSION_KERNEL_SZ: 5
RESNET:
ZERO_INIT_FINAL_BN: True
WIDTH_PER_GROUP: 64
NUM_GROUPS: 1
DEPTH: 101
TRANS_FUNC: bottleneck_transform
STRIDE_1X1: False
NUM_BLOCK_TEMP_KERNEL: [[3, 3], [4, 4], [6, 6], [3, 3]]
SPATIAL_DILATIONS: [[1, 1], [1, 1], [1, 1], [2, 2]]
SPATIAL_STRIDES: [[1, 1], [2, 2], [2, 2], [1, 1]]
NONLOCAL:
LOCATION: [[[], []], [[], []], [[6, 13, 20], []], [[], []]]
GROUP: [[1, 1], [1, 1], [1, 1], [1, 1]]
INSTANTIATION: dot_product
POOL:
[
[[2, 2, 2], [2, 2, 2]],
[[2, 2, 2], [2, 2, 2]],
[[2, 2, 2], [2, 2, 2]],
[[2, 2, 2], [2, 2, 2]],
]
BN:
USE_PRECISE_STATS: False
NUM_BATCHES_PRECISE: 200
SOLVER:
MOMENTUM: 0.9
WEIGHT_DECAY: 1e-7
OPTIMIZING_METHOD: sgd
MODEL:
NUM_CLASSES: 80
ARCH: slowfast
MODEL_NAME: SlowFast
LOSS_FUNC: bce
DROPOUT_RATE: 0.5
HEAD_ACT: sigmoid
TEST:
ENABLE: False
DATASET: ava
BATCH_SIZE: 8
DATA_LOADER:
NUM_WORKERS: 2
PIN_MEMORY: True
NUM_GPUS: 1
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: .
TENSORBOARD:
MODEL_VIS:
TOPK: 2
DEMO:
ENABLE: True
LABEL_FILE_PATH: "./ava.json" #刚刚生成的label文件
INPUT_VIDEO: "./input/demo.mp4" #视频输入路径
OUTPUT_FILE: "./output/demo.mp4" #视频输出路径
DETECTRON2_CFG: "COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"
DETECTRON2_WEIGHTS: detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl
执行命令:
python tools/run_net.py --cfg demo/AVA/SLOWFAST_32x2_R101_50_50.yaml
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "tools/run_net.py", line 57, in <module>
main()
File "tools/run_net.py", line 53, in main
demo(cfg)
File "D:\code\web-dev-work-place\github-desktop-store\SlowFast\tools\demo_net.py", line 114, in demo
for task in tqdm.tqdm(run_demo(cfg, frame_provider)):
File "D:\dev-evn\anaconda\envs\slowfast\lib\site-packages\tqdm\std.py", line 1178, in __iter__
for obj in iterable:
File "D:\code\web-dev-work-place\github-desktop-store\SlowFast\tools\demo_net.py", line 59, in run_demo
model = ActionPredictor(cfg=cfg, async_vis=async_vis)
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\visualization\predictor.py", line 132, in __init__
self.predictor = Predictor(cfg=cfg, gpu_id=gpu_id)
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\visualization\predictor.py", line 46, in __init__
cu.load_test_checkpoint(cfg, self.model)
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\utils\checkpoint.py", line 692, in load_test_checkpoint
load_checkpoint(
File "d:\code\web-dev-work-place\github-desktop-store\slowfast\slowfast\utils\checkpoint.py", line 298, in load_checkpoint
checkpoint = torch.load(f, map_location="cpu")
File "D:\dev-evn\anaconda\envs\slowfast\lib\site-packages\torch\serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "D:\dev-evn\anaconda\envs\slowfast\lib\site-packages\torch\serialization.py", line 762, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: pickle data was truncated
根据 issue,
python tools/run_net.py --cfg demo/AVA/SLOWFAST_32x2_R101_50_50.yaml long_size=8
无效。
此讨论说明不要再 window 系统内加载数据。而是在 linux 内就加载。