Chunk preprocessing og analyse
Start med at opsætte dit projekt med de to andre guides: Preprocessing på AI-LAB og Løve speciale - LabGym opsætning og brug på AI-LAB
Preprocessing
A. Ret preprocessing Python-scriptet
Åbn:
nano ~/video_processing/code/lion_thesis_training_preprocess.py
Find:
cmd = [ffmpeg_executable]
Erstat med:
cmd = [ffmpeg_executable, "-y", "-nostdin"]
Find:
subprocess.run(cmd, check=True)
Erstat med:
subprocess.run(cmd, check=True, stdin=subprocess.DEVNULL)
Gem og luk.
B. Lav preprocessing chunk-script
Opret:
nano ~/video_processing/code/make_video_chunks.py
Indsæt:
import subprocess
from pathlib import Path
BASE = Path.home() / "video_processing"
INPUT_LIST = BASE / "file_list.txt"
OUT_DIR = BASE / "code/video_chunks"
TARGET_SECONDS = 60 * 60
MAX_SECONDS = 90 * 60
FFPROBE = BASE / "tools/ffmpeg-7.0.2-amd64-static/ffprobe"
OUT_DIR.mkdir(parents=True, exist_ok=True)
def get_duration(video):
cmd = [
str(FFPROBE),
"-v", "error",
"-show_entries", "format=duration",
"-of", "default=noprint_wrappers=1:nokey=1",
str(video),
]
result = subprocess.check_output(cmd, text=True).strip()
return float(result)
videos = []
with INPUT_LIST.open() as f:
for line in f:
path = line.strip()
if path:
duration = get_duration(path)
videos.append((path, duration))
videos.sort(key=lambda x: x[1], reverse=True)
chunks = []
for video, duration in videos:
best_chunk = None
best_duration = None
for chunk in chunks:
chunk_duration = sum(d for _, d in chunk)
if chunk_duration + duration <= MAX_SECONDS:
if best_duration is None or chunk_duration < best_duration:
best_chunk = chunk
best_duration = chunk_duration
if best_chunk is not None:
best_chunk.append((video, duration))
else:
chunks.append([(video, duration)])
for old in OUT_DIR.glob("video_chunk_*.txt"):
old.unlink()
for i, chunk in enumerate(chunks):
chunk_file = OUT_DIR / f"video_chunk_{i:04d}.txt"
with chunk_file.open("w") as f:
for video, duration in chunk:
f.write(video + "\n")
total_min = sum(d for _, d in chunk) / 60
print(f"{chunk_file.name}: {len(chunk)} videos, {total_min:.1f} min")
C. Erstat preprocessing submit_array.sh
Åbn:
nano ~/video_processing/code/submit_array.sh
Slet alt og indsæt:
#!/bin/bash
PROJECT_DIR=/ceph/home/student.aau.dk/DITBRUGERNAVN/video_processing
INPUT_DIR=$PROJECT_DIR/data_in
FILE_LIST=$PROJECT_DIR/file_list.txt
JOB_SCRIPT=$PROJECT_DIR/code/run_array.sh
CHUNK_DIR=$PROJECT_DIR/code/video_chunks
mkdir -p "$PROJECT_DIR/logs"
mkdir -p "$PROJECT_DIR/data_out"
mkdir -p "$CHUNK_DIR"
find "$INPUT_DIR" -maxdepth 1 -type f -name "*.avi" | sort > "$FILE_LIST"
NUM_FILES=$(wc -l < "$FILE_LIST")
if [ "$NUM_FILES" -eq 0 ]; then
echo "No AVI files found"
exit 1
fi
echo "Found $NUM_FILES AVI files"
echo "Creating duration-based chunks..."
singularity exec /ceph/container/python/python_3.13.sif \
python3 "$PROJECT_DIR/code/make_video_chunks.py"
NUM_CHUNKS=$(ls "$CHUNK_DIR"/video_chunk_*.txt 2>/dev/null | wc -l)
if [ "$NUM_CHUNKS" -eq 0 ]; then
echo "No chunks were created"
exit 1
fi
MAX_INDEX=$((NUM_CHUNKS - 1))
echo "Created $NUM_CHUNKS chunks"
echo "Submitting array job: 0-$MAX_INDEX"
sbatch --array=0-"$MAX_INDEX"%2 "$JOB_SCRIPT"
Skift DITBRUGERNAVN.
D. Erstat preprocessing run_array.sh
Åbn:
nano ~/video_processing/code/run_array.sh
Slet alt og indsæt:
#!/bin/bash
#SBATCH --job-name=video_array
#SBATCH --output=/ceph/home/student.aau.dk/DITBRUGERNAVN/video_processing/logs/pre_%A_%a.out
#SBATCH --error=/ceph/home/student.aau.dk/DITBRUGERNAVN/video_processing/logs/pre_%A_%a.err
#SBATCH --time=2:00:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
set -euo pipefail
PROJECT_DIR=/ceph/home/student.aau.dk/DITBRUGERNAVN/video_processing
PY_CONTAINER=/ceph/container/python/python_3.13.sif
SCRIPT=$PROJECT_DIR/code/lion_thesis_training_preprocess.py
OUTPUT_DIR=$PROJECT_DIR/data_out
FFMPEG_PATH=$PROJECT_DIR/tools/ffmpeg-7.0.2-amd64-static/ffmpeg
CHUNK_FILE=$(printf "%s/code/video_chunks/video_chunk_%04d.txt" "$PROJECT_DIR" "$SLURM_ARRAY_TASK_ID")
mkdir -p "$OUTPUT_DIR"
mkdir -p "$PROJECT_DIR/logs"
if [ ! -f "$CHUNK_FILE" ]; then
echo "Chunk file not found: $CHUNK_FILE"
exit 1
fi
echo "Array task ID: $SLURM_ARRAY_TASK_ID"
echo "Chunk file: $CHUNK_FILE"
while read -r FILE; do
[ -z "$FILE" ] && continue
echo "----------------------------------------"
echo "Processing file: $FILE"
echo "----------------------------------------"
singularity exec "$PY_CONTAINER" python3 "$SCRIPT" \
--input_file "$FILE" \
--output_dir "$OUTPUT_DIR" \
--ffmpeg_path "$FFMPEG_PATH"
echo "Finished file: $FILE"
done < "$CHUNK_FILE"
echo "Chunk finished."
Skift DITBRUGERNAVN.
LabGym-del
A. Lav chunk-script til LabGym
Opret:
nano ~/labgym_lion/code/make_video_chunks.py
Indsæt:
import subprocess
from pathlib import Path
BASE = Path.home() / "labgym_lion"
INPUT_LIST = BASE / "code/video_list.txt"
OUT_DIR = BASE / "code/video_chunks"
TARGET_SECONDS = 30 * 60
MAX_SECONDS = 70 * 60
FFPROBE = Path.home() / "video_processing/tools/ffmpeg-7.0.2-amd64-static/ffprobe"
OUT_DIR.mkdir(parents=True, exist_ok=True)
def get_duration(video):
cmd = [
str(FFPROBE),
"-v", "error",
"-show_entries", "format=duration",
"-of", "default=noprint_wrappers=1:nokey=1",
str(video),
]
result = subprocess.check_output(cmd, text=True).strip()
return float(result)
videos = []
with INPUT_LIST.open() as f:
for line in f:
path = line.strip()
if path:
duration = get_duration(path)
videos.append((path, duration))
videos.sort(key=lambda x: x[1], reverse=True)
chunks = []
for video, duration in videos:
best_chunk = None
best_duration = None
for chunk in chunks:
chunk_duration = sum(d for _, d in chunk)
if chunk_duration + duration <= MAX_SECONDS:
if best_duration is None or chunk_duration < best_duration:
best_chunk = chunk
best_duration = chunk_duration
if best_chunk is not None:
best_chunk.append((video, duration))
else:
chunks.append([(video, duration)])
for old in OUT_DIR.glob("video_chunk_*.txt"):
old.unlink()
for i, chunk in enumerate(chunks):
chunk_file = OUT_DIR / f"video_chunk_{i:04d}.txt"
with chunk_file.open("w") as f:
for video, duration in chunk:
f.write(video + "\n")
total_min = sum(d for _, d in chunk) / 60
print(f"{chunk_file.name}: {len(chunk)} videos, {total_min:.1f} min")
Gem og luk.
B. Erstat LabGym submit_array.sh
Åbn:
nano ~/labgym_lion/code/submit_array.sh
Slet alt, og indsæt:
#!/bin/bash
PROJECT_DIR=/ceph/home/student.aau.dk/DITBRUGERNAVN/labgym_lion
INPUT_DIR=/ceph/home/student.aau.dk/DITBRUGERNAVN/video_processing/data_out
FILE_LIST=$PROJECT_DIR/code/video_list.txt
JOB_SCRIPT=$PROJECT_DIR/code/run_labgym_array.sh
CHUNK_DIR=$PROJECT_DIR/code/video_chunks
mkdir -p "$PROJECT_DIR/logs"
mkdir -p "$PROJECT_DIR/results"
mkdir -p "$CHUNK_DIR"
find "$INPUT_DIR" -maxdepth 1 -type f -name "*.mp4" | sort > "$FILE_LIST"
NUM_FILES=$(wc -l < "$FILE_LIST")
if [ "$NUM_FILES" -eq 0 ]; then
echo "No MP4 files found"
exit 1
fi
echo "Found $NUM_FILES processed videos"
echo "Creating duration-based LabGym chunks..."
singularity exec /ceph/container/python/python_3.10.sif \
python "$PROJECT_DIR/code/make_video_chunks.py"
NUM_CHUNKS=$(ls "$CHUNK_DIR"/video_chunk_*.txt 2>/dev/null | wc -l)
if [ "$NUM_CHUNKS" -eq 0 ]; then
echo "No chunks were created"
exit 1
fi
MAX_INDEX=$((NUM_CHUNKS - 1))
echo "Created $NUM_CHUNKS chunks"
echo "Submitting LabGym chunk jobs..."
sbatch --array=0-"$MAX_INDEX"%2 "$JOB_SCRIPT"
Skift DITBRUGERNAVN.
Gem og luk.
C. Erstat LabGym run_labgym_array.sh
Åbn:
nano ~/labgym_lion/code/run_labgym_array.sh
Slet alt, og indsæt:
#!/bin/bash
#SBATCH --job-name=labgym_lion
#SBATCH --output=/ceph/home/student.aau.dk/DITBRUGERNAVN/labgym_lion/logs/labgym_%A_%a.out
#SBATCH --error=/ceph/home/student.aau.dk/DITBRUGERNAVN/labgym_lion/logs/labgym_%A_%a.err
#SBATCH --mem=80G
#SBATCH --cpus-per-task=8
#SBATCH --gres=gpu:1
#SBATCH --time=6:00:00
set -euo pipefail
BASE=/ceph/home/student.aau.dk/DITBRUGERNAVN/labgym_lion
CHUNK_FILE=$(printf "%s/code/video_chunks/video_chunk_%04d.txt" "$BASE" "$SLURM_ARRAY_TASK_ID")
if [ ! -f "$CHUNK_FILE" ]; then
echo "Chunk file not found: $CHUNK_FILE"
exit 1
fi
echo "Array task ID: $SLURM_ARRAY_TASK_ID"
echo "Chunk file: $CHUNK_FILE"
while read -r FILE; do
[ -z "$FILE" ] && continue
BASENAME=$(basename "$FILE" .mp4)
RESULTS_DIR=/scratch/labgym_lion/results/$BASENAME
echo "----------------------------------------"
echo "Starting LabGym analysis"
echo "Video: $FILE"
echo "Results: $RESULTS_DIR"
echo "----------------------------------------"
singularity exec --nv \
-B ${BASE}:/scratch/labgym_lion \
/ceph/container/python/python_3.10.sif \
/bin/bash -c "
set -euo pipefail
source /scratch/labgym_lion/venv/bin/activate
export TMPDIR=/scratch/labgym_lion/tmp
export TEMP=/scratch/labgym_lion/tmp
export TMP=/scratch/labgym_lion/tmp
mkdir -p '$RESULTS_DIR'
python /scratch/labgym_lion/code/run_labgym_detector.py \
--video '$FILE' \
--detector /scratch/labgym_lion/data/models/lion_detector_collectively_v3 \
--categorizer /scratch/labgym_lion/data/models/lion_cat_v7 \
--results '$RESULTS_DIR' \
--animal-number '{\"Male\": 1, \"Female\": 2}' \
--animal-kinds Male Female \
--batch-size 2 \
--uncertain 20 \
--duration 0 \
--min-behavior-length 10 \
--skip-annotated-video
"
echo "Finished LabGym analysis: $FILE"
done < "$CHUNK_FILE"
echo "Chunk finished."
Skift DITBRUGERNAVN alle steder.
Gem og luk.
Gør scripts kørbare
Kør:
chmod +x ~/video_processing/code/run_array.sh
chmod +x ~/video_processing/code/submit_array.sh
chmod +x ~/labgym_lion/code/run_labgym_array.sh
chmod +x ~/labgym_lion/code/submit_array.sh
Video oprydnings script
fjerne de videoer der er blevet analyseret så man ikke skal analysere alt igen hvis der sker fejl
Lav scriptet
nano ~/video_processing/code/cleanup_analyzed_videos.sh
Indsæt:
#!/bin/bash
DATA_OUT=/ceph/home/student.aau.dk/DITBRUGERNAVN/video_processing/data_out
RESULTS=/ceph/home/student.aau.dk/DITBRUGERNAVN/labgym_lion/results
echo "Checking analyzed videos..."
echo "DATA_OUT: $DATA_OUT"
echo "RESULTS: $RESULTS"
echo ""
for VIDEO in "$DATA_OUT"/*.mp4; do
[ -e "$VIDEO" ] || continue
BASENAME=$(basename "$VIDEO" .mp4)
RESULT_DIR="$RESULTS/$BASENAME"
if [ -d "$RESULT_DIR" ] && [ "$(find "$RESULT_DIR" -type f | wc -l)" -ge 2 ]; then
echo "Deleting analyzed video: $VIDEO"
rm "$VIDEO"
else
echo "Keeping not-finished video: $VIDEO"
fi
done
echo ""
echo "Cleanup finished."
Skift DITBRUGERNAVN.
Gem og luk.
Gør det kørbart
chmod +x ~/video_processing/code/cleanup_analyzed_videos.sh
Kør det
sbatch ~/video_processing/code/cleanup_analyzed_videos.sh