Chunk preprocessing og analyse

Start med at opsætte dit projekt med de to andre guides: Preprocessing på AI-LAB og Løve speciale - LabGym opsætning og brug på AI-LAB

Preprocessing

A. Ret preprocessing Python-scriptet

Åbn:

nano ~/video_processing/code/lion_thesis_training_preprocess.py

Find:

cmd = [ffmpeg_executable]

Erstat med:

cmd = [ffmpeg_executable, "-y", "-nostdin"]

Find:

subprocess.run(cmd, check=True)

Erstat med:

subprocess.run(cmd, check=True, stdin=subprocess.DEVNULL)

Gem og luk.


B. Lav preprocessing chunk-script

Opret:

nano ~/video_processing/code/make_video_chunks.py

Indsæt:

import subprocess
from pathlib import Path

BASE = Path.home() / "video_processing"
INPUT_LIST = BASE / "file_list.txt"
OUT_DIR = BASE / "code/video_chunks"

TARGET_SECONDS = 60 * 60
MAX_SECONDS = 90 * 60

FFPROBE = BASE / "tools/ffmpeg-7.0.2-amd64-static/ffprobe"

OUT_DIR.mkdir(parents=True, exist_ok=True)

def get_duration(video):
    cmd = [
        str(FFPROBE),
        "-v", "error",
        "-show_entries", "format=duration",
        "-of", "default=noprint_wrappers=1:nokey=1",
        str(video),
    ]
    result = subprocess.check_output(cmd, text=True).strip()
    return float(result)

videos = []

with INPUT_LIST.open() as f:
    for line in f:
        path = line.strip()
        if path:
            duration = get_duration(path)
            videos.append((path, duration))

videos.sort(key=lambda x: x[1], reverse=True)

chunks = []

for video, duration in videos:
    best_chunk = None
    best_duration = None

    for chunk in chunks:
        chunk_duration = sum(d for _, d in chunk)

        if chunk_duration + duration <= MAX_SECONDS:
            if best_duration is None or chunk_duration < best_duration:
                best_chunk = chunk
                best_duration = chunk_duration

    if best_chunk is not None:
        best_chunk.append((video, duration))
    else:
        chunks.append([(video, duration)])

for old in OUT_DIR.glob("video_chunk_*.txt"):
    old.unlink()

for i, chunk in enumerate(chunks):
    chunk_file = OUT_DIR / f"video_chunk_{i:04d}.txt"

    with chunk_file.open("w") as f:
        for video, duration in chunk:
            f.write(video + "\n")

    total_min = sum(d for _, d in chunk) / 60
    print(f"{chunk_file.name}: {len(chunk)} videos, {total_min:.1f} min")

C. Erstat preprocessing submit_array.sh

Åbn:

nano ~/video_processing/code/submit_array.sh

Slet alt og indsæt:

#!/bin/bash

PROJECT_DIR=/ceph/home/student.aau.dk/DITBRUGERNAVN/video_processing
INPUT_DIR=$PROJECT_DIR/data_in
FILE_LIST=$PROJECT_DIR/file_list.txt
JOB_SCRIPT=$PROJECT_DIR/code/run_array.sh
CHUNK_DIR=$PROJECT_DIR/code/video_chunks

mkdir -p "$PROJECT_DIR/logs"
mkdir -p "$PROJECT_DIR/data_out"
mkdir -p "$CHUNK_DIR"

find "$INPUT_DIR" -maxdepth 1 -type f -name "*.avi" | sort > "$FILE_LIST"

NUM_FILES=$(wc -l < "$FILE_LIST")

if [ "$NUM_FILES" -eq 0 ]; then
    echo "No AVI files found"
    exit 1
fi

echo "Found $NUM_FILES AVI files"
echo "Creating duration-based chunks..."

singularity exec /ceph/container/python/python_3.13.sif \
python3 "$PROJECT_DIR/code/make_video_chunks.py"

NUM_CHUNKS=$(ls "$CHUNK_DIR"/video_chunk_*.txt 2>/dev/null | wc -l)

if [ "$NUM_CHUNKS" -eq 0 ]; then
    echo "No chunks were created"
    exit 1
fi

MAX_INDEX=$((NUM_CHUNKS - 1))

echo "Created $NUM_CHUNKS chunks"
echo "Submitting array job: 0-$MAX_INDEX"

sbatch --array=0-"$MAX_INDEX"%2 "$JOB_SCRIPT"

Skift DITBRUGERNAVN.


D. Erstat preprocessing run_array.sh

Åbn:

nano ~/video_processing/code/run_array.sh

Slet alt og indsæt:

#!/bin/bash
#SBATCH --job-name=video_array
#SBATCH --output=/ceph/home/student.aau.dk/DITBRUGERNAVN/video_processing/logs/pre_%A_%a.out
#SBATCH --error=/ceph/home/student.aau.dk/DITBRUGERNAVN/video_processing/logs/pre_%A_%a.err
#SBATCH --time=2:00:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G

set -euo pipefail

PROJECT_DIR=/ceph/home/student.aau.dk/DITBRUGERNAVN/video_processing

PY_CONTAINER=/ceph/container/python/python_3.13.sif
SCRIPT=$PROJECT_DIR/code/lion_thesis_training_preprocess.py
OUTPUT_DIR=$PROJECT_DIR/data_out
FFMPEG_PATH=$PROJECT_DIR/tools/ffmpeg-7.0.2-amd64-static/ffmpeg

CHUNK_FILE=$(printf "%s/code/video_chunks/video_chunk_%04d.txt" "$PROJECT_DIR" "$SLURM_ARRAY_TASK_ID")

mkdir -p "$OUTPUT_DIR"
mkdir -p "$PROJECT_DIR/logs"

if [ ! -f "$CHUNK_FILE" ]; then
    echo "Chunk file not found: $CHUNK_FILE"
    exit 1
fi

echo "Array task ID: $SLURM_ARRAY_TASK_ID"
echo "Chunk file: $CHUNK_FILE"

while read -r FILE; do
    [ -z "$FILE" ] && continue

    echo "----------------------------------------"
    echo "Processing file: $FILE"
    echo "----------------------------------------"

    singularity exec "$PY_CONTAINER" python3 "$SCRIPT" \
      --input_file "$FILE" \
      --output_dir "$OUTPUT_DIR" \
      --ffmpeg_path "$FFMPEG_PATH"

    echo "Finished file: $FILE"

done < "$CHUNK_FILE"

echo "Chunk finished."

Skift DITBRUGERNAVN.


LabGym-del

A. Lav chunk-script til LabGym

Opret:

nano ~/labgym_lion/code/make_video_chunks.py

Indsæt:

import subprocess
from pathlib import Path

BASE = Path.home() / "labgym_lion"
INPUT_LIST = BASE / "code/video_list.txt"
OUT_DIR = BASE / "code/video_chunks"

TARGET_SECONDS = 30 * 60
MAX_SECONDS = 70 * 60

FFPROBE = Path.home() / "video_processing/tools/ffmpeg-7.0.2-amd64-static/ffprobe"

OUT_DIR.mkdir(parents=True, exist_ok=True)

def get_duration(video):
    cmd = [
        str(FFPROBE),
        "-v", "error",
        "-show_entries", "format=duration",
        "-of", "default=noprint_wrappers=1:nokey=1",
        str(video),
    ]
    result = subprocess.check_output(cmd, text=True).strip()
    return float(result)

videos = []

with INPUT_LIST.open() as f:
    for line in f:
        path = line.strip()
        if path:
            duration = get_duration(path)
            videos.append((path, duration))

videos.sort(key=lambda x: x[1], reverse=True)

chunks = []

for video, duration in videos:
    best_chunk = None
    best_duration = None

    for chunk in chunks:
        chunk_duration = sum(d for _, d in chunk)

        if chunk_duration + duration <= MAX_SECONDS:
            if best_duration is None or chunk_duration < best_duration:
                best_chunk = chunk
                best_duration = chunk_duration

    if best_chunk is not None:
        best_chunk.append((video, duration))
    else:
        chunks.append([(video, duration)])

for old in OUT_DIR.glob("video_chunk_*.txt"):
    old.unlink()

for i, chunk in enumerate(chunks):
    chunk_file = OUT_DIR / f"video_chunk_{i:04d}.txt"

    with chunk_file.open("w") as f:
        for video, duration in chunk:
            f.write(video + "\n")

    total_min = sum(d for _, d in chunk) / 60
    print(f"{chunk_file.name}: {len(chunk)} videos, {total_min:.1f} min")

Gem og luk.


B. Erstat LabGym submit_array.sh

Åbn:

nano ~/labgym_lion/code/submit_array.sh

Slet alt, og indsæt:

#!/bin/bash

PROJECT_DIR=/ceph/home/student.aau.dk/DITBRUGERNAVN/labgym_lion
INPUT_DIR=/ceph/home/student.aau.dk/DITBRUGERNAVN/video_processing/data_out
FILE_LIST=$PROJECT_DIR/code/video_list.txt
JOB_SCRIPT=$PROJECT_DIR/code/run_labgym_array.sh
CHUNK_DIR=$PROJECT_DIR/code/video_chunks

mkdir -p "$PROJECT_DIR/logs"
mkdir -p "$PROJECT_DIR/results"
mkdir -p "$CHUNK_DIR"

find "$INPUT_DIR" -maxdepth 1 -type f -name "*.mp4" | sort > "$FILE_LIST"

NUM_FILES=$(wc -l < "$FILE_LIST")

if [ "$NUM_FILES" -eq 0 ]; then
    echo "No MP4 files found"
    exit 1
fi

echo "Found $NUM_FILES processed videos"
echo "Creating duration-based LabGym chunks..."

singularity exec /ceph/container/python/python_3.10.sif \
python "$PROJECT_DIR/code/make_video_chunks.py"

NUM_CHUNKS=$(ls "$CHUNK_DIR"/video_chunk_*.txt 2>/dev/null | wc -l)

if [ "$NUM_CHUNKS" -eq 0 ]; then
    echo "No chunks were created"
    exit 1
fi

MAX_INDEX=$((NUM_CHUNKS - 1))

echo "Created $NUM_CHUNKS chunks"
echo "Submitting LabGym chunk jobs..."

sbatch --array=0-"$MAX_INDEX"%2 "$JOB_SCRIPT"

Skift DITBRUGERNAVN.

Gem og luk.


C. Erstat LabGym run_labgym_array.sh

Åbn:

nano ~/labgym_lion/code/run_labgym_array.sh

Slet alt, og indsæt:

#!/bin/bash
#SBATCH --job-name=labgym_lion
#SBATCH --output=/ceph/home/student.aau.dk/DITBRUGERNAVN/labgym_lion/logs/labgym_%A_%a.out
#SBATCH --error=/ceph/home/student.aau.dk/DITBRUGERNAVN/labgym_lion/logs/labgym_%A_%a.err
#SBATCH --mem=80G
#SBATCH --cpus-per-task=8
#SBATCH --gres=gpu:1
#SBATCH --time=6:00:00

set -euo pipefail

BASE=/ceph/home/student.aau.dk/DITBRUGERNAVN/labgym_lion
CHUNK_FILE=$(printf "%s/code/video_chunks/video_chunk_%04d.txt" "$BASE" "$SLURM_ARRAY_TASK_ID")

if [ ! -f "$CHUNK_FILE" ]; then
    echo "Chunk file not found: $CHUNK_FILE"
    exit 1
fi

echo "Array task ID: $SLURM_ARRAY_TASK_ID"
echo "Chunk file: $CHUNK_FILE"

while read -r FILE; do
    [ -z "$FILE" ] && continue

    BASENAME=$(basename "$FILE" .mp4)
    RESULTS_DIR=/scratch/labgym_lion/results/$BASENAME

    echo "----------------------------------------"
    echo "Starting LabGym analysis"
    echo "Video: $FILE"
    echo "Results: $RESULTS_DIR"
    echo "----------------------------------------"

    singularity exec --nv \
    -B ${BASE}:/scratch/labgym_lion \
    /ceph/container/python/python_3.10.sif \
    /bin/bash -c "
    set -euo pipefail

    source /scratch/labgym_lion/venv/bin/activate

    export TMPDIR=/scratch/labgym_lion/tmp
    export TEMP=/scratch/labgym_lion/tmp
    export TMP=/scratch/labgym_lion/tmp

    mkdir -p '$RESULTS_DIR'

    python /scratch/labgym_lion/code/run_labgym_detector.py \
      --video '$FILE' \
      --detector /scratch/labgym_lion/data/models/lion_detector_collectively_v3 \
      --categorizer /scratch/labgym_lion/data/models/lion_cat_v7 \
      --results '$RESULTS_DIR' \
      --animal-number '{\"Male\": 1, \"Female\": 2}' \
      --animal-kinds Male Female \
      --batch-size 2 \
      --uncertain 20 \
      --duration 0 \
      --min-behavior-length 10 \
      --skip-annotated-video
    "

    echo "Finished LabGym analysis: $FILE"

done < "$CHUNK_FILE"

echo "Chunk finished."

Skift DITBRUGERNAVN alle steder.

Gem og luk.


Gør scripts kørbare

Kør:

chmod +x ~/video_processing/code/run_array.sh
chmod +x ~/video_processing/code/submit_array.sh
chmod +x ~/labgym_lion/code/run_labgym_array.sh
chmod +x ~/labgym_lion/code/submit_array.sh

Video oprydnings script

fjerne de videoer der er blevet analyseret så man ikke skal analysere alt igen hvis der sker fejl

Lav scriptet

nano ~/video_processing/code/cleanup_analyzed_videos.sh

Indsæt:

#!/bin/bash

DATA_OUT=/ceph/home/student.aau.dk/DITBRUGERNAVN/video_processing/data_out
RESULTS=/ceph/home/student.aau.dk/DITBRUGERNAVN/labgym_lion/results

echo "Checking analyzed videos..."
echo "DATA_OUT: $DATA_OUT"
echo "RESULTS:  $RESULTS"
echo ""

for VIDEO in "$DATA_OUT"/*.mp4; do
    [ -e "$VIDEO" ] || continue

    BASENAME=$(basename "$VIDEO" .mp4)
    RESULT_DIR="$RESULTS/$BASENAME"

    if [ -d "$RESULT_DIR" ] && [ "$(find "$RESULT_DIR" -type f | wc -l)" -ge 2 ]; then
        echo "Deleting analyzed video: $VIDEO"
        rm "$VIDEO"
    else
        echo "Keeping not-finished video: $VIDEO"
    fi
done

echo ""
echo "Cleanup finished."

Skift DITBRUGERNAVN.

Gem og luk.

Gør det kørbart

chmod +x ~/video_processing/code/cleanup_analyzed_videos.sh

Kør det

sbatch ~/video_processing/code/cleanup_analyzed_videos.sh
Powered by Forestry.md