Skip to main content

Overview

Remote training runs the CL1 neural interface on dedicated CL1 hardware while the training server runs on a separate machine with CUDA capabilities. This is the production setup for training biological neurons to play DOOM.
Critical: Start the CL1 interface before the training server. Both should be started around the same time, but CL1 first.

Network Setup

You need two machines on the same network:
  1. CL1 Device - Runs the neural interface (e.g., 192.168.240.84)
  2. Training Machine - Runs VizDoom and PPO training (e.g., 192.168.1.238)

Required Ports

Ensure these UDP ports are open between the machines:
  • 12345 - Stimulation commands (training → CL1)
  • 12346 - Spike data (CL1 → training)
  • 12347 - Event metadata (training → CL1)
  • 12348 - Feedback commands (training → CL1)
Test connectivity with ping before starting training. Both machines must be able to reach each other.

Quick Start

1

Configure IP Addresses

Before running the scripts, verify the IP addresses:On CL1 device, check scripts/run_cl1.sh:
# training-host should point to your training machine
# Example: 192.168.1.238 is "geodude"
--training-host 192.168.1.238
On training machine, check scripts/run_training_server.sh:
# cl1-host should point to your CL1 device
# Example: 192.168.240.84 is "cl1-2507-15"
--cl1-host 192.168.240.84
2

Start CL1 Interface First

On the CL1 device, run:
./scripts/run_cl1.sh
This executes:
python cl1_neural_interface.py \
    --training-host 192.168.1.238 \
    --recording-path /data/recordings/doom-neuron/ \
    --tick-frequency 10
What this does:
  • Connects to training server at 192.168.1.238
  • Saves recordings to /data/recordings/doom-neuron/
  • Runs neural loop at 10 Hz to avoid overstimulating neurons
The tick frequency of 10 Hz is carefully chosen to avoid overstimulating the biological neurons. Do not increase without careful consideration.
3

Start Training Server

On the training machine (after CL1 is running), run:
./scripts/run_training_server.sh
This executes:
python training_server.py \
    --mode train \
    --device cuda \
    --cl1-host 192.168.240.84 \
    --max-episodes 300
What this does:
  • Runs in training mode with PPO reinforcement learning
  • Uses CUDA for GPU acceleration
  • Connects to CL1 hardware at 192.168.240.84
  • Trains for up to 300 episodes
4

Monitor Training

The training server outputs episode statistics to training_log.jsonl and TensorBoard logs.View TensorBoard metrics:
tensorboard --logdir checkpoints/l5_2048_rand/logs --port 6006
Access at: http://<training-machine-ip>:6006

Manual Configuration

For custom setups, configure each component manually:

Basic Command

python cl1_neural_interface.py --training-host 192.168.1.100

Full Configuration Example

python cl1_neural_interface.py \
    --training-host 192.168.1.100 \
    --stim-port 12345 \
    --spike-port 12346 \
    --event-port 12347 \
    --feedback-port 12348 \
    --tick-frequency 10 \
    --recording-path /data/recordings

Configuration Options

ArgumentDefaultDescription
--training-hostrequiredIP address of training system
--stim-port12345Port for receiving stimulation commands
--spike-port12346Port for sending spike data
--event-port12347Port for receiving event metadata
--feedback-port12348Port for receiving feedback commands
--tick-frequency10Neural loop frequency in Hz
--recording-path./recordingsDirectory for saving recordings
Use absolute paths for --recording-path on production systems to ensure recordings are saved to persistent storage.

Advanced Configurations

Custom Feedback Configuration

python training_server.py \
    --mode train \
    --device cuda \
    --cl1-host 192.168.1.50 \
    --use-episode-feedback \
    --no-episode-feedback-surprise-scaling

Custom Recording Paths

python cl1_neural_interface.py \
    --training-host 192.168.1.100 \
    --recording-path /mnt/data/doom_recordings

Watch Mode (Inference)

To run a trained policy without further training:
./scripts/run_frozen_training_server.sh
This executes:
python training_server.py \
    --mode watch \
    --device cuda \
    --cl1-host 192.168.240.84 \
    --max-episodes 3650
Watch mode uses direct hardware access. The UDP interface has not been ported to watch mode yet.

Troubleshooting

Connection Issues

Symptom: CL1 interface can’t connect to training server Solutions:
  • Verify IP addresses with ip addr or ifconfig
  • Check firewall rules: sudo ufw status
  • Test connectivity: ping <training-host>
  • Ensure ports 12345-12348 are open

Timing Issues

Symptom: Training server fails to connect Solutions:
  • Ensure CL1 interface started first
  • Wait 5-10 seconds between starting CL1 and training server
  • Check that both systems are using the same tick frequency

Performance Issues

Symptom: Slow training or high latency Solutions:
  • Ensure machines are on same local network (avoid VPN/WAN)
  • Check network latency: ping -c 100 <cl1-host>
  • Monitor GPU usage: nvidia-smi -l 1
  • Reduce --max-episodes for testing

Output Files

CL1 Device (/data/recordings/doom-neuron/):
*.cl1                     # Neural recordings with metadata
Training Machine:
checkpoints/
├── episode_*.pt          # Model checkpoints
└── l5_2048_rand/
    └── logs/             # TensorBoard logs

training_log.jsonl        # Episode statistics

Stopping Training

Press Ctrl+C on either machine to gracefully shutdown both systems:
  1. Training server sends completion signal to CL1
  2. CL1 interface saves neural recording and exits
  3. Both processes cleanup UDP sockets
  4. Final checkpoint is saved

Next Steps