Troubleshooting

Common Issues

No Spikes Detected

Symptoms:

TensorBoard shows Spikes/total_count = 0
Agent behavior is random/static
Episode rewards flat or declining

Causes:

CL1 device not connected

# Check if CL1 interface is running
ps aux | grep cl1_neural_interface

UDP port mismatch

# CL1 side
python cl1_neural_interface.py --training-host 192.168.1.100 --spike-port 12346

# Training side
python training_server.py --cl1-host 192.168.1.50 --cl1-spike-port 12346

Firewall blocking UDP

# Allow ports 12345-12348
sudo ufw allow 12345:12348/udp

Network latency/packet loss

# Test connectivity
ping -c 10 192.168.1.50

# Monitor UDP traffic
sudo tcpdump -i eth0 udp port 12346

Solutions:

Verify spike collection loop

From cl1_neural_interface.py:200-220:

def collect_spikes(self, tick: cl.LoopTick) -> np.ndarray:
    spike_counts = np.zeros(len(self.channel_groups), dtype=np.float32)
    for spike in tick.analysis.spikes:
        idx = self.channel_lookup.get(spike.channel)
        if idx is not None:
            spike_counts[idx] += 1
    return spike_counts

Add debug logging:

if len(tick.analysis.spikes) > 0:
    print(f"Collected {len(tick.analysis.spikes)} spikes")

Check stimulation application

From cl1_neural_interface.py:173-200:

def apply_stimulation(
    self,
    neurons: cl.Neurons,
    frequencies: np.ndarray,
    amplitudes: np.ndarray
):
    # Interrupt ongoing stimulation
    neurons.interrupt(self.config.all_channels_set)
    
    # Apply stimulation for each channel set
    for i, channel_num in enumerate(self.config.encoding_channels):
        channel_set = cl.ChannelSet(channel_num)
        amplitude_value = float(amplitudes[i])
        freq_value = int(frequencies[i])
        
        # Create stimulation design
        stim_design = cl.StimDesign(
            self.config.phase1_duration,
            -amplitude_value,  # Negative phase
            self.config.phase2_duration,
            amplitude_value    # Positive phase
        )
        
        burst_design = cl.BurstDesign(
            self.config.burst_count,
            freq_value
        )
        
        neurons.stim(channel_set, stim_design, burst_design)

Verify stimulation parameters:

frequencies in range [4.0, 40.0] Hz
amplitudes in range [1.0, 2.5] μA
phase1_duration and phase2_duration = 160 μs

Training Divergence

Symptoms:

Reward suddenly drops to zero
Policy outputs NaN values
Gradient norms explode (>100)

Causes:

Learning rate too high

config = PPOConfig(
    learning_rate=3e-4  # Try 1e-4 or 3e-5
)

Gradient clipping too weak

config = PPOConfig(
    max_grad_norm=3.0  # Try 1.0 or 0.5
)

Unnormalized returns causing value explosion

config = PPOConfig(
    normalize_returns=True  # From README.md:123
)

Solutions:

Monitor gradient norms

From TensorBoard:

tensorboard --logdir checkpoints/l5_2048_rand/logs --port 6006

Watch these metrics:

Training/Policy_Grad_Norm (should stay < 10)
Training/Value_Grad_Norm (should stay < 10)
Training/Encoder_Grad_Norm (if encoder trainable)

If norms consistently hit max_grad_norm, gradients are being clipped. Reduce learning rate.

Check for NaN in policy outputs

Add logging to ppo_doom.py:

# After forward pass
if torch.isnan(forward_logits).any():
    print("NaN detected in forward_logits!")
    print(f"Spike features: {spike_features}")
    print(f"Decoder weights: {self.decoder.forward_head.weight}")

Common causes:

Division by zero in normalization
Exploding decoder weights (check L2 regularization)
Invalid spike counts (negative values)

Low Reward Despite Good Behavior

Symptoms:

Agent visibly plays well (kills enemies, picks up items)
TensorBoard shows low Episode_Reward
Feedback stimulation seems ineffective

Causes:

Reward shaping misconfigured

# From README.md:149
config = PPOConfig(
    simplified_reward=True  # Disables manual shaping
)

Feedback thresholds too strict

config = PPOConfig(
    feedback_positive_threshold=1.0,  # Lower to 0.5
    feedback_negative_threshold=-1.0  # Raise to -0.5
)

Event feedback channels misconfigured From ppo_doom.py:165-257, check event_feedback_settings:

'enemy_kill': EventFeedbackConfig(
    channels=[35, 36, 38],
    base_frequency=20.0,
    base_amplitude=2.5,
    base_pulses=40,
    info_key='event_enemy_kill',
    td_sign='positive'
)

Solutions:

Enable simplified reward

From README.md line 149:

simplified_reward=True disables manually shaped aim alignment and velocity. I did more tuning on False so it’s probably better kept this way.

For deadly corridor:

config = PPOConfig(
    simplified_reward=False,
    aim_alignment_gain=2.5,
    aim_alignment_max_distance=250.0,
    movement_velocity_reward_scale=0.01
)

Verify feedback stimulation

Check UDP packet transmission:

# On training server
sudo tcpdump -i eth0 udp port 12348 -X

Inspect feedback_port logs for:

Packet send count
Amplitude/frequency values
Channel assignments

From training_server.py, feedback is sent via:

udp_protocol.send_feedback_command(
    sock=self.feedback_socket,
    addr=(self.config.cl1_host, self.config.cl1_feedback_port),
    channel_set=channel_set,
    stim_design=(phase1_dur, phase1_amp, phase2_dur, phase2_amp),
    burst_design=(burst_count, frequency)
)

Decoder Bias Dominating

Symptoms:

Decoder/forward_wx_bias_ratio < 1.0 (bias larger than weight*input)
Ablation modes (zero, random) show similar performance to real spikes
Agent behavior unchanged when neurons are silenced

Causes:

decoder_zero_bias=False (default in some configs)

config = PPOConfig(
    decoder_zero_bias=True  # Force bias=0
)

Decoder MLP learning a static policy

config = PPOConfig(
    decoder_use_mlp=False  # Use linear readout only
)

Encoder not trainable

config = PPOConfig(
    encoder_trainable=True  # Encoder must adapt to neurons
)

Solutions:

Run ablation tests

See Ablation Modes page.Quick test:

# Baseline
python training_server.py --mode train --decoder-ablation none --max-episodes 500

# Zero ablation (should fail to learn)
python training_server.py --mode train --decoder-ablation zero --max-episodes 500

Compare TensorBoard metrics. If both show similar reward curves, decoder bias is dominating.

Monitor wx/bias ratio

From ppo_doom.py:718-744:

def compute_weight_bias_metrics(self, spike_features: torch.Tensor) -> Dict[str, float]:
    metrics: Dict[str, float] = {}
    for name, head in self.heads.items():
        if isinstance(head, LinearReadoutHead):
            weight = head.effective_weight()
            wx = torch.matmul(head_input, weight.t()).abs().mean()
            bias_mean = head.bias.abs().mean()
            ratio = float((wx / (bias_mean + eps)).item())
            metrics[f'Decoder/{name}_wx_bias_ratio'] = ratio
    return metrics

In TensorBoard, check:

Decoder/forward_wx_bias_ratio (should be > 1.0)
Decoder/attack_wx_bias_ratio
Decoder/camera_wx_bias_ratio

If ratios < 1.0, bias is larger than weighted input.

Network Connectivity Issues

Symptoms:

cl1_neural_interface.py reports “Connection timeout”
Training server logs “No spike data received”
Sporadic packet loss

Causes:

IP address mismatch

# Find CL1 IP
hostname -I

# Find training server IP
ip addr show

Port already in use

# Check port availability
sudo netstat -tulpn | grep 12345

# Kill existing process
sudo kill <PID>

Network latency

# Measure round-trip time
ping -c 100 192.168.1.50 | tail -1

Solutions:

Verify UDP socket setup

From cl1_neural_interface.py:145-171:

def setup_sockets(self):
    # Socket for receiving stimulation commands
    self.stim_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    self.stim_socket.bind(("0.0.0.0", self.stim_port))
    self.stim_socket.setblocking(False)  # Non-blocking
    
    print(f"Listening for stimulation commands on port {self.stim_port}")
    
    # Socket for sending spike data
    self.spike_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    
    print(f"Will send spike data to {self.training_host}:{self.spike_port}")

Verify output:

Listening for stimulation commands on port 12345
Will send spike data to 192.168.1.100:12346
Listening for event metadata on port 12347
Listening for feedback commands on port 12348

Test UDP connectivity

# On training server (192.168.1.100)
echo "test" | nc -u 192.168.1.50 12345

# On CL1 device (192.168.1.50)
echo "test" | nc -u 192.168.1.100 12346

If packets don’t arrive, check:

Firewall rules (sudo ufw status)
Routing tables (ip route)
Network interface config (ifconfig)

TensorBoard Monitoring

Essential Metrics

tensorboard --logdir checkpoints/l5_2048_rand/logs --port 6006

From README.md lines 60-67:

# Monitor specific run
tensorboard --logdir checkpoints/l5_2048_rand/logs

# Compare multiple runs
tensorboard --logdir_spec \
    baseline:checkpoints/baseline/logs,\
    ablation:checkpoints/ablation_zero/logs

Key Plots

Training Progress

Episode Reward (Training/Episode_Reward)

Should increase over time
High variance early (exploration)
Plateaus indicate convergence or need for curriculum change

Kill Count (Training/Kill_Count)

Tracks combat effectiveness
Should correlate with reward
Flat = agent not engaging enemies

Survival Time (Training/Survival_Time)

Longer = better policy
Sudden drops = environment difficulty increase or policy collapse

Neural Activity

Total Spike Count (Spikes/total_count)

Should be > 0 every episode
If zero, check CL1 connection and stimulation

Spikes per Channel Set (Spikes/encoding, Spikes/move_forward, etc.)

Shows which channels are active
Uneven distribution may indicate channel imbalance

Stimulation Parameters (Encoder/freq_mean, Encoder/amp_mean)

Frequencies should be in [4.0, 40.0] Hz
Amplitudes in [1.0, 2.5] μA
Stuck values = encoder not learning

Policy Health

Policy Gradient Norm (Training/Policy_Grad_Norm)

Should stay < max_grad_norm (default 3.0)
Consistently hitting limit = reduce learning rate

Entropy (Training/Entropy)

Measures policy randomness
High early (exploration), decreases over time
Too low too fast = premature convergence

KL Divergence (Training/KL_Divergence)

Measures policy change between updates
Should be small (< 0.1)
Large spikes = policy instability

Decoder Diagnostics

wx/bias Ratio (Decoder/forward_wx_bias_ratio)

Should be > 1.0 (weights dominate bias)
< 1.0 = decoder bias is compensating for spikes
If decoder_zero_bias=True, bias metrics will be zero

Weight L2 Norm (Decoder/weight_l2)

Tracks decoder weight magnitude
Explosion (>1000) = add L2 regularization

Bias Absolute Mean (Decoder/forward_bias_abs_mean)

Should be near zero if decoder_zero_bias=True
Growing bias = decoder learning static policy

Custom Logging

Add debugging metrics to ppo_doom.py:

# In training loop
writer.add_scalar('Debug/spike_mean', spike_features.mean(), episode)
writer.add_scalar('Debug/spike_std', spike_features.std(), episode)
writer.add_scalar('Debug/reward_positive_count', (rewards > 0).sum(), episode)
writer.add_histogram('Debug/spike_distribution', spike_features, episode)

Debugging Commands

Check Process Status

# CL1 device
ps aux | grep cl1_neural_interface

# Training server
ps aux | grep training_server

# GPU usage
nvidia-smi

Monitor Resource Usage

# CPU/Memory
htop

# Disk I/O (checkpoints)
iotop

# Network traffic
iftop

Inspect Checkpoints

import torch

# Load checkpoint
checkpoint = torch.load('checkpoints/l5_2048_rand/episode_1000.pt')

# Inspect keys
print(checkpoint.keys())
# dict_keys(['episode', 'policy_state_dict', 'optimizer_state_dict', 'config'])

# Check episode number
print(f"Episode: {checkpoint['episode']}")

# Inspect policy weights
policy_state = checkpoint['policy_state_dict']
print(f"Encoder keys: {[k for k in policy_state.keys() if 'encoder' in k]}")
print(f"Decoder keys: {[k for k in policy_state.keys() if 'decoder' in k]}")

Test Neural Interface

# CL1 device - run with verbose logging
python cl1_neural_interface.py \
    --training-host 192.168.1.100 \
    --tick-frequency 10 \
    --recording-path ./test_recordings

# Should output:
# Listening for stimulation commands on port 12345
# Will send spike data to 192.168.1.100:12346
# Listening for event metadata on port 12347
# Listening for feedback commands on port 12348
# Neural loop started at 10 Hz

Getting Help

Collecting Diagnostic Info

# System info
uname -a
python --version
pip list | grep torch

# Network config
ifconfig
route -n

# Logs
tail -n 100 checkpoints/l5_2048_rand/logs/training.log

Common Error Messages

RuntimeError: CUDA out of memory

Solution:Reduce batch size or steps per update:

config = PPOConfig(
    batch_size=128,  # Down from 256
    steps_per_update=1024  # Down from 2048
)

Or switch to CPU:

python training_server.py --mode train --device cpu

ConnectionRefusedError: [Errno 111] Connection refused

Solution:CL1 device not running or wrong IP:

# Verify CL1 is running
ssh user@192.168.1.50 "ps aux | grep cl1"

# Check IP matches
ping 192.168.1.50

ValueError: Channel X is reserved and cannot be used

Solution:From ppo_doom.py:285-350, forbidden channels are .Edit channel assignments:

config = PPOConfig(
    encoding_channels=[8, 9, 10, 17, 18, 25, 27, 28],  # No forbidden channels
)

Reporting Issues

Include:

Full command used to start training/CL1
TensorBoard screenshots of key metrics
Last 50 lines of logs
Configuration used (PPOConfig values)
Network topology (IP addresses, ports)
System specs (GPU, RAM, OS)

Get Started

Core Concepts

Guides

Configuration

Advanced

Common Issues

No Spikes Detected

Training Divergence

Low Reward Despite Good Behavior

Decoder Bias Dominating

Network Connectivity Issues

TensorBoard Monitoring

Essential Metrics

Key Plots

Custom Logging

Debugging Commands

Check Process Status

Monitor Resource Usage

Inspect Checkpoints

Test Neural Interface

Getting Help

Collecting Diagnostic Info

Common Error Messages

Reporting Issues

Get Started

Core Concepts

Guides

Configuration

Advanced

​Common Issues

​No Spikes Detected

​Training Divergence

​Low Reward Despite Good Behavior

​Decoder Bias Dominating

​Network Connectivity Issues

​TensorBoard Monitoring

​Essential Metrics

​Key Plots

​Custom Logging

​Debugging Commands

​Check Process Status

​Monitor Resource Usage

​Inspect Checkpoints

​Test Neural Interface

​Getting Help

​Collecting Diagnostic Info

​Common Error Messages

​Reporting Issues

Common Issues

No Spikes Detected

Training Divergence

Low Reward Despite Good Behavior

Decoder Bias Dominating

Network Connectivity Issues

TensorBoard Monitoring

Essential Metrics

Key Plots

Custom Logging

Debugging Commands

Check Process Status

Monitor Resource Usage

Inspect Checkpoints

Test Neural Interface

Getting Help

Collecting Diagnostic Info

Common Error Messages

Reporting Issues