Skip to main content

Common Issues

No Spikes Detected

Symptoms:
  • TensorBoard shows Spikes/total_count = 0
  • Agent behavior is random/static
  • Episode rewards flat or declining
Causes:
  1. CL1 device not connected
    # Check if CL1 interface is running
    ps aux | grep cl1_neural_interface
    
  2. UDP port mismatch
    # CL1 side
    python cl1_neural_interface.py --training-host 192.168.1.100 --spike-port 12346
    
    # Training side
    python training_server.py --cl1-host 192.168.1.50 --cl1-spike-port 12346
    
  3. Firewall blocking UDP
    # Allow ports 12345-12348
    sudo ufw allow 12345:12348/udp
    
  4. Network latency/packet loss
    # Test connectivity
    ping -c 10 192.168.1.50
    
    # Monitor UDP traffic
    sudo tcpdump -i eth0 udp port 12346
    
Solutions:
From cl1_neural_interface.py:200-220:
def collect_spikes(self, tick: cl.LoopTick) -> np.ndarray:
    spike_counts = np.zeros(len(self.channel_groups), dtype=np.float32)
    for spike in tick.analysis.spikes:
        idx = self.channel_lookup.get(spike.channel)
        if idx is not None:
            spike_counts[idx] += 1
    return spike_counts
Add debug logging:
if len(tick.analysis.spikes) > 0:
    print(f"Collected {len(tick.analysis.spikes)} spikes")
From cl1_neural_interface.py:173-200:
def apply_stimulation(
    self,
    neurons: cl.Neurons,
    frequencies: np.ndarray,
    amplitudes: np.ndarray
):
    # Interrupt ongoing stimulation
    neurons.interrupt(self.config.all_channels_set)
    
    # Apply stimulation for each channel set
    for i, channel_num in enumerate(self.config.encoding_channels):
        channel_set = cl.ChannelSet(channel_num)
        amplitude_value = float(amplitudes[i])
        freq_value = int(frequencies[i])
        
        # Create stimulation design
        stim_design = cl.StimDesign(
            self.config.phase1_duration,
            -amplitude_value,  # Negative phase
            self.config.phase2_duration,
            amplitude_value    # Positive phase
        )
        
        burst_design = cl.BurstDesign(
            self.config.burst_count,
            freq_value
        )
        
        neurons.stim(channel_set, stim_design, burst_design)
Verify stimulation parameters:
  • frequencies in range [4.0, 40.0] Hz
  • amplitudes in range [1.0, 2.5] μA
  • phase1_duration and phase2_duration = 160 μs

Training Divergence

Symptoms:
  • Reward suddenly drops to zero
  • Policy outputs NaN values
  • Gradient norms explode (>100)
Causes:
  1. Learning rate too high
    config = PPOConfig(
        learning_rate=3e-4  # Try 1e-4 or 3e-5
    )
    
  2. Gradient clipping too weak
    config = PPOConfig(
        max_grad_norm=3.0  # Try 1.0 or 0.5
    )
    
  3. Unnormalized returns causing value explosion
    config = PPOConfig(
        normalize_returns=True  # From README.md:123
    )
    
Solutions:
From TensorBoard:
tensorboard --logdir checkpoints/l5_2048_rand/logs --port 6006
Watch these metrics:
  • Training/Policy_Grad_Norm (should stay < 10)
  • Training/Value_Grad_Norm (should stay < 10)
  • Training/Encoder_Grad_Norm (if encoder trainable)
If norms consistently hit max_grad_norm, gradients are being clipped. Reduce learning rate.
Add logging to ppo_doom.py:
# After forward pass
if torch.isnan(forward_logits).any():
    print("NaN detected in forward_logits!")
    print(f"Spike features: {spike_features}")
    print(f"Decoder weights: {self.decoder.forward_head.weight}")
Common causes:
  • Division by zero in normalization
  • Exploding decoder weights (check L2 regularization)
  • Invalid spike counts (negative values)

Low Reward Despite Good Behavior

Symptoms:
  • Agent visibly plays well (kills enemies, picks up items)
  • TensorBoard shows low Episode_Reward
  • Feedback stimulation seems ineffective
Causes:
  1. Reward shaping misconfigured
    # From README.md:149
    config = PPOConfig(
        simplified_reward=True  # Disables manual shaping
    )
    
  2. Feedback thresholds too strict
    config = PPOConfig(
        feedback_positive_threshold=1.0,  # Lower to 0.5
        feedback_negative_threshold=-1.0  # Raise to -0.5
    )
    
  3. Event feedback channels misconfigured From ppo_doom.py:165-257, check event_feedback_settings:
    'enemy_kill': EventFeedbackConfig(
        channels=[35, 36, 38],
        base_frequency=20.0,
        base_amplitude=2.5,
        base_pulses=40,
        info_key='event_enemy_kill',
        td_sign='positive'
    )
    
Solutions:
From README.md line 149:
simplified_reward=True disables manually shaped aim alignment and velocity. I did more tuning on False so it’s probably better kept this way.
For deadly corridor:
config = PPOConfig(
    simplified_reward=False,
    aim_alignment_gain=2.5,
    aim_alignment_max_distance=250.0,
    movement_velocity_reward_scale=0.01
)
Check UDP packet transmission:
# On training server
sudo tcpdump -i eth0 udp port 12348 -X
Inspect feedback_port logs for:
  • Packet send count
  • Amplitude/frequency values
  • Channel assignments
From training_server.py, feedback is sent via:
udp_protocol.send_feedback_command(
    sock=self.feedback_socket,
    addr=(self.config.cl1_host, self.config.cl1_feedback_port),
    channel_set=channel_set,
    stim_design=(phase1_dur, phase1_amp, phase2_dur, phase2_amp),
    burst_design=(burst_count, frequency)
)

Decoder Bias Dominating

Symptoms:
  • Decoder/forward_wx_bias_ratio < 1.0 (bias larger than weight*input)
  • Ablation modes (zero, random) show similar performance to real spikes
  • Agent behavior unchanged when neurons are silenced
Causes:
  1. decoder_zero_bias=False (default in some configs)
    config = PPOConfig(
        decoder_zero_bias=True  # Force bias=0
    )
    
  2. Decoder MLP learning a static policy
    config = PPOConfig(
        decoder_use_mlp=False  # Use linear readout only
    )
    
  3. Encoder not trainable
    config = PPOConfig(
        encoder_trainable=True  # Encoder must adapt to neurons
    )
    
Solutions:
See Ablation Modes page.Quick test:
# Baseline
python training_server.py --mode train --decoder-ablation none --max-episodes 500

# Zero ablation (should fail to learn)
python training_server.py --mode train --decoder-ablation zero --max-episodes 500
Compare TensorBoard metrics. If both show similar reward curves, decoder bias is dominating.
From ppo_doom.py:718-744:
def compute_weight_bias_metrics(self, spike_features: torch.Tensor) -> Dict[str, float]:
    metrics: Dict[str, float] = {}
    for name, head in self.heads.items():
        if isinstance(head, LinearReadoutHead):
            weight = head.effective_weight()
            wx = torch.matmul(head_input, weight.t()).abs().mean()
            bias_mean = head.bias.abs().mean()
            ratio = float((wx / (bias_mean + eps)).item())
            metrics[f'Decoder/{name}_wx_bias_ratio'] = ratio
    return metrics
In TensorBoard, check:
  • Decoder/forward_wx_bias_ratio (should be > 1.0)
  • Decoder/attack_wx_bias_ratio
  • Decoder/camera_wx_bias_ratio
If ratios < 1.0, bias is larger than weighted input.

Network Connectivity Issues

Symptoms:
  • cl1_neural_interface.py reports “Connection timeout”
  • Training server logs “No spike data received”
  • Sporadic packet loss
Causes:
  1. IP address mismatch
    # Find CL1 IP
    hostname -I
    
    # Find training server IP
    ip addr show
    
  2. Port already in use
    # Check port availability
    sudo netstat -tulpn | grep 12345
    
    # Kill existing process
    sudo kill <PID>
    
  3. Network latency
    # Measure round-trip time
    ping -c 100 192.168.1.50 | tail -1
    
Solutions:
From cl1_neural_interface.py:145-171:
def setup_sockets(self):
    # Socket for receiving stimulation commands
    self.stim_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    self.stim_socket.bind(("0.0.0.0", self.stim_port))
    self.stim_socket.setblocking(False)  # Non-blocking
    
    print(f"Listening for stimulation commands on port {self.stim_port}")
    
    # Socket for sending spike data
    self.spike_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    
    print(f"Will send spike data to {self.training_host}:{self.spike_port}")
Verify output:
Listening for stimulation commands on port 12345
Will send spike data to 192.168.1.100:12346
Listening for event metadata on port 12347
Listening for feedback commands on port 12348
# On training server (192.168.1.100)
echo "test" | nc -u 192.168.1.50 12345

# On CL1 device (192.168.1.50)
echo "test" | nc -u 192.168.1.100 12346
If packets don’t arrive, check:
  • Firewall rules (sudo ufw status)
  • Routing tables (ip route)
  • Network interface config (ifconfig)

TensorBoard Monitoring

Essential Metrics

tensorboard --logdir checkpoints/l5_2048_rand/logs --port 6006
From README.md lines 60-67:
# Monitor specific run
tensorboard --logdir checkpoints/l5_2048_rand/logs

# Compare multiple runs
tensorboard --logdir_spec \
    baseline:checkpoints/baseline/logs,\
    ablation:checkpoints/ablation_zero/logs

Key Plots

Episode Reward (Training/Episode_Reward)
  • Should increase over time
  • High variance early (exploration)
  • Plateaus indicate convergence or need for curriculum change
Kill Count (Training/Kill_Count)
  • Tracks combat effectiveness
  • Should correlate with reward
  • Flat = agent not engaging enemies
Survival Time (Training/Survival_Time)
  • Longer = better policy
  • Sudden drops = environment difficulty increase or policy collapse
Total Spike Count (Spikes/total_count)
  • Should be > 0 every episode
  • If zero, check CL1 connection and stimulation
Spikes per Channel Set (Spikes/encoding, Spikes/move_forward, etc.)
  • Shows which channels are active
  • Uneven distribution may indicate channel imbalance
Stimulation Parameters (Encoder/freq_mean, Encoder/amp_mean)
  • Frequencies should be in [4.0, 40.0] Hz
  • Amplitudes in [1.0, 2.5] μA
  • Stuck values = encoder not learning
Policy Gradient Norm (Training/Policy_Grad_Norm)
  • Should stay < max_grad_norm (default 3.0)
  • Consistently hitting limit = reduce learning rate
Entropy (Training/Entropy)
  • Measures policy randomness
  • High early (exploration), decreases over time
  • Too low too fast = premature convergence
KL Divergence (Training/KL_Divergence)
  • Measures policy change between updates
  • Should be small (< 0.1)
  • Large spikes = policy instability
wx/bias Ratio (Decoder/forward_wx_bias_ratio)
  • Should be > 1.0 (weights dominate bias)
  • < 1.0 = decoder bias is compensating for spikes
  • If decoder_zero_bias=True, bias metrics will be zero
Weight L2 Norm (Decoder/weight_l2)
  • Tracks decoder weight magnitude
  • Explosion (>1000) = add L2 regularization
Bias Absolute Mean (Decoder/forward_bias_abs_mean)
  • Should be near zero if decoder_zero_bias=True
  • Growing bias = decoder learning static policy

Custom Logging

Add debugging metrics to ppo_doom.py:
# In training loop
writer.add_scalar('Debug/spike_mean', spike_features.mean(), episode)
writer.add_scalar('Debug/spike_std', spike_features.std(), episode)
writer.add_scalar('Debug/reward_positive_count', (rewards > 0).sum(), episode)
writer.add_histogram('Debug/spike_distribution', spike_features, episode)

Debugging Commands

Check Process Status

# CL1 device
ps aux | grep cl1_neural_interface

# Training server
ps aux | grep training_server

# GPU usage
nvidia-smi

Monitor Resource Usage

# CPU/Memory
htop

# Disk I/O (checkpoints)
iotop

# Network traffic
iftop

Inspect Checkpoints

import torch

# Load checkpoint
checkpoint = torch.load('checkpoints/l5_2048_rand/episode_1000.pt')

# Inspect keys
print(checkpoint.keys())
# dict_keys(['episode', 'policy_state_dict', 'optimizer_state_dict', 'config'])

# Check episode number
print(f"Episode: {checkpoint['episode']}")

# Inspect policy weights
policy_state = checkpoint['policy_state_dict']
print(f"Encoder keys: {[k for k in policy_state.keys() if 'encoder' in k]}")
print(f"Decoder keys: {[k for k in policy_state.keys() if 'decoder' in k]}")

Test Neural Interface

# CL1 device - run with verbose logging
python cl1_neural_interface.py \
    --training-host 192.168.1.100 \
    --tick-frequency 10 \
    --recording-path ./test_recordings

# Should output:
# Listening for stimulation commands on port 12345
# Will send spike data to 192.168.1.100:12346
# Listening for event metadata on port 12347
# Listening for feedback commands on port 12348
# Neural loop started at 10 Hz

Getting Help

Collecting Diagnostic Info

# System info
uname -a
python --version
pip list | grep torch

# Network config
ifconfig
route -n

# Logs
tail -n 100 checkpoints/l5_2048_rand/logs/training.log

Common Error Messages

Solution:Reduce batch size or steps per update:
config = PPOConfig(
    batch_size=128,  # Down from 256
    steps_per_update=1024  # Down from 2048
)
Or switch to CPU:
python training_server.py --mode train --device cpu
Solution:CL1 device not running or wrong IP:
# Verify CL1 is running
ssh user@192.168.1.50 "ps aux | grep cl1"

# Check IP matches
ping 192.168.1.50
Solution:From ppo_doom.py:285-350, forbidden channels are .Edit channel assignments:
config = PPOConfig(
    encoding_channels=[8, 9, 10, 17, 18, 25, 27, 28],  # No forbidden channels
)

Reporting Issues

Include:
  1. Full command used to start training/CL1
  2. TensorBoard screenshots of key metrics
  3. Last 50 lines of logs
  4. Configuration used (PPOConfig values)
  5. Network topology (IP addresses, ports)
  6. System specs (GPU, RAM, OS)