Ablation modes allow you to test whether the CL1 biological neurons are genuinely learning, or if the decoder network is doing all the work. By replacing real neural spikes with controlled alternatives, you can isolate and validate the neurons’ contribution to gameplay.
Ablations are critical for scientific validation. Without them, you cannot prove that the biological neurons (rather than the decoder/PPO policy) are responsible for learned behavior.
Use actual spike data from the CL1 hardware. This is the standard training/evaluation mode.
config = PPOConfig( decoder_ablation_mode='none')
In this mode, spike features flow directly from collect_spikes() to the decoder:
# ppo_doom.py:1066-1082def collect_spikes(self, tick: 'cl.LoopTick') -> np.ndarray: """ Collect and count spikes from CL SDK tick. Returns: spike_counts: (num_channel_sets,) array of spike counts per channel set """ spike_counts = np.zeros(self.num_channel_sets, dtype=np.float32) for spike in tick.analysis.spikes: idx = self.channel_lookup.get(spike.channel) if idx is not None: spike_counts[idx] += 1 return spike_counts
With zero ablation, you should observe no learning. If the agent still improves, the decoder bias or encoder is compensating, which invalidates the claim that neurons are learning.
Replace spike counts with random values sampled uniformly from [0, 1]. This tests whether structured neural responses are necessary, or if any input works.
# training_server.py --decoder-ablation randompython training_server.py \ --mode train \ --device cuda \ --cl1-host 192.168.1.50 \ --decoder-ablation random
With random ablation, learning should be severely impaired or absent. If performance matches real spikes, the decoder is learning a static policy independent of neural input.
No, this is precisely why there are ablations. The footage you see in the video was taken using a 0-bias full linear readout decoder, meaning that the action selected is a linear function of the output spikes from the CL1; the CL1 is doing the learning. There is a noticeable difference when using the ablation (both random and 0 spikes result in zero learning) versus actual CL1 spikes.Source: README.md FAQ section
Isn't the encoder/PPO doing all the learning?
This question largely assumes that the cells are static, which is incorrect; it is not a memory-less “feed X in, get Y” machine. Both the policy and the cells are dynamical systems; biological neurons have an internal state (membrane potential, synaptic weights, adaptation currents).The same stimulation delivered at different points in training will produce different spike patterns, because the neurons have been conditioned by prior feedback. During testing, we froze encoder weights and still observed improvements in the reward.Source: README.md FAQ section
When running ablations, ensure your decoder configuration isolates neural contributions:
config = PPOConfig( # Ablation mode decoder_ablation_mode='zero', # or 'random' # Decoder settings to prevent decoder-side learning decoder_zero_bias=True, # Force bias=0 so actions depend on spikes decoder_use_mlp=False, # Use linear readout only decoder_enforce_nonnegative=False, decoder_freeze_weights=False, # Regularization (optional) decoder_weight_l2_coef=0.0, decoder_bias_l2_coef=0.0)
decoder_zero_bias=TrueKeeps bias at zero so decoded actions depend solely on encoder output. This helped prevent decoder-sided learning in testing, but may behave differently on actual hardware since the SDK spikes were random. This should definitely be tested with ablations!decoder_use_mlp=FalseDefault linear decoder keeps hardware mapping transparent. Enable the MLP when you require richer non-linear policies (expect higher sample complexity; decoder also tends to start becoming a policy head, but this might be due to random spike noise from the SDK).Source: README.md lines 30-32