Overview
The encoder network converts game observations into stimulation parameters (frequency and amplitude) for biological neurons. The decoder network reads spike features from neurons and outputs action logits. Together they form the biological neural interface.Encoder Configuration
Trainability
Whether the encoder weights are trainable via backpropagation.When
True, the encoder learns to generate optimal stimulation parameters using Beta distributions. When False, uses fixed sigmoid-based mapping.Code comment: “Can try turning it False but I would say True is needed for reasonable PPO policy gradients especially if decoder_use_mlp: False”
Entropy Coefficient
Entropy penalty coefficient for encoder Beta distributions.Negative value acts as entropy penalty (encourages more deterministic stimulation). Positive values would encourage exploration in stimulation space.
CNN Visual Processing
Enable CNN processing of visual screen buffer.When enabled, adds a convolutional neural network to process downsampled game screen before the encoder MLP.
Code comment: “With my testing it seems like the CNN does not overfit/learn on its own, seems useful to keep True”
Base number of CNN channels in the first convolutional layer.The CNN architecture uses progressive channel expansion:
- Layer 1:
encoder_cnn_channels(default 16) - Layer 2:
encoder_cnn_channels * 2(default 32) - Layer 3:
encoder_cnn_channels * 4(default 64)
In
training_server.py, this is increased to 64 channels per DOOM Initial Report for better visual feature extraction.Downsampling factor for screen buffer before CNN processing.Original resolution is divided by this factor. For example, with 320×240 resolution and downsample=4, CNN processes 80×60 images.
CNN Architecture Details
The encoder CNN uses the following architecture:Decoder Configuration
Architecture Type
Use MLP decoder instead of linear readout heads.
False: Direct linear readout from spike features (recommended)True: 2-layer MLP processes spikes before action heads
Hidden layer size when
decoder_use_mlp=True.In ppo_doom.py: default is 32
In training_server.py: increased to 256Weight Constraints
Enforce non-negative weights in decoder linear readout heads.When
True, applies softplus activation to weights: weight = softplus(raw_weight)This ensures all spike contributions are positive, which can be biologically interpretable.Freeze all decoder parameters (no gradient updates).Useful for testing whether the encoder alone can learn, or for transfer learning scenarios.
Force decoder bias terms to zero and disable bias gradients.Setting bias to zero ensures actions are driven entirely by spike activity, not learned biases.
Code comment: “Prefer to be true, needs testing, bias tends to cause the decoder to generate its own predictions for movement”
L2 Regularization
L2 regularization coefficient for decoder weights.Penalizes large weights to encourage simpler linear readouts. Currently untuned (set to 0.0).
L2 regularization coefficient for decoder biases.Currently untuned (set to 0.0).
Ablation Testing
Ablation mode for testing decoder learning.
'none': Normal operation, use real spike features'zero': Replace spike features with zeros'random': Replace spike features with random values
Network Architecture
Hidden Layer Size
Hidden layer size for encoder, decoder MLP, and value network.Used across all network components for consistency.
Example Configurations
Minimal Linear Decoder
CNN-Based Encoder
MLP Decoder (Experimental)
Frozen Decoder Testing
Related Configuration
PPO Hyperparameters
Learning rate, gamma, GAE settings
Feedback Tuning
Stimulation feedback parameters