Welcome to DOOM Neuron

DOOM Neuron is a groundbreaking AI/ML project that trains biological neurons (CL1 hardware) to play DOOM using Proximal Policy Optimization (PPO) reinforcement learning.

This is real biological neural tissue learning to play a video game through electrical stimulation and spike-based feedback.

What Makes This Special?

Unlike traditional deep learning where silicon computes everything, DOOM Neuron:

Uses real biological neurons (CL1 hardware) for decision-making
Learns through electrical stimulation converted from game state
Adapts based on spike patterns from living neural tissue
Combines encoder-decoder architecture where the CL1 neurons are the core policy

Quick Start

Get a training session running in under 10 minutes with step-by-step setup

Installation

Install dependencies, CL SDK, VizDoom, and configure your environment

Architecture

Understand the encoder-decoder pipeline and how biological neurons learn

Configuration

Tune PPO hyperparameters, feedback settings, and scenario configs

How It Works

Game State to Electrical Signals

The encoder network converts DOOM observations (position, health, enemies, wall distances) into electrical stimulation parameters (frequency, amplitude, pulses).

CL1 Neural Response

Biological neurons on the CL1 hardware receive the electrical stimulation and produce spike patterns. The neurons have internal state (membrane potential, synaptic weights, adaptation currents) that evolves during training.

Spikes to Actions

The decoder network reads the spike counts from CL1 and outputs game actions (move forward/backward, strafe left/right, turn left/right, attack).

Reinforcement Learning

PPO optimizes the encoder and decoder using rewards (kills, armor pickups, survival). The biological neurons receive direct feedback through additional electrical pulses when positive/negative events occur.

Key Features

Biological Learning

The CL1 neurons are not static. They are dynamical systems with internal state that changes based on prior stimulation and feedback. During testing, encoder weights were frozen and improvements in reward were still observed, proving the neurons themselves are learning.

Hybrid Action Spaces

Supports both hybrid action spaces (separate categorical distributions for movement, camera, attack) and discrete action spaces for maximum flexibility.

Multiple Scenarios

Progressive Deathmatch: Survival mode with no ammo reset on kills (encourages resource management)
Survival: Classic survival gameplay
Deadly Corridor 1-5: Curriculum learning from easy to benchmark difficulty

Advanced Feedback System

Event-specific feedback with surprise scaling:

Positive events: enemy kills, armor pickups, approaching targets
Negative events: taking damage, wasting ammo, retreating from targets
TD error-based surprise modulation increases frequency/amplitude for unexpected events

FAQ

Isn't the decoder/PPO doing all the learning?

No, this is precisely why there are ablations. The footage was taken using a 0-bias full linear readout decoder, meaning that the action selected is a linear function of the output spikes from the CL1; the CL1 is doing the learning. There is a noticeable difference when using the ablation (both random and 0 spikes result in zero learning) versus actual CL1 spikes.

Isn't the encoder doing all the learning?

This assumes the cells are static, which is incorrect. Both the policy and the cells are dynamical systems. Biological neurons have internal state (membrane potential, synaptic weights, adaptation currents). The same stimulation delivered at different points in training produces different spike patterns because the neurons have been conditioned by prior feedback. During testing, encoder weights were frozen and still observed improvements in reward.

How is DOOM converted to electrical signals?

An encoder in the PPO policy dictates the stimulation pattern (frequency, amplitude, pulses, and which channels to stimulate). Because CL1 spikes are non-differentiable, the encoder is trained through PPO policy gradients using the log-likelihood trick (REINFORCE-style), i.e., by including the encoder’s sampled stimulation log-probs in the PPO objective rather than backpropagating through spikes.

Get Started

Core Concepts

Guides

Configuration

Advanced

DOOM Neuron - Train Biological Neurons to Play DOOM

Welcome to DOOM Neuron

What Makes This Special?

Quick Start

Installation

Architecture

Configuration

How It Works

Key Features

Biological Learning

Hybrid Action Spaces

Multiple Scenarios

Advanced Feedback System

FAQ

Next Steps

Get Started Now

Deep Dive

Get Started

Core Concepts

Guides

Configuration

Advanced

​Welcome to DOOM Neuron

​What Makes This Special?

Quick Start

Installation

Architecture

Configuration

​How It Works

​Key Features

​Biological Learning

​Hybrid Action Spaces

​Multiple Scenarios

​Advanced Feedback System

​FAQ

​Next Steps

Get Started Now

Deep Dive

Welcome to DOOM Neuron

What Makes This Special?

How It Works

Key Features

Biological Learning

Hybrid Action Spaces

Multiple Scenarios

Advanced Feedback System

FAQ

Next Steps