# NanoGrad: Automatic Differentiation Framework for CHICKEN Scheme

A lightweight, YASOS-based automatic differentiation and neural
network framework for CHICKEN Scheme, featuring BLAS-accelerated
operations and a clean functional API.

## Features

- **Automatic Differentiation**: Reverse-mode autodiff with topological sorting for correct gradient computation
- **BLAS Integration**: High-performance linear algebra operations using CBLAS
- **YASOS Object System**: Clean, polymorphic object-oriented abstractions
- **Mixed Precision**: Support for both 32-bit (f32) and 64-bit (f64) floating-point
- **Neural Network Layers**: Dense layers, convolutional layers, batch normalization, and sequential containers
- **Activation Functions**: ReLU, Tanh, Sigmoid, Softmax, LeakyReLU, Softplus, SiLU, GeLU
- **Optimizers**: SGD (with momentum), Adam, RMSprop
- **Loss Functions**: MSE, Cross-Entropy
- **Advanced Operations**: Convolution, RMSNorm, Layer Normalization, Batch Normalization, Global Pooling
- **Tensor Operations**: Reduction operations, slicing, reshaping with full gradient support

## Installation

```bash
# Install dependencies
chicken-install yasos blas mathh srfi-1 srfi-4 srfi-42 srfi-69

# Clone the repository
git clone https://github.com/iraikov/nanograd.git
cd nanograd

chicken-install
```

## Quick Start

### Basic Tensor Operations

```scheme
(import nanograd-autograd)

;; Create tensors with automatic differentiation
(define x (make-tensor32 (f32vector 1.0 2.0 3.0) '(3) requires-grad?: #t))
(define y (make-tensor32 (f32vector 4.0 5.0 6.0) '(3) requires-grad?: #t))

;; Element-wise operations
(define z (add x y))  ; z = x + y
(define w (mul x y))  ; w = x * y

;; Matrix operations
(define A (make-tensor32 (f32vector 1.0 2.0 3.0 4.0) '(2 2)))
(define b (make-tensor32 (f32vector 1.0 2.0) '(2)))
(define result (matmul-op A b))  ; Matrix-vector multiplication

;; Compute gradients
(backward! result)
(print-tensor (tensor-grad A))
```

### Reduction Operations

```scheme
;; Sum all elements
(define total (sum-tensor x))

;; Compute mean
(define avg (mean-tensor x))

;; Compute product
(define prod (product-tensor x))

;; Custom reduction with gradient
(define custom-result
  (reduce-tensor x max
    compute-gradient: (lambda (grad-out idx val all-values)
                       ;; Custom gradient logic
                       (if (= val (apply max all-values))
                           grad-out
                           0.0))))
```

### Tensor Slicing

```scheme
;; Extract slice along first dimension
(define batch (make-tensor32 (make-f32vector 100) '(10 10)))
(define slice (slice-tensor batch 2 5))  ; Extract elements 2-6 along first dim

;; Gradients flow back correctly
(backward! (sum-tensor slice))
(print-tensor (tensor-grad batch))  ; Only positions 2-6 have gradients
```

### Building a Neural Network

```scheme
(import nanograd-layer nanograd-optimizer)

;; Define a simple classification network
(define model
  (make-sequential
   (list
    (make-dense-layer 784 128 activation: (make-relu) name: "Hidden1")
    (make-dense-layer 128 64 activation: (make-relu) name: "Hidden2")
    (make-dense-layer 64 10 activation: (make-identity) name: "Output"))
   name: "Classifier"))

;; Create optimizer
(define optimizer (make-adam (parameters model) learning-rate: 0.001))

;; Training loop
(do ((epoch 1 (+ epoch 1)))
    ((> epoch 10))
  
  ;; Training mode
  (set-training-mode! model #t)
  
  (for-each
   (lambda (batch)
     (let* ((x (car batch))
            (target (cdr batch))
            (pred (forward model x))
            (loss (cross-entropy-loss (softmax pred) target)))
       
       ;; Backward pass and optimize
       (backward! loss)
       (step! optimizer)
       (zero-grad-layer! model)))
   training-data)
  
  ;; Evaluation mode
  (set-eval-mode! model)
  (evaluate-model model validation-data))
```

### Convolutional Neural Network with Batch Normalization

```scheme
(define cnn
  (make-sequential
   (list
    (make-conv2d-layer 3 32 3 stride: 1 padding: 1 
                       activation: (make-relu) name: "Conv1")
    (make-batch-norm-2d 32 name: "BN1")
    (make-conv2d-layer 32 64 3 stride: 1 padding: 1 
                       activation: (make-relu) name: "Conv2")
    (make-batch-norm-2d 64 name: "BN2")
    ;; Global average pooling reduces spatial dimensions
    (make-dense-layer 64 128 activation: (make-relu) name: "FC1")
    (make-dense-layer 128 10 activation: (make-identity) name: "Output"))
   name: "CNN"))

;; Forward pass with global average pooling
(define (forward-with-pooling model input)
  (let* ((conv-output (forward (list-ref (get-layers model) 0) input))
         (bn-output (forward (list-ref (get-layers model) 1) conv-output))
         (pooled (global-avg-pool2d bn-output)))
    (forward (list-ref (get-layers model) 2) pooled)))
```

## Architecture

### Module Structure

- **`nanograd-autograd`**: Core automatic differentiation engine
  - Tensor abstraction with YASOS
  - Arithmetic operations (add, sub, mul, div)
  - BLAS operations (matmul, dot, scale)
  - Activation functions
  - Loss functions
  - Reduction operations (sum, mean, product, custom reductions)
  - Tensor manipulation (slice, reshape, flatten)
  - Gradient computation with cycle detection

- **`nanograd-layer`**: Neural network layer abstractions
  - Dense (fully connected) layers
  - Convolutional layers (2D)
  - Batch normalization (2D)
  - Global average pooling
  - Sequential containers
  - Activation function objects
  - Training/evaluation mode control

- **`nanograd-optimizer`**: Optimization algorithms
  - SGD with momentum and Nesterov
  - Adam with bias correction
  - RMSprop with momentum

### Design Principles

1. **Functional Programming**: Immutable tensors, pure operations where possible
2. **YASOS Objects**: Clean polymorphic dispatch for operations
3. **BLAS Efficiency**: Leverage optimized linear algebra for performance
4. **Explicit Gradient Management**: Manual control over backward passes
5. **Mixed Precision**: First-class support for both f32 and f64

## API Reference

### Tensor Operations

#### Constructors
```scheme
(make-tensor32 data shape #:key (requires-grad? #t))
(make-tensor64 data shape #:key (requires-grad? #t))
```

#### Accessors
```scheme
(tensor-data tensor)        ; Get underlying data vector
(tensor-grad tensor)        ; Get gradient vector
(tensor-shape tensor)       ; Get shape list
(tensor-dtype tensor)       ; Get dtype ('f32 or 'f64)
(tensor-requires-grad? t)   ; Check if gradients enabled
```

#### Arithmetic
```scheme
(add a b)                   ; Element-wise addition
(sub a b)                   ; Element-wise subtraction
(mul a b)                   ; Element-wise multiplication
(div a b)                   ; Element-wise division
(safe-div a b #:key (epsilon 1e-8))
```

#### Linear Algebra
```scheme
(matmul-op a b)            ; Matrix multiplication
(dot-op a b)               ; Dot product
(scale-op tensor scalar)   ; Scalar multiplication
```

#### Reduction Operations
```scheme
(reduce-tensor tensor reducer #:key (compute-gradient #f))
  ; Generic reduction with custom gradient
  ; reducer: (element accumulator) -> new-accumulator
  ; compute-gradient: (grad-out index value all-values) -> grad-in

(sum-tensor tensor)        ; Sum all elements (gradient: uniform)
(mean-tensor tensor)       ; Mean of all elements
(product-tensor tensor)    ; Product of all elements (gradient: product rule)
```

**Example: Custom Maximum Reduction**
```scheme
(define (max-tensor tensor)
  (reduce-tensor tensor max
    compute-gradient: (lambda (grad-out idx val all-values)
                       ;; Gradient flows only to maximum element
                       (if (= val (apply max all-values))
                           grad-out
                           0.0))))
```

#### Tensor Manipulation
```scheme
(slice-tensor tensor start length)
  ; Extract slice along first dimension
  ; tensor: Input tensor with shape (n, ...)
  ; start: Starting index
  ; length: Number of elements to extract
  ; Returns: Tensor with shape (length, ...)

(reshape tensor new-shape)  ; Reshape (must preserve total elements)
(flatten-tensor tensor)     ; Flatten to 1D
```

**Example: Batch Processing**
```scheme
;; Process mini-batches from a dataset
(define dataset (make-tensor32 (make-f32vector 1000) '(100 10)))

(do ((i 0 (+ i batch-size)))
    ((>= i 100))
  (let ((batch (slice-tensor dataset i batch-size)))
    (process-batch model batch)))
```

#### Activations
```scheme
(relu tensor)              ; ReLU activation
(tanh-op tensor)           ; Hyperbolic tangent
(sigmoid tensor)           ; Sigmoid (logistic)
(sigmoid-stable tensor)    ; Numerically stable sigmoid
(softmax tensor)           ; Softmax normalization
(log-softmax tensor)       ; Log-softmax
(silu tensor)              ; SiLU
(gelu tensor)              ; GeLU
(leaky-relu tensor #:key (alpha 0.01))
(softplus tensor #:key (beta 1.0))
```

#### Loss Functions
```scheme
(mse-loss pred target)              ; Mean squared error
(cross-entropy-loss pred target)    ; Cross-entropy loss
```

#### Gradient Operations
```scheme
(zero-grad! tensor)        ; Zero out gradients
(backward! tensor)         ; Compute gradients via backprop
(add-to-grad! tensor delta) ; Accumulate gradients
```

### Layer API

#### Layer Construction
```scheme
(make-dense-layer input-size output-size 
                  #:key (activation (make-identity))
                        (dtype 'f32)
                        (name "Dense"))

(make-conv2d-layer in-channels out-channels kernel-size
                   #:key (stride 1)
                         (padding 0)
                         (activation (make-identity))
                         (dtype 'f32)
                         (name "Conv2D"))

(make-batch-norm-2d num-features
                    #:key (epsilon 1e-5)
                          (momentum 0.1)
                          (dtype 'f32)
                          (name "BatchNorm2d"))

(make-sequential layers #:key (name "Sequential"))
```

#### Batch Normalization

Batch Normalization normalizes activations across the batch dimension, improving training stability and convergence:

```scheme
;; Create batch norm layer for 64 channels
(define bn (make-batch-norm-2d 64 epsilon: 1e-5 momentum: 0.1))

;; In training mode: uses batch statistics and updates running stats
(set-training-mode! bn #t)
(define normalized (forward bn input))  ; input shape: (64, H, W)

;; In eval mode: uses running statistics
(set-eval-mode! bn)
(define normalized (forward bn input))  ; Deterministic output
```

**Key Features:**
- Learnable scale (gamma) and shift (beta) parameters
- Running mean and variance for evaluation
- Training/eval mode switching
- Numerical stability with epsilon parameter

#### Global Average Pooling

```scheme
(global-avg-pool2d input)
  ; Global average pooling over spatial dimensions
  ; Input shape: (C, H, W)
  ; Output shape: (C,)
  ; Gradients distributed uniformly over spatial dimensions
```

**Example: Replace Fully Connected Layers**
```scheme
;; Traditional approach: flatten + dense
(define old-approach
  (make-sequential
   (list
    (make-conv2d-layer 64 128 3)
    ;; flatten: (128, 8, 8) -> (8192,)
    (make-dense-layer 8192 10))))

;; Modern approach: global average pooling + dense
(define new-approach
  (make-sequential
   (list
    (make-conv2d-layer 64 128 3)
    ;; global avg pool: (128, 8, 8) -> (128,)
    (make-dense-layer 128 10))))

;; Fewer parameters, better generalization!
```

#### Layer Operations
```scheme
(forward layer input)      ; Forward pass
(parameters layer)         ; Get trainable parameters
(zero-grad-layer! layer)   ; Zero all parameter gradients

;; Training/Evaluation Mode Control
(set-training-mode! layer training?)  ; Set training mode (boolean)
(set-eval-mode! layer)                ; Set evaluation mode (shorthand)
```

**Training vs Evaluation Mode:**
- **Training Mode**: 
  - Batch norm uses batch statistics
  - Dropout is active (if implemented)
  - Stochastic behavior enabled
  
- **Evaluation Mode**:
  - Batch norm uses running statistics
  - Dropout is disabled
  - Deterministic behavior

```scheme
;; Training
(set-training-mode! model #t)
(for-each train-step training-batches)

;; Evaluation
(set-eval-mode! model)
(define accuracy (evaluate model test-data))
```

#### Activation Objects
```scheme
(make-relu)                ; ReLU activation
(make-tanh)                ; Tanh activation
(make-sigmoid)             ; Sigmoid activation
(make-silu)                ; SiLU activation
(make-gelu)                ; GeLU activation
(make-identity)            ; No activation
```

### Optimizer API

#### Optimizer Construction
```scheme
(make-sgd parameters 
          #:key (learning-rate 0.01)
                (momentum 0.0)
                (weight-decay 0.0)
                (nesterov #f))

(make-adam parameters
           #:key (learning-rate 0.001)
                 (beta1 0.9)
                 (beta2 0.999)
                 (epsilon 1e-8)
                 (weight-decay 0.0))

(make-rmsprop parameters
              #:key (learning-rate 0.01)
                    (alpha 0.99)
                    (epsilon 1e-8)
                    (weight-decay 0.0)
                    (momentum 0.0))
```

#### Optimizer Operations
```scheme
(step! optimizer)                ; Apply parameter updates
(get-learning-rate optimizer)    ; Get current learning rate
(set-learning-rate! optimizer lr) ; Update learning rate
(optimizer-state optimizer)      ; Get optimizer configuration
```

## Examples

See the `examples/` directory for complete working examples:

- Linear regression
- Binary classification
- Multi-class classification
- Learning rate scheduling
- Batch training
- Convolutional networks with batch normalization

### Complete Training Example with Batch Norm

```scheme
(import nanograd-autograd nanograd-layer nanograd-optimizer)

;; Define ResNet-style block
(define (make-resnet-block in-channels out-channels)
  (make-sequential
   (list
    (make-conv2d-layer in-channels out-channels 3 
                       padding: 1 activation: (make-identity))
    (make-batch-norm-2d out-channels)
    ;; ReLU applied separately
    (make-conv2d-layer out-channels out-channels 3
                       padding: 1 activation: (make-identity))
    (make-batch-norm-2d out-channels))
   name: "ResNetBlock"))

;; Full model
(define model
  (make-sequential
   (list
    (make-conv2d-layer 3 64 7 stride: 2 padding: 3)
    (make-batch-norm-2d 64)
    (make-resnet-block 64 64)
    (make-resnet-block 64 128)
    ;; ... more blocks ...
    )
   name: "ResNet"))

;; Training loop with proper mode switching
(define (train-epoch model optimizer train-data)
  (set-training-mode! model #t)
  
  (for-each
   (lambda (batch)
     (let* ((x (car batch))
            (y (cdr batch))
            (pred (forward model x))
            (loss (cross-entropy-loss pred y)))
       
       (backward! loss)
       (step! optimizer)
       (zero-grad-layer! model)))
   train-data))

(define (evaluate-epoch model test-data)
  (set-eval-mode! model)
  
  (let ((total-correct 0)
        (total-samples 0))
    (for-each
     (lambda (batch)
       (let* ((x (car batch))
              (y (cdr batch))
              (pred (forward model x))
              (predicted-class (argmax (tensor->list pred)))
              (true-class (argmax (tensor->list y))))
         (when (= predicted-class true-class)
           (set! total-correct (+ total-correct 1)))
         (set! total-samples (+ total-samples 1))))
     test-data)
    
    (/ total-correct total-samples)))

;; Main training loop
(define optimizer (make-adam (parameters model) learning-rate: 0.001))

(do ((epoch 1 (+ epoch 1)))
    ((> epoch 100))
  (train-epoch model optimizer train-data)
  (let ((acc (evaluate-epoch model test-data)))
    (printf "Epoch ~A: Test Accuracy = ~A%\n" epoch (* 100 acc))))
```

### Shape Manipulation

```scheme
(define x (make-tensor32 (f32vector 1.0 2.0 3.0 4.0) '(2 2)))

;; Reshape (must preserve total elements)
(define x-flat (reshape x '(4)))

;; Transpose dimensions
(define x-t (transpose-tensor x '(1 0)))
```

### Custom Weight Initialization

```scheme
;; Xavier/Glorot initialization (built-in for layers)
(define init-scale (sqrt (/ 2.0 (+ input-size output-size))))

;; He initialization for ReLU networks
(define init-scale (sqrt (/ 2.0 fan-in)))
```

## Limitations

- No GPU support (CPU-only via BLAS)
- Limited to dense and convolutional operations
- No automatic batching (must be implemented manually)
- Single-threaded execution

## Dependencies

- **yasos**: Object system
- **blas**: BLAS bindings for CHICKEN
- **mathh**: Extended math functions
- **srfi-1**: List utilities
- **srfi-4**: Homogeneous numeric vectors
- **srfi-42**: Eager comprehensions
- **srfi-69**: Hash tables

## License

LPGLv3 License - see LICENSE file for details

## Acknowledgments

This framework is inspired by:
- **PyTorch**: Dynamic computation graphs and autograd design
- [micrograd](https://github.com/karpathy/micrograd): Minimalistic autograd engine by Andrej Karpathy
- [tinygrad](https://github.com/tinygrad/tinygrad): Small neural network framework

Built with CHICKEN Scheme and powered by YASOS and BLAS.