Lessons Learned From Ben Eater's 8 Bit CPU to Designing Single Board Computer

This is what I’ve learned from my time attempting to design an 8 bit cpu to FPGAs, to designing a single board AVR based computer and OS. These topics were never really well investigated in school. I had taken computer science and mathematics courses, but never really breadboarded, soldered, or done any hardware/FPGA/architecture design in school.

Goals

In short I wanted a project that could challenge me, and bring some skills and insights into computing that was glossed over in university for me. I set out to build a breadboard CPU, then a single board computer, then a single board computer and FPGA GPU. Finally I decided that I would try and spend this time doing other things.

To preface, this project was probably too large and too vague for me to take on in the first place. It fizzled out due to a multitude of reasons. Mainly I no longer had interest in spending countless hours debugging electrical circuits, working with computers, designing a computer, all after a day of work on computers. I however feel that I learned a great deal. The detours I took along the way helped give me a more complete understanding of computation(from the practical side).

The topics I got to dabble in, were practically non-existent in my studies at university. From understanding Ohm’s Law, and actually applying it. To reading articles on timing attacks on the x86 microarchitecture. I still feel that although I no longer have the ambition to pursue computer related projects after work, I still do enjoy learning. And this project proved to me that computer work is probably best left as work.

Original Project

The original project was to follow along Ben Eater’s 8 Bit Breadboard CPU. I made good progress along that. Below is the 555 timer circuit and a single register.

555 Stable Timer

Later on my breadboarding got to be a little unreliable, so I began designing PCBs for a register and other circuits.

Registers & Timer

At this point the little table in my apartment had no more available space. This was the Fall and Winter of 2021. I decided I needed to come up with something new for the project. Living in a 300 ft\(^2\) apartment just did not give me liberty to take on a project of this magnitude.

I wish I had more details and things to show, but this was towards the end of 2021, it is currently 2024, and nothing too interesting or difficult came up.

Register Schematic

Register 3D

Register
3D-2

Lessons

I learned quite a bit. Mainly how electricity works(lots of shorts and burning TTL chips because I still write it off as magic). What designing PCBs entails. Soldering was brushed up on a good bit. Overall I figured I was going to run out of space in my studio apartment, and there wasn’t much more to learn(although developing a microarchitecture would’ve been worthwhile in hindsight).

Redirection

At this point I wanted to continue down this path and needed a new project. I had inspiration from Matt Sarnoff’s Amethyst. This was an AVR single board computer, with the bulk of the work being the operating system. This felt more up my alley.

New Goals

I learned very quickly that my practicality for Ben Eater’s 8 bit CPU was far fetched. My apartment was too small, this project takes commitment and space. I ran into the Matt Sarnoff’s Amethyst Project and decided with what I learned in Kicad, soldering, electricity, that I should be able to design a single board computer using a microcontroller(memory, IOs, etc handled on die).

Picking a Microcontroller

I think here I just chose the same chip as Matt Sarnoff did. The chip chosen was the Atmega1284P. This chip came in the 44 PDIP package and had everything needed to build a minimalist OS(without of course user space/kernel space or really any hardware capabilities that would make having a program space possible). Instead the idea was to have everything statically compiled and linked, including the program space, have some routines for keyboard input and other abstractions, and call it a day. This means that memory management is non-existent, and that this would basically not be an Operating System in any sense of the term.

The Atmega1284P had a very nice spec sheet for us.

  • 32 registers
  • Capable of 20 MHz clock
  • 128 kb program memory
  • 16 kb RAM
  • 10 bit ADC
    • Originally this was for VGA output, but was dropped for 1 way 8 bit bus
  • Pin interrupt & timer interrupts
    • Originally wanted keyboard to be pin interrupt, but with writing VGA signals out(required rigorous timing), went with a polling design

Original Design

Originally I wanted a VGA output, keyboard input, some programs and such. I had originally designed the video output to be with a VGA port.

VGA Schematic

VGA
Board

VGA Render

However when I was writing out examples of the OS, the VGA timing and my bad assembly skills was leaving me with near less than 3% clock cycles for other tasks. The best explanation I can give for the following code is, we’re giving an analog signal that is timed. We scan line by line of the screen, with extra run off before and after to sync timing. This has to do with how most monitor/TV outputs were CRT, and the beam would scan across the screen left to right, up to down. Calculating our resolution, 400x300, then finding how long each signal needs to be on/off for is calculated in the comments. After extrapolating, and calculating how much CPU time there would be before this routine got called again by the timer interrupt, it was just infeasible to do anything truly meaningful.

This was originally written for a test breadboard for an Atmega328P(was using this board to test)

#define __SFR_OFFSET  0x00
#ifndef __AVR_ATmega328P__
#define __AVR_ATmega328P__
#endif
#include <avr/io.h>
#include <avr/interrupt.h>
.global G_LINES

#define PE_LOW  0b0001;
#define PE_HIGH  0b010;
#define HSYNC  0b0100;
#define VSYNC  0b1000;

; http://tinyvga.com/vga-timing/800x600@60Hz
; 400 x 300
; 50 x 37

; Visible Area:
	; 400 px <- 20 uS = 400 clocks @ 20 MHz
	; 8 clock cycles per 8 bits
; Front Porch <- 1 uS = 20 clocks
	; 20 px
; Sync Pulse <- 3.2 uS = 64 clocks
	; 64 px
; Back Porch <- 2.2 uS = 44 clocks
	; 44 px

; Vertical Timing
	; 600 px <- Visible
	; 1 <- Front Porch
	; 4 <- Vertical Sync
	; 23 <- Back Porch

.macro ONECHAR
; 8 Clock Cycles; Z: address of char data, X: current VRAM position
ld ZL, X+				; 2cc ZL = (X++)
lpm r21, Z			; 3cc *Z
out PORTD, r21	; 1cc Out 8 bits of prog mem line
out PORTB, r19	; 1cc /PE LOW <- Load Pixel Data to Shift
out PORTB, r18 	; 1cc /PE HIGH <- Clock pumps
.endm

.global TIMER0_COMPA_vect
TIMER0_COMPA:
TIMER0_COMPA_vect: ; 7 - 10 cycles to get
	cli
	push r18
	push r19
	push r24
	push r25
	lds r24, G_LINES
	lds r25, G_LINES+1
	adiw r24, 0x01 ; G_LINES++
	; if G_LINES < 602
	cpi r24, 0x5A ; Compare lower
	ldi r18, 0x02 ; 2
	cpc r25, r18  ; compare higher
	brcc G_LINES_OVER_602
	ldi r18, PE_HIGH;
	ldi r19, PE_LOW;
	ONECHAR
	ONECHAR
	ONECHAR
	ONECHAR
	ONECHAR
	rjmp TIMER0_EXIT
	
G_LINES_OVER_602:
	; If lines == 628
	cpi r24, 0x74; Compare lower
	cpc r25, r18 ; Compare higher
	brne G_LINES_NOT_628
	ldi r24, 0 ; G_LINES = 0
	ldi r25, 0 ; G_LINES = 0
	rjmp TIMER0_EXIT
	

G_LINES_NOT_628:
	; If lines < 605
	cpi r24, 0x5D ; compare lower
	cpc r25, r18 ; compare higher
	brcc G_LINES_OVER_605
	rjmp TIMER0_EXIT

G_LINES_UNDER_605:
	; ...
	rjmp TIMER0_EXIT

TIMER0_EXIT:
	sts G_LINES+1, r25
	sts G_LINES, r24
	pop r25
	pop r24
	pop r19
	pop r18
	sei
	reti

Finally I decided, enough was enough, I should probably buy an FPGA and design an auxilary circuit to handle communications over a bus and a VGA. This would later be decided to just be an open bus, and I’d read the output with an arduino/Rasp Pi/etc until I decided to come back to designing an FPGA.

Open Bus Design

Tiny note, Baumputer was a running joke from high school because I spent too much time on Wikipedia and knew too many pieces of trivial facts. One time my chemistry professor asked the class what chemical makes shaving cream foam, and I immediately knew it was sodium lauryl sulfate because I’ve had this exact question googled at some point in my life. I decided it was a fun name.

Final Schematic Final Board Final Render Final

The Sierpinski Triangle printed on the PCB was just me learning and practicing how to print graphics.

OS Design

The idea behind much of this was to have an interrupt run on a cadence that would handle a few things on a regular cadence(this would be the heart of the single address space program that would seemingly act like my OS).

  1. Writing to a shared bus on a regular cadence(this would be read as our video output at some point)
  2. Polling the keyboard, it would output a signal on 6 pins separately, and read in on 8 pins
    • This would allow us to read a single key of a 48 key keyboard
  3. Iterate lights, this allows us to have an extra set of outputs, debugging etc

Keyboard Polling

Overall the idea was to have keyboard inputs be asynchronous. The keyboard buffer would be filled and used between the routine adding characters to the buffer, and programs reading this buffer. There is one blocking call that I found utility enough in to add, and that’s keyboard_wait_for_line(char *buffer, uint8_t size).

keyboard_driver.h

#include <avr/io.h>
#include <stdbool.h>
#ifndef keyboard_driver_h
#define keyboard_driver_h

#define KEYBOARD_BUFFER_LENGTH 64
#define GET_KFLAG(FLAG) GET_FLAG(FLAG, keyboard_status)
#define NGET_KFLAG(FLAG) NGET_FLAG(FLAG, keyboard_status)
#define SET_KFLAG(FLAG) SET_FLAG(FLAG, keyboard_status)
#define UNSET_KFLAG(FLAG) UNSET_FLAG(FLAG, keyboard_status)
static char keyboard_buffer[KEYBOARD_BUFFER_LENGTH];
static uint8_t keyboard_buffer_index;
static uint8_t keyboard_status;

const uint8_t K_FLAG_BUFFER_EMPTY;
const uint8_t K_FLAG_BUFFER_FULL;
const uint8_t K_FLAG_READ_LAST;
const uint8_t K_FLAG_RESERVED3;
const uint8_t K_FLAG_RESERVED4;
const uint8_t K_FLAG_RESERVED5;
const uint8_t K_FLAG_RESERVED6;
const uint8_t K_FLAG_RESERVED7;

void read_keyboard(void);
char kgetc(void);
uint8_t kputc(char c);
void keyboard_wait_for_line(char *buffer, uint8_t size);
void keyboard_setup(void);
#endif

There are some macros above, these are macros I used for originally being able to inline register actions, but this ended up being a bit unwieldy, and decided against it. Instead it was just a nice macro for handling that static flag.

#define GET_FLAG(FLAG, REGISTER) REGISTER & FLAG
#define NGET_FLAG(FLAG, REGISTER) ~REGISTER & FLAG
#define SET_FLAG(FLAG, REGISTER) REGISTER |= FLAG
#define UNSET_FLAG(FLAG, REGISTER) REGISTER &= ~FLAG

Below is a description for keyboard_driver.c. Instead of putting more code, here’s what should’ve been documentation as a table here.

Function Description
read_keyboard() A system call to be called by the interrupt handler. This is the routine to be called by the system. It handles scanning out, and reading in, adding the character to the buffer and such.
kgetc() This is meant to retrieve the last character input
kputc(char) This is for manipulating the buffer if needed
keyboard_wait_for_line(char*, uint8_t) This is a blocking call. This will wait for a character array followed by a new line to be input. It will copy the buffer to the pointer location and exit when finished
keyboard_setup() This is call made by original entry point to handle setting global registers to designate what IO ports are read/write for the keyboard driver

Handling Timing

So the main program has a call to set up our timer interrupt. This allows us to run the keyboard reader on a regular cadence, output the bus, handle random number generation and light manipulation.

void setup(void) {
  // Configure ISR Interrupt
  screen_setup();
  keyboard_setup();
  master_lights_setup();
  configure_interrupt();
}

If we take a look at the configure_interrupt() routine, we’ll find.

void configure_interrupt(void) {
  // 1024 Prescaler
  TCCR0B = (1 << CS02) | (1 << CS00);
  // 76 Hz
  TIFR0 = 1 << TOV0;
  // Enable Interrupt
  TIMSK0 = 1 << TOIE0;
}

This means our Timer0 interrupt will run on a 76 Hz cadence. This is that interrupts routine.

ISR(TIMER0_OVF_vect, ISR_BLOCK) {
  read_keyboard();
  output_video_ram();
  increment_master_lights();
}

If we take a look at the interrupt table we will find the routine at address 0x0000003c:

00000000 <__vectors>:
   0:	0c 94 e9 00 	jmp	0x1d2	; 0x1d2 <__ctors_end>
   4:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
   8:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
   c:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  10:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  14:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  18:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  1c:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  20:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  24:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  28:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  2c:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  30:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  34:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  38:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  3c:	0c 94 0a 01 	jmp	0x214	; 0x214 <__vector_15>
  40:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  44:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  48:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  4c:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  50:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  54:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  58:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  5c:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  60:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  64:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  68:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  6c:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  70:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  74:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  78:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  7c:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  80:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  84:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>
  88:	0c 94 08 01 	jmp	0x210	; 0x210 <__bad_interrupt>

Note there were examples of me trying to use assembly to cut clock cycles for the VGA timing, see here.

Overall

This project was too open ended, too ambitious and impractical for me at the time. Now it is slightly a bit of burnout of working with computers at this point. I decided to take on new hobbies, travel, and reduce time spent in front of the computer. The pandemic and and my sky rocketing computer time, prefaced with 2018/2019 projects, work, and school had probably given me unhealthy habits.

I recall in 2017 my first completed true project, and how healthy that work was. I was in high school, I had requested to work on a chess engine as a final senior project. I worked on it for the course of 3 months. This was surrounded with 6 of my other 7 classes, varsity basketball, and working over the weekends. This meant I was probably working on this project 5-10 hours a week. At this time, it was practically all of my computer time a week. My school did not offer computer science courses, or AP Computer Science, so this was very much open ended, however I did have to file project goals, expectations, weekly check ups, and this really helped me in having a final result.

End Notes

I have a feeling this will be a close for projects for some time. I may find myself coming back, but at this point I have been working on having less time with computers and healthier hobbies. In between my last post and this one I have begun to travel a good bit. I have ridden 14k miles on two different motorcycles. Ridden in Italy(Tuscany), and round trips to Skyline Drive(VA), and Berkshires(MA). I have plans on doing a motorcycle ride to Canada, and a ride of Skyline Drive plus the Blue Ridge Parkway in the future.

Written on February 18, 2024 22:00 UTC-5