Arcs, segments and sectors in BBC BASIC for the Sega Master System

Tuesday, 12th October 2021

More musing on tape phases

I bought a few more Acorn-format BBC BASIC cassettes to test my adaptation of BBC BASIC to the Sega Master System with, and have found a few interesting oddities since my last post. In that post I made the assertion that the phase of the recorded signals on the tapes is at 180°, i.e. each wave cycle goes negative before it goes positive (whereas a 0° phase would go positive before it goes negative). This matched the documentation I'd read, the output of tools like PlayUEF and my own tests with commercially-recorded tapes. With these new tapes I've found things are not quite so straightforward, though:

  • The "Welcome" tape for the BBC Master Series is recorded at 0°. OK, maybe that's a BBC Master weirdness?
  • One copy of the "Welcome" tape for the BBC Micro is recorded at the usual 180°, hooray, we're back to normal!
  • Another copy of the "Welcome" tape for the BBC Micro is recorded at 0°. It's also a different colour, but otherwise has identical programs on it, maybe the difference comes from different duplication plants?
  • Side A of Acornsoft's "Graphs and Charts" is recorded at 180°, Side B of the same tape is recorded at 0°. Flipping the tape flips the phase, I give up!

I am able to load the programs by changing the "phase" switch on my cassette recorder to "reverse", but not all cassette recorders have such a switch. There is a jumper setting in the tape interface circuit that the cassette loader software can check to control whether it starts timing the length of wave cycles on a falling (180°, default) or rising (0°) edge and so it can compensate for the reversed phase, but I'd rather see if I can find a way to automatically detect the phase and properly recover data without needing to rewind the tape, push a switch or type in a command, then trying again.

A "1" bit is represented by two 2400Hz cycles and a "0" bit by a single 1200Hz cycle. Each byte has at least one "1" bit before it and always starts with a "0" start bit. In theory, then, a "0" bit with the correct phase should be represented by a wave cycle that's 2× the length of a preceding 2400Hz cycle, but if the phase is incorrect it'll only be 1.5× the length. At the moment the threshold to detect the difference between a 2400Hz and 1200Hz cycle is placed at 1.5×, maybe for the start bit it needs to be at 1.3× to detect the "0" bit instead, and if after that the wave is over 1.6× it's treated as a "normal" 180° tape and loaded as normal, but if it's between 1.3× and 1.6× it's treated as a "reverse" 0° tape and an extra rising edge is checked for before loading the tape with reverse phase.

I'm not sure if that'll work and my quick initial tests didn't work very reliably so I'll need to do a bit more experimentation I think. I did encounter another oddity with formats beyond the reversed phase, though, and that's with Acornsoft's "Graphs and Charts". The tokenised BASIC programs on the tape would crash the loader or generate "Bad program" errors, and when I copied the tape to my PC and looked at the files there I could see that BBC BASIC for Windows refused to open them too. The problem is that instead of the usual <CR><FF> terminator on each program, they all end with <CR><80> instead. My loader tries to convert Acorn BBC BASIC format programs to the Z80 BBC BASIC format automatically during the load, but if it misses the terminator when stepping through the program it will either see the line as being zero bytes long (due to memory being cleared to 0) and loop infinitely when attempting to jump to the next line, or read a line length from uninitialised memory that causes it to advance to a line that doesn't start with the appropriate byte and so assume the program is not in Acorn format, skipping the conversion process and leaving a "Bad" program in memory.

I made my loader more robust by appending a suitable dummy terminator to the loaded program; if it already was a valid program with a proper terminator then it makes no difference, but it otherwise prevents the convertor from dropping off the end of the program and allows me to load the programs from the "Graphs and Charts" tape.

Drawing arcs, segments and sectors in BBC BASIC

Clown missing parts     Complete clown

I hadn't implemented all of the PLOTting routines that BBC BASIC can provide for drawing graphics on the screen, and a few of the programs on the BBC Master Series "Welcome" tape take advantage of the more advanced routines to draw circular arcs, segments and sectors. Attempting to run these programs produced results like the picture of the clown above with several parts of its face missing.

I had been putting this off as I hadn't been able to think of a good way to implement drawing these shapes. After a bit more research online I came across a paper by C. Bond entitled An Incremental Method for Drawing Circular Arcs Using Properties of Oriented Lines which turned out to be ideal.

The linked paper does a very good job of describing the technique so I would recommend reading it, but crucially it provides a solution that can be implemented easily in Z80 assembly with a few integer multiplications and additions. I was able to incorporate this into the existing circle tracing and filling code without too much effort, just an extra step in the "plot pixel" or "fill horizontal span" routines to clip against the lines that bound the arc, segment or sector.

Ship missing parts     Complete ship
Acorn rendered poorly     Acorn rendered correctly
Welcome missing parts     Welcome missing slightly fewer parts

As you can no doubt see from the "Welcome" screen at the end there are still parts missing from the final image. This is because the program only draws each required letter once and then uses the block copying PLOT routines to duplicate the letter if it's needed again instead of rendering it again from scratch – hence the second "B" from "BBC" is missing, and "MASTER" becomes "MAS  R" as the program is expecting to copy the "T" and "E" from "THE" in the line above.

Welcome in its complete form

Disabling this optimisation within the BASIC program and forcing each letter to be drawn instead of copied does improve the results, though it still doesn't entirely fit on the screen due to the much lower resolution!

Tape cycle frequencies and phases, plus VDrive3 support for BBC BASIC on the Sega Master System

Wednesday, 6th October 2021

I have now moved the tape interface circuit described in the previous entry from its breadboard prototype to a neat enclosure where I am happy to report it mostly works as well as it did before.

The tape interface installed in its enclosure.

I have spotted two issues, though. The first affects my small cassette recorder but not the large one, and is related to reading files from tape via the file IO routines (such as OPENIN then BGET# to retrieve a single byte, not from LOAD). Files are stored on tape in 256 byte blocks, and when the file is opened a whole 256 byte block is read from the tape and copied to the Master System's RAM. When the BASIC program requests a byte from the tape it is read from this local copy of the block instead, and when the read pointer reaches the end of the 256 byte block the next block is fetched from the tape and the local copy updated with this new data. The motor control comes in handy here, with the file system library stopping and starting the tape playback as desired.

The problem I was having was that the first block would read fine, but when the time came to read the second block the tape started playing but the file system library never seemed to be able to detect any data – it would just work its way along to the end of the tape, never displaying any errors, but never reading any meaningful data either.

The issue ended up being some code I'd added somewhat recently to automatically calibrate the threshold between a 1200Hz ("zero" bit) and a 2400Hz ("one" bit) tone. The code reads bits by counting the length of each wave cycle and comparing this to a threshold value - if it's longer, it's the 1200Hz "zero" bit tone, if it's shorter, it's the 2400Hz "one" bit tone. Before and after each file, and between each block, is a section of 2400Hz "carrier" tone. By detecting a large number of wave cycles that were all around the same length (say, within four length units of each other) you could assume that you were receiving the 2400Hz carrier tone and use that to calculate a suitable threshold for loading in data later.

The main bug was that I was comparing the length of the wave cycle just received with the length of the previous wave cycle rather than the length of the first wave cycle. This becomes a problem when you start a tape that was paused in the middle of a carrier tone (e.g. between blocks) as the tape will take some time to come up to speed, during which time the period of each wave cycle will gradually decrease. As the reference length we're checking the current wave length against is changing over time it means we end up compensating for the tape's speed increasing and so end up accepting the slower (longer) wave cycle lengths as part of the calculation of the threshold between "zero" and "one" bit wave cycle lengths.

As the wave cycle length threshold is now longer than it should be, all incoming wave cycles get interpreted as short ones and so it looks like we're just getting a stream of "one" bits - which looks like the carrier, so the code never starts trying to decode any blocks. My larger tape recorder comes up to speed much more quickly than the small one, or at least doesn't audibly ramp up in speed, and so isn't affected by this problem.

Fortunately the fix is very easy, just check the length of wave cycles against the first wave received rather than the previous one. This way if the timings drift over time (as they would during the period that the tape is coming up to speed) then they'll eventually go out of the permitted range. Changing this fixed the issue.

You may wonder why I'm calculating the threshold at runtime instead of just using a hard-coded value if the expected frequencies are a fixed 1200Hz and 2400Hz. My main intention was to be able to handle tapes that were running at the wrong speed due to a miscalibrated cassette recorder, but this does also open up the possibility to load from tapes at higher rates. Without any further adjustment my code can already load from audio generated at 2320 baud, i.e. with the two frequencies at 2320Hz and 4640Hz, without any further adjustment. This is a slightly annoying figure as 2400 baud would be the obvious target (being double the intended rate!) but at this speed the current code fails to latch on to the signal so I slowed the "tape" speed down until I found a value that worked. It's perhaps worth mentioning that the audio in this case was coming from PlayUEF running on my phone rather than an analogue cassette tape with the BAUD (or LOW) parameter used to adjust the speed, I'm not sure that a 93% boost in loading speed would be achievable from a tape!

Little and large tape recorders.

I mentioned above that there were two issues. The slow speed-up issue affected the small cassette recorder, but the large cassette recorder had a new issue that only became apparerent after moving the circuit to its final enclosure. The Master System should be able to start and stop the cassette remotely, handled by a reed relay that is then plugged into the cassette recorder's "remote" socket. The Master System was able to start both tape recorders without any issues, but was unable to stop the large one – the "motor" status light would switch off, but the tape would continue playing. Tapping the relay sharply would be the only way to get it to switch off.

I tried plugging in a 2.5mm TS plug into the side of the large tape recorder and used the very scientific approach of shorting its connections with a screwdriver to simulate the relay switching "on" and was surprised to see quite large sparks. I measured the DC current through the remote control switch when the cassette was running and it was only around 70mA at its highest – well below the 1A rating of the relay – however I reckon there is a very large inrush current when the motor kicks in that is sufficient to weld the reed switch contacts together, causing it to get stuck.

Why didn't this affect the circuit on the breadboard? In that setup I was using a 3.5mm TS extension cable connected to a 3.5mm male-to-male adaptor cable which was then connected to a 3.5mm to a 2.5mm adaptor cable. My assumption is that all of the extra contact resistances were helping to limit the inrush current, preventing the contacts from welding together.

It's very likely not the best sort of snubber circuit for this application but for the time being I've put a 10Ω resistor in series with the relay contacts, which limits the inrush current enough for the relay not to get stuck:

Tape interface circuit for the Sega Master System with added 10Ω resistor on the relay contacts to limit inrush current.

One matter that I've also been testing is the phase of the audio signal. As far as I can tell Acorn's tape format assumes a 180° phase; that is to say that each wave cycle starts at 180° and runs to 180°+360°=540° for a complete cycle rather than starting at 0° and running to 360°. The result of this is that instead of the signal starting at an amplitude of 0, then going positive before negative (as you'd expect for a typical sine wave) the signal goes negative first:

0° and 180° phase.

I have confirmed this by looking at the output of tools like PlayUEF as well as looking at the signals coming directly off commercially pre-recorded tapes with BBC Micro software on them, after first confirming that my sound card doesn't invert the signal by briefly connecting its input to a positive DC power supply and seeing that the received signal goes positive when that happens. Here's an example of a commercially-released tape, where you can see that after the high-frequency cycles in the first longer wave (a "zero" bit, acting as the start bit, starting at around 12.7065) starts low then goes high:

Plot of the signal from a commercially-released tape showing the 180° phase.

I then generated a test tone that alternates between 0 and 1 bits, i.e. one complete 1200Hz cycle then two complete 2400Hz cycles with the appropriate 180° phase:

Plot of the test signal showing the 180° phase.

This pattern should make it easy to spot the phase of recordings. I first tried running the test tone straight back into my PC to ensure that the phase was not being reversed by the PC, and got the same signal back in compared to to what I played out, as you'd expect. I then tried recording the signal to tape with my Grundig recorder, and found something interesting when playing it back:

Plot of the test signal recovered from the Grundig tape recorder.

The phase there is very clearly reversed – the signal I played into the tape recorder goes low before it goes high, whereas the signal I've recovered goes high before it goes low. I repeated the test with the Alba tape recorder:

Plot of the test signal recovered from the Alba tape recorder.

This shows the same thing – the phase on the tape is at 0° even though the test signal was at 180°. I also have a Sony digital voice recorder (an ICD-BX140) that I've been using as well as a tape recorder, so I tried the test with that too:

Plot of the test signal recovered from the Sony digital voice recorder.

Apart from the much lower recording level this once again shows the same thing – the signal phase is inverted when recorded.

I had previously encountered an issue where I had tried to make a tape from a UEF image by playing the audio generated by PlayUEF into a tape recorder. Attempting to load the data directly from PlayUEF worked fine, but the tape recording of the same signal wouldn't load. My Grundig tape recorder has a phase reversal switch, and setting it to "reverse" would allow me to load the tape created by recording the output of PlayUEF. However, in that reverse setting, I lost the ability to load from my commercially-recorded tapes. I think the above tests indicate why this is the case – the recorders all invert the signal when creating the tape recording. I find it particularly interesting that the Sony digital voice recorder does the same thing!

I did check to see if the phase switch on the Grundig tape recorder had any influence on the recording, but it doesn't – whether it's set to "normal" or "reverse" the recording to tape still ends up inverted. The phase switch only appears to affect playback.

PlayUEF does have an option to set the output phase to 0° instead of its default 180°. By doing that and recording its output to tape I was able to load the programs directly from either of my tape recorders.

Of course, it would be nice if the loader was able to handle either phase, and in practice I've found it does sometimes handle the "incorrect" phase quite well but it does seem to be more a lot more error prone – some programs load without any problem, others report problems with some blocks and others fail to be detected at all! I think I'll see if there's a way to reliably detect the phase and switch to the appropriate handler, but for the time being I've added a crude workaround – pulling pin 1 of the controller port (the d-pad "Up" input) to ground tells the loader to assume the phase is inverted. As I hope this won't be necessary in future I've left this as a jumper in my current tape interface circuit (visible below in the top left corner) rather than add a switch!

The tape interface circuit board.

A lot of the work has been focused on the tape loader, but tapes are not the most practical storage media these days. Finding a working tape recorder is a challenge in its own right, and a lot of the old tapes are tricky to work with as they've become sticky with age. In an attempt to be a bit more modern I've started integrating support for the VDrive3 to BBC BASIC. This is an inexpensive module that lets you access files on a USB mass storage device via simple commands sent over a serial interface. I'd previously used the VMusic2 device in another project and that uses the same firmware commands so I already had a little experience with it (the VMusic is pretty much a VDrive with an MP3 decoder bolted on top). I already had some serial routines written so it was a reasonably easy job to plumb in some code to handle LOADing and SAVEing programs, and from a hardware perspective all you need is a cable to connect the module directly to the Master System's controller port – the VDrive3 uses 3.3V signalling levels but its I/O lines are 5V tolerant.

The VDrive3 alongside the tape interface and RS-232 modules.

One notable advantage of the VDrive3 over the tape interface is that it allows for proper random-access of files rather than just sequential access. I've been working on implementing all of the required BASIC statements to support this (OPENIN, OPENOUT, OPENUP, BGET#, BPUT#, PTR#, EXT#, EOF#, CLOSE# etc). For a bit of fun I wrote a BASIC program that loads an image file from disk (or tape) and displays it on screen. I needed to add user-defined character support to the new reduced-resolution MODE 2 to handle this (something that was not implemented before due to a lack of VRAM) but with that in place it's working pretty well.

Loading a picture of a duck from a USB drive.

It's not fast, taking 2m45s to load from from tape and 1m19s from a USB flash drive, but it's a fun diversion!

Refining the tape interface for BBC BASIC on the Sega Master System

Tuesday, 28th September 2021

It's been quite a while since my last post but work has continued with the version of BBC BASIC (Z80) for the Sega Master System. Most of the features I have been working have not been particularly exciting to write about on their own, but here are some of the notable changes:

  • All VDU code (for text and graphics output) has been moved to a separate ROM bank, freeing up around 16KB of extra ROM space. It makes calling the code slightly more complicated but in the previous post I mentioned that I had less than 100 bytes of free ROM space so this was definitely required.
  • Simplified the video mode driver interface. Each video mode has its own driver code which exposes a number of different functions. Previously, each function had its own vector in RAM which was fast but consumed a lot of precious work RAM. Now only a couple of functions (which need to be called frequently and as quickly as possible) are vectored, and all of the other functions are called via a single generic function with a parameter for which specific operation to carry out (e.g. "reset the graphics viewport", "change the text colour").
  • Rewrote the triangle filling routine and line drawing routine to ensure that the same pixels are covered whether you fill a triangle or trace its outline – previously there would be some stray pixels that leaked outside the perimeter of a filled triangle.
  • Changed the output resolution for most video modes to 240×192 with a logical to physical pixel scale factor of 5⅓ for a total logical resolution of 1280×1024, the same as the BBC Micro – graphics programs designed for the BBC Micro now properly fit on the screen.
  • Changed MODE 2 to use a smaller resolution of 160×128 with a pixel scale factor of 8. Unfortunately, there is not enough video RAM to have a unique pattern in VRAM for every character position on-screen when using all 16 colours (as in mode 2) at the higher 240×192 resolution. The new lower resolution avoids screen corruption that previously occurred when running out of VRAM:
Corrupt MODE 2 graphics     Fixed but letterboxed MODE 2 graphics
MODE 2 graphics at the full-screen 240×192 and reduced 160×128 resolutions
  • Added support for scrolling the text viewport down as well as up, this is used by the line editor which now allows very long lines to be edited in small viewports by scrolling and repainting as necessary.
  • Reworked the keyboard input routines for greater reliability and performance, including proper mapping of key codes in *FX 4,1 and *FX 4,2 and removal of "clever" (but flawed) interrupt-based detection of when a key is available (that had a habit of dropping keys and resulted in worse performance than just periodically polling the keyboard anyway).

The most significant changes have been all to do with loading and saving from tape, though. Since my previous post I have been doing a lot of work testing and refining both the hardware and software of the tape cassette interface. The hardware has now been tested with a couple of different tape recorders, as well as a pocket digital recorder, my mobile phone and my PC's sound card. The loader is much more robust and if a single block fails to load from the tape you can usually just rewind the tape a short way and it will try again, no need to rewind back to the start and start from scratch. I was previously counting the duration of half-wave cycles from the tape and comparing these to some fixed thresholds baked into the code to determine the difference between a "0" and a "1", I've now changed this to measure full-wave cycles (which aren't affected by any duty-cycle shifts in the recordings) and to calibrate the thresholds during the initial carrier tone, which allows the code to compensate for tape recorders that aren't quite running at the correct speed. On a more practical basis I've also added the ability to save to tape, motor control via a reed relay (so the tape will automatically stop after a file is loaded), and support for opening files on tape for reading or writing and accessing them on a byte-by-byte basis. As tapes are sequential access there are some limitations to this (for example, you can only advance the read pointer later in the file, you can't seek backwards) but it works quite well otherwise.

A selection of tapes and a tape data recorder.

Of course, to test all of this I needed a tape recorder! I purchased the above Grundig DCR 001 data cassette recorder as well as some blank cassettes and a couple of commercially-prerecorded ones to do so. The DCR 001 has a few nice features that make it great for loading programs:

  • A dedicated "data" mode that sets the output level at an appropriate fixed level – no need to guess about finding a suitable volume level.
  • An optional "monitor" in data mode that lets you listen to the data on the tape as it's loading.
  • A "phase" switch that can be used to reverse the phase output, to compensate for the case where the data has been recorded with a 180° phase shift.
  • An "Automatic Programme Detecting System" that lets you skip ahead to the next program or back to the previous one by pressing Rewind or Fast Forward when Play is pressed – this stops when it detects a silent gap between recorded sections of data on the tape.
Tape interface circuit for the Sega Master System.

Above is the cassette interface circuit I came up with. It's quite simple and has a notable lack of analogue sections (no op-amp chips here!) – it uses a NOT gate with a 10KΩ negative feedback resistor to turn the analogue signal from the tape recorder's earphone output into a clean digital signal that can be fed into the Sega Master System. I've tested this with an SN74LS04N, SN74HCU04N and SN74F04N and all worked well, however for best results I'd recommend the SN74LS04N. The filtering capacitors clean up some high-frequency pulses/glitches that may otherwise end up on the output; this is not something I found was much of a problem with a cassette recorder or the audio output from my phone or PC, but a pocket digital voice recorder I have seems to produce glitchy high-frequency pulses in its output (I'm not sure if that's an artefact of its DAC or in the MP3 compression it uses for its voice clips) and those capacitors clean up the signal enough to be able to read it reliably.

The remote motor control relay is a reed relay, and must be a low current device to avoid overloading the Master System's controller power supply and to also allow it to be driven directly by the hex inverter chip. I found the EDR201A0500 perfect for this role!

A grubby ALBA tape recorder.

Being able to save and load back programs on one cassette recorder is fair enough, but to really be sure my hardware and software are working as they should I thought I should see if I was able to save a program to tape on one tape recorder and then load the same program back on a different tape recorder. To this end I bought the above cassette recorder. It was clear from the listing that the record button and counter reset button were damaged, and the whole thing was quite grubby, but I thought it worth a punt.

Getting inside was very easy as all but one of the screws was missing, and it was clear that someone had got there before me as the damaged plastic parts (including two of the main screw posts as well as the snapped-off parts of the record button and counter reset mechanism) had been removed at some point. After giving the tape path a very thorough clean and replacing one of the drive belts that was turning into a tarry mess I was able to get it to play a tape, so I took everything out of the enclosure to give that a good scrub and prepared to fix the damaged plastic parts.

Repairing the record button.

By tracing the arm from one of the other buttons onto a piece of scrap ABS plastic I was able to make a replacement arm for the damaged record button, which I then glued on with cyanoacrylate and some two-part epoxy resin to fill in the gaps. The result is not especially pretty but it does the job and my blobby repair is not going to be visible unless you take the recorder apart.

Repairing the counter reset button.

The counter reset button was a bit more fiddly; the reset mechanism seems to rely on the "n" shaped bit of plastic shown above in the top left photo, and though it looked like something used to be attached to it that had snapped off I wasn't sure what it would have originally looked like. I cut up some more scraps of ABS sheet to make up a new counter reset button which seems to work pretty well, though!

A somewhat refurbished ALBA tape recorder.

The final addition was a couple of new knobs for the tone and volume sliders – these are cheap generic parts and they don't quite line up with the markings (the original sliders have off-centre indicators) but overall I think this turned out pretty well, and my main concern – being able to save a program to tape on one machine and then load it back from another – has been thoroughly tested now and I'm happy with the results. My next task is to transplant the circuit from its current breadboard layout to a neat project box!

Loading BASIC programs from cassette tape to the Sega Master System

Monday, 16th August 2021

After I posted about the pattern filling modes on Twitter I was alerted to the BBC Micro Bot website which hosts a gallery of programs that produce impressive graphical output from very short BASIC programs (short enough to fit in a Tweet!)

I tried a few of them out but unfortunately ran into problems with a lot of them that use various tricks to reduce the original program text length by embedding non-ASCII characters directly into the body of the program. My usual approach to prepare programs was either to copy and paste them into an emulator (my emulator strips out non-ASCII characters on pasting, as these can't be mapped to keystrokes) or to save them to a file and transfer that serially, and my own editor's tokeniser and BBC BASIC for Windows both had a habit of mangling the non-ASCII characters.

TV showing a picture of a shell generated from a BASIC program

The BBC Micro Bot website does not provide a way to download the BASIC programs directly but does provide a share link that allows you to export a disk image or play back the program from a virtual tape cassette. I don't have a a disk drive attached to my Master System, but the tape option seemed promising...

The basic cassette tape interface

My initial approach to loading programs into BASIC from external storage was to connect my Cambridge Z88 computer running PC Link 2 or EazyLink to the Master System with an RS-232 serial cable. This works well for me, but isn't very practical for others who might not already own a Z88, and I do want this to be an easy project for people to replicate!

Being able to load from tape would significantly lower the barrier to entry, if the cassette tape interface circuit can be constructed easily. You wouldn't even need an actual tape player; the drive belts have gone in mine (I tried to make a test recording on mine and it just unspooled the tape into the bowels of the machine) so for the time being I've been testing loading from PlayUEF, the web-based UEF player that's also found on the BBC Micro Bot website. This can be run from a browser and it works well on my mobile phone and makes loading programs from a web link an absolute doddle. Perfect!

The challenge, then, is how to get that audio data into the Master System in the first place and whether it's got enough grunt to be able to decode it in software. Using the same tape format as the BBC Micro seemed like it would be a good choice so I read this article on the Acorn cassette format on BeebWiki. The highest frequency audio signal is a sine wave at 2400Hz. As we'll need to handle both halves of each cycle we'll need to sample the signal at least 4800 times per second. As the Master System has a CPU clock of around 3.58MHz, that gives us 3.58×106/4800≈756 CPU cycles for each half of the cycle which should be more than sufficient.

Breadboard with the prototype tape interface on it

We still need to get the audio signal into the Master System somehow! I did a bit of digging online and couldn't find too much information about suitable electrical interfaces. My best lead was the circuit diagram for the BBC Micro's tape interface, however that requires several op-amps and I only have a single LM741 in my parts bin, which isn't going to do a very good job on a single 5V supply. In the short term I've used the circuit above which is a slightly modified version of this common circuit used to convert S/PDIF signals to TTL levels using an unbuffered hex inverter as an amplifier. It's not the most stable circuit and has a habit of oscillating when not fed an input signal but the +5V pullup on the output from the console inhibits this (putting a pull-up or pull-down on any of the inputs changes the mark:space ratio of the pulses). That a load on the output affects what's happening on the input is definitely not a good indication of a properly-functioning circuit, but it'll have to do for now.

Logic analyser trace of the audio signal converted to square pulses

I don't know how well it'll respond to a real tape cassette player, but the signal from my PC's sound card or my mobile phone produces very clean square pulses so it should be enough to start experimenting with until I can come up with more stable design.

Decoding the tape signal in software

To decode the signal we'll need to know whether the tape is outputting silence, a 1200Hz tone or a 2400Hz tone. I'm handling this with a simple routine that samples the current level of the input, then loops around waiting for the input level to change state, counting the number of times it takes to go around the loop before the transition. If the counter overflows before the state changes then it's assumed that the tape is outputting silence, but if the state changes before that we can check the value the counter reached to see if it got to a small value (a high frequency signal from the tape) or a large value (a low frequency signal from the tape).

To determine a suitable value for the thresholds I wrote a test program that simply called this routine in a loop 256 times, storing its results in memory, before dumping the results to the screen once all 256 values had been found. By playing a section of the tape in the middle of a data block (so there would be a good mixture of 1200Hz and 2400Hz tones) I got the rather pleasing result that 1200Hz tones appeared to make the counter reach $20 (32) and 1200Hz tones made the counter reach $10 (16), so I put the threshold value between them at 24 as an initial test.

Once we know the frequency of the tape signal at a particular point we can use this to determine the current bit – a "0" bit is a complete 1200Hz cycle (so we'd see two instances of the counter reaching >16) and a "1" bit is two compete 2400Hz cycles (so we'd see four instances of the counter <16). Knowing the bit on its own is not very useful, though, so I added byte decoding which waits for a start bit (0), receives and stores 8 data bits, then checks for a stop bit (1). I could then dump this data to the screen:

A data dump from the tape to the screen

At this point the data isn't completely correctly decoded, but it always produces the same results on every run-through which to my mind was a good sign. The text displayed at the bottom of the screen shows some recognisable fragments of the original BASIC program (as the program is tokenised the keywords are missing). The main problem I was having was down to synchronisation; as the article on BeebWiki states "an odd number of 2400Hz cycles can and does occur" and these occasional cycles were throwing me out of sync. The fix was to add an extra check when handling 2400Hz; even though four 2400Hz half-waves are expected, if a 1200Hz half-wave is detected instead it's assumed that we've gone out of sync and the bit should be decoded as a complete 1200Hz pulse instead. After making this change more recognisable text started coming through, so I was able to move on to decoding the data blocks.

Decoding data blocks on tape

Files on tape are broken up into multiple data blocks, each with a header describing the block (e.g. file name, block number, data length) followed by a variable amount of data (up to 256 bytes) containing the data itself. To load a file from tape each incoming block has to be handled. If the block number is 0 then that's the start of the file, so the filename is checked. If this matches the supplied filename (or the supplied filename was the empty string "") then the loader switches from "Searching" to "Loading" mode and the data we just received is stored in memory. From that point on each incoming block is appended, as long as the block name and number match what we expect (i.e. the same filename as before but the block number should be the previous block number + 1). This continues until the "end of file" block flag is set. A CRC-16 of the received data is also checked after each block to ensure that the data has been received successfully. If there are any errors (the block name/number doesn't match what was expected, or the CRC-16 check fails) then an error message is printed and you can rewind the tape a bit to try receiving the block again.

This doesn't sound too complicated, but I've not had too much luck implementing this successfully. My code is currently a bit of a mess and doesn't properly maintain the "Loading" or "Searching" states which means that if an error occurs at any point then it can get a bit confused trying to resolve it (to the point where it's easier to just abort the load and start again from the beginning). However, the tape block issues are not the only problem...

Conflicting BASIC program formats

The programs I'm trying to load are designed for the BBC Micro, where programs are stored in the Acorn or "Wilson" (after Sophie Wilson) format. However, BBC BASIC (Z80) was written by Richard Russell and therefore uses the "Russell" format. Both have a lot in common, fortunately but the way each line is stored in memory differs (this is covered in more detail in the "Program format" article on BeebWiki). Fortunately the formats are similar enough that it's possible to convert them reasonably trivially in-memory.

My initial plan was to leave the loaded program as it is, and to provide a star command (e.g. *FCONVERT, named after the FCONVERT.BBC program) to convert from the "Wilson" format to the "Russell" format. Unfortunately, after loading the file BBC BASIC (Z80) checks the program and applies its own fixes to it, notably writing the appropriate terminator on the end of the program. As far as I can tell it does this by following the program along, line by line, until it finds what it believes is the end. In the "Russell" format the first byte of the line is its length, and in the "Wilson" format it's a carriage return (value 13). What I think this means is that BBC BASIC (Z80) thinks that the second line of an Acorn program should occur 13 bytes in, and as this isn't likely to be the case it ends up writing its terminator 13 bytes in to end the first line. This prevents me from fixing loaded programs after the fact, as they've already had part of their contents overwritten.

Instead, I've added a check to the program after I've loaded it but before I've returned control to BBC BASIC. If I am able to read the program from start to end in "Wilson" format then I apply the conversion to "Russell" format automatically. In practice this means that both file formats load correctly, but I don't particularly like the way that if you LOAD a "Wilson"-format file then SAVE it back it'll be saved as a "Russell"-format file.

There are still some differences in the formats and the way they're tokenised but usually that can be fixed by loading the problematic lines into the line editor (*EDIT) and pressing Return without making any changes. This forces them to be retokenised and has fixed some of the issues I've run into.

Another issue is that "Wilson" format BASIC files support a line number 0, but "Russell" format ones do not; the file will be loaded and can be run, but if you try to LIST it then it'll stop when it reaches line 0 (which is usually the first one!) so programs appear to be empty. This is not something I think can (or indeed should) be fixed automatically but if necessary you can change the line number for the first line from 0 to 1 with ?(PAGE+1)=1 so there is at least one solution.

The video above shows a couple of programs being loaded from the BBC Micro Bot website onto the Master System. Even with the terrible audio to digital interface it gets the job done! The 7905=7905 messages at the end of each block during loading show the CRC-16 calculated for the received data followed by the expected CRC-16, as they match it indicates the data is getting through successfully.

Emulating the tape interface

As alluded to above my current loading system works well enough when everything is OK but it gets a bit confused when there's a fault. I never really planned out how to handle this and the code's a bit of a spaghetti mess in places so I drew out a flowchart of a more robust loader, which itself is also a bit of a mess but hopefully one that will make loading tapes more user-friendly and robust.

Unfortunately, doing any work on this interface has been very tedious as every time I make a change I need to try it on real hardware – I don't know of any Master System emulators that include tape support! I've been using my own Cogwheel emulator throughout which is itself not a very good emulator (there are no proper debugging tools, for example) but I can at least bolt on weird features I need like PS/2 keyboard emulation, so I thought it would be a good use of my time to add tape interface emulation. I started by writing a UEF (Unified Emulator Format) parser that could turn a UEF image file into a 4800 baud bitstream that could be fed into the Master System's controller port. For some reason my files end up being slightly different lengths to the sound files generated by PlayUEF – nothing major, but a handful of seconds over the length of a roughly 7 minute file. I tried loading the generated bitstream as a 4800Hz sample rate file into Audacity and it didn't quite line up with the audio file generated by PlayUEF either, but after poring over the specifications for the UEF format I was unable to figure out where the discrepancy lay.

Undeterred I pressed ahead and added this simple cassette player to the emulator, which happily seems to work well in spite of the reported timing differences:

Cassette recorder interface in the Cogwheel emulator

One thing that's notable there is that the program CIRCLE2 appears in the catalogue on-screen. When I first added the cassette interface emulation it was missing, but it would not be detected on real hardware either, which made me quite happy! Being able to more rapidly make and test changes to the tape loader code I was able to experiment and found that the tape loader seemed to miss the synchronisation byte ($2A) sent at the start of block headers on the CIRCLE2 program. This seemed to be due to going out of sync with the bit stream in the period between detecting the carrier tone and the first start bit, so I changed the code to demand exactly two 1200Hz half-waves as the start bit rather than using my more lax code that decodes the contents of the main bit stream that allows for extra 2400Hz pulses. This ensures that the start bit is always properly synchronised, and after making this change CIRCLE2 appeared in the catalogue as it should. I then reflashed the ROM chip on the cartridge and tried on real hardware, and that now picks up CIRCLE2. Being able to fix a problem that appears on hardware by replicating and resolving the same issue in software emulation justifies the few hours I spent adding the cassette tape emulation!

Unfortunately, I'm down to under a hundred bytes of free program space on the 32KB ROM I've been developing with, and I still have a lot of features I'd like to add (including the more resilient tape loader). I've been trying to put this moment off as long as possible as I'm going to have to use bank switching to map in additional memory banks, and that means a lot of difficult decisions about what code lives on which ROM pages to minimise the amount of switching that's required. Conventionally slot 2 is used as the slot to map different banks into but at the moment that's where I've mapped in the cartridge RAM to extend BASIC's memory. On the plus side very little of my code needs direct access to BASIC's memory (I mostly use my own data storage or the stack which lives at the top end of memory, far above slot 2) so this would seem like a good fit but I currently allocate storage space in BASIC's free memory for some operations (e.g. as temporary space when copying data around the screen during scrolling operations) so I'll need to change that code to use memory allocated elsewhere. It's not the most glamourous part of the project, but it'll need to be done if I am to add all the features I'd like.

Patterned fill modes for the Master System version of BBC BASIC

Tuesday, 10th August 2021

One of the features I was quite happy with in the TI-83 Plus version of BBC BASIC was the dithered fills used to provide some semblance of different colours. Sixteen different patterns were provided between black and white:

2009.01.21.01.gif

As well as baking in the sixteen different dithered patterns I added a command that let you use your own pattern tile in a very hacky method (you'd allocate the memory for the pattern via DIM and then pass the address of that memory to GCOLPAT – the extra "PAT" statement after GCOL is what did the trick).

What I didn't realise at the time is that the BBC Micro also had support for patterned fills via its Graphics Extension ROM and a much more natural way to interact with it – GCOL 16,0, GCOL 32,0, GCOL 48,0 and GCOL 64,0 let you select one of four pattern slots, and you can redefine the patterns with VDU 23,2,… to VDU 23,5,… followed by 8 bytes of pattern data, similar to how user-defined characters are already programmed.

Armed with the Graphics Extension ROM's user manual and sample programs I went ahead and implemented these patterned fills in my four- and sixteen-colour modes. Here's the output of the "shades" program, which runs in a four-colour mode and generates dithered fill patterns for the triangle to simulate different shades:

Screenshot showing a triangle with dithered fill patterns to simulate additional colour shades     Screenshot showing a triangle with dithered fill patterns to simulate additional colour shades

Depending on the number of colours of the screen mode you're currently in the 8×8 bitmap used as a fill pattern is interpreted differently; in the 16-colour mode it represents a 2×8 pattern that is repeated four times horizontally, in 4-colour mode it represents a 4×8 pattern that is repeated twice and in 2-colour modes it's a complete 8×8 pattern with no repetition. I like the idea of these 8×8 patterns so added a new two-colour video mode, mostly based on the existing four-colour mode but with the palette limited to two colours and the revised rules on interpreting fill patterns. This produces results like the following from the "pattern" demo program:

Screenshot of several different fill patterns in the new two-colour mode     Screenshot of several different fill patterns in the new two-colour mode

I've also implemented VDU 23,11,… (it resets the four patterns to their initial values) and VDU 23,12,… to VDU 23,15,… – these are helper commands that let you populate a fill pattern from a 2×4 list of colour values that is then repeated across the specified pattern number regardless of the current screen mode and without you needing to perform the bitwise arithmetic to pack the different colour values into memory properly. You may have also noticed that the four patterns are spaced sixteen GCOL numbers apart from each other (16, 32, 48, 64) – this is because you can use these pattern fills in conjuction with the logical plotting modes, e.g. GCOL 16+1,0 will OR the pattern over the existing screen contents.

Screenshot of text with a pattered fill applied

These pattern fills apply to all graphics operations, so if I break out of the shaded triangle program above when it's settled on a particularly garish palette and switch to outputting text as graphics with VDU 5 I get results like the above. I'm not sure how practical or useful that would be but it does at least show how flexible the graphics plotting system is.

Better plotting modes, text anywhere on the screen and double-height characters for Sega Master System BBC BASIC

Monday, 9th August 2021

I've made quite a few changes to the way colours and plotting modes are handled in the Sega Master System version of BBC BASIC; previously this was handled on a per-graphics-mode driver basis, but this resulted in a lot of duplicated code and some inconsistencies. The new code is a bit more straightforward but also adds quite a few new features, most obviously different plotting modes:

Screenshot showing three colours blended together with an OR plotting mode
Plotting with the OR mode

The above screenshot shows the result of filling three circles with the OR plotting mode. The red circle is colour 9, the blue one is colour 10 and the green one is colour 12. Where the colours overlap they are ORed together when drawing, so for example red OR blue is 9 OR 10 which is 11, the colour code for magenta, hence the colour between the two is magenta. Where all three colours mix, that's 9 OR 10 OR 12 which is 15, the colour code for white.

It's the logical colours (the palette indices) that are combined by these plotting operations, rather than the RGB values, it just happens that the stock palette lines up neatly with the RGB values!

Another addition I've made is support for VDU 5. In normal operation, text sent to the display is displayed at the text cursor position. In VDU 5 mode the text is instead drawn at the graphics cursor position, which allows text to be drawn anywhere on the screen, outside the confines of its usual grid. It also takes into account the graphics plotting modes mentioned above and so you can blend text with other graphics operations. Here's an example:

Screenshot showing text drawn in a wavy fashion on the screen

The somewhat blurry text has been drawn in two passes slightly offset from one another, once with colour 1 and again with colour 2, ORed together so that where the text overlaps you get colour 3, which is white. The program for this is as follows:

 10 REM Draw wavy text in a watery style.
 20 MODE 2
 30 REM Set up a suitably watery palette.
 40 COLOUR 0,0,0,64
 50 COLOUR 1,0,64,128
 60 COLOUR 2,0,128,255
 70 COLOUR 3,255,255,255
 80 REM Draw text at the graphics cursor position.
 90 VDU 5
100 REM How many lines of text will we draw?
110 RESTORE
120 L%=-1
130 REPEAT READ L$
140   L%=L%+1
150 UNTIL L$=""
160 REM Draw in two passes, slightly offset, in OR mode for a blurry effect
170 FOR C%=1 TO 2
180   GCOL 1,C%
190   RESTORE
200   Y%=(960+L%*50)/2
210   REPEAT READ L$
220     IF LEN(L$)>0 AND L$<>"-" PROCWAVES(L$, C%*4, Y%-C%*4)
230     Y%=Y%-50
240   UNTIL L$=""
250 NEXT C%
260 REM Restore normal text operation.
270 VDU 4
280 REM Wait for a key then exit.
290 REPEAT UNTIL INKEY(0)=TRUE
300 REPEAT UNTIL INKEY(0)<>TRUE
310 END
320 REM Draw some wavy text at X%,Y%
330 DEFPROCWAVES(TEXT$,X%,Y%)
340 LOCAL I%
350 X%=X%+(1280-(LEN(TEXT$))*35)/2
360 FOR I%=1 TO LEN(TEXT$)
370   MOVE X%,Y%+20*SIN(X%/80+Y%/160)
380   X%=X%+35
390   PRINT MID$(TEXT$,I%,1);
400 NEXT I%
410 ENDPROC
420 REM Text strings to display.
430 DATA "VDU 5 can be used to place text"
440 DATA "anywhere you like on the screen,"
450 DATA "not just aligned to the main"
460 DATA "text grid."
470 DATA "-"
480 DATA "This is fun, isn't it?"
490 DATA

When you're finished you can use VDU 4 to return to the normal text mode. In the video modes where I've had enough free VRAM to spare for it I've also added support for user-defined characters. You can send 23, then a character number, then eight bytes of pixel data for the 8×8 character bitmap to the VDU and then use that data for subsequent text operations (the characters from &80..&FF can be redefined). These user-defined characters can also be used as 8×8 sprites when combined with VDU 5.

I'm having quite a lot of fun writing my own test programs but it's also been useful to be able to run existing software as a test. I've tried to improve my sound handling and it now sounds a bit more faithful to the BBC Micro (though the Master System is a slower machine, so it can't quite keep up with more adventurous pieces of music).

Screenshot showing the BBC Disc System Welcome screen     Screenshot showing the 'Bones' demo
The BBC Disc System "Welcome" screen and the Bones demo

With the new graphics plotting modes and VDU 5 support the BBC Disc System "Welcome" program runs, for example, though it looks a bit rough around the edges due to mismatches in the video modes. Bones also runs, however the mismatch is even more severe here and half way through the skeleton turns red as it's running in the four-colour video mode and COLOUR 1 is mapped to red, not the white in the monochrome mode the demo is expecting. The misalignment comes from differences in the number of rows and columns of text that are available; the demo mixes graphics commands (which are mapped reasonably closely to the BBC Micro's screen) and text; if the demo expects the screen to be 40 columns wide and the best I can offer with this hardware is 32 columns, it won't line up! Of course, the BASIC program can be tweaked to get it working properly, but it's fun to see just how close it gets without any tweaking...

One recurring issue, however, is with programs that expect to be able to use Teletext features. The BBC Micro's default video mode used a Teletext character generator, which allows for bright colourful text and some limited graphics support in very little memory. I had previously mapped this mode to the TMS9918A text mode, which allows for 40 characters in 24 rows, just one row short of the BBC Micro's 40×25. It's a fast mode and is good for program editing due to the large number of text characters on screen at a time (all other modes are 32×24) however it is very limited - there's no support for colour, for example, other than the global foreground and background colours.

To provide some colour support I loaded the character set into memory twice, once normally and once inverted, so via the COLOUR statements it's possible to inject some variety into this mode. The BBC Micro's Teletext MODE 7 doesn't support the COLOUR statement, it relies on embedding special control codes in character positions on the screen that affects the rendering of the rest of the text in the line – for example, a command might change the colour, or cause the rest of the line to flash. As I can't possibly support such codes, I just ignore them, and the programs run fine but look a bit bland.

Screenshot showing double-height text being duplicated

The main exception to this, I've found, is the double-height text mode. You can switch this on within a line by storing the value 141 in a character cell and then off again with 140 afterwards. This will only display the top halves of the characters, so to display the bottom halves you need to output the same content on the line below. A lot of BBC Micro programs seems to take advantage of this feature, so most programs I'd tried ended up having doubled-up text like you can see in the screenshot above.

The character set I'm using is only 96 characters long (32-127) but I had avoided investigating this as I assumed I'd need to triple the number of characters - the regular-height ones, plus the top and bottom halves of the doubled characters. 96×3=288 which is more than the 256 characters we have available, so I didn't think this would fit.

However, I did think that maybe some characters shared parts - for example, the top halves of the colon and semicolon are the same, and the bottom halves of the semicolon and comma are the same, so I wrote a quick program to count how many unique patterns would be needed and found that it would only take 225 patterns in total, well within our 256 pattern budget!

I set up a new video mode based on the original TMS9918 text mode. This generates these 129 additional patterns for the double-height characters when initialised, and also checks the nametable when outputting characters to see if there's a "double height" control character in the same line to determine whether it should pick one of the stretched characters (picking a top or bottom half depending on whether there are an even or odd number of lines above it that also contain the double height control character).

Screenshot showing double-height text being stretched

This new mode does lose the inverted text, but I think the double-height text will be more useful in practice – or at least is better at making programs look less broken!

A four colour any-pixel-addressable video mode for the Master System

Wednesday, 4th August 2021

The video display processor in the Sega Master System is derived from the TMS9918A and provides a number of different screen modes. These modes are mostly character-based and generate the picture that is displayed on your TV based on two blocks of data in video RAM: the name table, which specifies which character appears in each cell on the screen and the pattern generator which contains the pixel data for each character.

Example of TMS9918A Text output
TMS9918A "Text" mode

The "Text" mode is probably the most straightforward example. Only two colours can be displayed on the screen; a foreground and a background colour, both selected from a fixed palette of 15 colours. The name table is 40 cells wide and 24 tall. Each cell, or text position, is a single byte that refers to one of 256 patterns in the pattern generator, so the complete name table is 960 bytes long. Each character is displayed as 6×8 pixels but the data in the pattern generator is eight pixels wide, and each row is stored as a single byte with a set bit selecting the foreground colour and a cleared bit selecting the background colour (the two least-significant bits are ignored). Each character pattern therefore takes up 8 bytes of VRAM, or 2,048 bytes (2KB) for a complete set of 256.

The "Graphics I" mode changes the character size to be the full 8×8 pixels at the expense of the number of characters that can be displayed per line; this is reduced from 40 to 32, so the name table is now 32×24 and 768 bytes long. The pattern generator is the same size as before and you can still only display 256 unique patterns on-screen at a time, but there is at least some provision for colour. The top five bits of the character/pattern number are used as an index into a 32-byte colour table, where each colour table entry contains two four-bit values corresponding to the foreground and background colours for the pattern. This means that patterns 0-7 share the same two colours, patterns 8-15 have the next set of two colours, 16-23 after that and so on; 256 patterns, in groups of eight with a different pair of colours assigned to each of those 32 groups.

Example of TMS9918A Graphics II output
TMS9918A "Graphics II" mode

Fortunately, the TMS9918A is a revised version of the original TMS9918 which adds the much more useful "Graphics II" mode. The name table is the same as before, 32×24 cells of one byte each, and the patterns are still 8×8 pixel bitmaps. However the pattern generator table is now three times the size, storing 768 unique 8×8 patterns in 6KB. To be able to address all 768 patterns when you can only store values from 0-255 in the name table the screen is divided into three sections; the top third references the first 2KB of the pattern generator (patterns 0-255), the middle third references the second 2KB of the pattern generator (patterns 256-511) and the bottom third references the last 2KB of the pattern generator (patterns 512-767). This allows every single character position in the nametable to reference a unique pattern, and therefore have a bitmap that covers the entire screen area.

The colour table has also been significantly improved. Instead of one pair of colours for each group of eight patterns, each pattern now gets its own table of colour pairs, one per row. This doesn't allow every pixel on the screen to have its own colour, but you can at least give each 8×1 region its own pair of colours. This does lead to attribute clash when drawing; you can see this in the screenshot of the cones above, where the colour from some lines bleeds into the colour set for nearby lines if they happen to pass through the same 8×1 pixel block.

Very few Master System games use the TMS9918A video modes (F16 Fighting Falcon seems to be about the only concrete example), instead using a new video mode added specifically for the Master System. This has a number of nice new features (such as hardware scrolling and two user-definable 16-colour palettes instead of a fixed 15-colour one), but it's the name table and pattern generator that's of main interest. Instead of having a separate bitmap for the pattern and its colour table, each pattern is now a 32 byte object where each 8 pixel row is encoded as four bytes, one byte for each bit of the four-bit colour palette index. That is, if you wanted to store the top row of a pattern where the leftmost pixel was set to palette index 1, the rightmost pixel was set to palette index 15 and everything else was left as 0 it would be stored like this:

.db %10000001
.db %00000001
.db %00000001
.db %00000001

The second row would then follow with another four bytes, all the way down to the bottom row for the complete 32 byte pattern. This scheme allows you to set each pixel in every pattern to any of the sixteen colours per palette, so if you could fill the screen with such patterns then it would be possible to set any pixel on the screen to any of sixteen colours, solving the attribute clash problem. Unfortunately, to do so would take 768 patterns (32×24), and if each pattern is 32 bytes then that would take up 24KB of video RAM. The Master System only has 16KB of video RAM, which would limit you to 512 patterns, so it's not possible to fill the screen with unique patterns. In practice you can't even have 512 patterns, as you still need to store the name table in video RAM, and as each name table entry has been inflated to two bytes (which does at least let you index pattern numbers above 255 without splitting the screen into thirds as per the Graphics II scheme) that leaves even less room for patterns.

In practice you are limited to 448 patterns, which is not enough to cover the screen. As you will generally start from a blank screen and then add to it, the name table could be filled with blanks by default and then as you draw over the screen of it new patterns could be allocated as required. This is how the graphics currently work in the 16-colour Master System video mode, and it allows for colourful drawings with no colour clash as long as you don't fill too much of the screen.

Example of 16-colour Master System output
Cautious use of the 16-colour Master System mode

That's the mode the sphere above is rendered in; you can see there's fine detail at its North pole without becoming subject to colour clash and it doesn't fill so much of the screen that we ran out of unique patterns to allocate. As it would be very bad to run out of patterns when attempting to draw text the character set (the 96 characters from 32-127) are permanently allocated as unique patterns which further reduces us to 352 tiles for graphics output.

It's possible to constrain the graphics viewport via VDU 24 to a smaller region which would ensure you never needed more than 352 tiles, as long as your new viewport was properly aligned to the 8×8 grid. For example, 22×16=352 which means that you could have a 176×128 pixel graphics window that you could draw to without risking running out of patterns.

Screenshot of colour clash in Graphics II     Screenshot of running out of patterns in Master System mode
Colour clash in Graphics II (left) versus running out of tiles in Master System Mode (right)

The two screenshots above show the trade-off between colour clash in Graphics II and corruption due to running out of patterns in the 16-colour Master System graphics mode, both caused by running this program:

 10 INPUT "Mode",M%
 20 MODE M%
 30 FOR I%=0 TO 47
 40   GCOL 0,I%
 50   X%=(I%/47)*1279
 60   Y%=(I%/47)*959
 70   MOVE X%,0
 80   DRAW 1279,Y%
 90   DRAW 1279-X%,959
100   DRAW 0,959-Y%
110   DRAW X%,0
120 NEXT I%
130 REPEAT:UNTIL INKEY(0)=-1
140 REPEAT:UNTIL INKEY(0)<>-1

Having to worry about your pattern budget isn't really in the spirit of fun and experimentation. If we're up against hardware and memory limitations I'd rather have a slightly less capable video mode that does at least generate correct output however you use it.

It is possible to completely avoid colour clash if you use the Graphics II mode and simply never use any colour commands, but only having two colours is a bit drab. With a bit of tinkering I came up with another graphics mode that is still using the Master System mode, but reduces the number of colours down to 4. Quite fittingly, a lot of BBC Micro screen modes only supported four physical colours, so I set this new mode to use the same black/red/yellow/white palette by default:

Example of 4-colour Master System output
Four colours on-screen at once, no clash and no corruption

In fact, the palette is the whole key to this solution. You may recall that the Master System mode supports two 16-colour palettes. Usually one is assigned as the "background" palette and the other as the "sprite" palette, however each entry in the name table contains a flag that lets you specify which palette the selected pattern should be shown with. In this new video mode, the top half of the screen is assigned to patterns 0-383 using the "background" palette, and the bottom half of the screen is assigned to the same set of patterns 0-383 but to use the "sprite" palette. The palette is then filled with four colours, like this:

Illustration of the reduced colour palette
Using all 32 palette entries to display four colours on screen simultaneously

The top row is the "background" palette and shows the four desired colours repeating in order four times. The bottom row is the "sprite" palette, and shows that each of the four colours is stretched to fill four consecutive palette entries. This use of repetition effectively turns two bits in each pattern's bitplanes into "don't care" values, depending on which palette is selected.

For example, assume we're in a pattern assigned to the "background" (top) palette and we want to display red. This is colour %01, but as the red entry is repeated we could use use any of colours %0001, %0101, %1001 or %1101 and they'll all come out red – we don't care about the top two bits.

Alternatively, we could be in a pattern assigned to the "sprite" (bottom) palette and want to display yellow. This is colour %10, but it fills the entire palette block from %1000 to %1011, so again we don't care what the lower two bits are, just what the top two bits are.

Using this somewhat redundant palette, we can pack two different four-colour patterns for different parts of the screen into a single sixteen-colour pattern. It's a shame that a halving in bit count ends up quartering the number of colours but I think it's an acceptable solution considering the hardware limitations.

Better drawing, editing, memory and emulation for BBC BASIC on the Sega Master System

Tuesday, 3rd August 2021

I have continued to work on the Sega Master System version of BBC BASIC, and it's feeling much more like a practical version of BASIC than something that was just holding together to run one specific program!

One of the key improvements is to standardise the handling of the different graphics modes. Previously I was using a coordinate system where (0,0) is in the top left corner and all drawing operations were carried out using physical device coordinates (so the screen was treated as being 256 pixels wide and 192 pixels tall). I have now moved the origin (0,0) into the bottom left corner with the Y axis pointing up and scale down all coordinates by 5, effectively making the screen 1280 logical units wide and 960 units tall. This isn't 100% compliant, as the BBC Micro and other versions of BASIC treat the screen as being 1024 units tall, but dividing by 5⅓ is considerably trickier and it would result in the graphics being squished further vertically, so I think using a logical resolution of 1280×960 is an acceptable compromise. I've added some VDU statements to allow you to move the graphics origin as well as define custom graphics and text viewports, so you can control where on the screen graphics and text appear and how they are clipped/scroll.

Screenshot of the Mandelbaum program output
The output of the "Mandelbaum" program following these changes

I have also changed the default palette to more closely match the one used by the BBC Micro and other versions of BBC BASIC. This isn't too difficult when using the Master System's "Mode 4" as that has a user-definable palette, but the legacy TMS9918A modes have a fixed palette so I've tried to match the default palette as sensibly as I can to the TMS9918A palette. It's possible to change the logical to physical palette mappings under Master System "Mode 4" via a VDU command which writes directly to the CRAM (you can either remap one of the 16 logical entries to one of the stock 16 colours, or supply an RGB value directy to select additional colours) which allows for neat tricks like palette cycling, but the TMS9918A modes currently only let you change the current text/drawing colour, not amend the palette as that's fixed in hardware.

I've also added filled/outlined circle and axis-aligned ellipse PLOT commands using some code written by Darren "qarnos" Cubitt which I originally used in the TI-83 Plus version of BBC BASIC. This code is very capable and fully accepted 16-bit coordinates for its inputs, however it was also originally designed to output to a 96×64 pixel screen so the final plotting was done with 8-bit coordinates ranging from -128..127. Fortunately the Master System's screen also fits in 8-bit coordinates at 256 pixels wide but that's not quite enough information as you also need to be able to tell if a particular point is off-screen (less than zero or greater than 255); simply clipping it against those boundaries will result in a vertical smear on the left or right edge of the screen when drawing outlines. Fortunately I was able to figure out how to modify his code to add some extra clipping status flags to ensure that ellipses were clipped and displayed correctly on any part of the screen.

Screenshot showing filled circles     Screenshot showing filled cones
Filled circles and filled/outlined ellipses make these drawings possible

The only graphics operation exposed by the mode-specific drivers before was a simple "set pixel" routine. This is fine for lines but quite slow for filling shapes so graphics mode drivers can now supply a "fill horizontal span" routine for faster shape-filling. If the routine is left unimplemented a stub routine is provided that fills the span using the "set pixel" routine.

I also added a rectangle-filling PLOT command, which is perhaps not the most exciting-sounding graphics operation but it is used to clear the screen so it is at least useful. More interesting is a triangle-filling routine, something I've never enjoyed writing!

Usually I get very bogged down in the idea that the pixels affected when you fill a triangle should exactly fit within the pixels that are outlined by drawing lines between each of the triangle's points, no more and no less. This can be a bit difficult when the triangle is filled by tracing its edges from top to bottom and drawing horizontal spans between them. If the edge is "steep" (its overall height is greater than or equal to its width) then this isn't too bad, as there's only one X coordinate for each Y coordinate where a pixel would have been plotted. However, when the edge is "shallow" (its overall width is greater than its width) there are going to be certain Y coordinates where the line drawn would have had multiple pixels plotted. In that case, where is the boundary of the horizontal span?

The cop-out answer I've used in the past has been to set up three buffers the total height of the screen and to "draw" the three lines first using the same line-drawing algorithm as the line PLOTting command, keeping track of the minimum and maximum X coordinate for each Y coordinate. When it's time to fill the triangle the minimum and maximum X coordinate for each edge can be determined based on the current Y coordinate and a span drawn between them for perfect triangles. On the TI-83 Plus this takes up four bytes per line (minimum and maximum 16-bit values) for a 64 pixel tall screen, with three buffers for the three lines that comes to 4×64×3=768 bytes, pretty bad. On the Sega Master System that would be 4×192×3=2304 bytes, totally unacceptable on a machine with only 8KB total work RAM!

Screenshot of a 3D sphere rendered by BBC BASIC
Each face of this 3D sphere is filled by two triangles.

I've instead simply done my best to interpolate from one X coordinate to the other when working my way down the triangle and filling scanlines, doing a bit of extra pre-incrementing and fudging of initial error values depending on whether it's the top half or bottom half of the triangle. My test was to draw the outline of a triangle in bright white and then to fill a dark blue triangle over the top, if any bright white pixels were visible around the outside this indicated a gap. I mostly got it filled, but I then tried my test program on a BBC Micro emulator and found the BBC Micro exhibited similar gaps so I don't think I'm doing too badly! The above screenshot of the 3D sphere was rendered using this triangle-filling code.

Screenshot showing buggy BASIC code with a PRONT instead of a PRINT statement

I've also been working on improving the line editor. This is called by BASIC when it's asking for a line of text input from you. Previously I'd only implemented adding characters to the end of the line and pressing backspace to remove characters from the end of the line; if you'd typed in a long piece of code and made a mistake at the start you'd need to backspace all the way to the mistake, correct it, then re-type the rest of the line. Now you can use the cursor keys (or home/end) to move around within the line, insert or overwrite new characters (toggled by pressing the insert key on the keyboard) at the cursor position or backspace/delete characters mid-line as required. It sounds like a small thing but it was quite a lot of code to get right and makes a big difference for usability!

Another feature I've added to aid modifying lines of code is the *EDIT command. This takes a line number as an argument and then brings up the program line you've requested in the line editor, ready for editing. The way this works is a bit sneaky, as it's not natively implemented by BBC BASIC! The trick that makes it work is the ability to override two routines, OSLINE (which BASIC calls when it wants to display the line editor) and OSWRCH (which BASIC calls when it wants to output a character).

When a valid *EDIT <line> command is entered, OSLINE is overridden with the first custom routine. This routine doesn't ask for a line of input, but instead copies L.<line> to the input buffer, overrides OSWRCH with a routine that captures data to RAM, overrides OSLINE with a second custom routine, then returns. BASIC therefore thinks you've typed in a LIST statement so it dutifully starts outputting the line via OSWRCH, but this has been overridden to write the characters to RAM rather than the screen. When it's done this BASIC then calls OSLINE again for the next line of input, which brings up the second custom OSLINE handler. This pulls the data from RAM previously captured by the OSWRCH handler, copies it to the OSLINE buffer, and dispays it on-screen as if you'd just typed it in. It then restores the original OSLINE and OSWRCH handlers before jumping back into the stock OSLINE routine, so you can continue editing the line that you'd requested via *EDIT.

A modified Monopoly cartridge, used to run BBC BASIC with an 8KB RAM expansion.

All of this hopefully makes entering programs via the keyboard less cumbersome. Of course, not having to type in programs in full every time you wanted to run them would be even better, and an attempt to give BASIC more memory provides another way to load programs in a somewhat roundabout manner.

The photograph above shows the modified Monopoly cartridge that I'm now using to test BBC BASIC on real hardware instead of the modified After Burner cartridge I was using before. The advantage of Monopoly is that it has an additional 8KB RAM on board, which is used to save games in progress. Sega's cartridge mapper allows for on-cartridge RAM to be mapped into the address range $8000..$BFFF, immediately below the main work RAM which is at $C000..$DFFF. If present, then, BASIC's memory range (which runs from PAGE to HIMEM) can be extended by enabling the save RAM and moving PAGE down.

$8000..$BFFF is a 16KB range, though, and I mentioned that Monopoly has an 8KB RAM. This is true, and what it means is that the 8KB cartridge RAM is accessible from $8000..$9FFF but is then repeated ("mirrored") from $A000..$BFFF. The cartridge RAM detection therefore has to check two things: firstly that there is RAM in the first place (which can be verified by modifying memory at $8000 and seeing if those values stick) and how big it is (which can be checked by writing to $8000 and seeing if that has also modified the data at $8000+<RAM size>). Once presence of any RAM has been determined during startup, RAM mirroring is checked in 1KB, 2KB, 4KB and 8KB offsets. If any RAM mirroring is detected, then it's assumed the RAM is the size that was being checked at the time, however if not it's assumed that the RAM is the full 16KB. At this point, PAGE (which is $C000 in a stock machine with no cartridge RAM) is moved backwards by the size of the detected cartridge RAM, e.g. to $A000 for an 8KB RAM and $8000 for a 16KB RAM. This results in 16KB or 24KB total available to BBC BASIC, a considerable upgrade from the plain 8KB work RAM!

An added bonus of this is that when you type in programs they grow upwards in memory from PAGE. As PAGE now starts within your cartridge memory, and that cartridge memory retains its contents courtesy of a backup battery, it means that if your entered program is smaller than the size of your cartridge memory you can restore it when you switch the console back on by typing OLD.

That's very well for life on real hardware, but I continue to do most of my development testing in an emulator. I'm having to use my own emulator as there aren't any other Master System emulators that also include a PS/2 keyboard emulator, but I did end up running into a very weird bug. Certain trigonometric functions in BBC BASIC were producing very wrong values, resulting in very odd-looking output.

Screenshot showing wonkily-rendered cubeScreenshot showing wonkily-rendered Mandelbaum setScreenshot showing wonkily-plotted sphere
These shapes are on the wonk

Even though the Z80 emulator at the heart of the program passed ZEXDOC (an instruction tester that checks documented functionality) I remembered that it had failed some aspect of ZEXALL which checks the undocumented flags too. I re-ran ZEXALL and found the problem was with my implementation of the bit instruction, so I worked on fixing that including emulation of the Z80's internal temporary memptr/WZ register to ensure that bit n,(hl) set bits 3 and 5 of the flag register appropriately. ZEXALL now passes, but unsurprisingly it didn't fix my problem (as I didn't really think that BBC BASIC would rely on undocumented functionality!)

I ended up isolating certain exact values that when plugged into SIN() or COS() would produce incorrect results. I then dug out my old CP/M emulator and tried plugging those values into the generic CP/M version of BBC BASIC (which has none of my code in it!) and that produced the same, incorrect, results confirming that the issue was definitely in my Z80 emulation and not in some flaw of the BBC BASIC port.

After tracing through the code and dumping out debug values at certain points and seeing where it differed to a known good reference emulator I found that the fault occurred in FMUL, and from there I found that at some point the carry flag (which is either rotated or added to registers with RR or ADC instructions) was being lost. RR looked fine but digging into my implementation for the 16-bit ADC I found the culprit: the code is the same for ADD and ADC, but in ADC if the carry flag is set then the second operand is incremented by one first. This produces the correct numeric answer, but if the second operand was $FFFF on entry to the function then adding a carry of 1 to it would cause it to overflow back to 0. As this happens before any of the rest of the calculations are made it means that the final value of the carry flag was calculated based on op1+0 instead of op1+$10000, hence the loss of the carry flag.

Fortunately, fixing this fixed the wonky output demonstrated above and I feel slightly more confident in my emulation! Now I can continue working on BBC BASIC, I've had an idea for a screen mode that has a reduced number of colours but will hopefully let you draw anywhere on the screen without running out of tiles causing odd corruption (as happens in the usual 16-colour mode 4) and without attribute clash (as happens in the TMS9918A Graphics II mode)...

Running BBC BASIC on the Sega Master System

Thursday, 22nd July 2021

I've recently been spending some time finding a way to run BBC BASIC on the Sega Master System, inspired by BASIC Month 6: The Mandelbaum Set on the RetroBattlestations Reddit community.

Photograph of the final setup with a Sega Master System running BBC BASIC and a Z88 computer acting as file store.

When this month's program was first announced I tried running it on my only unarguably retrobattlestation, my Cambridge Z88, but the screen's low (vertical) resolution didn't do the program much justice.

I thought this gave me two options:

  1. Control some external piece of retro tech to produce higher-resolution output (e.g. a printer or a plotter.
  2. Port a BASIC interpreter to another retro system with a higher resolution display and run the program on that.

Unfortunately, I don't own any old printers or plotters (or a Logo-like turtle!) so option 2 seemed my best option. I had some experience adapting Richard Russell's BBC BASIC (Z80) to run on the TI-83 Plus calculator so I thought I should pick the Sega Master System as that also has a Z80 CPU in it. Here are some rough specs:

  • 3.58(ish)MHz Z80 CPU.
  • 8KB work RAM.
  • TMS9918A-derived VDP for video with 16KB dedicated VRAM accessed via I/O port.
  • SN76489-derived PSG for sound.
  • Two controller ports with six input pins and two pins that could be configured as inputs or outputs.
  • Software loaded from ROM cartridge or card slot with a small BIOS ROM that detects whether a cartridge or card is inserted (and if not, runs its own built-in game).

There is some precedence to this endeavour with the computer version of Sega's SG-1000, the SC-3000, which came with a keyboard and had BASIC ROM cartridges.

An RS-232 serial adaptor and PS/2 keyboard adaptor for the Master System.

I wanted to try to keep this project as retro as possible, so no modern microcontrollers as I've been accused of cheating by using them in the past. I did have to make a couple of adaptors to allow me to plug in a keyboard and to give the Master System a serial port to load or save programs over – more about these later!

Loading BASIC onto the Master System

The first problem was getting BASIC onto the Master System at all. As my Master System doesn't have BASIC in ROM (it comes with Hang-On, which is perhaps more fun but less likely to handle the Mandelbaum set) I'd need to load the program onto a cartridge or card. Master System cartridges usually contain a mapper circuit to handle bank switching in the lower 48KB of the Z80's address space (the upper 16KB contains the 8KB work RAM, appearing twice) which is normally integrated directly into the ROM chip for the game, however there are a handful of games that have separate mapper chips and ROM chips and the ROM chips that Sega used have a pinout that is extremely close to that used by common EEPROMs (e.g. 29F010,
49F040) with only a couple of pins needing to be swapped around. After Burner is one such cartridge, so I've modified an old copy to let me plug in flash memory chips that I can program with the version of BASIC I'm working on. A switch at the top of the cartridge lets me switch back to the original pin configuration if I want to play the original copy of After Burner.

Typing commands into BASIC

With a flashable cartridge to hand I was able to assemble a version of Richard Russell's BBC BASIC (Z80) with some simple code stubs in place to direct text output to the screen. Output is useful but only half the story, we still need to be send commands to BASIC!

The Master System doesn't have its own keyboard, so I'd need to find some way to interface one to it. I have previously used PS/2 keyboards in a number of projects, as they are pretty simple to deal with. Electrically, they use two open collector I/O lines for bidirectional data transfer (one as clock, one as data), and fortunately each controller port of the Master System has two pins that can be configured as outputs which can therefore be used to interface with a keyboard (either left as inputs and pulled weakly high in their idle state, or driven low as outputs for their active state).

Showing the internal wiring of the PS/2 keyboard adaptor and controller passthrough.

It acts as a pass-through cable so you can still have a regular controller plugged in when using the keyboard.

The two pins on the Master System control port that can act as outputs are TH and TR; TH is normally used by light guns to latch the horizontal counter in the video chip so is unused with normal controllers so it's no great loss here, but TR is the right action button (marked "2") so by using this pass-through adaptor you do unfortunately lose one of the controller buttons. However, you do gain a hundred or so keyboard keys, so I don't think it's too bad a compromise...

For the software side of things I adapted my Emerson AT device library, previously written for the TI-83 Plus calculators, to the Sega Master System hardware. This library handles the low-level AT device protocol and also translates the raw keyboard scancode values to the corresponding characters.

Saving and loading programs

At this point I was able to type in BASIC programs and run them on the Master System, which was pretty neat! However, I was still working on adding new features (e.g. drawing commands for graphics) and having to type in the entire Mandelbaum program after every change was going to get pretty exhausting. I could bake it into the ROM but that seemed like cheating, so I thought I should try to find a way to load the file from an external source. A floppy disk or tape cassette would seem authentically retro but adding a floppy drive controller to the Master System would be a fairly complicated task and I don't have a suitable data cassette recorder to even attempt loading from tape so I thought some sort of file store accessible over a serial port would be a good option.

The Z88 has a serial port and can act as a remotely-controlled file store when running the PC Link software (with the protocol documented here. This seemed like a good choice, if not for the fact that the Sega Master System doesn't have a serial port of its own. To get around this, I added one, using a MAX232 chip to adapt the Master System's 5V logic levels to RS-232 compatible ones so I could plug in a null modem cable from the Z88 or my PC without accidentally frying the Master System with -12V.

Showing the insides of the RS-232 serial port adaptor.

I wrote some code that bit-bangs the serial data over the controller port lines using the timing loop code I wrote for a previous BASIC Month (Crisps Tunes) to support rates between 19200 and 300 baud. 19200 baud is somewhat unreliable but that's OK because the Z88's 19200 baud is unreliable too, so the default 9600 baud speed does a good job. RTS/CTS handshaking has to be implemented, as there is no hardware serial support on the Master System and it needs to be actively polling the port to receive any data. In doing so I noticed one awkward fact about my PC's serial port - if you de-assert RTS it will continue sending data until its buffer is empty, presumably only checking the RTS line when it's about to top up the buffer. In practice this means that even if I change RTS virtually as soon as the start bit for the first byte is received, the PC will continue to send up to 256 bytes before stopping. To get around this I added a serial receive buffer that immediately checks for the next byte even after asserting RTS, and this seems to have done the trick.

The protocol used by PC Link requires acknowledgement after every single byte so is very slow but at least it's reliable. I plumbed the PC Link code into BASIC's LOAD and SAVE which makes loading and saving programs as transparent and easy as if you had a floppy disk in the system instead!

Pressed for size

The Master System has 8KB of RAM. Of this, 16 bytes are mapped to special hardware functions and BBC BASIC reserves 768 bytes for itself, so we're already down to 7,408 bytes. I initially reserved 256 bytes for my own needs (display settings, VDU command buffer, serial port status, keyboard status etc) bringing it down to 7,152 bytes. The 16KB of display memory is not directly accessible to the CPU, so it can't be used for additional work RAM, and having to access it indirectly via I/O ports is very slow but I can't afford to mirror parts of it in RAM for speed.

Initially the Mandelbaum program ran well enough by stripping out comments, but I then added sound support (with eight 13-byte envelope definitions, and four channels with their own state, copy of their active envelope and a command queue for 32 bytes per channel, adding around another 140 bytes of memory usage) and the program stopped running with a "No room" error during execution (performing a square root operation, of all things!) so I guess the tolerances were very tight. I went and combine more lines of code into single lines and replaced two-letter variable names with single-letter ones (no, really) and it was able to run again but I don't think 8KB is a particularly comfortable amount of RAM for a BASIC computer!

Further design considerations

The version of the BASIC host interface used here is very much a work-in-progress. I would need to extend it considerably to be useful, including:

  • Fuller support of different VDU commands, e.g. redefining character shapes, changing the text and graphics viewports, better colour handling (differentiating between logical palettes and physical palettes).
  • Better support of other modes (so far only TMS9918A "Text" and "Graphics II" modes are used, there is a Master System-specific "mode 4" but that lacks graphics support and only takes advantage of hardware scrolling for extremely fast program LISTing).
  • Implementation of more graphics commands - so far only PLOT 4 (MOVE) and PLOT 5 (DRAW) are implemented. They also use a non-standard coordinate system with (0,0) in the top left of the screen and a screen resolution of 256x192, whereas for standardisation with other BBC BASIC implementations this should move (0,0) to the bottom left and use a logical resolution of 1280x1024 or similar.
  • Support of other file systems rather than rely on a Z88 running PC Link, e.g. using I²C EEPROMs (as they use two open collector pins, so could be plugged into a controller port via a passive adaptor).
  • Support of extra RAM, either integrated directly on a custom cartridge or using the battery-backed SRAM supported by some other cartridge types.
  • Native controller support via BASIC ADVAL command (at the moment you can access the controller ports directly with GET(&DC))

This is before even getting into adding anything machine-specific (e.g. to take advantage of scrolling tilemaps or hardware sprites), but getting the BASICs down (and consistent!) is a very important starting point. Consistency is useful, after all I was able to get the original program running under BBC BASIC with very minimal changes to it.

But, in the short term, I have at least succeeded in what I set out to do, which was to run the Mandelbaum program on a retro "computer" wholly unsuited to it!

The above video provides another demonstration of the setup – playing the Cold Tea music demo, albeit with a heavily stripped-down version of the visuals as the Master System doesn't have anything that can match the capabilities of the BBC Micro's teletext mode 7.

A parallel port and a demonstration of the Z80 computer

Sunday, 5th September 2010

The last piece of hardware to add to the computer was a parallel port. These have eight data lines and nine assorted control and status lines. My last two 8-bit I/O expanders provide sixteen of these seventeen lines, and the final one was provided by the DS1307 real-time clock chip which happily has a spare pin on it that can be used as an output.

Parallel port I/O expanders Parallel port connector

This parallel port can be used to print from the computer. Some software has printing capabilities built in (such as the text editor VEDIT Plus), but by pressing Ctrl+P in CP/M any text sent to the display will be simultaneously sent to the printer.

I also needed to mount the LCD inside the case. I bought a plastic strip to try to make a nice frame for it, but couldn't cut it accurately enough by hand so have had to make do with merely sticking the LCD behind a rectangular hole cut in the aluminium. It's not the neatest arrangement and doesn't protect the LCD from scratches but is better than nothing.

To demonstrate the computer's hardware and software, I recorded a video:

I'm not desperately happy with the way it came out; I really need to find a better microphone and the angle of the sun and variable weather when I made the video threw the white balance off. On the plus side, I did find out how to capture crisp black and white video with my TV capture card; I connected the composite video output from the computer to the luma pins on the S-video input on the capture card, then dropped the saturation to zero in VirtualDub. For some reason this produces great quality video, in comparison to the composite input which produces a fuzzy mess — there shouldn't really be any difference with a black and white signal (regular television sets don't have any problems).

A clock and a serial port for the Z80 computer

Tuesday, 17th August 2010

At the end of the previous entry I mentioned that I was going to start developing my own programs for the Z80 computer. The first is a graphical clock, taking advantage of my implementation of the BBC Micro's VDU commands and the ability to use those commands to draw graphics onto the screen as well as text:

Graphical analogue clock for CP/M 3

I have uploaded the code and binary to my site for anyone who is interested, though it will only work on a machine running CP/M 3 and that is equipped with a display that implements a handful of BBC Micro VDU commands.

The computer features a display for output and a keyboard for input which is sufficient if you're interacting with a human but it's often nice for computers to be able to speak to eachother, so I've added an RS-232 serial port.

RS-232 driver and port from the inside

RS-232 is a bit of an unfriendly beast. Whereas the computer's logic uses 0V to indicate a logic low (0, "false") and 5V to indicate a logic high (1, "true") RS-232 uses around +12V for a logic low and -12V for a logic high. This requires that the outgoing signals are inverted and boosted and the incoming signals are inverted and reduced to protect the inputs of the receiver circuit. Fortunately you can easily get hold of chips that perform this task for you when aided by a number of capacitors; in my case I'm using an ST232, which is shown in the bottom left of the above photo. A DE-9M connector is provided on the outside of the case, much like the one you'd find on your desktop if you were trapped in the 1990s.

One issue I have yet to solve is handshaking. The serial port sends or receives data on two wires (TXD and RXD respectively). The receiver has to handle each incoming byte from the transmitter. As the receiver may be busy performing other tasks at the time it may end up receiving data faster than it can process it and it will start losing bytes. There are a number of different ways to avoid this problem. The simplest electronically is to use XON/XOFF handshaking; in this configuration, the receiver can send the XOFF byte to the transmitter when it's busy and the transmitter will stop sending data temporarily. The receiver can then send XON back to the transmitter when it's ready to receive more data. This technique has one major drawback — it prevents you from sending binary data containing the XOFF or XON bytes.

An alternative solution is to add two wires to the serial connection — Request To Send (RTS) and Clear To Send (CTS). These can be used to signal when each device is available to accept data. This allows you to send XOFF and XON directly over the serial port (extremely useful for binary data) yet requires the addition of two more wires to the port.

Unfortunately whilst implementing both techniques is possible, CP/M only internally refers to XON/XOFF handshaking; there is no way to select RTS/CTS handshaking. I think what I will end up doing is have CP/M's XON/XOFF refer to handshaking in general and then add a hardware-specific utility that lets me choose which particular type of handshaking I wish to use. This utility could also help me select other serial port configuration settings that CP/M doesn't expose (such as parity, number of stop bits or number of data bits).

Z80 computer session in PuTTY

With the hardware installed, the AVR I/O controller updated to use it and the BIOS reprogrammed to expose it to CP/M it is possible to interact with other computers over the serial port. CP/M features five logical I/O devices: CONIN and CONOUT for general console input and output, AUXIN and AUXOUT for general "auxiliary" output and LST for printer output. The BIOS exposes two physical devices; CRT for the keyboard and video display controller and RS232 for the serial port. By using the DEVICE utility you can connect these logical and physical devices together. In the above screenshot I have connected the serial port to both CONIN and CONOUT. This allows me to connect my desktop PC to the Z80 computer using a null modem cable and use terminal emulation software (such as PuTTY) to talk to it.

Simulated BBC Micro VDU mirroring console output

The above screenshot shows VirtualDub capturing the output of the video display controller next to an instance of BBC BASIC for Windows which is running the following program:

aux%=OPENIN("COM2: baud=9600 parity=N data=8 stop=1")
REPEAT
  REPEAT:UNTIL EXT#aux%
  VDU BGET#aux%
UNTIL.

This passes any data received over the serial port to the simulated VDU in BBC BASIC for Windows. As both video devices accept the same commands the result is that both show approximately the same thing.

I have been slightly improving the video display controller as I've gone along. One feature I had to add for the clock was the ability to draw text characters at the graphics cursor position, as opposed to the fixed text grid (this is used to draw the numbers around the dial). At the same time I added the ability to redefine the appearance of characters. One obvious use of this feature is to change the font, but when combined with the ability to render text anywhere on the screen some simple sprite-based games could be written for the computer. Each letter is just a 8×8 pixel sprite, after all.

MODE 2

Another feature I added was a simple implementation of MODE 2 where characters are stretched to sixteen pixels wide. You can't get much text on the screen in this mode but it may be useful for games.

A useful Z80 computer in a project box

Saturday, 14th August 2010

Work continues on the Z80 computer. The two final modifications to the box itself are the holes for the status LEDs and the power switch.

Status LEDs Power switch

The green LED indicates power and the orange one disk activity. Unfortunately, the project box is fairly scratched on the outside (one scratch on the front is my own fault, but the sides and back were fairly scuffed and scratched when I bought it). If anyone has any tips for polishing scratches out of ABS I'd be glad to hear them; the usual household polishing abrasives (such as toothpaste) remove most of the light scuffs and result in a lovely mirror finish, but don't do anything to the deeper scratches. I'll probably invest in the finest grade wet-and-dry sandpaper I can find and have a go with that followed with a Brasso polish, and if that doesn't help (or makes it worse) just sand the whole thing down and paint it.

Pin header connector

The circuit board inside the case needs to be attached to the case-mounted components somehow. In simpler projects I've resorted to soldering these connectors directly to the board, but this can make maintenance a problem (to remove the circuit board one would have to cut and resolder the wires). For this project I've left pin header strips on the board. The external connectors have leads soldered to them terminated with pin headers cut to size using some wire cutters and a rotary tool to polish them off; these headers are pictured above.

Circuit board mounted inside the case

The main circuit board can then be easily installed or removed from the case as required. The small circuit board for the video display controller is connected to the main circuit board in the same way.

Z80 and SRAM pin numbers marked

A Z80 computer can't live up to its name without some sort of a Z80 inside it, so I thought that that was the most obvious part to add next. Computers also generally need access to memory so I decided to add the 128KB SRAM chip at the same time. The Z80 communicates with the memory over an eight-bit data bus, a sixteen-bit address bus (to indicate which address in memory it is reading from or writing to) and a number of control lines (to indicate whether the current operation is a memory read or a memory write, for example). This provides a fairly tedious amount of soldering work; each pin on the memory needs to be connected to the corresponding pin on the Z80. To aid in the construction I stuck masking tape to the bottom of the perfboard around the outline of where the two chips would go and wrote the pin numbers onto the tape, shown in the photograph above.

Z80 and SRAM address, data and control buses

I put the two chips close together so I could put all of the bus wires on the inside of the IC holders rather than going around the outside. This saves a bit of space and avoids having to route the wires around the chip holders which gets a little untidy. The above photograph shows all of the wires in place before the chip holders were soldered in. Adding those in should be a quick and easy job, at least...

SRAM IC socket soldered in the wrong way around

Well, you'd have thought so, but somehow I managed to solder in the 32-pin SRAM socket the wrong way around. Each socket has a notch to help you align the chip using its corresponding notch. As you can see in the above photo the notch points right when it should point left like all of the other sockets. It wouldn't affect the operation of the circuit (as long as the SRAM chip was inserted with the notch to the left) but it looks untidy and I may as well do the job properly.

SRAM IC socket soldered in the correct way around

On the positive side I suppose I got to practice my desoldering skills.

Z80 and AVR data bus connections

The computer design uses an AVR microcontroller to manage the I/O devices (such as the keyboard, video display controller and SD card) and to load the OS into the Z80's memory on reset. To achieve this the Z80 and the AVR need to be connected together. The above photograph shows some new wires between the AVR (bottom left) and Z80 (bottom middle) to connect the Z80's data bus to the AVR's PORTA and a number of other wires to connect the Z80's control lines to several other I/O pins on the AVR. A number of pull-up resistors have been added to control lines on the Z80 so that when nothing is driving the control bus they rise high (the de-asserted state). If left disconnected ("floating") the other components connected to the control bus may think these lines had gone low (asserted) and treat that as a read or write operation, corrupting data.

I/O expanders Soldering detail of the I/O expanders

The AVR also needs to be connected to the Z80's address bus. This would take another sixteen pins if driven directly by the AVR; sixteen pins that aren't available to me! I am therefore using two MCP23S08 eight-bit I/O expanders, pictured above, to drive the address bus from the AVR. These are controlled over the SPI bus, which only takes up three pins on the AVR (these pins are shared with other SPI peripherals, such as the SD card) plus a single chip select pin that is unique to the I/O expanders. Four pins is better than sixteen, at any rate.

All ICs to date installed Computer in its project box

I keep mentioning chips even though the sockets are quite clearly empty in the above photographs. As I was approaching a useful computer circuit at this point I plugged all of the chips into their sockets to test the connections. As there was no SD card, real-time clock or keyboard I had to modify the boot loader on the AVR quite considerably; I started with a test program that wrote random data to blocks of memory then read them back to verify that they had written correctly. Once I had verified that the AVR was able to access memory correctly I reprogrammed it to copy a small Z80 program to memory and then let the Z80 take over. This Z80 program repeatedly output the string 'Z80' to the console output port. With everything plugged in I switched on the computer and saw the screen fill with Z80Z80Z80… so I was pretty certain that I'd wired everything up correctly!

DS1307 and battery clip

At this point I could start reintroducing the various peripherals to the computer. A DS1307 is used as a real-time clock. This clock needs to keep running when the computer is switched off, so I've added a 3V battery connector to the computer to keep it ticking.

SD card slot

As the computer uses a 512MB SD card for storage, I have added a pin socket strip to the board to plug in the SD card slot I scavenged from a card reader. The card is connected to the SPI bus along with the I/O expanders used to drive the Z80 address bus. SD cards run at 3.3V rather than the 5V that nearly everything else on the board uses so I've used a series of voltage dividers to drop the voltage on each input pin from 5V to around 3V (the resistor values I have don't allow me to get to 3.3V; 3V is the closest I can manage without going over 3.3V). The video display controller board also runs on 3.3V so I do at least have a suitable voltage supply for the card!

Keyboard connector

The final part of the computer that was on the breadboard prototype but not yet in the final build was the keyboard connector. This is simply a four pin header on the board that is connected to the PS/2 port screwed to the case. However, when I tried to use the computer, the keyboard didn't appear to work. Pressing Num Lock, Caps Lock or Scroll Lock would toggle the associated LED and hitting Ctrl+Alt+Del would reboot the computer but no other key worked. This implied that the AVR was handling the keyboard correctly but the Z80 wasn't receiving any notification of key presses. A bit of digging identified the problem; I'd forgotten to connect the Z80's interrupt pin to the AVR! When a key is pressed the AVR triggers an interrupt to let the Z80 know that a key is available. By soldering a wire between the two chips it started working as intended.

Z80 computer in its enclosure

The computer is now up to the same standard as it was when assembled on the breadboard, but is much more practical to work on. I hope to add a serial and parallel port to the computer soon, and would like to mount an LCD into the lid of the project box, but for the time being I am happy that I have managed to get this far.

Z80 computer running VEDIT

One of the advantages of running CP/M on the computer rather than my own operating system is the availability of existing software. The above photograph shows the computer running VEDIT, which is an excellent visual text editor.

VEDIT for CP/M

Zork for CP/M

With the hardware in a decent configuration I can start writing my own software. I think the first CP/M program I'll write is a graphical analogue clock, as this is the sort of program that can be left running for long periods as a way to check the stability of the computer.

Mounting circuit boards and rear panel connectors

Monday, 9th August 2010

One of the fun things about working with electronics is that you can end up with a physical product at the end of your hard work. To this end I have started moving my Z80 computer from its current breadboard to a more permanent enclosure.

Project box outside Project box inside

Large project boxes can be quite expensive (around £40, it seems), but the one I picked out was a slightly more reasonable £7. It's not the prettiest enclosure I've seen but it should be large enough to house the computer and provide space on the lid for the LCD and on the rear surface for a collection of connectors (as you'd expect to find on the rear of any computer).

Perfboard shown inside the computer.

The first challenge was how I intended to mount the circuit board within the box. The perfboard I will use for the main computer circuit doesn't fit the marked mounting posts on the bottom of the project box; it's too narrow and too deep. What the photo doesn't show very well is that the perfboard is not able to lie flat in the box due to the curve at the rear of the box. To raise the board above the bottom of the box I decided to use four PCB spacers, which required two new holes to be drilled into the perfboard away from its corners.

Two new holes for PCB spacers Underside of the perfboard showing PCB spacers

I decided that the video display controller, which resides on its own board, should be mounted on the main circuit board using PCB spacers too.

Holes drilled to support the VDC VDC mounted on the main circuit board

This required four more holes to be drilled into the main circuit board. I tried to align the small video display board so that its 16-way pin socket for connection to the LCD was as close to the horizontal centre as possible.

Holes drilled to support the main circuit board Using the main circuit board to find the position of all of the screw holes

The base of the project box needed to have four holes drilled into it to support the main circuit board. Once the two nearest the front edge had been drilled, I screwed the circuit board to the back of the project box to mark the position for the other two holes to ensure that they lined up exactly with the holes drilled in the circuit board.

Both circuit boards mounted inside the box

Screws come through the bottom of the project box to hold the main circuit board in place. Some sticky foam feet are provided with the project box which will raise it off the surface it is resting on to prevent these four screws from leaving scratches! Due to the curve at the back of the box the circuit board is only a few millimetres above its surface, which is why I reversed the screws holding the video display board to leave the long threaded ends pointing upwards.

Power supply Power supply soldering detail

As working on the enclosure is a fairly noisy activity I switched my attention to the electronics for a brief spell. The first part of the circuit I assembled was the power supply; this just uses a pair of voltage regulators to provide 5V and 3.3V from an external power supply (I use a cheap wall wart affair rated at 7.5V DC).

Oscillator

I decided that the next part to tackle would be the oscillator. This uses a 20MHz crystal and a 74LS04 according to the design on z80.info to generate a 20MHz clock signal which will be further divided by two to produce a 10MHz clock signal for the Z80. I had some real problems with this design; it would run at 20MHz until I attached a load to it, at which point it would generate a fairly random-looking signal or stop oscillating entirely. I experimented with a few different capacitors and found that if I remove the 120pF capacitor and replace it with a 33pF capacitor on the other end of the crystal it works reliably. I'm not entirely sure why this is, but it's the design I've been using for a while with the computer on a breadboard so I'm happy to keep it this way for the time being.

ATmega644P

I added a D flip-flop to divide the 20MHz clock to 10MHz and then added the ATmega644P microcontroller to the board. This has a jumper next to its clock input allowing for the selection of either 20MHz or 10MHz operation; a pin header to the left of this jumper allows for it to be programmed in-circuit.

VDC reinstalled in the case

With those new parts in place I reinstated the video display board to check that everything still fit. My main concern now was how far the connectors screwed into the rear of the case would intrude and whether there'd be any problems with them getting in the way of the circuit boards.

Rear panel marked for mounting connectors

I sketched a design of how I saw the connectors would fit on the back of the case and then copied the layout to some masking tape stuck to the case. The computer naturally needs a power supply and keyboard input, and the video display board accounts for the VGA connector and an RCA connector for composite video (which I neglected to mark). I also hope to include a serial port and a parallel port in the final design (though neither are currently supported by the software) so left space for those two connectors.

Hole drilled for the keyboard connector Keyboard connector mounted in the case

The 6-way mini-DIN connector for the keyboard is the deepest one to contend with so I decided to start with it. I cut the hole in the case by drilling a small hole in the plastic which I then enlarged with a burr tool to the correct shape and size.

Keyboard connector screwed in

Fortunately it looks like there's plenty of room in the case for connectors!

Connectors for the serial port, composite video output and DC input Inside view of the case with some more connectors installed

The next few connectors confirm this. I really do not enjoy cutting the holes for D-sub connectors (such as the one for the serial port); they don't have much of a metal lip to hide a botched hole, so I have to cut very slowly and very carefully, taking a very long time to slowly enlarge each hole until the connector fits. I'm therefore not really sure why I decided to have three D-sub connectors in this computer design; maybe I'm just a glutton for punishment.

Completed rear panel Rear panel as seen from the inside of the case

Finally, the rear of the case is completed. I will leave the masking tape on there as scratch protection until I have finished the front of the case (this will be significantly simpler — just a power switch, power LED and disk activity LED). Once that is done I can resume working on the electronics!

Integrating the dsPIC33 VDC with the Z80 computer

Saturday, 31st July 2010

The ultimate goal for the video display controller module I have been working on is to drive the display in my Z80 computer project. As I have now got a pretty good set of features I thought it would be a good idea to join the two projects together.

Z80 computer with dsPIC33 VDC

The big board in the lower middle of the above photograph is the main body of the computer, including the Z80, its RAM, the ATmega644P that is used to handle I/O, an SD card for storage and a DS1307 real-time clock. The small board in the bottom left of the photo is the power supply (supplying both 5V and 3.3V) and clock generator (providing a 20MHz and 10MHz clock).

At the top of the photo is the video display controller, connected to a 320×240 graphical LCD. A pin header is used to connect this VDC board to the rest of the computer. Three pins are required for power; 0V, 3.3V (dsPIC33 and output buffer) and 5V (LCD). The VDC is connected to the computer's ATmega644P I/O controller using the two-wire I2C bus (the same bus that is used to access the DS1307 clock). Rather than run a series of graphical demos, the VDC now waits for commands to be written to the I2C slave address 0xEE which it acts on to control what is shown on the screen. I'm aiming for these commands to work in the roughly same way as they did on the BBC Micro VDU, which should make porting the enhanced TI-83+ version of BBC BASIC to this computer a bit easier. The BBC Micro's VDU could be accessed by calling OSWRCH (assuming it was being used as the current output stream), which typically has an address of &FFEE — hence my choice of 0xEE as the I2C slave address!

Detail of the LCD connected to the VDC

A handful of these VDU commands have been implemented, which is sufficient to run simple CP/M software. The generic CP/M version of BBC BASIC does not, naturally, support any hardware-specific features and as such lacks advanced text or drawing support (one can send commands directly to the output stream with the VDU statement but this isn't very user-friendly). I will need to work on this now that the hardware is coming together! The current VDC code can be downloaded here if you are interested in the changes that have been made.

The above photo shows the newly constructed VDC hardware. All of my previous projects have been assembled on stripboard; as the projects have become more complex or simply smaller I've found stripboard to be increasingly awkward to work with. ICs can only really be orientated in one direction, and to reduce the size of circuits I've had to start cutting the tracks between holes (rather than the usual method which is to drill out an entire hole). The supplier I normally acquire parts from, Bitsbox, recently added three different sizes of perfboard to their catalogue so I thought I'd give it a go. I've found it much more pleasant to work with than stripboard, though not as easy to correct if you make a mistake and need to desolder a connection. You can certainly perform some interesting space-saving tricks on the underside of the board!

The underside of the VDC showing the soldering technique

The Kynar insulation on the wire I switched to using also has the advantage of not melting when heated with a soldering iron, as I've had problems in previous projects where tightly-spaced wires will end up getting shorted together as the insulation between them melts.

I have mentioned that one pin header is used to connect the VDC to the computer. There are three others on the board; the two-pin one is for the composite video output, the six-pin one is for connection to a PICkit to reprogram the dsPIC and the four-pin one for the VGA output.

Detail of VGA output from the VDC

Now that I have moved the VDC onto a permanent circuit board I feel that I can start moving the rest of the computer in the same direction. The software is far from complete and the hardware is pretty rudimentary but it does basically work and having a more robust system to work on should make life a bit easier.

Booting CP/M 3 from an SD card

Wednesday, 23rd June 2010

Up to this point I have been running CP/M 2.2 on the Z80 computer. CP/M 3 adds a number of useful features, including the following:

  • Support for more than 64KB RAM via banked memory.
  • Standardised access to real-time clock for file date and time stamping.
  • Improved text entry on the command-line when using the memory-banked version, such as the ability to move the cursor when editing and recall the previously entered line.
  • Support for disks with physical sectors larger than the default record size of 128 bytes.

Switching to a banked memory system would require some new hardware in the form of a memory management unit so I have stuck with the simpler non-banked system for the time being. Support for physical disk sectors larger than 128 bytes is more interesting (SD cards use 512 byte "blocks") and real-time clocks are always useful so I have started working on updating to CP/M 3.

Z80 computer with new SD card slot and real-time clock
Z80 computer with new SD card slot (bottom left) and real-time clock (top right)

CP/M consists of three main pieces of software:

  • A BIOS which exposes a small number of routines to perform primitive, hardware-specific operations (e.g. output a character to the console, read a raw sector from a disk, check if a key has been pressed).
  • The BDOS which provides the main API for transient programs (e.g. read a complete line of input from the console, create a file, read a record from a file).
  • The CCP, or console command processor, which provides the main user interface for loading and running other programs or performing some basic tasks via its built-in commands. This would be analogous to COMMAND.COM on DOS.

When working with CP/M 2.2 I had source files for these three pieces of software, so I just needed to implement the 17 BIOS functions, reassemble the three files to fixed addresses in memory and load them to these fixed addresses using the AVR when booting the computer. These three files were stored in the lower 8KB of the flash memory chip and were not accessible from within CP/M itself.

CP/M 3 proved to be a bit more of a challenge, as it is loaded slightly differently. The CCP is stored as a regular file named CCP.COM on the floppy disk you're booting from, so only the BIOS and BDOS need to be loaded from their hiding place at the start of the boot disk. These two pieces of software are merged into a single file named CPM3.SYS by a CP/M utility named GENCPM. To get this utility to work I needed to provide GENCPM with a hardware-specific BIOS3.SPR file that implemented the 31 BIOS routines. Fortunately, a file named BIOSKRNL.ASM is provided that implements most of the boilerplate code involved with writing a BIOS (you still have to provide the hardware-specific routines yourself, but your task is made much easier by following the template) so I just needed to recompile that for a non-banked system and link it with my handful of hardware-specific routines.

A log of a session in CP/M 3

Ideally, CPM3.SYS would be stored on the regular file system with CCP.COM and the hidden boot loader would load CPM3.SYS for you. CP/M 3 does provide a small boot loader for this purpose (aptly named CPMLDR) which employs a cut-down BDOS and BIOS to load CPM3.SYS from the file system into memory for you. I haven't been able to get it to work, though, so I currently parse and load CPM3.SYS using some C code on the AVR. This works well enough for the time being, as can be seen in the above output generated by the computer when testing the real-time clock.

DS1307 real-time clock

The time and date is maintained by a DS1307, an inexpensive eight-pin real-time clock and calendar chip that is shown in the middle of the above photograph. It is accessed over the I2C bus using a protocol that is natively supported by the AVR hardware. It uses binary-coded decimal to represent dates and times, which corresponds nicely to the time format used by CP/M; however, CP/M represents dates as a 16-bit integer counting the number of days since the 31st December 1977. I have used the algorithms on this website to convert dates to and from this format and the individual components.

The only downside of the DS1307 is that it only stores a two-digit year number, not the four digits one would hope for. This means that the century is discarded when setting the real-time clock, allowing for you to set a date that is then retrieved differently (truncated to the range 1930..2029). I haven't thought of a suitable solution to this problem just yet. I could use the AVR to act as the real-time clock, but I would then lose the advantage of the DS1307's battery backup that kicks in when the main power supply is removed.

The state of the DS1307 is effectively random at power-up. One of the first things the computer does when booting is to read the current date and time and check that all fields are within range. If not it resets them to midnight on the 1st January 1978 and displays a message to indicate that it has done so.

SD card in slot

The SD card has been a bit of a headache to get working and though it currently only supports reading, not writing, it should hopefully be a useful addition to the computer. Rather than the previous arrangement of series rectifier diodes to drop the supply voltage and zener diodes to protect the inputs I'm using a dedicated 3.3V regulator to power the card and resistor voltage dividers to drop the 5V logic signals to around 3V (the closest I could get to 3.3V with the resistors I had to hand). I'm using the disk image from the old 512KB flash chip and treating the card as having 128 byte sectors so the arrangement is no more capable than before and in some cases quite a lot slower (reading a 128 byte record now entails reading a whole 512 byte block from the card then returning the desired 128 byte range within that block) but it seems to be as reliable as it used to be at least. SD cards append a CRC16 checksum when transferring data blocks so I can hopefully detect errors more easily and their on-board flash memory controller should perform wear-levelling, prolonging the life of the card.

To write the disk image to the card I used HxD which makes the job as easy as copy and paste. One problem I did have is that it displayed an "Access denied" error when attempting to write data, which I assume to be because something in Windows was using the card at the same time as HxD. I knocked together a short program for the AVR that wrote junk to the first block of the card, the result being that Windows no longer recognised the card's file system and HxD managed to write the data to the disk with no further problems.

An SD card reader from Poundland

Sockets for regular SD cards seem to be relatively expensive for what they are, but the above SD card reader cost a pound (what else?) from Poundland. A bit of work with a soldering iron and some desoldering tools yielded some useful components:

Parts from the disassembled SD card reader

The crystal is unmarked and I'm hardly short of LEDs but the USB A connector could be a good way to reduce the size of a project that plugs into a USB port (USB B connectors are rather bulky) and the SD card slot works brilliantly for my needs here. There are cheaper and nastier ways to add an SD card slot to your project, but something like this feels more robust and has the advantage of reporting the state of the card's write protection switch.

Keyboard input and RAM disks make CP/M more useful

Wednesday, 16th June 2010

The hardware for the computer has changed in (mostly) subtle ways since the last post, with the exception of a PS/2 socket for connection to a keyboard.

Z80 computer with PS/2 keyboard socket

PS/2 keyboards (which use the same protocol as the older AT keyboard) communicate with the host by clocking data in either direction (keyboard to host or host to keyboard) over two wires, appropriately named "clock" and "data". An AVR pin change interrupt is used to detect a change in state of the clock line and either input or output a bit on the data line depending on the current direction of data transmission. Incoming bytes generally relate to the scancode of the key that has just been pressed or released. These scancodes are looked up on a series of hard-coded tables to translate them into their corresponding ASCII characters. CP/M accesses the keyboard via two BIOS routines: CONST (2), which checks whether a character is available or not, and CONIN (3), which retrieves the character. I initially implemented these by simply reading from I/O port 2 (CONST) or port 3 (CONIN).

As keyboard input is polled, CP/M was wasting a lot of time reading from the AVR. Due to the AVR's relatively slow way to respond to I/O requests this was slowing down any program that needed to periodically call CONST (for example, BBC BASIC constantly checks for the Escape key when interpreting BASIC programs). I converted this polling system into an event driven one by connecting the AVR to the Z80's maskable interrupt pin, /INT. When a new key is received by the AVR it pulls /INT low to assert it. The Z80 responds to the interrupt request by setting an internal flag to remember that a key has been pressed and acknowledges the interrupt by outputting a value to port $38 (the Z80's maskable interrupt handler resides at a fixed address of $38 in memory, so this seemed like a sensible choice). The AVR detects this write to port $38 and returns /INT to its high state. The CONST routine can now directly return the value of this flag when polled (rather than having to request the flag from the AVR) which noticeably speeds up running programs. The flag is cleared when a key is read by calling CONIN.

I did have some difficulty getting the interrupt system to work; the Z80 has a number of different ways of responding to interrupts, two of which rely on fetching a value from the data bus by asserting /IORQ before an interrupt is serviced. IM 0 fetches an instruction from the bus and executes it, and IM 2 fetches the least significant byte of the address of the interrupt service routine to combine with the most significant byte stored in the I register. IM 1 (which is what I'm using) just jumps to the fixed address $38. However, I hadn't taken this additional data read into account and when the Z80 attempted to read from an I/O device the AVR was either putting nonsense on the bus or (deliberately) locking up with a message to indicate an unsupported operation. Fortunately you can easily tell the difference between a regular I/O request and an interrupt data request by checking the Z80's /M1 output pin, so with that addition things started working a bit more smoothly!

BBC BASIC test session with the Z80 computer

I'm still using terminal emulation software on my PC to view the output of the computer, though as I now have keyboard entry the results are a little more impressive than the few boot report lines and a prompt that were in the last entry. I still haven't worked out why my PC switches off or blue-screens when programming AVRs over the serial port, so I've soldered together a parallel port programmer for the time being.

Programming hardware

The pinout of the programmer matches that of the website where I found the SI Prog design. The ATmega644P's SPI, power and reset pins that the programmer interfaces with are all adjacent, but not in the same order as the ones in the SI Prog, hence the small board to the right of the above photo which swaps the pin order around using wires soldered to its reverse (this saves a lot of breadboard space). The board in the middle plugs directly into the parallel port programmer and is used to program the 512KB flash memory chip I'm using for storage.

I haven't got around to implementing writing to this flash memory yet, unfortunately, though I have implemented a simple way to test a writable disk drive. The RAM chip I am using is a 128KB one, as Farnell didn't sell 64KB ones. The Z80 can only address 64KB without additional memory banking hardware, so I'd simply tied A16 low and was ignoring half of the memory. I have now edited the BIOS to expose two disk drives; the default A: (512KB of flash memory) and now B:, a 64KB RAM drive. A16 is now driven by the AVR; during normal operation, it is held low (giving the Z80 access to its usual 64KB) but during disk operations it can be driven high to grant the AVR access to the previously hidden storage.

Testing the RAM disk

In the above test I use the STAT command to check free space, the PIP command to copy BBCBASIC.COM from A: (flash) to B: (RAM) then run BBC BASIC from the RAM disk, save a program then run it again by passing its filename as a command-line argument to BBC BASIC. At the end I try to copy the new program back to A:, but as there is no writing support for flash it keels over with a fairly unhelpful generic CP/M error.

Now that I've finally got something working in a vaguely usable manner, I hope I can start to research ways to make it better. Sorting out writing to flash would be a good start (I'm sorely tempted by jbb's suggestion to use an EEPROM to map logical floppy sectors to physical flash sectors) and I certainly hope to dig out my 320×240 pixel graphical LCD and driver for output instead of relying on a desktop PC. I'd also like to upgrade to CP/M 3 (I'm currently using CP/M 2.2) but when I last looked at that it seemed like a much more involved process so I decided to keep it simple. There's a fair mountain of stuff I need to take in, but I'm certainly learning a lot as I go (I only just realised tonight that CP/M was capable of graphics output, for one). I'd be a very happy chap if I could eventually run WordStar on this computer!

Combining a Z80 and an ATmega644P to boot CP/M

Monday, 14th June 2010

I've been working on a new Z80 computer over the last few days. I would say that I had been working on the existing Z80 computer were it not for the fact that this a completely new design.

The previous computer had two 32KB RAM chips to provide a total of 64KB RAM. To run a user program you need to get it into RAM somehow, so I also included a 128KB ROM chip which occupied the lower 16KB of the Z80's address space to provide the fixed operating system that could be used to load programs. By adding memory banking hardware I could select one of eight 16KB pages of ROM. The next 16KB was one of two banks of RAM from one RAM chip, and the final 32KB was mapped directly to the other RAM chip.

Previous Z80 computer memory map

This is all fairly complicated, and not very flexible. Programs written for CP/M tend to be loaded into memory starting at address $0100, which is impossible with my old design as that section of memory is taken up by ROM.

Giving another device access to the buses

The Z80 accesses memory and other hardware devices using three buses; an eight-bit data bus which shuttles bytes of data between the various chips, a sixteen-bit address bus which addresses a location in memory or a particular I/O device, and a control bus which contains numerous lines that specify the type of operation (for example, if /MREQ and /WR go low together it indicates that a byte is being written to memory, or if /IORQ and /RD go low together it indicates that a byte is being read from an I/O device).

There is also a pin named /BUSREQ that can be used to request access to these buses. The Z80 will periodically check this pin and if it is held low it will put the data, address and control buses into a high-impedance state and drive /BUSACK low to acknowledge this. This effectively removes the Z80 from the circuit, and another device can now drive the buses.

This is the feature which I have based the new design around — the current prototype is pictured above. It features a Z80 and 128KB of SRAM (only 64KB is currently addressable) on the upper board. On the lower board is an ATmega644P microcontroller, which is used to start the computer.

When the circuit is reset, the ATmega644P requests access to the buses from the Z80. When access has been granted, it proceeds to copy the CP/M BIOS from the 512KB flash memory IC to a specific location in RAM (currently $F200). It then writes the Z80 jump instruction jp $F200 to the start of memory, returns control of the buses to the Z80 and pulses its /RESET pin. The CP/M BIOS then runs directly on the Z80.

As the ATmega644P doesn't have enough pins to drive all of the buses directly, I've added sixteen GPIO pins by using two MCP23S08 8-bit I/O expander chips. These are used to drive or sample the Z80 address bus; the data and control buses are driven or sampled directly by the GPIO ports on the ATmega644P.

Using a slow to respond microcontroller for I/O

The Z80 is most useful if it can talk to the outside world somehow, which is usually achieved by reading from or writing to I/O devices. In my previous design I built these out of latches and lots of glue logic. As I've added a powerful microcontroller to the computer which features a number of useful on-board peripherals, it would seem sensible to use that instead.

One problem with this idea is that the Z80 expects to read or write to an I/O device in a mere four clock cycles. The AVR has a delay between an interrupt occurring (such as a pin state changing) and executing interrupt service routine of at least five clock cycles. Even though the AVR is running at twice the clock speed of the Z80 this still doesn't provide much time to sample the address bus and perform some useful action before returning a value to the Z80. Fortunately, the Z80 has another useful pin, /WAIT, specifically to address this concern. By pulling this pin low the Z80 can be stalled, allowing the I/O device plenty of time to respond. I have included a 7474 D-type flip-flop as an SR latch to control the /WAIT pin. When the Z80's /IORQ pin goes low the flip-flop is reset, which pulls the /WAIT pin low. When the AVR notices that the /IORQ line has gone low it samples the address bus, performs the requisite task then sets the flip-flop, which drives the /WAIT pin high again and the Z80 continues executing the program.

The 7474 is a dual D-type flip-flop, so I have used the second flip-flop to halve the AVR's 20MHz clock signal to provide the 10MHz clock for the Z80.

CP/M interacts with the host computer by calling numbered BIOS functions. I have implemented a number of these BIOS functions by outputting a value to a port number that matches the BIOS function number. For example, CONOUT is function number four and is used to send the character in register C to the console.

CONOUT:
    ld a,c
    out (4),a
    ret

The AVR detects a write to port 4 and sends the incoming byte to one of its UARTs. I have connected this UART to a simple transistor inverter (pictured in the top right of the above photograph) and plugged the output from that into one of my PC's serial ports, so by running a terminal emulator I can see the output of CP/M on the screen. I have implemented only a handful of other functions (WBOOT outputs a value to port 1 to indicate that I should load the BDOS and CCP into RAM from the flash memory and READ can be used to copy 128 byte floppy disk sectors from flash memory to Z80 RAM) so the results are not exactly impressive:

Loading BIOS...OK
Loading BDOS...OK
Loading CCP...OK

A>

As I haven't implemented console input yet there's no way to type at the prompt, but that it gets that far is encouraging.

I haven't implemented writing to the flash memory due to a mistake I made when reading its datasheet. When writing to flash memory the value you write is ANDed with the data that's already there (you can only set a 1 bit to a 0 bit, but not vice-versa) – this is referred to as programming. If you want to write a 1 bit you have to erase the memory before writing to it (this is unsurprisingly referred to as erasing). Flash memory can be split into pages (small regions, in this case 256 bytes) and sectors (large regions, in this case 64KB). You can often program any number of bytes (up to a page at a time, aligned to page boundaries) but can only erase in larger blocks — pages, sectors, or the entire memory (bulk erase). I thought that the flash memory ICs I bought supported page erasing, but they only support sector erasing. CP/M transfers data between floppy disks and RAM in 128 byte floppy disk sectors, so to write an updated sector I would need to read 64KB from the flash memory, update a 128 byte region within it, erase an entire flash sector, then program the 64KB back to it. This would be very slow and quickly wear out the flash memory, so I am looking for some replacement flash memory ICs which do support page erase.

SPI flash memory programmer

To copy the system files and a sample disk image to the flash memory I cobbled together the above parallel port programmer which is driven by an application cobbled together in C#. It's rather slow but gets the job done — unlike my AVR programmer. After finally managing to get CP/M to boot in a satisfactory manner I made a few tweaks to the AVR program and hit the "Build and Program" button in the editor. The code built, but rather than program the AVR my computer switched off. No error message, not even a blue screen, just a sudden and surprising power down. Since then I've only managed to talk to the AVR once; every other time has resulted in either a power down or blue screen. I had hoped to add some keyboard handling routines to the project to at least be able to interact with CP/M, but after fiddling around for an hour and a half without managing to get anything working again I gave up. I wish I knew why it suddenly stopped working, after hours of reliable service — maybe it's a hint that it's time to buy a proper USB debugger rather than the cheap and cheerful home-made serial port programmer I've been using!

Power supply insidesPower supply enclosure

One equally cheap but useful addition to my tools is the above 5V power supply (yes, it's just a 7805 regulator in a box). Every project I have built needs a 5V supply from somewhere, which usually comes from a 7.5V wall wart power supply unit regulated to 5V with a 7805. This takes up valuable breadboard space and the weight of the cable from the power supply tends to drag the breadboard around the smooth surface of my desk, so having a dedicated box with an on-off switch, indicator LED, reverse voltage protection and an easy way to connect to the circuit via 2mm sockets is very handy indeed.

I now need to find a way to program AVRs without my PC switching itself off before I can make any more progress on the project...

Thinking about CP/M

Wednesday, 24th February 2010

It's been some time since I worked on my Z80 computer project, but the recent electronics projects I've completed have got me thinking about it again.

I did record a video to demonstrate the basic parts of the computer and some of its flaws a few months ago, which can be seen above. However, I'm now thinking of a more radical redesign than fixing the I/O board's shortcomings.

One of the reasons for my lack of motivation is that even if I did get something working I wouldn't have much software to run on it; it would be a lot of work to write software that only ran on that one particular machine. BBC BASIC helps somewhat, but an even better solution would be to model the device on an existing machine and run its operating system on it.

Fortunately, there was a popular operating system for the 8080 (and, by extension, the Z80) – CP/M. This is a very simple operating system that inspired DOS. Crucially, it is not hardware-specific, the source code is available and there is a wide range of software available for it, including BBC BASIC.

CP/M is made up of three main components. At the highest level is the Console Command Processor, or CCP. This provides the command-line interface, a handful of built-in commands and handles loading and executing external programs. It achieves this with the aid of the Basic Disk Operating System, or BDOS, which exposes a number of useful routines for a variety of tasks, such as outputting text to the display, searching for files on the disk or reading console input.

Both of the above components are machine-independent – they simply need to be copied to the correct address in RAM when the computer starts. Relocating them to a particular address requires setting a single value in their respective source files and reassembling them, which is nice and easy. It's the third component – the Basic I/O System, or BIOS – that requires a bit more work. This is the only part that is tailored to a particular machine's hardware, and my current implementation is listed below.

CCP    = $DC00
BDOS   = $E406
BIOS   = $F200

IOBYTE = $0003
CDISK  = $0004

DMAAD  = $0008
CTRACK = $000A
CSEC   = $000C

.org BIOS

	jp BOOT    ; COLD START
WBOOTE
	jp WBOOT   ; WARM START
	jp CONST   ; CONSOLE STATUS
	jp CONIN   ; CONSOLE CHARACTER IN
	jp CONOUT  ; CONSOLE CHARACTER OUT
	jp LIST    ; LIST CHARACTER OUT
	jp PUNCH   ; PUNCH CHARACTER OUT
	jp READER  ; READER CHARACTER OUT
	jp HOME    ; MOVE HEAD TO HOME POSITION
	jp SELDSK  ; SELECT DISK
	jp SETTRK  ; SET TRACK NUMBER
	jp SETSEC  ; SET SECTOR NUMBER
	jp SETDMA  ; SET DMA ADDRESS
	jp READ    ; READ DISK
	jp WRITE   ; WRITE DISK
	jp LISTST  ; RETURN LIST STATUS
	jp SECTRAN ; SECTOR TRANSLATE

DISKPARAM
	.dw $0000  ; No sector translation.
	.dw $0000  ; Scratch
	.dw $0000  ; Scratch
	.dw $0000  ; Scratch
	.dw DIRBUF ; Address of a 128-byte scratch pad area for directory operations within BDOS. All DPHs address the same scratch pad area.
	.dw DPBLK  ; Address of a disk parameter block for this drive. Drives with identical disk characteristics address the same disk parameter block.
	.dw CHK00  ; Address of a scratch pad area used for software check for changed disks. This address is different for each DPH.
	.dw ALL00  ; Address of a scratch pad area used by the BDOS to keep disk storage allocation information. This address is different for each DPH.

DIRBUF
	.fill 128
	
DPBLK           ; DISK PARAMETER BLOCK, COMMON TO ALL DISKS
	.DW 26      ; SECTORS PER TRACK
	.DB 3       ; BLOCK SHIFT FACTOR
	.DB 7       ; BLOCK MASK
	.DB 0       ; NULL MASK
	.DW 242     ; DISK SIZE-1
	.DW 63      ; DIRECTORY MAX
	.DB 192     ; ALLOC 0
	.DB 0       ; ALLOC 1
	.DW 16      ; CHECK SIZE
	.DW 2       ; TRACK OFFSET

CHK00
	.fill 16

ALL00
	.fill 31

; =========================================================================== ;
; BOOT                                                                        ;
; =========================================================================== ;
; The BOOT entry point gets control from the cold start loader and is         ;
; responsible for basic system initialization, including sending a sign-on    ;
; message, which can be omitted in the first version.                         ;
; If the IOBYTE function is implemented, it must be set at this point.        ;
; The various system parameters that are set by the WBOOT entry point must be ;
; initialized, and control is transferred to the CCP at 3400 + b for further  ;
; processing. Note that register C must be set to zero to select drive A.     ;
; =========================================================================== ;
BOOT
	xor a
	ld (IOBYTE),a
	ld (CDISK),a
	jp GOCPM

; =========================================================================== ;
; WBOOT                                                                       ;
; =========================================================================== ;
; The WBOOT entry point gets control when a warm start occurs.                ;
; A warm start is performed whenever a user program branches to location      ;
; 0000H, or when the CPU is reset from the front panel. The CP/M system must  ;
; be loaded from the first two tracks of drive A up to, but not including,    ;
; the BIOS, or CBIOS, if the user has completed the patch. System parameters  ;
; must be initialized as follows:                                             ;
;                                                                             ;
; location 0,1,2                                                              ;
;     Set to JMP WBOOT for warm starts (000H: JMP 4A03H + b)                  ;
;                                                                             ;
; location 3                                                                  ;
;     Set initial value of IOBYTE, if implemented in the CBIOS                ;
;                                                                             ;
; location 4                                                                  ;
;     High nibble = current user number, low nibble = current drive           ;
;                                                                             ;
; location 5,6,7                                                              ;
;     Set to JMP BDOS, which is the primary entry point to CP/M for transient ;
;     programs. (0005H: JMP 3C06H + b)                                        ;
;                                                                             ;
; Refer to Section 6.9 for complete details of page zero use. Upon completion ;
; of the initialization, the WBOOT program must branch to the CCP at 3400H+b  ;
; to restart the system.                                                      ;
; Upon entry to the CCP, register C is set to thedrive;to select after system ;
; initialization. The WBOOT routine should read location 4 in memory, verify  ;
; that is a legal drive, and pass it to the CCP in register C.                ;
; =========================================================================== ;
WBOOT

GOCPM
	ld a,$C3      ; C3 IS A JMP INSTRUCTION
	ld ($0000),a  ; FOR JMP TO WBOOT
	ld hl,WBOOTE  ; WBOOT ENTRY POINT
	ld ($0001),hl ; SET ADDRESS FIELD FOR JMP AT 0
	
	ld ($0005),a  ; FOR JMP TO BDOS
	ld hl,BDOS    ; BDOS ENTRY POINT
	ld ($0006),hl ; ADDRESS FIELD OF JUMP AT 5 TO BDOS

	ld bc,$0080   ; DEFAULT DMA ADDRESS IS 80H
	call SETDMA

	ei            ; ENABLE THE INTERRUPT SYSTEM
	ld a,(CDISK)  ; GET CURRENT DISK NUMBER
	ld c,a        ; SEND TO THE CCP
	jp CCP        ; GO TO CP/M FOR FURTHER PROCESSING

; =========================================================================== ;
; CONST                                                                       ;
; =========================================================================== ;
; You should sample the status of the currently assigned console device and   ;
; return 0FFH in register A if a character is ready to read and 00H in        ;
; register A if no console characters are ready.                              ;
; =========================================================================== ;
CONST
	out (2),a \ ret

; =========================================================================== ;
; CONIN                                                                       ;
; =========================================================================== ;
; The next console character is read into register A, and the parity bit is   ;
; set, high-order bit, to zero. If no console character is ready, wait until  ;
; a character is typed before returning.                                      ;
; =========================================================================== ;
CONIN
	out (3),a \ ret

; =========================================================================== ;
; CONOUT                                                                      ;
; =========================================================================== ;
; The character is sent from register C to the console output device.         ;
; The character is in ASCII, with high-order parity bit set to zero. You      ;
; might want to include a time-out on a line-feed or carriage return, if the  ;
; console device requires some time interval at the end of the line (such as  ;
; a TI Silent 700 terminal). You can filter out control characters that cause ;
; the console device to react in a strange way (CTRL-Z causes the Lear-       ;
; Siegler terminal to clear the screen, for example).                         ;
; =========================================================================== ;
CONOUT
	out (4),a \ ret

; =========================================================================== ;
; LIST                                                                        ;
; =========================================================================== ;
; The character is sent from register C to the currently assigned listing     ;
; device. The character is in ASCII with zero parity bit.                     ;
; =========================================================================== ;
LIST
	out (5),a \ ret

; =========================================================================== ;
; PUNCH                                                                       ;
; =========================================================================== ;
; The character is sent from register C to the currently assigned punch       ;
; device. The character is in ASCII with zero parity.                         ;
; =========================================================================== ;
PUNCH
	out (6),a \ ret

; =========================================================================== ;
; READER                                                                      ;
; =========================================================================== ;
; The next character is read from the currently assigned reader device into   ;
; register A with zero parity (high-order bit must be zero); an end-of-file   ;
; condition is reported by returning an ASCII CTRL-Z(1AH).                    ;
; =========================================================================== ;
READER
	out (7),a \ ret

; =========================================================================== ;
; HOME                                                                        ;
; =========================================================================== ;
; The disk head of the currently selected disk (initially disk A) is moved to ;
; the track 00 position. If the controller allows access to the track 0 flag  ;
; from the drive, the head is stepped until the track 0 flag is detected. If  ;
; the controller does not support this feature, the HOME call is translated   ;
; into a call to SETTRK with a parameter of 0.                                ;
; =========================================================================== ;
HOME
	ld bc,0
	jp SETTRK

; =========================================================================== ;
; SELDSK                                                                      ;
; =========================================================================== ;
; The disk drive given by register C is selected for further operations,      ;
; where register C contains 0 for drive A, 1 for drive B, and so on up to 15  ;
; for drive P (the standard CP/M distribution version supports four drives).  ;
; On each disk select, SELDSK must return in HL the base address of a 16-byte ;
; area, called the Disk Parameter Header, described in Section 6.10.          ;
; For standard floppy disk drives, the contents of the header and associated  ;
; tables do not change; thus, the program segment included in the sample      ;
; CBIOS performs this operation automatically.                                ;
;                                                                             ;
; If there is an attempt to select a nonexistent drive, SELDSK returns        ;
; HL = 0000H as an error indicator. Although SELDSK must return the header    ;
; address on each call, it is advisable to postpone the physical disk select  ;
; operation until an I/O function (seek, read, or write) is actually          ;
; performed, because disk selects often occur without ultimately performing   ;
; any disk I/O, and many controllers unload the head of the current disk      ;
; before selecting the new drive. This causes an excessive amount of noise    ;
; and disk wear. The least significant bit of register E is zero if this is   ;
; the first occurrence of the drive select since the last cold or warm start. ;
; =========================================================================== ;
SELDSK
	ld hl,DISKPARAM
	ld a,c
	or a
	ret z
	ld hl,$0000 ; Only disc 0 is supported.
	ret

; =========================================================================== ;
; SETTRK                                                                      ;
; =========================================================================== ;
; Register BC contains the track number for subsequent disk accesses on the   ;
; currently selected drive. The sector number in BC is the same as the number ;
; returned from the SECTRAN entry point. You can choose to seek the selected  ;
; track at this time or delay the seek until the next read or write actually  ;
; occurs. Register BC can take on values in the range 0-76 corresponding to   ;
; valid track numbers for standard floppy disk drives and 0-65535 for         ;
; nonstandard disk subsystems.                                                ;
; =========================================================================== ;
SETTRK	
	ld (CTRACK),bc
	ret

; =========================================================================== ;
; SETSEC                                                                      ;
; =========================================================================== ;
; Register BC contains the sector number, 1 through 26, for subsequent disk   ;
; accesses on the currently selected drive. The sector number in BC is the    ;
; same as the number returned from the SECTRAN entry point. You can choose to ;
; send this information to the controller at this point or delay sector       ;
; selection until a read or write operation occurs.                           ;
; =========================================================================== ;
SETSEC
	ld (CSEC),bc
	ret

; =========================================================================== ;
; SETDMA                                                                      ;
; =========================================================================== ;
; Register BC contains the DMA (Disk Memory Access) address for subsequent    ;
; read or write operations. For example, if B = 00H and C = 80H when SETDMA   ;
; is called, all subsequent read operations read their data into 80H through  ;
; 0FFH and all subsequent write operations get their data from 80H through    ;
; 0FFH, until the next call to SETDMA occurs. The initial DMA address is      ;
; assumed to be 80H. The controller need not actually support Direct Memory   ;
; Access. If, for example, all data transfers are through I/O ports, the      ;
; CBIOS that is constructed uses the 128 byte area starting at the selected   ;
; DMA address for the memory buffer during the subsequent read or write       ;
; operations.                                                                 ;
; =========================================================================== ;
SETDMA
	ld (DMAAD),bc
	ret

; =========================================================================== ;
; READ                                                                        ;
; =========================================================================== ;
; Assuming the drive has been selected, the track has been set, and the DMA   ;
; address has been specified, the READ subroutine attempts to read one sector ;
; based upon these parameters and returns the following error codes in        ;
; register A:                                                                 ;
;                                                                             ;
;     0 - no errors occurred                                                  ;
;     1 - nonrecoverable error condition occurred                             ;
;                                                                             ;
; Currently, CP/M responds only to a zero or nonzero value as the return      ;
; code. That is, if the value in register A is 0, CP/M assumes that the disk  ;
; operation was completed properly. If an error occurs the CBIOS should       ;
; attempt at least 10 retries to see if the error is recoverable. When an     ;
; error is reported the BDOS prints the message BDOS ERR ON x: BAD SECTOR.    ;
; The operator then has the option of pressing a carriage return to ignore    ;
; the error, or CTRL-C to abort.                                              ;
; =========================================================================== ;
READ
	out (13),a \ ret

; =========================================================================== ;
; WRITE                                                                       ;
; =========================================================================== ;
; Data is written from the currently selected DMA address to the currently    ;
; selected drive, track, and sector. For floppy disks, the data should be     ;
; marked as nondeleted data to maintain compatibility with other CP/M         ;
; systems. The error codes given in the READ command are returned in register ;
; A, with error recovery attempts as described above.                         ;
; =========================================================================== ;
WRITE
	out (14),a \ ret

; =========================================================================== ;
; LISTST                                                                      ;
; =========================================================================== ;
; You return the ready status of the list device used by the DESPOOL program  ;
; to improve console response during its operation. The value 00 is returned  ;
; in A if the list device is not ready to accept a character and 0FFH if a    ;
; character can be sent to the printer. A 00 value should be returned if LIST ;
; status is not implemented.                                                  ;
; =========================================================================== ;
LISTST
	out (15),a \ ret

; =========================================================================== ;
; SECTRAN                                                                     ;
; =========================================================================== ;
; Logical-to-physical sector translation is performed to improve the overall  ;
; response of CP/M. Standard CP/M systems are shipped with a skew factor of   ;
; 6, where six physical sectors are skipped between each logical read         ;
; operation. This skew factor allows enough time between sectors for most     ;
; programs to load their buffers without missing the next sector. In          ;
; particular computer systems that use fast processors, memory, and disk      ;
; subsystems, the skew factor might be changed to improve overall response.   ;
; However, the user should maintain a single-density IBM-compatible version   ;
; of CP/M for information transfer into and out of the computer system, using ;
; a skew factor of 6.                                                         ;
;                                                                             ;
; In general, SECTRAN receives a logical sector number relative to zero in BC ;
; and a translate table address in DE. The sector number is used as an index  ;
; into the translate table, with the resulting physical sector number in HL.  ;
; For standard systems, the table and indexing code is provided in the CBIOS  ;
; and need not be changed.                                                    ;
; =========================================================================== ;
SECTRAN
	ld h,b
	ld l,c
	ret

Quite a number of the above routines simply output the value of the accumulator to a port. This is because I'm running CP/M in a Z80 emulator that I've knocked together, and am handling writes to particular ports by implementing the machine-specific operations (such as console input or output) in C#. The floppy disk file system is also emulated in C#; when the program starts, it pulls all the files from a specified directory into an in-memory disk image. Writing to any sector deletes all of the files in this directory then extracts the files from the in-memory virtual disk image back into it. This is not especially efficient, but it works rather well.

To turn this into a working bit of hardware, I intend to replace the C# part with a microcontroller to handle keyboard input, text output and interfacing to an SD card for file storage. It would also be responsible for booting the system by copying the OS to Z80 memory from the SD card. I'm not sure the best way to connect the microcontroller to the Z80, though; disk operations use DMA, which is easy enough, but for lighter tasks such as querying whether console input is available or outputting a character to the display it would be nice to be able to go via I/O ports. A couple of I/O registers may be sufficient as per the current design; a proper Z80 PIO would be even better if I can get my hands on one.

Of more concern is a suitable display; the above screenshot is from an 80-character wide display. Assuming a character was four pixels wide (which is about as narrow as they can be made whilst still being legible) imposes a minimum resolution of 320 pixels horizontally – my current LCD is only 128 pixels wide (not even half way there), and larger ones are really rather expensive!

Expression Evaluation in Z80 Assembly

Tuesday, 24th February 2009

The expression evaluators I've written in the past have been memory hungry and complex. Reading the BBC BASIC ROM user's guide introduced me to the concept of expression evaluation using top-down analysis, which only uses a small amount of constant RAM and the stack.

I took some time out over the weekend to write an expression evaluator in Z80 assembly using this technique. It can take an expression in the form of a NUL-terminated string, like this:

.db "(-8>>2)+ceil(pi())+200E-2**sqrt(abs((~(2&4)>>>(30^sin(rad(90))))-(10>?1)))",0

and produce a single answer (or an error!) in the form of a floating-point number. The source code and some notes can be downloaded here.

I initially wrote a simple evaluator using 32-bit integers. I supported the operations the 8-bit Z80 could do relatively easily (addition, subtraction, shifts and logical operations) and got as far as 32-bit multiplication before deciding to use BBC BASIC's floating-point maths package instead. The downside is that BBC BASIC has to be installed (the program searches for the application and calls its FPP routine).

I'm not sure if the technique used is obvious (I'd never thought of it) but it works well enough and the Z80 code should be easy to follow - someone may find it useful. smile.gif

Z80 computer - Lines, cubes and inverted text

Sunday, 5th October 2008

I've made a few additions to the operating system for the computer. The Console module, which handles text input and output, now supports "coloured" text - that is you can set the text foreground and background colours to either black or white. This functionality is exposed via the BBC BASIC COLOUR statement. If you pass a value between 0 and 127 this sets the foreground colour (0..63 is white, 64..127 is black) and if you pass a value between 128 and 255 this sets the background colour (128..191 is white, 192..255 is black).

2008.10.05.02.Colour.png   2008.10.05.04.TextViewport.png

The image on the right also demonstrates another addition - you can set the text viewport to occupy a partial area of the display. This is most useful when coupled with the ability to define graphics viewports, which I have yet to add.

That said, I have started writing the Graphics module. So far all it can do is draw clipped lines, and this functionality is exposed via BBC BASIC's MOVE and DRAW statements. MOVE sets the graphics cursor position - DRAW also moves the graphics cursor, but also draws a line between the new position and the previously visited one.

2008.10.05.03.Line.png

I cannot use drawing code I've written for the TI-83+ version due to differences in the LCD hardware and the way that buffers are laid out. The popular way to lay out graphics buffers on the TI-83+ is as follows:

2008.10.05.05.LCD.TI.png

Each grey block represents 8 pixels - one byte in LCD memory represents 8 pixels grouped horizontally. The leftmost bit in each 8-pixel group is the most significant bit of each byte. The data is stored in the buffer so that each row of the LCD is represented by 12 consecutive bytes. This left-to-right, top-to-bottom arrangement should seem sensible to anyone who has worked with a linear framebuffer. However, due to the way that the LCD I'm using is arranged, I'm using the following buffer layout:

2008.10.05.06.LCD.Vertical.png

The LCD hardware groups pixels vertically, but when you write a byte to it its internal address pointer moves right. Furthermore, the most significant bit of each byte written is at the bottom of each group. This may sound a little confusing, but actually works out as more efficient. Writing text is easy; I'm using a 4×8 pixel font, so all I need to do is set the LCD's internal address counter correctly then write out four bytes, one for each column of the text (other sensible font sizes for the display, such as 6×8 or 8×8 are just as easy to display).

Another example of improved efficiency is if you deal with pixel-plotting routines. Each pixel on the display can be addressed by a buffer offset and an eight-bit mask to "select" the particular pixel in an eight-pixel group. With this arrangement, moving the pixel left or right is easy; simply increment or decrement the buffer offset by one. Moving the pixel up or down is a case of rotating the mask in the desired direction. If the rotation moves the pixel mask from one 8-pixel group to another (which only happens every eight pixels) the buffer offset needs to be moved by 128 in the correct direction to shunt it up or down.

On the TI-83+, moving the pixel up or down requires moving the buffer offset up or down by 12; moving the pixel left or right is a rotation as before with a simple buffer offset increment or decrement to move between 8-pixel groups.

In Z80 assembly incrementing or decrementing a 16-bit pointer by one is a single instruction taking 6 clock cycles; moving it by a larger offset takes at least 21 clock cycles, 42 if you include backing the temporary register such an operation would take.

What may be interesting to see is how well a raycaster would work on a system that has video memory arranged into columns.

Without wishing to be typecast as that programmer who loves spinning cubes, I also wrote a cube-spinning demo to test the line drawing routines as well as some integer arithmetic routines I've added (the Z80 can't multiply or divide, so these operations need to be implemented in software).

It runs fairly smoothly (bearing in mind the 2MHz clock speed). The second half of the video has the Z80 running at 10MHz; it actually seems quite stable even though the LCD is being accessed at nearly five times its maximum speed (the system did need to be reset a few times until it worked without garbling the display).

Fixed and scaled CHIP-8/SCHIP interpreter

Wednesday, 24th September 2008

The CHIP-8/SCHIP interpreter now seems happy enough to run games, though the lack of settings to control how fast or slow they run makes things rather interesting.

2008.09.23.01.FileListing.png

First of all, I've hacked together a painfully simple read-only file system. Each file is prefixed with a 13-byte header; 8 bytes for the filename (padded with spaces), 3 bytes for the extension (padded with spaces) and two bytes for the file size. The above file listing can be generated by typing *. at the BASIC prompt.

I've written a new sprite drawing routine that scales sprites up to double size when in CHIP-8 mode; this allows CHIP-8 games to fill the entire screen. Unlike the existing sprite code, which I've retained for SCHIP games, it runs entirely from ROM; the existing sprite code has to be copied to RAM as it uses some horrible self-modifying code tricks. I should probably rewrite that bit next. smile.gif

As for the bug I mentioned in the last post, it was because of this:

; --- snip ---

; Group 9:
;   * 9XY0 - Skips the next instruction if VX doesn't equal VY.
InstructionGroup.9
	call GetRegisterX
	ld b,a
	call GetRegisterY
	cp b
	jp nz,SkipNextInstruction

; Group A:
;   * ANNN - Sets I to the address NNN.
InstructionGroup.A
	call GetLiteralNNN
	ld (DataPointer),hl
	jp ExecutedInstruction

; --- snip ---

If an instruction in the form 9XY0 is executed and VX == VY, rather than jumping to ExecutedInstruction the code runs on and executes the instruction as if it had been an ANNN as well, which ended up destroying the data pointer. Adding a jp ExecutedInstruction after the jp nz,SkipNextInstruction fixed the bug.

One other advantage of the zoomed sprites is that "half-pixel" scrolls also work correctly:

2008.09.23.05.SChip.EmuTest.png

...not that I've seen any game that uses them.

2008.09.23.02.SChip.Piper.png   2008.09.23.03.Chip8.Brix.png

2008.09.23.07.SChip.UBoat.png   2008.09.23.08.SChip.Square.png

2008.09.23.04.Chip8.Blinky.png   2008.09.23.06.SChip.Blinky.png

The last two screenshots show two versions of the game Blinky, one as a regular CHIP-8 program and the other taking advantages of the SCHIP extensions.

64KB RAM and a CHIP-8/SCHIP interpreter

Monday, 22nd September 2008

The only major hardware modification since last time is the addition of another 32KB SRAM.

This appears as two 16KB pages in the $4000..$7FFF slot. Currently only the first page is used for OS variables and scratch space, freeing up the upper 32KB entirely for BBC BASIC's use.

One other minor hardware addition is support for a dual-coloured LED on the control port. This LED will be used to signify file access - reads by a green LED and writes by a red LED. As such I haven't implemented a proper file system, but typing SAVE "FILE" or LOAD "FILE" at the prompt will transfer data between the Z80 RAM and a 24LC256 32KB EEPROM. The routines do not pay attention to any file name specified - the first two bytes on the EEPROM indicate the file size, and the rest of the EEPROM is the file. I think some sort of simplified version of FAT may work well, as the EEPROM has a natural page size of 64 bytes which could be used in place of clusters.

2008.09.15.02.Memory.Board.Underside.jpg
Adding the second 32KB SRAM required soldering wires to the underside of the stripboard, not something I'd recommend!

As I have not yet added any graphical commands to BBC BASIC, and as porting assembly programs to this hardware is going to be a bit of a pain until I decide on the way the OS is going to work, I decided to try and port Vinegar to the system. Vinegar is a CHIP-8 and SCHIP interpreter - CHIP-8 programs being simple bytecode and so relatively simple to interpret.

2008.09.22.01.Chip8.Joust.png

The code I had written was difficult to port, however, being inefficiently and messily written, so I ended up rewriting all of it apart from the sprite drawing routines. The TI-83+ LCD follows the usual trend of storing 8 horizontal pixels in each byte of video memory. The LCD I have stores 8 vertical pixels in each byte of video memory, which means that each 8×8 pixel block in memory needs to be rotated by 90° before being sent to the LCD hardware. This is understandably very slow, and not helped by the Z80 only running at 2MHz. To further complicate issues, games rely on two 60Hz timers, and I have no timing hardware. The current version of the interpreter has some bugs, but is good enough to run some SCHIP programs.

CHIP-8 programs are displayed squashed in the top-left hand corner, as they're designed to run in a 64×32 video mode unlike SCHIP's 128×64 (happily, the resolution of the LCD) - typically, the one thing I really did need to fix for the new hardware, the sprite code, is the only thing I copied over. In reality, CHIP-8 graphics would need to be scaled up to fit the screen. Working out a way of getting the system to operate at 10MHz would really be a welcome upgrade!

Times, backlights and off-page calls

Sunday, 14th September 2008

Dates, times and backlights

I'm using a DS1307 real-time clock to provide the computer with real-time date and time functions. It's a great little chip - all it needs is power, two lines for I2C communications, a 32768Hz crystal between two pins and a back-up battery to keep it ticking when main power is removed and it's happy. That accounts for seven pins; the last remaining pin can be used as a one-bit output (you can set it to a high or low state in software) or it can be configured to output a square wave at 1Hz, ~4kHz, ~8kHz or ~32kHz.

2008.09.14.01.Clock.png

BBC BASIC can access the clock via the TIME$ pseudo-variable. This string variable returns the date and time in the format Sun,14 Sep 2008.15:20:00, and you can set the clock by assigning to the variable. When setting the clock you can specify either the date, the time, or both. Parsing the string has been an interesting exercise in Z80 programming, as it's not something I've ever attempted without regular expressions before!

2008.09.14.02.SettingClock.jpg

The only hardware modification since last time is a very poorly implemented software control of the backlight. The fifth bit of the control port specifies whether the backlight is on or off, and it can be toggled with the *BACKLIGHT command. I say "poorly implemented" as the transistor driver I'm using to interface the hardware port with the backlight LEDs results in a much dimmer backlight than when I had the LEDs hooked up directly to the power supply (on the positive side, at least the 5V regulator's heatsink is cool enough to touch - the backlight draws a lot of current).

Calling off-page functions

Now that I have access to all eight 16KB "pages" that make up the 128KB OS ROM, it may help to explain how one can use all of this memory. After all, if page 1 is swapped in and you wish to call a function on page 2, a regular Z80 call isn't going to work as you need to swap page 2 before calling the function then swap page 1 back in afterwards.

The trick is to exploit the way that the Z80 handles calling subroutines. There is a 16-bit register, PC, which stores the address of the next instruction to execute. When you call a subroutine, the Z80 pushes PC onto the stack then sets PC to the address of the subroutine. When you return from a subroutine (via the ret instruction) the Z80 simply pops the value it previously pushed onto the stack and copies this back to PC. Instead of calling the target subroutine directly, you call a special handler that is available on every page. Following your call is 16-bit identifier for the off-page function you wish to call. This handler then (prematurely) pops off the return address from the stack, reads the 16-bit value that follows it (which is the indentifier of the function you wish to call), looks up the page and address of the target function, swaps in the correct page and calls it as normal. When the function returns, the handler then swaps back the calling page and jumps back to the return address.

The Z80 has a series of rst instructions that call fixed addresses within the first 256 bytes of memory. These instructions are useful as they're small (one byte vs three bytes for a regular call) and fast, so I'm using rst $28 to call the off-page call handler (for no other reason than it's the same as the handler on the TI-83+).

As an example, let's say you had this function call at address $2B00:

$2B00:    rst $28
$2B01:    .dw $30F0
$2B03:    ; We'd return here.

When the Z80 executed that rst $28 it would push $2B01 (address of the next instruction) to the stack then jump to $28. The handler at $28 would do something like this:

    pop hl    ; hl is a 16-bit register and would now contain $2B01
    ld e,(hl) ; Read "e" from address pointed to by hl, now equals $F0
    inc hl    ; hl = $2B02
    ld d,(hl) ; Read "d" from address pointed to by hl, now equals $30
    inc hl    ; hl = $2B03 ("real" return address)
    push hl   ; push hl back on the stack so when we return from here we end up in the correct place.

Now, de is $30F0 - this is the identifier of the function we're calling. In my case, the identifier points to a function table on page 0. Each entry in the table is three bytes - one byte for the page index and two bytes for the address of the function on the that page. We'd need to do something like this:

    in a,(Page)  ; Read the current page into A.
    push af      ; Push A and F to the stack for later retrieval.
    and ~7       ; Mask out the lower three bits of the address.
    out (Page),a ; Sets current ROM page to 0.
    ex de,hl     ; Exchanges de and hl, so hl now points to the function identifier.
    or (hl)      ; ORs contents of memory at (hl) (ie, page number) with a, to set the target page.
    inc hl
    ld e,(hl)    ; e = LSB of target address
    inc hl
    ld d,(hl)    ; d = MSB of target address
    ex de,hl     ; hl = target address.
    out (Page),a ; Swaps in the correct page.
At this point, the correct page is swapped in and hl points to the address of the function to call. All we need to do now is call it!
    ld de,ReturnFromHandler ; Address to return to.
    push de ; Store on stack.
    jp (hl) ; Set pc = hl.
ReturnFromHandler
    ; Swap back the original page which was pushed earlier...
    pop af
    out (Page),a
    ret ; ...and return to the calling page!

A further advantage of using rst $28 to replace call is that both are the same size, so the assembler can check if you're calling an address on the same page or a different one and insert the regular (and much faster) Z80 call in places where you don't need to swap the page.

Finally, the obligatory video, this time showing a clock that toggles the backlight once a second.

Bank-Switching Memory and I2C

Thursday, 11th September 2008

Cheers for the comments. smile.gif As EasilyConfused pointed out, I have done calculator programming in the past, which makes this much easier - learning Z80 assembly to program a calculator influenced the choice of CPU in this computer, and porting BBC BASIC to the calculator showed that with a minimal amount of code to sit between it and the hardware you'd have a decent operating system with very little work. And if a Terminator-related name is good enough for the UK military, it should be good enough for this project...

The I/O board from a few posts ago has undergone a few revisions:

2008.09.12.01.IO.Board.jpg

Both PS/2 ports are now fully wired up, though only the lower one is currently used by the OS for keyboard input. I will need to adjust the AT protocol routines (the AT protocol is used to control both keyboard and mouse) to support multiple physical ports, as it was adapted from code I wrote for the TI-83+ calculator and as such only supports one device at a time. The mouse position will be polled by calling ADVAL(axis%), which on the BBC Micro would return the joystick position (axis% specifies the type of information to retrieve from the mouse - a value of 0 returns the buttons as a bitfield, 1 returns the movement in the X axis, 2 the movement in the Y axis and 3 the amount the scrollwheel has been scrolled).

At the very bottom of the circuit board is another 8-bit latch. This is for the (currently) write-only control port. The three least significant bits specify the current ROM page (one of eight 16KB ROM pages can be swapped in for a total of 128KB) and the next bit specifies one of two 16KB RAM pages accessible in the $4000..$7FFF address range. One of the other bits will be used to switch the LCD backlight on and off in software, one more may be connected to a buzzer, and I'm sure I can find some use for the last two. As it's write-only, its current state needs to be stored in RAM so that you can change bits of it (eg when changing the ROM page you wouldn't want to change the backlight status at the same time; you'd need to retrieve the current state and mask in the bits you wish to preserve). This is obviously an ugly hack, and I'm hoping I'll be able to use some of the space to the right of the latch IC on the circuit board to add the other latch to allow the port to be read as well (an I/O port needs two latches - an output latch that takes data from the data bus and outputs it to external hardware, and an input latch that takes data from external hardware and puts it back on the data bus).

The first test of the new ROM paging hardware was to display a simple animation. Assuming 1KB on each ROM page was taken up by the animation playback program, that leaves 15KB per ROM page. A frame (128×64 pixels) is 1KB, so that's 15 frames per page, or 120 frames total. I converted a clip from Pink Floyd's Arnold Layne music video to a suitable format and wrote a playback routine that could run from RAM. When the computer booted it would copy the player to RAM and run it from there as it could then run uninterrupted when different ROM pages were swapped in to read the frame data.

An animation like this is a useful test, as if the ROM paging didn't work properly (simulated by holding the three ROM page selection lines low) the software would still run, but would just loop the first 15 frames (or play chunks of 15 frames out of sequence) instead of crashing.

Another addition to the circuit above is the cluster of discrete transistors and resistors under the lowest PS/2 port. This is the same sort of pair of open-collector I/O data lines that drive each PS/2 port, except that the two data lines are fed out of the I/O board and back to the breadboard that's currently sitting between the memory board and the I/O board to these two simple 8-pin chips:

2008.09.12.03.I2C.Chips.jpg

This is the I2C bus, a simple, low-speed, two-wire bus that will allow other components be easily connected to the computer. The I2C protocol is implemented in software. The two chips in the above image are a DS1307 real-time clock (foreground, with quartz crystal) which I hope to use for timing purposes and a 24LC256 32K×8 EEPROM which I hope to use for file storage. I would need to have some way of accessing the I2C bus externally (to plug in EEPROMs as removable storage) as well as supporting the internal devices.

I haven't yet done any work on supporting I2C devices properly, but I have added I2C bus emulation to the emulator I'm using to develop the OS. BBC BASIC will pass commands prefixed with a *STAR to the operating system, so I've added a *I2CPROBE command that will hammer through all available addresses and list any devices that acknowledge a write request.

2008.09.12.03.I2C.Probe.png
$A0 is the EEPROM and $D0 is the clock.

I think I may have dug myself into a hole for CPU timing. I mentioned that I will need to drop the CPU clock to 2MHz when accessing the LCD; unfortunately, switching between 2MHz and 10MHz doesn't seem to work very well. I can run the system relatively stably at either speed (though at 10MHz data sent to the LCD is occasionally corrupted) but if I try and switch dynamically (eg switching from the 10MHz to the 2MHz clock when /IORQ goes low to indicate an I/O request) the system locks up. My assumption is that during time it takes the logic gates that perform the 10MHz/2MHz switch to properly settle into their new state (which is in the tens of nanoseconds) the clock signal stutters a little, effectively producing a clock signal (albeit a brief one) well over 10MHz. I don't have an oscilloscope to verify this, however. sad.gif

Running BBC BASIC on a home-built computer

Sunday, 7th September 2008

2008.09.07.01.LookAroundYou.jpg
This computer needs a name - I'd welcome any suggestions!

I have built a circuit on another piece of stripboard that will handle memory, clock signal generation and the Z80 itself.

A few posts ago I was wondering about how I'd partition memory. To date I've been using a very simple circuit where the lower 32KB of addressable memory is mapped to ROM and the upper 32KB is mapped to RAM. As my ROM chip is 128KB and I have two 32KB RAM chips, this seems a bit wasteful.

The memory layout I'm now using is quite simple: the upper 32KB is still mapped to RAM. However, only the first 16KB is mapped to ROM, and the three most significant bits of the ROM chip's address lines are connected to a device on the I/O board so that one of its eight 16KB "pages" can be swapped in. The next 16KB will be mapped to RAM, and the most significant bit of the RAM chip's address is connected to the same device on the I/O board so one of its two 16KB "pages" can be swapped in.

2008.09.07.03.Memory.Map.gif

For more information, see the Wikipedia article on bank switching. There is a potential problem here; the Z80 uses particular fixed addresses for certain operations. The three most obvious ones are $0000 (jumped to on reset), $0038 (address of maskable interrupt handler) and $0066 (address of non-maskable interrupt handler). As which 16KB bank switched in at power-on is effectively random, the easy way around this problem is to ensure that the first 256 bytes or so of every ROM page has the same code assembled on it. This means that whichever page is swapped in on boot doesn't matter, as the same common boot code is available on each page.

The assembled memory board looks like this:

I have only attached one of the 32KB RAM chips. The wiring was becoming a bit of a nightmare (I think I'll need to solder to the track side of the stripboard to fit in that other RAM chip) so for the moment the system can only access the fixed 32KB RAM. I haven't yet added the device on the I/O board to handle bank switching, so for the moment the ROM is permanently configured to access the first 16KB page by pulling the its three externally controllable address lines low.

That said, this machine does genuinely run BBC BASIC (the last system only ran a mockup with a dummy header at the top of the screen). I've done quite a bit of work on the OS in the emulator and it works pretty well there, and with a minor adjustment to cram it onto a single 16KB page it works well on hardware too.

The row of chips along the bottom of the memory board are responsible for generating the clock signals that drive the computer. If this looks needlessly complex, that's because it can run at either 10MHz or 2MHz and generates the E signal for LCD access. The CPU needs to drop to 2MHz when accessing the LCD (the LCD driver can't keep up, otherwise) so I'll probably end up connecting the input for this 2MHz/10MHz switch to the LCD chip enable pins so that normally the system runs at 10MHz but drops to 2MHz when accessing the LCD. Allowing the user to drop to 2MHz to save power is an appealing idea, however...

2MHz should be enough for anyone

Wednesday, 27th August 2008

2008.08.27.01.LCD.Ribbon.jpgLCD Timing
Last time I discussed the hardware I mentioned I had LCD timing issues. I have finally resolved them, but this has been the most time consuming part of the project so far.

The first thing to sort out was the LCD's E pin. Once you have set up the LCD's input pins to a state where they're ready to read or write data, you need to drive this pin high. I had had some success by holding it high permanently and relying on the Z80 to set all the other to the right state at roughly the same moment, but this was inaccurate and resulted in occasional display glitching.

Consulting the datasheet, it appears that once the input pins are ready E needs to be held low for at least 450nS and then needs to be driven high for at least 450nS. Hmm. During an I/O request (and once the Z80 has prepared the address and data bus) there's a delay of 1 clock cycle, then /IORQ is held low for about 2.5 CPU cycles. That is the window of opportunity. I have connected a binary counter to the Z80's clock signal and take the least significant bit of the output - every clock cycle this output toggles between a low and a low, effectively halving the CPU clock rate. I then connect the counter's reset pin (which overrides the clock input and forces it to output zero) to /IORQ. The result is that when the Z80 is not accessing hardware the counter is held in its reset state, and E is held low. When the Z80 holds /IORQ low, the counter starts up and outputs a zero for one CPU cycle, outputs a one for the next CPU cycle, then outputs a zero for the next half clock cycle at which point /IORQ goes high again and it is back to zero anyway. This is exactly what's needed!

This also allows us to calculate the maximum CPU clock rate. If we are generous and allow E to be low for 500nS then high for 500nS, that gives us a CPU clock rate of 1/500nS=2MHz.

Anyhow, that's one problem resolved, but there was still one nasty bug. When reading from the LCD it would occasionally end up writing to the area that was being read or, in worse cases, the Z80 would "crash". The LCD has a R/#W pin that is held high when reading and held low when writing. I had connected it directly to the Z80's /WR pin, which is high normally and low when writing. The problem here is "normally" as the LCD was expecting to be read even when the Z80 wasn't requesting a /RD. When being read, the LCD expects to put something onto the data bus, and it appeared that it kept thinking that it needed to put something on the bus when it wasn't needed. This caused fighting with the other chips that were trying to put their own values on the data bus, hence the crashes as the Z80 received invalid data.

The answer was very easy; simply connect the Z80's /RD pin to the LCD's R/#W pin via a NOT gate. In the default state the pin is held low (LCD expects a write and leaves the data bus alone), and only goes high during a /RD. The LCD interface is now very robust.

CPU Clock
Above I mention the calculation for the maximum clock rate. Rather than use the 555 for timing, I switched to using a 10MHz crystal resonator oscillator. I'm using the serial resonant circuit from z80.info with a 74F04 hex inverter (second circuit down). Fortunately the counter chips I have are decade counters (ie, designed to count from 0 to 9) made up of a ÷2 and a ÷5 section. I can connect a 10MHz oscillator to the ÷5 section and use the output of that to drive the CPU. In the final design I'd like to add a "hardware control port" with a bit that would let the programmer choose 2MHz or 10MHz mode by setting or resetting a particular output bit (other control bits would include switching the LCD backlight on or off and a buzzer for beeping sound output).

PS/2 Ports
As a friend pointed out, the 8-bit open-collector I/O port (which will drive two PS/2 ports, the I2C bus and TI calculator link port) had a flaw - there was no resistor on the base of the output transistors. The result is that if the output latch tries to drive the transistor base high, the transistor switches on and shorts the output of the latch to ground. This was clearly a problem in the design as the LCD backlight dimmed when trying to output to these ports as they drew a excessive amount of current when effectively short-circuited. A 22K resistor between the output latch and base of the transistor fixed the problem.

2008.08.27.02.PS2.Ports.jpg

In the above photo, I've also added two 100K resistors to hold the output high when floating, but only to the foreground PS/2 port for the time being. I don't think I'll be controlling a mouse yet!

Revised OS
With a little modification of the Emerson PS/2 library I've got a basic keyboard driver up and running on the hardware. All the OS does for the moment is check for keys and display them on the screen. It's currently hard-coded to the UK layout, as I haven't yet decided how I'm going to handle storage and by keeping the layout in ROM it at least frees up a few hundred bytes of RAM that would otherwise need to be there for the scancode translation tables.

Z80 computer with a primitive I/O board

Wednesday, 20th August 2008

A computer needs some way of interacting with the outside world via input and output devices. It's about time, then, that the Z80 computer project acquires a section dedicated to I/O.

2008.08.20.04.Overview.jpg

The Z80 differentiates between memory and I/O devices, though both share the data bus and the address bus. You can control I/O devices using the in (input) and out (output) instructions. When you input or output you must specify a device address and a value or target register. For example,

    in a,($20) ; Read a value from device $20 and store it in A.
    ld a,123
    out ($40),a ; Output 123 to device $40.

When you write to a device, the following happens:

  • The address bus is set to the address of the device to output to.
  • The data bus is set to the value to be written to the device.
  • The /IORQ and /WR pins on the Z80 go low.
  • The device processes the written data.
  • The /IORQ and /WR pins on the Z80 go high.

Reading from the device is very similar:

  • The address bus is set to the address of the device to read from.
  • The /IORQ and /RD pins on the Z80 go low.
  • The device puts the value to read onto the data bus.
  • The Z80 reads the value on the data bus.
  • The /IORQ and /RD pins on the Z80 go high.

(Accessing memory is a similar procedure, except with the /IORQ pin replaced by the /MREQ pin).

Interfacing I/O devices to a Z80 CPU should be rather straightforwards, then. I am using a 74HCT138N 3-to-8 line inverting decoder to handle the address bus input and /IORQ signal. This IC has three address inputs and 8 outputs. If the address input is %000, output 0 is low and all the other pins are high; if the address input is %001 output 1 is low and all the others are high; if the input is %010 output 2 is low and all the others are high (and so on and so forth). /IORQ is connected to another input on the chip, /E1, which causes all of the pins to go high when it is high regardless of the address input.

What does this mean in practice? Well, most devices have a "chip enable" or "chip select" input pin. When this input is active the device performs its function, but when the input is not active the device is deactivated and doesn't respond to any other inputs or output anything. By connecting each output of the 3-to-8 decoder to a particular device's chip enable pin I can ensure that each device is only activated when its address is specified on the address bus and the /IORQ pin on the Z80 is low.

I have connected the Z80's A5-A7 to A0-A2 on the 3-to-8 decoder. This means that the first device has a base address of $00, the second $20, the third $40 and so on at increments of $20. This might sound a little odd, but has a reason. Some devices, such as the LCD, have sub-addresses of their own. In the case of the LCD, it has a pin that specifies whether you're dealing with an instruction (such as a command to switch the display on or off or read the LCD status) or some data (which forms part of the picture on the LCD). By attaching this pin directly to the Z80's A0 and the LCD's chip select pin to output 1 from 3-to-8 decoder you end up with an LCD instruction port at $20 and an LCD data port at $21.

An LCD is all well and good, but we'll need to take input from the user. To accomplish this, I'm going to supply two PS/2 ports and implement the AT protocol (as used by PS/2 keyboards and pointing devices) in software. Each device only requires two open-collector data lines (data and clock), so a single I/O device that provides eight I/O lines would be useful.

The design I'm going for uses two 74AC373 octal transparent latches. When the latch enable input pin is held high whatever value is on the input passes through to the corresponding output. When the latch enable pin goes low, the last value that was latched at the input is still output. These particular latches also have an output enable pin that can be used to disable the outputs and let them "float" (ie, other devices can then drive that particular connection high or low as required). In this instance, one latch has its output enable pin activated so that it always outputs the last value written to it and has its latch enable pin connected to the Z80's /WR pin. The other latch has its latch enable pin activated so that it always outputs the values at its input and has its output enable pin connected to the Z80's /RD pin.

The transistors on each output are used to provide open-collector outputs. When the base of the transistor is held low, the transistor is "switched off" and its output floats, and so can be driven by external circuitry. When the base of the transistor is held high, it switches on and effectively connects the output to ground. A pull-up resistor ensures that the pin has a high signal when not connected to anything. This arrangement is useful as each pin can be driven low by either device and so works as an input or an output (for a real-world example, an AT keyboard usually outputs a clock signal on one line to the host when sending data, but if the host pulls the clock line low it can inhibit communication and the keyboard buffers the data to send instead).

Rather than build the circuit on breadboard, I went straight to stripboard. The above photo shows an incomplete version of the output board. Only one PS/2 port is wired up at all! The pin header to the left is to connect the LCD to. The coloured wires at the extreme left connect this I/O board to the rest of the computer.

I have modified the Z80 board I was using last time to add support for RAM. The 3-to-8 decoder in the bottom right is used to partition the address space into two 32KB regions. The lower 32KB is mapped to ROM, and the upper 32KB is mapped to RAM. This wastes 75% of the ROM chip (it's a 128KB chip) but without a more complex memory management unit this will have to do for the moment. The most significant bit of the address bus, A15, is fed into the 3-to-8 decoder along with the /MREQ pin.

The test software is a Z80 program that displays an animation on the LCD using 20 frames (1KB per frame) stored in ROM.

The Z80 is still not breaking MHz speeds, but there are problems here. I have not interfaced the LCD correctly, as its timing patterns for reading and writing data are quite different to those used by the Z80. Bizarrely, holding the E pin on the LCD permanently high appears to work 99% of the time, even though the datasheet indicates that it should be used to clock data in or out. The result is glitches in the data sent to the LCD, usually on the left hand side (the left hand side of the display has a propensity to believe it's been sent the "switch off" command). I'm not sure I'll be able to remedy this situation. Judging by the datasheet it looks like the LCD does its stuff when the E pin goes from a low to high state (the Z80 does everything when /IORQ goes low), so maybe simply inverting /IORQ and pumping it into E will do the trick.

Z80 Light-flasher

Wednesday, 13th August 2008

Now armed with a flash programmer, I thought it about time to try and build a Z80-based system.

Not much to look at, and it doesn't do much either. The large IC in the bottom-left, prominently marked Z, is the Z80 itself. To its left is a 555, generating a ~220Hz clock signal (yes, Hz, not MHz or even kHz). Above the Z80 is another large chip - this is the 128KB flash ROM. The eight parallel wires between them are the address bus - only A0 to A7 are connected. This only lets the Z80 address 256 bytes, but that should be enough for testing.

To the right of the flash ROM is an octal latch. This is used to provide an 8-bit output port for the system, which is connected to the LEDs to its right. As the latch's latch enable pin is active high, unlike everything else in the system (which is active low - ie, it does something when you drive it low) I have to put a NOT gate - the final black IC to the right of the Z80 - between it and the Z80's /WR (write) pin. I do not do any address decoding or even check the /IORQ pin, so any value written to to any hardware device or memory address will end up on the LED display. Not that that really matters, as there is a conspicuous lack of RAM in the system!

The large physical size and tedium of wiring even such a primitive system as this makes me wonder whether it's worth jumping straight to stripboard for subsequent hardware revisions...

For the curious, the program running on the Z80 is as follows.

.for p = 0 to 7
.defpage p, kb(16), $0000
.loop
.emptyfill $FF

.page 0

	im 1
	di

--	ld hl,LightSequence
	ld b,LightSequenceEnd-LightSequence
-	ld a,(hl)
	out (0),a
	inc hl
	djnz -
	jr --

LightSequence
	.db %00000001
	.db %00000010
	.db %00000100
	.db %00001000
	.db %00010000
	.db %00100000
	.db %01000000
	.db %10000000
	.db %01000000
	.db %00100000
	.db %00010000
	.db %00001000
	.db %00000100
	.db %00000010
	.db %00000001
	.db %00000010
	.db %00000100
	.db %00001000
	.db %00010000
	.db %00100000
	.db %01000000
	.db %10000000
	.db %01000000
	.db %00100000
	.db %00010000
	.db %00001000
	.db %00000100
	.db %00000010
	.db %00000001
	.db %00000011
	.db %00000111
	.db %00001111
	.db %00011111
	.db %00111111
	.db %01111111
	.db %11111111
	.db %11111111
	.db %00000000
	.db %11111111
	.db %00000000
	.db %11111111
	.db %00000000
	.db %11111111
	.db %00000000
	.db %11111111
	.db %00000000
	.db %11111111
	.db %00000000
	.db %11111111
	.db %11111110
	.db %11111100
	.db %11111000
	.db %11110000
	.db %11100000
	.db %11000000
	.db %10000000
	.db %00000000
	.db %10000000
	.db %11000000
	.db %11100000
	.db %11110000
	.db %01111000
	.db %00111100
	.db %00011110
	.db %00001111
	.db %10000111
	.db %11000011
	.db %11100001
	.db %11110000
	.db %01111000
	.db %00111100
	.db %00011110
	.db %00001111
	.db %10000111
	.db %11000011
	.db %11100001
	.db %11110000
	.db %01111000
	.db %00111100
	.db %00011110
	.db %00001111
	.db %00000111
	.db %00000011
	.db %00000001
	.db %00000000
LightSequenceEnd

.echoln strformat("Size: {0} bytes", $)

Emulators and neatened wiring

Tuesday, 12th August 2008

I've decided to switch to a regular 10MHz Z80 rather than a Z180, given the difficulty of using an SDIP 64. I now have a DIP 40 Z80 ready for use, but as I don't have the programmer for the Flash chip (which will hold the OS) there's not much I can do with it physically. I have therefore cobbled together a basic emulator to help develop some of the software beforehand.

2008.08.12.02.Emulator.png

To cut hardware costs I'm going to try and handle input in software. One bit of hardware I'm planning on having is an eight-bit open collector I/O port. Open collector pins float high in their reset state, and any device connected to the pin can drive it low. AT devices (keyboard and mouse) use this type of electrical connection, as does the I2C bus and the TI calculator link port. I can use up the eight pins easily - two pins per AT device (keyboard and mouse) makes four, two pins for the I2C bus and two pins for a TI calculator link port.

The I2C bus I mentioned above is a simple way to enhance the computer once built. There will be one device permanently attached to the bus, a DS1307 real-time clock, which will be used to provide time-keeping functions for the OS as well as generating periodic interrupts (the chip could be configured to trigger an interrupt 100 times a second, useful for timing game logic). I could then leave empty space on the circuit board to add other I2C devices over time, or have a socket on the case that could be used to plug in additional I2C modules.

Now that I have some more tools, namely a desoldering pump, I tidied up the horrible hack job I'd done on the graphical LCD (replacing the multiple wires with a single pin header).

2008.08.12.01.Sonic.jpg

Yes, still the PICAXE here, but I'm using its 256 byte EEPROM to store a 32×64 pixel image of Sonic that is repeated four times horizontally.

I'm still not sure what I'm doing with regards to memory or storage. I'm still working on the simple assumption that ROM is 32KB ($0000..$7FFF) and RAM is 32KB ($8000..$FFFF) but this wastes a lot of memory and isn't very flexible at all. I've planned a bank-switching MMU, but as this will require at least four registers to store what appears in each of the four 16KB windows it will end up being physically very large and painful to wire.

As for storage, I have no idea. I have some 32KB I2C EEPROMs, but 32KB isn't exactly very large. Alternatively, I have an old 512MB SD card, and could try talking to it over bit-banged SPI. (SD cards use 3.3V, though, which complicates matters - not to mention that bit-banged SPI is going to be a little sluggish). I also have a USB module which can talk to USB mass storage devices over a serial connection, so maybe I should add a UART to the project. Adding a fully-blown USB module (which also plays WMA, MP3 and MIDI files) to such an otherwise low-tech computer feels like heresy, though.

Interrupts: A Fresh Start

Thursday, 21st February 2008

I gave in and rewrote all of the Z80's interrupt emulation from scratch, finding some rather horrible bugs in the existing implementation as I went.

Some of the highlights included non-maskable interrupts ignoring the state of the IFF1 flag (this flag is automatically cleared when an interrupt is serviced, and is used to prevent the interrupt handler from being called again before it has finished) and the RETN instruction not copying the state of the IFF2 flag back to IFF1. When non-maskable interrupts are serviced, the state of IFF1 is copied to IFF2 before it gets cleared, the idea being that if you use RETN interrupts are automatically re-enabled on exit of the NMI ISR. (Contrast this with maskable interrupts, where both flags are cleared, and you need an explicit EI to re-enable them).

The HALT instruction (executes NOPs until an interrupt is requested or the CPU is reset) was also completely incorrectly (and bizarrely) implemented. The rewrite just sets a Halted property, which prevents the CPU from fetching or executing any instructions. The interrupt-triggering code simply resets this property.

This has fixed numerous bugs (I'm not sure when they were introduced, as it was all working a while back). It's gone from "not working at all" to "just about working", but some games or demos that rely on precise interrupt timing don't work properly.

HicolorDemoCorruption.png DesertSpeedtrapMisaligned.png
Game Gear Hicolor Demo and Desert Speedtrap

Both problems in the above screenshots relate to line-based interrupts from the VDP (Video Display Processor). Some other games simply hang at startup. sad.gif

Necromancy

Friday, 8th February 2008

After seeing Scet and Drilian's work on their respective emulator projects I decided I needed to do something with the stagnating Cogwheel source on my hard disk drive.

The only ROM I have tested where I can't find an explanation for a bug is the Game Gear Garfield: Caught in the Act game. Like many games, when left at the title screen it'll run a demo loop of the game in action. At one point Garfield would walk to the left of the screen, jump over a totem pole, shunt it to the right and use it as a way to jump out of a pit. However, in Cogwheel he would not jump far enough to the left, and not clearing the totem pole he'd just walk back to the right and not have anything to jump out of the pit on.

I remembered a post on MaxCoderz discussing a long-standing tradition of thinking that when a JP <condition>, <address> failed the instruction took a single clock cycle. You can see this misreported here, for example. This document, on the other hand, claims it always takes 10 clock cycles - and most importantly of all, the official user manual backs this up.

Garfield.png

So, Garfield can now get out of his pit. The user interface has changed (again) - I'm now using SlimDX to dump pixels to a Panel, which seems to be the least hassle distribution-wise and doesn't throw LoaderLockExceptions.

Brass 3 and software PAL

Wednesday, 14th November 2007

My work with the VDP in the Sega Master System made me more aware of how video signals are generated, so thought it would be an interesting exercise to try and generate them in software. This also gives me a chance to test Brass 3, by actively developing experimental programs.

Hardware.Thumb.jpg

I'm using a simple 2-bit DAC based on a voltage divider, using the values listed here. This way I can generate 0V (sync), ~0.3V (black), ~0.6V (grey) and 1V (white).

My first test was to output a horizontal sync pulse followed by black, grey, then white, counting clock cycles (based on a 6MHz CPU clock). That's 6 clock cycles per µs.

2-bit-DAC-Black-Grey-White.Thumb.jpg

The fastest way to output data to hardware ports on the Z80 is the outi instruction, which loads a value from the address pointed to by hl, increments hl, decrements b and outputs the value to port c. This takes a rather whopping 16 clock cycles (directly outputting to an immediate port address takes 11 clock cycles, but the overhead comes from loading an immediate value into a which takes a further 7). The time spent creating the picture in PAL is 52µs, which is 312 clock cycles. That's 19.5 outi instructions, and by the time you've factored in the loop overhead that gives you a safe 18 pixel horizontal resolution - which is pretty terrible.

Even with this technique, in the best case scenario you output once every 16 clock cycles which gives you a maximum time resolution of 2.67µs. This is indeed a problem as vertical sync is achieved by transmitting two different types of sync pulse, made of either a 2µs sync followed by 30µs black (short) or 30µs sync followed by 2µs black (long). In my case I plumped for the easiest to time 4µs/28µs and hoped it would work.

Anyhow, I made a small three-colour image for testing: Source.gif.

Of course, as I need to output each scanline anyway I end up with a resolution of 304 lines, which gives me rather irregular pixels, so I just stretch the above image up to 20×304. Eagle-eyed readers would have noticed that the horizontal resolution is only 18 pixels, but somewhere in the development process I forgot how to count and so made the image two pixels too wide.

TV.Thumb.jpg

As you can see, it shows (the entire image is shunted to the right). TVs crop the first and last few scanlines (they aren't wasted, though, and can be used for Teletext) so that's why that's missing. smile.gif A widescreen monitor doesn't help the already heavily distorted pixels either, but it does (somewhat surprisingly) work.

With a TI-83+ SE (or higher) you have access to a much faster CPU (15MHz) and more accurate timing (crystal timers rather than an RC circuit that changes speed based on battery levels) as well as better interrupt control, so on an SE calculator you could get at least double the horizontal resolution and output correct vertical sync patterns. You also have better control over the timer interrupts, so you could probably drive hsync via a fixed interrupt, leaving you space to insert a game (the only code I had space for checks to see if the On key is held so you can quit the program - more clock cycle counting). I only have the old 6MHz calculator, though, so I'm pleased enough that it works at all, even if the results are useless!

Brass 3.0.0.0 Beta 1

Monday, 5th November 2007

I've released a beta version of the new assembler. It comes with the compiler, a GUI builder (see the above screenshot) and the help viewer; it also comes bundled with a number of plugins.

I've also knocked together a quick demo project that can be built directly from Explorer once Brass is installed.

build_explorer.png

There are a number of missing features (such as a project editor, project templates and multiple build configurations) and no doubt broken, incomplete or untested components - but at least it's out in the wild now, which gives me an incentive to fix it!

Emulating TI-OS 1.15 and a greyscale LCD

Monday, 29th October 2007

OS 1.15 appears to boot, and if I run an OS in Pindur TI, archive the files (copy them to Flash ROM) then use that ROM dump in my emulator the files are still there, where they can be copied to RAM.

Trying to re-archive them results in a fairly un-helpful message, as I haven't implemented any Flash ROM emulation (nor can I find any information on it)...

memory.gif

Applications (which are only ever stored and executed on Flash ROM) work well, though.

cellsheet.gif    mirageos.gif    graph3.gif

I've also updated the LCD emulation a little to simulate the LCD delay; greyscale programs (that flicker pixels on and off) work pretty well now.

blur.gif    grey.gif

repton.gif    ft2.gif

Brass 3 and TI-83+ Emulation

Friday, 26th October 2007

Brass 3 development continues; the latest documentation (automatically generated from plugins marked with attributes via reflection) is here. The compiler is becoming increasibly powerful - and labels can now directly store string values, resulting in things like an eval() function for powerful macros (see also clearpage for an example where a single function is passed two strings of assembly source and it uses the smallest one when compiled).

Thanks to a series of hints posted by CoBB and Jim e I rewrote my TI-83+ emulator (using the SMS emulator's Z80 library) and it now boots and runs pretty well. The Flash ROM archive isn't implemented, so I'm stuck with OS 1.12 for the moment (later versions I've dumped lock up at "Defragmenting..."). I also haven't implemented software linking, and so to transfer files I need to plug in my real calculator to the parallel port and send files manually.

ion_received.png
ion_installed.png
ion_pacman.png
pacman-99.png

Brass 3

Tuesday, 2nd October 2007

Quake isn't dead, but I've shifted my concentration to trying to get Brass 3 (the assembler project) out.

brass_err.png

Brass 2 didn't really work, but I've taken a lot of its ideas - namely the plugin system - and kept some of the simplicity from Brass 1. The result works, and is easy to extend and maintain. Last night I got it to compile all of the programs I used for testing Brass 1 against TASM successfully.

I'm taking advantage of .NET's excellent reflection capabilities; one such example is marking plugin functions with attributes for documentation purposes, meaning that all you need to get Brass documentation is to drop your plugin collection assemblies (DLLs) into the Brass directory then open the help viewer app.

help_fsize.png

The source code examples are embedded as text, but compiled by the viewer (and thus syntax-highlighted) so you can click on directives or functions and it'll jump to their definitions automatically.

Native function support and a much-improved parser means that complex control structures can be built up, like:

file = fopen("somefile.txt")

#while !feof(file)
    .db fread(file)
#loop

fclose(file)

The compiler invokes the plugins, and the plugins talk back to the compiler ("remember your current position", "OK, we need to loop, so go back to this position", "this loop fails, so switch yourself off until you hit the #loop directive again").

The compiler natively works with project files (rather than some horrible command-line syntax) which specify which plugins to load, which include directories to search and so on and so forth. There are a number of different plugin classes:

  • IAssembler - CPU-specific assembler.
  • IDirective - assembler directive.
  • IFunction - functions like abs() or fopen().
  • IOutputWriter - writes the object file to disk (eg raw, intel hex, TI-83+ .8xp).
  • IOutputModifier - modifies each output byte (eg "unsquishing" bytes to two ASCII charcters for the TI-83).
  • IStringEncoder - handles the conversion of strings to byte[] arrays (ascii, utf8, arcane mappings for strange OS).

Unlike Brass 2, though, I actually have working output from this, so hopefully it'll get released!

As a bonus, to compare outputs between this and TASM (to check it was assembling properly) I hacked together a binary diff tool from the algorithm on Wikipedia (with the recursion removed) - it's not great, but it's been useful to me. smile.gif

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace Differ {
	class Program {
		static void Main(string[] args) {

			// Prompt syntax:
			if (args.Length != 2) {
				Console.WriteLine("Usage: Differ <file1> <file2>");
				return;
			}

			// Load both files into byte arrays (sloppy, but whatever).
			byte[][] Data = new byte[2][];

			for (int i = 0; i < Data.Length; ++i) {
				try {
					byte[] Source = File.ReadAllBytes(args[i]);
					Data[i] = new byte[Source.Length + 1];
					Array.Copy(Source, 0, Data[i], 1, Source.Length);
				} catch (Exception ex) {
					Console.WriteLine("File load error: " + args[i] + " (" + ex.Message + ")");
					return;
				}
			}

			// Quick-and-dirty equality test:
			if (Data[0].Length == Data[1].Length) {
				bool IsIdentical = true;
				for (int i = 0; i < Data[0].Length; ++i) {
					if (Data[0][i] != Data[1][i]) {
						IsIdentical = false;
						break;
					}
				}
				if (IsIdentical) {
					Console.WriteLine("Files are identical.");
					return;
				}
			}

			if (Data[0].Length != Data[1].Length) {
				Console.WriteLine("Files are different sizes.");
			}

			// Analysis:
			int[,] C = new int[Data[0].Length, Data[1].Length];
			for (int i = 1; i < Data[0].Length; ++i) {
				if ((i - 1) % 1000 == 0) Console.Write("\rAnalysing: {0:P}...", (float)i / (float)Data[0].Length);
				for (int j = 1; j < Data[1].Length; ++j) {
					if (Data[0][i] == Data[1][j]) {
						C[i, j] = C[i - 1, j - 1] + 1;
					} else {
						C[i, j] = Math.Max(C[i, j - 1], C[i - 1, j]);
					}
				}
			}
			Console.WriteLine("\rResults:".PadRight(Console.BufferWidth - 1));

			List<DiffData> CollectedDiffData = new List<DiffData>(Math.Max(Data[0].Length, Data[1].Length));
			
			for (int i = Data[0].Length - 1, j = Data[1].Length - 1; ; ) {
				if (i > 0 && j > 0 && Data[0][i] == Data[1][j]) {
					CollectedDiffData.Add(new DiffData(DiffData.DiffType.NoChange, Data[0][i], i, j));
					--i; --j;
				} else {
					if (j > 0 && (i == 0 || C[i, j - 1] >= C[i - 1, j])) {
						CollectedDiffData.Add(new DiffData(DiffData.DiffType.Addition, Data[1][j], i, j));
						--j;
					} else if (i > 0 && (j == 0 || C[i, j - 1] < C[i - 1, j])) {
						CollectedDiffData.Add(new DiffData(DiffData.DiffType.Removal, Data[0][i], i, j));
						--i;
					} else {
						CollectedDiffData.Reverse();
						break; // Done!
					}
				}
			}


			DiffData.DiffType LastType = (DiffData.DiffType)(-1);

			int PrintedData = 0;
			foreach (DiffData D in CollectedDiffData) {
				if (LastType != D.Type) {
					Console.WriteLine();
					Console.Write("{0:X4}:{1:X4}", D.AddressA - 1, D.AddressB - 1);
					LastType = D.Type;
					PrintedData = 0;
				} else if (PrintedData >= 16) {
					Console.WriteLine();
					Console.Write("         ");
					PrintedData = 0;
				}
				ConsoleColor OldColour = Console.ForegroundColor;

				switch (D.Type) {
					case DiffData.DiffType.NoChange:
						Console.ForegroundColor = ConsoleColor.White;
						break;
					case DiffData.DiffType.Addition:
						Console.ForegroundColor = ConsoleColor.Green;
						break;
					case DiffData.DiffType.Removal:
						Console.ForegroundColor = ConsoleColor.Red;
						break;
				}
				Console.Write(" " + D.Data.ToString("X2"));
				++PrintedData;
				Console.ForegroundColor = OldColour;
			}
			Console.WriteLine();

		}

		private struct DiffData {

			public enum DiffType {
				NoChange,
				Addition,
				Removal,
			}

			public DiffType Type;

			public byte Data;

			public int AddressA;
			public int AddressB;

			public DiffData(DiffType type, byte data, int addressA, int addressB) {
				this.Type = type;
				this.Data = data;
				this.AddressA = addressA;
				this.AddressB = addressB;
			}

			
		}
	}
}

Removals are shown in red, additions are shown in green, data that's the same is in white.

8-bit Raycasting Quake Skies and Animated Textures

Monday, 20th August 2007

All of this Quake and XNA 3D stuff has given me a few ideas for calculator (TI-83) 3D.

One of my problems with calculator 3D apps is that I have never managed to even get a raycaster working. Raycasters aren't exactly very tricky things to write.

So, to help me, I wrote a raycaster in C#, limiting myself to the constraints of the calculator engine - 96×64 display, 256 whole angles in a full revolution, 16×16 map, that sort of thing. This was easy as I had floating-point maths to fall back on.

2007.08.19.03.jpg

With that done, I went and ripped out all of the floating-point code and replaced it with fixed-point integer arithmetic; I'm using 16-bit values, 8 bits for the whole part and 8 bits for the fractional part.

From here, I just rewrote all of my C# code in Z80 assembly, chucking in debugging code all the way through so that I could watch the state of values and compare them with the results from my C# code.

2007.08.19.01.gif

The result is rather slow, but on the plus side the code is clean and simple. smile.gif The screen is cropped for three reasons: it's faster to only render 64 columns (naturally), you get some space to put a HUD and - most importantly - it limits the FOV to 90°, as the classic fisheye distortion becomes a more obvious problem above this.

2007.08.19.02.png

I sneaked a look at the source code of Gemini, an advanced raycaster featuring textured walls, objects and doors. It is much, much faster than my engine, even though it does a lot more!

It appears that the basic raycasting algorithm is pretty much identical to the one I use, but gets away with 8-bit fixed point values. 8-bit operations can be done significantly faster than 16-bit ones on the Z80, especially multiplications and divisions (which need to be implemented in software). You can also keep track of more variables in registers, and restricting the number of memory reads and writes can shave off some precious cycles.

Some ideas that I've had for the raycaster, that I'd like to try and implement:

  • Variable height floors and ceilings. Each block in the world is given a floor and ceiling height. When the ray intersects the boundary, the camera height is subtracted from these values, they are divided by the length of the ray (for projection) and the visible section of the wall is drawn. Two counters would keep track of the upper and lower values currently drawn to to keep track of the last block's extent (for occlusion) and floor/ceiling colours could be filled between blocks.
  • No texturing: wall faces and floors/ceilings would be assigned dithered shades of grey. I think this, combined with lighting effects (flickering, shading), would look better than monochrome texture mapping - and would be faster!
  • Ray-transforming blocks. For example, you could have two 16×16 maps with a tunnel: the tunnel would contain a special block that would, when hit, tell the raycaster to start scanning through a different level. This could be used to stitch together large worlds from small maps (16×16 is a good value as it lets you reduce level pointers to 8-bit values).
  • Adjusting floors and ceilings for lifts or crushing ceilings.


As far as the Quake project, I've made a little progress. I've added skybox support for Quake 2:

2007.08.15.01.jpg

Quake 2's skyboxes are simply made up of six textures (top, bottom, front, back, left, right). Quake doesn't use a skybox. Firstly, you have two parts of the texture - one half is the sky background, and the other half is a cloud overlay (both layers scroll at different speeds). Secondly, it is warped in a rather interesting fashion - rather like a squashed sphere, reflected in the horizon:

Sky.jpg

For the moment, I'm just using the Quake 2 box plus a simple pixel shader to mix the two halves of the sky texture.

2007.08.19.01.jpg

I daresay something could be worked out to simulate the warping.

GLSky.jpg

The above is from GLQuake, which doesn't really look very convincing at all.

2007.08.19.02.gif

I've reimplemented the texture animation system in the new BSP renderer, including support for Quake 2's animation system (which is much simpler than Quake 1's - rather than have magic texture names, all textures contain the name of the next frame in their animation cycle).

TI Emulation, Functions in Brass and Gemini on the Sega Game Gear

Tuesday, 13th February 2007

This post got me wondering about a TI emulator. I'd rather finish the SMS one first, but so as to provide some pictures for this journal I wrote a T6A04 emulator (to you and me, that's the LCD display driver chip in the TI-82/83 series calculators). In all, it's less than a hundred lines of code.

The problem with TI emulation is that one needs to emulate the TIOS to be able to do anything meaningful. Alas, I had zero documentation on the memory layout of the TI calculators, and couldn't really shoe-horn the ROM dump into a 64KB RAM, so left it out entirely. That limits my options as to what I can show, but here's my Microhertz demo -

tunnel.png cylinder.png flip.png lens.png blobs.png

I've added native support for functions in Brass.

The old Brass could do some function-type things using directives; for example, compare the two source files here:

.fopen fhnd, "test.txt"          ; Opens 'test.txt' and stores a handle in fhnd
.fsize fhnd, test_size           ; Stores the size of the file in test_size

.for i, 1, test_size
    .fread fhnd, chr             ; Read a byte and store it as "chr"
    .if chr >= 'a' && chr <= 'z' ; Is it a lowercase character?
        .db chr + 'A' - 'a'
    .else
        .db chr
    .endif
.loop

.fclose fhnd                      ; Close our file handle.

I personally find that rather messy. Here's the new version, using a variety of functions from the 'File Operations' plugin I've been writing:

fhnd = fopen("test.txt", r)

#while !feof(fhnd)
    chr = freadbyte(fhnd)
    .if chr >= 'a' && data <= 'z'
        .db chr + 'A' - 'a'
    .else
        .db chr
    .endif
#loop

fclose(fhnd)

I find that a lot more readable.

An extreme example is the generation of trig tables. Brass 1 uses a series of directives to try and make this easier.

.dbsin angles_in_circle, amplitude_of_wave, start_angle, end_angle, angle_step, DC_offset

Remembering that is not exactly what I'd call easy. If you saw the line of code:

.dbsin 256, 127, 0, 63, 1, 32

...what would you think it did? You'd have to consult the manual, something I'm strongly opposed to. However, this code, which compiles under Brass 2, should be much clearer:

#for theta = 0, theta < 360, ++theta
    .db min(127, round(128 * sin(deg2rad(theta))))
#loop

By registering new plugins at runtime, you can construct an elaborate pair of directives - in this case .function and .endfunction - to allow users to declare their own.

_PutS = $450A

.function bcall(label)
    rst $28
    .dw label 
.endfunction

bcall(_PutS)

You can return values the BASIC way;

.function slow_mul(op1, op2)
    slow_mul = 0
    .rept abs(op1)
        .if sign(op1) == 1
            slow_mul += op2 
        .else
            slow_mul -= op2
        .endif
    .loop
.endfunction

.echo slow_mul(log(100, 10), slow_mul(5, 4))

I had a thought (as you do) that it would be interesting to see how well a TI game would run on the Sega Master System. After all, they share the CPU, albeit at ~3.5MHz on the SMS.

However, there are some other differences...

  • Completely different video hardware.
  • Completely different input hardware.
  • 8KB RAM rather than 32KB RAM.
  • No TIOS.

The first problem was the easiest to conquer. The SMS has a background layer, broken up into 8×8 tiles. If I wrote a 12×8 pattern of tiles onto the SMS background layer, and modified the tile data in my own implementation of _grBufCpy routine, I could simulate the TI's bitmapped LCD display (programs using direct LCD control would not be possible).

You can only dump so much data to the VRAM during the active display - it is much safer to only write to the VRAM outside of the active display period. I can give myself a lot more of this by switching off the display above and below the small 96×64 window I'll be rendering to; it's enough to perform two blocks, the left half of the display in one frame, the right in the next.

As for the input, that's not so bad. Writing my own _getK which returned TI-like codes for the 6 SMS buttons (Up, Down, Left, Right, 1 and 2) was fine, but games that used direct input were a bit stuck. I resolved this by writing an Out1 and In1 function that has to be called and simulates the TI keypad hardware, mapping Up/Down/Left/Right/2nd/Alpha to Up/Down/Left/Right/1/2.

The RAM issue can't be resolved easily. Copying some chunks of code to RAM (for self-modifying reasons) was necessary in some cases. As for the lack of the TIOS, there's no option but to write my own implementation of missing functions or dummy functions that don't do anything.

Even with the above, it's still not perfect. If I leave the object code in Gemini, the graphics are corrupted after a couple of seconds of play. I think the stack is overwriting some of the code I've copied to RAM.

No enemies make it a pretty bad 'game', but I thought it was an entertaining experiment.

Sega Tween

Thursday, 1st February 2007

No updates for a while, I'm afraid - things have been pretty hectic.

sega_tween_3d_stereo_pair.png

I packaged up and released the Sega Tween demo I'd been working on. As you can see, I added an SMS and a 3D mode - this works with the SMS 3D glasses. The extra 3D is quite cheap to calculate - shift the rotated X coordinates one way for one eye, then the other way for the other eye. After projection to the screen they need to be shifted back a little way to re-centre, but it works quite well.

sega_tween_3d_anaglyph.png

However, I had neglected the fact that the SMS1 (which has the card slot, and hence the model that supports the 3D glasses) had a bug in the VDP and as such only supports four zoomed sprites per scanline. I added this glitch to the emulator;


In other news, I've done a small amount of work on Brass. It's quite embarrassing, really, how slow the old version is. Assembling this file:

.rept 9000
	ld a,1
.unsquish
	ld a,2
.squish
	ret
.loop

...produces this in old Brass:

Brass Z80 Assembler 1.0.4.9 - Ben Ryves 2005-2006
-------------------------------------------------
Assembling...
Pass 1 complete. (2093ms).
Pass 2 complete. (22062ms).
Writing output file...
Errors: 0, Warnings: 0.
Done!

Nearly half a minute! New Brass does a much better job of syntax parsing and caching...

Brass Assembler - Copyright © Bee Development 2005-2007
-------------------------------------------------------
ZiLOG Z80 - Copyright © Bee Development 2005-2006
TI Program Files - Copyright © Bee Development 2005-2006
Core Plugins - Copyright © Bee Development 2005-2006

Parsing source...
Building...
Writing output...
Time taken: 484.38ms.
Done!

Down to just under half a second. That's almost a 50× speed increase!

VDP Interrupts

Wednesday, 20th December 2006

road_rash_2.png

The VDP can generate two different types of CPU interrupt.

The first, and easiest, is the frame interrupt, which is requested when an entire frame has been generated. This is requested, therefore, at a regular 60Hz in NTSC regions and 50Hz in PAL regions - it's a useful timer to synchronise your game to.

The second, and more complex, is the line interrupt. This interrupt is requested when a user-definable number of scanlines have been displayed. An internal counter is decremented each active line (and one more just after), and when it overflows it resets to the value held in a VDP register and requests the interrupt (so 0 would request an interrupt every line, 1 every other line and so on). For every other line outside the active display area, the counter is reset to the contents of the VDP register.

Both interrupt types can be enabled or disabled by defined bits held in the VDP registers.

(The above should be loosely correct, the below is a little more uncertain).

Once an interrupt is requested, a flag for said interrupt is set. The flag is not reset until the VDP control port is read, so you must read the VDP control port if you expect any further interrupts.

To differentiate between line and frame interrupts you can check the value read from the control port. If the most significant bit is set, a frame interrupt (at least) was requested. Reading the vertical counter port (which returns the current scanline's vertical position) will also let you know where you are.

Something is a little wonky with my vertical counting code, as all lines end up being one too large. For the moment I'm subtracting one before returning the value (and waiting one extra scanline before triggering the frame interrupt) which is a horrible solution, but for the moment it has fixed a number of games that weren't working at all before.

earthworm_1.gif earthworm_2.gif

Earthworm Jim, which relies on line interrupts to switch on zoomed sprites to dipslay the status bar at the bottom, now plays. It's missing some graphics on the title screens, though.

For some reason, rebuilding my Z80 emulator (which is in a different project) fixed some other interrupt-related glitches, so I have a sneaking suspicion most of my earlier problems are related to using an out-of-date DLL.

road_rash_1.gif road_rash_3.png road_rash_4.png

Road Rash highlighted another bug. The VDP can draw doubled sprites - that is, when a particular bit is set it will draw sprites as 8x16 pixels, stacking two consecutive sprite tiles on top of each other. Road Rash uses this mode, but also uses odd sprite indices (odd as opposed to even, not strange). The VDP will only take even indices, so a line of code to clear the least significant bit if using doubled sprites fixed that.

Still no sound, though.

New Z80 emulator

Wednesday, 6th December 2006

sonic_gg_1.png

One chap I cannot thank enough is CoBB for all his hard work in the Z80 field.

I've rewritten the Z80 emulation from scratch; this time it uses an expanded switch block (the 'manual' way) to decode instructions. Rather than write every combination of instructions out by hand, the code making up the switch blocks up is generated by another program, reading instruction information from a table copied from an Excel spreadsheet.

At the cost of a significantly larger assembly (from 40KB to 140KB) I now get a 100% speed increase (from ~50MHz to 100MHz).

I still can't pass the port of ZEXALL I'm testing with (and the same instructions too - not bad for a 100% rewrite to end up with exactly the same bugs), but after comparing some of my offending tables against CoBB's ones I've isolated some of the hiccoughs. The only instruction group test I fail is, naturally, the one that takes the longest to execute - getting a hardware, or indeed emulator, comparison takes well over an hour.

Anyhow, rewriting the Z80 emulation seems to have been the right thing to do. As you might have guessed from the picture at the top of this entry, Sonic now runs.

sonic_gg_2.png sonic_gg_3.png

My VDP (Video Display Processor) emulation is still rather rough-and-ready (I've really been concentrating on the Z80 bit) and the second column of background tiles is not updated correctly, so I apologise in advance for the distortion! It only appears in SMS mode (the display is cropped in Game Gear mode).

sonic_sms_1.png sonic_sms_2.png

The fill colour (in the left column here) is incorrect in most games too.

sonic_2_1.png sonic_2_2.png sonic_2_3.png sonic_2_4.png sonic_2_5.png

I rather preferred Sonic 2, but maybe that's because you can pick up dropped rings and I'm rather lousy at it otherwise.

wb_1.png wb_2.png wb_3.png

I accused the previous Wonder Boy III shot of not reading the start button. Somehow I also failed to notice the missing sprites (clouds and main part of the castle), which was part of the main problem. It now appears to play well.

vfa_1.png vfa_2.png vfa_3.png vfa_4.png

Support for zoomed sprites seems to be missing in some emulators (at least, the versions of Dega and Pastorama I have to hand), but I use them so implemented them to let my programs work when testing - it's nice to see a commerical game use them too!

gunstar_1.png gunstar_2.png

Gunstar Heroes runs, but flickers and jumps (not the visible sprite 'flicker' in those screenshots) - some bug in my CPU interrupt handling or VDP interrupt generation.

psycho_fox.png the_flash.png

Psycho Fox runs insanely fast - even faster than The Flash as seen to the right - making an already difficult game to control virtually impossible. I'm really not sure what could be causing that, but the enemy sprites sometimes flicker up and back down quickly, so it could be one of the remaining CPU bugs.

fantasy_zone_2.png maze_walker.png

Fantasy Zone demonstrates both the blanking colour bug (that left column should be green) and the distorted second column bug, but plays well. I tried to get a better screenshot of Maze Walker, but not handling 3D glasses makes looking at the screen a rather unpleasant experience (it flickers the left and right eye views in quick succession; the 3D glasses had two LCD shutters that opened and closed in sequence with the images on-screen).

I might add a Dega-esque red/green anaglyph filter, but I find those rather unpleasant to look at so might provide a stereo pair view.

In any case, the most important SMS game now runs -

bios_1.png bios_2.png bios_3.png

I shall refrain from using the Terry Pratchett quote (just this once).

ys_junk.png

Some games look like the above, which makes me happy - it's the ones that do nothing at all that worry me. What with the CPU bugs, dodgy emulation of the mapper, missing (important!) hardware ports and hackish VDP, it's surprising anything runs. It's getting better.

EDIT: How are there so many spelling errors? Fantasy Start as opposed to Fantasy Zone, Video Display Hardware as the expansion of the VDP acronym... I should not write these so late at night.

Compatibility increases further...

Thursday, 23rd November 2006

wonder_boy_1.png

I've added better memory emulation (that is, handling ROM paging, RAM mirroring and enabling a BIOS or not). I wouldn't dare say "more accurate", as that might indicate that something about it is partially accurate. smile.gif

I've isolated one of the biggest problems - and that's programs getting caught in a loop waiting for an interrupt that is never triggered.

The source of these interrupts is the VDP. It can generate two kinds of interrupt - on a line basis (where you can configure it to fire an interrupt every X scanlines) or on a frame basis (where it fires an interrupt at the end of the active display).

Two bits amongst the VDP's own registers control whether the interrupt fires or not. On top of that, there are two internal flags that are set when either of the interrupts fire. They are reset when the VDP's control port is read.

Charles MacDonald's VDP documentation
Bit 5 of register $01 acts like a on/off switch for the VDP's IRQ line. As long as bit 7 of the status flags [this is the frame interrupt pending flag] is set, the VDP will assert the IRQ line if bit 5 of register $01 is set, and it will de-assert the IRQ line if the same bit is cleared.

Bit 4 of register $00 acts like a on/off switch for the VDP's IRQ line. As long as the line interrupt pending flag is set, the VDP will assert the IRQ line if bit 4 of register $00 is set, and it will de-assert the IRQ line if the same bit is cleared.

The way I've chosen to emulate this is as a boolean storing the IRQ status, and to call a function, UpdateIRQ(), every time something that could potentially change the status of it happens.

private bool IRQ = false;
private void UpdateIRQ() {
    bool newIRQ = (LineInterruptEnabled && LineInterruptPending) || (FrameInterruptEnabled && FrameInterruptPending);
    if (newIRQ && !IRQ) this.CPU.Interrupt(true, 0x00);
    IRQ = newIRQ;
}

This detects a rising edge. Falling edge hasn't worked so well. Well, truth be told, neither work very well. So, hitting the I key in the emulator performs a dummy read of the control port which usually 'unblocks' the program.

I'm hoping that my problem is related to this similar one, as the fix seems pretty straightforwards.

The SEGA logo at the top of this entry is not the only evidence of working commerical software...

wonder_boy_2.png wonder_boy_3.png

Not reading the Start button is a common affliction.

faceball_2000.png

Here's the Game Gear DOOM for Scet. smile.gif

marble_madness_1.png marble_madness_2.png marble_madness_3.png marble_madness_4.png marble_madness_5.png

Marble Madness appears to be fully playable, though needs prompting from my I key every so often.

columns_1.png columns_2.png

Columns can't find the start button either, but the demo mode works.

pinball_dreams.png

ys.png desert_strike.png

Getting as far as a title screen is still an achievement in my books.

My luck can't hold out forever, but the homebrew scene is still providing plenty of screenshots...

bock_2002_1.png bock_2002_2.png bock_2002_3.png

Maxim's Bock's Birthday 2002 has been immensely useful for sorting out ROM paging issues.

bock_2003_1.png bock_2003_2.png

Bock's Birthday 2003 demonstrates the interrupt bug, but appears to run healthily otherwise.

hicolor_1.png hicolor_2.png hicolor_3.png

Chris Covell's Hicolor Demo also demonstrates this bug, needing a prod before each new image is displayed.

digger_chan_1.png digger_chan_2.png

Aypok's Digger Chan appears to play fine.

zoop_1.png zoop_2.png nibbles.png

Martin Konrad's Zoop 'em Up and GG Nibbles games run, Zoop 'em Up seems to have collision detection issues (aluop-related bug).

The bug in the Z80 core still eludes me - how aluop with an immediate value passes and aluop with a register fails is rather worrying. At least part of the flags problem has been resolved - the cannon in Bock's KunKun & KokoKun still don't fire, but at the switches now work so you can complete levels.

Sega: Enter the Pies

Monday, 20th November 2006

The Z80 core is now a bit more accurate - ZEXALL still reports a lot of glitches, and this is even a specially modified version of ZEXALL that masks out the two undocumented flags.

The VDP (Video Display Processor - the graphics hardware) has been given an overhaul and is now slightly more accurate. A lot more software runs now. I have also hacked in my PSG emulator (that's the sound chip) from my VGM player. It's not timed correctly (as nothing is!) but it's good enough for testing.

picohertz_tween.png

picohertz_wormhole.png

Picohertz, the demo I have been working on (on and off) now runs correctly. The hole in the Y in the second screenshot is caused by the 8 sprite per scanline limit. The first screenshot shows off sprite zooming (whereby each sprite is zoomed to 200% the original size). The background plasma is implemented as a palette shifting trick.

fire_track.png

fire_track_paused.png

Fire Track runs and is fully playable. The second shot shows a raster effect (changing the horizontal scroll offset on each scanline.

Seeing as I understand the instructions that my programs use (and the results of them), and have my own understanding of parts of the hardware, it's not really surprising that the programs I've written work perfectly, but ones written by others don't, as they might (and often do) rely on tricks and results that I'm not aware of, or on hardware that I haven't implemented accurately enough. At least I do not need to emulate any sort of OS to run these programs!

SMS Power! has been an amazing resource in terms of hardware documentation and homebrew ROMs. I've been using the entries to the 2006 coding competition to test the emulator.

kunkun_kokokun_title.png kunkun_kokokun_game.png

Bock's game, KunKun & KokoKun, nearly works. The cannon don't fire, which would make the game rather easy if it wasn't for the fact that the switch to open the door doesn't work either. I suspect that a CPU flag isn't being set correctly as the result to an operation somewhere.

pong.png

Haroldoop's PongMaster is especially interesting as it was not written in Z80 assembly, but C. It's also one of the silkiest-smooth pong games I've come across.

an!mal_paws_text.png an!mal_paws.png

An!mal/furrtek's Paws runs, but something means that the effect doesn't work correctly (the 'wavy' bit should only have one full wave in it, not two - it appears my implementation of something doubles the frequency of the wave). The music sounds pretty good, though.

columns.png

Sega's Columns gets the furthest of any official software - the Sega logo fades in then out.

sega_enterpies.png

I do like the idea that Sega is an ENTERPIεS. (From the Game Gear BIOS). (I believe this is a CPU bug).

frogs.png

Charles Doty's Frogs is a bit of a conundrum. The right half of the second frog is missing due to the 8 sprites per scanline limitation of the VDP. However, Meka, Emukon, Dega and now my emulator draw the rightmost frog's tongue (and amount if it showing) differently, as well as whether the frog is sitting or leaping. There's a lot of source for such a static program (it doesn't do anything in any emulator I've tried it on, nor on hardware). Dega is by far the strangest, as the tongue moves in and out rapidly. I'm really not sure what's meant to be happening here.

Here are the results of ZEXALL so far.

Z80 instruction exerciser

ld hl,(nnnn).................OK
ld sp,(nnnn).................OK
ld (nnnn),hl.................OK
ld (nnnn),sp.................OK
ld <bc,de>,(nnnn)............OK
ld <ix,iy>,(nnnn)............OK
ld <ix,iy>,nnnn..............OK
ld (<ix,iy>+1),nn............OK
ld <ixh,ixl,iyh,iyl>,nn......OK
ld a,(nnnn) / ld (nnnn),a....OK
ldd<r> (1)...................OK
ldd<r> (2)...................OK
ldi<r> (1)...................OK
ldi<r> (2)...................OK
ld a,<(bc),(de)>.............OK
ld (nnnn),<ix,iy>............OK
ld <bc,de,hl,sp>,nnnn........OK
ld <b,c,d,e,h,l,(hl),a>,nn...OK
ld (nnnn),<bc,de>............OK
ld (<bc,de>),a...............OK
ld (<ix,iy>+1),a.............OK
ld a,(<ix,iy>+1).............OK
shf/rot (<ix,iy>+1)..........OK
ld <h,l>,(<ix,iy>+1).........OK
ld (<ix,iy>+1),<h,l>.........OK
ld <b,c,d,e>,(<ix,iy>+1).....OK
ld (<ix,iy>+1),<b,c,d,e>.....OK
<inc,dec> c..................OK
<inc,dec> de.................OK
<inc,dec> hl.................OK
<inc,dec> ix.................OK
<inc,dec> iy.................OK
<inc,dec> sp.................OK
<set,res> n,(<ix,iy>+1)......OK
bit n,(<ix,iy>+1)............OK
<inc,dec> a..................OK
<inc,dec> b..................OK
<inc,dec> bc.................OK
<inc,dec> d..................OK
<inc,dec> e..................OK
<inc,dec> h..................OK
<inc,dec> l..................OK
<inc,dec> (hl)...............OK
<inc,dec> ixh................OK
<inc,dec> ixl................OK
<inc,dec> iyh................OK
<inc,dec> iyl................OK
ld <bcdehla>,<bcdehla>.......OK
cpd<r>.......................OK
cpi<r>.......................OK
<inc,dec> (<ix,iy>+1)........OK
<rlca,rrca,rla,rra>..........OK
shf/rot <b,c,d,e,h,l,(hl),a>.OK
ld <bcdexya>,<bcdexya>.......OK
<rrd,rld>....................OK
<set,res> n,<bcdehl(hl)a>....OK
neg..........................OK
add hl,<bc,de,hl,sp>.........OK
add ix,<bc,de,ix,sp>.........OK
add iy,<bc,de,iy,sp>.........OK
aluop a,nn...................   CRC:04d9a31f expected:48799360
<adc,sbc> hl,<bc,de,hl,sp>...   CRC:2eaa987f expected:f39089a0
bit n,<b,c,d,e,h,l,(hl),a>...OK
<daa,cpl,scf,ccf>............   CRC:43c2ed53 expected:9b4ba675
aluop a,(<ix,iy>+1)..........   CRC:a7921163 expected:2bc2d52d
aluop a,<ixh,ixl,iyh,iyl>....   CRC:c803aff7 expected:a4026d5a
aluop a,<b,c,d,e,h,l,(hl),a>.   CRC:60323322 expected:5ddf949b
Tests complete



The aluop (add/adc/sub/sbc/and/xor/or/cp) bug seems to be related to the parity/overflow flag (all other documented flags seem to be generating the correct CRC). daa hasn't even been written yet, so that would be the start of the problems with the daa,cpl,scf,ccf group. adc and sbc bugs are probably related to similar bugs as the aluop instructions.

The biggest risk is that my implementation is so broken it can't detect the CRCs correctly. I'd hope not.

In terms of performance; when running ZEXALL, a flags-happy program, I get about ~60MHz speed in Release mode on a 2.4GHz Pentium 4. When ZEXALL is finished, and it's just looping around on itself, I get ~115MHz.

The emulator has not been programmed in an efficient manner, rather a simple and clear manner. All memory access is done by something that implements the IMemoryDevice controller (with two methods - byte ReadByte(ushort address) and void WriteByte(ushort address, byte data)) and all hardware access is done by something that implements the IHardwareController interface (also exposing two methods - byte ReadDevice(byte port) and void WriteDevice(byte port, byte data)).

Most of the Z80's registers can be accessed via an index which makes up part of an opcode. You'd have thought that the easiest way to represent this would be, of course, an array. However, it's not so simple - one of the registers, index 6, is (HL) - which means "whatever HL is pointing to". I've therefore implemented this with two methods - byte GetRegister(int index) and void SetRegister(int index, byte value).

Life isn't even that simple, though, as by inserting a prefix in front of the opcode you can change the behaviour of the CPU - instead of using HL, it'll use either IX or IY, two other registers. In the case of (HL) it becomes even hairier - it'll not simply substitute in (IX) or (IY), it'll substitute in (IX+d), where d is a signed displacement byte that is inserted after the original opcode.

To sort this out, I have three RegisterCollections - one that controls the "normal" registers (with HL), one for IX and one for IY. After each opcode and prefix is decoded, a variable is set to make sure that the ensuing code to handle each instruction works on the correct RegisterCollection.

The whole emulator is implemented in this simplified and abstracted manner - so I'm not too upset with such lousy performance.

I'm really not sure how to implement timing in the emulator. There's the easy timing, and the not-so-easy timing.

The easy timing relates the VDP speed. On an NTSC machine that generates 262 scanlines (60Hz), on a PAL machine that generates 313 scanlines (50Hz). That's 15720 or 15650 scanlines per second respectively.

According to the official Game Gear manual, the CPU clock runs at 3.579545MHz. I don't know if this differs with the SMS, or whether it's different on NTSC or PAL devices (the Game Gear is fixed to NTSC, as it never needs to output to a TV, having an internal LCD).

I interpret this as meaning that the CPU needs to be run for 227.7 or 228.7 cycles per scanline. That way, my main loop looks a bit like this:

if (Hardware.VDP.VideoStandard == VideoStandardType.NTSC) {
    for (int i = 0; i < 262; ++i) {
        CPU.FetchExecute(228); 
        Hardware.VDP.RasteriseLine();
    }
} else {
    for (int i = 0; i < 313; ++i) {
        CPU.FetchExecute(229);
        Hardware.VDP.RasteriseLine();
    }
}

The VDP raises an event when it enters the vertical blank area, so the interface can capture this and so present an updated frame.

The timing is therefore tied to the refresh rate of the display.

super_gg.png

Here's the fictional Super Game Gear, breezing along at 51MHz. The game runs just as smoothly as it would at 3MHz, though - as the game's timing is tied to waiting for the vertical blank.

Actually, I tell a lie - as Fire Track polls the vertical counter, rather than waiting for an interrupt, it is possible for it to poll this counter so fast (at an increased clock rate) that it hasn't changed between checks. That way "simple" effects run extra fast, but the game (that has a lot of logic code) runs at the same rate.

This works. The problem is caused by sound.

With the video output, I have total control of the rasterisation. However, with sound, I have to contend with the PC's real hardware too! I'm using the most excellent FMOD Ex library, and a simple callback arrangement, whereby when it needs more data to output it requests some in a largish chunk.
If I emulate the sound hardware "normally", that is updating registers when the CPU asks them to be updated, by the time the callback is called they'll have changed a number of times and the granularity of sound updates will be abysmal.

A solution might be to have a render loop like this:

for (int i = 0; i < 313; ++i) {
    CPU.FetchExecute(229);
    Hardware.VDP.RasteriseLine();
    Hardware.PSG.RenderSomeSamples(1000);
}

However, this causes its own problems. I'd have to ensure that I was generating exactly the correct number of samples - if I generated too few I'd end up with crackles and pops in the audio as I ran out of data when the callback requested some, or I'd end up truncating data (which would also crackle) if I generated too much.

My solution thus far has been a half-way-house - I buffer all PSG register updates to a Queue, logging the data written and how many CPU cycles had been executed overall when the write was attempted. This way, when the callback is run, I can run through the queued data, using the delay between writes to ensure I get a clean output.

As before, this has a problem if the timing isn't correct - rather than generate pops or crackles, it means that the music would play at an inconsistent rate.

Of course, the "best" solution would be to use some sort of latency-free audio solution - MIDI, for example, or ASIO. If I timed it, as with everything else, to scanlines I'd end up with a 64µs granularity - which is larger than a conventional 44.1kHz sample (23µs), so PWM sound might not work very well.

chip_8.png

Incidentally, this is not the first emulator I have written - I have written the obligatory Chip-8 emulator, for TI-83 calculator and PC. Being into hardware, but not having the facilities to hand to dabble in hardware as much as I'd like to, an emulator provides a fun middle-ground between hardware and software.

wormhole.gif

Multithreading on the TI-83 Plus

Thursday, 3rd August 2006

I'm not really clued up on the way these new-fangled modern operating systems handle running multiple threads, so I apologise in advance if this isn't really proper multithreading.

Basically, I was pondering (as one does) whether it would be possible to run any number of threads on the TI-83 Plus. I daresay others have done this in the past, but not having access to the internet at the time to check I thought I'd do my own bit of experimentation.

The way I can see this working is to give each thread a small amount of time to run in (time slice). At the end of one thread running, I would save the calculator's state away, and load the state for the next thread. I would keep on swapping threads several times a second, giving each their own bit of time and preserving their state.

What I really needed to know was what needed to be preserved between executions. Really, there isn't a lot - just the registers. PC and SP clearly need to be preserved, so that when a thread is resumed it resumes running at the point it was left at. Keeping the other general-purpose registers (AF, BC, DE, HL, IX, IY) safe would be important as well.

Naturally, I'll need to also preserve the stack. Or rather, for each thread, I'll need to allocate some memory for a stack and point the new thread's SP register at the end of that new memory. To keep things simple, I'll just point my test thread's SP at $FFFF-200, as the TI is arranged to have 400 bytes of stack RAM, growing backwards from $FFFF.

To perform the swapping, I'll use a custom interrupt (run in IM 2). The timer hardware will trigger this interrupt many times a second, so it is ideal. At the end of my custom interrupt, I'll jump into the default handler provided by the TIOS so the TIOS functions that rely on it behave correctly.

The way I see it working is like this:

-> Enter interrupt handler.

Pop value off stack. This will be the program counter of the last running thread. (->A)
Save current stack pointer. (->B)

Set stack pointer to area in RAM where I can preserve registers for this thread. (<-C)

Push IY, IX, HL, DE, BC, AF to the stack.
Push old stack pointer (<-B) to stack.
Push old program counter (<-A) to stack.

Find the address of the RAM where we have stored the registers for the next thread.
Load it into the stack pointer.

Pop value off stack, store it (will be PC) (->D)
Pop value off stack, store it (will be SP) (->E)

Pop AF, BC, DE, HL, IX, IY off stack.

Save current stack pointer (end of RAM area used to store registers for new thread) as address of last thread for next interrupt. (->C).

Load value into SP (<-E)
Push new thead's PC to stack (<-D)

Jump into TIOS interrupt handler ->

Effectively, all this does is take the pointer stored when the current thread was resumed, dump all the registers into the memory it points to, hunts the next thread, reloads the registers and saves the pointer for next interrupt.

Does it work?

_getkey.gif

The above screenshot doesn't look too impressive, I'll give you that. There are only two threads running. The secondary thread is drawing all those random dots on the LCD. The other thread (which is the main program thread) just contains "bcall(_getkey)" - hence the 2nd/Alpha icons appearing,

I'll make the primary thread do something a bit more exotic - display random characters on the left side of the screen, the secondary work on the right side of the screen:

randomchars.gif

The code for this is simply:

Main
	.include "Multithread/Multithread.asm"


	; Kick things into action.
	call Multithread.Init
	
	; For the moment, the secondary thread is hard-coded, and 
	; the 'multithreading' code ONLY switches back and forth between
	; two hard-coded thread slots.
-

	halt ; Slow things down a bit here.

	ld b,8
	call ionRandom
	ld (curCol),a

	ld b,8
	call ionRandom
	ld (curRow),a

	ld b,0
	call ionRandom
	bcall(_PutMap)
	bcall(_getCSC)
	cp skClear
	jr nz,{-}
	
	call Multithread.End
	ret
	
; ---------------------------------------------------------
	
SecondaryThread
	
	di ; Don't want any other thread writing to the LCD!
			
	ld b,64
	call ionRandom
	add a,$80
	push af ; Save for later...
		out ($10),a
		
		ld b,48
		call ionRandom
		add a,48
		ld b,a
		srl a
		srl a
		srl a
		add a,$20
		out ($10),a
	
		ld a,b
		and 7
		ld c,%10000000
		jr z,{+}
		ld b,a
	-	srl c
		djnz {-}
	+
	
		in a,($11)
		call LcdBusy
		in a,($11)
		xor c
		ld c,a

		call LcdBusy

	pop af
	out ($10),a

	call LcdBusy
	ld a,c
	out ($11),a			
		
	ei
	
	jr SecondaryThread
	
LcdBusy
	push af
	inc hl
	dec hl
	pop af
	ret

As you can see, there are two discrete threads running there.

Well, that's two threads up and running. What's needed to extend this to any number of threads?

  • Thread management. Basically, the ability to add/remove threads at will, and have the handler be able to switch between as many threads as required.
  • Allocation of new stack space. New threads will need more stack space, so I shall need to add some mechanism to steadily allocate it.
  • Idle threads. One big problem is that as you add threads, each one gets progressively slower. If threads flag themselves as idle, they can be skipped so threads that do need CPU time get it. The easiest way to do this is to set a "sleep" counter which is checked when threads are switched around - if it's zero (when it runs out), the thread is given some time to run.

Off the top of my head, that's about it. There are some issues I have come across when working with this:

  • TIOS routines are not thread-safe. This speaks for itself, pretty much - if you have two threads, both of which are calling bcall(_putC) to display a character, they interfere with eachother (for example - incorrectly setting the value of curCol or curRow as one thread changes it as the other one is reading it) which causes crashes.
  • Variables can become an issue. Same reason as above - if two threads call a function which uses a set RAM location for a local variable, the variable can be changed half way through.

One possible way to get around thread-unsafety of functions is to disable interrupts before calling them and reenabling them afterwards, which prevents the thread switcher from swapping them. It's a pretty lousy workaround, though. sad.gif

Structs and sensible variable layout

Tuesday, 9th May 2006

When developing an assembly program, you need to 'declare variables' by attaching an address (in RAM) to a label.

The problem here, of course, is having to calculate all the relevant positions in RAM for each variable. For example,

ram = $C000 ; Assume RAM starts at address $C000

var1 = ram+0
var2 = ram+1
var3 = ram+3 ; var2 is 2 bytes!
; ... and so on ...

Now this is fairly painful. So, I added .varloc and .var directives to Brass:

ram = $C000 ; Assume RAM starts at address $C000

.varloc ram, 1024 ; 1024B in size

.var 1, var1
.var 2, var2
.var 1, var3
; ... and so on ...

This eases things a bit, but what if you have lots of variables and a number of different RAM areas? Now you still have to shuffle things around to fit. So, now, Brass allows you to define multiple areas of memory for variables (through multiple .varloc statements) and shuffles around all the variables to best fit in the available memory. Variables defined using .tempvar can even overwrite eachother (provided they are in different, not-nested modules) to save space.

Of course, sometimes you need variables to share consecutive areas of RAM, so I also added structure support.

; Define it

.struct Point2D
    .var db, X
    .var db, Y
.endstruct

; Use it

.var Point2D, Me

    ld a,10
    ld (Me.X),a
    
    ld a,32
    ld (Me.Y),a
    
    ld hl,10+32*256
    ld (Me),hl

; Or even:

.struct Point3D
    .var Point2D, P
    .var db,      Z
.endstruct

.var Point3D, You

    ld a,(You.P.X)

And, to comply with the picture requirement, have something ancient and completely unrelated.

hippo.gif

Getting it to work

Wednesday, 16th November 2005

Reliability

I'm sure you wouldn't want to use a keyboard that was inaccurate. Unfortunately, (as you might have seen me moaning in the past) the keyboard generates the clock signal. This basically means you need to be constantly checking the clock line to see if it goes low - that or use a hardware interrupt the jumps in and receives the scancode packet when the keyboard decides you want to stop sending something.

Well, I don't have the luxury of an extra keyboard chip or a hardware interrupt, so I needed to don my thinking cap. I had thought that one possible way around this would be to send a "disable" command to the keyboard after receiving bytes, running my handler code, then sending an "enable" command - these commands clear the keyboard's buffer, unfortunately, which is not good.

I downloaded another document on the AT protocol to see if they mentioned anything useful, and lo and behold:

The host may inhibit communication at any time by pulling the Clock line low for at least 100 microseconds.

I think I can spare 100µs - I added the line ld a,1 \ out (bport),a to the end of my buffer-filling code from earlier (it fills a scancode key buffer) - and the code doesn't drop a single scancode. Result!

Providing useful functionality

All these keyboard routines do at the moment is to display the data coming in on the screen - not exactly a great use of them. What is really needed is a simple two-way handler - it calls two different user-defined routines based on what a key is doing, whether it is being pushed down or released.

For this, I should translate the scancodes into a new format - there are less than 256 keys, so there's no reason why I can't fit every single key into a byte.

As well as user-defined keyboard events, I'll have to add my own to handle toggling the status of the keyboard - the num/caps/scroll lock as well as the shift/alt/ctrl.

It's really quite simple to do.

	; Now we need to run through all the keyboard events!

	ld ix,_buffer	; Start at the beginning of the buffer.

_handle_next_scancode:

	; Let's assume it's a normal key:

	ld hl,_scancode_lut
	ld bc,_scancode_lut_end-_scancode_lut

_key_down_handler:
	ld de,_null_handler


	ld a,(ix+0)
	or a
	jr z,_handled_all_keys

	cp at_scs_enhance	; Is it an enhanced key?
	jr nz,_not_enhanced

		inc ix	; We know it's enhanced, so move along.
		ld a,(ix+0)
		or a
		jr z,_handled_all_keys
		ld hl,_scancode_e_lut
		ld bc,_scancode_e_lut_end-_scancode_e_lut
_not_enhanced:

	; Now HL points to our LUT, BC is the correct length
	; and IX points to the next byte.

	cp at_scs_keyup ; Is it a key UP event?
	jr nz,_not_keyup

		inc ix
		ld a,(ix+0)
		or a
		jr z,_handled_all_keys
_key_up_handler:
		ld de,_null_handler

_not_keyup:

	; Move to next chunk before we do anything
	inc ix
	; At this point, A=scancode, HL->translation table, DE->handler, BC=table size, IX->next scancode.

	; We now need to run the translation.

	cpir	; Simple as that!

	jr nz,_handle_next_scancode	; Not found

	; So HL->scancode+1
	; Here is where the magic happens:

	push de
	ld de,0-_scancode_lut
	add hl,de
	ld a,l
	pop de

	; Now, A = 'real' scancode.

	; We need to 'call' DE.
	; We can spoof this easily:

	ld hl,_handle_next_scancode
	push hl	; _h_n_s is on TOP of the stack.
	push de ; our event handler is on top of the stack;
	ret	; POP back off stack and jump to it.
		; When the handler RETs it'll pop off _h_n_s and carry on scanning!


_handled_all_keys:

_null_handler:
	ret
Basically, I just run through all the received bytes. I check for $E0 (if so, switch to the alternate scancode conversion table for the extended codes) then $F0 (if so, load the alternate handler for a key up event rather than a key down event) then look up the scancode on the selected code table.

I created a couple of very basic event handlers - the key down event displays ↓ followed by the adjusted key code, the key up event displays ↑ followed by the adjusted key code.

keyhandlers.jpg

What would be ideal would be to provide internal event handlers that could be called on keyup/keydown which would then jump over to the user's custom handler. These event handlers could look for special keys and adjust the keyboard LEDs and set internal flags that could be used to detect the status of certain keys. I'd need:

  • Num Lock
  • Caps Lock
  • Scroll Lock
  • Shift
  • Ctrl
  • Alt

Unfortunately, I think that there is a problem with my byte-sending code. Setting the keyboard LEDs starts to do strange things - I now have two options;

  1. After switching status, ignore the LEDs. The status flag is set correctly internally, but you can't see it on the keyboard (which is a bit pants).
  2. Update the status flags every single time we run through the loop to check for any new bytes (which makes the keyboard lag like crazy - there's up to half a second of buffering going on!)

Mixing-and-matching the two - rewriting the branch before we bring the clock high again (for maximum speed) confuses the keyboard - the status LEDs never change and it decides to disable itself in a strop until I send $FF (the reset command) again. I think it's time to revisit the at_send_byte routine again to see what it's doing wrong!

Well, comparing it to my new notes - it's actually completely wrong at the end, when it comes to sending the parity/stop/ACK bits! A quick rewrite to how I think it should go isn't too hopeful - the keyboard LEDs flash like mad. Tweaking the timing by throwing in a few calls to _wait_bit_low and _wait_bit_high to synchronise my data to the clock stop this completely - and now the code is as it was before, at 100% accuracy - but about twice as fast.

Replacing my branch code still doesn't work all the time - sometimes the LEDs change, sometimes they do not. Not believing it would work, I threw in a check for the ACK bytes returned - if they were $FE, the 'repeat last command' byte, I'd send again.

My routines were clearly not as broken as I thought - the keyboard LEDs now change status perfectly, and as much as I hammer the Num Lock, Caps Lock and Scroll Lock keys, I cannot lock up the program or get the keyboard LEDs to display the wrong value. Not to mention that keying in other keys is back to the lightning fast response they used to be...

TI Keyboard

Tuesday, 15th November 2005

Once again, I waste my time doing something remarkably useless - it's trying to connect a keyboard to my TI graphing calculator!

in_place.jpg

Idea

Alas, this is not an original idea - I had seen some work done on this in the past, and tried fiddling with the code myself, not really understanding much of it. I decided to have a go - from scratch, and on my own.

Constructing the hardware

keyboard_before.jpg

I decided to butcher an old, suitably discoloured keyboard (in a nice way, though, so if all went wrong I could reassemble it). I don't have a soldering iron, nor any sort of decent cabling or plugs so I planned to just cut and strip the wires inside the keyboard and attach (by twisting together and large amounts of Sellotape) a power cable and 2.5mm stereo minijack plug (the TI has a 2.5mm stereo minijack socket on it as a data port).

enough_screws.jpg

I'm not sure they put enough screws into this thing... (by contrast, my main keyboard has a whopping 2 screws in it - this keyboard's designer was clearly paid by the screw!)

controller.jpg

Finally (and with a slightly sore wrist), I get to the bit I need - the keyboard controller board. The circuit board has the four leads soldered straight into it - I'd been hoping for the more typical sight of a block that the cable plugs into, but unfortunately it looks like I'll be cutting wires and hoping not to snap them off by accident. The four wires have been labelled VCDO - I'll cross my fingers and assume these stand for VCC (+5V), Clock, Data and 0V.

controller_stripped.jpg

leads_attached.jpg

Out it comes, the wires are stripped and reattached. I then tested all the leads using my multimeter and created some stunning ASCII art to remind me what went where:

HARDWARE SCHEMATIC:

PS/2 SOCKET:                               TI CONNECTOR:

  6.-++-.5      4: VCC   >-[+5V]                _
  /o || o\      5: Clock >---< 1: Red    Tip   /_\.
 |   ++   |     1: Data  >---< 2: White  Ring  }_{
4|o      o|3    3: Gnd   >-+-< 3: Copper Base  | |
  \      /               __|__                 |_|
   \o__o/                 ---  0V             /   \.
   2    1                  '                  \___/===> To TI
 ^ Note that this is the SOCKET view
and not the plug view for a PS/2 port.

taped.jpg

With that done, I taped down the connections and screwed the whole thing back together. Tapping a 9V PP3 battery to the power leads makes the keyboard boot up; doesn't look like it's broken quite yet!

Writing the code

What with the hardware now hopefully complete, it's a simple case of writing the code to support the keyboard. *cough cough*

Basically, I need to write an implementation of the AT protocol. The protocol is fairly simple, but unfortunately the keyboard generates the clock signal for us (the TI doesn't have accurate timing at all - it's an RC circuit, and so the clock rate drops as the batteries go flat). Let's hope the TI can keep up!

I guess the easiest code to write would be code that sets the output high (as it's an open collector circuit, high is the default level - either side can pull this low and hold it there) then poll the clock and wait for it to drop then go back up again. The clock line maps to bit 0 of the link port, and the data line to bit 1 of the link port. The TI's data IO port can be controlled using these two lower two bits of the hardware port equated as 'bport' (port 0) in ti83plus.inc

Here's what I tried;

init_all:
    ; Set link port high
    ld a,%00000011
    out (bport),a


wait_bit_low:
    in a,(bport)        ; Read in the status of the bport.
    and %00000001       ; Mask out bit 0 (clock)
    jr nz,wait_bit_low  ; Is it non-zero (high?) If so, loop back.

wait_bit_high:
    in a,(bport)        ; Read in the status of the bport.
    and %00000001       ; Mask out bit 0 (clock)
    jr z,wait_bit_high  ; Is it zero (loop?) If so, loop back.

    ret                 ; Break out of the program

Strange, whatever I do - init the keyboard, tap keys, nothing happens. The program goes into an endless loop. Using the 9V to manually set the line low then high again, I realise something - I keep forgetting that the port works backwards, and that writing %00000011 sets the status to %00000000 - and when a line is held low by either device, it can't be brought up again very easily. Replacing the offending line with xor a to clear it to zero worked a treat - pressing any key on the keyboard exits the program.
Each "packet" on the AT protocol (it's a bit grand to call it that) is made up of 11 bits - 1 start bit, 8 data bits and 2 parity/stop bits. A djnz loop to get the 8 bits and a handful of rotate instructions to populate the result byte gives me this:

.module AT_Protocol

at_timeout = 255

at_get_byte:
    ; Clear Link port
    xor a
    out (bport),a

    ; Get the start bit:

    call _wait_bit_low
    call _wait_bit_high


    ; Now we need to get the 8 bits for the byte

    ; Reset the output byte
    ld c,0

    ld b,8

_get_byte_loop:

    call _wait_bit_low

    ; Now we get the bit itself
    in a,(bport)
    rrca
    rrca
    rr c

    call _wait_bit_high

    djnz _get_byte_loop

    ; Get the parity/stop bits

    call _wait_bit_low
    call _wait_bit_high
    call _wait_bit_low
    call _wait_bit_high

    ; Clear flags, load code into accumulator and exit
    xor a
    ld a,c
    ret

_get_byte_fail:
    ; Set nz to indicate failure, return.
    or 1
    ret


_wait_bit_low:
    push bc
    ld b,at_timeout
_wait_bit_low_loop:
    in a,(bport)
    and 1
    jr z,_waited_bit_low
    djnz _wait_bit_low_loop
    pop bc
    pop bc
    jr _get_byte_fail
_waited_bit_low:
    pop bc
    ret

_wait_bit_high:
    push bc
    ld b,at_timeout
_wait_bit_high_loop:
    in a,(bport)
    and 1
    jr nz,_waited_bit_high
    djnz _wait_bit_high_loop
    pop bc
    pop bc
    jr _get_byte_fail
_waited_bit_high:
    pop bc
    ret

Not too tricky at all! Amazingly, this code ran first time too. (Amazingly for me, that is). The test program just reveices a byte from the keyboard and displays it on the screen.

it_lives.jpg

One minor problem is that sometimes the code received differs by a bit to what it should (you can see this by holding down a key and noting how sometimes the code is different - I've written a short program that just displays the code on-screen when it's received). Consulting my AT protocol notes, I find that "After the clock line goes low a 5-microsecond pause is used so that the data line has time to latch." Maybe my pause isn't long enough?

Sadly, that did quite the opposite - the results are even more unpredictable. I guess it would be more better to try speeding up my code, rather than slowing it down..?

After increasing the speed a little (without unrolling all the loops, that is) the routines are (by and large) slightly more accurate. Still not perfect, but they'll do for the time being. If I press a key the release it just before it repeats, the accuracy is 100% perfect - I suspect that the problem is that in the time the rest of my program has drawn the last keycode, the keyboard has pottered away and tried to output another byte and I'm jumping in half way through. I guess I'll have to write some clever buffering code to handle that!

Interrupts

I thought that an ideal way to handle the timing/speed problem was to create a piece of code that could be loaded as a sort of driver. The calculator would be set up into interrupt mode 2 and would call the driver 100-or-so times a second. The code could then try to see if a byte was coming in, and if so it would add it to a buffer. A routine isolated from the rest of the code could then read a byte from the buffer and shift all the other items down to replace it - a sort of FIFO stack.

The Z80 has three interrupt modes; 0, 1 and 2. Interrupt mode 0 is pretty useless to us; interrupt mode 1 is the normal mode of operation. In this mode, the CPU pushes the program counter to the stack then jumps to memory location $38 every time an interrupt occurs. The interrupt handler then swaps the main CPU registers away with their shadow register pairs, does something, then swaps the CPU registers back again. Finally, you pop the old program counter off the stack and jump back to where you were. You could think of it as having a second thread running, only a lot less hi-tech and more restrictive.

Interrupt mode 2 is a stranger beast. The main difference is that it doesn't just jump to $38 - it creates a 16-bit address using the register I as the most significant byte and a byte off the data bus as the least significant byte - effectively, we have a 16-bit number made up of i*256+?. The CPU then loads the value in the memory location pointed to by this 16-bit value, then calls this address.

What does this mean for us? Well, it means that rather relying on the interrupt at $38 we can load our own interrupt into memory!

We need to do three things:

  1. Create a 257-byte lookup table aligned to a 256-byte boundary for the CPU to read from after it has build up the 16-bit address.
  2. Set the I register to the most significant byte of the start address of our lookup table.
  3. Copy our interrupt handler to the location our table points to, switch the interrupt mode to two and enable interrupts.

The way I've done this is:

  • Filled memory locations $8700 to $8800 (inclusive) with the byte $86.
  • Loaded $87 into the I register.
  • Copied my interrupt handler to $8686

My interrupt handler at $8686 is a simple jp instruction to jump back to my program for the sake of practicality.

Unfortunately, this approach doesn't work (and I tried a lot of different ways to get it to!) One reason for failure is that in im 2, the main interrupt handler at $38 isn't getting executed. The TIOS relies pretty heavily on this interrupt to work; most functions cause a pretty nasty crash or do other strange things. Fine, I say to myself, and replace my reti call at the end of my interrupt handler with a jp $38 to manually call the TIOS interrupt. The behaviour gets even stranger - calling ionFastCopy (a function, non-TIOS related, to copy the display buffer to the LCD) causes strange rippling effects to appear on the LCD, followed by a full-out crash when I finally quit the program.

On top of all this, the few times I can get a display of the key buffer I can see that it's not updating very frequently... The interrupt is not checking the port frequently. All this for nothing!

As far as I can see it, the only way for an interrupt-based technique to work would be for the TI to have a hardware interrupt - using the keyboard's clock connected to the CPU, so that whenever the clock goes low the TI could spring into action and receive the byte. Seeing as the only access to the CPU I have without invalidating my warranty is via the data port, I'm a bit stuck.

Back to square one

I guess the only way is to agressively poll the port... First up, I rewrote the code so that instead of displaying a decimal version of the code, I'd display a hexadecimal version - significantly easier to read, faster to convert. I then painstakingly noted down every key's scancode from this into a useful include file.

One problem with the original TI keyboard project was that it had problems with input; it would occasionally forget about shift being pressed, or accidentally repeat keys. I think I now know why...

On an AT protocol keyboard, scancodes are sent every time a key is pressed and again when the key is released. To differentiate between the two different actions, when a key is released the scancode is preceded by the special code $F0.

I reckon that the problem was that the function to get a byte would have been called, followed almost immediately with the code to translate/display it. What I intend to do instead is to get bytes in a loop and add them to a buffer until either the routine times out (no more bytes being sent) or the buffer is full (shouldn't happen!)

keycodes.jpg

As you can see, this system works. The above codes are special ones generated by pressing some of the extended keys (the cursor keys) - they send the code $E0 followed by the key itself. Some of the codes are downright silly - PrintScreen sends the command string E0F07CE0F012 when released - Pause sends E11477E1F014F077!

Infuriatingly... this is STILL not perfect! I'm still losing some bytes. What to do - if only there was a way to control the keyboard, to stop it from scanning... wait a minute...

Controlling the Keyboard

Sending a byte is not too different from receiving a byte - you hold the clock and data lines low, then just the data low, then write out the bits as the keyboard requests them on clock. The easiest code to transmit is $FF - keyboard reset. According to my notes, the keyboard should respond with the power-on self-test byte as well as resetting. Lo and behold, the keyboard lights flash and I get $AA back - the self test has passed. I also get $E8 back, and my notes don't mention $E8 anywhere, but I'll ignore that for a minute and bask in the glory of it otherwise working perfectly.

The next code I try is $EE. This is the echo code - in theory, the keyboard should send back $EE. Sadly, it doesn't. Damn. It just resets and sends back $AA, though it sometimes sends back $B8 as well.

On close inspection, it seems pretty obvious what I've done wrong - I'm completely neglecting to send the parity and stop bits. Oops. After adding them, the keyboard responds $EE to my $EE - which is quite correct! The parity is hard coded, so I'm glad it works.

Trying another code, $F2, gets the keyboard to spit back an unfriendly $FE - which translates as "resend, you idiot". $EE is %11101110, which contains 6 set bits. $F2 is %11110010 which contains 5 set bits - the parity needs to be reversed. Hopefully calculating the parity shouldn't consume too many clock cycles - after I extract the bit to send, I need to add it to another counter. I can then use the lowest bit of this counter as my parity bit.

Thankfully, I have sufficient time to calculate the parity - the keyboard now responds FAAB83. $FA is the ACK code, AB83 are the ID bytes. Scarily, my notes say that the ID bytes are $83 then $AB - I'm going to hope that this is an error in the notes - I doubt my routines are going to be able to mix up whole bytes!

Now for the main reason you'd want to send codes to the keyboard - to flash the LEDs, of course! The code is $ED - the keyboard should respond with an ACK ($FA), to which I send the status byte (the lower 3 bits control the LEDs). The code is pretty simple (albeit without any checking on the ACK):

    ld a,$ED
    call at_send_byte   ; Command
    call at_get_byte    ; ACK
    ld a,%00000101      ; Caps Lock and Scroll Lock on
    call at_send_byte   ; Command
    call at_get_byte    ; ACK

lights.jpg

Ace. I'll add the equates for the various commands to my include file. No doubt I can get some more done with this project tomorrow...

Scripted attacks

Monday, 3rd October 2005

Attack patterns can now be simply scripted as a list of 4-byte chunks, covering which enemy to use, the delay between adding them, how many of them to add, the delay between them and a delay after adding the last one.
For example, this:
.db $04,16,8,100
...would add 8 enemies of type $04, adding one every 16 game ticks and then pausing for 100 game ticks after adding the last one then progressing to the next scripted enemy.
To support this, I've added some new per-level parameters, covering:

  • Attack type (random or 'scripted') with delay between enemies or a pointer to script to follow.
  • Speed of enemies.
  • Speed of landscape scrolling (used very little - bonus levels with no enemies/mines scroll past extra-quickly).
  • Maximum number of mines.

Using all the above, I can easily configure each level's attack patterns quite simply.
Doing this has identified a number of bugs (mostly where new enemies were being initialised without clearing out a particular byte, which means that certain sequences would start in odd places) which have now been ironed out.

I have also picked up work again on my music system for the game. I have a very bad 12-bar-blues demo running with it - I need to find a decent pitch-to-period table as the one I calculated in Excel sounds slightly wrong. There are also some minor-ish reset bugs (the first time in-game a note is played the instrument is full volume for one frame plus some minor synch issues). Looks like I'll have to write the music in Notepad, though - who'd have thought that getting low-level access to the sound card was so bloody difficult in anything other than C (and I'm damned if I'm going to have to write a GUI system for the Windows console, and Win32 is too mucky to deal with for such as simple application). This is great fun, as you can imagine - take, for example, this: (the 12-bar-blues demo for the testbed for the music system)

; Demo tune

demo_tune:

; Instrument table:

.db 2		; Number of instruments
.dw simple
.dw vib

; Sequence table:

.db 6		; Number of sequences
.dw run_c
.dw run_f
.dw run_g
.dw bass_c
.dw bass_f
.dw bass_g

; Tune!

.db %00000000, %00000000
.db %00000011, %00000001
.db %10000000, %10001000

.db %00000000, %00000000
.db %10000000, %10001000

.db %00000001, %00000000
.db %00000100, %00000001
.db %10000000, %10001000

.db %00000000, %00000000
.db %00000011, %00000001
.db %10000000, %10001000

.db %00000010, %00000000
.db %00000101, %00000001
.db %10000000, %10001000

.db %00000001, %00000000
.db %00000100, %00000001
.db %10000000, %10001000

.db %00000000, %00000000
.db %00000011, %00000001
.db %10000000, %10001000

.db %00000000, %00000000
.db %10000000, %10001000

.db %11111111

; Sequences:

run_c:
	.db 1

	.db (144>>8)+%01000000
	.db (144&%11111111)
	.db $10

	.db (112>>8)+%01000000
	.db (112&%11111111)
	.db $10

	.db (93>>8)+%01000000
	.db (93&%11111111)
	.db $10

	.db (82>>8)+%01000000
	.db (82&%11111111)
	.db $10

	.db (77>>8)+%01000000
	.db (77&%11111111)
	.db $10

	.db (82>>8)+%01000000
	.db (82&%11111111)
	.db $10

	.db (93>>8)+%01000000
	.db (93&%11111111)
	.db $10

	.db (112>>8)+%01000000
	.db (112&%11111111)
	.db $10

	.db %11000000

run_f:
	.db 1

	.db (105>>8)+%01000000
	.db (105&%11111111)
	.db $10

	.db (82>>8)+%01000000
	.db (82&%11111111)
	.db $10

	.db (68>>8)+%01000000
	.db (68&%11111111)
	.db $10

	.db (60>>8)+%01000000
	.db (60&%11111111)
	.db $10

	.db (56>>8)+%01000000
	.db (56&%11111111)
	.db $10

	.db (60>>8)+%01000000
	.db (60&%11111111)
	.db $10

	.db (68>>8)+%01000000
	.db (68&%11111111)
	.db $10

	.db (82>>8)+%01000000
	.db (82&%11111111)
	.db $10

	.db %11000000

run_g:

	.db 1

	.db (93>>8)+%01000000
	.db (93&%11111111)
	.db $10

	.db (72>>8)+%01000000
	.db (72&%11111111)
	.db $10

	.db (60>>8)+%01000000
	.db (60&%11111111)
	.db $10

	.db (53>>8)+%01000000
	.db (53&%11111111)
	.db $10

	.db (49>>8)+%01000000
	.db (49&%11111111)
	.db $10

	.db (53>>8)+%01000000
	.db (53&%11111111)
	.db $10

	.db (60>>8)+%01000000
	.db (60&%11111111)
	.db $10

	.db (72>>8)+%01000000
	.db (72&%11111111)
	.db $10

	.db %11000000


bass_c:
	.db 0
	.db (307>>8)+%01000000
	.db (307&%11111111)
	.db 8
	.db (144>>8)+%01000000
	.db (144&%11111111)
	.db 8
	.db %11111111

bass_f:
	.db 0
	.db (224>>8)+%01000000
	.db (224&%11111111)
	.db 8
	.db (105>>8)+%01000000
	.db (105&%11111111)
	.db 8
	.db %11111111

bass_g:
	.db 0
	.db (198>>8)+%01000000
	.db (198&%11111111)
	.db 8
	.db (93>>8)+%01000000
	.db (93&%11111111)
	.db 8
	.db %11111111

; Instruments:

simple:
.db 4	; Length
.db 255
.db -64

.db 8
.db 0
.db 32

.db 1
.db 255
.db 0

.db 0

vib:

.db 2
.db 128
.db 50
.db 2
.db 128
.db -50
.db 2
.db 128
.db 50
.db 1
.db 128
.db -50
.db 1
.db 128
.db 50
.db 3
.db 128
.db -50

.db 10
.db 128
.db 14
.db 0

Thankfully, the assembler can handle me sticking sums in rather than hard-coded values in places making things a lot simpler... but it's still a bit mucky. Ah well. There is no noise channel set up, either, so no krch-krch-krch style beats from the white-noise generator as such.

It works!

Thursday, 22nd September 2005

  • Fire Track II for TI-83 Plus: 31 source files.
  • Fire Track for the Sega Game Gear: 42 source files.
    Based on this information alone, you can deduce that FTGG will be vastly superior. wink.gif

    I just gave FT2 a try in VTI (I don't have any batteries in my calculator) and good grief it's slow. FTGG is much much faster... (though you should see it go when I set the enemy speed to two... I need more accurate control of speed, 1 being medium-ish easy, 2 being nightmarishly impossible!)

    This makes me happy.

    yay.gif
    Things to notice about the above screenshot:

    • Near pixel-perfect parallax star background (you can see some through the bullet holes!)
    • Not one single poorly timed VDP write.

    This has required an enormous amount of code rewriting!
    First to be fixed was the VDP timing. With a few .ifdefs, restructuring of the main loop and addition of an extra function call, all sprite data is now written to a shadow in RAM and then all sent to the VDP at once. This means that in sprite-intensive frames the framerate drops (it is unnoticable!) instead of the VDP going bonkers.

    Next up, the pixel-perfection. I do not know if you know how difficult this is to do on this hardware (or, at least to a hack coder like myself), so I'll do my best to explain.

    On a purely software-driven system, you can create a bitmask of your tiles so that when you manually blend your layers together you can cut out the stars you don't need. Or, you just draw them first then draw the tiles on top.

    On more advanced hardware you can Z-order your polygons, so that's all done for you.

    On the Game Gear, I have two layers - sprites and background. I can set individual background tiles to be on top of sprites - but this would dump a solid 8x8 square on top of everything. In other words, ALL the sprites are drawn under the background tiles marked as being on top. I had toyed with the idea of bizarre palette tricks and messing around with the tiles themselves, thinking sprites would be too slow (I asked about this on the SMS Power! boards - apparantly some games do the sprite trick, so I gained confidence in it).

    What I need to do is this:

    1. Take the (x,y) of my star and convert this into an offset into the VRAM name table.
    2. Look up the value of the tile at this value. If it is 0 (blank tile), draw the star. (Special case to skip expensive stage below)
    3. Look up the tile in a table of sprites marked as transparent. If it isn't listed, don't draw it.
    4. Take the offset into our table of transparent tiles, multiply it by 64, use that as an offset into another table. Find how far into the nearest tile as an (x,y) coordinate we are - that is, x is in the range 0-7 and y is also in the range 0-7 inside the current tile.
    5. Use this coordinate to look up inside our mask whether to draw the sprite or not.

    There was an awful lot to go wrong and it did, often. sad.gif It works pretty well now, so I'm happy with where it is. smile.gif

    The function to calculate the VRAM name table offset based on an X,Y on screen was broken. It needs to take into account the current vertical scroll, you see, to calculate the offset, so was essentially doing this to calculate which 8-pixel-tall row a pixel was in:

    row_offset = (y/8) + (scroll_y/8)

    (where y is the coordinate we pass to the function) - which leads to all sorts of rounding errors! The correct function is, of course:

    row_offset = (y+scroll_y) / 8

    Which, oddly enough, didn't seem to work at all well - at certain times, the values returned were completely off. The reason is simple enough - our scroll_y is a value between 0 and 223, and out y coordinate is anything between 16 and 160. 160+223=383 - you try storing that in an 8-bit register! So I updated the entire function to 16-bit code where needed.

    Z80 ASM gurus - my old code to divide A by 8 was the simple:

        srl a
        srl a
        srl a

    For my divide-hl-by-8 function (srl [reg] is a sort of [reg]>>=1 function) I use this:

        srl h
        rr l
        srl h
        rr l
        srl h
        rr l

    -- is this the fastest I could use? Is there a nice 16-bit shift operation available to me?

    Next I had to generate the transparency data. I edited my tile editor - solid green &00FF00 is taken to mean "transparent!" It dumps a list of any tile with transparency, then an expanded 64-byte table of the mask (1 byte per pixel - horribly big, but much faster than packing it into a 1bpp mask).

    When I say "near pixel-perfect", I have a problem - my stars are 2x2 pixels, and so if they can overlap the borders by a pixel or two from time to time. It doesn't look too bad, and in my defense the original Fire Track displayed stars over ANY black areas on the background at some points...

  • On the merits of Excel and running out of space

    Monday, 12th September 2005

    video.jpg
    Latest video. [2.29MB AVI]

    You chaps probably don't know what a major headache it is trying to get clear, non-wobbly video... rolleyes.gif

    taking_video.jpg

    I gave up in the end and went for the "propped-up against a wall with camera on stack of books" approach. The reason I cannot, therefore, appear to play the game very well is because I cannot see the screen. All I can see is a faint, inverted glow and have to guess where I am. Not fun.

    Here are the major updates since last time:

    • Removed Blackadder theme rolleyes.gif
    • Altered colour contrast on title screen, added wipe to clear it in imitation of original, restructured code for better fades, changed timing, added sound. Title screen is now complete.
    • Rewrote entire code for checking bullet collisions with buildings on the ground from scratch. It now works perfectly.
    • Levels now "start" correctly, so the ship moves in from the bottom, the start platform moves towards you very fast until they are in the correct positions, at which point gameplay commences (much smoother than just dumping you at the start position with a full level already there).
    • All levels now mapped correctly, in order, with correct palettes for backgrounds AND sprites. Levels that end a zone with a platform rather than with the face are flagged as such and the correct end is substituted in now.
    • Each level can be pointed at a 'replacements' table, which lists "replace tile X with tile Y". Tiles are swapped around properly so that levels appear more varied.
    • Explosions are now fully implemented (2 kinds, background and foreground - with very slightly different animations) and working.
    • Path-based enemy attacks are now in full working order, using paths generated with a combination of my graphical calculator and Excel.
    • A couple of other new enemy attacks have been added.
    • The game now sports a BBC Micro font which can be used for proper text display.
    • The "epilogue" to the game is fully coded (type-writer effect for final message copied from original game, including sound effects and fades).
    • Each level now begins with a summary screen (sprite limits meant that displaying all your lives at the start of the level was impossible, so is now done on an interval screen). The screen displays the zone name and a number of lives, the number of lives being surrounded by orbiting space-ships (the number of those being the number of lives left).
    • The game now fully works on hardware (should now work on old hardware - not checked, as I do not have an old BIOS-less Game Gear) and in all emulators (I was not initialising the display correctly).
    • Destroyed terrain tiles are not updated live to the name table in the fairly intesive loop - they are saved to a list of destroyed tiles, which are then read off directly and updated when needed in the intensive graphics bit.

    I've probably forgotten a lot of stuff. Bear with me.

    There have been a number of "WTF" moments reading back some of the source. Take, for example, this gem:
    wtf.gif
    Some explanation is needed here - this is from the code that handles the special case when you have shot a 2x2 square building and it lies out of bounds of the name table. The name table is my grid (32x28) of tiles indices that each point to an 8x8 tile in the VRAM. As the view point moves up this map, new tiles indices are written to it, or old ones are adjusted when something is blown up. The worst case scenario here is if the 2x2 tile lies just above the top of the name table - which is what the above code has to fudge. Look at this:

    tilemap.gif
    (I love debugging emulators!)
    This shows all of my 32x28 tilemap. That funny white rectange is my view rectangle (it's currently half-off the display, and it loops around). If you look at the top and bottom edges of the tilemap, you'll see a ring has been shot out. Naturally, the bottom will have been hit first. What happens in this case is the address of the tiles that have been erased is decremented by a whole row of map points (there are 2 bytes to a spot on the name table, so I go backwards by 64). In this case, I find that the address is above the start of the tilemap so I add on 64*28 (which is the entire size of the table) and add it back to get back into the bounds of the table itself. As you can see, it was a successful operation in this case, but only because I was writing back HL this time and NOT the accumulator! No wonder my sprites were becoming destroyed as I was writing in completely the wrong part of the VRAM...

    blown_up.gif
    This is the screenshot from the emulator of the screen display for the above tilemap (the sprite layer is not displayed in the tilemap, for obvious reasons).

    I'm still not getting very far with my quest for music. sad.gif
    The only bit of the game that has music is the title screen, and that's not so much music as a weird noise (here is an MP3 of my Game Gear's rendition, this is the original from the BBC Micro). Even though I have specified that the noise channel (the one that produces the high beeping noises) to use the highest frequency, it's still lower than the Beeb. All the other notes are correct, though.

    Latenite has got yet another update - project support. (It looks more and more like VS.NET every day!) Rather than create messy proejct files and stuff which makes distrubution tougher, I thought I'd go the easier route of just allowing people to stick directives in the file with the ;#latenite:main directive. You can now ;#latenite:name="name_of_project" and ;#latenite:project="file_name"[,"project_folder"].

    latenite_small.jpg
    Click for larger image

    Skybound Visualstyles (which I'm using to fix the weird .NET XP interface bugs) looks lovely but hugely slows down the app - only really noticeable with so many hundreds of nested controls like me and when moving large windows on top of it forcing repaints, so now the XP styling option is saved away (so you can revert to boring standard grey if you like). Latenite now saves window/splitter positions when you close - a first for an app by me! grin.gif
    In fact, Latenite is also my first app that needs a splash screen, it takes so long to get going... sad.gif It's only about 2 seconds, though, for it to track down all the tools, load all the documentation, add all the compiler scripts and present itself, so it's not too bad.

    One overlooked piece of software is Microsoft Excel. People seem to develop a hatred of it, especially in people who end up trying to use it as a database application (it appears that this is what 9 out of 10 smallish (and some not so small!) companies do).
    I have been using Excel to create those smooth curving paths for some of the enemies to follow. First, I prototype them on my graphical calculator (lots of trial and error involved!):

    calc_1.jpg
    calc_2.jpg

    ..before transferring into an Excel spreadsheet.

    excel_small.jpg
    Click for legible!

    The Excel spreadsheet is set up so that column A is the steps (usually 0 to 256) in ascending order. Column B is the X coordinate, column C is the Y coordinate.
    Columns E and F serve a special purpose - they contain the differences between each element in columns B and C (so E3=B3-B2, E4=B4-B2 and so on). Columns G and H "fix" the values - if it's negative, make it positive and set the fourth bit (just add 8). This is for the internal representation of the paths. Column I contains the data I can just copy and paste into the source files - it starts with the initial (x,y) coordinate, and then has a list of bytes. Each byte signifies a change in (x) and change in (y), with the lower four bits for (x) and the upper four bits for (y).If the nibble is moving the object left or up (rather than right or down), then the most significant bit of that nibble is set. For example:

    %0001 - move right/down 1 pixel
    %0011 - move right/down 3 pixels
    %1001 - move left/up 1 pixel
    %1100 - move left/up 4 pixels

    ...and therefore...

    %11000001 = move right one, up four.

    Poor Excel has suffered a little bit of neglect from me - I used to use it to generate my trig tables. WLA-DX, the assembler I learn to love more and more every day, supports creation of a number of different tables - including trig tables - with a single directive. Just specify .dbsin (or .dbcos) followed by the start angle, how many bytes of trig table you want, the step between each angle, the magnitude of the sine wave and the DC offset and away it goes.
    People still using TASM, STOP! Use WLA-DX. It's ace (and free, not shareware!).

    You can never have enough pictures in a journal (hey, you should see this page load on our 700 bits/sec phone line at home!)
    Here's the full list of sprites:

    sprites.gif

    The last ship being after the explosions is a mistake, but it's too much hassle to change. I'm really pleased the way they've come out - I'm no pixel artist. smile.gif

    I'll finish off with a few assorted screenies.

    misc_01.pngmisc_02.png
    misc_03.pngmisc_04.png

    ...I would have done. I should have done. However, when taking those screenshots I saw a glitch in the graphics. As an exercise for the reader, can YOU spot where the bug is?

    bug.gif
    (and it's not in one of those external functions I wrote - this is why copy-paste coding is bad, 'kay?)

    Oh, (possibly?) final last thought popping into my head: one thing I really miss from the BBC Micro era is the comfortable keyboard layout. I mean, look at this small photo of a Master 128 keyboard:
    master_128.jpg
    Click for bigger

    You might think that that blocky monstrosity would knacker your fingers, but that's not the reason. It's the fact that rather than squash one hand's fingers onto a small cursor pad (or the even sillier WASD), you balance movement between your two hands - Z and X for left and right, : and / for up and down. (On a modern UK layout, that'd be ZX and '/). The space bar is now conveniently accessible by both thumbs, and you can reach around most of the rest of the keyboard with your other fingers.
    The Game Gear is an ergonomic handheld (unlike the uncomfortable Game Boy), so I'm reckoning that I can balance the controls between hands - ← and → on the d-pad can control left and right, and the 1 and 2 buttons can control up and down. I will offer multiple layouts, though, as otherwise that might confuse some people. wink.gif

    Anyway, I hope you find this journal interesting. It seems a bit all over the place - video, pictures, rambling, odd bugs, nostalgia... The only thing that's an issue is my rapidly shrinking GDNet+ account web space!

    Subscribe to an RSS feed that only contains items with the Z80 tag.

    FirstLast RSSSearchBrowse by dateIndexTags