A parallel port and a demonstration of the Z80 computer

Sunday, 5th September 2010

The last piece of hardware to add to the computer was a parallel port. These have eight data lines and nine assorted control and status lines. My last two 8-bit I/O expanders provide sixteen of these seventeen lines, and the final one was provided by the DS1307 real-time clock chip which happily has a spare pin on it that can be used as an output.

Parallel port I/O expanders Parallel port connector

This parallel port can be used to print from the computer. Some software has printing capabilities built in (such as the text editor VEDIT Plus), but by pressing Ctrl+P in CP/M any text sent to the display will be simultaneously sent to the printer.

I also needed to mount the LCD inside the case. I bought a plastic strip to try to make a nice frame for it, but couldn't cut it accurately enough by hand so have had to make do with merely sticking the LCD behind a rectangular hole cut in the aluminium. It's not the neatest arrangement and doesn't protect the LCD from scratches but is better than nothing.

To demonstrate the computer's hardware and software, I recorded a video:

I'm not desperately happy with the way it came out; I really need to find a better microphone and the angle of the sun and variable weather when I made the video threw the white balance off. On the plus side, I did find out how to capture crisp black and white video with my TV capture card; I connected the composite video output from the computer to the luma pins on the S-video input on the capture card, then dropped the saturation to zero in VirtualDub. For some reason this produces great quality video, in comparison to the composite input which produces a fuzzy mess — there shouldn't really be any difference with a black and white signal (regular television sets don't have any problems).

A clock and a serial port for the Z80 computer

Tuesday, 17th August 2010

At the end of the previous entry I mentioned that I was going to start developing my own programs for the Z80 computer. The first is a graphical clock, taking advantage of my implementation of the BBC Micro's VDU commands and the ability to use those commands to draw graphics onto the screen as well as text:

Graphical analogue clock for CP/M 3

I have uploaded the code and binary to my site for anyone who is interested, though it will only work on a machine running CP/M 3 and that is equipped with a display that implements a handful of BBC Micro VDU commands.

The computer features a display for output and a keyboard for input which is sufficient if you're interacting with a human but it's often nice for computers to be able to speak to eachother, so I've added an RS-232 serial port.

RS-232 driver and port from the inside

RS-232 is a bit of an unfriendly beast. Whereas the computer's logic uses 0V to indicate a logic low (0, "false") and 5V to indicate a logic high (1, "true") RS-232 uses around +12V for a logic low and -12V for a logic high. This requires that the outgoing signals are inverted and boosted and the incoming signals are inverted and reduced to protect the inputs of the receiver circuit. Fortunately you can easily get hold of chips that perform this task for you when aided by a number of capacitors; in my case I'm using an ST232, which is shown in the bottom left of the above photo. A DE-9M connector is provided on the outside of the case, much like the one you'd find on your desktop if you were trapped in the 1990s.

One issue I have yet to solve is handshaking. The serial port sends or receives data on two wires (TXD and RXD respectively). The receiver has to handle each incoming byte from the transmitter. As the receiver may be busy performing other tasks at the time it may end up receiving data faster than it can process it and it will start losing bytes. There are a number of different ways to avoid this problem. The simplest electronically is to use XON/XOFF handshaking; in this configuration, the receiver can send the XOFF byte to the transmitter when it's busy and the transmitter will stop sending data temporarily. The receiver can then send XON back to the transmitter when it's ready to receive more data. This technique has one major drawback — it prevents you from sending binary data containing the XOFF or XON bytes.

An alternative solution is to add two wires to the serial connection — Request To Send (RTS) and Clear To Send (CTS). These can be used to signal when each device is available to accept data. This allows you to send XOFF and XON directly over the serial port (extremely useful for binary data) yet requires the addition of two more wires to the port.

Unfortunately whilst implementing both techniques is possible, CP/M only internally refers to XON/XOFF handshaking; there is no way to select RTS/CTS handshaking. I think what I will end up doing is have CP/M's XON/XOFF refer to handshaking in general and then add a hardware-specific utility that lets me choose which particular type of handshaking I wish to use. This utility could also help me select other serial port configuration settings that CP/M doesn't expose (such as parity, number of stop bits or number of data bits).

Z80 computer session in PuTTY

With the hardware installed, the AVR I/O controller updated to use it and the BIOS reprogrammed to expose it to CP/M it is possible to interact with other computers over the serial port. CP/M features five logical I/O devices: CONIN and CONOUT for general console input and output, AUXIN and AUXOUT for general "auxiliary" output and LST for printer output. The BIOS exposes two physical devices; CRT for the keyboard and video display controller and RS232 for the serial port. By using the DEVICE utility you can connect these logical and physical devices together. In the above screenshot I have connected the serial port to both CONIN and CONOUT. This allows me to connect my desktop PC to the Z80 computer using a null modem cable and use terminal emulation software (such as PuTTY) to talk to it.

Simulated BBC Micro VDU mirroring console output

The above screenshot shows VirtualDub capturing the output of the video display controller next to an instance of BBC BASIC for Windows which is running the following program:

aux%=OPENIN("COM2: baud=9600 parity=N data=8 stop=1")
REPEAT
  REPEAT:UNTIL EXT#aux%
  VDU BGET#aux%
UNTIL.

This passes any data received over the serial port to the simulated VDU in BBC BASIC for Windows. As both video devices accept the same commands the result is that both show approximately the same thing.

I have been slightly improving the video display controller as I've gone along. One feature I had to add for the clock was the ability to draw text characters at the graphics cursor position, as opposed to the fixed text grid (this is used to draw the numbers around the dial). At the same time I added the ability to redefine the appearance of characters. One obvious use of this feature is to change the font, but when combined with the ability to render text anywhere on the screen some simple sprite-based games could be written for the computer. Each letter is just a 8×8 pixel sprite, after all.

MODE 2

Another feature I added was a simple implementation of MODE 2 where characters are stretched to sixteen pixels wide. You can't get much text on the screen in this mode but it may be useful for games.

A useful Z80 computer in a project box

Saturday, 14th August 2010

Work continues on the Z80 computer. The two final modifications to the box itself are the holes for the status LEDs and the power switch.

Status LEDs Power switch

The green LED indicates power and the orange one disk activity. Unfortunately, the project box is fairly scratched on the outside (one scratch on the front is my own fault, but the sides and back were fairly scuffed and scratched when I bought it). If anyone has any tips for polishing scratches out of ABS I'd be glad to hear them; the usual household polishing abrasives (such as toothpaste) remove most of the light scuffs and result in a lovely mirror finish, but don't do anything to the deeper scratches. I'll probably invest in the finest grade wet-and-dry sandpaper I can find and have a go with that followed with a Brasso polish, and if that doesn't help (or makes it worse) just sand the whole thing down and paint it.

Pin header connector

The circuit board inside the case needs to be attached to the case-mounted components somehow. In simpler projects I've resorted to soldering these connectors directly to the board, but this can make maintenance a problem (to remove the circuit board one would have to cut and resolder the wires). For this project I've left pin header strips on the board. The external connectors have leads soldered to them terminated with pin headers cut to size using some wire cutters and a rotary tool to polish them off; these headers are pictured above.

Circuit board mounted inside the case

The main circuit board can then be easily installed or removed from the case as required. The small circuit board for the video display controller is connected to the main circuit board in the same way.

Z80 and SRAM pin numbers marked

A Z80 computer can't live up to its name without some sort of a Z80 inside it, so I thought that that was the most obvious part to add next. Computers also generally need access to memory so I decided to add the 128KB SRAM chip at the same time. The Z80 communicates with the memory over an eight-bit data bus, a sixteen-bit address bus (to indicate which address in memory it is reading from or writing to) and a number of control lines (to indicate whether the current operation is a memory read or a memory write, for example). This provides a fairly tedious amount of soldering work; each pin on the memory needs to be connected to the corresponding pin on the Z80. To aid in the construction I stuck masking tape to the bottom of the perfboard around the outline of where the two chips would go and wrote the pin numbers onto the tape, shown in the photograph above.

Z80 and SRAM address, data and control buses

I put the two chips close together so I could put all of the bus wires on the inside of the IC holders rather than going around the outside. This saves a bit of space and avoids having to route the wires around the chip holders which gets a little untidy. The above photograph shows all of the wires in place before the chip holders were soldered in. Adding those in should be a quick and easy job, at least...

SRAM IC socket soldered in the wrong way around

Well, you'd have thought so, but somehow I managed to solder in the 32-pin SRAM socket the wrong way around. Each socket has a notch to help you align the chip using its corresponding notch. As you can see in the above photo the notch points right when it should point left like all of the other sockets. It wouldn't affect the operation of the circuit (as long as the SRAM chip was inserted with the notch to the left) but it looks untidy and I may as well do the job properly.

SRAM IC socket soldered in the correct way around

On the positive side I suppose I got to practice my desoldering skills.

Z80 and AVR data bus connections

The computer design uses an AVR microcontroller to manage the I/O devices (such as the keyboard, video display controller and SD card) and to load the OS into the Z80's memory on reset. To achieve this the Z80 and the AVR need to be connected together. The above photograph shows some new wires between the AVR (bottom left) and Z80 (bottom middle) to connect the Z80's data bus to the AVR's PORTA and a number of other wires to connect the Z80's control lines to several other I/O pins on the AVR. A number of pull-up resistors have been added to control lines on the Z80 so that when nothing is driving the control bus they rise high (the de-asserted state). If left disconnected ("floating") the other components connected to the control bus may think these lines had gone low (asserted) and treat that as a read or write operation, corrupting data.

I/O expanders Soldering detail of the I/O expanders

The AVR also needs to be connected to the Z80's address bus. This would take another sixteen pins if driven directly by the AVR; sixteen pins that aren't available to me! I am therefore using two MCP23S08 eight-bit I/O expanders, pictured above, to drive the address bus from the AVR. These are controlled over the SPI bus, which only takes up three pins on the AVR (these pins are shared with other SPI peripherals, such as the SD card) plus a single chip select pin that is unique to the I/O expanders. Four pins is better than sixteen, at any rate.

All ICs to date installed Computer in its project box

I keep mentioning chips even though the sockets are quite clearly empty in the above photographs. As I was approaching a useful computer circuit at this point I plugged all of the chips into their sockets to test the connections. As there was no SD card, real-time clock or keyboard I had to modify the boot loader on the AVR quite considerably; I started with a test program that wrote random data to blocks of memory then read them back to verify that they had written correctly. Once I had verified that the AVR was able to access memory correctly I reprogrammed it to copy a small Z80 program to memory and then let the Z80 take over. This Z80 program repeatedly output the string 'Z80' to the console output port. With everything plugged in I switched on the computer and saw the screen fill with Z80Z80Z80… so I was pretty certain that I'd wired everything up correctly!

DS1307 and battery clip

At this point I could start reintroducing the various peripherals to the computer. A DS1307 is used as a real-time clock. This clock needs to keep running when the computer is switched off, so I've added a 3V battery connector to the computer to keep it ticking.

SD card slot

As the computer uses a 512MB SD card for storage, I have added a pin socket strip to the board to plug in the SD card slot I scavenged from a card reader. The card is connected to the SPI bus along with the I/O expanders used to drive the Z80 address bus. SD cards run at 3.3V rather than the 5V that nearly everything else on the board uses so I've used a series of voltage dividers to drop the voltage on each input pin from 5V to around 3V (the resistor values I have don't allow me to get to 3.3V; 3V is the closest I can manage without going over 3.3V). The video display controller board also runs on 3.3V so I do at least have a suitable voltage supply for the card!

Keyboard connector

The final part of the computer that was on the breadboard prototype but not yet in the final build was the keyboard connector. This is simply a four pin header on the board that is connected to the PS/2 port screwed to the case. However, when I tried to use the computer, the keyboard didn't appear to work. Pressing Num Lock, Caps Lock or Scroll Lock would toggle the associated LED and hitting Ctrl+Alt+Del would reboot the computer but no other key worked. This implied that the AVR was handling the keyboard correctly but the Z80 wasn't receiving any notification of key presses. A bit of digging identified the problem; I'd forgotten to connect the Z80's interrupt pin to the AVR! When a key is pressed the AVR triggers an interrupt to let the Z80 know that a key is available. By soldering a wire between the two chips it started working as intended.

Z80 computer in its enclosure

The computer is now up to the same standard as it was when assembled on the breadboard, but is much more practical to work on. I hope to add a serial and parallel port to the computer soon, and would like to mount an LCD into the lid of the project box, but for the time being I am happy that I have managed to get this far.

Z80 computer running VEDIT

One of the advantages of running CP/M on the computer rather than my own operating system is the availability of existing software. The above photograph shows the computer running VEDIT, which is an excellent visual text editor.

VEDIT for CP/M

Zork for CP/M

With the hardware in a decent configuration I can start writing my own software. I think the first CP/M program I'll write is a graphical analogue clock, as this is the sort of program that can be left running for long periods as a way to check the stability of the computer.

Mounting circuit boards and rear panel connectors

Monday, 9th August 2010

One of the fun things about working with electronics is that you can end up with a physical product at the end of your hard work. To this end I have started moving my Z80 computer from its current breadboard to a more permanent enclosure.

Project box outside Project box inside

Large project boxes can be quite expensive (around £40, it seems), but the one I picked out was a slightly more reasonable £7. It's not the prettiest enclosure I've seen but it should be large enough to house the computer and provide space on the lid for the LCD and on the rear surface for a collection of connectors (as you'd expect to find on the rear of any computer).

Perfboard shown inside the computer.

The first challenge was how I intended to mount the circuit board within the box. The perfboard I will use for the main computer circuit doesn't fit the marked mounting posts on the bottom of the project box; it's too narrow and too deep. What the photo doesn't show very well is that the perfboard is not able to lie flat in the box due to the curve at the rear of the box. To raise the board above the bottom of the box I decided to use four PCB spacers, which required two new holes to be drilled into the perfboard away from its corners.

Two new holes for PCB spacers Underside of the perfboard showing PCB spacers

I decided that the video display controller, which resides on its own board, should be mounted on the main circuit board using PCB spacers too.

Holes drilled to support the VDC VDC mounted on the main circuit board

This required four more holes to be drilled into the main circuit board. I tried to align the small video display board so that its 16-way pin socket for connection to the LCD was as close to the horizontal centre as possible.

Holes drilled to support the main circuit board Using the main circuit board to find the position of all of the screw holes

The base of the project box needed to have four holes drilled into it to support the main circuit board. Once the two nearest the front edge had been drilled, I screwed the circuit board to the back of the project box to mark the position for the other two holes to ensure that they lined up exactly with the holes drilled in the circuit board.

Both circuit boards mounted inside the box

Screws come through the bottom of the project box to hold the main circuit board in place. Some sticky foam feet are provided with the project box which will raise it off the surface it is resting on to prevent these four screws from leaving scratches! Due to the curve at the back of the box the circuit board is only a few millimetres above its surface, which is why I reversed the screws holding the video display board to leave the long threaded ends pointing upwards.

Power supply Power supply soldering detail

As working on the enclosure is a fairly noisy activity I switched my attention to the electronics for a brief spell. The first part of the circuit I assembled was the power supply; this just uses a pair of voltage regulators to provide 5V and 3.3V from an external power supply (I use a cheap wall wart affair rated at 7.5V DC).

Oscillator

I decided that the next part to tackle would be the oscillator. This uses a 20MHz crystal and a 74LS04 according to the design on z80.info to generate a 20MHz clock signal which will be further divided by two to produce a 10MHz clock signal for the Z80. I had some real problems with this design; it would run at 20MHz until I attached a load to it, at which point it would generate a fairly random-looking signal or stop oscillating entirely. I experimented with a few different capacitors and found that if I remove the 120pF capacitor and replace it with a 33pF capacitor on the other end of the crystal it works reliably. I'm not entirely sure why this is, but it's the design I've been using for a while with the computer on a breadboard so I'm happy to keep it this way for the time being.

ATmega644P

I added a D flip-flop to divide the 20MHz clock to 10MHz and then added the ATmega644P microcontroller to the board. This has a jumper next to its clock input allowing for the selection of either 20MHz or 10MHz operation; a pin header to the left of this jumper allows for it to be programmed in-circuit.

VDC reinstalled in the case

With those new parts in place I reinstated the video display board to check that everything still fit. My main concern now was how far the connectors screwed into the rear of the case would intrude and whether there'd be any problems with them getting in the way of the circuit boards.

Rear panel marked for mounting connectors

I sketched a design of how I saw the connectors would fit on the back of the case and then copied the layout to some masking tape stuck to the case. The computer naturally needs a power supply and keyboard input, and the video display board accounts for the VGA connector and an RCA connector for composite video (which I neglected to mark). I also hope to include a serial port and a parallel port in the final design (though neither are currently supported by the software) so left space for those two connectors.

Hole drilled for the keyboard connector Keyboard connector mounted in the case

The 6-way mini-DIN connector for the keyboard is the deepest one to contend with so I decided to start with it. I cut the hole in the case by drilling a small hole in the plastic which I then enlarged with a burr tool to the correct shape and size.

Keyboard connector screwed in

Fortunately it looks like there's plenty of room in the case for connectors!

Connectors for the serial port, composite video output and DC input Inside view of the case with some more connectors installed

The next few connectors confirm this. I really do not enjoy cutting the holes for D-sub connectors (such as the one for the serial port); they don't have much of a metal lip to hide a botched hole, so I have to cut very slowly and very carefully, taking a very long time to slowly enlarge each hole until the connector fits. I'm therefore not really sure why I decided to have three D-sub connectors in this computer design; maybe I'm just a glutton for punishment.

Completed rear panel Rear panel as seen from the inside of the case

Finally, the rear of the case is completed. I will leave the masking tape on there as scratch protection until I have finished the front of the case (this will be significantly simpler — just a power switch, power LED and disk activity LED). Once that is done I can resume working on the electronics!

Integrating the dsPIC33 VDC with the Z80 computer

Saturday, 31st July 2010

The ultimate goal for the video display controller module I have been working on is to drive the display in my Z80 computer project. As I have now got a pretty good set of features I thought it would be a good idea to join the two projects together.

Z80 computer with dsPIC33 VDC

The big board in the lower middle of the above photograph is the main body of the computer, including the Z80, its RAM, the ATmega644P that is used to handle I/O, an SD card for storage and a DS1307 real-time clock. The small board in the bottom left of the photo is the power supply (supplying both 5V and 3.3V) and clock generator (providing a 20MHz and 10MHz clock).

At the top of the photo is the video display controller, connected to a 320×240 graphical LCD. A pin header is used to connect this VDC board to the rest of the computer. Three pins are required for power; 0V, 3.3V (dsPIC33 and output buffer) and 5V (LCD). The VDC is connected to the computer's ATmega644P I/O controller using the two-wire I2C bus (the same bus that is used to access the DS1307 clock). Rather than run a series of graphical demos, the VDC now waits for commands to be written to the I2C slave address 0xEE which it acts on to control what is shown on the screen. I'm aiming for these commands to work in the roughly same way as they did on the BBC Micro VDU, which should make porting the enhanced TI-83+ version of BBC BASIC to this computer a bit easier. The BBC Micro's VDU could be accessed by calling OSWRCH (assuming it was being used as the current output stream), which typically has an address of &FFEE — hence my choice of 0xEE as the I2C slave address!

Detail of the LCD connected to the VDC

A handful of these VDU commands have been implemented, which is sufficient to run simple CP/M software. The generic CP/M version of BBC BASIC does not, naturally, support any hardware-specific features and as such lacks advanced text or drawing support (one can send commands directly to the output stream with the VDU statement but this isn't very user-friendly). I will need to work on this now that the hardware is coming together! The current VDC code can be downloaded here if you are interested in the changes that have been made.

The above photo shows the newly constructed VDC hardware. All of my previous projects have been assembled on stripboard; as the projects have become more complex or simply smaller I've found stripboard to be increasingly awkward to work with. ICs can only really be orientated in one direction, and to reduce the size of circuits I've had to start cutting the tracks between holes (rather than the usual method which is to drill out an entire hole). The supplier I normally acquire parts from, Bitsbox, recently added three different sizes of perfboard to their catalogue so I thought I'd give it a go. I've found it much more pleasant to work with than stripboard, though not as easy to correct if you make a mistake and need to desolder a connection. You can certainly perform some interesting space-saving tricks on the underside of the board!

The underside of the VDC showing the soldering technique

The Kynar insulation on the wire I switched to using also has the advantage of not melting when heated with a soldering iron, as I've had problems in previous projects where tightly-spaced wires will end up getting shorted together as the insulation between them melts.

I have mentioned that one pin header is used to connect the VDC to the computer. There are three others on the board; the two-pin one is for the composite video output, the six-pin one is for connection to a PICkit to reprogram the dsPIC and the four-pin one for the VGA output.

Detail of VGA output from the VDC

Now that I have moved the VDC onto a permanent circuit board I feel that I can start moving the rest of the computer in the same direction. The software is far from complete and the hardware is pretty rudimentary but it does basically work and having a more robust system to work on should make life a bit easier.

Booting CP/M 3 from an SD card

Wednesday, 23rd June 2010

Up to this point I have been running CP/M 2.2 on the Z80 computer. CP/M 3 adds a number of useful features, including the following:

  • Support for more than 64KB RAM via banked memory.
  • Standardised access to real-time clock for file date and time stamping.
  • Improved text entry on the command-line when using the memory-banked version, such as the ability to move the cursor when editing and recall the previously entered line.
  • Support for disks with physical sectors larger than the default record size of 128 bytes.

Switching to a banked memory system would require some new hardware in the form of a memory management unit so I have stuck with the simpler non-banked system for the time being. Support for physical disk sectors larger than 128 bytes is more interesting (SD cards use 512 byte "blocks") and real-time clocks are always useful so I have started working on updating to CP/M 3.

Z80 computer with new SD card slot and real-time clock
Z80 computer with new SD card slot (bottom left) and real-time clock (top right)

CP/M consists of three main pieces of software:

  • A BIOS which exposes a small number of routines to perform primitive, hardware-specific operations (e.g. output a character to the console, read a raw sector from a disk, check if a key has been pressed).
  • The BDOS which provides the main API for transient programs (e.g. read a complete line of input from the console, create a file, read a record from a file).
  • The CCP, or console command processor, which provides the main user interface for loading and running other programs or performing some basic tasks via its built-in commands. This would be analogous to COMMAND.COM on DOS.

When working with CP/M 2.2 I had source files for these three pieces of software, so I just needed to implement the 17 BIOS functions, reassemble the three files to fixed addresses in memory and load them to these fixed addresses using the AVR when booting the computer. These three files were stored in the lower 8KB of the flash memory chip and were not accessible from within CP/M itself.

CP/M 3 proved to be a bit more of a challenge, as it is loaded slightly differently. The CCP is stored as a regular file named CCP.COM on the floppy disk you're booting from, so only the BIOS and BDOS need to be loaded from their hiding place at the start of the boot disk. These two pieces of software are merged into a single file named CPM3.SYS by a CP/M utility named GENCPM. To get this utility to work I needed to provide GENCPM with a hardware-specific BIOS3.SPR file that implemented the 31 BIOS routines. Fortunately, a file named BIOSKRNL.ASM is provided that implements most of the boilerplate code involved with writing a BIOS (you still have to provide the hardware-specific routines yourself, but your task is made much easier by following the template) so I just needed to recompile that for a non-banked system and link it with my handful of hardware-specific routines.

A log of a session in CP/M 3

Ideally, CPM3.SYS would be stored on the regular file system with CCP.COM and the hidden boot loader would load CPM3.SYS for you. CP/M 3 does provide a small boot loader for this purpose (aptly named CPMLDR) which employs a cut-down BDOS and BIOS to load CPM3.SYS from the file system into memory for you. I haven't been able to get it to work, though, so I currently parse and load CPM3.SYS using some C code on the AVR. This works well enough for the time being, as can be seen in the above output generated by the computer when testing the real-time clock.

DS1307 real-time clock

The time and date is maintained by a DS1307, an inexpensive eight-pin real-time clock and calendar chip that is shown in the middle of the above photograph. It is accessed over the I2C bus using a protocol that is natively supported by the AVR hardware. It uses binary-coded decimal to represent dates and times, which corresponds nicely to the time format used by CP/M; however, CP/M represents dates as a 16-bit integer counting the number of days since the 31st December 1977. I have used the algorithms on this website to convert dates to and from this format and the individual components.

The only downside of the DS1307 is that it only stores a two-digit year number, not the four digits one would hope for. This means that the century is discarded when setting the real-time clock, allowing for you to set a date that is then retrieved differently (truncated to the range 1930..2029). I haven't thought of a suitable solution to this problem just yet. I could use the AVR to act as the real-time clock, but I would then lose the advantage of the DS1307's battery backup that kicks in when the main power supply is removed.

The state of the DS1307 is effectively random at power-up. One of the first things the computer does when booting is to read the current date and time and check that all fields are within range. If not it resets them to midnight on the 1st January 1978 and displays a message to indicate that it has done so.

SD card in slot

The SD card has been a bit of a headache to get working and though it currently only supports reading, not writing, it should hopefully be a useful addition to the computer. Rather than the previous arrangement of series rectifier diodes to drop the supply voltage and zener diodes to protect the inputs I'm using a dedicated 3.3V regulator to power the card and resistor voltage dividers to drop the 5V logic signals to around 3V (the closest I could get to 3.3V with the resistors I had to hand). I'm using the disk image from the old 512KB flash chip and treating the card as having 128 byte sectors so the arrangement is no more capable than before and in some cases quite a lot slower (reading a 128 byte record now entails reading a whole 512 byte block from the card then returning the desired 128 byte range within that block) but it seems to be as reliable as it used to be at least. SD cards append a CRC16 checksum when transferring data blocks so I can hopefully detect errors more easily and their on-board flash memory controller should perform wear-levelling, prolonging the life of the card.

To write the disk image to the card I used HxD which makes the job as easy as copy and paste. One problem I did have is that it displayed an "Access denied" error when attempting to write data, which I assume to be because something in Windows was using the card at the same time as HxD. I knocked together a short program for the AVR that wrote junk to the first block of the card, the result being that Windows no longer recognised the card's file system and HxD managed to write the data to the disk with no further problems.

An SD card reader from Poundland

Sockets for regular SD cards seem to be relatively expensive for what they are, but the above SD card reader cost a pound (what else?) from Poundland. A bit of work with a soldering iron and some desoldering tools yielded some useful components:

Parts from the disassembled SD card reader

The crystal is unmarked and I'm hardly short of LEDs but the USB A connector could be a good way to reduce the size of a project that plugs into a USB port (USB B connectors are rather bulky) and the SD card slot works brilliantly for my needs here. There are cheaper and nastier ways to add an SD card slot to your project, but something like this feels more robust and has the advantage of reporting the state of the card's write protection switch.

Keyboard input and RAM disks make CP/M more useful

Wednesday, 16th June 2010

The hardware for the computer has changed in (mostly) subtle ways since the last post, with the exception of a PS/2 socket for connection to a keyboard.

Z80 computer with PS/2 keyboard socket

PS/2 keyboards (which use the same protocol as the older AT keyboard) communicate with the host by clocking data in either direction (keyboard to host or host to keyboard) over two wires, appropriately named "clock" and "data". An AVR pin change interrupt is used to detect a change in state of the clock line and either input or output a bit on the data line depending on the current direction of data transmission. Incoming bytes generally relate to the scancode of the key that has just been pressed or released. These scancodes are looked up on a series of hard-coded tables to translate them into their corresponding ASCII characters. CP/M accesses the keyboard via two BIOS routines: CONST (2), which checks whether a character is available or not, and CONIN (3), which retrieves the character. I initially implemented these by simply reading from I/O port 2 (CONST) or port 3 (CONIN).

As keyboard input is polled, CP/M was wasting a lot of time reading from the AVR. Due to the AVR's relatively slow way to respond to I/O requests this was slowing down any program that needed to periodically call CONST (for example, BBC BASIC constantly checks for the Escape key when interpreting BASIC programs). I converted this polling system into an event driven one by connecting the AVR to the Z80's maskable interrupt pin, /INT. When a new key is received by the AVR it pulls /INT low to assert it. The Z80 responds to the interrupt request by setting an internal flag to remember that a key has been pressed and acknowledges the interrupt by outputting a value to port $38 (the Z80's maskable interrupt handler resides at a fixed address of $38 in memory, so this seemed like a sensible choice). The AVR detects this write to port $38 and returns /INT to its high state. The CONST routine can now directly return the value of this flag when polled (rather than having to request the flag from the AVR) which noticeably speeds up running programs. The flag is cleared when a key is read by calling CONIN.

I did have some difficulty getting the interrupt system to work; the Z80 has a number of different ways of responding to interrupts, two of which rely on fetching a value from the data bus by asserting /IORQ before an interrupt is serviced. IM 0 fetches an instruction from the bus and executes it, and IM 2 fetches the least significant byte of the address of the interrupt service routine to combine with the most significant byte stored in the I register. IM 1 (which is what I'm using) just jumps to the fixed address $38. However, I hadn't taken this additional data read into account and when the Z80 attempted to read from an I/O device the AVR was either putting nonsense on the bus or (deliberately) locking up with a message to indicate an unsupported operation. Fortunately you can easily tell the difference between a regular I/O request and an interrupt data request by checking the Z80's /M1 output pin, so with that addition things started working a bit more smoothly!

BBC BASIC test session with the Z80 computer

I'm still using terminal emulation software on my PC to view the output of the computer, though as I now have keyboard entry the results are a little more impressive than the few boot report lines and a prompt that were in the last entry. I still haven't worked out why my PC switches off or blue-screens when programming AVRs over the serial port, so I've soldered together a parallel port programmer for the time being.

Programming hardware

The pinout of the programmer matches that of the website where I found the SI Prog design. The ATmega644P's SPI, power and reset pins that the programmer interfaces with are all adjacent, but not in the same order as the ones in the SI Prog, hence the small board to the right of the above photo which swaps the pin order around using wires soldered to its reverse (this saves a lot of breadboard space). The board in the middle plugs directly into the parallel port programmer and is used to program the 512KB flash memory chip I'm using for storage.

I haven't got around to implementing writing to this flash memory yet, unfortunately, though I have implemented a simple way to test a writable disk drive. The RAM chip I am using is a 128KB one, as Farnell didn't sell 64KB ones. The Z80 can only address 64KB without additional memory banking hardware, so I'd simply tied A16 low and was ignoring half of the memory. I have now edited the BIOS to expose two disk drives; the default A: (512KB of flash memory) and now B:, a 64KB RAM drive. A16 is now driven by the AVR; during normal operation, it is held low (giving the Z80 access to its usual 64KB) but during disk operations it can be driven high to grant the AVR access to the previously hidden storage.

Testing the RAM disk

In the above test I use the STAT command to check free space, the PIP command to copy BBCBASIC.COM from A: (flash) to B: (RAM) then run BBC BASIC from the RAM disk, save a program then run it again by passing its filename as a command-line argument to BBC BASIC. At the end I try to copy the new program back to A:, but as there is no writing support for flash it keels over with a fairly unhelpful generic CP/M error.

Now that I've finally got something working in a vaguely usable manner, I hope I can start to research ways to make it better. Sorting out writing to flash would be a good start (I'm sorely tempted by jbb's suggestion to use an EEPROM to map logical floppy sectors to physical flash sectors) and I certainly hope to dig out my 320×240 pixel graphical LCD and driver for output instead of relying on a desktop PC. I'd also like to upgrade to CP/M 3 (I'm currently using CP/M 2.2) but when I last looked at that it seemed like a much more involved process so I decided to keep it simple. There's a fair mountain of stuff I need to take in, but I'm certainly learning a lot as I go (I only just realised tonight that CP/M was capable of graphics output, for one). I'd be a very happy chap if I could eventually run WordStar on this computer!

Combining a Z80 and an ATmega644P to boot CP/M

Monday, 14th June 2010

I've been working on a new Z80 computer over the last few days. I would say that I had been working on the existing Z80 computer were it not for the fact that this a completely new design.

The previous computer had two 32KB RAM chips to provide a total of 64KB RAM. To run a user program you need to get it into RAM somehow, so I also included a 128KB ROM chip which occupied the lower 16KB of the Z80's address space to provide the fixed operating system that could be used to load programs. By adding memory banking hardware I could select one of eight 16KB pages of ROM. The next 16KB was one of two banks of RAM from one RAM chip, and the final 32KB was mapped directly to the other RAM chip.

Previous Z80 computer memory map

This is all fairly complicated, and not very flexible. Programs written for CP/M tend to be loaded into memory starting at address $0100, which is impossible with my old design as that section of memory is taken up by ROM.

Giving another device access to the buses

The Z80 accesses memory and other hardware devices using three buses; an eight-bit data bus which shuttles bytes of data between the various chips, a sixteen-bit address bus which addresses a location in memory or a particular I/O device, and a control bus which contains numerous lines that specify the type of operation (for example, if /MREQ and /WR go low together it indicates that a byte is being written to memory, or if /IORQ and /RD go low together it indicates that a byte is being read from an I/O device).

There is also a pin named /BUSREQ that can be used to request access to these buses. The Z80 will periodically check this pin and if it is held low it will put the data, address and control buses into a high-impedance state and drive /BUSACK low to acknowledge this. This effectively removes the Z80 from the circuit, and another device can now drive the buses.

This is the feature which I have based the new design around — the current prototype is pictured above. It features a Z80 and 128KB of SRAM (only 64KB is currently addressable) on the upper board. On the lower board is an ATmega644P microcontroller, which is used to start the computer.

When the circuit is reset, the ATmega644P requests access to the buses from the Z80. When access has been granted, it proceeds to copy the CP/M BIOS from the 512KB flash memory IC to a specific location in RAM (currently $F200). It then writes the Z80 jump instruction jp $F200 to the start of memory, returns control of the buses to the Z80 and pulses its /RESET pin. The CP/M BIOS then runs directly on the Z80.

As the ATmega644P doesn't have enough pins to drive all of the buses directly, I've added sixteen GPIO pins by using two MCP23S08 8-bit I/O expander chips. These are used to drive or sample the Z80 address bus; the data and control buses are driven or sampled directly by the GPIO ports on the ATmega644P.

Using a slow to respond microcontroller for I/O

The Z80 is most useful if it can talk to the outside world somehow, which is usually achieved by reading from or writing to I/O devices. In my previous design I built these out of latches and lots of glue logic. As I've added a powerful microcontroller to the computer which features a number of useful on-board peripherals, it would seem sensible to use that instead.

One problem with this idea is that the Z80 expects to read or write to an I/O device in a mere four clock cycles. The AVR has a delay between an interrupt occurring (such as a pin state changing) and executing interrupt service routine of at least five clock cycles. Even though the AVR is running at twice the clock speed of the Z80 this still doesn't provide much time to sample the address bus and perform some useful action before returning a value to the Z80. Fortunately, the Z80 has another useful pin, /WAIT, specifically to address this concern. By pulling this pin low the Z80 can be stalled, allowing the I/O device plenty of time to respond. I have included a 7474 D-type flip-flop as an SR latch to control the /WAIT pin. When the Z80's /IORQ pin goes low the flip-flop is reset, which pulls the /WAIT pin low. When the AVR notices that the /IORQ line has gone low it samples the address bus, performs the requisite task then sets the flip-flop, which drives the /WAIT pin high again and the Z80 continues executing the program.

The 7474 is a dual D-type flip-flop, so I have used the second flip-flop to halve the AVR's 20MHz clock signal to provide the 10MHz clock for the Z80.

CP/M interacts with the host computer by calling numbered BIOS functions. I have implemented a number of these BIOS functions by outputting a value to a port number that matches the BIOS function number. For example, CONOUT is function number four and is used to send the character in register C to the console.

CONOUT:
    ld a,c
    out (4),a
    ret

The AVR detects a write to port 4 and sends the incoming byte to one of its UARTs. I have connected this UART to a simple transistor inverter (pictured in the top right of the above photograph) and plugged the output from that into one of my PC's serial ports, so by running a terminal emulator I can see the output of CP/M on the screen. I have implemented only a handful of other functions (WBOOT outputs a value to port 1 to indicate that I should load the BDOS and CCP into RAM from the flash memory and READ can be used to copy 128 byte floppy disk sectors from flash memory to Z80 RAM) so the results are not exactly impressive:

Loading BIOS...OK
Loading BDOS...OK
Loading CCP...OK

A>

As I haven't implemented console input yet there's no way to type at the prompt, but that it gets that far is encouraging.

I haven't implemented writing to the flash memory due to a mistake I made when reading its datasheet. When writing to flash memory the value you write is ANDed with the data that's already there (you can only set a 1 bit to a 0 bit, but not vice-versa) – this is referred to as programming. If you want to write a 1 bit you have to erase the memory before writing to it (this is unsurprisingly referred to as erasing). Flash memory can be split into pages (small regions, in this case 256 bytes) and sectors (large regions, in this case 64KB). You can often program any number of bytes (up to a page at a time, aligned to page boundaries) but can only erase in larger blocks — pages, sectors, or the entire memory (bulk erase). I thought that the flash memory ICs I bought supported page erasing, but they only support sector erasing. CP/M transfers data between floppy disks and RAM in 128 byte floppy disk sectors, so to write an updated sector I would need to read 64KB from the flash memory, update a 128 byte region within it, erase an entire flash sector, then program the 64KB back to it. This would be very slow and quickly wear out the flash memory, so I am looking for some replacement flash memory ICs which do support page erase.

SPI flash memory programmer

To copy the system files and a sample disk image to the flash memory I cobbled together the above parallel port programmer which is driven by an application cobbled together in C#. It's rather slow but gets the job done — unlike my AVR programmer. After finally managing to get CP/M to boot in a satisfactory manner I made a few tweaks to the AVR program and hit the "Build and Program" button in the editor. The code built, but rather than program the AVR my computer switched off. No error message, not even a blue screen, just a sudden and surprising power down. Since then I've only managed to talk to the AVR once; every other time has resulted in either a power down or blue screen. I had hoped to add some keyboard handling routines to the project to at least be able to interact with CP/M, but after fiddling around for an hour and a half without managing to get anything working again I gave up. I wish I knew why it suddenly stopped working, after hours of reliable service — maybe it's a hint that it's time to buy a proper USB debugger rather than the cheap and cheerful home-made serial port programmer I've been using!

Power supply insidesPower supply enclosure

One equally cheap but useful addition to my tools is the above 5V power supply (yes, it's just a 7805 regulator in a box). Every project I have built needs a 5V supply from somewhere, which usually comes from a 7.5V wall wart power supply unit regulated to 5V with a 7805. This takes up valuable breadboard space and the weight of the cable from the power supply tends to drag the breadboard around the smooth surface of my desk, so having a dedicated box with an on-off switch, indicator LED, reverse voltage protection and an easy way to connect to the circuit via 2mm sockets is very handy indeed.

I now need to find a way to program AVRs without my PC switching itself off before I can make any more progress on the project...

Thinking about CP/M

Wednesday, 24th February 2010

It's been some time since I worked on my Z80 computer project, but the recent electronics projects I've completed have got me thinking about it again.

I did record a video to demonstrate the basic parts of the computer and some of its flaws a few months ago, which can be seen above. However, I'm now thinking of a more radical redesign than fixing the I/O board's shortcomings.

One of the reasons for my lack of motivation is that even if I did get something working I wouldn't have much software to run on it; it would be a lot of work to write software that only ran on that one particular machine. BBC BASIC helps somewhat, but an even better solution would be to model the device on an existing machine and run its operating system on it.

Fortunately, there was a popular operating system for the 8080 (and, by extension, the Z80) – CP/M. This is a very simple operating system that inspired DOS. Crucially, it is not hardware-specific, the source code is available and there is a wide range of software available for it, including BBC BASIC.

CP/M is made up of three main components. At the highest level is the Console Command Processor, or CCP. This provides the command-line interface, a handful of built-in commands and handles loading and executing external programs. It achieves this with the aid of the Basic Disk Operating System, or BDOS, which exposes a number of useful routines for a variety of tasks, such as outputting text to the display, searching for files on the disk or reading console input.

Both of the above components are machine-independent – they simply need to be copied to the correct address in RAM when the computer starts. Relocating them to a particular address requires setting a single value in their respective source files and reassembling them, which is nice and easy. It's the third component – the Basic I/O System, or BIOS – that requires a bit more work. This is the only part that is tailored to a particular machine's hardware, and my current implementation is listed below.

CCP    = $DC00
BDOS   = $E406
BIOS   = $F200

IOBYTE = $0003
CDISK  = $0004

DMAAD  = $0008
CTRACK = $000A
CSEC   = $000C

.org BIOS

	jp BOOT    ; COLD START
WBOOTE
	jp WBOOT   ; WARM START
	jp CONST   ; CONSOLE STATUS
	jp CONIN   ; CONSOLE CHARACTER IN
	jp CONOUT  ; CONSOLE CHARACTER OUT
	jp LIST    ; LIST CHARACTER OUT
	jp PUNCH   ; PUNCH CHARACTER OUT
	jp READER  ; READER CHARACTER OUT
	jp HOME    ; MOVE HEAD TO HOME POSITION
	jp SELDSK  ; SELECT DISK
	jp SETTRK  ; SET TRACK NUMBER
	jp SETSEC  ; SET SECTOR NUMBER
	jp SETDMA  ; SET DMA ADDRESS
	jp READ    ; READ DISK
	jp WRITE   ; WRITE DISK
	jp LISTST  ; RETURN LIST STATUS
	jp SECTRAN ; SECTOR TRANSLATE

DISKPARAM
	.dw $0000  ; No sector translation.
	.dw $0000  ; Scratch
	.dw $0000  ; Scratch
	.dw $0000  ; Scratch
	.dw DIRBUF ; Address of a 128-byte scratch pad area for directory operations within BDOS. All DPHs address the same scratch pad area.
	.dw DPBLK  ; Address of a disk parameter block for this drive. Drives with identical disk characteristics address the same disk parameter block.
	.dw CHK00  ; Address of a scratch pad area used for software check for changed disks. This address is different for each DPH.
	.dw ALL00  ; Address of a scratch pad area used by the BDOS to keep disk storage allocation information. This address is different for each DPH.

DIRBUF
	.fill 128
	
DPBLK           ; DISK PARAMETER BLOCK, COMMON TO ALL DISKS
	.DW 26      ; SECTORS PER TRACK
	.DB 3       ; BLOCK SHIFT FACTOR
	.DB 7       ; BLOCK MASK
	.DB 0       ; NULL MASK
	.DW 242     ; DISK SIZE-1
	.DW 63      ; DIRECTORY MAX
	.DB 192     ; ALLOC 0
	.DB 0       ; ALLOC 1
	.DW 16      ; CHECK SIZE
	.DW 2       ; TRACK OFFSET

CHK00
	.fill 16

ALL00
	.fill 31

; =========================================================================== ;
; BOOT                                                                        ;
; =========================================================================== ;
; The BOOT entry point gets control from the cold start loader and is         ;
; responsible for basic system initialization, including sending a sign-on    ;
; message, which can be omitted in the first version.                         ;
; If the IOBYTE function is implemented, it must be set at this point.        ;
; The various system parameters that are set by the WBOOT entry point must be ;
; initialized, and control is transferred to the CCP at 3400 + b for further  ;
; processing. Note that register C must be set to zero to select drive A.     ;
; =========================================================================== ;
BOOT
	xor a
	ld (IOBYTE),a
	ld (CDISK),a
	jp GOCPM

; =========================================================================== ;
; WBOOT                                                                       ;
; =========================================================================== ;
; The WBOOT entry point gets control when a warm start occurs.                ;
; A warm start is performed whenever a user program branches to location      ;
; 0000H, or when the CPU is reset from the front panel. The CP/M system must  ;
; be loaded from the first two tracks of drive A up to, but not including,    ;
; the BIOS, or CBIOS, if the user has completed the patch. System parameters  ;
; must be initialized as follows:                                             ;
;                                                                             ;
; location 0,1,2                                                              ;
;     Set to JMP WBOOT for warm starts (000H: JMP 4A03H + b)                  ;
;                                                                             ;
; location 3                                                                  ;
;     Set initial value of IOBYTE, if implemented in the CBIOS                ;
;                                                                             ;
; location 4                                                                  ;
;     High nibble = current user number, low nibble = current drive           ;
;                                                                             ;
; location 5,6,7                                                              ;
;     Set to JMP BDOS, which is the primary entry point to CP/M for transient ;
;     programs. (0005H: JMP 3C06H + b)                                        ;
;                                                                             ;
; Refer to Section 6.9 for complete details of page zero use. Upon completion ;
; of the initialization, the WBOOT program must branch to the CCP at 3400H+b  ;
; to restart the system.                                                      ;
; Upon entry to the CCP, register C is set to thedrive;to select after system ;
; initialization. The WBOOT routine should read location 4 in memory, verify  ;
; that is a legal drive, and pass it to the CCP in register C.                ;
; =========================================================================== ;
WBOOT

GOCPM
	ld a,$C3      ; C3 IS A JMP INSTRUCTION
	ld ($0000),a  ; FOR JMP TO WBOOT
	ld hl,WBOOTE  ; WBOOT ENTRY POINT
	ld ($0001),hl ; SET ADDRESS FIELD FOR JMP AT 0
	
	ld ($0005),a  ; FOR JMP TO BDOS
	ld hl,BDOS    ; BDOS ENTRY POINT
	ld ($0006),hl ; ADDRESS FIELD OF JUMP AT 5 TO BDOS

	ld bc,$0080   ; DEFAULT DMA ADDRESS IS 80H
	call SETDMA

	ei            ; ENABLE THE INTERRUPT SYSTEM
	ld a,(CDISK)  ; GET CURRENT DISK NUMBER
	ld c,a        ; SEND TO THE CCP
	jp CCP        ; GO TO CP/M FOR FURTHER PROCESSING

; =========================================================================== ;
; CONST                                                                       ;
; =========================================================================== ;
; You should sample the status of the currently assigned console device and   ;
; return 0FFH in register A if a character is ready to read and 00H in        ;
; register A if no console characters are ready.                              ;
; =========================================================================== ;
CONST
	out (2),a \ ret

; =========================================================================== ;
; CONIN                                                                       ;
; =========================================================================== ;
; The next console character is read into register A, and the parity bit is   ;
; set, high-order bit, to zero. If no console character is ready, wait until  ;
; a character is typed before returning.                                      ;
; =========================================================================== ;
CONIN
	out (3),a \ ret

; =========================================================================== ;
; CONOUT                                                                      ;
; =========================================================================== ;
; The character is sent from register C to the console output device.         ;
; The character is in ASCII, with high-order parity bit set to zero. You      ;
; might want to include a time-out on a line-feed or carriage return, if the  ;
; console device requires some time interval at the end of the line (such as  ;
; a TI Silent 700 terminal). You can filter out control characters that cause ;
; the console device to react in a strange way (CTRL-Z causes the Lear-       ;
; Siegler terminal to clear the screen, for example).                         ;
; =========================================================================== ;
CONOUT
	out (4),a \ ret

; =========================================================================== ;
; LIST                                                                        ;
; =========================================================================== ;
; The character is sent from register C to the currently assigned listing     ;
; device. The character is in ASCII with zero parity bit.                     ;
; =========================================================================== ;
LIST
	out (5),a \ ret

; =========================================================================== ;
; PUNCH                                                                       ;
; =========================================================================== ;
; The character is sent from register C to the currently assigned punch       ;
; device. The character is in ASCII with zero parity.                         ;
; =========================================================================== ;
PUNCH
	out (6),a \ ret

; =========================================================================== ;
; READER                                                                      ;
; =========================================================================== ;
; The next character is read from the currently assigned reader device into   ;
; register A with zero parity (high-order bit must be zero); an end-of-file   ;
; condition is reported by returning an ASCII CTRL-Z(1AH).                    ;
; =========================================================================== ;
READER
	out (7),a \ ret

; =========================================================================== ;
; HOME                                                                        ;
; =========================================================================== ;
; The disk head of the currently selected disk (initially disk A) is moved to ;
; the track 00 position. If the controller allows access to the track 0 flag  ;
; from the drive, the head is stepped until the track 0 flag is detected. If  ;
; the controller does not support this feature, the HOME call is translated   ;
; into a call to SETTRK with a parameter of 0.                                ;
; =========================================================================== ;
HOME
	ld bc,0
	jp SETTRK

; =========================================================================== ;
; SELDSK                                                                      ;
; =========================================================================== ;
; The disk drive given by register C is selected for further operations,      ;
; where register C contains 0 for drive A, 1 for drive B, and so on up to 15  ;
; for drive P (the standard CP/M distribution version supports four drives).  ;
; On each disk select, SELDSK must return in HL the base address of a 16-byte ;
; area, called the Disk Parameter Header, described in Section 6.10.          ;
; For standard floppy disk drives, the contents of the header and associated  ;
; tables do not change; thus, the program segment included in the sample      ;
; CBIOS performs this operation automatically.                                ;
;                                                                             ;
; If there is an attempt to select a nonexistent drive, SELDSK returns        ;
; HL = 0000H as an error indicator. Although SELDSK must return the header    ;
; address on each call, it is advisable to postpone the physical disk select  ;
; operation until an I/O function (seek, read, or write) is actually          ;
; performed, because disk selects often occur without ultimately performing   ;
; any disk I/O, and many controllers unload the head of the current disk      ;
; before selecting the new drive. This causes an excessive amount of noise    ;
; and disk wear. The least significant bit of register E is zero if this is   ;
; the first occurrence of the drive select since the last cold or warm start. ;
; =========================================================================== ;
SELDSK
	ld hl,DISKPARAM
	ld a,c
	or a
	ret z
	ld hl,$0000 ; Only disc 0 is supported.
	ret

; =========================================================================== ;
; SETTRK                                                                      ;
; =========================================================================== ;
; Register BC contains the track number for subsequent disk accesses on the   ;
; currently selected drive. The sector number in BC is the same as the number ;
; returned from the SECTRAN entry point. You can choose to seek the selected  ;
; track at this time or delay the seek until the next read or write actually  ;
; occurs. Register BC can take on values in the range 0-76 corresponding to   ;
; valid track numbers for standard floppy disk drives and 0-65535 for         ;
; nonstandard disk subsystems.                                                ;
; =========================================================================== ;
SETTRK	
	ld (CTRACK),bc
	ret

; =========================================================================== ;
; SETSEC                                                                      ;
; =========================================================================== ;
; Register BC contains the sector number, 1 through 26, for subsequent disk   ;
; accesses on the currently selected drive. The sector number in BC is the    ;
; same as the number returned from the SECTRAN entry point. You can choose to ;
; send this information to the controller at this point or delay sector       ;
; selection until a read or write operation occurs.                           ;
; =========================================================================== ;
SETSEC
	ld (CSEC),bc
	ret

; =========================================================================== ;
; SETDMA                                                                      ;
; =========================================================================== ;
; Register BC contains the DMA (Disk Memory Access) address for subsequent    ;
; read or write operations. For example, if B = 00H and C = 80H when SETDMA   ;
; is called, all subsequent read operations read their data into 80H through  ;
; 0FFH and all subsequent write operations get their data from 80H through    ;
; 0FFH, until the next call to SETDMA occurs. The initial DMA address is      ;
; assumed to be 80H. The controller need not actually support Direct Memory   ;
; Access. If, for example, all data transfers are through I/O ports, the      ;
; CBIOS that is constructed uses the 128 byte area starting at the selected   ;
; DMA address for the memory buffer during the subsequent read or write       ;
; operations.                                                                 ;
; =========================================================================== ;
SETDMA
	ld (DMAAD),bc
	ret

; =========================================================================== ;
; READ                                                                        ;
; =========================================================================== ;
; Assuming the drive has been selected, the track has been set, and the DMA   ;
; address has been specified, the READ subroutine attempts to read one sector ;
; based upon these parameters and returns the following error codes in        ;
; register A:                                                                 ;
;                                                                             ;
;     0 - no errors occurred                                                  ;
;     1 - nonrecoverable error condition occurred                             ;
;                                                                             ;
; Currently, CP/M responds only to a zero or nonzero value as the return      ;
; code. That is, if the value in register A is 0, CP/M assumes that the disk  ;
; operation was completed properly. If an error occurs the CBIOS should       ;
; attempt at least 10 retries to see if the error is recoverable. When an     ;
; error is reported the BDOS prints the message BDOS ERR ON x: BAD SECTOR.    ;
; The operator then has the option of pressing a carriage return to ignore    ;
; the error, or CTRL-C to abort.                                              ;
; =========================================================================== ;
READ
	out (13),a \ ret

; =========================================================================== ;
; WRITE                                                                       ;
; =========================================================================== ;
; Data is written from the currently selected DMA address to the currently    ;
; selected drive, track, and sector. For floppy disks, the data should be     ;
; marked as nondeleted data to maintain compatibility with other CP/M         ;
; systems. The error codes given in the READ command are returned in register ;
; A, with error recovery attempts as described above.                         ;
; =========================================================================== ;
WRITE
	out (14),a \ ret

; =========================================================================== ;
; LISTST                                                                      ;
; =========================================================================== ;
; You return the ready status of the list device used by the DESPOOL program  ;
; to improve console response during its operation. The value 00 is returned  ;
; in A if the list device is not ready to accept a character and 0FFH if a    ;
; character can be sent to the printer. A 00 value should be returned if LIST ;
; status is not implemented.                                                  ;
; =========================================================================== ;
LISTST
	out (15),a \ ret

; =========================================================================== ;
; SECTRAN                                                                     ;
; =========================================================================== ;
; Logical-to-physical sector translation is performed to improve the overall  ;
; response of CP/M. Standard CP/M systems are shipped with a skew factor of   ;
; 6, where six physical sectors are skipped between each logical read         ;
; operation. This skew factor allows enough time between sectors for most     ;
; programs to load their buffers without missing the next sector. In          ;
; particular computer systems that use fast processors, memory, and disk      ;
; subsystems, the skew factor might be changed to improve overall response.   ;
; However, the user should maintain a single-density IBM-compatible version   ;
; of CP/M for information transfer into and out of the computer system, using ;
; a skew factor of 6.                                                         ;
;                                                                             ;
; In general, SECTRAN receives a logical sector number relative to zero in BC ;
; and a translate table address in DE. The sector number is used as an index  ;
; into the translate table, with the resulting physical sector number in HL.  ;
; For standard systems, the table and indexing code is provided in the CBIOS  ;
; and need not be changed.                                                    ;
; =========================================================================== ;
SECTRAN
	ld h,b
	ld l,c
	ret

Quite a number of the above routines simply output the value of the accumulator to a port. This is because I'm running CP/M in a Z80 emulator that I've knocked together, and am handling writes to particular ports by implementing the machine-specific operations (such as console input or output) in C#. The floppy disk file system is also emulated in C#; when the program starts, it pulls all the files from a specified directory into an in-memory disk image. Writing to any sector deletes all of the files in this directory then extracts the files from the in-memory virtual disk image back into it. This is not especially efficient, but it works rather well.

To turn this into a working bit of hardware, I intend to replace the C# part with a microcontroller to handle keyboard input, text output and interfacing to an SD card for file storage. It would also be responsible for booting the system by copying the OS to Z80 memory from the SD card. I'm not sure the best way to connect the microcontroller to the Z80, though; disk operations use DMA, which is easy enough, but for lighter tasks such as querying whether console input is available or outputting a character to the display it would be nice to be able to go via I/O ports. A couple of I/O registers may be sufficient as per the current design; a proper Z80 PIO would be even better if I can get my hands on one.

Of more concern is a suitable display; the above screenshot is from an 80-character wide display. Assuming a character was four pixels wide (which is about as narrow as they can be made whilst still being legible) imposes a minimum resolution of 320 pixels horizontally – my current LCD is only 128 pixels wide (not even half way there), and larger ones are really rather expensive!

Expression Evaluation in Z80 Assembly

Tuesday, 24th February 2009

The expression evaluators I've written in the past have been memory hungry and complex. Reading the BBC BASIC ROM user's guide introduced me to the concept of expression evaluation using top-down analysis, which only uses a small amount of constant RAM and the stack.

I took some time out over the weekend to write an expression evaluator in Z80 assembly using this technique. It can take an expression in the form of a NUL-terminated string, like this:

.db "(-8>>2)+ceil(pi())+200E-2**sqrt(abs((~(2&4)>>>(30^sin(rad(90))))-(10>?1)))",0

and produce a single answer (or an error!) in the form of a floating-point number. The source code and some notes can be downloaded here.

I initially wrote a simple evaluator using 32-bit integers. I supported the operations the 8-bit Z80 could do relatively easily (addition, subtraction, shifts and logical operations) and got as far as 32-bit multiplication before deciding to use BBC BASIC's floating-point maths package instead. The downside is that BBC BASIC has to be installed (the program searches for the application and calls its FPP routine).

I'm not sure if the technique used is obvious (I'd never thought of it) but it works well enough and the Z80 code should be easy to follow - someone may find it useful. smile.gif

Z80 computer - Lines, cubes and inverted text

Sunday, 5th October 2008

I've made a few additions to the operating system for the computer. The Console module, which handles text input and output, now supports "coloured" text - that is you can set the text foreground and background colours to either black or white. This functionality is exposed via the BBC BASIC COLOUR statement. If you pass a value between 0 and 127 this sets the foreground colour (0..63 is white, 64..127 is black) and if you pass a value between 128 and 255 this sets the background colour (128..191 is white, 192..255 is black).

2008.10.05.02.Colour.png   2008.10.05.04.TextViewport.png

The image on the right also demonstrates another addition - you can set the text viewport to occupy a partial area of the display. This is most useful when coupled with the ability to define graphics viewports, which I have yet to add.

That said, I have started writing the Graphics module. So far all it can do is draw clipped lines, and this functionality is exposed via BBC BASIC's MOVE and DRAW statements. MOVE sets the graphics cursor position - DRAW also moves the graphics cursor, but also draws a line between the new position and the previously visited one.

2008.10.05.03.Line.png

I cannot use drawing code I've written for the TI-83+ version due to differences in the LCD hardware and the way that buffers are laid out. The popular way to lay out graphics buffers on the TI-83+ is as follows:

2008.10.05.05.LCD.TI.png

Each grey block represents 8 pixels - one byte in LCD memory represents 8 pixels grouped horizontally. The leftmost bit in each 8-pixel group is the most significant bit of each byte. The data is stored in the buffer so that each row of the LCD is represented by 12 consecutive bytes. This left-to-right, top-to-bottom arrangement should seem sensible to anyone who has worked with a linear framebuffer. However, due to the way that the LCD I'm using is arranged, I'm using the following buffer layout:

2008.10.05.06.LCD.Vertical.png

The LCD hardware groups pixels vertically, but when you write a byte to it its internal address pointer moves right. Furthermore, the most significant bit of each byte written is at the bottom of each group. This may sound a little confusing, but actually works out as more efficient. Writing text is easy; I'm using a 4×8 pixel font, so all I need to do is set the LCD's internal address counter correctly then write out four bytes, one for each column of the text (other sensible font sizes for the display, such as 6×8 or 8×8 are just as easy to display).

Another example of improved efficiency is if you deal with pixel-plotting routines. Each pixel on the display can be addressed by a buffer offset and an eight-bit mask to "select" the particular pixel in an eight-pixel group. With this arrangement, moving the pixel left or right is easy; simply increment or decrement the buffer offset by one. Moving the pixel up or down is a case of rotating the mask in the desired direction. If the rotation moves the pixel mask from one 8-pixel group to another (which only happens every eight pixels) the buffer offset needs to be moved by 128 in the correct direction to shunt it up or down.

On the TI-83+, moving the pixel up or down requires moving the buffer offset up or down by 12; moving the pixel left or right is a rotation as before with a simple buffer offset increment or decrement to move between 8-pixel groups.

In Z80 assembly incrementing or decrementing a 16-bit pointer by one is a single instruction taking 6 clock cycles; moving it by a larger offset takes at least 21 clock cycles, 42 if you include backing the temporary register such an operation would take.

What may be interesting to see is how well a raycaster would work on a system that has video memory arranged into columns.

Without wishing to be typecast as that programmer who loves spinning cubes, I also wrote a cube-spinning demo to test the line drawing routines as well as some integer arithmetic routines I've added (the Z80 can't multiply or divide, so these operations need to be implemented in software).

It runs fairly smoothly (bearing in mind the 2MHz clock speed). The second half of the video has the Z80 running at 10MHz; it actually seems quite stable even though the LCD is being accessed at nearly five times its maximum speed (the system did need to be reset a few times until it worked without garbling the display).

Fixed and scaled CHIP-8/SCHIP interpreter

Wednesday, 24th September 2008

The CHIP-8/SCHIP interpreter now seems happy enough to run games, though the lack of settings to control how fast or slow they run makes things rather interesting.

2008.09.23.01.FileListing.png

First of all, I've hacked together a painfully simple read-only file system. Each file is prefixed with a 13-byte header; 8 bytes for the filename (padded with spaces), 3 bytes for the extension (padded with spaces) and two bytes for the file size. The above file listing can be generated by typing *. at the BASIC prompt.

I've written a new sprite drawing routine that scales sprites up to double size when in CHIP-8 mode; this allows CHIP-8 games to fill the entire screen. Unlike the existing sprite code, which I've retained for SCHIP games, it runs entirely from ROM; the existing sprite code has to be copied to RAM as it uses some horrible self-modifying code tricks. I should probably rewrite that bit next. smile.gif

As for the bug I mentioned in the last post, it was because of this:

; --- snip ---

; Group 9:
;   * 9XY0 - Skips the next instruction if VX doesn't equal VY.
InstructionGroup.9
	call GetRegisterX
	ld b,a
	call GetRegisterY
	cp b
	jp nz,SkipNextInstruction

; Group A:
;   * ANNN - Sets I to the address NNN.
InstructionGroup.A
	call GetLiteralNNN
	ld (DataPointer),hl
	jp ExecutedInstruction

; --- snip ---

If an instruction in the form 9XY0 is executed and VX == VY, rather than jumping to ExecutedInstruction the code runs on and executes the instruction as if it had been an ANNN as well, which ended up destroying the data pointer. Adding a jp ExecutedInstruction after the jp nz,SkipNextInstruction fixed the bug.

One other advantage of the zoomed sprites is that "half-pixel" scrolls also work correctly:

2008.09.23.05.SChip.EmuTest.png

...not that I've seen any game that uses them.

2008.09.23.02.SChip.Piper.png   2008.09.23.03.Chip8.Brix.png

2008.09.23.07.SChip.UBoat.png   2008.09.23.08.SChip.Square.png

2008.09.23.04.Chip8.Blinky.png   2008.09.23.06.SChip.Blinky.png

The last two screenshots show two versions of the game Blinky, one as a regular CHIP-8 program and the other taking advantages of the SCHIP extensions.

64KB RAM and a CHIP-8/SCHIP interpreter

Monday, 22nd September 2008

The only major hardware modification since last time is the addition of another 32KB SRAM.

This appears as two 16KB pages in the $4000..$7FFF slot. Currently only the first page is used for OS variables and scratch space, freeing up the upper 32KB entirely for BBC BASIC's use.

One other minor hardware addition is support for a dual-coloured LED on the control port. This LED will be used to signify file access - reads by a green LED and writes by a red LED. As such I haven't implemented a proper file system, but typing SAVE "FILE" or LOAD "FILE" at the prompt will transfer data between the Z80 RAM and a 24LC256 32KB EEPROM. The routines do not pay attention to any file name specified - the first two bytes on the EEPROM indicate the file size, and the rest of the EEPROM is the file. I think some sort of simplified version of FAT may work well, as the EEPROM has a natural page size of 64 bytes which could be used in place of clusters.

2008.09.15.02.Memory.Board.Underside.jpg
Adding the second 32KB SRAM required soldering wires to the underside of the stripboard, not something I'd recommend!

As I have not yet added any graphical commands to BBC BASIC, and as porting assembly programs to this hardware is going to be a bit of a pain until I decide on the way the OS is going to work, I decided to try and port Vinegar to the system. Vinegar is a CHIP-8 and SCHIP interpreter - CHIP-8 programs being simple bytecode and so relatively simple to interpret.

2008.09.22.01.Chip8.Joust.png

The code I had written was difficult to port, however, being inefficiently and messily written, so I ended up rewriting all of it apart from the sprite drawing routines. The TI-83+ LCD follows the usual trend of storing 8 horizontal pixels in each byte of video memory. The LCD I have stores 8 vertical pixels in each byte of video memory, which means that each 8×8 pixel block in memory needs to be rotated by 90° before being sent to the LCD hardware. This is understandably very slow, and not helped by the Z80 only running at 2MHz. To further complicate issues, games rely on two 60Hz timers, and I have no timing hardware. The current version of the interpreter has some bugs, but is good enough to run some SCHIP programs.

CHIP-8 programs are displayed squashed in the top-left hand corner, as they're designed to run in a 64×32 video mode unlike SCHIP's 128×64 (happily, the resolution of the LCD) - typically, the one thing I really did need to fix for the new hardware, the sprite code, is the only thing I copied over. In reality, CHIP-8 graphics would need to be scaled up to fit the screen. Working out a way of getting the system to operate at 10MHz would really be a welcome upgrade!

Times, backlights and off-page calls

Sunday, 14th September 2008

Dates, times and backlights

I'm using a DS1307 real-time clock to provide the computer with real-time date and time functions. It's a great little chip - all it needs is power, two lines for I2C communications, a 32768Hz crystal between two pins and a back-up battery to keep it ticking when main power is removed and it's happy. That accounts for seven pins; the last remaining pin can be used as a one-bit output (you can set it to a high or low state in software) or it can be configured to output a square wave at 1Hz, ~4kHz, ~8kHz or ~32kHz.

2008.09.14.01.Clock.png

BBC BASIC can access the clock via the TIME$ pseudo-variable. This string variable returns the date and time in the format Sun,14 Sep 2008.15:20:00, and you can set the clock by assigning to the variable. When setting the clock you can specify either the date, the time, or both. Parsing the string has been an interesting exercise in Z80 programming, as it's not something I've ever attempted without regular expressions before!

2008.09.14.02.SettingClock.jpg

The only hardware modification since last time is a very poorly implemented software control of the backlight. The fifth bit of the control port specifies whether the backlight is on or off, and it can be toggled with the *BACKLIGHT command. I say "poorly implemented" as the transistor driver I'm using to interface the hardware port with the backlight LEDs results in a much dimmer backlight than when I had the LEDs hooked up directly to the power supply (on the positive side, at least the 5V regulator's heatsink is cool enough to touch - the backlight draws a lot of current).

Calling off-page functions

Now that I have access to all eight 16KB "pages" that make up the 128KB OS ROM, it may help to explain how one can use all of this memory. After all, if page 1 is swapped in and you wish to call a function on page 2, a regular Z80 call isn't going to work as you need to swap page 2 before calling the function then swap page 1 back in afterwards.

The trick is to exploit the way that the Z80 handles calling subroutines. There is a 16-bit register, PC, which stores the address of the next instruction to execute. When you call a subroutine, the Z80 pushes PC onto the stack then sets PC to the address of the subroutine. When you return from a subroutine (via the ret instruction) the Z80 simply pops the value it previously pushed onto the stack and copies this back to PC. Instead of calling the target subroutine directly, you call a special handler that is available on every page. Following your call is 16-bit identifier for the off-page function you wish to call. This handler then (prematurely) pops off the return address from the stack, reads the 16-bit value that follows it (which is the indentifier of the function you wish to call), looks up the page and address of the target function, swaps in the correct page and calls it as normal. When the function returns, the handler then swaps back the calling page and jumps back to the return address.

The Z80 has a series of rst instructions that call fixed addresses within the first 256 bytes of memory. These instructions are useful as they're small (one byte vs three bytes for a regular call) and fast, so I'm using rst $28 to call the off-page call handler (for no other reason than it's the same as the handler on the TI-83+).

As an example, let's say you had this function call at address $2B00:

$2B00:    rst $28
$2B01:    .dw $30F0
$2B03:    ; We'd return here.

When the Z80 executed that rst $28 it would push $2B01 (address of the next instruction) to the stack then jump to $28. The handler at $28 would do something like this:

    pop hl    ; hl is a 16-bit register and would now contain $2B01
    ld e,(hl) ; Read "e" from address pointed to by hl, now equals $F0
    inc hl    ; hl = $2B02
    ld d,(hl) ; Read "d" from address pointed to by hl, now equals $30
    inc hl    ; hl = $2B03 ("real" return address)
    push hl   ; push hl back on the stack so when we return from here we end up in the correct place.

Now, de is $30F0 - this is the identifier of the function we're calling. In my case, the identifier points to a function table on page 0. Each entry in the table is three bytes - one byte for the page index and two bytes for the address of the function on the that page. We'd need to do something like this:

    in a,(Page)  ; Read the current page into A.
    push af      ; Push A and F to the stack for later retrieval.
    and ~7       ; Mask out the lower three bits of the address.
    out (Page),a ; Sets current ROM page to 0.
    ex de,hl     ; Exchanges de and hl, so hl now points to the function identifier.
    or (hl)      ; ORs contents of memory at (hl) (ie, page number) with a, to set the target page.
    inc hl
    ld e,(hl)    ; e = LSB of target address
    inc hl
    ld d,(hl)    ; d = MSB of target address
    ex de,hl     ; hl = target address.
    out (Page),a ; Swaps in the correct page.
At this point, the correct page is swapped in and hl points to the address of the function to call. All we need to do now is call it!
    ld de,ReturnFromHandler ; Address to return to.
    push de ; Store on stack.
    jp (hl) ; Set pc = hl.
ReturnFromHandler
    ; Swap back the original page which was pushed earlier...
    pop af
    out (Page),a
    ret ; ...and return to the calling page!

A further advantage of using rst $28 to replace call is that both are the same size, so the assembler can check if you're calling an address on the same page or a different one and insert the regular (and much faster) Z80 call in places where you don't need to swap the page.

Finally, the obligatory video, this time showing a clock that toggles the backlight once a second.

Bank-Switching Memory and I2C

Thursday, 11th September 2008

Cheers for the comments. smile.gif As EasilyConfused pointed out, I have done calculator programming in the past, which makes this much easier - learning Z80 assembly to program a calculator influenced the choice of CPU in this computer, and porting BBC BASIC to the calculator showed that with a minimal amount of code to sit between it and the hardware you'd have a decent operating system with very little work. And if a Terminator-related name is good enough for the UK military, it should be good enough for this project...

The I/O board from a few posts ago has undergone a few revisions:

2008.09.12.01.IO.Board.jpg

Both PS/2 ports are now fully wired up, though only the lower one is currently used by the OS for keyboard input. I will need to adjust the AT protocol routines (the AT protocol is used to control both keyboard and mouse) to support multiple physical ports, as it was adapted from code I wrote for the TI-83+ calculator and as such only supports one device at a time. The mouse position will be polled by calling ADVAL(axis%), which on the BBC Micro would return the joystick position (axis% specifies the type of information to retrieve from the mouse - a value of 0 returns the buttons as a bitfield, 1 returns the movement in the X axis, 2 the movement in the Y axis and 3 the amount the scrollwheel has been scrolled).

At the very bottom of the circuit board is another 8-bit latch. This is for the (currently) write-only control port. The three least significant bits specify the current ROM page (one of eight 16KB ROM pages can be swapped in for a total of 128KB) and the next bit specifies one of two 16KB RAM pages accessible in the $4000..$7FFF address range. One of the other bits will be used to switch the LCD backlight on and off in software, one more may be connected to a buzzer, and I'm sure I can find some use for the last two. As it's write-only, its current state needs to be stored in RAM so that you can change bits of it (eg when changing the ROM page you wouldn't want to change the backlight status at the same time; you'd need to retrieve the current state and mask in the bits you wish to preserve). This is obviously an ugly hack, and I'm hoping I'll be able to use some of the space to the right of the latch IC on the circuit board to add the other latch to allow the port to be read as well (an I/O port needs two latches - an output latch that takes data from the data bus and outputs it to external hardware, and an input latch that takes data from external hardware and puts it back on the data bus).

The first test of the new ROM paging hardware was to display a simple animation. Assuming 1KB on each ROM page was taken up by the animation playback program, that leaves 15KB per ROM page. A frame (128×64 pixels) is 1KB, so that's 15 frames per page, or 120 frames total. I converted a clip from Pink Floyd's Arnold Layne music video to a suitable format and wrote a playback routine that could run from RAM. When the computer booted it would copy the player to RAM and run it from there as it could then run uninterrupted when different ROM pages were swapped in to read the frame data.

An animation like this is a useful test, as if the ROM paging didn't work properly (simulated by holding the three ROM page selection lines low) the software would still run, but would just loop the first 15 frames (or play chunks of 15 frames out of sequence) instead of crashing.

Another addition to the circuit above is the cluster of discrete transistors and resistors under the lowest PS/2 port. This is the same sort of pair of open-collector I/O data lines that drive each PS/2 port, except that the two data lines are fed out of the I/O board and back to the breadboard that's currently sitting between the memory board and the I/O board to these two simple 8-pin chips:

2008.09.12.03.I2C.Chips.jpg

This is the I2C bus, a simple, low-speed, two-wire bus that will allow other components be easily connected to the computer. The I2C protocol is implemented in software. The two chips in the above image are a DS1307 real-time clock (foreground, with quartz crystal) which I hope to use for timing purposes and a 24LC256 32K×8 EEPROM which I hope to use for file storage. I would need to have some way of accessing the I2C bus externally (to plug in EEPROMs as removable storage) as well as supporting the internal devices.

I haven't yet done any work on supporting I2C devices properly, but I have added I2C bus emulation to the emulator I'm using to develop the OS. BBC BASIC will pass commands prefixed with a *STAR to the operating system, so I've added a *I2CPROBE command that will hammer through all available addresses and list any devices that acknowledge a write request.

2008.09.12.03.I2C.Probe.png
$A0 is the EEPROM and $D0 is the clock.

I think I may have dug myself into a hole for CPU timing. I mentioned that I will need to drop the CPU clock to 2MHz when accessing the LCD; unfortunately, switching between 2MHz and 10MHz doesn't seem to work very well. I can run the system relatively stably at either speed (though at 10MHz data sent to the LCD is occasionally corrupted) but if I try and switch dynamically (eg switching from the 10MHz to the 2MHz clock when /IORQ goes low to indicate an I/O request) the system locks up. My assumption is that during time it takes the logic gates that perform the 10MHz/2MHz switch to properly settle into their new state (which is in the tens of nanoseconds) the clock signal stutters a little, effectively producing a clock signal (albeit a brief one) well over 10MHz. I don't have an oscilloscope to verify this, however. sad.gif

Running BBC BASIC on a home-built computer

Sunday, 7th September 2008

2008.09.07.01.LookAroundYou.jpg
This computer needs a name - I'd welcome any suggestions!

I have built a circuit on another piece of stripboard that will handle memory, clock signal generation and the Z80 itself.

A few posts ago I was wondering about how I'd partition memory. To date I've been using a very simple circuit where the lower 32KB of addressable memory is mapped to ROM and the upper 32KB is mapped to RAM. As my ROM chip is 128KB and I have two 32KB RAM chips, this seems a bit wasteful.

The memory layout I'm now using is quite simple: the upper 32KB is still mapped to RAM. However, only the first 16KB is mapped to ROM, and the three most significant bits of the ROM chip's address lines are connected to a device on the I/O board so that one of its eight 16KB "pages" can be swapped in. The next 16KB will be mapped to RAM, and the most significant bit of the RAM chip's address is connected to the same device on the I/O board so one of its two 16KB "pages" can be swapped in.

2008.09.07.03.Memory.Map.gif

For more information, see the Wikipedia article on bank switching. There is a potential problem here; the Z80 uses particular fixed addresses for certain operations. The three most obvious ones are $0000 (jumped to on reset), $0038 (address of maskable interrupt handler) and $0066 (address of non-maskable interrupt handler). As which 16KB bank switched in at power-on is effectively random, the easy way around this problem is to ensure that the first 256 bytes or so of every ROM page has the same code assembled on it. This means that whichever page is swapped in on boot doesn't matter, as the same common boot code is available on each page.

The assembled memory board looks like this:

I have only attached one of the 32KB RAM chips. The wiring was becoming a bit of a nightmare (I think I'll need to solder to the track side of the stripboard to fit in that other RAM chip) so for the moment the system can only access the fixed 32KB RAM. I haven't yet added the device on the I/O board to handle bank switching, so for the moment the ROM is permanently configured to access the first 16KB page by pulling the its three externally controllable address lines low.

That said, this machine does genuinely run BBC BASIC (the last system only ran a mockup with a dummy header at the top of the screen). I've done quite a bit of work on the OS in the emulator and it works pretty well there, and with a minor adjustment to cram it onto a single 16KB page it works well on hardware too.

The row of chips along the bottom of the memory board are responsible for generating the clock signals that drive the computer. If this looks needlessly complex, that's because it can run at either 10MHz or 2MHz and generates the E signal for LCD access. The CPU needs to drop to 2MHz when accessing the LCD (the LCD driver can't keep up, otherwise) so I'll probably end up connecting the input for this 2MHz/10MHz switch to the LCD chip enable pins so that normally the system runs at 10MHz but drops to 2MHz when accessing the LCD. Allowing the user to drop to 2MHz to save power is an appealing idea, however...

2MHz should be enough for anyone

Wednesday, 27th August 2008

2008.08.27.01.LCD.Ribbon.jpgLCD Timing
Last time I discussed the hardware I mentioned I had LCD timing issues. I have finally resolved them, but this has been the most time consuming part of the project so far.

The first thing to sort out was the LCD's E pin. Once you have set up the LCD's input pins to a state where they're ready to read or write data, you need to drive this pin high. I had had some success by holding it high permanently and relying on the Z80 to set all the other to the right state at roughly the same moment, but this was inaccurate and resulted in occasional display glitching.

Consulting the datasheet, it appears that once the input pins are ready E needs to be held low for at least 450nS and then needs to be driven high for at least 450nS. Hmm. During an I/O request (and once the Z80 has prepared the address and data bus) there's a delay of 1 clock cycle, then /IORQ is held low for about 2.5 CPU cycles. That is the window of opportunity. I have connected a binary counter to the Z80's clock signal and take the least significant bit of the output - every clock cycle this output toggles between a low and a low, effectively halving the CPU clock rate. I then connect the counter's reset pin (which overrides the clock input and forces it to output zero) to /IORQ. The result is that when the Z80 is not accessing hardware the counter is held in its reset state, and E is held low. When the Z80 holds /IORQ low, the counter starts up and outputs a zero for one CPU cycle, outputs a one for the next CPU cycle, then outputs a zero for the next half clock cycle at which point /IORQ goes high again and it is back to zero anyway. This is exactly what's needed!

This also allows us to calculate the maximum CPU clock rate. If we are generous and allow E to be low for 500nS then high for 500nS, that gives us a CPU clock rate of 1/500nS=2MHz.

Anyhow, that's one problem resolved, but there was still one nasty bug. When reading from the LCD it would occasionally end up writing to the area that was being read or, in worse cases, the Z80 would "crash". The LCD has a R/#W pin that is held high when reading and held low when writing. I had connected it directly to the Z80's /WR pin, which is high normally and low when writing. The problem here is "normally" as the LCD was expecting to be read even when the Z80 wasn't requesting a /RD. When being read, the LCD expects to put something onto the data bus, and it appeared that it kept thinking that it needed to put something on the bus when it wasn't needed. This caused fighting with the other chips that were trying to put their own values on the data bus, hence the crashes as the Z80 received invalid data.

The answer was very easy; simply connect the Z80's /RD pin to the LCD's R/#W pin via a NOT gate. In the default state the pin is held low (LCD expects a write and leaves the data bus alone), and only goes high during a /RD. The LCD interface is now very robust.

CPU Clock
Above I mention the calculation for the maximum clock rate. Rather than use the 555 for timing, I switched to using a 10MHz crystal resonator oscillator. I'm using the serial resonant circuit from z80.info with a 74F04 hex inverter (second circuit down). Fortunately the counter chips I have are decade counters (ie, designed to count from 0 to 9) made up of a ÷2 and a ÷5 section. I can connect a 10MHz oscillator to the ÷5 section and use the output of that to drive the CPU. In the final design I'd like to add a "hardware control port" with a bit that would let the programmer choose 2MHz or 10MHz mode by setting or resetting a particular output bit (other control bits would include switching the LCD backlight on or off and a buzzer for beeping sound output).

PS/2 Ports
As a friend pointed out, the 8-bit open-collector I/O port (which will drive two PS/2 ports, the I2C bus and TI calculator link port) had a flaw - there was no resistor on the base of the output transistors. The result is that if the output latch tries to drive the transistor base high, the transistor switches on and shorts the output of the latch to ground. This was clearly a problem in the design as the LCD backlight dimmed when trying to output to these ports as they drew a excessive amount of current when effectively short-circuited. A 22K resistor between the output latch and base of the transistor fixed the problem.

2008.08.27.02.PS2.Ports.jpg

In the above photo, I've also added two 100K resistors to hold the output high when floating, but only to the foreground PS/2 port for the time being. I don't think I'll be controlling a mouse yet!

Revised OS
With a little modification of the Emerson PS/2 library I've got a basic keyboard driver up and running on the hardware. All the OS does for the moment is check for keys and display them on the screen. It's currently hard-coded to the UK layout, as I haven't yet decided how I'm going to handle storage and by keeping the layout in ROM it at least frees up a few hundred bytes of RAM that would otherwise need to be there for the scancode translation tables.

Z80 computer with a primitive I/O board

Wednesday, 20th August 2008

A computer needs some way of interacting with the outside world via input and output devices. It's about time, then, that the Z80 computer project acquires a section dedicated to I/O.

2008.08.20.04.Overview.jpg

The Z80 differentiates between memory and I/O devices, though both share the data bus and the address bus. You can control I/O devices using the in (input) and out (output) instructions. When you input or output you must specify a device address and a value or target register. For example,

    in a,($20) ; Read a value from device $20 and store it in A.
    ld a,123
    out ($40),a ; Output 123 to device $40.

When you write to a device, the following happens:

  • The address bus is set to the address of the device to output to.
  • The data bus is set to the value to be written to the device.
  • The /IORQ and /WR pins on the Z80 go low.
  • The device processes the written data.
  • The /IORQ and /WR pins on the Z80 go high.

Reading from the device is very similar:

  • The address bus is set to the address of the device to read from.
  • The /IORQ and /RD pins on the Z80 go low.
  • The device puts the value to read onto the data bus.
  • The Z80 reads the value on the data bus.
  • The /IORQ and /RD pins on the Z80 go high.

(Accessing memory is a similar procedure, except with the /IORQ pin replaced by the /MREQ pin).

Interfacing I/O devices to a Z80 CPU should be rather straightforwards, then. I am using a 74HCT138N 3-to-8 line inverting decoder to handle the address bus input and /IORQ signal. This IC has three address inputs and 8 outputs. If the address input is %000, output 0 is low and all the other pins are high; if the address input is %001 output 1 is low and all the others are high; if the input is %010 output 2 is low and all the others are high (and so on and so forth). /IORQ is connected to another input on the chip, /E1, which causes all of the pins to go high when it is high regardless of the address input.

What does this mean in practice? Well, most devices have a "chip enable" or "chip select" input pin. When this input is active the device performs its function, but when the input is not active the device is deactivated and doesn't respond to any other inputs or output anything. By connecting each output of the 3-to-8 decoder to a particular device's chip enable pin I can ensure that each device is only activated when its address is specified on the address bus and the /IORQ pin on the Z80 is low.

I have connected the Z80's A5-A7 to A0-A2 on the 3-to-8 decoder. This means that the first device has a base address of $00, the second $20, the third $40 and so on at increments of $20. This might sound a little odd, but has a reason. Some devices, such as the LCD, have sub-addresses of their own. In the case of the LCD, it has a pin that specifies whether you're dealing with an instruction (such as a command to switch the display on or off or read the LCD status) or some data (which forms part of the picture on the LCD). By attaching this pin directly to the Z80's A0 and the LCD's chip select pin to output 1 from 3-to-8 decoder you end up with an LCD instruction port at $20 and an LCD data port at $21.

An LCD is all well and good, but we'll need to take input from the user. To accomplish this, I'm going to supply two PS/2 ports and implement the AT protocol (as used by PS/2 keyboards and pointing devices) in software. Each device only requires two open-collector data lines (data and clock), so a single I/O device that provides eight I/O lines would be useful.

The design I'm going for uses two 74AC373 octal transparent latches. When the latch enable input pin is held high whatever value is on the input passes through to the corresponding output. When the latch enable pin goes low, the last value that was latched at the input is still output. These particular latches also have an output enable pin that can be used to disable the outputs and let them "float" (ie, other devices can then drive that particular connection high or low as required). In this instance, one latch has its output enable pin activated so that it always outputs the last value written to it and has its latch enable pin connected to the Z80's /WR pin. The other latch has its latch enable pin activated so that it always outputs the values at its input and has its output enable pin connected to the Z80's /RD pin.

The transistors on each output are used to provide open-collector outputs. When the base of the transistor is held low, the transistor is "switched off" and its output floats, and so can be driven by external circuitry. When the base of the transistor is held high, it switches on and effectively connects the output to ground. A pull-up resistor ensures that the pin has a high signal when not connected to anything. This arrangement is useful as each pin can be driven low by either device and so works as an input or an output (for a real-world example, an AT keyboard usually outputs a clock signal on one line to the host when sending data, but if the host pulls the clock line low it can inhibit communication and the keyboard buffers the data to send instead).

Rather than build the circuit on breadboard, I went straight to stripboard. The above photo shows an incomplete version of the output board. Only one PS/2 port is wired up at all! The pin header to the left is to connect the LCD to. The coloured wires at the extreme left connect this I/O board to the rest of the computer.

I have modified the Z80 board I was using last time to add support for RAM. The 3-to-8 decoder in the bottom right is used to partition the address space into two 32KB regions. The lower 32KB is mapped to ROM, and the upper 32KB is mapped to RAM. This wastes 75% of the ROM chip (it's a 128KB chip) but without a more complex memory management unit this will have to do for the moment. The most significant bit of the address bus, A15, is fed into the 3-to-8 decoder along with the /MREQ pin.

The test software is a Z80 program that displays an animation on the LCD using 20 frames (1KB per frame) stored in ROM.

The Z80 is still not breaking MHz speeds, but there are problems here. I have not interfaced the LCD correctly, as its timing patterns for reading and writing data are quite different to those used by the Z80. Bizarrely, holding the E pin on the LCD permanently high appears to work 99% of the time, even though the datasheet indicates that it should be used to clock data in or out. The result is glitches in the data sent to the LCD, usually on the left hand side (the left hand side of the display has a propensity to believe it's been sent the "switch off" command). I'm not sure I'll be able to remedy this situation. Judging by the datasheet it looks like the LCD does its stuff when the E pin goes from a low to high state (the Z80 does everything when /IORQ goes low), so maybe simply inverting /IORQ and pumping it into E will do the trick.

Z80 Light-flasher

Wednesday, 13th August 2008

Now armed with a flash programmer, I thought it about time to try and build a Z80-based system.

Not much to look at, and it doesn't do much either. The large IC in the bottom-left, prominently marked Z, is the Z80 itself. To its left is a 555, generating a ~220Hz clock signal (yes, Hz, not MHz or even kHz). Above the Z80 is another large chip - this is the 128KB flash ROM. The eight parallel wires between them are the address bus - only A0 to A7 are connected. This only lets the Z80 address 256 bytes, but that should be enough for testing.

To the right of the flash ROM is an octal latch. This is used to provide an 8-bit output port for the system, which is connected to the LEDs to its right. As the latch's latch enable pin is active high, unlike everything else in the system (which is active low - ie, it does something when you drive it low) I have to put a NOT gate - the final black IC to the right of the Z80 - between it and the Z80's /WR (write) pin. I do not do any address decoding or even check the /IORQ pin, so any value written to to any hardware device or memory address will end up on the LED display. Not that that really matters, as there is a conspicuous lack of RAM in the system!

The large physical size and tedium of wiring even such a primitive system as this makes me wonder whether it's worth jumping straight to stripboard for subsequent hardware revisions...

For the curious, the program running on the Z80 is as follows.

.for p = 0 to 7
.defpage p, kb(16), $0000
.loop
.emptyfill $FF

.page 0

	im 1
	di

--	ld hl,LightSequence
	ld b,LightSequenceEnd-LightSequence
-	ld a,(hl)
	out (0),a
	inc hl
	djnz -
	jr --

LightSequence
	.db %00000001
	.db %00000010
	.db %00000100
	.db %00001000
	.db %00010000
	.db %00100000
	.db %01000000
	.db %10000000
	.db %01000000
	.db %00100000
	.db %00010000
	.db %00001000
	.db %00000100
	.db %00000010
	.db %00000001
	.db %00000010
	.db %00000100
	.db %00001000
	.db %00010000
	.db %00100000
	.db %01000000
	.db %10000000
	.db %01000000
	.db %00100000
	.db %00010000
	.db %00001000
	.db %00000100
	.db %00000010
	.db %00000001
	.db %00000011
	.db %00000111
	.db %00001111
	.db %00011111
	.db %00111111
	.db %01111111
	.db %11111111
	.db %11111111
	.db %00000000
	.db %11111111
	.db %00000000
	.db %11111111
	.db %00000000
	.db %11111111
	.db %00000000
	.db %11111111
	.db %00000000
	.db %11111111
	.db %00000000
	.db %11111111
	.db %11111110
	.db %11111100
	.db %11111000
	.db %11110000
	.db %11100000
	.db %11000000
	.db %10000000
	.db %00000000
	.db %10000000
	.db %11000000
	.db %11100000
	.db %11110000
	.db %01111000
	.db %00111100
	.db %00011110
	.db %00001111
	.db %10000111
	.db %11000011
	.db %11100001
	.db %11110000
	.db %01111000
	.db %00111100
	.db %00011110
	.db %00001111
	.db %10000111
	.db %11000011
	.db %11100001
	.db %11110000
	.db %01111000
	.db %00111100
	.db %00011110
	.db %00001111
	.db %00000111
	.db %00000011
	.db %00000001
	.db %00000000
LightSequenceEnd

.echoln strformat("Size: {0} bytes", $)

Emulators and neatened wiring

Tuesday, 12th August 2008

I've decided to switch to a regular 10MHz Z80 rather than a Z180, given the difficulty of using an SDIP 64. I now have a DIP 40 Z80 ready for use, but as I don't have the programmer for the Flash chip (which will hold the OS) there's not much I can do with it physically. I have therefore cobbled together a basic emulator to help develop some of the software beforehand.

2008.08.12.02.Emulator.png

To cut hardware costs I'm going to try and handle input in software. One bit of hardware I'm planning on having is an eight-bit open collector I/O port. Open collector pins float high in their reset state, and any device connected to the pin can drive it low. AT devices (keyboard and mouse) use this type of electrical connection, as does the I2C bus and the TI calculator link port. I can use up the eight pins easily - two pins per AT device (keyboard and mouse) makes four, two pins for the I2C bus and two pins for a TI calculator link port.

The I2C bus I mentioned above is a simple way to enhance the computer once built. There will be one device permanently attached to the bus, a DS1307 real-time clock, which will be used to provide time-keeping functions for the OS as well as generating periodic interrupts (the chip could be configured to trigger an interrupt 100 times a second, useful for timing game logic). I could then leave empty space on the circuit board to add other I2C devices over time, or have a socket on the case that could be used to plug in additional I2C modules.

Now that I have some more tools, namely a desoldering pump, I tidied up the horrible hack job I'd done on the graphical LCD (replacing the multiple wires with a single pin header).

2008.08.12.01.Sonic.jpg

Yes, still the PICAXE here, but I'm using its 256 byte EEPROM to store a 32×64 pixel image of Sonic that is repeated four times horizontally.

I'm still not sure what I'm doing with regards to memory or storage. I'm still working on the simple assumption that ROM is 32KB ($0000..$7FFF) and RAM is 32KB ($8000..$FFFF) but this wastes a lot of memory and isn't very flexible at all. I've planned a bank-switching MMU, but as this will require at least four registers to store what appears in each of the four 16KB windows it will end up being physically very large and painful to wire.

As for storage, I have no idea. I have some 32KB I2C EEPROMs, but 32KB isn't exactly very large. Alternatively, I have an old 512MB SD card, and could try talking to it over bit-banged SPI. (SD cards use 3.3V, though, which complicates matters - not to mention that bit-banged SPI is going to be a little sluggish). I also have a USB module which can talk to USB mass storage devices over a serial connection, so maybe I should add a UART to the project. Adding a fully-blown USB module (which also plays WMA, MP3 and MIDI files) to such an otherwise low-tech computer feels like heresy, though.

Interrupts: A Fresh Start

Thursday, 21st February 2008

I gave in and rewrote all of the Z80's interrupt emulation from scratch, finding some rather horrible bugs in the existing implementation as I went.

Some of the highlights included non-maskable interrupts ignoring the state of the IFF1 flag (this flag is automatically cleared when an interrupt is serviced, and is used to prevent the interrupt handler from being called again before it has finished) and the RETN instruction not copying the state of the IFF2 flag back to IFF1. When non-maskable interrupts are serviced, the state of IFF1 is copied to IFF2 before it gets cleared, the idea being that if you use RETN interrupts are automatically re-enabled on exit of the NMI ISR. (Contrast this with maskable interrupts, where both flags are cleared, and you need an explicit EI to re-enable them).

The HALT instruction (executes NOPs until an interrupt is requested or the CPU is reset) was also completely incorrectly (and bizarrely) implemented. The rewrite just sets a Halted property, which prevents the CPU from fetching or executing any instructions. The interrupt-triggering code simply resets this property.

This has fixed numerous bugs (I'm not sure when they were introduced, as it was all working a while back). It's gone from "not working at all" to "just about working", but some games or demos that rely on precise interrupt timing don't work properly.

HicolorDemoCorruption.png DesertSpeedtrapMisaligned.png
Game Gear Hicolor Demo and Desert Speedtrap

Both problems in the above screenshots relate to line-based interrupts from the VDP (Video Display Processor). Some other games simply hang at startup. sad.gif

Necromancy

Friday, 8th February 2008

After seeing Scet and Drilian's work on their respective emulator projects I decided I needed to do something with the stagnating Cogwheel source on my hard disk drive.

The only ROM I have tested where I can't find an explanation for a bug is the Game Gear Garfield: Caught in the Act game. Like many games, when left at the title screen it'll run a demo loop of the game in action. At one point Garfield would walk to the left of the screen, jump over a totem pole, shunt it to the right and use it as a way to jump out of a pit. However, in Cogwheel he would not jump far enough to the left, and not clearing the totem pole he'd just walk back to the right and not have anything to jump out of the pit on.

I remembered a post on MaxCoderz discussing a long-standing tradition of thinking that when a JP <condition>, <address> failed the instruction took a single clock cycle. You can see this misreported here, for example. This document, on the other hand, claims it always takes 10 clock cycles - and most importantly of all, the official user manual backs this up.

Garfield.png

So, Garfield can now get out of his pit. The user interface has changed (again) - I'm now using SlimDX to dump pixels to a Panel, which seems to be the least hassle distribution-wise and doesn't throw LoaderLockExceptions.

Brass 3 and software PAL

Wednesday, 14th November 2007

My work with the VDP in the Sega Master System made me more aware of how video signals are generated, so thought it would be an interesting exercise to try and generate them in software. This also gives me a chance to test Brass 3, by actively developing experimental programs.

Hardware.Thumb.jpg

I'm using a simple 2-bit DAC based on a voltage divider, using the values listed here. This way I can generate 0V (sync), ~0.3V (black), ~0.6V (grey) and 1V (white).

My first test was to output a horizontal sync pulse followed by black, grey, then white, counting clock cycles (based on a 6MHz CPU clock). That's 6 clock cycles per µs.

2-bit-DAC-Black-Grey-White.Thumb.jpg

The fastest way to output data to hardware ports on the Z80 is the outi instruction, which loads a value from the address pointed to by hl, increments hl, decrements b and outputs the value to port c. This takes a rather whopping 16 clock cycles (directly outputting to an immediate port address takes 11 clock cycles, but the overhead comes from loading an immediate value into a which takes a further 7). The time spent creating the picture in PAL is 52µs, which is 312 clock cycles. That's 19.5 outi instructions, and by the time you've factored in the loop overhead that gives you a safe 18 pixel horizontal resolution - which is pretty terrible.

Even with this technique, in the best case scenario you output once every 16 clock cycles which gives you a maximum time resolution of 2.67µs. This is indeed a problem as vertical sync is achieved by transmitting two different types of sync pulse, made of either a 2µs sync followed by 30µs black (short) or 30µs sync followed by 2µs black (long). In my case I plumped for the easiest to time 4µs/28µs and hoped it would work.

Anyhow, I made a small three-colour image for testing: Source.gif.

Of course, as I need to output each scanline anyway I end up with a resolution of 304 lines, which gives me rather irregular pixels, so I just stretch the above image up to 20×304. Eagle-eyed readers would have noticed that the horizontal resolution is only 18 pixels, but somewhere in the development process I forgot how to count and so made the image two pixels too wide.

TV.Thumb.jpg

As you can see, it shows (the entire image is shunted to the right). TVs crop the first and last few scanlines (they aren't wasted, though, and can be used for Teletext) so that's why that's missing. smile.gif A widescreen monitor doesn't help the already heavily distorted pixels either, but it does (somewhat surprisingly) work.

With a TI-83+ SE (or higher) you have access to a much faster CPU (15MHz) and more accurate timing (crystal timers rather than an RC circuit that changes speed based on battery levels) as well as better interrupt control, so on an SE calculator you could get at least double the horizontal resolution and output correct vertical sync patterns. You also have better control over the timer interrupts, so you could probably drive hsync via a fixed interrupt, leaving you space to insert a game (the only code I had space for checks to see if the On key is held so you can quit the program - more clock cycle counting). I only have the old 6MHz calculator, though, so I'm pleased enough that it works at all, even if the results are useless!

Brass 3.0.0.0 Beta 1

Monday, 5th November 2007

I've released a beta version of the new assembler. It comes with the compiler, a GUI builder (see the above screenshot) and the help viewer; it also comes bundled with a number of plugins.

I've also knocked together a quick demo project that can be built directly from Explorer once Brass is installed.

build_explorer.png

There are a number of missing features (such as a project editor, project templates and multiple build configurations) and no doubt broken, incomplete or untested components - but at least it's out in the wild now, which gives me an incentive to fix it!

Emulating TI-OS 1.15 and a greyscale LCD

Monday, 29th October 2007

OS 1.15 appears to boot, and if I run an OS in Pindur TI, archive the files (copy them to Flash ROM) then use that ROM dump in my emulator the files are still there, where they can be copied to RAM.

Trying to re-archive them results in a fairly un-helpful message, as I haven't implemented any Flash ROM emulation (nor can I find any information on it)...

memory.gif

Applications (which are only ever stored and executed on Flash ROM) work well, though.

cellsheet.gif    mirageos.gif    graph3.gif

I've also updated the LCD emulation a little to simulate the LCD delay; greyscale programs (that flicker pixels on and off) work pretty well now.

blur.gif    grey.gif

repton.gif    ft2.gif

Brass 3 and TI-83+ Emulation

Friday, 26th October 2007

Brass 3 development continues; the latest documentation (automatically generated from plugins marked with attributes via reflection) is here. The compiler is becoming increasibly powerful - and labels can now directly store string values, resulting in things like an eval() function for powerful macros (see also clearpage for an example where a single function is passed two strings of assembly source and it uses the smallest one when compiled).

Thanks to a series of hints posted by CoBB and Jim e I rewrote my TI-83+ emulator (using the SMS emulator's Z80 library) and it now boots and runs pretty well. The Flash ROM archive isn't implemented, so I'm stuck with OS 1.12 for the moment (later versions I've dumped lock up at "Defragmenting..."). I also haven't implemented software linking, and so to transfer files I need to plug in my real calculator to the parallel port and send files manually.

ion_received.png
ion_installed.png
ion_pacman.png
pacman-99.png

Brass 3

Tuesday, 2nd October 2007

Quake isn't dead, but I've shifted my concentration to trying to get Brass 3 (the assembler project) out.

brass_err.png

Brass 2 didn't really work, but I've taken a lot of its ideas - namely the plugin system - and kept some of the simplicity from Brass 1. The result works, and is easy to extend and maintain. Last night I got it to compile all of the programs I used for testing Brass 1 against TASM successfully.

I'm taking advantage of .NET's excellent reflection capabilities; one such example is marking plugin functions with attributes for documentation purposes, meaning that all you need to get Brass documentation is to drop your plugin collection assemblies (DLLs) into the Brass directory then open the help viewer app.

help_fsize.png

The source code examples are embedded as text, but compiled by the viewer (and thus syntax-highlighted) so you can click on directives or functions and it'll jump to their definitions automatically.

Native function support and a much-improved parser means that complex control structures can be built up, like:

file = fopen("somefile.txt")

#while !feof(file)
    .db fread(file)
#loop

fclose(file)

The compiler invokes the plugins, and the plugins talk back to the compiler ("remember your current position", "OK, we need to loop, so go back to this position", "this loop fails, so switch yourself off until you hit the #loop directive again").

The compiler natively works with project files (rather than some horrible command-line syntax) which specify which plugins to load, which include directories to search and so on and so forth. There are a number of different plugin classes:

  • IAssembler - CPU-specific assembler.
  • IDirective - assembler directive.
  • IFunction - functions like abs() or fopen().
  • IOutputWriter - writes the object file to disk (eg raw, intel hex, TI-83+ .8xp).
  • IOutputModifier - modifies each output byte (eg "unsquishing" bytes to two ASCII charcters for the TI-83).
  • IStringEncoder - handles the conversion of strings to byte[] arrays (ascii, utf8, arcane mappings for strange OS).

Unlike Brass 2, though, I actually have working output from this, so hopefully it'll get released!

As a bonus, to compare outputs between this and TASM (to check it was assembling properly) I hacked together a binary diff tool from the algorithm on Wikipedia (with the recursion removed) - it's not great, but it's been useful to me. smile.gif

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace Differ {
	class Program {
		static void Main(string[] args) {

			// Prompt syntax:
			if (args.Length != 2) {
				Console.WriteLine("Usage: Differ <file1> <file2>");
				return;
			}

			// Load both files into byte arrays (sloppy, but whatever).
			byte[][] Data = new byte[2][];

			for (int i = 0; i < Data.Length; ++i) {
				try {
					byte[] Source = File.ReadAllBytes(args[i]);
					Data[i] = new byte[Source.Length + 1];
					Array.Copy(Source, 0, Data[i], 1, Source.Length);
				} catch (Exception ex) {
					Console.WriteLine("File load error: " + args[i] + " (" + ex.Message + ")");
					return;
				}
			}

			// Quick-and-dirty equality test:
			if (Data[0].Length == Data[1].Length) {
				bool IsIdentical = true;
				for (int i = 0; i < Data[0].Length; ++i) {
					if (Data[0][i] != Data[1][i]) {
						IsIdentical = false;
						break;
					}
				}
				if (IsIdentical) {
					Console.WriteLine("Files are identical.");
					return;
				}
			}

			if (Data[0].Length != Data[1].Length) {
				Console.WriteLine("Files are different sizes.");
			}

			// Analysis:
			int[,] C = new int[Data[0].Length, Data[1].Length];
			for (int i = 1; i < Data[0].Length; ++i) {
				if ((i - 1) % 1000 == 0) Console.Write("\rAnalysing: {0:P}...", (float)i / (float)Data[0].Length);
				for (int j = 1; j < Data[1].Length; ++j) {
					if (Data[0][i] == Data[1][j]) {
						C[i, j] = C[i - 1, j - 1] + 1;
					} else {
						C[i, j] = Math.Max(C[i, j - 1], C[i - 1, j]);
					}
				}
			}
			Console.WriteLine("\rResults:".PadRight(Console.BufferWidth - 1));

			List<DiffData> CollectedDiffData = new List<DiffData>(Math.Max(Data[0].Length, Data[1].Length));
			
			for (int i = Data[0].Length - 1, j = Data[1].Length - 1; ; ) {
				if (i > 0 && j > 0 && Data[0][i] == Data[1][j]) {
					CollectedDiffData.Add(new DiffData(DiffData.DiffType.NoChange, Data[0][i], i, j));
					--i; --j;
				} else {
					if (j > 0 && (i == 0 || C[i, j - 1] >= C[i - 1, j])) {
						CollectedDiffData.Add(new DiffData(DiffData.DiffType.Addition, Data[1][j], i, j));
						--j;
					} else if (i > 0 && (j == 0 || C[i, j - 1] < C[i - 1, j])) {
						CollectedDiffData.Add(new DiffData(DiffData.DiffType.Removal, Data[0][i], i, j));
						--i;
					} else {
						CollectedDiffData.Reverse();
						break; // Done!
					}
				}
			}


			DiffData.DiffType LastType = (DiffData.DiffType)(-1);

			int PrintedData = 0;
			foreach (DiffData D in CollectedDiffData) {
				if (LastType != D.Type) {
					Console.WriteLine();
					Console.Write("{0:X4}:{1:X4}", D.AddressA - 1, D.AddressB - 1);
					LastType = D.Type;
					PrintedData = 0;
				} else if (PrintedData >= 16) {
					Console.WriteLine();
					Console.Write("         ");
					PrintedData = 0;
				}
				ConsoleColor OldColour = Console.ForegroundColor;

				switch (D.Type) {
					case DiffData.DiffType.NoChange:
						Console.ForegroundColor = ConsoleColor.White;
						break;
					case DiffData.DiffType.Addition:
						Console.ForegroundColor = ConsoleColor.Green;
						break;
					case DiffData.DiffType.Removal:
						Console.ForegroundColor = ConsoleColor.Red;
						break;
				}
				Console.Write(" " + D.Data.ToString("X2"));
				++PrintedData;
				Console.ForegroundColor = OldColour;
			}
			Console.WriteLine();

		}

		private struct DiffData {

			public enum DiffType {
				NoChange,
				Addition,
				Removal,
			}

			public DiffType Type;

			public byte Data;

			public int AddressA;
			public int AddressB;

			public DiffData(DiffType type, byte data, int addressA, int addressB) {
				this.Type = type;
				this.Data = data;
				this.AddressA = addressA;
				this.AddressB = addressB;
			}

			
		}
	}
}

Removals are shown in red, additions are shown in green, data that's the same is in white.

8-bit Raycasting Quake Skies and Animated Textures

Monday, 20th August 2007

All of this Quake and XNA 3D stuff has given me a few ideas for calculator (TI-83) 3D.

One of my problems with calculator 3D apps is that I have never managed to even get a raycaster working. Raycasters aren't exactly very tricky things to write.

So, to help me, I wrote a raycaster in C#, limiting myself to the constraints of the calculator engine - 96×64 display, 256 whole angles in a full revolution, 16×16 map, that sort of thing. This was easy as I had floating-point maths to fall back on.

2007.08.19.03.jpg

With that done, I went and ripped out all of the floating-point code and replaced it with fixed-point integer arithmetic; I'm using 16-bit values, 8 bits for the whole part and 8 bits for the fractional part.

From here, I just rewrote all of my C# code in Z80 assembly, chucking in debugging code all the way through so that I could watch the state of values and compare them with the results from my C# code.

2007.08.19.01.gif

The result is rather slow, but on the plus side the code is clean and simple. smile.gif The screen is cropped for three reasons: it's faster to only render 64 columns (naturally), you get some space to put a HUD and - most importantly - it limits the FOV to 90°, as the classic fisheye distortion becomes a more obvious problem above this.

2007.08.19.02.png

I sneaked a look at the source code of Gemini, an advanced raycaster featuring textured walls, objects and doors. It is much, much faster than my engine, even though it does a lot more!

It appears that the basic raycasting algorithm is pretty much identical to the one I use, but gets away with 8-bit fixed point values. 8-bit operations can be done significantly faster than 16-bit ones on the Z80, especially multiplications and divisions (which need to be implemented in software). You can also keep track of more variables in registers, and restricting the number of memory reads and writes can shave off some precious cycles.

Some ideas that I've had for the raycaster, that I'd like to try and implement:

  • Variable height floors and ceilings. Each block in the world is given a floor and ceiling height. When the ray intersects the boundary, the camera height is subtracted from these values, they are divided by the length of the ray (for projection) and the visible section of the wall is drawn. Two counters would keep track of the upper and lower values currently drawn to to keep track of the last block's extent (for occlusion) and floor/ceiling colours could be filled between blocks.
  • No texturing: wall faces and floors/ceilings would be assigned dithered shades of grey. I think this, combined with lighting effects (flickering, shading), would look better than monochrome texture mapping - and would be faster!
  • Ray-transforming blocks. For example, you could have two 16×16 maps with a tunnel: the tunnel would contain a special block that would, when hit, tell the raycaster to start scanning through a different level. This could be used to stitch together large worlds from small maps (16×16 is a good value as it lets you reduce level pointers to 8-bit values).
  • Adjusting floors and ceilings for lifts or crushing ceilings.


As far as the Quake project, I've made a little progress. I've added skybox support for Quake 2:

2007.08.15.01.jpg

Quake 2's skyboxes are simply made up of six textures (top, bottom, front, back, left, right). Quake doesn't use a skybox. Firstly, you have two parts of the texture - one half is the sky background, and the other half is a cloud overlay (both layers scroll at different speeds). Secondly, it is warped in a rather interesting fashion - rather like a squashed sphere, reflected in the horizon:

Sky.jpg

For the moment, I'm just using the Quake 2 box plus a simple pixel shader to mix the two halves of the sky texture.

2007.08.19.01.jpg

I daresay something could be worked out to simulate the warping.

GLSky.jpg

The above is from GLQuake, which doesn't really look very convincing at all.

2007.08.19.02.gif

I've reimplemented the texture animation system in the new BSP renderer, including support for Quake 2's animation system (which is much simpler than Quake 1's - rather than have magic texture names, all textures contain the name of the next frame in their animation cycle).

TI Emulation, Functions in Brass and Gemini on the Sega Game Gear

Tuesday, 13th February 2007

This post got me wondering about a TI emulator. I'd rather finish the SMS one first, but so as to provide some pictures for this journal I wrote a T6A04 emulator (to you and me, that's the LCD display driver chip in the TI-82/83 series calculators). In all, it's less than a hundred lines of code.

The problem with TI emulation is that one needs to emulate the TIOS to be able to do anything meaningful. Alas, I had zero documentation on the memory layout of the TI calculators, and couldn't really shoe-horn the ROM dump into a 64KB RAM, so left it out entirely. That limits my options as to what I can show, but here's my Microhertz demo -

tunnel.png cylinder.png flip.png lens.png blobs.png

I've added native support for functions in Brass.

The old Brass could do some function-type things using directives; for example, compare the two source files here:

.fopen fhnd, "test.txt"          ; Opens 'test.txt' and stores a handle in fhnd
.fsize fhnd, test_size           ; Stores the size of the file in test_size

.for i, 1, test_size
    .fread fhnd, chr             ; Read a byte and store it as "chr"
    .if chr >= 'a' && chr <= 'z' ; Is it a lowercase character?
        .db chr + 'A' - 'a'
    .else
        .db chr
    .endif
.loop

.fclose fhnd                      ; Close our file handle.

I personally find that rather messy. Here's the new version, using a variety of functions from the 'File Operations' plugin I've been writing:

fhnd = fopen("test.txt", r)

#while !feof(fhnd)
    chr = freadbyte(fhnd)
    .if chr >= 'a' && data <= 'z'
        .db chr + 'A' - 'a'
    .else
        .db chr
    .endif
#loop

fclose(fhnd)

I find that a lot more readable.

An extreme example is the generation of trig tables. Brass 1 uses a series of directives to try and make this easier.

.dbsin angles_in_circle, amplitude_of_wave, start_angle, end_angle, angle_step, DC_offset

Remembering that is not exactly what I'd call easy. If you saw the line of code:

.dbsin 256, 127, 0, 63, 1, 32

...what would you think it did? You'd have to consult the manual, something I'm strongly opposed to. However, this code, which compiles under Brass 2, should be much clearer:

#for theta = 0, theta < 360, ++theta
    .db min(127, round(128 * sin(deg2rad(theta))))
#loop

By registering new plugins at runtime, you can construct an elaborate pair of directives - in this case .function and .endfunction - to allow users to declare their own.

_PutS = $450A

.function bcall(label)
    rst $28
    .dw label 
.endfunction

bcall(_PutS)

You can return values the BASIC way;

.function slow_mul(op1, op2)
    slow_mul = 0
    .rept abs(op1)
        .if sign(op1) == 1
            slow_mul += op2 
        .else
            slow_mul -= op2
        .endif
    .loop
.endfunction

.echo slow_mul(log(100, 10), slow_mul(5, 4))

I had a thought (as you do) that it would be interesting to see how well a TI game would run on the Sega Master System. After all, they share the CPU, albeit at ~3.5MHz on the SMS.

However, there are some other differences...

  • Completely different video hardware.
  • Completely different input hardware.
  • 8KB RAM rather than 32KB RAM.
  • No TIOS.

The first problem was the easiest to conquer. The SMS has a background layer, broken up into 8×8 tiles. If I wrote a 12×8 pattern of tiles onto the SMS background layer, and modified the tile data in my own implementation of _grBufCpy routine, I could simulate the TI's bitmapped LCD display (programs using direct LCD control would not be possible).

You can only dump so much data to the VRAM during the active display - it is much safer to only write to the VRAM outside of the active display period. I can give myself a lot more of this by switching off the display above and below the small 96×64 window I'll be rendering to; it's enough to perform two blocks, the left half of the display in one frame, the right in the next.

As for the input, that's not so bad. Writing my own _getK which returned TI-like codes for the 6 SMS buttons (Up, Down, Left, Right, 1 and 2) was fine, but games that used direct input were a bit stuck. I resolved this by writing an Out1 and In1 function that has to be called and simulates the TI keypad hardware, mapping Up/Down/Left/Right/2nd/Alpha to Up/Down/Left/Right/1/2.

The RAM issue can't be resolved easily. Copying some chunks of code to RAM (for self-modifying reasons) was necessary in some cases. As for the lack of the TIOS, there's no option but to write my own implementation of missing functions or dummy functions that don't do anything.

Even with the above, it's still not perfect. If I leave the object code in Gemini, the graphics are corrupted after a couple of seconds of play. I think the stack is overwriting some of the code I've copied to RAM.

No enemies make it a pretty bad 'game', but I thought it was an entertaining experiment.

Sega Tween

Thursday, 1st February 2007

No updates for a while, I'm afraid - things have been pretty hectic.

sega_tween_3d_stereo_pair.png

I packaged up and released the Sega Tween demo I'd been working on. As you can see, I added an SMS and a 3D mode - this works with the SMS 3D glasses. The extra 3D is quite cheap to calculate - shift the rotated X coordinates one way for one eye, then the other way for the other eye. After projection to the screen they need to be shifted back a little way to re-centre, but it works quite well.

sega_tween_3d_anaglyph.png

However, I had neglected the fact that the SMS1 (which has the card slot, and hence the model that supports the 3D glasses) had a bug in the VDP and as such only supports four zoomed sprites per scanline. I added this glitch to the emulator;


In other news, I've done a small amount of work on Brass. It's quite embarrassing, really, how slow the old version is. Assembling this file:

.rept 9000
	ld a,1
.unsquish
	ld a,2
.squish
	ret
.loop

...produces this in old Brass:

Brass Z80 Assembler 1.0.4.9 - Ben Ryves 2005-2006
-------------------------------------------------
Assembling...
Pass 1 complete. (2093ms).
Pass 2 complete. (22062ms).
Writing output file...
Errors: 0, Warnings: 0.
Done!

Nearly half a minute! New Brass does a much better job of syntax parsing and caching...

Brass Assembler - Copyright © Bee Development 2005-2007
-------------------------------------------------------
ZiLOG Z80 - Copyright © Bee Development 2005-2006
TI Program Files - Copyright © Bee Development 2005-2006
Core Plugins - Copyright © Bee Development 2005-2006

Parsing source...
Building...
Writing output...
Time taken: 484.38ms.
Done!

Down to just under half a second. That's almost a 50× speed increase!

VDP Interrupts

Wednesday, 20th December 2006

road_rash_2.png

The VDP can generate two different types of CPU interrupt.

The first, and easiest, is the frame interrupt, which is requested when an entire frame has been generated. This is requested, therefore, at a regular 60Hz in NTSC regions and 50Hz in PAL regions - it's a useful timer to synchronise your game to.

The second, and more complex, is the line interrupt. This interrupt is requested when a user-definable number of scanlines have been displayed. An internal counter is decremented each active line (and one more just after), and when it overflows it resets to the value held in a VDP register and requests the interrupt (so 0 would request an interrupt every line, 1 every other line and so on). For every other line outside the active display area, the counter is reset to the contents of the VDP register.

Both interrupt types can be enabled or disabled by defined bits held in the VDP registers.

(The above should be loosely correct, the below is a little more uncertain).

Once an interrupt is requested, a flag for said interrupt is set. The flag is not reset until the VDP control port is read, so you must read the VDP control port if you expect any further interrupts.

To differentiate between line and frame interrupts you can check the value read from the control port. If the most significant bit is set, a frame interrupt (at least) was requested. Reading the vertical counter port (which returns the current scanline's vertical position) will also let you know where you are.

Something is a little wonky with my vertical counting code, as all lines end up being one too large. For the moment I'm subtracting one before returning the value (and waiting one extra scanline before triggering the frame interrupt) which is a horrible solution, but for the moment it has fixed a number of games that weren't working at all before.

earthworm_1.gif earthworm_2.gif

Earthworm Jim, which relies on line interrupts to switch on zoomed sprites to dipslay the status bar at the bottom, now plays. It's missing some graphics on the title screens, though.

For some reason, rebuilding my Z80 emulator (which is in a different project) fixed some other interrupt-related glitches, so I have a sneaking suspicion most of my earlier problems are related to using an out-of-date DLL.

road_rash_1.gif road_rash_3.png road_rash_4.png

Road Rash highlighted another bug. The VDP can draw doubled sprites - that is, when a particular bit is set it will draw sprites as 8x16 pixels, stacking two consecutive sprite tiles on top of each other. Road Rash uses this mode, but also uses odd sprite indices (odd as opposed to even, not strange). The VDP will only take even indices, so a line of code to clear the least significant bit if using doubled sprites fixed that.

Still no sound, though.

New Z80 emulator

Wednesday, 6th December 2006

sonic_gg_1.png

One chap I cannot thank enough is CoBB for all his hard work in the Z80 field.

I've rewritten the Z80 emulation from scratch; this time it uses an expanded switch block (the 'manual' way) to decode instructions. Rather than write every combination of instructions out by hand, the code making up the switch blocks up is generated by another program, reading instruction information from a table copied from an Excel spreadsheet.

At the cost of a significantly larger assembly (from 40KB to 140KB) I now get a 100% speed increase (from ~50MHz to 100MHz).

I still can't pass the port of ZEXALL I'm testing with (and the same instructions too - not bad for a 100% rewrite to end up with exactly the same bugs), but after comparing some of my offending tables against CoBB's ones I've isolated some of the hiccoughs. The only instruction group test I fail is, naturally, the one that takes the longest to execute - getting a hardware, or indeed emulator, comparison takes well over an hour.

Anyhow, rewriting the Z80 emulation seems to have been the right thing to do. As you might have guessed from the picture at the top of this entry, Sonic now runs.

sonic_gg_2.png sonic_gg_3.png

My VDP (Video Display Processor) emulation is still rather rough-and-ready (I've really been concentrating on the Z80 bit) and the second column of background tiles is not updated correctly, so I apologise in advance for the distortion! It only appears in SMS mode (the display is cropped in Game Gear mode).

sonic_sms_1.png sonic_sms_2.png

The fill colour (in the left column here) is incorrect in most games too.

sonic_2_1.png sonic_2_2.png sonic_2_3.png sonic_2_4.png sonic_2_5.png

I rather preferred Sonic 2, but maybe that's because you can pick up dropped rings and I'm rather lousy at it otherwise.

wb_1.png wb_2.png wb_3.png

I accused the previous Wonder Boy III shot of not reading the start button. Somehow I also failed to notice the missing sprites (clouds and main part of the castle), which was part of the main problem. It now appears to play well.

vfa_1.png vfa_2.png vfa_3.png vfa_4.png

Support for zoomed sprites seems to be missing in some emulators (at least, the versions of Dega and Pastorama I have to hand), but I use them so implemented them to let my programs work when testing - it's nice to see a commerical game use them too!

gunstar_1.png gunstar_2.png

Gunstar Heroes runs, but flickers and jumps (not the visible sprite 'flicker' in those screenshots) - some bug in my CPU interrupt handling or VDP interrupt generation.

psycho_fox.png the_flash.png

Psycho Fox runs insanely fast - even faster than The Flash as seen to the right - making an already difficult game to control virtually impossible. I'm really not sure what could be causing that, but the enemy sprites sometimes flicker up and back down quickly, so it could be one of the remaining CPU bugs.

fantasy_zone_2.png maze_walker.png

Fantasy Zone demonstrates both the blanking colour bug (that left column should be green) and the distorted second column bug, but plays well. I tried to get a better screenshot of Maze Walker, but not handling 3D glasses makes looking at the screen a rather unpleasant experience (it flickers the left and right eye views in quick succession; the 3D glasses had two LCD shutters that opened and closed in sequence with the images on-screen).

I might add a Dega-esque red/green anaglyph filter, but I find those rather unpleasant to look at so might provide a stereo pair view.

In any case, the most important SMS game now runs -

bios_1.png bios_2.png bios_3.png

I shall refrain from using the Terry Pratchett quote (just this once).

ys_junk.png

Some games look like the above, which makes me happy - it's the ones that do nothing at all that worry me. What with the CPU bugs, dodgy emulation of the mapper, missing (important!) hardware ports and hackish VDP, it's surprising anything runs. It's getting better.

EDIT: How are there so many spelling errors? Fantasy Start as opposed to Fantasy Zone, Video Display Hardware as the expansion of the VDP acronym... I should not write these so late at night.

Compatibility increases further...

Thursday, 23rd November 2006

wonder_boy_1.png

I've added better memory emulation (that is, handling ROM paging, RAM mirroring and enabling a BIOS or not). I wouldn't dare say "more accurate", as that might indicate that something about it is partially accurate. smile.gif

I've isolated one of the biggest problems - and that's programs getting caught in a loop waiting for an interrupt that is never triggered.

The source of these interrupts is the VDP. It can generate two kinds of interrupt - on a line basis (where you can configure it to fire an interrupt every X scanlines) or on a frame basis (where it fires an interrupt at the end of the active display).

Two bits amongst the VDP's own registers control whether the interrupt fires or not. On top of that, there are two internal flags that are set when either of the interrupts fire. They are reset when the VDP's control port is read.

Charles MacDonald's VDP documentation
Bit 5 of register $01 acts like a on/off switch for the VDP's IRQ line. As long as bit 7 of the status flags [this is the frame interrupt pending flag] is set, the VDP will assert the IRQ line if bit 5 of register $01 is set, and it will de-assert the IRQ line if the same bit is cleared.

Bit 4 of register $00 acts like a on/off switch for the VDP's IRQ line. As long as the line interrupt pending flag is set, the VDP will assert the IRQ line if bit 4 of register $00 is set, and it will de-assert the IRQ line if the same bit is cleared.

The way I've chosen to emulate this is as a boolean storing the IRQ status, and to call a function, UpdateIRQ(), every time something that could potentially change the status of it happens.

private bool IRQ = false;
private void UpdateIRQ() {
    bool newIRQ = (LineInterruptEnabled && LineInterruptPending) || (FrameInterruptEnabled && FrameInterruptPending);
    if (newIRQ && !IRQ) this.CPU.Interrupt(true, 0x00);
    IRQ = newIRQ;
}

This detects a rising edge. Falling edge hasn't worked so well. Well, truth be told, neither work very well. So, hitting the I key in the emulator performs a dummy read of the control port which usually 'unblocks' the program.

I'm hoping that my problem is related to this similar one, as the fix seems pretty straightforwards.

The SEGA logo at the top of this entry is not the only evidence of working commerical software...

wonder_boy_2.png wonder_boy_3.png

Not reading the Start button is a common affliction.

faceball_2000.png

Here's the Game Gear DOOM for Scet. smile.gif

marble_madness_1.png marble_madness_2.png marble_madness_3.png marble_madness_4.png marble_madness_5.png

Marble Madness appears to be fully playable, though needs prompting from my I key every so often.

columns_1.png columns_2.png

Columns can't find the start button either, but the demo mode works.

pinball_dreams.png

ys.png desert_strike.png

Getting as far as a title screen is still an achievement in my books.

My luck can't hold out forever, but the homebrew scene is still providing plenty of screenshots...

bock_2002_1.png bock_2002_2.png bock_2002_3.png

Maxim's Bock's Birthday 2002 has been immensely useful for sorting out ROM paging issues.

bock_2003_1.png bock_2003_2.png

Bock's Birthday 2003 demonstrates the interrupt bug, but appears to run healthily otherwise.

hicolor_1.png hicolor_2.png hicolor_3.png

Chris Covell's Hicolor Demo also demonstrates this bug, needing a prod before each new image is displayed.

digger_chan_1.png digger_chan_2.png

Aypok's Digger Chan appears to play fine.

zoop_1.png zoop_2.png nibbles.png

Martin Konrad's Zoop 'em Up and GG Nibbles games run, Zoop 'em Up seems to have collision detection issues (aluop-related bug).

The bug in the Z80 core still eludes me - how aluop with an immediate value passes and aluop with a register fails is rather worrying. At least part of the flags problem has been resolved - the cannon in Bock's KunKun & KokoKun still don't fire, but at the switches now work so you can complete levels.

Sega: Enter the Pies

Monday, 20th November 2006

The Z80 core is now a bit more accurate - ZEXALL still reports a lot of glitches, and this is even a specially modified version of ZEXALL that masks out the two undocumented flags.

The VDP (Video Display Processor - the graphics hardware) has been given an overhaul and is now slightly more accurate. A lot more software runs now. I have also hacked in my PSG emulator (that's the sound chip) from my VGM player. It's not timed correctly (as nothing is!) but it's good enough for testing.

picohertz_tween.png

picohertz_wormhole.png

Picohertz, the demo I have been working on (on and off) now runs correctly. The hole in the Y in the second screenshot is caused by the 8 sprite per scanline limit. The first screenshot shows off sprite zooming (whereby each sprite is zoomed to 200% the original size). The background plasma is implemented as a palette shifting trick.

fire_track.png

fire_track_paused.png

Fire Track runs and is fully playable. The second shot shows a raster effect (changing the horizontal scroll offset on each scanline.

Seeing as I understand the instructions that my programs use (and the results of them), and have my own understanding of parts of the hardware, it's not really surprising that the programs I've written work perfectly, but ones written by others don't, as they might (and often do) rely on tricks and results that I'm not aware of, or on hardware that I haven't implemented accurately enough. At least I do not need to emulate any sort of OS to run these programs!

SMS Power! has been an amazing resource in terms of hardware documentation and homebrew ROMs. I've been using the entries to the 2006 coding competition to test the emulator.

kunkun_kokokun_title.png kunkun_kokokun_game.png

Bock's game, KunKun & KokoKun, nearly works. The cannon don't fire, which would make the game rather easy if it wasn't for the fact that the switch to open the door doesn't work either. I suspect that a CPU flag isn't being set correctly as the result to an operation somewhere.

pong.png

Haroldoop's PongMaster is especially interesting as it was not written in Z80 assembly, but C. It's also one of the silkiest-smooth pong games I've come across.

an!mal_paws_text.png an!mal_paws.png

An!mal/furrtek's Paws runs, but something means that the effect doesn't work correctly (the 'wavy' bit should only have one full wave in it, not two - it appears my implementation of something doubles the frequency of the wave). The music sounds pretty good, though.

columns.png

Sega's Columns gets the furthest of any official software - the Sega logo fades in then out.

sega_enterpies.png

I do like the idea that Sega is an ENTERPIεS. (From the Game Gear BIOS). (I believe this is a CPU bug).

frogs.png

Charles Doty's Frogs is a bit of a conundrum. The right half of the second frog is missing due to the 8 sprites per scanline limitation of the VDP. However, Meka, Emukon, Dega and now my emulator draw the rightmost frog's tongue (and amount if it showing) differently, as well as whether the frog is sitting or leaping. There's a lot of source for such a static program (it doesn't do anything in any emulator I've tried it on, nor on hardware). Dega is by far the strangest, as the tongue moves in and out rapidly. I'm really not sure what's meant to be happening here.

Here are the results of ZEXALL so far.

Z80 instruction exerciser

ld hl,(nnnn).................OK
ld sp,(nnnn).................OK
ld (nnnn),hl.................OK
ld (nnnn),sp.................OK
ld <bc,de>,(nnnn)............OK
ld <ix,iy>,(nnnn)............OK
ld <ix,iy>,nnnn..............OK
ld (<ix,iy>+1),nn............OK
ld <ixh,ixl,iyh,iyl>,nn......OK
ld a,(nnnn) / ld (nnnn),a....OK
ldd<r> (1)...................OK
ldd<r> (2)...................OK
ldi<r> (1)...................OK
ldi<r> (2)...................OK
ld a,<(bc),(de)>.............OK
ld (nnnn),<ix,iy>............OK
ld <bc,de,hl,sp>,nnnn........OK
ld <b,c,d,e,h,l,(hl),a>,nn...OK
ld (nnnn),<bc,de>............OK
ld (<bc,de>),a...............OK
ld (<ix,iy>+1),a.............OK
ld a,(<ix,iy>+1).............OK
shf/rot (<ix,iy>+1)..........OK
ld <h,l>,(<ix,iy>+1).........OK
ld (<ix,iy>+1),<h,l>.........OK
ld <b,c,d,e>,(<ix,iy>+1).....OK
ld (<ix,iy>+1),<b,c,d,e>.....OK
<inc,dec> c..................OK
<inc,dec> de.................OK
<inc,dec> hl.................OK
<inc,dec> ix.................OK
<inc,dec> iy.................OK
<inc,dec> sp.................OK
<set,res> n,(<ix,iy>+1)......OK
bit n,(<ix,iy>+1)............OK
<inc,dec> a..................OK
<inc,dec> b..................OK
<inc,dec> bc.................OK
<inc,dec> d..................OK
<inc,dec> e..................OK
<inc,dec> h..................OK
<inc,dec> l..................OK
<inc,dec> (hl)...............OK
<inc,dec> ixh................OK
<inc,dec> ixl................OK
<inc,dec> iyh................OK
<inc,dec> iyl................OK
ld <bcdehla>,<bcdehla>.......OK
cpd<r>.......................OK
cpi<r>.......................OK
<inc,dec> (<ix,iy>+1)........OK
<rlca,rrca,rla,rra>..........OK
shf/rot <b,c,d,e,h,l,(hl),a>.OK
ld <bcdexya>,<bcdexya>.......OK
<rrd,rld>....................OK
<set,res> n,<bcdehl(hl)a>....OK
neg..........................OK
add hl,<bc,de,hl,sp>.........OK
add ix,<bc,de,ix,sp>.........OK
add iy,<bc,de,iy,sp>.........OK
aluop a,nn...................   CRC:04d9a31f expected:48799360
<adc,sbc> hl,<bc,de,hl,sp>...   CRC:2eaa987f expected:f39089a0
bit n,<b,c,d,e,h,l,(hl),a>...OK
<daa,cpl,scf,ccf>............   CRC:43c2ed53 expected:9b4ba675
aluop a,(<ix,iy>+1)..........   CRC:a7921163 expected:2bc2d52d
aluop a,<ixh,ixl,iyh,iyl>....   CRC:c803aff7 expected:a4026d5a
aluop a,<b,c,d,e,h,l,(hl),a>.   CRC:60323322 expected:5ddf949b
Tests complete



The aluop (add/adc/sub/sbc/and/xor/or/cp) bug seems to be related to the parity/overflow flag (all other documented flags seem to be generating the correct CRC). daa hasn't even been written yet, so that would be the start of the problems with the daa,cpl,scf,ccf group. adc and sbc bugs are probably related to similar bugs as the aluop instructions.

The biggest risk is that my implementation is so broken it can't detect the CRCs correctly. I'd hope not.

In terms of performance; when running ZEXALL, a flags-happy program, I get about ~60MHz speed in Release mode on a 2.4GHz Pentium 4. When ZEXALL is finished, and it's just looping around on itself, I get ~115MHz.

The emulator has not been programmed in an efficient manner, rather a simple and clear manner. All memory access is done by something that implements the IMemoryDevice controller (with two methods - byte ReadByte(ushort address) and void WriteByte(ushort address, byte data)) and all hardware access is done by something that implements the IHardwareController interface (also exposing two methods - byte ReadDevice(byte port) and void WriteDevice(byte port, byte data)).

Most of the Z80's registers can be accessed via an index which makes up part of an opcode. You'd have thought that the easiest way to represent this would be, of course, an array. However, it's not so simple - one of the registers, index 6, is (HL) - which means "whatever HL is pointing to". I've therefore implemented this with two methods - byte GetRegister(int index) and void SetRegister(int index, byte value).

Life isn't even that simple, though, as by inserting a prefix in front of the opcode you can change the behaviour of the CPU - instead of using HL, it'll use either IX or IY, two other registers. In the case of (HL) it becomes even hairier - it'll not simply substitute in (IX) or (IY), it'll substitute in (IX+d), where d is a signed displacement byte that is inserted after the original opcode.

To sort this out, I have three RegisterCollections - one that controls the "normal" registers (with HL), one for IX and one for IY. After each opcode and prefix is decoded, a variable is set to make sure that the ensuing code to handle each instruction works on the correct RegisterCollection.

The whole emulator is implemented in this simplified and abstracted manner - so I'm not too upset with such lousy performance.

I'm really not sure how to implement timing in the emulator. There's the easy timing, and the not-so-easy timing.

The easy timing relates the VDP speed. On an NTSC machine that generates 262 scanlines (60Hz), on a PAL machine that generates 313 scanlines (50Hz). That's 15720 or 15650 scanlines per second respectively.

According to the official Game Gear manual, the CPU clock runs at 3.579545MHz. I don't know if this differs with the SMS, or whether it's different on NTSC or PAL devices (the Game Gear is fixed to NTSC, as it never needs to output to a TV, having an internal LCD).

I interpret this as meaning that the CPU needs to be run for 227.7 or 228.7 cycles per scanline. That way, my main loop looks a bit like this:

if (Hardware.VDP.VideoStandard == VideoStandardType.NTSC) {
    for (int i = 0; i < 262; ++i) {
        CPU.FetchExecute(228); 
        Hardware.VDP.RasteriseLine();
    }
} else {
    for (int i = 0; i < 313; ++i) {
        CPU.FetchExecute(229);
        Hardware.VDP.RasteriseLine();
    }
}

The VDP raises an event when it enters the vertical blank area, so the interface can capture this and so present an updated frame.

The timing is therefore tied to the refresh rate of the display.

super_gg.png

Here's the fictional Super Game Gear, breezing along at 51MHz. The game runs just as smoothly as it would at 3MHz, though - as the game's timing is tied to waiting for the vertical blank.

Actually, I tell a lie - as Fire Track polls the vertical counter, rather than waiting for an interrupt, it is possible for it to poll this counter so fast (at an increased clock rate) that it hasn't changed between checks. That way "simple" effects run extra fast, but the game (that has a lot of logic code) runs at the same rate.

This works. The problem is caused by sound.

With the video output, I have total control of the rasterisation. However, with sound, I have to contend with the PC's real hardware too! I'm using the most excellent FMOD Ex library, and a simple callback arrangement, whereby when it needs more data to output it requests some in a largish chunk.
If I emulate the sound hardware "normally", that is updating registers when the CPU asks them to be updated, by the time the callback is called they'll have changed a number of times and the granularity of sound updates will be abysmal.

A solution might be to have a render loop like this:

for (int i = 0; i < 313; ++i) {
    CPU.FetchExecute(229);
    Hardware.VDP.RasteriseLine();
    Hardware.PSG.RenderSomeSamples(1000);
}

However, this causes its own problems. I'd have to ensure that I was generating exactly the correct number of samples - if I generated too few I'd end up with crackles and pops in the audio as I ran out of data when the callback requested some, or I'd end up truncating data (which would also crackle) if I generated too much.

My solution thus far has been a half-way-house - I buffer all PSG register updates to a Queue, logging the data written and how many CPU cycles had been executed overall when the write was attempted. This way, when the callback is run, I can run through the queued data, using the delay between writes to ensure I get a clean output.

As before, this has a problem if the timing isn't correct - rather than generate pops or crackles, it means that the music would play at an inconsistent rate.

Of course, the "best" solution would be to use some sort of latency-free audio solution - MIDI, for example, or ASIO. If I timed it, as with everything else, to scanlines I'd end up with a 64µs granularity - which is larger than a conventional 44.1kHz sample (23µs), so PWM sound might not work very well.

chip_8.png

Incidentally, this is not the first emulator I have written - I have written the obligatory Chip-8 emulator, for TI-83 calculator and PC. Being into hardware, but not having the facilities to hand to dabble in hardware as much as I'd like to, an emulator provides a fun middle-ground between hardware and software.

wormhole.gif

Multithreading on the TI-83 Plus

Thursday, 3rd August 2006

I'm not really clued up on the way these new-fangled modern operating systems handle running multiple threads, so I apologise in advance if this isn't really proper multithreading.

Basically, I was pondering (as one does) whether it would be possible to run any number of threads on the TI-83 Plus. I daresay others have done this in the past, but not having access to the internet at the time to check I thought I'd do my own bit of experimentation.

The way I can see this working is to give each thread a small amount of time to run in (time slice). At the end of one thread running, I would save the calculator's state away, and load the state for the next thread. I would keep on swapping threads several times a second, giving each their own bit of time and preserving their state.

What I really needed to know was what needed to be preserved between executions. Really, there isn't a lot - just the registers. PC and SP clearly need to be preserved, so that when a thread is resumed it resumes running at the point it was left at. Keeping the other general-purpose registers (AF, BC, DE, HL, IX, IY) safe would be important as well.

Naturally, I'll need to also preserve the stack. Or rather, for each thread, I'll need to allocate some memory for a stack and point the new thread's SP register at the end of that new memory. To keep things simple, I'll just point my test thread's SP at $FFFF-200, as the TI is arranged to have 400 bytes of stack RAM, growing backwards from $FFFF.

To perform the swapping, I'll use a custom interrupt (run in IM 2). The timer hardware will trigger this interrupt many times a second, so it is ideal. At the end of my custom interrupt, I'll jump into the default handler provided by the TIOS so the TIOS functions that rely on it behave correctly.

The way I see it working is like this:

-> Enter interrupt handler.

Pop value off stack. This will be the program counter of the last running thread. (->A)
Save current stack pointer. (->B)

Set stack pointer to area in RAM where I can preserve registers for this thread. (<-C)

Push IY, IX, HL, DE, BC, AF to the stack.
Push old stack pointer (<-B) to stack.
Push old program counter (<-A) to stack.

Find the address of the RAM where we have stored the registers for the next thread.
Load it into the stack pointer.

Pop value off stack, store it (will be PC) (->D)
Pop value off stack, store it (will be SP) (->E)

Pop AF, BC, DE, HL, IX, IY off stack.

Save current stack pointer (end of RAM area used to store registers for new thread) as address of last thread for next interrupt. (->C).

Load value into SP (<-E)
Push new thead's PC to stack (<-D)

Jump into TIOS interrupt handler ->

Effectively, all this does is take the pointer stored when the current thread was resumed, dump all the registers into the memory it points to, hunts the next thread, reloads the registers and saves the pointer for next interrupt.

Does it work?

_getkey.gif

The above screenshot doesn't look too impressive, I'll give you that. There are only two threads running. The secondary thread is drawing all those random dots on the LCD. The other thread (which is the main program thread) just contains "bcall(_getkey)" - hence the 2nd/Alpha icons appearing,

I'll make the primary thread do something a bit more exotic - display random characters on the left side of the screen, the secondary work on the right side of the screen:

randomchars.gif

The code for this is simply:

Main
	.include "Multithread/Multithread.asm"


	; Kick things into action.
	call Multithread.Init
	
	; For the moment, the secondary thread is hard-coded, and 
	; the 'multithreading' code ONLY switches back and forth between
	; two hard-coded thread slots.
-

	halt ; Slow things down a bit here.

	ld b,8
	call ionRandom
	ld (curCol),a

	ld b,8
	call ionRandom
	ld (curRow),a

	ld b,0
	call ionRandom
	bcall(_PutMap)
	bcall(_getCSC)
	cp skClear
	jr nz,{-}
	
	call Multithread.End
	ret
	
; ---------------------------------------------------------
	
SecondaryThread
	
	di ; Don't want any other thread writing to the LCD!
			
	ld b,64
	call ionRandom
	add a,$80
	push af ; Save for later...
		out ($10),a
		
		ld b,48
		call ionRandom
		add a,48
		ld b,a
		srl a
		srl a
		srl a
		add a,$20
		out ($10),a
	
		ld a,b
		and 7
		ld c,%10000000
		jr z,{+}
		ld b,a
	-	srl c
		djnz {-}
	+
	
		in a,($11)
		call LcdBusy
		in a,($11)
		xor c
		ld c,a

		call LcdBusy

	pop af
	out ($10),a

	call LcdBusy
	ld a,c
	out ($11),a			
		
	ei
	
	jr SecondaryThread
	
LcdBusy
	push af
	inc hl
	dec hl
	pop af
	ret

As you can see, there are two discrete threads running there.

Well, that's two threads up and running. What's needed to extend this to any number of threads?

  • Thread management. Basically, the ability to add/remove threads at will, and have the handler be able to switch between as many threads as required.
  • Allocation of new stack space. New threads will need more stack space, so I shall need to add some mechanism to steadily allocate it.
  • Idle threads. One big problem is that as you add threads, each one gets progressively slower. If threads flag themselves as idle, they can be skipped so threads that do need CPU time get it. The easiest way to do this is to set a "sleep" counter which is checked when threads are switched around - if it's zero (when it runs out), the thread is given some time to run.

Off the top of my head, that's about it. There are some issues I have come across when working with this:

  • TIOS routines are not thread-safe. This speaks for itself, pretty much - if you have two threads, both of which are calling bcall(_putC) to display a character, they interfere with eachother (for example - incorrectly setting the value of curCol or curRow as one thread changes it as the other one is reading it) which causes crashes.
  • Variables can become an issue. Same reason as above - if two threads call a function which uses a set RAM location for a local variable, the variable can be changed half way through.

One possible way to get around thread-unsafety of functions is to disable interrupts before calling them and reenabling them afterwards, which prevents the thread switcher from swapping them. It's a pretty lousy workaround, though. sad.gif

Structs and sensible variable layout

Tuesday, 9th May 2006

When developing an assembly program, you need to 'declare variables' by attaching an address (in RAM) to a label.

The problem here, of course, is having to calculate all the relevant positions in RAM for each variable. For example,

ram = $C000 ; Assume RAM starts at address $C000

var1 = ram+0
var2 = ram+1
var3 = ram+3 ; var2 is 2 bytes!
; ... and so on ...

Now this is fairly painful. So, I added .varloc and .var directives to Brass:

ram = $C000 ; Assume RAM starts at address $C000

.varloc ram, 1024 ; 1024B in size

.var 1, var1
.var 2, var2
.var 1, var3
; ... and so on ...

This eases things a bit, but what if you have lots of variables and a number of different RAM areas? Now you still have to shuffle things around to fit. So, now, Brass allows you to define multiple areas of memory for variables (through multiple .varloc statements) and shuffles around all the variables to best fit in the available memory. Variables defined using .tempvar can even overwrite eachother (provided they are in different, not-nested modules) to save space.

Of course, sometimes you need variables to share consecutive areas of RAM, so I also added structure support.

; Define it

.struct Point2D
    .var db, X
    .var db, Y
.endstruct

; Use it

.var Point2D, Me

    ld a,10
    ld (Me.X),a
    
    ld a,32
    ld (Me.Y),a
    
    ld hl,10+32*256
    ld (Me),hl

; Or even:

.struct Point3D
    .var Point2D, P
    .var db,      Z
.endstruct

.var Point3D, You

    ld a,(You.P.X)

And, to comply with the picture requirement, have something ancient and completely unrelated.

hippo.gif

Getting it to work

Wednesday, 16th November 2005

Reliability

I'm sure you wouldn't want to use a keyboard that was inaccurate. Unfortunately, (as you might have seen me moaning in the past) the keyboard generates the clock signal. This basically means you need to be constantly checking the clock line to see if it goes low - that or use a hardware interrupt the jumps in and receives the scancode packet when the keyboard decides you want to stop sending something.

Well, I don't have the luxury of an extra keyboard chip or a hardware interrupt, so I needed to don my thinking cap. I had thought that one possible way around this would be to send a "disable" command to the keyboard after receiving bytes, running my handler code, then sending an "enable" command - these commands clear the keyboard's buffer, unfortunately, which is not good.

I downloaded another document on the AT protocol to see if they mentioned anything useful, and lo and behold:

The host may inhibit communication at any time by pulling the Clock line low for at least 100 microseconds.

I think I can spare 100µs - I added the line ld a,1 \ out (bport),a to the end of my buffer-filling code from earlier (it fills a scancode key buffer) - and the code doesn't drop a single scancode. Result!

Providing useful functionality

All these keyboard routines do at the moment is to display the data coming in on the screen - not exactly a great use of them. What is really needed is a simple two-way handler - it calls two different user-defined routines based on what a key is doing, whether it is being pushed down or released.

For this, I should translate the scancodes into a new format - there are less than 256 keys, so there's no reason why I can't fit every single key into a byte.

As well as user-defined keyboard events, I'll have to add my own to handle toggling the status of the keyboard - the num/caps/scroll lock as well as the shift/alt/ctrl.

It's really quite simple to do.

	; Now we need to run through all the keyboard events!

	ld ix,_buffer	; Start at the beginning of the buffer.

_handle_next_scancode:

	; Let's assume it's a normal key:

	ld hl,_scancode_lut
	ld bc,_scancode_lut_end-_scancode_lut

_key_down_handler:
	ld de,_null_handler


	ld a,(ix+0)
	or a
	jr z,_handled_all_keys

	cp at_scs_enhance	; Is it an enhanced key?
	jr nz,_not_enhanced

		inc ix	; We know it's enhanced, so move along.
		ld a,(ix+0)
		or a
		jr z,_handled_all_keys
		ld hl,_scancode_e_lut
		ld bc,_scancode_e_lut_end-_scancode_e_lut
_not_enhanced:

	; Now HL points to our LUT, BC is the correct length
	; and IX points to the next byte.

	cp at_scs_keyup ; Is it a key UP event?
	jr nz,_not_keyup

		inc ix
		ld a,(ix+0)
		or a
		jr z,_handled_all_keys
_key_up_handler:
		ld de,_null_handler

_not_keyup:

	; Move to next chunk before we do anything
	inc ix
	; At this point, A=scancode, HL->translation table, DE->handler, BC=table size, IX->next scancode.

	; We now need to run the translation.

	cpir	; Simple as that!

	jr nz,_handle_next_scancode	; Not found

	; So HL->scancode+1
	; Here is where the magic happens:

	push de
	ld de,0-_scancode_lut
	add hl,de
	ld a,l
	pop de

	; Now, A = 'real' scancode.

	; We need to 'call' DE.
	; We can spoof this easily:

	ld hl,_handle_next_scancode
	push hl	; _h_n_s is on TOP of the stack.
	push de ; our event handler is on top of the stack;
	ret	; POP back off stack and jump to it.
		; When the handler RETs it'll pop off _h_n_s and carry on scanning!


_handled_all_keys:

_null_handler:
	ret
Basically, I just run through all the received bytes. I check for $E0 (if so, switch to the alternate scancode conversion table for the extended codes) then $F0 (if so, load the alternate handler for a key up event rather than a key down event) then look up the scancode on the selected code table.

I created a couple of very basic event handlers - the key down event displays ↓ followed by the adjusted key code, the key up event displays ↑ followed by the adjusted key code.

keyhandlers.jpg

What would be ideal would be to provide internal event handlers that could be called on keyup/keydown which would then jump over to the user's custom handler. These event handlers could look for special keys and adjust the keyboard LEDs and set internal flags that could be used to detect the status of certain keys. I'd need:

  • Num Lock
  • Caps Lock
  • Scroll Lock
  • Shift
  • Ctrl
  • Alt

Unfortunately, I think that there is a problem with my byte-sending code. Setting the keyboard LEDs starts to do strange things - I now have two options;

  1. After switching status, ignore the LEDs. The status flag is set correctly internally, but you can't see it on the keyboard (which is a bit pants).
  2. Update the status flags every single time we run through the loop to check for any new bytes (which makes the keyboard lag like crazy - there's up to half a second of buffering going on!)

Mixing-and-matching the two - rewriting the branch before we bring the clock high again (for maximum speed) confuses the keyboard - the status LEDs never change and it decides to disable itself in a strop until I send $FF (the reset command) again. I think it's time to revisit the at_send_byte routine again to see what it's doing wrong!

Well, comparing it to my new notes - it's actually completely wrong at the end, when it comes to sending the parity/stop/ACK bits! A quick rewrite to how I think it should go isn't too hopeful - the keyboard LEDs flash like mad. Tweaking the timing by throwing in a few calls to _wait_bit_low and _wait_bit_high to synchronise my data to the clock stop this completely - and now the code is as it was before, at 100% accuracy - but about twice as fast.

Replacing my branch code still doesn't work all the time - sometimes the LEDs change, sometimes they do not. Not believing it would work, I threw in a check for the ACK bytes returned - if they were $FE, the 'repeat last command' byte, I'd send again.

My routines were clearly not as broken as I thought - the keyboard LEDs now change status perfectly, and as much as I hammer the Num Lock, Caps Lock and Scroll Lock keys, I cannot lock up the program or get the keyboard LEDs to display the wrong value. Not to mention that keying in other keys is back to the lightning fast response they used to be...

TI Keyboard

Tuesday, 15th November 2005

Once again, I waste my time doing something remarkably useless - it's trying to connect a keyboard to my TI graphing calculator!

in_place.jpg

Idea

Alas, this is not an original idea - I had seen some work done on this in the past, and tried fiddling with the code myself, not really understanding much of it. I decided to have a go - from scratch, and on my own.

Constructing the hardware

keyboard_before.jpg

I decided to butcher an old, suitably discoloured keyboard (in a nice way, though, so if all went wrong I could reassemble it). I don't have a soldering iron, nor any sort of decent cabling or plugs so I planned to just cut and strip the wires inside the keyboard and attach (by twisting together and large amounts of Sellotape) a power cable and 2.5mm stereo minijack plug (the TI has a 2.5mm stereo minijack socket on it as a data port).

enough_screws.jpg

I'm not sure they put enough screws into this thing... (by contrast, my main keyboard has a whopping 2 screws in it - this keyboard's designer was clearly paid by the screw!)

controller.jpg

Finally (and with a slightly sore wrist), I get to the bit I need - the keyboard controller board. The circuit board has the four leads soldered straight into it - I'd been hoping for the more typical sight of a block that the cable plugs into, but unfortunately it looks like I'll be cutting wires and hoping not to snap them off by accident. The four wires have been labelled VCDO - I'll cross my fingers and assume these stand for VCC (+5V), Clock, Data and 0V.

controller_stripped.jpg

leads_attached.jpg

Out it comes, the wires are stripped and reattached. I then tested all the leads using my multimeter and created some stunning ASCII art to remind me what went where:

HARDWARE SCHEMATIC:

PS/2 SOCKET:                               TI CONNECTOR:

  6.-++-.5      4: VCC   >-[+5V]                _
  /o || o\      5: Clock >---< 1: Red    Tip   /_\.
 |   ++   |     1: Data  >---< 2: White  Ring  }_{
4|o      o|3    3: Gnd   >-+-< 3: Copper Base  | |
  \      /               __|__                 |_|
   \o__o/                 ---  0V             /   \.
   2    1                  '                  \___/===> To TI
 ^ Note that this is the SOCKET view
and not the plug view for a PS/2 port.

taped.jpg

With that done, I taped down the connections and screwed the whole thing back together. Tapping a 9V PP3 battery to the power leads makes the keyboard boot up; doesn't look like it's broken quite yet!

Writing the code

What with the hardware now hopefully complete, it's a simple case of writing the code to support the keyboard. *cough cough*

Basically, I need to write an implementation of the AT protocol. The protocol is fairly simple, but unfortunately the keyboard generates the clock signal for us (the TI doesn't have accurate timing at all - it's an RC circuit, and so the clock rate drops as the batteries go flat). Let's hope the TI can keep up!

I guess the easiest code to write would be code that sets the output high (as it's an open collector circuit, high is the default level - either side can pull this low and hold it there) then poll the clock and wait for it to drop then go back up again. The clock line maps to bit 0 of the link port, and the data line to bit 1 of the link port. The TI's data IO port can be controlled using these two lower two bits of the hardware port equated as 'bport' (port 0) in ti83plus.inc

Here's what I tried;

init_all:
    ; Set link port high
    ld a,%00000011
    out (bport),a


wait_bit_low:
    in a,(bport)        ; Read in the status of the bport.
    and %00000001       ; Mask out bit 0 (clock)
    jr nz,wait_bit_low  ; Is it non-zero (high?) If so, loop back.

wait_bit_high:
    in a,(bport)        ; Read in the status of the bport.
    and %00000001       ; Mask out bit 0 (clock)
    jr z,wait_bit_high  ; Is it zero (loop?) If so, loop back.

    ret                 ; Break out of the program

Strange, whatever I do - init the keyboard, tap keys, nothing happens. The program goes into an endless loop. Using the 9V to manually set the line low then high again, I realise something - I keep forgetting that the port works backwards, and that writing %00000011 sets the status to %00000000 - and when a line is held low by either device, it can't be brought up again very easily. Replacing the offending line with xor a to clear it to zero worked a treat - pressing any key on the keyboard exits the program.
Each "packet" on the AT protocol (it's a bit grand to call it that) is made up of 11 bits - 1 start bit, 8 data bits and 2 parity/stop bits. A djnz loop to get the 8 bits and a handful of rotate instructions to populate the result byte gives me this:

.module AT_Protocol

at_timeout = 255

at_get_byte:
    ; Clear Link port
    xor a
    out (bport),a

    ; Get the start bit:

    call _wait_bit_low
    call _wait_bit_high


    ; Now we need to get the 8 bits for the byte

    ; Reset the output byte
    ld c,0

    ld b,8

_get_byte_loop:

    call _wait_bit_low

    ; Now we get the bit itself
    in a,(bport)
    rrca
    rrca
    rr c

    call _wait_bit_high

    djnz _get_byte_loop

    ; Get the parity/stop bits

    call _wait_bit_low
    call _wait_bit_high
    call _wait_bit_low
    call _wait_bit_high

    ; Clear flags, load code into accumulator and exit
    xor a
    ld a,c
    ret

_get_byte_fail:
    ; Set nz to indicate failure, return.
    or 1
    ret


_wait_bit_low:
    push bc
    ld b,at_timeout
_wait_bit_low_loop:
    in a,(bport)
    and 1
    jr z,_waited_bit_low
    djnz _wait_bit_low_loop
    pop bc
    pop bc
    jr _get_byte_fail
_waited_bit_low:
    pop bc
    ret

_wait_bit_high:
    push bc
    ld b,at_timeout
_wait_bit_high_loop:
    in a,(bport)
    and 1
    jr nz,_waited_bit_high
    djnz _wait_bit_high_loop
    pop bc
    pop bc
    jr _get_byte_fail
_waited_bit_high:
    pop bc
    ret

Not too tricky at all! Amazingly, this code ran first time too. (Amazingly for me, that is). The test program just reveices a byte from the keyboard and displays it on the screen.

it_lives.jpg

One minor problem is that sometimes the code received differs by a bit to what it should (you can see this by holding down a key and noting how sometimes the code is different - I've written a short program that just displays the code on-screen when it's received). Consulting my AT protocol notes, I find that "After the clock line goes low a 5-microsecond pause is used so that the data line has time to latch." Maybe my pause isn't long enough?

Sadly, that did quite the opposite - the results are even more unpredictable. I guess it would be more better to try speeding up my code, rather than slowing it down..?

After increasing the speed a little (without unrolling all the loops, that is) the routines are (by and large) slightly more accurate. Still not perfect, but they'll do for the time being. If I press a key the release it just before it repeats, the accuracy is 100% perfect - I suspect that the problem is that in the time the rest of my program has drawn the last keycode, the keyboard has pottered away and tried to output another byte and I'm jumping in half way through. I guess I'll have to write some clever buffering code to handle that!

Interrupts

I thought that an ideal way to handle the timing/speed problem was to create a piece of code that could be loaded as a sort of driver. The calculator would be set up into interrupt mode 2 and would call the driver 100-or-so times a second. The code could then try to see if a byte was coming in, and if so it would add it to a buffer. A routine isolated from the rest of the code could then read a byte from the buffer and shift all the other items down to replace it - a sort of FIFO stack.

The Z80 has three interrupt modes; 0, 1 and 2. Interrupt mode 0 is pretty useless to us; interrupt mode 1 is the normal mode of operation. In this mode, the CPU pushes the program counter to the stack then jumps to memory location $38 every time an interrupt occurs. The interrupt handler then swaps the main CPU registers away with their shadow register pairs, does something, then swaps the CPU registers back again. Finally, you pop the old program counter off the stack and jump back to where you were. You could think of it as having a second thread running, only a lot less hi-tech and more restrictive.

Interrupt mode 2 is a stranger beast. The main difference is that it doesn't just jump to $38 - it creates a 16-bit address using the register I as the most significant byte and a byte off the data bus as the least significant byte - effectively, we have a 16-bit number made up of i*256+?. The CPU then loads the value in the memory location pointed to by this 16-bit value, then calls this address.

What does this mean for us? Well, it means that rather relying on the interrupt at $38 we can load our own interrupt into memory!

We need to do three things:

  1. Create a 257-byte lookup table aligned to a 256-byte boundary for the CPU to read from after it has build up the 16-bit address.
  2. Set the I register to the most significant byte of the start address of our lookup table.
  3. Copy our interrupt handler to the location our table points to, switch the interrupt mode to two and enable interrupts.

The way I've done this is:

  • Filled memory locations $8700 to $8800 (inclusive) with the byte $86.
  • Loaded $87 into the I register.
  • Copied my interrupt handler to $8686

My interrupt handler at $8686 is a simple jp instruction to jump back to my program for the sake of practicality.

Unfortunately, this approach doesn't work (and I tried a lot of different ways to get it to!) One reason for failure is that in im 2, the main interrupt handler at $38 isn't getting executed. The TIOS relies pretty heavily on this interrupt to work; most functions cause a pretty nasty crash or do other strange things. Fine, I say to myself, and replace my reti call at the end of my interrupt handler with a jp $38 to manually call the TIOS interrupt. The behaviour gets even stranger - calling ionFastCopy (a function, non-TIOS related, to copy the display buffer to the LCD) causes strange rippling effects to appear on the LCD, followed by a full-out crash when I finally quit the program.

On top of all this, the few times I can get a display of the key buffer I can see that it's not updating very frequently... The interrupt is not checking the port frequently. All this for nothing!

As far as I can see it, the only way for an interrupt-based technique to work would be for the TI to have a hardware interrupt - using the keyboard's clock connected to the CPU, so that whenever the clock goes low the TI could spring into action and receive the byte. Seeing as the only access to the CPU I have without invalidating my warranty is via the data port, I'm a bit stuck.

Back to square one

I guess the only way is to agressively poll the port... First up, I rewrote the code so that instead of displaying a decimal version of the code, I'd display a hexadecimal version - significantly easier to read, faster to convert. I then painstakingly noted down every key's scancode from this into a useful include file.

One problem with the original TI keyboard project was that it had problems with input; it would occasionally forget about shift being pressed, or accidentally repeat keys. I think I now know why...

On an AT protocol keyboard, scancodes are sent every time a key is pressed and again when the key is released. To differentiate between the two different actions, when a key is released the scancode is preceded by the special code $F0.

I reckon that the problem was that the function to get a byte would have been called, followed almost immediately with the code to translate/display it. What I intend to do instead is to get bytes in a loop and add them to a buffer until either the routine times out (no more bytes being sent) or the buffer is full (shouldn't happen!)

keycodes.jpg

As you can see, this system works. The above codes are special ones generated by pressing some of the extended keys (the cursor keys) - they send the code $E0 followed by the key itself. Some of the codes are downright silly - PrintScreen sends the command string E0F07CE0F012 when released - Pause sends E11477E1F014F077!

Infuriatingly... this is STILL not perfect! I'm still losing some bytes. What to do - if only there was a way to control the keyboard, to stop it from scanning... wait a minute...

Controlling the Keyboard

Sending a byte is not too different from receiving a byte - you hold the clock and data lines low, then just the data low, then write out the bits as the keyboard requests them on clock. The easiest code to transmit is $FF - keyboard reset. According to my notes, the keyboard should respond with the power-on self-test byte as well as resetting. Lo and behold, the keyboard lights flash and I get $AA back - the self test has passed. I also get $E8 back, and my notes don't mention $E8 anywhere, but I'll ignore that for a minute and bask in the glory of it otherwise working perfectly.

The next code I try is $EE. This is the echo code - in theory, the keyboard should send back $EE. Sadly, it doesn't. Damn. It just resets and sends back $AA, though it sometimes sends back $B8 as well.

On close inspection, it seems pretty obvious what I've done wrong - I'm completely neglecting to send the parity and stop bits. Oops. After adding them, the keyboard responds $EE to my $EE - which is quite correct! The parity is hard coded, so I'm glad it works.

Trying another code, $F2, gets the keyboard to spit back an unfriendly $FE - which translates as "resend, you idiot". $EE is %11101110, which contains 6 set bits. $F2 is %11110010 which contains 5 set bits - the parity needs to be reversed. Hopefully calculating the parity shouldn't consume too many clock cycles - after I extract the bit to send, I need to add it to another counter. I can then use the lowest bit of this counter as my parity bit.

Thankfully, I have sufficient time to calculate the parity - the keyboard now responds FAAB83. $FA is the ACK code, AB83 are the ID bytes. Scarily, my notes say that the ID bytes are $83 then $AB - I'm going to hope that this is an error in the notes - I doubt my routines are going to be able to mix up whole bytes!

Now for the main reason you'd want to send codes to the keyboard - to flash the LEDs, of course! The code is $ED - the keyboard should respond with an ACK ($FA), to which I send the status byte (the lower 3 bits control the LEDs). The code is pretty simple (albeit without any checking on the ACK):

    ld a,$ED
    call at_send_byte   ; Command
    call at_get_byte    ; ACK
    ld a,%00000101      ; Caps Lock and Scroll Lock on
    call at_send_byte   ; Command
    call at_get_byte    ; ACK

lights.jpg

Ace. I'll add the equates for the various commands to my include file. No doubt I can get some more done with this project tomorrow...

Scripted attacks

Monday, 3rd October 2005

Attack patterns can now be simply scripted as a list of 4-byte chunks, covering which enemy to use, the delay between adding them, how many of them to add, the delay between them and a delay after adding the last one.
For example, this:
.db $04,16,8,100
...would add 8 enemies of type $04, adding one every 16 game ticks and then pausing for 100 game ticks after adding the last one then progressing to the next scripted enemy.
To support this, I've added some new per-level parameters, covering:

  • Attack type (random or 'scripted') with delay between enemies or a pointer to script to follow.
  • Speed of enemies.
  • Speed of landscape scrolling (used very little - bonus levels with no enemies/mines scroll past extra-quickly).
  • Maximum number of mines.

Using all the above, I can easily configure each level's attack patterns quite simply.
Doing this has identified a number of bugs (mostly where new enemies were being initialised without clearing out a particular byte, which means that certain sequences would start in odd places) which have now been ironed out.

I have also picked up work again on my music system for the game. I have a very bad 12-bar-blues demo running with it - I need to find a decent pitch-to-period table as the one I calculated in Excel sounds slightly wrong. There are also some minor-ish reset bugs (the first time in-game a note is played the instrument is full volume for one frame plus some minor synch issues). Looks like I'll have to write the music in Notepad, though - who'd have thought that getting low-level access to the sound card was so bloody difficult in anything other than C (and I'm damned if I'm going to have to write a GUI system for the Windows console, and Win32 is too mucky to deal with for such as simple application). This is great fun, as you can imagine - take, for example, this: (the 12-bar-blues demo for the testbed for the music system)

; Demo tune

demo_tune:

; Instrument table:

.db 2		; Number of instruments
.dw simple
.dw vib

; Sequence table:

.db 6		; Number of sequences
.dw run_c
.dw run_f
.dw run_g
.dw bass_c
.dw bass_f
.dw bass_g

; Tune!

.db %00000000, %00000000
.db %00000011, %00000001
.db %10000000, %10001000

.db %00000000, %00000000
.db %10000000, %10001000

.db %00000001, %00000000
.db %00000100, %00000001
.db %10000000, %10001000

.db %00000000, %00000000
.db %00000011, %00000001
.db %10000000, %10001000

.db %00000010, %00000000
.db %00000101, %00000001
.db %10000000, %10001000

.db %00000001, %00000000
.db %00000100, %00000001
.db %10000000, %10001000

.db %00000000, %00000000
.db %00000011, %00000001
.db %10000000, %10001000

.db %00000000, %00000000
.db %10000000, %10001000

.db %11111111

; Sequences:

run_c:
	.db 1

	.db (144>>8)+%01000000
	.db (144&%11111111)
	.db $10

	.db (112>>8)+%01000000
	.db (112&%11111111)
	.db $10

	.db (93>>8)+%01000000
	.db (93&%11111111)
	.db $10

	.db (82>>8)+%01000000
	.db (82&%11111111)
	.db $10

	.db (77>>8)+%01000000
	.db (77&%11111111)
	.db $10

	.db (82>>8)+%01000000
	.db (82&%11111111)
	.db $10

	.db (93>>8)+%01000000
	.db (93&%11111111)
	.db $10

	.db (112>>8)+%01000000
	.db (112&%11111111)
	.db $10

	.db %11000000

run_f:
	.db 1

	.db (105>>8)+%01000000
	.db (105&%11111111)
	.db $10

	.db (82>>8)+%01000000
	.db (82&%11111111)
	.db $10

	.db (68>>8)+%01000000
	.db (68&%11111111)
	.db $10

	.db (60>>8)+%01000000
	.db (60&%11111111)
	.db $10

	.db (56>>8)+%01000000
	.db (56&%11111111)
	.db $10

	.db (60>>8)+%01000000
	.db (60&%11111111)
	.db $10

	.db (68>>8)+%01000000
	.db (68&%11111111)
	.db $10

	.db (82>>8)+%01000000
	.db (82&%11111111)
	.db $10

	.db %11000000

run_g:

	.db 1

	.db (93>>8)+%01000000
	.db (93&%11111111)
	.db $10

	.db (72>>8)+%01000000
	.db (72&%11111111)
	.db $10

	.db (60>>8)+%01000000
	.db (60&%11111111)
	.db $10

	.db (53>>8)+%01000000
	.db (53&%11111111)
	.db $10

	.db (49>>8)+%01000000
	.db (49&%11111111)
	.db $10

	.db (53>>8)+%01000000
	.db (53&%11111111)
	.db $10

	.db (60>>8)+%01000000
	.db (60&%11111111)
	.db $10

	.db (72>>8)+%01000000
	.db (72&%11111111)
	.db $10

	.db %11000000


bass_c:
	.db 0
	.db (307>>8)+%01000000
	.db (307&%11111111)
	.db 8
	.db (144>>8)+%01000000
	.db (144&%11111111)
	.db 8
	.db %11111111

bass_f:
	.db 0
	.db (224>>8)+%01000000
	.db (224&%11111111)
	.db 8
	.db (105>>8)+%01000000
	.db (105&%11111111)
	.db 8
	.db %11111111

bass_g:
	.db 0
	.db (198>>8)+%01000000
	.db (198&%11111111)
	.db 8
	.db (93>>8)+%01000000
	.db (93&%11111111)
	.db 8
	.db %11111111

; Instruments:

simple:
.db 4	; Length
.db 255
.db -64

.db 8
.db 0
.db 32

.db 1
.db 255
.db 0

.db 0

vib:

.db 2
.db 128
.db 50
.db 2
.db 128
.db -50
.db 2
.db 128
.db 50
.db 1
.db 128
.db -50
.db 1
.db 128
.db 50
.db 3
.db 128
.db -50

.db 10
.db 128
.db 14
.db 0

Thankfully, the assembler can handle me sticking sums in rather than hard-coded values in places making things a lot simpler... but it's still a bit mucky. Ah well. There is no noise channel set up, either, so no krch-krch-krch style beats from the white-noise generator as such.

It works!

Thursday, 22nd September 2005

  • Fire Track II for TI-83 Plus: 31 source files.
  • Fire Track for the Sega Game Gear: 42 source files.
    Based on this information alone, you can deduce that FTGG will be vastly superior. wink.gif

    I just gave FT2 a try in VTI (I don't have any batteries in my calculator) and good grief it's slow. FTGG is much much faster... (though you should see it go when I set the enemy speed to two... I need more accurate control of speed, 1 being medium-ish easy, 2 being nightmarishly impossible!)

    This makes me happy.

    yay.gif
    Things to notice about the above screenshot:

    • Near pixel-perfect parallax star background (you can see some through the bullet holes!)
    • Not one single poorly timed VDP write.

    This has required an enormous amount of code rewriting!
    First to be fixed was the VDP timing. With a few .ifdefs, restructuring of the main loop and addition of an extra function call, all sprite data is now written to a shadow in RAM and then all sent to the VDP at once. This means that in sprite-intensive frames the framerate drops (it is unnoticable!) instead of the VDP going bonkers.

    Next up, the pixel-perfection. I do not know if you know how difficult this is to do on this hardware (or, at least to a hack coder like myself), so I'll do my best to explain.

    On a purely software-driven system, you can create a bitmask of your tiles so that when you manually blend your layers together you can cut out the stars you don't need. Or, you just draw them first then draw the tiles on top.

    On more advanced hardware you can Z-order your polygons, so that's all done for you.

    On the Game Gear, I have two layers - sprites and background. I can set individual background tiles to be on top of sprites - but this would dump a solid 8x8 square on top of everything. In other words, ALL the sprites are drawn under the background tiles marked as being on top. I had toyed with the idea of bizarre palette tricks and messing around with the tiles themselves, thinking sprites would be too slow (I asked about this on the SMS Power! boards - apparantly some games do the sprite trick, so I gained confidence in it).

    What I need to do is this:

    1. Take the (x,y) of my star and convert this into an offset into the VRAM name table.
    2. Look up the value of the tile at this value. If it is 0 (blank tile), draw the star. (Special case to skip expensive stage below)
    3. Look up the tile in a table of sprites marked as transparent. If it isn't listed, don't draw it.
    4. Take the offset into our table of transparent tiles, multiply it by 64, use that as an offset into another table. Find how far into the nearest tile as an (x,y) coordinate we are - that is, x is in the range 0-7 and y is also in the range 0-7 inside the current tile.
    5. Use this coordinate to look up inside our mask whether to draw the sprite or not.

    There was an awful lot to go wrong and it did, often. sad.gif It works pretty well now, so I'm happy with where it is. smile.gif

    The function to calculate the VRAM name table offset based on an X,Y on screen was broken. It needs to take into account the current vertical scroll, you see, to calculate the offset, so was essentially doing this to calculate which 8-pixel-tall row a pixel was in:

    row_offset = (y/8) + (scroll_y/8)

    (where y is the coordinate we pass to the function) - which leads to all sorts of rounding errors! The correct function is, of course:

    row_offset = (y+scroll_y) / 8

    Which, oddly enough, didn't seem to work at all well - at certain times, the values returned were completely off. The reason is simple enough - our scroll_y is a value between 0 and 223, and out y coordinate is anything between 16 and 160. 160+223=383 - you try storing that in an 8-bit register! So I updated the entire function to 16-bit code where needed.

    Z80 ASM gurus - my old code to divide A by 8 was the simple:

        srl a
        srl a
        srl a

    For my divide-hl-by-8 function (srl [reg] is a sort of [reg]>>=1 function) I use this:

        srl h
        rr l
        srl h
        rr l
        srl h
        rr l

    -- is this the fastest I could use? Is there a nice 16-bit shift operation available to me?

    Next I had to generate the transparency data. I edited my tile editor - solid green &00FF00 is taken to mean "transparent!" It dumps a list of any tile with transparency, then an expanded 64-byte table of the mask (1 byte per pixel - horribly big, but much faster than packing it into a 1bpp mask).

    When I say "near pixel-perfect", I have a problem - my stars are 2x2 pixels, and so if they can overlap the borders by a pixel or two from time to time. It doesn't look too bad, and in my defense the original Fire Track displayed stars over ANY black areas on the background at some points...

  • On the merits of Excel and running out of space

    Monday, 12th September 2005

    video.jpg
    Latest video. [2.29MB AVI]

    You chaps probably don't know what a major headache it is trying to get clear, non-wobbly video... rolleyes.gif

    taking_video.jpg

    I gave up in the end and went for the "propped-up against a wall with camera on stack of books" approach. The reason I cannot, therefore, appear to play the game very well is because I cannot see the screen. All I can see is a faint, inverted glow and have to guess where I am. Not fun.

    Here are the major updates since last time:

    • Removed Blackadder theme rolleyes.gif
    • Altered colour contrast on title screen, added wipe to clear it in imitation of original, restructured code for better fades, changed timing, added sound. Title screen is now complete.
    • Rewrote entire code for checking bullet collisions with buildings on the ground from scratch. It now works perfectly.
    • Levels now "start" correctly, so the ship moves in from the bottom, the start platform moves towards you very fast until they are in the correct positions, at which point gameplay commences (much smoother than just dumping you at the start position with a full level already there).
    • All levels now mapped correctly, in order, with correct palettes for backgrounds AND sprites. Levels that end a zone with a platform rather than with the face are flagged as such and the correct end is substituted in now.
    • Each level can be pointed at a 'replacements' table, which lists "replace tile X with tile Y". Tiles are swapped around properly so that levels appear more varied.
    • Explosions are now fully implemented (2 kinds, background and foreground - with very slightly different animations) and working.
    • Path-based enemy attacks are now in full working order, using paths generated with a combination of my graphical calculator and Excel.
    • A couple of other new enemy attacks have been added.
    • The game now sports a BBC Micro font which can be used for proper text display.
    • The "epilogue" to the game is fully coded (type-writer effect for final message copied from original game, including sound effects and fades).
    • Each level now begins with a summary screen (sprite limits meant that displaying all your lives at the start of the level was impossible, so is now done on an interval screen). The screen displays the zone name and a number of lives, the number of lives being surrounded by orbiting space-ships (the number of those being the number of lives left).
    • The game now fully works on hardware (should now work on old hardware - not checked, as I do not have an old BIOS-less Game Gear) and in all emulators (I was not initialising the display correctly).
    • Destroyed terrain tiles are not updated live to the name table in the fairly intesive loop - they are saved to a list of destroyed tiles, which are then read off directly and updated when needed in the intensive graphics bit.

    I've probably forgotten a lot of stuff. Bear with me.

    There have been a number of "WTF" moments reading back some of the source. Take, for example, this gem:
    wtf.gif
    Some explanation is needed here - this is from the code that handles the special case when you have shot a 2x2 square building and it lies out of bounds of the name table. The name table is my grid (32x28) of tiles indices that each point to an 8x8 tile in the VRAM. As the view point moves up this map, new tiles indices are written to it, or old ones are adjusted when something is blown up. The worst case scenario here is if the 2x2 tile lies just above the top of the name table - which is what the above code has to fudge. Look at this:

    tilemap.gif
    (I love debugging emulators!)
    This shows all of my 32x28 tilemap. That funny white rectange is my view rectangle (it's currently half-off the display, and it loops around). If you look at the top and bottom edges of the tilemap, you'll see a ring has been shot out. Naturally, the bottom will have been hit first. What happens in this case is the address of the tiles that have been erased is decremented by a whole row of map points (there are 2 bytes to a spot on the name table, so I go backwards by 64). In this case, I find that the address is above the start of the tilemap so I add on 64*28 (which is the entire size of the table) and add it back to get back into the bounds of the table itself. As you can see, it was a successful operation in this case, but only because I was writing back HL this time and NOT the accumulator! No wonder my sprites were becoming destroyed as I was writing in completely the wrong part of the VRAM...

    blown_up.gif
    This is the screenshot from the emulator of the screen display for the above tilemap (the sprite layer is not displayed in the tilemap, for obvious reasons).

    I'm still not getting very far with my quest for music. sad.gif
    The only bit of the game that has music is the title screen, and that's not so much music as a weird noise (here is an MP3 of my Game Gear's rendition, this is the original from the BBC Micro). Even though I have specified that the noise channel (the one that produces the high beeping noises) to use the highest frequency, it's still lower than the Beeb. All the other notes are correct, though.

    Latenite has got yet another update - project support. (It looks more and more like VS.NET every day!) Rather than create messy proejct files and stuff which makes distrubution tougher, I thought I'd go the easier route of just allowing people to stick directives in the file with the ;#latenite:main directive. You can now ;#latenite:name="name_of_project" and ;#latenite:project="file_name"[,"project_folder"].

    latenite_small.jpg
    Click for larger image

    Skybound Visualstyles (which I'm using to fix the weird .NET XP interface bugs) looks lovely but hugely slows down the app - only really noticeable with so many hundreds of nested controls like me and when moving large windows on top of it forcing repaints, so now the XP styling option is saved away (so you can revert to boring standard grey if you like). Latenite now saves window/splitter positions when you close - a first for an app by me! grin.gif
    In fact, Latenite is also my first app that needs a splash screen, it takes so long to get going... sad.gif It's only about 2 seconds, though, for it to track down all the tools, load all the documentation, add all the compiler scripts and present itself, so it's not too bad.

    One overlooked piece of software is Microsoft Excel. People seem to develop a hatred of it, especially in people who end up trying to use it as a database application (it appears that this is what 9 out of 10 smallish (and some not so small!) companies do).
    I have been using Excel to create those smooth curving paths for some of the enemies to follow. First, I prototype them on my graphical calculator (lots of trial and error involved!):

    calc_1.jpg
    calc_2.jpg

    ..before transferring into an Excel spreadsheet.

    excel_small.jpg
    Click for legible!

    The Excel spreadsheet is set up so that column A is the steps (usually 0 to 256) in ascending order. Column B is the X coordinate, column C is the Y coordinate.
    Columns E and F serve a special purpose - they contain the differences between each element in columns B and C (so E3=B3-B2, E4=B4-B2 and so on). Columns G and H "fix" the values - if it's negative, make it positive and set the fourth bit (just add 8). This is for the internal representation of the paths. Column I contains the data I can just copy and paste into the source files - it starts with the initial (x,y) coordinate, and then has a list of bytes. Each byte signifies a change in (x) and change in (y), with the lower four bits for (x) and the upper four bits for (y).If the nibble is moving the object left or up (rather than right or down), then the most significant bit of that nibble is set. For example:

    %0001 - move right/down 1 pixel
    %0011 - move right/down 3 pixels
    %1001 - move left/up 1 pixel
    %1100 - move left/up 4 pixels

    ...and therefore...

    %11000001 = move right one, up four.

    Poor Excel has suffered a little bit of neglect from me - I used to use it to generate my trig tables. WLA-DX, the assembler I learn to love more and more every day, supports creation of a number of different tables - including trig tables - with a single directive. Just specify .dbsin (or .dbcos) followed by the start angle, how many bytes of trig table you want, the step between each angle, the magnitude of the sine wave and the DC offset and away it goes.
    People still using TASM, STOP! Use WLA-DX. It's ace (and free, not shareware!).

    You can never have enough pictures in a journal (hey, you should see this page load on our 700 bits/sec phone line at home!)
    Here's the full list of sprites:

    sprites.gif

    The last ship being after the explosions is a mistake, but it's too much hassle to change. I'm really pleased the way they've come out - I'm no pixel artist. smile.gif

    I'll finish off with a few assorted screenies.

    misc_01.pngmisc_02.png
    misc_03.pngmisc_04.png

    ...I would have done. I should have done. However, when taking those screenshots I saw a glitch in the graphics. As an exercise for the reader, can YOU spot where the bug is?

    bug.gif
    (and it's not in one of those external functions I wrote - this is why copy-paste coding is bad, 'kay?)

    Oh, (possibly?) final last thought popping into my head: one thing I really miss from the BBC Micro era is the comfortable keyboard layout. I mean, look at this small photo of a Master 128 keyboard:
    master_128.jpg
    Click for bigger

    You might think that that blocky monstrosity would knacker your fingers, but that's not the reason. It's the fact that rather than squash one hand's fingers onto a small cursor pad (or the even sillier WASD), you balance movement between your two hands - Z and X for left and right, : and / for up and down. (On a modern UK layout, that'd be ZX and '/). The space bar is now conveniently accessible by both thumbs, and you can reach around most of the rest of the keyboard with your other fingers.
    The Game Gear is an ergonomic handheld (unlike the uncomfortable Game Boy), so I'm reckoning that I can balance the controls between hands - ← and → on the d-pad can control left and right, and the 1 and 2 buttons can control up and down. I will offer multiple layouts, though, as otherwise that might confuse some people. wink.gif

    Anyway, I hope you find this journal interesting. It seems a bit all over the place - video, pictures, rambling, odd bugs, nostalgia... The only thing that's an issue is my rapidly shrinking GDNet+ account web space!

    Subscribe to an RSS feed that only contains items with the Z80 tag.

    FirstLast RSSSearchBrowse by dateIndexTags