Chemistry Reference and  Research
           
 
Periodic Table
- standard table
- large table
 
Chemical Elements
- by name
- by symbol
- by atomic number
 
Chemical Properties
 
Chemical Reactions
 
Organic Chemistry
 
Branches of Chemistry
Analytical chemistry
Biochemistry
Computational Chemistry
Electrochemistry
Environmental chemistry
Geochemistry
Inorganic chemistry
Materials science
Medicinal chemistry
Nuclear chemistry
Organic chemistry
Pharmacology
Physical chemistry
Polymer chemistry
Supramolecular Chemistry
Thermochemistry

ElectrEm

ElectrEm [1] is an emulator of the Acorn Electron coded in platform neutral C++ using the SDL library. Ports exist for Windows, Linux and Mac OS X operating systems.

In strict terms, the name refers to two different emulators - ElectrEm Classic and ElectrEm Future. The former was a reasonably accurate emulator suitable for Pentium and above machines, whose development began sometime in 2000 and tailed off in early 2002. The latter is a complete rewrite of the emulator started by the same author in late 2003 with the stated goal of 100% emulation accuracy.

ElectrEm Future has not been developed at anything like the same rate as ElectrEm Classic and lacks many of the nicer features of the older emulator - notably including a useful GUI - but features cycle perfect emulation for any software it is capable of running.

Elkulator is an alternative emulator of the Acorn Electron for PC machines. It has attracted some former users of ElectrEm Classic through its availability for DOS, dropped by ElectrEm Future.

Contents

Detailed technical workings of ElectrEm Future

These notes are provided by the author of ElectrEm for two reasons:

  • the emulator is open source and therefore this knowledge is usefully kept in the public domain
  • the specifics of this particular emulator are easily read past, providing a good insight into the functioning of full hardware emulators generally

General Design

How it Isn't Done

The means of emulation I've seen time and time again in so many emulators, and even implemented myself in ElectrEm Classic, is to run the CPU for the minimum number of operations that exceeds n cycles, then give a call out to all non-CPU devices to run for n cycles. The number n is picked as a useful divisor of the rate at which various things need to function.

In ElectrEm Classic, n was a single scanline of the display. The CPU would run for at least a scanline, then the display output would get a chance to draw a scanline, the sound output would check whether a new sound buffer needed to be filled, etc. This was the only opportunity devices got to signal an interrupt - at scanline ends. This tends to be known as 'scanline level emulation'. A common result is that things like the display are drawn according to the state of memory at the end of the scanline - not really accurate but often near enough. Some emulators are even as audacious as to run 'frame level emulation' - although this has not really been justifiable since the Pentium age.

ElectrEm Future doesn't divide time like this.

How it is done

ElectrEm consists of a conceptual 'process pool', which constitutes a single Acorn Electron. Think of it as the outer case and the clock generator - it forms the visible part of the Electron emulation, and ensures everything keeps running, but in itself doesn't do anything.

To this are attached a number of 'components'. These are the various pieces of hardware that form an Electron rather than, for example, a BBC Micro. This includes the 6502 processor, the ULA (although it is actually implemented as a series of related components), add-ons such as the Plus 1 and Plus 3, and so on.

The whole emulator is event based, in that the process pool essentially follows the following recipe to emulation success:

  • Ask around the components to discover when the next 'break worthy' cross-component communication will occur
  • Run all components up until the time of the first 'break worthy' communication
  • Allow communication to occur
  • Repeat ad infinitum

One such run is an emulation iteration.

Of course, this imports an idea that 'break worthy' communication is predictable. To this end, the process pool categorises components into two categories - those that will produce predictable communications, and those which will not. An optimisation creeps in here because of the Acorn Electron design - there can only be exactly one unpredictable component, namely the 6502. All other components are predictable.

In this light, a 'break worthy' communication is defined to be one which will affect the 6502's actual program flow, i.e. the changing of an interrupt or reset line.

So, fleshing out the previous description a little, the process pool actually performs these steps:

  • Ask all components other than the 6502 when they will next alter one of the 6502 control lines
  • Run all components until the first alteration is to occur
  • Allow control line change to occur
  • Repeat ad infinitum

Of course, sometimes the 6502 brings a future control line change on itself, for example it enables an interrupt on the ULA which was not previously enabled. In that case the ULA may now conceivably want to affect a 6502 control line at an earlier time than it had previously thought. To handle this possibility, one further modification to the process pool behaviour is made. It now runs:

  • Ask all components other than the 6502 when they will next alter one of the 6502 control lines
  • Ask the 6502 to run to the first alteration, wait for it to finish
  • Ask the 6502 how long it actually ran for
  • Run all other components for that amount of time
  • Allow any requested control line changes to occur
  • Repeat ad infinitum

The events that may cause the non-6502 components to change their mind about affecting control lines are all related to register reads and writes by the 6502, so it is perfectly possible for a component to inform the 6502 if its earlier report becomes inaccurate. In that scenario the 6502 to exits early. This gives components an opportunity to immediately change control lines if they desire, and certainly allows the process pool to determine a correct running time for the next iteration.

In practice, the process pool also does a tiny bit more in the realm of synchronising emulation to the correct running speed and releasing cycles to the operating system where possible.

Of course, a requirement for this manner of emulation is that all components, including the CPU, be able to pause and resume their operation at any cycle. All of the ElectrEm components achieve this.

Miscellaneous Process Pool Functions

The process pool is also responsible for the following things:

  • A common interface for file opening, irrespective of which component the media is aimed at
  • Electron machine configuration (i.e. attachment or not of various peripherals, some emulation specific options)
  • Maintaining an idea of which memory map locations are occupied by registers rather than memory locations
  • Machine state saving/loading
  • Various features useful to a debugger

Collecting all of these within the process pool performs the vitally useful function of extending a clean, single API for the Electron emulation. This, in turn, makes everything more straightforward for third parties coming to the code or grafting different front ends and user interfaces on top.

Bit Multiplexing

Bit multiplexing - also known as the technology without a catchy name - is both the most impressive feature of Electrem and the most underused at the same time. The idea is that every bit of Electron memory is secretly divided in a way that the emulated machine cannot detect into five (this is the 'multiplexing'). One of these represents the original bit as would be stored on a real Electron. The other four represent data that has subsequently been glued on top of the original.

Every time the original bit is manipulated, the four extra bits are manipulated in a 'similar' fashion. At the end of the day, this means that when the code that draws the display comes to put all the data together, it finds more than the original machine ever allowed. From this it can construct, as an example, 256 colour images instead of 4 colour images.

The reason that this technology is all but unused at present is that there are no tools for content creation. Hopefully this can be addressed in future. Through UEF, extra data can be piggybacked onto all three of the popular Electron mediums - tape, disc and ROM cartridge.

Miscellaneous Notes

Typical Electron programs run with only two potential interrupts - real time clock and end of display. This means that, most of the time, the process pool runs in iterations of 19,968 cycles - 1/100th of a second. The process pool never generates an iteration exceeding 39,936 cycles (1/50th of second) due to the need of the video component to update the display.

The 6502

Overview

ElectrEm emulates a 6502 hybrid that never existed - its 6502 has a built in memory mapper and a dumb data collector. It also takes instructions on a variable clock rate. On the side of the true 6502 functions, it implements all 6502 opcodes, documented and otherwise, and has been certified in that respect by the 6502 Test Suite. As stated above, it is capable of running for arbitrary cycle counts and is happy to pause at any point - not just between operations.

Furthermore, every operation is implemented as a 100% accurate cycle breakdown of the original. All the spurious memory reads and writes of a real 6502 occur at exactly the right time.

The Memory Mapper

The memory mapper grafted onto the 6502 emulation is exactly as much as required for an Electron and Slogger Master RAM Board emulation.

Memory is first of all divided into eight windows. Depending which of these the CPU is operating from, it will see an entirely different memory map. Having at least two is a prerequisite for Master RAM Board emulation, but optimisation reasons related to binary counting lead ElectrEm to eight. If the Master RAM Board is disabled, all eight windows lead to the same memory map.

Each map consists of 256 pages, which may be arbitrarily mapped to physical memory. That number is the largest common divisor of the various paging devices that may operate on the memory map - the ROM paging between &8000 and &c000, the JIM page and the Slogger Master RAM Board if enabled.

Each page has a read and a write pointer. When emulating RAM, the read and write pointers are set to the same place. When emulating ROM, the read pointer is set to the ROM data but the write pointer is set to a separate area of memory with which the 6502 otherwise has no interaction. In this way the 6502 itself need have no idea which areas are ROM and which are not - that is up to the device in control of the map.

Of course, not all addresses are RAM or ROM - some are registers for communication with other devices. To this end a list of flags for all memory locations is kept. If any of these indicates that a register lies at a particular address then the 6502 asks the process pool to read or write to that address for it. The process pool then tells the 6502 whether it needs to exit early so that the time to the next control line adjust can be reconsidered.

Cycle Timing

Also associated with every page of memory is a timing chart - which contains information about the 'halt' states the 6502 will encounter if it tries to read or write to memory on that page. In reality the 6502 doesn't have a halt line, and what really happens is that the ULA stops the CPU clock, but the difference isn't significant for an emulator.

Using this approach, combined with perfect emulation of cycle by cycle reads and writes, ElectrEm is able to reproduce the timing effects of an Electron with perfect accuracy.

The Data Collector

Another function of the 6502 is redirected data collection. Every cycle it collects the data at a particular location and stores it to a buffer. This is just a useful way of implementing correct-time video memory collection. This function cannot be implemented as only occurring when the 6502 is halted as when it is executing ROM code the machine runs on a split bus.

Miscellaneous Notes

Any cycle interruption is implemented cleanly and efficiently through threads. The actual CPU emulation is done in a separate thread to that of the process pool, which itself waits on a semaphore while the CPU thread is executing. The overhead of semaphore waits (which usually incur an overhead due to the granularity of schedulers in operating systems) is more than made up for by the large cycle numbers the CPU is able to execute for at a time by virtue of the over-arching ElectrEm design.

The ULA

Overview

The ULA is implemented as three separate components - the display, the tape, and everything else. All three cross-communicate as necessary, mostly because of the shared ULA interrupt register, but also because at this time the process pool only allows one device to claim any particular address for register reads and writes, so writes to some of the multi-purpose registers need to be shared out.

The Display

The display is a very solid piece of work. It uses the CPU data collection to collect video memory, and queues up a list of all video device state changes throughout any given frame, then at end of frame puts all the information together to produce the screen. At no time does it make any of the artificial assumptions seen on many other emulators, such as only allowing palette or mode changes at the end of a scanline.

The inner loop works like this:

  • Determine time to next queued event
  • Inspect collected video display data and produce correct display up until that event
  • Process event
  • Repeat until end of frame
  • Put frame on screen

Furthermore, the entire code adapts to producing an 8bit, 16bit, 24bit or 32bit output in either RGB or YUV colour spaces - the latter being related to the option of using a hardware overlay surface for a TV style display.

The code is also capable of using bit multiplexed data as described above to produce enhanced displays.

In optimising the code, I have assumed that palette and mode changes are 'infrequent'. To this end, a series of tables are used for mapping from collected video data to PC video output data, and changes in these are relatively expensive. A special case is implemented for a 'blank' display - when the palette is set so that no pixel data appears. This optimisation makes a lot of sense when the huge number of games that only use the lowest 156 scanlines of the display (the area between the real time clock and end of display interrupts) is allowed for. In the future a caching system should greatly reduce the cost of palette changes.

Similarly, due to the information that needs to be given to the CPU for data collection, applications that don't often change the video start address will run more quickly than those that do.

A CRC based system determines which scanlines have actually changed from one frame to the next, and only those are redrawn. Most programs change only very small sections of the screen from frame to frame due to the relatively low processing power of the Electron, and again these will run faster on the emulator than programs which change large numbers of scanlines frame on frame. Games like Syncron and Firetrack are disaster scenarios for this optimisation!

The Tape

Full support for all UEF tape chunks is provided, and Windows users can enjoy CSW support through Fraser Ross's csw.dll. An upshot of CSW support for non-Windows users is that ElectrEm knows how to compose byte level information from zero crossings - paving the way for support of 'real audio' such as WAV files, or pretty much any other tape image format. Real time input from a sound card will be relatively easy to implement if SDL ever deems to provide it.

As you'd expect, the tape hardware runs with the same fully accurate timing of all the other components. This means that the undesirable effects of the original hardware are duplicated - trying to load data in modes 0-3 will almost certainly fail as important bytes shift into the ULA and out again while the CPU remains halted for video draw reasons.

Everything else

The 'everything else' component handles the keyboard, paging and general memory issues (including Slogger MRB paging) and sound output.

As is the way SDL operates, a separate thread fills sound buffers as they are required for sound output. This thread reads a queue of audio events, in much the same way as the video circuits utilise a queue of video events. Things like the sampled sounds in Exile function due to this arrangement's emphasis on time stamping everything and obeying the time stamps.

The only other thing of mild interest to say is that the keyboard actually operates in two modes - normal user input or programmatic input. The latter causes the ULA to ignore your real keyboard and instead report that a programmed series of keys is depressed. This is the way that 'autload' functionality, whereby the emulator determines the correct load commands for an inserted piece of media, and issues them, has been implemented in ElectrEm Future. Compare and contrast with ElectrEm Classic, which utilised a paged ROM that used Acorn OS functionality to add key presses to the keyboard input queue.

The Plus 3

Overview

The Plus 3 is the only add-on interface so far emulated by ElectrEm Future. The interface consists of a WD1770 drive controller chip and one or two floppy disc drives, plus an Acorn specific control register for selection of disc drive, density and side.

The Plus 3 is a component that never adjusts CPU control lines, and often waits on a meaningful event to occur in terms of data from the disc drives, so several optimisations come into play.

First of all, the state of the Plus 3 is only calculated when it actually needs to be known - i.e. whenever a register read or write occurs. This means that when the Electron is not loading or saving, having a Plus 3 attached to the emulation has a near-zero processing cost - in truth just a call/ret pair per process pool iteration.

As the WD1770 often ends up in internal loops, it maintains a separate thread that never overlaps in time with the calling thread, analogously to the 6502 emulation. The unfortunate difference is that due to the design decision to only update state on request, the WD1770 is often asked to update for cycle counts as low as 6. At this level, the cost of using semaphores for communication could become a serious consideration.

Due to this, the WD1770 implements the concept of 'idling'. For example, suppose it is in a 'read sector' operation (a type II command). In that case, in the inner loop, the WD1770 sits around until it detects an address mark, then inspects the address to determine if the interesting sector has arrived. To this end, the WD1770 enters an idle state, calculating how many cycles will occur until the next sector arrives (as, being an emulator, it can look ahead), and the whole semaphore thing isn't done until the WD1770 is to be allowed to run for at least that many cycles. In this way, using a normal ADFS disc and the normal load/save operations, the maximum cost is 17*5 semaphore pairs per second, as there are 5 revolutions per second, and 17 events that might be interesting on any given circulation - 16 sectors plus the index hole.

Users of low end machines may see slow emulator performance when performing disc accesses.

The file formats which record only sector contents (including all the normal BBC Micro related floppy formats) are clocked so that they produce a disc surface read speed of 31,200 bytes/second (double density - ADFS) or 15,600 bytes/second (single density - DFS). This, conveniently, is 64 cycles/byte for double density and 128 cycles/byte for single.

File formats which offer different bit rates, either explicitly or implicitly (e.g. by supplying some number other than 6,250 bytes on a track in double density mode), are reproduced at the correct rate.

01-04-2007 01:16:19
The contents of this article are licensed from Wikipedia.org under the GNU Free Documentation License. How to see transparent copy