Last information update: 19th October 1998
Last correction: 3rd August 2000
This document is, like the Acorn Machine List, now frozen and I am not going to be updating it further - beyond correcting any mistakes in the pre-existing information. It is still useful as a historical document about the older ARM chips but I no longer have the time to keep tabs on what ARM is doing and keep this document current.
As of writing this there are currently nine commercially available ARM processor cores with up to ten possibly being available soon. I will seperate the processors into two sections, those currently available and those soon to be available.
No banked R8 and R9 in FIQ mode.
No multiply instruction.
LDR/STR instructions with register specified shift amounts. (Withdrawn from
the ARM2.)
No co-processor interface or co-processor instructions.
USR : user mode
IRQ : interrupt mode ( with a private copy of R13 and R14.)
FIQ : fast interupt mode ( private copies of R8 to R14.)
SVC : supervisor mode. (private copies of R13 and R14.)
Only non USR mode code may change the processor mode providing hardware security if the hardware and physical memory is only accessible from privileged, ie non USR mode, code. Due to the top six bits of the program counter being used to hold the processor status flags this chip was restricted to addressing 26 bits of memory, or a 64 Megabyte address space. In actuality there are eight bits of processor status held in the PC register. Because an ARM instruction is always four bytes long the bottom two bits of the PC were always an implied zero when the register was being used as a PC. When that register is used for other operations the bottom bits reflect the mode the processor is operating in. (00 - USR, 01 - IRQ, 10 - FIQ & 11 - SVC)
A three stage instruction pipeline allows the chip to execute instructions quickly with a fairly low transistor count. One side effect of the pipeline is the ability to get a 'free' rotation/shift on every instruction as one stage of the pipeline dealth exclusively with a barrel shift of a given register. Combined with the condition execution of every instruction then long runs of code without branches, which stall the pipeline, could be achieved allowing a fairly high instruction execution speed for the clock rate. (About 0.6 instructions per clock cycle on average)
The ARM2 chip was the first, not to mention only, incarnation of this cell and was clocked at 8 MHz giving an average performance of 4-4.7 MIPS.
Rumours abound of 20 MHz ARM2 chips having been produced and used.
Currently no information about this has been seen by the document
maintainer.
Finally one new instruction was added, the SWP instruction. A monotonic register to memory swap command useful for multi-processor arrays.
Several speeds of ARM3 chips were produced. Initially 26 MHz varieties were released with the A540 machines, then 25 MHz versions were used in the A5000 and 24 MHz ones in the A4. Finally a 33MHz version was produced and used in the alpha variant of the A5000.
However this is merely the bulk produced versions. Many third parties have taken lower rated ARM3 chips and tested them at higher speeds, sorting the ARM3s into classes of chips that could work at varying speeds. Consequently there are hordes of ARM3 chips out there running at varying speeds. The highest I have heard of is about 37 MHz.
A second incarnation of the chip was as the ARM250 which was a 12MHz variant of the ARM3 cell and had the IOC1, VIDC1a and MEMC1 chips all integrated into the one chip but unlike the normal ARM3 it had no processor cache. The ARM250 delivered about 7 MIPS performance.
A 24 MHz ARM3 using a 12MHz main memory will produce an average speed of
execution of 13.26 MIPS. At 33 MHz 17.96 MIPS is delivered.
User32 - 32 bit USR mode.
Supervisor32 - 32 bit SVC mode. (private SPSR register)
IRQ32 - 32 bit IRQ mode. (private SPSR register)
FIQ32 - 32 bit FIQ mode. (private SPSR register)
Abort32 - Memory fetch abort more. (private SPSR register)
Undefined32 Undefined instruction mode. (private SPSR register)
The SPSR register is a Saved Processor Status Register and holds a copy of the CPSR (Current Processor Status Register) when the new mode is entered. The addition of the Abort32 mode and this change, although the CPSR/SPSR is really a corollary of the change to 32bits, allows the ARM6 cell to easily handle virtual memory without the contortions you had to go through on earlier cell ARM chips.
Two new instructions for reading and writing the CPSR and SPSR registers were added. The program counter is now fully 32 bit with the CPSR being hardware shifted into position when the PC is read in 26 bit modes. (for backwards compatibility.) The ARM6 cell is fully binary compatible, in the 26 bit modes, with the earlier ARM cell's code. The chip is fully static, the clock can slowed to any speed and the processor will maintain state. Finally the cell can work in either big-endian or little endian operation can be hardware switched between the two modes. Total register count in the ARM6 cell (not chip) is 36,000 transistors.
Several versions of the ARM6 cell have been produced. The ARM61 is a hardwired version of the ARM6 cell in ARM2/3 compatibility mode. This chip cannot enter the 32bit address/processor modes. The ARM600 range of chips is an ARM6 cell with an inbuilt MMU, on chip cache similar to the ARM3 chip's, an eight deep write back buffer with two independent addresses and a total transistor count of 360,000. The cache has had performance tweaks, is now controlled by the MMU and has been adjusted for 32 bit addressing. Three ARM610 chip speeds have been produced. One at 20 MHz delivering 17 MIPS, one at 30 MHz delivering 26 MIPS performance and finally one at 33MHz giving around 27-28 MIPS.
Also available are the ARM60 (an ARM 6 cell as a chip, without anything
else.), ARM650 (An ARM6 with some RAM & peripheral controllers. Designed
For embedded control systems.), ARM6l (lower power ARM6 cell) and the
ARM60l (lower power version of the ARM 6 cell as a chip.).
Most of what is new in the ARM7 cell is internal changes on timings for various signals. The ARM700 chip has a larger on chip cache (8kb, and radically altered for power efficiency) to the ARM600, improving cache hit rates. It also has twice the number of translation lookaside entries in the MMU and twice the number of address on the write buffer. (Presumably now four address can be written to before the buffer stalls.) At 40MHz the ARM710 delivers about 36 MIPS, or around a 40% improvement over the ARM610.
ARM7 series devices are ARM7 (chip cell core.), ARM7D (the chip core
with debugging support.), ARM7DM ( an ARM7D with an enhanced multiply.),
ARM7DMI (an ARM7DM with ICEbreaker (tm). ICEbreaker is on chip support for
In-Circuit-Emulation.), ARM70DM (ARM7DMI as a chip.), ARM700 (ARM7 + MMU +
cache + Writeback Buffer.) and the ARM7500 (ARM7 + MMU + cache + Writeback
Buffer + IOMD + VIDC20). Nearly all of these cores can be offered with the
Thumb core as well.
Fabricated on 0.5 micron process the chip is listed as delivering 80
MIPS performance with a 3.3 Volt device at 80 MHz. This is over twice the
performance of an ARM7 chip and lives up to the initial 'roadmap' promises
made about the ARM family. However it's performance is eclipsed by the
StrongARM devices for raw processing power.
In terms of the instruction set there is one new instructions added, the halfword load/store for moving 16 bit data units. Complete code compatibility is not guaranteed with earlier processors because of two factors, The extended pipeline means stack calls that store the Program Counter will have a value of the PC a full sixteen bytes ahead of the currently executing instruction, rather than the more normal eight bytes. Secondly the split cache introduces problems with self modifying code being first executed, then treated as data, manipulated and an attempt is then made to execute the altered code before it is flushed from the instruction cache.
Such code fragments will break. Fortunately such code tends to be fairly rare and confined to the OS (SWI handlers in particular). Produced on a 0.35 micron process the SA110 part achieves 115 MIPS at 100 MHz, 185 MIPS at 160 MHz and 230 MIPS at 200 MHz. The SA1100 part is designed for portable applications and contains an SA core, MMU, read/write buffers (probably a Level 1 cache and write buffer akin to the SA110 part), PCMCIA support, colour/greyscale LCD controller and general purpose IO controller (including two serial ports and USB support). It can be clocked at 133 or 200 MHz and consumes less than 500 mW of power.
Finally Digital have announced and demoed a 300 MHz SA part with an
'Attached Media Processor' designed to improve video playback, audio
processing and allow a software modem to be implemented. Details on this
part are sketchy and likely to remain so till the design is ready to go
into full production.
It is initially going to be offered as two parts, the ARM9TDMI (Thumb,
Debug support, 64bit Mulitply and ICEBreaker In Circuit Emulation) - which
is the base core part, and the ARM940T. The ARM940T offers, above and
beyond he base core, 4kb Instruction/Data caches, a write buffer (8 words,
4 independant addresses), AMBA bus interface, external co-processor
support and a protection unit for embedded applications (requires no
address translation and allows eight, independantly sized and level of
protection, protected areas of memory). Both parts are fabricated at 0.35
microns, clock at 150 MHz (producing 165 MIPS) with the ARM9TDMI consuming
225 mW and the ARM950T 675mW.
At full use it consumes about 150mW of power, however when doing an idle
loop this drops to 0.1mW. It is produced on a 0.5 micron, three-layer,
process similar to the ARM8. Currently no commercially released versions
exist, but this may change in the near future as investment is being
sought.
Initially planned versions include the ARM10TDMI core with the ARM1020T processor built around this core but adding an MMU with demand paged virtual memopry support, a 32Kb harvard style level 1 cache (most likely 16Kb Instruction and 16Kb Data caches ala the StrongARM), write buffer and an enhanced AMBA bus interface. Exact power consumption figures haven't been released but I expect the ARM1020T will consume between 0.6 to 1 Watt worth of power at 300 MHz.
It now remains to be seen whether the ARM10 beats Intel's efforts with it's StrongARM series, details of the StrongARM 2 are due to be released soon.
The ARM Architecture is built around a programmers model of sixteen general purpose registers and a variety of processor modes. Each processor mode offers differing levels of memory access, manipulation of the PC & mode and it's own private registers.
By default the programmer 'sees' 16 User mode registers, but when in other modes various registers are swapped out with registers particular to that mode. This table summarises the various modes and registers.
USR IRQ FIQ SVC R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R10_fiq R11 R11_fiq R12 R12_fiq R13 R13_irq R13_fiq R13_svc R14 R14_irq R14_fiq R14_svc R15 (aka PC)Where a register isn't named in the table, then the USR mode register is visible.
To help keep interupt latency to a minimum FIQ (Fast Interupt Request) mode has a reasonably large set of private registers allowing interupt code to execute in register as much as possible. If there is only one FIQ claimant allowed at a time, a stricture RISC OS stipulates, a further optimisation of pre-loading these registers can be performed.
By convention, and partially enforced by the instruction set, R14 is the 'link' register - commonly holding the return address of any sub routine call. The BL (Branch and Link) instruction automatically stores the correct return address in R14. All registers are general purpose, including R15 which is the Program Counter, status flags and mode register all in one. 26 bits of word aligned address, two bits of processor mode in bits 1 & 0 ( 00 - USR, 01 - IRQ, 10 - FIQ & 11 - SVC) and six bits of processor status (Negative, Carry, oVerflow, Zero, Interupt Request Disable and Fast ).
Instructions include Load/Store (Register, Multiple registers, Byte), Move
(and Move NOT), Addition (Add, Add with Carry, Subtract, Subtract with Carry,
Reverse Subtract, Reverse Subtract with Carry), Comparison (Compare and
Compare Not), Boolean Logic (Test, Test Equivalence, And, Exclusive Or, Or,
Bit Clear), Program Flow (Branch, Branch with Link) and the Software
Interupt.
Version 2 - ARM2
This architecture added a banked R8 and R9 in FIQ mode, the LDR/STR
instruction with register specified shift amounts was withdrawn and two new
'classes' of instruction were added - these being Multiply (multiply
and multiply accumulate) and co-processor control (Data operation,
co-processor data to ARM register, ARM register to co-processor, Load &
Store).
Version 2as - ARM3 & ARM250
Functionaly identical to the v2 architecture this variant added one extra
instruction SWP and allocated co-processor zero to be CPU identification and
cache control.
Version 3 - ARM6, ARM7 & Amulet 1
This update to the ARM architecture removed the 26bit restriction to the PC counter allowing full 32bit addressing for both data and code. (Previously only data could be addressed across the full 32bit address range.) As a result the dodge of storing processor flags mixed in with the PC in register 15 was no longer possible and a new set of registers were added to hold processor state. For each processor mode the registers CPSR (Current Processor Status Register) and SPSR (Stacked Processor Status Registers) were added. Two new processor modes were added as well Abort32 and Undefined32. For backwards compatibility the chip could be set to emulate the older 26bit mode of operation. A further improvement included the ability to change the byte order of the chip from little-endian to big-endian operation.
All this required the addition of new Move instructions (SPSR to register,
CPSR to register, register to SPSR, register to CPSR, immediate constant to
SPSR and immediate constant to CPSR.) to communicate with the status registers
for each processor mode.
Version 3M
This extension of the version three architecture gave extended Multiply
opcodes including unsigned long, unsigned accumulate long, signed long and
signed accumlate long multiplys.
Version 4 - StrongARM, ARM8 & ARM9
The new instructions first introduced in the 3M
architecture now become part of the main architecture in version 4.
Additionally a Halfword (16bit) load/store instruction was added.
Some sketchy details are starting to come out about this
architecture. But as yet the actual architecture refinements are not
available.
Developed concurrently with ARM Architecture Version 5 this is a new
floating point system giving the ARM family considerably faster floating
point performance. As with the Version 5 architecture details are
unavailble on exactly what this architecture implements.
Finally for the latest information and details regarding the ARM family of
processors why not visit ARMLtd's homepages
where details on current and upcoming ARM processors are kept.
Version 5 - ARM10
Vector Floating Point v1 - ARM10
Philip R. Banks
Send Email