The ARM firmware and Replay FPGA framework

You are here:
Estimated reading time: 6 min

Here a brief overview what happens when switching on the Replay board. When talking of ARM, it is always the specific processor found on the Atmel uC (SAM7) meant.

The only configuration stored on the board is the ARM firmware in flash, everything else (FPGA cores) is ALWAYS loaded via SD card (except an “emergency FPGA setup”):

The ARM starts its own bootloader at flash address 0x00100000
(this part is found on SVN in “…\sw\bootloader\replay_loader_v1.2”)

– it sets up the ARM PLL and its I/O (including serial debug)
– it checks if the replay menu button is pressed, this is the bootloader loop:
– if it is pressed, it initialises the USB as HID device and waits for communication (a LED blinks on the board)
– after two seconds it checks again for the menu button, if it is still pressed, it stays in this mode
– in case a USB connecting is established, it waits for address/data packets to be flashed
– if no button is pressed
– it de-initialises the USB device
– it jumps to  flash address 0x00102000 and thus starts the Replay loader

– the bootloader part forwards (or “replicates”) all ARM vectors from 0x001000xx to 0x001020xx,
so interrupts etc. can be used by the Replay loader as well

The Replay boot loader starts at 0x00102000 and is thus independently developed/flashed to the ARM bootloader
(this part is found on SVN in “…\sw\arm_sw\Replay_Boot”)

– it sets up again ARM PLL and I/O (it could be different from the bootloader)
– if configured at boot time (not in default deliveries), it initialises the USB CDC device for debug (alternative to serial)
– it enters the main loop:
– it initialises a SD card, gives a message if the SD card is changed
– if no SD card is found and FPGA is not initialised yet (this is the latest firmware found on SVN!):
– it configures the FPGA with a default setup (stored compressed on the ARM flash)
– it initialises the on-board PLL with default settings (HD27 mode)
– it de-initialises the video coder (if fitted)
– it initialises the video buffer
– it checks if the FPGA is sane (SPI status readout)
– it initialises the video DAC (via FPGA)
– it initialises the DRAM train (via FPGA)
– intialise OSB and give a proper message to enter an SD card
– wait until SD card is inserted, key entries are handled to detect a full board reset (F11 button)
– if SD card is found:
– load the “replay.ini” file in the root directory of the SD card
– parse the first part of the ini, including on-board PLL and FPGA configuration
– load the FPGA configuration (the default is “loader.bin”)
– parse the second part of the ini, including initialising memories on FPGA (e.g. ROM files)
– handle OSD and keypresses
– handle generic FDD/HDD data exchange between SD card and FPGA
– handle external FPGA configuration (detects drop of FPGA done signal and re-inits SDRAM params)

With an SD card inserted, further INI files can be loaded or core configuration can be changed with a menu setup also defined in the INI file.

Details about INI files are explained here:

Dynamic ARM code from SD card aka rApps (= Replay Applications):

Such an example is found on SVN in “…\sw\arm_sw\Replay_Apps\rAppFlashUpdater”.

This feature is e.g. required for the latest flasher running directly from SD card. Of course for flashing the flash can’t be used for execution, thus code needs to be set up on the (embedded) SRAM of the processor and executed from there. To avoid wasting ARM flash for storing this SRAM code, it makes sense to load it from SD card as well.

To ensure the rApp code is small to fit on the ARM SRAM, it does not handle the SD card and filesystem, but only FPGA communication. As the ARM does not have a lot of RAM, the FPGA memory (or even DRAM) is used via this FPGA communication. Thus this procedure to load rApps is a little complicated, but allows very efficient dynamic code run on ARM and still provides plenty of memory for additional data required (and used) by the application:
From the Replay boot loader (running in flash), controlled by a special INI file:
– ARM code is copied to FPGA memory (or DRAM)
– additional code (e.g. the new flash code) is copied to FPGA memory (or DRAM)
– an additional data (RAM) block on the FPGA memory contains config information the rApp may require
(like flash checksums)
– The ARM code is then copied from FPGA back to ARM SRAM. This routine has a very small SRAM footprint, keeping
most of the SRAM for the rApp to be loaded. Then the execution continues on this loaded code in SRAM.
– The rApp initialises the SPI interface again and fetches all it needs from the FPGA memory.
– For the flasher, it can compare checksums between the FPGA memory content and its own flash content etc.
– Finally it downloads the FPGA memory back to flash in case an update is needed.

The FPGA framework aka (dynamic) FPGA configuration set up by the ARM:

– The default is found on SVN in “…\hw\replay\cores\loader”
(–> but it may be of course any other project in “…\hw\replay\cores\*”)

This loader version is delivered as default and set together with the “replay.ini” in the root directory of an SD card and can be seen as reference setup for the Replay framework.

It contains a test bench an all configuration needed for synthesis (UCF file for ISE). You find also simple batch files for generating the FPGA bin file and to start a simulation. You just need a decent Web ISE development setup from Xilinx.

The main files are:
– Replay_Top.vhd:
– maps all I/O for the replay board
– should not be changed, as it fits to the above setup and test bench
– Core_Top.vhd:
– top-level configuration
– embeds the own core and needs to be adopted accordinly
– Core_Video.vhd:
– the example “core”, can be replaced by an own development

The example implements:
– The generic Replay video generator, handling several (configurable) video modes
– the OSD menu support
– it provides basic mouse handling
– it shows by the background image example how to use the SDRAM
– it shows a test image (a coloured cross and white frame) for checking display setups and video modes
– it provides a test sound to show how to use the audio DAC

Some more words on the framework, on-board/FPGA clocking and video generation

It provides a generic clocking scheme which should be used accordingly, as it is also properly constrained in the UCF. It is not recommended to set up own clocks, but use this clocking with proper enabling (clock gating) instead.

–> SYSCLK is derived from CLK_A on the replay board (:4), source is y0 from the on-board PLL
–> AUDCLK, AUXCLK is derived from CLK_B on the replay board (:1), source is y2 from the on-board PLL
–> VIDCLK is derived from CLK_C on the replay board (:1), source is y4 from the on-board PLL
You may also want to set up y1 on the on-board PLL for the video coder (Composite/SVHS out).
y3/y5 is for expansion cards and usually not needed (yet).

You can find a simple calculation sheet from INI setups to clock values on SVN in “…\hw\replay\doc\PLL_SETUP.ods”

The y0 clock needs to be higher than 100MHz, as the derived SYSCLK must be above 25MHz to properly support all SPI speeds the ARM will use (depends on the used SD card and max. transfer speed there) for ROM uploads, OSD handling and so on. It is also used for the DDR RAM clock.

In case you want to use the Replay video generator (which you should for for supporting all DVI/HDMI modes and so on) the idea is as follows:

–  The core runs solely on SYSCLK, including its own video generation, usually a classic (digital RGB out with H/V sync lines).
–  The video generator driving the on-board video DAC runs on VIDCLK (and “drives” the DVI connector).

–  The replay video converter (explained here: maps and syncs the SYSCLK-domain based video signals to the Replay video generator.

– an Android app to calculate PLL settings for SYSCLK and VIDCLK based on the video format of the core and the video format to be used on the DVI connector (on SVN in “…\sw\tools\ReplayPllCalc”). Runs on real device or any Android emu.

In case SD-video formats shall be used on Composite/SVHS out as well, the three on board PLL are used:
– one for SYSCLK (y0)
– one for the PLL/NTSC Coder clock (colour burst frequency on y1)
– one for VIDCLK (y4)

In this situation, the AUXCLK/AUDCLK on y2 will be derived from any of the PLL existing clocks and will be very likely off the optimum frequency the audio DAC expects. But this should be no issue at all.

Example implementation

You can check out the VIC20 core of an implementation fully using/featuring Mikes framework library and supporting in one version an HD video mode with progressive/double-scan output for DVI/HDMI and on an other an SD video mode with interlaced output like the original core used for HDMI/Composite/SVHS. It also shows how ROM uploads etc. could be implemented.

Views: 220