VLM5030 gate-level design validation and lock-step comparison

Now with the extracted gate-level design in place, how does it perform?

The initial validation consisted of two basic phases:

  1. VHDL simulation in a test bench that dumps the PCM audio to a binary file. The binary is imported with Audacity and converted to WAV format for listening.
    Goal: Prove that the design can actually generate correct samples in a flexible environment.
  2. Integration of the gate-level design in several Konami FPGA conversions like Track’n’Field, Hypersports, Yie-Ar-Kung-Fu and others
    Goal: Prove that the design interfaces correctly with the target systems and compare audio with real hardware.

Each method has its pros and cons. Simulation enables maximum controllability and observability of internal signals, but its execution speed is slow and variation of input is limited. Running the replacement in FPGA conversions adds variation in terms of system interfacing and sample/function coverage but offers almost zero control and observe features. One can just run the game and listen to the samples that are played in attract mode or in the first few levels.

To bridge both worlds, I created a test rig consisting of a small Cyclone II board that contains the gate-level replacement plus the ROM and also controls an external VLM5030 chip. It enables full control over the selection of speech samples without the need to trigger specific in-game situations.

Block diagram of the VLM5030 test rig

Operation of the chip and the replacement is observed by a logic analyzer that’s hooked to the address bus. This makes use of VLM5030’s feature to output audio as 10 bit signed integer PCM samples on the address bus. Tracing both the chip’s and the replacement’s audio stream from a common trigger enables direct comparison of their PCM output with sample rate granularity.

The results are quite impressing – the screenshot below shows a range of Salamander’s “Destroy them all!” speech sample. Topmost channel is the gate-level design, followed by the chip and a re-run of the chip as the bottom channel. They’re perfectly in sync and sample values appear to match as well!

Detail of Salamander’s “Destroy them all!” at 20 ms. From top to bottom: GL design, VLM5030, VLM5030 2nd run

However, there are locations where all three waveforms begin to diverge (around the 0.108 s mark). It’s not necessarily just a difference between the gate-level design and the chip, but also the chip produces a different waveform during its 2nd run:

Detail of Salamander’s “Destroy them all!” at 100 ms. From top to bottom: GL design, VLM5030, VLM5030 2nd run

All three converge later at the 0.270 s mark and continue in lock-step:

Detail of Salamander’s “Destroy them all!” at 255 ms. From top to bottom: GL design, VLM5030, VLM5030 2nd run

Conclusion

Initial comparison results show that the gate-level design produces identical audio waveforms for most of the frames. It differs during frames where the chip itself exhibits seemingly non-deterministic behaviour. In addition, the gate-level design also shows such behaviour and produces different waveforms during repeated runs of the same sample. Further analysis is required to understand this in more detail.

References

Gate-level replacement: salamander_gl.wav

VLM5030 chip: salamander_chip.wav

VLM5030 chip, 2nd run: salamander_chip_2ndrun.wav

Extract, reconstruct, simulate … repeat

VLM5030 die, highlighted ROMs

When I started to search for the random generator the first challenge was the decision where to begin with. Looking at the die, there are not many landmarks that provide clear guidance. The ROMs are quite prominent due to their regular structure, but they don’t have a direct relationship to the random generator. Furthermore, the purpose of two of the three ROMs was entirely unclear.

VLM5030 die, highlighted pad ring

Next on the list of obvious items are the pads of course. They neatly follow the DIP40 pin-out and are assigned to functions in no time. Should I start off from the data or address bus? That would mean to dig through data paths whereas I expected the random generator to be located in a processing or calculation block. Somewhere in the middle between K-factors and final PCM output. With potentially deep logic cones on either side, not an appealing idea. I briefly considered entering the design via the few control pins, but the perspective to untangle piles of potentially complex control logic wasn’t appealing at all.

VLM5030 OSC1 and OSC2 surrounding the oscillator circuitIn the end I decided for tracing the clock tree first. That’s a single source signal, easy to identify from the pads, and – if Sanyo designed a reasonable clock system – fans out to all function blocks. I also hoped that the logic in the clock tree is less complex and that the purpose of such logic is easy to understand.

The OSC2 pin feeds directly into the clock generation system where the 3.58 MHz clock is divided by 2 before being used anywhere else. Identifying the divider was my first lesson in tracing logic from the die shot. After much back and forth I figured that all these polygons are just standard logic gates – and that they actually make sense. I mean, it’s located in the center of the die, and a /2 divider can be expected at the clock tree’s root.

Transistor-level circuit of the OSC2 clock divider

OSC2 /2 divider

Curiosity was fueled enough to tackle further gates in the neighbourhood to the divider. This time I faced a number of flip-flops – they look different than the logic gates but once I understood the concept of 2-phase clocking and latch/feedback topology, they turned out to be positive edge triggered D-type registers arranged in a shift chain with feedback. Not too bad for starters, but isn’t that a bit too easy? Let’s build it a VHDL model and check the extracted circuit in simulation. The waveforms from simulation backed the extraction result: That shift chain generates 10+ staggered clocks.

Simulation waveforms of VLM5030's derived clocks

Core clocks of the VLM5030, clk2 is oscillator /2

Step by step I paved my way through the design (still searching for the random source):

  1. Extract the gates of a sub-circuit
  2. Reconstruct the sub-circuit as a VHDL model based on the extraction result
  3. Simulate the model to examine its behaviour and to validate the extraction
  4. Repeat with the next sub-circuit

At the point when the random source had been identified, I looked back at quite a collection of models running nicely in simulation. Should I stop here or keep on moving until I would hit a wall maybe?

TLDR; I decided to continue, repeatedly cycled through above steps and didn’t hit any walls. The VLM5030 gate-level replacement is available at FPGAArcade’s github repo.

ROM extraction

The previous post introduced a tracing procedure to extract logic gates from VLM5030’s layout information. This post shows how the procedure can be extended to extract VLM5030’s embedded ROMs.

Simple ROMs

The image below shows a portion of the sequencer ROM:

Tracing of sequencer ROM bitline 0 and equivelant circuit

Bitline 0 in the sequencer ROM

  • The blue metal bar is GND
  • The horizontal red metal bar is bitline 0 (out of 37)
  • The vertical orange polysilicon wires represent a total of 12 word lines
  • The greenish overlays on polysilicon are 5 transistors that short the bitline to GND under control of the respective word line

Applying the tracing procedure extracts the function of bitline 0: It’s a NOR with 5 inputs.

To generalize the picture, we can conclude that each bitline is a NOR with 12 inputs. Each input is either 0 or 1, depending on the presence of a word line transistor:

  • No word line transistor –> ‘0’
  • Word line transistor present –> ‘1’

The other bitlines follow the same NOR concept, just with different combinations of word lines as inputs. Not too bad, this ROM is just a stacked pile of NORs.

Translated to VHDL:

-- 12 word lines
xwl <= na0 & a0 & a1 & na1 & na2 & a2 & a3 & na3 & na4 & a4 & xa5 & xa6;
-- bitlines as NOR of word lines with transistor pattern
xromdo <= (
  00 => norf(xwl, "100110100100"),
  01 => norf(xwl, "010110010100"),
  02 => norf(xwl, "011010011000"),
  ...

Array of ROM slices

Extracting the ROM that stores the K-factors is a bit more complex since it’s partitioned into 6 independent tables. Each of which connects to a common bus to output the data.

Shown below is bitline 0 exhibiting the same composition of word line transistors as before. There are two differences this time, though: The bitline is gated by 2 enable transistors (ena2 & ena0) and there’s no pull-up transistor for termination.

Bitline 0 in KROM slice 2

Bitline 0 in KROM table 2

Remember the statement about logic gates without termination? The job’s not done until we hit a pull-up – it ain’t over ’till the fat lady sings.

Tracing back further, we end up at a fat pull-up transistor that finally terminates all the bitlines 0 from each of the tables:

KROM bit 0 as distributed complex NORThe corresponding logic function follows the hierarchical structure of the K-factor ROM:

  • 10 NOR gates [9:0], generating the 10 bit output data vector (NOR[0] in the image above)
    • Each NOR with up to 6 inputs, collecting the corresponding bitlines of the 6 tables (bitlines 0 of tables 2 and 3 in the image above)
      • AND function, enabling the bitlines of the currently active table
        • The bitlines themselves, represented by OR functions of the word lines where transistors are present

 

Fat lady by カロリーネ

Job done.

Logic gates extraction

The technology

We need to set the stage first for this post. The VLM5030 is built from depletion-load NMOS logic. This means there are basically two types of devices:

  • Enhancement-mode NMOS transistors for shorting a node to GND (or forwarding voltage between two nodes)
  • Depletion-mode NMOS transistors to act as pull-ups to VDD

In addition, there are a total of three layers to connect devices in decreasing order or conductivity:

  • Metal (red) for general signal routing and VDD & GND distribution
  • Polysilicon (orange) for general signal routing and transistor gates
  • Diffusion (transparent) for routing over small distances or of uncritical signals

Layers are interconnected with vias:

  • Metal to polysilicon
  • Metal to diffusion

A remarkable item is that the VLM5030 doesn’t connect polysilicon directly to diffusion. Consequently, whenever a signal needs to change from polysilicon to diffusion or vice versa, there’s first a via from polysilicon to metal and next to it a second via connecting metal to diffusion. Other chips like the MOS6502 solve this with buried contacts that allow direct connections between polysilicon and diffusion. So Sanyo used a less advanced technology, probably to save on cost per wafer. But it actually helps the tracing task since the layer change is pretty obvious this way.

Basic logic gates

To understand which kind of logic gate is built by a particular group of transistors, apply the following basic tracing procedure:

  1. Start from the GND metal
  2. Follow the path(s) along the enhancement-mode transistors
  3. Stop at the pull-up transistor
Basic logic gates NOT, NAND, NOR, XOR. Layout and schematic

Tracing procedure applied to basic logic gates

The pull-up transistor acts as termination, so there must always exist one in a gate; not zero, not two, exactly one. It corresponds to the gate’s dot.

This aspect is important for tracing – the logic gate is incomplete without a pull-up.