In this post we pick up where we left of last time and start looking at the design and implementation of the USB device that I have been working on. First things first though, the code for the entire project can be found in this github repository. The reader is advised to have that readily available.

The first major goal of this series will be to have the device play along during the USB enumeration process so that the host can set address and read relevant descriptors. This can be verified by making sure that the device shows up properly when issuing a lsusb on my Linux workstation.

Design of usbdev (and SoC)

Clock recovery

The signaling in USB consists of the differential pair D+ and D-. For a full speed (FS) device the bit rate is 12Mbit/s so if we were able to sample at exactly the right spot a 12Mhz clock should suffice, in reality though this is the tricky bit. To aid in clock recovery USB employs both NRZI encoding and bit-stuffing to ensure that the differential pair will contain a level transition at least every seven bit times.

With this in mind it would seem a reasonable approach to run the design at 48Mhz and oversample the differential pair with a factor of four. More precisely by obtaining four equally spaced samples for each bit time we should be able to adjust the actual sampling position (in terms of 1/4 bit times) to be as far away for any level transition as possible (i.e. in the middle of the eye diagram).

So running at a 48Mhz clock we have a 2-bit counter (reg [1:0] cntr) incrementing each cycle (except for when adjusting). When the counter equals zero we perform the real sample. For every 48Mhz cycle we also sample and shift the value into a four bit shift register (reg [3:0] past). Since we want any possible level transition to occur in the middle of this shift register we either advance or delay the counter with one increment depending on if a transition occurred early or late in the shift register.

  // A bit transition should ideally occur between past[2] and past[1]. If it
  // occurs elsewhere we are either sampling too early or too late.
  assign advance = past[3] ^ past[2];
  assign delay   = past[1] ^ past[0];

If advance is active the counter increments by two and if delay is active there is no increment (for the given 48Mhz cycle).

After seeing some transitions this should be able to adjust the sampling point to where the signal lines are stable.

This is not the only option for clock recovery though but it is the simplest one to implement. However if the bit rate was significantly higher compared to the frequency our design is clocked at (e.g. USB high-speed at 480Mbit/s) other methods would have to be used. In such cases I would suspect that sampling at the exact bit rate and then phase adjusting the clock (with help of a PLL) until the synchronization pattern is reliably detected would be the method of choice. Of course simply phase adjusting until the synchronization pattern is detected would not be enough, you probably want a FSM to find the midpoint of highest and lowest phase adjustment that makes the pattern detectable.

A variant of the previous approach would be to use adjustable delay lines on the inputs instead of phase adjusting the clock.

The HW/SW interface

To control the USB block some kind of hardware/software interface needs to be created. I have chosen the simplest possible design that came to mind.

Each endpoint has a 8 byte buffer in RAM, starting at RAM address zero comes the 16 OUT endpoints immediately followed by the 16 IN endpoints. In total 256 bytes of RAM are used for endpoint buffers.

In addition the following registers are exposed to the CPU.

Register Access Address
R_USB_ADDR R/W 0x20000000
R_USB_ENDP_OWNER R/W 0x20000004
R_USB_CTRL W 0x20000008
R_USB_IN_SIZE_0_7 R/W 0x2000000C
R_USB_IN_SIZE_8_15 R/W 0x20000010
R_USB_DATA_TOGGLE R/W 0x20000014
R_USB_OUT_SIZE_0_7 R 0x20000018
R_USB_OUT_SIZE_8_15 R 0x2000001C

I have not bothered documenting them in detail yet but essentially it is as follows.

  • R_USB_ADDR - is the 7-bit device address.
  • R_USB_ENDP_OWNER - The 16 low bits correspond to the 16 OUT endpoints and the 16 high bits correspond to the 16 IN endpoints. A set bit indicates that the corresponding endpoint buffer is handed over to the USB block. This means that the corresponding endpoint will accept one IN/OUT packet (with data) and ACK, after which the bit will be cleared and the buffer is handed over to the CPU. When the CPU owns a buffer the USB block will respond with NAK to all IN/OUT+DATA packets.
  • R_USB_CTRL - Control pull-ups for attach.
  • R_USB_IN_SIZE_0_7 - 4-bits per endpoint indicate the number of bytes in the corresponding buffer.
  • R_USB_IN_SIZE_8_15 - 4-bits per endpoint indicate the number of bytes in the corresponding buffer.
  • R_USB_DATA_TOGGLE - The 16 low bits select the data toggle (DATA0/DATA1) to be expected for the 16 OUT endpoints and the 16 high bits select the data toggle to be sent for the 16 IN endpoints.
  • R_USB_OUT_SIZE_0_7 - 4-bits per endpoint indicate the number of bytes in the corresponding buffer.
  • R_USB_OUT_SIZE_8_15 - 4-bits per endpoint indicate the number of bytes in the corresponding buffer.

Of course a primitive interface like this requires a fair amount of CPU intervention and if one wants to offload the CPU and achieve higher performance a lot of this handling could be automated by the USB block itself.

SoC

The resulting SoC consists of the USB device block, a PicoRV32 RISCV CPU, RAM and ROM. From the CPU’s side the memory map is as follows.

Base Size Memory
0x00000000 16KB ROM
0x10000000 4KB RAM
0x20000000 32B USB control & status registers

The USB device block’s endpoint buffers reside in RAM and the block has priority over the CPU when accessing. RAM is constructed of four 8-bit banks. The CPU is connected by a 32-bit bus while the USB block has a 8-bit bus. As mentioned USB has priority when accessing RAM but in practice this should result in minimal stall cycles for the CPU as its clock is significantly higher than the rate of which USB will read/write bytes.

Simulation

The simulation environment is based around Verilator and a set of C++ classes to build, manipulate and dissect USB packets.

usb-pack-gen

The USB packet generation and manipulation code is found in sim/usb-pack-gen.h and sim/usb-pack-gen.cpp. It allows both encoding and decoding of USB packets. In essence it consists of three layers

  • UsbPacket - Base class for the various USB packet types (SETUP,IN,OUT,DATA0,DATA1,ACK,NAK) that allow easy high level manipulation. UsbPacket is derived into the various packet types.
  • USbBitVector - A sequence of USB bits.
  • UsbSymbolVector - A sequence of USB symbols (J,K,SE0,SE1).

There are methods for translating in both directions performing the necessary steps such as CRC calculation, bit-stuffing and NRZI encoding.

Test-suite

The test-suite is invoked by the sim/runner.pl script that will execute all tests found in sim/tests (or the ones given as argument). Each sim/tests/test_XXX.c consists of both firmware code to be compiled for the RISCV CPU as well as usb-pack-gen code compiled into the Verilator based simulation environment.

To get a feel for what a test looks like I suggest studying sim/tests/test_003.c. It is probably a good idea to start with running the test and looking at the decoded USB traffic.

./runner.pl tests/test_003.c
sigrok-cli -i test_003.sim.sr -P usb_signalling:dp=0:dm=1,usb_packet | awk '/usb_packet-1: [^:]+$/{ print $0 }'

Should give us something like this (but without the host/device annotations I added manually).

host   : OUT ADDR 0 EP 0
host   : DATA0 [ 23 64 54 AF CA FE ]
device : ACK
host   : IN ADDR 0 EP 1
device : NAK
host   : IN ADDR 0 EP 1
device : DATA0 [ 24 65 55 B0 CB FF ]
host   : ACK
host   : IN ADDR 0 EP 1
device : NAK

So the host sends six bytes of data to endpoint zero and the device acknowledges it. The host tries to read from endpoint one but the device is busy (the dummy loop reading the address register) so responds with a not acknowledge. The host tries to read again and this time the device responds with the byte array with each element incremented by one. The host acknowledges the received data. The host tries to read yet again but now the endpoint buffer has been handed back to the CPU so the USB block responds with not acknowledge.

Now with this in mind it should be rather clear what is going on in the test case.

Test-suite artifacts

When run the test-suite outputs several useful artifacts. These are

  • test_XXX.comp.log - Any messages (errors and warnings) from the firmware compile.
  • test_XXX.sim.log - Log and debug printouts from the simulation.
  • test_XXX.sim.vcd - RTL simulation waveform in VCD format.
  • test_XXX.sim.sr - Captured USB signaling in sigrok’s format.
  • test_XXX.elf - Firmware code for the RISCV CPU.
  • test_XXX.so - Shared object with test-case code to be loaded into the simulator.

A real world example

I thought we finish this post by using some work-in-progress driver code (sw/usb-dev-driver.c) to perform the first steps of the USB enumeration process with the ULX3S connected to my Linux workstation. The logic analyzer captured the following trace (5M samples at 5Mhz but only 68KB file size with sigrok’s native format). The reader is urged to decode it by at least using

sigrok-cli -i ulx3s-usbdev-1.sr -P usb_signalling:dp=1:dm=0,usb_packet | awk '/usb_packet-1: [^:]+$/{ print $0 }'

but preferably also in PulseView to better see reset signaling etc.

In summary the following happens in the trace. The host resets the bus, the device descriptor is read, the host resets the bus, the host sets the device’s address to 23, the host reads the device descriptor once again, the host reads the configuration descriptor (first nine bytes only to figure out total size), the host reads the total size of the configuration descriptor (which includes interface and endpoint descriptors but in our case there are none so the size is still 9 bytes). Finally the host tries to set the configuration of the device but this is not yet implemented in the firmware and the device responds with NAK indefinitely.

On my Linux workstation the dmesg log contained the following lines

[ 2980.864737] usb 1-5: new low-speed USB device number 16 using xhci_hcd
[ 2981.014361] usb 1-5: config 0 has no interfaces?
[ 2981.014369] usb 1-5: New USB device found, idVendor=1234, idProduct=5678
[ 2981.014372] usb 1-5: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 2981.014647] usb 1-5: config 0 descriptor??
[ 2986.192817] usb 1-5: can't set config #0, error -110

and a lsusb -v contained

Bus 001 Device 016: ID 1234:5678 Brain Actuated Technologies
Couldn't open device, some information will be missing
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass          255 Vendor Specific Class
  bDeviceSubClass       255 Vendor Specific Subclass
  bDeviceProtocol       255 Vendor Specific Protocol
  bMaxPacketSize0         8
  idVendor           0x1234 Brain Actuated Technologies
  idProduct          0x5678
  bcdDevice            1.00
  iManufacturer           0
  iProduct                0
  iSerial                 0
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength            9
    bNumInterfaces          0
    bConfigurationValue     0
    iConfiguration          0
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              100mA

As can be seen my made up idVendor actually corresponded to a registered vendor as can be confirmed in Linux usb.ids.

Slightly strange though is that lsusb claims that the device address is 16 while the trace clearly contains a SET_ADDRESS and subsequent use of address 23.

Next steps

The logical next steps would be to continue working on the driver code until the enumeration process completes successfully supporting all required requests. Likely some RTL bugs will show up in the process but we will try to deal with them when they do.

That is it for today. As always if you have questions or feedback - leave a comment below!