Big Picture: There has been, for about a year, an activity to try to work on a RISC-V vector accelerator that is tailored to be able to execute machine learning workloads. And there is one innovation that we are trying to do that nobody else in the world does and that is to implement an instruction called fused dot-product.
A fused dot-product is a mechanism to introduce user-defined rounding in arithmetic. And if you do that, then what you gain is you are able to do smaller representations for the weights in the ML model. If we look at a traditional linear algebra execution engine, it will use a dot-product, but it will use a fused multiply-add. And if you do that then every sum of the “sum of products” that dot-product exists will have a rounding event. If we have a large matrix, which is typical in a machine learning environment, you have millions of rounding errors.
And the reason this is a problem for machine learning is that the matrices that are being used are such that they tend to be multi-modal. What it really means (from a numerical point of view) is that you have multiple peaks of energy that you are trying to resolve. And if you have rounding error, you will have one of these peaks basically attenuate to a point where it is no longer resolvable
This is what fused dot-product is solving
From a verification point of view, the real fun starts in the fact that we are going to bring in different arithmetic. We could bring in IEEE, we could bring in bfloat from Google, we could bring Microsoft fp9 or we can bring in the number system called posit, which is a tapered floating point.
There is research out there which people have done where they have demonstrated an 8-bit posit can compete with single precision IEEE floating point in these machine learning applications, if you use fused dot-product – That is the context of the project
As you can imagine, there is close to 7000+ blocks that are of interest here that have their own complexity and need
The project which we are starting here is really to try to build a testbench and verification strategy that would allow us to, progressively over time, add these blocks that are of importance
We will start off with an integer ALU, not very much complexity in that, but it gives you a vehicle to start thinking about the whole verification strategy.
One of the things we are trying to do is adhere to opensource strategy, so that anybody can pick up this GitHub repo and be productive with this, without needing industrial tools. And one vehicle which is industrial strength for verification is Verilator
In a nutshell, the project really is to build a Verilator Verification environment i.e. a structure in which we can set up testbenches that are executed with Verilator. The thing which is interesting in this project is we are going to tie that Verilator piece with a golden model arithmetic library and that is going to be something that you can publish as nobody else in the world has that
It’s a Verilator Testbench environment that uses an online arithmetic library to generate the right bit pattern. We are not using randoms, but we are using a Golden model. If you progress from ALU to a vector accelerator, you will have a vector lane, vector register file, vector load/store unit, vector instructions.
We can layer this testbench in such a way that when a new project resource comes online and needs to implement (for eg.) vector scale instruction, then there is a reusable testbench structure which that person can quickly pick-up, set as their basic structure, build their functionality into it, and have a working testbench environment in which they can test their work
First round of HDP will be a testbench which is driven purely by a C++ verilog testbench that brings in the golden model. This is the first step in which you take a verilog module that is a transformation of some sort, and you tie that into a Verilator testbench that can drive that module and that can also drive the golden result. So,this is Verilator specific in the sense that Verilator is really the main function of C++ program. And when you bring in a verilog module, that gets transformed into a C++ code that then gets driven by the main function of Verilator.
There are 2 problem statements here
One is to setup the environment so that you have the right Makefile, you have the right structure which you can expand to new functional units
Second is how to link in the golden model
Summary of Project execution:
1. Articulate a productive organization in which testbenches are built
2. Then we need a productive Makefile that a programmer or designer can simply say “make test” and you have the test environment
3. Then we need to have integration with the golden model which brings in new GitHub repo and that machinery needs to be figured out