Real Time Implementation of BitNetMCU

Due to Low flash space available on VSDSquadron Mini, we had to reduce number of layers in neural network as compared to demo implementation


  1. Overview
  2. Components Required
  3. Project Flow
  4. Circuit Diagram
  5. BitNetMcu Implementation
    1. Training Neural Network model
    2. Exporting Model weights to C file for using with VSD
    3. Testing Output prediction of model
    4. Generating dll file for inference
  6. Uploading BitNetMCU to VSDSquadron and realtime implementation
    1. Overview
    2. ToolChain Installation
    3. Demo implementation with test images stored on board while flashing
      1. Implementation
      2. Result
    4. Real Time Implementation with USART communication with Arduino and OV7670 integration
      1. State Diagram
      2. Connection Diagram
      3. Installation
      4. Camera Capture and Testing
      5. Image Capturing, Comperssion and sending over Uart
      6. VSDImplementation


This is a simple implementation of a low-bit quntized neural network on Risc-V microcontroller. The project is based on the Risc-V microcontroller and MNIST dataset.


  • Board- VSD Squadron Mini
  • Processor used- CH32V003F4U6 chip with 32-bit RISC-V core based on RV32EC instruction set
  • dataset used- MNIST
  • Algorithm used- Low-bit quntized neural network

Low-bit quantized neural network

Neural network quantization is a powerful technique for deploying deep learning models on resource-constrained devices.
By reducing the memory footprint and computational requirements of models, quantization enables efficient inference and improved privacy.

Why Quantization?

Quntization is a easy way to compress the model. It can easily applied on existing model without loss of accuracy.

Types of quantization-

  1. Weight Quantization
    The weights of the neural network are reduced to fewer bits precision, typically 8-bits or 16-bits.
  2. Activation Quantization
    The activations of the neural network are also represented using fewer bits, following a similar approach to weight quantization.
  3. Post Training Quantization
    In post training qunatization the weights and activations are quantized to lower precision bits without need of retraining the model.
  4. Quantization Aware Training
    The quantized model is trained to minimize the accuracy loss due to quantization. This involves adjusting the learning rate, batch size, and other hyperparameters to optimize the performance of the quantized model.

MNIST dataset

The MNIST dataset is a collection of images of handwritten digits that is commonly used for training machine learning models.\

Key Features-

  • The dataset consists of 60,000 training images and 10,000 testing images.\
  • Each image is a 28×28 grayscale image.\
  • The images are normalized to fit into a 28×28 pixel bounding box and anti-aliased, introducing grayscale levels.
  • MNIST is widely used for training and testing in the field of machine learning, especially for image classification tasks.

Dataset Structure

The MNIST dataset is split into two subsets:

  • Training Set: This subset contains 60,000 images of handwritten digits used for training machine learning models.
  • Testing Set: This subset consists of 10,000 images used for testing and benchmarking the trained models.

Sample Image of the MNIST dataset

The example showcases the variety and complexity of the handwritten digits in the MNIST dataset, highlighting the importance of a diverse dataset for training robust image classification models.


Components Required for BitNetMCU

  • VSD Squadron Mini
  • USB Cable
  • Camera module(optional)

Components Required for real time BitNetMCU

  • VSD Squadron Mini
  • USB Cable
  • Camera module (arduino camera OV7670)
  • jumper wires
  • battery
  • 7 segment display
  • Breadboard
  • push button

Flow of the project

BitNetMCU Implementation

  • Configuration
  • Model Training
  • Exporting the quantized model
  • Testing the C-model
  • Deploying the model on the VSD Squadron Mini
  • Testing demo accuracy on VSD Squadron Mini

BitNetMCU real time Implementation

  • Connecting the camera module to the VSD Squadron Mini
  • Capturing images using the camera module
  • Preprocessing the images
  • Inferring detected digits using the model on the VSD Squadron Mini
  • Displaying the output on the 7 segment display

Circuit Connection for BitNetMCU

No hardware connections are required for BitNetMCU only we have to connect VSD squadron with computer using USB cable.

Circuit Connection for BitNetMCU real time

The following connections are required for BitNetMCU real time implementation:

BitNetMcu Implementation

Step 1: Install all required Libraries

pip install -r requirements.txt

Step 2: Setup Configuration

Edit trainingparameter.yaml file and update configuration settings as per following.

Quantization Settings

  • QuantType:4bitsym
    • Specifies the quantization method to be used. ‘4bitsym’ stands for symmetric 4-bit quantization, which reduces the precision of weights to 4 bits symmetrically around zero.
  • BPW:4
    • Stands for Bits Per Weight, indicating that each weight in the model will be represented using 4 bits.
  • NormType:RMS
    • Specifies the normalization technique. ‘RMS’ (Root Mean Square) normalization is used to standardize the range of independent variables or features of data. Other options include ‘Lin’ for linear normalization and ‘BatchNorm’ for batch normalization.
  • WScale:PerTensor
    • Defines the scale application strategy. ‘PerTensor’ means that scaling is applied to the entire tensor, whereas ‘PerOutput’ would apply scaling to each output individually.
  • quantscale:0.25
    • This parameter sets the scale of the standard deviation for each tensor relative to the maximum value, effectively controlling the spread of weight values after quantization.

Learning Parameters

  • batch_size:128
    • Indicates the number of training examples utilized in one iteration. A batch size of 128 means that 128 samples are processed before the model’s internal parameters are updated.
  • num_epochs:60
    • Specifies the number of complete passes through the training dataset. Training will occur over 60 epochs.
  • scheduler:Cosine
    • The learning rate scheduler type. ‘Cosine’ annealing gradually reduces the learning rate following a cosine curve. Alternative schedulers include ‘StepLR’, which reduces the learning rate at regular intervals.
  • learning_rate:0.001
    • The initial learning rate for the optimizer, determining the step size at each iteration while moving toward a minimum of the loss function.
  • lr_decay:0.1
    • Factor by which the learning rate is reduced. This is used with step-based learning rate schedulers like ‘StepLR’ but is not applicable with the ‘Cosine’ scheduler.
  • step_size:10
    • Step size for learning rate decay in the ‘StepLR’ scheduler, indicating the number of epochs between each decay step.

Data Augmentation

  • augmentation:True
    • A Boolean flag indicating whether data augmentation is to be applied. If True, data augmentation techniques will be used to artificially expand the dataset.
  • rotation1:10
    • Specifies the degree of rotation for data augmentation. Images will be rotated up to 10 degrees in one direction.
  • rotation2:10
    • Specifies the degree of rotation in the opposite direction, allowing rotations up to 10 degrees.

Model Parameters

  • network_width1:32
    • Width of the first layer in the neural network, indicating that the first layer contains 32 units or neurons.
  • network_width2:16
    • Width of the second layer in the neural network, with 16 units or neurons.
  • network_width3:16
    • Width of the third layer in the neural network, with 16 units or neurons.


  • runtag:opt_
    • A string prefix used for naming the run or experiment. This helps in identifying and organizing different experimental runs.


This configuration script sets the parameters for a machine learning experiment involving 4-bit symmetric quantization with RMS normalization and per-tensor weight scaling. The model will be trained using a batch size of 128 over 60 epochs with a cosine annealing learning rate scheduler starting at 0.001. Data augmentation includes rotations up to 10 degrees. The neural network architecture consists of three layers, each with 64 units. The run is tagged with the prefix “opt_” to facilitate easy identification.

Training Neural Network model

Execute training of model using file

  1. Install python (We used python version 3.12.3 for this demo)
  2. cd into BitNetMCU folder
  3. Create virtual envioronment using following command
python -m venv bitnetenv
  1. Activate virtual envioronment using following command
  1. Install required dependencies in virtual envioronment using following command
pip install -r requirements.txt
  1. Start traing by executing following command
  • Once training ends trained model will be saved in modeldata folder

Exporting Model weights to C file for using with VSD

  • Export weights of model using file
    use following command
  • This will model weights in BitNetMCU_model.h file
  • We Tried changing multiple model parameters via trainingparameters.yaml file and saved to parameters as files with network sizes added

Weight Distribution of generated weights of model

Weight Intensity of generated weights of model

Testing Output prediction of model

  • Here we try to provide multiple 16×16 images as input to the model and test the generated predictions
  1. using gcc compile BitNetMCU_MNIST_test.c file
gcc BitNetMCU_MNIST_test.c -o testoutput.o
  1. Execute the compiled file
  • You should be able to see the labels and predictions generated for the test images

Generating dll file for inference

  1. install Make
  2. run make command

Uploading BitNetMCU to VSDSquadron and realtime implementation

Registration for Ethical RISC-V IoT Workshop

Welcome to Ethical RISC-V IoT Workshop

The “Ethical RISC-V IoT Workshop” at IIIT Bangalore, organized in collaboration with VSD, is a structured, educational competition aimed at exploring real-world challenges in IoT and embedded systems. Participants progress through three stages: building an application, injecting and managing faults, and enhancing application security. The event spans from May 9 to June 15, 2024, culminating in a showcase of top innovations and an award ceremony. This hands-on hackathon emphasizes learning, testing, and securing applications in a collaborative and competitive environment.

Rules :
  1. Only for Indian Student whose college is registered under VTU
  2. Only team of 2 members can Register
  3. Use only VSDSquadron Mini resources for product development
Awards :
  1. Prize money for final 10 Team
  2. 3 Winner team’s Product will be evaluated for Incubation
  3. 7 consolation prizes
  4. Completion Certificate to final round qualifier
  5. Chance to build a Proud Secured RISC-V Platform for India

Date for Registration : 9th May - 22nd May, 2024
Hackathon Inauguration : 23rd May 2024

VSDSquadron (Educational Board)

VSDSquadron, a cutting-edge development board based on the RISC-V architecture that is fully open-source. This board presents an exceptional opportunity for individuals to learn about RISC-V and VLSI chip design utilizing only open-source tools, starting from the RTL and extending all the way to the GDSII. The possibilities for learning and advancement with this technology are limitless.

Furthermore, the RISC-V chips on these boards should be open for VLSI chip design learning, allowing you to explore PNR, standard cells, and layout design. And guess what? vsdsquadron is the perfect solution for all your needs! With its comprehensive documentation and scalable labs, thousands of students can learn and grow together.

VSD HDP (Hardware Design Program) Duration-10 Week

With VSD Hardware Design Program (VSD-HDP),  you have the opportunity to push the boundaries of what exist in open source and establish the new benchmark for tomorrow.

It will leverage your degree in Electrical or Computer Engineering to work with

  • Programmable logic
  • Analog/ digital IP
  • RISC-V
  • Architecture & microprocessors
  • ASICs and SoCs on high-density digital or RF circuit cards
  • Gain hands-on knowledge during design validation and system integration.

Sounds exciting to just get started with expert mentors, doesn’t it? But we are looking for the next generation of learners, inventors, rebels, risk takers, and pioneers.

“Spend your summer working in the future !!”

Outcomes of VSD Online Research IP Design Internship Program

  1. Job opportunities in Semiconductor Industry
  2. Research work can be submitted to VLSI International journals
  3. Participate in Semiconductor International Conference with Internship Research Work
  4. Paper Publications in IEEE Conference and SIG groups
  5. Tape out opportunity and IP Royalty
  6. Interact with world class Semiconductor designer and researchers
  7. Academic professions where more research projects are encouraged.
  8. All the above research and publication work will help colleges and institutes to improve accreditation levels.

Know More Information

VSD – Intelligent Assessment Technology (VSD-IAT)

VSD – Intelligent Assessment Technology (VSD-IAT) is expertly built training platform and is suited for designer requirements. Semiconductor companies understand the value of training automation and Engineer performance enhancement, and do not need to be convinced of the impact of a virtual platform for learning. VSD trainings are quick, relevant, and easy to access from any device at any time zone.

VSD Intern Webinars

VSD Interns made it happen !!

VSD is working towards creating innovative talent pool who are ready to develop design and products for the new tech world. VSD believes in “Learning by doing principle” , and always prepare the student to apply the knowledge learned in the workshops, webinars and courses. We always push our students to work on new designs, test it and work continuously till it becomes the best performing design. Any student who enrolls to VSD community starts working with small design and grows with us and develops a tapeout level design with complete honesty and dedication towards the Work !!

Check out VSD Interns Achievement!

VSDOpen Online Conference

Welcome to the World’s only online conference in Semiconductor Industry VSDOpen Conference. With enormous support and global presence of audience from different segments of industrial lobby and academia made a highly successful event. Evolution is change in the genetic makeup of a population over time, online conference is one kind evaluation everyone adapt soon. 

  • VSDOpen 2022 is an online conference to share open-source research with the community and promote  hardware design mostly done by the student community.
  • VSDOpen 2022 is based on the theme “How to lower the cost to learn, build, and tapeout chips ?”  , which will provide a platform to community to build stronger designs and strengthen the future of Chip design.
  • VSDOpen is envisioned to create a community based revolution in semiconductor hardware technology.
  • The open source attitude is required to bring out the talent and innovation from the community who are in remote part of world and have least access to the technologies.  And now Google support will help to bring the vision to execution by VSD team

VSD Online Course by Kunal Ghosh

VSD offers online course in complete spectrum of vlsi backend flow from RTL design, synthesis and Verification, SoC planning and design, Sign-off analysis, IP Design, CAD/EDA automation and basic UNIX/IT, Introduction to latest technology – RISC-V, Machine intelligence in EDA/CAD, VLSI Interview FAQ’s.

Current Reach – As of 2021, VSD and its partners have released 41 online VLSI courses and was successfully able to teach  ~35900 Unique students around 151 countries in 47 different languages, through its unique info-graphical and technology mediated learning methods.

Enquiry Form