MuseBox applied to Social Distancing

What is MuseBox?

MuseBox is a software platform (Software + Hardware) based on FPGA for real-time analysis for audio and video streams.

MuseBox was created for AV Broadcasting applications, but it can be used for every multimedia system analysis.

Real-time computing is a computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee a response within specified time constraints, often referred to as "deadlines".

A MuseBox Demo photo

It can work with live streams (therefore coming from a video or peripheral source) of any size and resolution and with files saved on filesystem.

It has a scalable and easily configurable architecture according to needs:

  • resource scalability: if a single FPGA is not enough for the amount of work, you can easily add another FPGA with little effort
  • reliability scalability: the stack is easily expandable, ensuring the accuracy of response according to needs
  • cost scalability: each platform solution can be configured with a trade-off between cost and reliability of response, low energy consumption
  • maintenance scalability: in case of failure or malfunction, each piece is easily replaceable in plug-n-play mode

Xilinx FPGAs are MPSoC, therefore they are highly customizable and easily interfaced with low-end microprocessors (ARM) ⇒ currently they support ZCU10x, Ultra96, PYNQ Zx and ZynQ family

The organization in microservices allows you to easily customize the entire infrastructure, with high fault-tolerance it

is possible to avoid the use of the codec/decoder via ALVEO if not necessary, each hardware part is configurable

The communication is based on message queues, therefore it intrinsically load-balanced and can be easily interfaced with heterogeneous systems

The Machine Learning part works on FPGA and uses frameworks certified by Xilinx, ie AI SDK, DNNDK, and FINN, which allow the use of already existing algorithms and neural networks to convert them into FPGA context (eg Tensorflow, Caffe, PyTorch)

The presence of a high-level orchestrator allows to manage the flow of metadata in an organized and centralized way with the relative saving on database avoiding congestion and race-condition

Demo video

https://www.youtube.com/watch?v=9boT1QkTyxU
  • the system recognizes people within the frame
  • for each person, check the distance between each subject
    • if it is too far, it does not consider the connection
    • if it is close enough (within 3 meters) it signals a potential problem (green connection line)
    • if it is too close (less than 1.5 meters) it signals a potential infection (red line)
  • the distances can be set at system start
  • scalability and robustness with respect to video quality and size
  • possibility of expansion with further analysis (presence of mask, subject at risk through facial recognition, subject age control, etc.

close people, potential risk (green line),
too close people, infection risk (red line)

Possible scenarios

  • analysis of the work environment and safety
  • control of open spaces for monitoring people
  • patient monitoring and contagion tracking