SMPTE 2110 for Broadcast Worflows
SMPTE started drafting the ST2110 family of norms around 2015-2016, based on the Video Service Forum work and TR-03 publication, and the consortium officially published a first version of the video, audio and ancillary data essences standardization end of 2017.
SMPTE 2110 is an RTP-based suite of standards for transport of audiovisual content over IP. Its main characteristics are :
- Separate transport of individual essences (video, audio, ancillary data)
- Synchronization relying on PTP timing
- Uncompressed content is favored, even if uncompressed is possible
Handling uncompressed video reception and transmission in software, without hardware acceleration, can be challenging as the data rate is high and the timing constraints to respect are quite drastic :
Regarding streaming rate, a 1080p60 HD stream requires around 2.5Gbps of bandwidth, and represents more than 200.000 packets per second. These packets must be emitted on the IP network with an accuracy of just a few nanoseconds. Then, imagine saturating a 100GbE link with forty of these streams running concurrently on the network interface!
In these conditions, designing a software-based solution for SMPTE 2110 reception and transmission needs finding suitable balance between CPU consumption, streaming accuracy, and packet rate (defined by the amount of streams running in parallel and by their resolution, frame rate, and sampling type).
Actually, depending on the type of targeted application, processing or device, objectives might be to minimize the CPU consumption to the lowest possible, or to reach the highest amount of HD streams handled in parallel on a system, or to maximize the resolution, or anything else.
The IP Virtual Card being a generic software development kit intended to a wide amount of different applications, it provides many parameters having an impact on CPU consumption, as well as different dimensions on which you can configure the CPU load distribution on the available cores.
For the benchmarking campaign whom results are described in this paper, we used different sets of parameters depending on the behavior or limit we wanted to highlight in each of the tests.
The different situations for which we present figures in this paper are :
- The maximum achieved network throughput
- The maximum achieved amount of HD channels
- The highest video format
ST2110-21 models
Benchmarking a ST2110 reception software consists in using a known good video source – or set of video sources – and then to check at the receiver side that no error is reported, and no packet is dropped. That test is conducted in incremental conditions depending on the limit to measure (maximum throughput, maximum amount of channels, highest video format). In parallel of the verification of proper stream reception, the load of the different CPU cores is monitored during the test execution.
Benchmarking a ST2110 transmission software consists in using a reference analyzer to check the conformity of one or several streams emitted by the transmitter under test. Stream sanity check essentially consists in looking for missing packets, and stream compliance check includes continuous verification of the traffic shaping against expected values of CMax and Vrx values accordingly to ST2110-21 models, as well as PTP offset verification. Like in reception use cases, benchmark tests are conducted incrementally until reaching the limit to define, and CPU load is monitored during the test execution.
Regarding network packets pacing requirements, the ST2110-21 defines shaping profiles more dedicated to hardware implementations – the narrow, or type N, profiles – as well as a more relaxed profile sufficient for software streaming : the wide, or type W, profile. ST2110 received able to receive any of these contents are called asynchronous, or type A.
Except some specific cases where it can benefit from hardware packet pacing present on certain NIC, the IP Virtual Card is a type W emitter, and a type A receiver.
SMPTE 2110 Streaming Performance
In addition of the benchmarking principles explained here above, the reader must be aware that performance depends on two important factors : the software network stack technology on one side, and the characteristics of the system – mainly the amount and type of CPUs – on the other hand.
Regarding the network stack, two technologies exist :
- Network “sockets” is how operating systems natively handle network traffic. Sockets are optimized for multitasking (lots of low-demanding contexts) while broadcast streaming generally require the opposite (a few high-demanding contexts)
- A “kernel bypass” is a third-party piece of software implementing a direct communication pipe between the applicative software and the NIC. It allows designing high-throughput, low-consumption, low-latency solutions. DPDK is an example of kernel bypass technology, and the one used by DELTACAST in its IP Virtual Card
As mentioned the system characteristics also play an important role in performance, the CPU question being the prevalent one.
Actually, a computer platform can host one several processors. A processor executes instructions at a given frequency, expressed in Gigahertz, and is made of several independent computing units called “cores”, or “CPU cores”.
Generally speaking about software streaming, the amount of cores will directly influence the quantity of channels the system will be able to handle in parallel, while the CPU frequency will rather have an impact on the packet rate, and hence the resolution and frame rate, of the video feeds.
You will hence benefit from a large amount of CPU cores only if you are able to properly spread the packets processing load onto them, and handling a large set of streams in parallel is a good fit for that as you can assign them do specific CPU cores based on some hashing factor.
Benchmark IP Virtual Card
As introduced sooner in the document, we decided to benchmark a ST2110 software streaming solution like the IP Virtual Card based on 3 dimensions :
- The maximum achieved network throughput
- The maximum achieved amount of HD channels
- The highest video format
The following chapters detail each of these three axes.
Regarding the equipment used, we chose a trade-off between modernity, amount of CPU core, and processor frequency, and built up a server hosting a two 2nd generation Xeon Scalable processors, each one offering 24 physical cores running at 2.9 GHz.
Maximum network throughput
The network throughput is the quantity of packets emitted or received per second (pkt/sec), or otherwise expressed : the quantity of bits emitted or received per second (bps, Mbps, Gbps).
The throughput on a network interface is the sum of the throughput of all the ST2110 streams running in parallel on that port, and the throughput of each stream is function of its video format and bitdepth.
For this benchmark test, we tried saturating a 100 GbE network interface.
Testing for the maximum achievable bit rate clearly highlights the limitations of the Windows and Linux native network stacks, and the benefits from using a kernel bypass technology, as illustrated above.
Maximum amount of parallel channels
To measure a relevant metric, this benchmark test intends to run a maximum amount of ST2110 video streams configured in 1080i60 10-bit 4:2:2 YCbCr.
The streams run concurrently over a 100GbE network, theoretically capable of simultaneously conveying 88 of them.
Once again, the benchmark test shows the important added value of a kernel bypass, and so does the per ST2110 stream CPU core consumptions listed in the table below :
|
OS sockets |
DPDK kernel bypass |
||
Windows |
Linux |
Windows |
Linux |
|
RX |
2 |
0.5 |
0.3 |
0.3 |
TX |
1 |
0.7 |
0.2 |
0.2 |
Highest video format
As the IP Virtual Card supports video formats from standard definition up to UHD 4K60, this later one is the highest tested format although the software approach could be able to handle more.
|
OS sockets |
DPDK kernel bypass |
||
Windows |
Linux |
Windows |
Linux |
|
RX |
1080i60 |
4K30 |
4K60 |
4K60 |
TX |
1080p60 |
4K30 |
4K60 |
4K60 |
Conclusions
Being focused on tightly synchronized, uncompressed video, network transport, the ST2110 suite of standards is challenging to implement in the form of a software solution.
However, this paper and the figures it presents prove that it is a workable approach using the appropriate software building blocks and CPU usage strategy.
Our experience shows that using the operating system native IP stack through sockets programming is the easiest and quickest way of working but is rapidly limited in performance if you need to handle more than a few streams in parallel, or too demanding video formats.
Then come kernel bypass technologies – and especially DPDK – allowing to drastically reduce the CPU usage footprint while pushing the achievable streaming throughput near to the network link saturation.
With these technologies, and if the infrastructure accepts type W emitters, software-based implementations can compete with dedicated hardware from a performance point of view. The hardware-independent nature of such solution makes it a particularly interesting choice in these times of uncertainty regarding hardware supplies and price evolution. If you require a type N emitter however, then the NIC choice will be limited to a couple of specific models.
Besides that, the diversity of CPU models available on the market allow tailoring the BoM to your very specific needs, what can have a great impact on cost optimization.
Last but not least, with the constant evolution of computer technologies, new levels of performance are unlocked every day with no impact to the software architecture.
Software-based ST2110 streaming is definitively a forward-looking choice!