Since we launched ZCast back in early 2016, the most requested enhancement we got from our users was to “improve the sound quality”.
As audio-geeks ourselves, we felt your pain every time we used ZCast. We get it, in today’s world, you expect more than a phone-line quality when it comes to professional audio recording.
It took us many months of intensive research, evaluation of protocols, libraries, and technologies to find the audio configuration that was right for our users and for the future of ZCast.
Finally, earlier this year, we began our “hacking” journey to build and integrate the new audio stack into ZCast.
This new version has been in beta for the past couple of months and was used by our amazing and loyal group of beta ZCasters, and we are finally ready to release “ZCast 3 – HD Audio” for everyone to enjoy starting today! 👏🏼
What’s new with ZCast 3 – HD Audio?
TL;DR; (more details here)
In short, this new version switched from an audio stack that was limited to an audio bandwidth of 8kHz/mono to 48kHz/stereo sound!
If you do not know what kHz is, in short, this is a way to measure the frequency of waves, such as sound waves, and it is used in sound systems to tell what’s the maximum sound frequency we can capture. It is basically the number of times we sample your audio per second, to be able to transmit it over the network and restore it on the listeners’ side.
The higher this number, the better the audio quality is, and for reference high-def home audio systems, such as you CD Audio players, are using 44.1kHz.
To get a feel for the difference, check out this interactive demo we created by recording music through both ZCast 3.0 and ZCast 2.0 at the same time.
Once you hit play, the mixer will automatically fade between the two recordings.
But, you may take control over the knob at any time and try it for yourself.
(Music: “Please Listen Carefully” by Jahzzar; License: Creative Common)
The audio spectrum image you see represents the audio you listen to. It shows the volume of each frequence over time.
The brighter the color, the higher the volume of this frequency is. The important thing to see is that with the old version, the audio was clipped at 8kHz and any sound above that frequency didn’t get captured by the system.
ZCast 3 HD Audio introduces these core improvements:
- A brand new and modern audio streaming and mixing backend, replacing our original telephony-based conferencing mixing system with a pure VoIP (Voice over IP) stack (VoIP on Wikipedia)
- We use OPUS audio codec: OPUS is a modern compression standard that supports up to 48KHz audio sampling (compared with the 44.1KHz bandwidth of your typical CD player, or 8KHz which was the original ZCast bandwidth), and it dynamically updates the network usage based on your available bandwidth capabilities at every given moment
- Stereo streaming is now supported if the device is capable of stereo audio capturing. Note that at the time of writing this post, Chrome is the only browser in the market that supports stereo streaming.
- Unlimited number of listeners can join a live broadcast
- MP3 Live streaming – Listeners can now listen to live ZCasts without installing the app directly in their social network feed on Twitter, Facebook etc.
But what we are really excited about, is the endless opportunities that this new audio stack is enabling for the ZCast platform going forward. The list of new ideas is so long and exciting we can’t wait to start working on them.
ZCast 3.0 HD Audio is ready for you to enjoy as of now. Start using it by either downloading ZCast 3.0 for iOS on your iOS devices, or by openning ZCast 3.0 HD Audio
If you want some more of the geeky info, keep reading…
What’s new? [The Nerd’s edition]
If you are an Audio-Geek like we are, here is a deeper dive into ZCast 3.0.
For a start, as we mentioned before, we have switched from a telephony based audio System to a full-audio-spectrum HD audio system based on the Kurento open source project, which in turn is based on the amazing GStreamer library.
Our previous version of ZCast was using a phone-conferencing audio backend (Twilio) to connect and record all participants and listeners of a ZCast. Think of each ZCast as a huge conference call with few people talking and many people listening in. This was easy to work with, but was limiting our quality to that of, well, your typical conference call.
Telephony, since its inception, uses a very narrow audio bandwidth, somewhere between 4kHz to 8kHz. Your old analog home phones were usually using 4kHz, and the more modern VoIP based systems have doubled their bandwidth to 8kHz, but our amazing human ears are capable of listening to a much wider range: starting at 50Hz all the way up to 15kHz (and some sound audio-experts would argue they can hear the full spectrum of 20Hz to 20kHz, but we won’t argue with them…)
To put this in perspective, the good-old Audio CD Player, the one that revolutionized the digital music world about 30 years ago, is capable of reproducing audio in the range of 20Hz to 20kHz by using a sampling rate of 44.1KHz that is just enough over 20KHz so that when sampling the higher end of the spectrum, there is minimal loss of the 20KHz range data.
Btw, the most modern hi-end studios are capable of 48kHz and some “audio crazy guys” might go all the way up to 96kHz, but for most people 44.1kHz or higher should be good enough!
But, choosing how many times you sample the original audio per second (which is what Hz stands for, the number of revolutions per second of a waveform, named after Heinrich Rudolf Hertz) is only one part of the equation. The 2nd part is, how many unique values will you use to digitize the amplitude of each sample. The way digital audio works is that we measure the amplitude of the original waveform every short period of time, and we use a number to represent this amplitude. A low-end system might use 8-bit (256 different values) per sample, while high-end systems will use anywhere from 16-bit (65,536) to 32-bit (4,294,967,296) and more. The higher the sample bit size is, the smoother the digital audio will be when converted back from digital data to analog sound into your speakers.
The next important factor that determines the quality of the audio is storage size or in other words: compression of stored media. This is important because we need to transfer this data over the network, so the size of the content is critical not just from a disk-size perspective, but for network bandwidth requirements.
If we would keep this data in its raw format (like a traditional CD player does), these files can get quite big, so big that in fact, even our modern networks will not be able to stream them without hiccups.
Also, as much as storage and networks are getting cheaper and faster, it is still not cheap enough, and network bandwidth is still quite limited to the point that storing and streaming raw Audio and/or video is not realistic at this point in time. For that reason, we need to compress our audio files.
Traditionally compression of computer data comes in two “flavors”: lossless and lossy. Lossless is like your Zip archive files. They compress the data but do not lose any bit of information. When you uncompressed a zip file, all the bits will be restored exactly to their original form. The second kind of compression is what is known as a lossy compression. A lossy compression will not be able to restore the data exactly to its original state, but it can do it close enough that a human will not notice the difference. This is used typically for visual and audio data. We have all seen photo images formatted as JPEG, PNG or GIF. These are all visual data compressions. For audio and video, we have commonly used formats such as MP3, MP4, Ogg and others.
For live streaming, compression is key. If you compress too much, you will both lose quality, but also, you might require more effort on the client device to decompress the data, therefore, causing delays and hiccups in the user experience.
For that reason a good compression for streaming should be mathematically easy to decompress on-the-fly so that video/audio players can quickly restore the compressed data for us, humans, to enjoy, in as close to real-time as possible, with minimal latency.
So, what does all this have to do with ZCast? Well, Everything!
When we set on the mission to choose the best compression and audio codec (encoding/decoding) format for our HD system, we needed a solution that will allow you, our user, to choose how great you want your audio to sound, and not be limited by our decision, but only by your selection of equipment and networks.
After analyzing and testing many libraries, formats, and compressions, we have chosen OPUS as our preferred audio codec.
OPUS checks the box on multiple requirements:
- Audio quality
- Compression level
- Supported platforms
- Robust network conditions handling
Let’s start with quality
The OPUS format supports up to 48KHz Stereo audio with a compression that is barely noticeable for spoken words and music, giving results that are superior to mp3, which is the most common audio compression out there today.
OPUS compresses the data in the most efficient way and reduces the size of the stream to allow seamless HD audio broadcasting even on poor 3G cellular networks.
OPUS is supported natively on all major browsers such as Chrome, FireFox, Microsoft Edge and even Safari (starting with Safari 11 that will come out in the coming months).
OPUS doesn’t require any plugins to be installed. Your browser already has all that is needed to use it.
OPUS is also supported by all native iOS and Android implementations of the WebRTC protocol.
Robust network conditions handling
In addition, not only that OPUS can compress up to 48KHz of stereo audio in real-time, but it can also dynamically adapt to the quality of your network condition at each given moment, and if your cellular network is acting up, or your Wi-Fi bandwidth is down because your family is streaming three different Sponge Bob episodes on three different iPads on your home network at the same time, while you are broadcasting your ZCast, it will automatically increase the compression and reduce the network footprint of your content! So whether you are broadcasting from your car while driving, or from your basement, once you hit a bad network situation, your broadcast will continue with the best-possible quality that your current network allows for!
We are excited for this release of ZCast and we are sure that it will allow you to start creating high-quality content while interviewing your guest speakers and be able to control the quality of your content like you couldn’t do it before.