Thursday, March 14, 2019

Re: [Discuss-gnuradio] compressing I/Q files

Marcus, all,


Thx.

In the mean time, I did a little bit of testing.

A 256 MB piece of a I/Q file (a pass of NOAA-19), sampled at 240 Ksps.
Gzip compressed this down to 40 MB. 7Zip managed to get this down to 29
MB (but compressing took 10 to 20 times longer).

Now, after converting this file from float to short, you get a 128 MB file.
However, if you then compress that, the gain isn't that big anymore:
gzip 33 MB, 7zip 25 MB.


My guess is that gzip and 7zip do compression based on looking for
repetitive patterns. This means that converting 32bit floats to 16bit
shorts does not really help if you plan to compress the files afterwards
anyway.

Kristoff


On 10/03/19 18:33, Marcus Müller wrote:
> Hi Kristoff, Benny and Alban,
>
> TL;DR:
> Benny is exactly on spot. Other than that, decimate your signal if you
> know the bandwidth is less than your sampling rate, and don't put too
> much hope on audio encoders.
>
> Long Version:
>
> Point is: the signal coming from your SDR device, whatever that might
> be, has finite resolution – typically, no more than 16 bits per
> channel. Hence, the conversion from float to short (or directly getting
> short, if your device driver allows that) is lossless. For example,
> USRPs' driver (UHD), and the GNU Radio USRP source, can be configured
> to hand out the signed complex 16 bit conversion of the data from the
> network or USB interface instead of the float32 conversion.
>
> Any other compression method can only do so much:
> Your signal recording is essentially random – meaning that all values
> should be roughly equally likely. Maybe extreme high amplitudes are a
> little rarer, since you'd typically avoid those to stay clear of
> clipping.
> That means that the average info per sample is relatively high: From
> seeing other samples, we know very little about it, so the surprise we
> get from its actual value is pretty high. Information-theoretically,
> the expected information content per sample is the entropy of a source.
> Information and entropy are both measured in bit – the completely fair
> random decision between 0 and 1 ("flipping a coin") is worth 1 bit, and
> picking one out of 2¹⁶ values perfectly randomly is worth 16 bit.
>
> (Lossless) compression can, best case, achieve a compression where the
> amount of bits used per sample is equal to the entropy of the source.
> Now, if your signal is somewhat noisy, and other than that relatively
> interesting (i.e. you're not observing a constant value), your source
> entropy often approaches the limit given by the ADC – in my tests, even
> on severly backed-off signals, standard Huffmann and Lempel-Ziv-Welch
> compressors (zip, gzip, 7z, zstd, bz2, xz) achieved negligible
> compression ratios on radio recordings.
>
> I've tried FLAC, too – FLAC doesn't allow to set the actual sampling
> rate as high as was truly used by typical SDR hardware (i.e. the header
> field for the sampling rate simply doesn't have enough size to allow
> for 10⁷, for example). But that's mainly a metadata problem that can be
> solved by ignorance.
> However, FLAC's linear prediction coding relies on signals having
> a) "small" deviation from a linear function for short time periods, and
> b) the following residual coding relies on geometric distribution –
>
> and that's usually not given, because
> a) if you already know you will be in need of compression, you're
> probably not significantly oversampling your signal, but are already
> decimating it to a rate barely more than sufficient. Everything else
> would be a larger waste of space – and has no benefits for signal
> analysis later, and
> b) with the prior assumption broken, only a zero-order linear precoder
> doesn't make things worse – i.e., simply handing through the input
> samples to the residual coder. That residual coder, as said, depends on
> the distribution of amplitudes to follow a specific statistic to work
> well. Sadly, that statistic doesn't apply to I&Q signals, typically.
>
> My experience is that FLAC doesn't work well for anything that's not
> massively oversampled AM audio – which is no surprise, because that
> literally isn't very different from audio, which is what FLAC was
> designed for.
>
> However, my FLAC experiments lie years in the past – maybe the encoder
> got more versatile; Alban, do you have deviating experience?
>
> Best regards,
> Marcus
> On Sun, 2019-03-10 at 11:54 +0000, Benny Alexandar wrote:
>> Yes, converting float 32bit to short16 is an option, compressing
>> using 7zip or gzip won't give good compression .
>> From: Discuss-gnuradio <
>> discuss-gnuradio-bounces+ben.alex=outlook.com@gnu.org> on behalf of
>> Kristoff<kristoff@skypro.be>
>> Sent: Sunday, March 10, 2019 3:57 PM
>> To:discuss-gnuradio@gnu.org
>> Subject: [Discuss-gnuradio] compressing I/Q files
>>
>> Hi all,
>>
>>
>>
>> Simple and short question:
>> What is the best way to compress a raw I/Q file? A generic
>> compression-tool like gzip, zip? Or are there better and specialised
>> tools?
>>
>>
>> Is converting the data in the I/Q file from float to short an option?
>>
>>
>> Kristoff
>>
>>
>> _______________________________________________
>> Discuss-gnuradio mailing list
>> Discuss-gnuradio@gnu.org
>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>> _______________________________________________
>> Discuss-gnuradio mailing list
>> Discuss-gnuradio@gnu.org
>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

No comments:

Post a Comment