Monday, November 2, 2020

Re: Maximum Number of Bins

On 11/02/2020 12:39 PM, Criss Swaim wrote:
> Thank you Marcus & Marcus - your insights are greatly appreciated.
>
> I am looking at the suggestions, exp the fft conversion and we are
> considering upgrading, but need to see if the system will scale, as is.
> BTW, I am maintaining the code and not the original developer, so I am
> not familiar with all the pieces, esp. the ones that have been working.
> I look at this as an opportunity to dig deeper into GnuRadio.
>
> 1) for clarification, we have been running the 3.7 code for 4 -5 years -
> not sure when we upgraded to 3.7.9. The system runs with 2 million fft
> bins, but at 3 or 4 million, it fails. M. Leach has demonstrated that
> without our custom block, GnuRadio can process the high bin levels. I
> have run various configurations of our model (without the bin_sub_avg
> python block) but I still receive the error.
>
> 2) Both of you have mentioned we are using old message queues...Can you
> point me to some documentation that explains this. We are using the
> blocks.add_const_vff and connect functions to remove background constant
> (a numpy array) from the signal stream. What would be a better
> approach? I have not looked at this for several years, so I need to
> refresh and this would be a good time to look at alternate options. It
> is bit of a black box for me and I would like to research alternate
> approaches as I dig into this process.
using add_const is fine as a way to remove backgrounds.
>
> 3) M. Leach: you indicated that the conversions from a
> stream->string->numpy array is very inefficient. Can you point to
> another approach to convert a stream to numpy array? This is done once
> every 60 minutes, but still if it could be improved, that would help.
A gnu radio sample stream is already numpy compatible, so turning it into
a string first (maybe that's what is going into the message queue?) isn't
necessary.

>
> 4) Finally, I have also been looking for a change log for the 3.7 to 3.8
> system. Moving from 3.6 to 3.7 was a significant change and was
> wondering if 3.7 to 3.8 is the same level of effort for custom blocks.
> Also, is there a timeline for 3.9?
I have one application that straddles between 3.7 and 3.9 -- there were
some gotchas, and
I'm not going to recommend anyone convert to 3.9 yet. The 3.7--3.8
conversion should be
quite a bit smoother than 3.6 to 3.7

>
> Again, thanks for any guidance.
>
> Criss Swaim
> cswaim@tpginc.net
> cell: 505.301.5701
>
> On 10/31/2020 9:55 AM, Marcus Müller wrote:
>> Hi Craig, hi Marcus,
>>
>> Also, just because I need to point that out:
>>
>> GNU Radio 3.7 is really a legacy series of releases by now. You should
>> avoid using it for new developments - it's getting harder and harder to
>> even build it on modern versions of Linux. In fact, a lot of its
>> dependencies simply don't exist for modern systems anymore.
>> Developing for 3.7 is hence dangerous in terms of lifetime. That's among
>> the chief reasons why we released 3.8. Took us long enough!
>>
>> 3.7.9.2 is positively ancient. A 3.7.13.4 or later should be the oldest
>> version of GNU Radio you work with, even when maintaining old code.
>>
>> Other than that:
>>
>>> Oct 29 10:45:07 tf kernel: analysis_sink_1[369]: segfault at
>>> 7f9c5a7fd000 ip 00007f9dd9361d43 sp 00007f9c5a48a638 error 6 in
>>> libgnuradio-vandevender.so[7f9dd9336000+4d000]
>> This really looks like a bug in your code!
>> These happen easily with the older style msgq that you seem to be using
>> (we've basically all but removed these in current development versions
>> of GNU Radio), especially if directly interfacing with Python land,
>> which has different ideas of object lifetime than your C++ code might
>> have...
>> I think a slight reconsideration of your software architecture might
>> help here, but I've not seen your overall code.
>>
>>> With an FFT size of 2**22 bins. This took about 20 seconds for the
>>> FFTW3 "planner" to crunch on, but after that, worked
>>> just fine within the flow-graph.
>>>
>> Not quite 20s for me, but yes, single-threaded FFT performance was about
>> 14 transforms of that size per second, 2 threads allowed for ~23
>> transforms a second, 4 threads for about 28. Knowing GNU Radio, I'd
>> recommend you rather stick with a single thread per transform, because
>> other block also have CPU requirements (if you really want to increase
>> throughput, deinterleave vectors and have multiple single-threaded FFTs
>> run in parallel, then recombine after).
>>
>> Seeing that you you only need 20 MS/s, and 14 transform are 14 · 2²² =
>> 7·2²³ samples a second and that would be roughly 56 MS/s, I think you
>> are fine. If you're not, get a faster PC, honestly!
>>
>> Best regards,
>> Marcus M (the younger Marcus)
>>
>> On 29.10.20 23:53, Marcus D. Leech wrote:
>>> On 10/29/2020 06:03 PM, Criss Swaim wrote:
>>>> we are running version 3.7.9.2
>>>>
>>> I constructed a simple flow-graph in GR 3.7.13.5
>>>
>>> osmosdr_source--->stream-to-vector-->fft-->null-sink
>>>
>>> With an FFT size of 2**22 bins. This took about 20 seconds for the
>>> FFTW3 "planner" to crunch on, but after that, worked
>>> just fine within the flow-graph.
>>>
>>> You should really keep your FFT sizes to a power-of-2, particularly at
>>> this size range. That's not related to your problem
>>> directly, but power-of-2 FFTs have lower computational complexity.
>>> Among other things, the FFTW2 "planner" for
>>> non-power-of-2 FFTs at these eye-watering FFT sizes seems to take a
>>> LONG time to compute a "plan".
>>>
>>> You should probably look at restructuring your code--looks like you're
>>> using message queues and marshaling your samples
>>> as *string* data through those queues. While it shouldn't necessarily
>>> Seg Fault, it's not a terribly efficient way of doing things.
>>>
>>>
>>>> Criss Swaim
>>>> cswaim@tpginc.net
>>>> cell: 505.301.5701
>>>> On 10/29/2020 11:37 AM, Marcus D. Leech wrote:
>>>>> On 10/29/2020 01:17 PM, Criss Swaim wrote:
>>>>>> I have attached a png of the flow graph and the error msgs from the
>>>>>> system log are below. These error messages are the only messages.
>>>>>>
>>>>>>> Oct 29 10:45:26 tf abrt-hook-ccpp[378]: /var/spool/abrt is
>>>>>>> 23611049718 bytes (more than 1279MiB), deleting
>>>>>>> 'ccpp-2020-10-27-15:30:43-28474'
>>>>>>> Oct 29 10:45:07 tf abrt-hook-ccpp[378]: Process 329 (python2.7) of
>>>>>>> user 1000 killed by SIGSEGV - dumping core
>>>>>>> Oct 29 10:45:07 tf audit[370]: ANOM_ABEND auid=1000 uid=1000
>>>>>>> gid=1000 ses=8656
>>>>>>> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=370
>>>>>>> comm="copy11" exe="/usr/bin/Oct 29 10:45:07 tf audit[369]:
>>>>>>> ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=8656
>>>>>>> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=369
>>>>>>> comm="analysis_sink_1" exe="Oct 29 10:45:07 tf kernel: traps:
>>>>>>> copy11[370] general protection ip:7f9e0acfdee0 sp:7f9c5a7fb590
>>>>>>> error:0 in libpthread-2.22.so[7f9e0acf1000+18000]
>>>>>>> Oct 29 10:45:07 tf kernel: analysis_sink_1[369]: segfault at
>>>>>>> 7f9c5a7fd000 ip 00007f9dd9361d43 sp 00007f9c5a48a638 error 6 in
>>>>>>> libgnuradio-vandevender.so[7f9dd9336000+4d000]
>>>>>> Flow is USRP -> stream to vector -> fft -> complex to mag ->
>>>>>> bin_sub_avg -> analysis_sinkf
>>>>>>
>>>>>> bin_sub_avg (python) & analysis_sinkf (c/c++) are custom blocks.
>>>>>>
>>>>>> the function of Bin Sub Avg, which is written in Python, is to start
>>>>>> a background task which periodically (in this case hourly) samples
>>>>>> the input signal, calculates the background noise and subtracts it
>>>>>> from the signal that is passed the the Analysis_sinkf module.
>>>>>>
>>>>>> Analys_sinkf monitors each bin and only when specific thresholds for
>>>>>> the bin are met (ie duration, strength) is the signal written out to
>>>>>> a signal file. Signals not passing the criteria are dropped.
>>>>>>
>>>>>> This code base has been running for over 3 years, with the original
>>>>>> system implementation about 8/9 years ago.
>>>>>>
>>>>>> I have traced the problem to the input signal into bin_sub_avg when
>>>>>> the number of fft bins is 3 million (2 million works). At 3 million
>>>>>> bins, any reference to the result of the delete_head() function in
>>>>>> the python code causes a failure. The python code just fails
>>>>>> without a traceback, then the invalid data stream is passed to the
>>>>>> analysis_sinkf module which is C/C++ and it causes the segment fault.
>>>>>>
>>>>>> Thus my suspicion is there is a limit in the fft block on the number
>>>>>> of bins it can handle and some variable is overflowing, but this is
>>>>>> a guess at this point. There may be a restriction in the
>>>>>> gr.signature_io module, but that seems unlikely.
>>>>>>
>>>>>>
>>>>> What version of Gnu Radio is this?
>>>>>
>>>>>

No comments:

Post a Comment