Thursday, October 29, 2020

Re: Maximum Number of Bins

I have attached a png of the flow graph and the error msgs from the system log are below.  These error messages are the only messages.

Oct 29 10:45:26 tf abrt-hook-ccpp[378]: /var/spool/abrt is 23611049718 bytes (more than 1279MiB), deleting 'ccpp-2020-10-27-15:30:43-28474'
Oct 29 10:45:07 tf abrt-hook-ccpp[378]: Process 329 (python2.7) of user 1000 killed by SIGSEGV - dumping core
Oct 29 10:45:07 tf audit[370]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=8656 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=370 comm="copy11" exe="/usr/bin/Oct 29 10:45:07 tf audit[369]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=8656 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=369 comm="analysis_sink_1" exe="Oct 29 10:45:07 tf kernel: traps: copy11[370] general protection ip:7f9e0acfdee0 sp:7f9c5a7fb590 error:0 in libpthread-2.22.so[7f9e0acf1000+18000]
Oct 29 10:45:07 tf kernel: analysis_sink_1[369]: segfault at 7f9c5a7fd000 ip 00007f9dd9361d43 sp 00007f9c5a48a638 error 6 in libgnuradio-vandevender.so[7f9dd9336000+4d000]

Flow is USRP -> stream to vector -> fft -> complex to mag -> bin_sub_avg -> analysis_sinkf

bin_sub_avg (python) & analysis_sinkf (c/c++) are custom blocks.

the function of Bin Sub Avg, which is written in Python, is to start a background task which periodically (in this case hourly) samples the input signal, calculates the background noise and subtracts it from the signal that is passed the the Analysis_sinkf module.

Analys_sinkf monitors each bin and only when specific thresholds for the bin are met (ie duration, strength) is the signal written out to a signal file.  Signals not passing the criteria are dropped.

This code base has been running for over 3 years, with the original system implementation about 8/9 years ago.

I have traced the problem to the input signal into bin_sub_avg when the number of fft bins is 3 million (2 million works).  At 3 million bins, any reference to the result of the delete_head() function in the python code causes a failure.  The python code just fails without a traceback, then the invalid data stream is passed to the analysis_sinkf module which is C/C++ and it causes the segment fault.

Thus my suspicion is there is a limit in the fft block on the number of bins it can handle and some variable is overflowing, but this is a guess at this point.  There may be a restriction in the gr.signature_io module, but that seems unlikely.

Criss Swaim  cswaim@tpginc.net  cell: 505.301.5701
On 10/28/2020 5:34 PM, Marcus D Leech wrote:
Sharing your flow-graph. The exact error messages and more context would be good    Presumably you're talking about FFT bins but it's not clear.     Also why are your samples being conveyed as strings ?  That's wildly inefficient.     Sent from my iPhone    
On Oct 28, 2020, at 7:24 PM, Criss Swaim <cswaim@tpginc.net> wrote:    I am working on a new application of gnuradio that pushes the  limits--satellite-based detection of RF from rotating  magnetized-quark-nugget dark matter transiting through the  magnetosphere--and need as many bins as possible to reduce the  background noise per frequency channel.    I have successfully run with 2 million bins, but when I jump to 3  million bins, the application abends with a segment fault.  I have  deconstructed the following python line in a custom python block that is  failing:    
raw_samps = numpy.fromstring(self._msgq.delete_head().to_string(), numpy.float32)  
  and the failure is occurring while trying to convert the results from  the delete_head() to a string (to_string()).   Any reference to the  result of the delete_head() functions results in an error.    the _msgq is defined as _msgq = gr.msg_queue(MSGQ_LIMIT) where the  MSGQ_LIMIT = 2    Here is the refactored code:    
# refactor raw_samps line test_str = self._msgq.delete_head()  print("bin_sub_avg::got msg "  sys.stdout.flush()   print(test_str)  sys.stdout.flush() test_string = test_str.to_string()  print("bin_sub_avg::converted msg to string")   sys.stdout.flush()  raw_samps = numpy.fromstring(test_string, numpy.float32)  print("bin_sub_avg::converted from string to numpy array")  sys.stdout.flush()  
  The output is:    bin_sub_avg::got msg   "the object for the shared pointer - test_str" (I did not save the exact message)    Then the application aborts.    Is there a limit on the number of bins gnuradio can handle?    Any thoughts on how to find the cause or limit?    --   Criss Swaim  cswaim@tpginc.net  cell: 505.301.5701      
  

No comments:

Post a Comment