Friday, June 5, 2026

Re: Accumulating into output_items[0] across multiple general_work()

Hi Dani, Thanks for your response, very informative! I guess it's probably reasonable not to base a block implementation on an assumption of “it works because I looked at the source code”. And you’re right, it’s true that exotic buffer implementations are free to hand off output buffers how ever they want to (which I think need very specific reasons to do so, but still not contract-breaking). My block is actually not (just) an FFT, it’s a bit more complicated, I just mentioned that for the sake of giving a simple example use case, but your insight is valuable. Thanks again, Wael > On Jun 4, 2026, at 11:43 PM, Daniel Estévez <daniel@destevez.net> wrote: > > Hi Wael, > > This is a great question. My understanding is basically the same as the conclusion you've arrived to. In practice this works correctly for GR3 regular CPU buffers, because the buffer is a ring buffer and the write pointer is not advanced if you return zero as the number of items that your general_work() call has produced. So the next general_work() call sees the same data that the previous call wrote in the output buffer. If I remember correctly, this also works in the same way for GR4 regular CPU buffers. > > However, I think this is going into undocumented assumptions of how the buffer system should work. I don't think this is solidly written down anywhere, but for me the contract that general_work() has with the output buffer is that it should never read data from the output buffer, and it needs to write exactly as many items at the beginning of the buffer as the number of items that are being produced. Anything else might break under an exotic customs buffer implementation. For instance, I could imagine an implementation which hands off an output buffer taken from a common buffer pool, with no guarantee that the buffer is the same in consecutive calls even if no data was produced. > > Since your use case is accumulating many FFT frames, I would say that storing the accumulator as a member in the block class and copying the result to the output buffer whenever the accumulation has finished is quite acceptable. The output data rate is going to be quite low, so the memory copies add very little overhead. > > Another comment is that a very long integration can be realized by cascading multiple shorter integrations, which can be realized with the in-tree Integrate block. For instance you could integrate 1e9 FFTs by cascading three Integrate blocks, each set to a integration of 1e3. When using floating point numbers this approach is also better numerically, because otherwise you are sequentially adding numbers to an accumulator that ends up being approximately 1e9 times larger than the input numbers, which doesn't really work because the float32 machine epsilon is 1e-7. > > Best, > Dani. > > On 05/06/2026 03:27, Wael Farah wrote: >> Hi all, >> I have a block whose output items are running averages over a long integration (for the sake of simplicity say a power spectrum accumulated over millions of FFT frames, way too many to hold as one input buffer). >> The implementation that fell out naturally is: in general_work(), add the next batch of partial contributions directly into output_items[0] [n_emitted], return 0 while the integration is still incomplete, and only return n_emitted > 0 when one or more integrations are done. It's pretty tempting to do as this would practically avoid an extra memory allocation for an internal buffer and a memcpy back to output_items when accumulation is ready. >> However, this relies on the assumption that returning 0 leaves the write pointer unadvanced, so the next general_work() call sees the same memory at output_items[0][n_emitted] and I can keep adding into it. >> As far as I can tell, this works on current GR (no-op when produce_each(0) <https://github.com/gnuradio/gnuradio/blob/main/ gnuradio-runtime/lib/block_detail.cc#L123-L132> -> write pointer not updated), but it’s more like an implementation detail rather than it being documented as part of the scheduler API. >> Two questions: >> 1) Is the "accumulate into output_items[0] across calls" pattern supported, or am I in undocumented/unidentified scheduler/buffer behavior territory? >> 2) If it's not supported, is there any reason beyond "no API guarantee”; e.g. would it break under certain custom buffers or futuristically GR4? >> If the answer is just "use an internal buffer," happy to refactor. >> Thanks! >> Wael >

No comments:

Post a Comment