GNU Radio, One Step at a Time: February 2016

Monday, February 29, 2016

[Discuss-gnuradio] How to change the SNR through gnuradio

Hi,

I want to change the SNR through gnuradio.Can I realize it by just change the gain of the gnuradio?Or the instrument is fixed,and the snr can't be change?

Thanks so much.

Best Regards,

Re: [Discuss-gnuradio] Saving Waterfall to Video File

In the past, I have saved decimated vectors of the signal to a file and used that to recreate a long, static waterfall plot for debug of satellite passes.

On Sun, Feb 28, 2016 at 2:19 PM Stephen Berger <stephen.berger.temconsulting@gmail.com> wrote:

I would like to save a waterfall plot (spectrograph) to a video file so that I can share it and cut out portions of interest for presentation. Has anyone found a way to directly save the output to an MP4 or some other video format?

I have been using a screen recorder but that takes a good time of time and the process ends up requiring 2 to 4 different programs by the time you record, edit and have your file ready for presentation. It would be a time savings to save it directly.

Best Regards,

Stephen Berger
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Very Respectfully,

Dan CaJacob

[Discuss-gnuradio] Aliasing on USRP N210

I am observing aliasing on the spectrum analyzer in the following simple
transmission scenario:
sampling freq=25M, carrier freq=12.5MHz, and a sine wave freq=-11MHz at
baseband.
I use the following command:
> uhd_siggen -s 25e6 -f 12.5e6 --sine -x -11e6

I expect to see a delta at freq=1.5MHz, but there is another one
freq=26.5MHz (see attached jpeg).

It occurs on LFTX or BasicTX doughterboards. When I move the sine wave
from -11MHz toward zero, the aliasing gradually disappears. It appears
again when moving up toward +10MHz.

Why does it happen? Can I avoid it?

Thanks in advance.

[Discuss-gnuradio] Google Summer of Code 2016

Everyone,

I'm really happy to say that GNU Radio was accepted as a participating
organization for Google Summer of Code 2016! After not participating
last year, I'm very glad I can give this news, and it'll be the fourth
time that we participate in this program.
Note: We'll also apply for SOCIS again, but I haven't heard from them yet.

For anyone interested in participating, it's imperative that you read
all of our GSoC pages carefully, starting here:

http://gnuradio.org/redmine/projects/gnuradio/wiki/GSoC

This includes our ideas list, which should be a starting point for
anyone planning to apply.

Also, have a look at the summer of code home page, and the timeline:

https://summerofcode.withgoogle.com/how-it-works/

Applications start on March 14th, but you should definitely get in touch
with the community before that if you're interested.

*What to do if you want to participate in GSoC '16:*

If you want to participate, you should first pick a project to work on.
The ideas list (see link above) is where to go first for this, although
you can bring your own ideas to the table as well (I still recommend
looking at the ideas page to get an idea for a good scope).

Next, get in touch with us via the mailing list. We're very happy about
people wanting to participate, so please don't be intimated by the
mailing list. Let us know what you want to do!

Sooner rather than later, you need to start working on your actual
application. You can do this before the application deadline: In the
past, students have often prepared their application on github, or
similar web services, andn then posted a link to that page on their
actual application on the GSoC application page. This allows to iterate
on the application faster, and make it easier for community member to
give input. It might seem scary to post something like this in the
public, but that's what you'll be doing with your work for the duration
of the summer if you get accepted.

A couple of hints:
- Really read the pages I linked to up top.
- In the past, we've always had many more applicants than slots. So
start working on your application early to give you a head start!
- Usually, many more people apply to do radio stuff than to work on
GUIs, whereas the latter is usually more useful to the bulk of our
users. Just saying :)

OK, I'm hoping to see a lot of new people this round! If you have any
questions, please ask them in this thread.

Cheers,
Martin

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Re: [Discuss-gnuradio] GSoC '16

Vilvanesh,

glad to see you (and hopefully many more!) students excited about GSoC
'16. I'll have a message ready for this list very soon, hopefully it'll
answer your questions!

Cheers,
Martin

On 02/29/2016 12:03 PM, vilvaneshk . wrote:
> Hello devepolers !
>
> I'm vilvanesh .I so glad and excited to find GNU Radio
> in the list of accepted organisations for GSoC '16
> https://summerofcode.withgoogle.com/organizations/?sp-search=gnu%20radio .
> I am really interested in contributing . I went through the wiki and
> ideas page . I found two projects to be beginner friendly i) GRC
> extensions : output C++ code .
> ii) offline analysis and virtualisation tools . Tim O'shea is the mentor
> for both of them . It would be really helpful if this reaches his
> ears/eyes.can someone help with that ?
>
>
> cheers,
> vilvanesh k.
>
>
> _______________________________________________
> Discuss-gnuradio mailing list
> Discuss-gnuradio@gnu.org
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

[Discuss-gnuradio] GSoC '16

Hello devepolers !

I'm vilvanesh .I so glad and excited to find GNU Radio in the list of accepted organisations for GSoC '16 https://summerofcode.withgoogle.com/organizations/?sp-search=gnu%20radio . I am really interested in contributing . I went through the wiki and ideas page . I found two projects to be beginner friendly i) GRC extensions : output C++ code .

ii) offline analysis and virtualisation tools . Tim O'shea is the mentor for both of them . It would be really helpful if this reaches his ears/eyes.can someone help with that ?

cheers,

vilvanesh k.

[Discuss-gnuradio] "file sink" problem when I use GRC

Hi,all:

I use 2 USRP N210 and 2 PC to transmit a file. And implement it by GRC.

PC1（transmitter）:

PC2（receiver）:

In my PC1(transmitter) I have a file and the content is "hello world". I think the PC2(receiver) should receive the file and save it in the path which I define.

But when I implemented the GRC example I saw the "hello world" output in the console, but did't write into the file in PC2.The file in PC2 still blank.

This "files transmit " example I had seen on the youtube and it can be successful obtain the data and save to the receiver. So I don't where I went wrong.

Thanks,

Re: [Discuss-gnuradio] Is there way to specify the destination of signal ?

Dear SangHyuk Kim,

Yes, that's possible, but the way you describe this is a Layer 2 or even Layer 3 problem – the typical job of GNU Radio is to do signal processing, i.e. Layer 1. Of course, as Basti's gr-ieee802-11 demonstrates, you can write very practical blocks that doe the MAC logic; but that's up to you to implement (as it is very system specific).

As an easy way to implement this, look at the gr-digital/examples/ofdm/rx_ofdm.grc example included in GNU Radio.
Look at the "Header/Payload Demux" Block: it takes in samples, and outputs one header stream, and one payload stream. Follow the header stream: It ends in the "Packet Header Parser".
The "Header Parser" tells the "Header/Payload Demux" to forward the payload to the payload stream. You could just implement something that works like "Packet Header Parser", but also checks whether the packet was for your receiver (or for someone else).

But to be honest, most communication systems wouldn't do that. The information whether a packet is for "you" belongs to the next higher layer, and hence, should be included in the payload data.

So the easiest way to do this is to add a "destination address" to your payload data, and read that from the payload data, with a very minimal Python block.

Best regards,
Marcus

On 29.02.2016 13:26, SangHyuk Kim wrote:

Hi all.

I want to make unicast communication system such as Wi-Fi using SDR(in my case, usrp n210)

however, I can't find any block like of that.

Exactly what I'm saying like this situation :

A -------> B

C

(A; Tx packet to B, B; Rx packet from A, C; discard packet from A to B)

Is this possible in GNU Radio ?
_______________________________________________  Discuss-gnuradio mailing list  Discuss-gnuradio@gnu.org  https://lists.gnu.org/mailman/listinfo/discuss-gnuradio  

Re: [Discuss-gnuradio] Wi-Fi Channel Monitoring

Hi,

On 28 Feb 2016, at 16:03, Stephen Berger <stephen.berger.temconsulting@gmail.com> wrote:

Let me offer a non-GNURadio solution. This can probably be replicated in GNURadio but I haven't done it and so would need to look into how feasible it is to implement. You can use Wireshark and any WiFi NIC to capture packets. If you can put the NIC is promiscuous mode you can sequentially tune it to each channel and record any packets that occur. I was dwelling on each channel for 3 seconds and scanning all the 2.4 and 5 GHz channels.

In each captured packet is the channel number being used and the RSSI of the signal at each device. For some purposes this RSSI is useful. If you can further measure the RSSI at your USRP you have 2 measures of the signal and that could potentially be very useful.

I also looked at things like the number of access points and attached devices. This is useful because you can start to compare environments based on the network configuration of the access points and the number of people using their devices in the area.

What this method does not tell you is what is out there that is not WiFi. In some recent measurements I am seeing an increasing number of Bluetooth and Bluetooth Low Energy. Not a surprise.

Sounds good. Since you use WiFi cards I wanted to mention, that some of them support /spectral scans/ [1], which report about the power level at different frequencies.

These measurements are independent from received frames and allow to build a very simple spectrum analyser, i.e., interferers like Bluetooth can also be detected.

Best,

Bastian

[1] http://blog.altermundi.net/article/playing-with-ath9k-spectral-scan/

[Discuss-gnuradio] Is there way to specify the destination of signal ?

Hi all.

I want to make unicast communication system such as Wi-Fi using SDR(in my case, usrp n210)

however, I can't find any block like of that.

Exactly what I'm saying like this situation :

A -------> B

(A; Tx packet to B, B; Rx packet from A, C; discard packet from A to B)

Is this possible in GNU Radio ?

Re: [Discuss-gnuradio] Discuss-gnuradio] How to specify maximum size of input buffers on blocks

Hi Gonzalo,

On 29.02.2016 04:58, Gonzalo Arcos wrote:

I have seen that most of the buffers are almost full on average (80-89%), however, that does not help me to know whether a block was blocked from pushing data into the buffer because downstream blocks did not read fast enough.

Hm, while uniform usage is usually a good sign, that much average fill isn't good. You say "most" blocks; what's downstream of these high fill buffers?

From your original mail:

"File Source is only allowed to produce so many items that the write pointer doesn't advance beyond the minimum read pointer, because in that case, it would overwrite samples that a downstream block hasn't consumed."

The maximum distance between the write pointer and the minimum read pointer, is percentage based? Is a fixed item value? This is because i tried to play around with the buffer sizes but did not notice any change in the performance of the flowgraph, either for good or bad.

I don't really understand your question. The point is that the position of the write pointer might never pass the read pointer(s). The difference between the "most behind" read pointer and the write pointer is used to calculate noutput_items for the block's (general_)work call. In fact, your block is asked "to generate this much output, how much input would you need?" via its "forecast" method with exactly that difference. If your forecast says it needs more input than available, the number is halved, and "forecast" is called again, until forecast only demands as much input as available. At that point, general_work is called with noutput_items set to that number.

You said that a buffer with size 0, will use the default gnu radio size. I would like to ask, is this a fixed value taken from a config file?

umm... Buffer generation. Wait... Here it is: [1] and used here [2].
So, it defaults to 64KB buffers. That aligns nicely witht the observation that complex number buffers are typically 8192 items long (complex = float I + float Q = 32bit + 32bit = 8B; 8192 * 8 = 64K).
If you have other restrictions (e.g. set_min_output_buffer, or item size not factor of page size (4KB)), you might get different sizes, though.

or gnuradio automatically asses the system memory and sets a default buffer appriately? For example, if i would like to give gnuradio 6 GBs of RAM for the buffers, then upstream blocks should not block because downstream blocks are slow processing the buffer data, at least for a big amount of items.

Hm, that would be a nice feature, at least for experimentation (enlarging all the buffers might have pretty ugly effects on latency, so I wouldn't recommend doing it unless you know your sample rate * buffer_size/2 * blocks in critical path are significantly less than your tolerable latency). I don't think GNU Radio has that feature, though :) I'd say, in general, it's a good idea to manually enlarge the output buffers of the blocks upstream of CPU-intense blocks, but to keep granularity small in the rest of the flow graph.
Feel free to experiment with [1]; you could remove the assignment of the #define constant to s_fixed_buffer_size and replace it with a call to

const unsigned int s_fixed_buffer_size = prefs::singleton()->get_long("DEFAULT", "buffer_size", GR_FIXED_BUFFER_SIZE);

and add a "buffer_size = X" to your .gnuradio/config.conf under the [default] clause¹.

Best regards,
Marcus

¹ um, yeah, that should make the staticness a bit conceptually questionable, which should lead to us renaming the variable, but... hey, experimentation!

[1] https://github.com/gnuradio/gnuradio/blob/master/gnuradio-runtime/lib/flat_flowgraph.cc#L41
[2] https://github.com/gnuradio/gnuradio/blob/master/gnuradio-runtime/lib/flat_flowgraph.cc#L134

2016-02-26 14:48 GMT-03:00 Gonzalo Arcos <gonzaloarcos12@gmail.com>:
I will investigate and try this. Thank you so much Marcus!
2016-02-26 12:09 GMT-03:00 Marcus Müller <marcus.mueller@ettus.com>:
Hi Gonzalo,

However i noticed that after the optimization, the sum of all blocks percentage of processing time is not 100%, this is when i started evaluating the possibility that gnu radio scheduler or gnuradio framework or the relationship between blocks is what is preventing me to achieved my desired rate, and it is not each block processing time anymore.
Well, in GNU Radio, every block runs in its own thread, so you can basically can get up to the number of CPU cores * 100% as total consumption.

However, i do not have an easy way to determine the "amount of time a block has been blocked because another block in the downstream has not read the input buffer fast enough for the first block to be able to write on the buffer".

The performance monitor should be able to tell you how full your buffers are on average; watch out for "full" buffers: they're upstream from your bottlenecks.
Also, look for the average rate (as reported e.g. by the probe_rate), and compare that value with the (average work items/average work call time). That kind of gives you a "block duty cycle".

Best regards,
Marcus
On 02/26/2016 01:51 PM, Gonzalo Arcos wrote:
Hi Marcus,

Thanks a lot for your answer! So now having confirmed this, comes the real question :)

I am trying to improve the performance of a flowgraph. My performance "metric" is the rate at which a file in my filestystem grows, since my flowgraph starts with a file source and ends with a file sink. This means that after doing some processing, the flowgraph outputs the result to a file, so if the file grows faster, then the flowgraph is executing faster and i achieve a faster transfer rate, even if i then change the file sink for another sink (i.e. usrp).

The transfer rate is very important to me, because im evaluating the performance of the flowgraph compared to the implementation that already exists in hardware, which i know exactly the capable transfer rate.

So after using the performance monitor and detecting some blocks that do a lot of processing, ive optimized those blocks.

However i noticed that after the optimization, the sum of all blocks percentage of processing time is not 100%, this is when i started evaluating the possibility that gnu radio scheduler or gnuradio framework or the relationship between blocks is what is preventing me to achieved my desired rate, and it is not each block processing time anymore.

However, i do not have an easy way to determine the "amount of time a block has been blocked because another block in the downstream has not read the input buffer fast enough for the first block to be able to write on the buffer".

It is also not as simple as connecting a file sink in an intermediate point in the flowgraph delete the rest of the blocks, and measure the transfer rate against this new file sink. This is because in my flowgraph interpolation and decimation occurs at different blocks, resulting in making the file size an unrealiable metric in these cases.

In the only case this is valid, is at the very end of the flowgraph, when the data has been restored to its original content, and therefore, any output by from the final block corresponds to an input byte.

I hope i expressed myself correctly. Do you have any tip on how to figure out why my flowgraph at a determine rate? Should the time a flowgraph executes be always the same to the sum of the time all blocks consume?
---------- Forwarded message ----------
From: Marcus Müller <marcus.mueller@ettus.com>
Date: 2016-02-26 6:52 GMT-03:00
Subject: Re: [Discuss-gnuradio] How to specify maximum size of input buffers on blocks
To: discuss-gnuradio@gnu.org
Hi Gonzalo,

these are the mails I like most :)

On 25.02.2016 01:11, Gonzalo Arcos wrote:

Suppose i have a file source, connected to 2 blocks, one, lets call it Block A, does always return instantly after general work is called, in other words, it will never consume an input item. The other one, say its block B, is the identity block, it consumes and output exactly what it receives as input.

The output of the first block is a null sink (since i know it will not produce any output), and the output of the second block is a file sink.

So:
              /->A->Null Sink  File Source -|                \->B->File Sink
What i am experiencing at running this flowgraph, is that block B will work for the very first seconds (or centiseconds), and will eventually block. This is because A has never consumed an input item, so i guess A's input buffer is full. Because of this, the file sink cannot push any more items further into the flowgraph, resulting in B not having any new input items to process.

Exactly!
So the mechanism below is: the output buffer of File Source is the input buffer of A and the input buffer of B. No memory duplication here.
File Source has a buffer writer with a write pointer, and A and B have their own read pointers pointing into that buffer.
When File Source produces N items, the write pointer advances by N. Similarly, when A consumes M items, A's read pointer advances by M.
When calling (general_)work, the input_items buffer(s) is (are) really just a pointer (start_of_buffer + read pointer). Equivalently, the output_items buffer(s) is (are) really just pointing to the write pointer.

File Source is only allowed to produce so many items that the write pointer doesn't advance beyond the minimum read pointer, because in that case, it would overwrite samples that a downstream block hasn't consumed.

So, my question is, how big is the input buffer of A?

Depends. Typically (64bit Linux), 8192 complex items get allocated, but that really depends on various factors.

Is that customizable?

Yes! For example, as you noticed, in the GRC, open the "advanced" tab in a block property; there's a "Min Output Buffer" and a "Max Output buffer field". There's corresponding methods to set these sizes in the gr::block class[1].

Can i increase the size, so the flowgraph wont block at the first seconds, but at a minute, for example?

Yes; but that might, depending on the rate in which the file source produces samples, be *a whole lot* of memory!
I recommend not doing that, but instead:

Make A consume the items it has read (because, as far as I can tell, you don't actually need them, afterwards, do you?)

implement your "end the running of this flowgraph after X samples" by adding a "head" block, which has the functionality to simply consume all N input samples, copy them from its input to its output buffer, and then consume(N). If sum(N) == X, it says it's done and thus leads to the flowgraph be shut down

Can i specify a policy for a block that if it reaches X amount of samples in its input buffer to drop some of the input?

You could just write a block that does that for you.

Ive seen that each block has minimum output buffer and maximum output buffer in grc. However, i do not see any option regarding its INPUT buffer.

Because there's no such thing as a dedicated input buffer :) because every input buffer is in fact the output buffer of the upstream block.

Another thing i thought is that A's input buffer is the file source output buffer, therefore by adjusting the output buffer of the file source, i am adjusting A's (and B's) input buffer. However, the output buffer of the file source is 0, so i guess its "infinite".

No, 0 just instructs GRC to not write a line containing a "set_min_output_buffer" call[2], so GNU Radio uses the default.

Best regards,
Marcus

[1] https://gnuradio.org/doc/doxygen/classgr_1_1block.html#a9d10e3f6747f91b215abe81b60a003d5
[2] https://github.com/gnuradio/gnuradio/blob/master/grc/python/flow_graph.tmpl#L230
_______________________________________________  Discuss-gnuradio mailing list  Discuss-gnuradio@gnu.org  https://lists.gnu.org/mailman/listinfo/discuss-gnuradio  
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Re: [Discuss-gnuradio] Using volk kernels on basic operations of gr_complex, in my own custom blocks.

It won't give you time spent, but 'perf top' is a nice tool that gives function-level performance counters for all running code. It comes with linux-tools and uses performance counters built in to the kernel. There's also a couple of other perf subtools you can explore.

Regarding your full buffers, I think that's a result of GNU Radio's scheduler.

If you have a flowgraph with A->B and B takes a very long time to process all of its samples then A will always have full output buffers since it operates much faster. It's not necessarily bad or cause for concern, but performance improvements should focus on B.

-nathan

On Sun, Feb 28, 2016 at 10:48 PM, Gonzalo Arcos <gonzaloarcos12@gmail.com> wrote:

Thanks to all of you for your very informative answers.

Douglas, i feel good now because you have described perfectly all the things i did / thought on how to improve the performance :), i also agree that merging blocks should be a last time resort. I have used the performance monitor and managed to improve the perofrmance of the most expensive blocks. What i could not achieve though, is profiling the program with a mainstream profiler, like valgrind or oprofile, or some other profilers for python. I remember than when visualizing the data, all the time was spent in the start() of the top block, and i could not get information pertaining each blocks general work, let alone functions executed within the block. After discovering the performance monitor, i used it in conjuntion with calls to clock() to determine the time spent in each function within each block, to get a rough measurement. But if it is possible to get this information automatically, i am very interested in learning how to do it. Could you help me?

There is also another interesting aspect of improving performance, which is blocks being blocked due to the output buffer being full. Ive tried playing around a bit with the min and max output buffer sizes, but the performance did not seem to be affected.
After using the performance monitor to analyze the buffer average full %, i see that most of them are relatively full, however, i do not know if they are full enough to make an upstream block to have to wait to push data into the buffer.

2016-02-28 19:39 GMT-03:00 Douglas Geiger <doug.geiger@bioradiation.net>:
The phenomenon Sylvain is pointing at is basically the fact that as compilers improve, you should expect the 'optimized' proto-kernels to no longer have as dramatic an improvement compared with the generic ones. As to your question of 'is it worth it' - that comes down to a couple of things: for example - how much of an improvement do you require to be 'worth it' (i.e., how much is your time worth and/or how much of an performance improvement do you require for your application). Similarly, is it worth it to you to get cross-platform improvements (which is one of the features of VOLK)? Or, perhaps, is it worth it to you just to learn how to use VOLK?

A couple of thoughts here: in general, when I have a flowgraph that is not meeting my performance requirements, my first step is to do some course profiling (i.e. via gr-perf-monitorx) to determine if there is a single block that is my primary performance bottleneck. If so - that is the block I will concentrate on for optimizations (both via VOLK, and/or any algorithmic improvements - e.g. can I turn any run-time calculations into a look-up table calculated either at compile-time, or within the constructor).
If there is not a clear bottleneck, then next I look a little deeper using perf/oprofile to look at what functions my flowgraph is spending a lot of time in: can I e.g. create a faster version of some primitive calculation that all my blocks use a lot, and therefore get a speed-up across many blocks which should translate into a fast over-all application.

Finally, if I still need more improvements I would look at collecting many blocks together into a single, larger block. This is generally less desirable, since you now have a (more) application-specific block, and it becomes harder to re-use in later projects, but if you have performance requirements that drive you there, then it absolutely is an option. At this point you likely have multiple operations being done to your incoming samples, and it becomes easy to collect all of those into a single larger VOLK call (and from there, create a SIMD-ized proto-kernel that targets your particular platform). So, while re-usability of code drives you away from this scenario, it offers the greatest potential for performance improvements, and thus is where many applications with high performance requirements tend to gravitate towards. Ideally you can strike a balance between the two: i.e. have widely re-usable blocks, but with a set of operations inside them that you can take advantage of e.g. SIMD-ized function calls to make them high-performance. If you can craft the block to be widely re-usable for a certain class of things (e.g. look at how the OFDM blocks are setup to be easily re-configurable for the many ways an OFDM waveform can be crafted). In the long-run having more knobs to turn to customize your existing code base to deal with whatever new scenario you are looking at in 1/2/10 years from now is always better than a brittle solution that solves today's problem, but is difficult to re-use to deal with tomorrow's.

Hope that was helpful. If you are interested in learning more about how to use VOLK - certainly have a look at libvolk.org - the documentation is (I think) fairly good at introducing the concepts and intent, as well as how the API looks/works. And certainly don't be shy about asking more questions here.

Good luck,
Doug

On Sun, Feb 28, 2016 at 1:58 AM, Sylvain Munaut <246tnt@gmail.com> wrote:
> Just wanted to ask the more experienced users if you think this idea is
> worth a shot, or the performance improvement will be marginal.

Performance improvement is vastly dependent of the operation you're doing.

You can get an idea of the improvement by comparing the volk-profile
output for the generic kernel (coded in pure C) and the sse/avx ones.

For instance, on my laptop : for some very simple one (like float
add), the generic is barely slower than simd. Most likely because it's
so simple than even the compiler itself was able to simdize it by
itself.
But for other things (like complex multiply), the SIMD version is 10x faster ...

Cheers,

Sylvain

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

--
Doug Geiger
doug.geiger@bioradiation.net

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Sunday, February 28, 2016

Re: [Discuss-gnuradio] Using volk kernels on basic operations of gr_complex, in my own custom blocks.

On Mon, 2016-02-29 at 00:48 -0300, Gonzalo Arcos wrote:
> Thanks to all of you for your very informative answers.
>
> Douglas, i feel good now because you have described perfectly all the
> things i did / thought on how to improve the performance :), i also
> agree that merging blocks should be a last time resort. I have used
> the performance monitor and managed to improve the perofrmance of the
> most expensive blocks. What i could not achieve though, is profiling
> the program with a mainstream profiler, like valgrind or oprofile, or
> some other profilers for python. I remember than when visualizing the
> data, all the time was spent in the start() of the top block, and i
> could not get information pertaining each blocks general work, let
> alone functions executed within the block. After discovering the
> performance monitor, i used it in conjuntion with calls to clock() to
> determine the time spent in each function within each block, to get a
> rough measurement. But if it is possible to get this information
> automatically, i am very interested in learning how to do it. Could
> you help me?
>

Once upon a time, and unfortunately not long enough ago...

A particularly ugly method is to craft support code around your block
then call general_work() directly (i.e., exclude most of GNURadio).
There are many pitfalls to this approach but I was able to analyze the
performance of some blocks across several implementations using the
usual tools.

> There is also another interesting aspect of improving performance,
> which is blocks being blocked due to the output buffer being full.
> Ive tried playing around a bit with the min and max output buffer
> sizes, but the performance did not seem to be affected.
> After using the performance monitor to analyze the buffer average
> full %, i see that most of them are relatively full, however, i do
> not know if they are full enough to make an upstream block to have to
> wait to push data into the buffer.
>
>
> 2016-02-28 19:39 GMT-03:00 Douglas Geiger <doug.geiger@bioradiation.n
> et>:
> > The phenomenon Sylvain is pointing at is basically the fact that as
> > compilers improve, you should expect the 'optimized' proto-kernels
> > to no longer have as dramatic an improvement compared with the
> > generic ones. As to your question of 'is it worth it' - that comes
> > down to a couple of things: for example - how much of an
> > improvement do you require to be 'worth it' (i.e., how much is your
> > time worth and/or how much of an performance improvement do you
> > require for your application). Similarly, is it worth it to you to
> > get cross-platform improvements (which is one of the features of
> > VOLK)? Or, perhaps, is it worth it to you just to learn how to use
> > VOLK?
> >
> > A couple of thoughts here: in general, when I have a flowgraph that
> > is not meeting my performance requirements, my first step is to do
> > some course profiling (i.e. via gr-perf-monitorx) to determine if
> > there is a single block that is my primary performance bottleneck.
> > If so - that is the block I will concentrate on for optimizations
> > (both via VOLK, and/or any algorithmic improvements - e.g. can I
> > turn any run-time calculations into a look-up table calculated
> > either at compile-time, or within the constructor).
> > If there is not a clear bottleneck, then next I look a little
> > deeper using perf/oprofile to look at what functions my flowgraph
> > is spending a lot of time in: can I e.g. create a faster version of
> > some primitive calculation that all my blocks use a lot, and
> > therefore get a speed-up across many blocks which should translate
> > into a fast over-all application.
> >
> > Finally, if I still need more improvements I would look at
> > collecting many blocks together into a single, larger block. This
> > is generally less desirable, since you now have a (more)
> > application-specific block, and it becomes harder to re-use in
> > later projects, but if you have performance requirements that drive
> > you there, then it absolutely is an option. At this point you
> > likely have multiple operations being done to your incoming
> > samples, and it becomes easy to collect all of those into a single
> > larger VOLK call (and from there, create a SIMD-ized proto-kernel
> > that targets your particular platform). So, while re-usability of
> > code drives you away from this scenario, it offers the greatest
> > potential for performance improvements, and thus is where many
> > applications with high performance requirements tend to gravitate
> > towards. Ideally you can strike a balance between the two: i.e.
> > have widely re-usable blocks, but with a set of operations inside
> > them that you can take advantage of e.g. SIMD-ized function calls
> > to make them high-performance. If you can craft the block to be
> > widely re-usable for a certain class of things (e.g. look at how
> > the OFDM blocks are setup to be easily re-configurable for the many
> > ways an OFDM waveform can be crafted). In the long-run having more
> > knobs to turn to customize your existing code base to deal with
> > whatever new scenario you are looking at in 1/2/10 years from now
> > is always better than a brittle solution that solves today's
> > problem, but is difficult to re-use to deal with tomorrow's.
> >
> > Hope that was helpful. If you are interested in learning more about
> > how to use VOLK - certainly have a look at libvolk.org - the
> > documentation is (I think) fairly good at introducing the concepts
> > and intent, as well as how the API looks/works. And certainly don't
> > be shy about asking more questions here.
> >
> > Good luck,
> > Doug
> >
> > On Sun, Feb 28, 2016 at 1:58 AM, Sylvain Munaut <246tnt@gmail.com>
> > wrote:
> > > > Just wanted to ask the more experienced users if you think this
> > > idea is
> > > > worth a shot, or the performance improvement will be marginal.
> > >
> > > Performance improvement is vastly dependent of the operation
> > > you're doing.
> > >
> > > You can get an idea of the improvement by comparing the volk-
> > > profile
> > > output for the generic kernel (coded in pure C) and the sse/avx
> > > ones.
> > >
> > > For instance, on my laptop : for some very simple one (like float
> > > add), the generic is barely slower than simd. Most likely because
> > > it's
> > > so simple than even the compiler itself was able to simdize it by
> > > itself.
> > > But for other things (like complex multiply), the SIMD version is
> > > 10x faster ...
> > >
> > >
> > > Cheers,
> > >
> > > Sylvain
> > >
> > > _______________________________________________
> > > Discuss-gnuradio mailing list
> > > Discuss-gnuradio@gnu.org
> > > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
> > >
> >
> >
> > --
> > Doug Geiger
> > doug.geiger@bioradiation.net
> >
> _______________________________________________
> Discuss-gnuradio mailing list
> Discuss-gnuradio@gnu.org
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Re: [Discuss-gnuradio] Discuss-gnuradio] How to specify maximum size of input buffers on blocks

I have seen that most of the buffers are almost full on average (80-89%), however, that does not help me to know whether a block was blocked from pushing data into the buffer because downstream blocks did not read fast enough.

From your original mail:

"File Source is only allowed to produce so many items that the write pointer doesn't advance beyond the minimum read pointer, because in that case, it would overwrite samples that a downstream block hasn't consumed."

The maximum distance between the write pointer and the minimum read pointer, is percentage based? Is a fixed item value? This is because i tried to play around with the buffer sizes but did not notice any change in the performance of the flowgraph, either for good or bad.

You said that a buffer with size 0, will use the default gnu radio size. I would like to ask, is this a fixed value taken from a config file? or gnuradio automatically asses the system memory and sets a default buffer appriately? For example, if i would like to give gnuradio 6 GBs of RAM for the buffers, then upstream blocks should not block because downstream blocks are slow processing the buffer data, at least for a big amount of items.

2016-02-26 14:48 GMT-03:00 Gonzalo Arcos <gonzaloarcos12@gmail.com>:

I will investigate and try this. Thank you so much Marcus!
2016-02-26 12:09 GMT-03:00 Marcus Müller <marcus.mueller@ettus.com>:
Hi Gonzalo,

However i noticed that after the optimization, the sum of all blocks percentage of processing time is not 100%, this is when i started evaluating the possibility that gnu radio scheduler or gnuradio framework or the relationship between blocks is what is preventing me to achieved my desired rate, and it is not each block processing time anymore.
Well, in GNU Radio, every block runs in its own thread, so you can basically can get up to the number of CPU cores * 100% as total consumption.

However, i do not have an easy way to determine the "amount of time a block has been blocked because another block in the downstream has not read the input buffer fast enough for the first block to be able to write on the buffer".

The performance monitor should be able to tell you how full your buffers are on average; watch out for "full" buffers: they're upstream from your bottlenecks.
Also, look for the average rate (as reported e.g. by the probe_rate), and compare that value with the (average work items/average work call time). That kind of gives you a "block duty cycle".

Best regards,
Marcus
On 02/26/2016 01:51 PM, Gonzalo Arcos wrote:
Hi Marcus,

Thanks a lot for your answer! So now having confirmed this, comes the real question :)

I am trying to improve the performance of a flowgraph. My performance "metric" is the rate at which a file in my filestystem grows, since my flowgraph starts with a file source and ends with a file sink. This means that after doing some processing, the flowgraph outputs the result to a file, so if the file grows faster, then the flowgraph is executing faster and i achieve a faster transfer rate, even if i then change the file sink for another sink (i.e. usrp).

The transfer rate is very important to me, because im evaluating the performance of the flowgraph compared to the implementation that already exists in hardware, which i know exactly the capable transfer rate.

So after using the performance monitor and detecting some blocks that do a lot of processing, ive optimized those blocks.

However i noticed that after the optimization, the sum of all blocks percentage of processing time is not 100%, this is when i started evaluating the possibility that gnu radio scheduler or gnuradio framework or the relationship between blocks is what is preventing me to achieved my desired rate, and it is not each block processing time anymore.

However, i do not have an easy way to determine the "amount of time a block has been blocked because another block in the downstream has not read the input buffer fast enough for the first block to be able to write on the buffer".

It is also not as simple as connecting a file sink in an intermediate point in the flowgraph delete the rest of the blocks, and measure the transfer rate against this new file sink. This is because in my flowgraph interpolation and decimation occurs at different blocks, resulting in making the file size an unrealiable metric in these cases.

In the only case this is valid, is at the very end of the flowgraph, when the data has been restored to its original content, and therefore, any output by from the final block corresponds to an input byte.

I hope i expressed myself correctly. Do you have any tip on how to figure out why my flowgraph at a determine rate? Should the time a flowgraph executes be always the same to the sum of the time all blocks consume?
---------- Forwarded message ----------
From: Marcus Müller <marcus.mueller@ettus.com>
Date: 2016-02-26 6:52 GMT-03:00
Subject: Re: [Discuss-gnuradio] How to specify maximum size of input buffers on blocks
To: discuss-gnuradio@gnu.org
Hi Gonzalo,

these are the mails I like most :)

On 25.02.2016 01:11, Gonzalo Arcos wrote:

Suppose i have a file source, connected to 2 blocks, one, lets call it Block A, does always return instantly after general work is called, in other words, it will never consume an input item. The other one, say its block B, is the identity block, it consumes and output exactly what it receives as input.

The output of the first block is a null sink (since i know it will not produce any output), and the output of the second block is a file sink.

So:
              /->A->Null Sink  File Source -|                \->B->File Sink
What i am experiencing at running this flowgraph, is that block B will work for the very first seconds (or centiseconds), and will eventually block. This is because A has never consumed an input item, so i guess A's input buffer is full. Because of this, the file sink cannot push any more items further into the flowgraph, resulting in B not having any new input items to process.

Exactly!
So the mechanism below is: the output buffer of File Source is the input buffer of A and the input buffer of B. No memory duplication here.
File Source has a buffer writer with a write pointer, and A and B have their own read pointers pointing into that buffer.
When File Source produces N items, the write pointer advances by N. Similarly, when A consumes M items, A's read pointer advances by M.
When calling (general_)work, the input_items buffer(s) is (are) really just a pointer (start_of_buffer + read pointer). Equivalently, the output_items buffer(s) is (are) really just pointing to the write pointer.

File Source is only allowed to produce so many items that the write pointer doesn't advance beyond the minimum read pointer, because in that case, it would overwrite samples that a downstream block hasn't consumed.

So, my question is, how big is the input buffer of A?

Depends. Typically (64bit Linux), 8192 complex items get allocated, but that really depends on various factors.

Is that customizable?

Yes! For example, as you noticed, in the GRC, open the "advanced" tab in a block property; there's a "Min Output Buffer" and a "Max Output buffer field". There's corresponding methods to set these sizes in the gr::block class[1].

Can i increase the size, so the flowgraph wont block at the first seconds, but at a minute, for example?

Yes; but that might, depending on the rate in which the file source produces samples, be *a whole lot* of memory!
I recommend not doing that, but instead:

Make A consume the items it has read (because, as far as I can tell, you don't actually need them, afterwards, do you?)

implement your "end the running of this flowgraph after X samples" by adding a "head" block, which has the functionality to simply consume all N input samples, copy them from its input to its output buffer, and then consume(N). If sum(N) == X, it says it's done and thus leads to the flowgraph be shut down

Can i specify a policy for a block that if it reaches X amount of samples in its input buffer to drop some of the input?

You could just write a block that does that for you.

Ive seen that each block has minimum output buffer and maximum output buffer in grc. However, i do not see any option regarding its INPUT buffer.

Because there's no such thing as a dedicated input buffer :) because every input buffer is in fact the output buffer of the upstream block.

Another thing i thought is that A's input buffer is the file source output buffer, therefore by adjusting the output buffer of the file source, i am adjusting A's (and B's) input buffer. However, the output buffer of the file source is 0, so i guess its "infinite".

No, 0 just instructs GRC to not write a line containing a "set_min_output_buffer" call[2], so GNU Radio uses the default.

Best regards,
Marcus

[1] https://gnuradio.org/doc/doxygen/classgr_1_1block.html#a9d10e3f6747f91b215abe81b60a003d5
[2] https://github.com/gnuradio/gnuradio/blob/master/grc/python/flow_graph.tmpl#L230
_______________________________________________  Discuss-gnuradio mailing list  Discuss-gnuradio@gnu.org  https://lists.gnu.org/mailman/listinfo/discuss-gnuradio  
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Re: [Discuss-gnuradio] Using volk kernels on basic operations of gr_complex, in my own custom blocks.

Thanks to all of you for your very informative answers.

Douglas, i feel good now because you have described perfectly all the things i did / thought on how to improve the performance :), i also agree that merging blocks should be a last time resort. I have used the performance monitor and managed to improve the perofrmance of the most expensive blocks. What i could not achieve though, is profiling the program with a mainstream profiler, like valgrind or oprofile, or some other profilers for python. I remember than when visualizing the data, all the time was spent in the start() of the top block, and i could not get information pertaining each blocks general work, let alone functions executed within the block. After discovering the performance monitor, i used it in conjuntion with calls to clock() to determine the time spent in each function within each block, to get a rough measurement. But if it is possible to get this information automatically, i am very interested in learning how to do it. Could you help me?

There is also another interesting aspect of improving performance, which is blocks being blocked due to the output buffer being full. Ive tried playing around a bit with the min and max output buffer sizes, but the performance did not seem to be affected.

After using the performance monitor to analyze the buffer average full %, i see that most of them are relatively full, however, i do not know if they are full enough to make an upstream block to have to wait to push data into the buffer.

2016-02-28 19:39 GMT-03:00 Douglas Geiger <doug.geiger@bioradiation.net>:

The phenomenon Sylvain is pointing at is basically the fact that as compilers improve, you should expect the 'optimized' proto-kernels to no longer have as dramatic an improvement compared with the generic ones. As to your question of 'is it worth it' - that comes down to a couple of things: for example - how much of an improvement do you require to be 'worth it' (i.e., how much is your time worth and/or how much of an performance improvement do you require for your application). Similarly, is it worth it to you to get cross-platform improvements (which is one of the features of VOLK)? Or, perhaps, is it worth it to you just to learn how to use VOLK?

A couple of thoughts here: in general, when I have a flowgraph that is not meeting my performance requirements, my first step is to do some course profiling (i.e. via gr-perf-monitorx) to determine if there is a single block that is my primary performance bottleneck. If so - that is the block I will concentrate on for optimizations (both via VOLK, and/or any algorithmic improvements - e.g. can I turn any run-time calculations into a look-up table calculated either at compile-time, or within the constructor).
If there is not a clear bottleneck, then next I look a little deeper using perf/oprofile to look at what functions my flowgraph is spending a lot of time in: can I e.g. create a faster version of some primitive calculation that all my blocks use a lot, and therefore get a speed-up across many blocks which should translate into a fast over-all application.

Finally, if I still need more improvements I would look at collecting many blocks together into a single, larger block. This is generally less desirable, since you now have a (more) application-specific block, and it becomes harder to re-use in later projects, but if you have performance requirements that drive you there, then it absolutely is an option. At this point you likely have multiple operations being done to your incoming samples, and it becomes easy to collect all of those into a single larger VOLK call (and from there, create a SIMD-ized proto-kernel that targets your particular platform). So, while re-usability of code drives you away from this scenario, it offers the greatest potential for performance improvements, and thus is where many applications with high performance requirements tend to gravitate towards. Ideally you can strike a balance between the two: i.e. have widely re-usable blocks, but with a set of operations inside them that you can take advantage of e.g. SIMD-ized function calls to make them high-performance. If you can craft the block to be widely re-usable for a certain class of things (e.g. look at how the OFDM blocks are setup to be easily re-configurable for the many ways an OFDM waveform can be crafted). In the long-run having more knobs to turn to customize your existing code base to deal with whatever new scenario you are looking at in 1/2/10 years from now is always better than a brittle solution that solves today's problem, but is difficult to re-use to deal with tomorrow's.

Hope that was helpful. If you are interested in learning more about how to use VOLK - certainly have a look at libvolk.org - the documentation is (I think) fairly good at introducing the concepts and intent, as well as how the API looks/works. And certainly don't be shy about asking more questions here.

Good luck,
Doug

On Sun, Feb 28, 2016 at 1:58 AM, Sylvain Munaut <246tnt@gmail.com> wrote:
> Just wanted to ask the more experienced users if you think this idea is
> worth a shot, or the performance improvement will be marginal.

Performance improvement is vastly dependent of the operation you're doing.

You can get an idea of the improvement by comparing the volk-profile
output for the generic kernel (coded in pure C) and the sse/avx ones.

For instance, on my laptop : for some very simple one (like float
add), the generic is barely slower than simd. Most likely because it's
so simple than even the compiler itself was able to simdize it by
itself.
But for other things (like complex multiply), the SIMD version is 10x faster ...

Cheers,

Sylvain

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

--
Doug Geiger
doug.geiger@bioradiation.net