Friday, July 25, 2014

Re: [Discuss-gnuradio] comments on stream tags and metadata storage

I've added five issues to cover the topics from my original email and
followups.

http://gnuradio.org/redmine/issues/698 proposes that the key of a stream
tag be a namespaced identifier to avoid conflicts between
individually-developed components.

The other four are listed inline below with additional responses to Marcus:

On 07/25/2014 08:46 AM, Marcus Müller wrote:
> Hi Peter,
>
> I agree that this is a very relevant topic, and especially the
> performance of tag handling might prove to be problematic soon...
> However, it's a bit hard to start a discussion like that; a lot of
> things in GNU Radio are like they are because someone wrote them like
> that, and they proved to just work, or if they didn't, they got remodeled.
> That being said, your mail was very long, and it took me multiple
> sessions to read it. I've now decided to share my reply as partial as it is.
>
> sooo let me just whip up a few comments:
> (1) That's a documentation issue, isn't it?

I don't believe so.

http://gnuradio.org/redmine/issues/699 expands on point (1), that
add_item_tag() can only be safely called from within gr::block::work().

> Anyway, I'm not quite sure
> you're right; the insertion of tags is mutexed IIRC, and the
> get_tags_in_range() functionality, too, so once the user got his vector
> of tags, that won't change anymore. There's the possibility that he
> *misses* some of the tags for the range that get inserted after he
> got_tags_in_range(), but that's only fair -- it's quite intuitive not to
> insert tags after you've handed off the samples to which they would
> belong to downstream tags.

I wasn't concerned about tags across multiple calls to
get_tags_in_range(), but for tags that are added to a stream after one
call to work() and before the end of the next call to work().

> (2) that's an interesting point.
>> In the current implementation it's further necessary that tags be
> added to an output in monotonic non-decreasing offset order.
> Uh, that's news to me, can you point me to the reason? If a block
> assumes things to be ordered, but they aren't... again, this is not
> well-documented, so you're right for raising this issue!
>
> I'm a bit worried that you always suffer at one end: If tags are always
> stored ordered, than inserting tags gains computational complexity, even
> if the getter doesn't need them sorted.
>
> In my opinion we shouldn't go for the "generate tags only in work()"
> because that would increase the complexity of the insertion (inserting
> would have to device a check that it's being called from work, or this
> will only be a contract...) and is kind of unnecessary. A block always
> (even outside of work()) has access to nitems_written() so it's always
> able to avoid generating tags for samples that might already have been
> read downstream.

See below at point (4).

> (3) I don't see the relation to the discussion, as you said ;) but that
> sounds like a bug, so if you opened up a new thread or filed a bug at
> gnuradio.org, that would be awesome :)

http://gnuradio.org/redmine/issues/700 records point (3), that GRC
parameter callbacks can be invoked multiple times as a result of a
single user action due to the architecture of GUI and other components.

> (4) I'm fairly certain the buffers use a deque to store tags, not a map
> of any kind. So maybe I'm misunderstanding you, or you misread code
> somewhere?
> I think what you're describing might be a bug in metadata_filesink, so
> that might need some attention! see (3).

Yes, the lead-in example was specific to the behavior of
file_metadata_sink, but the fundamental requirement goes across the
entire infrastructure. I'm hoping to see support for making that
requirement stick.

http://gnuradio.org/redmine/issues/701 expands on points (2) and (4),
that the infrastructure should promise to maintain all tags inserted by
blocks in their original order (for any sample offset) and should
document the situations where tags may be discarded by the infrastructure.


> (5)>(5) All stream tags are placed in the extras block,
> sorry, can't follow you there. Extras block?

The extras block is described in the section "Extras Information" at:
http://gnuradio.org/doc/doxygen/page_metadata.html

http://gnuradio.org/redmine/issues/702 records points (2) and (5), that
the existing file_meta_source/sink corrupt (IMO) the metadata.

Thanks for your time. I look forward to seeing any feedback on the issues.

Again I may not have made it clear that I'm not intending to raise these
simply as complaints. I have the necessary software skills to fix them,
and they generally include an outline of the approach I'd propose. At
this time I won't make any promises, but I'd consider contributing the
necessary effort if I can get enough feedback/sanity-checks/interface
validation/similar assistance from you folks to be confident that it'd
be seen as a worthwhile enhancement.

Peter

> Greetings,
> Marcus
>
> On 25.07.2014 13:00, Peter A. Bigot wrote:
>> I'd hoped my comments below would start a more extensive dialog on GNU
>> Radio's metadata infrastructure. Several years experience that I have
>> with this capability in a non-commercial C++ DSP framework suggests
>> many enhancements in flow, representation, and utilities.
>>
>> I have a slight itch to contribute to a solution, but without
>> community involvement can't hope to provide anything mergable. Is
>> this simply not something anybody feels needs to be addressed, or did
>> I ask in the wrong forum?
>>
>> Peter
>>
>> On 07/17/2014 05:11 PM, Peter A. Bigot wrote:
>>> Some comments after playing with stream tags and metadata this
>>> afternoon.
>>>
>>> (1) Although the discussion of stream tag insertion hints that this
>>> should be done within the scheduler's call to work() it could be more
>>> clear that doing it in any other context can result in race conditions.
>>> (I did think I saw it stated more clearly somewhere, but can't find
>>> that now, so maybe this point has been addressed.)
>>>
>>> (2) In the current implementation it's further necessary that tags be
>>> added to an output in monotonic non-decreasing offset order.
>>> file_meta_sink does not sort the return value from get_tags_in_range(),
>>> and emits all data up to the timestamp of the next tag, so a subsequent
>>> tag with an earlier offset is dropped from the archive.
>>>
>>> (I note that tagged_file_sink() does sort the tags it receives in one
>>> case, but not in others.)
>>>
>>> I don't see this requirement on ordered generation documented. In some
>>> cases, it may be inconvenient to do this, e.g. when a block's analysis
>>> discovers after-the-fact that something interesting can be associated
>>> with a past sample. Similarly, a user might want a block to associate
>>> a tag with sample that not yet arrived, to notify a downstream block
>>> that will need to process the event.
>>>
>>> A simple solution for the infrastructure is to require that tags only be
>>> generated from within work(), with offsets corresponding to samples
>>> generated in that call to work(), and in non-decreasing offset order
>>> (though this last requirement could be handled by add_item_tag()). The
>>> developer must then handle the too-late/too-early tag associations
>>> through some other mechanism, such as carrying the effective offset as
>>> part of the tag value.
>>>
>>> (3) Qt GUI Range with widget Counter + Slider invokes callbacks twice,
>>> even if the value itself was set exactly once through the counter text
>>> entry. If the callback records the change by queuing a stream tag for
>>> addition to the output, multiple tags with the same offset/key/value
>>> will be generated.
>>>
>>> There are ugly solutions to this but it's probably sufficient to note
>>> somewhere that it can happen. It's really not specific to tags, but is
>>> clearly visible in that case.
>>>
>>> (4) The in-memory stream of tags can produce multiple settings of the
>>> same key at the same offset. However, when stored to a file only the
>>> last setting of the key is recorded.
>>>
>>> I believe this last behavior is incorrect and that it's a mistake to use
>>> a map instead of a multimap or simple list for the metadata record of
>>> stream tags associated with a sample.
>>>
>>> One argument is that it's critical that a stream archive of a processing
>>> session faithfully record the contents of the stream so that re-running
>>> the application using playback reproduces that stream and thus the
>>> original behavior (absent non-determinism due to asynchrony). This
>>> faithful reproduction is what would allow a maintainer to diagnose an
>>> operational failure caused by a block with a runtime failure when the
>>> same tag is processed twice at the same offset. This is true even if
>>> the same key is set to the same value at the same sample offset multiple
>>> times, which some might otherwise want to argue is redundant.
>>>
>>> A corollary argument is that the sample number at which an event like a
>>> tuner configuration change occurs usually can't be exactly associated
>>> with a sample; the best estimate is likely to be the index of the first
>>> sample generated by the next call to work. But depending on processing
>>> speed an application might change an attribute of a data source multiple
>>> times before work was invoked. The effect of those intermediate changes
>>> may be visible in the signal, and to lose the fact they occurred by
>>> discarding all but the last change affects both reproducibility and
>>> interpretation of the signal itself.
>>>
>>> (5) All stream tags are placed in the extras block, and when a segment
>>> is completed file_meta_sink will generate a new header. The new header
>>> contains copies of the unique tags, but updates their offsets to be the
>>> start of the new segment.
>>>
>>> This is incorrect as the original stream did not have those tags
>>> associated with those samples, so re-playing will introduce a behavioral
>>> difference. For example, a tag that is meant to be associated with the
>>> start of a packet will be duplicated at an offset that is probably not
>>> the start of a packet.
>>>
>>> Solutions include (a) leave the original offset setting for tags in the
>>> extras section when they're reproduced in a new segment, even though
>>> that offset is not present in the segment; (b) treat stream tags as
>>> ephemeral and do not persist them in the extras section when generating
>>> a new segment; (c) extend the add_item_tag API to record whether the
>>> tag is ephemeral or persistent. Offhand I can see no argument
>>> supporting persisting a tag and updating its offset, and only rare cases
>>> where it's appropriate to replicate outdated information in a new
>>> segment, so (b) seems to be the right move.
>>>
>>> All the above is based on my understanding and expectations of how
>>> stream tags are/should be used. If my understanding is mistaken,
>>> please let me know.
>>>
>>> Peter
>>>
>>>
>>> _______________________________________________
>>> Discuss-gnuradio mailing list
>>> Discuss-gnuradio@gnu.org
>>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>>
>> _______________________________________________
>> Discuss-gnuradio mailing list
>> Discuss-gnuradio@gnu.org
>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
> _______________________________________________
> Discuss-gnuradio mailing list
> Discuss-gnuradio@gnu.org
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio


_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

No comments:

Post a Comment