Wednesday, December 22, 2021

Re: VOLK C++ core

Hi all,

thanks for your thoughts so far.

I expect we can maintain a C interface for e.g. FFI (Foreign Function
Interface?).

The VOLK library itself is written in C. However, we provide all the
necessary tools to use VOLK in C++. Unlike other libraries that is not
as straightforward as one might think.
1. C complex.h and C++ std::complex are incompatible. We need to convert.
2. STD_C_COMPLEX is optional. MSVC doesn't support it. One might wonder:
How do we compile VOLK with MSVC then? Well, we pretend everything is
C++ and things compile. Unfortunately that approach leads to our C vs
C++ interface mess where we rely on undefined behavior.

I mentioned `std::span` because it seems like the better alternative to
`float *values, const size_t length`. We can still add a wrapper for C
that looks like the current API.

C++20 adds quite a few interesting features like `std::span` but also
concepts. Depending on our timeline, it might be too early for C++20 though.

Also, I already received feedback that suggests to do one step at a
time. i.e. finish our VOLK LGPL re-licensing business and then work on
the next step.

I arrived at the point where I wrote this email after I tried to improve
the current VOLK API. I'd like to add smth like `multiply(std::span
result, std::span in0, std::span in1)` and just unpack that to call a
`multiply(float* result, ... , unsigned length)` function. However, the
current complex.h support makes that almost impossible. We can add the
API but we still have this let's compile things in C or C++ mode
depending on the compiler thing.

Cheers
Johannes

On 22.12.21 01:21, Marcus Müller wrote:
> Hey Nick,
>
> > Yes, that's kind of my fault.
>
> "Dault" is kind of a hard word when it's your achievement that a C API
> VOLK got as far as it took us!
>
> > C++20 finally makes C++ a much less Lovecraftian nightmare
>
> We're going to have template metaprogramming! SFinaE fhtagn!
>
> Seriously, though. Operations as base classes, kernels / hardware
> specializations as subclasses, and the actual call being a virtual
> operator() call...
>
> Thinks about this: Instead of the dispatcher bending the addresses
> symbols from shared libraries point to (as we do now), there could be a
> dispatcher object (possibly, but not necessary a singleton) with members
> of the operation class type, which get assigned instances of the optimal
> (according to prior volk_profile and/or heuristics) kernel
> implementation subclass.
> I.e., instead of using our current nice trick to save the address of the
> correct &volk_sig_in_kernel_sig_out_arch_alignment in
> volk_sig_in_kernel_sig_out_aligment, we just use the standard vtable/C++
> polymorphism. Same performance at runtime - one CALL.
> Immediate benefit: all these things suddenly become self-aware.
> Let's add a self-documenting call that returns a const char* describing
> what this thing does. That makes people very happy when they wrap things
> for Python, because now the type comes with documentation in your IDE
> through little effort.
> We can tell the user that this kernel prefers but doesn't need aligned
> memory. Or, much more bug-relevant, we could communicate the acceptable
> input multiples, and stop doing the cute "for the rest, we do the
> _general approach after the main loop is through in every single
> _arch_alignment" implementation.
>
> (of course, I have far more somber dreams, don't assume the Lovecraftian
> horrors are too far from here. If each kernel implementation is a type,
> we can make these types have the capability (optional trait) to give us
> the type encapsulating what the VOLK kernel does in its "inner loop" (if
> applicable). Because C++ allows us to pass things like __mm256& and
> const __mm256&, we can then simply compose new inner loops. And of
> course, instead of implementing the same loop skeleton 200 times, we
> could just, for those kernels where the inner loop is "simple", have one
> templated
>
> template<typename nucleus_operation>
> class loop_kernel {
>    using op = nucleus_operation;
>    operator()(std::span<op::in_type> in, ...) {
>     for(auto ptr = in.begin(); ptr < in.end(); ptr += op::simd_width)
>       op::operate(*ptr, ...)
>    }
> };
>
> or so.)
>
> Cheers!
> Marcus
>
>
> On 21.12.21 20:25, Nick Foster wrote:
>>
>> On Tue, Dec 21, 2021 at 3:29 AM Marcus Müller <mmueller@gnuradio.org
>> <mailto:mmueller@gnuradio.org>> wrote:
>>
>>     Hi Johannes,
>>
>>     I, for one, like it :) Especially since I honestly find void
>>     volk_32fc_x2_s32fc_multiply_conjugate_add_32fc to be a teeny tiny
>> bit clunky and would
>>     rather call a type-safe, overloaded function in a volk namespace
>> called
>>     multiply_conjugate_add.
>>
>>
>> Yes, that's kind of my fault. It was the best option we could come up
>> with to be rigorously type-specific in C, kind of a bespoke
>> implementation of name mangling. The original motivation, of course,
>> was the VOLK dispatcher. C was a hard requirement at the time, and I
>> confess I don't remember why. I think it came down from namccart's
>> original donation of vectorized code.
>>
>> I would be very happy to see VOLK move to C++ (or at least provide
>> wrappers). I strongly advocate for using C++20 -- std::span, variadic
>> arguments, lambdas etc. seem tailor-made for VOLK. Runtime dispatching
>> could be positively elegant, compared to how it must be done in C. And
>> C++20 finally makes C++ a much less Lovecraftian nightmare of a
>> language than the one I learned from Stroustrop.
>>
>> Nick
>>
>>     Re: RFC: can we have something like a wiki page (maybe on the VOLK
>> repo?) to collect
>>     these
>>     comments?
>>
>>     You mention spans, so C++-VOLK would be >= C++20?
>>
>>     Cheers,
>>     Marcus
>>
>>     On 21.12.21 10:55, Johannes Demel wrote:
>>      > Hi everyone,
>>      >
>>      > today I'd like to propose an idea for the future of VOLK.
>> Currently, VOLK is a C
>>     library
>>      > with a C++ interface and tooling that is written in C++.
>>      >
>>      > I propose to make VOLK a C++ library. Similar to e.g. UHD, we
>> can add a C interface if
>>      > the need arises.
>>      >
>>      > This email serves as a request for comments. So go ahead.
>>      >
>>      > Benefits:
>>      > - sane std::complex interface.
>>      > - same compilation mode on all platforms.
>>      > - Better dynamic kernel load management.
>>      > - Option to use std::simd in the future
>>      > - Less manual memory management (think vector, ...).
>>      >
>>      > Drawbacks:
>>      > - It is a major effort.
>>      > - VOLK won't be a C project anymore.
>>      >
>>      > Why do I propose this shift?
>>      > VOLK segfaults on PowerPC architectures. This issue requires a
>> breaking API change
>>     to be
>>      > fixable. I tried to update the API to fix this isse.
>>      > https://github.com/gnuradio/volk/pull/488
>> <https://github.com/gnuradio/volk/pull/488>
>>      > It works with GCC and Clang but fails on MSVC.
>>      > One might argue that PowerPC is an obscure architecture at this
>> point but new
>>      > architectures might cause the same issue in the future. Also,
>> VOLK tries to be
>>     portable
>>      > and that kind of issue is a serious roadblock.
>>      >
>>      > How did we get into this mess?
>>      > The current API is a workaround to make things work for a
>> specific compiler: MSVC.
>>     MSVC
>>      > does not support C `complex.h` at all. The trick to make things
>> work with MSVC is:
>>      > compile VOLK in C++ mode and pretend it is a C++ library anyways.
>>      > In turn `volk_complex.h` defines complex data types differently
>> depending if VOLK is
>>      > included in C or C++. Finally, we just hope that the target
>> platform provides the same
>>      > ABI for C complex and C++ complex. C complex and C++ complex
>> are not compatible.
>>      > However, passing pointers around is.
>>      > Thus, the proposed change does not affect Windows/MSVC users
>> because they were
>>     excluded
>>      > from our C API anyways. The bullet point: "same compilation
>> mode on all platforms"
>>      > refers to this issue.
>>      >
>>      > Proposed timeline:
>>      > Together with our re-licensing effort, we aim for a VOLK 3.0
>> release. VOLK 3.0 is a
>>     good
>>      > target for breaking API changes.
>>      >
>>      > Effects:
>>      > I'd like to make the transition to VOLK 3.0 as easy as
>> possible. Thus, I'd like to
>>     keep
>>      > an interface that hopefully doesn't require any code changes
>> for VOLK 2.x users. A
>>      > re-built of your application should be sufficient. However,
>> we'd be able to adopt a
>>      > C++-ic API as well. e.g. use vectors, spans etc.
>>      >
>>      > The current implementation to detect and load the preferred
>> implementation at
>>     runtime is
>>      > hard to understand and easy to break. C++ should offer more
>> accessible tools to make
>>      > this part easier.
>>      >
>>      > What about all the current kernels?
>>      > We'd start with a new API and hide the old kernel code behind
>> that interface. We
>>     come up
>>      > with a new implementation structure and how to load it. Thus,
>> we can progressively
>>      > convert to "new-style" implementations.
>>      >
>>      > Another bonus: std::simd
>>      > Currently, std::simd is a proposal for C++23. Making VOLK a C++
>> lib would allow us to
>>      > eventually use std::simd in VOLK and thus make Comms DSP
>> algorithms more optimized on
>>      > more platforms.
>>      >
>>      > Cheers
>>      > Johannes
>>      >
>>
>

No comments:

Post a Comment