Hi Marcus,
Thanks for the reality check! You make a great point about the GitHub runner essentially acting as its own queue, treating it as a 1:1 runner-to-node-pair mapping cuts out a massive amount of unnecessary orchestration logic.
I will definitely take your advice to avoid the Labgrid and keep the architecture as lean and native as possible.
Thanks again for the sanity check and the practical insights!
Best,
Joseph
On Wed, 1 Apr, 2026, 18:59 Marcus Müller, <mmueller@gnuradio.org> wrote:
Hey Joseph,
I'm relieved if you've already been discussing things with Cyrille. Don't get rabbithole'd
early on :)
I'm not sure what Labgrid brings to the table in terms of making sure that concurrent PR
jobs don't collide; the usual paradigm here, and how for example the github runner
interface (but also other CI runners) handle this is that there's a daemon that accepts a
job, and only when it's done starts the next. The CI runner you need to have running to
accept the job coming from the forge (Github) already does that!
> Crash-safe orphan detection
Not a bad thing to have, but since any test needs to fail after a timeout has happened,
I'd assume the timing-out and yielding the node would be included in what you run on the
nodes themselves.
Other than that, I'd feel fairly confident that in case the controller crashes, you have a
complete system failure, at which point you can just automatically stop all jobs you're
running, and start fresh.
> Hardware-agnostic YAML environments: Keeps test scripts decoupled from CorteXlab's
> specific node identifiers.
Not a bad motive! Yes, but it locks you into the specific format of a different specific
software, and it's just a few YAML files, so while I think this is a good argument, not
one that is super urgent!
> I noted in the proposal that the plan is to assess this during Community Bonding
Very fair!
Best regards,
Marcus
On 2026-04-01 1:12 PM, Joseph George wrote:
> Hi Marcus,
> Thanks for taking a look and for the feedback!
> I completely agree that CorteXlab's native Minus API is fantastic for handling the core
> node management and scheduling.
> After discussing the architecture with Cyrille on the list earlier this week, I actually
> updated the final proposal to clarify this exact relationship.
>
> My main reason for exploring Labgrid was actually to fill a few specific CI orchestration
> gaps that I wasn't sure CorteXlab's Minus API handled natively for unattended runners.
> Specifically:
>
> * Blocking reservation queue: Ensures concurrent PR jobs don't collide.
> * Crash-safe orphan detection: Uses a heartbeat so a killed runner doesn't hold a node
> locked indefinitely.
> * Hardware-agnostic YAML environments: Keeps test scripts decoupled from CorteXlab's
> specific node identifiers.
>
>
> *I noted in the proposal that the plan is to assess this during Community Bonding. If
> introducing Labgrid is too heavy or the wrong fit for the GNU Radio CI ecosystem, I am
> 100% on board with dropping it and just building a lightweight custom shim around the
> Minus API to handle the queueing and heartbeats.*
> *
> *
> Really appreciate you taking the time to review the concept! I'd love to hear your
> thoughts on this layered approach.
> Best,
> Joseph
>
> On Wed, 1 Apr, 2026, 16:20 Marcus Müller, <mmueller@gnuradio.org
> <mailto:mmueller@gnuradio.org>> wrote:
>
> Don't think labgrid is the kind of thing that helps here, much.
>
> On 2026-03-31 2:09 PM, Philip Balister wrote:
> > On 3/30/26 4:16 AM, Cyrille Morin wrote:
> >> Hello Joseph,
> >>
> >> I read trough your document.
> >> Overall, it looks good, it appears to have everything required of the proposal
> document.
> >>
> >> A couple of thoughts:
> >>
> >> The proposed integrated tests look good and feel like what we would like to head
> >> towards, but being integration tests, they involve a lot of moving parts, so they
> might
> >> require a lot of tweaking and debugging time to work reliably, which might push
> back the
> >> integration into the CI pipeline.
> >>
> >> I've never used Labgrid so I don't know much about what it can or cannot help
> with. But
> >> it does sound in your proposal to perform many task already done by the platform's
> >> systems (booking, health check, ...) You might want to detail where specifically
> Labgrid
> >> would offer new and required capabilities
> >
> > Labgrid would offer a general API to the hardware so the work could extend beyond
> > CorteXlab. It is certainly worth a look to see if it is straight forward to
> abstract the
> > interface to the underlying hardware.
> >
> > Philip
> >
> >
> >>
> >> Best
> >>
> >> *Cyrille MORIN*
> >> /Ingénieur SED/
> >> /Équipe MARACAS/
> >>
> >> Logo Inria
> >> Centre Inria de Lyon
> >>
> >> Laboratoire CITI
> >> Campus La Doua - Villeurbanne
> >> 6 avenue des Arts
> >> F-69621 Villeurbanne
> >>
> >> https://team.inria.fr/maracas/ <https://team.inria.fr/maracas/>
> >> Le 28/03/2026 à 14:49, Joseph George a écrit :
> >>>
> >>> Hi Cyrille,
> >>>
> >>> I have completed the first draft of my GSoC 2026 proposal for the "Hardware in
> the Loop
> >>> CI" project.
> >>>
> >>> Draft : Hardware in the Loop CI <https://drive.google.com/file/ <https://
> drive.google.com/file/>
> >>> d/1ATLOxq_bvPpG7fizTQtZK-8w_BwadVeF/view?usp=drive_link>
> >>>
> >>> A huge thank you to Larry and Philip for the insights. I have explicitly
> integrated the
> >>> LBNL Node Health Check paradigm to isolate hardware failures from software
> regressions,
> >>> and I've adopted Labgrid as the core hardware orchestration layer to manage the
> >>> CorteXlab USRPs.
> >>>
> >>> I would greatly appreciate any feedback from the community,
> >>>
> >>> Thanks for your time and guidance!
> >>>
> >>> Best, Joseph George
> >>>
> >>>
> >>> On Thu, 26 Mar 2026 at 22:23, Cyrille Morin <cyrille.morin@inria.fr
> <mailto:cyrille.morin@inria.fr>> wrote:
> >>>
> >>> Hi Joseph,
> >>>
> >>> Welcome!
> >>>
> >>> Feel free to share your draft here on the mailing list, for
> >>> feedback by members of the community, that's the right place
> >>>
> >>> I don't have a specific format for the tests scenarios, choose
> >>> what you think is best/more readable/most relevant.
> >>> But do look at the GSoC Student info on the wiki if you haven't
> >>> already: https://wiki.gnuradio.org/index.php?title=GSoCStudentInfo <https://
> wiki.gnuradio.org/index.php?title=GSoCStudentInfo>
> >>> <https://wiki.gnuradio.org/index.php?title=GSoCStudentInfo <https://
> wiki.gnuradio.org/index.php?title=GSoCStudentInfo>>
> >>>
> >>> *Cyrille MORIN*
> >>> Le 26/03/2026 à 15:56, Joseph George a écrit :
> >>>> Hi Cyrille,
> >>>> I'm Joseph, an ECE student and the Chair of the IEEE Signal
> >>>> Processing Society at my college. I'm putting together a GSoC
> >>>> proposal for the "Hardware in the loop CI" project and wanted to
> >>>> quickly say hello.
> >>>>
> >>>> I have a strong background in bridging DSP theory with physical
> >>>> hardware. I recently placed 7th globally in the ICASSP 2026 ALS
> >>>> challenge by building domain-driven acoustic biomarker pipelines,
> >>>> and I regularly build hardware projects (like ESP32 navigation
> >>>> systems using Kalman filtering for sensor fusion). I'd love to
> >>>> help bring GNU Radio's CI tests out of software only simulation
> >>>> and onto the physical CorteXlab hardware.
> >>>>
> >>>> I am drafting my 12-week timeline right now. Is there a specific
> >>>> format you prefer for the test scenarios, or a good place to drop
> >>>> a link to my draft for a quick sanity check before Tuesday's
> >>>> deadline?
> >>>
> >
> >
>
>
No comments:
Post a Comment