Reproducible RV Pipeline Workshop for Students

A hands-on RV workshop where students compare pipelines, quantify differences, and learn reproducible exoplanet analysis.

Radial velocity is one of the most important techniques in modern exoplanet detection, but it is also one of the easiest places for hidden assumptions to sneak into a result. If two teams analyze the same star with different data reduction choices, masking rules, wavelength solutions, or uncertainty models, they can end up with different planet masses, different detection significance, and sometimes even different conclusions about whether a planet exists at all. That is exactly why a hands-on workshop is so valuable: students learn not only how to run the pipelines, but why reproducibility and error analysis matter as much as the final fit. This guide is designed as a practical teaching resource for classrooms, research groups, and summer schools, and it connects the workflow to broader lessons in credible space science communication and the open-science habits that make results easier to trust and reuse.

Grounded in the research culture of groups like the Aarhus exoplanet team and instrument-driven programs such as Carnegie’s PFS work, the workshop below treats pipeline comparison as an experiment in itself. Students process the same stellar spectra through multiple RV pipelines, compare the outputs, and then diagnose the differences. Along the way, they build the habits that define strong astronomy training: documentation, version control, uncertainty budgeting, and a willingness to test whether a “detected planet” survives changes in assumptions. If you want a broader context for how astronomy training is evolving, see our piece on the changing landscape of undergraduate astronomy degrees, which highlights why practical, research-like exercises are increasingly important.

Why Radial-Velocity Pipelines Are Such a Powerful Teaching Tool

Radial velocity in one sentence

Radial velocity measures the tiny back-and-forth motion of a star along our line of sight, inferred from Doppler shifts in its spectral lines. A planet does not orbit alone; the star and planet both move around their common center of mass, and we detect the stellar wobble through careful spectroscopic measurement. In practice, those velocity shifts can be just a few meters per second or less, which means the method is exquisitely sensitive to calibration, tellurics, instrumental drift, stellar activity, and reduction choices. That sensitivity is what makes RV both powerful and educational: students can see how small technical decisions produce meaningful scientific differences.

Why different pipelines can disagree

Different pipelines may use different line masks, template construction methods, continuum normalization, cosmic-ray rejection, barycentric corrections, or order-weighting schemes. One pipeline may be optimized for a given spectrograph’s throughput and detector characteristics, while another may prioritize transparency and portability. These differences are not necessarily flaws; they are often sensible choices tailored to particular goals. The educational value comes from making those choices visible and then asking students to trace how each choice affects the output velocities, uncertainty bars, and eventual Keplerian fit.

What students learn beyond astronomy

This workshop is also a lesson in scientific computing. Students practice reproducibility, metadata tracking, and collaborative debugging, all of which are essential skills for future researchers and data scientists. In that sense, the workshop aligns with the same workflow mindset used in other technical fields, from debugging quantum jobs that fail because of noise to building trustworthy analysis pipelines in engineering. The key habit is simple: never trust a result you cannot reproduce, explain, and rerun from scratch.

Workshop Goals, Learning Outcomes, and Audience

Who this workshop is for

The ideal audience includes advanced undergraduates, first-year graduate students, research interns, and teachers who want to bring authentic astronomical research into the classroom. Students do not need to be pipeline experts at the beginning, but they should be comfortable with basic Python, plotting, and the idea of working with tabular data. A mixed-level group actually works well because it mirrors real research teams, where one person may be confident with code while another notices a subtle issue in the data. That social learning aspect also makes the workshop more durable and more inclusive.

Core learning outcomes

By the end of the workshop, participants should be able to explain the basic RV method, run at least two pipelines on the same dataset, compare their outputs, and quantify the differences with plots and summary statistics. They should also be able to identify common sources of disagreement, such as missing calibration frames, different outlier rejection thresholds, or underestimated uncertainties. Just as important, they should understand when a mismatch is scientifically meaningful and when it is a harmless consequence of equivalent modeling choices. That distinction is a central part of real-world space industry analysis and reporting too: not every difference is a controversy, but every difference deserves context.

Workshop products students can take home

Students should leave with a small reproducible analysis bundle: scripts, notebooks, plots, a results table, and a short methods memo. If the workshop is run as a class assignment, that bundle can become the basis for a mini research report or poster. It is especially powerful when students are required to document the software versions and parameter files they used, because that habit makes their project usable by others later. For educators, that deliverable is more than a grade artifact; it is a teaching demonstration of open science in action.

Dataset Strategy: Using the Same Stars to Stress-Test Different Pipelines

Choosing a good workshop dataset

Not every RV dataset is suitable for teaching. For a workshop, you want a dataset with enough observations to show structure in the velocities, but not so much complexity that students spend the entire session fighting file formats. Ideal examples include a known planet host, a quiet star with a clear periodic signal, or a published benchmark dataset with accessible metadata. If your goal is to compare pipelines rather than discover a new planet, it is usually better to choose a dataset where the expected answer is already known, because that lets students focus on methodological differences rather than hunting for a signal from scratch.

Open datasets and instrument context

Whenever possible, use spectra from a well-documented instrument or a published RV archive. That helps students connect what they see in the pipeline to the realities of spectrograph design, observing cadence, and calibration strategy. The research culture described by teams such as Aarhus exoplanet groups and instrument specialists like Johanna Teske’s Carnegie exoplanet work shows why the instrument context matters: the data are not abstract numbers, but measurements shaped by telescope, spectrograph, and observing strategy. If you need a broader pedagogical frame for this kind of hands-on experience, our guide on narrative transport in the classroom explains why a story-like workflow helps students retain technical concepts.

Data volume and difficulty tuning

The best workshop dataset is one students can finish in one day or one weekend. If the files are too large, the workshop becomes a computing bottleneck rather than a scientific exercise. If the signal is too obvious, students may miss the point that a clean-looking result can still hide processing sensitivity. A good rule is to choose a dataset with several dozen spectra and at least one opportunity to discuss edge cases like low signal-to-noise points, activity-induced jitter, or sparsely sampled phase coverage.

Pipeline Comparison: What to Look For and How to Structure the Exercise

What “multiple pipelines” should mean

In this workshop, “multiple pipelines” does not mean simply pressing the same buttons in slightly different software. It means comparing distinct analysis philosophies. For example, students can compare a pipeline that emphasizes automated reduction and robust defaults with a more configurable open workflow that exposes each step. The goal is to reveal how assumptions propagate from raw spectra to velocity time series and then into orbital parameters. If you want a useful analogy outside astronomy, think of it like comparing different production systems that each promise reliability but make different tradeoffs in observability and control, much like the ideas in operate versus orchestrate frameworks.

Best-practice comparison metrics

Students should compare more than just the final period or semi-amplitude. They should examine RMS scatter, internal error bars, goodness-of-fit statistics, residual structure, phase-folded curves, and any correlation with activity indicators or observing conditions. It is also worth checking whether the pipelines preserve the same observation timestamps, barycentric corrections, and reference frames. These details can feel tedious in the moment, but they are exactly where reproducibility problems are born.

A practical comparison table

The table below gives a template for what students should record as they compare pipelines. You can adapt it to your specific instrument, code base, or class level.

Comparison Item	Pipeline A	Pipeline B	What to Interpret
Velocity zero point	Instrument-relative	Template-relative	Offsets can shift absolute values without changing the planet signal
Typical uncertainty	Smaller formal errors	Larger conservative errors	Underestimated errors can inflate detection significance
Outlier handling	3σ clipping	Robust weighting	Different choices affect scatter and possible signal loss
Telluric treatment	Masking only	Model-and-subtract	Residual contamination can mimic low-amplitude variability
Activity diagnostics	Not included	Included alongside RVs	Important for separating planet signals from stellar noise
Final fitted planet mass	Higher by 8%	Lower by 8%	Investigate whether the difference comes from model assumptions or errors

Students often find that the most useful insight is not which pipeline is “right,” but which parts of the workflow are stable and which are fragile. That is a valuable lesson because science rarely gives a single correct answer independent of method. Instead, good science quantifies how sensitive the answer is to defensible choices. That sensitivity analysis is what turns a result from a number into evidence.

Reproducibility as a First-Class Research Skill

What reproducibility actually means in this workshop

Reproducibility means another person can follow your steps and obtain the same or statistically consistent result. In an RV workshop, that includes software versions, parameter files, masks, templates, calibrations, and the exact dataset used. It also means students should be able to explain any differences between reruns rather than treating them as mysterious glitches. In practical terms, reproducibility is a form of scientific humility: you acknowledge that methods matter and that your result depends on them.

How to structure a reproducible workflow

The simplest structure is: data download, environment setup, pipeline run, output comparison, and report generation. Each stage should be scripted rather than done manually wherever possible, and the scripts should be stored with the analysis. Version control systems like Git help students track changes, and a shared repository can make collaboration much easier. For teams who need to understand why reproducibility is also a business and organizational advantage, the logic resembles lessons from workflow architecture in regulated data systems: if the system cannot be inspected, it cannot be trusted.

Documenting assumptions

The most common reproducibility failure in student projects is incomplete documentation of assumptions. Did you normalize each order separately or together? Did you use the same wavelength mask for all stars? Did you exclude spectra with low signal-to-noise, and if so, why? Students should learn to write these decisions down as if another group will inherit their work, because in research, another group often will. Good notes make future troubleshooting far easier than trying to reverse-engineer a notebook weeks later.

Error Analysis: Where the Science Really Gets Interesting

Formal uncertainties versus real-world scatter

One of the most important lessons in RV analysis is that formal fit errors are not always the whole story. A pipeline may output tiny per-point uncertainties, but the residuals can still show excess scatter from stellar activity, instrumental systematics, or underestimated noise. Students should therefore compare the reported errors with the empirical scatter of the residuals. If the residual RMS is much larger than expected, the pipeline is telling you something important about model inadequacy, not just measurement noise.

Common RV error sources students should test

Students should explicitly explore at least four categories of error: photon noise, calibration error, barycentric correction error, and astrophysical jitter. It is also wise to include telluric contamination, order-to-order inconsistencies, and template mismatch, especially if the star type differs from the pipeline’s “ideal” target. A useful classroom exercise is to deliberately perturb one setting at a time and observe how the velocity curve changes. This “controlled stress test” approach feels similar to the way engineers evaluate system resilience in simulation-based de-risking workflows.

How to teach students to report uncertainty honestly

Students often want a single clean number, but scientific honesty requires them to report uncertainty in a way that reflects model limitations. If two pipelines disagree beyond their quoted errors, students should not simply average them unless they have a principled reason. Instead, they should explain the discrepancy, check diagnostics, and, when appropriate, inflate the uncertainty budget or cite a systematic offset. That habit is especially important in exoplanet detection, where a claimed signal can move from “promising” to “unreliable” once systematic error is accounted for.

A Step-by-Step Workshop Plan You Can Run in One Day

Phase 1: Orientation and scientific context

Start with a short talk on the RV method, common sources of noise, and the scientific question you are testing. Show one published example of a planet detection and one case where stellar activity complicated the result. Students should see early that RV analysis is both elegant and messy, because that mental model helps them interpret the outputs later. If you need a quick media-aware framing for a class or public-facing version of this workshop, the ideas in using live NASA and astronaut clips can help you add visual context without oversimplifying the science.

Phase 2: Run the pipelines

Assign students to small groups and give each group the same dataset. Have one half run Pipeline A and the other half run Pipeline B, then swap results so every group sees both. Ask them to record the number of spectra ingested, rejected, corrected, and transformed into velocities. This is the point where small implementation differences become visible, and students begin to appreciate how “same data” does not always mean “same analysis.”

Phase 3: Compare and explain

Students should produce a comparison plot, a summary table, and a brief interpretation paragraph. The most useful comparisons usually include the velocity time series, residuals, phase-folded orbit, and uncertainty distributions. Encourage students to identify where the pipelines agree and where they diverge, then rank the likely causes from most to least plausible. This stage can be framed like a detective exercise, which keeps the room engaged while also building genuine analytical discipline.

Phase 4: Re-run with one changed assumption

To drive home the reproducibility lesson, change exactly one assumption: for example, alter the mask, add one more rejection criterion, or switch the weighting scheme. Students then rerun the analysis and report what changed. This makes the abstract idea of sensitivity concrete. It is often the moment when they realize that pipeline outputs are not facts handed down by software, but outcomes produced by a chain of choices.

Best Practices for Student Teams and Instructors

Use roles to reduce confusion

A simple role structure works well: one student manages data and file organization, one runs the pipeline, one checks logs, and one records results. Rotating roles halfway through the workshop helps everyone gain broader experience while reducing the chance that a single person becomes the bottleneck. This also mirrors the collaborative nature of real observing teams, where data handling, analysis, and interpretation are often split across people. For broader workflow inspiration, our guide on strong onboarding practices in hybrid teams offers a useful model for keeping distributed collaborators aligned.

Teach students to trust logs, not memory

Students should be trained to read pipeline logs carefully. Many reproducibility issues are obvious there: a missing calibration file, a version mismatch, a file path error, or an unexpected fallback setting. A good instructor habit is to pause whenever an error appears and have the group read the log line by line before touching the code. That habit pays off long after the workshop, because it helps students debug systematically rather than emotionally.

Use a shared interpretation rubric

Not every difference between pipelines deserves a new hypothesis about the star. Provide a rubric that asks: Is the difference within uncertainties? Does it persist after rerunning? Does it correlate with a known systematic? Is the effect astrophysical, instrumental, or procedural? A rubric prevents the workshop from drifting into guesswork and gives students a repeatable framework they can use later in research projects.

Common Pitfalls and How to Avoid Them

Confusing precision with accuracy

A pipeline can produce very tight error bars and still be wrong in a systematic way. This is one of the most important conceptual lessons in the workshop. Students should learn that low scatter does not automatically imply a valid planet detection if the model does not capture the noise structure. The remedy is not to chase prettier plots, but to understand the assumptions beneath them.

Assuming one pipeline is the “gold standard”

Students sometimes assume that the most automated or most widely used pipeline must be correct. In reality, pipelines are tools, and their usefulness depends on the instrument, target star, and scientific question. A pipeline that works brilliantly for one spectrograph may be mediocre for another. This is why comparison workshops are so valuable: they teach context-dependent judgment, not blind software loyalty.

Ignoring astrophysical noise

Stellar activity can create velocity signals that resemble planets, especially at low amplitudes. Spots, plages, rotation, and magnetic cycles all complicate the interpretation of RV data. Students should be encouraged to inspect activity indicators when available and to ask whether a candidate period matches stellar rotation or activity timescales. That habit is foundational to honest exoplanet detection and helps prevent overconfident claims.

Assessment, Extensions, and Classroom Variations

How to assess student learning

A good assessment asks students to explain one pipeline difference, quantify one uncertainty issue, and propose one improvement to the workflow. You can grade their plots, their method notes, and their interpretation in a short memo. The strongest responses will not just report numbers; they will show that the student understands where the numbers came from and what could make them change. That combination of explanation and critique is a hallmark of genuine research readiness.

Ways to extend the workshop

Advanced groups can add transit photometry, astrometric context, or stellar activity diagnostics. Another strong extension is to ask students to contribute a pull request to a shared code repository or to write a short reproducibility checklist for future users. If you want to connect the workshop to public events and outreach, consider pairing it with a local observing night or a planetarium program. For inspiration on making astronomy engaging beyond the classroom, see our practical guide to planning an eclipse experience, which shows how structured observing turns curiosity into memory.

How to adapt for teachers and clubs

For high school teachers or astronomy clubs, simplify the pipelines by pre-configuring environments and using smaller datasets, but keep the comparison component intact. The whole point is not to overwhelm learners with software complexity, but to help them see that scientific results emerge from decisions, not magic. If you need materials for student-facing outreach or event promotion, our guide to clear narrative sequencing offers ideas for keeping technical content understandable and compelling.

What a Strong Workshop Report Should Include

Methods

Students should describe the dataset, the two pipelines, the software versions, the parameter settings, and any manual interventions. The methods section should be detailed enough that another group could repeat the work. It is worth reminding students that “standard settings” is not a method description. Specificity is what makes the report usable and trustworthy.

Results

The results section should present the velocity curves, the residuals, the best-fit orbital parameters, and a concise comparison of uncertainties and fit quality. Students should report both agreement and disagreement, even if one pipeline clearly outperforms the other. If they can, they should include a short paragraph on the likely cause of any discrepancy. That makes the report feel like a mini-paper rather than a lab worksheet.

Discussion

The discussion should interpret the differences in scientific terms. Was one pipeline more sensitive to outliers? Did another better handle order weighting or telluric masking? Did either result suggest that the signal might be partly due to stellar activity? The best discussions connect technical choices to astrophysical consequences, which is exactly the skill students need when they move from class exercises to real research.

FAQ

What is the main educational purpose of comparing RV pipelines?

The main purpose is to teach students that exoplanet detection is not just about obtaining a velocity curve. It is about understanding how software choices, calibration, and uncertainty modeling shape the scientific result. By comparing pipelines on the same dataset, students see that reproducibility is a scientific skill, not an administrative burden.

Do students need advanced coding experience to participate?

No. They need enough technical comfort to run scripts, edit configuration files, and make basic plots, but the workshop can be designed so that the hardest setup work is preconfigured by the instructor. In many cases, the learning comes from interpretation rather than coding complexity. A mixed-experience group can work very well if roles are assigned clearly.

Why not just teach one pipeline instead of comparing several?

Teaching one pipeline is useful, but it can hide the fact that results depend on methodological choices. Comparing pipelines reveals how robust a signal really is. That makes students better scientists because they learn to ask whether a result survives reasonable alternative approaches.

What should students do if the pipelines disagree strongly?

They should not panic or average the answers blindly. Instead, they should compare logs, check preprocessing assumptions, inspect residuals, and ask which difference is most likely to explain the divergence. Strong disagreement is often the most valuable part of the lesson because it forces careful error analysis.

Can this workshop be adapted for classrooms without access to local telescope data?

Yes. Public benchmark datasets, published RV archives, and pre-packaged teaching files are enough for a strong workshop. The important thing is to preserve the comparison and reproducibility components. Students can learn nearly all of the core concepts without collecting the data themselves.

How does this workshop connect to open science?

Open science means sharing data, code, settings, and interpretation in a way that other people can inspect and reuse. This workshop is a direct model of that practice because students must document their workflow, explain assumptions, and make the analysis reproducible. The open approach is also more educational because it turns the entire pipeline into a transparent learning object.

Conclusion: The Real Lesson Is Not Just the Planet, but the Process

Radial velocity remains one of the most elegant techniques in astronomy because it turns tiny motions into evidence for worlds we cannot see directly. But the method’s real educational strength is that it makes the invisible assumptions of science visible. When students compare multiple pipelines on the same dataset, they learn how reproducibility, error analysis, and careful documentation separate a tentative signal from a persuasive result. That is a lesson that applies far beyond exoplanets and into every corner of modern data-driven science.

If you are building a course, a research boot camp, or a student workshop series, the smartest approach is to treat the pipeline itself as part of the curriculum. Students should learn not only how to run it, but how to test it, question it, and explain it. For more context on the broader research environment students are entering, our overview of undergraduate astronomy program trends shows why these skills are increasingly central. And if you want to connect this workshop to outreach, communication, or event planning, related guides like NASA clip-based content ideas and eclipse planning can help you turn technical learning into memorable experiences.

Timely Without the Clickbait: How to Cover Space Industry Market Moves - A guide to accurate, credible space reporting for fast-moving topics.
Narrative Transport for the Classroom - Learn how story structure can improve technical teaching.
Use Simulation and Accelerated Compute to De-Risk Physical AI Deployments - A useful analogy for stress-testing assumptions before real-world use.
Cultivating Strong Onboarding Practices in a Hybrid Environment - Helpful for organizing student research teams and shared workflows.
How to Plan the Perfect Trip to See a Total Solar Eclipse - A practical observing guide that connects science with real-world planning.

Elena Hart

Senior Science Editor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.