Reproduce the Butternut Study: A Step‑by‑Step Student GIS + Genomics Lab
lab-exercisereproducibilityconservation

Reproduce the Butternut Study: A Step‑by‑Step Student GIS + Genomics Lab

MMaya Thornton
2026-05-13
19 min read

A hands-on student lab for building a butternut habitat-suitability map with open genomic data, GIS, R, and restoration ethics.

If you want a classroom project that feels like real conservation science, this is it. In this lab, students use data-pattern thinking and modern spatial analysis to reproduce the logic behind a butternut restoration study: combine open genomic trait data, climate layers, and GIS tools to build a habitat-suitability map. The goal is not just to make a pretty map. The goal is to answer a scientific question with evidence, interpret uncertainty, and discuss the ethics of restoration decisions in a changing climate.

The butternut story is powerful because it sits at the intersection of ecology, genetics, disease, and climate. According to the Virginia Tech study summarized in the source material, endangered butternut trees and disease-resistant hybrids appear most likely to thrive in portions of the Midwest and Northeast when climate and soil conditions line up with genetic resistance. That makes this a perfect model for data-native analysis: students can work from open layers, document their workflow, and explain why their map recommends some restoration sites and rejects others.

For educators, this lab also solves a common problem in conservation education: how to teach complicated methods without turning the lesson into a black box. By using accessible tools like ArcGIS Pro, QGIS, and R, students practice reproducible science instead of merely memorizing terms. They also get to wrestle with the real-world issues behind restoration, including species identity, genetic mixing, assisted migration, and whose values count when deciding what should be planted where. If you are building a broader unit on climate ecology, you might also pair this with a lesson on responsible coverage of environmental news so students can compare scientific reporting with scientific methods.

1. What Students Will Learn

Core scientific ideas

This lab introduces students to habitat suitability modeling, a technique that estimates where a species is likely to survive based on environmental conditions. In plain language, it asks: if we know where a species currently survives, what climate and soil patterns do those places share, and where else might those patterns occur? Students also learn the conservation relevance of genomic data, especially how inherited traits such as disease resistance can be linked to restoration planning. That makes the exercise a strong example of connecting content, data, and learner experience in one coherent assignment.

Technical skills

Students practice downloading open data, cleaning tabular records, joining attributes to spatial layers, and creating raster-based suitability models. Depending on the class level, they can do the spatial work in ArcGIS Pro or QGIS and the data wrangling and visualization in R. The lab builds literacy in coordinate systems, layer symbology, classification, and spatial overlays. It also teaches students that good analysis is as much about documentation and assumptions as it is about software buttons.

Conservation thinking

This is not just a GIS lab. It is a conservation decision-making lab. Students must ask whether a mapped “best site” is actually appropriate for planting, especially when the target species is endangered and the source material may include hybrids or disease-resistant lines. That conversation makes this lesson feel much closer to real restoration planning than a typical classroom exercise. If you want a related example of how data-driven decisions shape practical outcomes, see how structured data reduces costly errors in inventory systems.

2. Background: Why Butternut, and Why Now?

The ecological problem

Butternut, a native North American tree in the walnut family, has been devastated by butternut canker, an invasive fungal disease that spread widely over the last century. As the source material explains, the species is now endangered and has nearly disappeared from many forests. That matters because butternut is a mast tree, producing nuts that feed wildlife such as turkeys, deer, and bears, so its decline affects both forest composition and food webs. In ecological terms, this is a classic case of a keystone-like resource being lost from the landscape.

The genomic angle

The exciting part of the study is that some butternut individuals appear naturally more resistant than others. In a conservation context, that means genetics is not just about ancestry or taxonomy; it can help identify which trees have traits worth protecting and propagating. Students can use open trait data, if available from public repositories or supplementary materials, to explore how phenotype and environment intersect. This is where the lab becomes a meaningful introduction to reproducible workflows: every step should be written down, versioned, and repeatable by another student.

Why habitat modeling matters

The Virginia Tech researchers combined climate, soil, and genetic data to identify where resistant butternut trees and hybrids are most likely to thrive. That kind of model is valuable because restoration isn’t simply about planting anywhere a species once lived. Climate conditions are shifting, disease pressure changes the equation, and some habitats may no longer support the original genetic stock. For a broader lesson on how location shapes outcomes, compare this with geographic risk analysis in other fields: the same logic applies when environmental suitability depends on spatial patterns.

3. Data Package: What Students Need

Minimum data layers

At a minimum, students need four kinds of data: occurrence points for butternut or resistant trees, climate variables, soil variables, and a boundary layer for the study region. If genomic trait data is available, it can be attached to the occurrence table as an attribute such as resistant, susceptible, or hybrid. A practical classroom setup is to provide a cleaned CSV of occurrences, plus raster climate layers and a soil raster or polygon layer. Students then focus on analysis, not on spending most of the period hunting for files.

Good open-data sources

Depending on your classroom access, useful sources may include public biodiversity records, climate normals, soil datasets, and supplementary tables from published studies. Teachers should model source evaluation: what is the provenance of the data, who collected it, and what are the limitations? This is an ideal moment to discuss why open data is powerful but not automatically complete or unbiased. If your students have worked with event planning or regional travel data before, they will recognize the importance of data quality and scale, much like in forecast-error planning.

Suggested classroom data dictionary

Create a short data dictionary before analysis begins. For example, columns might include sample_id, latitude, longitude, genotype_class, disease_resistance_score, source, and year_collected. Climate layers can be labeled by variable name, such as annual_mean_temp, precipitation_seasonality, or temperature_range. A simple table like this helps students understand that GIS is not magic; it is structured information arranged in spatial form.

Data ComponentExample SourceWhy It MattersClassroom Use
Occurrence pointsPublic records or study supplementShows where trees were foundMap sample locations
Genotype or resistance classStudy trait tableLinks biology to conservation valueSymbolize resistant vs susceptible
Climate rastersWorldClim or similar datasetsDefines environmental suitabilityBuild habitat model
Soil layerPublic soil survey dataCaptures edaphic constraintsFilter unsuitable areas
Region boundaryState or ecoregion shapefileSets analysis extentClip layers consistently

4. Software Setup: ArcGIS Pro, QGIS, and R

Choose one primary GIS platform

You can teach this lab in ArcGIS Pro or QGIS. ArcGIS Pro is especially helpful if your school already has a license, because it offers strong raster tools and a polished interface for students new to GIS. QGIS is excellent for open-access classrooms and has a large ecosystem of plugins and tutorials. Either way, the lesson works best if you commit to one platform as the main workspace and use R for repeatable preprocessing and plots.

Use R for reproducibility

R is the backbone of the reproducible-science part of the lab. Students can import the trait table, filter duplicates, summarize genotype classes, and create maps or charts that document the inputs to the GIS work. Packages like sf, terra, raster, and ggplot2 are particularly useful, though you can scale the difficulty up or down depending on the class. To understand why keeping computation local and transparent matters, see why local processing beats cloud-only systems for reliability.

Suggested installation checklist

Before class, verify that each student computer can open the sample data, display the base map, load rasters, and export a final image or PDF. Have a shared folder with all datasets already organized, and give students a script template so they do not begin from a blank file. This reduces troubleshooting and keeps the lesson focused on science rather than software friction. A well-prepared lab resembles a professional workflow more than a scavenger hunt, which is why thoughtful design matters in technical instruction just as it does in scaling K-12 learning support.

5. Step-by-Step Workflow for the Lab

Step 1: Clean the trait table in R

Start by importing the occurrence and trait data into R. Check for missing coordinates, inconsistent genotype labels, and duplicate samples. Then create a simplified class variable such as resistant, susceptible, and hybrid so the spatial analysis remains interpretable for beginners. Students should save the cleaned output as a new CSV and note every transformation they made.

Step 2: Map sample points

Open the cleaned table in your GIS software and display the points over a basemap and regional boundary. Ask students what patterns they notice before doing any modeling. Are resistant trees clustered in one region? Do hybrids appear in areas with different climate conditions? This early observation stage is vital because it connects statistical thinking with visual reasoning, a skill that also matters in audience analytics and other data-rich fields.

Step 3: Prepare climate and soil layers

Clip all rasters to the same extent and projection. Reclassify or standardize variables if necessary so they can be compared on a common scale. In a beginner version of the lab, students can use just three to five variables to avoid overwhelming complexity. In an advanced version, they can reduce multicollinearity by selecting variables with low correlation in R before uploading them to GIS.

Step 4: Build the suitability model

There are several ways to do this. A simple classroom approach is a weighted overlay model: assign each variable a score from low suitability to high suitability, then combine the layers. A more advanced approach is to use presence data and environmental background points to build a statistical model, then convert predicted probabilities into a map. Either way, the scientific question is the same: where do climate and soil conditions align with observed resistant trees?

Step 5: Validate the output

Students should compare predicted suitable regions against known sample locations and discuss whether the model seems plausible. They should also ask what the model cannot see: seed dispersal, land ownership, forest fragmentation, and changing future climate. This is where the lesson becomes genuinely scientific. A model is not a verdict; it is an argument with uncertainty.

Pro Tip: Have students save a versioned project file at each stage—raw data, cleaned data, prepared layers, final map—so they can retrace every decision and defend it during discussion.

6. Reproducibility: Turning a Lab into a Scientific Workflow

Document every assumption

Students often think reproducibility means “I can open my file later.” In science, it means another person can reproduce the result using your notes, data, and code. That requires documenting projections, variable names, classification rules, and any manual edits. Without that trail, even a visually impressive map has limited scientific value.

Use notebooks and folder structure

Encourage students to use a project folder with separate directories for raw data, processed data, scripts, maps, and exports. In R, a Quarto or R Markdown notebook is ideal because code and narrative live together. This habit makes the lesson easier to grade and easier to revisit later. It also teaches a workflow mindset similar to structured planning in crawl governance and documentation: clarity up front prevents confusion later.

Teach version control lightly

Even if your students are not ready for full Git workflows, you can introduce the concept of versioning by naming files clearly and saving milestones. For older students, a shared repository or class drive can reinforce collaboration norms. The key is to show that scientific work is iterative, not linear. Students should expect to revise their map after seeing early results, just as professional researchers refine models after checking outputs.

7. Interpreting the Habitat-Suitability Map

Read the map as a hypothesis, not a fact

Once the map is finished, the most important step is interpretation. Where are the highest-suitability areas, and do those zones make ecological sense? Are they concentrated in places such as southern Indiana, western Kentucky, western Michigan, or New England, as reported in the source study, or does the classroom model differ because of simplified variables? Students should be encouraged to explain discrepancies rather than assume the model is wrong.

Discuss scale and resolution

A map’s resolution changes what it can detect. A coarse climate raster may miss local microclimates, while a fine-scale map may imply more precision than the data support. Teachers can ask: if a site looks suitable at the regional scale, what additional field checks would we need before planting? This kind of discussion builds the habit of moving from map to management carefully.

Relate suitability to restoration strategy

Restoration should not be framed as “plant the highest-suitability pixels and call it a day.” A forest manager also needs to consider genetic diversity, access, long-term maintenance, land ownership, and disease risk. Students should compare their map to the source study’s explanation that restoration should target combinations of temperature, precipitation, and soil carbon that support resistant butternuts. In other words, the map is a decision aid, not a decision maker.

8. Restoration Ethics: The Hardest and Most Important Conversation

What counts as a native restored tree?

Butternut restoration raises a classic ethics question: if a native species is nearly gone and resistant hybrids may help it persist, how much genetic mixing is acceptable? Some people prioritize preserving the original genome as much as possible, while others argue that keeping the species functionally present in forests is the more urgent goal. Students should learn that restoration is never purely technical; it is also philosophical and political. If you want a useful analogy for choice under constraints, see when to refresh versus rebuild in brand strategy, where identity and practicality must be balanced.

Who benefits, and who decides?

Restoration choices affect wildlife, landowners, nurseries, researchers, and future generations. A planting plan that works on paper may fail if it ignores community priorities or local stewardship capacity. Students can role-play as forest managers, Indigenous land stewards, nursery operators, and conservation biologists to see how different stakeholders weigh risks and benefits. This is one of the best ways to teach ethical reasoning because it turns an abstract debate into a lived decision process.

How do we talk about assisted migration?

Climate change is pushing many species outside the conditions they historically occupied. That leads to difficult questions about assisted migration, which is the human movement of species to locations expected to become suitable. Is moving butternut seedlings a necessary intervention or an overreach? There is no one correct answer, but students should leave the lab understanding that conservation in the Anthropocene often means choosing among imperfect options rather than preserving a static past. For another angle on practical decision-making under uncertainty, read how historical forecast errors improve contingency planning.

9. Differentiation: How to Adapt the Lab for Different Grade Levels

Middle school or introductory biology

For younger students, simplify the lab to a guided map-reading exercise. Provide pre-made layers, a partially completed trait table, and a very small number of climate variables. Their job is to identify patterns and write a short claim-evidence-reasoning response about which region seems most suitable and why. They can still discuss ethics in plain language, especially the tradeoff between saving a species and keeping it genetically pure.

High school and dual enrollment

At this level, students can manage the full workflow with more independence. They can clean the data in R, create the model in QGIS or ArcGIS Pro, and write a short methods section explaining their choices. Teachers can assess both the final map and the reasoning behind it. This version is ideal for a capstone unit because it blends ecology, data science, and communication.

Undergraduate or advanced learners

For more advanced classes, ask students to compare two models: one using climate only and one using climate plus soil or genotype data. They can evaluate whether adding genetic information improves predictions and discuss the risks of overfitting. If you want to broaden the conversation about how data systems scale responsibly, the article on total cost of ownership for field deployments is a useful conceptual parallel for thinking about resource tradeoffs in research infrastructure.

10. Assessment, Discussion, and Extension Ideas

What to grade

A strong assessment rubric should include scientific accuracy, data workflow quality, map design, interpretation, and ethical reasoning. Did the student use appropriate layers? Did they explain their classification scheme? Did they identify at least one limitation of the model? Did they make a thoughtful argument about restoration ethics? These criteria reward both technical skill and scientific maturity.

Extension projects

Students can extend the project by comparing butternut with another threatened tree species, testing future climate scenarios, or exploring how habitat suitability changes under different weighting schemes. They could also create a short classroom presentation or infographic summarizing their results for a nontechnical audience. For students interested in public communication, compare this with how responsible environmental storytelling turns complex research into something the public can use.

Community science connection

If local or regional restoration groups exist, invite students to think about how their map might be used outside the classroom. What extra information would practitioners need before acting on it? This ties the lesson to real conservation work and makes the map feel consequential. Students begin to see that science education is not just about “getting the right answer,” but about building tools people can use wisely.

11. Common Pitfalls and How to Avoid Them

Confusing correlation with causation

Students may assume that because resistant trees occur in a certain climate, climate alone caused resistance. Remind them that genomic traits, soil conditions, and local history can all influence where trees survive. A habitat suitability model is a pattern-finding tool, not proof of mechanism. That distinction is one of the most important lessons in any research-based class.

Overstating precision

Maps can look authoritative even when the underlying data are thin. If the sample size is small or spatially biased, the map may exaggerate confidence. Teachers should require a limitations section in every student report. It is better to say “the model suggests” than “the model proves.”

Ignoring ethics until the end

Do not save ethics for a five-minute wrap-up. Introduce the ethical dilemma early, so students understand that each technical choice has a conservation consequence. When students know from the start that restoration decisions involve values, they pay closer attention to what the map is really doing. This makes the lesson more honest, more memorable, and more scientifically realistic.

12. Wrap-Up: Why This Lab Matters

It turns students into investigators

This lab gives students a full research arc: question, data, workflow, map, interpretation, and ethical reflection. That is a rare and valuable experience in secondary and undergraduate education. Instead of treating GIS as a cartography exercise, students learn it as a way of thinking about evidence in space. They also gain confidence working with genomics and environmental data at the same time.

It shows how open data can serve conservation

Open data is more than a buzzword here. It allows students to engage with real scientific materials, reproduce a contemporary study’s logic, and see how public research can inform restoration decisions. This kind of access is one reason reproducible science matters so much in education. When students can inspect the data, they can understand the science.

It makes the ethics visible

Perhaps most importantly, the butternut case teaches that conservation is not only about saving a species in the abstract. It is about deciding which trees to protect, which seedlings to plant, and how to balance genetic integrity with ecological resilience. Students leave with technical skills, but they also leave with a better sense of how science serves society. That combination is exactly what a modern conservation education lesson should do.

Pro Tip: End the lab by asking students to write a one-paragraph “management memo” to a forest steward, explaining where they would plant butternut, what uncertainties remain, and what ethical tradeoffs they considered.

Quick Comparison: ArcGIS Pro, QGIS, and R in This Lab

ToolBest ForStrength in the LabLimitation
ArcGIS ProPolished classroom GIS workflowsStrong raster tools and intuitive interfaceLicense availability may be limited
QGISOpen-access instructionFree, flexible, and widely supportedSome advanced tools require setup
RData cleaning and reproducibilityTransparent scripts and repeatable stepsCan feel intimidating to beginners
ArcGIS Pro + RMixed-method instructionCombines visual mapping with scripted analysisRequires switching between environments
QGIS + ROpen science classroomsLow-cost, highly reproducible workflowStudents may need extra guidance

FAQ

Do students need advanced programming experience to complete this lab?

No. Beginners can use pre-cleaned data and follow guided steps in ArcGIS Pro or QGIS. R is helpful for reproducibility, but it can be scaffolded with templates or partially completed scripts. The lab is designed so teachers can adjust the technical depth to fit the course level.

Where can I get open genomic trait data for the butternut study?

Start with supplementary materials from the published paper, public repository records, or author-shared datasets if available. If full genomic data is not accessible, a trait table with resistance classes still supports the core lesson. The important thing is to be transparent about data provenance and limitations.

What if my class only has QGIS and no ArcGIS Pro?

That is completely fine. QGIS can handle the spatial workflow effectively, and students can still build a strong habitat suitability map. The reproducibility lesson is actually stronger in open-source environments because the tools are accessible to anyone with the files and scripts.

How do I keep students from overinterpreting the map?

Require a limitations section, a confidence statement, and at least one alternative explanation for the result. You can also ask students to identify where ground-truth field surveys would be needed before action. This keeps the lesson grounded in scientific caution rather than visual persuasion.

Why include an ethics discussion in a GIS lab?

Because restoration is never value-neutral. Deciding where and how to plant endangered trees involves tradeoffs about genetics, ecosystems, and human responsibility. The ethics discussion helps students understand that scientific tools support decisions, but do not replace judgment.

Can this lab be adapted for other species?

Yes. The same framework works for many conservation case studies, especially those involving climate suitability, disease resistance, or hybridization. Students can compare species or regions and learn that the workflow matters more than the specific organism.

Related Topics

#lab-exercise#reproducibility#conservation
M

Maya Thornton

Senior Education Editor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

2026-05-13T09:22:40.827Z