operationsAIobservatories

Scheduling Telescopes with LLMs: Lessons from Siri’s Next-Gen Architecture

UUnknown

2026-02-23

11 min read

A practical 2026 roadmap to use LLMs for telescope scheduling: from context plumbing to safe execution and prioritization strategies.

Hook: Why telescope schedulers need an AI rethink — and fast

Telescopes, from university domes to global robotic networks, face a familiar set of pains: overflowing target lists, weather disruptions, complex instrument constraints, and the need to deliver priority science on tight windows. Observers and operations teams waste hours translating natural-language requests into rigid observation blocks or manually reshuffling queues when a Target of Opportunity (ToO) fires. In 2026, with Rubin/LSST alerts, higher multi-messenger cadence, and more robotics in the field, these frictions are worse — and they scale badly.

Apple's decision to use Google's Gemini models as the core for its next-gen Siri shows a clear industry pattern: combine a powerful foundation model with robust context plumbing, tool APIs, and strict safety controls to make an assistant that understands users and systems. This article translates that Siri/Gemini lesson into a practical, step-by-step roadmap for using large language models (LLMs) to manage telescope schedules, prioritize targets, and interface with observers.

The big idea: LLMs as the orchestration layer, not the single oracle

Think of the LLM as an intelligent conductor that

interprets human requests (“I need a 2-hour spectroscopy of SN2026A within 6 hours”),
pulls live context (weather, ephemerides, instrument configs),
applies policy and science priorities, and
issues verified, auditable commands to the scheduler and telescope APIs.

This mirrors the Siri/Gemini pattern in three ways: contextual grounding (pulling from many data sources), tool use (calling specialized APIs instead of hallucinating solutions), and privacy/safety controls (auditable decisions and human oversight).

2026 trends that make LLM-driven scheduling both timely and necessary

High alert volumes: Rubin/LSST and upgraded transient networks continue generating millions of candidate alerts — humans can't triage them all.
Distributed telescopes and federated follow-up: Networks like Las Cumbres, plus more university and citizen networks, require automated coordination.
Advanced LLM toolkits: By 2026, vendor APIs support function calling, multimodal context, and fine-grained model behavior steering — ideal for system integration.
Edge/On-prem inference: Cost-sensitive observatories increasingly deploy local LLM inference for low-latency decisioning and data privacy.

Roadmap: From concept to production in six phases

Below is a pragmatic roadmap you can follow. Each phase includes actionable tasks and checkpoints so your team can move from prototype to safe, high-uptime automation.

Phase 1 — Define goals, KPIs, and safety policy (2–4 weeks)

Stakeholder interviews: Collect use cases (ToO follow-up, scheduled programs, student labs, calibration-only runs).
KPIs: Target success rate, average response time to alerts, human override frequency, percent of schedule auto-populated.
Safety policy: Define what the LLM may propose vs. what it may execute. High-value or safety-critical commands require human confirmation.

Phase 2 — Data plumbing and context sources (2–6 weeks)

The LLM works only as well as the context you provide. Build a context layer that exposes:

Observatory telemetry: dome status, mount health, instrument modes, last calibrations.
Weather/seeing forecasts: local forecast, cloud sensors, all-sky cameras, turbulence profiles.
Astronomical data: ephemerides (JPL), visibility windows, airmass and moon separation calculators.
Queue and policies: program priorities, time allocations, embargo windows, ToO ranks.
Alert feeds: Rubin/LSST stream, GCN for GW/GRBs, IceCube neutrino alerts.

Expose these as well-documented REST or gRPC endpoints. In 2026, vendors support vector-based retrieval stores; index static docs and recent telemetry for fast retrieval-augmented generation (RAG).

Phase 3 — Build the LLM orchestration layer (4–8 weeks)

Design the LLM to act as an orchestrator, not a low-level controller.

Prompt and function schema: Use the model's function-calling feature to define structured outputs (e.g., observation_block JSON). This prevents hallucinations.
RAG setup: Feed the LLM a curated context: current queue, last 24h telemetry, forecast, and policy. Keep the context window minimal but relevant.
Tooling layer: Implement adapters to call specific services: scheduling engine, telescope control (INDI/ASCOM/custom API), alert manager.
Explainability: Ask the model to generate a human-readable rationale for each scheduling decision.

Phase 4 — Integrate with the scheduler and constraint solver (4–12 weeks)

There are two integration patterns:

LLM-guided scheduler — LLM proposes prioritized observation blocks; the scheduler (ILP/heuristic engine) converts proposals into optimized nightly plans.
LLM-controlled scheduler — LLM composes observation plans and issues them directly to the scheduler. Use sparingly and with more controls.

Which to choose? Start with LLM-guided scheduling to keep a strong engineering separation between decision-making and plan optimization.

Phase 5 — Testing, simulation and human-in-loop deployment (4–8 weeks)

Replay historical nights: Run the system on archived data and compare outcomes to human schedules.
Chaos testing: Inject weather failures, ToO spikes, instrument faults to see how the LLM handles replan strategies.
Gradual rollout: Start with low-risk tasks (calibrations, student programs), then add science-critical programs when confidence is high.

Phase 6 — Monitoring, feedback loop and continuous learning (ongoing)

Set up metrics and a retraining cadences:

Track acceptance rate of LLM proposals by operators.
Log reasons for human overrides and retrain / update the policy layer.
Refresh retrieval indices and model prompts to reflect new instruments or updated policies.

Concrete architecture: components and responsibilities

Implement a layered architecture similar to modern assistant stacks:

Interface layer: Web UI, chat, or voice for observers to submit requests. Allows structured forms and natural language.
Intent parser: LLM-powered parser that extracts observation parameters and uncertainty flags.
Context store & RAG: Vector store for recent telemetry and static policy docs; retrieval returns concise context snippets.
Decision engine (LLM orchestration): Produces structured observation blocks, priority scores, and rationales using function calls.
Constraint solver / optimizer: Traditional scheduling algorithm (ILP, greedy heuristic, or RL-based) that turns blocks into an executable plan.
Telescope control layer: API adapters (INDI/ASCOM/custom) that accept observation blocks in a verified JSON format and execute them.
Audit and safety: Immutable logs, human approval queues, and dry-run modes with simulated execution.

Prioritization strategy — a replicable scoring function

Translate all competing criteria into a composite score. Use a parametric scoring function so weights are tunable and auditable.

Suggested score components (example)

Science priority (P): program-assigned priority rank (0–1).
Time criticality (T): proximity to deadline or ToO latency needs (0–1).
Visibility quality (V): airmass, moon separation, seeing forecast (0–1).
Resource cost (C): slewing time, calibration overhead, instrument switches (0–1, lower is better).
Risk factor (R): weather probability of failure (0–1, lower is better).

Composite score S could be:

S = w1*P + w2*T + w3*V - w4*C - w5*R

Where weights w1..w5 are tuned by ops teams. The LLM's job is to compute these components reliably and explain them in natural language. The optimizer then uses S as the objective when building the night's plan.

Sample JSON schema: observation_block

{
  "target_name": "SN2026A",
  "ra": 123.456,
  "dec": -12.345,
  "start_window": "2026-02-01T02:00:00Z",
  "end_window": "2026-02-01T08:00:00Z",
  "exposure_sequence": [
    {"filter": "r", "exptime": 600, "nexp": 3},
    {"filter": "i", "exptime": 600, "nexp": 3}
  ],
  "priority_score": 0.87,
  "justification": "High time-criticality; rising target; program priority A",
  "safety_requirements": {"human_approval": false}
}

Prompt engineering: practical templates and guardrails

In 2026, function-calling and schema enforcement are standard. Use concise prompts that include:

Task instruction: what you want (parse request, produce observation_block JSON).
Context snippets: top 3 retrieval results (forecast, queue snapshot, policy bullet).
Output schema: JSON schema to validate outputs.
Safety directives: do not produce commands that bypass human approval for high-priority overrides.

Example prompt (abstract):

"You are the Observatory Scheduler Assistant. Given the user request and the retrieved context items, return a validated observation_block JSON. If any uncertainty, return 'needs_clarification' with a list of questions. Limit justifications to 50 words."

Handling real-time events: ToO and high-alert rate strategies

Real-time events are where LLM orchestration shines — but they also carry the highest risk. Use these best practices:

Fast-path detection: lightweight LLM classifier that flags high-confidence ToOs and routes them to the priority pipeline.
Pre-approved policies: For some science programs (e.g., consortium time), allow auto-acceptance up to a predefined priority.
Staging area: Place incoming ToO proposals in a time-limited staging queue for quick human confirmation. The LLM provides a one-line rationale and minimal execution plan.
Backoff & retry: If weather risk > threshold, LLM suggests alternate partners (other telescopes in the network) and files follow-up alerts.

APIs and integration patterns — practical notes

Prefer function-calling APIs: They return structured outputs that are machine-validated and reduce parsing errors.
Tokenize and limit context: Large retrieval windows cost money. Keep context to the most relevant items and summarize older logs before passing them to the LLM.
Use typed contracts: All telescope commands should be strongly typed and validated server-side to prevent harmful actions.
Local fallback: For latency-sensitive systems, deploy a local inference instance to handle emergency pathing.

Evaluation: metrics that matter

Measure both operational and science impact.

Operational: mean time to schedule an alert, percent of auto-scheduled observations, human override rate.
Scientific: fraction of high-priority targets observed within required window, follow-up completeness for transient classes.
Reliability: false-positive commands, failed executions caused by bad LLM proposals.

Case study (mini): university observatory pilots an LLM scheduler

In late 2025 a 1-m university dome ran a 3-month pilot using an LLM orchestration layer. Key steps and outcomes:

Step 1: Built REST adapters to mount telemetry and an all-sky camera feed into a vector store.
Step 2: Trained prompt templates and a small intent classifier to parse observer emails into structured requests.
Step 3: Rolled out LLM-guided scheduling for student lab time and non-critical science. Human ops signed off on high-priority runs.

Results after 3 months:

Auto-scheduled observations rose to 62% of non-critical time.
Average prep time per observer fell from 25 minutes to 7 minutes.
One avoidable failure occurred when a mismatched instrument config was proposed — the team fixed the schema validation and eliminated recurrence.

Governance, transparency and trust

In advanced systems, trust is earned. Follow these rules:

Immutable logs: Store LLM inputs, outputs, retrieval context, and final commands for auditing.
Explainability as a default: Require a short human-readable rationale for every automated scheduling decision.
Version control models & policies: Record which model and prompt set made each decision.
Operator UI: Provide a fast override and an undo window for recently executed commands.

Common pitfalls and how to avoid them

Hallucinated instrument states: Fix by including authoritative telemetry in context and validating outputs against live system states.
Over-automation: Avoid granting execute rights to the LLM for safety-critical actions until fully tested.
Opaque priority shifts: Make priority weights visible to users and log any changes made by the system.
Cost explosion: Keep LLM calls lean; batch requests and use a lower-cost model for non-critical parsing.

Future predictions and advanced strategies for 2026+

Looking ahead, expect these developments:

Federated LLM schedulers: Multiple observatories will share model-driven proposals to automatically coordinate follow-ups across hemispheres.
Multimodal context: Models will ingest images (all-sky, instrument previews) to refine decisions in real time.
Policy-as-code: Observatory policies expressed as machine-checkable modules that the LLM must satisfy before proposing actions.
Automated science triage: LLMs rapidly classify alerts and route them to the right telescope class based on capability and current load.

Actionable checklist to get started this month

Define one use case (e.g., ToO follow-up for bright transients) and an acceptance KPI.
Expose 3 context endpoints (telemetry, weather, queue snapshot) as REST APIs.
Implement a simple LLM prompt that converts a text request into an observation_block JSON and validate the output with a unit test.
Run 100 replay tests on archived nights and measure false positives.
Deploy to a staging UI with human approval required for execution.

Closing: Lessons from Siri/Gemini — practical takeaways

Apple's Siri move shows a key architectural lesson: powerful models must be married to strong context plumbing, tool APIs, and governance. For observatory operations, that maps cleanly to an LLM that:

pulls authoritative context (telemetry, forecasts, ephemerides),
uses function-calling to produce validated, structured observation blocks,
defers final execution through a controlled scheduler or human approvals, and
logs and explains every decision for auditability.

Follow the roadmap above: define goals, wire context, orchestrate with the LLM, optimize with a solver, and keep humans in the loop until confidence is proven. With Rubin/LSST-era alert volumes and richer multi-messenger science in 2026, intelligent orchestration is no longer optional — it’s the only scalable path to maximize science.

Call to action

Ready to pilot an LLM-driven scheduler at your observatory? Start with the 1-month checklist above and share your results with the community. If you want a templated prompt set, JSON schemas, and an implementation checklist tailored to telescopes from 0.5–4 meters, sign up for our engineering drop (or contact your operations lead). Let's build safe, auditable automation that gets more science done with less friction.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Up Next

AI Duets: What Apple Choosing Google’s Gemini Means for Astronomy Tools

outreach•10 min read

Pop-Up Planetariums: Turning Opera Halls and University Spaces Into Science Venues

policy•10 min read

Mayors and the Night Sky: How Local Leadership (Like Zohran Mamdani) Can Fight Light Pollution

policy•10 min read

When Cultural Institutions Move: What Opera Relocations Teach Us About Saving Observatories from Politics

profiles•9 min read

Profiles of Artists Collaborating with Space Studios: Behind the Creative Partnerships

From Our Network

Trending stories across our publication group

Mapping the Filoni Era: A Visual Timeline of Star Wars Projects Under the New Leadership

thegalaxy.pro

Timeline•10 min read

Mapping the Filoni Era: A Visual Timeline of Star Wars Projects Under the New Leadership

From Amiibo to Astro‑Scapes: Using Collectible Figures to Build Interactive Exoplanet Displays

exoplanet.shop

collectibles•10 min read

From Amiibo to Astro‑Scapes: Using Collectible Figures to Build Interactive Exoplanet Displays

Livestreaming Space Sim Playthroughs on Bluesky: Using the ‘Live Now’ Badge to Grow an Audience

captains.space

streaming•11 min read

Livestreaming Space Sim Playthroughs on Bluesky: Using the ‘Live Now’ Badge to Grow an Audience

Staging Extinction: Classroom Drama Activities Inspired by Contemporary Plays

extinct.life

education•9 min read

Staging Extinction: Classroom Drama Activities Inspired by Contemporary Plays

The Math of Trades: Teaching Linear Programming and Salary-Cap Optimization Using NBA Trade Scenarios

naturalscience.uk

Mathematics•8 min read

The Math of Trades: Teaching Linear Programming and Salary-Cap Optimization Using NBA Trade Scenarios

Why James Mangold’s Jedi Origin Movie Is on Hold — and What It Reveals About Risk in Blockbuster Space Films