Run-bundle reference
qsospec uses one Parquet-backed run format for a single spectrum and for a large sample. Scalar science fields remain long-form and provisional; model components are nested records, so adding a future line recipe does not require adding a new Parquet column.
run_directory/
manifest.json
data/
inputs/
objects/
measurements/
warnings/
models/
failures/
derived/
qa/
.staging/
The datasets contain canonical, collision-free object shards. Finalization validates them without creating duplicate compact copies and removes empty staging state. JSON is used only for concise run-level provenance.
Single object
import qsospec
result = qsospec.fit_object_to_store(
"spectrum.fits",
"runs/my_object",
redshift=1.2,
)
The main QA is written by default. Set write_qa=False to defer plotting or
write_legacy_products=True to request the former loose CSV/JSON products.
Batch fitting
batch = qsospec.fit_batch(
["spectra-000.parquet", "spectra-001.parquet"],
"runs/sample",
n_workers="auto",
)
Parquet sources are scanned once with projected columns and bounded record batches. FITS inputs may be files, globs, directories, or CSV/Parquet manifest tables. FITS tasks are dynamically scheduled one file at a time; Parquet spectra use small worker microbatches.
n_workers="auto" selects at most eight spawned processes and leaves one CPU
available. Each worker limits BLAS/OpenMP to one thread. n_workers=1 selects
serial execution. A restricted platform without process semaphore support
falls back to serial execution.
For independent cluster jobs, use the same run directory and configuration:
qsospec.fit_batch(
inputs,
run_directory,
num_shards=16,
shard_index=job_index,
finalize=False,
)
After every job completes:
qsospec.finalize_run(run_directory)
Partitioning is deterministic from the internal source-and-row object key. Workers write checksummed private staging shards; only the coordinator promotes validated shards.
Resume and inspection
Reusing a run directory with the same configuration skips completed objects and retries failures by default. A changed scientific configuration is rejected; use a new run directory or run ID.
run = qsospec.open_run("runs/sample")
model = qsospec.load_model(run, "scientific-object-id")
Object IDs need not be unique. Use the internal object_key when an ID is
ambiguous.
Catalogs, derived quantities, and QA
Wide science catalogs are views over the authoritative long-form
measurements table. Inspect available quantities before defining a catalog:
measurements = run.read_table("measurements").to_pandas()
print(
measurements[["section", "recipe_id", "quantity"]]
.drop_duplicates()
)
Derived quantities are a separate calibration stage. A calculator receives an object record and all of its long-form measurements, and returns one or more records containing a quantity, value, errors, unit, and optional metadata. This permits changing cosmology, bolometric corrections, or black-hole-mass calibrations without refitting spectra.
qsospec.compute_derived_quantities(run, calculators)
qsospec.render_qa(
run,
warning_codes=["optional_line_fit_failed"],
sample=20,
)
Batch fitting does not create QA figures by default. render_qa(...) can
select object IDs, warning codes, failures, deterministic random samples, or a
query against the object table. Main QA figures distinguish final fitted
pixels, pPXF emission masks, and configured not-modelled windows. Schema
version 5 stores exact pPXF masks, per-complex excluded-pixel masks and
metadata, rest wavelength, and the rest-frame-normalized arrays used by the
fit. Older development schemas are rejected and their runs should be
recreated.
Model rows store the corrected, rest-frame-normalized arrays actually fitted plus Galactic-extinction and frame-conversion provenance. Raw uncorrected flux arrays are not duplicated.
Notebook display
figure = model.plot_qa()
model.show_qa()
run.plot_qa("scientific-object-id")
These methods return open Matplotlib figures and do not create additional
files. model.qa_path points to the primary saved QA image when available.