> For the complete documentation index, see [llms.txt](https://docs.hydromancer.xyz/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.hydromancer.xyz/reservoir/schema-reference/orderbook.md).

# Orderbook

20-level L2 orderbook snapshots for every perpetual and spot market on Hyperliquid and all HIP-3 dexes. Each snapshot captures the top 20 bid and ask price levels at the time it was taken — price, total size at the level, and order count. Updated weekly.

**Bucket:** `s3://hydromancer-reservoir` (requester pays)

```
by_dex/{dex}/orderbook/1m/{asset_class}/date=YYYY-MM-DD/{coin}.parquet
```

One file per coin per day. `{dex}` is `hyperliquid` for native perps/spot, or the HIP-3 dex name (`xyz`, `hyna`, `felix`, …). `{asset_class}` is `perps` . Files are only created for dates with data.

Coin naming:

* **Perps (native):** `BTC`, `ETH`, `SOL`, …
* **Perps (HIP-3):** same as the coin (e.g. `NVDA`, `AAPL`, `SP500`). Routed via the `{dex}` path segment.
* **Spot (native):** `@{pair_index}` (e.g. `@107`, `@142`). The legacy `PURR/USDC` pair has its slash URL-encoded as `PURR%2FUSDC.parquet` on disk.

## Schema

Each row is one complete orderbook snapshot. Bid and ask levels are stored as list columns (best price first), with up to 20 levels per side.

| Column          | Type                                           | Description                                                                                                                     |
| --------------- | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `block_time_ms` | int64                                          | Block time in milliseconds since epoch (UTC)                                                                                    |
| `block_number`  | uint64                                         | Hyperliquid block number                                                                                                        |
| `bids`          | `list<struct<px:string, sz:string, n:uint32>>` | Bid levels, highest price first. `px` and `sz` are decimal strings for lossless precision; `n` is the order count at that level |
| `asks`          | `list<struct<px:string, sz:string, n:uint32>>` | Ask levels, lowest price first. Same field semantics as bids                                                                    |

**Why list-of-struct?** One row equals one complete snapshot, which matches how you'd think about the book. Parquet physically stores `bids.list.item.px`, `bids.list.item.sz`, and `bids.list.item.n` as separate columns under the hood — so column pruning still works. A query that only reads `bids[1].px` (the best bid over time) only loads that one physical column.

## Quick Start

### DuckDB

```sql
INSTALL httpfs;
LOAD httpfs;
SET s3_region = 'ap-northeast-1';

-- Best bid / best ask / spread / mid over the day
SELECT
    block_time_ms,
    CAST(bids[1].px AS DOUBLE) AS best_bid,
    CAST(asks[1].px AS DOUBLE) AS best_ask,
    CAST(asks[1].px AS DOUBLE) - CAST(bids[1].px AS DOUBLE) AS spread,
    (CAST(bids[1].px AS DOUBLE) + CAST(asks[1].px AS DOUBLE)) / 2 AS mid
FROM read_parquet('s3://hydromancer-reservoir/by_dex/hyperliquid/orderbook/1m/perps/date=2026-04-06/BTC.parquet')
ORDER BY block_time_ms;

-- Depth within 10 bps of mid (both sides, USD)
WITH snap AS (
    SELECT
        block_time_ms,
        (CAST(bids[1].px AS DOUBLE) + CAST(asks[1].px AS DOUBLE)) / 2 AS mid,
        bids, asks
    FROM read_parquet('s3://hydromancer-reservoir/by_dex/hyperliquid/orderbook/1m/perps/date=2026-04-06/BTC.parquet')
)
SELECT
    block_time_ms, mid,
    list_sum(list_transform(bids, b ->
        CASE WHEN CAST(b.px AS DOUBLE) >= mid * 0.999
             THEN CAST(b.sz AS DOUBLE) * CAST(b.px AS DOUBLE) ELSE 0 END
    )) AS bid_10bps_usd,
    list_sum(list_transform(asks, a ->
        CASE WHEN CAST(a.px AS DOUBLE) <= mid * 1.001
             THEN CAST(a.sz AS DOUBLE) * CAST(a.px AS DOUBLE) ELSE 0 END
    )) AS ask_10bps_usd
FROM snap
ORDER BY block_time_ms;

-- Multi-coin scan: average spread across the top native perps on a day
SELECT
    regexp_extract(filename, '/([^/]+)\.parquet$', 1) AS coin,
    AVG(CAST(asks[1].px AS DOUBLE) - CAST(bids[1].px AS DOUBLE)) AS avg_spread,
    AVG((CAST(bids[1].px AS DOUBLE) + CAST(asks[1].px AS DOUBLE)) / 2) AS avg_mid
FROM read_parquet(
    's3://hydromancer-reservoir/by_dex/hyperliquid/orderbook/1m/perps/date=2026-04-06/*.parquet',
    filename = true
)
GROUP BY coin
ORDER BY avg_mid DESC;

-- Multi-day query for a single coin
SELECT block_time_ms, bids[1].px AS best_bid, asks[1].px AS best_ask
FROM read_parquet('s3://hydromancer-reservoir/by_dex/hyperliquid/orderbook/1m/perps/date=*/BTC.parquet')
WHERE block_time_ms BETWEEN 1775347200000 AND 1775606400000
ORDER BY block_time_ms;
```

### Polars

```python
import polars as pl

# Best bid / best ask / spread
df = pl.scan_parquet(
    "s3://hydromancer-reservoir/by_dex/hyperliquid/orderbook/1m/perps/date=2026-04-06/BTC.parquet"
).select([
    pl.col("block_time_ms"),
    pl.col("bids").list.first().struct.field("px").cast(pl.Float64).alias("best_bid"),
    pl.col("asks").list.first().struct.field("px").cast(pl.Float64).alias("best_ask"),
]).with_columns(
    spread=pl.col("best_ask") - pl.col("best_bid"),
    mid=(pl.col("best_bid") + pl.col("best_ask")) / 2,
).collect()

# Explode to long format (one row per level) if you prefer
long = pl.scan_parquet(
    "s3://hydromancer-reservoir/by_dex/hyperliquid/orderbook/1m/perps/date=2026-04-06/BTC.parquet"
).select([
    pl.col("block_time_ms"),
    pl.col("bids"),
]).explode("bids").unnest("bids").collect()
# columns: block_time_ms, px, sz, n
```

## Notes

* **Time ordering.** Rows within each file are written in block order. You can iterate without sorting for streaming reconstructions.
* **Missing levels.** If a side of the book has fewer than 20 populated price levels, the list simply contains fewer elements — no padding with zero/null entries.
* **Dex prefixes.** The path's `{dex}` segment is the authoritative dex identifier. HIP-3 coin names do **not** include the `dex:` prefix here — e.g. a Trade\[XYZ] file for `NVDA` lives at `by_dex/xyz/orderbook/1m/perps/date=…/NVDA.parquet`.
* **Scale.** For context, a busy day for 602 markets produces \~866K rows total per day across all coins in the 1m cadence, roughly \~90 MB compressed.
