# Orderbook

20-level L2 orderbook snapshots for every perpetual and spot market on Hyperliquid and all HIP-3 dexes. Each snapshot captures the top 20 bid and ask price levels at the time it was taken — price, total size at the level, and order count. Updated weekly.

**Bucket:** `s3://hydromancer-reservoir` (requester pays)

```
by_dex/{dex}/orderbook/1m/{asset_class}/date=YYYY-MM-DD/{coin}.parquet
```

One file per coin per day. `{dex}` is `hyperliquid` for native perps/spot, or the HIP-3 dex name (`xyz`, `hyna`, `felix`, …). `{asset_class}` is `perps` . Files are only created for dates with data.

Coin naming:

* **Perps (native):** `BTC`, `ETH`, `SOL`, …
* **Perps (HIP-3):** same as the coin (e.g. `NVDA`, `AAPL`, `SP500`). Routed via the `{dex}` path segment.
* **Spot (native):** `@{pair_index}` (e.g. `@107`, `@142`). The legacy `PURR/USDC` pair has its slash URL-encoded as `PURR%2FUSDC.parquet` on disk.

## Schema

Each row is one complete orderbook snapshot. Bid and ask levels are stored as list columns (best price first), with up to 20 levels per side.

| Column          | Type                                           | Description                                                                                                                     |
| --------------- | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `block_time_ms` | int64                                          | Block time in milliseconds since epoch (UTC)                                                                                    |
| `block_number`  | uint64                                         | Hyperliquid block number                                                                                                        |
| `bids`          | `list<struct<px:string, sz:string, n:uint32>>` | Bid levels, highest price first. `px` and `sz` are decimal strings for lossless precision; `n` is the order count at that level |
| `asks`          | `list<struct<px:string, sz:string, n:uint32>>` | Ask levels, lowest price first. Same field semantics as bids                                                                    |

**Why list-of-struct?** One row equals one complete snapshot, which matches how you'd think about the book. Parquet physically stores `bids.list.item.px`, `bids.list.item.sz`, and `bids.list.item.n` as separate columns under the hood — so column pruning still works. A query that only reads `bids[1].px` (the best bid over time) only loads that one physical column.

## Quick Start

### DuckDB

```sql
INSTALL httpfs;
LOAD httpfs;
SET s3_region = 'ap-northeast-1';

-- Best bid / best ask / spread / mid over the day
SELECT
    block_time_ms,
    CAST(bids[1].px AS DOUBLE) AS best_bid,
    CAST(asks[1].px AS DOUBLE) AS best_ask,
    CAST(asks[1].px AS DOUBLE) - CAST(bids[1].px AS DOUBLE) AS spread,
    (CAST(bids[1].px AS DOUBLE) + CAST(asks[1].px AS DOUBLE)) / 2 AS mid
FROM read_parquet('s3://hydromancer-reservoir/by_dex/hyperliquid/orderbook/1m/perps/date=2026-04-06/BTC.parquet')
ORDER BY block_time_ms;

-- Depth within 10 bps of mid (both sides, USD)
WITH snap AS (
    SELECT
        block_time_ms,
        (CAST(bids[1].px AS DOUBLE) + CAST(asks[1].px AS DOUBLE)) / 2 AS mid,
        bids, asks
    FROM read_parquet('s3://hydromancer-reservoir/by_dex/hyperliquid/orderbook/1m/perps/date=2026-04-06/BTC.parquet')
)
SELECT
    block_time_ms, mid,
    list_sum(list_transform(bids, b ->
        CASE WHEN CAST(b.px AS DOUBLE) >= mid * 0.999
             THEN CAST(b.sz AS DOUBLE) * CAST(b.px AS DOUBLE) ELSE 0 END
    )) AS bid_10bps_usd,
    list_sum(list_transform(asks, a ->
        CASE WHEN CAST(a.px AS DOUBLE) <= mid * 1.001
             THEN CAST(a.sz AS DOUBLE) * CAST(a.px AS DOUBLE) ELSE 0 END
    )) AS ask_10bps_usd
FROM snap
ORDER BY block_time_ms;

-- Multi-coin scan: average spread across the top native perps on a day
SELECT
    regexp_extract(filename, '/([^/]+)\.parquet$', 1) AS coin,
    AVG(CAST(asks[1].px AS DOUBLE) - CAST(bids[1].px AS DOUBLE)) AS avg_spread,
    AVG((CAST(bids[1].px AS DOUBLE) + CAST(asks[1].px AS DOUBLE)) / 2) AS avg_mid
FROM read_parquet(
    's3://hydromancer-reservoir/by_dex/hyperliquid/orderbook/1m/perps/date=2026-04-06/*.parquet',
    filename = true
)
GROUP BY coin
ORDER BY avg_mid DESC;

-- Multi-day query for a single coin
SELECT block_time_ms, bids[1].px AS best_bid, asks[1].px AS best_ask
FROM read_parquet('s3://hydromancer-reservoir/by_dex/hyperliquid/orderbook/1m/perps/date=*/BTC.parquet')
WHERE block_time_ms BETWEEN 1775347200000 AND 1775606400000
ORDER BY block_time_ms;
```

### Polars

```python
import polars as pl

# Best bid / best ask / spread
df = pl.scan_parquet(
    "s3://hydromancer-reservoir/by_dex/hyperliquid/orderbook/1m/perps/date=2026-04-06/BTC.parquet"
).select([
    pl.col("block_time_ms"),
    pl.col("bids").list.first().struct.field("px").cast(pl.Float64).alias("best_bid"),
    pl.col("asks").list.first().struct.field("px").cast(pl.Float64).alias("best_ask"),
]).with_columns(
    spread=pl.col("best_ask") - pl.col("best_bid"),
    mid=(pl.col("best_bid") + pl.col("best_ask")) / 2,
).collect()

# Explode to long format (one row per level) if you prefer
long = pl.scan_parquet(
    "s3://hydromancer-reservoir/by_dex/hyperliquid/orderbook/1m/perps/date=2026-04-06/BTC.parquet"
).select([
    pl.col("block_time_ms"),
    pl.col("bids"),
]).explode("bids").unnest("bids").collect()
# columns: block_time_ms, px, sz, n
```

## Notes

* **Time ordering.** Rows within each file are written in block order. You can iterate without sorting for streaming reconstructions.
* **Missing levels.** If a side of the book has fewer than 20 populated price levels, the list simply contains fewer elements — no padding with zero/null entries.
* **Dex prefixes.** The path's `{dex}` segment is the authoritative dex identifier. HIP-3 coin names do **not** include the `dex:` prefix here — e.g. a Trade\[XYZ] file for `NVDA` lives at `by_dex/xyz/orderbook/1m/perps/date=…/NVDA.parquet`.
* **Scale.** For context, a busy day for 602 markets produces \~866K rows total per day across all coins in the 1m cadence, roughly \~90 MB compressed.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.hydromancer.xyz/reservoir/schema-reference/orderbook.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
