Training Data

Export

Export historical, time-sliced market features and labels as CSV or JSON for model training.

Note

The export endpoint requires Pro or Enterprise. See Plans & Pricing.

Export

GET /export

Labeled ML training data. Returns computed features for every resolved market, ready for model training. Each row has 22 ML features plus outcome label and metadata (28 columns total).

ParameterTypeDefaultDescription
limit1-100001000Number of rows to return
min_tradesnumber0Minimum trade count per market
formatstringjson"json" or "csv"
categorystringFilter by market type: crypto_short, crypto, sports, politics, weather, other
condition_idstringGet features for a specific market
price_min0-1Minimum median_price_at_trade (filters out decided markets)
price_max0-1Maximum median_price_at_trade (filters out decided markets)
slicedbooleanfalseReturn time-sliced training data (features at 25/50/75% lifecycle) for live model training
# JSON format
curl -H "x-api-key: YOUR_KEY" \
  "https://api.scanna.xyz/export?limit=100&min_trades=5"

# CSV format (directly loadable by pandas)
curl -H "x-api-key: YOUR_KEY" \
  "https://api.scanna.xyz/export?format=csv&limit=1000" > training_data.csv

Features per market

FeatureDescription
volume_totalTotal USD volume
volume_zscoreVolume z-score vs all markets
buy_ratioBuy volume / total volume
whale_trade_countTrades ≥$1K notional
whale_notional_ratioWhale volume / total volume
smart_money_*Smart money participation, buy ratio, trade count (point-in-time)
hours_to_resolveDuration from first trade to resolution
unique_tradersDistinct wallet count
price_volatilityStandard deviation of trade prices
ofi_normalizedOrder flow imbalance [-1, 1] (buy pressure vs sell pressure)
vwapVolume-weighted average price
vwap_deviationMedian price minus VWAP (late informed money signal)
late_money_*Volume fraction and OFI of last 25% of trades
price_momentumLast trade price minus first trade price
price_autocorrMomentum continuation (+1) vs reversal (-1)
wallet_hhiWallet concentration index (0=dispersed, 1=dominated)
winning_sideOutcome label: "side_a" or "side_b"