Complete guide to using the BoardGameGeek Data Extractor library.
pip install bgg-extractor
git clone https://github.com/hantablack9/BoardGameGeek-Data-Extractor.git
cd BoardGameGeek-Data-Extractor
pip install -e .
uv add bgg-extractor
# or from local source
uv add ../BoardGameGeek-Data-Extractor
The BGG API token is mandatory for all operations. Set it up in one of three ways:
# Windows PowerShell
$env:BGG_API_TOKEN="your_token_here"
# Linux/Mac
export BGG_API_TOKEN="your_token_here"
Create a .env file in your project root:
BGG_API_TOKEN=your_token_here
The library will automatically load this file.
from bgg_extractor import BGGClient
client = BGGClient(token="your_token_here")
Note: Get your BGG API token from your BoardGameGeek account settings.
from bgg_extractor import search, get_things, save_json
# Search for games
results = search("Catan")
print(f"Found {len(results.items)} results")
# Get detailed information
games = get_things([13, 174430], stats=True) # Catan and Gloomhaven
# Save to file
save_json(games.items, "games.json")
# Search for games
bgg-extractor search --query "Gloomhaven" --output results.json
# Get game details
bgg-extractor things --ids 174430 --stats --output gloomhaven.json
# Get user collection
bgg-extractor collection --username eekspider --stats --output collection.csv
Search for board games by name.
bgg-extractor search --query "Wingspan" --output results.json
bgg-extractor search --query "Pandemic" --type boardgame --exact
Options:
--query, -q: Search query (required)--type, -t: Filter by type (boardgame, rpg, etc.)--exact: Exact match only--output, -o: Output file path--format, -f: Output format (json or csv)Get detailed information about specific games.
bgg-extractor things --ids 174430 13 --stats --output games.json
Options:
--ids: Game IDs (required, space-separated)--type: Filter by type--stats: Include statistics--videos: Include videos--versions: Include version information--output, -o: Output file path--format, -f: Output format (json or csv)Get a user’s game collection.
bgg-extractor collection --username eekspider --stats --output collection.csv
Options:
--username, -u: BGG username (required)--stats: Include statistics--brief: Return abbreviated results--subtype: Item subtype filter--output, -o: Output file path--format, -f: Output format (json or csv)Get a user’s play history.
bgg-extractor plays --username eekspider --output plays.json
Options:
--username, -u: BGG username (required)--id: Filter by specific game ID--mindate: Minimum date (YYYY-MM-DD)--maxdate: Maximum date (YYYY-MM-DD)--output, -o: Output file path--format, -f: Output format (json or csv)Simple, blocking functions perfect for scripts and notebooks:
from bgg_extractor import (
search,
get_things,
get_collection,
get_plays,
get_user,
save_json,
save_csv
)
# All functions handle async internally
results = search("Wingspan")
games = get_things([266192], stats=True)
collection = get_collection("eekspider", stats=True)
For maximum performance in async applications:
import asyncio
from bgg_extractor import BGGClient
async def fetch_data():
async with BGGClient(token="your_token") as client:
# Search
results = await client.search("Gloomhaven")
# Get games
games = await client.get_thing([174430], stats=True)
# Get collection
collection = await client.get_collection("username", stats=True)
return games
# Run async function
games = asyncio.run(fetch_data())
games = get_things([13], stats=True)
for game in games.items:
print(f"Name: {game.name}")
print(f"Year: {game.yearpublished}")
print(f"Players: {game.minplayers}-{game.maxplayers}")
print(f"Description: {game.description[:100]}...")
The BGG API has a 20-item limit per request:
from bgg_extractor import BGGClient
import asyncio
async def fetch_many_games(game_ids):
batch_size = 20
all_games = []
async with BGGClient() as client:
for i in range(0, len(game_ids), batch_size):
batch = game_ids[i:i+batch_size]
print(f"Fetching batch {i//batch_size + 1}...")
result = await client.get_thing(batch, stats=True)
all_games.extend(result.items)
return all_games
game_ids = list(range(1, 101))
games = asyncio.run(fetch_many_games(game_ids))
from bgg_extractor import save_json, save_csv
# Save as JSON
save_json(games.items, "games.json")
# Save as CSV
save_csv(games.items, "games.csv")
from bgg_extractor.transform import models_to_list, model_to_dict
import pandas as pd
# Convert to list of dictionaries
games_dict = models_to_list(games.items)
# Create DataFrame
df = pd.DataFrame(games_dict)
print(df.head())
# Save DataFrame
df.to_csv("games_processed.csv", index=False)
For large datasets, stream directly to disk:
import json
from bgg_extractor import BGGClient
async def stream_to_disk(game_ids, output_file):
batch_size = 20
async with BGGClient() as client:
with open(output_file, 'w') as f:
for i in range(0, len(game_ids), batch_size):
batch = game_ids[i:i+batch_size]
result = await client.get_thing(batch, stats=True)
# Write each game as JSON line
for game in result.items:
f.write(json.dumps(game.model_dump()) + '\n')
# Later, read back with pandas
import pandas as pd
df = pd.read_json("games.jsonl", lines=True)
Solution: Set the BGG_API_TOKEN environment variable or create a .env file.
Solution: The BGG API limits requests to 20 items. Use batch processing (see examples above).
Solution: Use the async API directly with await in Jupyter notebooks instead of sync wrappers.
# Instead of:
results = search("Catan") # Won't work in Jupyter
# Use:
async with BGGClient(token=token) as client:
results = await client.search("Catan")
Solution: The library includes automatic throttling (2-second delay between requests). If you still encounter issues, increase min_delay:
from bgg_extractor import BGGClient
async with BGGClient(min_delay=5.0) as client: # 5-second delay
results = await client.search("Catan")
The BGG API sometimes queues requests. The library automatically retries up to 12 times with exponential backoff. If needed, adjust:
async with BGGClient(max_poll_attempts=20) as client:
results = await client.search("Catan")