BoardGameGeek-Data-Extractor

BGG Extractor User Guide

Complete guide to using the BoardGameGeek Data Extractor library.

Table of Contents

Installation

Via pip (when published to PyPI)

pip install bgg-extractor

From source

git clone https://github.com/hantablack9/BoardGameGeek-Data-Extractor.git
cd BoardGameGeek-Data-Extractor
pip install -e .
uv add bgg-extractor
# or from local source
uv add ../BoardGameGeek-Data-Extractor

Configuration

BGG API Token (Required)

The BGG API token is mandatory for all operations. Set it up in one of three ways:

Option 1: Environment Variable

# Windows PowerShell
$env:BGG_API_TOKEN="your_token_here"

# Linux/Mac
export BGG_API_TOKEN="your_token_here"

Create a .env file in your project root:

BGG_API_TOKEN=your_token_here

The library will automatically load this file.

Option 3: Pass Directly in Code

from bgg_extractor import BGGClient

client = BGGClient(token="your_token_here")

Note: Get your BGG API token from your BoardGameGeek account settings.

Quick Start

Python Library

from bgg_extractor import search, get_things, save_json

# Search for games
results = search("Catan")
print(f"Found {len(results.items)} results")

# Get detailed information
games = get_things([13, 174430], stats=True)  # Catan and Gloomhaven

# Save to file
save_json(games.items, "games.json")

Command Line

# Search for games
bgg-extractor search --query "Gloomhaven" --output results.json

# Get game details
bgg-extractor things --ids 174430 --stats --output gloomhaven.json

# Get user collection
bgg-extractor collection --username eekspider --stats --output collection.csv

CLI Usage

Available Commands

Search for board games by name.

bgg-extractor search --query "Wingspan" --output results.json
bgg-extractor search --query "Pandemic" --type boardgame --exact

Options:

things

Get detailed information about specific games.

bgg-extractor things --ids 174430 13 --stats --output games.json

Options:

collection

Get a user’s game collection.

bgg-extractor collection --username eekspider --stats --output collection.csv

Options:

plays

Get a user’s play history.

bgg-extractor plays --username eekspider --output plays.json

Options:

Python Library Usage

Simple, blocking functions perfect for scripts and notebooks:

from bgg_extractor import (
    search,
    get_things,
    get_collection,
    get_plays,
    get_user,
    save_json,
    save_csv
)

# All functions handle async internally
results = search("Wingspan")
games = get_things([266192], stats=True)
collection = get_collection("eekspider", stats=True)

Async API (For Advanced Use)

For maximum performance in async applications:

import asyncio
from bgg_extractor import BGGClient

async def fetch_data():
    async with BGGClient(token="your_token") as client:
        # Search
        results = await client.search("Gloomhaven")

        # Get games
        games = await client.get_thing([174430], stats=True)

        # Get collection
        collection = await client.get_collection("username", stats=True)

        return games

# Run async function
games = asyncio.run(fetch_data())

Working with Data

Accessing Game Information

games = get_things([13], stats=True)

for game in games.items:
    print(f"Name: {game.name}")
    print(f"Year: {game.yearpublished}")
    print(f"Players: {game.minplayers}-{game.maxplayers}")
    print(f"Description: {game.description[:100]}...")

Batch Processing with Rate Limits

The BGG API has a 20-item limit per request:

from bgg_extractor import BGGClient
import asyncio

async def fetch_many_games(game_ids):
    batch_size = 20
    all_games = []

    async with BGGClient() as client:
        for i in range(0, len(game_ids), batch_size):
            batch = game_ids[i:i+batch_size]
            print(f"Fetching batch {i//batch_size + 1}...")
            result = await client.get_thing(batch, stats=True)
            all_games.extend(result.items)

    return all_games

game_ids = list(range(1, 101))
games = asyncio.run(fetch_many_games(game_ids))

Saving Data

from bgg_extractor import save_json, save_csv

# Save as JSON
save_json(games.items, "games.json")

# Save as CSV
save_csv(games.items, "games.csv")

Data Transformation

from bgg_extractor.transform import models_to_list, model_to_dict
import pandas as pd

# Convert to list of dictionaries
games_dict = models_to_list(games.items)

# Create DataFrame
df = pd.DataFrame(games_dict)
print(df.head())

# Save DataFrame
df.to_csv("games_processed.csv", index=False)

Stream to Disk (Memory Efficient)

For large datasets, stream directly to disk:

import json
from bgg_extractor import BGGClient

async def stream_to_disk(game_ids, output_file):
    batch_size = 20

    async with BGGClient() as client:
        with open(output_file, 'w') as f:
            for i in range(0, len(game_ids), batch_size):
                batch = game_ids[i:i+batch_size]
                result = await client.get_thing(batch, stats=True)

                # Write each game as JSON line
                for game in result.items:
                    f.write(json.dumps(game.model_dump()) + '\n')

# Later, read back with pandas
import pandas as pd
df = pd.read_json("games.jsonl", lines=True)

Troubleshooting

Common Issues

“BGG_API_TOKEN is required”

Solution: Set the BGG_API_TOKEN environment variable or create a .env file.

“Cannot load more than 20 items”

Solution: The BGG API limits requests to 20 items. Use batch processing (see examples above).

“This event loop is already running” (Jupyter)

Solution: Use the async API directly with await in Jupyter notebooks instead of sync wrappers.

# Instead of:
results = search("Catan")  # Won't work in Jupyter

# Use:
async with BGGClient(token=token) as client:
    results = await client.search("Catan")

Rate Limiting / 429 Errors

Solution: The library includes automatic throttling (2-second delay between requests). If you still encounter issues, increase min_delay:

from bgg_extractor import BGGClient

async with BGGClient(min_delay=5.0) as client:  # 5-second delay
    results = await client.search("Catan")

202 Queued Responses

The BGG API sometimes queues requests. The library automatically retries up to 12 times with exponential backoff. If needed, adjust:

async with BGGClient(max_poll_attempts=20) as client:
    results = await client.search("Catan")

Getting Help