Introduction

The Data Mage site is an exploration of applying data science to Magic: The Gathering (MTG). MTG is a collectible card game known for its popularity, long history, and intricate gameplay. MTG offers a wealth of data and countless opportunities for analysis.

Although I’m not an MTG expert, I played briefly in 1999-2000 and recently returned to the game after my son became interested. This project is a way for me to deepen my understanding of MTG and provide insights that others might find useful.

I’ll analyze three main aspects of MTG: gameplay, collecting, and economics.

Given the complexity of gameplay, my analysis will cover various perspectives: aggregate win rates, micro-level decisions like combat resolution and mulligan choices, and evaluations of components such as card mana efficiency. We will also explore the metagame, including deck building, play formats, and drafting strategies

The collecting aspect of MTG is vast, with nearly 100,000 cards printed over the game’s history. I’ll examine this by looking into aspects like the probability distributions of collector booster packs.

Finally, the economics of MTG, particularly the secondary market for singles, is of great interest to me. I’ll focus on predicting card prices and analyzing price trends for new sets.

1 Data Science

MTG can be fodder for may aspects data science. Here is a sampler of concepts I would like to explore:

Regression: Predict card’s expected mana cost given its attributes and keywords.
LLMs: Generate numeric representations from card descriptions with encoder LLMs. Use representations for predictive tasks.
Network Science: Bipartite graphs of deck-card relationships. Identify communities in card one-mode projection.
Graph Neural Networks: Predict deck win rate with given card compositions.
Bayesian Inference: Posterior distribution of a booster pack’s value given the pack composition and secondary market prices.
Time-Series Analysis: Predict card price on secondary market \(d\) days after set release.
Hidden Markov Models: Estimation of board state. For example, estimate the states from Quadrant Theory. These states are ‘opening’, ‘parity’, ‘winning’, or ‘losing’ state.
Reinforcement Learning: Maximize the outcome of combat stage, given the board state of potentially attacking and defending creatures.
Optimization: Optimize win rate of a deck, given constraints such as maximum market cost or number of mythic rares.
Game Theory: For mulligans, calculate the expected value of hands using utility theory.

2 Play Format

For game play, I will look at limited formats, such as Draft and Sealed Deck. Constructed formats, such as Standard, Modern, and Legacy, are out of scope for now. This is due to the complexity of the metagame and the vast number of cards available for deck construction.

Draft play also allows us to study three types of player skill. First, there is the skill of drafting the best cards from the draft pool based on one’s currently drafted cards. Second, there is the skill of deck construction from the drafted cards. Third, there is the skill of playing the deck in a tournament setting.

Sealed and constructed formats are also interesting, but exclude the drafting skill. Constructed formats also require a deep understanding of the metagame, which is out of scope for now.

3 Data Sources

3.1 Card Data Sources

I’ll use the card data generously made available by the tireless folks at the open-source project MTGJSON. MTGJSON provides a comprehensive database of MTG cards, including card attributes, card text, and card prices. The data is available in JSON format, which I’ll convert to a pandas DataFrame for analysis.

MTGJSON sources a lot of data from Scryfall, which has an excellent webapp for exploring MTG card data.

MTGJSON sources the booster pack composition data from. The source code provided by taw on github contains estimated booster pack composition probabilities. He also provides a webapp at mtg.wtf. Note that booster pack composition is proprietary information of Wizards of the Coast, and the above probabilities are estimates.

See the mtgjson-data-intro notebook for more details on the data sources.

3.2 Draft Play Data Sources

For draft play, I will used data from 17lands. They compile data from their user base to provide draft pick data. The data includes the draft pick order, the cards picked, and the win rate of the deck.

See the draft-data-intro notebook for more information on the draft data.

4 Code

Anyone interested in looking at the code for these analyses can find it in the github repository