The Data Mage
1 Introduction
Welcome to Data Mage, a journey into the world of Magic: The Gathering (MTG) through the lens of data science. MTG is a collectible card game celebrated for its popularity, rich history, and intricate gameplay, offering a treasure trove of data ripe for analysis.
Recently I’ve rekindled my interest in MTG. I was big into MTG during middle school from 1998-2000. After a long hiatus, I have caught MTG fever with my son and we’ve been playing avidly for the last year. I’ve embarked on this project to deepen my understanding of the game and share insights that others might find valuable.
This exploration will delve into three main facets of MTG: gameplay, collecting, and economics.
In terms of gameplay, I’ll analyze various perspectives—from aggregate win rates to micro-level decisions like combat resolution and mulligan choices, as well as evaluations of components such as card mana efficiency. We’ll also delve into the metagame, including deck building, play formats, and drafting strategies.
The collecting aspect is vast, with nearly 100,000 cards printed over the game’s history. I’ll examine this realm by exploring elements like the probability distributions of collector booster packs.
Finally, the economics of MTG, particularly the secondary market for singles, is of great interest. I’ll focus on predicting card prices and analyzing price trends for new sets.
Currently I have some set analysis for MTG Arena draft.
2 Data Science Applications
Magic: The Gathering offers a fascinating playground for data science exploration. Here are some concepts I’m excited to delve into:
- Exploratory Data Analysis: Understand the distribution of card attributes, mana costs, and rarities. I have analysis by draft sets here.
- Regression Analysis: Predicting a card’s expected mana cost based on its attributes and keywords, helping to understand the balance and design of cards.
- Large Language Models (LLMs): Using encoder LLMs to generate numeric representations from card descriptions, which can be leveraged for various predictive tasks.
- Network Science: Analyzing the relationships between decks and cards using bipartite graphs, and identifying communities within the card network through one-mode projections.
- Graph Neural Networks: Predicting a deck’s win rate based on its card composition, utilizing advanced neural network architectures designed for graph data.
- Bayesian Inference: Estimating the posterior distribution of a booster pack’s value given its composition and secondary market prices, providing insights into pack value.
- Time-Series Analysis: Forecasting card prices on the secondary market days after a set’s release to understand and anticipate market trends.
- Hidden Markov Models: Estimating the game board state by identifying phases like ‘opening’, ‘parity’, ‘winning’, or ‘losing’ using concepts like Quadrant Theory.
- Reinforcement Learning: Maximizing combat outcomes by making optimal decisions during the combat phase, given the board state of attacking and defending creatures.
- Optimization: Enhancing deck win rates by optimizing card selection under constraints like maximum budget or rarity limits.
- Game Theory: Applying utility theory to calculate the expected value of hands during mulligan decisions, improving strategic choices.
3 Focus on Limited Formats
In my analyses, I’ll concentrate on limited formats such as Draft and Sealed Deck. Constructed formats like Standard, Modern, and Legacy are currently beyond the scope of this project due to their complex metagame and the vast number of cards available for deck construction.
Draft play offers a unique opportunity to study multiple facets of player skill:
- Drafting Skill: Selecting the best cards from the draft pool based on the cards you’ve already picked.
- Deck Construction Skill: Building an effective deck from your drafted cards.
- Gameplay Skill: Playing the deck effectively in a tournament setting.
While Sealed and Constructed formats are also intriguing, they exclude the drafting component. Additionally, Constructed formats require an in-depth understanding of the metagame, which I’m setting aside for now to maintain a manageable scope.
4 Data Sources
4.1 Card Data
I’m utilizing card data generously provided by the tireless team behind the open-source project MTGJSON. MTGJSON offers a comprehensive database of MTG cards, including attributes, text, and prices. The data is available in JSON format, which I’ll convert into pandas DataFrames for analysis.
MTGJSON sources much of its data from Scryfall, which also hosts an excellent web app for exploring MTG card information.
For booster pack composition data, MTGJSON references estimates provided by taw in his GitHub repository. Taw also offers a web app at mtg.wtf. Please note that booster pack compositions are proprietary information of Wizards of the Coast, and these probabilities are estimated.
4.2 Draft Play Data
For draft play analysis, I’ll use data from 17Lands. They compile data from their user base to provide detailed draft pick information, including pick order, cards selected, and deck win rates.
5 Code Availability
If you’re interested in exploring the code behind these analyses, feel free to check out my GitHub repository.
6 Join the Journey
I invite you to follow along as I explore the intersections of data science and Magic: The Gathering. Whether you’re a data enthusiast, a seasoned planeswalker, or someone curious about the fusion of analytics and gaming, there’s something here for you. Stay tuned for updates, and feel free to reach out with thoughts or collaborations.