(1630, 1600, 30)
Summary
The code to wrangle the data. Deal with NAs, join data frames, etc.
1 Introduction
This data set is very large, in both memory and in size. Each row is a game, with identifiers for the draft event and the player. The columns have information on the given draft game, with the majority are for card counts per state, such as the number of a given card in the deck or the number of a given card in the opening hand.
Ideally I want to melt the data into a tall layout, with rows for each unique combination of card and game and columns for the card count states, such as deck, hand, etc. This will allow me to easily filter and group the data for analysis. Unfortunately, after attempting to wrangle in this format, I found that the data set is too large to handle in memory. I could play with making spares arrays, but I think that would be too much work for this data set.
For this notebook, I will keep the data in the wide layout and aggregate down into useful tables, such as per-game and per-player summaries.
1.1 Terms
The following terms are used in the data set: - game: a single game of MTG - match: a series of games between two players. Typically best of 1 or best of 3. - draft: an initial draft event, followed by a series of matches between a group of players. - build: a instance of a deck in a draft. Can be adjusted between games.
2 Initial Setup
Define Paths
Convert the csv file to parquet if needed.
Scan file into a lazy frame and set the schema.
Parse column names, such as the card name and the card state.
Check for extra underscores in card names.
Show first few rows of the data.
| expansion | event_type | draft_id | draft_time | game_time | build_index | match_number | game_number | rank | opp_rank | main_colors | splash_colors | on_play | num_mulligans | opp_num_mulligans | opp_colors | num_turns | won | user_n_games_bucket | user_game_win_rate_bucket |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | str | str | str | str | i64 | i64 | i64 | str | str | str | str | bool | i64 | i64 | str | i64 | bool | i64 | f64 |
Convert timestamps from strings to datetimes
3 Games Table
A table of each game, excludes card data.
Note that the ‘opp_rank’ does not have consistent type
| expansion | event_type | draft_id | draft_time | game_time | build_index | match_number | game_number | rank | opp_rank | main_colors | splash_colors | on_play | num_mulligans | opp_num_mulligans | opp_colors | num_turns | won | user_n_games_bucket | user_game_win_rate_bucket |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | str | str | datetime[μs] | datetime[μs] | i64 | i64 | i64 | str | str | str | str | bool | i64 | i64 | str | i64 | bool | i64 | f64 |
| "MKM" | "PremierDraft" | "8e1e92694dcd44e380720f2ec3d971… | 2024-02-06 17:48:51 | 2024-02-06 18:41:03 | 0 | 1 | 1 | null | "None" | "WB" | "G" | true | 0 | 0 | "WB" | 9 | true | 50 | 0.62 |
| "MKM" | "PremierDraft" | "8e1e92694dcd44e380720f2ec3d971… | 2024-02-06 17:48:51 | 2024-02-06 18:49:03 | 0 | 2 | 1 | "silver" | "None" | "WB" | "G" | true | 0 | 0 | "RG" | 9 | true | 50 | 0.62 |
4 Draft Table
A table of the aggregated draft data, excluding card data.
| draft_id | expansion | n_builds | n_matches | n_games | n_wins | win_rate | rank | opp_rank | main_colors | opp_colors | splash_colors | n_starts | n_muls | n_opp_muls | mean_turns | draft_time | first_game_time | last_game_time | n_losses |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | str | i64 | i64 | u32 | u32 | f64 | str | str | str | str | str | u32 | i64 | i64 | f64 | datetime[μs] | datetime[μs] | datetime[μs] | u32 |
| "dd1e9b19a8bf4b6d87d58798796063… | "MKM" | 0 | 7 | 7 | 4 | 0.571429 | "platinum" | "None" | "WU" | "BG" | null | 5 | 0 | 4 | 8.428571 | 2024-03-07 22:47:01 | 2024-03-07 23:11:35 | 2024-03-08 21:04:31 | 3 |
| "38e6baea32d944c59235e03931ff55… | "MKM" | 0 | 5 | 5 | 2 | 0.4 | "diamond" | "None" | "UG" | "WUB" | "B" | 2 | 0 | 0 | 9.6 | 2024-03-11 11:27:09 | 2024-03-11 11:49:45 | 2024-03-11 12:27:10 | 3 |
| "162e396bf1a448f7b1ef9ab97e7615… | "MKM" | 0 | 3 | 3 | 0 | 0.0 | "platinum" | "None" | "UBRG" | "B" | "W" | 1 | 0 | 0 | 5.333333 | 2024-02-06 20:36:34 | 2024-02-06 20:58:30 | 2024-02-06 21:05:37 | 3 |
| "1e15f17a131649db8fa64c2991c656… | "MKM" | 0 | 5 | 5 | 2 | 0.4 | "platinum" | "None" | "UG" | "WR" | "WB" | 1 | 0 | 1 | 9.0 | 2024-02-15 20:12:26 | 2024-02-15 20:37:50 | 2024-02-16 15:11:45 | 3 |
| "9f3f793e47264c7db21c6a43979c2c… | "MKM" | 0 | 7 | 7 | 4 | 0.571429 | "bronze" | "None" | "BR" | "WR" | null | 4 | 1 | 1 | 7.571429 | 2024-02-10 18:42:08 | 2024-02-10 19:02:46 | 2024-02-10 20:02:53 | 3 |
5 Card Data
| expansion | draft_id | match_number | game_number | build_index | opening_hand_Forest | drawn_Forest | tutored_Forest | deck_Forest | sideboard_Forest | opening_hand_Island | drawn_Island | tutored_Island | deck_Island | sideboard_Island | opening_hand_Mistveil Plains | drawn_Mistveil Plains | tutored_Mistveil Plains | deck_Mistveil Plains | sideboard_Mistveil Plains | opening_hand_Mountain | drawn_Mountain | tutored_Mountain | deck_Mountain | sideboard_Mountain | opening_hand_Plains | drawn_Plains | tutored_Plains | deck_Plains | sideboard_Plains | opening_hand_Swamp | drawn_Swamp | tutored_Swamp | deck_Swamp | sideboard_Swamp | opening_hand_A Killer Among Us | drawn_A Killer Among Us | … | deck_Vitu-Ghazi Inspector | sideboard_Vitu-Ghazi Inspector | opening_hand_Warleader's Call | drawn_Warleader's Call | tutored_Warleader's Call | deck_Warleader's Call | sideboard_Warleader's Call | opening_hand_Wispdrinker Vampire | drawn_Wispdrinker Vampire | tutored_Wispdrinker Vampire | deck_Wispdrinker Vampire | sideboard_Wispdrinker Vampire | opening_hand_Wojek Investigator | drawn_Wojek Investigator | tutored_Wojek Investigator | deck_Wojek Investigator | sideboard_Wojek Investigator | opening_hand_Worldsoul's Rage | drawn_Worldsoul's Rage | tutored_Worldsoul's Rage | deck_Worldsoul's Rage | sideboard_Worldsoul's Rage | opening_hand_Worldspine Wurm | drawn_Worldspine Wurm | tutored_Worldspine Wurm | deck_Worldspine Wurm | sideboard_Worldspine Wurm | opening_hand_Wrench | drawn_Wrench | tutored_Wrench | deck_Wrench | sideboard_Wrench | opening_hand_Yarus, Roar of the Old Gods | drawn_Yarus, Roar of the Old Gods | tutored_Yarus, Roar of the Old Gods | deck_Yarus, Roar of the Old Gods | sideboard_Yarus, Roar of the Old Gods |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | … | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 |
| "MKM" | "8e1e92694dcd44e380720f2ec3d971… | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 7 | 0 | 2 | 1 | 0 | 8 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| "MKM" | "8e1e92694dcd44e380720f2ec3d971… | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 7 | 0 | 0 | 1 | 0 | 8 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| "MKM" | "8e1e92694dcd44e380720f2ec3d971… | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 7 | 0 | 1 | 2 | 0 | 8 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| "MKM" | "8e1e92694dcd44e380720f2ec3d971… | 4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 7 | 0 | 0 | 2 | 0 | 8 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| "MKM" | "8e1e92694dcd44e380720f2ec3d971… | 5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 7 | 0 | 1 | 1 | 0 | 8 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
6 Summary Table
| expansion | first_game | last_game | max_games | win_rate | start_rate | n_games | n_mul | n_opp_mul | mean_turns | total_turns | max_turns | min_turns | n_drafts | n_matches | mean_matches | n_games_per_draft | n_cards | max_card | mean_land | max_land | min_land |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| str | datetime[μs] | datetime[μs] | i64 | f64 | f64 | u32 | i64 | i64 | f64 | i64 | i64 | i64 | u32 | i64 | f64 | f64 | i32 | i64 | f64 | i32 | i32 |
| "MKM" | 2024-02-06 16:24:44 | 2024-03-18 23:59:48 | 1 | 0.549798 | 0.49982 | 964377 | 114014 | 121101 | 9.278472 | 8947945 | 32 | 1 | 165251 | 965887 | 5.844969 | 5.835832 | 326 | 8 | 22.525862 | 38 | 9 |