Summary

The code to wrangle the data. Deal with NAs, join data frames, etc.

1 Introduction

This data set is very large, in both memory and in size. Each row is a game, with identifiers for the draft event and the player. The columns have information on the given draft game, with the majority are for card counts per state, such as the number of a given card in the deck or the number of a given card in the opening hand.

Ideally I want to melt the data into a tall layout, with rows for each unique combination of card and game and columns for the card count states, such as deck, hand, etc. This will allow me to easily filter and group the data for analysis. Unfortunately, after attempting to wrangle in this format, I found that the data set is too large to handle in memory. I could play with making spares arrays, but I think that would be too much work for this data set.

For this notebook, I will keep the data in the wide layout and aggregate down into useful tables, such as per-game and per-player summaries.

1.1 Terms

The following terms are used in the data set: - game: a single game of MTG - match: a series of games between two players. Typically best of 1 or best of 3. - draft: an initial draft event, followed by a series of matches between a group of players. - build: a instance of a deck in a draft. Can be adjusted between games.

2 Initial Setup

Define Paths

Convert the csv file to parquet if needed.

Scan file into a lazy frame and set the schema.

Parse column names, such as the card name and the card state.

(1630, 1600, 30)

Check for extra underscores in card names.

Show first few rows of the data.

shape: (0, 20)
expansion event_type draft_id draft_time game_time build_index match_number game_number rank opp_rank main_colors splash_colors on_play num_mulligans opp_num_mulligans opp_colors num_turns won user_n_games_bucket user_game_win_rate_bucket
str str str str str i64 i64 i64 str str str str bool i64 i64 str i64 bool i64 f64

Convert timestamps from strings to datetimes

3 Games Table

A table of each game, excludes card data.

Note that the ‘opp_rank’ does not have consistent type

shape: (2, 20)
expansion event_type draft_id draft_time game_time build_index match_number game_number rank opp_rank main_colors splash_colors on_play num_mulligans opp_num_mulligans opp_colors num_turns won user_n_games_bucket user_game_win_rate_bucket
str str str datetime[μs] datetime[μs] i64 i64 i64 str str str str bool i64 i64 str i64 bool i64 f64
"MKM" "PremierDraft" "8e1e92694dcd44e380720f2ec3d971… 2024-02-06 17:48:51 2024-02-06 18:41:03 0 1 1 null "None" "WB" "G" true 0 0 "WB" 9 true 50 0.62
"MKM" "PremierDraft" "8e1e92694dcd44e380720f2ec3d971… 2024-02-06 17:48:51 2024-02-06 18:49:03 0 2 1 "silver" "None" "WB" "G" true 0 0 "RG" 9 true 50 0.62

4 Draft Table

A table of the aggregated draft data, excluding card data.

shape: (5, 20)
draft_id expansion n_builds n_matches n_games n_wins win_rate rank opp_rank main_colors opp_colors splash_colors n_starts n_muls n_opp_muls mean_turns draft_time first_game_time last_game_time n_losses
str str i64 i64 u32 u32 f64 str str str str str u32 i64 i64 f64 datetime[μs] datetime[μs] datetime[μs] u32
"dd1e9b19a8bf4b6d87d58798796063… "MKM" 0 7 7 4 0.571429 "platinum" "None" "WU" "BG" null 5 0 4 8.428571 2024-03-07 22:47:01 2024-03-07 23:11:35 2024-03-08 21:04:31 3
"38e6baea32d944c59235e03931ff55… "MKM" 0 5 5 2 0.4 "diamond" "None" "UG" "WUB" "B" 2 0 0 9.6 2024-03-11 11:27:09 2024-03-11 11:49:45 2024-03-11 12:27:10 3
"162e396bf1a448f7b1ef9ab97e7615… "MKM" 0 3 3 0 0.0 "platinum" "None" "UBRG" "B" "W" 1 0 0 5.333333 2024-02-06 20:36:34 2024-02-06 20:58:30 2024-02-06 21:05:37 3
"1e15f17a131649db8fa64c2991c656… "MKM" 0 5 5 2 0.4 "platinum" "None" "UG" "WR" "WB" 1 0 1 9.0 2024-02-15 20:12:26 2024-02-15 20:37:50 2024-02-16 15:11:45 3
"9f3f793e47264c7db21c6a43979c2c… "MKM" 0 7 7 4 0.571429 "bronze" "None" "BR" "WR" null 4 1 1 7.571429 2024-02-10 18:42:08 2024-02-10 19:02:46 2024-02-10 20:02:53 3

5 Card Data

shape: (5, 1_635)
expansion draft_id match_number game_number build_index opening_hand_Forest drawn_Forest tutored_Forest deck_Forest sideboard_Forest opening_hand_Island drawn_Island tutored_Island deck_Island sideboard_Island opening_hand_Mistveil Plains drawn_Mistveil Plains tutored_Mistveil Plains deck_Mistveil Plains sideboard_Mistveil Plains opening_hand_Mountain drawn_Mountain tutored_Mountain deck_Mountain sideboard_Mountain opening_hand_Plains drawn_Plains tutored_Plains deck_Plains sideboard_Plains opening_hand_Swamp drawn_Swamp tutored_Swamp deck_Swamp sideboard_Swamp opening_hand_A Killer Among Us drawn_A Killer Among Us deck_Vitu-Ghazi Inspector sideboard_Vitu-Ghazi Inspector opening_hand_Warleader's Call drawn_Warleader's Call tutored_Warleader's Call deck_Warleader's Call sideboard_Warleader's Call opening_hand_Wispdrinker Vampire drawn_Wispdrinker Vampire tutored_Wispdrinker Vampire deck_Wispdrinker Vampire sideboard_Wispdrinker Vampire opening_hand_Wojek Investigator drawn_Wojek Investigator tutored_Wojek Investigator deck_Wojek Investigator sideboard_Wojek Investigator opening_hand_Worldsoul's Rage drawn_Worldsoul's Rage tutored_Worldsoul's Rage deck_Worldsoul's Rage sideboard_Worldsoul's Rage opening_hand_Worldspine Wurm drawn_Worldspine Wurm tutored_Worldspine Wurm deck_Worldspine Wurm sideboard_Worldspine Wurm opening_hand_Wrench drawn_Wrench tutored_Wrench deck_Wrench sideboard_Wrench opening_hand_Yarus, Roar of the Old Gods drawn_Yarus, Roar of the Old Gods tutored_Yarus, Roar of the Old Gods deck_Yarus, Roar of the Old Gods sideboard_Yarus, Roar of the Old Gods
str str i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64 i64
"MKM" "8e1e92694dcd44e380720f2ec3d971… 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 7 0 2 1 0 8 0 0 0 0 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"MKM" "8e1e92694dcd44e380720f2ec3d971… 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 7 0 0 1 0 8 0 0 0 0 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"MKM" "8e1e92694dcd44e380720f2ec3d971… 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 7 0 1 2 0 8 0 0 0 0 0 0 0 0 0 0 2 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"MKM" "8e1e92694dcd44e380720f2ec3d971… 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 7 0 0 2 0 8 0 0 0 0 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"MKM" "8e1e92694dcd44e380720f2ec3d971… 5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 7 0 1 1 0 8 0 0 0 0 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

6 Summary Table

shape: (1, 22)
expansion first_game last_game max_games win_rate start_rate n_games n_mul n_opp_mul mean_turns total_turns max_turns min_turns n_drafts n_matches mean_matches n_games_per_draft n_cards max_card mean_land max_land min_land
str datetime[μs] datetime[μs] i64 f64 f64 u32 i64 i64 f64 i64 i64 i64 u32 i64 f64 f64 i32 i64 f64 i32 i32
"MKM" 2024-02-06 16:24:44 2024-03-18 23:59:48 1 0.549798 0.49982 964377 114014 121101 9.278472 8947945 32 1 165251 965887 5.844969 5.835832 326 8 22.525862 38 9