< December 2018 >
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506

Monday, 10 December 2018

04:00 PM

The 'knight on an infinite chessboard' puzzle: efficient simulation in R [Variance Explained] 04:00 PM, Monday, 10 December 2018 06:40 PM, Tuesday, 24 December 2019

Previously in this series:

I’ve recently been enjoying The Riddler: Fantastic Puzzles from FiveThirtyEight, a wonderful book from 538’s Oliver Roeder. Many of the probability puzzles can be productively solved through Monte Carlo simulations in R.

Here’s one that caught my attention:

Suppose that a knight makes a “random walk” on an infinite chessboard. Specifically, every turn the knight follows standard chess rules and moves to one of its eight accessible squares, each with probability 1/8.

What is the probability that after the twentieth move the knight is back on its starting square?

In this post I’ll show how I’d answer this question through simulation in R, with an eye on keeping the simulation fast and interpretable. As in many of my posts, we’ll take a “tidy approach” that focuses on the dplyr, tidyr, and ggplot2 packages.

Simulating a knight’s moves

The first question is how we can simulate a knight’s move randomly. There are eight positions that a knight can move to, following the rule of “two spaces in one direction, one in the other”.

center

It helps to break each possible move into its X and Y components:

  • X = 2, Y = 1
  • X = 2, Y = -1
  • X = 1, Y = 2
  • X = 1, Y = -2
  • X = -1, Y = 2
  • X = -1, Y = -2
  • X = -2, Y = 1
  • X = -2, Y = -1

Notice that the moves are always made up of one 2 and one 1, and that either, both, or neither can be negative.

The first step of the simulation (getting a 1 and a 2) can be done as follows for a single simulation:

# Random number between 1 and 2
move_x <- sample(2, 1)

# If x is 2, y is 1, and vice versa
move_y <- 3 - move_x

c(move_x, move_y)
## [1] 1 2

How about letting either number be negative? Well, sample(c(1, -1)) can get us a value that’s either 1 or -1, so we can multiply that by each.

move_x <- sample(2, 1) * sample(c(1, -1), 1)
move_y <- (3 - abs(move_x)) * sample(c(1, -1), 1)

c(move_x, move_y)
## [1]  2 -1

Now that we can sample a single move, we can try sampling many moves (say, 20) by adding 20, replace = TRUE to each.

move_x <- sample(2, 20, replace = TRUE) * sample(c(1, -1), 20, replace = TRUE)
move_y <- (3 - abs(move_x)) * sample(c(1, -1), 20, replace = TRUE)

move_x
##  [1]  2  1  1  1 -2  2 -2  1  2 -2 -1  1 -2 -2  2 -2  1 -2 -2 -2
move_y
##  [1]  1 -2 -2 -2  1  1  1 -2 -1 -1  2 -2 -1 -1 -1 -1 -2 -1  1 -1

Now that we’ve figured out how to simulate chess moves, we can go ahead with our simulation of positions.

Tidy simulation

The crossing() function from tidyr is wonderful for setting up Monte Carlo simulations. It creates a tbl_df with every combination of the inputs (it’s similar to the built-in expand.grid).

In this simulation we’ll want some number of trials and some number of turns. If we wanted to generate 3 trials that each included 3 chess turns, we could do:

crossing(trial = 1:3,
         turn = 1:3)
## # A tibble: 9 x 2
##   trial  turn
##   <int> <int>
## 1     1     1
## 2     1     2
## 3     1     3
## 4     2     1
## 5     2     2
## 6     2     3
## 7     3     1
## 8     3     2
## 9     3     3

In this experiment, we’ll simulate 100,000 trials (and, as the riddle specifies, 20 moves), and then bring in our vectorized approach to simulating chess moves. Notice we use n() (a special dplyr function) for the number of items.

# Use n() and replace = TRUE to vectorize the sampling above
crossing(trial = 1:100000,
         turn = 1:20) %>%
  mutate(move_x = sample(2, n(), replace = TRUE) *
           sample(c(1, -1), n(), replace = TRUE),
         move_y = (3 - abs(move_x)) *
           sample(c(1, -1), n(), replace = TRUE))
## # A tibble: 2,000,000 x 4
##    trial  turn move_x move_y
##    <int> <int>  <dbl>  <dbl>
##  1     1     1      1     -2
##  2     1     2      2     -1
##  3     1     3      1     -2
##  4     1     4     -1     -2
##  5     1     5     -2     -1
##  6     1     6     -1     -2
##  7     1     7      1     -2
##  8     1     8     -2      1
##  9     1     9      1     -2
## 10     1    10      2     -1
## # ... with 1,999,990 more rows

Now, because the knights both start at (0, 0), we can calculate the position based on the cumulative sum of moves the X direction and in the Y direction. We do this by adding a group_by() (since movement happens separately within each trial) and mutate() to the sequence.

turns <- crossing(trial = 1:100000,
                  turn = 1:20) %>%
  mutate(move_x = sample(2, n(), replace = TRUE) *
           sample(c(1, -1), n(), replace = TRUE),
         move_y = (3 - abs(move_x)) *
           sample(c(1, -1), n(), replace = TRUE)) %>%
  group_by(trial) %>%
  mutate(position_x = cumsum(move_x),
         position_y = cumsum(move_y)) %>%
  ungroup()

turns
## # A tibble: 2,000,000 x 6
##    trial  turn move_x move_y position_x position_y
##    <int> <int>  <dbl>  <dbl>      <dbl>      <dbl>
##  1     1     1      1      2          1          2
##  2     1     2      2      1          3          3
##  3     1     3      1      2          4          5
##  4     1     4     -2     -1          2          4
##  5     1     5     -1      2          1          6
##  6     1     6      2     -1          3          5
##  7     1     7      1     -2          4          3
##  8     1     8      2      1          6          4
##  9     1     9      2     -1          8          3
## 10     1    10      1     -2          9          1
## # ... with 1,999,990 more rows

Just like that, we’ve performed 100,000 trials of 20 sequential chess moves. Not bad!

Answering the questions

First, let’s answer the riddle’s question. How often is the knight back at its home space (0, 0) by the twentieth move?

turns %>%
  filter(turn == 20) %>%
  summarize(mean(position_x == 0 & position_y == 0))
## # A tibble: 1 x 1
##   `mean(position_x == 0 & position_y == 0)`
##                                       <dbl>
## 1                                   0.00627

The probability of being back on the home space is about .6%.

How does this compare to other turns?

library(ggplot2)
theme_set(theme_light())

by_turn <- turns %>%
  group_by(turn) %>%
  summarize(home = mean(position_x == 0 & position_y == 0))

by_turn %>%
  ggplot(aes(turn, home)) +
  geom_line() +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(y = "% home")

center

It looks like it’s impossible for the knight to be back home on an odd numbered turn (if you have a chessboard you can try it!). At an even numbered turn, the probability is 1/8 for the second turn (the probability the first move was immediately followed by the opposite move). It then goes down from there to the .006 we see above.

Where might the chess piece be by turn 20? A geom_tile is a nice visualization for this.

turns %>%
  filter(turn == 20) %>%
  count(position_x, position_y, sort = TRUE) %>%
  ggplot(aes(position_x, position_y, fill = n)) +
  geom_tile()

center

Finally, any time we look at change over time (especially in several dimensions like a chess board), we can use Thomas Pedersen’s gganimate package to give a more dynamic view.

library(gganimate)

turns %>%
  count(turn, position_x, position_y) %>%
  ggplot(aes(position_x, position_y, fill = n)) +
  geom_tile() +
  transition_manual(turn)

center

Besides showing how the knight “spreads” across the board, the animation communicates an insight that might not be noticeable otherwise (but which is relevant in chess): any given space on the chessboard is either an “odd # of moves” space or an “even # of moves” one. As we saw earlier, the home is an even space, so it can be reached on turn 20 but not on turns 19 or 21.

Conclusions

What have we learned about efficient simulation in R?

  • Think vectorizably: Notice that we generated all of our 2 million moves in a couple of calls to sample(). This helps keeps our simulation fast because functions like sample() have some computational overhead. If we’d simulated each individual move in a loop, this simulation would have taken a lot longer.
  • Some functions show up a lot in simulations. The crossing() and cumsum() functions are very different, but they’re both wildly useful for simulations. Make sure you learn and understand them!
  • Tidy simulation is useful for visualization This puzzle could have been solved a bit more computationally efficiently by creating matrices for move_x, move_y, position_x, and position_y, and operating on them instead. In my opinion it would have made the code a bit less readable. But more importantly, by working in a tidy format we were able to immediately pipe the results into ggplot2 and gganimate and create visualizations. These helped check our results and improve our intuition around the answer.

Until next time, happy simulating!

Feeds

FeedRSSLast fetchedNext fetched after
XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Bits of DNA XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
blogs.perl.org XML 12:00 AM, Tuesday, 18 January 2022 12:15 AM, Tuesday, 18 January 2022
Blue Collar Bioinformatics XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Boing Boing XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Epistasis Blog XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Futility Closet XML 12:00 AM, Tuesday, 18 January 2022 12:15 AM, Tuesday, 18 January 2022
gCaptain XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Hackaday XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
In between lines of code XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
InciWeb Incidents for California XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
LeafSpring XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Living in an Ivory Basement XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
LWN.net XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Mastering Emacs XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Planet Debian XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Planet Emacsen XML 12:00 AM, Tuesday, 18 January 2022 12:15 AM, Tuesday, 18 January 2022
RNA-Seq Blog XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
RStudio Blog - Latest Comments XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
RWeekly.org - Blogs to Learn R from the Community XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
The Adventure Blog XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
The Allium XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
Variance Explained XML 12:00 AM, Tuesday, 18 January 2022 12:30 AM, Tuesday, 18 January 2022
January 2022
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
December 2021
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102
November 2021
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405
October 2021
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
September 2021
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930010203
August 2021
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
July 2021
MonTueWedThuFriSatSun
28293001020304
05060708091011
12131415161718
19202122232425
26272829303101
June 2021
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293001020304
May 2021
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
April 2021
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829300102
March 2021
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
February 2021
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
November 2020
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30010203040506
September 2020
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293001020304
July 2020
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102
June 2020
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405
May 2020
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
April 2020
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930010203
February 2020
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282901
January 2020
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930310102
December 2019
MonTueWedThuFriSatSun
25262728293001
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
November 2019
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
October 2019
MonTueWedThuFriSatSun
30010203040506
07080910111213
14151617181920
21222324252627
28293031010203
August 2019
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829303101
July 2019
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
June 2019
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282930
May 2019
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102
April 2019
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29300102030405
March 2019
MonTueWedThuFriSatSun
25262728010203
04050607080910
11121314151617
18192021222324
25262728293031
February 2019
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728010203
January 2019
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293031010203
December 2018
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
November 2018
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829300102
October 2018
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
September 2018
MonTueWedThuFriSatSun
27282930310102
03040506070809
10111213141516
17181920212223
24252627282930
August 2018
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930310102
July 2018
MonTueWedThuFriSatSun
25262728293001
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
June 2018
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
May 2018
MonTueWedThuFriSatSun
30010203040506
07080910111213
14151617181920
21222324252627
28293031010203
April 2018
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30010203040506
February 2018
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272801020304
January 2018
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
December 2017
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
November 2017
MonTueWedThuFriSatSun
30310102030405
06070809101112
13141516171819
20212223242526
27282930010203
September 2017
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
August 2017
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293031010203
March 2017
MonTueWedThuFriSatSun
27280102030405
06070809101112
13141516171819
20212223242526
27282930310102
January 2017
MonTueWedThuFriSatSun
26272829303101
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
November 2016
MonTueWedThuFriSatSun
31010203040506
07080910111213
14151617181920
21222324252627
28293001020304
October 2016
MonTueWedThuFriSatSun
26272829300102
03040506070809
10111213141516
17181920212223
24252627282930
31010203040506
September 2016
MonTueWedThuFriSatSun
29303101020304
05060708091011
12131415161718
19202122232425
26272829300102
August 2016
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
July 2016
MonTueWedThuFriSatSun
27282930010203
04050607080910
11121314151617
18192021222324
25262728293031
May 2016
MonTueWedThuFriSatSun
25262728293001
02030405060708
09101112131415
16171819202122
23242526272829
30310102030405
April 2016
MonTueWedThuFriSatSun
28293031010203
04050607080910
11121314151617
18192021222324
25262728293001
December 2014
MonTueWedThuFriSatSun
01020304050607
08091011121314
15161718192021
22232425262728
29303101020304
October 2014
MonTueWedThuFriSatSun
29300102030405
06070809101112
13141516171819
20212223242526
27282930310102