Importing 4 billion chess games with speed and scale using Elasticsearch and Universal Profiling

This is the first blog post in a series of ones to follow. Chess is a fascinating game. What can happen on those 64 squares can be a disastrous adventure or a wonderful experience. Lichess is a platform that allows you to play chess; luckily for us, it publishes all rated games as archives, starting in 2013. There are a total of over 4 billion games played. Yes, 4 billion matches.

Ingesting 4 billion documents is something that the Elastic Stack can handle easily. However, my custom Python implementation to extract and ship those documents out of Lichess faced severe performance problems. We will use Elastic APM and Universal Profiling to solve those performance problems in my custom Python application.

Let me start by saying that Elastic is an incredibly collaborative workplace. Multiple people from different teams helped me in this situation, and I am very grateful for their help.

Mục lục bài viết

PGN: The portable game notation

How is a chess game recorded? The standard is a so-called PGN (portable game notation) format. This is the very first game from Lichess’s archive.

Xổ số miền Bắc