Lindy effect in surnames problem
I have a challenge for you: Assume you have a population of 100 000, with distinct surnames, and the same number of males and females. They randomly pair with each other, and make next population. Each pair has random number of children, from 0 to 5. Child takes surname from one of the parents. The gender of each child is random. Then population is limited to 100 000. How many distinct surnames there will be after 100 000 iterations? What is the relationship between the time how long a surname has survived, and what is its expected survival time?
If you are good at mathematics, you can find answers with complex formulas. However, I suggest simulating this situation, and see the results.
I used Kotlin, to find out the result. With every generation, every surname has a change of being eliminated from the pool, so it seems intuitive, that over time there will be only one surname left. The question is, how long it is going to take. I will model a person as a class with a surname (I used
Int, because why not) and a biological gender (for this exercise, I assumed
Boolean). To make a population of 100_000 people, I used
List builder, and made a random people, each with a different surname. Then for each iteration, I shuffle my population, limit it, and randomly assign males to females. Then each pair have a random number of children, from 0 to 5, what makes the new population. Then I print the iteration number and the number of distinct surnames.
It mu case, it took around 6000 iterations, until only two surnames were left. Fights for full domintation took around 2500 more iterations. It seems nearly impossible, to have more than one surname after 100 000 iterations. It seems quite clear, that the longer a surname survived, the less surnames it has to fight with, the longer expected survival time it has. This perfecty demonstrates the Lindy effect. Let's see it in our simutation.
We will first see, what is the distribution of the number of surnames, for that, we will note how many
I started that in REPL, to be able to operate on the result
surnamesLost variable, without recalculating it every time.
Notice, that the algorithm does not include the first iteration, when over half of surnames are lost. I decided to exclude it, because it is not a natural situation, and it will affect other estimations too much.
What is the average survival rate after the first round? This can be calculated by waighted average of iterations a surname survived. We will exclude the last surname, that will live forever.
How does it change, if we ignore first 10 rounds?
Only 10 round has passed, and now the expected survival time is nearly 100 iterations. What is it after 100 iterations?
As you can see, the longer a surname survived, the longer expected survival rate it has. Lindy effect. Let's see its distribution for 1000 iterations.
Clearly, together with surviving this iteration, the chance to survive another iteration is growing. How much? We can calculate and visualize it.