This document represents a supplementary material for the ‘Is Age Really Cruel to Experts: Compensatory Effects of Activity’ study. If you have any additional questions about the paper or statistical analysis, feel free to write to Nemanja.Vaci@aau.at.

1. Intersection of the datasets

To investigate whether negative effect of practice in FIDE dataset is due to the restriction of range of tournament activity records. We derived the players that are registered in both datasets.

  1. We added the first and the last name of every player in both datasets

  2. All the special characters and name titles were replaced

  3. We used stringdist package in R, in particular function amatch to find string matches between datasets.

  4. Year of birth was added to the both datasets

  5. We calculated pairwise string distances between previously matched names with stringdist function

      The Jaro-Winkler method was used to compute distances, that results on a scale from 0 (exact match) to 1 (completely dissimilar). This method takes into account the length of the strings, number of characters with match between two strings, and number of transpositions required to make string exact.

  1. All players for whom the year of birth between datasets did not match and distance between names was greater than 0.1 were excluded

This resulted with 13487 individuals.

The main model (see Modeling of age and activity effects subsection in Results) was re-analysed on the new datasets. Results show same pattern of results as in the main analysis. That is, the preserving effect of tournament activity in German database and declining effect in FIDE database.

FIDE database

FIDEIntersection<-lmer(Rating~poly(AgeC, degree=3, raw=T)*Games+(1+AgeC|IDFIDE), data=FIDE)
print(summary(FIDEIntersection), cor=FALSE)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Rating ~ poly(AgeC, degree = 3, raw = T) * Games + (1 + AgeC |      IDFIDE)
##    Data: FIDE
## 
## REML criterion at convergence: 4202242
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -16.6570  -0.4346  -0.0128   0.4276  13.3794 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  IDFIDE   (Intercept) 89681.55 299.469       
##           AgeC           90.21   9.498  -0.62
##  Residual               442.81  21.043       
## Number of obs: 453617, groups:  IDFIDE, 12982
## 
## Fixed effects:
##                                          Estimate Std. Error t value
## (Intercept)                             1.887e+03  3.103e+00   608.1
## poly(AgeC, degree = 3, raw = T)1        2.228e+01  1.479e-01   150.7
## poly(AgeC, degree = 3, raw = T)2       -6.011e-01  3.944e-03  -152.4
## poly(AgeC, degree = 3, raw = T)3        4.243e-03  3.921e-05   108.2
## Games                                   2.599e-01  3.145e-02     8.3
## poly(AgeC, degree = 3, raw = T)1:Games  7.088e-02  3.764e-03    18.8
## poly(AgeC, degree = 3, raw = T)2:Games -3.481e-03  1.314e-04   -26.5
## poly(AgeC, degree = 3, raw = T)3:Games  3.703e-05  1.338e-06    27.7

German database

GermanIntersection<-lmer(Rating~poly(AgeC, degree=3, raw=T)*Games+(1+AgeC|IDGerman), data=German)
print(summary(GermanIntersection), cor=FALSE)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Rating ~ poly(AgeC, degree = 3, raw = T) * Games + (1 + AgeC |      IDGerman)
##    Data: German
## 
## REML criterion at convergence: 5521820
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -12.5444  -0.4498   0.0116   0.4765   8.0926 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  IDGerman (Intercept) 454528.4 674.19        
##           AgeC           729.3  27.01   -0.69
##  Residual               3971.9  63.02        
## Number of obs: 482960, groups:  IDGerman, 13488
## 
## Fixed effects:
##                                          Estimate Std. Error t value
## (Intercept)                             5.065e+02  7.310e+00    69.3
## poly(AgeC, degree = 3, raw = T)1        1.458e+02  4.546e-01   320.6
## poly(AgeC, degree = 3, raw = T)2       -4.064e+00  1.360e-02  -298.8
## poly(AgeC, degree = 3, raw = T)3        3.189e-02  1.340e-04   238.0
## Games                                   6.796e+00  1.249e-01    54.4
## poly(AgeC, degree = 3, raw = T)1:Games -6.795e-01  1.781e-02   -38.2
## poly(AgeC, degree = 3, raw = T)2:Games  2.092e-02  6.958e-04    30.1
## poly(AgeC, degree = 3, raw = T)3:Games -1.907e-04  7.587e-06   -25.1

2. Cohort effects

To investigate cohort effects in the datasets, we divided the players into three groups:

  1. Players born between 1900 and 1940

  2. Players born between 1940 and 1980

  3. Players born after 1980

The Generalized Additive Modeling (mgcv package in R) was used to fit the non-linear regression over the age of participants. This way we obtained curves over the age for every cohort.

Results show that the proposed effect persists in the case of the FIDE database. The main reason is that Elo ratings differ between the cohorts, as the required Elo points for logging in the database were changed through history of dataset. The results on the German database, however, show that there are no indication of possible cohort effects. In other words, the Elo scores from different cohorts aligne perfectly.

FIDE database

bam1<-bam(Rating~s(Age), data=FIDE1)
bam2<-bam(Rating~s(Age), data=FIDE2)
bam3<-bam(Rating~s(Age), data=FIDE3)
par(mfrow=c(1,3))
plot_smooth(bam3, view='Age', ylim=c(1800,2300), main='2010 to 1980')
## Summary:
##  * Age : numeric predictor; with 30 values ranging from 10.000000 to 30.000000.
plot_smooth(bam2, view='Age', ylim=c(1800,2300), main='1980 to 1940')
## Summary:
##  * Age : numeric predictor; with 30 values ranging from 31.000000 to 55.000000.
plot_smooth(bam1, view='Age', ylim=c(1800,2300), main='1940 to 1900')
## Summary:
##  * Age : numeric predictor; with 30 values ranging from 56.000000 to 80.000000.

German database

bamG1<-bam(Rating~s(Age), data=German1)
bamG2<-bam(Rating~s(Age), data=German2)
bamG3<-bam(Rating~s(Age), data=German3)
par(mfrow=c(1,3))
plot_smooth(bamG3, view='Age', ylim=c(900,1800), main='2007 to 1980')
## Summary:
##  * Age : numeric predictor; with 30 values ranging from 10.000000 to 27.000000.
plot_smooth(bamG2, view='Age', ylim=c(900,1800), main='1980 to 1940')
## Summary:
##  * Age : numeric predictor; with 30 values ranging from 28.000000 to 55.000000.
plot_smooth(bamG1, view='Age', ylim=c(900,1800), main='1940 to 1900')
## Summary:
##  * Age : numeric predictor; with 30 values ranging from 56.000000 to 80.000000.

3. Ability factor

To investigate effects of expertise on the behavior of function, we divided the ability factor in four different groups. In this case, we used statistically defined cut-offs based on player’s peak rating, thus, groups of players with four different level of expertise were obtained.

Results show that in the case of both datasets, the ability level influences the stabilization point of decline. The higher the level of ability, the sooner the inflection point is observed, except for the lowest ability level, as the function is flat.

Descriptive statistics for FIDE database

with(FIDE, aggregate(Rating~Ability2, FUN= function(Rating) c(MEAN=mean(Rating), SD=sd(Rating), RANGE=range(Rating))))
##   Ability2 Rating.MEAN  Rating.SD Rating.RANGE1 Rating.RANGE2
## 1        1  1710.19992   83.60242    1500.00000    1837.00000
## 2        2  2022.45803   91.78300    1500.00000    2175.00000
## 3        3  2238.44584   93.00313    1502.00000    2513.00000
## 4        4  2492.20624  101.42623    1645.00000    2851.00000
with(German, aggregate(Rating~Ability2, FUN= function(Rating) c(MEAN=mean(Rating), SD=sd(Rating), RANGE=range(Rating))))
##   Ability2 Rating.MEAN  Rating.SD Rating.RANGE1 Rating.RANGE2
## 1        1   754.08544   90.28766     200.00000     859.00000
## 2        2  1162.77550  235.82714     200.00000    1510.00000
## 3        3  1689.28402  229.60076     200.00000    2161.00000
## 4        4  2179.11974  193.70309     298.00000    2813.00000

Descriptive statistics for German database

with(German, aggregate(Rating~Ability2, FUN= function(Rating) c(MEAN=mean(Rating), SD=sd(Rating), RANGE=range(Rating))))
##   Ability2 Rating.MEAN  Rating.SD Rating.RANGE1 Rating.RANGE2
## 1        1   754.08544   90.28766     200.00000     859.00000
## 2        2  1162.77550  235.82714     200.00000    1510.00000
## 3        3  1689.28402  229.60076     200.00000    2161.00000
## 4        4  2179.11974  193.70309     298.00000    2813.00000
FIDEAbility4<-lmer(Rating~poly(AgeC, degree=3, raw=T)*Games*Ability2+(1+AgeC|ID), data=FIDE)
GermanAbility4<-lmer(Rating~poly(AgeC, degree=3, raw=T)*Games*Ability2+(1+AgeC|ID), data=German)

FIDE database

print(summary(FIDEAbility4), cor=FALSE)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Rating ~ poly(AgeC, degree = 3, raw = T) * Games * Ability2 +      (1 + AgeC | ID)
##    Data: FIDE
## 
## REML criterion at convergence: 27456376
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -24.0034  -0.3968  -0.0119   0.3919  16.6139 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  ID       (Intercept) 69586.2  263.79        
##           AgeC          152.3   12.34   -0.72
##  Residual               496.2   22.28        
## Number of obs: 2916227, groups:  ID, 100529
## 
## Fixed effects:
##                                                   Estimate Std. Error t value
## (Intercept)                                      1.614e+03  3.588e+00   449.7
## poly(AgeC, degree = 3, raw = T)1                -1.330e+01  2.568e-01   -51.8
## poly(AgeC, degree = 3, raw = T)2                 5.145e-01  7.976e-03    64.5
## poly(AgeC, degree = 3, raw = T)3                -5.805e-03  7.638e-05   -76.0
## Games                                            8.109e-01  3.828e-02    21.2
## Ability2                                         1.548e+02  1.456e+00   106.3
## poly(AgeC, degree = 3, raw = T)1:Games          -7.230e-02  5.515e-03   -13.1
## poly(AgeC, degree = 3, raw = T)2:Games           1.878e-03  2.079e-04     9.0
## poly(AgeC, degree = 3, raw = T)3:Games          -1.896e-05  2.193e-06    -8.6
## poly(AgeC, degree = 3, raw = T)1:Ability2        1.077e+01  9.650e-02   111.7
## poly(AgeC, degree = 3, raw = T)2:Ability2       -3.323e-01  2.824e-03  -117.7
## poly(AgeC, degree = 3, raw = T)3:Ability2        2.963e-03  2.748e-05   107.8
## Games:Ability2                                  -3.553e-02  1.413e-02    -2.5
## poly(AgeC, degree = 3, raw = T)1:Games:Ability2  2.663e-02  1.968e-03    13.5
## poly(AgeC, degree = 3, raw = T)2:Games:Ability2 -1.050e-03  7.429e-05   -14.1
## poly(AgeC, degree = 3, raw = T)3:Games:Ability2  1.133e-05  7.934e-07    14.3

FIDEdes
##   Maximum Second_Derivative
## 1      35      out_of_range
## 2      34      out_of_range
## 3      33                65
## 4      36                53

German database:

print(summary(FIDEAbility4), cor=FALSE)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Rating ~ poly(AgeC, degree = 3, raw = T) * Games * Ability2 +      (1 + AgeC | ID)
##    Data: FIDE
## 
## REML criterion at convergence: 27456376
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -24.0034  -0.3968  -0.0119   0.3919  16.6139 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  ID       (Intercept) 69586.2  263.79        
##           AgeC          152.3   12.34   -0.72
##  Residual               496.2   22.28        
## Number of obs: 2916227, groups:  ID, 100529
## 
## Fixed effects:
##                                                   Estimate Std. Error t value
## (Intercept)                                      1.614e+03  3.588e+00   449.7
## poly(AgeC, degree = 3, raw = T)1                -1.330e+01  2.568e-01   -51.8
## poly(AgeC, degree = 3, raw = T)2                 5.145e-01  7.976e-03    64.5
## poly(AgeC, degree = 3, raw = T)3                -5.805e-03  7.638e-05   -76.0
## Games                                            8.109e-01  3.828e-02    21.2
## Ability2                                         1.548e+02  1.456e+00   106.3
## poly(AgeC, degree = 3, raw = T)1:Games          -7.230e-02  5.515e-03   -13.1
## poly(AgeC, degree = 3, raw = T)2:Games           1.878e-03  2.079e-04     9.0
## poly(AgeC, degree = 3, raw = T)3:Games          -1.896e-05  2.193e-06    -8.6
## poly(AgeC, degree = 3, raw = T)1:Ability2        1.077e+01  9.650e-02   111.7
## poly(AgeC, degree = 3, raw = T)2:Ability2       -3.323e-01  2.824e-03  -117.7
## poly(AgeC, degree = 3, raw = T)3:Ability2        2.963e-03  2.748e-05   107.8
## Games:Ability2                                  -3.553e-02  1.413e-02    -2.5
## poly(AgeC, degree = 3, raw = T)1:Games:Ability2  2.663e-02  1.968e-03    13.5
## poly(AgeC, degree = 3, raw = T)2:Games:Ability2 -1.050e-03  7.429e-05   -14.1
## poly(AgeC, degree = 3, raw = T)3:Games:Ability2  1.133e-05  7.934e-07    14.3

Germandes
##         Maximum Second_Derivative
## 1 flat_function     flat_function
## 2            41                54
## 3            38                54
## 4            33                50

We also excluded the players in German dataset below 1500 points to estimate whether shape of the curves is due to the range restriction. This way we obtained two groups of players (equal on Elo rating in both datasets): 1) Between 1500 and 2000 Elo points and 2) Above 2000 Elo points. Results show same effects as in the main analysis.

GermanRestricted<-lmer(Rating~poly(AgeC, degree=3, raw=T)*Games*AbilityNew+(1+AgeC|ID), data=German2)
print(summary(GermanRestricted), cor=FALSE)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Rating ~ poly(AgeC, degree = 3, raw = T) * Games * AbilityNew +      (1 + AgeC | ID)
##    Data: German2
## 
## REML criterion at convergence: 18349579
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -12.2627  -0.5202   0.0091   0.5385  12.4061 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  ID       (Intercept) 139978.7 374.14        
##           AgeC           192.2  13.86   -0.71
##  Residual               1992.7  44.64        
## Number of obs: 1685698, groups:  ID, 90154
## 
## Fixed effects:
##                                                     Estimate Std. Error t value
## (Intercept)                                        9.510e+02  6.669e+00  142.59
## poly(AgeC, degree = 3, raw = T)1                   5.245e+00  4.937e-01   10.62
## poly(AgeC, degree = 3, raw = T)2                   3.156e-01  1.500e-02   21.04
## poly(AgeC, degree = 3, raw = T)3                  -5.899e-03  1.426e-04  -41.36
## Games                                             -2.029e-02  2.054e-01   -0.10
## AbilityNew                                         3.875e+01  5.318e+00    7.29
## poly(AgeC, degree = 3, raw = T)1:Games             2.342e-01  2.543e-02    9.21
## poly(AgeC, degree = 3, raw = T)2:Games            -8.897e-03  9.002e-04   -9.88
## poly(AgeC, degree = 3, raw = T)3:Games             8.731e-05  9.275e-06    9.41
## poly(AgeC, degree = 3, raw = T)1:AbilityNew        5.050e+01  3.853e-01  131.08
## poly(AgeC, degree = 3, raw = T)2:AbilityNew       -1.677e+00  1.191e-02 -140.74
## poly(AgeC, degree = 3, raw = T)3:AbilityNew        1.503e-02  1.180e-04  127.42
## Games:AbilityNew                                   2.775e+00  1.539e-01   18.03
## poly(AgeC, degree = 3, raw = T)1:Games:AbilityNew -3.693e-01  1.956e-02  -18.88
## poly(AgeC, degree = 3, raw = T)2:Games:AbilityNew  1.175e-02  7.169e-04   16.39
## poly(AgeC, degree = 3, raw = T)3:Games:AbilityNew -1.085e-04  7.607e-06  -14.26