Skip to main content

Table 1 Statistics of investigated papers. “Source” denotes where the projects are collected for investigation, “# Project” indicates the number of investigated projects, and “Approach” shows what methods are used to obtain the conclusions

From: Evaluation indicators for open-source software: a review

Study Source # Project Data analysis methods
Lerner (2005) SourceForge 40000 OLS1
Colazo et al. (2009) SourceForge 62 OLS and Cox regression
Sen et al. (2008) SourceForge 196 responses Multinomial logit analysis
Grewal (2006) SourceForge 108 Latent class cluster analysis
Crowston et al. (2004) SourceForge 122 PSM2
Garousi (2009) SourceForge 8,627 N.A.
Crowston et al. (2003) Surveys via SlashDot3 170 Atlas-ti13
Sen (2006) FreshMeat 12923 FIML4
Wu et al. (2007) SourceForge 56 3SLS5
Stewart et al. (2005) FreshMeat 147 MANCOVA6
Raymond (1999) Fetchmail12 N.A. N.A.
Fershtman et al. (2004) SourceForge 71 GLS7
Subramaniam (2009) SourceForge 8,627 Random-effects and linear regression
Midha et al. (2012) N.A. 283 VIF8
Colazo (2005) SourceForge 62 OLS
Tsay et al. (2012) Github N.A. Separate negative & binomial regression
Homscheid et al. (2016) Survey 321 Theory-driven approach
Spaeth et al. (2015) Maemo and OpenMoko N.A. N.A.
Teigland et al. (2014) eZ Publish N.A. Abductive approach
Guinan et al. (1998) 15 organizations 66 Teams PCA16
English et al. (2007) SourceForge 110,933 N.A.
Beecher (2008) Debian9 50 GQM10 Method
Robinson and Vlas. (2015) SourceForge 31 Six-Vertex measurement model
Comino et al. (2007) SourceForge 88,192 N.A.
Giuri et al. (2004) SourceForge N.A. Multinomial logit analysis
Schweik (2009) SourceForge 107,747 N.A.
Ghapanchi (2015)) N.A. 1,409 PLS11
Chang (2018) CFA and brigades’ Slack channels 143 Inferential statistics method
Ke and Zhang (2011) SourceForge 233 PLS
Peng (2019) Github N.A. OLS, GLM17, BLR18
Feitelson et al. (2006) SourceForge 1681 Least-squares analysis
Emanuel et al. (2010) SourceForge 160141 Datamining 2-Itemset Association Rule
Tamura and Yamada (2007) Fedora Core Linux N.A. Neural network and NHPP model
Norikane et al. (2018) QT project database N.A. Prediction model
Bao et al. (2019) Github 917 Wilcoxon rank-sum test with Bonferroni correction
Yang et al. (2013) Ohloh N.A. Regression data analysis
Hanoğlu and Tarhan (2019) Github 17 Understand 5.1 and JASP
Crowston and Shamshurin (2017) ASF14 Incubator 74 Violin plot
Joy et al. (2018) Github 130 OLS
Chen et al. (2015) N.A. 70 Data Analysis
Greene and Fischer (2016) Github 1000 N.A.
Rebouças et al. (new12-20) Github 35360 Fisher’s Exact Test
Hata et al. (020803) Github 22 Game-theoretical models
Fronchetti et al. (020804) Github 450 Random Forest and KSC clustering algorithm19
  1. 1OLS: Ordinary least squares regression
  2. 2PSM: Parametric Survival Model
  3. 3Surveys via SlashDot: The data was collected by surveying developers via SlashDot, a popular Web-based discussion board
  4. 4FIML: Full Information Maximum Likelihood
  5. 53SLS: Three-Stage Least-Squares regression
  6. 6MANCOVA: Multivariate analysis of covariance
  7. 7GLS: Generalized least squares regression
  8. 8VIF: Variance Inflation Factors
  9. 9Debian: This survey is made among Linux kernel developers
  10. 10GQM: Goal, Question, Metric method
  11. 11PLS: Partial least squares regression
  12. 12Fetchmail: Full-featured IMAP and POP client
  13. 13Atlas-ti: A program used for qualitative research or data analysis
  14. 14ASF: Apache Software Foundation
  15. 15LCA: Latent class cluster analysis
  16. 16PCA: Principal component analysis
  17. 17GLM: Generalized linear model
  18. 18BLR: Bayesian linear regression
  19. 19KSC clustering algorithm: K-Spectral Centroid clustering algorithm