Skip to main content

Table 1 Statistics of investigated papers. “Source” denotes where the projects are collected for investigation, “# Project” indicates the number of investigated projects, and “Approach” shows what methods are used to obtain the conclusions

From: Evaluation indicators for open-source software: a review

Study

Source

# Project

Data analysis methods

Lerner (2005)

SourceForge

40000

OLS1

Colazo et al. (2009)

SourceForge

62

OLS and Cox regression

Sen et al. (2008)

SourceForge

196 responses

Multinomial logit analysis

Grewal (2006)

SourceForge

108

Latent class cluster analysis

Crowston et al. (2004)

SourceForge

122

PSM2

Garousi (2009)

SourceForge

8,627

N.A.

Crowston et al. (2003)

Surveys via SlashDot3

170

Atlas-ti13

Sen (2006)

FreshMeat

12923

FIML4

Wu et al. (2007)

SourceForge

56

3SLS5

Stewart et al. (2005)

FreshMeat

147

MANCOVA6

Raymond (1999)

Fetchmail12

N.A.

N.A.

Fershtman et al. (2004)

SourceForge

71

GLS7

Subramaniam (2009)

SourceForge

8,627

Random-effects and linear regression

Midha et al. (2012)

N.A.

283

VIF8

Colazo (2005)

SourceForge

62

OLS

Tsay et al. (2012)

Github

N.A.

Separate negative & binomial regression

Homscheid et al. (2016)

Survey

321

Theory-driven approach

Spaeth et al. (2015)

Maemo and OpenMoko

N.A.

N.A.

Teigland et al. (2014)

eZ Publish

N.A.

Abductive approach

Guinan et al. (1998)

15 organizations

66 Teams

PCA16

English et al. (2007)

SourceForge

110,933

N.A.

Beecher (2008)

Debian9

50

GQM10 Method

Robinson and Vlas. (2015)

SourceForge

31

Six-Vertex measurement model

Comino et al. (2007)

SourceForge

88,192

N.A.

Giuri et al. (2004)

SourceForge

N.A.

Multinomial logit analysis

Schweik (2009)

SourceForge

107,747

N.A.

Ghapanchi (2015))

N.A.

1,409

PLS11

Chang (2018)

CFA and brigades’ Slack channels

143

Inferential statistics method

Ke and Zhang (2011)

SourceForge

233

PLS

Peng (2019)

Github

N.A.

OLS, GLM17, BLR18

Feitelson et al. (2006)

SourceForge

1681

Least-squares analysis

Emanuel et al. (2010)

SourceForge

160141

Datamining 2-Itemset Association Rule

Tamura and Yamada (2007)

Fedora Core Linux

N.A.

Neural network and NHPP model

Norikane et al. (2018)

QT project database

N.A.

Prediction model

Bao et al. (2019)

Github

917

Wilcoxon rank-sum test with Bonferroni correction

Yang et al. (2013)

Ohloh

N.A.

Regression data analysis

Hanoğlu and Tarhan (2019)

Github

17

Understand 5.1 and JASP

Crowston and Shamshurin (2017)

ASF14 Incubator

74

Violin plot

Joy et al. (2018)

Github

130

OLS

Chen et al. (2015)

N.A.

70

Data Analysis

Greene and Fischer (2016)

Github

1000

N.A.

Rebouças et al. (new12-20)

Github

35360

Fisher’s Exact Test

Hata et al. (020803)

Github

22

Game-theoretical models

Fronchetti et al. (020804)

Github

450

Random Forest and KSC clustering algorithm19

  1. 1OLS: Ordinary least squares regression
  2. 2PSM: Parametric Survival Model
  3. 3Surveys via SlashDot: The data was collected by surveying developers via SlashDot, a popular Web-based discussion board
  4. 4FIML: Full Information Maximum Likelihood
  5. 53SLS: Three-Stage Least-Squares regression
  6. 6MANCOVA: Multivariate analysis of covariance
  7. 7GLS: Generalized least squares regression
  8. 8VIF: Variance Inflation Factors
  9. 9Debian: This survey is made among Linux kernel developers
  10. 10GQM: Goal, Question, Metric method
  11. 11PLS: Partial least squares regression
  12. 12Fetchmail: Full-featured IMAP and POP client
  13. 13Atlas-ti: A program used for qualitative research or data analysis
  14. 14ASF: Apache Software Foundation
  15. 15LCA: Latent class cluster analysis
  16. 16PCA: Principal component analysis
  17. 17GLM: Generalized linear model
  18. 18BLR: Bayesian linear regression
  19. 19KSC clustering algorithm: K-Spectral Centroid clustering algorithm