igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] igraph R: fit_power_law


From: Tamas Nepusz
Subject: Re: [igraph] igraph R: fit_power_law
Date: Mon, 5 Aug 2019 16:32:12 +0200

Dear Sander

  1. The igraph documentation suggests that the bfgs function is used to estimate the power law alpha, but I think the C implementation relies on the  Broyden-Fletcher-Goldfarb-Shanno optimization function of the lbfgs library instead. Is that correct?
This is the exact implementation of the BFGS optimization that we use in power law fitting:

https://github.com/ntamas/plfit/blob/master/src/lbfgs.c

As far as I know this is the C port of the limited memory variant of the Broyden-Fletcher-Goldfarb-Shanno method, originally written in FORTRAN. The license notes in the source code might give you more clues.

  1. The fit_power_law function relies on the MLE function of the stat4 package. I am curious why this was deprecated, given the availability of plfit and MLE parameters. Is this simply a memory issue? 
I don't know; this is purely in the domain of the R interface of igraph; the C core uses the L-BFGS method and my "plfit" library:

https://github.com/ntamas/plfit

The plfit library is an efficient implementation of the method published by Clauset, Shalizi and Newman:

Clauset A, Shalizi CR and Newman MEJ: Power-law distributions in empirical data. SIAM Review 51, 661-703 (2009).
 
  1. How to interpret the p-value of the Kolmogorov-Smirnov test?
See the paper cited above for more details.
 
  1. The igraph help file states: "Small p-values (less than 0.05) indicate that the test rejected the hypothesis that the original data could have been drawn from the fitted power-law distribution" . The C implementation of the KS test in igraph uses the Hurwitz Zeta function. Shouldn't this mean that high p-values indicate a good model fit, as suggested by Clauset et al (2009:678)? 
Well, tests based on p-values are not really about whether a model is a "good fit" or a "bad fit"; a low p-value _roughly_ says that "it is very unlikely that the data could have been generated from the hypothesized distribution" (in our case, a power-law). A high p-value _roughly_ means that "the data may have come from the hypothesized distribution"; however, there could be alternative distributions that can describe the data just as well.

So, in a nutshell:

low p-value --> null hypothesis (power-law) rejected --> data is likely not a power-law
high p-value --> null hypothese (power-law) _not_ rejected --> data could come from a power-law, or maybe from something else, we don't know, we just could not _exclude_ the power-law

All the best,
T.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]