swarm-modeling
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Swarm-Modelling] Parameter-fitting and model comparison methods for


From: chris rohe
Subject: Re: [Swarm-Modelling] Parameter-fitting and model comparison methods for ABM?
Date: Sun, 01 Jun 2003 18:33:07 +0000

Darren,
I would like to request a copy of your paper on evaluating models. My M.A. was recently completed, which focused on creating predictive models for prehistoric archaeological sites in the Rocky Mountains using Boolean, weighting, and regression models. I am very interested in agent based and neural network approaches, but completely lost in their methods. I hope to continue to learn new ideas and approaches in such computer techniques. Please let me know if I can get a copy of that paper and it would be better to send it to me at my work email. address@hidden Thanks
Chris Rohe
Assisitant Director of Cartography and Geospatial Technologies
Statistical Research, Inc.


From: Darren Schreiber <address@hidden>
Reply-To: address@hidden
To: address@hidden
Subject: Re: [Swarm-Modelling] Parameter-fitting and model comparison methods for ABM?
Date: Thu, 29 May 2003 04:42:15 -0700

This is a good opportunity to bounce around an idea I have been toying with for about a year.

Imagine if we were to use a notation:

y=m(a,b, x)

where m means method (akin to f meaning function), alpha is the algorithm used, beta is the set parameter settings, x is the inputs to the model, and y is the outputs from the model. We could then think about evaluating model fit of ABM's in a similar manner to evaluating model fit for regression models.

In a paper on model evaluation (aka validation), I argue for a four part ontology connected to a list of 20 or so evaluation techniques. My proposed ontology is Theory--Model--Phenomena--Noumena. In this ontology, theory is the idea in your head about how the world really works. The model is the expression of the theory in a concrete form. Phenomena is the set of observations we have. And, the Noumena (using a phrase from Kant) is things as they actually are (Truth with a capital T). My distinction between theory and model comes out of my background as an attorney. I cannot copyright my great American novel (theory) until I have written it (model.)

In typical regression models, we would look for the best parameters beta to minimize residuals (or something like this depending on the estimator) but we don't tinker with the functional form of the equation. In evaluating ABM's however, we can tinker with both the algorithms and the parameters in order to adjust the fit between the theory and the phenomena. It isn't hard to imagine that a sufficiently complex problem might even warrant using genetic programming to create myriad swarm simulations in order to find an algorithm and set of parameters that best fits the problem. The very clever next step (suggested by John Miller at CMU) would be to then create another search to see how easy it is to "break" the model in order to evaluate the robustness of the resulting algorithm/parameter set. Great models would be ones that both fit the data and are robust in fitting the data. You would probably also want to partition your dataset so that you are doing exploratory analysis with one dataset and confirmatory analysis on the second.

Statistical theory about evaluating regression models is hitting somewhat of a quandry now because of the computer revolution. It used to be that the 95% confidence interval made a lot of sense as a rule of thumb, because running regression model took a big stack of punch cards and a lot of mainframe time. But some statistical packages now have the power to search the possible combinations of variables and functional forms to find the best fits with the data. And, other kinds of data mining have become an important (even life saving) art. But, if I have run three hundred possible models against my data set, it would not be shocking if fifteen of them are getting significance at the .05 level even if the data set is just random numbers. Some have argued that sloppy statistical practices are probably leading to spurious result rates of 50% in some disciplines.

Imagine, I mine the hell out of my data and using eight parameters finally get a .05 result from my data set. You kind of think my theory is kooky, but since most journals aren't publishing null findings, you feel obliged to start with my kooky theory and data mine from there until you also meet the .05 threshold. It isn't hard to see how you could even get an entire line of research based off of nothing but totally random numbers and yet meeting all of the typical standards for publication.

The problem is being compounded by the escalation in types of estimators and models (OLS, GLM, probit, logit, scobit, MLE, MCMC, neural networks, ABMs.) If I can't get .05 results from my data with OLS, should I just move to another form? When methodologists lose sight of the substantive problems it can be easy to just drift into more and more arcane methods.

A related problem is that if I proliferate the parameters in a regression model very high, I have quickly defined a nearly incomprehensible high dimensional space where robustness may be a huge unknown and theoretical understanding is almost certainly lost. Political scientist Chris Achen has thus argued for "ART: A Rule of Three." You better have a damned good reason for having more than three variables in your regression model.

As I have seen how the problems facing statistics are starting to cause an unraveling of traditional practice and theory, it has given me increasing confidence that evaluating agent-based models does not have to be relegated to the black art status. In fact, our consciousness of the flexibility of our algorithms and parameters may even give us a conceptual advantage in getting ahead of these problems and perhaps even providing leadership to other types of modelers.

In my current thinking, modeling of all sorts (statistical, formal, computational, qualitative, narratives, interpretive dance, etc) face a common set of problems. The Matrix as a movie may get a positive evaluation because it resonates some "real" truths (in the noumenal sense). Battlefield Earth (a horrifyingly bad film) fails because its plots, characters, technology, and acting are all completely unbelievable or not worthy of belief or suspended disbelief. Models work when they address a problem. As Jane Azevedo argues in "Mapping Reality", not all models are appropriate for all problems and models can only be evaluated in the context of the problem they are addressing. A topological map is great for hiking but terrible for navigating the subway system of London.

To get around the problems we face in evaluating agent-based models, we need to return to some first principles of epistemology and metaphysics, we need to develop best-practices that fit our current problems and technology and their foreseeable extensions, and we need to develop some community consensus about those best practices. My prediction is that in about a few years the problems of current statistical practice (publication based upon asterixes, thoughtless superpowered data mining, no null findings, no replication studies, no substantive interpretation of coefficients) will hit some academic headlines and cause a big rethink. Agent-based modeling is going to have to do that thinking now to move ahead. And just borrowing the rubric from standard stats isn't going to do it.

Steve, I am sending you a copy of my evaluating models paper under a separate cover. I appreciate feedback on it since I am going to be doing a major rework of it in the fall. Anyone who would like a copy please ask. It has more coherence than this late night (early morning) set of rambling.

        Darren



On Wednesday, May 28, 2003, at 06:27  AM, Steve Railsback wrote:

Hi-

I am trying to write up some recommendations for analysis of agent-based
models, and am not sure what to say about traditional methods for (1)
fitting model parameters to data and (2) comparing model versions by
their ability to fit data.

In the stochastic simulation literature (e.g., Law and Kelton 1999)
there is discussion of some traditional techniques like maximum
likelihood estimation; but I've never seen such techniques (also
including Akaika's Information Criterion and Bayesian analysis) applied
to agent-based models. Instead, the few examples of parameter-fitting
I've found use a simple filtering process- simulate a billion
alternative parameter sets and identify the ones that produce acceptable
results.

Lacking a background in statistics, I can't help wondering if there are
not some fundamental obstacles to using these traditional techniques for
ABMs, due to things like:

- The observed relation of a particular output to a particular input
potentially being extremely weird and 'noisy' even if the input has a
simple, strong effect on the agent behavior that produces the output.

- The large number of pathways by which a system can arrive at a
particular state?

- With AIC, the analysis depends heavily on the number of parameters in
the model, which by itself could be a very interesting problem. What
really constitutes a 'parameter'?

- Does the conventional equating of degrees of freedom with # of
parameters make sense?

Does anyone have any literature, experience, understanding??

Thanks,

Steve Railsback

Lang, Railsback & Assoc.
Arcata CA

_______________________________________________
Modelling mailing list
address@hidden
http://www.swarm.org/mailman/listinfo/modelling


_______________________________________________
Modelling mailing list
address@hidden
http://www.swarm.org/mailman/listinfo/modelling

_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE* http://join.msn.com/?page=features/junkmail



reply via email to

[Prev in Thread] Current Thread [Next in Thread]