Re: [Maposmatic-dev] daily update stats

maposmatic-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Maposmatic-dev] daily update stats

From:	David Decotigny
Subject:	Re: [Maposmatic-dev] daily update stats
Date:	Thu, 07 Jan 2010 15:01:40 +0100
User-agent:	Thunderbird 2.0.0.23 (X11/20090817)


Hello,

Jeroen van Rijn wrote:

On Thu, Jan 7, 2010 at 09:58, David Decotigny <address@hidden> wrote:

All in all, that's roughly 60% penalty on both sides.

We'll hit a major critical show-stopper when osm2pgsql reaches 24h to
complete. Any bet on /when/ it /will/ happen ? :) Any offer for a higher-end
hosting ? :)


Hello David,

A 60% penalty is substantial enough to try and improve the influence
of running these things concurrently.

http://wiki.openstreetmap.org/wiki/Osm2pgsql#Slim_mode tells me that
the daily update diffs mean osm2pgsql is being run in slim mode, which
should mean that the daily diffs themselves could be split into chunks
after downloading them, and then running osm2pgsql on the resulting
smaller planet diffs. These could then be scheduled at times of lower
load, with a deadline to start any remaining updates if not completed
by then to assure all updates finish in time.

From what I understood, you are proposing to split the diff updatesinto chunks, and to schedule the renderings inbetween the processing ofthese chunks, effectively serializing things "manually" in order tocontrol the impact of the renderings on the diff update.

The idea is nice, indeed, it could allow us to survive a little longer.But, still, I'm afraid this solution could have a significant overheadon the diff updates (ie. overhead of parsing the diff update, splittingit, etc.), and furthermore, doing so would remove the benefit of having2 CPUs available, not to mention the pain to implement it (need tosynchronize django with the diff update, with all the mess related tofault-tolerance when a process crashes, etc.).

For the same strategy, another, lighter (imho), solution I was thinkingof, was to keep the parallelism we have, but to control it: regulate theflow of renderings so that we have lower than 60% penalty on the diffupdates. That is, when the rendering queue is populated, we don'tconstantly render the maps while the diff update is running (that's whatwe do now). Instead, we control when renderings are allowed or not(think of some "fluid" scheduling technic), while osm2pgsql runs tillcompletion. That way, we don't have to bother about osm2pgsql (it runscontinuously), but we do regulate the renderings so that the overheadthey incur on the diff update is controlled and moderate.

But both solutions have their limit: at some point, the diff update,even alone on the machine, will require 24h to process, based on theassumption that OSM gains in popularity. So, at best we will eventuallynot be able to render anything, and at worst, we will even not be ableto update the DB... Of course, this will happen later with the strategyabove, than if we keep the current scheme. But this will eventuallyhappen, these solutions will just allow us to survive a few weeks/monthslonger. That's the main reason why I would recommend some "easy"technical implementation if we decide to adopt this strategy in themeantime.

In the longer run, either we find the correct way to tune the wholesystem (pgsql, nice, etc.) so that we significantly reduce the pain ittakes to run the diff updates. Or we enjoy a higher-end machine. Or weoptimize osm2pgsql and/or the DB indexes in postgis. Or all of theseoptions.


While I don't have higher-end hosting to offer, I'd be more than happy
to investigate tuning the update process on my local development
server, and submit patches and findings where applicable. I'll be
installing a copy of the mapsosmatic codebase this weekend as it is,
once I have it up and running I'll start paying attention to what's
what as far as these updates are concerned.

That is, is the contention for disk i/o slowing things down, is it
that osm2pgsql dominates the cpu? What happens when we change the
priority of the update and/or rendering tasks, and so on. It may take
me some time to get down and dirty with this codebase, as it's new to
me, but I hope to be of some use to the project in due time. ;)

To answer your first question, I didn't personnally investigate. But Ihave the intuition it's either i/o-based, or lacking some index to speedthings up, or inefficiently serially sending several queries that couldbe grouped. Having more RAM should help anyhow (imho). The OSM peoplewould probably know a lot better on that subject, and I'd be interestedto hear on that.

As for the 2nd point, you have first to follow the instructions in theINSTALL file for ocitysmap. We recommend using postgres 8.3. Theseinstructions have been followed several times by several people runningubuntu jaunty, karmic, and debian sid (both 32 and 64 bits). Then, youfollow the INSTALL file in maposmatic.


The box in question is an AMD Athlon64 X2 6000 (@ stock 3GHz), with
4Gb DDR2, my old workstation now converted to server, basically.

I take it you've already looked into the following (from the same page):
"Optimization

Large imports into PostGIS are very sensitive to maintenance and
monitoring configuration: it is smart to increase the value of
checkpoint_segments so that autovacuum tasks don't slow down imports."

Regards,
Jeroen.


We are very interested in any postgres/system parameter we could tune.

Best regards,

[Prev in Thread]

Current Thread

[Next in Thread]

[Maposmatic-dev] daily update stats, David Decotigny, 2010/01/06
- Re: [Maposmatic-dev] daily update stats, Jeroen van Rijn, 2010/01/06
  - Re: [Maposmatic-dev] daily update stats, Thomas Petazzoni, 2010/01/07
    - Re: [Maposmatic-dev] daily update stats, David Decotigny, 2010/01/07
    - Message not available
    - Re: [Maposmatic-dev] daily update stats, David Decotigny <=
    - Re: [Maposmatic-dev] daily update stats, Jeroen van Rijn, 2010/01/07

Prev by Date: [Maposmatic-dev] [PATCH] [RFC] Dutch translation for the MapOSMatic Django front-end
Next by Date: Re: [Maposmatic-dev] [PATCH] [RFC] Dutch translation for the MapOSMatic Django front-end
Previous by thread: Re: [Maposmatic-dev] daily update stats
Next by thread: Re: [Maposmatic-dev] daily update stats
Index(es):
- Date
- Thread