# # # patch "wiki/PerformanceWork/SQLiteAnalyzeDiscussion.mdwn" # from [129b9b89a10e714df93779d8798102ed5db4bfe0] # to [c9fe44481f0e127b53483a20002b85e57dd6ddc5] # ============================================================ --- wiki/PerformanceWork/SQLiteAnalyzeDiscussion.mdwn 129b9b89a10e714df93779d8798102ed5db4bfe0 +++ wiki/PerformanceWork/SQLiteAnalyzeDiscussion.mdwn c9fe44481f0e127b53483a20002b85e57dd6ddc5 @@ -1,6 +1,6 @@ -[[!tag migration-auto]] +[[!tag migration-done]] -Ever since we imported SQLite 3.2.3 (which included a much improved query optimizer, and added the [analyze](http://sqlite.org/lang_analyze.html) command), we've needed to run an analyze on monotone repositories to get decent performance, otherwise SQLite doesn't make good use of our indices and we become extremely I/O bound. This can affect many areas of monotone--for instance, a small commit into a very large repository can take an extremely long time unless the repository has been analyzed recently, because we spend many minutes seek()ing in the repository. +Ever since we imported SQLite 3.2.3 (which included a much improved query optimizer, and added the [analyze](http://sqlite.org/lang_analyze.html) command), we've needed to run an analyze on monotone repositories to get decent performance, otherwise SQLite doesn't make good use of our indices and we become extremely I/O bound. This can affect many areas of monotone--for instance, a small commit into a very large repository can take an extremely long time unless the repository has been analyzed recently, because we spend many minutes seek()ing in the repository. ---- @@ -13,90 +13,88 @@ Question on Note on Note: Can you provid ---- Question on Note on Note: Can you provide a specific monotone command (or ideally, a specific SQL statement) that is aided by ANALYZE when run on the latest monotone database complete with user/sys/real timings? - - I think 'heads' will do it, the query to select branch certs with a particular value. - [[People/MattJohnston]] + - I think `heads` will do it, the query to select branch certs with a particular value. - [[People/MattJohnston]] ---- -I tried 'mtn heads' on a freshly pulled repo and source tree with and without ANALYZE (i.e., with and without the sqlite_stat1 table). Both database files were hot. The timings were basically the same. Using a database with ANALYZE previously run on it resulted in a timing of: real 0m3.375s. Using a database without ANALYZE previously run on it (no sqlite_stat1 table) resulted in a timing of: real 0m3.234s. ANALYZE is clearly not helpful in this case; it just served to put the database into the OS cache. Do you recall any other specific examples of monotone slowness without ANALYZE? (Please run all tests on hot databases to compare apples with apples.) +I tried `mtn heads` on a freshly pulled repo and source tree with and without ANALYZE (i.e., with and without the sqlite_stat1 table). Both database files were hot. The timings were basically the same. Using a database with ANALYZE previously run on it resulted in a timing of: `real 0m3.375s`. Using a database without ANALYZE previously run on it (no `sqlite_stat1` table) resulted in a timing of: `real 0m3.234s`. ANALYZE is clearly not helpful in this case; it just served to put the database into the OS cache. Do you recall any other specific examples of monotone slowness without ANALYZE? (Please run all tests on hot databases to compare apples with apples.) Just for kicks, I powered off my computer and restarted it to guarantee that all database files would be cold. I then retested both the ANALYZEs and non-ANALYZEd databases. The timings are as follows: mt.mtn.analyze: - /c/monotone/monotone-sources$ time ../mtn.exe --db=../mt.mtn.analyze heads + /c/monotone/monotone-sources$ time ../mtn.exe --db=../mt.mtn.analyze heads - mtn: branch 'net.venge.monotone' is currently merged: - 7160327541e5ef25b37068e8cebfcff674bcb887 address@hidden 2006-05-01T12:00:21 - - real 0m9.234s - user 0m0.015s - sys 0m0.015s + mtn: branch 'net.venge.monotone' is currently merged: + 7160327541e5ef25b37068e8cebfcff674bcb887 address@hidden 2006-05-01T12:00:21 + real 0m9.234s + user 0m0.015s + sys 0m0.015s mt.mtn.no-analyze: - /c/monotone/monotone-sources$ time ../mtn.exe --db=../mt.mtn.no-analyze heads - mtn: branch 'net.venge.monotone' is currently merged: - 7160327541e5ef25b37068e8cebfcff674bcb887 address@hidden 2006-05-01T12:00:21 + /c/monotone/monotone-sources$ time ../mtn.exe --db=../mt.mtn.no-analyze heads + mtn: branch 'net.venge.monotone' is currently merged: + 7160327541e5ef25b37068e8cebfcff674bcb887 address@hidden 2006-05-01T12:00:21 - real 0m4.672s - user 0m0.015s - sys 0m0.015s + real 0m4.672s + user 0m0.015s + sys 0m0.015s Re-run the previous commands, now that both database files are in OS cache: mt.mtn.analyze: - /c/monotone/monotone-sources$ time ../mtn.exe --db=../mt.mtn.analyze heads + /c/monotone/monotone-sources$ time ../mtn.exe --db=../mt.mtn.analyze heads - mtn: branch 'net.venge.monotone' is currently merged: - 7160327541e5ef25b37068e8cebfcff674bcb887 address@hidden 2006-05-01T12:00:21 - - real 0m3.312s - user 0m0.015s - sys 0m0.015s + mtn: branch 'net.venge.monotone' is currently merged: + 7160327541e5ef25b37068e8cebfcff674bcb887 address@hidden 2006-05-01T12:00:21 + real 0m3.312s + user 0m0.015s + sys 0m0.015s + mt.mtn.no-analyze: - /c/monotone/monotone-sources$ time ../mtn.exe --db=../mt.mtn.no-analyze heads - mtn: branch 'net.venge.monotone' is currently merged: - 7160327541e5ef25b37068e8cebfcff674bcb887 address@hidden 2006-05-01T12:00:21 + /c/monotone/monotone-sources$ time ../mtn.exe --db=../mt.mtn.no-analyze heads + mtn: branch 'net.venge.monotone' is currently merged: + 7160327541e5ef25b37068e8cebfcff674bcb887 address@hidden 2006-05-01T12:00:21 - real 0m3.235s - user 0m0.015s - sys 0m0.015s + real 0m3.235s + user 0m0.015s + sys 0m0.015s Verify whether ANALYZE had beed run on these files: mt.mtn.analyze: - /c/monotone$ ./sqlite3.exe mt.mtn.analyze "select * from sqlite_stat1" + /c/monotone$ ./sqlite3.exe mt.mtn.analyze "select * from sqlite_stat1" - db_vars|sqlite_autoindex_db_vars_1|4 2 1 - revision_certs|revision_certs__name_value|15523 2218 2 - revision_certs|revision_certs__id|15523 4 - revision_certs|sqlite_autoindex_revision_certs_2|15523 2218 2 2 1 1 - revision_certs|sqlite_autoindex_revision_certs_1|15523 1 - files|sqlite_autoindex_files_1|1433 1 - public_keys|sqlite_autoindex_public_keys_2|24 1 - public_keys|sqlite_autoindex_public_keys_1|24 1 - file_deltas|sqlite_autoindex_file_deltas_1|14315 2 1 - revisions|sqlite_autoindex_revisions_1|4122 1 - branch_epochs|sqlite_autoindex_branch_epochs_2|1 1 - branch_epochs|sqlite_autoindex_branch_epochs_1|1 1 - revision_roster|sqlite_autoindex_revision_roster_1|4122 1 - roster_deltas|sqlite_autoindex_roster_deltas_1|5008 2 1 - revision_ancestry|revision_ancestry__child|5012 2 - revision_ancestry|sqlite_autoindex_revision_ancestry_1|5012 2 1 - next_roster_node_number|sqlite_autoindex_next_roster_node_number_1|1 1 - rosters|sqlite_autoindex_rosters_1|2 1 + db_vars|sqlite_autoindex_db_vars_1|4 2 1 + revision_certs|revision_certs__name_value|15523 2218 2 + revision_certs|revision_certs__id|15523 4 + revision_certs|sqlite_autoindex_revision_certs_2|15523 2218 2 2 1 1 + revision_certs|sqlite_autoindex_revision_certs_1|15523 1 + files|sqlite_autoindex_files_1|1433 1 + public_keys|sqlite_autoindex_public_keys_2|24 1 + public_keys|sqlite_autoindex_public_keys_1|24 1 + file_deltas|sqlite_autoindex_file_deltas_1|14315 2 1 + revisions|sqlite_autoindex_revisions_1|4122 1 + branch_epochs|sqlite_autoindex_branch_epochs_2|1 1 + branch_epochs|sqlite_autoindex_branch_epochs_1|1 1 + revision_roster|sqlite_autoindex_revision_roster_1|4122 1 + roster_deltas|sqlite_autoindex_roster_deltas_1|5008 2 1 + revision_ancestry|revision_ancestry__child|5012 2 + revision_ancestry|sqlite_autoindex_revision_ancestry_1|5012 2 1 + next_roster_node_number|sqlite_autoindex_next_roster_node_number_1|1 1 + rosters|sqlite_autoindex_rosters_1|2 1 - mt.mtn.no-analyze: - /c/monotone$ ./sqlite3.exe mt.mtn.no-analyze "select * from sqlite_stat1" + /c/monotone$ ./sqlite3.exe mt.mtn.no-analyze "select * from sqlite_stat1" - SQL error: no such table: sqlite_stat1 + SQL error: no such table: sqlite_stat1 So there you have it - ANALYZE on a hot database makes no difference, and on a cold database is actually slower (strangely, in this specific case) than a database that was not ANALYZED (no sqlite_stat1 table). @@ -104,31 +102,30 @@ It looks like the behaviour has changed It looks like the behaviour has changed for the better in recent versions (0.26). I think this problem was originally observed around 0.21/0.22 (whenever analyze was first introduced). It also may be that I misremember the originally noted performance problem. I attempted to reproduce slow commits with an unanalyzed repository with a hot and cold cache, and found that there is no significant difference in performance between analyzed and unanalyzed databases for this case. -The experiment was to commit one line changes to three files in the head of net.venge.monotone. The database had been freshly dumped, loaded and analyzed prior to each test run. In both cases, the database was analyzed first, but in the unanalyzed case the contents of the sqlite_stat1 table was deleted after the analyze. +The experiment was to commit one line changes to three files in the head of net.venge.monotone. The database had been freshly dumped, loaded and analyzed prior to each test run. In both cases, the database was analyzed first, but in the unanalyzed case the contents of the sqlite_stat1 table was deleted after the analyze. -Analyzed, hot cache: 7.18 to 7.71 seconds, 95% C.I. 0.27 -Unanalyzed, hot cache: 7.34 to 7.73 seconds, 95% C.I. 0.19 + Analyzed, hot cache: 7.18 to 7.71 seconds, 95% C.I. 0.27 + Unanalyzed, hot cache: 7.34 to 7.73 seconds, 95% C.I. 0.19 -Analyzed, cold cache: 11.13 to 15.12 seconds, 95% C.I. 1.99 -Unanalyzed, cold cache: 12.74 to 14.81 seconds, 95% C.I. 1.03 + Analyzed, cold cache: 11.13 to 15.12 seconds, 95% C.I. 1.99 + Unanalyzed, cold cache: 12.74 to 14.81 seconds, 95% C.I. 1.03 ---- I'm glad you finally agree that ANALYZE is of no use whatsoever to Monotone in its present state. The funny thing is that I'm the same dude who pointed NJS towards using ANALYZE in the SQLite ticket (http://www.sqlite.org/cvstrac/tktview?tn=1414) in the first place, and asked that SQLite continue to perform manual table join ordering somehow for this exact type of case. The reason why ANALYZE helped in that specific case is due to the fact that SQLite guessed wrongly as to how to perform the ordering of the join in the query: - SELECT A.parent, A.child + SELECT A.parent, A.child FROM revision_ancestry AS A, revision_certs AS C, revision_certs AS P - WHERE (C.id IN a_view OR P.id IN a_view) AND - C.id = A.child AND P.id = A.parent AND - C.name = 'branch' AND P.name = 'branch' AND - C.value != P.value + WHERE (C.id IN a_view OR P.id IN a_view) AND + C.id = A.child AND P.id = A.parent AND + C.name = 'branch' AND P.name = 'branch' AND + C.value != P.value - However, you do not necessarily need ANALYZE to get good query results with multi-table joins. Simply replace each comma in the FROM clause with "CROSS JOIN" and fiddle with the order of the tables in the from clause until you find an optimal order for your SELECT (i.e., FROM revision_ancestry AS A CROSS JOIN revision_certs AS C CROSS JOIN revision_certs AS P). This is exactly how the logic in SQLite used to work before ANALYZE was introduced. The order of the tables in the from clause determined the order in which the tables were joined. Because monotone 0.26 apparently does not make use of multi-table joins in queries, ANALYZE is not helpful except as a database file pre-caching technique (although the monotone database pre-caching patch proposed on the mailing list is more effective/efficient). -As for the theory on why "time ../mtn.exe --db=../mt.mtn.analyze heads" was so much slower in a cold start than "time ../mtn.exe --db=../mt.mtn.no-analyze heads", I reversed the order in which I ran the two commands and I got 6s for heads on mt.mtn.no-analyze and 9s again with mt.mtn.analyze, each after a cold machine boot. The two database files are identical except for the existance of the sqlite_stat1 table in mt.mtn.analyze. +As for the theory on why `time ../mtn.exe --db=../mt.mtn.analyze heads` was so much slower in a cold start than `time ../mtn.exe --db=../mt.mtn.no-analyze heads`, I reversed the order in which I ran the two commands and I got 6s for heads on mt.mtn.no-analyze and 9s again with mt.mtn.analyze, each after a cold machine boot. The two database files are identical except for the existance of the sqlite_stat1 table in mt.mtn.analyze. And for the record, it takes virtually zero time (1/10th of a second or less) to load the mtn binary into memory from a cold OS file cache. It takes less than 2 seconds for my 5 year old pentium to pre-cache an 80 meg file from a cold cache, and just 1/10th of a second from a hot cache.