Re: [Koha-devel] DB design (MARC structure)

koha-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-devel] DB design (MARC structure)

From:	Paul POULAIN
Subject:	Re: [Koha-devel] DB design (MARC structure)
Date:	Mon Jun 21 02:51:02 2004
User-agent:	Mozilla/5.0 (X11; U; Linux i686; fr-FR; rv:1.6) Gecko/20040115

Joshua Ferraro a écrit :

Paul a écrit,

I'm still showing some problems with the searching speed with the latest

CVS ... maybe there are still some index problems. I tried an authorsearch on "o'brian, patrick" and the result took about 10-15 seconds toreturn. Directly from mysql I get:

mysql> select distinct m1.bibid from biblio,biblioitems,marc_biblio,marc_word 
as m1,marc_word as m2,marc_word as m3 where 
biblio.biblionumber=marc_biblio.biblionumber and 
biblio.biblionumber=biblioitems.biblionumber and m1.bibid=marc_biblio.bibid and 
(m1.bibid=m2.bibid and m1.bibid=m3.bibid) and ((m1.word  like 'o%' and 
m1.tagsubfield in ('100a','110a', '700a', '710a'))and (m2.word like 'brian%' and 
m2.tagsubfield in('100a','110a', '700a', '710a'))and (m3.word like 'patrick%' and 
m3.tagsubfield in('100a','110a', '700a', '710a'))) order by biblio.title;

77 rows in set (5.34 sec)

here's the explain on that query:

mysql> explain select distinct m1.bibid from 
biblio,biblioitems,marc_biblio,marc_word as m1,marc_word as m2,marc_word as m3 
where biblio.biblionumber=marc_biblio.biblionumber and 
biblio.biblionumber=biblioitems.biblionumber and m1.bibid=marc_biblio.bibid and 
(m1.bibid=m2.bibid and m1.bibid=m3.bibid) and ((m1.word  like 'o%' and 
m1.tagsubfield in ('100a','110a', '700a', '710a'))and (m2.word like 'brian%' and 
m2.tagsubfield in('100a','110a', '700a', '710a'))and (m3.word like 'patrick%' and 
m3.tagsubfield in('100a','110a', '700a', '710a'))) order by biblio.title;
+-------------+--------+------------------------+-------------+---------+--------------------------+------+-----------------------------------------------+
| table       | type   | possible_keys          | key         | key_len | ref   
                   | rows | Extra                                        |
+-------------+--------+------------------------+-------------+---------+--------------------------+------+-----------------------------------------------+
| m2          | range  | bibid,word,Search_Marc | Search_Marc |     259 | NULL  
                   |  366 | Using where; Using temporary; Using filesort |
| m1          | ref    | bibid,word,Search_Marc | bibid       |       8 | 
m2.bibid                 |   56 | Using where                                  |
| marc_biblio | eq_ref | PRIMARY,biblionumber   | PRIMARY     |       8 | 
m1.bibid                 |    1 | Using where; Distinct                        |
| biblio      | eq_ref | PRIMARY,blbnoidx       | PRIMARY     |       4 | 
marc_biblio.biblionumber |    1 | Distinct                                     |
| biblioitems | ref    | bibnoidx               | bibnoidx    |       4 | 
biblio.biblionumber      |   12 | Using index; Distinct                        |
| m3          | ref    | bibid,word,Search_Marc | bibid       |       8 | 
m1.bibid                 |   20 | Using where; Distinct                        |
+-------------+--------+------------------------+-------------+---------+--------------------------+------+-----------------------------------------------+
6 rows in set (0.04 sec)

mysql>

So it still looks like we're using temp and filesort--which I assume is
causing the hangup ... if that is the best we can do we may need to start
thinking about breaking up marc_word into sections (e.g., marc_word_title;
marc_word_author, etc.).  The search is really accurate but just too slow
for a production database as large as ours.  What do folks think, would that
speed things up?

I think it's too complex to code. The DB would need a big rewritte &multi-marc support would be a pain.

Some ideas to continue speeding things :

* analyse the table :http://www.databasejournal.com/features/mysql/article.php/10897_1382791_3* is your my.cnf correctly set ? (ie with big caches : "Eliminating thefilesort to speed things up is best done by calculating how big yourresult set of this operation can become and then increase the servervariable sort_buffer_size which is the maximum size that mysqld willkeep in memory before using a filesort instead. Note that if yourresultset is extremely big then you might consume more memory than isadvisable and then things might slow down because of swaping" hint givenhere : http://forums.devshed.com/showthread.php?t=149288. Settingsort_buffer_size=16M seems to make my harddisk silent ;-) )* change the index to make it "unique". The problem being to change thetable definition (with a lot of values in it, is will be quite hard).Not sure of the speed improvement

* move temporary index to RAM (how ?)

* if we add I "limit 0,200" to the query, things are faster. Maybe wecould add a "0,50" or some systempref value.* how many lines have you in your marc_word table ? mine is 9 000 000,and answers are faster than for you (around 5seconds), except when theresult is >200 entries.* do you have in marc_word only tags that are interesting (= did youdiscard, for example, 0xx tags). I think yes, even if, for instance,it's not a standard Koha hack ;-)* multi-word is slower than single word (in your case o'brian is a 2word seach). (note I'm still thinking it's a not a good idea to index 1letter words).* An idea could be to split marc_word into 10 tables, one for each nXXtag (0xx, 1xx, 2xx...) I'm not sure we had a big improvement here,because for some searches, we could have to query a lot of differentstables. another problem is that some tables would be almost empty (like0xx), and some would be huge (like 7xx probably).* Remove ordering of the result. It will remove the "using filesort",which is bad. which improvement does it give Joshua ? maybe we couldorder the result with a perl script AFTER building the result (I'm notsure it would be a good idea, as it's not compatible with any limitclause (& supposes to retrieve all the result list)

HTH

--
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)

[Prev in Thread]

Current Thread

[Next in Thread]

[Koha-devel] DB design (MARC structure), Paul POULAIN, 2004/06/14
- Re: [Koha-devel] DB design (MARC structure), Joshua Ferraro, 2004/06/14
  - Re: [Koha-devel] DB design (MARC structure), Paul POULAIN, 2004/06/17
    - Re: [Koha-devel] DB design (MARC structure), Paul POULAIN, 2004/06/17
    - Re: [Koha-devel] DB design (MARC structure), Joshua Ferraro, 2004/06/19
    - Re: [Koha-devel] DB design (MARC structure), Paul POULAIN <=
    - Re: [Koha-devel] DB design (MARC structure), Joshua Ferraro, 2004/06/21
    - [Koha-devel] UTF-8 and Koha, Stephen Hedges, 2004/06/23
    - Re: [Koha-devel] UTF-8 and Koha, Benedykt P. Barszcz, 2004/06/23
    - Re: [Koha-devel] UTF-8 and Koha, Stephen Hedges, 2004/06/23

Prev by Date: Re: [Koha-devel] How to change the default language of librarian interface
Next by Date: Re: [Koha-devel] DB design (MARC structure)
Previous by thread: Re: [Koha-devel] DB design (MARC structure)
Next by thread: Re: [Koha-devel] DB design (MARC structure)
Index(es):
- Date
- Thread