gnunet-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-developers] MYSQL optimization


From: Hendrik Pagenhardt
Subject: [GNUnet-developers] MYSQL optimization
Date: Mon, 15 Dec 2003 11:44:40 +0100

Hi there,

I recently converted my GNUnet database from gdbm to mysql. Shortly
after the conversion I noticed a much higher disk activity than before.
As there was not more traffic I suspected the database queries as
culprits. My investigation showed, that at least one query can be
optimized heavily: getRandomContent(). The solution used there is an
optimized version of the standard "random" access method I found in many
places. But it still is too slow for just getting a random chunk of
content. So I came up with the following:

---------------------------

int getRandomContent(HighDBHandle handle,
                     ContentIndex * ce) {
  mysqlHandle * dbh = handle;
  MYSQL_RES * sql_res;
  MYSQL_ROW sql_row;
  char * escapedHash;
  char * hash;
  char * scratch;
  int i;
  int found;

  MUTEX_LOCK(&dbh->DATABASE_Lock_);
  hash = MALLOC(sizeof(HashCode160));
  escapedHash = MALLOC(2*sizeof(HashCode160)+1);
  scratch = MALLOC(2*sizeof(HashCode160)+512+1);

  found = NO;
  for (i=0;i<sizeof(HashCode160);i++)
    hash[i] = randomi(256);
  mysql_escape_string(escapedHash, hash, sizeof(HashCode160));
  sprintf(scratch,
          "SELECT hash,type,priority,fileOffset,fileIndex "
          "FROM data%uof%u "
          "WHERE hash >= '%s' "
          "AND (type = %d OR type = %d) "
          "LIMIT 1 UNION "
          "SELECT hash,type,priority,fileOffset,fileIndex "
          "FROM data%uof%u "
          "WHERE hash >= '' "
          "AND (type = %d OR type = %d) "
          "LIMIT 1",
          dbh->n,
          dbh->i,
          escapedHash,
          LOOKUP_TYPE_CHK,
          LOOKUP_TYPE_CHKS,
          dbh->n,
          dbh->i,
          LOOKUP_TYPE_CHK,
          LOOKUP_TYPE_CHKS);
  mysql_query(dbh->dbf, scratch);
  if(mysql_error(dbh->dbf)[0]) {
    LOG(LOG_ERROR,
        "ERROR: %s\n",
        mysql_error(dbh->dbf));
    FREE(hash);
    FREE(scratch);
    FREE(escapedHash);
    MUTEX_UNLOCK(&dbh->DATABASE_Lock_);
    return SYSERR;
  }
  if(!(sql_res=mysql_store_result(dbh->dbf))) {
    LOG(LOG_ERROR,
        "ERROR: %s\n",
        mysql_error(dbh->dbf));
    FREE(hash);
    FREE(scratch);
    FREE(escapedHash);
    MUTEX_UNLOCK(&dbh->DATABASE_Lock_);
    return SYSERR;
  }
  if(mysql_num_rows(sql_res)>0) {
    if(!(sql_row=mysql_fetch_row(sql_res))) {
      LOG(LOG_ERROR,
          "ERROR: %s\n",
          mysql_error(dbh->dbf));
      FREE(hash);
      FREE(scratch);
      FREE(escapedHash);
      mysql_free_result(sql_res);
      MUTEX_UNLOCK(&dbh->DATABASE_Lock_);
      return SYSERR;
    }
    memcpy(&ce->hash,
           sql_row[0],
           sizeof(HashCode160));
    ce->type = htons(atol(sql_row[1]));
    ce->importance = htonl(atol(sql_row[2]));
    ce->fileOffset = htonl(atol(sql_row[3]));
    ce->fileNameIndex = htons(atol(sql_row[4]));
    found = YES;
    mysql_free_result(sql_res);
  }


  FREE(hash);
  FREE(scratch);
  FREE(escapedHash);

  MUTEX_UNLOCK(&dbh->DATABASE_Lock_);
  if(found==YES) {
    return OK;
  } else {
    LOG(LOG_DEBUG,
        "DEBUG: MySQL random didn't find anything!\n");
    return SYSERR;
  }
}

-------------------------

My version relies on the index on the hash to be used to access the
hash values in a sorted manner until one is found which is higher than
the randomly generated hash. If none is found (random hash to high) the
lowest hash will be used (wrap around). Of course this is only
sufficiently random, if the hashes in the database are evenly
distributed in the key space (hashes following large "gaps" are
preferred), but I think it should suffice for the purpose. If there is a
conceptual error in this approach I'll gladly discuss it.

Another optimization worthy place in the MYSQL database module is
"estimateAvailableBlocks()" as it is called quite often (for all
buckets). The queries themselves are quite fast, but I think they
accumulate. I think I can change it to only check the size of the bucket
which was touched by an update, so it will cache previous results for
the other buckets. This should reduce those queries by a factor of
bucket number. When I come up with an implementation suggestion, I will
post it here.

Ciao,
     Hendrik





reply via email to

[Prev in Thread] Current Thread [Next in Thread]