Tag Archives: benchmark

Benchmarks: Some thoughts / Alguns pensamentos

This article is written in English and Portuguese
Este artigo está escrito em Inglês e Português

English version:

Disclaimer
Although I have a perfectly clear disclaimer on the right side of the blog, I’d like to start this article by re-enforcing that disclaimer. Everything I’ll write here are my own thoughts and in no way represent my employer view or position. It’s probable also that the ideas presented here may go against some respectable people’s ideas and some established opinions. I’d like to apologize if someone expects something different. Having said this, the ideas presented here are my real belief, based on my work experience, on my readings and many discussions with some very respectable people.

What is a database benchmark?
Database benchmarks are well defined stress tests and usually when we see references to them chances are that it’s all about the TPC council published benchmarks. The TPC council is a non-profit organization whose members are database software vendors, hardware makers, and other IT related organizations. The purpose of the TPC (Transaction Processing Council) is to “define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry”.
The TPC has defined several specs for benchmarks. The specs have designations like:

  • TPC-C
    OLTP benchmark
  • TPC-D
    Datawarehouse benchmark (deprecated)
  • TPC-H
    Datawarehouse benchmark (current)
  • TPC-DS
    Decision support benchmark (new)
  • TPC-E
    OLTP benchmark created to replace the TPC-C

The benchmark specs are very detailed and include everything that defines the tests. This include the database schema, the queries that are to be run, the several rules that specify what can and can’t be used in terms of features, and also what must be used (like referential and check constraints etc.).
The benchmark reports include the measures taken, which are typically specified in number of transactions per time unit (the transactions are well defined in the specs). They also include the total system cost, which should include hardware and software costs including acquisition and support for a well defined period of time (3 years for example)
And these specs have evolved over time, due to several reasons:

  • New hardware and software features turned earlier specs useless (like materialized views which killed TPC-D)
  • New uses of software made some old specs look a bit outdated, meaning that new specs were created to better match the new reality

Personally I have no reason to assume that TPC was not created with good intentions. But for reasons that will be clarified along this article, I give no credit to the benchmarks. They only measure the will of a supplier to win, and how well it can twist the benchmark rules in order to achieve something worth publishing. Please note that although I have some commercial understanding of the market, I’m basically a technician. If you’re talking to top management of big companies, they tend to don’t understand any technical related argument, and they might, because of that, give some credit to the benchmarks published on TPC.org. But if you’re talking to some technically aware person, I believe it’s easy to dismantle a benchmark in around 10m (I’ll try it for the top TPC-C published as an exercise).
Having said this, running a publishable benchmark represents an enormous technical and economic effort, and I must grant credit to the companies which do it.

The TPC-E mystery…
TCP-E benchmark specification was introduced because TPC-C had a lot of holes that made it unrealistic. You can easily find several opinions stating it’s a better benchmark than it’s predecessor.
But if you check the results you’ll see that only one database vendor, Microsoft, has publish results for this. The mystery is why? Of course, MS supporters say it’s because no one can beat them. No one else believes that. Most people who dedicate some time to study this believe that this is a leap frog game and it really depends on the investment. And if it’s still done (or was) with TPC-C, a very mature specification, it should be even easier to do with a relatively new benchmark (the tricks are easier to discover and implement while the specs are still new). I’ve found several references on the Internet speaking about why no one entered the TPC-E “game”, but none of them is conclusive (from my perspective). I can leave some references here for your own research:

You can decide for yourself. Personally I think the issue is related to what Mr. Jerry Keesee (IBM Informix database development director) explained in a public webcast (Chat with the labs) on 29 January 2009. More on that later.
But currently for OLTP, the most popular benchmark is TPC-C. TPC-E apparently is being pushed by hardware vendors (including IBM).

Benchmarks and Informix
Informix was a regular leader of benchmarks in the nineties. It used to partner with hardware vendors to achieve several top result in several categories.
The subject of benchmarks is very sensitive within the Informix community. Many people strongly believe that IBM should run official benchmarks using Informix. IBM never did it after acquiring Informix Software Inc. We may understand or not that position, but the reasons were clearly explained by Mr. Jerry Keesee, in a very clear answer to the question “How come IBM doesn’t participate in public benchmarks of IDS? Like the TPC.org benchmarks?”. The question was asked by a well known (and particularly critical of IBM) participant of the Informix forums at the end of a webcast in 29 January 2009. The reasons presented were:

  1. IBM has been doing TPC-C benchmarks with DB2 for a very long time. If we do one with Informix only two things can happen and they’re both bad for IBM:
    1. Informix gets a better number, and the competition and analysts would crush us (IBM and DB2)
    2. Informix gets a worse number, and the competition and analysts would crush us (IBM and Informix)
  2. We could consider TPC-E, but currently there’s only one vendor (MS) who published results on this kind of benchmark. Once we publish one (which would be better in absolute numbers or cost, since no one publishes a benchmark that doesn’t show an improvement), we would have entered a very expensive race, because the vendor who is surpassed will probably reply and the leap frog game would start. Jerry prefers to invest on new features and product improvements which directly bring benefits to the customers.

You can’t ask for something clearer than this. Meanwhile a benchmark on MDM (Meter Data Management) was done and published and Informix got a great result but this is not a standard. So it does not satisfy people who really want TPC results.

Very recently you may have noticed that the words “TPC-C” and “Informix” were floating around the social networks and some Informix related sites. That’s because Eric Vercelletto decided to pick the TPC-C schema and specifications and run a non-oficial database stress test. While some people were jumping around in happiness others were criticizing him. My position is much more neutral, and I think most of the people talking about it never took the time to make a deep analysis of a TPC-C benchmark result. The opinions tend to be divided between something like “yeah!!! Informix finally has a TPC-C benchmark. Now we can show the performance Informix can achieve” and “Oh… The result is so low. He used the free version. It’s useless”. Really, if you want to speak about it, let’s spend some time to look at some facts:

  • The benchmark run by Eric is not a true TPC-C benchmark. It’s not official, it’s not audited and yes, the number is very low if you compare it to other official published results (which by the way can’t be done for legal and technical reasons)
  • Yes, Eric used the Innovator-C edition, which is a cut down version free of costs. It has limits and lacks some features that could help (only 2GB of RAM, no partitioning etc.)
  • Eric fought against technical problems with the clients sending the transactions. In fact he put the clients and the database on the same machine. Something you’ll never see in a true benchmark
  • Eric used 4 hard disks. You can find a published result on TPC.org site with the same number of cores in the database server, that used 200 hard disks. Yes, you read correctly, two hundred hard disks (but for the top results the numbers are on the thousands)
  • The same  published be…
Leave a comment Continue Reading →

Informix 11.70.xC4 is available / Informix 11.70.xC4 está disponível

This article is written in English and Portuguese
Este artigo está escrito em Inglês e Português

English Version:

IBM has relesed Informix 11.70.xC4 in October 25. The changes in this release, taken directly from the release notes, are (comments added):

  • Administration
    • Enhancements to the OpenAdmin Tool (OAT) for Informix
      OAT now allows the management of database users (for non-OS users) and OAT is now delivered and installable with Client SDK for Windows (32bits), Linux (32 and 64 bits) and MAC OS (64 bits)
    • Enhancements to the Informix Replication Plug-in for OAT
      The ER plugin now follows ER improvements and can handle multibyte locales.
    • Informix Health Advisor Plug-in for OAT
      A totally new plugin that can examine a great number of metrics and configuration details, warning you (email) of anything not within the recommended settings and/or defined thresholds.
      The checks can be scheduled and you can setup several different profiles. Each will run a specific (and configurable) set of metrics.
    • Dynamically change additional configuration parameters
      Several parameters can now be changed with onmode -wm/-wf. Some of them are really important (WSTATS, AUTO_REPREPARE, CKPTINTVL, DIRECTIVES, OPTCOMPIND, SHMADD) and can save you from planned downtime. Others are more or less irrelevant (some of them could be changed by editing the $ONCONFIG file), but it’s important that they can be changed through SQL Admin API for client DBA tools
    • Compare date and interval values
      API extensions to compare datetime and interval values.
    • Plan responses to high severity event alarms
      Could not understand what is new. This could be done before by customizing the ALARMPROGRAM script
    • Data sampling for update statistics operations
      A new parameter (USTLOW_SAMPLE) defines if you want to sample the data for the index information gathering or not (indexes with more than 100.000 leaf pages). 11.70.xC3 did this by default. This can also be set at session level. Note that this can have a dramatic impact on the time it takes to regenerate your statistics. The “LOW” will be the slowest for large tables with indexes…
    • SQL administration API command arguments for creating sbspaces
      New options to create smart blob spaces with logging and access time recording in SQL admin API
    • Monitor client program database usage
      The client program’s full path name is now available in onstat -g ses.
      Note that although you can use this to monitor and control access, this information is sent by the client side and potentially can be faked (not the average user, but an attacker could do it)
    • Progress of compression operations
      Two new columns in onstat -g dsk show the approximate percentage of the tasks already completed and the estimated time to finish
  • High availability and Enterprise Replication
    • Easier setup of faster consistency checking
      When using ifx_replcheck and an index is created on it, the CRCOLS are not necessary
    • Handle Connection Manager event alarms
      Scripts used for processing connection manager alarms now have access to two variables that identify their name (INFORMIXCMNAME) and unit name (INFORMIXCMCONUNITNAME). This facilitates the script creation
    • Easier startup of Connection Manager
      When the variable CMCONFIG is set and points to the connection manager configuration file, it can be started, stop and restarted without specifying the configuration file. Much like ONCONFIG is used for the engine
    • Prevent failover if the primary server is active
      A new parameter called SDS_LOGCHECK can specify an number of seconds while the SDS secondaries will monitor the logical logs for activity (which would be generated by the primary server). This tries to implement a safety measure to prevent an SDS server to become a primary after a “false” failure of the primary. Note that usually this is prevented by using I/O fencing, but if that is not available this can be another way to make sure you don’t end up with two primaries
    • Configure secure connections for replication servers
      A new parameter called S6_USE_REMOTE_SERVER_CFG define…
Leave a comment Continue Reading →

Informix 11.70.xC4 is available / Informix 11.70.xC4 está disponível

This article is written in English and Portuguese
Este artigo está escrito em Inglês e Português

English Version:

IBM has relesed Informix 11.70.xC4 in October 25. The changes in this release, taken directly from the release notes, are (comments added):

  • Administration
    • Enhancements to the OpenAdmin Tool (OAT) for Informix
      OAT now allows the management of database users (for non-OS users) and OAT is now delivered and installable with Client SDK for Windows (32bits), Linux (32 and 64 bits) and MAC OS (64 bits)
    • Enhancements to the Informix Replication Plug-in for OAT
      The ER plugin now follows ER improvements and can handle multibyte locales.
    • Informix Health Advisor Plug-in for OAT
      A totally new plugin that can examine a great number of metrics and configuration details, warning you (email) of anything not within the recommended settings and/or defined thresholds.
      The checks can be scheduled and you can setup several different profiles. Each will run a specific (and configurable) set of metrics.
    • Dynamically change additional configuration parameters
      Several parameters can now be changed with onmode -wm/-wf. Some of them are really important (WSTATS, AUTO_REPREPARE, CKPTINTVL, DIRECTIVES, OPTCOMPIND, SHMADD) and can save you from planned downtime. Others are more or less irrelevant (some of them could be changed by editing the $ONCONFIG file), but it’s important that they can be changed through SQL Admin API for client DBA tools
    • Compare date and interval values
      API extensions to compare datetime and interval values.
    • Plan responses to high severity event alarms
      Could not understand what is new. This could be done before by customizing the ALARMPROGRAM script
    • Data sampling for update statistics operations
      A new parameter (USTLOW_SAMPLE) defines if you want to sample the data for the index information gathering or not (indexes with more than 100.000 leaf pages). 11.70.xC3 did this by default. This can also be set at session level. Note that this can have a dramatic impact on the time it takes to regenerate your statistics. The “LOW” will be the slowest for large tables with indexes…
    • SQL administration API command arguments for creating sbspaces
      New options to create smart blob spaces with logging and access time recording in SQL admin API
    • Monitor client program database usage
      The client program’s full path name is now available in onstat -g ses.
      Note that although you can use this to monitor and control access, this information is sent by the client side and potentially can be faked (not the average user, but an attacker could do it)
    • Progress of compression operations
      Two new columns in onstat -g dsk show the approximate percentage of the tasks already completed and the estimated time to finish
  • High availability and Enterprise Replication
    • Easier setup of faster consistency checking
      When using ifx_replcheck and an index is created on it, the CRCOLS are not necessary
    • Handle Connection Manager event alarms
      Scripts used for processing connection manager alarms now have access to two variables that identify their name (INFORMIXCMNAME) and unit name (INFORMIXCMCONUNITNAME). This facilitates the script creation
    • Easier startup of Connection Manager
      When the variable CMCONFIG is set and points to the connection manager configuration file, it can be started, stop and restarted without specifying the configuration file. Much like ONCONFIG is used for the engine
    • Prevent failover if the primary server is active
      A new parameter called SDS_LOGCHECK can specify an number of seconds while the SDS secondaries will monitor the logical logs for activity (which would be generated by the primary server). This tries to implement a safety measure to prevent an SDS server to become a primary after a “false” failure of the primary. Note that usually this is prevented by using I/O fencing, but if that is not available this can be another way to make sure you don’t end up with two primaries
    • Configure secure connections for replication servers
      A new parameter called S6_USE_REMOTE_SERVER_CFG defines…
Leave a comment Continue Reading →