Running Multiple TPC-C instances in MySQL Database


pdf version

    Iostat multiple instance raw data  iostat Single instance raw data

 

Prepared by:     Emmanuel Arzuaga, Northeastern University Computer Architecture Research Group (NUCAR).

 

Summary

The main idea of this project is to simulate multiple workloads in the same machine.  We are trying to model a typical workload of a company with one license of database system software who shares that same license for multiple databases.   These multiple databases may be used potentially for different applications.  The objective is then to see how the system would behave when managing different databases at the same time.  In this document we report our first set of experiments which starts with the use of three different databases.

 

Problem Addressed

In these experiments we want to address the lack of characterization of multiple workloads in a server system.   Most of the research been done in this area has focused traditionally in characterizing the behavior of streaming data (video/audio) or servers dedicated completely for a single application [1].  In real life this may not necessarily be the common case.  Due to high cost of database software licensing, small companies often use same license to create multiple databases that they will need. Therefore, their particular workload will consist of multiple applications interacting with a single database system.  In order of enhancing performance for this type of services, there is the need to fully understand the characteristic of these multiple instance workloads.

 

Methods used

In this experiment there are three TPC-C [2] like implementations.  These were implemented by a NUCAR member using c++ and mysql++ [3] tools [4].  The machine used for this experiment is a workstation DELL Precision 450 running Fedora Core 4 and MySQL version 4.1.11.  The kernel version is 2.6.11-1.1369FC4_smp.  The system has a Pentium Xeon processor running at 3.06GHz with 1GB RAM, 1MB L2 cache and 533MHz front side bus.  The system has 2 HDDs of 80GB each.  The HDDs configuration is sumarized in the following hdparm -i output:

 

 /dev/hda:\\

 Model=IC35L090AVV207-0, FwRev=V23OA66A, SerialNo=VNVC02G3D50ZWT

 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }

 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=52

 BuffType=DualPortCache, BuffSize=1821kB, MaxMultSect=16, MultSect=16

 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=156250000

 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}

 PIO modes:  pio0 pio1 pio2 pio3 pio4

 DMA modes:  mdma0 mdma1 mdma2

 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5

 AdvancedPM=yes: disabled (255) WriteCache=enabled

 Drive confor

ms to: ATA/ATAPI-6 T13 1410D revision 3a:

 * signifies the current active mode


Figure1.   System Hard Disk information using hdparm -i.

 

 

Experiment and Results

The experiment consisted on running three instances of a TPC-C like workload in the same system using three different databases.  In this first experiment the TPC-C implementations have the same configuration:

 

Table 1.  TPC-C workload description.

Warehouse Number

Number of Terminals

Transactions per Terminal

1

20

10 million

 

Although we are using in this experiment TPC-C data only, that does not need to be the case.   In the future we plan on using a mix of TPC-C and TPC-H type of workloads. As we can see in table 1, in this experiment there were 20 virtual users (or terminals posting transactions) per program.

 

We are aware that using one warehouse is not a good mapping for a TPC-C type testing. Nevertheless since these testing can take a large amount of execution time, we consider that as a first set of experiments it was important to make sure that the system would execute properly before giving it a heavier workload.

 

We collected disk I/O information for 12 hours of execution using linux command iostat [5].  In this report we present the number of KB read and written in a given time unit.  Time is presented in 1-100 units where 100 means 12 hours, so each unit represents about 7 ½ minutes.  Figure 2 presents the number of reads and writes for both hard disks.

 

      (a)                                                                                                                                   (b)

         (c)                                                                                                                                             (d)
Figure 2.  Reads and Writes in KB for both hard disks.

 

We can clearly see from figure 1 that these workloads are heavily write dominated, so most of disk contention would be due to writes.   We can also see that HD2 is getting the largest amount of work being assigned in a magnitude of 175% more that HD1.  We have to mention that these disk are not configured in any RAID format so a better disk utilization may be easily acquired by just do a RAID 0 configuration to the disks (stripping).

 

For the purpose of a better comparison, we ran the same experiment with only one instance of TPC-C; that is a single instance workload.  This experiment might help us see the impact of this type of application independently. Results are displayed in figure 3.

 

  (a)                                                                                                                                  (b)

   (c)                                                                                                                                    (d)
Figure 3.  Reads and Writes in KB for both hard disks using single instance.

 

Figure 3 shows a similar behavior of figure 2.  The HD2 writes are several orders of magnitude larger than HD1.  The reads are also smaller than wirtes. Although this experiment may give us the impression that our multiple instance data is a result of a multiplication of workloads in the same system, we still need a better way of characterizing independent workloads running concurrently.

 

Closing Remarks and Future Work

In this report we have briefly shown results from the first experiments dealing with multiple workload characterization.  We plan in extending these experiments by adding TPC-H workloads to have a better real system behavior.  We are also interested in quantifying the impact of each instance independently as a component of the whole disk utilization.  This will help us understand better the overall performance of the system given a particular set of desired applications to be run.

 

References

 

[1]           L. Huang, G. Peng, T. Chiueh. Multi-Dimensional Software Virtualization. ACM SIGMETRICS Performance 2004,  New York, June 2004.

[2]           Transaction Processing Performance Council. TPC-C and TPC-H implementation. [web page]    2001-2004; URL: http://www.tpc.org. [Accessed 18 Mar 2006].

[3]           TangentSoft.net, MySQL++ library. [web page]; URL: http://tangentsoft.net/mysql++/. [Accessed 15 Jan 2006].

[4]           V. Sridharan. TPC-C like c++ implementation. Norhteastern University Computer Architecture Research Group, Northeastern University, Boston, August 2005.

[5]           S. Godard. Iostat man file. SYSTAT Utilities Home Page. [web page] 2006 Feb; URL: http://perso.wanadoo.fr/sebastien_godard/. [Accessed 16 Jan 2006].