Quantcast
Viewing latest article 43
Browse Latest Browse All 50

A 10x Performance Increase for Batch INSERTs With MySQL Connector/J Is On The Way....

Connector/J has a feature where the driver can take prepared statements of the form "INSERT INTO foo VALUES (...)", and if configured with "rewriteBatchedStatements=true", can re-write batches of them to the form "INSERT INTO foo VALUES (...), (...), (...)". This is a performance win on a few fronts, because of reduction in latency (remember, MySQL in general doesn't have a "batch" form of prepared statement parameter bindings, so each parameter set usually needs to be sent as a separate INSERT), and because of optimizations of handling "multivalue" INSERT in the server itself.

Prior to code sitting at the the head of Connector/J 5.1 which fixes Bug#41532 and Bug#40440"larger" batches (over 500 batched parameter sets or so) experienced extreme performance degradations due to a bunch of extra parsing that the driver did while creating the underlying prepared statement to accept all of the values in the "multivalue" form as seen in this profiling session while running a microbenchmark that creates simple, large INSERT batches:


Image may be NSFW.
Clik here to view.

We can see in the following graph, that in 5.1.7, we got acceptable (not as good as the new code though!) rates of a few thousand rows per second on INSERT, up to batch sizes of 256 or so. After that, the number of rows per second asymptotically approaches zero (not good!):


Image may be NSFW.
Clik here to view.

Now, with the following patch that will be part of 5.1.8, we take the parse cost once, and leave semi-materialized structures available that can be combined into the required prepared statement form to create the "multivalue" INSERT statement, but without parsing. If you have "cachePrepStmts=true" as a configuration parameter, the parse cost is avoided for all prepared statements that are cached on a given connection as well. With this change, you can see a large difference in the profile:


Image may be NSFW.
Clik here to view.

As well as in the performance of our microbenchmark, where we see nearly a 10x performance improvement, topping out close to 40,000 rows/second:


Image may be NSFW.
Clik here to view.

Notice that in both the old version of the feature, and the new, there is a "buckle" that happens, which in the new code is due to bottlenecks in MySQL or the hardware it's sitting on top of, so it seems there is a "sweet spot" in batch size. This buckle seems to shift to the left for InnoDB (at least on my laptop) compared to MyISAM, but that's under no contention. Today, the re-written batch feature limits batch size by "max_allowed_packet", but I think we'll want to add a configuration option to limit it either by parameter set count, or packet size, since this sweet spot more than likely changes depending on storage engine, schema and hardware capabilities. Perhaps we can squeeze this in before we cut a release of 5.1.8.

If you'd like to give the new version of this feature a shot, it's available in the nightly snapshot builds of Connector/J 5.1, at http://downloads.mysql.com/snapshots.php, I'd appreciate any feedback you might have on it!

In the interests of "full disclosure", here is the microbenchmark code, which is a rework of the regression test for Bug#41532:


import java.math.BigDecimal;
import java.sql.Connection;
import java.sql.Date;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.Timestamp;
import java.sql.Types;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.util.Map.Entry;

import org.apache.commons.math.stat.descriptive.DescriptiveStatistics;

import com.mysql.jdbc.NonRegisteringDriver;

public class Foo {

	public static void main(String[] args) throws Exception {

		Connection c = new NonRegisteringDriver().connect(
				"jdbc:mysql:///test?user=root&rewriteBatchedStatements=true",
				null);
		Map> results = new TreeMap>();

		for (int repeat = 0; repeat < 34; repeat++) {
			for (int numberOfRows = 1; numberOfRows < 10000; numberOfRows *= 2) {
				List forThisRowCount = results.get(new Integer(
						numberOfRows));

				if (forThisRowCount == null) {
					forThisRowCount = new LinkedList();
					results.put(new Integer(numberOfRows), forThisRowCount);
				}
				c.createStatement()
						.execute("DROP TABLE IF EXISTS testBug41532");

				c
						.createStatement()
						.execute(
								"CREATE TABLE testBug41532(ID"
										+ " INTEGER, S1 VARCHAR(100), S2 VARCHAR(100), S3 VARCHAR(100), D1 DATETIME, D2 DATETIME, D3 DATETIME, N1 DECIMAL(28,6), N2 DECIMAL(28,6), N3 DECIMAL(28,6), UNIQUE KEY"
										+ " UNIQUE_KEY_TEST_DUPLICATE (ID) ) ENGINE=MYISAM");
				PreparedStatement pstmt = c
						.prepareStatement("INSERT INTO testBug41532(ID, S1, S2, S3, D1,"
								+ "D2, D3, N1, N2, N3) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)");
				try {
					c.setAutoCommit(false);
					c
							.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED);
					Date d1 = new Date(System.currentTimeMillis());
					Date d2 = new Date(System.currentTimeMillis() + 1000000);
					Date d3 = new Date(System.currentTimeMillis() + 1250000);

					for (int i = 0; i < numberOfRows; i++) {
						pstmt.setObject(1, new Integer(i), Types.INTEGER);
						pstmt.setObject(2, String.valueOf(i), Types.VARCHAR);
						pstmt.setObject(3, String.valueOf(i * 0.1),
								Types.VARCHAR);
						pstmt
								.setObject(4, String.valueOf(i / 3),
										Types.VARCHAR);
						pstmt.setObject(5, new Timestamp(d1.getTime()),
								Types.TIMESTAMP);
						pstmt.setObject(6, new Timestamp(d2.getTime()),
								Types.TIMESTAMP);
						pstmt.setObject(7, new Timestamp(d3.getTime()),
								Types.TIMESTAMP);
						pstmt.setObject(8, new BigDecimal(i + 0.1),
								Types.DECIMAL);
						pstmt.setObject(9, new BigDecimal(i * 0.1),
								Types.DECIMAL);
						pstmt.setObject(10, new BigDecimal(i / 3),
								Types.DECIMAL);
						pstmt.addBatch();
					}
					long startTime = System.currentTimeMillis();
					pstmt.executeBatch();
					c.commit();
					long stopTime = System.currentTimeMillis();

					long elapsedTime = stopTime - startTime;

					forThisRowCount.add(new Long(elapsedTime));

					System.out.println(numberOfRows + ": elapsedTime: "
							+ elapsedTime + " rows/permilli: "
							+ (double) numberOfRows / (double) elapsedTime);

					ResultSet rs = c.createStatement().executeQuery(
							"SELECT COUNT(*) FROM testBug41532");
					rs.next();
					if (rs.getInt(1) != numberOfRows) {
						System.out.println("Failed!");
					}
				} finally {

				}

			}
		}

		Iterator>> iter = results.entrySet()
				.iterator();

		System.out.println("size\t\t\tmin\t\t\tmax\t\t\tmean\t\t\tstddev");

		while (iter.hasNext()) {
			Entry> entry = (Entry>) iter
					.next();

			List forRowCount = entry.getValue();

			DescriptiveStatistics stats = new DescriptiveStatistics();
			double rowCount = (Integer) entry.getKey();
			for (Object val : forRowCount) {
				stats.addValue(rowCount / (double) ((Long) val).longValue());
			}

			double min = stats.getMin();
			double max = stats.getMax();
			double mean = stats.getMean();
			double std = stats.getStandardDeviation();

			System.out.println(rowCount + "\t\t\t" + min + "\t\t\t" + max
					+ "\t\t\t" + mean + "\t\t\t" + std);

		}
	}

}

Viewing latest article 43
Browse Latest Browse All 50

Trending Articles