Monday, December 14, 2009

Java and OpenJPA for MySQL Cluster

MySQL Cluster Connector for Java is a new feature in the upcoming Cluster 7.1.

It is an easy to use, high-performance Java and JPA interface for MySQL Cluster. We currently call it simply Cluster/J and Cluster/JPA. It is an Objection Relational bridge that maps Java classes to tables stored in MySQL Cluster, and implements an OpenJPA interface.

How does it work?

The new MySQL Cluster Connector for Java and JPA implementation consist of two parts: an OpenJPA plugin and the standalone Cluster/J API.

The standalone Cluster/J API allows you to write very simple Java programs for MySQL Cluster. A read by primary key is then as easy as

Fish e = session.find(Fish.class, 4711);

A query like the above is executed without using the MySQL Server. Cluster/J API is a native MySQL Cluster-specific API, which is a direct access path to Cluster which is very simple to use, and which is used as the fast track in OpenJPA.

The OpenJPA for Cluster implementation plugs into Apache OpenJPA and provides native access to Cluster. Previous JPA implementations provided Cluster support via JDBC and the MySQL Server. JDBC is also used in this implementation; however, many transactions can now be directly executed on Cluster, without the MySQL Server being involved. This allows much faster query execution and higher throughput. We thus refer to it as "the fast track". More complex transactions still will be sent through JDBC where they will be optimized by the MySQL Server's query optimizer.

A third part is an implementation detail called NDB JTie; this is needed to actually use MySQL Cluster Connector/J with Cluster. NDB JTie contains a Java-to-C/C++ mapper based on JNI. It consists of a dynamic library and the two JAR files ndbjtie and JTie as wrappers around the existing NDBAPI while adding very little overhead. You need this part but are not supposed to program against it.

The native Cluster/J implementation already allows complex queries, but it is limited to queries on single tables and does not support relations or inheritance. Those are the domain of JPA.

Where to download?

The sources can be found here:

mysql-5.1.39-ndb-7.1.0-clusterj-174-alpha.tar.gz

A Linux 64-bit tarball can be found here:

mysql-cluster-7.1.0-clusterj-174-alpha-linux-x86_64.tar.gz.

Cluster/J is part of Cluster 7.1 but should work with current GA 7.0 versions of Cluster as well. It can be used for writing OpenJPA based or Cluster/J based applications. Even simple "drop-in" migrations for OpenJPA from any other database to Cluster may work, although we have not yet extensively tested such migrations ourselves.

In the download you will find the following new files and directories that make up Cluster/J:
  1. share/mysql/java directory containing all JAR files needed
  2. the lib/mysql or lib directory with the new library libndbjtie
  3. storage/ndb/clusterj directory containing the Cluster/J sub-packages and sources
  4. storage/ndb/ndbjtie directory with the new JNI interface
  5. Cluster/J pom.xml and automake files if you like to work with maven or automake for compiling Cluster/J and JPA yourself.
  6. Cluster/J is integrated with automake into the cluster build system.
The naming of jarfiles follows the typical conventions of <package>-<version>.jar. There are also unversioned links to any jar file. In addition, many of the files exist only for testing purposes.

Prerequisites

Download and unpack mysql-cluster-connector-j-7.1.0-135.tar.gz from ftp.mysql.com/pub/mysql/download, configure and compile; JTie and Cluster/J are enabled by default whenever the configure scripts can find a Java compiler.

./configure --with-plugins=ndbcluster

You will find the two JTie jars under storage/ndb/ndbjtie/ in the sub dirs ./src and ./jtie/src.

If you want to use OpenJPA and want to compile cluster yourself, download OpenJPA from apache.org. We have tested with apache-openjpa-1.2.1. Then let configure know where to find the OpenJPA jar file and that the OpenJPA is supposed to be compiled into the MySQL Cluster Connector for Jav

--with-openjpa --with-classpath=/Users/bo/prg/apache-openjpa-1.2.1/openjpa-1.2.1.jar

Start Cluster as usual.

Where to start?

First, we will develop a simple Cluster/J standalone application which reads from and writes to MySQL Cluster, requiring no use of JDBC or OpenJPA. While the application itself doesn't need a MySQL Server, since it connects directly to the Cluster data nodes, a MySQL Server is still necessary to create a Cluster table.

Check the end if this tutorial for a link to the example files.

A simple Cluster/J program

Three components are needed to implement a Cluster/J program:
  1. An annotated entity interface
  2. A table created in MySQL Cluster
  3. A configuration pointing to a running MySQL Cluster instance
A simple interface for a new entity looks like this:

@PersistenceCapable(table="t_fish_food")
public interface Fish {

@PrimaryKey
int getId();
void setId(int id);

@Index(name="idx_unique_hash_name")
String getName();
void setName(String name);

String getWine();
void setWine(String wine);

@Column(name = "taste")
@Index(name="idx_taste")
Integer getTastiness();
void setTastiness(Integer taste);

}

The annotations need to be imported from the Cluster/J API. They can be found in the annotation package from the cluster API and are imported like this:

import com.mysql.clusterj.annotation.Column;
import com.mysql.clusterj.annotation.Index;
import com.mysql.clusterj.annotation.PersistenceCapable;
import com.mysql.clusterj.annotation.PrimaryKey;

As Cluster/J's main purpose is to write, read and modify data in a database, it is simpler to create the table that matches this interface using MySQL (more specifically the mysql client):

CREATE TABLE `t_fish_food` (
`id` int(11) NOT NULL,
`name` varchar(128) DEFAULT NULL,
`wine` varchar(55) DEFAULT NULL,
`taste` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_unique_hash_name` (`name`) USING HASH,
KEY `idx_taste` (`taste`)
) ENGINE=ndbcluster;

In Cluster/J as well as in JPA, annotations are used to describe how the interface is mapped to tables in a database. The annotation @PersistenceCapable(table="t_fish_food") is used to let Cluster/J know which database table to map the Fish to: the t_fish_food table in cluster. The annotations @PrimaryKey and @Index inform Cluster/J about indexes in the database table.

Whereas JPA requires extra annotations for every attribute, this is not necessary in Cluster/J.

Note that Cluster/J does not yet know which database the actual table is located in. This is part of writing the class which uses the Fish interface. Now that the interface and the table in cluster are complete, the actual program for writing data can be coded. Cluster/J requires the use of sessions and transactions. In order to obtain a session, a session factory needs to be created first. Properties are used to describe the database environment. The easiest way to do this is by reading a properties file:

Since we are accessing Cluster data directly (without going through a MySQL Server), the following properties are sufficient:

com.mysql.clusterj.connectstring=localhost:1186
com.mysql.clusterj.database=test

Once the properties are loaded, we can create a session factory:

sessionFactory = ClusterJHelper.getSessionFactory(props);

The session itself is created like this

session = sessionFactory.getSession();

Once these preparations are complete, the actual reading and writing of data is easy. First create a new instance of Fish, like this:

Fish fish = session.newInstance(Fish.class);
fish.setId(4711 + i);
fish.setName(fishNames[i]);
fish.setTastiness(i*12);
fish.setWine(wineNames[i]);

Next, store it in the cluster via the native interface:

Transaction tx = session.currentTransaction();
tx.begin();
session.makePersistent(fish);
tx.commit();

That's all there is to it. Compile and run the program, then check the database. The attached source example shows how to store an entire ArrayList of Fish in a single transaction.
To use the native Cluster/J API, compile your application with the clusterj-api.jar that is included. The files clusterj-core.jar and clusterj-tie.jar, as well as the the ndbjtie/jtie JAR files, are needed to run your Cluster/J application.

Example:
java -classpath /usr/local/mysql/share/mysql/java/clusterj-api.jar:...

When starting the JVM, set the java.library.path variable to point to the directory containing the Cluster client and ndbjtie libraries.

Example:
java -Djava.library.path=/usr/local/mysql/lib/mysql

To use OpenJPA you must add the same clusterj and ndbjtie JAR files as well as the MySQL Server JDBC connector (mysql-connector-j.jar) to your project. In addition, you must also update your persistence.xml file.

Alternatively, you can also use the individual clusterj packages. Using the native Cluster/J API, you can compile using only the api package; however, if you do so, you must also run your application using the -core and -tie packages as well as the ndbjtie packages.

Whether you are using either native Cluster/J or OpenJPA, the JVM needs to be made aware of the location of the dynamic JTie library. This can be done by adding the directory containing the library to the java.library.path.

If you are wondering why you must use different JAR files at compile time and run time, this is because the Cluster/J API is meant to change less often, which means that you can trust your applications written against it to continue to run without having to adapt them to each new release of it. However, the implementation layers – the Cluster/J core and the clusterj-tie Cluster plugin – can at change any time.

Cluster/J is also intended to be independent of the Cluster version, while the NDB JTie layer is version specific.

Selects and updates with Cluster/J

The most simple and at the same time most efficient way to retrieve data from Cluster via Cluster/J is the find() function, as shown here:

Fish e = session.find(Fish.class, 4711);

An update on the retrieved object is as simple as this:
e.setTastiness(e.getTastiness() * 2);
session.updatePersistent(e);

This all should then be embedded in a transaction.

WHERE clauses

With Cluster/J, it is possible to write more complex queries. Data in Cluster/J is represented as domain objects which are separate from business logic. Domain objects are mapped to database tables. As a first step, you should fetch a query builder which provides the domain object.

 QueryBuilder qb = session.getQueryBuilder();

QueryDomainType dobj = qb.createQueryDefinition(Fish.class);

Query query = session.createQuery(dobj);

List resultList = query.getResultList();
for (Fish result: resultList) {

name = result.getName();
...

The query above maps to a full table scan. In order to introduce certain desired conditions the domain object is modified. PredicantOperand objects represent the operands in the condition. A string is used as a placeholder for parameters. The name of the set or get function of the persistence-capable interface is used as the other operand.

 PredicateOperand parm;
PredicateOperand propertyPredicate;
Predicate greaterThan;

param = dobj.param("tasteparam");
propertyPredicate = dobj.get("tastiness");

greaterThan = propertyPredicate.greaterThan(paramLowerPredicate);
dobj.where(greaterThan);

This part of the code could be also written in pseudo-SQL as WHERE tastiness > ?. The actual query is then created with the domain object, and parameters set for query execution:

 Query query = session.createQuery(dobj);
query.setParameter("tasteparam", new Integer(12));

A simple JPA app

There are excellent tutorials and examples available to help you quickly ramp up on OpenJPA, but this might still be helpful to you in writing a first OpenJPA application for cluster.

OpenJPA requires a running MySQL Server and the MySQL JDBC Connector/J. OpenJPA is built on top of Cluster/J but it routes more complex queries through MySQL Server.

The approach for writing an OpenJPA app using the native Java connector to Cluster is quite similar to that for writing a Cluster/J app. Instead of instantiating a SessionFactory with the needed properties, OpenJPA uses a persistence.xml file. JPA also requires writing an entity class instead of describing the persistable entity with an interface.

Entities mapped to Cluster tables need to implement the Serializable interface. If the table name is the same as the class name, then name = "t_basic" can be omitted. Unlike Cluster/J, JPA requires you to annotate the actual member variables and not the setter and getter functions.

@Entity(name = "t_fish_food") //Name of the entity
public class Fish implements Serializable {

@Id //signifies the primary key
@Column(name = "id", nullable = false)
@GeneratedValue(strategy = GenerationType.AUTO)
private int id;

@Column(name = "name", length = 128)
private String name;

@Column(name = "wine", length = 55)
private String wine;

@Column(name = "taste")
private Integer tastiness;

...

The persistence.xml file needs to be in the META-INF directory, which needs to be in the Java classpath when executing the JPA program:

<persistence xmlns="http://java.sun.com/xml/ns/persistence"
xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0">

<persistence-unit name="ndbjpa" type="RESOURCE_LOCAL">
<provider>
org.apache.openjpa.persistence.PersistenceProviderImpl
</provider>
<class>Fish</class>
<properties>
<property name="openjpa.ConnectionDriverName"
value="com.mysql.jdbc.Driver">
<property name="openjpa.ConnectionURL"
value="jdbc:mysql://localhost:3306/test">
<property name="openjpa.ConnectionUserName" value="root">
<property name="openjpa.ConnectionPassword" value="">
<property name="openjpa.BrokerFactory" value="ndb">
<property name="openjpa.ndb.connectString" value="localhost:1186">
<property name="openjpa.ndb.database" value="test">
</properties>
</persistence-unit>
</persistence>

The openjpa.BrokerFactory property advises JPA to enable the native NDB fast track.

openjpa.ndb.connectString uses the usual syntax for Cluster connection strings pointing to the Cluster management server.The Cluster/J fast track does not extract the database from the JDBC driver URL. You need to configure the fast track database using openjpa.ndb.database.

One of the special characteristic of OpenJPA is the implementation of auto-increment values via a sequence table. This table can be created automatically by enabling synchronized schemas. However, we recommend creating the schema manually with the following CREATE TABLE statement:

CREATE TABLE `OPENJPA_SEQUENCE_TABLE` (
`ID` tinyint(4) NOT NULL,
`SEQUENCE_VALUE` bigint(20) DEFAULT NULL,
PRIMARY KEY (`ID`)
) ENGINE=ndbcluster;

If openjpa.jdbc.SynchronizeMappings is used, you need to be careful. In this case, the MySQL Server needs to be configured to use NDB (or NDBCLUSTER) as the default storage engine. AIn addition, you must keep in mind that indexes are not created from the Java class.

In order to query the t_fish_food table that we created in the previous section, the following steps are necessary:

 EntityManagerFactory entityManagerFactory =
Persistence.createEntityManagerFactory("ndbjpa");
EntityManager em = entityManagerFactory.createEntityManager();
EntityTransaction userTransaction = em.getTransaction();

Please note that the entity manager factor is created with the name pointing to the persistence unit defined in the XML file.

 userTransaction.begin();
Query q = em.createQuery("select x from Fish x where x.tastiness > 12");

for (Fish m : (List) q.getResultList()) {
System.out.println(m.toString());
}
userTransaction.commit();

em.close();
entityManagerFactory.close();

You should also note that the column names in the WHERE clause are the entity's class members and not the column names in the table that they are mapped to. Now you've run your first OpenJPA program on Cluster. You can find more complex examples online. The examples enclosed in the OpenJPA package currently don't work for Cluster or any other storage engine.

Examples

You can find the examples used here in the basic tutorial as part of the source package here. To compile the Cluster/J example clusterj_example:

javac -classpath clusterj-api.jar Fish.java Main.java

Start cluster and set host and port in cluster.properties pointing to the cluster management process. Then run the example with

java \
-Djava.library.path=<cluster-inst-dir>/lib \
-classpath mysql-cluster-connector-java.jar:ndbjtie.jar:jtie.jar:. Main

Libraries needed are all either found in the cluster install directory under lib if you use the pre-compiled cluster and lib/mysql if you compiled cluster yourself. Jar files are found in the share/mysql/java sub directory of your Cluster 7.1 installation.

Compiling the OpenJPA example is as simple as

javac -classpath openjpa-1.2.1.jar:geronimo-jpa_3.0_spec-1.0.jar \
Fish.java Main.java

And you typically run with the following jar files:

java -Djava.library.path=<cluster-inst-dir>/lib \
-classpath openjpa-1.2.1.jar:\
geronimo-jpa_3.0_spec-1.0.jar:\
geronimo-jta_1.1_spec-1.1.jar:\
commons-lang-2.1.jar:\
commons-collections-3.2.jar:\
serp-1.13.1.jar:\
mysql-cluster-connector-java.jar:\
mysql-connector-java-5.1.10-bin.jar:\
jtie.jar:\
ndbjtie.jar:../.:. Main

The MySQL JDBC driver can be downloaded from dev.mysql.com. The jar files are either part of Cluster 7.1 or the Apache OpenJPA distribution.

FAQ & hints

32-bit Java VM and 64-bit libaries This is an easy trap to get caught in. A 64-bit Java VM and 32-bit Cluster shared libraries will not work together (and vice versa). The error message you get when you try to run the program using a mixed 64-bit and 32-bit is quite clear:

failed loading library 'ndbjtie'; ... wrong ELF class: ELFCLASS64 (Possible cause: architecture word width mismatch)

Missing -Djava.library.path The Java VM needs the path to the cluster client libraries and the ndbjtie libraries. If that path is not provided or libaries are not found in the path provided then you get this error message:

failed loading library 'ndbjtie'; java.library.path='...' Exception in thread "main" java.lang.UnsatisfiedLinkError: ... /libndbjtie.so.0.0.0: libndbclient.so.4: cannot open shared object file: No such file or directory