by John Turner
Posted on April 13, 2015
Last week, I was speaking to an intern who worked for me while I was at Paddy Power. He was explaining with some frustration that he had recently had the supervisor for his final year project reassigned. The guidance from the new supervisor was that the project needed some “wow”. I was not sure if this meant he needed to present the project while flanked by the Dallas Coyboys Cheerleaders so I asked for some specifics. This was the rather perplexing response.
I know Ian is using JPA and I would like to give him a little more information than he received from the above suggestion. So here is my effort at explaining the JPA inheritance strategies.
The JPA inheritance strategies facilitate mapping of an inheritance hierarchy in 3 different ways. To demonstrate the advantages and disadvantages of each, I will use a class hierarchy comprising a Vehicle, Airplane, Bike and Car.
So you can follow or experiment with the code I’ve made it available on GitHub. Note that there are separate repositories for each class, that the Vehicle class is abstract and that each repository has a corresponding test class. I’m using Spring Boot to automatically discover the repositories, entity mappings as well as provide an in memory database for testing purposes.
I’ve enabled the Hibernate SQL logging so that we can see the table DDL as well as the structure of a findAll query executed via each of the repositories.
The JOINED inheritance strategy specifies a table for each class within the hierarchy.
Table Structure
Given the Vehicle hierarchy, there will be 4 tables created as demonstrated by the log output below:
You will also notice that in this simple example, the properties of each class file correlate directly to columns on each table with the id duplicated across all 4 tables. The id column on the airplane, bike and car tables act as a foreign key while the id on the vehicle table is the primary key.
From this we can deduce all the advantages and disadvantages.
Changing Class Definitions is Easier
Because there is a direct correlation of class + properties to table + columns there is no duplication of the vehicle definition. As a result a single change for a class or property definition will result in a single change for a table or column definition. For example, if I add an ‘isMortorised’ property to vehicle class it will result in the addition of a ‘motorised’ column on the vehicle table.
Data Integrity at the DBMS
Another side effect of the direct mapping between class and table is that I can manage data integrity at the database layer. For example, column definitions including null constraints, foreign key constraints etc. can be managed by the DBMS (without resorting to stored procedures).
Creating an Airplane, Bike or Car is (more) Expensive
To save an Airplane requires an insert into the vehicle and airplane tables. It will also require a sequence number to be generated for the vehicle and a foreign key constraint validation to occur on the airplane table. We can see this from the log statement below:
Retrieving an Airplane, Bike or Car is (more) Expensive
To retrieve an Airplane (or Airplanes) requires a join between vehicle and airplane.
Retrieving a Vehicle is (much more) Expensive
To retrieve a Vehicle (or Vehicles) requires a (left outer) join between vehicle, airplane, bike and car.
As the name suggests, the SINGLE_TABLE inheritance strategy specifies a table for the entire class hierarchy.
Table Structure
Given the Vehicle hierarchy, there will be 1 table created as demonstrated by the log output below:
You’ll notice that the first column definition does not correspond to a property from any of the class files. This is a discriminator column that allows JPA to understand the type of entity to create when it retrieves a row from this table.
Changing Class Definitions is Harder
When using the JOINED strategy I only impacted the table corresponding to the class I was modifying. With the SINGLE_TABLE strategy I am changing table that all vehicles are stored in. This may or may not be a big deal but is worth considering especially if you are storing lots of data.
Data Integrity at the DBMS
Column definitions including null constraints, foreign key constraints etc. can no longer be managed by the DBMS (without resorting to stored procedures). This can be a problem especially when a database can be access by different applications using different access layers.
Data Storage
Because a single table is stores all vehicles there will necessarily be a lot of null values in columns that do not relate to the specific type. This may or may not be a problem depending on the size of the tables and how efficient the DBMS is at storing null values.
Creating an Airplane, Bike or Car is Cheap
Because the SINGLE_TABLE strategy uses a single table, creating an Airplane, Bike or Car is a single insert executed against a single (albeit larger) table.
Note that the discriminator value defaults to the class name. Be careful that whatever discriminator you use is treated efficiently by the DBMS (in terms of both storage and query efficiency. The discriminator can be changed using the JPA DiscriminatorColumn annotation.
Retrieving a Vehicle, Airplane, Bike or Car is Cheap
Again, because of the use of a single table retrieving a Vehicle, Airplane, Bike or Car (or many of same) is cheap. It is a single select statement which filters using the discriminator value if a subclass is being queried. This is the cheapest strategy for executing polymorphic queries (i.e. retrieving Vehicles).
The TABLE_PER_CLASS strategy is really a table per concrete class strategy.
Table Structure
Given the Vehicle hierarchy, there will be 3 tables created as demonstrated by the log output below:
The inherited colour property is defined in the definitions of the airplane, bike and car tables. There is no vehicle table as Vehicle is an abstract class. There is also no discriminator as data from each vehicle is stored in its own table.
Changing Class Definitions is Easier
Given that each concrete class has its own corresponding table definition, it is easier to modify these. However, if I modify the Vehicle definition I have to make corresponding modifications to the airplane, bike and car tables.
Data Integrity at the DBMS
Because each concrete class is stored in its own table the DBMS can manage null constraints, foreign key constraints etc. without resorting to stored procedures.
Creating an Airplane, Bike or Car is Cheap
Because the TABLE_PER_CLASS strategy uses a single table for each concrete class, creating an Airplane, Bike or Car is a single insert executed against a single table.
There is no discriminator value in this case!
Retrieving an Airplane, Bike or Car is Cheap
Again, because of the use of a single table retrieving an Airplane, Bike or Car (or many of same) is cheap. It is a single select statement.
Retrieving a Vehicle is Expensive
The biggest disadvantage of this strategy is that polymorphic queries are expensive, requiring a union of 3 data sets (from each of the tables).
So there we have it, some of the considerations to be taken into account when selecting the JPA inheritance strategy. A lot of these will depend on the particulars of the DBMS you are using but for the most part this will provide a useful guide.