Protecting Structured Data Using JDBC and Machina Tools for Java

Machina Tools provides an easy-to-use interface to interact with Machina. In particular, Machina Tools exposes functions to create and manage data protection keys, tag them with attributes that can be used to inform policy, and to encrypt data using your choice of ciphers. As part of this capability, Machina allows application developers to create and manage cryptographic keys in very large volumes.

This increase in scale gives developers the freedom to create as many keys as needed to protect each piece of data. This scale gives you an unprecedented level of resolution with respect to policy-based data access. Machina’s policy engine combines policy rules, data markings, and real-time information about every key request to provide complete control over access to protected data.

Using Machina Tools you can build this functionality directly into JVM-hosted applications. The SDK can be incorporated into mobile, desktop, and server application profiles, and it takes advantage of Java support of file access and storage, and of network and database connectivity.

In this post presents examples showing how to integrate the Java SDK into a vanilla Java Database Connectivity (JDBC) application. These samples describe a use case for protecting individual data values in a database table holding personnel records.

  • Code Integration Sample 1 describes a simple integration, where a database table schema may be modified to accommodate the additional metadata used by Machina to protect sensitive table data.
  • Code Integration Sample 2 describes a more complex integration, where table data may be protected even in the case of an existing database with a fixed schema.

These two code samples take advantage of Machina’s scalability by independently protecting individual data values. As will be seen in the code walk-throughs, by taking advantage of this capability, fine-grained control of data visibility becomes simple and intuitive. Complete code for these code samples have been published to Github.

Java Database Connectivity (JDBC)

Java includes a set of classes that provide for connectivity to database management systems. The Java API allows applications to access data in databases through the use of the java.sql package. This package provides the APIs for accessing and processing data stored in a data source (usually a relational database). The Java Database Connectivity (JDBC) interfaces allow client applications to establish connections with a given database server, and to access, add, and modify data in a platform-independent (Linux, Windows, MacOS) and database-independent (PostgreSQL, MySQL, Oracle, SQL Server) manner.

The combination of Java and SQL relational databases is a popular combination for implementing data-oriented software solutions. It allows for scalable solutions and an expressive business-logic vocabulary.

Code Integration Sample 1

This code sample consists of a small set of Java classes that implement manipulations of a single relational database table. The sample is written to interact with any JDBC-compliant database system. The setup scripts target the syntax of PostgreSQL, but other database systems could be targeted as long as they supported equivalent functionality.

When invoked, the code is run in the context of a single unit test. This is done in order to minimize the complexity of the sample. It should be straightforward to map the concepts into an alternate execution environment.

This sample demonstrates several capabilities of the data protection engine of Machina:

  • the ability to substitute Ionic-protected versions of database record values into a JDBC record INSERT
  • the ability to read JDBC records containing Ionic-protected values from a JDBC SELECT, decrypting the content in the context of the operation
  • the ability to tag sensitive record fields with data markings, allowing fine-grained logic to govern key release decisions
  • the ability to programmatically adjust Machina server policy, causing the view of the resulting table data to be dynamically adjusted

The last point is worth discussing in more detail. The code in Waypoint 3 describes the process of manipulating the active Machina policy set to dynamically alter the application’s view of the data. Two policy snippets are added sequentially and then removed sequentially. After each policy change, the data store is queried for the data. As the scope of the data protection policy changes, the visibility of individual data values also changes.

These demonstrate how data items can be marked with multiple values, each of which can be used to adjust the visibility of the data or personally identifiable information (PII). One data marking is applied based on the table column of the data element; a different marking is applied based on the table row. As the two DENY policies are sequentially added, the view of the data changes. Less data is visible to the client application.

All Data

IDFirstLastZipDepartment
1JohnWilliams35762Engineering
2ThomasJohnson57198Marketing
3ElizabethJackson55100HR

Policy Restrict PII Applied (deny access to data columns FirstLast)

IDFirstLastZipDepartment
1[RESTRICTED][RESTRICTED]35762Engineering
2[RESTRICTED][RESTRICTED]57198Marketing
3[RESTRICTED][RESTRICTED]55100HR

Policy Restrict HR Applied (deny access to data rows where Department=HR)

IDFirstLastZipDepartment
1[RESTRICTED][RESTRICTED]35762Engineering
2[RESTRICTED][RESTRICTED]57198Marketing

As the two DENY policies are sequentially removed, the view of the data is restored to its unrestricted state.

Policy Restrict PII Removed

IDFirstLastZipDepartment
1JohnWilliams35762Engineering
2ThomasJohnson57198Marketing

Policy Restrict HR Removed

IDFirstLastZipDepartment
1JohnWilliams35762Engineering
2ThomasJohnson57198Marketing
3ElizabethJackson55100HR

Machina allows multiple data markings to be applied to each protected data element, and multiple policies to be in effect on key release decisions. Together, these allow for arbitrarily complex data governance strategies to be employed when integrating Ionic protection into database applications.

Code Integration Sample 2

This sample gives guidance on strategies to integrate Machina capabilities into existing database applications. (As with code sample 1, the PostgreSQL database system is targeted, but the JDBC logic can be used with alternate database systems.)

Table schemas can often be fixed and unchangeable due to organizational requirements. In these cases, a second table can be added to hold the Ionic-protected sensitive values. The two tables can be bound together using database foreign key constraints. A layer of business logic can be employed to present the two tables as a single logical table for consumption by database client applications. This business logic layer could consist of database stored procedures, application code, or an object relational mapping (ORM) library.

Use of a second table would also address the storage problem caused by Machina protection of non-textual field types (such as INTEGER or DATE). These fields lack the extra space needed to store Machina metadata associated with a protected value. This metadata includes:

  • the ID associated with the AES key used to protect the data
  • the cryptography initialization vector (IV) used to protect the data against dictionary attacks
  • in the case of the Ionic ChunkCipher implementations, additional version formatting metadata

In a Machina integration like this, attention must be given to fields that require their content to take a certain form. For example, valid social security numbers are required to be in the form 123-45-6789. Using the “two table” strategy, the entire protected value is put into the designated column of the second table. So what should go in the data column of the first table? Some possibilities, based on database constraints, and the constraints of all clients of the data:

  • NULL, if a field is NULLABLE, and if all data clients can handle NULL values
  • a “default value”, if the database and data clients can accommodate data duplicates
  • a value derived from the protected value, and manipulated to conform to the data constraints of the column
  • a random value, manipulated to conform to the data constraints of the column

Ideally, access to the data should be limited to Machina-aware data clients, each configured to access the same Machina key tenant. In any case, the use of this strategy assures that values in sensitive database table columns are only recoverable via authorized data access operations.

Analysis

Both code samples are very basic, in order to be easily digestible. They intend to demonstrate simple integration strategies that would be widely applicable. Future code samples might focus more on specific aspects of Machina Tools.

Links