Ionic Open Sources New Lambda Function to Protect Sensitive Data Scanned by BigID

Ionic recently open sourced a new AWS Lambda function that protects personally identifiable information (PII) as reported in a BigID scan. BigID provides organizations with the ability to discover, inventory, and index personal data across different data sources. BigID’s data intelligence approach uses correlation and machine learning that enables enterprises to better understand whose data they have, assign residency, and map data flows.

This new Lambda function from Ionic Machina Tools takes the results of a BigID scan and applies data-protection controls based on data residency, data sensitivity, and personal data attribute findings. Thus, the Lambda function provides an example of protecting data using consistent policy enforcement across multiple deployment architectures.

LAMBDA FOR BIGID AND S3 SAMPLE CODE

The BigID S3 Lambda sample utilizes Machina Tools for BigID Data Intelligence, which leverages BigID’s file scanning capability to selectively protect documents containing PII stored on Amazon’s S3 service, and Machina Tools for Amazon S3 SDK client. Using the report generated by BigID scanning files in the S3 source bucket, the PII is encrypted and copied to the destination bucket using Machina Tools for Amazon S3 SDK client. The PII object name along with any attributes from the BigID scan are included as key attributes on the Ionic key associated with the PII. Key attributes are attribute-value pairs that are represented as metadata and stored with the key. These key attributes are used by Machina’s policy engine to determine if a given user should be allowed to view the contents of the file. Objects without PII are also copied to the destination bucket unencrypted.

Note: This example uses a JSON persistor which is derived from plaintext persistor. The JSON persistor can be used in production environments.

Using Machina Tools for Amazon S3

Machina Tools utilizes Amazon S3’s Java SDK. To install, run, and test this sample, follow these instructions on Github. The sample uses Maven to build the sample package, and uses the Maven Shade plugin to produce a monolithic fat jar for uploading to AWS Lambda.

Code Walkthrough

In this sample the Maven *persistor* is stored in AWS Systems Manager Parameter Store. Parameter Store allows the *persistor* and other sensitive values to be securely stored with access restricted by AWS Identity and Access Manager (AIM). The persistor stored in Parameter Store is a plaintext persistor. A custom subclass, JsonPersistor is used in the example to load the persistor from JSON.

This blog will use the following terms:

  • S3 source bucket – contains all the data that was scanned by BigID.
  • BigID data source – name of the BigID configuration that points to the S3 source in the BigID dashboard.
  • S3 destination bucket – contains all the data in the source bucket with encrypted PII.

The Lambda class does the following:

  1. Retrieves AWS Parameter Store parameters
  2. Acquires the BigId self-signed certificate from a copy in AWS Parameter Store
  3. Obtains the PII scan results from the BigID data source
  4. Loads the JSON persistor
  5. Iterates over PII encrypting it into the S3 destination bucket
  6. Copies non-PII information into the S3 destination bucket
JsonNode piiObjectList;

try (CloseableHttpClient client = HttpClients.createDefault();) {

    HttpPost httpPost = new HttpPost(bigidUrl + "/api/v1/sessions");

    ObjectNode node = JsonNodeFactory.instance.objectNode();
    node.put("username", bigidUser);
    node.put("password", bigidPassword);
    StringEntity params = new StringEntity(node.toString());
    httpPost.addHeader("content-type", "application/json; charset=utf-8");
    httpPost.setEntity(params);

    String authTokenString;
    ObjectMapper mapper = new ObjectMapper();
    try (CloseableHttpResponse response = client.execute(httpPost)) {
        HttpEntity entity = response.getEntity();
        JsonNode jsonBlob = mapper.readTree(entity.getContent());
        JsonNode authToken = jsonBlob.get("auth_token");
        authTokenString = authToken.asText();
    }

    HttpGet httpGet = new HttpGet(bigidUrl + "/api/v1/piiRecords/objects?filter=" +
        URLEncoder.encode("system IN (" + srcName + ")", "UTF-8"));
    httpGet.addHeader("content-type", "application/json");
    httpGet.addHeader("Authorization", authTokenString);

    try (CloseableHttpResponse response = client.execute(httpGet)) {
        HttpEntity entity = response.getEntity();
        JsonNode jsonBlob = mapper.readTree(entity.getContent());
        piiObjectList = jsonBlob.get("objectsList");
        outStream.println(piiObjectList.size() + " targets identified.");
    }
}

After the persistor loaded, the IonicS3EncryptionClient is created using the Machina Tools S3 SDK to access the destination bucket.

IonicEncryptionMaterialsProvider iemp = new IonicEncryptionMaterialsProvider();
try {
    iemp.setPersistor(jsonPersistor);
} catch (Exception e) {
    outStream.println(e.getLocalizedMessage());
}
iemp.setIonicMetadataMap(getMetadataMap());

IonicS3EncryptionClient ionicS3 = (IonicS3EncryptionClient)IonicS3EncryptionClientBuilder.standard().withEncryptionMaterials(iemp).build();

Next, get access to the S3 source bucket that contains all the data.

AmazonS3 s3 = AmazonS3ClientBuilder.defaultClient();

Onto step 5. With the S3 source bucket, the BigID data source, and S3 destination bucket, the source BigID PII objects are encrypted and uploaded to the S3 destination bucket using Ionic keys. The BigID PII object name is stored as a Machina attribute ionic-filename along with BigID PII object’s, attributes converted to Machina attributes, stored in bigid-attributes.

piiObjectList.forEach((JsonNode node) -> {

    String split[] = node.get("fullObjectName").asText().split("/", 2);
    if (!split[0].equals(srcBucket)) {
        return;
    }
    S3Object plainObject = s3.getObject(srcBucket, split[1]);

    KeyAttributesMap kam = new KeyAttributesMap();

    kam.put("ionic-filename", Arrays.asList(node.get("objectName").asText()));

    ArrayList<String> bigidAttributes = new ArrayList<String>();
    node.get("attribute").forEach((JsonNode entry) -> {
        bigidAttributes.add(entry.asText());
    });
    kam.put("bigid-attributes", bigidAttributes);

    PutObjectRequest req = new PutObjectRequest(destBucket, plainObject.getKey(),
        plainObject.getObjectContent(), plainObject.getObjectMetadata());
    outStream.println("Encrypting " + node.get("objectName").asText());
    ionicS3.putObject(req, new CreateKeysRequest.Key("", 1, kam));
    copiedSet.add(split[1]);
});

Finally step 6, the unencrypted objects in the S3 source bucket are copied to the S3 destination bucket:

List srcObjectSummaries = s3.listObjects(srcBucket).getObjectSummaries();
outStream.println("\n" + (srcObjectSummaries.size() - piiObjectList.size()) + " objects without PII to be copied unencrypted.");
srcObjectSummaries.forEach((S3ObjectSummary objectSummary) -> {
    String key = objectSummary.getKey();
    if (copiedSet.contains(key)) {
        return;
    }
    outStream.println("Copying unencrypted " + key);
    s3.copyObject(srcBucket, key, destBucket, key);
});

To reiterate, all code is located on the Ionic Machina GitHub site. Also, check out the Machina Developers portal where you’ll find additional examples and documentation.

SUMMARY

Release of this Lambda function is a result of the recent partnership between Ionic and BigID, which delivers a consolidated approach for sensitive data protection and enforcement, and provides an automated model to mitigate compliance, privacy, and security risks throughout the entirety of the data lifecycle.

Additional Contributions by Andrew Patterson