We recently came across a requirement to perform Encryption & Decryption of Parquet Files. These files were of huge sizes and we had to perform this operation from multiple systems like – Azure Synapse Pipelines, Web App/API, Console Apps, etc.
Encryption requirements were of two types –
- Full File Encryption/Decryption
- Column Level Encryption/Decryption
While searching on the internet there were very limited options to perform cryptographic operations on the Parquet Files, especially column-level encryption. Also, the old way of traversing each row and then going to the required column proved to be very slow and resource-consuming.
So, we needed some code that reads column-level data and performs crypto operations.
After lots of searches and tries we found below NuGet package, which is recently launched and is still under preview
https://www.nuget.org/packages/Microsoft.Data.Encryption.Cryptography/
It has the functionality of performing cryptographic operations for Parquet files on column-level data. But, as this package is still under preview, we ran into many bugs and supportability issues, ex- it requires the latest .Net Framework version of 7.0 only.
Next challenge we faced was with the hosting of this service. Initially, we thought of Azure Functions but during load testing, we realized it was getting timed out for big files.
So finally, we used Azure Batch Account, which is meant for to run large-scale parallel and high-performance computing (HPC).
Below will discuss in detail about this Helper which performs Encryption and Decryption of Full File or Column Level.
1 Use Case
Parquet files might contain sensitive user information. This helper provides below mentioned functionalities –
- Full File Encryption/Decryption: Allows users to encrypt/decrypt Full Parquet File, this option is extremely fast, as it performs a cryptographic operation on the full file.
- Column Level Encryption/Decryption: Allows users to encrypt/decrypt specific Columns of the parquet file. This utility reads columns in a file and performs cryptographic operations.
It’s slightly resource intensive and may take more time than Full File Encryption/Decryption. However, this utility reads data column-wise and not row-wise, which improves performance.
2 Azure Batch Intro (Skip this part if you know about it)
We used Azure Batch to run large-scale parallel and high-performance computing (HPC) batch jobs efficiently in Azure. Azure Batch creates and manages a pool of compute nodes (virtual machines), installs the applications you want to run, and schedules jobs to run on the nodes.
There’s no cluster or job scheduler software to install, manage, or scale. Instead, you use Batch APIs and tools, command-line scripts, or the Azure portal to configure, manage, and monitor your jobs.
Costing –
There is no additional charge for using Batch. You only pay for the underlying resources consumed, such as virtual machines, storage, and networking.
Refer to below link for more details –
2.1 Important Components of Azure Batch
2.1.1 Batch accounts
All processing and resources are associated with a Batch account. When your application makes a request against the Batch service, it authenticates the request using the Azure Batch account name, the URL of the account, and either an access key or an Azure Active Directory token.
You can run multiple Batch workloads in a single Batch account. You can also distribute your workloads among Batch accounts that are in the same subscription but located in different Azure regions.

2.1.2 Nodes –
A node is an Azure virtual machine (VM) or cloud service VM that is dedicated to processing a portion of your application’s workload. The size of a node determines the number of CPU cores, memory capacity, and local file system size that is allocated to the node.
Nodes can run any executable or script that is supported by the operating system environment of the node. Executables or scripts include *.exe, *.cmd, *.bat, and PowerShell scripts (for Windows) and binaries, shell, and Python scripts (for Linux).
All compute nodes in Batch also include:
- A standard folder structure and associated environment variables that are available for reference by tasks.
- Firewall settings that are configured to control access.
- Remote access to both Windows (Remote Desktop Protocol (RDP)) and Linux (Secure Shell (SSH)) nodes (unless you create your pool with remote access disabled).
2.1.3 Pools –
A pool is a collection of these nodes for your application to run on.
Azure Batch pools build on top of the core Azure compute platform. They provide large-scale allocation, application installation, data distribution, health monitoring, and flexible adjustment (scaling) of the number of compute nodes within a pool.
The pool can be created manually, or automatically by the Batch service when you specify the work to be done.
Node type and target – When you create a pool, you can specify which types of nodes you want and the target number for each. The two types of nodes are:
- Dedicated nodes. Dedicated compute nodes are reserved for your workloads. They are more expensive than Spot nodes, but they are guaranteed to never be preempted.
- Spot nodes. Spot nodes take advantage of surplus capacity in Azure to run your Batch workloads. Spot nodes are less expensive per hour than dedicated nodes, and enable workloads requiring significant computing power. For more information, see Use Spot VMs with Batch.
Note – Spot nodes may be pre-empted when Azure has insufficient surplus capacity. If a node is pre-empted while running tasks, the tasks are requeued and run again once a compute node becomes available again. Spot nodes are a good option for workloads where the job completion time is flexible and the work is distributed across many nodes.
2.1.3.1 Node size –
When you create an Azure Batch pool, you can choose from among almost all the VM families and sizes available in Azure. Azure offers a range of VM sizes for different workloads, including specialized HPC or GPU-enabled VM sizes. Note that node sizes can only be chosen at the time a pool is created. In other words, once a pool is created, its node size cannot be changed.
For more information, see Choose a VM size for compute nodes in an Azure Batch pool.
2.1.3.2 Automatic scaling policy
For dynamic workloads, you can apply an automatic scaling policy to a pool. The Batch service will periodically evaluate your formula and dynamically adjusts the number of nodes within the pool according to the current workload and resource usage of your compute scenario. This allows you to lower the overall cost of running your application by using only the resources you need and releasing those you don’t need.
You enable automatic scaling by writing an automatic scaling formula and associating that formula with a pool. The Batch service uses the formula to determine the target number of nodes in the pool for the next scaling interval (an interval that you can configure). You can specify the automatic scaling settings for a pool when you create it, or enable scaling on a pool later. You can also update the scaling settings on a scaling-enabled pool.
A scaling formula can be based on the following metrics:
- Time metrics are based on statistics collected every five minutes in the specified number of hours.
- Resource metrics are based on CPU usage, bandwidth usage, memory usage, and number of nodes.
- Task metrics are based on task state, such as Active (queued), Running, or Completed.
3 Azure Functions vs Azure Batch
3.1.1 Azure Functions
- This serverless service is most suitable for event-driven triggers that run for a short period.
- A function can also be used to run scheduled jobs through timer triggers, when configured to run at set times.
- Azure Functions is not a recommended option for large, long-running tasks because they can cause unexpected timeout issues. However, depending on the hosting plan, they can be considered schedule-driven triggers.
- Azure Functions limitations –
Azure subscription limits and quotas – Azure Resource Manager | Microsoft Learn
Resource | Consumption plan | Premium plan | Dedicated plan | ASE |
Default timeout duration (min) | 5 | 30 | 301 | 30 |
Max timeout duration (min) | 10 | unbounded7 | unbounded2 | unbounded |
Max outbound connections (per instance) | 600 active (1200 total) | unbounded | unbounded | unbounded |
Max request size (MB)3 | 100 | 100 | 100 | 100 |
ACU per instance | 100 | 210-840 | 100-840 | 210-2508 |
Max memory (GB per instance) | 1.5 | 3.5-14 | 1.75-14 | 3.5 – 14 |
Max instance count (Windows/Linux) | 200/100 | 100/20 | varies by SKU9 | 1009 |
Storage5 | 5 TB | 250 GB | 50-1000 GB | 1 TB |
3.1.2 Azure Batch
- Consider Azure Batch if you need to run large, parallel high-performance computing (HPC) workloads across tens, hundreds, or thousands of VMs.
- The Batch service provisions the VMs, assign tasks to the VMs, runs the tasks, and monitors the progress.
- Batch can automatically scale out the VMs in response to the workload.
- It also provides job scheduling.
- It supports both Linux and Windows VMs.
Refer –
Background jobs guidance – Best practices for cloud applications | Microsoft Learn
Costing Comparison– Premium and Dedicated plans are on little costlier side, however in case of Azure Batch you only pay for the resources underneath like VM, storage, network etc.
4 Helper Function Overview (.Net Code)
DWCryptographer project is a console application, which takes below parameters as input and performs Full File and Column level encryption/decryption
Parameters –
- Request Type – User has to pass ‘E’ or ‘D’ for Encryption or Decryption
- BlobContainerName – Name of the blob container from which the file has to be uploaded/downloaded
- BlobName – Full File of the Blob on which operation has to be performed
- EncryptedColumns – Comma Separated columns which requires encryption/decryption
Ex – DWCryptographer.exe E DWParquetFiles dev/input/sourcefile_PlainText.parquet 6,7
Here 6th and 7th column will be encrypted.
Note – If the value of EncryptedColumns is null or empty it assumes Full File Encryption. So, if user wants to perform column level encryption, they must pass EncryptedColumns as comma-separated value
4.1 Key Vault
- Secret – Used for Secrets like Connection Strings and other important key-value pairs
- Keys – Used to generate RSA key which is used for performing cryptographic operations



5 Code Walkthrough –
5.1 Full File Crypto
The below code performs Full file Encryption. Encryption logic is defined in the function – getBlobEncryptionOption.
While encryption the blob is uploaded with encryption options, highlighted in bold below.
BlobClient inputBlobClient;
try
{
//Read Data from the Blob
Console.WriteLine("Attempting to download plaintext blob");
inputBlobClient = new BlobServiceClient(saConnectionString).GetBlobContainerClient(blobContainerName).GetBlobClient(blobName);
}
catch (Exception ex)
{
Console.WriteLine("Exception occurred while creating Blob Client, verify blob inputs. Message - " + ex.Message);
throw new Exception($"Exception occurred while creating Blob Client, verify blob inputs - {saConnectionString}, {blobContainerName}, {blobName}. {ex.Message} ");
}
MemoryStream inputMStream = new MemoryStream();
inputBlobClient.DownloadTo(inputMStream);
inputMStream.Position = 0;
Console.WriteLine("PlainText blob downloaded successfully, creating Encrypted Blob with encrypted options");
// Create blob client with client-side encryption enabled
BlobClient encryptedBlob = new BlobServiceClient(saConnectionString, FullFileCryptographerHelper.getBlobEncryptionOptions()).GetBlobContainerClient(blobContainerName).GetBlobClient(blobName);
// Upload the encrypted contents to the blob.
encryptedBlob.Upload(inputMStream, true);
Console.WriteLine("Uploaded Encrypted Blob successfully");
Encryption logic is defined below –
internal static BlobClientOptions getBlobEncryptionOptions()
{
// Your key and key resolver instances, either through Azure Key Vault SDK
IKeyEncryptionKey key = getKeyClient();
//IKeyEncryptionKeyResolver keyResolver = new KeyResolver(new DefaultAzureCredential());
IKeyEncryptionKeyResolver keyResolver = new KeyResolver(new ManagedIdentityCredential());
// Create the encryption options to be used for upload and download.
ClientSideEncryptionOptions encryptionOptions = new ClientSideEncryptionOptions(ClientSideEncryptionVersion.V2_0)
{
KeyEncryptionKey = key,
KeyResolver = keyResolver,
// String value that the client library will use when calling IKeyEncryptionKey.WrapKey()
KeyWrapAlgorithm = "RSA-OAEP"
};
// Set the encryption options on the client options.
return (new SpecializedBlobClientOptions() { ClientSideEncryption = encryptionOptions });
}
For Decryption the only logic change is to pass the same encryption options while downloading the blob, highlighted in bold –
BlobClient blob;
// Create blob client with client-side encryption enabled
blob = new BlobServiceClient(saConnectionString, FullFileCryptographerHelper.getBlobEncryptionOptions()).GetBlobContainerClient(blobContainerName).GetBlobClient(blobName);
// Download and decrypt the encrypted contents from the blob.
MemoryStream outputStream = new MemoryStream();
blob.DownloadTo(outputStream);
outputStream.Position = 0;
//Upload Decrypted
BlobClient decryptedBlob = new BlobServiceClient(saConnectionString).GetBlobContainerClient(blobContainerName).GetBlobClient(blobName);
decryptedBlob.Upload(outputStream, true);
5.2 Column-level Crypto
Below code helps with the column-level encryption
Main code to define encryption logic is –
//Updating Writer Settings
updateWriterSettings(ref writerSettings, intArrEncryptedColumns, encryptionKey, EncryptionType.Randomized);
Here it used encryption Key object which is created using our Azure Key Vault Key and Encryption Type is Randomized.
#region Read Input Blob
BlobClient downloadBlobClient;
downloadBlobClient = new BlobServiceClient(saConnectionString).GetBlobContainerClient(blobContainerName).GetBlobClient(blobName);
//Read the blob
MemoryStream msInputBlob = new MemoryStream();
downloadBlobClient.DownloadTo(msInputBlob);
msInputBlob.Position = 0;
// Create reader
using ParquetFileReader reader = new ParquetFileReader(msInputBlob);
// Copy source settings as target settings using Copy function
List<FileEncryptionSettings> writerSettings = reader.FileEncryptionSettings.Select(s => Copy(s)).ToList();
//Updating Writer Settings
updateWriterSettings(ref writerSettings, intArrEncryptedColumns, encryptionKey, EncryptionType.Randomized);
string tempFileName = Guid.NewGuid().ToString();
using (var encryptedMemoryStream = new FileStream(tempFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
// Create and pass the target settings to the writer
using ParquetFileWriter writer = new ParquetFileWriter(encryptedMemoryStream, writerSettings);
// Process the file - Transformation
ColumnarCryptographer cryptographer = new ColumnarCryptographer(reader, writer);
cryptographer.Transform();
Console.WriteLine("Column Encryption Transform completed successfully");
}
//Write Encrypted Data to Blob
BlobClient uploadBlobClient = new BlobServiceClient(saConnectionString).GetBlobContainerClient(blobContainerName).GetBlobClient(blobName);
uploadBlobClient.Upload(tempFileName, true);
Console.WriteLine("Uploaded the encrypted blob");
msInputBlob.Dispose();
//Delete the temp file
File.Delete(tempFileName);
Helper functions –
Below function will copy all the encryption settings of input file to output file.
public static FileEncryptionSettings Copy(FileEncryptionSettings encryptionSettings)
{
System.Type genericType = encryptionSettings.GetType().GenericTypeArguments[0];
System.Type settingsType = typeof(FileEncryptionSettings<>).MakeGenericType(genericType);
return (FileEncryptionSettings)Activator.CreateInstance(
settingsType,
new object[] {
encryptionSettings.DataEncryptionKey,
encryptionSettings.EncryptionType,
encryptionSettings.GetSerializer () });
}
Below code returns the Encryption Key Object using Azure Key Vault key, generated earlier.
public static ProtectedDataEncryptionKey getEncyryptionKey()
{
byte[] keyObj = CryptoGenericHelper.getKeyObject();
//Get Vault Path-Getting Key from Azure Key Vault
string keyVaultUri = CryptoGenericHelper.getValuesFromConfig("KeyVaultURI");
string keyVaultKeyName = CryptoGenericHelper.getValuesFromConfig("EncryptionKeyName");
string azureKeyVaultKeyPath = keyVaultUri + "/keys/" + keyVaultKeyName;
// New Token Credential to authenticate from Azure
//Azure.Core.TokenCredential tokenCredential = new DefaultAzureCredential();
Azure.Core.TokenCredential tokenCredential = new ManagedIdentityCredential();
// Azure Key Vault provider that allows client applications to access a key encryption key is stored in Microsoft Azure Key Vault.
EncryptionKeyStoreProvider azureKeyProvider = new AzureKeyVaultKeyStoreProvider(tokenCredential);
// Represents the key encryption key that encrypts and decrypts the data encryption key
KeyEncryptionKey keyEncryptionKey = new KeyEncryptionKey("KEK", azureKeyVaultKeyPath, azureKeyProvider);
// Represents the encryption key that encrypts and decrypts the data items
return new ProtectedDataEncryptionKey("DEK", keyEncryptionKey, keyObj);
}
Decryption Logic –
The main logic for decryption is below where we use the same encryption key to decrypt but this time EncryptionType is set to Plaintext instead of Randomized earlier –
updateWriterSettings(ref writerSettings, intArrEncryptedColumns, encryptionKey, EncryptionType.Plaintext);
public static void decryptColumnsParquetFile(string blobName, string saConnectionString, string blobContainerName, ProtectedDataEncryptionKey encryptionKey, int[] intArrEncryptedColumns)
{
BlobClient downloadBlobClient;
downloadBlobClient = new BlobServiceClient(saConnectionString).GetBlobContainerClient(blobContainerName).GetBlobClient(blobName);
//Read the blob
MemoryStream msInputBlob = new MemoryStream();
downloadBlobClient.DownloadTo(msInputBlob);
msInputBlob.Position = 0;
Console.WriteLine("Donwloaded the Encrypted Blob");
// Create reader-Using Parquet Reader to read input blob"
using ParquetFileReader reader = new ParquetFileReader(msInputBlob, GetEncryptionKeyStoreProvider());
// Copy source settings as target settings
Console.WriteLine("Copy source settings as target settings using Copy function");
List<FileEncryptionSettings> writerSettings = reader.FileEncryptionSettings.Select(s => Copy(s)).ToList();
Console.WriteLine("Updating the writer settings");
updateWriterSettings(ref writerSettings, intArrEncryptedColumns, encryptionKey, EncryptionType.Plaintext);
string tempFileName = Guid.NewGuid().ToString();
using (var decryptedMemoryStream = new FileStream(tempFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
// Create and pass the target settings to the writer
using ParquetFileWriter writer = new ParquetFileWriter(decryptedMemoryStream, writerSettings);
// Process the file - Transformation
ColumnarCryptographer cryptographer = new ColumnarCryptographer(reader, writer);
cryptographer.Transform();
Console.WriteLine("Column Encryption Transform completed successfully");
}
//Write Encrypted Data to Blob
BlobClient uploadBlobClient = new BlobServiceClient(saConnectionString).GetBlobContainerClient(blobContainerName).GetBlobClient(blobName);
uploadBlobClient.Upload(tempFileName, true);
Console.WriteLine("Uploaded the decrypted blob");
msInputBlob.Dispose();
//Delete the temp file
File.Delete(tempFileName);
}
Code Deployment
- Create Azure Batch Account
Refer below article –
https://learn.microsoft.com/en-us/azure/batch/quick-create-portal
- Create a pool of compute nodes
- Create Application –
- Go to Applications -> Add

- Provide Application Details and browse the code package zip file

- Activate the package and restart the nodes

- Create a Batch Pool

- Provide Pool Details like VM Size, Authentication etc.–

- Attach Storage Account, it’s mandatory to hold the packages

Usage in Azure Synapse
In our scenario the encryption of the file was required after data is read from Source -On-Prem system to Destination -Azure Data Lake. And Decryption is required when the data is read from Data Lake to SQL Pool.
We have created a small pipeline which can be called from other pipelines if Encryption/Decryption is required by passing parameters.

Caller Pipeline – It calls the Crypto helper pipeline to Encrypt or Decrypt the file

This Batch account could be called from anywhere like Web App or console app and crypto operations can be performed at the fly.
Hope this helps.
Contact Me:-
@MyYouTubeChannel, @Gmail, @Facebook , @Twitter, @LinkedIn , @MSDNTechnet, @My Personal Blog