SQL Server Database File Deduplication

The estimated reading time for this post is 2 minutes

Data Deduplication feature is added on Windows Server 2012 to reduce the storage cost by replacing duplicate data chunks from files with some reparse pointers. It reduces the impact of redundant data on storage costs. When enabled, Data Deduplication optimizes free space on a volume by examining the data on the volume by looking for duplicated portions on the volume. Duplicated portions of the volume’s dataset are stored once and are (optionally) compressed for additional savings. Data Deduplication optimizes redundancies without compromising data fidelity or integrity.

The space savings that you can gain from Data Deduplication depend on the dataset or workload on the volume. Datasets that have high duplication could see optimization rates of up to 95%, or a 20x reduction in storage utilization. To use data deduplication feature for SQL Server database files, you are required to choose either ‘Virtual Machine’ or ‘Backup’ usage type during configuration as both mentioned types are able to optimize files while they are in use by other applications. I have enabled the feature and configured the disk volume ‘E’ to optimize SQL Server database files.

Remember that SQL Server database files should not be on a volume which NTFS Data Compression is enabled as it result in data corruption.

NTFS compression is a feature of NTFS that you can optionally enable at the volume level. With NTFS compression, each file is optimized individually via compression at write-time. Unlike NTFS compression, Data Deduplication can get spacing savings across all the files on a volume. This is better than NTFS compression because files may have both internal duplication (which is addressed by NTFS compression) and have similarities with other files on the volume (which is not addressed by NTFS compression). Additionally, Data Deduplication has a post-processing model, which means that new or modified files will be written to disk in-optimized and will be optimized later by Data Deduplication.

Bare in mind that Data Deduplication feature is supported by Failover Clustering, Storage Replica, DFS  Replication, Branch Cache, Quotas and Windows Server Backup.

There are two dummy databases named ‘Deduplicate_DB’ and ‘Deduplicate_DB2’ beside ‘AdventureWorksDW’ database on volume ‘E’ as shown at below figure:-

Once I create the and moved the database files onto volume E, the I start the ‘Optimization’ job manually by running the ‘Start-DedupJob’ powershell command and waited for a while for Windows Server to optimize the database files.

The ‘Get-DedupStatus’ command shows that how much space have been saved by using deduplication feature. For my case it saved about 4.99GB storage space. The windows explorer shows the database files size remain same but the volume information on ‘This PC’ shows different as below figure:-

The Server Management console shows that 64% of database files size are optimized and I managed to save almost 5GB on volume E.

Data Deduplication feature is very useful if you are required to store database backup files on disk storage for long period of time and you would like to save lots of storage space and cut down the cost. It is recommended to run this optimization task during non-peak business hours on database files and run hourly basis for database backup storage volumes. I hope this short blog post was informative and useful for you, stay tuned for more blog posts.

Hamid J. Fard

I am SQL Server Data Platform Expert with more than 9 years’ of professional experience, I am currently Microsoft Certified Master: SQL Server 2008, Microsoft Certified Solutions Master: Charter-Data Platform, Microsoft Data Platform MVP and CIW Database Design Specialist. I also do Pig, Hive and Sqoop development regarding Hadoop Big Data platform. After a few years of being a production database administrator I jumped into the role of Data Platform Expert. Being a consultant allows me to work directly with customers to help solve questions regarding database issues for SQL Server.

More Posts

Follow Me: