A Hard Drive Retrospective – SQL Server Edition

Sometimes it’s good to look back and appreciate what we have and how far we’ve come. This applies to many things, but today I’m talking about the humble hard drive.

My first hard drive, as a 7 year old learning the delights of Windows 95, was 1.5GB. I can only guess it was very slow – especially paired with a 133Mhz Pentium 1 – because I recall an inordinate amount of time spent twiddling my thumbs watching the Windows boot logo. Now, my desktop and laptop barely even leave enough time for the (far more drab Windows 10) logo to appear at all, mainly thanks to solid-state drives aka SSDs. They’re everywhere now, but it wasn’t that long ago they were a shiny new thing that had sysadmins and infrastructure engineers acknowledging the speed but wondering about reliability.

While I don’t have any dusty PATA relics available, I thought I’d take a look at a bunch of different drives, throw a couple of SQL workloads at them and have them fight it out. From oldest to newest, the contenders entering the ring are…

Western Digital Blue HDD (4TB) – a ‘spinning rust’ HDD, released 2016
Samsung 860 EVO SSD (256GB) – A SATA3 consumer SSD, released 2018
Samsung 870 EVO SSD (1TB) – A SATA3 consumer SSD, released 2021
Corsair Force MP510 (256GB) – A PCI-E 3.0 NVME SSD, released 2018

Unfortunately I can’t install SQL Server on my PS5, else I’d be able to end this retrospective with its monster PCI-E 4.0 5.5GBps SSD instead. Ah well. Trust me, it’s incredibly fast. Anyway…

Let’s look at some raw benchmarks first, using CrystalDiskMark set to the ‘Real World Performance’ setting ensuring we aren’t using unrealistic caching or access patterns;

Our MP510 drank his milk and ate his spinach

From HDD, to 256GB SATA SSD, to newer 1TB SATA SSD, to 256GB PCI-E NVME SSD we have;

  • Sustained read/writes going from 100MB/s to 500MB/s to 1.7GB/s
  • Random reads going from sub-1MB/s to 30 to 50MB/s
  • Random writes going from barely 1MB/s to 80, to 100, to 160MB/s
  • IOPS going from a few hundred to over 9000 to over 100,000
  • Latency going from milliseconds to hundreds of microseconds to sub-100μs

Big gains all round. In particular, that enormous improvement in random performance and latency is why SSDs make computers feel so snappy. The difference between the two SATA SSDs isn’t much because they’re limited by the interface and most of the gain from the 860 to 870 will be due to it being 1TB thus having more chips to stripe reads and writes across. Additionally, eagle-eyed hardware nerds may have spotted the HDD is 80% full – which significantly reduces its sustained performance. Fair point, but a) it’s still in a similar ballpark and b) I wasn’t going to delete my MP3s and rare film collection just for this test, heh.

The tests

Now for the SQL tests. These were performed on a new database on each drive with any vendor-specific caching turned off and a 4-core CPU of similar spec – first up, create a table and dump 10GB of data into it;

CREATE TABLE dbo.SpeedTest
(
	a INT IDENTITY(1,1),
	b CHAR(8000)
)

INSERT INTO dbo.SpeedTest (b)
SELECT TOP 1280000 REPLICATE('b',8000)
FROM sys.all_objects ao1
CROSS JOIN sys.all_objects ao2

First test – a table scan of the whole 10GB, and a clearing of the buffer cache beforehand just in case;

DBCC DROPCLEANBUFFERS

SET STATISTICS TIME ON

SELECT COUNT(*)
FROM dbo.SpeedTest

(As expected, a parallel scan using all 4 cores)

Results;

Who even needs query tuning?

That, folks, is progress. We’ve gone from just under 2 minutes on the HDD, to sub 20 seconds for SATA SSDs, to 3.7 seconds on the PCI-E NVME SSD. Blazin’.

How about seeks? For this I turned to Adam Machanic’s sqlquerystress tool to simulate hammering the database with multiple threads of repeated queries.

First create an index on the identity column to have something to seek;

CREATE NONCLUSTERED INDEX ncx_SpeedTest ON dbo.SpeedTest (a)

And here’s a simple parameterised query, which the tool will substitute values of column a for during successive iterations;

SELECT *
FROM dbo.SpeedTest
WHERE a = @id

Check we’re getting a seek;

Then set the tool for 10000 iterations on 4 threads, clear buffer cache, and the results look like this for the HDD;

Results for all drives;

The trend continues, albeit without the orders of magnitude gains seen with the scanning test. The NVME SSD is still the winner, but not by much vs the 1TB SATA SSD, which is to be expected since the read IOPS and latency aren’t much better. However – if it was an equally-sized 1TB NVME drive, I’d expect some more gains to be had from the increased IOPs.

Finally let’s look at the unglamourous but crucial task of backing up and restoring. For this test I performed a simple single-file backup and restore to the same drive. This isn’t shouldn’t be a realistic scenario because backups should at the very least be on a different drive if not a dedicated backup server – thus avoiding reading to and writing from the same drive – but I didn’t have an easy way to recreate this and it at least reflects a simultanous read/write workload. Additionally I didn’t see any point splitting the backup into multiple files – which can increase speed – given the r/w contention at play.

Our poor old HDD doesn’t fare well here – his little heads scrambling back and forth across the platters to try and keep up. And again big gains to SATA SSD and NVME, who’s ~250MB/s backup speed would be far higher if it could write to another NVME SSD.

The verdict

The HDD languishes with poor to relatively awful performance all over. Our SATA SSDs improve in every aspect with enormous gains in sustained performance. Then our PCI-E NVME ushers in a new era of speed, unconstrained by archaic interfaces and, as a bonus, doesn’t even need cables!

That said, from a wider perspective, it’s not all bad news for the humble HDD. They might have poor performance compared to their solid brethren, but remain king for large amounts of storage – they’re made in much bigger sizes (18TB at the moment, with more to come in future), cost less per GB at high sizes and don’t have the same wear problems as SSDs which are only guaranteed for a certain number of ‘drive writes per day’ meaning some write-intensive applications are out.

These days your SQL Servers are probably VMs hooked up to a flash-based SAN, so there’s nothing to learn here on that front. But if you’re somehow still running a desktop or laptop on HDD, I highly recommend switching to at least a SATA SSD, preferably an NVME drive if possible. I added the MP510 tested here to my laptop last year and the speed of the thing is incredible. Additionally – the bigger the better – bigger size equals more chips and channels which equals more speed.