Properly architecting and scaling storage can eliminate bottlenecks. But ultimately, the ideal storage design comes down to the performance, capacity and budget needs of each business.
This page provides storage performance recommendations for eDiscovery application servers, database servers, and related components. Optimizing storage read/write speeds and IOPS can significantly improve overall performance and user experience in high usage eDiscovery environments.
Here are some on-premises storage recommendations for achieving high performance for eDiscovery workloads:
Use all-flash storage arrays with SSDs for optimal performance. Target max IOPS and low latency.
Leverage SAN fabrics like 32Gb Fibre Channel or 100Gb Ethernet for back-end connectivity.
Ensure redundant infrastructure - controllers, switches, HBAs, network paths.
Use storage tiering if using a mix of SSD and HDD. Tier active data to SSD.
Enable read/write caching capabilities on array controllers.
Use proper RAID configurations optimized for performance (RAID 10).
Scale up SSD drives, cache, and controllers to grow IOPS and throughput.
Use all-flash or hybrid (SSD cache + HDD) NAS systems.
Leverage high throughput network connectivity - 25GbE, 50GbE, 100GbE.
Enable Adaptive Cache SSD caching to absorb large reads/writes.
Setup WAN efficient replication for disaster recovery.
Ensure NFSv3 or SMB3 high performance protocols.
Use NVMe PCIe SSDs on servers for extremely high IOPS and low latency.
Leverage NVMeoF/RDMA to extend NVMe storage across network.
Configure OS software RAID 0/1 where possible.
Use server-side read/write caching.
Leveraging high performance interconnects, protocols, controllers, SSD media, caching, and scale-out architecture can help achieve optimal throughput and IOPS on premises.
Work with your vendors to map business requirements to storage architectures. While performance is crucial, find the optimal balance for your environment and constraints.
Here are some recommendations for achieving high performance storage with VMware for eDiscovery workloads:
Use the vSphere Virtual Volumes (vVols) architecture to leverage fast storage arrays directly.
For SAN storage, utilize the VMware Virtual SAN with all-flash disks for maximum IOPS and low latency.
Configure VMware vSphere Distributed Resource Scheduler (DRS) to optimize workload placement.
Use VMware vSphere Flash Read Cache to cache hot reads and improve read performance.
Leverage Fast VP storage tiering to automatically place hot data on the highest performance tier.
Use VMware vMotion to relocate VMs to less utilized datastores to rebalance workloads.
Ensure gigabit ethernet connectivity between hosts and 10Gbps+ between vSphere and SAN fabrics.
Use Paravirtual SCSI (PVSCSI) adapters for reduced CPU utilization.
Schedule I/O intensive VM operations to off-peak hours.
Monitor and tune queue depths, queue scheduling, etc.
Scale up SSD spindles on arrays to meet performance demands.
vVols, Virtual SANs, caching, networking, and scheduling optimizations can help maximize storage performance for eDiscovery workloads on VMware. Continually monitor and tune based on requirements.
Here are some recommendations for achieving high performance with Amazon EBS volumes for eDiscovery workloads in AWS:
Use Provisioned IOPS (io1) volumes for mission critical, low latency workloads that require consistent high IOPS. You can provision up to 64,000 IOPS per volume.
For sequential workloads, use Throughput Optimized HDD (st1) or General Purpose SSD (gp2) volumes. st1 can provide 500MB/s throughput per volume.
Use EBS-optimized instances for additional dedicated bandwidth between EC2 and EBS.
Use RAID 0 stripping across multiple EBS volumes to increase total IOPS and throughput. Spread volumes across different servers.
Ensure EBS volumes use the latest generation (ie. C5, M5 for EC2). These have the fastest interconnect.
Place EC2 instances and EBS volumes in the same Availability Zone for lowest latency.
Use Elastic File System (EFS) for high throughput shared file storage rather than EBS.
Enable EBS encryption for data at rest security. This has minimal impact on latency.
Monitor and scale up EBS volume sizes as needed to meet performance demands.
Consider using EC2 instance storage (NVMe SSDs) for temporary storage (tempdb) with high IOPS/throughput.
Using Provisioned IOPS and optimizing with RAID, EBS-optimized instances, latest EC2 types, and low latency access within AZs can help meet demanding eDiscovery workload requirements on AWS.
While these recommendations aim to optimize performance, clients should determine the storage configuration that offers the best balance of performance, cost, and features to suit their specific business needs and budget.
Here are some recommendations for achieving high performance storage with Azure for eDiscovery workloads:
Use Premium SSDs for VM disks when low latency and consistency is critical. These provide up to 32,767 IOPS per disk.
For throughput, use Standard SSDs. These can provide up to 750MB/s throughput per disk.
Use Azure Ultra Disks for extremely high IOPS (up to 160K IOPS) and throughput (2GB/s) per disk.
Leverage blob storage for large sequential operations. Enable soft delete and versioning for data protection.
Use Premium Files shares for high performance SMB 3.0 file storage, instead of directly attached disks.
Spread disks across multiple VMs and scale up to increase aggregate IOPS and throughput.
Use accelerated networking on VMs for low latency, high throughput disk access.
Place VMs and storage accounts in same region and availability zones for fastest access.
Use caching on disks judiciously. Enabling host caching can boost performance.
Monitor and scale up disk sizes as needed to meet performance demands.
Consider using Ultra SSDs on Azure HPC VMs for temporary storage with extremely high IOPS and throughput.
Combining Premium/Ultra SSDs, Premium Files, blob storage, accelerated networking, caching, and scaling can help address rigorous eDiscovery workload needs on Azure.
As with AWS above, clients should weigh the performance gains against additional costs to find the ideal storage setup aligned to their budget and requirements.
Properly architecting and scaling storage can eliminate performance bottlenecks in eDiscovery environments. Match storage capabilities with workload requirements and scale components appropriately.