Optimizing IntelliCache: Addressing Network, Scalability, and Management Challenges in Virtual Disk Image Caching
Owner: scampo
Time: Thu 1:25 PM 6 Jun +0100 (Europe/Lisbon) Final
Location: AUGUSTA

IntelliCache optimises disk image management by caching virtual disk images (vDisks) on the hypervisor’s local storage. This approach aims to alleviate the challenges associated with centralised disk streaming while enhancing performance and scalability. However, similar challenges exist and require attention: Network Challenges: Initial Network Load: During boot-up or when accessing new data, VMs still rely on network connectivity until the required data is cached locally. This can strain network bandwidth, particularly during peak usage periods. Network Dependency for Initial Data: Network issues or outages can disrupt VM availability and performance until the necessary data is cached locally. NFS readonly image caching: when a cache miss occurs, it falls back to a queue depth of one request, which can be slower than not using a cache at all, especially if the working set exceeds the cache size and you could have been issuing queue depth 16 requests. Scalability Challenges: Increased Load with VM Growth: As the number of VMs rises, so does the demand on the hypervisor’s caching infrastructure. This can lead to performance degradation if not managed effectively. Optimizing Cache Efficiency: There might be limits to how many VMs can efficiently utilize cached data from a single vDisk without experiencing performance impacts. Management Complexity: Configuration and Maintenance: Setting up and maintaining IntelliCache configurations may require specialized knowledge and expertise, adding complexity to the management process. Performance Troubleshooting: Diagnosing and resolving performance issues in environments utilizing IntelliCache can be intricate, especially in large deployments with various contributing factors. This design session is aimed at discussing these challenges as well as to explore advanced caching mechanisms in order to optimise caching efficiency, mitigate network dependencies, and streamline management processes, ultimately improving overall system performance and scalability.

Sourcecode: https://github.com/xapi-project/sm/blob/master/drivers/blktap2.py#L363-L364