News, Press, & Events

Pool Your Resources With File Virtualization

Here's a look at the drawbacks and benefits of this compelling technology, which makes disparate storage resources look like a single system.

May 16, 2008 - Andrew Conry-Murray

Fast, cheap, and out of control: That's an accurate description for many enterprise storage environments, which are stuffed full of file servers and network-attached storage appliances. As the cost of disk comes down and the amount of data to store goes up, it's tempting to simply throw more NAS appliances into the mix to keep up with growing storage demands.

But that's a poor excuse for a strategy. For one, file servers and NAS appliances can only serve the clients and applications attached to them. This can lead to unbalanced storage utilization. Workgroup A's appliance may be near to bursting while workgroup B's sits half empty. Reconfiguring workgroup A's clients to tap into workgroup B's storage device is often more trouble than simply adding another device. For another, the sticker price on a storage appliance isn't a full measure of its overall cost. The proliferation of low-cost storage devices can increase costs elsewhere, such as management and power and cooling.

Nonetheless, IT can't stop adding storage--in particular storage for unstructured data. Word documents, spreadsheets, PowerPoint presentations, and e-mail expand like ballplayers on steroids. File-based storage increases by 50% to 120% every year, according to IDC.

Enter file virtualization. This technology inserts itself--whether logically or physically--between clients and applications and storage devices. Clients and applications are configured to connect to the virtualization layer, which appears to the clients as one giant storage system known as the global namespace. The virtualization system then manages the actual connections to the storage devices.

By creating a global namespace, several things become possible. First, storage devices can be used by any client or application, which enables storage administrators to make better use of available capacity and to easily distribute new capacity. Second, the file virtualization layer simplifies common tasks such as migration and mirroring. Some file virtualization technologies let administrators move data even while users are making changes to the files. Third, administrators can get better control over data disposition, such as moving lower-value data to a less expensive disk.

File virtualization has yet to make significant inroads to the enterprise. A rough estimate from the Taneja Group, a storage consulting firm, puts total current revenue from file virtualization products at just $100 million. But analysts expect those revenue numbers to soar. That's because file virtualization provides immediate relief from a chronic IT pain: data migration.

Enterprises migrate data--typically files such as Office documents and CAD/CAM files--for two main reasons. A NAS appliance may be reaching its capacity, both in terms of available storage and I/O. Lots of users shipping information back and forth will affect the NAS appliance's performance. Slow response times lead to slower apps and unproductive employees. In this case, a storage administrator may migrate some data to another NAS device, often one that's faster and has more storage capacity. The admin may also split the data among multiple NAS appliances.

Enterprises also migrate data to reclaim unused storage. "We found a lot of our projects had space allocated to them that wasn't being used, like 30% to 35%," says John James, officer in charge for the U.S. IT unit of SPi, a global business process outsourcing provider. That unused storage was essentially wasted money, he says.

So James copied information to tape, deleted it off the original system, and then restored it to another location. But there's always a risk that this process could've been interrupted by a business opportunity that required users to get access to the data. "We'd have to stop and allow production to do their thing, and then restart again," says James.

Indeed, migrating data is like trying to move office furniture--you can only do it when your employees aren't using it. That means storage administrators have to set up a time with business units in which the data becomes unavailable to workers, such as a weekend.

MOVE THE RIGHT DATA

The next challenge becomes making sure only the right data is moved. Burzin Engineer, VP of technology for e-commerce site Shopzilla, says it would've taken from two days to a week to identify the files to be moved, depending on the size of the file structure. And just before the move itself, his team would've had to verify that the files hadn't been changed. Then, during the planned downtime, the data would be copied and moved to a new storage device. This move can take hours or days, depending on the amount of data and the network's capacity. The next step would be to reconfigure the PCs and applications so that they know the new location of the data. Enterprises with dozens of NAS devices may have to dedicate a full-time storage administrator who does nothing but reconfigure clients and applications after a migration.

"It was hardly worth doing," says Engineer. "We'd just buy more disk and forget about solving the problem."

File virtualization takes the pain out of migration in several ways. Most importantly, it lets storage administrators migrate data even if that data is currently being used by employees. This means no more weekend migration parties for the storage team, and no disruption to employees. The file virtualization technology monitors the data as it's being written and manages the client and server I/O during the data transfer.

Another benefit is more comprehensive management of storage resources. The file virtualization technology discovers the storage environment and can monitor statistics such as I/O and available disk. It also automatically allocates more disk space based on administrator-defined policies.

This is a feature of EMC's Rainfinity file virtualization product, which SPi now uses, that James appreciates. Instead of setting aside 1.5 TB to a project that might not require all that space, he can provision a third of that. "If it reaches 80% of capacity, we can allocate another 500 GB," he says. "Now we don't have those late-night calls about a project running out of space."

A file virtualization product also lets administrators move data to different storage tiers. For instance, the fastest, highest-performance (and most expensive) disk is used for production data that drives the business. Older data, such as files associated with finished projects, can be moved to second-tier storage. Users can still access files on second-tier storage because the file virtualization technology maintains the map to the data.

Shopzilla, which deployed F5's Acopia ARX file virtualization devices on the e-commerce production network, is also planning a deployment for the corporate network. "In corporate, anything that ends in .mp3, we'll pull out and put on second-tier storage," says Engineer.

Engineer also gained a surprise benefit: more leverage with his NAS vendors. Because the ARX manages the connections between clients and the storage devices, he can mix and match NAS products without interoperability concerns. "I can negotiate price with my NAS vendors. They know I have a device that can bypass them." He says that alone paid for the ARX devices.

WHAT'S IN A NAMESPACE?

A key technology of a file virtualization system is a global namespace. Essentially, the namespace is the location of a file on a storage device. When PCs and applications are connected directly to a file server or NAS, they know exactly where a file resides when they need to retrieve it. For example, PC A knows that an Excel file resides on NAS 1.

But if you install a new device, or move data from one NAS to another, you have to reconfigure the client or application so it can find the right path to the information. File virtualization creates a global namespace. In a global namespace, PC A doesn't need to know exactly where an Excel file is; all it needs to have is a path to the file virtualization system, which in turn understands the physical location of the file.

Vendors have taken different approaches to the global namespace. Many rely on third-party technologies, including Microsoft's Distributed File System Namespace for Windows environments, and auto-mount for Unix/Linux environments. DFS Namespace is widely used by file virtualization vendors.

DFS Namespace is built into the Windows operating system. It lets administrators present shared folders that are located on different servers as being in a central location. When a client accesses a directory to find a file, it talks to a DFS server to find out where the file is actually located.

Many file virtualization technologies act as a management layer on top of DFS Namespace. They update the DFS Namespace to include NAS filers as well as file servers. They also provide management capabilities for heterogeneous NAS devices such as monitoring available disk space. The benefit of leveraging DFS Namespace is that it's already integrated into Windows environments and doesn't require additional software to be installed on Windows servers and PCs. In addition, if the file virtualization technology fails, the most recent drive mappings are still stored in Windows, which means clients can still access data.

On the downside, products that rely on Microsoft DFS tend to work best in Windows-centric organizations. For instance, Brocade's StorageX and its new File Management Engine appliance, which leverage DFS, are aimed primarily at Windows shops.

Other vendors, including F5 Acopia, create their own global namespace. They can support Microsoft DFS, but it's not a requirement. The upside of a vendor-created namespace is that it provides a uniform mechanism for virtualizing files for Windows and Unix/Linux environments. However, if the file virtualization product fails, it takes the namespace with it and may cut off access to data until it's restored.

File virtualization technologies are generally deployed in-band or out-of-band. In-band products, which include F5 Acopia's ARX, Attune Systems' Maestro File Manager, and Brocade's File Management Engine, sit directly in the data path between clients and storage devices. They are full proxies that touch every packet crossing the wire.

Out-of-band file virtualization products are deployed as appliances that connect to a switch. They observe storage traffic and manage the namespace to direct data to the appropriate storage device, but they don't intercept any packets. Out-of-band products include Brocade's StorageX and EMC's Rainfinity. EMC positions Rainfinity as a hybrid file virtualization technology. That is, it generally sits out-of-band, but it can behave as an in-band device when directed by policy.

Each architectural approach has its benefits and drawbacks. The greatest benefit of in-band file virtualization is that it lets storage administrators migrate files even if they're being used. This eliminates the need to schedule downtime in which users and applications can't have access to their data.

The in-band product essentially makes a copy of the data as it's being written by a user. Once the migration is complete, the file virtualization system updates the namespace with the new physical location of the data. By contrast, out-of-band products can't move live data.

In-band file virtualization also provides more granular control over data. Storage administrators can define policies around file metadata such as file type (i.e., MP3, Excel), the date the file was created, or the last time the file was accessed. Out-of-band file virtualization systems can only migrate directories or folders, not individual files. For instance, a folder that has high-value documents and files related to a key business project may be stored on high-end disk. That folder may also contain data unrelated to the project, but must also be stored on the same disk.

Still, in-band products come with their own set of potential drawbacks. At the top of the list is the potential to introduce latency. Because the device sits in-band and must process every packet that crosses the wire, the file virtualization system can become a bottleneck. This is a crucial concern for high-volume environments that generate lots of traffic. You must carefully evaluate the processing capability of in-band devices to ensure that they can handle the pounding, particularly if the appliances are built from general, PC-based components rather than purpose-built hardware.

A second issue is device failure--a dead box will cut off access to the storage system. And if the in-band virtualization device creates its own namespace, administrators will have to rebuild that namespace when the device comes back online.

The obvious solution is to deploy clusters of in-band products. While this helps address latency and provides additional devices to take up the load if one machine fails, it also raises the price of the system. For instance, the average configuration of a high-end Acopia switch costs $200,000. You'd double that for a two-node cluster.

Because out-of-band devices don't sit in the data path, a device failure won't cut off access to data. And because out-of-band devices tend to leverage Microsoft DFS for the namespace, the DFS environment will have the most recent namespace configuration so that clients and applications can still find their data.

LEARNING CURVE

Deploying a file virtualization system is a significant undertaking, and customers should budget for a professional services engagement and a top-notch support contract in addition to the capital cost of the product.

"We had all kinds of EMC engineers in here," says SPi's James. His team was trained by EMC and participated in the implementation of the Rainfinity system. "We did some of the configurations under the careful eye of the EMC engineers," he says. SPi also added 55 TB of storage at the time of the Rainfinity implementation, which increased the complexity of the project.

The team worked carefully to ensure that data migrations didn't consume resources that would slow the product environment. That meant striking a balance among several resources, including network bandwidth and the demand on the Rainfinity boxes and NAS heads (that is, the part of the storage device that actually writes data to a disk). "The migrations run in the background and have a lower priority than production traffic," James says.

James and his team have run into snags here and there. "Decisions we made to get the system up and running might have to be changed down the road, so we've had to make configuration changes." That said, the team is familiar enough with the platform that phone calls and WebEx sessions with an EMC engineer are sufficient to get back on track.

As companies consider an approach to file virtualization, here's a word to the wise: Vendors treat the in-band and out-of-band argument like Catholics and Protestants treat religion--one is truth, and the other is heresy. Yes, there are architectural differences and pluses and minuses to each, but it's too easy for conversations about the technology to degenerate into slagging contests, which may be entertaining but do little to help enterprises choose the best fit.

Rather than pick sides, companies are best served by understanding their environments and identifying two or three major pain points to address. For instance, if an environment is fairly balanced between Windows and Unix/Linux, a vendor whose namespace doesn't heavily rely on DFS makes sense. If storage consolidation in a Windows environment is a primary driver, a souped-up, purpose-built platform may be overkill.

The truth is that file virtualization can solve serious storage headaches. Forget about in-band vs. out-of-band. Focus on your needs, prepare to invest in training and professional services, and then start making calls.