Tracking and cracking network performance problems is no easy task. More than a matter of identifying often mystifying bottlenecks, ensuring network efficiency requires an almost preternatural understanding of your organization's IT operations, as well as a thick skin for withstanding the heat when problems inevitably arise.
To keep your network humming, we've outlined 10 areas where tweaking and moderate investment can lead to significant performance gains. After all, as more and more organizations seek to conduct business at wire speed, making sure your systems blaze is essential to the competitive edge your organization needs.
Performance tip No. 1: Speed up that WAN
IT has long been caught in the web of leased lines and costly WAN charges. Linking multiple sites with T1 lines, MPLS, and even Frame Relay used to be the only way to guarantee connectivity, but the scene has changed. Rather than curse at your monthly WAN bill, it's high time to investigate your alternatives.
Cogent Communications is one of several providers boasting a significant fiber footprint around the United States. Tapping these outlets might mean a substantial increase in site-to-site bandwidth at a significant cost savings -- it's all a matter of location. Even bringing a few sites into a new WAN design can save enough money to increase bandwidth to the sites that aren't accessible by the same carrier.
You may wind up running your own VPN between these sites, but if the carrier's SLA is strong enough and the network is as low-latency as it should be, this won't be an issue. Think of the benefits of 100Mbps across all your sites and a WAN bill downsized by half.
Sites outside the footprint of the larger carriers, and thus destined to remain on leased-line connections for the foreseeable future, could benefit from a WAN accelerator, such as Riverbed's Steelhead appliance (see the InfoWorld Test Center's hands-on review of Riverbed Steelhead). If you can't increase bandwith to those satellite sites, your only option is to decrease traffic on those circuits without reducing their efficacy. That's where WAN optimization tools come in.
Performance tip No. 2: Lose the leased lines
Unless you're headquartered in the Sahara, it's time to ditch leased-line Net access. Between Time Warner Business Class, Comcast Business Class, and FiOS, there's bound to be a better, cheaper way to bring high-speed Internet into your environment. A ten-fold Internet bandwidth increase in place of existing T1 circuits is not out of the realm of possibility and can be achieved for a fraction of the cost without compromising reliability.
Granted, T1 and T3 leased lines provide more of a guarantee against latency, but the cost differential is extraordinary, and the maturity of these networks -- especially the business-class products -- has grown substantially. It's time to tell your telco to pull its SmartJacks and bring in something better.
Slow Internet access is always a major complaint among users. Bringing them the same relative speed they get at home goes a long way toward appeasing the masses.
Performance tip No. 3: Let auld acquaintance be forgot
Many businesses cling desperately to elderly application platforms, leaving IT saddled with the high-cost, resource-intensive task of shoehorning old platforms into new infrastructures. This is how you wind up with a brand-new VMware vSphere architecture running a handful of Windows NT4 boxes.
Refusing to let go of the past often results in increased costs, downtime, and fragility of core business systems. Instead of holding meeting after meeting to figure out how to get a 10-year-old accounting package transferred to a new infrastructure, launch it into orbit and migrate to something new. The upfront costs may be more, but they will pale against the long-term costs you'll incur by not severing these ties.
This is a personnel issue as much as it is a technical one. There are always those in IT shops who see everything through the prism of their favored technology, facts be damned. It's not always easy to shepherd these folks through the dark and stormy night of new technology, but remember, hanging on to fixed-purpose IT admins can be as detrimental as hanging on to elderly technology.
Performance tip No. 4: Build a lab
There's no excuse. For the cost of a single server, you can build a monster IT test lab. A cheap, dual-CPU, 12-core AMD Istanbul-based 1U server can run several dozen virtual machines in a test scenario for about $1,500. Using VMware Server on Linux or VMware ESXi, you can avoid software licensing fees, while maintaining a perfectly valid platform for testing anything, from software upgrades to new packages, new operating systems, or even network architectures.
Combine a virtualized server lab with tools such as GNS3, and you can build and test just about any planned network or system infrastructure you want. There's no easier way to determine where resource bottlenecks reside than in a test bed, and if that test bed is as easily constructed as it is in a virtual lab, there's no reason not to find them. Moreover, with a virtual lab, you can find the sweet spot for certain servers, including how much RAM and CPU resources they'll need to function under expected (and unexpected) loads, thereby ensuring you waste fewer resources.
Performance tip No. 5: Watch everything
Network and system monitoring is the granddaddy of bottleneck diagnostics. When users complain that the network is slow, the network usually has nothing to do with it. But unless you have the facilities to show exactly where the problem resides, you're left hunting around in the dark for the solution.
Whether you prefer proprietary or open source tools, there's a myriad of options available to monitor everything from network latency and throughput to RAM and CPU utilization, to SAN performance and disk queue lengths -- you name it.
If it exists, it can be monitored. If it can be monitored, it can be graphed. And if it can be graphed, there's a very good chance that a simple perusal of the resulting graph can lead you in the right direction, greatly accelerating the problem-detection portion of any troubleshooting effort.
And when implementing network monitoring, be sure to leave no stone unturned. Monitor the CPU utilization of your routers and switches; watch the error rates on Ethernet interfaces; have your routers and switches log to central syslog servers and implement some form of logfile analysis to alert you when there are reports of anything from IP conflicts to circuits going down. Careful, conscientious implementation and tweaking of your monitoring framework will save enormous amounts of time and energy, especially when it counts the most.
Performance tip No. 6: Know your apps
Infrastructure performance monitoring will only get you so far. All the computing and storage resources that you are offering up on your network are being consumed by your applications. For too many of us, those applications form something akin to a black hole -- we can easily observe their effects on our infrastructure, but it's often difficult to see inside them to know what's going on.
Many IT shops are content to let software vendors install and implement the applications on their networks; after all, that's less work for IT. But be careful -- you're on the hook when the network later slows to a crawl.
Spend time testing your apps with an eye to uncovering their soft spots. Whether it's a particularly expensive stored database procedure that gets called when users log in or a massive performance slowdown during third shift when backups kick off, you need to know ahead of time where your likely performance drains reside.
To accomplish this, insist on testing new applications in your infrastructure before they are purchased. Pay close attention to the amount of resources used as you test and project how much performance the application will require under real-life production loads. This kind of testing can uncover severe architectural flaws in the application that may make it inappropriate for your environment. Better to know that in advance than to find yourself fending off users armed with torches and pitchforks.
Performance tip No. 7: Terabytes and spindle counts, oh my
The past few years have seen explosive growth in disk capacity. With the advent of 2TB SATA disks, it's now possible to jam more than 10TB into a single two-rack-unit server. And that's great -- because now you need fewer disks, right? Not so fast.
It's crucial to understand that today's SATA disks share an important trait with their smaller predecessors: They're fast. While it may be possible to fit 2TB of data onto a single 7,200-rpm SATA disk, you'll still be limited to an average randomized transactional throughput of perhaps 80 IOPS (I/O operations per second) per disk. Unless you're storing a mostly static data bone yard, be prepared to be thoroughly unhappy with the performance you'll get out of these new drives as compared to twice the number of 1TB disks.
If your applications require a lot of randomized reads and writes -- database and email servers commonly fit this bill -- you'll need a lot of individual disks to obtain the necessary transactional performance. While huge disks are great for storing less frequently used data, your most prized data must still sit on disk arrays made up of faster and smaller disks.
Performance tip No. 8: Beware the 10-pound server in the 5-pound bag
Virtualization has to be just about the coolest thing to happen to the enterprise datacenter in a long time. It offers a multitude of manageability and monitoring benefits, scales cleanly, makes disaster recovery simpler than ever before, and dramatically decreases the number of physical servers you need chewing up power and spewing out heat.
As you grow your virtualization infrastructure, it should be fairly easy to keep tabs on CPU and memory performance. Any virtualization hypervisor worth its salt will give you visibility into the headroom you have to work with. Disk performance, on the other hand, is tougher to track and more likely to get you into trouble as you push virtualization to its limits.
By way of example, let's say you have a hundred physical servers you'd like to virtualize. They're all essentially idling on three-year-old hardware and require 1GHz of CPU bandwidth, 1GB of memory, and 250 IOPS of transactional disk performance.
You might imagine that an eight-socket, six-core X5650 server with 128GB of RAM would be able to run this load comfortably. After all, you have more than 20 percent of CPU and memory overhead, right? Sure, but bear in mind that you're going to need the equivalent of about 140 15,000-rpm Fibre Channel or SAS disks attached to that server to be able to provide the transactional load you'll require. It's not just about compute performance.
Performance tip No. 9: To dedupe or not to dedupe
As your data grows exponentially, it's natural to seek out tools that curb the use of expensive storage capacity. One of the best such examples is data deduplication. Whether you're deduplicating in your backup and archiving tier or directly to primary storage, there are massive capacity benefits you can derive weeding out similar data and storing only what is unique.
Deduplication is great for the backup tier. Whether you implement it in your backup software or in an appliance such as a virtual tape library, you can potentially keep months of backups in a near-line state ready to restore at a moment's notice. That's a better deal than having to dig for tape every time you have a restore that's more than a day or two old.
Like most great ideas, however, deduplication has its drawbacks. Chief among these is that deduplication requires a lot of work. It should come as no surprise that NetApp, one of the few major SAN vendors to offer deduplication on primary storage, is also one of the few major SAN vendors to offer controller hardware performance upgrades through its Performance Acceleration Modules. Identifying and consolidating duplicated blocks on storage requires a lot of controller resources. In other words, saving capacity comes at a performance price.
Performance tip No. 10: Accelerate your backups
Backups are almost always slower than you'd like them to be, and troubleshooting backup performance problems is often more art than science. But there is one common problem that nearly every backup administrator faces at some point or another.
If you are backing up direct to tape, it's likely you're underfeeding your tape drives. The current generation of LTO4 tape drives (soon to be supplanted by LTO5) is theoretically capable of more than 120MBps of data write throughput, but few ever see that in real life. Mostly this is because there are very few backup sources that can support sustained read rates to match the tape drive's write performance. For example, a backup source consisting of a pair of SAS disks in a RAID1 array may be capable of raw throughput well beyond 120MBps in a lab environment, but for standard Windows-based file copies over a network, you'll rarely see rates greater than 60MBps. Because many tape drives become significantly less efficient when their buffers are empty, this becomes the root cause of most backup performance problems.
In other words, the problem isn't your tape drive; it's the storage in the servers you're backing up. Though there may not be a great deal you can do about this without investing heavily in a large, high-performance intermediate disk-to-disk backup solution, you have more options if you have a SAN. Though it will depend largely on the kind of SAN you have and what backup software you run, utilizing host backups -- which read directly from the SAN rather than over the network -- can be a great solution to this particularly vexing problem.