Tuesday, October 12, 2010

Will Web-scale Servers Find a Role in the Enterprise?

PC World


An emerging class of extremely low-power servers is helping Internet companies and hosting providers to slash their energy bills, and proponents say they could have a role in the enterprise as well.

The servers, offered by established players such as Dell and SGI as well as start-ups such as SeaMicro, cut power use by reducing server components to a minimum, aggregating fans and power supplies across several servers, and employing low-power processors normally used in netbooks and other mobile devices, such as Via's Nano processor and Intel's Atom chip.

Examples include Dell's "Fortuna" server, which crams 12 mini servers based on Via Nano processors into a 2U chassis. It's a fully functioning server with its own storage, memory, management controller and dual 1GbE cards, but each server consumes less than 30 watts of power at full load -- far less than a typical server of the same size. Dell developed the server with Web hosting companies in mind.

More radical is SeaMicro's SM10000, introduced in June, which crams 512 single-core Z530 Atom chips into a 10U system, or about a quarter of a standard server rack. SeaMicro's breakthrough is a custom ASIC that replaces most of the components on a typical server board, including storage and networking controllers, leaving just three chips: the processor, DRAM and ASIC.

"The big processors are like taking a spaceship to the grocery store for most problems today," says Andrew Feldman, SeaMicro's CEO. "What you really need is a Prius."

SeaMicro says the boxes provide equivalent performance at a fraction of the power of traditional rackmount servers, and in far less space. It can cram 2,048 Atom CPUs in a fully loaded rack that burns just 8 kilowatts.

The low-power processors are not powerful, but they are well suited to workloads that can be broken into many smaller, separate tasks that are executed independently, said John Abbott, chief analyst at The 451 Group. "That's what the big CPUs from Intel and AMD aren't good at; they have to be fully utilized or they're not being efficient."

Web-scale companies such as Yahoo and Microsoft use the low-power servers for jobs like dishing up search results or displaying status updates. They are also popular among hosting providers who want to offer customers a dedicated server at minimal cost.

Data mining and more

While those companies are the main target for these low-power servers today, proponents say they could be used in the future for certain tasks at large corporations. The example most often cited is large-scale data mining, where the servers can be used to uncover trends among terabytes of data such as financial transactions, customer records and weblogs.

"I'm not advising anybody to migrate enterprise workloads to these new platforms. That would be a disaster," says Forrest Norrod, who run's Dell's server division. "But when you are doing new development, new services, you've got to start considering these new cloud architectures because they are going to offer the lowest marginal cost to compute."

For data mining, tools such as Apache Hadoop, which is open source software inspired by Google's MapReduce, allow petabytes of data to be distributed across a cluster of commodity servers and then searched and analyzed at high speed. Aster Data and Greenplum, recently acquired by EMC, also offer tools for distributed data mining.

"The big Web guys use this technology to ingest their weblogs," says Mike Olsen, CEO of Cloudera, which makes a commercial supported version of Hadoop. "They want to watch their user behavior at a very fine, granular level -- what pages do they visit, how long do they spend there, where do they go next."

But some financial services companies are running Cloudera for similar tasks, Olsen says. He cites one customer, a large bank that has acquired dozens of smaller banks throughout the United States. Each has customer data locked in siloed applications for tracking credit cards, debit cards and home mortgages.

"You want to be able to analyze that data to do a risk assessment, which is critical, but there's no way to easily search across all that data where it resides or to identify multiple instances of the same guy," Olsen says.

The bank copied the data into a large Hadoop cluster where it can search for patterns in a way it was unable to do previously, he says. For example, it can look for customers that defaulted on loans and analyze their transaction history during the preceding months, to uncover patterns that might help identify customers likely to default in the future.

Another potential use case is running the middleware used to authenticate and connect mobile workers to back-end ERP and CRM systems. "When everyone with a smartphone logs onto your corporate network at 8 o' clock on Monday morning, there's a server-side app that has to secure and maintain each connection," Norrod says. "That area is exploding and it's ideal for this type of system."

High-performance computing applications are also fertile ground. Scientists at the University of Kentucky are using a cluster of Dell C6100 servers, which were designed to offer high compute density, to simulate and analyze the movement of molecules. Vince Kellen, the university's CIO, says the systems could also be used for simulations in drug design or mechanical engineering, or for processing large audio and video files. "It's potentially useful wherever a large file can be broken into many pieces and analyzed," he says.

For sure, there are obstacles to the adoption of such systems in the enterprise. Big Internet companies tend to have homogenous workloads and can fine-tune their data centers for a particular set of applications, says Andy Bechtolsheim, a systems designer who co-founded Sun Microsystems and now leads product development at Arista Networks. "They have in many ways a simpler problem than enterprise data centers," he says.

He believes that "the jury is still out" on whether servers based on low-power Atom and Arm-based processors are really more energy-efficient. "What matters is not absolute power consumption but rather power efficiency, i.e., power used per application throughput," Bechtolsheim says.

Applications that will run across such a distributed architecture must usually be custom written, and there are technical limitations to the types of workloads that can be moved to a cluster of low-power systems.

"The things that become a challenge are the input/output resources and the cache resources, which are not something that these systems are known for," says Bill Mannel, vice president of product marketing at SGI. "So as soon as you get in a situation where you need to move a lot of data in and out, that's when you see a fall-off."

But SeaMicro's Feldman argues that pressure to reduce costs is forcing companies to explore alternatives.

"Today's databases are punishingly expensive and historically they run on expensive servers," he says. "People are doing anything they can to avoid buying more of them, and that's where software like Hadoop comes in."