December 29, 2009

Service Provider Uptime- Colocation

As described by Wikipedia, uptime is a measure of the time a computer system has been "up" and running.  While that definition describes uptime specifically for a computer, uptime measurements are the single most important measurement for colocation and managed hosting providers today.  Its the basis of the SLA they provide, but it's also the standard of performance every provider strives for.

Before we discuss the definition of uptime for service providers, its worth our time to discuss the meaning as it relates to the services the provider is going to give you. 

When asked about up-time, every service provider thinks about it in a different way.  Some will say that downtime is defined as an unexpected event that renders a particular service out-of-service for a prolonged period of time.  Additionally, every SLA for up-time- even when its 100% guaranteed, will include provisions for planned downtime.  So, its wise to understand how those up-times and planned outages are defined in your contract.  If the provider doesn't provide their service using N+1 redundancy, there's a strong likelihood that you will experience downtime because of routine maintenance.  Normally, if this occurs, you're most likely dealing with a Tier2 provider.


So, how is up-time measured for service providers?  That question is somewhat ambiguous because, in some ways, its a philosophical argument.  However, there are guidelines that provide some ideas on how service providers measure themselves:



It goes without saying that in some cases, a great deal of planning and investment is required for a service provider to be able to live up to these standards.  The amount of manpower and capital needed to achieve near 100% up-time is significant.  At the end of the day, relying on a service provider to meet or exceed these up-time guarantees is simply what is expected, so finding a provider with strong financial's and a solid track record is paramount.  If you understand how the provider deals with downtime (planned and unplanned), you're arming yourself for a good contract negotiation.

Reliability is the name of the game, customers rely on the colocation or managed hosting provider to deliver a service that is as good or better than what they could do on their own.  The components that are measured by their uptime are pretty straight forward.

Colocation providers focus on several things:
  • Power:  Definitely the single most important element for a colocation provider.  They achieve HA (high availability) by investing money in multiple power feeds from the utility provider, redundant ATS (automatic transfer switches), multiple generators and UPS systems. 

    In addition to these redundancies, you will also hear providers discuss A&B power feeds.  By doing so, most providers will achieve true A & B power by providing you power drops from different distribution sources.  In some data centers, the use of the Starline Buss system gives them the ability to easily provide multiple power drops from multiple sources.  But beware, if it looks like it's coming from 2 separate sources, make sure that you understand how the power is sourced.  Meaning, its only true A&B power if the "B" side power is sourced from a different set of UPS systems and distribution.

  • Cooling:  This is the second most important element in the colocation center.  Most, if not all Tier 3 data centers will have more than one cooling unit (AHU or CRAC) to achieve N+1 redundancy.  Be sure you also understand if the cooling source is redundant and that it is power runs through the main UPS and generator systems.  If there aren't multiple cooling loops, there is effectively a single point of failure in the cooling systems design.

  • Network:  Any data center worth looking at is outfitted with multiple internet providers connections.  As long as there is more than 1, you're dealing with a Tier 3 data center.  You'll find that some data centers provide more than 2 and that's a good thing.  However, it's worth exploring how the routing is handled and if they are using any special BGP routing to ensure that your outbound traffic is sent across the best route, not the cheapest.  In addition, a multi-homed network and redundant distribution is a must.  Be sure to always ask for multiple network drops if you're buying your internet connectivity from the provider.
Overall, colocation is going to continue to play a significant role in the IT budgets of the future.  More companies are strategically shedding items out of the budget that are not in the core competency.  I remember discussing this with the CIO of company here in Denver.  He put it quite frankly when he said managing generators, ups systems and network have nothing to do with the existence of the company.  What made them great was making their product, not managing the uptime of a generator.


(Hosting Providers uptime will be covered in another article.)

How to choose a Colocation (Colo) Provider

Evaluating a colocation provider can be a daunting task.  Not only are you planning to physically moving most, if not all your IT infrastructure, but you have to make the right decision on a service provider that provides the least risk and the most value for your dollar spent.  This is no trivial task, but here are a set of starting points that can help in building an evaluation plan that is sound.

Before reaching into the guts of this discussion, I think it is apt to discuss what you should consider not doing.  There are two things to consider immediately that can make this process easier and less risky:

Do not consider a 3rd party colocation provider:  That is, make sure that the companies you put on your list own and operate the physical space your cabinet or cage will go into.  While resellers serve an important role in our market, consideration of these resellers should be eliminated for legal reasons.  Should a 3rd party default on their contract with the hosting facility, the gear you've entrusted them can become part of a legal battle.  In most, if not all situations like this, providers will shut down power and internet connectivity and your equipment will be held as collateral assets until the bill is paid.  The main idea here is to keep as much control over the contract with the facility.

-Consider evaluating no more than 3 -4 vendors:  Doing this will really cut down on the time building a comparison matrix.  If your company makes use of a committee for evaluation, having more than 3 or 4 vendors will invariably lead to analysis paralysis.  I have worked with several companies that chose to evaluate upwards of 12 different providers.  Their decision ended up becoming more political and price driven than it was practical and value driven.  Additionally, the differences between providers- which we'll go into later- is becoming smaller and smaller.  Because of that, the decision is harder to make based on the comparison matrix.

Aside from the above, the evaluation process can be made straight forward and simple.  The key is to find providers that won't waste your time because of their lack of abilities.  You will have to physically tour each facility so keeping it to 3 or 4 different providers will save you time, money and alot of debate.

How to Choose a Colocation (Colo) Provider: 

1.  The levels of redundancy really do matter:  In today's colocation world, the levels of redundancy are easier to understand when using the designations that the Uptime Institute uses to classify data center redundancy.  Why is this important?  Depending on the level of redundancy you want and are willing to pay for, the tier level of the facility you're evaluating determines the type of SLA (service level agreement) the facility can provide you.

Generally, the designation of Tier 1,2,3 or 4 directly correlates to the levels of redundancy for power, cooling and network, with some added components.  I won't go into details of each tier.  What is important is to understand that the classification directly effects the SLA and capability of the data center.


Tier 1-
  • Common SLA for power, cooling and network availability =  less 99% guaranteed.
  • Single generator, power utility, UPS systems and network.
  • Usually not 24x7x365.
  • Pricing will vary but generally below 30.00/sf for space, power and internet connectivity.
Tier 2-
  • Common SLA for power, cooling and network availability = less than 99% guaranteed.
  • May possess redundant components for generator, power utility, UPS systems and network.  Commonly referred to as N+1 redundancy.
  • May be staffed 24x7x365.
  • Pricing will vary but generally at or near 30.00/sf for space, power and internet connectivity.
Tier 3-  Typically, colocation is their primary business. 
  • SLA's are generally 99.999% or better.
  • Commonly staffed 24x7x365.
  • Will maintain at least N+1 redundancy, meaning they have more than 1 generator, UPS, power utility connection and multiple internet carriers.
  • Pricing will vary by state but you can expect to pay between 50.00+/sf for Space/Power and Access (SPA).  Higher Density configurations will generally be much higher (more on that later).
Tier 4-  Typically, these are hard to find data centers.  They are usually housed in hardened facilities and will have redundancy of system + system.  For example, if a Tier3 data center wished to be classified as Tier4, they would need to take all of their N+1 redundant components and duplicate it.  So, in other words, their redundancy is redundant.

  • SLA's must be guaranteed 100% availability.
  • They are typically bunkered or can even have kevlar lined walls.
  • Most will have onsite armed security personnel.
  • No single point of failure in their infrastructure.
  • Pricing will vary, but expect to pay top dollar for the highest levels of security, redundancy and availability.  100.00+ per square foot is common.
Today, most professionally run data centers carry a majority of the Tier 3 characteristics.  Some data centers will be deficient on some of the components listed here, but you can expect that a Tier 3 will have N+1 redundant cooling, power infrastructure and network carrier availability.  They will also guarantee a 100% uptime SLA and will staff themselves 24x7x365. 

2.  Certification:  In the last 8 years, security has become paramount not only for our country, but also IT.  Along with that, the digitization of medical records and the prolific use of credit cards has forced the government to mandate that companies comply with pre-determined requirements for ensuring that medical records, social security numbers and credit card numbers can not be accessed physically.  So, in response, most auditing firms are forcing their customers to comply with certain criteria to secure data.  The biggest of these that pertains to colocation facilities is the SAS70-Type II certification.

This is an important certification for a colocation provider to possess.  In other words, if the data center you're looking at doesn't have a SAS70-Type II certification that it can share with you, take them off the list.  Essentially, the SAS70-Type II certification is a set of self imposed access processes and rules that are audited by a 3rd party every year.  While these rules are self imposed vs. PCI rules that are regulated by a 3rd party, its important for the simple fact that you will know that the data center you're evaluating takes physical security seriously.

3.  Wattage per Square Foot:  Even though I've put this down as #3 on the list, the amount of wattage per square foot that a data center can support should be #1 or #2 on your list.  Here's why;

As you can attest, the capabilities of a single U server have grown exponentially in the last 3 years.  More performance crammed into a smaller form factor generates a lot more heat, and heat is the biggest challenge in any data center.  Since 2005, the computing industry has seen the introduction of multi-core processing and faster disk drives into smaller and smaller form factors.  While this is awesome for physical consolidation, its created a challenge for most data center providers.  Why?  More compute = more watts consumed which equals more BTU (heat), and heat is the data center's nemesis.

So, how much a data center can put into any one cabinet has a great baring on two things.  1)  The number of devices can you put in that single cabinet.  2)  How many devices can my neighbor put into a single cabinet.

I'll talk more about densification of colocation space in another article.  What is important to consider is that most data centers have a rating of watts/sf.  That number is generated by numerous factors;

  • Raised floor height:  Size matters in this case.  Raised floor data centers use static pressure to force air up through perforated floor tiles.  The height of the raised floor will determine how much CFM (cubic feet / minute) can be pushed through.  Ultimately this affects how much cool air can be forced into the data center.
  • Cooling tonage:  CRAC (cooling room air conditioning) units have a rated tonage.  So, the data center will install a certain type and number of CRAC units in a data center.  The aggregate tonage of these units will give the data center is cooling ability.
  • Heat extraction units.  This is a new thing in data center build outs.  It's very efficient at aiding in the isolation of heat from the general data center.  The idea is if the data center can extract heat faster than it has to cool that heat, it saves them money.  However, its important to understand that by doing this, you may be exposed to cabinets or cages that generate more heat than the rest of the data center.   Should something go wrong with the heat extraction unit, you could be in a situation where temperatures are rising higher than normal.  
4.  Managed Services:  There's lots of debate out there about what a managed service is.  In the colocation or infrastructure as a service business, a managed service will most certainly be centered around some part of your infrastructure.  Colocation providers are seeking new and creative ways to provide you more services in the square footage your consuming in their facility.  You can leverage these services and any of the professionally run providers will have some of the following services available;
  • Managed firewall, load balancer, IPS, IDS and switches.
  • Managed data protection and offsite replication.
  • Managed storage.
  • Managed servers.
  • Managed database, OS management.
Ask the provider how deep they can go in providing these services.  Keep in mind, most if not all of these services will have to be on their own gear, so if you're a Cisco shop and the provider is a Juniper shop, you may have to sacrifice the equipment type.  But, in today's world, the brand of equipment is becoming less and less important as long as there is a strong SLA for performance and uptime associated with the service.

In summary, there are some key points to consider as you're moving down the decision path for colocation.  This process can be easy if you keep your vendor selection slim and eliminate 3rd party providers.  Consideration of the Tier level, the type of certifications and the amount of watts/sf will give you a strong base line to start from.  Lastly, managed services becomes the cherry on top if they do them well and you negotiate a strong SLA.  I'll go into more detail in subsequent articles and will drill deeper into some of the items above.  

Choosing to use colocation is a strategic initiative and one that will pay dividends in the future.  As computing becomes more dense and compliance tightens, leveraging the capabilities of a data center will ultimately help you keep more staff and help you manage the tightening budgets of the coming years.