Storage Admin Notes: December 2011

Thursday, December 22, 2011

CRC Errors, Code Violation Errors, Class 3 Discards & Loss of Sync - Why Storage isn't Always to Blame!!!

Storage is often

automatically pinpointed as the source of all problems. From System Admins, DBAs, Network guys to Application owners, all are quickly ready to point the figure at SAN Storage given the slightest hint of any performance degradation. Not really surprising though, considering it’s the common denominator amongst all silos. On the receiving end of this barrage of accusation is the SAN Storage team, who are then subjected to hours of troubleshooting only to prove that their Storage wasn’t responsible. On this circle goes until there reaches a point when the Storage team are faced with a problem that they can’t absolve themselves of blame, even though they know the Storage is working completely fine. With array-based management tools still severely lacking in their ability to pinpoint and solve storage network related problems and with server based tools doing exactly that i.e. looking at the server, there really is little if not nothing available to prove that the cause of latency is a slow draining device such as a flapping HBA, damaged cable or failing SFP. Herein lies the biggest paradox in that 99% of the time when unidentifiable SAN performance problems do occur, they are usually linked to trivial issues such as a failing SFP. In a 10,000 port environment, the million dollar question is ‘where do you begin to look for such a miniscule needle in such a gargantuan haystack?’

To solve this dilemma it’s imperative to know what to look for and have the right tools to find them, enabling your SAN storage environment to be a proactive and not a reactive fire-fighting / troubleshooting circus. So what are some of the metrics and signs that should be looked for when the Storage array, application team and servers all report everything as fine yet you still find yourself embroiled in performance problems?

Firstly to understand the context of these metrics / signs and the make up of FC transmissions, let’s use the analogy of a conversation. Firstly the Frames would be considered the words, the Sequences the sentences and an Exchange the conversation that they are all part of. With that premise it is important to first address the most basic of physical layer problems, namely Code Violation Errors. Code Violation Errors are the consequence of bit errors caused by corruption that occur in the sequence – i.e. any character corruption. A typical cause of this would be a failing HBA that would eventually start to suffer from optic degradation prior to its complete failure. I also recently experienced at one site Code Violation Errors when several SAN ports had been left enabled after their servers had been decommissioned. Some might think what’s the problem if they have nothing connected to them? In fact this scenario was creating millions of Code Violation Errors causing a CPU overhead on the SAN switch and subsequent degradation. With mission critical applications connected to the same SAN switch, performance problems became rife and without the identification of the Code Violation Errors could have led to weeks of troubleshooting with no success.

The build up of Code Violation Errors become even more troublesome as they eventually lead to what is referred to as a Loss of Sync. A Loss of Sync is usually indicative of incompatible speeds between points and again this is typical of optic degradation in the SAN infrastructure. For example if an SFP is failing, its optic signal will degrade and hence will not be at for example the 4Gbps it’s set at. Case point: a transmitting device such as a HBA is set at 4Gbps while the receiving end i.e. the SFP (unbeknownst to the end user) has degraded down to 1Gbps. Severe performance problems will occur as the two points constantly struggle with their incompatible speeds. Hence it’s an imperative to be alerted of any Loss of Sync as ultimately they are also an indication of an imminent Loss of Signal i.e. when the HBA or SFP are flapping and are about to fail. This leads to the nightmare scenario of an unplanned path failure in your SAN storage environment and worse still a possible outage if failover cannot occur.

                   One  of the biggest culprits and a sure-fire hit to resolving performance  problems is to look for what are termed CRC errors. CRC Errors usually  indicate some kind of physical problem within the FC link and are  indicative of code violation errors that have led to consequent  corruption inside the FC data frame. Usually caused by a flapping SFP or  a very old / bent / damaged cable, once CRC errors are acknowledged by  the receiver, the receiver would reject the request leaving the Frame  having to be resent. For example as an analogy imagine a newspaper  delivery boy, who while cycling to his destination loses some of the  pages of the paper prior to delivery. Upon delivery the receiver would  request for the newspaper to be redelivered with the missing pages. This  would entail the delivery boy having to cycle back to find the missing  pages and bring back the newspaper as a whole. In the context of a CRC  error a Frame that should typically take only a few milliseconds to  deliver could take up to 60 seconds in being rejected and resent. Such  response times can be catastrophic to a mission critical application and  it’s underlying business. By gaining an insight into CRC errors and  their root cause one can immediately pinpoint which bent cable or old  SFP is responsible and proactively replace them long before they start  to cause poor application response times or even worse a loss to your  business.

                The  other FC SAN gremlin is what is termed a Class 3 discard. Of the  various services of data transport defined by the Fibre Channel ANSI  Standard, the most commonly used is Class 3. Ideal for high throughput,  Class-3 is essentially a datagram service based on frame switching and  is a connectionless service. Class 3’s main advantage comes from not  giving an acknowledgement that a frame has been rejected or busied by a  destination device or Fabric. The benefits of this are that it firstly  significantly reduces the overhead on the transmitting device and  secondly allows for more bandwidth availability for transmission which  would otherwise be reduced. Furthermore the lack of acknowledgements  removes the potential delays between devices caused by round-trips of  information transfers. As for data integrity, Class 3 Flow control has  this handled by higher-level protocols such as TCP due to Fibre Channel  not checking the corrupted or missing frames. Hence any discovery of a  corrupted packet by the higher-level protocol on the receiving device  instantly initiates a retransmission of the sequence. All of this sounds  great until the non-acknowledgement of rejected frames starts to also  bring about Class 3’s disadvantage. This is that inevitably a Fabric  will become busy with traffic and will consequently discard frames,  hence the name Class 3 discards. Due to this the receiving device’s  higher-level protocol’s subsequent request for retransmission of  sequences will then degrade the device and fabric throughput.

              Another  indication of Class 3 discards are zoning conflicts where a frame has  been transmitted and cannot reach a destination, hence concluding in the  SAN initiating a Class 3 discard. This is caused by either legacy or  zoning mistakes where for example a decommissioned Storage system was  not unzoned from a server or vice versa leading to continuous frames  being discarded and degraded throughput as sequences are retransmitted.  This then results in performance problems, potential application  degradation and automatic finger pointing at the Storage System for a  problem that can’t automatically be identified. By resolving the zoning  conflict and spreading the load of the SAN throughput across the right  ports, the heavy traffic or zoning issues which cause the Class 3  discards can be quickly removed bringing immediate performance and  throughput improvements. By gaining an insight into the occurrence and  amount of Class 3 discards, huge performance problems can be quickly  remediated before they occur and thus another reason as to why the  Storage shouldn’t automatically be blamed.

           These  are just some of the metrics / signs to look for which can ultimately  save you from weeks of troubleshooting and guessing. By first  acknowledging these metrics, identifying when they occur and proactively  eliminating them, the SAN storage environment will quickly evolve and  transform into a healthy, proactive and optimized one. Furthermore by  eliminating each of these issues you also empower yourself by  eliminating their consequent problems such as application slowdown, poor  response times, unplanned outages and long drawn out troubleshooting  exercises which eventually lead to fingerpointing fights. Ideally what  will occur is a paradigm shift where instead of application owners  complaining to the Storage team, the Storage team will proactively  identify problems prior to their existence. Here lies the key to making  the ‘always blaming the Storage’ syndrome a thing of the past.   

Tuesday, December 13, 2011

Enterprise Computing: Why Thin Provisioning Is Not The Holy Grail for Utilisation

Thin Provisioning (Dynamic Provisioning, Virtual Provisioning, or whatever you prefer to call it) is being heavily touted as a method of reducing storage costs. Whilst at the outset it seems to provide some significant storage savings, it isn’t the answer for all our storage ills.

What is it?

Thin Provisioning (TP) is a way of reducing storage allocations by virtualising the storage LUN. Only the sectors of the LUN which have been written to are actually placed on physical disk. This has the benefit of reducing wastage, in instances where more storage is provisioned to a host than is actually needed. Look a the following figure. It shows five typical 10GB LUNs, allocated from an array. In a “normal” storage configuration, those LUNs would be allocated to a host and configured with a file system. Invariably, the file systems will never be run at 100% utilisation (just try it!) as this doesn’t work operationally and also because users typically order more storage than they actually require, for a many reasons. Typically, host volumes can be anywhere from 30-50% utilised and in an environment where the entire LUN is reserved out for the host, this results in a 50-70% wastage.

Now, contrast this to a Thin Provisioned model. Instead of dedicating the physical LUNs to a host, they now form a storage pool; only the data which has actually been written is stored onto disk. This has two benefits; either the storage pool can be allocated smaller than the theoretical capacity of the now virtual LUNs, or more LUNs can be created from the same size storage pool. Either way, the physical storage can be used much more efficiently and with much less waste.

There are some obvious negatives to the TP model. It is possible to over-provision LUNs and as data is written to them, exhaust the shared storage pool. This is Not A Good Thing and clearly requires additional management techniques to ensure this scenario doesn’t happen and sensible standards for layout and design to ensure a rogue host writing lots of data can’t impact other storage users.

The next problem with TP in this representation is the apparent concentration of risk and performance of many virtual LUNs to a smaller number of physical devices. In my example, the five LUNs have been stored on only three physical LUNs. This may represent a potential performance bottleneck and consequently vendors have catered for this in their implementations of TP. Rather than there being large chunks of storage provided from fixed volumes, TP is implemented using smaller blocks (or chunks) which are distributed across all disks in the pool. The third image visualises this method of allocation.

So each vendor’s implementation of TP uses a different block size. HDS use 42MB on the USP, EMC use 768KB on DMX, IBM allow a variable size from 32KB to 256KB on the SVC and 3Par use blocks of just 16KB. The reasons for this are many and varied and for legacy hardware are a reflection of the underlying hardware architecture.

Unfortunately, the file systems that are created on thin provisioned LUNs typically don’t have a matching block size structure. Windows NTFS for example, will use a maximum block size of only 4KB for large disks unless explicitly overriden by the user. The mismatch between the TP block size and the file system block size causes a major problem as data is created, amended and deleted over time on these systems. To understand why, we need to examine how file systems are created on disk.

The fourth graphic shows a snapshot from one of the logical drives in my desktop PC. This volume hasn’t been defragmented for nearly 6 months and consequently many of the files are fragmented and not stored on disk in contiguous blocks. Fragmentation is seen as a problem for physical disks as the head needs to move about frequently to retrieve fragmented files and that adds a delay to the read and write times to and from the device. In a SAN environment, fragmentation is less of an issue as the data is typically read and written through cache, negating most of the physical issues of moving disk heads. However fragmentation and thin provisioning don’t get along very well and here’s why.

The Problem of Fragmentation and TP

When files are first created on disk, they will occupy contiguous sections of space. If this data resides on TP LUNs, then a new block will be assigned to a virtual TP LUN as soon as a single filesystem block is created. For a Windows system using 4KB blocks on USP storage, this means 42MB each time. This isn’t a problem as the file continues to be expanded, however it is unlikely this file will end neatly on a 42MB boundary. As more files are created and deleted, each 42MB block will become partially populated with 4KB filesystem blocks, leaving “holes” in the filesystem which represent unused storage. Over time, a TP LUN will experience storage utilisation “creep” as new blocks are “touched” and therefore written onto physical disk. Even if data is deleted from an entire 42MB chunk, it won’t be released by the array as data is usually ”logically deleted” by the operating system. De-fragmenting a volume makes the utilisation creep issue worse; it writes to unused space in order to consolidate files. Once written, these new areas of physical disk space are never reclaimed.

So what’s the solution?

Fixing the TP Problem

Making TP useful requires a feature that is already available in the USP arrays as Zero Page Reclaim and 3Par arrays as Thin Built In. When an entire “empty” TP chunk is detected, it is automatically released by the system (in HDS’s case at the touch of a button). So, for example as fat LUNs are migrated to thin LUNs, unused space can be released.

This feature doesn’t help however with traditional file systems that don’t overwrite deleted data with binary zeros. I’d suggest two possibilities to cure this problem:

Secure Defrag. As defragmentation products re-allocate blocks, they should write binary zeros to the released space. Although this is time consuming, it would ensure deleted space could be reclaimed by the array.
Freespace Consolidation. File system free space is usually tracked by maintaining a chain of freespace blocks. Some defragmentation tools can consolidate this chain. It would be an easy fix to simply write binary zeros over each block as it is consolidated up.

One alternative solution from Symantec is to use their Volume Manager software, which is now “Thin Aware”. I’m slightly skeptical about this as a solution as it places requirements on the operating system to deploy software or patches just to make storage operate efficiently. It takes me back to Iceberg and IXFP….
Summary
So in summary, Thin Provisioning can be a Good Thing, however over time, it will lose its shine. We need fixes that allow deleted blocks of data to be consolidated and returned to the storage array for re-use. Then TP will deliver on what it promises.
Footnote
Incidentally, I’m surprised HDS haven’t made more noise about Zero Page Reclaim. It’s a TP feature that to my knowledge EMC haven’t got on DMX or V-Max.

Source:http://thestoragearchitect.com/

Thin provisioning

Thin provisioning Introduction

Thin provisioning, sometimes called "over subscription" is an important emerging storage technologies is thin provisioning. This article defines thin provisioning, describes how it works, identifies some challenges for the technology, and suggests where it will be most useful.

If applications run out of storage space, they crash. Therefore, storage administrators commonly install more storage capacity than required to avoid any potential application failures. This practice provides 'headroom' for future growth and reduces the risk of application failures. However, it requires the installation of more physical disk capacity than is actually used, creating waste.
Thin provisioning software allows higher storage utilization by eliminating the need to install physical disk capacity that goes unused. Figure 1 shows how storage administrators typically allocate more storage than is needed for applications -- planning ahead for growth and ensuring applications won't crash because they run out of disk space. In Figure 1 volume A has only 100 GB of physical data, but may has been allocated much more than that based on growth projections (500GB, in this example). The unused storage allocated to the volume cannot be used by other applications. In many cases the full 500 GB is never used and is essentially wasted. This is sometimes referred to as "stranded storage."
In most implementations, thin provisioning provides storage to applications from a common pool of storage on an as required basis. Thin provisioning works in combination with storage virtualization, which is essentially a prerequisite to effectively utilize the technology. With thin provisioning, a storage administrator allocates logical storage to an application as usual, but the system releases physical capacity only when it is required. When utilization of that storage approaches a predetermined threshold (e.g. 90%), the array automatically provides capacity from a virtual storage pool which expands a volume without involving the storage administrator. The volume can be over allocated as usual, so the application thinks it has plenty of storage, but now there is no stranded storage. Thin provisioning is on-demand storage that essentially eliminates allocated but unused capacity.
There are some challenges with thin provisioning technology, and some areas where it is not currently recommended:

The data that is deleted from a volume needs to be reclaimed, which can add to the storage controller overhead and increased cost.
File systems (e.g. Microsoft NTFS files) that use unused blocks instead of reusing released blocks cause volumes to expand to their maximum allocated size before reusing storage. This negates the benefits of thin provisioning.
Applications that spread metadata across the entire volume will obviate the advantages of thin provisioning.
Applications that expect the data to be contiguous, and/or optimize I/O performance around that assumption are not good candidates for thin provisioning.
If a host determines that there is sufficient available space, it may allocate it to an application, and the application may deploy it. This space is virtual, however, and if the array can't provision real new storage fast enough, the application will fail. High performance controllers and and good monitoring of over-provisioning of storage will be required to avoid reduced availability.

As thin provisioning technology matures, applications and file systems will be built and modified to avoid these kinds of problems. The economic justification for thin provisioning is simple: it makes storage allocation automatic, which significantly reduces the storage administrators' work, and it can reduce the amount of storage required to service applications. It also reduces the number of spinning disk drives required and therefore will result in substantial reductions in energy consumption.
Action Item: Thin provisioning can provide some major advantages in increasing overall storage utilization and should be seriously considered when virtualizing a data center. However, users should be aware of the caveats and should examine the storage requirements and management of their applications to identify any that are poor candidates for this approach.