Wednesday, January 19, 2011

EMC VPLEX architecture

Here you will find a good overview on the architecture of the EMC VPLEX product.

How to change a Solaris zone netmask?

It is not recommended to edit  /etc/zones/zonename.xml file
zonecfg -z email-zone
zonecfg:email-zone> remove net address=174.1.130.132
zonecfg:email-zone> add net
zonecfg:email-zone:net> set address=174.1.130.232/24
zonecfg:email-zone:net> set physical=bge0
zonecfg:email-zone:net> end
zonecfg:email-zone> commit
zonecfg:email-zone> exit

Use this format for new-ip to configure a specific subnet mask: 174.1.130.132/24

Storage administration terminology

There is nothing more frustrating than trying to learn a new technology and not being able to keep up with basic instruction. This is partially representative of my own journey into storage, as historically I’ve focused on servers, virtualization, and operating systems. Storage administration is a peculiar segment of IT, and there are a number of terms and acronyms that go along with it. Here are some of the terms that I have taken in during my journey:
  1. Thin provisioning:
Disks are expensive, and many administrators make the case to provision volumes at any level as a thin provisioned solution. This effectively doesn’t count free space as used even though it is allocated.
  1. Hot spots:
A hot spot is an area of disk that receives a large amount of I/O from the storage consumers. This can be due to a number of logical unit numbers (LUNs) residing on a single disk or simply a very busy single LUN. The objective of a storage administrator is to mitigate the hot spots across the available storage.
  1. Short stroking:
This is somewhat of a trick play in storage administration in that a drive can be made to perform better than a typical configuration. A short stroke is only using a very small area of a hard drive as part of a large array to make the hard drive have a short area to traverse. While this will increase the performance of the drive, there is usually a lot of wasted space.
  1. Zero Page Reclaim or (ZPR):
For thin provisioning environments, this is where the storage controller goes and harvests zeroes and returns them to a master available storage pool. Depending on the architecture or use of storage products, this may be done exclusively by the storage processor or as an enhancement to new products. One example is VMware’s vSphere 4.1 which now offers support for vSphere APIs for Array Integration (VAAI) with selected storage products. This effectively lets the storage processor do the zero-handling or bulk zero I/O operations instead of the vSphere host as an optimization for disk use and storage network traffic.
  1. Wide striping:
This practice involves having a higher number of drives in use for a LUN to achieve greater throughput. Basically, 12 drives in an array providing a LUN will provide better aggregate throughput than two or three of those same drives.
  1. Cheap and deep:
This is typically used to describe storage that is very large in terms of GB or TB, yet is slow and inexpensive. In today’s data centers, that gravitates toward SATA drives that are 1 TB, 2 TB, or larger.
  1. Rotational storage:
This term refers to traditional hard drives with moving platters and a seeking head to the regions on the disk. This is also known on a more casual level as “spinning rust”. The alternative is solid state (or enterprise flash) storage, which has no moving parts.
  1. Storage tiering:
Whether automated or storage administrator-based, this is putting workloads on disk resources that perform to their requirements. In most environments, there are two levels of storage tiers: SAS and SATA drives with the SAS drives being higher performance (and price) drives.
  1. Slow-Spin or No-Spin:
Denotes a slow tier of storage that is very slow, such as 5400 or 7200 RPM; or even in a powered down state. This can include a tape storage solution.
Storage administration is filled with a number of terms that denote how disk resources are managed, provisioned and consumed. Do you have some storage jargon that you use in your daily administration? Share your terms below.

Tuesday, January 4, 2011

Comparisons of SVM vs VXVM

Characteristic

Solstice Disksuite

Veritas Volume Manager

Availability
Free with a server license, pay for workstation. Sun seems to have an on again off again relationship about whether future development will continue to occur with this product, but it is currently available for Solaris8 and will continue to be supported in that capacity. The current word is that future development is on again. Salt liberally. (Just to further confuse things, SDS is free for Solaris 8 up to 8 CPUs, even for commercial).
Available from Sun or directly from Veritas (pricing may differ considerably. Also execellent educational pricing. Free with storage array or A5000 (but cannot use striping outside array device)
Installation
relatively easy. You must do special things to first achieve rootdisk, swap, and other disk mirroring in exactly the right order.
easy. Follow the onscreen prompts, let it do its reboots.
Upgrading
easy, remove patches, remove packages, add new packages
slightly more complex, but well documented. There are several ways to do it.
Replacing failed
root mirror
very easy. replace disk and resynchronize
very easy. replace disk and resynchronize
Replacing failed
primary root disk
relatively easy. boot off mirror, replace disk, resync, boot off primary.
easy to slightly complex depending on setup. Well documented. 11 steps or fewer.
Replacing failed
data disk in
redundant (mirrored
or RAID5) volume
trivial
trivial
extensibility / number
of volumes
Traditionally, relatively easy but EXTREMELY limited by usage of hard partition table on disk. Number of total volumes on a typical system is very limited because of this. If you have a lot of disks, you can still create a lot of metadevices. The default is 256 max, but this can be increased by setting nmd=XXXX in /kernel/drv/md.conf and then rebooting. Schemes for managing metadevice naming for large number of deices are available, but clunky and occassionally contrived. NOTE: SDS 4.2.1+ (avail Sol7) removes the reliance upon disk VTOC for making metadevices through 'soft partitions'.
trivial. No limitations will be encountered by most people. Number of volumes is potentially limitless.
Moving a volume
difficult unless special planning and precautions have been taken with laying out the proper partition and disk labels beforehand. Somewhat hidden by GUI.
trivial. on redundant volumes can be done on the fly.
Growing a volume
volume can be extended in two different ways. It can be concatenated with another region of space someplace else or, if there is contiguous space following ALL of the partitions of the original volume, the stripe can be extended. Using concatenation you could grow a 4 disk stripe by 2 additional disks. (e.g. 4 disk stripe concatenated with a 2 disk stripe).
volume can be extended in two different ways. The columes of the stripe can be extended for Raid0/5, simple single-disk volumes can be grown directly, and in VxVM > 3.0, a volume can be re-layed out (The number of columns in a RAID-5 stripe can be reconfigured on the fly!). Contiguous space is not required. In VXVM < 3.0 if you are increasing the size of a stripe, you must have space on disks where is the original number of disks in a stripe. You can't 'grow' (but could relayout) a 4 disk stripe by adding two more disks, but you could with 4. Extremely flexible.
Shrinking a volume
(only possible with
VxFS filesystem!)
difficult. You must adjust all disk or soft partitions manually.
trivial. vxresize can shrink filesystem and volume in one command..
Relayout volume
(change 4 disk
raid-5 volume to 5 disk volume
Requires dump/restore of data.
Available on the fly for VxVM > 3.0
Logging
in SDS a Meta-trans device may be used which provides a log based addition on top of a UFS filesystem. This transaction log, if used, should be mirrorred! (Loss of log results in a filesystem that may be corrupted even beyond fsck repair.) Use of a UFS+ logging filesystem instead of a trans device is a better alternative. UFS+ logging is availabe in Sol7 and above.
VxVM has RAID-5 logs and mirror/drl logs, Logging, if used need not be mirrored, and volume can continue operating if log fails. Having 1 is highly recommended for crash recovery. Logs are infinitessimally small, typically 1 disk cylinder or so. The SDS logs are really more equivalent to a VxFS log at the filesystem level, but it is worth mentioning the additional capabilities of VxVM in this regard. UFS+ with logging can also be used on a VxVM volume. There are many kinds of purpose-specific logs for things like fast mirror resync, volume replication, database logging, etc.
Performance
Your mileage may vary. SDS seems to excel at simple RAID-0 striping, but seems to be only marginally faster than VxVM. VxVM also seems to gain back when using large interleaves. For best results, benchmark YOUR data with YOUR app and pay very close attention to your data size and your stripe unit/interleave size. RAID5 on VxVM is almost always faster by 20-30%.
Links:
archive1, archive2
Notifications (see also)
SNMP traps are used for notification. You must have something set to receive them. Notifications are limited in scope.
VxVM uses email for notifying you when a volume is being moved because of bad blocks using hot relocation or sparing. The notification is very good.
Sparing
hot spare disks may be designated for a diskset, but must be done at the slice level.
hot spare disks may be designated for a diskgroup. Or, extra space on any disk can be used for dynamic hot relocation without the need for reserving a spare.
Terminology
SDS diskset = VxVM diskgroup, SDS metadevice = VxVM volume, SDS Trans device ~ VxVM log, VxVM has subdisks which are units of data (e.g. a column of a stripe) that have no SDS equivalent. VxVM plexes are groupings of subdisks (e.g. into a stripe) that have no real SDS equivalent. VxVM Volumes are groupings of plexes. (e.g. the data plex and a log plex, or 2 plexes for a 0+1 volume)
GUI
Most people prefer the VxVM GUI, though there are a few that prefer the (now 4 years old) SDS gui. SDS has been renamed SVM in Solaris9 and the GUI is supposedly much improved. VxVM has gone through 3-4 GUI incarnations. Disclaimer: I *never* use the GUI
command line usage
metareplace, metaclear to delete, metainit for volumes, metadb for state databases, etc
vxassist is used for creation of all volume types, vxsd, vxvol, vxplex operate on appropriate VxVM objects (see terminology above). Generally, there are many more vx specific commands, but normal usage rarely requires 20% of these except for advanced configurations (special initializations, using alternate pathing, etc)
device database configuration copies
Kept in special, replicated, partitions you must setup on disk and configure via metdb. /etc/opt/SUNWmd, and /etc/system contain the boot/metadb information and description of the volumes. Lose these and you have big problems. NOTE: in Solaris9 SVM, configuration copies are now kept on the metadisks themselves with the data, like VxVM
Kept in the private region on each disk. Disks can move about and the machine can be reinstalled without having to worry about losing data in volumes.
Typical usage
Simple mirroring of root disk, simple striping of disks where situation is relatively stagnant (e.g. just a bunch of disks with RAID0 and no immediate scaling or mobility concerns). Scales well in size of small number of volumes, but poorly in large number of smaller volumes.
enterprise ready. Data mobility, scalability, and configuration are all extensively addressed. Replacing failed encapsulated rootdisk is more complicated than it needs to be. See Sun best practices paper for a better way. Other alternatives exist.
Best features
Simple/simplistic - root/swap mirroring and simple striping is no brainer, free or nearly so. Easier to fix by hand (without immediate support) when something goes fubar (vxvm is much more complex to understand under the hood).
extensible, error notifications are good, extremeley configurable, relayout on the fly with VxVM > 3.0, nice integration with VxFS, best scalability. Excellent edu pricing.
Worst features
configuration syntax (meta*), configuration information stored on host system (< Sol9). Metadb/slices -- a remnant from SunOS4 days! -- needs to be completely redone; naming is inflexible and limited. Number of hard metadevices has kernel hack workarounds, but is still very limiting. Required mirroring of trans logs is inconvenient, but mitigated by using native UFS+ w/logging in Solaris7 and above. Lack of drive level hotsparing (see sparing) is extremely inconvenient.
expensive for enterprises and big servers, root mirroring and primary rootdisk failure for encapsulated rootdisk is too complex (but well documented) (should be fixed in VxVM 4.0), somewhat steep learning curve for advanced usage. Recovery from administrative SNAFUs (involving restore and single user mode) on a mirrored rootdisk can be troublesome.
Tips
keep backups of your of configuration in case of corruption. Regular usage of metastat, metastat -p, and prtvtoc can help.
In VxVM regular usage of vxprint -ht is useful for disaster recovery. There are also several different disaster recovery scripts here
Using VxVM for
data and SDS
for root mirroring
Many people do this. There are tradeoffs. One the one hand you have added simplicity in the management of your rootdisks by not having to deal with VxVM encapsulation, which can ease recovery and upgrades. On the other hand, you now have the added complexity of having to maintain a separate rootdg volume someplace else, or use a simple slice (which, by the way, neither Sun nor Veritas will support if there are problems). You also have the added complexity of managing too completely separate storage/volume management products and their associated nuances and patches. In the end it boils down to preference. There is no right or wrong answer here, though some will say otherwise. ;) Veritas VxVM 4.0 removes the requirement for rootdg.

Red Hat Linux Tasks

Setup telnet server

# rpm -ivh xinetd.????                  insert lastest revision
# rpm -ivh telnet-server.????           insert lastest revision

edit the telnet config file
# cd /etc/inetd.d
# vi telnet                             change disable option to "no"

restart the xinetd service
# service xinetd restart
Setup Virtual IP Address

# cd /etc/sysconfig/network-scripts
# cp ifcfg-eth0 ifcfg-eth0:1
Edit and up following options, the other options should be ok
# vi ifcfg-eth0:1
 
DEVICE=eth0:1
IPADDR=192.168.0.6
Restart the network services, remember this will restart all network interfaces
# service network restart
 
Allow root access

# cd /etc
# vi securetty
Add the following lines at the bottom of the file, this allows 5 sessions but you can add more.
pts/1
pts/2
pts/3
pts/4
pts/5
 
Setup NTP

# cd /etc/ntp

edit the ntpservers file and add your ntpservers
clock.redhat.com
clock2.redhat.com
Start the ntp service and check ntp
# service ntpd start
# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*mail01.tjgroup. 192.43.244.18 2 u 32 64 37 37.598 -465.66 2.783
ns1.pulsation.f 194.2.0.58 3 u 23 64 37 28.774 -478.17 0.862
+enigma.wiredgoa 204.123.2.5 2 u 30 64 37 161.413 -475.88 1.307
LOCAL(0) LOCAL(0) 10 l 29 64 37 0.000 0.000 0.004
Start the ntp service on reboot
# chkconfig ntpd --list
ntpd 0:off 1:off 2:off 3:off 4:off 5:off 6:off

# chkconfig --levels 2345 ntpd on
# chkconfig ntpd --list
ntpd 0:off 1:off 2:on 3:on 4:on 5:on 6:off

Solaris Zones Basics-2

Zone States
ConfiguredConfiguration has been completed and storage has been committed. Additional configuration is still required.
IncompleteZone is in this state when it is being installed or uninstalled.
InstalledThe zone has a confirmed configuration, zoneadm is used to verify the configuration, Solaris packages have been installed, even through it has been installed, it still has no virtual platform associated with it.
Ready (active)Zone's virtual platform is established. The kernel creates the zschedprocess, the network interfaces are plumbed and filesystems mounted. The system also assigns a zone ID at this state, but no processes are associated with this zone.
Running (active)A zone enters this state when the first user process is created. This is the normal state for an operational zone.
Shutting down + Down (active)Normal state when a zone is being shutdown.

Files and Directories
zone config file
/etc/zones
zone index
/etc/zones/index

Note: used by /lib/svc/method/svc-zones to start and stop zones

Cheat sheet
Creating a zonezonecfg -z <zone>
see creating a zone for a more details
deleting a zone from the global ssytem## halt the zone first, then uninstall it
zoneadm -z <zone> halt
zoneadm -z <zone> uninstall

## now you can delete it
zonecfg -z <zone> delete -F
Display zones current configurationzonecfg -z <zone> info
Display zone namezonename
Create a zone creation filezonecfg -z <zone> export
  
Verify a zonezoneadm -z <zone> verify
Installing a zonezoneadm -z <zone> install
Ready a zonezoneadm -z <zone> ready
boot a zonezoneadm -z <zone> boot
reboot a zonezoneadm -z <zone> reboot
halt a zonezoneadm -z <zone> halt
uninstalling a zonezoneadm -z <zone> uninstall -F
Veiwing zoneszoneadm list -cv
  
login into a zonezlogin <zone>
login to a zones consolezlogin -C <zone> (use ~. to exit)
login into a zone in safe mode (recovery)zlogin -S <zone>
  
add/remove a package (global zone)# pkgadd -G -d . <package>

If the -G option is missing the package will be added to all zones
add/remove a package (non-global zone)# pkgadd -Z -d . <package>
If the -Z option is missing the package will be added to all zones
Query packages in all non-global zones# pkginfo -Z
query packages in a specified zone# pkginfo -z <zone>
  
lists processes in a zone# ps -z <zone>
list the ipcs in a zone# ipcs -z <zone>
process grep in a zone# pgrep -z <zone>
list the ptree in a zone# ptree -z <zone>
Display all filesystems# df -Zk
display the zones process informtion# prstat -Z
# prstat -z <zone>
Note:
-Z reports information about processes and zones
-z reports information about a particular zone

Solaris Zones Basics-1

There are two types of zones global and non-global. The global zone is the server itself and is used as the system-wide configuration and control, there can only be one global zone per system. A maximum of 8192 non-global zones can exist on a system, all non-global zones are isolated from each other.
There are two type types of non-global zones sparse root zone or whole root zones.

whole root zone
Solaris packages are copied to the zone's private file system. Disk space usage is much greater than using a sparse zone
sparse zone
You can determine how much of the global zone file system you want to be inherited from the global zone. Sparse zones use loopback file systems from global zone.
Use the inherit-pkg-dir resource to specify which directories to inherit.

Zone States
Configured
Configuration has been completed and storage has been committed. Additional configuration is still required.
Incomplete
Zone is in this state when it is being installed or uninstalled.
Installed
The zone has a confirmed configuration, zoneadm is used to verify the configuration, Solaris packages have been installed, even through it has been installed, it still has no virtual platform associated with it.
Ready (active)
Zone's virtual platform is established. The kernel creates the zsched process, the network interfaces are plumbed and filesystems mounted. The system also assigns a zone ID at this state, but no processes are associated with this zone.
Running (active)
A zone enters this state when the first user process is created. This is the normal state for an operational zone.
Shutting down + Down (active)
Normal state when a zone is being shutdown.
Zone Daemons
zoneadm
Each zone will have a zoneadm associated with it and carries out the following actions:
allocates the zone ID and starts the zsched process
sets system-wide resource controls
prepares the zone's devices if any specified in the zone configuration
plumbs the virtual network interface
mounts any loopback or conventional filesystems
zsched
The job of the zsched is to keep track of kernel threads running within the zone.
List zone name
# zonename
List all zones
All the configured zone and there status should be listed.
# zoneadm list -cv
ID   NAME       STATUS    PATH
 
0    global     running   /
 
3    testzone   running   /zones/testzone
Creating a zone
When creating a zone the zonename must be unique, no longer than 64 characters and is case-sensitive and must begin with a
alpha-numeric character. It can include underbars(_), hyphens (-) and periods (.). The name
 global and SUNW are reserved
words and cannot be used.
# zonecfg -z testzone
testzone: No such zone configured
 
Use 'create' to begin configuring a new zone.
 
zonecfg:testzone> create
 
zonecfg:testzone> set zonepath=/zones/testzone
 
zonecfg:testzone> set autoboot=true
 
zonecfg:testzone> info
 
zonepath: /zones/testzone
 
autoboot: true
 
pool:
 
inherit-pkg-dir:
 
dir: /lib
 
inherit-pkg-dir:
 
dir: /platform
inherit-pkg-dir:
dir: /sbin
 
inherit-pkg-dir:
 
dir: /usr
 
zonecfg:testzone> verify
zonecfg:testzone> commit
 
zonecfg:testzone> ^D
The zone will now be created in a configured state, ignore the error at the top as this is just reporting that there is no other testzone.
# zoneadm list -cv
ID  NAME     STATUS     PATH
0   global   running    /
 
-   testzone configured  /zones/testzone
/zones can be a filesystem or directory. Although the zone has been create it does not have resouces yet i.e no ip address.
Install the zone
Copy the necessary files from the global zone and populate the product database for the zone. While the zone is being installed
the state changes to
 incomplete.
# zoneadm –z testzone install 
# zoneadm list –cv
ID   NAME    STATUS   PATH
 
0     global      running       /
 
1    testzone   incomplete        /zones/testzone
Once the zone is installed the state changes again to installed
# zoneadm list –cv
ID   NAME        STATUS        PATH
 
0    global      running       /
 
1    testzone    installed     /zones/testzone
Ready a zone
When the zone is in the ready state it is associated with a virtual platform, network interfaces are plumbed and filesystems mounted.
There is no "ok>" prompt in a zone.
# zoneadm –z testzone ready
# zoneadm list –cv
ID   NAME    STATUS   PATH
 
0     global      running       /
 
1    testzone   ready         /zones/testzone
Booting a zone
When you boot a zone the state changes to running. When booting a zone it automatically readies the state of a zone so you
do not need to ready a zone beforehand.
# zoneadm –z testzone boot
# zlogin -C testzone
 
[Connected to zone 'testzone' console]
 
[NOTICE: Zone booting up]
SunOS Release 5.10 Version Generic 64-bit
 
Copyright 1983-2005 Sun Microsystems, Inc. All rights reserved.
 
Use is subject to license terms.
 
Hostname: ukstsg10
ukstsg10 console login:

# zoneadm list –cv
ID   NAME      STATUS   PATH
0    global    running  /
 
4    testzone  running  /export/home/testzone
Login into a zones console
You can login to the zones console, use '~.' to exit out of the console. All console messages will be reported here as per a 
normal console, they only difference is there is no "ok>" prompt.
The first time a zone is booted you have to finish off the configuration which asks you set language, terminal type, etc
# zlogin -C testzone
Adding a network resource to a zone
You need to log into the zone for the changes to take effect
# zonecfg –z testzone 
zonecfg:testzone> add net
 
zonecfg:testzone:net> set address=192.168.0.12
 
zonecfg:testzone:net> set physical=hme0
 
zonecfg:testzone:net> end
 
zonecfg:testzone> export
 
create -b
 
set zonepath=/zones/testzone
 
set autoboot=false
 
add inherit-pkg-dir
 
set dir=/lib
 
end
 
add inherit-pkg-dir
 
set dir=/platform
 
end
 
add inherit-pkg-dir
 
set dir=/sbin
 
end
 
add inherit-pkg-dir
 
set dir=/usr
 
end
 
add net
set address=147.184.30.12
 
set physical=hme0
 
end
 
zonecfg:testzone>exit
Mount a LOFS in a zone (ideal for cdrom)
You need to log into the zone for the changes to take effect
# zonecfg -z testzone
zonecfg:myzone> add fs
zonecfg:myzone:fs> set dir=/mnt
zonecfg:myzone:fs> set special=/cdrom
zonecfg:myzone:fs> set type=lofs
zonecfg:myzone:fs> add options [ro,nodevices]
zonecfg:myzone:fs> end
zonecfg:myzone> commit
zonecfg:myzone> exit
Add a disk/filesystem device to a zone
You need to log into the zone for the changes to take effect
# zonecfg –z testzone 
zonecfg:my-zone3> add fs 
zonecfg:my-zone3:fs> set dir=/data1
zonecfg:my-zone3:fs> set special=/dev/dsk/c1t1d0s0
zonecfg:my-zone3:fs> set raw=/dev/rdsk/c1t1d0s0
zonecfg:my-zone3:fs> set type=ufs
zonecfg:my-zone3:fs> add options [logging, nosuid] 
zonecfg:my-zone3:fs> end 
zonecfg:my-zone3:fs> commit 
zonecfg:my-zone3:fs> exit
create the vfstab file entry and mount the device
# df –k
/data1 8705501 8657 8609789 1% /data1
# mount 
/data1 on /data1 read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=80008 8 on Mon Mar 7 15:50:53 2005
# cat /etc/mnttab
/dev/dsk/c1t1d0s0 /export/home/testzone/root/data1 ufs rw,intr,largefiles, logging, xattr,onerror=panic,dev=800088 1110211568
Mount a filesystem from the global zone
You need to log into the zone for the changes to take effect
# zonecfg -z testzone
zonecfg:myzone> add inherit-pkg-dir
zonecfg:myzone:fs> set dir=/opt/sfw
zonecfg:myzone:fs> end
zonecfg:myzone> commit
zonecfg:myzone> exit
Halting a zone
# zoneadm –z testzone halt 
# zoneadm list –cv
ID NAME     STATUS     PATH 
0  global   running    /
 
-  testzone installed  /zones/testzone
Rebooting a zone
# zoneadm –z testzone reboot 
# zoneadm list –cv
ID NAME     STATUS     PATH 
0  global   running    /
 
1 testzone running  /zones/testzone
Uninstalling a zone
# zoneadm –z testzone uninstall -F 
Deleting a zone
# zoneadm –z testzone delete -F