Solaris Guru ~ Helps you Get Quick Start on Solaris: May 2008

Thursday, May 22, 2008

Solaris - Crontab

Features:
1. Permits scheduling of scripts(shell/perl/python/ruby/PHP/etc.)/tasks on a per-user basis via individual cron tables.
2. Permits recurring execution of tasks
3. Permits one-time execution of tasks via 'at'
4. Logs results(exit status but can be full output) of executed tasks
5. Facilitates restrictions/permissions via - cron.deny,cron.allow,at.*

Directory Layout for Cron daemon:
/var/spool/cron - and sub-directories of to store cron & at entries
/var/spool/cron/atjobs - houses one-off, atjobs
- 787546321.a - corresponds to a user's atjob

/var/spool/cron/crontabs - houses recurring jobs for users
- username - these files house recurring tasks for each user

Cron command:
crontab - facilitates the management of cron table files
-crontab -l - lists the cron table for current user -
- reads /var/spool/cron/crontabs/root

Cron file format

m(0-59) h(0-23) dom(1-31) m(1-12) dow(0-6) command
10 3 * * * /usr/sbin/logadm - 3:10AM - every day
15 3 * * 0 /usr/lib/fs/nfs/nfsfind - 3:15 - every Sunday
30 3 * * * [ -x /usr/lib/gss/gsscred_clean ] && /usr/lib/gss/gsscred_clean
1 2 * * * [ -x /usr/sbin/rtc ] && /usr/sbin/rtc -c > /dev/null 2>&1

m(0-59) h(0-23) dom(1-31) m(1-12) dow(0-6) command
Note: (date/time/command) MUST be on 1 line
m = minute(0-59)
h = hour(0-23)
dom = day of the month(1-31)
m = month(1-12)
dow = day of the week(0-6) - 0=Sunday

Note: each line contains 6 fields/columns - 5 pertain to date & time of execution, and the 6th pertains to command to execute

#m h dom m dow
10 3 * * * /usr/sbin/logadm - 3:10AM - every day
* * * * * /usr/sbin/logadm - every minute,hour,dom,m,dow
*/5 * * * * /usr/sbin/logadm - every 5 minutes(0,5,10,15...)
1 0-4 * * * /usr/sbin/logadm - 1 minute after the hours 0-4
0 0,2,4,6,9 * * * /usr/sbin/logadm - top of the hours 0,2,4,6,9

1-9 0,2,4,6,9 * * * /usr/sbin/logadm - 1-9 minutes of hours 0,2,4,6,9

Note: Separate columns/fields using whitespace or tabs

###Create crontabs for root ###
Note: ALWAYS test commands prior to crontab/at submission

11 * * * * script.sh -va >> /reports/`date +%F`.script.report

Note: set EDITOR variable to desired editor
export EDITOR=vim

###script.sh ###
#!/usr/bin/bash
HOME=/export/home/vishal
df -h >> $HOME/`date +%F`.script.report
#END

Note: aim to reference scripts(shell/perl/python/ruby/PHP,etc.) instead of the various characters

Note:
Default Solaris install creates 'at.deny' & 'cron.deny'
You MUST not be included in either file to be able to submit at & cron entries

Conversely, if cron.allow and at.allow files exist, you MUST belong to either file to submit at or cron entries

NETSTAT Usage in Solaris

Lists connections for ALL protocols & address families to and from machine
Address Families (AF) include:
INET - ipv4
INET6 - ipv6
UNIX - Unix Domain Sockets(Solaris/FreeBSD/Linux/etc.)

Protocols Supported in INET/INET6 include:
TCP, IP, ICMP(PING(echo/echo-reply)), IGMP, RAWIP, UDP(DHCP,TFTP,etc.)

Lists routing table
Lists DHCP status for various interfaces
Lists net-to-media table - network to MAC(network card) table

NETSTAT USAGE:

netstat - returns sockets by protocol using /etc/services for lookup

/etc/nssswitch.conf is consulted by netstat to resolve names for IPs

netstat -a - returns ALL protocols for ALL address families (TCP/UDP/UNIX)

netstat -an - -n option disables name resolution of hosts & ports

netstat -i - returns the state of interfaces. pay attention to errors/collisions/queue columns when troubleshooting performance

netstat -m - returns streams(TCP) statistics

netstat -p - returns net-to-media info (MAC/layer-2 info.) i.e. arp

netstat -P protocol (ip|ipv6|icmp|icmpv6|tcp|udp|rawip|raw|igmp) - returns active sockets for selected protocol

netstat -r - returns routing table

netstat -D - returns DHCP configuration (lease duration/renewal/etc.)

netstat -an -f address_family

netstat -an -f inet|inet6|unix

netstat -an -f inet - returns ipv4 only information

netstat -n -f inet

netstat -anf inet -P tcp

netstat -anf inet -P udp

State Database Replicas - Introduction

Note: At least 3 replicas are required for a consistent, functional, multi-user Solaris system.

3 - yields at least 2 replicas in the event of a failure
Note: if replicas are on same slice or media and are lost, then Volume Management will fail, causing loss of data.
Note: place replicas on as many distinct controllers/disks as possible

Note: Max of 50 replicas per disk set

Note: Volume Management relies upon Majority Consensu Algorithm (MCA) to determine the consistency of the volume information

3 replicas = 1.5(half) = 1-rounded-down +1 = 2 = MCA(half +1)

Note: try to create an even amount of replicas
4 replicas = 2(half) + 1 = 3

State database replica is approximately 4MB by default - for local storage

Rules regarding storage location of state database replicas:
1. dedicated partition/slice - c0t1d0s3
2. local partition that is to be used in a volume(RAID 0/1/5)
3. UFS logging devices
4. '/', '/usr', 'swap', and other UFS partitions CANNOT be used to store state database replicas

Solaris Volume Management - Introduction

Solaris' Volume Management permits the creation of 5 object types:
1. Volumes(RAID 0(concatenation or stripe)/1(mirroring)/5(striping with parity)
2. Soft partitions - permits the creation of very large storage devices
3. Hot spare pools - facilitates provisioning of spare storage for use when RAID-1/5 volume has failed
i.e. MIRROR
-DISK1
-DISK2
-DISK3 - spare

4. State database replica - MUST be created prior to volumes
- Contains configuration & status of ALL managed objects (volumes/hot spare pools/Soft partitions/etc.)

5. Disk sets - used when clustering Solaris in failover mode

Note: Volume Management facilitates the creation of virtual disks
Note: Virtual disks are accessible via: /dev/md/dsk & /dev/md/rdsk
Rules regarding Volumes:
1. State database replicas are required
2. Volumes can be created using dedicated slices
3. Volumes can be created on slices with state database replicas
4. Volumes created by volume manager CANNOT be managed using 'format', however, can be managed using CLI-tools (metadb, metainit) and GUI tool (SMC)
5. You may use tools such as 'mkfs', 'newfs', 'growfs'
6. You may grow volumes using 'growfs'

Creating a Swap File/Partition in Solaris

swap -l | -s - to display swap information

mkfile size location_of_file - to create swap file
mkfile 512m /data2/swap2

swap -a /data2/swap2 - activates swap file

To remove swap file:
swap -d /data2/swap2 - removes swap space from kernel. does NOT remove file
rm -rf /data2/swap2

###Swap Partition Creation###
format - select disk - partition - select slice/modify
swap -a /dev/dsk/c0t2d0s1

Modify /etc/vfstab

Implementing a Temporary File System (TEMPFS) in Solaris

TempFS provides in-memory (RAM), very fast, storage and boosts application performance

Steps:
1. Determine available memory and the amount you can spare for TEMPFS
-prtconf
- allocate 100MB
2. Execute mount command:

mkdir /tempdata && chmod 777 /tempdata && mount -F tmpfs -osize=100m swap /tempdata

Note: TEMPFS data does NOT persist/survive across reboots
Note: TEMPFS data is lost when the following occurs:
1. TEMPFS mount point is unmounted: i.e. umount /tempdata
2. System reboot

Modify /etc/vfstab to include the TEMPFS mount point for reboots

swap - /tempdata tmpfs - yes -

How to determine file system associated with device in Solaris

1. fstyp /dev/dsk/c0t0d0s0 - returns file system type
2. grep mount point from /etc/vfstab - returns matching line
grep /var /etc/vfstab
3. cat /etc/mnttab - displays currently mounted file system

Steps to partition and create file systems on a Solaris Disk

1. unmount existing file systems
-umount /data2 /data3

2. confirm fdisk partitions via 'format' utility
-format - select disk - select fdisk

3. use partition - modify to create slices on desired drives
DISK1
-slice 0 - /dev/dsk/c0t1d0s0
DISK2
-slice 0 - /dev/dsk/c0t2d0s0

4. Create file system using 'newfs /dev/rdsk/c0t0d0s0'

5. Use 'fsck /dev/rdsk/c0t1d0s0' to verify the consistency of the file system

6. Mount file systems at various mount points
mount /dev/dsk/c0t1d0s0 /data2 && mount /dev/dsk/c0t2d0s0 /data3

7. create entries in Virtual File System Table (/etc/vfstab) file

Wednesday, May 21, 2008

NAS and SAN - A Comparison (for newbies)

At first glance NAS and SAN might seem almost identical, and in fact many times either will work in a given situation. After all, both NAS and SAN generally use RAID connected to a network, which then are backed up onto tape. However, there are differences -- important differences -- that can seriously affect the way your data is utilized. For a quick introduction to the technology, take a look at the diagrams below.

Wires and Protocols
Most people focus on the wires, but the difference in protocols is actually the most important factor. For instance, one common argument is that SCSI is faster than ethernet and is therefore better. Why? Mainly, people will say the TCP/IP overhead cuts the efficiency of data transfer. So a Gigabit Ethernet gives you throughputs of 60-80 Mbps rather than 100Mbps.

But consider this: the next version of SCSI (due date ??) will double the speed; the next version of ethernet (available in beta now) will multiply the speed by a factor of 10. Which will be faster? Even with overhead? It's something to consider.

The Wires
--NAS uses TCP/IP Networks: Ethernet, FDDI, ATM (perhaps TCP/IP over Fibre Channel someday)
--SAN uses Fibre Channel

The Protocols
--NAS uses TCP/IP and NFS/CIFS/HTTP
--SAN uses Encapsulated SCSI

More Differences

NAS - Almost any machine that can connect to the LAN (or is interconnected to the LAN through a WAN) can use NFS, CIFS or HTTP protocol to connect to a NAS and share files.
SAN - Only server class devices with SCSI Fibre Channel can connect to the SAN. The Fibre Channel of the SAN has a limit of around 10km at best

NAS - A NAS identifies data by file name and byte offsets, transfers file data or file meta-data (file's owner, permissions, creation data, etc.), and handles security, user authentication, file locking
SAN - A SAN addresses data by disk block number and transfers raw disk blocks.

NAS - A NAS allows greater sharing of information especially between disparate operating systems such as Unix and NT.
SAN - File Sharing is operating system dependent and does not exist in many operating systems.

NAS - File System managed by NAS head unit
SAN - File System managed by servers

NAS - Backups and mirrors (utilizing features like NetApp's Snapshots) are done on files, not blocks, for a savings in bandwidth and time. A Snapshot can be tiny compared to its source volume.
SAN - Backups and mirrors require a block by block copy, even if blocks are empty. A mirror machine must be equal to or greater in capacity compared to the source volume.

What's Next?
NAS and SAN will continue to butt heads for the next few months or years, but as time goes on, the boundaries between NAS and SAN are expected to blur, with developments like SCSI over IP and Open Storage Networking (OSN), the latter recently announced at Networld Interop. Under the OSN initiative, many vendors such as Amdahl, Network Appliance, Cisco, Foundry, Veritas, and Legato are working to combine the best of NAS and SAN into one coherent data management solution.

SAN / NAS Convergence
As Internet technologies like TCP/IP and Ethernet have proliferated worldwide, some SAN products are making the transition from Fibre Channel to the same IP-based approach NAS uses. Also, with the rapid improvements in disk storage technology, today's NAS devices now offer capacities and performance that once were only possible with SAN. These two industry factors have led to a partial convergence of NAS and SAN approaches to network storage.

Tips on Veritas Volume Manager (VxVM)

Important Notes for Installing VERITAS Volume Manager (VxVM)

* Check what VERITAS packages are currently running:
# pkginfo | grep –i VRTS

* Make sure the boot disk has at least two free partitions with 2048 contiguous sectors (512 bytes) aviable.
# prtvtoc /dev/rdsk/c0t0d0

* Make sure to save the boot disk information by using the “prtvtoc” command.
# prtvtoc /dev/rdsk/c0t0d0 > /etc/my_boot_disk_information

* Make sure to have a backup copy of the /etc/system and /etc/vfstab files.
* Add packages to your system.
# cd 2location_of_your_packages
# pkgadd –d . VRTSvxvm VRTSvmman VRTSvmdoc

* Add the license key by using vxlicinst.
# vxlicinst

* Then run the Volume Manager Installation program.
# vxinstall

* Check the .profile file to ensure the following paths:
# PATH=$PATH:/usr/lib/vxvm/bin:/opt/VRTSobgui/bin:/usr/sbin:/opt/VRTSob/bin
# MANPATH=$MANPATH:/opt/VRTS/man
# export PATH MANPATH

The VERITAS Enterprise Administrator (VEA) provides a Java-based graphical user interface for managing Veritas Volume Manager (VxVM).

Important Notes for how to set up VEA:

* Install the VEA software.
# cd 2location_of_your_packages
# pkgadd –a ../scripts/VRTSobadmin –d . VRTSob VRTSobgui VRTSvmpro VRTSfspro

* Start the VEA server if not, running.
# vxsvc –m (Check or monitor the VEA server is running)
# vxsvc (Start the VEA server)

* Start the Volume Manager User interface.
# vea &

The Most handy Volume Manager commands:

* # vxdiskadm
* # vxdctl enable (Force the VxVM configuration to rescan for the disks. See devfsadm)
* # vxassist (Assist to create a VxVM volume.)
* # vxdisk list rootdisk (Displays information about the header contents of the root disk.)
* # vxdg list rootdg (Displays information about the content of the rootdg disk group.)
* # vxprint –g rootdg –thf | more (Displays information about volumes in rootdg.)

In order to create VERITAS Volume Manager, you may use the following three methods:

(This article emphases on the CLI method.)

* VEA
* Command Line Interface (CLI)
* vxdiskadm

Steps to create a disk group:
* # vxdg init accountingdg disk01=c1t12d0

Steps to add a disk to a disk group:

* View the status of the disk.
# vxdisk list --or-- # vxdisk –s list

* Add one un-initialized disk to the free disk pool.
# vxdisksetup –i c1t8d0

* Add the disk to a disk group called accoutingdg.
# vxdg init accountingdg disk01=c1t8d0
# vxdg –g accountingdg adddisk disk02=c2t8d0

Steps to split objects between disk groups:
* # vxdg split sourcedg targetdg object …

Steps to join disk groups:

* # vxdg join sourcedg targetdg

Steps to remove a disk from a disk group:

* Remove the “disk01” disk from the “accountingdg” diskgroup.
# vxdg –g accountingdg rmdisk=disk01

Steps to remove a device from the free disk pool:

* Remove the c1t8d0 device from the free disk pool.
# vxdiskunsetup c2t8d0

Steps to manage disk group:

* To deport and import the “accountingdg” disk group.
# vxdg deport accountingdg
# vxdg –C import accountingdg
# vxdg –h other_hostname deport accountingdg

* To destroy the “accountingdg” disk group.
# vxdg destroy accountingdg

Steps to create a VOLUME:

* # vxassist –g accountingdg make payroll_vol 500m
* # vxassist –g accountingdg make gl_vol 1500m

Steps to mount a VOLUME:

If using ufs:

* # newfs /dev/vx/rdsk/accountingdg/payroll_vol
* # mkdir /payroll
* # mount –F ufs /dev/vx/dsk/accountingdg/payroll_vol /payroll

If using VxFS:

* # mkfs –f vxfs /dev/vx/rdsk/accountingdg/payroll_vol
* # mkdir /payroll
* # mount –F vxfs /dev/vx/dsk/accountingdg/payroll_vol /payroll

Steps to resize a VOLUME:

* # vxresize –g accountingdg payroll_vol 700m

Steps to remove a VOLUME:
* # vxedit –g accountingdg –rf rm payroll_vol

Steps to create a two striped and a mirror VOLUME:
* # vxassist –g accounting make ac_vol 500m layout=stripe,mirror

Steps to create a raid5 VOLUME:
* # vxassist –g accounting make ac_vol 500m layout=raid5 ncol=5 disk01 …

Display the VOLUME layout:
* # vxprint –rth

Add or remove a mirror to an existing VOLUME:
* # vxassist –g accountingdg mirror payroll_vol
* # vxplex –g accounitngdg –o rm dis payroll_plex01

Add a dirty region log to an existing VOLUME and specify the disk to use for the drl:
* # vxassist –g accountingdg addlog payroll_vol logtype=drl disk04

Move an existing VOLUME from its disk group to another disk group:
* # vxdg move accountingdg new_accountingdg payroll_vol

To start a VOLUME:
* #vxvol start

Steps to encapsulate and Root Disk Mirroring
* Use “vxdiskadm” to place another disk in rootdg with the same size or greater.
* Set the eeprom variable to enable VxVM to create a device alias in the openboot program.

# eeprom use-nvramrc?=true

* Use “vxdiskadm” to mirror the root volumes. (Option 6)
* Test you can reboot from mirror disk.

# vxmend off rootvol-01 (disable the boot disk)
# init 6
OK> devalias (check available boot disk aliases)
OK> boot vx-disk01

Write a script to use the “for” statement to do some work.

# for i in 0 1 2 3 4
>do
>cp –r /usr/sbin /mydir${i}
>mkfile 5m /mydir${i}
>dd if=/mydir/my_input_file of=/myother_dir/my_output_file &
>done

Tuesday, May 20, 2008

Veritas Volume Manager - Quick Start Command Reference

Setting Up Your File System

Make a VxFS file system - mkfs –F vxfs [generic_options] [-o vxfs_options] char_device [size]
Mount a file system - mount –F vxfs [generic_options] [-o vxfs_options] block_device mount_point
Unmount a file system - umount mount_point
Determine file system type - fstype [-v] block_device
Report free blocks/inodes - df –F vxfs [generic_options] [-o s] mount_point
Check/repair a file system - fsck –F vxfs [generic_options] [y|Y] [n|N] character_device

Online Administration

Resize a file system - fasdm [-b newsize] [-r raw_device] mount_point
Dump a file system - vxdump [options] mount_point
Restore a file system - vxrestore [options] mount_point
Create a snapshot file system - mount –F vxfs –o snapof=source_block_device,[snapsize=size] destination_block_device snap_mount_point
Create a storage checkpoint - fsckptadm [-nruv] create ckpt_name mount_point
List storage checkpoints - fsckptadm [-clv] list mount_point
Remove a checkpoint - fsckptadm [-sv] remove ckpt_name mount_point
Mount a checkpoint - mount –F vxfs –o ckpt=ckpt_name pseudo_device mount_point
Unmount a checkpoint - umount mount_point
Change checkpoint attributes - fsckptadm [-sv] set [nodata|nomount|remove] ckpt_name
Upgrade the VxFS layout - vxupgrade [-n new_version] [-r raw_device] mount_point
Display layout version - vxupgrade mount_point

Defragmenting a file system

Report on directory fragmentation - fsadm –D mount_point
Report on extent fragmentation - fsadm –E [-l largesize] mount_point
Defragment directories - fsadm –d mount_point
Defragment extents - fsadm –e mount_point
Reorganize a file system to support files > 2GB - fsadm –o largefiles mount_point

Intent Logging, I/O Types, and Cache Advisories

Change default logging behavior - fsck –F vxfs [generic_options] –o delaylog|tmplog|nodatainlog|blkclear block_device mount_point
Change how VxFS handles buffered I/O operations - mount –F vxfs [generic_options] –o mincache=closesync|direct|dsync|unbuffered| tmpcache block_device mount_point
Change how VxFS handles I/O requests for files opened with O_SYNC and O_DSYNC - mount –F vxfs [generic_options] –o convosync=closesync|direct|dsync|unbuffered |delay block_device mount_point

Quick I/O

Enable Quick I/O at mount - mount –F vxfs –o qio mount_point
Disable Quick I/O - mount –F vxfs –o noqio mount_point
Treat a file as a raw character device - filename::cdev:vxfs:
Create a Quick I/O file through a symbolic link - qiomkfile [-h header_size] [-a] [-s size] [-e|-r size] file
Get Quick I/O statistics - qiostat [-i interval][-c count] [-l] [-r] file
Enable cached QIO for all files in a file system - vxtunefs –s –o qio_cache_enable=1 mnt_point
Disable cached QIO for a file - qioadmin –S filename=OFF mount_point

Mirroring Disk With Solaris Disksuite (formerly Solstice)

The first step to setting up mirroring using DiskSuite is to install the DiskSuite packages and any necessary patches for systems prior to Solaris 9. SVM is part of the base system in Solaris 9. The latest recommended version of DiskSuite is 4.2 for systems running Solaris 2.6 and Solaris 7, and 4.2.1 for Solaris 8. There are currently three packages and one patch necessary to install DiskSuite 4.2. They are:

SUNWmd (Required)
SUNWmdg (Optional GUI)
SUNWmdn (Optional SNMP log daemon)
106627-19 (obtain latest revision)

The packages should be installed in the same order as listed above. Note that a reboot is necessary after the install as new drivers will be added to the Solaris kernel. For DiskSuite 4.2.1, install the following packages:

SUNWmdu (Commands)
SUNWmdr (Drivers)
SUNWmdx (64-Bit Drivers)
SUNWmdg (Optional GUI)
SUNWmdnr (Optional log daemon configs)
SUNWmdnu (Optional log daemon)

For Solaris 2.6 and 7, to make life easier, be sure to update your PATH and MANPATH variables to add DiskSuite's directories. Executables reside in /usr/opt/SUNWmd/sbin and man pages in /usr/opt/SUNWmd/man. In Solaris 8, DiskSuite files were moved to "normal" system locations (/usr/sbin) so path updates are not necessary.

The Environment
In this example we will be mirroring two disks, both on the same controller. The first disk will be the primary disk and the second will be the mirror. The disks are:

Disk 1: c0t0d0
Disk 2: c0t1d0

The partitions on the disks are presented below. There are a few items of note here. Each disk is partitioned exactly the same. This is necessary to properly implement the mirrors. Slice 2, commonly referred to as the 'backup' slice, which represents the entire disk must not be mirrored. There are situations where slice 2 is used as a normal slice, however, this author would not recommend doing so. The three unassigned partitions on each disk are configured to each be 10MB. These 10MB slices will hold the DiskSuite State Database Replicas, or metadbs. More information on the state database replicas will be presented below. In DiskSuite 4.2 and 4.2.1, a metadb only occupies 1034 blocks (517KB) of space. In SVM, they occupy 8192 blocks (4MB). This can lead to many problems during an upgrade if the slices used for the metadb replicas are not large enough to support the new larger databases.

Disk 1:
c0t0d0s0: /
c0t0d0s1: swap
c0t0d0s2: backup
c0t0d0s3: unassigned
c0t0d0s4: /var
c0t0d0s5: unassigned
c0t0d0s6: unassigned
c0t0d0s7: /export

Disk 2:
c0t1d0s0: /
c0t1d0s1: swap
c0t1d0s2: backup
c0t1d0s3: unassigned
c0t1d0s4: /var
c0t1d0s5: unassigned
c0t1d0s6: unassigned
c0t1d0s7: /export

The Database State Replicas

The database state replicas serve a very important function in DiskSuite. They are the repositories of information on the state and configuration of each metadevice (A logical device created through DiskSuite is known as a metadevice). Having multiple replicas is critical to the proper operation of DiskSuite.

· There must be a minimum of three replicas. DiskSuite requires at least half of the replicas to be present in order to continue to operate.
· 51% of the replicas must be present in order to reboot.
· Replicas should be spread across disks and controllers where possible.
· In a three drive configuration, at least one replica should be on each disk, thus allowing for a one disk failure.
· In a two drive configuration, such as the one we present here, there must be at least two replicas per disk. If there were only three and the disk which held two of them failed, there would not be enough information for DiskSuite to function and the system would panic.

Here we will create our state replicas using the metadb command:

# metadb -a -f /dev/dsk/c0t0d0s3
# metadb -a /dev/dsk/c0t0d0s5
# metadb -a /dev/dsk/c0t0d0s6
# metadb -a /dev/dsk/c0t1d0s3
# metadb -a /dev/dsk/c0t1d0s5
# metadb -a /dev/dsk/c0t1d0s6

The -a and -f options used together create the initial replica. The -a option attaches a new database device and automatically edits the appropriate files.

Initializing Submirrors

Each mirrored meta device contains two or more submirrors. The meta device gets mounted by the operating system rather than the original physical device. Below we will walk through the steps involved in creating metadevices for our primary filesystems. Here we create the two submirrors for the / (root) filesystem, as well as a one way mirror between the meta device and its first submirror.

# metainit -f d10 1 1 c0t0d0s0
# metainit -f d20 1 1 c0t1d0s0
# metainit d0 -m d10

The first two commands create the two submirrors. The -f option forces the creation of the submirror even though the specified slice is a mounted filesystem. The second two options 1 1 specify the number of stripes on the metadevice and the number of slices that make up the stripe. In a mirroring situation, this should always be 1 1. Finally, we specify the logical device that we will be mirroring.

After mirroring the root partition, we need to run the metaroot command. This command will update the root entry in /etc/vfstab with the new metadevice as well as add the appropriate configuration information into /etc/system. Ommitting this step is one of the most common mistakes made by those unfamiliar with DiskSuite. If you do not run the metaroot command before you reboot, you will not be able to boot the system!

# metaroot d0

Next, we continue to create the submirrors and initial one way mirrors for the metadevices which will replace the swap, and /var partitions.

# metainit -f d11 1 1 c0t0d0s1
# metainit -f d21 1 1 c0t1d0s1
# metainit d1 -m d11
# metainit -f d14 1 1 c0t0d0s4
# metainit -f d24 1 1 c0t1d0s4
# metainit d4 -m d14
# metainit -f d17 1 1 c0t0d0s7
# metainit -f d27 1 1 c0t1d0s7
# metainit d7 -m d17

Updating /etc/vfstab

The /etc/vfstab file must be updated at this point to reflect the changes made to the system. The / partition will have already been updated through the metaroot command run earlier, but the system needs to know about the new devices for swap and /var. The entries in the file will look something like the following:

/dev/md/dsk/d1 - - swap - no -
/dev/md/dsk/d4 /dev/md/rdsk/d4 /var ufs 1 yes -
/dev/md/dsk/d7 /dev/md/rdsk/d7 /export ufs 1 yes -

Notice that the device paths for the disks have changed from the normal style
/dev/dsk/c#t#d#s# and /dev/rdsk/c#t#d#s# to the new metadevice paths,
/dev/md/dsk/d# and /dev/md/rdsk/d#.

The system can now be rebooted. When it comes back up it will be running off of the new metadevices. Use the df command to verify this. In the next step we will attach the second half of the mirrors and allow the two drives to synchronize.

Attaching the Mirrors
Now we must attach the second half of the mirrors. Once the mirrors are attached it will begin an automatic synchonization process to ensure that both halves of the mirror are identical. The progress of the synchonization can be monitored using the metastat command. To attach the submirrors, issue the following commands:

# metattach d0 d20
# metattach d1 d21
# metattach d4 d24
# metattach d7 d27

Final Thoughts

With an eye towards recovery in case of a future disaster it may be a good idea to find out the physical device path of the root partition on the second disk in order to create an Open Boot PROM (OBP) device alias to ease booting the system if the primary disk fails.

In order to find the physical device path, simply do the following:

# ls -l /dev/dsk/c0t1d0s0

This should return something similar to the following:

/sbus@3,0/SUNW,fas@3,8800000/sd@1,0:a

Using this information, create a device alias using an easy to remember name such as altboot. To create this alias, do the following in the Open Boot PROM:
ok nvalias altboot /sbus@3,0/SUNW,fas@3,8800000/sd@1,0:a

It is now possible to boot off of the secondary device in case of failure using boot altboot from the OBP.

Gigabit Ethernet Configuration

These days all the newer Sun Systems ship with GE (Gigabit Ethernet) Port. Let me give you a quick run down on how to go about configuring the GE Port.

First, to make sure that your Network Interface Card is actually GE Supported, run the following command:

# kstat ce | more
module: ce instance: 0
name: ce0 class: net
adv_cap_1000fdx 1
adv_cap_1000hdx 1
adv_cap_100T4 0
adv_cap_100fdx 1
adv_cap_100hdx 1
adv_cap_10fdx 1
adv_cap_10hdx 1
adv_cap_asmpause 0
adv_cap_autoneg 1
adv_cap_pause 0

You the line adv_cap_1000fdx, this means that the interface support GE link. For better through put, i would suggest you to use a Cat-6 cable instead of Cat-5e cable for better results. Cat-5e has low MHz frequency as compared to Cat-6, so Cat-5e can actually be a bottle neck for you if network traffic is high. Next we go about configuring the interface. Dont worry, its pretty simple and straight forward.

The ndd is a nice little utility used to examine and set kernel parameters, namely the TCP/IP drivers. Most kernel parameters accessible through ndd can be adjusted without rebooting the system. To see which parameters are available for a particular driver, use the following ndd command:

# ndd /dev/ce \?

Here /dev/ce is the name of the driver and command lists the parameters for this particular driver. Use of backslash in from of "?" prevents the shell from interpreting the question mark as a special character. However, in most cases even ignoring backslash should give you same results.

Some Interpretations-

# ndd -set /dev/ce instance 2
Interpretation:
Choose ce2 network interface to set parameters.

# ndd -get /dev/ce link_mode
Interpretation:
0 -- half-duplex
1 -- full-duplex

# ndd -get /dev/ce link_speed
Interpretation:
0 -- 10 Mbit
1 -- 100 Mbit
1000 -- 1 Gbit

Usually in most cases, if you enable your network interface to adv_autoneg_cap, it should detect the GE Connection and jump your interface to 1000mbps. But in some cases it might not. In such a situation, i would strongly suggest you to set force GE link on the switch. However, even if that doesnt work, then move to forcing your NIC to GE link. Below steps can be followed.

To Switch the NIC to Auto Negotiation-

ndd -set /dev/ce instance 2
ndd -set /dev/ce adv_1000fdx_cap 1
ndd -set /dev/ce adv_1000hdx_cap 0
ndd -set /dev/ce adv_100fdx_cap 1
ndd -set /dev/ce adv_100hdx_cap 0
ndd -set /dev/ce adv_10fdx_cap 0
ndd -set /dev/ce adv_10hdx_cap 0
ndd -set /dev/ce adv_autoneg_cap 1

To force your NIC to 1000fdx

ndd -set /dev/ce instance 2
ndd -set /dev/ce adv_1000fdx_cap 1
ndd -set /dev/ce adv_1000hdx_cap 0
ndd -set /dev/ce adv_100fdx_cap 0
ndd -set /dev/ce adv_100hdx_cap 0
ndd -set /dev/ce adv_10fdx_cap 0
ndd -set /dev/ce adv_10hdx_cap 0
ndd -set /dev/ce adv_autoneg_cap 0

This should do. In case you want to make these changes permanent, i would suggest that you, create a file /etc/init.d/nddconfig and add following entries into the file-

#!/bin/sh

ndd -set /dev/ce instance 2
ndd -set /dev/ce adv_1000fdx_cap 0
ndd -set /dev/ce adv_1000hdx_cap 0
ndd -set /dev/ce adv_100fdx_cap 1
ndd -set /dev/ce adv_100hdx_cap 0
ndd -set /dev/ce adv_10fdx_cap 0
ndd -set /dev/ce adv_10hdx_cap 0
ndd -set /dev/ce adv_autoneg_cap 0

# ln -s /etc/init.d/nddconfig /etc/rc3.d/S31nddconfig

NOTE: The /etc/system settings are not supported for configuring ce Ethernet adapters during system startup; you may either use ndd commands in an /etc/rc?.d script or create a /platform/sun4u/kernel/drv/ce.conf file with appropriate settings.

Please feel free to post your questions in the comments section if you have any. I would be happy to answer them.

ZFS - Introduction

###Zettabyte File System (ZFS)###

Some Features:
1. Supports storage space of upto 256 quadrillion zettabytes (Terabytes - Petabytes - Exabytes - Zettabytes(1024 Exabytes))
2. Supports RAID-0/1 & RAID-Z(which is nothing but RAID-5 with enhancements. Best part is you dont need 3 disks to achieve RAID-5 on ZFS, even with 2 virtual devices, ZFS provides and good amount of redundancy.)
3. Support File System Snapshots (read-only copies of file systems or volumes.)
4. Supports creation of volumes (which can contain disks, partitions, files)
5. Uses storage pools to manage storage - aggregates virtual devices
6. ZFS File system attached to storage pool can grow dynamically as storage is added. No need to reformat or backup your data before you add any extra storage.
7. File systems may span multiple physical disks without any extra software or even efforts.
8. ZFS is transactional so its all or nothing. If a write / read operation fails for some reason, the entire transaction is rolled back.
9. Pools & file systems are auto-mounted. No need to maintain /etc/vfstab (automatically handled through XML Files.)
10. Supports file system hierarchies: /storage1/{home(50GB),var(100GB),etc.}
11. Supports reservation of storage: /storage1/{home(50GB),var}
12. Solaris 10, provides a secure web-based ZFS management tool @ https://localhost:6789/zfs

###ZFS - Quick Command Reference###
zpool list - lists known ZFS pools
zpool create pool_name device_name1, device_name2, device_name3, etc.
zpool create storage1 c0t1d0|/dev/dsk/c0t1d0
zfs list - returns ZFS dataset info.
zfs mount - returns pools and mount points
zpool status - returns virtual devices that constitute pools
zpool destroy storage1 - Destroys pool and associated file systems

###Create file systems within storage1###
zfs create storage1/home - creates file system named 'home' in storage1

###Set quota on existing file system###
zfs set quota=10G storage1/home

###Create user-based file system beneath storage1/home###
zfs create storage1/home/vishal

zfs get -r compression storage1 - returns compression property for file systems associated with 'storage1'

###Rename ZFS File System###
zfs rename storage1/home/vishal storage1/home/vishalnew

###Extending dynamically, pool storage###
zpool add storage1 c0t2d0

###ZFS Redundancy/Replication###
1. Mirroring - RAID-1
2. RAID-5 - RAID-Z

Virtual Devices:
1. c0t1d0 - 72GB
2. c0t2d0 - 72GB

Note: Redundancy/Replication is associated directly with the pool

zpool create storagemirror1 mirror c0t1d0 c0t2d0

###ZFS Snapshots###
It’s a read only copy of Volumes and File Systems
Use no additional space, initially

List ZFS Snapshots
zfs list –t snapshot

Create a snapshot
zfs snapshot snapraidz1/home@homesnap1 (highest number is the most recent snapshot)

zfs list –t snapshot

snapshots can be viewed within the filesystem it was created

cd /snapraidz1/home

cd .zfs (hidden directory. Wont be visible even with ls -a.)

cd snapshot
cd homesnap1

zfs destroy snapraidz1/home@homesnap1

rename a snapshot

zfs rename snapraidz1/home@homesnap3 snapraidz1/home@homesnap4

rename between different pool names is not possible.

Rollback a zfs snapshot

zfs rollback snapraidz1/home@homesnap4

File systems need to be remounted for roll back to work. Solution is use –f

zfs rollback –f snapraidz1/home@homesnap4

ZFS Clones

Features:
Writeable file systems / Volumes.
Clones are linked to snapshots.
Clone can be stored anywhere in zfs hierarchy

Create a clone

zfs clone snapraidz1/home@homesnap4 snapraidz1/homeclone1 (directory would be created)

Solaris Zones - Introduction

Solaris 10 has this excellent new feature called Zones. I was looking forward to this feature since a long time. I guess Sun was pretty late in implementing this technology on Solaris Platform. Earlier on i had worked with BSD Jails. However its implementation was a lil tricky. But in Solaris its all the more easier and involves fewer steps to quickly make it up and running. Below are some notes that i have compiled. It should give you a fair understanding of Solaris Zones.

Features:
1. Its Virtualization - i.e. VMWare, BSD Jails
2. As of now, they can host only instances of Solaris. Not other OSs.
3. Limit of 8192 zones per Solaris system
4. Primary zone (also called global zone) has access to ALL zones
5. Non-global zones, do NOT have access to other non-global zones
6. Default non-global zones derive packages from global zone
7. Program isolation - zone1(Apache), zone2(MySQL)
8. Provides 'z' commands to manage zones: zlogin, zonename, zoneadm,zonecfg

###Features of GLOBAL zone###
1. Solaris ALWAYS boots(cold/warm) to the global zone
2. Knows about ALL hardware devices attached to the system
3. Knows about ALL non-global zones

###Features of NON-GLOBAL zones###
1. Installed at a location on the filesystem of the GLOBAL zone 'zone root path' /export/home/zones/{zone1,zone2,zone3,...}
2. Share packages with GLOBAL zone
3. Manage distinct hostname and tables files
4. Cannot communicate with other non-global zones by default. NIC must be used, which means, use standard network API(TCP)
5. GLOBAL zone admin. can delegate non-global zone administration

###Zone Configuration###
Use: zonecfg - to configure zones
Note: zonecfg can be run: interactively, non-interactively, command-file modes

Requirements for non-global zones:
1. hostname
2. zone root path. i.e. /export/home/zones/testzone1
3. IP address - bound to logical or physical interface

Zone Types:
1. Sparse Root Zones - share key files with global zone
2. Whole Root Zones - require more storage

Steps for configuring non-global zone:
1. mkdir /export/home/zones/testzone1 && chmod 700 /export/home/zones/testzone1
2. zonecfg -z testzone1
3. create
4. set zonepath=/export/home/zones/testzone1 - sets root of zone
5. add net ; set address=192.168.1.60
6. set physical=e1000g0
7. (optional) set autoboot=true - testzone1 will be started when system boots
8. (optional) add attr ; set name=comment; set type=string; set value="TestZone1"
9. verify zone - verifies zone for errors
10. commit changes - commit

11. Zone Installation - zoneadm -z testzone1 install - places zone, 'testzone1' into 'installed' state. NOT ready for production
12. zoneadm -z testzone1 boot - boots the zone, changing its state

###Zlogin - is used to login to zones###
Note: each non-global zone maintains a console. Use 'zlogin -C zonename' after installing zone to complete zone configuration

Note: Zlogin permits login to non-global zone via the following:
1. Interactive - i.e. zlogin -l username zonename
2. Non-interactive - zlogin options command
3. Console mode - zlogin -C zonename
4. Safe mode - zlogin -S

zoneadm -z testzone1 reboot - reboots the zone
zlogin testzone1 shutdown

ZFS - Training Video

ZFS is still not stable for production use (thats what i have read so far), i have played around a lot with it and i feel its indeed a file system for the future. I hope something similar would be available on other platforms as well. I know for sure that Mac OS would definitely have it in its next release. This ZFS CBT would be extremely useful for newbies:

http://rapidshare.com/files/100722826/ZFS_CBT.part1.rar
http://rapidshare.com/files/100722845/ZFS_CBT.part2.rar
http://rapidshare.com/files/100722828/ZFS_CBT.part3.rar
http://rapidshare.com/files/100722803/ZFS_CBT.part4.rar
http://rapidshare.com/files/100722818/ZFS_CBT.part5.rar
http://rapidshare.com/files/100724192/ZFS_CBT.part6.rar
http://rapidshare.com/files/100732713/ZFS_CBT.part7.rar

I don't own the above links, so i am not sure when they will go dead.

After working with so many file systems over the years, i feel that there should be a universal file system for all storage devices. I am sure someone, somewhere is already working on this idea.

SAN Storage Configuration in Solaris

Storage area networks have moved from the cutting edge to almost commonplace. But, if this is your first time connecting remote storage to Solaris, there are a number of configuration tasks that will be new to you if you have only worked with directly attached SCSI arrays or Sun's legacy Fibre Channel arrays such as the A5200 series. In this article, I will outline steps to configure Solaris to attach to remote disk storage from HP, IBM, and Network Appliance SANs. The platform configurations have much in common, but there are some differences in the patch sets and software libraries required. In my environment, I have used a few different SAN host bus adapter cards from Emulex and JNI, and we installed Veritas Volume Manager on the systems after the SAN devices were attached.

A SAN is a network that provides access to remote devices through a serial SCSI protocol such as Fibre Channel or iSCSI. This differs from NAS (network attached storage), which uses SMB/CIFS and NFS protocols. A SAN is much more than just a Fibre loop to remote disk drives -- tape robots and other hosts can transfer data directly between nodes and thereby decrease the load on the production Ethernet interfaces, where most of the user application traffic is directed. Using a SAN fabric of Fibre Channel switches, dozens of servers from multiple platforms can be interconnected to share the same cabinet of disks for data and share the same tape devices for enterprise backups.

Cost justification for investing in a SAN might be found by comparing the cost of directly attached storage to that of a centralized pool of storage. In my environment, many of my legacy directly attached storage arrays were underutilized. Although most of my old disk arrays still had room for growth, this storage space could not be effectively shared among servers. Pooling all disk storage into one large SAN array allowed for less waste and more efficient storage expansion as needed. Another advantage of migrating our storage to SANs has been increased I/O speed. Many of our older servers use 80 MB/s SCSI or 100 MB/s Fibre Channel for local disk storage.

Migrating storage to the SAN using dual 2-Gb Fibre Channel connections on each server has brought new life to some of our older, slower machines. Our older disk arrays had little or no cache to speed I/O transactions unlike our SAN storage cabinets, which have several gigabytes of I/O cache. In practice, we found some of our heavily I/O-bound applications, such as database loads, to run in half their previous time after migrating to the SAN. There have been numerous other advantages to our SAN, such as new tools for monitoring space, tools for trend analysis, and a significant consolidation of rack space in our data center.

Preparing Solaris for SAN Connectivity:

Regardless of the brand of SAN equipment used, the place to start is with Solaris patches. We can't stress enough the importance of applying Sun's recommended patch clusters immediately after installing Solaris. We found that some features didn't work as expected until the recommended patch set and additional patches below were applied. We started with Sun's Recommended and Security patch cluster, which can be downloaded from the Web or via ftp from:

http://sunsolve.sun.com

For Solaris 8, we downloaded the 8_Recommended.zip file and unzipped this file on our boot disk where we had plenty of free space. We shut down to single user mode with init s before running the install_cluster script. We issued the uname -a command before and after the patches to ensure the kernel version had been upgraded. The kernel version is the number after the dash in the output, for example, 108528-20 is Solaris 8 at kernel version 20. Sun's best practice is to keep up to date with the latest kernel release by installing the full Recommended and Security patch cluster. The patch cluster includes the basic patches that are recommended for all systems, although additional patches may be required for specific applications.

We then downloaded the Sun StorEdge SAN 4.2 software. This can be found at www.sun.com/storage/san, and will require a sunsolve account login. The file to download is Solaris_8_SFK_packages.tar.Z. We uncompressed and tar extracted this file and noted that it contained three small Solaris packages: SUNWsan, SUNWcfpl, and SUNWcfplx. We used the pkgadd utility to install all three of these with pkgadd -d . Solaris_8_SFK_packages.

The next step was to review individual patch versions on our box and determine whether additional patches were required. We used the showrev command to list installed patches and install higher versions as needed. For example, showrev -p | grep 109524 will search for the ssd driver patch at any version. We wanted to see version two, 109524-02, or higher. The best practice is to make a list of the versions you have and compare them to the latest available versions on sunsolve.sun.com. Below are the specific patches we applied for Fibre channel and general SAN readiness on our servers. These patches are all installed separately with the patchadd command; refer to the readme files that come with these patches for specific installation instructions. After all the patches were installed, we rebooted and checked /var/adm/messages for any errors before continuing:

109524-15 /kernel/drv/ssd patch
109657-09 isp driver patch
108974-27 dada, uata, dad, sd and scsi drivers patch
109529-06 luxadm, liba5k and libg_fc patch #1
111413-08 luxadm, liba5k and libg_fc patch #2 (requires 109529-06)
110380-04 ufssnapshots support, libadm patch
110934-13 pkgtrans, pkgadd, pkgchk

Installing Host Bus Adapters

We used dual JNI (http://www.jni.com) host bus adapters to connect to our HP and IBM SANs, and we used Emulex (http://www.emulex.com) adapters to connect to our Network Appliance SAN. Thus, we can provide an example for both of these common cards. The choice of interface card will depend on vendor support and certification of card models and specific levels of firmware. Be sure to check the vendor Web sites for latest firmware releases. Power off your server and install the interface cards in appropriate slots. Our test servers were a Sun 220R system (with PCI slots) and a Sun Ultra-2 server (with SBUS slots). We installed two adapters for each of the installations below so we would have some hardware redundancy and be able to take advantage of multipathing software. Some of Sun's Enterprise servers can accept either SBUS or PCI cards by making use of Sun's I/O boards with the appropriate SBUS or PCI slots.

Network Appliance SAN Configuration

In the first installation, we connected a Sun 220R server to our Network Appliance SAN filer model F940. We installed two PCI-based Emulex LP9002L host bus adapters in our Sun server and studied /var/adm/messages to be sure we had no boot errors. The software driver for the Emulex cards was provided to us by Network Appliance on CD, although updates can be downloaded from their Web site (http://www.netapp.com). Network Appliance has repackaged the Emulex drivers to include additional tools beyond the generic drivers obtainable at http://www.emulex.com. We installed the drivers by uncompressing and tar extracting the file "ntap_solaris_fcp_1_0.tar.Z" and then running the install script in the resulting directory. The script installed the Emulex lpfc drivers and utilities, Network Appliance sanlun utility, and updated parameters in /etc/system. There was only one question to answer regarding the use of Veritas with multipathing, which we will discuss later. We rebooted after running the script.

The Network Appliance SAN filer has a worldwide node name (WWNN), which should be bound to your Sun server. Using persistent bindings between the filer (known as the target) and the Sun server's Fibre Channel cards (the initiators) means you will always get the same SCSI target IDs for your SAN disks after every reboot. Without persistent bindings, the SCSI ID could change. To set up these bindings we telneted into the SAN filer and issued the command fcp nodename. This provided us the Fibre Channel node name in two formats, with and without colons, similar to the following example:

filer> fcp nodename
Fibre Channel nodename: 50:a9:80:00:55:CC:66:CD (50a9800055CC66CD)

Once we installed the two Emulex host bus adapter cards in the Sun server, they were named by Solaris to be lpfc0 and lpfc1. We ran the newly installed lputil command (installed under the /usr/sbin/lpfc directory) to create persistent bindings. The utility is a text-based menu, so we selected the "Persistent Bindings" and "Bind Target Manually" options. We were shown a list of HBA cards in our server (lpfc0, lpfc1), entered 0 to select the first card, and selected "Node Name". We pasted in our Network Appliance Fibre Channel nodename discovered above as a block of 16 characters with no colons or dashes. We entered 1 for the target number. The target number could be set to any number from 0 to 511 and uniquely identifies the filer in the event you want to attach more than one SAN filer to the Sun server. We repeated these steps for the second HBA card lpfc1, entering the same Fibre Channel nodename and filer target 1. We performed a reconfiguration reboot at this point with reboot -- -r. We think it's a good idea to go back into the lpfc menu after the system reboots to verify that the persistent bindings were preserved and that everything looks correct.

LUN Configuration

The next step is to create LUNs on the SAN filer and make them available to the Sun server, or more specifically, make them available to the HBA cards in our Sun server. The term LUN is derived from SCSI unit number and, in this context, we are referring to SCSI disk devices, although they are virtual disks and not physical SCSI disk drives. A virtual disk is created by combining slices of physical disks into one logical volume. The physical disk structure may use mirroring or RAID5, but the resulting LUN can be treated as one unit of disk in Solaris. In our environment, the SAN provides storage for several operating systems and is not managed by the Unix administrators. How to configure and make available sets of disks or LUNs on the SAN arrays is beyond the scope of this article, but we can summarize the steps. We created an initiator group on the filer, which is a list of worldwide port names (WWPNs). Network Appliance provides a tool to make this step easier. We ran the following sanlun command on our Sun server and provided the output to our SAN systems administrator:

# /usr/sbin/sanlun fcp show adapter -c

Enter this filer command to create an initiator group for this system:

igroup create -f -t solaris "sunhost1" 10000000c9307031 10000000c930694c

The second line of output provided us with the Network Appliance igroup command to be used on the filer to create the initiator group. The name of the group is the same as the Sun host name, in this example sunhost1. The following numbers are the WWPNs of the two HBA cards in this Sun server, and the igroup command is used to add these cards into the sunhost1 igroup on the filer. We provided this information to our SAN administrator along with our request for LUNs. He created three LUNs for us and mapped them to this initiator group for our use.

On the Solaris side, we needed to configure the /kernel/drv/sd.conf file, which is used to define and configure SCSI disk storage. Each Sun server already has a generic version of this file, however, we need to expand upon it to tell Solaris to probe for new LUNs at boot time. We added the following lines to the bottom of our sd.conf file to configure three additional LUNs. Note that target 1 matches the target we mapped to our filer when configuring the HBA cards with the lputil menu:

name="sd" parent="lpfc" target=1 lun=0;
name="sd" parent="lpfc" target=1 lun=1;
name="sd" parent="lpfc" target=1 lun=2;

If you anticipate needing to add storage space to this system in the future, add plenty of extra entries to the sd.conf file. The sd.conf is only read by Solaris during a reboot, so to avoid reboots, you must define several more LUNs than you currently need by incrementing the lun number. As long as these definitions were read during the last reboot, you will be able to add LUNs on the fly with the devfsadm command.

After our sd.conf was configured and the server was rebooted, we saw the new devices listed in the Solaris format utility. We noticed warnings regarding the new LUNs indicating they had corrupt labels. When we selected a disk (LUN) by number, we received the message that it was formatted but not labeled and were asked, "Label it now?" Responding with a "y", we labeled the LUN and repeated the process for all the new LUNs. Along with the format command, we found Network Applicance's sanlun utility useful for displaying the status of our LUNs. For example, "sanlun lun show filer1" lists all the LUNs on SAN filer1, showing the Solaris device filename, LUN size, and state.

We installed two HBAs in our server so we had two paths to each LUN; for example, we had one set of disks on controller 2 and another set on controller 3. Our first disk, c2t0d0, is the same physical LUN as c3t0d0 and only the first instance needs to be labeled. Veritas will take advantage of the second path to the LUN if you install Veritas Volume Manager with Dynamic Multipathing. At this point, the LUNs looked like standard Solaris disks, and we were ready to create new file systems on them or initialize them with Veritas Volume Manager.

HP SAN Configuration

Our second installation was performed connecting an HP SAN model XP1024 to the Sun 220R server. Starting from a clean installation of Solaris on the server, we applied the Sun patches as indicated above and then installed two PCI-based JNI host bus adapters. We performed a reconfiguration reboot and again examined the /var/adm/messages to ensure we had no boot errors. Even before any software drivers for the cards were installed, we could verify that Solaris was aware of the JNI hardware by issuing the command prtconf | grep JNI. The output showed one line per card and indicated that a JNI device was installed but no driver was attached. We downloaded the drivers for the JNI cards directly from their Web site (http://www.jni.com). Our cards were model FCX-6562, and the files to download for Solaris were the driver package "JNIC146x.pkg" and the EZ Fibre GUI configuration software "EZF_22j.tar". You need to be root to perform these installs, although you do not need to be in single user mode. We installed the drivers with pkgadd -d jnic146x.pkg. In the pkgadd menu, we selected "all" to install both the JNI HBA driver and the associated libraries for Solaris. At this point, the command prtconf | grep JNI showed one line for each card indicating instance 0 and instance 1, an indication the device drivers were attached.

One difference between the Emulex and JNI products is the addition of a configuration GUI provided by JNI. We configured the sd.conf with LUN definitions and relied on the EZ Fibre GUI to perform these edits. The GUI also made tuning parameter changes for the JNI drivers, which are stored in the /kernel/drv/jnic146x.conf file. To install the EZ Fibre software, we tar extracted the ezf_22j.tar file, changed into the resulting EZF_22 directory, and ran the install.sh script. This is an X Window application, and it popped up a series of windows leading us through the installation. We accepted the license agreement and the default install directory /opt/jni/ezfibre/standalone. After installation, we started the EZ Fibre GUI by changing to the ezfibre/standalone directory and executing the ezf script. The GUI provided a view of our system, listing each JNI card and providing the card status and WWPN information.

We used the EZ Fibre configuration GUI to look up the worldwide port names for each JNI HBA card and provided these WWPNs to our SAN administrator. The card parameter changes are stored in the jnic146x.conf file. The defaults may be correct for most installs, but we hard-coded the link speed to 2 GB, disabled IP protocol, and set the topology to use a fabric instead of a private Fibre loop. Changes here were made to both HBA cards separately before rebooting.

Our HP SAN administrator uses a Web-based tool from HP called the StorageWorks command view. Using this tool, he created LUNs within the HP SAN array and assigned them to our HBA WWPNs. This created a soft zone to map the LUNs to our HBAs and is analogous to using the igroup command in the Network Appliance installation above. After the LUNs were assigned to the cards in our Sun server, we could see the LUNs as available to us in the "LUN-Level Zoning" tab in the EZ Fibre GUI. Checking them off and accepting them in the GUI made the proper edits to our sd.conf file. We repeated the process of attaching LUNs to both HBA cards. We committed changes for each card separately in the GUI and then rebooted the server. Note that persistent bindings are in effect since Dynamic Binding in the GUI is disabled by default. After exiting the EZ Fibre GUI, we examined the sd.conf file and saw the LUNs added at the bottom in the following format:

name="sd" class="scsi" target=0 lun=1;
name="sd" class="scsi" target=0 lun=2;
name="sd" class="scsi" target=0 lun=3;

Although using the GUI to add LUNs to sd.conf was convenient, it did not provide a way to add extra LUN definitions for future disk expansion. Thus, we used a text editor to edit the sd.conf file and add several more lines, incrementing the LUN number. The sd.conf file is only read at boot time, and we wanted to be able to add more disk space on the fly without having to reboot in the future. We can have up to 256 LUNs per target number. If our SAN manager provides another LUN on the fly, we can run the devfsadm command to create the Solaris device for the predefined LUN without rebooting.

IBM Shark SAN Configuration

We connected an older Sun Ultra-2 server to our Shark SAN ESS 2105-F20 using a pair of SBUS-based JNI cards (model FCE1473). Again, we downloaded the card drivers from the JNI Web site, along with the JNI EZ Fibre utility, and there were no significant differences from our previous JNI installations. This Ultra-2 was installed with Solaris 9 and, unlike with Solaris 8, we could achieve connectivity to the Shark SAN without additional patches after the base Solaris install. However, the best practice is to always install the cluster of patches 9_Recommended.zip.

We downloaded the Sun StorEdge SAN 4.2 software, Solaris_9_SFK_packages.tar.Z, from http://www.sun.com/storage/san. This is essentially the same as for Solaris 8 and contains the Solaris packages SUNWsan, SUNWcfpl, and SUNWcfplx for Solaris 9. We confirmed through the /var/adm/messages file that our hardware was running cleanly before setting up LUNs with the EZ Fibre GUI. After rebooting, we saw the familiar message about corrupt labels on our LUNs and used the Solaris format utility to label each LUN with a default Solaris label. Although not required for our installation, enhanced functionality is available through IBM's Subsystem Device Drivers (SDD) for Solaris. These drivers support the multipath configuration within the Shark and allow for load balancing multiple paths.

Veritas Volume Manager and DMP

In each of our SAN installations, every LUN that was provided to our Sun server became a Solaris disk device in the familiar Sun convention of c#t#d#, where c = controller, t = target, d = disk. The LUNS as defined in sd.conf as 0, 1, 2 became Solaris disk devices c2t0d0, c2t0d1, c2t0d2. We saw them again as controller 3 (c3t0d0, c3t0d1, c3t0d2) since we have two HBA paths to the same devices. We worked with our SAN administrator to create LUNs that were roughly 7 Gb in size. Although we could create LUNs many times this size, we thought it would be less wasteful to add disks in increments of 7 Gb, one or two LUNs at a time, as our applications required more space.

We needed a utility that allowed us to grow and expand file systems on the fly without reboots. With this setup, we could add LUNs on the fly if they were already reserved in the sd.conf file, however, we could not resize an existing LUN. Although our SAN administrator could resize a LUN on the backend, it would require us to create a new file system on the Solaris device. We could back up, resize, and rebuild the LUN and Solaris disk device, and then restore our data. But we could not keep our file system online during that process.

Veritas Volume Manager allows us to create Unix file systems that span multiple LUNs and also provides tools to resize Unix file systems without taking them offline. If one of our mounted file systems is running out of space, we can import another LUN, initialize it as a Veritas disk, and extend the file system onto this disk. It is not necessary to unmount the file system during the process.

In our three SAN environments, we installed Veritas Volume Manager version 3.5. Veritas recommends also installing their maintenance pack update MP1 for 3.5. (At the time of writing, MP1 consisted of Sun patches 113210-02 for Solaris 8, 112392-04 for vxvm, and three for vea, 113203-02, 113595-02, and 13596-02. On our older Veritas 3.2 installations, we downloaded 113201-02 for vxvm, 111904-06 for vmsa, and 110435-08 for vxfs.) There was nothing unusual about the installation in a SAN environment compared to legacy disk arrays. Once the LUNs are visible as Solaris devices under the format utility and have been labeled, they can be brought under Veritas control. Veritas should see the disks after a reconfiguration reboot or, if you have added LUNs on the fly, you can run the Veritas vxdctl enable command to make them visible to Veritas. The vxdisk list command is convenient for checking whether all your disk devices are known to Veritas. We use the vxdiskadm menu to initialize all our LUN disks and then the Veritas vea (previously vmsa) GUI to create volumes and file systems.

As mentioned previously, we have dual HBAs in each server so there is one instance of each disk under each controller c2 and c3. Veritas is installed with Dynamic Multipathing by default and will configure itself to show only one instance of each disk as seen in the vxdisk list output. We verified that Veritas is aware of the second path to each of these disks by listing the detailed information on each disk. For example, vxdisk list c2t0d0s2 will show a page of detail about the disk, including multipath information near the bottom:

Multipathing information:
numpaths: 2
c2t0d0s2 state=enabled
c3t0d0s2 state=enabled

An enabled state means that DMP has been enabled in Veritas, but it is not an immediate indicator of the physical connection status. To certify our servers during the installation process, we watched a tail -f /var/adm/messages on our system and unplugged one Fibre HBA connection at a time to observe the failure and reconnection of each path to the LUN devices.

In our HP SAN environment, we downloaded an additional software package, the Veritas Array Support Library. Starting with Volume Manager version 3.2, Veritas introduced the Device Discovery Layer (DDL), which enhances the Veritas configuration daemon to discover multipathing attributes of disks. Support for some disk arrays is built in before adding these libraries and vxddladm listsupport will identify them. However, check the Veritas Web site at support.veritas.com to see whether a library is available for your specific array. In the case of our HP SAN, the array support library enabled Veritas to correctly identify the array and provide additional advanced functionality relative to command devices, which are reserved devices for each host server to communicate with the array. We did not install Veritas support libraries for our Shark and Network Appliance SANs; however, we issued the command vxddladm addjbod vid=NETAPP for the Network Appliance SAN to identify NETAPP to the Veritas DDL.

Summary

We have shown examples of configuring Solaris to connect to three different vendors' SANs. Although there are differences in how to configure the HBA software drivers for various types of cards, the steps to achieve connectivity are similar. Before beginning an installation, we ensure that our system has been upgraded to the latest patch sets available. Patch releases are frequently released, and time spent researching the latest versions on sunsolve before installing is time well spent. Once the system is running with the latest patches, SAN software, and HBA drivers, we provide the WWPN numbers of our HBA cards to our SAN administrator. He creates virtual disks and puts them into a group or soft zone, which restricts their use only to our HBAs.

We then edit our Solaris sd.conf file to add these new LUNs and instruct Solaris to define them as disk devices during the next reconfiguration reboot. We configure the HBA driver software to persistently bind the SCSI targets for the new LUNs so they will be consistent across reboots. The new disk devices are labeled with the Solaris format utility and initialized with Veritas Volume Manager. We install enhanced libraries for Veritas DDL if they exist for our vendor's SAN and, finally, test the functionality of dynamic multipathing. Our first attempt to connect Solaris to a SAN was challenging because we needed to learn several new concepts, such as configuring the sd.conf file and how to bind WWPN numbers in our environment. We hope we have provided enough of an overview of these concepts here to make this process easier for others.

Solaris Guru ~ Helps you Get Quick Start on Solaris

Categories