Friday, February 23, 2007

How Netbackup works???

I was trying to simplify the understanding of the netbackup processes. Here's what I could manage.

On master server

bprd must be running.

ltid must be running.

Scheduled backup operations

bprd -> bpsched -> consults bpdbm to determine
1. which jobs need to be run.
2. figure out req of jobs to be run.

This is automated and is governed bu the global Wakeup interval. If bpsched finds that there is no job to be done, it quitely exits.

bpsched now starts another bpsched (lets call it bpsched-B) which is given the onus of executing the tasks.

bpsched-B also spawns multiple bpscheds,one for each task it needs to spawn.

Each of these bpsched starts a bpcd process on the media server required by the policy. The media server is determined by the storage unit.

bpcd -> bpbrm -> bptm (on media-server) -> bpcd (client) -> bpbkar to start sending data ->
meta data to bpbrm ->bpdbm (update catalog)

At the same time

bptm (on media-server)
bptm (on media server) -> vmd (media to be used).
bptm (on media server) -> ltid (robotic communication, tape request).
bptm (on media server) -> tpreq

bptm (child process) -> bpbkar on client and receiving and writing out to the data buffers.

Once bpbkar stops sending out data to the bptm on the media server, it exits and the bpsched child will now communicate to its parent bpsched-B and bpsched-B will check if there are other jobs it scheduled running. If no it send a signal to the bpsched-A.

Incase there are no more jobs to be scheduled, bpsched-A exits.

References : Veritas Troubleshooting Guide and this post from Kemal Badur.

Wednesday, February 21, 2007

Off Host SnapShot

I was just revising my VxVM fundas, since it has been quite sometime before I have actually done it. More involved with backup off late.

Fast-Resync FlashSnap (Licensed Feature), Lets discuss 3rd mirror snapshot only as I have worked only on this feature.

Step 1 : Test bed starts with creation of a volume.

vxassist -g datadg make vol01 10g

Step 2 : Enable Fast Resync

vxsnap -g datadg prepare vol01

This also adds a DCO (Data Change Object), tracks changes and is

persistent across reboots.

Step 3 : Add mirror to the volume.

vxsnap -g datadg addmir datavol

This will add a mirror plex to the vol01, lets call it vol01-02

Step 4 : Now to create instant FULL sized snapshot.

vxsnap -g datadg make source=vol01/newvol=snapvol01/plex=vol01-02

Step 5 : Off host snap shot.

vxdg split datadg offdg snapvol01

Step 6 : Deport the DG

vxdg deport offdg

Step 7 : On the remote host, import the DG and mount. Use this

copy to perform backup operations, MIS e.t.c.

Step 8 : Deport the dg on the remote host.

vxdg deport offdg

Step 9 : Import it back on the original host.

vxdg import offdg

Step 10 : Join the DG's

vxdg join offdg datadg

Step 11 : Start the snapshot vol

vxvol -g datadg startall

Step 12 : Refresh the snapshot volume with the original volume.

vxsnap -g datadg refresh snapvol01 vol01

Difference between UFS and VxFS :-

The following differences summarize the tests done by veritas.

1. VxFS has faster recovery from unusual events like disk corruption.
2. VxFS is 8-20 % faster on database performance than UFS.
3. fsck on vxfs are 6 times faster than UFS with logging.
4. Performs better during multiprocess operations. 300% more than UFS during concurrent loads.
5. Emhanced Mount options like blkclear, closesync.
6. Administration is online, reszing, defragmentation e.t.c.

The most prominent difference is the extent based allocation thats default in vxfs as opposed to the block based allocation of UFS.

An extent is cluster of contigous blocks. Writes to extents is faster during multiple operations since more number of disks can respond. Also useful when files are large and when the fs is subjected to sequential i/o.

Sunday, February 04, 2007

SFCFS – Storage Foundation Cluster File System

SFCFS is a product for sharing a single file system among multiple nodes. Luns from a storage sub-system need to be shared amongst the various nodes in a cluster and SFCFS maintains consistency on the FS shared by multipe nodes.

Say node-A, node-B, node-C share a file system /users, and node-A goes down, users can still access the /users filesystem on node-B and node-C on the cluster.

How does SFCFS benefit??

  1. Load balance applications.
  2. Shared FS can be resized dynamically.
  3. Extending clusters since the new node can simply mount the configuration FS.
  4. Reduce downtime for planned maintainence on systems.

In this short tutorial, I would explain the key components of the SFCFS architecture and also a simple example and some caveats to be taken care duiring installation.

Part 1 – SFCFS key components

SFCFS includes component products like Veritas Cluster Server (VCS), Cluster File System (CFS) and cluster functionality of Veritas Volume Manager (VCM).

  1. VCS manages the communication between nodes and the mechanism to add and register nodes.
  2. CFS manages the coherency in metadata on the shared filesystem. The node that mounts the FS first becomes the primary node, and is responsible for updating the metadata on the shared FS.
  3. CVM makes the logical volumes accessible throughout the cluster.

Part 2 – Membership ports

Each component in SFCFS registers with a membership port. Knowing the various ports and their significance helps in troubleshooting and ensuring approporiate membership.

port a – heartbeat membership.

port b – I/O fencing membership.

port f – Cluster file systems membership.

port u – temporary port used by CVM.

port v – CVM membership.

port q – used by CVM

Part 3 – Installing SFCFS

Although the entire installation process cannot be detiled in this tutorial, a few key steps will be highlighted.

Select a node (preferably the to be primary) to begin installation. Enable direct login via rsh to all other nodes from the primary node.

Mount the CDROM and proceed by running the installer script.

Enter the system names seperated by a space which are to be included in the cluster.

Select “n” for I/O fencing since it has to be manually configured later.

At our place we has uninstalled all Veritas components prior to installing SFCFS, if you have VxVM present on the systems, you will be asked to choose a new or exisiting configuraton.

Enter a cluster name : core_compute

Enter a cluster id : 123

Configure your heartbeat links appropriately. Heartbeat links are private channels used for communication by the VCS. Keep a list of interfaces ready for configuration as link1 and link2 for each node.

For further help refer the installation and admin guides for SFCFS on

Part 4 – Verifying the installation

a. LLT

The following files are important

1. /etc/llthosts – This file is just like /etc/hosts, except instead of IP->hostnames, it does llt node numbers (as set in set-node). You need this file for VCS to start. It should look like this:

0       coredev05
1       coredev06

2. /etc/llttab - low-latency transport configuration file
 # this sets our node ID, must be unique in cluster
set-node 0
 # set the heartbeat links
link hme1 /dev/hme:1 - ether - -
# link-lowpri is for public networks
link-lowpri hme0 /dev/hme:0 - ether - -
 # set cluster number, must be unique
set-cluster 123

b. GAB

GAB requires only one configuration file, /etc/gabtab.

This file lists the number of nodes in the cluster.

/sbin/gabconfig -c –n3
 c. Who’s the master??
 Type vxdctl –m mode on any node to view if it’s a master or slave.
Part 5 - IO Fencing
IO Fencing is a great and reliable way to prevent split brain during jeopardy.
Imagine if all heartbeat links fail, then you would have data corruption since the nodes
would’nt be able to update metadata appropriately.
IO Fencing needs a odd number of disks and alteast 3 of them to be configured
as co-ordinator disks. The co-ordinator disks need to suppport SCSI3-PGR reservations.
Choose 20 MB Luns as co-ordinator disks. You could even use smaller Luns
but we have run into problems with small Luns.

We nowadays use 50 MB Luns to avoid any last minute blues.
Use the vxfentsthdw utility to test disks for SCSI3-PG reservations.
Use this tool carefully since it wipes out all data from the disks.
Add all the selected disks to a disk group using VxVM commands, on any node
# vxdg init coorddg disk01=c7t11d3 disk02=c7t11d4 disk03=c7t11d5
Deport the disk group vxfendg
# vxdg deport coorddg
Re-import the disk group with the –t (temporary) option to prevent it from auto import
during reboot.
# vxdg –t import coorddg
Deport it again to prevent access by anybody.
# vxdg deport coorddg
Add the co-ord disk group to the file /etc/vxfendg on each system.
# echo “coordg” /etc/vxfendg
On each system start IO fencing.
/etc/init.d/vxfen start
This creates a file /etc/vxfentab and lists the disks in the DG listed in /etc/vxfendg.
In this example the /etc/vxfentab would consist of the following
 Part 6 – Regualar Operations
That was all about installation and configuring IO fencing.
Now to share and mount data disk groups.
a. To add a disk group bijudg and vol1 to the configuration and mount it, the
following steps need to be performed.
#cfsmntadm add bijudg vol1 /testmount all=rw
This will associate the mount point /testmount with the cluster.
# cfsmount /testmount
# df –k /testmount
/dev/vx/dsk/bijudg/var-crash  15482443      10 15327609     1%    /testmount
b. Resizing a FS
Just verify that you are resizing the FS on the primary node for that filesystem
# vxdctl –c mode
# fsclustadm –v showprimary /testmount
Use the vxresize command
# vxresize –g bijudg vol1 +5g

Thats all folks..!