MyTechNote

If any licenses are not valid or expired -- get them FIXED before continuing! All licenses should say "No expiration". If ANY license has an actual expiration date, the test failed. Permenant licenses do NOT have an expiration date. Non-essential licenses may be moved -- however, a senior admin should do this.

1. Hand check SystemList & AutoStartList

On either machine:

grep SystemList /etc/VRTSvcs/conf/config/main.cf

system1

system2

grep AutoStartList /etc/VRTSvcs/conf/config/main.cf

system1

system2

Each list should contain both machines. If not, many of the next tests will fail.

system1

system2

system1

system2

2. Verify Cluster is Running

First verify that veritas is up & running:

hastatus -summary

vi /.profile

. /.profile

hastatus -summary

Here is the expected result (your SYSTEMs/GROUPs may vary):

One system should be OFFLINE and one system should be ONLINE ie:
# hastatus -summary

  -- SYSTEM STATE
  -- System               State                Frozen              

  A  e4500a               RUNNING              0                    
  A  e4500b               RUNNING              0                    

  -- GROUP STATE
  -- Group           System               Probed     AutoDisabled    State          

  B  oragrp          e4500a               Y          N               ONLINE         
  B  oragrp          e4500b               Y          N               OFFLINE

If your systems do not show the above status, try these debugging steps:

If NO systems are up, run hastart on both systems and run hastatus -summary again.
If only one system is shown, start other system with hastart. Note: one system should ALWAYS be OFFLINE for the way we configure systems here. (If we ran oracle parallel server, this could change -- but currently we run standard oracle server)

If both systems are up but are OFFLINE and hastart did NOT correct the problem and oracle filesystems are not running on either system, the cluster needs to be reset. (This happens under strange network situations with GE Access.) [You ran hastart and that wasn't enough to get full cluster to work.]

Verify that the systems have the following EXACT status (though your machine names will vary for other customers):

gedb002# hastatus -summary

-- SYSTEM STATE
-- System               State                Frozen              

A  gedb001              RUNNING              0                    
A  gedb002              RUNNING              0                    

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State        
  

B  oragrp          gedb001              Y          N               OFFLINE      
  
B  oragrp          gedb002              Y          N               OFFLINE      
  
gedb002#  hares -display | grep  ONLINE
nic-qfe3  State           gedb001   ONLINE
nic-qfe3  State           gedb002   ONLINE

gedb002# vxdg list
NAME         STATE           ID
rootdg       enabled  957265489.1025.gedb002

gedb001# vxdg list
NAME         STATE           ID
rootdg       enabled  957266358.1025.gedb001

Recovery Commands:

hastop -all

hastart

hastatus -summary

If none of these steps resolved the situation, contact Lorraine or Luke (possibly Russ Button or Jen Redman if they made it to Veritas Cluster class) or a Veritas Consultant.

3. Verify Services Can Switch Between Systems

Once, hastatus -summary works, note the GROUP name used. Usually, it will be "oragrp", but the installer can use any name, so please determine it's name.

First check if group can switch back and forth. On the system that is running (system1), switch veritas to other system (system2):

hagrp -switch groupname -to system2

Watch failover with hastatus -summary. Once it is failed over, switch it back:

hagrp -switch groupname -to system1

4. Verify OTHER System Can Go Up & Down Smoothly For Maintanence

On system that is OFFLINE (should be system 2 at this point), reboot the computer.

ssh system2

/usr/sbin/shutdown -i6 -g0 -y

Make sure that the when the system comes up & is running after the reboot. That is, when the reboot is finished, the second system should say it is offline using hastatus.

hastatus -summary

Once this is done, hagrp -switch groupname -to system2 and repeat reboot for the other system

hagrp -switch groupname -to system2

ssh system1

/usr/sbin/shutdown -i6 -g0 -y

Verify that system1 is in cluster once rebooted

hastatus -summary

5. Test Actual Failover For System 2 (and pray db is okay)

To do this, we will kill off the listener process, which should force a failover. This test SHOULD be okay for the db (that is why we choose LISTENER) but there is a very small chance things will go wrong .. hence the "pray" part :).

On system that is online (should be system2), kill off ORACLE LISTENER Process

ps -ef | grep LISTENER

Output should be like:

  root  1415   600  0 20:43:58 pts/0    0:00 grep LISTENER
  oracle   831     1  0 20:27:06 ?        0:00 /apps/oracle/product/8.1.5/bin/tnslsnr LISTENER -inherit

kill -9 process-id

Failover will take a few minutes

You will note that system 2 is faulted -- and system 1 is now online

You need to CLEAR the fault before trying to fail back over.

hares -display | grep FAULT

hares -clear resource-name -sys faulted-system

6. Test Actual Failover For System 1 (and pray db is okay)

Now we do same thing for the other system first verify that the other system is NOT faulted

hastatus -summary

Now do the same thing on this system... To do this, we will kill off the listener process, which should force a failover.

On system that is online (should be system2), kill off ORACLE LISTENER Process

ps -ef | grep LISTENER

Output should be like:

  oracle   987     1  0 20:49:19 ?        0:00 /apps/oracle/product/8.1.5/bin/tnslsnr LISTENER -inherit
  root  1330   631  0 20:58:29 pts/0    0:00 grep LISTENER

kill -9 process-id

Failover will take a few minutes

You will note that system 1 is faulted -- and system 1 is now online

You need to CLEAR the fault before trying to fail back over.

hares -display | grep FAULT

hares -clear resource-name -sys faulted-system

Run:

hastatus -summary

to make sure everything is okay.

Veritas Product Overview
Veritas FileSystem Overview
Veritas Volume Manager Overview
Veritas Cluster Overview * Veritas Cluster Install
Veritas Cluster Debugging * Veritas Cluster Testing

Unix Tutorials ~ Unix System Security ~ Unix Help
Free URL Submit ~ UnixTools.com ~ Free Web Resources
Unix Software ~ Unix Hardware ~ Web Related Books

Pages are copied from UnixTools.com

A good explanation of MULTINICA and MULTINICB in VCS

MULTINICA & MULTINICB IN VCS

( from http://unixtips.hpage.co.in/multinic_42103512.html )

The MultiNICA represents a set of network interfaces and provides failover capabilities between them.You can use one base IP address for all NICs, or you can specify a different IP address for use with each NIC. The MultiNICA agent configures one interface at a time. If it does not detect activity on the configured interface, it configures a new interface and migrates IP aliases to it.
Here is the example MulticnicA that i have configured. Here i'm going to use base IP for both the NIC.Here Linux641 is server1 and linux642 is server 2.

This is sample MutilnicA attribute. 192.168.0.101 is the base IP os linux 641 server and IP moves between eth0 and eth3 when there is failure.
Below is the sample main.cf output for multinica.

Below is the IPmulticnic that is confgiugred. This will point to MultinicA(nica) resource.

Here the ip 192.168.0.1 floats between eth0 and eth3 along with base IP.

MULTINICB (Link-based IPMP setup with VCS)
With Solaris 10 came a nice feature – Link-based IP Multipathing (IPMP). It determines NIC availability solely on the NIC driver reporting the physical link status – UP or DOWN. Previous versions used “probe-based” IPMP, where connectivity is tested by pinging something on the network from each interface. While probe-based is actually a more thorough test (tests network layer 3 as well as 2), it is much more cumbersome to configure, and you need an extra IP address for each interface for “test” addresses. IMO Link-based IPMP is sufficient for most applications.
For some reason, configuring link-based IPMP in VCS is somewhat tricky, and the documentation doesn’t seem to help much. It seems all the default values for VCS are for probe-based IPMP only.
To achieve link-based IPMP, here’s how I’ve configured my MultiNICB resource:

These are the values you must change from the defaults:
UseMpathd: 1
Tells VCS to use mpathd for network link status
MpathCommand: /usr/lib/inet/in.mpathd -a
The default, /usr/sbin/in.mpathd is just incorrect – it doesn’t live there.
ConfigCheck: 0
If you leave this at 1, it will overwrite your /etc/hostname.xxx files with probe-based IPMP configuration
Device: (your IPMP interfaces here)
The “interface alias” for each device is not needed, leave them blank.
IgnoreStatus: 0
You want VCS to NOT ignore link status, since this is how link-based IPMP works.
GroupName:
Do not use your IPMP group name here, it’s not needed. VCS is not monitoring the group, mpathd is.
Here’s how it looks in main.cf:
MultiNICB csgmultinic (
UseMpathd = 1
MpathdCommand = “/usr/lib/inet/in.mpathd -a”
ConfigCheck = 0
Device = { ce0 = “”, ce4 = “” }
IgnoreLinkStatus = 0
)

Saturday, July 6, 2013

About ZFS

Thursday, July 4, 2013

Rational ClearCase VOBs

What's up with LDoms

Part 1 - Introduction & Basic Concepts
Part 2 - Creating a first, simple guest
Part 3 - A closer look at Disk Backend Choices
Part 4 - Virtual Networking Explained
Part 5 - A few Words about Consoles
Part 6 - Sizing the IO Domain
Part 7 - Layered Virtual Networking

For further reading, here are some recommendable links:

Wednesday, July 3, 2013

Solaris ILOM / ALOM Cheat Sheet

ILOM ALOM CMT Command Comparison

ALOM:          ILOM:
setdate        set /SP/clock datetime=value
               value format: MMDDhhmmYYYY
 
setdefaults    set /SP reset_to_defaults=all
               -> reset /SP This resets the SP
 
setkeyswitch   set /SYS keyswitch_state=value
                value= normal, diag, stby, locked
 
setsc          set target property=value
setupsc        No equivalent in ILOM
setlocator     set /SYS/LOCATE value= Fast_Blink or off
setfru -c data set /SYS customer_frudata=data 
showplatform   show /HOST
showplatform   show /SYS ( to view Serial Number )
showfru        No equivalent in ILOM
showusers -g # show /SP/users
showhost       show /HOST
showkeyswitch  show /SYS keyswitch_state
 
showsc param   show target property
VIEW DIAG LEVEL show /HOST/diag
 
setsc diag_level set /HOST/diag trigger=All-resets
                 none, normal, User-reset, Power-on-reset, 
                 Error-reset
 
showdate       show /SP/clock datetime
 
showlogs       show /SP/logs/event/list
showlogs       show /SP/faultmgmt
               set /SP/logs/event clear=true
 
showenvironment show -o table -level all /SYS
shownetwork    show /SP/network
showlocator    show /SYS/LOCATE
password       set /SP/users/ password
restartssh     set /SP/services/ssh restart_sshd_action=true
usershow       show /SP/users
useradd user   create /SP/users/
 
Create "admin" create /SP/users/admin
               set /SP/users/admin role=Administrator
               set /SP/users/admin cli_mode=alom
 
userdel user   delete /SP/users/
userdel -y     delete -script /SP/users/
userpassword   set /SP/users/ password
userperm user  set /SP/users/ role=permissions
consolehistory SEE RENE FOR MORE INFO ;)
console -f     start -force /SP/console
break -c       set /HOST send_break_action=break
break -D       set /HOST send_break_action=dumpcore
 
bootmode       set /HOST/bootmode property=value
                state=value "reset_nvram or normal"
                script="setenv auto-boot? false"
 
flashupdate -s load -source tftp://ipaddr/pathname
reset -c       reset /SYS
reset -y -c    reset -script /SYS
powercycle     stop /SYS
powercycle -y  stop -script /SYS
powercycle -f  stop -force /SYS
               start -force /SYS
poweroff       stop /SYS
poweroff -y    stop -script /SYS
poweroff -f    stop -force /SYS
poweron        start /SYS
clearfault uuid set /SYS/component clear_fault_action=true
removefru -y   set /SYS/PS0 prepare_to_remove_action=true
enablecomponent set /SYS/component component_state=enabled
disablecomp    set /SYS/component component_state=disabled
clearasrdb     No equivalent in ILOM
resetsc        reset /SP
resetsc -y     reset -script /SP
userclimode    set /SP/users/ cli_mode=default or alom
logout         exit
 
DISPLAYING DIMM INFORMATION:
 
-> show /SYS/MB/CMP0/BR0/CH0/D#
 
    Targets:
        SEEPROM
        SERVICE
        PRSNT
        T_AMB
 
    Properties:
        type = DIMM
        component_state = Enabled
        fru_name = 4096MB DDR2 SDRAM FB-DIMM 333 (PC2 2600)
        fru_description = FBDIMM 4096 Mbyte
        fru_manufacturer = Samsung
        fru_version = FFFFFF
        fru_part_number = 501-7954-01 Rev 05
        fru_serial_number = 00CE01074627037EA3
        fault_state = OK
        clear_fault_action = (none)
 
Setting up Network Managment Port ILOM:
 
-> set pendingipaddress=
-> set pendingipdiscovery=static
-> set pendingipnetmask=255.255.255.0
-> set pendingipgateway=
-> set commitpending=true
 
Setting up Network Managment Port ALOM:
 
sc> setsc if_network true
sc> setsc if_connection "telnet or ssh"
sc> setsc netsc_dhcp false
sc> setsc netsc_ipaddr 
sc> setsc netsc_ipnetmask 
sc> setsc netsc_ipgateway 
sc> setsc netsc_commit
 
 
ALOM CMT Variable Comparison
 
ALOM:          ILOM:
diag_level      /HOST/diag level
diag_mode       /HOST/diag mode
diag_trigger    /HOST/diag trigger
diag_verbosity  /HOST/diag verbosity
if_connection   /SP/services/ssh state
if_emailalerts  /SP/clients/smtp state
if_network      /SP/network state
if_snmp         /SP/services/snmp
mgt_mailalert   /SP/alertmgmt/rules
mgt_mailhost    /SP/clients/smtp address
mgt_snmptraps   /SP/sevices/snmp v1|v2c|v3
mgt_traphost    /SP/alertmgmt/rules
                /SP/services/snmp port
netsc_dhcp      /SP/network pendingipdiscovery
netsc_commit    /SP/network commitpending=true
netsc_enetaddr  /SP/network macaddress
netsc_ipaddr    /SP/network pendingipaddress
netsc_ipgateway /SP/network pendingipgateway
netsc_ipnetmask /SP/network pendingipnetmask
sc_backupuserdata /SP BACKUP_USER_DATA
sc_customerinfo /SP system_identifier
sc_escapechars  /SP/console escapechars
sc_powerondelay /SP/policy HOST_POWER_ON_DELAY
 
sc_powerstatememory /SP/policy HOST_LAST_POWER_STATE
                   States= enabled or disabled
 
ser_baudrate    /SP/serial/external pendingspeed
ser_data        No equivalent in ILOM
ser_parity      /SP/serial/external pendingparity
ser_stopbits    /SP/serial/external pendingstopbits
sys_autorestart /SP autorestart
sys_autorunonerror /SP autorunonerror
sys_eventlevel  No equivalent in ILOM
sys_enetaddr    /HOST macaddress
 
Procedure to set the Serial Number after PDB replacement:
 
sc> setsc sc_servicemode true
Warning: misuse of this mode may invalidate your warranty.
sc> setcsn -c chassis_serial_number
Are you sure you want to permanently set the Chassis Serial 
Number to chassis_serial_number[y/n]? y
Chassis serial number recorded.
sc> showplatform
SUNW,Sun-Fire-T5120
Chassis Serial Number: chassis-serial-number
Domain Status
------ ------
S0 Running
sc>setsc sc_servicemode false 
 
HOW TO RESET ILOM PASSWORD:
InfoDoc #: 209731
 
Power down the host system (using the front panel powerbutton)
or if an SP admin account exists, you can alternatiely use that
accounts ALOM Command Line Interface poweroff command.
 
Unplug the system's power cord(s) Remove the system's top cover.
 
 Insert a Jumper (you provide the jumper) on Pins 1 & 2 of 
PJ6801. This Jumper is located at the T5120/T5220 Motherboard
 
 Insert a Jumper (you provide the jumper) on Pins 1 & 2 of 
J10401. This Jumper is located on the SP of the T5140/T5240
 
- near the edge of the Motherboard at rear of the system -
  center of the rear edge of the Motherboard.
 
Plug in the system's power cord(s).
 
Press the front panel Power button to power on the system.
  You must power on the system to complete the reset.
 
This is because the state of the PJ6801 jumper cannot be
determined without the host CPU running.
 
The SP root password will be reset to the default  changeme.
 
Log in as root into the SP, using any available method - ssh
or a Web Browser to the SP's network management port or via
tip-hardware or a terminal server to the SP's serial management
port.
 
Password to use  is   changeme.
 
That is simply to see if the changeme password works.
 
Power down the system using the front panel
 
Unplug the system's power cord(s)
 
Remove the PJ6801 jumper. You must remove the PJ6801 jumper
after resetting the password, or the password will be reset
every time the SP is reset (e.g. at power up).
 
Replace the system's top cover.
 
Plug in the system's power cord(s).
 
If the system administrator would like the SP's root account
password changed to something other than  changeme  Then you
can change the root password using the usualSP's root account's
Command Line User Interface command. Whether or not the SP's 
root account password is to be changed to something different 
than changeme after the top cover has been reinstalled and the 
system's power cord(s) plugged-in, if the system administrator 
would like the Host powered-up, that can be done using the 
front panel power button or via a login to the SP's root or 
admin accounts and the appropriate START or power on, 
respectively, may be used.

Sunday, July 14, 2013

Black Sheep Networks Resources

Testing Veritas Cluster

Testing Veritas Cluster

0. Check Veritas Licenses - for FileSystem, Volume Manager AND Cluster

1. Hand check SystemList & AutoStartList

2. Verify Cluster is Running

3. Verify Services Can Switch Between Systems

4. Verify OTHER System Can Go Up & Down Smoothly For Maintanence

5. Test Actual Failover For System 2 (and pray db is okay)

6. Test Actual Failover For System 1 (and pray db is okay)

Unix Tutorials ~ Unix System Security ~ Unix Help Free URL Submit ~ UnixTools.com ~ Free Web Resources Unix Software ~ Unix Hardware ~ Web Related Books

A good explanation of MULTINICA and MULTINICB in VCS

Saturday, July 6, 2013

About ZFS

Thursday, July 4, 2013

Rational ClearCase VOBs

What's up with LDoms

Wednesday, July 3, 2013

Solaris ILOM / ALOM Cheat Sheet

Unix Tutorials ~ Unix System Security ~ Unix Help
Free URL Submit ~ UnixTools.com ~ Free Web Resources
Unix Software ~ Unix Hardware ~ Web Related Books