|
UNIX ADMIN RESOURCES [all local copies]
Some Old stuff Copied from Black Sheep Networks
|
|
|
|
|
|
|
Freshmeat.net News
|
Testing Veritas Cluster
Testing Veritas Cluster
Actual commands are in black.
0. Check Veritas Licenses - for FileSystem, Volume Manager AND Cluster
vxlicense -p
If any licenses are not valid or expired -- get them FIXED before continuing! All licenses should say "No expiration". If ANY license has an actual expiration date, the test failed. Permenant licenses do NOT have an expiration date. Non-essential licenses may be moved -- however, a senior admin should do this.
1. Hand check SystemList & AutoStartList
On either machine:
-
grep SystemList /etc/VRTSvcs/conf/config/main.cf
You should get:
SystemList = { system1, system2 }
grep AutoStartList /etc/VRTSvcs/conf/config/main.cf
You should get:
AutoStartList = { system1, system2 }
Each list should contain both machines. If not, many of the next tests will fail.
-
If your lists do NOT contain both systems, you will probably need to modify them with
commands that follow.
-
more /etc/VRTSvcs/conf/config/main.cf (See if it is reasonable. It is
likely that the systems aren't fully set up)
haconf -makerw (this lets you write the conf file)
hagrp -modify oragrp SystemList system1 0 system2 1
hagrp -modify oragrp AutoStartList system1 system2
haconf -dump -makero (this makes conf file read only again)
2. Verify Cluster is Running
First verify that veritas is up & running:
-
hastatus -summary
-
If this command could NOT be found, add the following to root's
path in /.profile:
-
vi /.profile
add /opt/VRTSvcs/bin to your PATH variable
-
PATH=/usr/bin:/usr/sbin:/usr/ucb:/usr/local/bin:/opt/VRTSvcs/bin:/sbin:$PATH
export PATH
hastatus -summary
Here is the expected result (your SYSTEMs/GROUPs may vary):
One system should be OFFLINE and one system should be ONLINE ie:
# hastatus -summary
-- SYSTEM STATE -- System State Frozen A e4500a RUNNING 0 A e4500b RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B oragrp e4500a Y N ONLINE B oragrp e4500b Y N OFFLINE
If your systems do not show the above status, try these debugging steps:
- If NO systems are up, run hastart on both systems and
run hastatus -summary again.
- If only one system is shown, start other system with hastart. Note: one system should ALWAYS be OFFLINE for the way we configure systems here. (If we ran
oracle parallel server, this could change -- but currently we run
standard oracle server)
- If both systems are up but are OFFLINE and hastart did NOT correct the problem and oracle filesystems are not running on either system, the cluster needs to be reset.
(This happens under strange network situations with GE Access.) [You ran hastart and that wasn't enough to get full cluster to work.]
Verify that the systems have the following EXACT status (though your machine names will vary for other customers):
gedb002# hastatus -summary -- SYSTEM STATE -- System State Frozen A gedb001 RUNNING 0 A gedb002 RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabled State B oragrp gedb001 Y N OFFLINE B oragrp gedb002 Y N OFFLINE gedb002# hares -display | grep ONLINE nic-qfe3 State gedb001 ONLINE nic-qfe3 State gedb002 ONLINE gedb002# vxdg list NAME STATE ID rootdg enabled 957265489.1025.gedb002 gedb001# vxdg list NAME STATE ID rootdg enabled 957266358.1025.gedb001
Recovery Commands:
-
hastop -all
on one machine hastart
wait a few minutes
on other machine hastart
hastatus -summary (make sure one is OFFLINE && one is ONLINE)
If none of these steps resolved the situation, contact Lorraine or Luke (possibly Russ Button or Jen Redman if they made it to Veritas Cluster class) or a Veritas Consultant.
3. Verify Services Can Switch Between Systems
Once, hastatus -summary works, note the GROUP name used. Usually, it will be "oragrp", but the installer can use any name, so please determine it's name.
First check if group can switch back and forth. On the system that is running (system1), switch veritas to other system (system2):
-
hagrp -switch groupname -to system2
[ie: hagrp -switch oragrp -to e4500b]
-
hagrp -switch groupname -to system1
4. Verify OTHER System Can Go Up & Down Smoothly For Maintanence
On system that is OFFLINE (should be system 2 at this point), reboot the computer.
-
ssh system2
/usr/sbin/shutdown -i6 -g0 -y
-
hastatus -summary
-
hagrp -switch groupname -to
system2
ssh system1
/usr/sbin/shutdown -i6 -g0 -y
-
hastatus -summary
5. Test Actual Failover For System 2 (and pray db is okay)
To do this, we will kill off the listener process, which should force a failover. This test SHOULD be okay for the db (that is why we choose LISTENER) but there is a very small chance things will go wrong .. hence the "pray" part :).
On system that is online (should be system2), kill off ORACLE LISTENER Process
-
ps -ef | grep LISTENER
root 1415 600 0 20:43:58 pts/0 0:00 grep LISTENER oracle 831 1 0 20:27:06 ? 0:00 /apps/oracle/product/8.1.5/bin/tnslsnr LISTENER -inherit
-
kill -9 process-id (the first # in list - in this case 831)
You will note that system 2 is faulted -- and system 1 is now online
You need to CLEAR the fault before trying to fail back over.
-
hares -display | grep FAULT
for the resource that is failed (in this case, LISTENER)
Clear the fault
hares -clear resource-name -sys faulted-system [ie: hares -clear LISTENER -sys e4500b]
6. Test Actual Failover For System 1 (and pray db is okay)
Now we do same thing for the other system first verify that the other system is NOT faulted
-
hastatus -summary
On system that is online (should be system2), kill off ORACLE LISTENER Process
-
ps -ef | grep LISTENER
oracle 987 1 0 20:49:19 ? 0:00 /apps/oracle/product/8.1.5/bin/tnslsnr LISTENER -inherit root 1330 631 0 20:58:29 pts/0 0:00 grep LISTENER
-
kill -9 process-id (the first # in list - in this case 987)
You will note that system 1 is faulted -- and system 1 is now online
You need to CLEAR the fault before trying to fail back over.
-
hares -display | grep FAULT
for the resource that is failed (in this case, LISTENER)
Clear the fault
hares -clear resource-name -sys faulted-system [ie: hares -clear LISTENER -sys e4500a]
Run:
-
hastatus -summary
Veritas FileSystem Overview
Veritas Volume Manager Overview
Veritas Cluster Overview * Veritas Cluster Install
Veritas Cluster Debugging * Veritas Cluster Testing
Unix Tutorials ~
Unix System Security ~
Unix Help
Free URL Submit ~
UnixTools.com ~
Free Web Resources
Unix Software ~
Unix Hardware ~
Web Related Books
Free URL Submit ~ UnixTools.com ~ Free Web Resources
Unix Software ~ Unix Hardware ~ Web Related Books
Pages are copied from UnixTools.com
A good explanation of MULTINICA and MULTINICB in VCS
Here is the example MulticnicA that i have configured. Here i'm going to use base IP for both the NIC.Here Linux641 is server1 and linux642 is server 2.

Below is the sample main.cf output for multinica.

Below is the IPmulticnic that is confgiugred. This will point to MultinicA(nica) resource.

MULTINICB (Link-based IPMP setup with VCS)
With Solaris 10 came a nice feature – Link-based IP Multipathing (IPMP). It determines NIC availability solely on the NIC driver reporting the physical link status – UP or DOWN. Previous versions used “probe-based” IPMP, where connectivity is tested by pinging something on the network from each interface. While probe-based is actually a more thorough test (tests network layer 3 as well as 2), it is much more cumbersome to configure, and you need an extra IP address for each interface for “test” addresses. IMO Link-based IPMP is sufficient for most applications.
For some reason, configuring link-based IPMP in VCS is somewhat tricky, and the documentation doesn’t seem to help much. It seems all the default values for VCS are for probe-based IPMP only.
To achieve link-based IPMP, here’s how I’ve configured my MultiNICB resource:

UseMpathd: 1
Tells VCS to use mpathd for network link status
MpathCommand: /usr/lib/inet/in.mpathd -a
The default, /usr/sbin/in.mpathd is just incorrect – it doesn’t live there.
ConfigCheck: 0
If you leave this at 1, it will overwrite your /etc/hostname.xxx files with probe-based IPMP configuration
Device: (your IPMP interfaces here)
The “interface alias” for each device is not needed, leave them blank.
IgnoreStatus: 0
You want VCS to NOT ignore link status, since this is how link-based IPMP works.
GroupName:
Do not use your IPMP group name here, it’s not needed. VCS is not monitoring the group, mpathd is.
Here’s how it looks in main.cf:
MultiNICB csgmultinic (
UseMpathd = 1
MpathdCommand = “/usr/lib/inet/in.mpathd -a”
ConfigCheck = 0
Device = { ce0 = “”, ce4 = “” }
IgnoreLinkStatus = 0
)
Saturday, July 6, 2013
Thursday, July 4, 2013
What's up with LDoms
Part 1 - Introduction & Basic Concepts
Part 2 - Creating a first, simple guest
Part 3 - A closer look at Disk Backend Choices
Part 4 - Virtual Networking Explained
Part 5 - A few Words about Consoles
Part 6 - Sizing the IO Domain
Part 7 - Layered Virtual Networking
For further reading, here are some recommendable links:
- The LDoms 2.2 Admin Guide
- The "Beginners Guide to LDoms"
- The LDoms Information Center on MOS
- LDoms on OTN
Wednesday, July 3, 2013
Solaris ILOM / ALOM Cheat Sheet
ILOM ALOM CMT Command Comparison
ALOM: ILOM:
setdate set /SP/clock datetime=value
value format: MMDDhhmmYYYY
setdefaults set /SP reset_to_defaults=all
-> reset /SP This resets the SP
setkeyswitch set /SYS keyswitch_state=value
value= normal, diag, stby, locked
setsc set target property=value
setupsc No equivalent in ILOM
setlocator set /SYS/LOCATE value= Fast_Blink or off
setfru -c data set /SYS customer_frudata=data
showplatform show /HOST
showplatform show /SYS ( to view Serial Number )
showfru No equivalent in ILOM
showusers -g # show /SP/users
showhost show /HOST
showkeyswitch show /SYS keyswitch_state
showsc param show target property
VIEW DIAG LEVEL show /HOST/diag
setsc diag_level set /HOST/diag trigger=All-resets
none, normal, User-reset, Power-on-reset,
Error-reset
showdate show /SP/clock datetime
showlogs show /SP/logs/event/list
showlogs show /SP/faultmgmt
set /SP/logs/event clear=true
showenvironment show -o table -level all /SYS
shownetwork show /SP/network
showlocator show /SYS/LOCATE
password set /SP/users/ password
restartssh set /SP/services/ssh restart_sshd_action=true
usershow show /SP/users
useradd user create /SP/users/
Create "admin" create /SP/users/admin
set /SP/users/admin role=Administrator
set /SP/users/admin cli_mode=alom
userdel user delete /SP/users/
userdel -y delete -script /SP/users/
userpassword set /SP/users/ password
userperm user set /SP/users/ role=permissions
consolehistory SEE RENE FOR MORE INFO ;)
console -f start -force /SP/console
break -c set /HOST send_break_action=break
break -D set /HOST send_break_action=dumpcore
bootmode set /HOST/bootmode property=value
state=value "reset_nvram or normal"
script="setenv auto-boot? false"
flashupdate -s load -source tftp://ipaddr/pathname
reset -c reset /SYS
reset -y -c reset -script /SYS
powercycle stop /SYS
powercycle -y stop -script /SYS
powercycle -f stop -force /SYS
start -force /SYS
poweroff stop /SYS
poweroff -y stop -script /SYS
poweroff -f stop -force /SYS
poweron start /SYS
clearfault uuid set /SYS/component clear_fault_action=true
removefru -y set /SYS/PS0 prepare_to_remove_action=true
enablecomponent set /SYS/component component_state=enabled
disablecomp set /SYS/component component_state=disabled
clearasrdb No equivalent in ILOM
resetsc reset /SP
resetsc -y reset -script /SP
userclimode set /SP/users/ cli_mode=default or alom
logout exit
DISPLAYING DIMM INFORMATION:
-> show /SYS/MB/CMP0/BR0/CH0/D#
Targets:
SEEPROM
SERVICE
PRSNT
T_AMB
Properties:
type = DIMM
component_state = Enabled
fru_name = 4096MB DDR2 SDRAM FB-DIMM 333 (PC2 2600)
fru_description = FBDIMM 4096 Mbyte
fru_manufacturer = Samsung
fru_version = FFFFFF
fru_part_number = 501-7954-01 Rev 05
fru_serial_number = 00CE01074627037EA3
fault_state = OK
clear_fault_action = (none)
Setting up Network Managment Port ILOM:
-> set pendingipaddress=
-> set pendingipdiscovery=static
-> set pendingipnetmask=255.255.255.0
-> set pendingipgateway=
-> set commitpending=true
Setting up Network Managment Port ALOM:
sc> setsc if_network true
sc> setsc if_connection "telnet or ssh"
sc> setsc netsc_dhcp false
sc> setsc netsc_ipaddr
sc> setsc netsc_ipnetmask
sc> setsc netsc_ipgateway
sc> setsc netsc_commit
ALOM CMT Variable Comparison
ALOM: ILOM:
diag_level /HOST/diag level
diag_mode /HOST/diag mode
diag_trigger /HOST/diag trigger
diag_verbosity /HOST/diag verbosity
if_connection /SP/services/ssh state
if_emailalerts /SP/clients/smtp state
if_network /SP/network state
if_snmp /SP/services/snmp
mgt_mailalert /SP/alertmgmt/rules
mgt_mailhost /SP/clients/smtp address
mgt_snmptraps /SP/sevices/snmp v1|v2c|v3
mgt_traphost /SP/alertmgmt/rules
/SP/services/snmp port
netsc_dhcp /SP/network pendingipdiscovery
netsc_commit /SP/network commitpending=true
netsc_enetaddr /SP/network macaddress
netsc_ipaddr /SP/network pendingipaddress
netsc_ipgateway /SP/network pendingipgateway
netsc_ipnetmask /SP/network pendingipnetmask
sc_backupuserdata /SP BACKUP_USER_DATA
sc_customerinfo /SP system_identifier
sc_escapechars /SP/console escapechars
sc_powerondelay /SP/policy HOST_POWER_ON_DELAY
sc_powerstatememory /SP/policy HOST_LAST_POWER_STATE
States= enabled or disabled
ser_baudrate /SP/serial/external pendingspeed
ser_data No equivalent in ILOM
ser_parity /SP/serial/external pendingparity
ser_stopbits /SP/serial/external pendingstopbits
sys_autorestart /SP autorestart
sys_autorunonerror /SP autorunonerror
sys_eventlevel No equivalent in ILOM
sys_enetaddr /HOST macaddress
Procedure to set the Serial Number after PDB replacement:
sc> setsc sc_servicemode true
Warning: misuse of this mode may invalidate your warranty.
sc> setcsn -c chassis_serial_number
Are you sure you want to permanently set the Chassis Serial
Number to chassis_serial_number[y/n]? y
Chassis serial number recorded.
sc> showplatform
SUNW,Sun-Fire-T5120
Chassis Serial Number: chassis-serial-number
Domain Status
------ ------
S0 Running
sc>setsc sc_servicemode false
HOW TO RESET ILOM PASSWORD:
InfoDoc #: 209731
Power down the host system (using the front panel powerbutton)
or if an SP admin account exists, you can alternatiely use that
accounts ALOM Command Line Interface poweroff command.
Unplug the system's power cord(s) Remove the system's top cover.
Insert a Jumper (you provide the jumper) on Pins 1 & 2 of
PJ6801. This Jumper is located at the T5120/T5220 Motherboard
Insert a Jumper (you provide the jumper) on Pins 1 & 2 of
J10401. This Jumper is located on the SP of the T5140/T5240
- near the edge of the Motherboard at rear of the system -
center of the rear edge of the Motherboard.
Plug in the system's power cord(s).
Press the front panel Power button to power on the system.
You must power on the system to complete the reset.
This is because the state of the PJ6801 jumper cannot be
determined without the host CPU running.
The SP root password will be reset to the default changeme.
Log in as root into the SP, using any available method - ssh
or a Web Browser to the SP's network management port or via
tip-hardware or a terminal server to the SP's serial management
port.
Password to use is changeme.
That is simply to see if the changeme password works.
Power down the system using the front panel
Unplug the system's power cord(s)
Remove the PJ6801 jumper. You must remove the PJ6801 jumper
after resetting the password, or the password will be reset
every time the SP is reset (e.g. at power up).
Replace the system's top cover.
Plug in the system's power cord(s).
If the system administrator would like the SP's root account
password changed to something other than changeme Then you
can change the root password using the usualSP's root account's
Command Line User Interface command. Whether or not the SP's
root account password is to be changed to something different
than changeme after the top cover has been reinstalled and the
system's power cord(s) plugged-in, if the system administrator
would like the Host powered-up, that can be done using the
front panel power button or via a login to the SP's root or
admin accounts and the appropriate START or power on,
respectively, may be used.


