Sunday, February 27, 2011

How to handle lost undo datafile

As DBA we are not suppose to drop any datafile without proper analysis and backup even if it is offline datafile. But as Human we make mistakes which sometimes cost us large. In this blog i am going to share how to recover lost undo datafile.

There are 2 scenarios lost undo datafile with active transactions and without active transactions. In active transaction case, the preferred option is to restore the datafile from backup and perform media recovery. If there is no active transactions then follow these steps to recover the database.

1. Query to check for active transactions,

SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */ KTUXESTA Status,KTUXECFL Flags
FROM x$ktuxe WHERE ktuxesta!=’INACTIVE’;

select local_tran_id, state from dba_2pc_pending;

Both the above query should return “no rows”.

2. Make clean shutdown of database,

shutdown immediate;

3. create a parameter file from spfile

create pfile from spfile;

4. Update pfile with “undo_management=manual”

5. Start the database

startup pfile=$ORACLE_HOME/dbs/init.ora

6. Drop undo tablespace with datafile

drop tablespace including contents and datafiles;

7. Create a new undo tablespace,

CREATE UNDO TABLESPACE DATAFILE ‘’ SIZE 2M AUTOEXTEND ON NEXT 1M MAXSIZE UNLIMITED;

8. Shutdown the database

shutdown immediate;

9. Update pfile with “undo_management=AUTO”

10. Create spfile from pfile

create spfile from pfile;

11. Start the database

startup;

Reference
*********

How to Recover from a Lost Datafile with Different Scenarios [ID 198640.1]

Monday, February 14, 2011

rconfig does not recognize cluster environment

rconfig is a command line tool to convert single instance database to RAC. It got introduced from 10gR2. We were trying to convert single instance 11gR1 database to RAC database using 11gR2 grid. After installing grid, we verified the cluster status using "crs_stat". All the cluster services are running fine. But when we try to invoke rconfig from 11gR1 Oracle Home, we got this below error;

Operation Failed
Clusterware is not configured

On Further analysis in rconfig log file in $ORACLE_HOME/cfgtoollogs/rconfig, we notice that
11gR1 Oracle home could not communicate with CRS services.

...
oracle.ops.mgmt.cluster.ClusterException: PRKH-1010 : Unable to communicate with CRS services.
....

Then we come across this metalink note 946332.1, which states that "unable to create lower version database in 11gR2 grid"

Cause:
RAC nodes were not pinned:

$11gR2_ORACLE_HOME/grid/bin/olsnodes -t
oranode1 Unpinned
oranode2 Unpinned
oranode3 Unpinned

This is a known issue when trying to configure lower version of database with fresh 11gR2 grid infrastructure. The fix is manually need to pin RAC nodes. Run below command as root in any 1 node.

# crsctl pin css -n oranode1 oranode2 oranode3
CRS-4664: Node oranode1 successfully pinned.
CRS-4664: Node oranode2 successfully pinned.
CRS-4664: Node oranode3 successfully pinned.

Check the status, node should be "Pinned"
$11gR2_ORACLE_HOME/grid/bin/olsnodes -t
oranode1 Pinned
oranode2 Pinned
oranode3 Pinned

Note: This issue will occur even when trying to create database using dbca from lower version Oracle home IN 11gR2 grid.

Reference
=======
1. Unable To Create 10.1 or 10.2 or 11.1 ASM RAC Databases (ORA-29702) Using Brand New 11gR2 Grid Infrastructure Installation. [946332.1]

Saturday, February 12, 2011

SYSTEM09.dbf file corrupted in R12.1.1 fresh install on Linux x86_64

Fresh installed R12.1.1 with rapidwizard version (StartCD) 12.1.1.9 or below on Linux X86_64 contains corrupted SYSTEM09.dbf file. Thankfully those corruption are not data corruption (does not belong to any extents/segments) hence will not affect your installation and database startup. Might be this was the reason that Oracle could not capture this issue before it releases to the customer.

User will identify this corruption only when they starts to backup using RMAN or validates datafile using RMAN. RMAN will fail with below error message,

ORA-19566: exceeded limit of 0 corrupt blocks for file /DBF_LOC/system09.dbf

On verifying the datafile using dbv, there will be around 20000+ corrupted blocks in system09.dbf,

...
Page 192127 is marked corrupt
Corrupt block relative dba: 0x5802ee7f (file 352, block 192127)
Completely zero block found during dbv:

Page 192128 is marked corrupt
Corrupt block relative dba: 0x5802ee80 (file 352, block 192128)
Completely zero block found during dbv:



DBVERIFY - Verification complete

Total Pages Examined : 192128
Total Pages Processed (Data) : 54272
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 26750
Total Pages Failing (Index): 0
Total Pages Processed (Other): 2063
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 82570
Total Pages Marked Corrupt : 26473
Total Pages Influx : 0
Total Pages Encrypted : 0
Highest block SCN : 3755938521 (1388.3755938521)
$

Since the affected blocks are empty blocks we can resize the datafile to fix this corruption issue. Run this command as sysdba,

alter database datafile '/DBF_LOC/system09.dbf' resize 1000M;


Reference
=======
1. Oracle E-Business Suite Installation and Upgrade Notes Release 12 (12.1.1) for Linux x86-64 [ID 761566.1]
2. More than 20000 Blocks Corrupted in system09.dbf After Fresh Installation of E-Business Suite R12.1.1 on Linux x86_64 Platform [ID 840963.1]

Wednesday, February 9, 2011

Device Mapper Multipathing



Multipath
========

The connection from the server through Host Bus Adapter (HBA) to storage controller is referred as a path. When multiple paths exists to a storage device(LUN) on a storage subsystem, it is referred as multipath connectivity. It is a enterprise level storage capability. Main purpose of multipath connectivity are;

* To provide redundant access to the storage devices, i.e to have access to the storage device when one or more of the components in a path fail.
* The increased throughput by way of load balancing.


A simple example of multipath could be: 2 HBAs connected to a switch to which the storage controllers are connected. In this case the storage controller can be accessed from either of the HBAs and hence we have multipath connectivity.

In Linux, a SCSI device is configured for a LUN seen on each path. i.e, if a LUN has 4 paths, then one will see four SCSI devices getting configured for the same storage LUN. Doing I/O to a LUN in a such an environment is unmanageable,

* applications/administrators do not know which SCSI device to use
* all applications consistently using the same device even on path failure
* always using the storage device specific preferred path
* spreading I/O between multiple valid paths

Device mapper Multipathing
=====================

Device mapper multipathing (DM-Multipath) allows you to configure multiple I/O paths between server nodes and storage arrays into a single device. Multipathing aggregates the I/O paths, creating a new device that consists of the aggregated paths.

Storage devices with 2 controller can be configured,

Active/Active: Means that both the controllers can process I/Os.
Active/Passive: Means that one of the controllers(active) can process I/Os, and the other one(passive) is in a standby mode. I/Os to the passive controller will fail.

Steps to configure DM-Multipath
=======================

First we need to check whether all the hardware connections were performed successfully,i.e connection from host to SAN switch through Fiber Channel HBA and connection from SAN switch to storage. Steps listed here are for Active/Passive configured SAN Storage.

1. Check the following rpm for multipath drivers installled.

#rpm –qa | grep device-mapper
device-mapper-1.02.28-2.el5
device-mapper-event-1.02.28-2.el5
device-mapper-multipath-0.4.7-23.el5
device-mapper-1.02.28-2.el5

If the above packages are not installed then install it using “rpm -ivh”. All the above packages we will get it from Enterprise Linux OS install CD. Also you can download from rpm.pbone.net

2. Check devices list in /proc/partitions

# cat /proc/partitions
major minor #blocks name

8 0 291991552 sda
8 1 200781 sda1
8 2 51199155 sda2
8 3 8193150 sda3
8 4 1 sda4
8 5 232396258 sda5
8 16 104857600 sdb
8 32 104857600 sdc

4. Get the unique scsi id (WWID) for the available devices

# scsi_id -g -u -s /block/sda
3600605b00293c32014c002630a328feb
# scsi_id -g -u -s /block/sdb
3600605b00293e20014bf1ca41c462c45
# scsi_id -g -u -s /block/sdc
3600605b00293e20014bf1ca41c462c45
#

Device sdb and sdc are having same SCSI id which means they both points to same LUN in SAN storage.

Note: when obtaining unique scsi id we have use /block/sdb not with /dev/sdb

5. Check available devices listed in OS using fdisk

# fdisk -l

Disk /dev/sda: 298.9 GB, 298999349248 bytes
255 heads, 63 sectors/track, 36351 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 * 1 25 200781 83 Linux
/dev/sda2 26 6399 51199155 83 Linux
/dev/sda3 6400 7419 8193150 82 Linux swap / Solaris
/dev/sda4 7420 36351 232396290 5 Extended
/dev/sda5 7420 36351 232396258+ 83 Linux

Disk /dev/sdb: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn’t contain a valid partition table

Ignore primary device /dev/sda, it is a local hard disk which has OS and other applications.

Note that device /dev/sdb is listed in fdisk but not /dev/sdc thats because I/O to the passive device here /dev/sdc fails.

5. Check for the multipath modules are loaded by issuing the ‘lsmod’ command

#lsmod | grep dm[\-_]round
dm_round_robin 36801 1
dm_multipath 52433 2 dm_round_robin
#

If they are not loaded, load them using ‘modprobe’ command:

# modprobe dm-multipath
# modprobe dm-round-robinps
#

6. Update /etc/multipath.conf

* Set user_friendly_names=yes, which provides user friendly name to the device other wise DM will name the device with WWID as default like /dev/mapper/3600605b00293e20014bf1ca41c462c45
* Blacklist is the list of devices which will not allow DM to configure multipath for example hard disks. To include only specified devices here SAN LUNs for the multipathing, blacklist all the devices and in blacklist_exceptions add the device required for multipathing.

blacklist {
wwid “*”
}
blacklist_exceptions {
wwid 3600605b00293e20014bf1ca41c462c45
}

* Include different devices in multipathing group and provide alias name. If alias name is not provided DM will use the default user friendly naming convention /dev/mapper/mpathn where n is number from 0. In this example we have provided alias name as “oracle”.

multipaths {
multipath {
wwid 3600605b00293e20014bf1ca41c462c45 #<— for sdb and sdc
alias oracle
}

}

7. Set chkconfig to start the mutipath daemon automatically on reboot.

# chkconfig –list multipathd
multipathd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
#

If the service is disable as above, enable it:

# chkconfig multipathd on
# chkconfig –list multipathd
multipathd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
#

8. Start the multipath daemon service:

# service multipathd start
Starting multipathd daemon: [ OK ]
#

9. Check the multipath configurations,

# multipath -ll
sdc: checker msg is “readsector0 checker reports path is down”
oracle (3600605b00293e20014bf1ca41c462c45) dm-0 SUN,SUN_6180
[size=100G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
\_ 7:0:1:0 sdb 8:48 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 7:0:0:0 sdc 8:32 [active][ghost]
#

Note that sdb is active/ready and sdc is active/ghost. status active/ghots points that it is passive connection.

10. Verify DM-Multipathing device in fdisk, in fdisk the DM devices will be listed like /dev/dm-n where n is number from 0.

# fdisk –l

Disk /dev/sda: 298.9 GB, 298999349248 bytes
255 heads, 63 sectors/track, 36351 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 25 200781 83 Linux
/dev/sda2 26 6399 51199155 83 Linux
/dev/sda3 6400 7419 8193150 82 Linux swap / Solaris
/dev/sda4 7420 36351 232396290 5 Extended
/dev/sda5 7420 36351 232396258+ 83 Linux

Disk /dev/sdb: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn’t contain a valid partition table

Disk /dev/dm-0: 107.3 GB, 107374182400 bytes
255 heads, 63 sectors/track, 13054 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/dm-0 doesn’t contain a valid partition table

11. Check user friendly named devices for DM in /dev/

# ls -lrt /dev/mapper/*
crw——- 1 root root 10, 62 Feb 5 01:46 /dev/mapper/control
brw-rw—- 1 root disk 253, 0 Feb 5 01:46 /dev/mapper/oracle

12. Use /dev/mapper/ for any file system activities like partition using fdisk, creating file system and to mount the device. Also make sure in /etc/fstab this user friendly named device is used for auto mount.

Example:
1. To partition,
#fdisk /dev/mapper/oracle

3. To create file syste, After creating partition the device name for the partition will be /dev/mapper/pn where n is number from 1,
#mkfs -t ext3 /dev/mapper/oraclep5

2. To mount the device
#mount -t ext3 /dev/mapper/oraclep5 /data

References
=========

1. Red Hat Enterprise Linux 6 DM Multipath - DM Multipath Configuration and Administration
2. http://sourceware.org/lvm2/wiki/MultipathUsageGuide#active_passive
3. Configuration and Use of Device Mapper Multipathing on Oracle Enterprise Linux (OEL) [ID 555603.1]

Monday, February 7, 2011

OCFS2 Fencing

I am very excited to start writing my experience with Oracle products
and solutions. In this blog am going to detail on OCFS2 fencing technology.

We are setting up a new data center for Oracle products,
for which we have got SUN servers X4170,X4270 and X4470,
SUN SAN storage 6180, SAN Switches and Fiber Channel
HBA (Host Bus Adapter). We have configured SAN storage and allocated
LUNs to the servers for Oracle EBS R12.1, OBI and Hyperian Applications.

We created a required partitions and OCFS2 file system on new devices
for our Oracle EBS R12.1 RAC database and shared Applications File system.
OCFS2 is certified for Shared Application file system, check ID 384248.1.
After configuring our file system and mounted, we started to test
various scenarios to verify the stability of the setup. To our surprise noticed
that server is getting rebooted when we removed all HBAs connected to the
server or powered off SAN switch.

On analysis we found that it is a expected behavior of OCFS2 which is called "Fencing".

Fencing is the act of forcefully removing a node from a cluster.
A node with OCFS2 file system mounted will fence itself when it
realizes that it does not have quorum in a degraded cluster.
It does this so that other nodes won't be stuck trying to access its resources.
In earlier versions of OCFS2, User reported that nodes are hanging
during fencing. From version OCFS2 1.2.5, Oracle no longer uses
"panic" state for fencing instead it uses "machine restart".

Let us now see, how exactly OCFS2 forces kernel to restart on fencing.

  • After configuring OCFS2 and started cluster service O2CB, there will be a heartbeat system file in which every node writes its presence every 2 seconds to its block in the file.
  • Block offset is equal to its global node number, that is node 0 will write to the first block in the file, node 1 to the second block and so on.
  • All nodes will read the heartbeat system file every 2 seconds.
  • As long as timestamp is changing that node is consider alive.
  • A node self-fences if it fails to update its timestamp for ((O2CB_HEARTBEAT_THRESHOLD - 1) * 2) seconds.
  • The [o2hb-xx] kernel thread, after every timestamp write, sets a timer to panic the system after that duration.
  • If the next timestamp is written within that duration, as it should, it first cancels old timer before setting up a new one.
  • If for some reason the [o2hb-x] kernel thread is unable to update the timestamp for O2CB_HEARTBEAT_THRESHOLD (default=7 or 31) loops and thus be deemed dead by other nodes in the cluster and OCFS2 forces kernel to restart.
  • Once a node is deemed dead, the surviving node which manages cluster, lock the dead node's journal, recovers it by replaying the journal.

From the above steps it is evident that the parameter O2CB_HEARTBEAT_THRESHOLD=(((timeout in secs) / 2) + 1), is very important in defining the time line to restart the server.
The default value is 31, which means if the node does not update the timestamp in the heartbeat system file in 60 sec then that node restarts. This value is quite low for RAC environment.

Assume that we have a 2 controller SAN storage and configured active/passive that is at any particular time only 1 controller path is active other is passive and used for failover. OCFS2 forces the kernel to restart (after the timeout) when a cable to a SAN device is cut even if the SAN configuration was going to perform failover.

To over come this we have to increase the value of O2CB_HEARTBEAT_THRESHOLD. If you want to increase to 120sec then the value should be 61.

Steps to change O2CB_HEARTBEAT_THRESHOLD
=====================================
1. Stop O2CB services in the server
#service O2CB stop

2. Update the O2CB configuration file with required value for O2CB_HEARTBEAT_THRESHOLD

# service o2cb configure
Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot. The current values will be shown in brackets ('[]'). Hitting
without typing an answer will keep that current value. Ctrl-C
will abort.

Load O2CB driver on boot (y/n) [y]:
Cluster stack backing O2CB [o2cb]:
Cluster to start on boot (Enter "none" to clear) [ocfs2]:
Specify heartbeat dead threshold (>=7) [31]: 61
Specify network idle timeout in ms (>=5000) [30000]:
Specify network keepalive delay in ms (>=1000) [2000]:
Specify network reconnect delay in ms (>=2000) [2000]:
Writing O2CB configuration: OK
Loading filesystem "configfs": OK
Mounting configfs filesystem at /sys/kernel/config: OK
Loading filesystem "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Starting O2CB cluster ocfs2: OK
#

Alternate way to update the value, is to edit file /etc/sysconfig/o2cb and start O2CB service.

3. Check the status of O2CB
# /etc/init.d/o2cb status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs2: Online
Heartbeat dead threshold = 61
Network idle timeout: 30000
Network keepalive delay: 2000
Network reconnect delay: 2000
Checking O2CB heartbeat: Active
#

References
========
1. OCFS2: A Cluster File System for Linux - User's Guide for Release 1.4
2. OCFS2 Kernel Panics on SAN Failover [ID 377616.1]
3. OCFS2 1.2 - FREQUENTLY ASKED QUESTIONS [ID 391771.1]
4. Heartbeat/Voting/Quorum Related Timeout Configuration for Linux, OCFS2, RAC Stack to Avoid Unnecessary Node Fencing, Panic and Reboot [ID 395878.1]

In my next blog, I will explain the detail steps for configuring multipath using native device mapper(DM) in Linux.