Thursday, September 9, 2010

Netapp FAS 3140 Cluster cf giveback fails

Cluster giveback fails:

1. Check both controller cluster License enabled using license command
2. check Cluster status using cf status 
3. If it is disabled enable using cf enable

filer1(partner)>cf status

filer1 is up, takeover disabled because of reason (unsynchronized log)

briencfvt1 has disabled takeover by briencfvt2 (unsynchronized log)

VIA Interconnect is down (link 0 down, link 1 down)


Filer1>cf giveback 

takeoverState FT_NONE, takeoverString 'No takeover information'

givebackState FT_NONE, givebackString 'No giveback information'

givebackRetries 0, givebackRequested FALSE


filer1(takeover)*> cf monitor

  current time: 02Sep2010 23:32:10

  TAKEOVER 3+04:55:29, partner 'briencfvt1', cluster monitor enabled

filer1(takeover)*> cf monitor

  current time: 02Sep2010 23:32:18

  TAKEOVER 3+04:55:37, partner 'briencfvt1', cluster monitor enabled

briencfvt2(takeover)*> cf staT             cf monitor          cf monitor          cf monitor all

cf: Current monitor status (02Sep2010 23:32:26):

partner 'briencfvt1', Interconnect not present

state TAKEOVER, time 276973571, event CHECK_FSM, elem S200_2 (32)

mirrorConsistencyRequired TRUE

takeoverByPartner 0x2009

mirrorEnabled TRUE, lowMemory FALSE, memio UNINIT, killPackets TRUE

degraded FALSE, reservePolicy ALWAYS_AFTER_TAKEOVER, resetDisks TRUE

hw_assist status:

briencfvt2 has taken over briencfvt1

timeouts:

    fast 1000, slow 2500, mailbox 10000, connect 5000

    operator 600000, firmware 15000 (recvd 15914), dumpcore 60000

    booting 300000 (recvd 0)

    transit timer enabled TRUE, transit 600000 (last 32427)

mailbox disks:

Disk 0a.17 is a local mailbox disk

Disk 0a.18 is a local mailbox disk

Disk 0a.24 is a partner mailbox disk

Disk 0a.29 is a partner mailbox disk

primary state:

version 2, senderSysid 151731691

cluster_time 1283173623, hbt 92375, node_status TAKEOVER_DISABLED

info 0x2009

flags 0x0 <>

channel CHANNEL_MAILBOX, abs_time 1283450545, sk_time 276973571

channel_status 0

channel CHANNEL_IC, abs_time 0, sk_time 0

channel_status 4

channel CHANNEL_NETWORK, abs_time 0, sk_time 0

channel_status -1

backup state:

version 2, senderSysid 151732359

cluster_time 1261476214, hbt 21851918, node_status TAKEOVER_ACTIVE

info 0x2000

flags 0x0 <>

channel CHANNEL_MAILBOX, abs_time 1283173612, sk_time 28686

channel_status 0

Channel Read Ctx:

version 2, senderSysid 151732359

cluster_time 1261476214, hbt 21746005, node_status TAKEOVER_ACTIVE

info 0x2000

flags 0x0 <>

channel CHANNEL_IC, abs_time 0, sk_time 0

channel_status 4

Channel Read Ctx:

version 2, senderSysid 0

cluster_time 0, hbt 0, node_status UNKNOWN

info 0x0 <>

flags 0x0 <>

channel CHANNEL_NETWORK, abs_time 0, sk_time 0

channel_status -1

Channel Read Ctx:

version 2, senderSysid 0

cluster_time 0, hbt 0, node_status UNKNOWN

info 0x0 <>

flags 0x0 <>

takeoverState FT_TAKEOVER_DONE_OK, takeoverString 'Takeover OK'

givebackState FT_NONE, givebackString 'No giveback information'

givebackRetries 0, givebackRequested FALSE

autoGivebackEnabled FALSE, autoGivebackWasDone FALSE, autoGivebackCifsStopping FALSE

autoGivebackLastVetoCheck 0, autoGivebackAttemptsExceeded FALSE

Maximum primary disk mailbox io times: normal = 373, transition = 0

Maximum backup disk mailbox io times: normal = 301, transition = 0

Num times logs unsynced : 0

Filer1(takeover)*> rdfile /etc/messages

Sun Aug 29 00:00:00 IST [briencfvt2 (takeover): kern.log.rotate:notice]: System briencfvt2 (ID 0151731691) is running NetApp Release 7.3.2

Sun Aug 29 00:00:01 IST [briencfvt2 (takeover): kern.log.rotate:notice]: System briencfvt1 (ID 0151732359) is running NetApp Release 7.3.2

Sun Aug 29 00:00:13 IST [briencfvt2 (takeover): cf.ic.hourlyNicDownTime:info]: Interconnect adapter link #0 has been down for 4474 minutes

Sun Aug 29 00:00:13 IST [briencfvt2 (takeover): cf.ic.hourlyNicDownTime:info]: Interconnect adapter link #1 has been down for 4474 minutes

Sun Aug 29 01:00:00 IST [briencfvt2 (takeover): kern.uptime.filer:info]:   1:00am up  3 days,  3:34 0 NFS ops, 9578 CIFS ops, 0 HTTP ops, 183525 FCP ops, 0 iSCSI ops

Sun Aug 29 01:00:05 IST [briencfvt1/briencfvt2: raid.rg.scrub.resume:notice]: partner:/aggr0/plex0/rg0: resuming scrub at stripe 86097672 (79% complete)

Sun Aug 29 01:00:06 IST [briencfvt2 (takeover): raid.rg.scrub.start:notice]: local:/aggr0/plex0/rg0: starting scrub

Sun Aug 29 01:00:08 IST [briencfvt2 (takeover): cf.ic.hourlyNicDownTime:info]: Interconnect adapter link #0 has been down for 4534 minutes

Sun Aug 29 01:00:08 IST [briencfvt2 (takeover): cf.ic.hourlyNicDownTime:info]: Interconnect adapter link #1 has been down for 4534 minutes

Sun Aug 29 01:15:09 IST [briencfvt1/briencfvt2: asup.post.sent:notice]: Cluster Notification message posted to NetApp: Cluster Notification from briencfvt1 (WEEKLY_LOG) INFO

Sun Aug 29 01:46:24 IST [briencfvt2 (takeover): asup.post.sent:notice]: Cluster Notification message posted to NetApp: Cluster Notification from briencfvt2 (WEEKLY_LOG) INFO

Sun Aug 29 01:46:31 IST [briencfvt2 (takeover): asup.post.sent:notice]: Cluster Notification message posted to NetApp: Cluster Notification from briencfvt2 (PERFORMANCE DATA) INFO

Filer1> Fri Sep  3 00:02:15 IST [briencfvt1: cf.rv.nicReset:error]: Reset cluster interconnect(s) due to unsynchronized log






Solution:

Reboot the both controller and its working fine.





1 comment: