AnsweredAssumed Answered

AMS-1000 data recovery failed on hard drive replacement

Question asked by robert megowan on Sep 8, 2016

Hello,

 

Sorry if this is the wrong forum, I'm new to this.

 

I have a two controller AMS-1000 (tagma) with 24 trays (~300 disks). From what I've read these are fairly similar to the AMS-200 and AMS-500. I've even read the manual for an AMS-2000 and found a lot of similarities. We are trying to end of life this array and migrate data off, but it has to survive for a few more months while we get our data.  We were originally hoping to be done with it before we ran out of hot spares. However it has become apparent that our rate of drive failure is too high for this strategy to work. So here I am, with an out of support array that has ~ 14 failed drives trying to replace whatever I can to get it to limp along for a little while longer.

 

We purchased some drives from a third party who sells AMS replacement parts. So far the array doesn't seem to mind, and the model numbers on the drives are identical.

 

My issues/questions in order of importance are:

 

1) I replaced two 750 gb sata drives, one at a time. Each time it started a reconstruction. This reconstruction got to around 10 % on the first drive then failed. On the second drive it ran for about 10 hours, and probably was nearly done but now upon waking up it is failed too. Each time it was a DFPC error? I suspect that the system is hammering the new drive trying to rebuild it and is causing drive stalls. The models of the disks being replaced are the same. Not sure what I can do about this as my options within storage navigator modular seem pretty weak. Logs below. I am trying a 500gb failed drive in a different tray now. Maybe that will work.

 

09/07/2016 12:39:24 C0 I152HH Data recovery failed(Unit-24,HDU-05) :MANUAL/STRC
09/07/2016 12:04:10 C1 W060AT SATA HDU alarm(Unit-24,HDU-05) :HDU   /STRC
09/07/2016 12:04:10 C1 IY06HH DFPC error detect(Unit-24,HDU-05)[HDU TIME OUT]                                                 
09/07/2016 11:14:13 C0 I150HH Data recovery started(Unit-24,HDU-05)                                                           

 

2) I have another failed 500GB disk in the same system, different tray. It has the red light on that tells me it is detached (and I can see in Storage navigator modular that it is indeed detached). However this one drive has both the red LED on and the green LED on. The green LED is solid, and is not blinking. This is the only failed drive that has this, the rest of my failed drives all have only the red LED on. Is this disk safe to remove/replace? All the events pertaining to that disk are below, and they look normal to me...

   

01/04/2016 20:49:25 C0 I008HH Data recovery to spare HDU(Unit-06,HDU-01)                                                      

     

01/04/2016 07:03:39 C0 W060AT SATA HDU alarm(Unit-06,HDU-01)                                                        :HDU   /STRC
01/04/2016 07:03:39 C0 IY06HH DFPC error detect(Unit-06,HDU-01)[HDU TIME OUT]                                                 

 

3) we also have an ENC error, I have no idea how to start troubleshooting this as this was never a customer replaceable part. I've checked the cabling and it is plugged in, so my only guess is that the card went bad. Before I try to find a new one I want to make sure it is truly dead.

 

4) Can I replace multiple drives at once? If so, how many? Or should I take it slow given the age of this system.

 

Looking forward to any help you can provide.

 

full logs from the last 2 years below:

    

09/07/2016 12:39:24 C0 I152HH Data recovery failed(Unit-24,HDU-05)                                                  :MANUAL/STRC
09/07/2016 12:04:10 C1 W060AT SATA HDU alarm(Unit-24,HDU-05)                                                        :HDU   /STRC
09/07/2016 12:04:10 C1 IY06HH DFPC error detect(Unit-24,HDU-05)[HDU TIME OUT]                                                 
09/07/2016 11:14:13 C0 I150HH Data recovery started(Unit-24,HDU-05)                                                           
08/13/2016 02:07:42 C1 W060AT SATA HDU alarm(Unit-16,HDU-09)                                                        :HDU   /STRC
08/13/2016 02:07:42 C1 IY06HH DFPC error detect(Unit-16,HDU-09)[HDU TIME OUT]                                                 
06/29/2016 23:34:45 C0 I008HH Data recovery to spare HDU(Unit-08,HDU-06)                                                      
06/29/2016 23:34:45 C0 I151HH Data recovery completed(Unit-12,HDU-14)                                                         
06/29/2016 09:44:40 C0 I150HH Data recovery started(Unit-12,HDU-14)                                                           
06/29/2016 09:10:32 C1 W060AT SATA HDU alarm(Unit-08,HDU-06)                                                        :HDU   /STRC
06/29/2016 09:10:32 C1 IY06HH DFPC error detect(Unit-08,HDU-06)[HDU TIME OUT]                                                 
06/25/2016 08:22:12 C0 I008HH Data recovery to spare HDU(Unit-13,HDU-05)                                                      
06/25/2016 08:22:12 C0 I151HH Data recovery completed(Unit-08,HDU-14)                                                         
06/24/2016 18:06:59 C0 I150HH Data recovery started(Unit-08,HDU-14)                                                           
06/24/2016 17:32:56 C1 W060AT SATA HDU alarm(Unit-13,HDU-05)                                                        :HDU   /STRC
06/24/2016 17:32:56 C1 IY00HH HDU error report(Unit-13,HDU-05)[03-1101]                                                       
06/20/2016 21:07:38 C0 I008HH Data recovery to spare HDU(Unit-22,HDU-05)                                                      
06/20/2016 21:07:38 C0 I151HH Data recovery completed(Unit-14,HDU-14)                                                         
06/20/2016 11:22:26 C0 I150HH Data recovery started(Unit-14,HDU-14)                                                           
06/20/2016 10:48:27 C1 W060AT SATA HDU alarm(Unit-22,HDU-05)                                                        :HDU   /STRC
06/20/2016 10:48:27 C1 IY06HH DFPC error detect(Unit-22,HDU-05)[HDU TIME OUT]                                                 
06/11/2016 11:35:38 C0 I008HH Data recovery to spare HDU(Unit-15,HDU-00)                                                      
06/11/2016 11:35:38 C0 I151HH Data recovery completed(Unit-23,HDU-14)                                                         
06/10/2016 12:11:55 C0 I150HH Data recovery started(Unit-23,HDU-14)                                                           
06/10/2016 12:11:52 C0 I152HH Data recovery failed(Unit-21,HDU-14)                                                  :MANUAL/STRC
06/10/2016 12:11:51 C0 W061AT SATA Spare HDU alarm(Unit-21,HDU-14)                                                  :HDU   /STRC
06/10/2016 12:11:51 C0 IY06HH DFPC error detect(Unit-21,HDU-14)[HDU TIME OUT]                                                 
06/10/2016 12:06:14 C0 I150HH Data recovery started(Unit-21,HDU-14)                                                           
06/10/2016 12:06:14 C0 I008HH Data recovery to spare HDU(Unit-13,HDU-02)                                                      
06/10/2016 12:06:14 C0 I151HH Data recovery completed(Unit-13,HDU-14)                                                         
06/10/2016 09:41:38 C0 W060AT SATA HDU alarm(Unit-15,HDU-00)                                                        :HDU   /STRC
06/10/2016 09:41:38 C0 IY00HH HDU error report(Unit-15,HDU-00)[03-1101]                                                       
06/09/2016 12:27:17 C0 I150HH Data recovery started(Unit-13,HDU-14)                                                           
06/09/2016 11:53:29 C1 W060AT SATA HDU alarm(Unit-13,HDU-02)                                                        :HDU   /STRC
06/09/2016 11:53:29 C1 IY06HH DFPC error detect(Unit-13,HDU-02)[HDU TIME OUT]                                                 
04/17/2016 17:19:25 C0 I008HH Data recovery to spare HDU(Unit-24,HDU-12)                                                      
04/17/2016 17:19:25 C0 I151HH Data recovery completed(Unit-24,HDU-14)                                                         
04/17/2016 05:34:30 C0 I150HH Data recovery started(Unit-24,HDU-14)                                                           
04/17/2016 05:01:30 C1 W060AT SATA HDU alarm(Unit-24,HDU-12)                                                        :HDU   /STRC
04/17/2016 05:01:30 C1 IY06HH DFPC error detect(Unit-24,HDU-12)[HDU TIME OUT]                                                 
03/24/2016 00:48:48 C0 I540HH ENC recovery failed(Unit-22,ENC-1)                                                    :MANUAL/STRC
03/24/2016 00:48:48 C0 I5YJ00 ENC error[SES access error](Unit-22,ENC-1)                                                      
03/24/2016 00:47:36 C0 I1H600 Automatic SENC recovery starts(Unit-22,ENC-1)                                                   
03/24/2016 00:46:34 C0 I53600 ENC error inf.[LPFO]                                                                  :ENC      
03/24/2016 00:46:33 C0 W08001 Loop alarm(Path-0,Loop-1)                                                             :MANUAL/STRC
03/24/2016 00:46:32 C0 W08011 Loop alarm(Path-1,Loop-1)                                                             :MANUAL/STRC
03/24/2016 00:46:32 C0 W0GD00 SENC alarm[RKAJAT](Unit-22,ENC-1)                                                     :ENC   /STRC
03/24/2016 00:14:02 C1 W08001 Loop alarm(Path-0,Loop-1)                                                             :MANUAL/STRC
03/24/2016 00:14:02 C1 W08011 Loop alarm(Path-1,Loop-1)                                                             :MANUAL/STRC
03/24/2016 00:14:00 C1 I6DZ00 Unit missing(Unit-22,Loop-1)                                                          :MANUAL   
03/09/2016 23:01:51 C0 I008HH Data recovery to spare HDU(Unit-24,HDU-06)                                                      
03/09/2016 23:01:51 C0 I151HH Data recovery completed(Unit-22,HDU-14)                                                         
03/09/2016 10:44:46 C0 I00BHH ENC recovered(Unit-22,ENC-0)                                                                    
03/09/2016 10:12:28 C1 I00A10 Loop recovered(Path-1,Loop-0)                                                                   
03/09/2016 10:12:28 C1 I00A00 Loop recovered(Path-0,Loop-0)                                                                   
03/09/2016 10:44:42 C0 I00A10 Loop recovered(Path-1,Loop-0)                                                                   
03/09/2016 10:44:42 C0 I00A00 Loop recovered(Path-0,Loop-0)                                                                   
03/09/2016 10:43:30 C0 I1H600 Automatic SENC recovery starts(Unit-22,ENC-0)                                                   
03/09/2016 10:10:12 C1 W08000 Loop alarm(Path-0,Loop-0)                                                             :MANUAL/STRC
03/09/2016 10:10:12 C1 W08010 Loop alarm(Path-1,Loop-0)                                                             :MANUAL/STRC
03/09/2016 10:42:26 C0 I53600 ENC error inf.[LPFO]                                                                  :ENC      
03/09/2016 10:42:26 C0 W0GD00 SENC alarm[RKAJAT](Unit-22,ENC-0)                                                     :ENC   /STRC
03/09/2016 10:42:23 C0 W08000 Loop alarm(Path-0,Loop-0)                                                             :MANUAL/STRC
03/09/2016 10:42:23 C0 W08010 Loop alarm(Path-1,Loop-0)                                                             :MANUAL/STRC
03/09/2016 10:42:02 C0 I150HH Data recovery started(Unit-22,HDU-14)                                                           
03/09/2016 10:41:59 C0 I152HH Data recovery failed(Unit-22,HDU-14)                                                  :MANUAL/STRC
03/09/2016 10:41:58 C0 W060AT SATA HDU alarm(Unit-24,HDU-06)                                                        :HDU   /STRC
03/09/2016 10:41:58 C0 IY06HH DFPC error detect(Unit-24,HDU-06)[HDU TIME OUT]                                                 
03/09/2016 10:41:27 C0 I150HH Data recovery started(Unit-22,HDU-14)                                                           
03/09/2016 10:41:27 C0 I15AHH Dynamic sparing start(Unit-24,HDU-06)[LNKTO]                                                    
03/09/2016 10:41:27 C0 I6DD00 HDU error over(Unit-24,HDU-06)[LNKTO]                                                 :HDU      
02/18/2016 07:22:34 C0 I008HH Data recovery to spare HDU(Unit-22,HDU-09)                                                      
02/18/2016 07:22:34 C0 I151HH Data recovery completed(Unit-16,HDU-14)                                                         
02/17/2016 18:53:35 C0 I150HH Data recovery started(Unit-16,HDU-14)                                                           
02/17/2016 18:21:37 C1 W060AT SATA HDU alarm(Unit-22,HDU-09)                                                        :HDU   /STRC
02/17/2016 18:21:37 C1 IY06HH DFPC error detect(Unit-22,HDU-09)[HDU TIME OUT]                                                 
01/10/2016 23:39:46 C0 I008HH Data recovery to spare HDU(Unit-05,HDU-00)                                                      
01/10/2016 23:39:46 C0 I151HH Data recovery completed(Unit-11,HDU-14)                                                         
01/10/2016 01:22:38 C0 I150HH Data recovery started(Unit-11,HDU-14)                                                           
01/10/2016 01:22:38 C0 W060AT SATA HDU alarm(Unit-05,HDU-12)                                                        :HDU   /STRC
01/10/2016 01:22:38 C0 I41FHH HDU error over(Unit-05,HDU-12)[REAOV]                                                 :HDU      
01/10/2016 01:22:38 C0 I008HH Data recovery to spare HDU(Unit-05,HDU-12)                                                      
01/10/2016 01:22:37 C0 I151HH Data recovery completed(Unit-05,HDU-14)                                                         
01/09/2016 03:19:02 C1 W060AT SATA HDU alarm(Unit-05,HDU-00)                                                        :HDU   /STRC
01/09/2016 03:19:02 C1 IY00HH HDU error report(Unit-05,HDU-00)[03-1101]                                                       
01/08/2016 18:40:11 C1 I41FHH HDU error over(Unit-05,HDU-12)[REAOV]                                                 :HDU      
01/08/2016 18:31:32 C0 I150HH Data recovery started(Unit-05,HDU-14)                                                           
01/08/2016 18:31:32 C0 I15AHH Dynamic sparing start(Unit-05,HDU-12)[REAOV]                                                    
01/08/2016 18:00:17 C1 I41FHH HDU error over(Unit-05,HDU-12)[REAOV]                                                 :HDU      
01/07/2016 04:08:19 C0 I008HH Data recovery to spare HDU(Unit-08,HDU-11)                                                      
01/07/2016 04:08:19 C0 I151HH Data recovery completed(Unit-04,HDU-14)                                                         
01/05/2016 08:15:07 C0 I150HH Data recovery started(Unit-04,HDU-14)                                                           
01/05/2016 08:15:04 C0 I152HH Data recovery failed(Unit-04,HDU-14)                                                  :MANUAL/STRC
01/05/2016 07:43:51 C1 W060AT SATA HDU alarm(Unit-08,HDU-11)                                                        :HDU   /STRC
01/05/2016 07:43:51 C1 IY06HH DFPC error detect(Unit-08,HDU-11)[HDU TIME OUT]                                                 
01/05/2016 08:14:38 C0 I150HH Data recovery started(Unit-04,HDU-14)                                                           
01/05/2016 08:14:38 C0 I15AHH Dynamic sparing start(Unit-08,HDU-11)[REAOV]                                                    
01/05/2016 07:43:26 C1 I41FHH HDU error over(Unit-08,HDU-11)[REAOV]                                                 :HDU      
01/04/2016 20:49:25 C0 I008HH Data recovery to spare HDU(Unit-06,HDU-01)                                                      
01/04/2016 20:49:25 C0 I151HH Data recovery completed(Unit-01,HDU-14)                                                         
01/04/2016 07:04:00 C0 I150HH Data recovery started(Unit-01,HDU-14)                                                           
01/04/2016 07:03:39 C0 W060AT SATA HDU alarm(Unit-06,HDU-01)                                                        :HDU   /STRC
01/04/2016 07:03:39 C0 IY06HH DFPC error detect(Unit-06,HDU-01)[HDU TIME OUT]                                                 
12/18/2015 16:41:07 C0 I008HH Data recovery to spare HDU(Unit-24,HDU-05)                                                      
12/18/2015 16:41:07 C0 I151HH Data recovery completed(Unit-15,HDU-14)                                                         
12/18/2015 04:35:36 C0 I150HH Data recovery started(Unit-15,HDU-14)                                                           
12/18/2015 04:04:38 C1 W060AT SATA HDU alarm(Unit-24,HDU-05)                                                        :HDU   /STRC
12/18/2015 04:04:38 C1 IY06HH DFPC error detect(Unit-24,HDU-05)[HDU TIME OUT]                                                 
11/03/2015 18:01:06 C0 IZYR00 Automatic ENC microprogram download completed successfully                                      
11/03/2015 18:01:06 C0 IZYS00 Automatic ENC microprogram download start                                             :MANUAL   
11/03/2015 18:01:02 C0 I00BHH ENC recovered(Unit-20,ENC-1)                                                                    
11/03/2015 17:30:48 C1 I00A31 Loop recovered(Path-3,Loop-1)                                                                   
11/03/2015 17:30:48 C1 I00A21 Loop recovered(Path-2,Loop-1)                                                                   
11/03/2015 18:00:59 C0 I00A31 Loop recovered(Path-3,Loop-1)                                                                   
11/03/2015 18:00:59 C0 I00A21 Loop recovered(Path-2,Loop-1)                                                                   
11/01/2015 13:19:19 C1 W08031 Loop alarm(Path-3,Loop-1)                                                             :MANUAL/STRC
11/01/2015 13:19:19 C1 W08021 Loop alarm(Path-2,Loop-1)                                                             :MANUAL/STRC
11/01/2015 13:49:28 C0 I61C00 SES error inf.[SES-HDU error]                                                                   
11/01/2015 13:49:28 C0 W08031 Loop alarm(Path-3,Loop-1)                                                             :MANUAL/STRC
11/01/2015 13:49:28 C0 W08021 Loop alarm(Path-2,Loop-1)                                                             :MANUAL/STRC
11/01/2015 13:49:28 C0 W0GC00 ENC alarm[RKAJ](Unit-20,ENC-1)                                                        :ENC   /STRC
10/12/2015 23:49:00 C0 I007HH HDU recovered(Unit-23,HDU-13)                                                                   
10/12/2015 23:48:59 C0 I151HH Data recovery completed(Unit-23,HDU-13)                                                         
10/12/2015 15:22:53 C0 I150HH Data recovery started(Unit-23,HDU-13)                                                           
10/03/2015 01:37:19 C0 I007HH HDU recovered(Unit-16,HDU-08)                                                                   
10/03/2015 01:37:19 C0 I151HH Data recovery completed(Unit-16,HDU-08)                                                         
10/02/2015 16:25:11 C0 I150HH Data recovery started(Unit-16,HDU-08)                                                           
09/29/2015 22:16:24 C0 I008HH Data recovery to spare HDU(Unit-23,HDU-13)                                                      
09/29/2015 22:16:24 C0 I151HH Data recovery completed(Unit-15,HDU-14)                                                         
09/29/2015 09:58:06 C0 I150HH Data recovery started(Unit-15,HDU-14)                                                           
09/29/2015 09:28:26 C1 W060AT SATA HDU alarm(Unit-23,HDU-13)                                                        :HDU   /STRC
09/29/2015 09:28:26 C1 IY06HH DFPC error detect(Unit-23,HDU-13)[HDU TIME OUT]                                                 
07/21/2015 16:16:00 C0 I007HH HDU recovered(Unit-24,HDU-05)                                                                   
07/21/2015 16:16:00 C0 I151HH Data recovery completed(Unit-24,HDU-05)                                                         
07/21/2015 04:11:37 C0 I150HH Data recovery started(Unit-24,HDU-05)                                                           
07/21/2015 04:11:37 C0 I007HH HDU recovered(Unit-23,HDU-04)                                                                   
07/21/2015 04:11:37 C0 I151HH Data recovery completed(Unit-23,HDU-04)                                                         
07/20/2015 18:28:07 C0 I150HH Data recovery started(Unit-23,HDU-04)                                                           
07/18/2015 12:30:12 C0 I008HH Data recovery to spare HDU(Unit-16,HDU-08)                                                      
07/18/2015 12:30:11 C0 I151HH Data recovery completed(Unit-22,HDU-14)                                                         
07/17/2015 16:24:36 C0 I150HH Data recovery started(Unit-22,HDU-14)                                                           
07/17/2015 15:56:10 C1 W060AT SATA HDU alarm(Unit-16,HDU-08)                                                        :HDU   /STRC
07/17/2015 15:56:10 C1 IY06HH DFPC error detect(Unit-16,HDU-08)[HDU TIME OUT]                                                 
07/14/2015 04:29:02 C0 I008HH

Outcomes