AnsweredAssumed Answered

Restarting a failed object replication

Question asked by Mike Hall on Jun 1, 2016
Latest reply on Jun 1, 2016 by Mike Hall

We have a DR setup that mirrors our main file system to an object replication target. A couple of weeks ago, the node that serves out the EVS that this filesystem lives on inexplicably rebooted itself. After the filesystems failed over, the object replication stopped going through.  The first error reported is that the filesystem was unmounted during the backup:

 

2016-05-19 01:00:12.977-07:00 Snapshot index 3043689889793 has 18569417 indirection object entries

2016-05-19 01:00:12.977-07:00 Starting object replication session 786a231c-b1a4-11d1-973c-2f7c01c81f22

2016-05-19 22:22:00.393-07:00 Aborting: Source file system unmounting

 

The time stamp marries up to when we had to reboot the nodes, due to a heap overload that stopped the main filesystem being served out via NFS.

 

The next time the replication was supposed to run, it gave us this error:

 

2016-05-20 01:00:30.449-07:00 Using latest common snapshot ID 46389582 (index 3040187645955) to get the old checkpoint number

2016-05-20 01:00:30.449-07:00 Found incomplete object replication of snapshot ID 46443022 on target, rolling back to snapshot ID 46389582

2016-05-20 01:01:04.260-07:00 Object replication failed: File system not mounted

 

This was probably because we hadn't migrated all the filesystems over to their preferred EVS's. From then on, the object replication fails with the error

 

Failed. No baseline snapshot could be established between the source and replication target file systems. Verify that the replication policy specifies the correct target file system

 

2016-05-23 01:00:12.205-07:00 Using replication features 0x7 = TrueClonesSupport | ObjTouchMarkAsSparseAndFree | DeclonedSnapshotFilesSupport

2016-05-23 01:00:12.221-07:00 Target has 9 persona entries

2016-05-23 01:00:12.307-07:00 Aborting: Latest object replication completed on the target has snapshot ID 46389582 which is not available on the source

 

For the OR to work correctly, it needs two days worth of snapshots to essentially run an incremental backup. Unfortunately, I didn't set this up and the sysadmin who did has now left the company. I think at this stage, I need to start the whole thing again from scratch, though I suppose it may be possible to run an incremental backup from a recently created snapshot. Any ideas?

Outcomes