i may not have enough info so pls forgive me, im part of a team and just trying to help out.
we are having issues with our shares being extremely slow.
when we fail over from one node to another it works for about 20 min perfectly, then goes back to being slooooww.
any help would be great.
this is info from an email chain.
I just got off the phone with XYZ and HDS services (support). XYZ will be working on looking at the logs I sent last night and get a better understanding on what is going on from the array perspective. HDS services is in the process of looking at the logs and diagnostics I uploaded last night. Will keep you posted on any updates.
Current array stats:
CPU below 40%
Cache write pending below 10%
NAS Pool drives busy rate below 65% (These were consistently @ 90+%)
CPU below 5%
IOPS/sec below 1000 (typically around 3500)
Ethernet throughput below 50 mb/sec
Disk latency below 50ms
FPGA load @ 100% (there are multiple FPGAs in the HNAS device and if one is @ 100%, then that is what it will report)
Man what all these stats tell me is that the issue might be somewhere else.
xyz and HDS have been analyzing the filer logs in detail and the performance statistics on it now are better than they have been in quite a while yet the actual response of the requests to the NAS are significantly worse starting yesterday so we are wanting to look at the other common components for user and system access. Network is the next thing in line so xyz is reaching out to xyz for help on that. If anyone has any other ideas feel free to chime in. This one has us all scratching our heads on what caused the sudden change in performance.
On December 21st the network team moved all of the voice equipment, VPN WAN routers, the DHQ Gigamans, VPN330 boxes, and the Riverbeds. Yesterday morning around 10am, I moved the Blue Coat reverse proxy, the Aruba wireless controller and the Foundry that was still on the old side but it was cabled into the Nexus already.
On Dec 21st we put the new storage shelves into the Hitachi frame. This was done with Hitachi there to certify it and is what we had been replicating data to relieve some of the burden on the disks. This would seem to be a likely culprit which is why we have been going through everything on the Hitachi side in great detail. That is where the confusion comes in. The drive busy rate of 65% that we are seeing now is lower than it has been in several months and the iOPs of 1000 is significantly lower than normal. The FPGA load is what we are focusing on now since it is higher but HDS indicate that is not a problem. The low disk rate and low IOPS almost seems to indicate that there is a lower rate of requests getting to the system. We have also discussed the option of backing out of that addition. I just got off the phone with xyz and him and xyz are in the process of testing a couple of things. In addition xyz and HDS are looking at some other items on the Hitachi side. When they are done we would like to all get on the conf bridge and discuss where we are and what we should do as the next steps. 1:30 should be enough time for them to get done with what they are working on so I will send an invite for then