It appears to be possible in some situations to overload the management interfaces of a Dell Equallogic filer. The situation we experienced with a client was that after running an internal vulnerability scan (required for PCI:DSS compliance), the monitoring system picked up that the web interface had stopped responding.
The vulnerability scan essentially attempts to determine what software is in use on each device, and look for known issues, either in terms of misconfiguration (e.g. insecure ciphers in use for SSL), or vulnerabilities (e.g. version X of package Y is known to be insecure). In doing so it does generate a fairly large number of requests, so the assumption is that this somehow overloaded the device.
Further investigation showed that even SSHing to the array or accessing the serial console was not possible – the login would succeed, but then no prompt would appear.
While it continued to serve iSCSI requests perfectly (and interestingly SNMP remained operational for monitoring), not having access to the management interfaces is a serious problem, as it means it is impossible to perform routine tasks such as managing snapshots, and in the event of a drive failure it will make it very difficult to determine what has occurred and take the necessary actions.
After some research, some discussions with Dell technical support, and some further investigation, we discovered a very simple solution. While CLI access to the array (via SSH / serial console) as either the ‘grpadmin’ user or a RADIUS account was not operational, we discovered it was possible to log in to the array as the ‘root’ user and get a root prompt. The password for the ‘root’ user will likely have been set when the array was first commissioned – if you don’t know what it is then one suggestion would be to try your ‘grpadmin’ password.
Once there, it was then simply a matter of running the following command to restart the management interfaces (note this does not restart the filer, just the management tools so there should be no interruption in service):
eqlinit restart-snap netmgtd
This command completed essentially immediately, and the management interfaces became operational once more, allowing us to carry out some checks to ensure everthing was as expected.
While we need to point out that this information should be used at your own risk (and in particular if you have a support contract then we would always advise talking to Dell technical support prior to carrying out any unusual actions such as this), we hope it might come in useful in avoiding having to take more drastic actions such as causing a controller failover, or hard power cycling the filer, with all the associated risks of data corruption etc.