Returning to the subject of the human factor as the foundation of data center resiliency: 70% of equipment failure can be attributed to human factor. These reasons can lead to human errors:
Lack of knowledge, training, and understanding are the biggest issues in a data center. Furthermore, in a crisis is when a person makes the most errors.
I have seen many RAID errors and crashes that could of easily been avoided. If only the person knew how to check the logs daily. If one drive fails normally the other drives can handle the RAID. However, not replacing this drive is dangerous if another drive fails.
In the past I hired 3 IT professionals to run my network. I'll never do that again. I had nothing but trouble. One time a RAID failed. The idiot decided he needed to replace the wrong disk and rebuild the RAID. This ended up in a very expensive data recovery job for me. After this I decided to hire a professional company to handle my network. I am happier and haven't had any more incidents.
Lack of knowledge of documentation
Lack of knowledge of proper procedures
Low level of proficiency
Improper placement of equipment and switches
Lack of motivation
Inadvertence
Fatigue