I spent the last decade working with virtualization platforms and the certifications and accreditation’s that go along with them. During this time, I thought I understood what it meant to run an efficient data center. After six months of working with Red Hat CloudForms, a Cloud Management Platform (CMP), I now wonder what was I thinking. I encountered every one of the problems below, each are preventable with the right solution. Remember, we live in the 21st century–shouldn’t the software that we use act like it?
- We filled up a data store and all of the machines on it stopped working.
It does not matter if it is a development environment or the mission critical database cluster, when storage fills up everything stops! More often than not it is due to an excessive number of snapshots. The good news is CloudForms can quickly be set up with a policy to recognize and prevent this from happening.For example we can check the storage utilization and if it is over 90% full take action, or better yet, when it is within two weeks of being full based on usage trends. That way if manual action is required, there is enough forewarning to do so. Another good practice is to setup a policy to disable more than a few snapshots. We all love to take snapshots, but there is a real cost to them, and there is no need to let them get out of hand.
- I just got thousands of emails telling me that my host is down.The only thing worse than no email alert is receiving thousands of them. In CloudForms it is not only easy to set up alerts, but also to define how often they should be acted upon. For example, check every hour, but only notify once per day.
- Your virtual machines (VMs) cannot be migrated because the VM tools updater CD-ROM image was not un-mounted correctly.
This is a serious issue for a number of reasons. First it breaks Disaster Recovery (DR) operations and can cause virtual machines to be out of balance. It also disables the ability to put a node into maintenance mode, potentially causing additional outages and delays.Most solutions involve writing a shell script that runs as root and attempts to periodically unmount the virtual CD-ROM drives. These scripts usually work, but are both scary from a security standpoint and indiscriminately dangerous, imagine physically ejecting the CD-ROM while the database administrator is in the middle of a database upgrade! With CloudForms we can setup a simple policy that can unmount drives once a day, but only after sanity checking that it is the correct CD-ROM image and that the system is in a state where it can be safely unmounted.
- I have to manually ensure that all of my systems pass an incredibly detailed and painful compliance check (STIGS, PCI, FIPS, etc.) by next week!
I have lost weeks of my life to this and if you have not had the pleasure, count yourself lucky. When the “friendly” auditors show up with a stack of three-ring binders and a mandate to check everything, you might as well clear your calendar for the next few weeks. In addition, since these checks are usually a requirement to continuing operations, expect many of these meetings to involve layers of upper management you did not know existed, and this is definitely not the best time to become acquainted.The good news is CloudForms allows for you to run automatic checks on VMs and hosts. If you are not already familiar with its OpenSCAP scanning capability, you owe yourself a look. Not only that, but if someone attempts to bring a VM online that is not compliant, CloudForms can shut it right back down. That is the type of peace of mind that allows for sleep-filled nights.
- Someone logged into a production server as root using the virtual console and broke it. Now you have to physically hunt down and interrogate all the potential culprits — as well as fix the problem.
Before you pull out your foam bat and roam the halls to apply some “sense” to the person who did this, it is good to know exactly who it was and what they did. With CloudForms you can see a timeline of each machine, who logged into what console, as well as perform a drift analysis to potentially see what changed. With this knowledge you can now not only fix the problem, but also “educate” the responsible party.
- The developers insist that all VM’s must have 8 vCPU’s and 64GB of RAM.
The best way to fight flagrant waste or resources is with data. CloudForms provides the concept of “Right-Sizing” where it will watch VMs operate and determine what resource allocation is the ideal size. With this information in hand CloudForms can either automatically adjust the allocations, or spit out a report to be used to show what the excessive resources are costing.
- Someone keeps creating 32bit VM’s with more than 4GB of RAM!
As we know there is no “good” way that a 32bit VM can possibly use that much memory and it is essentially just waste. A simple CloudForms policy to check for “OS Type = 32bit” and “RAM > 4GB”, can be a very interesting report to run. Or better yet, put a policy in place to automatically adjust the memory to 4GB and notify the system owner.
- I have to buy hardware for next year, but my capacity-planning formula involves a spreadsheet and a dart board.
Long term planning in IT is hard, especially with dynamic workloads in a multi-cloud environment. Once CloudForms is running, it automatically collects performance data and executes trend line analysis to assist with operational management. For example, in 23 days you will be out of storage on your production SAN. If that does not get the system administrator’s attention nothing will. It can also perform simulations to see what your environment would look like if you added resources. So you can see your trend lines and capacity if you added another 100 VMs of a particular type and size.
- For some reason two hosts were swapping VMs back and forth, and I only found out when people complained about performance.
As an administrator there is no worse way to find out that something is wrong than being told by a user. Large scale issues such as this can be hard to see from the logs since they consist of typical output. With CloudForms, a timeline overview of the entire environment highlights issues like this and the root cause can be tracked down.
- I spend most of my day pushing buttons, spinning up VMs, manually grouping them into virtual folders and tracking them with spreadsheets.
Before starting a new administrator role it is always good to ask for the “Point of Truth” system that keeps track of what systems are running, where they are, and who is responsible for them. More often than not the answer is, “A guy, who keeps track of the list, on his laptop”.This may be how it was always done, but now with tools such as CloudForms, you can automatically tag machines based on location, projects, users, or any other combination of characteristics, and as a bonus, can provide usage and costing information back to the user. Gary could only dream of providing that much helpful information.
There is never enough time in the day, and the pace of new technologies is accelerating. The only way to keep up is to automate processes. The tools that got you where you are today are not necessarily the same ones that will get you through the next generation of technologies. It will be critical to have tools that work across multiple infrastructure components and provide the visibility and automation required. This is why you need a cloud management platform and where the real power of CloudForms comes into play.