Adding Monitors to DataDog

One of my primary complaints with something like Nagios is adding monitors in, which is typically done via config files that need to be reloaded – if its not in the correct format sometimes it doesn’t work, sometimes Nagios doesn’t restart. Now that I have a host added to DataDog, I am going to take a look at how to create a monitor. Since the host I added is my Ansible master, I will look at adding a monitor to ensure that Ansible is running.

In the DataDog interface, navigate to Monitors >> New Monitor. There are quite a few options to select from, including ensuring a specific host is checking into DataDog, this is quite likely a check you will want for machines that are of the “pet” variety, since DataDog will remove hosts that have not checked in within a 3 hour window, as well as specific metrics or services.


To monitor for a specific process, select … yup … Process. Sadly, this requires config file updates on each host no less, not something very easy to accomplish at scale without something like Ansible to ensure all of the desired config files are deployed. Still, I am going to check it out. To add a process monitor to a host you must edit conf.d/process.yaml according to the directions, problem with that is, as you might recognize, there is generally no conf.d directory at the root, so their directions are a bit off (as I find most linux directions are – sorry but its true). I will assume for a moment they are referring to the conf.d/process.yaml folder for the DataDog agent, but again this information isn’t something that was presented during the install.

The full path you are looking for is /etc/dd-agent/conf.d – in this folder you will find an example process.yaml file (slight annoyonce – other items are in directories/sub-directories that start with
datadog” – but not this one). Here is the example file:

  # the check will refresh the matching pid list every X seconds
  # except if it detects a change before. You might want to set it
  # low if you want to alert on process service checks.
  # pid_cache_duration: 120

# The `system.processes.cpu.pct` metric sent by this check is only accurate for processes that live
# for more than 30 seconds. Do not expect its value to be accurate for shorter-lived processes.
#  - name: (required) STRING. It will be used to uniquely identify your metrics as they will be tagged with this name
#    search_string: (required) LIST OF STRINGS. If one of the elements in the list matches,
#                    return the counter of all the processes that contain the string
#    exact_match: (optional) Boolean. Default to True, if you want to look for an arbitrary
#                 string, use exact_match: False
#    ignore_denied_access: (optional) Boolean. Default to True, when getting the number of files descriptors, dd-agent user might
#    get a denied access. Set this to true to not issue a warning if that happens.
#    thresholds: (optional) Two ranges: critical and warning
#         warning: (optional) List of two values: If the number of processes found is below the first value or
#                  above the second one, the process check will return WARNING.
#         critical: (optional) List of two values: If the number of processes found is below the first value or
#                   above the second one, the process check will return CRITICAL.
#     In this example, process check will return OK for 3 to 5 process. WARNING for 1, 2, 6, 7 processes and Critical below 1 or above 7.
#     CRITICAL is always dominant in case of overlapping.
# Examples:

  - name: ssh
    search_string: ['ssh', 'sshd']
    # tags:
    #   - env:staging
    #   - cluster:big-data
      # critical if no sshd or more than 8 sshd are running
      critical: [1, 7]
      # warning if 1, 2, 6, 7 sshd processes are running
      warning: [3, 5]
      # ok if 3, 4, 5 processes are running

  - name: postgres
    search_string: ['postgres']
    ignore_denied_access: True

  - name: nodeserver
    search_string: ['node server.js']

I made a copy of this file and removed the example checks, entering just:

- name: ansible
    search_string: ['ansible']

And restarted the agent (sudo /etc/init.d/datadog-agent restart). Once the new monitor is created, you will be able to select the new monitor and select the monitor thresholds.conditions that fit your needs and customize the notifications before saving. You can now navigate back to Monitors >> Manage Monitors to see what you have created, as well as their status.



DataDog First Impressions

After a sad first experience with another “Software as a Service” monitoring vendor which I can’t sign up and get access to without first talking to a sales person, and because they don’t publish their pricing, I decided to give DataDog a try. DataDog is free for up to 5 hosts, sufficient maybe for very small SMBs or for a specific, small application instance or $15 per host billed annually (or $18 a month billed monthly). You can sign up for DataDog free or pro on their website, but for enterprsie support you’ll have to talk to the dreaded sales person.

Once you sign up, you will need to select the services you use (though this doesn’t seem to relate to the agents available); for example there are options for AWS, Docker, Google Cloud Platform, and vSphere (via Windows vCenter Agent, so it appears no VCSA), as well as other common applications such as Apache, Tomcat, PagerDuty, Microsoft IIS, and MicrosoftSQL (props to the DataDog team for including Microsoft technologies – they aren’t the cool hipster things to use but damn stable and underrated in today’s startupie/devopsie world).


As you can see above, I have selected AWS, IIS, vSphere, and SQL since these are things I have access to in my lab. Once complete, select your agent, I’ll start with Ubuntu since that is what my Ansible control machine is running on. Once selected, you will get a command to run to pull down the agent.


The command/directions provided will change based on your operating system, as you might expect. You will need admin/sudo access for the agent installation, which will proceed when you enter the command provided.


The agent will complete and connect to DataDog.


Once you have finished select OS types on the DataDog wizard, click next. You will see information collected by DataDog, right off the bat it shows that the machine I added has NTP issues; makes sense since I never setup NTP!


In addition to Events (which you are looking at above), you can also see

  • Dashboards, which you will need to create
  • Infrastructure lists or maps
  • List of triggered monitors, and the ability to create new monitors
  • Integration for the various systems I listed earlier

Time to let the agent do some work, I’ll check back in on DataDog in a day or so to see how its doing collecting information as well as look at how easy (or not) it is to create new monitors (currently a primary complaint with Nagios and Sensu).

LogicMonitor – Bad First Impressions

TL;DR – Sorry for the disappointing post, I was really hoping to finally get some hands on with LogicMonitor tonight, it appears though they do not have a great understanding of “Software as a Service” as their signup simply generates an automated email from sales and no access out of the gate.

Before I get to LogicMonitor, allow me to wonder aloud for a moment why monitoring products are so horrible… I’ve been a long time Nagios user, and…it is what it is. How its not evolved beyond archaic config files is beyond me. How new tools like Sensu seem to be just as convoluted as Nagios was/is, is also beyond me. With that, I am hoping that some SaaS based solutions are a bit more modern for users to deal with – I don’t want to make a career of learning how to write checks, alerts, and other modules – it’s 2015 – this should work by now.

LogicMonitor has been on my radar for a while now, and I often run into them at VMUGs and VMworld. My one complaint with them, they don’t publish their pricing, this is generally a huge red flag for me. There is also no free tier beyond the 14 day trial, I’m not going to layout any amount of money after 14 days, it is simply not long enough to have a good gauge of a monitoring tools effectiveness. But, alas, this also “is what it is”. You can sign up for a 14 day trial with no credit card, so that is good at least.

Once you sign up… you get an email, stating someone is going to contact you…man – what a disappointment. I hope the folks at LogicMonitor are reading this, this is not how to attract customers.



Off to test DataDog it looks like.

Update: At least I get the automated email quickly, however it shouldn’t take 30 minutes to add a “test instance”


It’s almost time for #Commitmas!

As everyone gets ready for Thanksgiving in the US I wanted to remind you that Commitmas 2015 is right around the corner. I don’t know what I would have done last year without Commitmas to pull us through. Anyway – uh, Commitmas? Huh, could it be that some of you are not acquainted with the story of Commitmas?

Matthew Brender came up with the idea for Commitmas last year to help people who were new to GitHub to have experience with the platform and learn about what it can do. Well, it is back this year, except  we are taking the 12 Days of Commitmas and expanding the holiday to 30 days! During those 30 days we have lined up presenters for each day of the work week (Mon-Fri) to present on various topics.

It will start with an introduction to GitHub by Matthew Brender and the first few episodes will help everyone new to GitHub get acquanted with the terms and working with tools on their local machine. As the month goes on, you will start to see how GitHub should be part of your normal workflow. You’ll see how it can be used with network configs, building web pages and using it with other popular products such as Slack.

Currently there are 4 openings left in the schedule, if you would like to present, please contact either myself or Rob Nelson (@rnelson0) or check out the 30 Days of Commitmas 2015 schedule over at GitHub where you can also find more details about how to participate. In short, if you are new to GitHub, the goal is to get setup and commit something you learned that day to the README file in your GitHub repository (if none of that makes sense, don’t worry it will). For people with more experience, there are other challenges such as working on other projects or contributing to opensource projects, as well as help those who are new get over hurdles.

You can sign up for the Commitmas #vBrownBag episodes at

Hope to see you there!

Ansible Role for Sensu

I am looking into various monitoring products, and since I may need to install them again, that means automation. With some help from Sarah Zelechoski and Larry Smith, I have the first pass done on an Ansible role for Sensu. There may be better ones out there, or you might just want to follow the directions manually but so far this role gets the install working up through the base install with examples. This is just as much about me getting better with Ansible, don’t like how I did something? PRs welcome as that is how I will learn from those with more experience.

The role is still a work in progress, so very much as some clean up to do, but should get you going quickly on Ubuntu 14.04. Here is a screenshot of the sensu server log:


You can grab it from


Where are all the blackbox VMware vendors?

In my last post, I wanted to point for people who think of OpenStack as “too complex” stacks up component for component to a VMware stack; now that is not to say they are equal in every way, far from it. They can, however, coexist as I still believe they have different purposes in the data center. This post lead to a conversation with Trevor pot which basically boiled down to KVM v ESXi – a very interesting topic, and not something we are likely to solve on Twitter or a blog post. But, as Trevor usually does, he made me think (shaking fist in air at Trevor – how dare you make me think!).

My position is that I would never deploy KVM to an SMB, which Trevor has experience doing, and successfully done so. He also cited vendors such as Scale Computing as a success story about deploying KVM in the SMB, which is not an apples to apples comparison between KVM and ESXi in my opinion.

First and foremost, when I make the decision to deploy Scale Computing, or similar vendors who have essentially hidden or abstracted KVM into a black box behind the scenes, I am generally not making the decision to deploy them BECAUSE of KVM. Rather I would chose Scale Computing for a great many number of SMBs because of what Scale Computing has built for me as an administrator to worry about managing my virtual machines and applications, not the underlying hypervisor. There are quite a few SMBs where even something Scale Computing has too high of a cost. Having worked at smaller VARS/MSPs I can tell you that there are many SMBs who don’t need more but a few servers for them to operate their business, and I can and have do this very easily with ESXi and no vCenter while still providing and meeting SLAs and availability concerns.

Just to be clear here – Scale Computing is a GREAT solution, I am not bashing them because of KVM. They have proved this model out and would sign off on the purchase an implementation any day. For that same company, however, I might (and probably would) not sign off on a pure KVM deployment.

Another concern I would have with deploying pure KVM at an SMB is ongoing maintenance and support. While I, as an engineer, may very well be able to support the KVM deployment, you also need to consider that you may not also be around, I mean heck if I won the lottery tomorrow I’d probably be taking a long vacation at the beach. So, why is that a problem? Having been Director of IT for a few different organizations, and involved in the hiring process with several others, I can tell you that finding a quality Linux operations/IT engineer is not small task – at least in Boston (which isn’t exactly the equivalent of Death Valley for the tech world). Now, I have met a great deal of programmers well versed in linux – but they don’t want to do ops, and if we are talking about a typical, non tech SMB, they are probably not interested in doing it full time…they’d rather be developing – that that is totally cool, but also means the SMB will struggle to replace your skill set. (That’s not to say programmers can’t do IT, please don’t extract that from what I have said – its just generally not their passion, not the thing they want to do everyday. Someone I consider one of my best friends, and is a programmer, understands and COULD do IT/infrastructure very easily, it’s just not what he wants to do).

Another area to consider is the ecosystem – how many backup vendors support KVM? Veeam? Nope; they support VMware and Hyper-V though! Unitrends? Nope; support VMware and Hyper-V though! Now before you go rip me up on Twitter, yes KVM runs on Linux so if you can backup Linux you can backup the related files. You can also do in guest backup of the virtual machines but at this point you may be making decisions to scrape previous investment and knowledge for certain tools to support KVM. Check out IT support forums and watch the tumbleweeds roll by when someone asks a question on pure KVM or Xen. Since I mentioned Xen, probably a great time to point out that before my first production deployment of VMware back in 2008 we had tested VMware, Xen, and Hyper-V. From a pure technology and performance standpoint, we actually selected Xen for the project, however during the production POC we ran into some errors even Google hadn’t heard of. We hoped on the VMware community forums and started searching for posts related to the NIC we were using (was a NIC problem with Xen) and had plenty of similar posts and ways to fix it – we ended up switching to VMware (and this was a company that pinched EVERY penny – even the CTO saw the value in spending the money on the VMware licenses).

If KVM and OpenStack are so great, why are so many vendors making a career out of hiding or abstracting what they are behind some other management layer? If we go up to a full “cloud” solution you have vendors like Mirantis also abstracting KVM and OpenStack, as well as Project Caspian which EMC announced at EMC World. While I am sure there are some, I don’t know of any vendors who have built a black box to sell to customers to completely mask the full suite of VMware products – why is that? Some might point out that Nutanix has built Acropolis which can abstract ESXi/vCenter – and that is quite true. However, I don’t think (but don’t know since I don’t work for Nutanix) they built Acropolis with the intention of making ESXi/vCenter easier to manage – they made it so they would not have to rely on ESXi at all and could deploy KVM instead (that’s not meant to be FUD Josh). Acropolis gives Nutanix customers choice on which they want to deploy, without having to worry about maintaining and managing KVM, again – at that point KVM is a black box.

So if KVM and OpenStack are so great, why are there so many black box vendors abstracting it? And if  VMware the deployment and management of a full suite of components is so awful and hard…why are there no black box vendors for that?

Before you go – please read!!!

Between this and my last post, you would think I’d hate myself. I am not saying KVM or OpenStack is bad, nor am I saying that VMware/vSphere/vRA is the only solution you should consider, but, as with everything it comes down to both TECHNICAL AND BUSINESS requirements as to what the right solution is. The black box solutions vendors like Scale Computing produce and support are top notch, and if you are an SMB admin you should have them on your list for your next refresh, but you also need to consider the impact to deploying the pure versions of the underlying technology as well.

OpenStack is to complex, I’ll stick with VMware

Over the last 6 months or so I’ve had to spend more time with OpenStack that I had in my entire career combined. One thing I kept hearing was how complex OpenStack was/is, that there were to many components to keep track of. As I sat down to really think about that, especially as it relates to VMware I came up with this. Now not everything is a perfect 1-to-1 match, so please don’t tear me up on Twitter/comments, but I think you’ll get the point:


So, still think VMware is less complex? Or has less components? Now, certainly there are differences in deployment and configuration but circa VMware’s 5th birthday (circa ESXi 3.0/3.5) it wasn’t that easy to install either, and there certainly wasn’t the range of products that we have today. However given the number of different products VMware offers for different purposes, I think, at least at a 30,000 foot view that it is clear that the number of products/programs between VMware based solutions and OpenStack based solutions is very similar.

From everything I have seen so far, I still feel that OpenStack is more suited for organizations that can dedicate development resources to maintaining their infrastructure, versus developing business applications.

Also Sorry for the click bait title :)

Configure vCenter Server Appliance 6 VCSA6 after VMware Workstation Deployment

**Update: If, like me, you only have the 6.0.0 version of the VCSA available to download, check out this post for how to install Update 1 to get the VAMI back. Thank you Christian for the reminder on the VAMI being brought back**

As some folks may recall from last years #vDM30in30, I run my home lab on a single 8-core AMD box with all of the hosts nested. I do, however, prefer to run certain virtual machines such as my Domain Controller and vCenter Server in Workstation as well, so I can have those powered on without having to worry about host issues (after all this is a lab and I h0rked up my ESXi 5.5 hosts pretty good doing some testing).

Since the new vCenter Server Appliance (VCSA6) has a new deployment method, installing it in Workstation is not obvious, however Florian Grehl has a great write up on how to do just that – Thanks Florian. Once you have it deployed, there are a few things to take care of that the installer would have otherwise handled.


  • Next, navigate to Active Directory and click the Join button. Enter the details for your domain


For me, using the current latest version of the VCSA6, I did not receive an acknowledgement that the join was successful, however if you check AD you should see the computer object created in the OU/CN specified.

  • Reboot the VCSA6 (Right click on the node in the vSphere Web Client and select Reboot). Since this is deployed in Workstation, we can monitor the progress of the restart there.


Once the VCSA6 completes the restart, you should now be able to access the vSphere Web Client by its FQDN. If you navigate back to the node, you can now see that it is joined to the domain



At this point, I would normally set the NTP server as well, however I don’t see an option to do so through the vSphere Web Client. There is where updating to Update 1 comes in handy, as the NTP server can be set through the new HTML5 VAMI (https://<vc-fqdn>:5480). See this post if you had to download the 6.0.0 version of the VCSA to install Update 1.

Next, you will need to configure SSO. Navigate to Administration >> Single Sign On >> Configuration. Here you can update the policies, and add AD as an identity source. Time to get my lab reconfigured!

Clean Up Active Directory Computer Objects w vRealize Automation Custom Properties and Build Profiles

Chalk this up to I should have paid more attention when RTFD (reading the documentation), but since I missed it tucked away in there, I thought others might have as well. vRealize Automation ships with several custom properties that you can use to delete Active Directory computer objects when destroying a virtual machine deployed through vRA. This is excellent for Windows shops who might otherwise have to build some other means of cleanup. One item to point out, however, is that this will delete the computer object immediately. If you have any type of retention period where you might have to restore these VMs, you would then also need to restore the AD object.

This also assumes you are joining Windows based VMs to AD when they are deployed through a customization spec, or doing it manually but that seems like it would be a pain (I hope that isn’t your job to do!)

Steps to perform as a Fabric Administrator

  • Log into vRA as a fabric admin to and navigate to Infrastructure >> Blueprints >> Build Profiles
  • Create a new build profile and provide a name such as ADCleanUp
  • In the Add from property set area, select ActiveDirectoryCleanupPlugin
  • Click the pencil icon next to Plugin.AdMachineCleanup.UserName and enter the username for an account you have delegated rights to delete computer objects, or be bad and user your domain admin user (that is bad, so don’t, but if you do I told you so!)
  • Click the pencil icon next to Plugin.AdMachineCleanup.Password, click the encrypted checkbox and enter the password for the account used in the previous step

Steps to perform as a Tenant Administrator

  • Log into vRA as a tenant admin and edit the blueprint you want to assign the build profile to
  • Click on the Properties tab and click the checkbox for ADCleanUp (or whatever you named yours)

Log in as a user who has access to the blueprint, deploy it, check AD, destroy it, and check AD again – poof – its gone!


Deploying the EMC vVNX VVOLs Technical Preview

**Disclaimer: I am an EMC employee. This post was not requested or required by my employer, it is simply my experience getting to know the product**

Back in May at EMC World, support for VMware Virtual Volumes, or VVOLs, was announced and would be supported on the Virtual VNX (vVNX) that was also relased that May (however without VVOL support). Fast forward to August and a Technical Preview of VVOLs was released which anyone can download. I think blog post I will be deploying the vVNX and making sure it is setup and ready to attach to a vSphere 6 cluster (future post).

Make sure you have a host that can take a VM running 12GB of RAM, I tried dropping it down to 4 and it was unnnnnhappy. I am going to try again after this post to redeploy and start it with 8GB to see if that works. After downloading the appliance, log into the vSphere web client and start the Deploy OVF Template wizard and select the local file you downloaded.

  • Accept the extra configuration options and EULA
  • Provide a name and select a folder
  • Select whether to thin, or thick provision the appliance (you will need a bit over 2GB for thin, and 84GB for thick)
  • Select the networks for the vVNX management and data interfaces
  • Provide the system name and IP information for the interfaces


Click Finish to start the deployment – do not set the virtual machine to power on after deployment. When the deployment finishes, add additional drives to the the vVNX to use for storage and then power on the virtual machine when finished.

Open a web browser and navigate to the management IP address you provided via https, so in my example it would be and log in with the default username and password of admin / Password123# which will launch the initial configuration wizard


The following are the high level steps for the vVNX initial configuration:

  • Accept the EULA
  • Change the admin password from Password123#
  • [Optional] Set a different service password (default is to make it the same)
  • Register and save your vVNX license file by providing the System UUID
  • Set the DNS & NTP servers for your network – note if there is a major time difference you will need to set NTP later
  • Create storage pools and add drives – for example I added three drives to my vVNX from my Synology, so I have created a pool called vxprt-capacity


  • Assign tiers to the drives, I opted for capacity.
  • During the pool creation wizard, when prompted check the box for Create VMware Capability Profile and assign tags
  • Add iSCSi interfaces
  • Optionally, create a NAS (I skipped this step for now)

Once the initial configuration is complete, you will be returned to the new HTML5 Unisphere client (which is pretty nice by the way!). Next up for me is getting ESXi 6 and the vSphere 6 VCSA setup in my lab.