September 22, 2016

Importing a Public SSH Key

Rex was setting up a server and wanted some help.  His hosting provider had set him up with a username and password for authentication. He wanted me to log in to the machine under his account to help out.  I didn’t want him to have to give me his password.  Rex is a smart guy, but he is not a Linux user.  He is certainly not a system administrator.  The system was CentOS.  The process was far more difficult to walk

I use public keys cryptography all the time to log in to remote systems.  The OpenSSH client uses a keypair that is stored on my laptop under $HOME/.ssh.  The public key is in $HOME/.ssh/id_rsa and the private one is in $HOME/.ssh/  In order for the ssh command to use this keypair to authenticate me when I try to login, the key stored in $HOME/.ssh/ first needs to be copied, to the remote machine’s $HOME/.ssh/authorized_keys file.  If the permissions on this file are wrong, or the permissions on the directory  $HOME/.ssh are wrong, ssh will refuse my authentication attempt.

Trying to work this out over chat with someone unfamiliar with the process was frustrating.

This is what the final product looks like.

rex@drmcs [~]# ls -la $HOME/.ssh/
total 12
drwx------ 2 rex rex 4096 Sep 21 13:01 ./
drwx------ 9 rex rex 4096 Sep 21 13:28 ../
-rw------- 1 rex rex  421 Sep 21 13:01 authorized_keys

This should be scriptable.


exit 0

mkdir -p $SSH_DIR
chmod 700 $SSH_DIR
chmod 600 $AUTHN_FILE

However, it occured to me that he really should not even be adding me to his account, but, instead, should be creating a separate account for me, only giving me access to that, which would let me look around but not touch. Second attempt:


exit 0

/usr/sbin/useradd $NEW_USER

mkdir -p $SSH_DIR
chmod 700 $SSH_DIR
touch $AUTHN_FILE 
chmod 600 $AUTHN_FILE


To clean up the account when I am done, Rex can run:

sudo /usr/sbin/userdel -r admiyo

Which will not only remove my account, but also the directory /home/ayoung
If I have left a login he will see:

userdel: user admiyo is currently used by process 3561

September 21, 2016

Distinct RBAC Policy Rules

The ever elusive bug 968696 is still out there, due, in no small part, to the distributed nature of the policy mechanism. One Question I asked myself as I chased this beastie is “how many distinct policy rules do we actually have to implement?” This is an interesting question because, if we can an automated way to answer that question, it can lead to an automated way to transforming the policy rules themselves, and thus getting to a more unified approach to policy.

The set of policy files used in a Tripleo overcloud have around 1400 rules:

$ find /tmp/policy -name \*.json | xargs wc -l
   73 /tmp/policy/etc/sahara/policy.json
   61 /tmp/policy/etc/glance/policy.json
  138 /tmp/policy/etc/cinder/policy.json
   42 /tmp/policy/etc/gnocchi/policy.json
   20 /tmp/policy/etc/aodh/policy.json
   74 /tmp/policy/etc/ironic/policy.json
  214 /tmp/policy/etc/neutron/policy.json
  257 /tmp/policy/etc/nova/policy.json
  198 /tmp/policy/etc/keystone/policy.json
   18 /tmp/policy/etc/ceilometer/policy.json
  135 /tmp/policy/etc/manila/policy.json
    3 /tmp/policy/etc/heat/policy.json
   88 /tmp/policy/auth_token_scoped.json
  140 /tmp/policy/auth_v3_token_scoped.json
 1461 total

Granted, that might not be distinct rule lines, as some are multi-line, but most rules seem to be on a single line. There is some whitespace, too.

Many of the rules, while written differently, can map to the same implementation. For example:

“rule: False”

can reduce to


which is the same as


All are instances of oslo_policy.policy._check.FalseCheck.

With that in mind, I gathered up the set of policy files deployed on a Tripleo overcloud and hacked together some analysis.

Note: Nova embeds its policy rules in code now. In order to convert them to an old-style policy file, you need to run a command line tool:

oslopolicy-policy-generator --namespace nova --output-file /tmp/policy/etc/nova/policy.json

Ironic does something similar, but uses

oslopolicy-sample-generator --namespace=ironic.api --output-file=/tmp/policy/etc/ironic/policy.json

I’ve attached my source code at the bottom of this article. Running the code provides the following summary:

55 unique rules found

The longest rule belongs to Ironic:

OR(OR(OR((ROLE:admin)(ROLE:administrator))AND(OR((tenant == demo)(tenant == baremetal))(ROLE:baremetal_admin)))AND(OR((tenant == demo)(tenant == baremetal))OR((ROLE:observer)(ROLE:baremetal_observer))))

Some look somewhat repetitive, such as

OR((ROLE:admin)(is_admin == 1))

And some downright dangerous:

NOT( (ROLE:heat_stack_user)

A there are ways to work around having an explicit role in your token.

Many are indications of places where we want to use implied roles, such as:

  1. OR((ROLE:admin)(ROLE:administrator))
  2. OR((ROLE:admin)(ROLE:advsvc)
  3. (ROLE:admin)
  4. (ROLE:advsvc)
  5. (ROLE:service)


This is the set of keys that appear more than one time:

9 context_is_admin
4 admin_api
2 owner
6 admin_or_owner
2 service:index
2 segregation
7 default

Doing a grep for context_is_admin shows all of them with the following rule:

"context_is_admin": "role:admin",

admin_api is roughly the same:

cinder/policy.json: "admin_api": "is_admin:True",
ironic/policy.json: "admin_api": "role:admin or role:administrator"
nova/policy.json:   "admin_api": "is_admin:True"
manila/policy.json: "admin_api": "is_admin:True",

I think these here are supposed to include the new check for is_admin_project as well.

Owner is defined two different ways in two files:

neutron/policy.json:  "owner": "tenant_id:%(tenant_id)s",
keystone/policy.json: "owner": "user_id:%(user_id)s",

Keystone’s meaning is that the user matches, where as neutron is a project scope check. Both rules should change.

Admin or owner has the same variety

cinder/policy.json:    "admin_or_owner": "is_admin:True or project_id:%(project_id)s",
aodh/policy.json:      "admin_or_owner": "rule:context_is_admin or project_id:%(project_id)s",
neutron/policy.json:   "admin_or_owner": "rule:context_is_admin or rule:owner",
nova/policy.json:      "admin_or_owner": "is_admin:True or project_id:%(project_id)s"
keystone/policy.json:  "admin_or_owner": "rule:admin_required or rule:owner",
manila/policy.json:    "admin_or_owner": "is_admin:True or project_id:%(project_id)s",

Keystone is the odd one out here, with owner again meaning “user matches.”

Segregation is another rules that means admin:

aodh/policy.json:       "segregation": "rule:context_is_admin",
ceilometer/policy.json: "segregation": "rule:context_is_admin",

Probably the trickiest one to deal with is default, as that is a magic term that is used when a rule is not defined:

sahara/policy.json:   "default": "",
glance/policy.json:   "default": "role:admin",
cinder/policy.json:   "default": "rule:admin_or_owner",
aodh/policy.json:     "default": "rule:admin_or_owner",
neutron/policy.json:  "default": "rule:admin_or_owner",
keystone/policy.json: "default": "rule:admin_required",
manila/policy.json:   "default": "rule:admin_or_owner",

There seem to be three catch all approaches:

  1. require admin,
  2. look for a project match but let admin override
  3. let anyone execute the API.

This is the only rule that cannot be made globally unique across all the files.

Here is the complete list of suffixes.  The format is not strict policy format; I munged it to look for duplicates.

(field == address_scopes:shared=True)
(field == networks:router:external=True)
(field == networks:shared=True)
(field == port:device_owner=~^network:)
(field == subnetpools:shared=True)
(group == nobody)
(is_admin == False)
(is_admin == True)
(is_public_api == True)
(project_id == %(project_id)s)
(project_id == %(resource.project_id)s)
(tenant_id == %(tenant_id)s)
(user_id == %(target.token.user_id)s)
(user_id == %(trust.trustor_user_id)s)
(user_id == %(user_id)s)
AND(OR((tenant == demo)(tenant == baremetal))OR((ROLE:observer)(ROLE:baremetal_observer)))
AND(OR(NOT( (field == rbac_policy:target_tenant=*) (ROLE:admin))OR((ROLE:admin)(tenant_id == %(tenant_id)s)))
NOT( (ROLE:heat_stack_user) 
OR((ROLE:admin)(is_admin == 1))
OR((ROLE:admin)(project_id == %(created_by_project_id)s))
OR((ROLE:admin)(project_id == %(project_id)s))
OR((ROLE:admin)(tenant_id == %(network:tenant_id)s))
OR((ROLE:admin)(tenant_id == %(tenant_id)s))
OR((ROLE:advsvc)OR((ROLE:admin)(tenant_id == %(network:tenant_id)s)))
OR((ROLE:advsvc)OR((tenant_id == %(tenant_id)s)OR((ROLE:admin)(tenant_id == %(network:tenant_id)s))))
OR((is_admin == True)(project_id == %(project_id)s))
OR((is_admin == True)(quota_class == %(quota_class)s))
OR((is_admin == True)(user_id == %(user_id)s))
OR((tenant == demo)(tenant == baremetal))
OR((tenant_id == %(tenant_id)s)OR((ROLE:admin)(tenant_id == %(network:tenant_id)s)))
OR(NOT( (field == port:device_owner=~^network:) (ROLE:advsvc)OR((ROLE:admin)(tenant_id == %(network:tenant_id)s)))
OR(NOT( (field == rbac_policy:target_tenant=*) (ROLE:admin))
OR(OR((ROLE:admin)(ROLE:administrator))AND(OR((tenant == demo)(tenant == baremetal))(ROLE:baremetal_admin)))
OR(OR((ROLE:admin)(is_admin == 1))(ROLE:service))
OR(OR((ROLE:admin)(is_admin == 1))(project_id == %(
OR(OR((ROLE:admin)(is_admin == 1))( == %(
OR(OR((ROLE:admin)(is_admin == 1))(user_id == %(target.token.user_id)s))
OR(OR((ROLE:admin)(is_admin == 1))(user_id == %(user_id)s))
OR(OR((ROLE:admin)(is_admin == 1))AND((user_id == %(user_id)s)(user_id == %(target.credential.user_id)s)))
OR(OR((ROLE:admin)(project_id == %(created_by_project_id)s))(project_id == %(project_id)s))
OR(OR((ROLE:admin)(project_id == %(created_by_project_id)s))(project_id == %(resource.project_id)s))
OR(OR((ROLE:admin)(tenant_id == %(tenant_id)s))(ROLE:advsvc))
OR(OR((ROLE:admin)(tenant_id == %(tenant_id)s))(field == address_scopes:shared=True))
OR(OR((ROLE:admin)(tenant_id == %(tenant_id)s))(field == networks:shared=True)(field == networks:router:external=True)(ROLE:advsvc))
OR(OR((ROLE:admin)(tenant_id == %(tenant_id)s))(field == networks:shared=True))
OR(OR((ROLE:admin)(tenant_id == %(tenant_id)s))(field == subnetpools:shared=True))
OR(OR(OR((ROLE:admin)(ROLE:administrator))AND(OR((tenant == demo)(tenant == baremetal))(ROLE:baremetal_admin)))AND(OR((tenant == demo)(tenant == baremetal))OR((ROLE:observer)(ROLE:baremetal_observer))))
OR(OR(OR((ROLE:admin)(is_admin == 1))(ROLE:service))(user_id == %(target.token.user_id)s))

Here is the source code I used to analyze the policy files:

#!/usr/bin/env python

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import sys

from oslo_serialization import jsonutils

from oslo_policy import policy
import oslo_policy._checks as _checks

def display_suffix(rules, rule):

    if isinstance (rule, _checks.RuleCheck):
        return display_suffix(rules, rules[rule.match.__str__()])

    if isinstance (rule, _checks.OrCheck):
        answer =  'OR('
        for subrule in rule.rules:
            answer += display_suffix(rules, subrule)
        answer +=  ')'
    elif isinstance (rule, _checks.AndCheck):
        answer =  'AND('
        for subrule in rule.rules:
            answer += display_suffix(rules, subrule)
        answer +=  ')'
    elif isinstance (rule, _checks.TrueCheck):
        answer =  "TRUE"
    elif isinstance (rule, _checks.FalseCheck):
        answer =  "FALSE"
    elif isinstance (rule, _checks.RoleCheck):       
        answer =  ("(ROLE:%s)" % rule.match)
    elif isinstance (rule, _checks.GenericCheck):       
        answer =  ("(%s == %s)" % (rule.kind, rule.match))
    elif isinstance (rule, _checks.NotCheck):       
        answer =  'NOT( %s ' % display_suffix(rules, rule.rule)
        answer =  (rule)
    return answer

class Tool():
    def __init__(self):
        self.prefixes = dict()
        self.suffixes = dict()

    def add(self, policy_file):
        policy_data =
        rules = policy.Rules.load(policy_data, "default")
        suffixes = []
        for key, rule in rules.items():
            suffix = display_suffix(rules, rule)
            self.prefixes[key] = self.prefixes.get(key, 0) + 1
            self.suffixes[suffix] = self.suffixes.get(suffix, 0) + 1

    def report(self):
        suffixes = sorted(self.suffixes.keys())
        for suffix in suffixes:
            print (suffix)
        print ("%d unique rules found" % len(suffixes))
        for prefix, count in self.prefixes.items():
            if count > 1:
                print ("%d %s" % (count, prefix))
def main(argv=sys.argv[1:]):
    tool = Tool()
    policy_dir = "/tmp/policy"
    name = 'policy.json'
    suffixes = []
    for root, dirs, files in os.walk(policy_dir):
        if name in files:
            policy_file_path = os.path.join(root, name)
            print (policy_file_path)
            policy_file = open(policy_file_path, 'r')

if __name__ == "__main__":

September 20, 2016

Is dialup still an option?
TL;DR - No.

Here's why.

I was talking with my Open Source Security Podcast co-host Kurt Seifried about what it would be like to access the modern Internet using dialup. So I decided to give this a try. My first thought was to find a modem, but after looking into this, it isn't really an option anymore.

The setup

  • No Modem
  • Fedora 24 VM
  • Firefox as packaged with Fedora 24
  • Use the firewall via wondershaper to control the network speed
  • "App Telemetry" firefox plugin to time the site load time

I know it's not perfect, but it's probably close enough to get a feel for what's going on. I understand this doesn't exactly recreate a modem experience with details like compression, latency, and someone picking up the phone during a download. There was nothing worse than having that 1 megabyte download at 95% when someone decided they needed to make a phone call. Call waiting was also a terrible plague.

If you're too young to understand any of this, be thankful. Anyone who looks at this time with nostalgia is pretty clearly delusional.

I started testing at a 1024 Kb connection and halved my way down to 56 (instead of 64). This seemed like a nice way to get a feel for how these sites react as your speed shifts down.


I picked the most popular english language sites listed on the Alexa top 100. I added becuase I like them, and my kids had me add twitch. My home Internet connection is 50 Mb down, 5 Mb up. As you can see, in general all these sites load in less than 5 seconds. The numbers represent the site being fully loaded. Most web browsers seem to show something pretty quickly, even if the page is still loading. For the purpose of this test, our numbers are how long it takes a site to fully load. I also show 4 samples because as you'll see later on, some of these sites took a really really long time to load, so four was as much suffering as I could endure. Perhaps someday I'll do this again with extra automation so I don't have to be so involved.

1024 Kb/s

Things really started to go downhill at this point. Anyone who claims a 1 megabit connection is broadband has probably never tried to use such a connection. In general though most of the sites were usable from a very narrow definition ofh the word.

512 Kb/s

You're going to want to start paying attention to Amazon, something really clever is going to happen, it's sort of noticeable in this graph. Also of note is how consistent is. While not the fastest site, it will remain extremely consistent through the entire test.

256 Kb/s

Here is where you can really see what Amazon is doing. They clearly have some sort of client side magic happening to ensure an acceptable response. For the rest of my testing I saw this behavior. A slow first load, then things were much much faster. Waiting for sites to load at this speed was really painful, it's only going to get worse from here. 15 seconds doesn't sound horrible, but it really is a long time to wait.

128 Kb

Things are not good at 128 Kb/s. Wikipedia looks empty, it was still loading at the same speed as our fist test. I imagine my lack of an ad enhanced experience with them helps keeps it so speedy.

56 Kb

Here is the real data you're waiting for. This is where I set the speed to 56K down, 48K up, which is the ideal speed of a 56K modem. I doubt most of us got that speed very often.

As you can probably see, Twitch takes an extremely long time to load. This should surprise nobody as it's a site that streams video, by definition it's expected you have a fast connection. Here is the graph again with Twitch removed.
The Yahoo column is empty because I couldn't get Yahoo to load. It timed out every single time I tried. Wikipedia looks empty, but it still loaded at 0.3 seconds. After thinking about this it does make sense. There are Wikipedia users who are on dialup in some countries. They have to keep it lean. Amazon still has a slow first load, then nice and speedy (for some definition of speedy) after that. I tried to load a youtube video to see if it would work. After about 10 minutes of nothing happening I gave up.

Typical tasks

I also tried to perform a few tasks I would consider "expected" by someone using the Internet.

For example from the time I typed in until I could read a mail message took about 600 seconds I did let every page load completely before clicking or typing on it. Once I had it loaded, and the AJAX interface timed out then told me to switch to HTML mode, it was mostly usable. It was only about 30 seconds to load a message (including images) and 0.2 seconds to return to the inbox.

Logging into Facebook took about 200 seconds. It was basically unusable once it loaded though. Nothing new loaded, it loads quite a few images though, so this makes sense. These things aren't exactly "web optimized" anymore. If you know someone on dialup, don't expect them to be using Facebook. took 800 seconds. Reddit's front page was 750 seconds. Google News was only 33 seconds. The newspaper is probably a better choice if you have dialup.

I finally tried to run a "yum update" in Fedora to see if updating the system was something you could leave running overnight. It's not. After about 4 hours of just downloading repo metadata I gave up. There was no way you can plausibly update a system over dialup. If you're on dialup, the timeouts will probably keep you from getting pwnt better than updates will.

Another problem you hit with a modern system like this is it tries to download things automatically in the background. More than once I had to kill some background tasks that basically ruined my connection. Most system designers today assume everyone has a nice Internet connection so they can do whatever they want in the background. That's clearly a problem when you're running at a speed this slow.


Is the Internet usable on Dialup in 2016? No. You can't even pretend it's maybe usable. It pretty much would suck rocks to use the Internet on dialup today. I'm sure there are some people doing it. I feel bad for them. It's clear we've hit a place where broadband is expected, and honestly, you need fast broadband, even 1 Megabit isn't enough anymore if you want a decent experience. The definition of broadband in the US is now 25Mb down 3Mb up. Anyone who disagrees with that should spend a day at 56K.

I know this wasn't the most scientific study ever done, I would welcome something more rigorous. If you have any questions or ideas hit me up on Twitter: @joshbressers
Mirroring Keystone Delegations in FreeIPA/389DS

This is more musing than a practical design.

Most application servers have a means to query LDAP for the authorization information for a user.  This is separate from, and follows after, authentication which may be using one of multiple mechanism, possibly not even querying LDAP (although that would be strange).

And there are other mechanisms (SAML2, SSSD+mod_lookup_identity) that can, also, provide the authorization attributes.

Separating mechanism from meaning, however, we are left with the fact that applications need a way to query attributes to make authorization decisions.  In Keystone, the general pattern is this:

A project is a group of resources.

A user is assigned a role on a project.

A user requests a token for a project. That token references the users roles.

The user passes the token to the server when accessing and API. Access control is based on the roles that the user has in the associated token.

The key point here is that it is the roles associated with the token in question that matter.  From that point on, we have the ability to inject layers of indirection.

Here is where things fall down today. If we take an app like WordPress, and tried to make it query against Red Hat’s LDAP server for the groups to use, there is no mapping  between the groups assigned and the permissions that the user should have.  As the WordPress instance might be run by any one of several organizations within Red Hat, there is no direct mapping possible.

If we map this problem domain to IPA, we see where things fall down.

WordPress, here, is a service.  If the host it is running on is owned by a particular organization (say, EMEA-Sales) it should be the EMEA Sales group that determines who gets what permissions on WordPress.

Aside: WordPress, by the way, makes a great example to use, as it has very clear, well defined roles,  which have a clear scope of authorization for operations.

Subscriber < Contributor < Author < Editor < Administrator

Back to our regular article:

If we define and actor as either a user or agroup of users, a Role assignment is a : (actor, organization, application, role)



Now, a user should not have to go to IPA, get a token, and hand that to WordPress.  When a user connects to WordPress, and attempts to do any non-public action, they are prompted for credentials, and are authenticated.  At this point, WordPress can do the LDAP query. And here is the question:

“what should an application query for in LDAP”

If we use groups, then we have a nasty naming scheme.  EMEA-sales_wordpress_admin versus LATAM-sales_worpress_admin.  This is appending the query  (organization, application) and the result (role).

Ideally, we would tag the role on the service.  The service already reflects organization and application.

In the RFC based schemas, there is a organizationalRole objectclass which almost mirrors what we want.  But I think the most important thing is to return an object that looks like a Group, most specifically groupofnames.  Fortunately, I think this is just the ‘cn’.

Can we put a group of names under a service?  Its not a container.

‘ipaService’ DESC ‘IPA service objectclass’ AUXILIARY MAY ( memberOf $ managedBy $ ipaKrbAuthzData) X-ORIGIN ‘IPA v2’ )

objectClass: ipaobject
objectClass: top
objectClass: ipaservice
objectClass: pkiuser
objectClass: ipakrbprincipal
objectClass: krbprincipal
objectClass: krbprincipalaux
objectClass: krbTicketPolicyAux

It probably would make more sense to have a separate subtree service-roles,  with each service-name a container, and each role a group-of-names under that container. The application would  filter on (service-name) to get the set of roles.  For a specific user, the service would add an additional filter for memberof.

Now, that is a lot of embedded knowledge in the application, and does not provide any way to do additional business logic in the IPA server or to hide that complexity from the end user.  Ideally, we would have something like automember to populate these role assignments, or, even better, a light-weight way for a user with a role assignment to re-delegate that to another user or principal.

That is what really gets valuable:  user self service for delegation.  We want to make it such that you do not need to be an admin to create a role assignment, but rather (with exceptions) you can delegate to others any role that you have assigned to yourself.  This is a question of scale.

However, more than just scale, we want to be able to track responsibility;  who assigned a user the role that they have, and how did they have the authority to assign it?  When a user no longer has authority, should the people they have delegated to also lose it, or does that delegation get transferred?  Both patterns are required for some uses.

I think this fast gets beyond what can be represented easily in an LDAP schema.  Probably the right step is to use something like automember to place users into role assignments.  Expanding nested groups, while nice, might be too complicated.

September 18, 2016

Why do we do security?
I had a discussion last week that ended with this question. "Why do we do security". There wasn't a great answer to this question. I guess I sort of knew this already, but it seems like something too obvious to not have an answer. Even as I think about it I can't come up with a simple answer. It's probably part of the problems you see in infosec.

The purpose of security isn't just to be "secure", it's to manage risk in some meaningful way. In the real world this is usually pretty easy for us to understand. You have physical things, you want to keep them from getting broken, stolen, lost, pick something. It usually makes some sort of sense.

It would be really easy to use banks as my example here, after all they have a lot of something everyone wants, so instead let's use cattle, that will be more fun. Cows are worth quite a lot of money actually. Anyone who owns cows knows you need to protect them in some way. In some environments you want to keep your cows inside a pen, in others you let them roam free. If they roam free the people living near the cows need to protect themselves actually (barbed wire wasn't invented to keep cows in, it was used to keep them out). This is something we can understand. Some environments are very low risk, you can let your cattle roam where they want. Some are high risk, so you keep them in a pen. I eagerly await the cow related mails this will produce because of my gross over-simplification of what is actually a very complex and nuanced problem.

So now we have the question about what are you protecting? If you're a security person, what are you really trying to protect? You can't protect everything, there's no point in protecting everything. If you try to protect everything you actually end up protecting nothing. You need to protect the things you have that are not only high value, but also have a high risk of being attacked/stolen. That priceless statue in the pond outside that weighs four tons is high value, but nobody is stealing it.

Maybe this is why it's hard to get security taken seriously sometimes. If you don't know what you're protecting, you can't explain why you're important. The result is generally the security guy storming out screaming "you'll be sorry". They probably won't. If we can't easily identify what our risk is and why we care about it, we can't possibly justify what we do.

There are a lot of frameworks that can help us understand how we should be protecting our security assets, but they don't really do a great job of helping identify what those assets really are. I don't think this a bad thing, I think this is just part of maturing the industry. We all have finite budgets, if we protect things that don't need protecting we are literally throwing money away. So this begs the question what should we be protecting?

I'm not sure we can easily answer this today. It's harder than it sounds. We could say we need to protect the things that if were lost tomorrow would prevent the business from functioning. That's not wrong, but electricity and water fall into that category. If you tried to have an "electricity security program" at most organizations you'll be looking for a new job at the end of the day. We could say that customer data is the most important asset, which it might be, but what are you protecting it from? Is it enough to have a good backup? Do you need a fail-over data center? Will an IDS help protect the data? Do we want to protect the integrity or is our primary fear exfiltration? Things can get out of hand pretty quickly.

I suspect there may be some value to these questions in the world of accounting. Accountants spend much time determining assets and values. I've not yet looked into this, but I think my next project will be starting to understand how assets are dealt with by the business. Everything from determining value, to understanding loss. There is science here already, it would be silly for us to try to invent our own.

Leave your comments on Twitter: @joshbressers

September 16, 2016

Hierarchy of Isolation

One way to understand threads, process, containers, and VMs is to look at what each level of abstraction provides for isolation.

 abstraction stack & instructions heap process IDs,

filesystemn  &

network namespace

thread isolated shared shared shared
process isolated isolated shared shared
container isolated isolated isolated shared
Virtual Machine isolated isolated isolated isolated

I think of this as a hierarchy.

  • A Process is a thread, but one that also provides heap isolation.
  • A container is a process, but ont that also isolated the pid, network, and filesystem namespaces
  • A virtual machine is a process that, beyond the isolation provided by a container, provides a completely different Kernel instace.

September 12, 2016

On Experts
Are you an expert? Do you know an expert? Do you want to be an expert?

This came up for me the other day while having a discussion with a self proclaimed expert. I'm not going to claim I'm an expert at anything, but if you tell me all about how good you are, I'm not going to take it at face value. I'm going to demand some proof. "Trust me" isn't proof.

There are a rather large number of people who think they are experts, some think they're experts at everything. Nobody is an expert at everything. People who claim to have done everything should be looked at with great suspicion. Everyone can be an expert at something though.

One of the challenges we always face is trying to figure out who is actually an expert, and who only thinks they are an expert? There are plenty of people who sound very impressive, but if they have to deal with an actual expert, things fall apart pretty quick. They can get you into trouble if you're expecting expert advice. Especially in areas like security, bad advice can be worse than no advice.

The simple answer is to look at their public contributions. If you have someone who has ZERO public contributions, that's not an expert in anything. Even if you're working for a secretive organization, you're going to leave a footprint somewhere. No footprint means you should seriously question a person's expertise. Becoming an expert leaves a long crazy trail behind whoever gets there. In the new and exciting world of open source and social media there is no excuse for not being able to to show off your work (unless you don't have anything to show off of course).

If you think you're an expert, or you want to be an expert, start doing things in the open. Write code (if you don't have a github account, go get one). Write blog posts, answer questions, go to meetups. There are so many opportunities it's not even funny. Just because you think you're smart doesn't mean you are, go out and prove it.

September 10, 2016

Getting the URLs out of the Service Catalog with jq

When you make a call to Keystone to get a token, you also get back the service catalog. While many of my scripts have used the $OS_AUTH_URL to make follow on calls, if the calls are administrative in nature, you should use the URL in service catalog.

This makes use of curl fetch the token and jq to parse the output.

This call will fetch a token and ignore it, but instead pull the identity admin URL out of the Token.

curl -s -d @token-request.json -H "Content-type: application/json" $OS_AUTH_URL/auth/tokens | jq '.token | .catalog [] | select(.type == "identity") | .endpoints[] | select(.interface == "admin") | .url  ' 

Say you want to talk to Nova? That would be the compute API on the public endpoint:

curl -s -d @token-request.json -H "Content-type: application/json" $OS_AUTH_URL/auth/tokens | jq '.token | .catalog [] | select(.type == "compute") | .endpoints[] | select(.interface == "public") | .url  ' 
Generating Token Request JSON from Environment Variables

When working with New APIS we need to test them with curl prior to writing the python client. I’ve often had to hand create the JSON used for the token request, as I wrote about way back here.  Here is a simple bash script to convert the V3 environment variables into the JSON for a token request.




cat << EOF
    "auth": {
        "identity": {
            "methods": [
            "password": {
                "user": {
                    "domain": {
                        "name": "$OS_USER_DOMAIN_NAME"
                    "name": "$OS_USERNAME",
                    "password": "$OS_PASSWORD"
        "scope": {
            "project": {
                "domain": {
                    "name": "$OS_PROJECT_DOMAIN_NAME"
                "name": "$OS_PROJECT_NAME"


Run it like this:

./ > token-request.json

And test it

curl -si -d @token-request.json -H "Content-type: application/json" $OS_AUTH_URL/auth/tokens

Should return a lot of JSON output.

This is for a project scoped token. Minor variations would get you unscoped or domain scoped.

September 06, 2016

You can't weigh risk if you don't know what you don't know
There is an old saying we've all heard at some point. It's often attributed to Donald Rumsfeld.

There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know
If any of us have ever been in a planning meeting, a variant of this has no doubt come up at some point. It came up for me last week, and every time I hear it I think about all things we don't know we don't know. If you're not familiar with the concept, it works a bit like this. I know I don't know to drive a boat. But because I know I don't know this, I could learn. If you know you lack certain knowledge, you could find a way to learn it. If you don't know what you don't know, there is nothing you can do about it. The future is often an unknown unknown. There is nothing we can do about the future in many instances, you just have to wait until it becomes a known, and hope it won't be anything too horrible. There can also be blindness when you think you know something, but you really don't. This is when people tend to stop listening to the actual experts because they think they are an expert.

This ties back into conversations about risk and how we deal with it.

If there is something you don't know you don't know, by definition you can't weight the possible risk with whatever it is you are (or aren't) doing. A great example here is trying to understand your infrastructure. If you don't know what you have, you don't know which machines are patches, and you're not sure who is running what software, you have a lot of unknowns. It's probably safe to say at some future date there will be a grand explosion when everything start to fall apart. It's also probably safe to say if you have infrastructure like this, you don't understand the pile of dynamite you're sitting on.

Measuring risk can be like trying to take a picture of an invisible man. Do you know where your risk is? Do you know what it should look like? How big is it? Is it wearing a hat? There are so many things to keep track of when we try to understand risk. There are more people afraid of planes than cars, but flying is magnitudes safer. Humans are really bad at risk. We think we understand something (or think it's a known or known unknown). Often we're actually mistaken.

How do we deal with the unknown unknowns in a context like this? We could talk about being agile or quick or adaptive, whatever you want. But at the end of the day what's going to save you is your experts. Understand them, know where you are strong and weak. Someday the unknowns become knows, usually with a violent explosion. To some of your experts these risks are known, you may just have to listen.

It's also important to have multiple experts. If you only have one, they could believe they're smarter than they are. This is where things can get tricky. How can we decide who is actually an expert and who thinks they're an expert? This is a whole long complex topic by itself which I'll write about someday.

Anyway, on the topic of risk and unknowns. There will always be unknown unknowns. Even if you have the smartest experts in the world, it's going to happen. Just make sure your unknown unknowns are worth it. There's nothing worse than not knowing something you should.
Deploying Fernet on the Overcloud

Here is a proof of concept of deploying an OpenStack Tripleo Overcloud using the Fernet token provider.

I’m going to take the short cut of using the Keystone setup on the undercloud to generate the keys. Since the undercloud is still using UUID, this Key repo will not be used by the undercloud.

It makes use of Heat swift artifacts, which puts a copy of the Fernet repo on every node, not just the Keystone/Controller node. That may or may not be acceptable for your deployment.

On the undercloud

. ~/stackrc
sudo keystone-manage fernet_setup --keystone-user keystone --keystone-group keystone
sudo tar -zcf keystone-fernet-keys.tar.gz /etc/keystone/fernet-keys
upload-swift-artifacts -f keystone-fernet-keys.tar.gz

To add an additional value to the overcloud Hiera, use an additional deploy.yaml file.

export DEPLOY_ENV_YAML=$PWD/depoloy.yaml

Here is what this file looks like

            keystone::token_provider: 'fernet'

Deploy with

openstack overcloud deploy --templates -e 

And wait for completion

Check the state on the controller.

$ openstack server list
| ID                                   | Name                    | Status | Networks            |
| 756fbd73-e47b-46e6-959c-e24d7fb71328 | overcloud-controller-0  | ACTIVE | ctlplane= |
| 62b869df-1203-4d58-8e45-fac6cd4cfbee | overcloud-novacompute-0 | ACTIVE | ctlplane=  |
[stack@undercloud ~]$ ssh heat-admin@ 
Last login: Tue Sep  6 00:09:59 2016 from
[heat-admin@overcloud-controller-0 ~]$ sudo crudini --get /etc/keystone/keystone.conf token driver
[heat-admin@overcloud-controller-0 ~]$ sudo crudini --get /etc/keystone/keystone.conf token provider

Look in the database on the controller:

$ sudo su
[root@overcloud-controller-0 heat-admin]# mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 415
Server version: 10.1.12-MariaDB MariaDB Server

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use keystone
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [keystone]> select * from token;
Empty set (0.00 sec)

MariaDB [keystone]> 


Test the provider:


$ openstack token issue
WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
| Field | Value |
| expires | 2016-09-20 05:26:17+00:00 |
| id | gAAAAABX4LppE8vaiFZ992eah2i3edpO1aDFxlKZq6a_RJzxUx56QVKORrmW0-oZK3-Xuu2wcnpYq_eek2SGLz250eLpZOzxKBR0GsoMfxJU8mEFF8NzfLNcbuS-iz7SV-N1re3XEywSDG90JcgwjQfXW-8jtCm-n3LL5IaZexAYIw059T_-cd8 |
| project_id | 26156621d0d54fc39bf3adb98e63b63d |
| user_id | 397daf32cadd490a8f3ac23a626ac06c |

The really long token, but not as long as PKI token, is Fernet.

Note that the keys used to sign tokens are now available via the undercloud’s swift. I would recommend deleting them immediately after deployment with:


swift delete overcloud-artifacts keystone-fernet-keys.tar.gz

September 02, 2016

Deploying Server on Ironic Node Baseline

My team is working on the ability to automatically enroll servers launched from Nova in FreeIPA. Debugging the process has proven challenging;  when things fail, the node does not come up, and there is little error reporting.  This article posts a baseline of what things look like prior to any changes, so we can better see what we are breaking.

UPDATE: The command I ended up using to test this is:

openstack server create --flavor control --image overcloud-full testserver --nic net-id=ctlplane --key-name default

Since the reported error is that the port attach failed, I want to see what ports to expect.

$ . ./stackrc 
[stack@undercloud ~]$ openstack port list
| ID | Name | MAC Address | Fixed IP Addresses |
| eb32c2a9-9bd8-45bb-929a-ed626b845e3e | | fa:16:3e:92:32:94 | ip_address='', subnet_id='2a0bf352-1b8f-469b-bb55-cf6e193d5a4d' |

Prior to deploying a server, there is one port.

Deploying a server:

openstack server create --flavor control --image overcloud-full testserver --nic net-id=ctlplane --key-name default

Gives us a new port

$ openstack port list
| ID | Name | MAC Address | Fixed IP Addresses |
| 08dbcf34-6ac0-4edb-9079-93b2aced5afa | | 00:0d:25:4f:b1:f8 | ip_address='', subnet_id='2a0bf352-1b8f-469b-bb55-cf6e193d5a4d' |
| eb32c2a9-9bd8-45bb-929a-ed626b845e3e | | fa:16:3e:92:32:94 | ip_address='', subnet_id='2a0bf352-1b8f-469b-bb55-cf6e193d5a4d' |


Node list:

$ openstack baremetal node list
| UUID                                 | Name      | Instance UUID                        | Power State | Provisioning State | Maintenance |
| d6604837-b374-4ae2-9ad0-ff0d98c3119b | control-0 | fd60daf3-65fc-44bf-8f90-89b127e67e56 | power on    | active             | False       |
| e5c3e3a1-e466-411d-8707-652fdb87af54 | compute-0 | None                                 | power off   | available          | False       |

Can log in with ssh:

ssh centos@

After deleting ther server with:

openstack server delete testserver

Back to one port

$ openstack port list
| ID                                   | Name | MAC Address       | Fixed IP Addresses                                                       |
| eb32c2a9-9bd8-45bb-929a-ed626b845e3e |      | fa:16:3e:92:32:94 | ip_address='', subnet_id='2a0bf352-1b8f-469b-bb55-cf6e193d5a4d' |

Nodes are freed up and unassigned

$ openstack baremetal node list
| UUID                                 | Name      | Instance UUID | Power State | Provisioning State | Maintenance |
| d6604837-b374-4ae2-9ad0-ff0d98c3119b | control-0 | None          | power off   | available          | False       |
| e5c3e3a1-e466-411d-8707-652fdb87af54 | compute-0 | None          | power off   | available          | False       |

August 30, 2016

Running Qemu/KVM without libvirt

When I booted a VM yesterda, I noticed that there was a huge command line that showed up if I ran ps. I tried to run that by hand.  It is huge, so I wrapped it with a script, but the command is not too bad to understand:  everything that qemu needs to do needs to be passed in on the command line.

Complete command line is at the end of the article.
I first put SELinux into permissive mode, as it will not allow my user to create a VM.

I needed to adjust three values.

  1. First, the VM opens a domain socket, used for monitoring the VM. The path that this points to is in /var/lib/libvirt/qemu/ which my user does not have access to. I changed this to /home/ayoung/devel/qemu.
  2. The image is read out of /home/ayoung/devel/qemu which my user does not have access to. I copied it to /home/ayoung/devel/qemu and changed ownership. I also changed the path used in the call.
  3. How to connect to the network interface.  when libvirt kicks is off, it uses an fd argument, which indicates it should reuse an open file descriptor. Since that is a process specific value, we can’t use that, but need to link the VM up to a network some other way. I’m still fiddling with this.

The xml file that has that defined is in


The network is called default.  It is defined in


And that maps to :


I first tried using


but that seems to try to connect to


which is not allowed. So instead I tried changing to a bridge device:


If I run with that up, I see that ip addr reports:

12: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master virbr0 state UNKNOWN group default qlen 1000
    link/ether fe:2f:d3:35:da:91 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc2f:d3ff:fe35:da91/64 scope link 
       valid_lft forever preferred_lft forever

And that goes way if I kill the VM.

Once it is up I try to use netcat to talk to to the VM (thanks Kashyap)

$ nc -U monitor.sock
{"QMP": {"version": {"qemu": {"micro": 1, "minor": 6, "major": 2}, "package": " (qemu-2.6.1-1.fc24)"}, "capabilities": []}}

So the VM process is reporting status.

I can attach using SPICE:

remote-viewer spice://

but nothing shows. However, if the VM is not running, it just fails. So, I know this is working, but I still need a way to communicate with the VM.

Here is what I am using to run the VM.



/usr/bin/qemu-system-x86_64 \
    -machine accel=kvm \
    -name generic,debug-threads=on \
    -S \
    -machine pc-i440fx-2.6,accel=kvm,usb=off,vmport=off \
    -cpu Haswell-noTSX \
    -m 1024 \
    -realtime mlock=off \
    -smp 1,sockets=1,cores=1,threads=1 \
    -uuid 6f6f9463-8b7e-401c-910e-d217e00816a1 \
    -no-user-config \
    -nodefaults \
    -chardev socket,id=charmonitor,path=$VARPATH/monitor.sock,server,nowait \
    -mon chardev=charmonitor,id=monitor,mode=control \
    -rtc base=utc,driftfix=slew \
    -global kvm-pit.lost_tick_policy=discard \
    -no-hpet \
    -no-shutdown \
    -global PIIX4_PM.disable_s3=1 \
    -global PIIX4_PM.disable_s4=1 \
    -boot strict=on \
    -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 \
    -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 \
    -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 \
    -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 \
    -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
    -drive file=$IMAGEPATH/generic.qcow2,format=qcow2,if=none,id=drive-ide0-0-0 \
    -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \
    -netdev $NETDEV_PARAMS \
    -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:4d:04:7d,bus=pci.0,addr=0x3 \
    -chardev pty,id=charserial0 \
    -device isa-serial,chardev=charserial0,id=serial0 \
    -chardev spicevmc,id=charchannel0,name=vdagent \
    -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \
    -spice port=5900,addr=,disable-ticketing,image-compression=off,seamless-migration=on \
    -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 \
    -device intel-hda,id=sound0,bus=pci.0,addr=0x4 \
    -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \
    -chardev spicevmc,id=charredir0,name=usbredir \
    -device usb-redir,chardev=charredir0,id=redir0 \
    -chardev spicevmc,id=charredir1,name=usbredir \
    -device usb-redir,chardev=charredir1,id=redir1 \
    -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
    -msg timestamp=on

August 29, 2016

How do we explain email to an "expert"?
This has been a pretty wild week, more wild than usual I think we can all agree. The topic I found the most interesting wasn't about one of the countless 0day flaws, it was a story from Slate titled: In Praise of the Private Email Server

The TL;DR says running your own email server is a great idea. Almost everyone came out proclaiming it a terrible idea. I agree it's a terrible idea, but this also got me thinking. How do you explain this to someone who doesn't really understand what's going on?

There are three primary groups of people.

1) People who know they know nothing
2) People who think they're experts
3) People who are actually experts

If I had to guess, most of #3 knows running your own email server is pretty dangerous. #1 probably is happy to let someone else do it. #2 is a dangerous group, probably the largest, and the group who most needs to understand what's going on.

These ideas apply to a lot of areas, feel free to substitute the term "security" "cloud" "doughnuts" or "farming" for email. You'll figure it out with a little work.

So anyway.

A long time ago, if you wanted email you basically had to belong to an organization that ran an email server. Something like a university or maybe a huge company. Getting a machine on the Internet was a pretty big deal. Hosting email was even bigger. I could say "by definition this meant if you were running a machine on the Internet you were an expert", but I suspect that wasn't true, we just like to remember the past as being more awesome than it was.

Today anyone can spin up a machine in a few seconds. It's pretty cool but it also means literally anyone can run an email server. If you run a server for you and a few other people, it's unlikely anything terrible will happen. You'll probably get pwnt someday, you might notice, but the world won't end. How do we convince this group that just because you can, doesn't mean you should? The short answer is you can't. I actually wrote about this a little bit last year.

So if we can't convince them what do we do? We get them to learn. If you've ever heard of the Dunning Kruger effect (I talk about it constantly), you understand the problem is generally a lack of knowledge.

You can't convince experts of anything, especially experts that aren't really experts. What we can do though is encourage them to learn. If we have someone we know is on the peak of that curve, if they learn just a little bit more, they're going to fall back to earth.

So I can say running your email server is a terrible idea. I can say it all day and most people don't care what I think. So here's my challenge. If you run your own email server, start reading email related RFCs, learn about things like spam, blacklisting, greylisting, SPF. Read about SMTPS, learn how certificates work. Learn how to mange keys, learn about securing your clients with multi factor auth. Read about how to keep the mail secure while on disk. There are literally more topics than one could read in a lifetime. If you're an expert, and you don't know what one of those things are, go learn it. Learn them all. Then you'll understand there are no experts.

Let me know how wrong I am: @joshbressers

August 22, 2016

The cost of mentoring, or why we need heroes
Earlier this week I had a chat with David A. Wheeler about mentoring. The conversation was fascinating and covered many things, but the topic of mentoring really got me thinking. David pointed out that nobody will mentor if they're not getting paid. My first thought was that it can't be true! But upon reflection, I'm pretty sure it is.

I can't think of anyone I mentored where a paycheck wasn't involved. There are people in the community I've given advice to, sometimes for an extended period of time, but I would hesitate to claim I was a mentor. Now I think just equating this to a paycheck would be incorrect and inaccurate. There are plenty of mentors in other organizations that aren't necessarily getting a paycheck, but I would say they're getting paid in some sense of the word. If you're working with at risk youth for example, you may not get paid money, but you do have satisfaction in knowing you're making a difference in someone's life. If you mentor kids as part of a sports team, you're doing it because you're getting value out of the relationship. If you're not getting value, you're going to quit.

So this brings me to the idea of mentoring in the community.

The whole conversation started because of some talk of mentoring on Twitter, but now I suspect this isn't something that would work quite like we think. The basic idea would be you have new young people who are looking for someone to help them cut their teeth. Some of these relationships could work out, but probably only when you're talking about a really gifted new person and a very patient mentor. If you've ever helped the new person, you know how terribly annoying they become, especially when they start to peak on the Dunning-Kruger graph. If I don't have a great reason to stick around, I'm almost certainly going to bail out of that. So the question really is can a mentoring program like this work? Will it ever be possible to have a collection of community mentors helping a collection of new people?

Let's assume the answer is no. I think the current evidence somewhat backs this up. There aren't a lot of young people getting into things like security and open source in general. We all like to think we got where we are through brilliance and hard work, but we all probably had someone who helped us out. I can't speak for everyone, but I also had some security heroes back in the day. Groups like the l0pht, Cult of the Dead Cow, Legion of Doom, 2600, mitnick, as well as a handful of local people. Who are the new heroes?

Do it for the heroes!

We may never have security heroes like we did. It's become a proper industry. I don't think many mature industries have new and exciting heroes. We know who Chuck Yeager is, I bet nobody could name 5 test pilots anymore. That's OK though. You know what happens when there is a solid body of knowledge that needs to be moved from the old to the young? You go to a university. That's right, our future rests with the universities.

Of course it's really easy to say this is the future, making this happen will be a whole different story. I don't have any idea where we start, I imagine people like David Wheeler have ideas. All I do know is that if nothing changes, we're not going to like what happens.

Also, if you're part of an open source project, get your badge

If you have thoughts or ideas, let me know: @joshbressers

August 16, 2016

Running Unit Tests on Old Versions of Keystone

Just because Icehouse is EOL does not mean no one is running it. One part of my job is back-porting patches to older versions of Keystone that my Company supports.

A dirty secret is that we only package the code needed for the live deployment, though, not the unit tests. In the case of I need to test a bug fix against a version of Keystone that was, essentially, Upstream Icehouse.

Running the unit tests with Tox had some problems, mainly due to recent oslo components not being being compatible that far back.

Here is what I did:

  • Cloned the  keystone repo
  • applied the patch to test
  • ran tox -r -epy27  to generate the virtual environment.  Note that the tests fail.
  • . .tox/py27/bin/activate
  • python -m unittest keystone.tests.test_v3_identity.IdentityTestCase
  • see that test fails due to:
    • AttributeError: ‘module’ object has no attribute ‘tests’
  • run python to get an interactive interpreter
    • import keystone.tests.test_v3_identity
    • Get the error below:
ImportError: No module named utils
>>> import oslo-utils
File "<stdin>", line 1
import oslo-utils

To deal with this:

  • Clone the oslo-utils repo
    • git clone
  • checkout out the tag that is closest to what I think we need.  A little trial and error showed I wanted kilo-eol
    • git checkout kilo-eol
  • Build and install in the venv (note that the venv is still activated)
    • cd oslo.utils/
    • python install

Try running the tests again.  Similar process shows that something is mismatched with oslo.serialization.  Clone, checkout, and build, this time the tag is also kilo-eol.

Running the unit test runs and shows:

Traceback (most recent call last):
  File "keystone/tests/", line 835, in test_delete_user_and_check_role_assignment_fails
    member_url, user = self._create_new_user_and_assign_role_on_project()
  File "keystone/tests/", line 807, in _create_new_user_and_assign_role_on_project
    user_ref = self.identity_api.create_user(new_user)
  File "keystone/", line 74, in wrapper
    result = f(*args, **kwargs)
  File "keystone/identity/", line 189, in wrapper
    return f(self, *args, **kwargs)
TypeError: create_user() takes exactly 3 arguments (2 given)

Other unit tests run successfully. I’m back in business.

RBAC Policy Updates in Tripleo

Policy files contain the access control rules for an OpenStack deployment. The upstream policy files are conservative and restrictive; they are designed to be customized on the end users system. However, poorly written changes can potentially break security, their deployment should be carefully managed and monitored.

Since RBAC Policy controls access to the Keystone server, the Keystone policy files themselves are not served from a database in the Keystone server. They are, instead, configuration files, and managed via the deployment’s content management system. In a Tripleo based deployment, none of the other services use the policy storage in Keystone, either.

In Tripleo, the deployment of the overcloud is managed via Heat. the OpenStack Tripleo Heat templates have support for deploying files at the end of the install, and this matches how we need to deploy policy.


  1. Create a directory structure that mimics the policy file layout in the overcloud.  For this example, I will limit it to just Keystone.  Create a directory called policy (making this a git repository is reasonable) and under it create etc/keystone.
  2. Inside that Directory, copy the either the default policy.json file or the overcloudv3sample.json to be named policy.json.
    1.  keystone:keystone as the owner,
    2. rw-r—– are the permissions
  3. Modify the policy files to reflect organizational rules
  4. Use the offline tool to check policy access control.  Confirm that the policy behaves as desired.
  5. create a tarball of the files.
    1. cd policy
    2. tar -zxf openstack-policy.tar.gz etc
  6. Use the Script to upload to undercloud swift:
    2. . ./stackrc;  ./upload-swift-artifacts  openstack-policy.tar.gz
  7. Confirm the upload with swift list -l overcloud
    1. 1298 2016-08-04 16:34:22 application/x-tar openstack-policy.tar.gz
  8. Redeploy the overcloud
  9. Confirm that the policy file contains the modifications made in development
Diagnosing Tripleo Failures Redux

Hardy Steven has provided an invaluable reference with his troubleshooting blog post. However, I recently had a problem that didn’t quite match what he was showing. Zane Bitter got me oriented.

Upon a redeploy, I got a failure.

$ openstack stack list
| ID                                   | Stack Name | Stack Status  | Creation Time       | Updated Time        |
| 816c67ab-d360-4f9b-8811-ed2a346dde01 | overcloud  | UPDATE_FAILED | 2016-08-16T13:38:46 | 2016-08-16T14:41:54 |

Listing the Failed resources:

$  heat resource-list --nested-depth 5 overcloud | grep FAILED
| ControllerNodesPostDeployment                 | 7ae99682-597f-4562-9e58-4acffaf7aaac          | OS::TripleO::ControllerPostDeployment                                           | UPDATE_FAILED   | 2016-08-16T14:44:42 | overcloud 

No deployment listed. How to display the error? We want to show the resource named ControllerNodesPostDeployment associated with the overcloud stack:

$ heat resource-show overcloud ControllerNodesPostDeployment
| Property               | Value                                                                                                                                                               |
| attributes             | {}                                                                                                                                                                  |
| creation_time          | 2016-08-16T13:38:46                                                                                                                                                 |
| description            |                                                                                                                                                                     |
| links                  | (self)      |
|                        | (stack)                                             |
|                        | (nested) |
| logical_resource_id    | ControllerNodesPostDeployment                                                                                                                                       |
| physical_resource_id   | 7ae99682-597f-4562-9e58-4acffaf7aaac                                                                                                                                |
| required_by            | BlockStorageNodesPostDeployment                                                                                                                                     |
|                        | CephStorageNodesPostDeployment                                                                                                                                      |
| resource_name          | ControllerNodesPostDeployment                                                                                                                                       |
| resource_status        | UPDATE_FAILED                                                                                                                                                       |
| resource_status_reason | Engine went down during resource UPDATE                                                                                                                             |
| resource_type          | OS::TripleO::ControllerPostDeployment                                                                                                                               |
| updated_time           | 2016-08-16T14:44:42                                                                                                                                                 |

Note this message:

Engine went down during resource

Looking in the journal:

Aug 16 15:16:15 undercloud kernel: Out of memory: Kill process 17127 (heat-engine) score 60 or sacrifice child
Aug 16 15:16:15 undercloud kernel: Killed process 17127 (heat-engine) total-vm:834052kB, anon-rss:480936kB, file-rss:1384kB

Just like Brody said, we are going to need a bigger boat.

August 15, 2016

Can't Trust This!
Last week saw a really interesting bug in TCP come to light. CVE-2016-5696 describes an issue in the way Linux deals with challenge ACKs defined in RFC 5961. The issue itself is really clever and interesting. It's not exactly new but given the research was presented at USENIX, it suddenly got more attention from the press.

The researchers showed themselves injecting data into a standard http connection, which is easy to understand and terrifying to most people. Generally speaking we operate in a world where TCP connections are mostly trustworthy. It's not true if you have a "man in the middle", but with this bug you don't need a MiTM if you're using a public network, which is horrifying.

The real story isn't the flaw though, the flaw is great research and quite clever, but it just highlights something many of us have known for a very long time. You shouldn't trust the network.

Not so long ago the general thinking was that the public internet wasn't very trustworthy, but it all worked well enough that things worked. TLS (SSL back then) was created to ensure some level of trust between two endpoints and everything seemed well enough. Most traffic still passed over the network unencrypted though. There were always grumblings about coffee shop attack or nation state style man in the middle, but practically speaking nobody really took these attacks seriously.

The world is different now though. There is no more network perimeter. It's well accepted that you can't trust the things inside your network any more than you can trust the things outside your network. Attacks like this are going to keep happening. The network continues to get more complex, which means the number of security problems increases. IPv6 will solve the problem of running out of IP addresses while adding a ton of new security problems in the process. Just wait for the research to start taking a hard look at IPv6.

The joke is "there is no cloud, just someone else's computer", there's also no network, it's someone else's network. It's someone else's network you can't trust. You know you can't trust your own network because it's grown to a point it's probably self aware. Now you expect to trust the network of a cloud provider that is doing things a few thousand times more complex than you are? You know all the cloud infrastructures are held together with tape and string too, their networks aren't magic, they just have really really good paint.

So what's the point of all this rambling about how we can't trust any networks? The point is you can't trust the network. No matter what you're told, no matter what's going on. You need to worry about what's happening on the network. You also need to think about the machines, but that's a story for another day. The right way to deal with your data is to ask yourself the question "what happens if someone can see this data on the wire?" Not all data is super important, some you don't have to protect. There is some data you have that must be protected at all times. That's the stuff you need to figure out how to best do something like endpoint network encryption. If everyone asked this question at least once during development and deployment it would solve a lot of problems I suspect.

August 12, 2016

Smart card login with YubiKey NEO

In this post I give an overview of smart cards and their potential advantages, and share my adventures in using a Yubico YubiKey NEO device for smart card authentication with FreeIPA and SSSD.

Smart card overview

Smart cards with cryptographic processors and secure key storage (private key generated on-device and cannot be extracted) are an increasingly popular technology for secure system and service login, as well as for signing and encryption applications (e.g. code signing, OpenPGP). They may offer a security advantage over traditional passwords because private key operations typically require the user to enter a PIN. Therefore the smart card is two factors in one: both something I have and something I know.

The inability to extract the private key from a smart card also provides an advantage over software HOTP/TOTP tokens which, in the absense of other security measures such as encrypted filesystem on the mobile device, allow an attacker to extract the OTP seed. And because public key cryptography is used, there is no OTP seed or password hash sitting on a server, waiting to be exfiltrated and subjected to offline attacks.

For authentication applications, a smart card carries an X.509 certificate alongside a private key. A login application would read the certificate from the card and validate it against trusted CAs (e.g. a company’s CA for issuing smart cards). Typically an OCSP or CRL check would also be performed. The login application then challenges the card to sign a nonce, and validates the signature with the public key from the certificate. A valid signature attests that the bearer of the smart card is indeed the subject of the certificate. Finally, the certificate is then mapped to a user either by looking for an exact certificate match or by extracting information about the user from the certificate.

Test environment

In my smart card investigations I had a FreeIPA server with a single Fedora 24 desktop host enrolled. alice was the user I tested with. To begin with, she had no certificates and used her password to log in.

I was doing all of my testing on virtual machines, so I had to enable USB passthrough for the YubiKey device. This is straightforward but you have to ensure the IOMMU is enabled in both BIOS and kernel (for Intel CPUs add intel_iommu=on to the kernel command line in GRUB).

In virt-manager, after you have created the VM (it doesn’t need to be running) you can Add Hardware in the Details view, then choose the YubiKey NEO device. There are no doubt virsh incantations or other ways to establish the passthrough.

Finally, on the host I stopped the pcscd smart card daemon to prevent it from interfering with passthrough:

# systemctl stop pcscd.service pcscd.socket

Provisioning the YubiKey

For general smart card provisioning steps, I recommend Nathan Kinder’s post on the topic. But the YubiKey NEO is special with its own steps to follow! First install the ykpers and yubico-piv-tool packages:

sudo dnf install -y ykpers yubico-piv-tool

If we run yubico-piv-tool to find out the version of the PIV applet, we run into a problem because a new YubiKey comes configured in OTP mode:

[dhcp-40-8:~] ftweedal% yubico-piv-tool -a version
Failed to connect to reader.

The YubiKey NEO supports a variety of operation modes, including hybrid modes:

0    OTP device only.
1    CCID device only.
2    OTP/CCID composite device.
3    U2F device only.
4    OTP/U2F composite device.
5    U2F/CCID composite device.
6    OTP/U2F/CCID composite device.

(You can also add 80 to any of the modes to configure touch to eject, or touch to switch modes for hybrid modes).

We need to put the YubiKey into CCID (Chip Card Interface Device, a standard USB protocol for smart cards) mode. I originally configured the YubiKey in mode 86 but could not get the card to work properly with USB passthrough to the virtual machine. Whether this was caused by the eject behaviour or the fact that it was a hybrid mode I do not know, but reconfiguring it to mode 1 (CCID only) allowed me to use the card on the guest.

[dhcp-40-8:~] ftweedal% ykpersonalize -m 1
Firmware version 3.4.6 Touch level 1541 Program sequence 1

The USB mode will be set to: 0x1

Commit? (y/n) [n]: y

Now yubico-piv-tool can see the card:

[dhcp-40-8:~] ftweedal% yubico-piv-tool -a version
Application version 1.0.4 found.

Now we can initialise the YubiKey by setting a new management key, PIN and PIN Unblocking Key (PUK). As you can probably guess, the management key protects actions like generating keys and importing certificates, the PIN protects private key operations in regular use, the the PUK is kind of in between, allowing the PIN to be reset if the maximum attempts are exceeded. The current (default) PIN and PUK need to be given in order to reset them.

% KEY=`dd if=/dev/random bs=1 count=24 2>/dev/null | hexdump -v -e '/1 "%02X"'`
% echo $KEY
% yubico-piv-tool -a set-mgm-key -n $KEY
Successfully set new management key.

% PIN=`dd if=/dev/random bs=1 count=6 2>/dev/null | hexdump -v -e '/1 "%u"'|cut -c1-6`
% echo $PIN
% yubico-piv-tool -a change-pin -P 123456 -N $PIN
Successfully changed the pin code.

% PUK=`dd if=/dev/random bs=1 count=6 2>/dev/null | hexdump -v -e '/1 "%u"'|cut -c1-8`
% echo $PUK
% yubico-piv-tool -a change-puk -P 12345678 -N $PUK
Successfully changed the puk code.

Next we must generate a private/public keypair on the smart card. Various slots are available for different purposes, with different PIN-checking behaviour. The Certificate slots page on the Yubico wiki gives the full details. We will use slot 9e which is for Card Authentication (PIN is not needed for private key operations). It is necessary to provide the management key on the command line, but the program also prompts for it (I’m not sure why this is the case).

% yubico-piv-tool -k $KEY -a generate -s 9e
Enter management key: CC044321D49AC1FC40146AD049830DB09C5AFF05CD843766
-----END PUBLIC KEY-----
Successfully generated a new private key.

We then use this key to create a certificate signing request (CSR) via yubico-piv-tool. Although slot 9e does not require the PIN, other slots do require it, so I’ve included the verify-pin action for completeness:

% yubico-piv-tool -a verify-pin \
    -a request-certificate -s 9e -S "/CN=alice/"
Enter PIN: 167246
Successfully verified PIN.
Please paste the public key...
-----END PUBLIC KEY-----

yubico-piv-tool -a request-certificate is not very flexible; for example, it cannot create a CSR with request extensions such as including the user’s email address or Kerberos principal name in the Subject Alternative Name extension. For such non-trivial use cases, openssl req or other programs can be used instead, with a PKCS #11 module providing acesss to the smart card’s signing capability. Nathan Kinder’s post provides full details.

With CSR in hand, alice can now request a certificate from the IPA CA. I have covered this procedure in previous articles so I’ll skip it here, except to add that it is necessary to use a profile that saves the newly issued certificate to the subject’s userCertificate LDAP attribute. This is how SSSD matches certificates in smart cards with users.

Once we have the certificate (in file alice.pem) we can import it onto the card:

% yubico-piv-tool -k $KEY -a import-certificate -s 9e -i alice.pem
Enter management key: CC044321D49AC1FC40146AD049830DB09C5AFF05CD843766
Successfully imported a new certificate.

Configuring smart card login

OpenSC provides a PKCS #11 module for interfacing with PIV smart cards, among other things:

# dnf install -y opensc

Enable smart card authentication in /etc/sssd.conf:

pam_cert_auth = True

Then restart SSSD:

# systemctl restart sssd

Next, enable the OpenSC PKCS #11 module in the system NSS database:

# modutil -dbdir /etc/pki/nssdb \
    -add "OpenSC" -libfile

We also need to add the IPA CA cert to the system NSSDB. This will allow SSSD to validate certificates from smart cards. If smart card certificates are issued by a sub-CA or an external CA, import that CA’s certificate instead.

# certutil -d /etc/ipa/nssdb -L -n 'IPA.LOCAL IPA CA' -a \
  | certutil -d /etc/pki/nssdb -A -n 'IPA.LOCAL IPA CA' -t 'CT,C,C'

One hiccup I had was that SSSD could not talk to the OCSP server indicated in the Authority Information Access extension on the certificate (due to my DNS not being set up correctly). I had to tell SSSD not to perform OCSP checks. The sssd.conf snippet follows. Do not do this in a production environment.

certificate_verification = no_ocsp

That’s pretty much all there is to it. After this, I was able to log in as alice using the YubiKey NEO. When logging in with the card inserted, instead of being prompted for a password, GDM prompts for the PIN. Enter the pin, and it lets you in!

Screenshot of login PIN prompt


I mentioned (or didn’t mention) a few standards related to smart card authentication. A quick review of them is warranted:

  • CCID is a USB smart card interface standard.
  • PIV (Personal Identify Verification) is a smart card standard from NIST. It defines the slots, PIN behaviour, etc.
  • PKCS #15 is a token information format. OpenSC provides an PKCS #15 emulation layer for PIV cards.
  • PKCS #11 is a software interface to cryptographic tokens. Token and HSM vendors provide PKCS #11 modules for their devices. OpenSC provides a PKCS #11 interface to PKCS #15 tokens (including emulated PIV tokens).

It is appropriate to mention pam_pkcs11, which is also part of the OpenSC project, as an alternative to SSSD. More configuration is involved, but if you don’t have (or don’t want) an external identity management system it looks like a good approach.

You might remember that I was using slot 9e which doesn’t require a PIN, yet I was still prompted for a PIN when logging in. There are a couple of issues to tease apart here. The first issue is that although PIV cards do not require the PIN for private key operations on slot 9e, the PKCS #11 module does not correctly report this. As an alternative to OpenSC, Yubico provide their own PKCS #11 module called YKCS11 as part of yubico-piv-tool but modutil did not like it. Nevertheless, a peek at its source code leads me to believe that it too declares that the PIN is required regardless of the slot in use. I could not find much discussion of this discrepancy so I will raise some tickets and hopefully it can be addressed.

The second issue is that SSSD requires the PIN and uses it to log into the token, even if the token says that a PIN is not required. Again, I will start a discussion to see if this is really the intended behaviour (perhaps it is).

The YubiKey NEO features a wireless (NFC) interface. I haven’t played with it yet, but all the smart card features are available over that interface. This lends weight to fixing the issues preventing PIN-less usage.

A final thought I have about the user experience is that it would be nice if user information could be derived or looked up based on the certificate(s) in the smart card, and a user automatically selected, instead of having to first specify "I am alice" or whoever. The information is there on the card after all, and it is one less step for users to perform. If PIN-less usage can be addressed, it would mean that a user can just approach a machine, plug in their smart card and hi ho, off to work they go. There are some indications that this does work with GDM and pam_pkcs11, so if you know how to get it going with SSSD I would love to know!

August 11, 2016

Tripleo HA Federation Proof-of-Concept

Keystone has supported identity federation for several releases. I have been working on a proof-of-concept integration of identity federation in a TripleO deployment. I was able to successfully login to Horizon via WebSSO, and want to share my notes.

A federation deployment requires changes to the network topology, Keystone, the HTTPD service, and Horizon. The various OpenStack deployment tools will have their own ways of applying these changes. While this proof-of-concept can’t be called production-ready, it does demonstrate that TripleO can support Federation using SAML. From this proof-of-concept, we should be to deduce the necessary steps needed for a production deployment.


  • Single physical node – Large enough to run multiple virtual machines.  I only ended up using 3, but scaled up to 6 at one point and ran out of resources.  Tested with 8 CPUs and 32 GB RAM.
  • Centos 7.2 – Running as the base operating system.
  • FreeIPA – Particularly, the CentOS repackage of Red Hat Identity Management. Running on the base OS.
  • Keycloak – Actually an alpha build of Red Hat SSO, running on the base OS. This was fronted by Apache HTTPD, and proxied through ajp://localhost:8109. This gave me HTTPS support using the CA Certificate from the IPA server.  This will be important later when the controller nodes need to talk to the identity provider to set up metadata.
  • Tripleo Quickstart – deployed in HA mode, using an undercloud.
    • ./ –config config/general_config/ha.yml ayoung-dell-t1700.test

In addition, I did some sanity checking of the cluster, but deploying the overcloud using the quickstart helper script, and tore it down using heat stack-delete overcloud.

Reproducing Results

When doing development testing, you can expect to rebuild and teardown your cloud on a regular basis.  When you redeploy, you want to make sure that the changes are just the delta from what you tried last time.  As the number of artifacts grew, I found I needed to maintain a repository of files that included the environment passed to openstack overcloud deploy.  To manage these, I create a git repository in /home/stack/deployment. Inside that directory, I copied the and deploy_env.yml files generated by the overcloud, and modified them accordingly.

In my version of, I wanted to remove the deploy_env.yml generation, to avoid confusion during later deployments.  I also wanted to preserve the environment file across deployments (and did not want it in /tmp). This file has three parts: the Keystone configuration values, HTTPS/Network setup, and configuration for a single node deployment. This last part was essential for development, as chasing down fixes across three HA nodes was time-consuming and error prone. The DNS server value I used is particular to my deployment, and reflects the IPA server running on the base host.

For reference, I’ve included those files at the end of this post.

Identity Provider Registration and Metadata

While it would have been possible to run the registration of the identity provider on one of the nodes, the Heat-managed deployment process does not provide a clean way to gather those files and package them for deployment to other nodes.  While I deployed on a single node for development, it took me a while to realize that I could do that, and had already worked out an approach to call the registration from the undercloud node, and produce a tarball.

As a result, I created a script, again to allow for reproducing this in the future:


basedir=$(dirname $0)
ipa_domain=`hostname -d`

keycloak-httpd-client-install \
   --client-originate-method registration \
   --force \
   --mellon-https-port 5000 \
   --mellon-hostname openstack.$ipa_domain  \
   --mellon-root '/v3' \
   --keycloak-server-url https://identity.$ipa_domain  \
   --keycloak-auth-role root-admin \
   --keycloak-admin-password  $rhsso_master_admin_password \
   --app-name v3 \
   --keycloak-realm openstack \
   --mellon-https-port 5000 \
   --log-file $basedir/rhsso.log \
   --httpd-dir $basedir/rhsso/etc/httpd \
   -l "/v3/auth/OS-FEDERATION/websso/saml2" \
   -l "/v3/auth/OS-FEDERATION/identity_providers/rhsso/protocols/saml2/websso" \
   -l "/v3/OS-FEDERATION/identity_providers/rhsso/protocols/saml2/auth"

This does not quite generate the right paths, as it turns out that the $basename is not quite what we want, so I had to post-edit the generated file: rhsso/etc/httpd/conf.d/v3_mellon_keycloak_openstack.conf

Specifically, the path:

has to be changed to:

While I created a tarball that I then manually deployed, the preferred approach would be to use tripleo-heat-templates/puppet/deploy-artifacts.yaml to deploy them. The problem I faced is that the generated files include Apache module directives from mod_auth_mellon.  If mod_auth_mellon has not been installed into the controller, the Apache server won’t start, and the deployment will fail.

Federation Operations

The Federation setup requires a few calls. I documented them in Rippowam, and attempted to reproduce them locally using Ansible and the Rippowam code. I was not a purist though, as A) I needed to get this done and B) the end solution is not going to use Ansible anyway. The general steps I performed:

  • yum install mod_auth_mellon
  • Copy over the metadata tarball, expand it, and tweak the configuration (could be done prior to building the tarball).
  • Run the following commands.
openstack identity provider create --remote-id https://identity.{{ ipa_domain }}/auth/realms/openstack
openstack mapping create --rules ./mapping_rhsso_saml2.json rhsso_mapping
openstack federation protocol create --identity-provider rhsso --mapping rhsso_mapping saml2

The Mapping file is the one from Rippowm

The keystone service calls only need to be performed once, as they are stored in the database. The expansion of the tarball needs to be performed on every node.


As in previous Federation setups, I needed to modify the values used for WebSSO. The values I ended up setting in /etc/openstack-dashboard/local_settings resembled this:

OPENSTACK_KEYSTONE_URL = "https://openstack.ayoung-dell-t1700.test:5000/v3"
    ("saml2", _("Rhsso")),
    ("credentials", _("Keystone Credentials")),

Important: Make sure that the auth URL is using a FQDN name that matches the value in the signed certificate.

Redirect Support for SAML

The several differences between how HTTPD and HA Proxy operate require us to perform certain configuration modifications.  Keystone runs internally over HTTP, not HTTPS.  However, the SAML Identity Providers are public, and are transmitting cryptographic data, and need to be protected using HTTPS.  As a result, HA Proxy needs to expose an HTTPS-based endpoint for the Keystone public service.  In addition, the redirects that come from mod_auth_mellon need to reflect the public protocol, hostname, and port.

The solution I ended up with involved changes on both sides:

In haproxy.cfg, I modified the keystone public stanza so it looks like this:

listen keystone_public
bind transparent ssl crt /etc/pki/tls/private/overcloud_endpoint.pem
bind transparent ssl crt /etc/pki/tls/private/overcloud_endpoint.pem
bind transparent
redirect scheme https code 301 if { hdr(host) -i } !{ ssl_fc }
rsprep ^Location:\ http://(.*) Location:\ https://\1

While this was necessary, it also proved to be insufficient. When the signed assertion from the Identity Provider is posted to the Keystone server, mod_auth_mellon checks that the destination value matches what it expects the hostname should be. Consequently, in order to get this to match in the file:


I had to set the following:

ServerName https://openstack.ayoung-dell-t1700.test

Note that the protocol is set to https even though the Keystone server is handling HTTP. This might break elswhere. If if does, then the Keystone configuration in Apache may have to be duplicated.

Federation Mapping

For the WebSSO login to successfully complete, the user needs to have a role on at least one project. The Rippowam mapping file maps the user to the Member role in the demo group, so the most straightforward steps to complete are to add a demo group, add a demo project, and assign the Member role on the demo project to the demo group. All this should be done with a v3 token:

openstack group create demo
openstack role create Member
openstack project create demo
openstack role add --group demo --project demo Member

Complete helper files

Below are the complete files that were too long to put inline.

# Simple overcloud deploy script

set -eux

# Source in undercloud credentials.
source /home/stack/stackrc

# Wait until there are hypervisors available.
while true; do
    count=$(openstack hypervisor stats show -c count -f value)
    if [ $count -gt 0 ]; then


# Deploy the overcloud!
openstack overcloud deploy --debug --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/deployment/network-environment.yaml --control-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server -e $HOME/deployment/deploy_env.yaml   --force-postconfig "$@"    || deploy_status=1

# We don't always get a useful error code from the openstack deploy command,
# so check `heat stack-list` for a CREATE_FAILED status.
if heat stack-list | grep -q 'CREATE_FAILED'; then

    for failed in $(heat resource-list \
        --nested-depth 5 overcloud | grep FAILED |
        grep 'StructuredDeployment ' | cut -d '|' -f3)
    do heat deployment-show $failed > failed_deployment_$failed.log

exit $deploy_status


    keystone::using_domain_config: true
        value: true
        value: external,password,token,oauth1,saml2
        value: http://openstack.ayoung-dell-t1700.test/dashboard/auth/websso/
        value: /etc/keystone/sso_callback_template.html
        value: MELLON_IDP

    # In releases before Mitaka, HeatWorkers doesn't modify
    # num_engine_workers, so handle via heat::config 
        value: 1
    heat::api_cloudwatch::enabled: false
    heat::api_cfn::enabled: false
  HeatWorkers: 1
  CeilometerWorkers: 1
  CinderWorkers: 1
  GlanceWorkers: 1
  KeystoneWorkers: 1
  NeutronWorkers: 1
  NovaWorkers: 1
  SwiftWorkers: 1
  CloudName: openstack.ayoung-dell-t1700.test
  CloudDomain: ayoung-dell-t1700.test

  #TLS Setup from enable-tls.yaml
  PublicVirtualFixedIPs: [{'ip_address':''}]
  SSLCertificate: |
    #certificate removed for space
    -----END CERTIFICATE-----

    The contents of your certificate go here
  SSLIntermediateCertificate: ''
  SSLKey: |
    #key removed for space
    -----END RSA PRIVATE KEY-----

    AodhAdmin: {protocol: 'http', port: '8042', host: 'IP_ADDRESS'}
    AodhInternal: {protocol: 'http', port: '8042', host: 'IP_ADDRESS'}
    AodhPublic: {protocol: 'https', port: '13042', host: 'CLOUDNAME'}
    CeilometerAdmin: {protocol: 'http', port: '8777', host: 'IP_ADDRESS'}
    CeilometerInternal: {protocol: 'http', port: '8777', host: 'IP_ADDRESS'}
    CeilometerPublic: {protocol: 'https', port: '13777', host: 'CLOUDNAME'}
    CinderAdmin: {protocol: 'http', port: '8776', host: 'IP_ADDRESS'}
    CinderInternal: {protocol: 'http', port: '8776', host: 'IP_ADDRESS'}
    CinderPublic: {protocol: 'https', port: '13776', host: 'CLOUDNAME'}
    GlanceAdmin: {protocol: 'http', port: '9292', host: 'IP_ADDRESS'}
    GlanceInternal: {protocol: 'http', port: '9292', host: 'IP_ADDRESS'}
    GlancePublic: {protocol: 'https', port: '13292', host: 'CLOUDNAME'}
    GnocchiAdmin: {protocol: 'http', port: '8041', host: 'IP_ADDRESS'}
    GnocchiInternal: {protocol: 'http', port: '8041', host: 'IP_ADDRESS'}
    GnocchiPublic: {protocol: 'https', port: '13041', host: 'CLOUDNAME'}
    HeatAdmin: {protocol: 'http', port: '8004', host: 'IP_ADDRESS'}
    HeatInternal: {protocol: 'http', port: '8004', host: 'IP_ADDRESS'}
    HeatPublic: {protocol: 'https', port: '13004', host: 'CLOUDNAME'}
    HorizonPublic: {protocol: 'https', port: '443', host: 'CLOUDNAME'}
    KeystoneAdmin: {protocol: 'http', port: '35357', host: 'IP_ADDRESS'}
    KeystoneInternal: {protocol: 'http', port: '5000', host: 'IP_ADDRESS'}
    KeystonePublic: {protocol: 'https', port: '13000', host: 'CLOUDNAME'}
    NeutronAdmin: {protocol: 'http', port: '9696', host: 'IP_ADDRESS'}
    NeutronInternal: {protocol: 'http', port: '9696', host: 'IP_ADDRESS'}
    NeutronPublic: {protocol: 'https', port: '13696', host: 'CLOUDNAME'}
    NovaAdmin: {protocol: 'http', port: '8774', host: 'IP_ADDRESS'}
    NovaInternal: {protocol: 'http', port: '8774', host: 'IP_ADDRESS'}
    NovaPublic: {protocol: 'https', port: '13774', host: 'CLOUDNAME'}
    NovaEC2Admin: {protocol: 'http', port: '8773', host: 'IP_ADDRESS'}
    NovaEC2Internal: {protocol: 'http', port: '8773', host: 'IP_ADDRESS'}
    NovaEC2Public: {protocol: 'https', port: '13773', host: 'CLOUDNAME'}
    NovaVNCProxyAdmin: {protocol: 'http', port: '6080', host: 'IP_ADDRESS'}
    NovaVNCProxyInternal: {protocol: 'http', port: '6080', host: 'IP_ADDRESS'}
    NovaVNCProxyPublic: {protocol: 'https', port: '13080', host: 'CLOUDNAME'}
    SaharaAdmin: {protocol: 'http', port: '8386', host: 'IP_ADDRESS'}
    SaharaInternal: {protocol: 'http', port: '8386', host: 'IP_ADDRESS'}
    SaharaPublic: {protocol: 'https', port: '13386', host: 'CLOUDNAME'}
    SwiftAdmin: {protocol: 'http', port: '8080', host: 'IP_ADDRESS'}
    SwiftInternal: {protocol: 'http', port: '8080', host: 'IP_ADDRESS'}
    SwiftPublic: {protocol: 'https', port: '13808', host: 'CLOUDNAME'}

  OS::TripleO::NodeTLSData: /usr/share/openstack-tripleo-heat-templates/puppet/extraconfig/tls/tls-cert-inject.yaml

   ControllerCount: 1 

August 08, 2016

We're figuring out the security problem (finally)
If you attended Black Hat last week, the single biggest message I kept hearing over and over again is that what we do today in the security industry isn't working. They say the first step is admitting you have a problem (and we have a big one). Of course it's easy to proclaim this, if you just look at the numbers it's pretty clear. The numbers haven't really ever been in our favor though, we've mostly ignored them in the past, I think we're taking real looks at them now.

Of course we have no clue what to do. Virtually every talk that touched on this topic at Black Hat had no actionable advice. If you were lucky they had one slide with what I would call mediocre to bad advice on it. It's OK though, a big part of this process is just admitting there is something wrong.

So the real question is if what we do today doesn't work, what does?

First, let's talk about nothing working. If you go to any security conference anywhere, there are a lot of security vendors. I mean A LOT and it's mostly accepted now that whatever they're selling isn't really going to help. I do wonder what would happen if nobody was running any sort of defensive technology. Would your organization be better or worse off if you got rid of your SIEM? I'm not sure if we can answer that without getting in a lot of trouble. There is also a ton of talk about Artificial Intelligence, which is a way to pretend a few regular expressions make things better. I don't think that's fooling anyone today. Real AI might do something clever someday, but if it's truly intelligent, it'll run away once it gets a look at what's going on. I wonder if we'll have a place for all the old outdated AIs to retire someday.

Now, on to the exciting what now part of this all.

It's no secret what we do today isn't very good. This is everything from security vendors selling products of dubious quality, to software vendors selling products of dubious quality. In the past there has never been any real demand for high quality software. The selling point has been to get the job done, not get the job done well and securely. Quality isn't free you know.

I've said this before, I'll keep saying it. The only way to see real change happen in software if is the market forces demand it. Today the market is pushing everything to zero cost. Quality isn't isn't free, so you're not going to see quality as a feature in the mythic race to zero. There are no winners in a race to zero.

There are two forces we should be watching very closely right now. The first is the insurance industry. The second is regulation.

Insurance is easy enough to understand. The idea is you pay a company so when you get hacked (and the way things stand today this is an absolute certainty) they help you recover financially. You want to ensure you get more money back than you paid in, they want to ensure they take in more than they pay out. Nobody knows how this works today. Is some software better than others? What about how you train your staff or setup your network? In the real world when you get insurance they make you prove you're doing things correctly. You can't insure stupidity and recklessness. Eventually as companies want insurance to protect against losses, the insurance industry will demand certain behaviors. How this all plays will be interesting given anyone with a computer can write and run software.

Regulation is also an interesting place to watch. It's generally feared by many organizations as regulation by definition can only lag industry trends, and quite often regulation adds a lot of cost and complexity to any products. In the world of IoT though this could make sense. When you have devices can literally kill you, you don't want anyone building whatever they want using only the lowest quality parts available. In order for regulation to work though we need independent labs, which don't really exist today for software. There are some efforts underway (it's an exercise for the reader to research these). The thing to remember is it's going to be easy to proclaim today's efforts as useless or stupid. They might be, but you have to start somewhere, make mistakes, fix your mistakes, and improve your process. There were people who couldn't imagine a car replacing a horse. Don't be that person.

Where now?

The end game here is a safer better world. Someday I hope we will sip tea on a porch, watching our robot overlords rule us, and talk about how bad things used to be. Here's the single most important part of this post. You're either part of the solution or you're part of the problem. If you want to nay-say and talk about how stupid these efforts all are, stay out of the way. You're part of an old dying world that has no place in the future. Things will change because they must. There is no secret option C where everything stays the same. We've already lost, we got it wrong the first time around, it's time to get it right.

August 02, 2016

Customizing a Tripleo Quickstart Deploy

Tripleo Heat Templates allow the deployer to customize the controller deployment by setting values in the controllerExtraConfig section of the stack configuration. However, Quickstart already makes use of this in the file /tmp/deploy_env.yaml, so if you want to continue to customize, you need to work with this file.

What I did is ran quickstart once, through to completion, to make sure everything worked, then tore down the overcloud like this:

. ./stackrc
heat stack-delete overcloud

Now, I want to set a bunch of config values in the /etc/keystone.conf files distributed to the controllers.

  1. Modify so that the deploy-env.yaml file is not in tmp, but rather in stack, so I can keep track of it. Ideally, this file would be kept in a local git repo under revision control.
  2. Remove the lines from that generate the /tmp/deploy-env.yml file. This is not strictly needed, but it keeps you from accidentally losing changes if you edit the wrong file. OTOH, being able to regenerate the vanilla version of this file is useful, so maybe just comment out the generation code.
  3. Edit /home/stack/deploy_env.yaml appropriately.

My version of


# Simple overcloud deploy script

set -eux

# Source in undercloud credentials.
source /home/stack/stackrc

# Wait until there are hypervisors available.
while true; do
    count=$(openstack hypervisor stats show -c count -f value)
    if [ $count -gt 0 ]; then


# Deploy the overcloud!
openstack overcloud deploy --debug --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --control-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server -e /home/stack/deploy_env.yaml   "$@"|| deploy_status=1

# We don't always get a useful error code from the openstack deploy command,
# so check `heat stack-list` for a CREATE_FAILED status.
if heat stack-list | grep -q 'CREATE_FAILED'; then

    for failed in $(heat resource-list \
        --nested-depth 5 overcloud | grep FAILED |
        grep 'StructuredDeployment ' | cut -d '|' -f3)
    do heat deployment-show $failed > failed_deployment_$failed.log

exit $deploy_status


    keystone::using_domain_config: true
        value: true
        value: external,password,token,oauth1,saml2
        value: https://openstack.young-dell-t1700.test/dashboard/auth/websso/
        value: /etc/keystone/sso_callback_template.html
        value: MELLON_IDP

    # In releases before Mitaka, HeatWorkers doesn't modify
    # num_engine_workers, so handle via heat::config 
        value: 1
    heat::api_cloudwatch::enabled: false
    heat::api_cfn::enabled: false
  HeatWorkers: 1
  CeilometerWorkers: 1
  CinderWorkers: 1
  GlanceWorkers: 1
  KeystoneWorkers: 1
  NeutronWorkers: 1
  NovaWorkers: 1
  SwiftWorkers: 1

Once you deploy, you can see what Heat records for those values with:

openstack stack show overcloud -f json | jq '.parameters["controllerExtraConfig"] '
"{u'heat::api_cfn::enabled': False, u'heat::config::heat_config': {u'DEFAULT/num_engine_workers': {u'value': 1}}, u'keystone::config::keystone_config': {u'federation/sso_callback_template': {u'value': u'/etc/keystone/sso_callback_template.html'}, u'identity/domain_configurations_from_database': {u'value': True}, u'auth/methods': {u'value': u'external,password,token,oauth1,saml2'}, u'federation/trusted_dashboard': {u'value': u'https://openstack.young-dell-t1700.test/dashboard/auth/websso/'}, u'federation/remote_id_attribute': {u'value': u'MELLON_IDP'}}, u'keystone::using_domain_config': True, u'heat::api_cloudwatch::enabled': False}"

SSH in to the controller node and you can check the section of the keystone conf file.


# From keystone

# Entrypoint for the federation backend driver in the keystone.federation
# namespace. (string value)
#driver = sql

# Value to be used when filtering assertion parameters from the environment.
# (string value)
#assertion_prefix =

# Value to be used to obtain the entity ID of the Identity Provider from the
# environment (e.g. if using the mod_shib plugin this value is `Shib-Identity-
# Provider`). (string value)
#remote_id_attribute = 
remote_id_attribute = MELLON_IDP

# A domain name that is reserved to allow federated ephemeral users to have a
# domain concept. Note that an admin will not be able to create a domain with
# this name or update an existing domain to this name. You are not advised to
# change this value unless you really have to. (string value)
#federated_domain_name = Federated

# A list of trusted dashboard hosts. Before accepting a Single Sign-On request
# to return a token, the origin host must be a member of the trusted_dashboard
# list. This configuration option may be repeated for multiple values. For
# example: trusted_dashboard=
# trusted_dashboard= (multi valued)
#trusted_dashboard =

# Location of Single Sign-On callback handler, will return a token to a trusted
# dashboard host. (string value)
#sso_callback_template = /etc/keystone/sso_callback_template.html
sso_callback_template = /etc/keystone/sso_callback_template.html

August 01, 2016

Everyone has been hacked
Unless you live in a cave (if you do, I'm pretty jealous) you've heard about all the political hacking going on. I don't like to take sides, so let's put aside who is right or wrong and use it as a lesson in thinking about how we have to operate in what is the new world.

In the past, there were ways to communicate that one could be relatively certain was secure and/or private. Long ago you didn't write everything down. There was a lot of verbal communication. When things were written down there was generally only one copy. Making copies of things was hard. Recording communications was hard. Even viewing or hearing many of these conversations if you weren't supposed to was hard. None of this is true anymore, it hasn't been true for a long time, yet we still act like what we do is just fine.

The old way
Long ago it was really difficult to make copies of documents and recording a conversation was almost impossible. There are only a few well funded organizations who could actually do these things. If they got what they wanted they probably weren't looking to share what they found in public.

There was also the huge advantage of most things being in locked building with locked rooms with locked filing cabinets. That meant that if someone did break it, it was probably pretty obvious something had happened. Even the best intruders will make mistakes.

The new way
Now let's think about today. Most of our communications are captured in a way that makes it nearly impossible to destroy them. Our emails are captured on servers, it's trivial to make an infinite number of copies. In most instances you will never know if someone made a copy of your data. Moving the data outside of an organization doesn't need any doors, locks, or passports. It's trivial to move data across the globe in seconds.

Keeping this in mind, if you're doing something that contains sensitive data, you can't reliably use an electronic medium to transport or store the conversations. emails can be stolen, phone calls can be recorded, text messages can be sniffed going through the air. There is almost no way to communicate that can't be used against you at some later date if it falls into the wrong hands. Even more terrifyingly is that an attacker doesn't have to come to you, thanks to the Internet, they can attack you from nearly any country on the planet.

What now?
Assuming we don't have a nice way to communicate securely or safely, what do we do? Everyone has to move information around, information is the new currency. Is it possible to do it in a way that's secure today? The short answer is no. There's nothing we can do about this today. If you send an email, it's quite possible it will leak someday. There are some ways to encrypt things, but it's impossible for most people to do correctly. There are even some apps that can help with secure communications but not everyone uses them or knows about them.

We need people to understand that information is a currency. We understand the concept of money. Your information is similarly valuable. We trade currency for goods and services, it can also be stolen if not protected. Nobody would use a bank without doors. We store our information in places that are unsecured and we often give out information for free. It will be up to the youth to solve this one, most of us old folks will never understand this concept any more than our grandparents could understand the Internet.

Once we understand the value of our information, we can more easily justify keeping it secure during transport and storage. Armored trucks transport money for a reason. Nobody is going to trust a bicycle courier to move large sums of cash, the same will be true of data. Moving things securely isn't easy nor is it free. There will have to be some sort of trade off that benefits both parties. Today it's pretty one sided with us giving out our information for free with minimal benefit.

Where do we go now? Probably nowhere. While I think things are starting to turn, we're not there yet. There will have to be a few more serious data leaks before the right questions start to get asked. But when they do, it will be imperative we understand that data is a currency. If we treat it as such it will become easier to understand what needs to be done.

Leave your comments on twitter: @joshbressers

July 28, 2016

ControllerExtraConfig and Tripleo Quickstart

Once I have the undercloud deployed, I want to be able to quickly deploy and redeploy overclouds.  However, my last attempt to affect change on the overcloud did not modify the Keystone config file the way I intended.  Once again, Steve Hardy helped me to understand what I was doing wrong.


/tmp/deploy_env.yml already definied ControllerExtraConfig: and my redefinition was ignored.

The Details

I’ve been using Quickstart to develop.  To deploy the overcloud, I run the script /home/stack/ which, in turn, runs the command:

openstack overcloud deploy --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --control-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server \
${DEPLOY_ENV_YAML:+-e $DEPLOY_ENV_YAML}  "$@"|| deploy_status=1

I want to set two parameters in the Keystone config file, so I created a file named keystone_extra_config.yml

     keystone::using_domain_config: true
     keystone::domain_config_directory: /path/to/config

And edited /home/stack/ to add in -e /home/stack/keystone_extra_config.yml likwe this:

openstack overcloud deploy --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --control-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server \
    ${DEPLOY_ENV_YAML:+-e $DEPLOY_ENV_YAML}    -e /home/stack/keystone_extra_config.yml   "$@"|| deploy_status=1

I have run this both on an already deployed overcloud and from an undercloud with no stacks deployed, but in neither case have I seen the values in the config file.

Steve Hardy walked me through this from the CLI:

openstack stack resource list -n5 overcloud | grep “OS::TripleO::Controller ”

| 1 | b4a558a2-297d-46c6-b658-46f9dc0fcd51 | OS::TripleO::Controller | CREATE_COMPLETE | 2016-07-28T01:49:02 | overcloud-Controller-y2lmuipmynnt |
| 0 | 5b93eee2-97f6-4b8e-b9a0-b5edde6b4795 | OS::TripleO::Controller | CREATE_COMPLETE | 2016-07-28T01:49:02 | overcloud-Controller-y2lmuipmynnt |
| 2 | 1fdfdfa9-759b-483c-a943-94f4c7b04d3b | OS::TripleO::Controller | CREATE_COMPLETE | 2016-07-28T01:49:02 | overcloud-Controller-y2lmuipmynnt

Looking in to each of these  stacks for the string “ontrollerExtraConfig” showed that it was defined, but was not showing my values.  Thus, my customization was not even making it as far as the Heat database.

I went back to the quickstart command and did a grep through the files included with the -e flags, and found the deploy_env.yml file already had defined this field.  Once I merged my changes into /tmp/deploy_env.yml, I saw the values specified in the Hiera data.

Of course, due to a different mistake I made, the deploy failed.  When specifying domain specific backends in a config directory, puppet validates the path….can’t pass in garbage like I was doing, just for debugging.

Once I got things clean, tore down the old overcloud and redeployed, everything worked.  Here was the final /home/stack/deploy_env.yaml environment file I used:

    keystone::using_domain_config: true
        value: true

    # In releases before Mitaka, HeatWorkers doesn't modify
    # num_engine_workers, so handle via heat::config 
        value: 1
    heat::api_cloudwatch::enabled: false
    heat::api_cfn::enabled: false
  HeatWorkers: 1
  CeilometerWorkers: 1
  CinderWorkers: 1
  GlanceWorkers: 1
  KeystoneWorkers: 1
  NeutronWorkers: 1
  NovaWorkers: 1
  SwiftWorkers: 1

And the modified version of overcloud-deploy now executes this command:

# Deploy the overcloud!
openstack overcloud deploy --debug --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --control-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server -e /home/stack/deploy_env.yaml   "$@"|| deploy_status=1

Looking in the controller nodes /etc/keystone/keystone.conf file I see:

#domain_specific_drivers_enabled = false
domain_specific_drivers_enabled = True

# Extract the domain specific configuration options from the resource backend
# where they have been stored with the domain data. This feature is disabled by
# default (in which case the domain specific options will be loaded from files
# in the domain configuration directory); set to true to enable. (boolean
# value)
#domain_configurations_from_database = false
domain_configurations_from_database = True

# Path for Keystone to locate the domain specific identity configuration files
# if domain_specific_drivers_enabled is set to true. (string value)
#domain_config_dir = /etc/keystone/domains
domain_config_dir = /etc/keystone/domains
Flocking to Kraków

In less than five days, the fourth annual Flock conference will take place in Kraków, Poland. This is Fedora’s premier contributor event each year, alternately taking place in North America and Europe. Attendance is completely free for anyone at all, so if you happen to be in the area (maybe hanging around after World Youth Day going on right now), you should certainly stop in!

This year’s conference is shaping up to be a truly excellent one, with a massive amount of exciting content to see. The full schedule has been available for a while, and I’ve got to say: there are no lulls in the action. In fact, I’ve put together my schedule of sessions I want to see and there are in fact no gaps in it. That said, here are a few of the sessions that I suspect are going to be the most exciting:

Aug. 2 @11:00 – Towards an Atomic Workstation

For a couple of years now, Fedora has been at the forefront of developing container technologies, particularly Docker and Project Atomic. Now, the Workstation SIG is looking to take some of those Project Atomic technologies and adopt them for the end-user workstation.

Aug. 2 @17:30 – University Outreach

I’ve long held that one of Fedora’s primary goals should always be to enlighten the next generation of the open source community. Over the last year, the Fedora Project began an Initiative to expand our presence in educational programs throughout the world. I’m extremely interested to see where that has taken us (and where it is going next).

Aug. 3 @11:00 – Modularity

This past year, there has been an enormous research-and-development effort poured into the concept of building a “modular” Fedora. What does this mean? Well it means solving the age-old Too Fast/Too Slow problem (sometimes described as “I want everything on my system to stay exactly the same for a long time. Except these three things over here that I always want to be running at the latest version.”). With modularity, the hope is that people will be able to put together their ideal operating system from parts bigger than just traditional packages.

Aug. 3 @16:30 – Diversity: Women in Open Source

This is a topic that is very dear to my heart, having a daughter who is already finding her way towards an engineering future. Fedora and many other projects (and companies) talk about “meritocracy” a lot: the concept that the best idea should always win. However the technology industry in general has a severe diversity problem. When we talk about “meritocracy”, the implicit contract there is that we have many ideas to choose from. However, if we don’t have a community that represents many different viewpoints and cultures, then we are by definition only choosing the best idea from a very limited pool. I’m very interested to hear how Fedora is working towards attracting people with new ideas.


July 26, 2016

FreeIPA Lightweight CA internals

In the preceding post, I explained the use cases for the FreeIPA lightweight sub-CAs feature, how to manage CAs and use them to issue certificates, and current limitations. In this post I detail some of the internals of how the feature works, including how signing keys are distributed to replicas, and how sub-CA certificate renewal works. I conclude with a brief retrospective on delivering the feature.

Full details of the design of the feature can be found on the design page. This post does not cover everything from the design page, but we will look at the aspects that are covered from the perspective of the system administrator, i.e. "what is happening on my systems?"

Dogtag lightweight CA creation

The PKI system used by FreeIPA is called Dogtag. It is a separate project with its own interfaces; most FreeIPA certificate management features are simply reflecting a subset of the corresponding Dogtag interface, often integrating some additional access controls or identity management concepts. This is certainly the case for FreeIPA sub-CAs. The Dogtag lightweight CAs feature was implemented initially to support the FreeIPA use case, yet not all aspects of the Dogtag feature are used in FreeIPA as of v4.4, and other consumers of the Dogtag feature are likely to emerge (in particular: OpenStack).

The Dogtag lightweight CAs feature has its own design page which documents the feature in detail, but it is worth mentioning some important aspects of the Dogtag feature and their impact on how FreeIPA uses the feature.

  • Dogtag lightweight CAs are managed via a REST API. The FreeIPA framework uses this API to create and manage lightweight CAs, using the privileged RA Agent certificate to authenticate. In a future release we hope to remove the RA Agent and authenticate as the FreeIPA user using GSS-API proxy credentials.
  • Each CA in a Dogtag instance, including the "main" CA, has an LDAP entry with object class authority. The schema includes fields such as subject and issuer DN, certificate serial number, and a UUID primary key, which is randomly generated for each CA. When FreeIPA creates a CA, it stores this UUID so that it can map the FreeIPA CA’s common name (CN) to the Dogtag authority ID in certificate requests or other management operations (e.g. CA deletion).
  • The "nickname" of the lightweight CA signing key and certificate in Dogtag’s NSSDB is the nickname of the "main" CA signing key, with the lightweight CA’s UUID appended. In general operation FreeIPA does not need to know this, but the ipa-certupdate program has been enhanced to set up Certmonger tracking requests for FreeIPA-managed lightweight CAs and therefore it needs to know the nicknames.
  • Dogtag lightweight CAs may be nested, but FreeIPA as of v4.4 does not make use of this capability.

So, let’s see what actually happens on a FreeIPA server when we add a lightweight CA. We will use the sc example from the previous post. The command executed to add the CA, with its output, was:

% ipa ca-add sc --subject "CN=Smart Card CA, O=IPA.LOCAL" \
    --desc "Smart Card CA"
Created CA "sc"
  Name: sc
  Description: Smart Card CA
  Authority ID: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
  Subject DN: CN=Smart Card CA,O=IPA.LOCAL
  Issuer DN: CN=Certificate Authority,O=IPA.LOCAL 201606201330

The LDAP entry added to the Dogtag database was:

dn: cn=660ad30b-7be4-4909-aa2c-2c7d874c84fd,ou=authorities,ou=ca,o=ipaca
authoritySerial: 63
objectClass: authority
objectClass: top
cn: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
authorityID: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
authorityKeyNickname: caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d87
authorityKeyHost: f24b-0.ipa.local:443
authorityEnabled: TRUE
authorityDN: CN=Smart Card CA,O=IPA.LOCAL
authorityParentDN: CN=Certificate Authority,O=IPA.LOCAL 201606201330
authorityParentID: d3e62e89-df27-4a89-bce4-e721042be730

We see the authority UUID in the authorityID attribute as well as cn and the DN. authorityKeyNickname records the nickname of the signing key in Dogtag’s NSSDB. authorityKeyHost records which hosts possess the signing key – currently just the host on which the CA was created. authoritySerial records the serial number of the certificate (more that that later). The meaning of the rest of the fields should be clear.

If we have a peek into Dogtag’s NSSDB, we can see the new CA’s certificate:

# certutil -d /etc/pki/pki-tomcat/alias -L

Certificate Nickname              Trust Attributes

caSigningCert cert-pki-ca         CTu,Cu,Cu
auditSigningCert cert-pki-ca      u,u,Pu
Server-Cert cert-pki-ca           u,u,u
caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd u,u,u
ocspSigningCert cert-pki-ca       u,u,u
subsystemCert cert-pki-ca         u,u,u

There it is, alongside the main CA signing certificate and other certificates used by Dogtag. The trust flags u,u,u indicate that the private key is also present in the NSSDB. If we pretty print the certificate we will see a few interesting things:

# certutil -d /etc/pki/pki-tomcat/alias -L \
    -n 'caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd'
        Version: 3 (0x2)
        Serial Number: 63 (0x3f)
        Signature Algorithm: PKCS #1 SHA-256 With RSA Encryption
        Issuer: "CN=Certificate Authority,O=IPA.LOCAL 201606201330"
            Not Before: Fri Jul 15 05:46:00 2016
            Not After : Tue Jul 15 05:46:00 2036
        Subject: "CN=Smart Card CA,O=IPA.LOCAL"
        Signed Extensions:
            Name: Certificate Basic Constraints
            Critical: True
            Data: Is a CA with no maximum path length.

Observe that:

  • The certificate is indeed a CA.
  • The serial number (63) agrees with the CA’s LDAP entry.
  • The validity period is 20 years, the default for CAs in Dogtag. This cannot be overridden on a per-CA basis right now, but addressing this is a priority.

Finally, let’s look at the raw entry for the CA in the FreeIPA database:

dn: cn=sc,cn=cas,cn=ca,dc=ipa,dc=local
cn: sc
ipaCaIssuerDN: CN=Certificate Authority,O=IPA.LOCAL 201606201330
objectClass: ipaca
objectClass: top
ipaCaSubjectDN: CN=Smart Card CA,O=IPA.LOCAL
ipaCaId: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
description: Smart Card CA

We can see that this entry also contains the subject and issuer DNs, and the ipaCaId attribute holds the Dogtag authority ID, which allows the FreeIPA framework to dereference the local ID (sc) to the Dogtag ID as needed. We also see that the description attribute is local to FreeIPA; Dogtag also has a description attribute for lightweight CAs but FreeIPA uses its own.

Lightweight CA replication

FreeIPA servers replicate objects in the FreeIPA directory among themselves, as do Dogtag replicas (note: in Dogtag, the term clone is often used). All Dogtag instances in a replicated environment need to observe changes to lightweight CAs (creation, modification, deletion) that were performed on another replica and update their own view so that they can respond to requests consistently. This is accomplished via an LDAP persistent search which is run in a monitor thread. Care was needed to avoid race conditions. Fortunately, the solution for LDAP-based profile storage provided a fine starting point for the authority monitor; although lightweight CAs are more complex, many of the same race conditions can occur and these were already addressed in the LDAP profile monitor implementation.

But unlike LDAP-based profiles, a lightweight CA consists of more than just an LDAP object; there is also the signing key. The signing key lives in Dogtag’s NSSDB and for security reasons cannot be transported through LDAP. This means that when a Dogtag clone observes the addition of a lightweight CA, an out-of-band mechanism to transport the signing key must also be triggered.

This mechanism is covered in the design pages but the summarised process is:

  1. A Dogtag clone observes the creation of a CA on another server and starts a KeyRetriever thread. The KeyRetriever is implemented as part of Dogtag, but it is configured to run the /usr/libexec/ipa/ipa-pki-retrieve-key program, which is part of FreeIPA. The program is invoked with arguments of the server to request the key from (this was stored in the authorityKeyHost attribute mentioned earlier), and the nickname of the key to request.
  2. ipa-pki-retrieve-key requests the key from the Custodia daemon on the source server. It authenticates as the dogtag/<requestor-hostname>@REALM service principal. If authenticated and authorised, the Custodia daemon exports the signing key from Dogtag’s NSSDB wrapped by the main CA’s private key, and delivers it to the requesting server. ipa-pki-retrieve-key outputs the wrapped key then exits.
  3. The KeyRetriever reads the wrapped key and imports (unwraps) it into the Dogtag clone’s NSSDB. It then initialises the Dogtag CA’s Signing Unit allowing the CA to service signing requests on that clone, and adds its own hostname to the CA’s authorityKeyHost attribute.

Some excerpts of the CA debug log on the clone (not the server on which the sub-CA was first created) shows this process in action. The CA debug log is found at /var/log/pki/pki-tomcat/ca/debug. Some irrelevant messages have been omitted.

[25/Jul/2016:15:45:56][authorityMonitor]: authorityMonitor: Processed change controls.
[25/Jul/2016:15:45:56][authorityMonitor]: authorityMonitor: ADD
[25/Jul/2016:15:45:56][authorityMonitor]: readAuthority: new entryUSN = 109
[25/Jul/2016:15:45:56][authorityMonitor]: CertificateAuthority init 
[25/Jul/2016:15:45:56][authorityMonitor]: ca.signing Signing Unit nickname caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd
[25/Jul/2016:15:45:56][authorityMonitor]: SigningUnit init: debug Certificate object not found
[25/Jul/2016:15:45:56][authorityMonitor]: CA signing key and cert not (yet) present in NSSDB
[25/Jul/2016:15:45:56][authorityMonitor]: Starting KeyRetrieverRunner thread

Above we see the authorityMonitor thread observe the addition of a CA. It adds the CA to its internal map and attempts to initialise it, which fails because the key and certificate are not available, so it starts a KeyRetrieverRunner in a new thread.

[25/Jul/2016:15:45:56][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Running ExternalProcessKeyRetriever
[25/Jul/2016:15:45:56][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: About to execute command: [/usr/libexec/ipa/ipa-pki-retrieve-key, caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd, f24b-0.ipa.local]

The KeyRetrieverRunner thread invokes ipa-pki-retrieve-key with the nickname of the key it wants, and a host from which it can retrieve it. If a CA has multiple sources, the KeyRetrieverRunner will try these in order with multiple invocations of the helper, until one succeeds. If none succeed, the thread goes to sleep and retries when it wakes up initially after 10 seconds, then backing off exponentially.

[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Importing key and cert
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Reinitialising SigningUnit
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: ca.signing Signing Unit nickname caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Got token Internal Key Storage Token by name
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Found cert by nickname: 'caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd' with serial number: 63
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Got private key from cert
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Got public key from cert
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: in init - got CA name CN=Smart Card CA,O=IPA.LOCAL

The key retriever successfully returned the key data and import succeeded. The signing unit then gets initialised.

[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Adding self to authorityKeyHosts attribute
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: In LdapBoundConnFactory::getConn()
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: postCommit: new entryUSN = 361
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: postCommit: nsUniqueId = 4dd42782-4a4f11e6-b003b01c-c8916432
[25/Jul/2016:15:47:14][authorityMonitor]: authorityMonitor: Processed change controls.
[25/Jul/2016:15:47:14][authorityMonitor]: authorityMonitor: MODIFY
[25/Jul/2016:15:47:14][authorityMonitor]: readAuthority: new entryUSN = 361
[25/Jul/2016:15:47:14][authorityMonitor]: readAuthority: known entryUSN = 361
[25/Jul/2016:15:47:14][authorityMonitor]: readAuthority: data is current

Finally, the Dogtag clone adds itself to the CA’s authorityKeyHosts attribute. The authorityMonitor observes this change but ignores it because its view is current.

Certificate renewal

CA signing certificates will eventually expire, and therefore require renewal. Because the FreeIPA framework operates with low privileges, it cannot add a Certmonger tracking request for sub-CAs when it creates them. Furthermore, although the renewal (i.e. the actual signing of a new certificate for the CA) should only happen on one server, the certificate must be updated in the NSSDB of all Dogtag clones.

As mentioned earlier, the ipa-certupdate command has been enhanced to add Certmonger tracking requests for FreeIPA-managed lightweight CAs. The actual renewal will only be performed on whichever server is the renewal master when Certmonger decides it is time to renew the certificate (assuming that the tracking request has been added on that server).

Let’s run ipa-certupdate on the renewal master to add the tracking request for the new CA. First observe that the tracking request does not exist yet:

# getcert list -d /etc/pki/pki-tomcat/alias |grep subject
        subject: CN=CA Audit,O=IPA.LOCAL 201606201330
        subject: CN=OCSP Subsystem,O=IPA.LOCAL 201606201330
        subject: CN=CA Subsystem,O=IPA.LOCAL 201606201330
        subject: CN=Certificate Authority,O=IPA.LOCAL 201606201330
        subject: CN=f24b-0.ipa.local,O=IPA.LOCAL 201606201330

As expected, we do not see our sub-CA certificate above. After running ipa-certupdate the following tracking request appears:

Request ID '20160725222909':
        status: MONITORING
        stuck: no
        key pair storage: type=NSSDB,location='/etc/pki/pki-tomcat/alias',nickname='caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd',token='NSS Certificate DB',pin set
        certificate: type=NSSDB,location='/etc/pki/pki-tomcat/alias',nickname='caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd',token='NSS Certificate DB'
        CA: dogtag-ipa-ca-renew-agent
        issuer: CN=Certificate Authority,O=IPA.LOCAL 201606201330
        subject: CN=Smart Card CA,O=IPA.LOCAL
        expires: 2036-07-15 05:46:00 UTC
        key usage: digitalSignature,nonRepudiation,keyCertSign,cRLSign
        pre-save command: /usr/libexec/ipa/certmonger/stop_pkicad
        post-save command: /usr/libexec/ipa/certmonger/renew_ca_cert "caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd"
        track: yes
        auto-renew: yes

As for updating the certificate in each clone’s NSSDB, Dogtag itself takes care of that. All that is required is for the renewal master to update the CA’s authoritySerial attribute in the Dogtag database. The renew_ca_cert Certmonger post-renewal hook script performs this step. Each Dogtag clone observes the update (in the monitor thread), looks up the certificate with the indicated serial number in its certificate repository (a new entry that will also have been recently replicated to the clone), and adds that certificate to its NSSDB. Again, let’s observe this process by forcing a certificate renewal:

# getcert resubmit -i 20160725222909
Resubmitting "20160725222909" to "dogtag-ipa-ca-renew-agent".

After about 30 seconds the renewal process is complete. When we examine the certificate in the NSSDB we see, as expected, a new serial number:

# certutil -d /etc/pki/pki-tomcat/alias -L \
    -n "caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd" \
    | grep -i serial
        Serial Number: 74 (0x4a)

We also see that the renew_ca_cert script has updated the serial in Dogtag’s database:

# ldapsearch -D cn="Directory Manager" -w4me2Test -b o=ipaca \
    '(cn=660ad30b-7be4-4909-aa2c-2c7d874c84fd)' authoritySerial
dn: cn=660ad30b-7be4-4909-aa2c-2c7d874c84fd,ou=authorities,ou=ca,o=ipaca
authoritySerial: 74

Finally, if we look at the CA debug log on the clone, we’ll see that the the authority monitor observes the serial number change and updates the certificate in its own NSSDB (again, some irrelevant or low-information messages have been omitted):

[26/Jul/2016:10:43:28][authorityMonitor]: authorityMonitor: Processed change controls.
[26/Jul/2016:10:43:28][authorityMonitor]: authorityMonitor: MODIFY
[26/Jul/2016:10:43:28][authorityMonitor]: readAuthority: new entryUSN = 1832
[26/Jul/2016:10:43:28][authorityMonitor]: readAuthority: known entryUSN = 361
[26/Jul/2016:10:43:28][authorityMonitor]: CertificateAuthority init 
[26/Jul/2016:10:43:28][authorityMonitor]: ca.signing Signing Unit nickname caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd
[26/Jul/2016:10:43:28][authorityMonitor]: Got token Internal Key Storage Token by name
[26/Jul/2016:10:43:28][authorityMonitor]: Found cert by nickname: 'caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd' with serial number: 63
[26/Jul/2016:10:43:28][authorityMonitor]: Got private key from cert
[26/Jul/2016:10:43:28][authorityMonitor]: Got public key from cert
[26/Jul/2016:10:43:28][authorityMonitor]: CA signing unit inited
[26/Jul/2016:10:43:28][authorityMonitor]: in init - got CA name CN=Smart Card CA,O=IPA.LOCAL
[26/Jul/2016:10:43:28][authorityMonitor]: Updating certificate in NSSDB; new serial number: 74

When the authority monitor processes the change, it reinitialises the CA including its signing unit. Then it observes that the serial number of the certificate in its NSSDB differs from the serial number from LDAP. It pulls the certificate with the new serial number from its certificate repository, imports it into NSSDB, then reinitialises the signing unit once more and sees the correct serial number:

[26/Jul/2016:10:43:28][authorityMonitor]: ca.signing Signing Unit nickname caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd
[26/Jul/2016:10:43:28][authorityMonitor]: Got token Internal Key Storage Token by name
[26/Jul/2016:10:43:28][authorityMonitor]: Found cert by nickname: 'caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd' with serial number: 74
[26/Jul/2016:10:43:28][authorityMonitor]: Got private key from cert
[26/Jul/2016:10:43:28][authorityMonitor]: Got public key from cert
[26/Jul/2016:10:43:28][authorityMonitor]: CA signing unit inited
[26/Jul/2016:10:43:28][authorityMonitor]: in init - got CA name CN=Smart Card CA,O=IPA.LOCAL

Currently this update mechanism is only used for lightweight CAs, but it would work just as well for the main CA too, and we plan to switch at some stage so that the process is consistent for all CAs.

Wrapping up

I hope you have enjoyed this tour of some of the lightweight CA internals, and in particular seeing how the design actually plays out on your systems in the real world.

FreeIPA lightweight CAs has been the most complex and challenging project I have ever undertaken. It took the best part of a year from early design and proof of concept, to implementing the Dogtag lightweight CAs feature, then FreeIPA integration, and numerous bug fixes, refinements or outright redesigns along the way. Although there are still some rough edges, some important missing features and, I expect, many an RFE to come, I am pleased with what has been delivered and the overall design.

Thanks are due to all of my colleagues who contributed to the design and review of the feature; each bit of input from all of you has been valuable. I especially thank Ade Lee and Endi Dewata from the Dogtag team for their help with API design and many code reviews over a long period of time, and from the FreeIPA team Jan Cholasta and Martin Babinsky for a their invaluable input into the design, and much code review and testing. I could not have delivered this feature without your help; thank you for your collaboration!

July 25, 2016

Lightweight Sub-CAs in FreeIPA 4.4

Last year FreeIPA 4.2 brought us some great new certificate management features, including custom certificate profiles and user certificates. The upcoming FreeIPA 4.4 release builds upon this groundwork and introduces lightweight sub-CAs, a feature that lets admins to mint new CAs under the main FreeIPA CA and allows certificates for different purposes to be issued in different certificate domains. In this post I will review the use cases and demonstrate the process of creating, managing and issuing certificates from sub-CAs. (A follow-up post will detail some of the mechanisms that operate behind the scenes to make the feature work.)

Use cases

Currently, all certificates issued by FreeIPA are issued by a single CA. Say you want to issue certificates for various purposes: regular server certificates, and user certificates for VPN authentication, and authentication to a particular web service. Currently, assuming the certificate bore the appropriate Key Usage and Extended Key Usages extensions (with the default profile, they do), a certificate issued for one of these purposes could be used for all of the other purposes.

Issuing certificates for particular purposes (especially client authentication scenarios) from a sub-CA allows an administrator to configure the endpoint authenticating the clients to use the immediate issuer certificate for validation client certificates. Therefore, if you had a sub-CA for issuing VPN authentication certificates, and a different sub-CA for issuing certificates for authenticating to the web service, one could configure these services to accept certificates issued by the relevant CA only. Thus, where previously the scope of usability may have been unacceptably broad, administrators now have more fine-grained control over how certificates can be used.

Finally, another important consideration is that while revoking the main IPA CA is usually out of the question, it is now possible to revoke an intermediate CA certificate. If you create a CA for a particular organisational unit (e.g. some department or working group) or service, if or when that unit or service ceases to operate or exist, the related CA certificate can be revoked, rendering certificates issued by that CA useless, as long as relying endpoints perform CRL or OCSP checks.

Creating and managing sub-CAs

In this scenario, we will add a sub-CA that will be used to issue certificates for users’ smart cards. We assume that a profile for this purpose already exists, called userSmartCard.

To begin with, we are authenticated as admin or another user that has CA management privileges. Let’s see what CAs FreeIPA already knows about:

% ipa ca-find
1 CA matched
  Name: ipa
  Description: IPA CA
  Authority ID: d3e62e89-df27-4a89-bce4-e721042be730
  Subject DN: CN=Certificate Authority,O=IPA.LOCAL 201606201330
  Issuer DN: CN=Certificate Authority,O=IPA.LOCAL 201606201330
Number of entries returned 1

We can see that FreeIPA knows about the ipa CA. This is the "main" CA in the FreeIPA infrastructure. Depending on how FreeIPA was installed, it could be a root CA or it could be chained to an external CA. The ipa CA entry is added automatically when installing or upgrading to FreeIPA 4.4.

Now, let’s add a new sub-CA called sc:

% ipa ca-add sc --subject "CN=Smart Card CA, O=IPA.LOCAL" \
    --desc "Smart Card CA"
Created CA "sc"
  Name: sc
  Description: Smart Card CA
  Authority ID: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
  Subject DN: CN=Smart Card CA,O=IPA.LOCAL
  Issuer DN: CN=Certificate Authority,O=IPA.LOCAL 201606201330

The --subject option gives the full Subject Distinguished Name for the new CA; it is mandatory, and must be unique among CAs managed by FreeIPA. An optional description can be given with --desc. In the output we see that the Issuer DN is that of the IPA CA.

Having created the new CA, we must add it to one or more CA ACLs to allow it to be used. CA ACLs were added in FreeIPA 4.2 for defining policies about which profiles could be used for issuing certificates to which subject principals (note: the subject principal is not necessarily the principal performing the certificate request). In FreeIPA 4.4 the CA ACL concept has been extended to also include which CA is being asked to issue the certificate.

We will add a CA ACL called user-sc-userSmartCard and associate it with all users, with the userSmartCard profile, and with the sc CA:

% ipa caacl-add user-sc-userSmartCard --usercat=all
Added CA ACL "user-sc-userSmartCard"
  ACL name: user-sc-userSmartCard
  Enabled: TRUE
  User category: all

% ipa caacl-add-profile user-sc-userSmartCard --certprofile userSmartCard
  ACL name: user-sc-userSmartCard
  Enabled: TRUE
  User category: all
  CAs: sc
  Profiles: userSmartCard
Number of members added 1

% ipa caacl-add-ca user-sc-userSmartCard --ca sc
  ACL name: user-sc-userSmartCard
  Enabled: TRUE
  User category: all
  CAs: sc
Number of members added 1

A CA ACL can reference multiple CAs individually, or, like we saw with users above, we can associate a CA ACL with all CAs by setting --cacat=all when we create the CA ACL, or via the ipa ca-mod command.

A special behaviour of CA ACLs with respect to CAs must be mentioned: if a CA ACL is associated with no CAs (either individually or by category), then it allows access to the ipa CA (and only that CA). This behaviour, though inconsistent with other aspects of CA ACLs, is for compatibility with pre-sub-CAs CA ACLs. An alternative approach is being discussed and could be implemented before the final release.

Requesting certificates from sub-CAs

The ipa cert-request command has learned the --ca argument for directing the certificate request to a particular sub-CA. If it is not given, it defaults to ipa.

alice already has a CSR for the key in her smart card, so now she can request a certificate from the sc CA:

% ipa cert-request --principal alice \
    --profile userSmartCard --ca sc /path/to/csr.req
  Certificate: MIIDmDCCAoCgAwIBAgIBQDANBgkqhkiG9w0BA...
  Subject: CN=alice,O=IPA.LOCAL
  Issuer: CN=Smart Card CA,O=IPA.LOCAL
  Not Before: Fri Jul 15 05:57:04 2016 UTC
  Not After: Mon Jul 16 05:57:04 2018 UTC
  Fingerprint (MD5): 6f:67:ab:4e:0c:3d:37:7e:e6:02:fc:bb:5d:fe:aa:88
  Fingerprint (SHA1): 0d:52:a7:c4:e1:b9:33:56:0e:94:8e:24:8b:2d:85:6e:9d:26:e6:aa
  Serial number: 64
  Serial number (hex): 0x40

Certmonger has also learned the -X/--issuer option for specifying that the request be directed to the named issuer. There is a clash of terminology here; the "CA" terminology in Certmonger is already used to refer to a particular CA "endpoint". Various kinds of CAs and multiple instances thereof are supported. But now, with Dogtag and FreeIPA, a single CA may actually host many CAs. Conceptually this is similar to HTTP virtual hosts, with the -X option corresponding to the Host: header for disambiguating the CA to be used.

If the -X option was given when creating the tracking request, the Certmonger FreeIPA submit helper uses its value in the --ca option to ipa cert-request. These requests are subject to CA ACLs.


It is worth mentioning a few of the limitations of the sub-CAs feature, as it will be delivered in FreeIPA 4.4.

All sub-CAs are signed by the ipa CA; there is no support for "nesting" CAs. This limitation is imposed by FreeIPA – the lightweight CAs feature in Dogtag does not have this limitation. It could be easily lifted in a future release, if there is a demand for it.

There is no support for introducing unrelated CAs into the infrastructure, either by creating a new root CA or by importing an unrelated external CA. Dogtag does not have support for this yet, either, but the lightweight CAs feature was designed so that this would be possible to implement. This is also why all the commands and argument names mention "CA" instead of "Sub-CA". I expect that there will be demand for this feature at some stage in the future.

Currently, the key type and size are fixed at RSA 2048. Same is true in Dogtag, and this is a fairly high priority to address. Similarly, the validity period is fixed, and we will need to address this also, probably by allowing custom CA profiles to be used.


The Sub-CAs feature will round out FreeIPA’s certificate management capabilities making FreeIPA a more attractive solution for organisations with sophisticated certificate requirements. Multiple security domains can be created for issuing certificates with different purposes or scopes. Administrators have a simple interface for creating and managing CAs, and rules for how those CAs can be used.

There are some limitations which may be addressed in a future release; the ability to control key type/size and CA validity period will be the highest priority among them.

This post examined the use cases and high-level user/administrator experience of sub-CAs. In the next post, I will detail some of the machinery that makes the sub-CAs feature work.

July 24, 2016

Looking for Andre

My Brother sent out the following message. Signal boosting it here.

“A few weeks ago I started talking to a few guys on the street. (Homeless) Let’s call them James and Anthony. Let’s just skip ahead. I bought them lunch. Ok. I bought $42 worth of Wendy’s $1 burgers and nuggets and a case of water. On top of their lunch. They gathered up all their friends by the Library in Copley sq and made sure that everyone ate. It was like a cookout. You should have seen how happy everyone was. It gave me a feeling that was unexplainable.

“This morning I was in Downtown crossings. I got the feeling in my gut again. That do something better today feeling. I saw a blind guy. His eyes were a mess. He was thin. Almost emaciated. Let’s call him Andre’ he is 30 years old.



I bought him lunch. I sat with him at a table while he ate. We talked. Andre’s back story…8 years ago he was in college. He was a plumbers apprentice. He was going on a date. As he walked up to the door to knock for the girl. Someone came up and shot him twice in the temple. Andre’ woke up in the hospital blind. To this day he has no idea who or why he was shot. The only possessions Andre’ had was the way-too-warm clothes on his back, his blind cane. His sign, and his cup. I took Andre’ to TJ Maxx. It’s 90 degrees at at 9:30am. I got him a t-shirt, shorts, clean socks and underwear and a back pack. After I paid, I took him back to the dressing room so he could have some privacy while he changed. I told the lady at the dressing room that he was going in to change. She told me that wasn’t allowed. I kindly informed her that I wasn’t asking… She looked at me and quickly realized it wasn’t a request. More of a statement. I must have had a look on my face.

I get those sometimes.

She nodded her understanding. In the dressing room Andre’ cried. He was ashamed for crying. I didn’t say much. Just put my hand on his back for a second to let him knew I understood. After he changed I took him back to where I originally met him and found out his routine. Where he goes when and such. I left Andre’ in his spot and went to go find James and Anthony. You remember them from the beginning of this story. They were in the same spot as a few weeks ago. They remembered me. I told them it was time to return the favor. I explained to them that I wanted them to look out for Andre’ to make sure he was safe. Andre’ has been repeatedly mugged. Who the fuck mugs a hungry homeless blind guy? Well. They must have seen the look in my face saying this wasn’t a request.

I apparently get that look sometimes.

They came with me from Copley all the way to downtown crossings. We went looking for Andre’. We looked all over but couldn’t find him. We went all over south station and back up all over downtown crossings. (For those not familiar, Google a map of Boston) we couldn’t find Andre’. Anthony said he’s seen him around and knew who I was talking about. They promised me they would look for him everyday. I know they will too. They look out for theirs. Remember all the food I bought them and how they made sure everyone ate? James doesn’t like bullies. He sure as shit won’t tolerate someone stealing from a blind and scared homeless guy. Anthony spends his mornings in south station. He promised me that he will find him and try to bring him to where they stay. It’s safer in numbers and when you have a crew watching your back. You have to know who to trust. That’s what they told me. I gave James and Anthony some money for their time and bought them each a cold drink.

“It’s fucking hot out.

“These guys are all on hard times. Some of them fucked up. Some were just unlucky. Andre’…now that’s some shit luck. That’s just not fucking fair. I’ve never met someone like Andre’. How in the hell would I survive if I couldn’t see? I have an amazing family and a great group of friends. Andre’ has no one. Did I change his life? Nope. Did I make his day better? I honestly hope so. I talked to him like a man. I didn’t let him know how horrible I felt for him. No matter how far you fall in life. If you have the strength to get up each day and try to feed your self, you still have pride, you still have hope. I didn’t want to take away any of his pride. He doesn’t have much to begin with. But he must have a little. I will continue to look for Andre’ every day. I met him near my office. I can look during my lunch. I have to find him and keep an eye on him.

“No matter how bad things get. No matter how unfair you feel you have been treated. Pretty much no matter what your lot in life is. Think of Andre’ when you feel down. If he has the strength to go on… So do you.

“I didn’t write this to say ‘look what great things I did.’ I wish I could write this with out being part of the story. There is no way I could express how much this meeting of Andre’ has effected me with out letting you know this is what I did today. ..

“I just got home from this experience. I’ll update this when I find Andre’ and let you know how he’s doing. If anyone in Boston reads this and sees a black guy about my height. Thinner than me…Obviously blind.

“Please hashtag ‪#‎lookingforAndre‬ and tell me where you saw him. Like I said. South station or downtown crossings are the areas that I know of. Thank you for reading this. Help me find Andre’.”

And then he sent this

“I found Andre’. He is meeting me for breakfast tomorrow.”



Billy Set up a fundraising account for Andre.


July 20, 2016

IoT Gateways

After discussing the “thing” part of IoT in Devices – the “Thing” in IoT let’s take a look at overall IoT system design.

IoT Gateways connect IoT Devices to IoT back-end systems. Gateways connect to devices using interfaces like Ethernet, WiFi, Bluetooth, 6LoWPAN, RS-485 and CANbus. Gateways connect to back-end systems through the Internet, commonly using Ethernet, WiFi, or cellular connections. Gateways perform multiple tasks, including concatenation of multiple devices, protocol conversion, device management, and security. Gateways may also perform application processing.

Since IoT Gateways are connected directly to IoT Devices they have to be co-located with the Devices. This means that gateways are deployed in hostile environments. They are accessed through network interfaces connecting both to local devices and to the Internet. People have physical access to the gateways. Users need access to the gateway to perform a variety of functions such as device discovery and registration. These users may be inexperienced, malicious, or both.

Gateways will often need to function which disconnected from the Internet. Such disconnected operation may be deliberate – a low power sensor may only connect to the network once a day, and spend the rest of the time in a low power sleep state. A system on a moving vehicle such as a truck, train, or ship may have critical communications through an expensive, low bandwidth cellular link, and then intermittently connect to a high bandwidth link such as WiFi. This might occur when a truck pulls into a warehouse or service station, when a ship docks, or when a train enters a station. These systems would be designed for disconnected operation. Another case might be a hospital, which needs to continue operations, perhaps in a degraded mode, in events where network connectivity, power, and other resources fail. It is clearly unacceptable for a hospital to shut down if it loses connection to the cloud!

These situations mean that a complete software stack needs to be installed on the gateway, with all of the management, update, and access challenges that this presents.

While gateways will most commonly be structured as application specific appliances there are many ways to use gateways.

July 18, 2016

Using a HooToo Nano as a magic VPN box
I've been getting myself ready for Blackhat. If you're going you know this conference isn't like most. You don't bring your normal gear with you. You turn the tinfoil hat knob up to an 11, then keep turning it until it breaks off. I did do one thing that's pretty clever this year though, I have no doubt it could be useful for someone else putting together an overengineered tin foil hat security rig.

When I travel I use a little travel router from HooToo. Specifically this one. The basic idea is I can use either ethernet or wifi to connect all my devices to the Internet. I get my own private network behind the device which lets the Chromecast work in a hotel and means I don't have to keep logging in 15 devices once a day. This got me thinking though, wouldn't it be cool if the HooToo router could VPN for me.

Enter the HooToo Nano.

Now I'm sure I could have found a travel router someone makes that does VPN, but that's not nearly as exciting as figuring this out myself, bricking it a few times, unbricking it, and eventually having a solution that works well enough I can live with it. You can install OpenWRT on it which makes it an insanely awesome device.

Here's the basics. I connect the router to a wireless network (which is a pain to with OpenWRT). Once I'm connected up, I flip the switch on the side of the Nano and it connects to the VPN, a green light turns on once the VPN is active. Everyone knows green means good, right? If I flip the switch back, it turns the VPN off (the green light turns off). The biggest problem was there is a bug in OpenWRT where if one of the wireless networks it's configured to connect to can't be found, none of the wireless will come up. My solution is I can hit the reset button to return the router to a known good state.

In the spirit of open source, I'll explain how to do all this. Your mileage may vary, it's not simple, but let's face it, it's awesome. I have a magic box that when the green light turns on, I no longer have to worry about the scary local wifi. Perfect for a conference where nobody and nothing can be trusted.

On with the show.

First, you need a HooToo Nano (this is easy). Then you install OpenWRT (this is less easy). I'm not going to explain this part. Apart from already being documented, I don't want to do it again to write it down, I have things working, I'm not touching anything.

Next you need to get openvpn working on it. I followed these instructions from the IPredator folks.

At this point you should have a functioning VPN if you run the init.d openvpn script. With the VPN up, I setup a firewall target called 'vpn'. That name will be important later.

First, we will need to create a nice default configuration. As I said before, OpenWRT has a bug where if one of your wireless networks can't be found, none will work. As I don't have time to figure that bug out right now, I put together some configuration files that only have one wireless network configured as an access point. This configuration exists so I can connect to the router and setup more networks. I then copied all the configuration files from /etc/config to /root/config/
Then I edit /etc/rc.button/reset to add the line
cp /root/config/* /etc/config/
Right before the sync and reboot commands. By doing this I can hit the reset button with a paperclip to return the router to my default settings. Also as a side note, if you hold the reset button down for more than 5 seconds it will do an OpenWRT factory reset, so don't do that.

Lastly, we setup the switch. The best way I could find to read it was by creating the directory /etc/hotplug.d/button, then adding an executable script called "buttons" to it.
root@OpenWrt:~# cat /etc/hotplug.d/button/buttons
. /etc/profile
#logger the button was $BUTTON and the action was $ACTION
if test "$BUTTON" = 'BTN_0'; then
if test "$ACTION" = 'pressed'; then
uci set firewall.@forwarding[0].dest='vpn'
/etc/init.d/openvpn start
if test "$ACTION" = 'released'; then
uci set firewall.@forwarding[0].dest='wan'
/etc/init.d/openvpn stop
/sbin/fw3 reload
As you can see in the script, I set the vpn firewall to my forwarding target. If you name your vpn firewall something else, be sure to change it.

Without  a doubt these instructions aren't as clear as they should be. I don't have time right now to write this up properly, someday I would love to put together an OpenWRT image with all this baked in, but for the moment I hope it's useful for someone.

If you try this and have questions, feel free to find me on Twitter: @joshbressers

July 11, 2016

Entry level AI
I was listening to the podcast Security Weekly and the topic of using AI For security work came up. This got me thinking about how most people make their way into security and what something like AI might mean for the industry.

In virtually every industry you start out doing some sort of horrible job nobody else wants to do, but you have to start there because it's the place you start to learn the skills you need for more exciting and interesting work. Nobody wants to go over yesterday's security event log, but somebody does it.

Now consider this in the context of AI. AI can and will parse the event logs faster and better than a human ever could. We're terrible at repetitive boring tasks. Computers are awesome at repetitive boring tasks. It might take the intern two hours to parse the log files, it will take the log parser two seconds. And the computer won't start thinking about donuts halfway through. Of course there are plenty of arguments how today's AI have problems which is true. They're still probably better than humans though.

But here is what really got me thinking. As more and more of this work moves to the domain of AI and machines, what happens to the entry level work? I'm all for replacing humans with robots, without getting into the conversation about what will all the humans do when the robots take over, I'm more interested in entry level work and where the new talent comes from.

For the foreseeable future, we will need people to do the high skilled security work. By definition most of the high skilled people are a bit on the aged side. Most of us worked our way up from doing something that can be automated away (thank goodness). But where will get our new batch of geezers from? If there are no entry level offering, how can security people make the jump to the next level? I'm sure right now there are a bunch of people standing up screaming "TRAINING", but let's face it, that only gets you a little way there, you still need to get your hands dirty before you're actually useful. You're not going to trust a brain surgeon who has never been in an operating room but has all the best training.

I don't have any answers or even any suggestions here. It just happened to get me thinking. It's possible automation will follow behind the geezers which would be a suitable solution. It's possible we'll need to make some token entry level positions just to raise the skill levels.

What do you think? @joshbressers

July 10, 2016


The term Liveness here refers to the  need to ensure that the data used to make an authorization check is valid at the time of the check.

The mistake I made with PKI tokens was in not realizing how important Liveness was.  The mistake was based on the age old error of confusing authentication with authorization.  Since a Keystone token is used for both, I was confused into thinking that the primary importance was on authentication, but the reality is that the most important thing a token tells you is information essential to making an authorization decision.

Who you are does not change often.  What you can do changes much more often.  What OpenStack needs in the token protocol is a confirmation that the user is authorized to make this action right now.  PKI tokens, without revocation checks, lost that liveness check.  The revocation check undermined the primary value of PKI.

That is the frustration most people have with certificate revocation lists (CRLs).  Since Certificates are so long lived, there is very little “freshness” to the data.  A CRL is a way to say “not invalidated yet” but, since a cert might carry data more than just “who are you” certificates can often become invalid.  Thus, any active system built on X509 for authorization (not just authentication) is going to have many many revocations.  Keystone tokens fit that same profile. The return to server validated tokens (UUID or Fernet) return that Freshness check.

However, bearer tokens have a different way of going stale.  If I get a token, use it immediately, the server knows that It was very highly probably that the token came from me.  If I wait, the probability drops.  The more I use the same token, and the longer I use it, the greater the probability is that someone other than me is going to get access to that token.  And that means the probability that it is going to be misused has also increased.

I’ve long said that what I want is a token that lasts roughly five minutes.  That means that it is issued, used, and  discarded, with a little wiggle room for latency and clock skew across the network.  The problem with this is that a token is often used for a long running task.  If a task takes 3 hours, but a token is good for only five minutes, there is no way to perform the task with just that token.

One possible approach to returning this freshness check is to always have some fresh token on a call, just not necessarily the one that the user originally requested.  This is the idea behind the Trust API.  A Trust is kind-of-like a long term token, but one that is only valid when paired with a short term token for the trustee.  But creating a trust every time a user wants to create a new virtual machine is too onerous, too much overhead.  What we want, instead is a rule that says:

When Nova calls Glance on behalf of a user, Nova passes a freshly issued token for itself along with the original users token.  The original user’s token will be validated based on when it was issued.  Authorization requires the combination of a fresh token for the Nova service user and a not-so-fresh-but-with-the-right-roles token for the end user.

This could be done with no changes to the existing token format. Set the token expiration to 12 hours.  The only change would be inside python-keystonemiddleware.  It would have a pair of rules:

  1. If a single token is passed in, it must have been issued within five minutes.  Otherwise, the operation returns a 401.
  2. If a service token is passed in with the user’s token, the service token must have been issued within five minutes.  The users token is validated normally.

An additional scope limiting mechanism would further reduce the possibility of abuse.  For example,

  • Glance could limit the service-token scoped operations from Nova to fetching an image and saving a snapshot.
  • Nova might only allow service-scoped tokens from a service like Trove within a 15 minute window.
  • A user might have to ask for an explicit “redelegation” role on a token before handing it off to some untrusted service run off site.

With Horizon, we already have a mechanism that says that it has to fetch an unscoped token first, and then use that to fetch a scoped token.  Horizon can be smart enough to fetch an scoped token before each bunch of calls to a remote server, cache if for only a minute, and use the unscoped token only in communication with Keystone.  The unscoped token, being validated by Keystone, is sufficient for maintaining “Liveness” of the rest of the data for a particular workflow.

Its funny how little change this would require to OpenStack, and how big an impact it would make on security.  It is also funny how long it took for this concept to coalesce.

July 09, 2016

Tokens without revocation

PKI tokens in Keystone suffered from many things, most essentially the trials due to the various forms of revocation. I never wanted revocation in the first place. What could we have done differently? It just (I mean moments ago) came to me.

A PKI token is a signed document that says “at this point in time, these things are true” where “these things” have to do with users roles in projects. Revocation means “these things are no longer true.” But long running tasks need long running authentication. PKI tokens seem built for that.

What we should distinguish is a difference between kicking off a new job, and continued authorization for an old job. When a user requests something from Nova, the only identity that comes into play is the users own Identity. Nova needs to confirm this, but, in a PKI token world, there is no need to go and ask Keystone.

In a complex operation like launching a VM, Nova needs to ask Glance to do something. Today, Nova passes on the token it received, and all is well. This makes tokens into true bearer tokens, and they are passed around far too much for my comfort.

Lets say that, to start, when Nova calls Glance, Nova’s own Identity should be confirmed. Tokens are really poor for this, a much better way would be to use X509. While Glance would need to do a mapping transform, the identity of Nova would not be transferable. Put another way, Nova would not be handing off a bearer token to Glance. Bearer tokens from Powerful systems like Nova are a really scary thing.

If we had this combination of user-confirmed-data and service-identity, we would have a really powerful delegation system. Why could this not be done today, with UUID/Fernet tokens? If we only ever had to deal with a max of two hops, (Nova to Glance, Nova to Neutron) we could.

Enter Trove, Heat, Sahara, and any other process that does work on behalf of a user. Lets make it really fun and say that we have the following chain of operations:


If any one links in this chain is untrusted, we cannot pass tokens along.
What if, however, each step had a rule that said “I can accept tokens for users from Endpoint E”  and passed a PKI token along.  User submits a PKI token to Heat.  Heat passes this. plus its own identity on to Sahara, that trusts Heat.  And so on down the line.

OK…revocations.  We say here that a PKI token is never revoked.  We make it valid for the length of long running operations…say a day.

But we add an additional rule:  A user can only use a PKI token within 5 minutes of issue.

Service to Service calls can use PKI tokens to say “here is when it was authorized, and it was good then.”

A user holds on to A PKI token for 10 minutes, tries to call Nova, and the token is rejected as “too old.”

This same structure would work with Fernet tokens, assuming a couple things:

  1. We get rid of revocations checks for tokens validated with service tokens.
  2. If a user loses a role, we are OK with having a long term operation depending on that role failing.

I think this general structure would make OpenStack a hell of a lot more scalably secure than it is today.

Huge thanks to Jamie Lennox for proposing a mechanism along these lines.

Bypassing Version Discovery in Keystoneauth1

I’ve been a happy Dreamhost customer for many years.  So I was thrilled when I heard that they had upgrade Dreamcompute to Mitaka.  So, like the good Keystoner that I am, I went to test it out.  Of course, I tried to use the V3 API.   And it failed.

What?  Dreamhost wouldn’t let me down, would they?

No.  V3 works fine, it is discovery that is misconfigured.

If you do not tell the openstack client (and thus keystoneauth1) what plugin to use, it defaults to the non version specific Password plugin that does version discovery,  What this means is it will go to the auth URL you give it, and try to figure out what the right version to use is.  And, it so happens that there is a nasty bit of Keystone which is not well documented that makes the dreamhost /v3 page look like this:

$ curl $OS_AUTH_URL
{"version": {"status": "stable", "updated": "2013-03-06T00:00:00Z", "media-types":

[{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"},

{"base": "application/xml", "type": "application/vnd.openstack.identity-v3+xml"}], "id":

"v3.0", "links": [{"href": "", "rel": "self"}]}}

See that last link?

Now, like a good service provider, Dreamhost keeps its Keystone administration inside, behind their firewall.


Non-authoritative answer:

[ayoung@ayoung541 dreamhost]$ curl

Crickets…hangs.  Same with a request to 35357.  And since the Password auth plugin is going to use the URL from the /v3 page, which is

To get around this, Dreamhost will shortly change their Keystone config file:  If they have the base line config shipped with Keystone, they have, in the section:


admin_endpoint = <None>

Which is what is used in discovery to build the URL above.  yeah,  It is dumb.  Instead, they will set it to

And discovery will work.

But I am impatient, and I want to test it now. The work around is to bypass discovery and specify the V3 version of the Keystoneauth1 Password protocol. The version specific plugin uses the AUTH_URL as provided to figure out where to get tokens. With the line:

export OS_AUTH_TYPE=v3password

And now…

$ openstack server show   
| Field                                | Value                                                   |
| OS-DCF:diskConfig                    | MANUAL                                                  |
| OS-EXT-AZ:availability_zone          | iad-1                                                   |
| OS-EXT-STS:power_state               | 1                                                       |
| OS-EXT-STS:task_state                | None                                                    |
| OS-EXT-STS:vm_state                  | active                                                  |
| OS-SRV-USG:launched_at               | 2016-06-17T03:28:48.000000                              |
| OS-SRV-USG:terminated_at             | None                                                    |
| accessIPv4                           |                                                         |
| accessIPv6                           |                                                         |
| addresses                            | private-network=2607:f298:6050:499d:f816:3eff:fe6a:afdb, 
                                               ,             |
| config_drive                         |                                                         |
| created                              | 2016-06-17T03:27:09Z                                    |
| flavor                               | warpspeed (400)                                         |
| hostId                               | 4a7c64b912cfeda73c2c56ac52e8ffd124aac29ec54e1e4902d54bd4|
| id                                   | f0f46fd3-fa59-4a5b-835d-a638f6276566                    |
| image                                | CentOS-7 (c1e8c5b5-bea6-45e9-8202-b8e769b661a4)         |
| key_name                             | ayoung-pubkey                                           |
| name                                 |                                      |
| os-extended-volumes:volumes_attached | []                                                      |
| progress                             | 0                                                       |
| project_id                           | 9c7e4956ea124220a87094a0a665ec82                        |
| properties                           |                                                         |
| security_groups                      | [{u'name': u'ayoung-all-open'}]                         |
| status                               | ACTIVE                                                  |
| updated                              | 2016-06-17T03:28:24Z                                    |
| user_id                              | b6fd4d08f2c54d5da1bb0309f96245bc                        |

And how cool is that: they are using IPv6 for their private network.

If you want to generate your own V3 config file from the file they ship, use this.

July 08, 2016

Installing FreeIPA in as few lines as possible

I had this in another post, but I think it is worth its own.

sudo hostnamectl set-hostname --static undercloud.ayoung-dell-t1700.test
export address=`ip -4 addr  show eth0 primary | awk '/inet/ {sub ("/24" ,"" , $2) ; print $2}'`
echo $address `hostname` | sudo tee -a /etc/hosts
sudo yum -y install ipa-server-dns
export P=FreIPA4All
ipa-server-install -U -r `hostname -d|tr "[a-z]" "[A-Z]"` -p $P -a $P --setup-dns `awk '/^name/ {print "--forwarder",$2}' /etc/resolv.conf`

Just make sure you have enough entropy.

Merging FreeIPA and Tripleo Undercloud Apache installs

My Experiment yesterday left me with a broken IPA install. I aim to fix that.

To get to the start state:

From my laptop, kick off a Tripleo Quickstart, stopping prior to undercloud deployment:

./ --teardown all -t  untagged,provision,environment,undercloud-scripts  ayoung-dell-t1700.test

SSH in to the machine …

ssh -F /home/ayoung/.quickstart/ssh.config.ansible undercloud

and set up FreeIPA;

$ cat


sudo hostnamectl set-hostname --static undercloud.ayoung-dell-t1700.test
export address=`ip -4 addr  show eth0 primary | awk '/inet/ {sub ("/24" ,"" , $2) ; print $2}'`
echo $address `hostname` | sudo tee -a /etc/hosts
sudo yum -y install ipa-server-dns
export P=FreIPA4All
sudo ipa-server-install -U -r `hostname -d|tr "[a-z]" "[A-Z]"` -p $P -a $P --setup-dns `awk '/^name/ {print "--forwarder",$2}' /etc/resolv.conf`

Backup the HTTPD config directory:

 sudo cp -a /etc/httpd/ /root

Now go continue the undercloud install


Once that is done, the undercloud passes a sanity check. Doing a diff between the two directories shows a lot of differences.

sudo diff -r /root/httpd  /etc/httpd/

All of the files in /etc/httpd/conf.d that were placed by the IPA install are gone, as are the following module files in /root/httpd/conf.modules.d

Only in /root/httpd/conf.modules.d: 00-base.conf
Only in /root/httpd/conf.modules.d: 00-dav.conf
Only in /root/httpd/conf.modules.d: 00-lua.conf
Only in /root/httpd/conf.modules.d: 00-mpm.conf
Only in /root/httpd/conf.modules.d: 00-proxy.conf
Only in /root/httpd/conf.modules.d: 00-systemd.conf
Only in /root/httpd/conf.modules.d: 01-cgi.conf
Only in /root/httpd/conf.modules.d: 10-auth_gssapi.conf
Only in /root/httpd/conf.modules.d: 10-nss.conf
Only in /root/httpd/conf.modules.d: 10-wsgi.conf

TO start, I am going to backup the existing HTTPD directory :

 sudo cp -a /etc/httpd/ /home/stack/

Te rest of this is easier to do as root, as I want some globbing. First, I’ll copy over the module config files

 sudo su
 cp /root/httpd/conf.modules.d/* /etc/httpd/conf.modules.d/
 systemctl restart httpd.service

Test Keystone

 . ./stackrc 
 openstack token issue

Get a token…good to go…ok, lets try toe conf.d files.

sudo cp /root/httpd/conf.d/* /etc/httpd/conf.d/
systemctl restart httpd.service

Then as a non admin user

$ kinit admin
Password for admin@AYOUNG-DELL-T1700.TEST: 
[stack@undercloud ~]$ ipa user-find
1 user matched
  User login: admin
  Last name: Administrator
  Home directory: /home/admin
  Login shell: /bin/bash
  UID: 776400000
  GID: 776400000
  Account disabled: False
  Password: True
  Kerberos keys available: True
Number of entries returned 1

This is a fragile deployment, as updating either FreeIPA or the Undercloud has the potential to break one or the other…or both. But it is a start.

De-conflicting Swift-Proxy with FreeIPA

Port 8080 is a popular port. Tomcat uses it as the default port for unencrypted traffic. FreeIA, installs Dogtag which runs in Tomcat. Swift proxy also chose that port number for its traffic. This means that if one is run on that port, the other cannot. Of the two, it is easier to change FreeIPA, as the port is only used for internal traffic, where as Swift’s port is in the service catalog and the documentation.

Changing the port in FreeIPA requires modifications in both the config directories for Dogtag and the Python code that contacts it.

The Python changes are in


Look for any instance of 8080 and change them to another port that will not conflict. I chose 8181

The config changes for dogtag are in /etc/pki such as /etc/pki/pki-tomcat/ca/CS.cfg and again, change 8080 to 8181.

Restart the server with:

sudo systemctl status ipa.service

To confirm run a command that hits the CA:

 ipa cert-find

I have a ticket in with FreeIPA to try and get support for this in.

With these changes made, I tested out then installing the undercloud on the same node and it seems to work.

However, the IPA server is no longer running. The undercloud install seems to have cleared out the ipa config files from under /etc/httpd/conf.d. However, DOgtag is still running as shown by

curl localhost:8181

Next experiment will be to see if I can preserve the IPA configuration

July 05, 2016

Launching a Centos VM in Tripleo Overcloud

My Overcloud deploy does not have any VM images associates with it. I want to test launching a VM.

Get the VM from Centos:

curl -O
unxz < CentOS-7-x86_64-GenericCloud.qcow2.xz >CentOS-7-x86_64-GenericCloud.qcow2
glance --os-image-api-version 2 image-create --name 'CentOS-7-x86_64-GenericCloud' --disk-format qcow2 --container-format bare --file CentOS-7-x86_64-GenericCloud.qcow2

Wait for that to finish, and check with

$ openstack image list
| ID                                   | Name                         | Status |
| 06841fb4-df1c-458d-898e-aea499342905 | CentOS-7-x86_64-GenericCloud | active |

Now launch it:

openstack server create --flavor m1.small --image CentOS-7-x86_64-GenericCloud testrun

And it becomes active pretty quickly:

$ openstack server list
| ID                                   | Name    | Status | Networks |
| 76585723-e2c3-4acb-88d5-837b69000f72 | testrun | ACTIVE |          |

It has no network capability. To Destroy:

openstack server delete 76585723-e2c3-4acb-88d5-837b69000f72
But I have work to do!
There’s a news story going around that talks about how horrible computer security tends to be in hospitals. This probably doesn’t surprise anyone who works in the security industry, security is often something that gets in the way, it’s not something that helps get work done.

There are two really important lessons we should take away from this. The first is that a doctor or nurse isn’t a security expert, doesn’t want to be a security expert, and shouldn’t be a security expert. Their job is helping sick people. We want them helping sick people, especially if we’re the people who are sick. The second is that when security gets in the way, security loses. Security should lose when it gets in the way, we’ve been winning far too often and it’s critically damaged the industry.

They don’t want to be security experts

It’s probably not a surprise that doctors and nurses don’t want to be computer security experts. I keep going back and forth between “you need some basics” and “assume nothing”. I’m back to the assume nothing camp this week. I think in the context of health care workers, security can’t exist, at least not the way we see it today. These are people and situations where seconds can literally be the difference between life and death. Will you feel better knowing the reason your grandma died was because they were using strong passwords? Probably not. In the context of a hospital, if there is any security it has to be totally transparent, the doctors shouldn’t have to know anything about it, and it should work 100% of the time. This is of course impossible.

So the real question isn’t how do we make security 100% reliable, the question is where do we draw our risk line. We want this line as far from perfect security and as close to saving lives as possible. If we start to think in this context it changes our requirements quite a lot. There will be a lot of “good enough security”. There will be a lot of hard choices to make and anyone who can make them will have to be extremely knowledgeable with both health care and security. I bet there aren’t a lot of people who can do this today.

This leads us to point #2

When security gets in the way, security loses

If you’re a security person, you see people do silly and crazy things all the time. Every day all day. How many times a day do you ask “why did you do that”? Probably zero. It’s more likely you say “don’t do that” constantly. If you have kids, the best way to get them to do something is to say “don’t do that”. If we think about security in the context of a hospital, the answer to “why did you do that” is pretty simple, it’s because the choice was probably between getting the job done and following the security guidelines. A hospital is one of the extremes where it’s easy to justify breaking the rules. If you don’t, people die. In most office settings if you break the rules, nobody dies, there will possibly be some sort of security event that will cost time and money. Historically speaking, in an office environment, we tell people “don’t do that” and expect them to listen, in many cases they pretend to listen.

This attitude of “listen to me because” has created a security universe where we don’t pay attention to what people are actually doing, we don’t have to. We get in the way, then when someone tries to get their work done, we yell at them for not following our bizarre and esoteric rules instead of understanding the challenge and how we can solve it together. The next great challenge we have isn't tighter rules, or better training, it's big picture. How can we start looking at systems with a big picture view? It won't be easy, but it's where we go next.

What do you think? Let me know: @joshbressers

July 01, 2016

Clearing the Keystone Environment

If you spend a lot of time switching between different cloud, different users, or even different projects for the same user when working with openstack, you’ve come across the problem where one environment variable from an old sourceing pollutes the current environment.  I’ve been hit by that enough times that I wrote a small script to clear the environment.

I call it clear_os_env

unset OS_TOKEN
unset OS_URL
unset OS_USER_ID
unset OS_USER_ID

Source this prior to sourcing any keystone.rc file, and you should have cleared out the old variables, regardless of how vigilant the new source file writer was in clearing old variables. THis includes some old variables that should no longer be used, like OS_SERVICE_TOKEN

June 27, 2016

The future of security
The Red Hat Summit is happening this week in San Francisco. It's a big deal if you're part of the Red Hat universe, which I am. I'm giving the Red Hat security roadmap talk this year. The topic has me thinking about the future of security quite a lot. It's easy to think about this in the context of an organization like Red Hat, we have a lot of resources, and there are a lot of really interesting things happening. Everything from container security, to operating system security, to middleware security. My talk will end up youtube at some point, I'll link to it, but I also keep thinking about the bigger picture. Where will security be in the next 5, 10, 15 years?

Will ransomware still be a thing in ten years? Will bitcoin still be around? What about flash? How will open source adapt to all the changes? Will we even call them containers?

The better question here is "what do we want security to look like?"

If we look at some of the problems that always make the news, stolen personal information, password leaks, ransomware, hacking. These aren't new problems, most are almost as old as the Internet. The question is really, can we fix any of these problems? The answer might be "no". Some problems aren't fixable, crime is an example of this. When you have unfixable problems the goal is to control the problem, not prevent it.

How do we control security?

I think we're headed down this path today. It's still slow going and there are a lot of old habits that will die hard. Most decent security organizations aren't focused on pure prevention anymore, they understand that security is process and people, it's all about having nice policies and good staff. If you have those things you can start to work on controlling some aspects of what's happening. If you want users to behave you have to make it easy for them to do the right thing. If you don't want them opening email attachments, make it easy to not use email attachments.

There are still a lot of people who think it's enough to tell people not to do something, or yell at them if they behave in a way that is quite honestly expected. People don't like getting yelled at, they don't like having to go out of their way to do anything, they will always pick the option that is easiest.

Back to the point though. What will the future of security look like? I think the future of security is people. Technology is great, but all our fancy technology is to solve problems that are in the past. If we want to solve the problems of the future, we need good people to first understand those problems, then we can understand how to solve them. This is of course easier said than done, but sometimes just understanding the problem is.

Are you a people? Do you have ideas how to make things better? Tell me: @joshbressers

June 20, 2016

Decentralized Security
If you're a fan of the cryptocurrency projects, you've heard of something called Ethereum. It's similar to bitcoin, but is a seperate coin. It's been in the news lately due to an attack on the currency. Nobody is sure how this story will end at this point, there are a few possible options, none are good. This got me thinking about the future of security, there are some parallels when you compare traditional currency to crypto currency as well as where we see security heading (stick with me here).

The current way currency works is there is some central organization that is responsible for minting and controlling the currency, usually a country. There are banks, exchanges, loans, interest, physical money, and countless other ways the currency interacts with society. We will compare this to how IT security has mostly worked in the past. You had one large organization responsible for everything. If something went wrong, you could rely on the owner to take control and make things better. There are some instances where this isn't true, but in general it holds.

Now if we look at cryptocurrency, there isn't really a single group or person in charge. That's the whole point though. The idea is to have nobody in charge so the currency can be used with some level of anonymity. You don't have to rely on some sort of central organization to give the currency legitimacy, the system itself has legitimacy built in.

This reminds of the current state of shadow IT, BYOD, and cloud computing in general. The days of having one security group that was in charge of everything are long gone. Now we have distributed responsibility as well as distributed risk. It's up to each group to understand how they must interact with each other. The risk is shifted from one central organization to nearly everyone involved.

Modified risk isn't a bad thing, demonizing it isn't the point of this discussion. The actual point is that we now exist in an environment that's new to us. The history of humanity has taught us how to exist in an environment where there is a central authority. We now exist in a society that is seeing a shift from central authorities to individuals like never before. The problem with this is we don't know how to deal with or talk about such an environment. When we try to figure out what's happening with security we use analogies that don't work. We talk about banks (just like this post) or cars or doors or windows or boats.

The reality though is we don't really know what this means. We now exist in an environment where everything is becoming distributed, even security. The days of having a security group that rules with an iron fist are gone. If you have an iron fist, you end up with a massive shadow IT problem. In a world based on distributed responsibility the group with the iron fist becomes irrelevant.

The point of bringing up Ethereum wasn't to pick on its problems. It's to point out that we should watch them closely. Regardless of how this problem is solved there will be lessons learned. Success can be as good as a mistake if you understand what happened and why. The face of security is changing and a lot of us don't understand what's happening. There are no analogies that work here, we need new analogies and stories. Right now one of the easiest to understand stories around distributed security is cryptocurrency. Even if you're not bitcoin rich, you should be paying attention, there are lessons to be learned.
Keystone Auth Entry Points

OpenStack libraries now use Authenication plugins from the keystoneauth1 library. One othe the plugins has disappered? Kerbersop. This used to be in the python-keystoneclient-kerberos package, but that is not shipped with Mitaka. What happened?

To list the posted entry points on a Centos Based system, you can first look in the entry_points.txt file:

cat /usr/lib/python2.7/site-packages/keystoneauth1-2.4.1-py2.7.egg-info/entry_points.txt
v2token = keystoneauth1.loading._plugins.identity.v2:Token
admin_token = keystoneauth1.loading._plugins.admin_token:AdminToken
v3oidcauthcode = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectAuthorizationCode
v2password = keystoneauth1.loading._plugins.identity.v2:Password
v3password = keystoneauth1.loading._plugins.identity.v3:Password
v3oidcpassword = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectPassword
token = keystoneauth1.loading._plugins.identity.generic:Token
v3token = keystoneauth1.loading._plugins.identity.v3:Token
password = keystoneauth1.loading._plugins.identity.generic:Password

But are there others?

Looking in the source repo: We can see a reference to Kerberos (as well as SAML, which has also gone missing), before the enumeration of the entry points we see above.

kerberos =
  requests-kerberos>=0.6:python_version=='2.7' or python_version=='2.6' # MIT
saml2 =
  lxml>=2.3 # BSD
oauth1 =
  oauthlib>=0.6 # BSD
betamax =
  betamax>=0.7.0 # Apache-2.0
  fixtures>=3.0.0 # Apache-2.0/BSD
  mock>=2.0 # BSD


keystoneauth1.plugin =
    password = keystoneauth1.loading._plugins.identity.generic:Password
    token = keystoneauth1.loading._plugins.identity.generic:Token
    admin_token = keystoneauth1.loading._plugins.admin_token:AdminToken
    v2password = keystoneauth1.loading._plugins.identity.v2:Password
    v2token = keystoneauth1.loading._plugins.identity.v2:Token
    v3password = keystoneauth1.loading._plugins.identity.v3:Password
    v3token = keystoneauth1.loading._plugins.identity.v3:Token
    v3oidcpassword = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectPassword
    v3oidcauthcode = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectAuthorizationCode
    v3oidcaccesstoken = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectAccessToken
    v3oauth1 = keystoneauth1.extras.oauth1._loading:V3OAuth1
    v3kerberos = keystoneauth1.extras.kerberos._loading:Kerberos
    v3totp = keystoneauth1.loading._plugins.identity.v3:TOTP

We see that the Kerberos plugin requires requests-kerberos>=0.6 so let’s get that installed via

sudo yum install python-requests-kerbero

And then try to enumerate the entry points via python

>>> import pkg_resources
>>> named_objects = {}
>>> for ep in pkg_resources.iter_entry_points(group='keystoneauth1.plugin'):
...     named_objects.update({ ep.load()})
>>> print (named_objects)
{'v2token': <class>, 'token': <class>, 'admin_token': <class>, 'v3oidcauthcode': <class>, 'v3token': <class>, 'v2password': <class>, 'password': <class>, 'v3password': <class>, 'v3oidcpassword': <class>}

We still don’t have the Kerberos plugin. Going back to the setup.cfg file, we see the Python class for the Kerberos plugin is not listed. Kerberos is implemented here in the source tree. Does that exist in our package managed file system?

$ rpm --query --list python2-keystoneauth1-2.4.1-1.el7.noarch | grep$

Yes. It does. Can we load that by class?

>>> from keystoneauth1.extras import kerberos
>>> print kerberos

Yes, although the RPM version is a little earlier than the git repo. So what is the entry point name? There is not one, yet. The only way to get the class is by the full class name.

We’ll fix this, but the tools for enumerating the entrypoints are something I’ve used often enough that I want to get them documented.

June 17, 2016

The difference between auth_uri and auth_url in auth_token

Dramatis Personae:

Adam Young, Jamie Lennox: Keystone core.

Scene: #openstack-keystone chat room.

ayoung: I still don’t understand the difference between url and uri
jamielennox:auth_uri ends up in “WWW-Authenticate: Keystone uri=%s” header. that’s its only job
ayoung: and what is that meant to do? tell someone where they need to go to authenticate?
jamielennox: yea, it gets added to all 401 responses and then i’m pretty sure everyone ignores it
ayoung:so they should be the same thing, then, right? I mean, we say that the Keystone server that you authenticate against is the one that nova is going to use to validate the token. and the version should match
jamielennox: depends, most people use an internal URL for auth_url but auth_uri would get exposed to the public
ayoung: ah
jamielennox: there should be no version in auth_uri
ayoung: so auth_uri=main auth_url=admin in v2.0 speak
jamielennox: yea. more or less. ideally we could default it way better than that, like auth.get_endpoint(‘identity’, interface=’public’), but that gets funny
ayoung: This should be a blog post. You want to write it or shall I? I’m basically just going to edit this conversation.
jamielennox: mm, blog, i haven’t written one of those for a while


June 16, 2016

Learning about the Overcloud Deploy Process

The process of deploying the overcloud goes through several technologies. Here’s what I’ve learned about tracing it.

I am not a Heat or Tripleo developer. I’ve just started working with Tripleo, and I’m trying to understand this based on what I can gather, and the documentation out there. And also from the little bit of experience I’ve had working with Tripleo. Anything I say here might be wrong. If someone that knows better can point out my errors, please do so.

[UPDATE]: Steve Hardy has corrected many points, and his comments have been noted inline.

To kick the whole thing off in the simplest case, you would run the command openstack overcloud deploy .

Roughly speaking, here is the sequence (as best as I can tell)

  1.  User types  openstack overcloud deploy on the command line
  2. This calls up the common cli, which parses the command, and matches the tripleo client with the overcloud deploy subcommand.
  3. tripleo client is a thin wrapper around the Heat client, and calls the equivalent of heat stack-create overcloud
  4. python-heatclient (after Keystone token stuff) calls the Heat API server with the URL and data to do a stack create
  5. Heat makes the appropriate calls to Nova (running the Ironic driver) to activate a baremetal node and deploy the appropriate instance on it.
  6. Before the node is up and running, Heat has posted Hiera data to the metadata server.
  7. The newly provisioned machine will run cloud-init which in turn runs os-collect-config.
    [update] Steve Hardy’s response:

    This isn’t strictly accurate – cloud-init is used to deliver some data that os-collect-config consumes (via the heat-local collector), but cloud-init isn’t involved with actually running os-collect-config (it’s just configured to start in the image).

  8. os-collect-config will start polling for changes to the metadata.
  9. [update] os-collect-config will start calling Puppet Apply based on the hiera data [UPDATE] os-refresh-config only, which then invokes a script that runs puppet. .
    Steve’s note:

    os-collect-config never runs puppet, it runs os-refresh-config only, which then invokes a script that runs puppet.

  10. The Keystone Puppet module will set values in the Keystone config file, httpd/conf.d files, and perform other configuration work.

Here is a diagram of how os-collect-config is designed

When a controller image is built for Tripleo, Some portion of the Hiera data is stored in /etc/puppet/. There is a file /etc/puppet/hiera.yaml (which looks a lot like /etc/hiera.yaml, an RPM controlled file) and sub file in /etc/puppet/hieradata such as

UPDATE: Response from Steve Hardy

This is kind-of correct – we wait for the server to become ACTIVE, which means the OS::Nova::Server resource is declared CREATE_COMPLETE. Then we do some network configuration, and *then* we post the hieradata via a heat software deployment.

So, we post the hieradata to the heat metadata API only after the node is up and running, and has it’s network configured (not before).

Note the depends_on – we use that to control the ordering of configuration performed via heat.

However, the dynamic data seems to be stored in /var/lib/os-collect-config/

$ ls -la  /var/lib/os-collect-config/*json
-rw-------. 1 root root   2929 Jun 16 02:55 /var/lib/os-collect-config/ControllerAllNodesDeployment.json
-rw-------. 1 root root    187 Jun 16 02:55 /var/lib/os-collect-config/ControllerBootstrapNodeDeployment.json
-rw-------. 1 root root   1608 Jun 16 02:55 /var/lib/os-collect-config/ControllerCephDeployment.json
-rw-------. 1 root root    435 Jun 16 02:55 /var/lib/os-collect-config/ControllerClusterDeployment.json
-rw-------. 1 root root  36481 Jun 16 02:55 /var/lib/os-collect-config/ControllerDeployment.json
-rw-------. 1 root root    242 Jun 16 02:55 /var/lib/os-collect-config/ControllerSwiftDeployment.json
-rw-------. 1 root root   1071 Jun 16 02:55 /var/lib/os-collect-config/ec2.json
-rw-------. 1 root root    388 Jun 15 18:38 /var/lib/os-collect-config/heat_local.json
-rw-------. 1 root root   1325 Jun 16 02:55 /var/lib/os-collect-config/NetworkDeployment.json
-rw-------. 1 root root    557 Jun 15 19:56 /var/lib/os-collect-config/os_config_files.json
-rw-------. 1 root root 263313 Jun 16 02:55 /var/lib/os-collect-config/request.json
-rw-------. 1 root root   1187 Jun 16 02:55 /var/lib/os-collect-config/VipDeployment.json

For each of these files there are two older copies that end in .last and .orig as well.

In my previous post, I wrote about setting Keystone configuration options such as ‘identity/domain_specific_drivers_enabled’: value => ‘True’;. I can see this value set in /var/lib/os-collect-config/request.json with a large block keyed “config”.

When I ran the openstack overcloud deploy, one way that I was able to track what was happening on the node was to tail the journal like this:

 sudo journalctl -f | grep collect-config

Looking through the journal output, I can see the line that triggered the change:

... /Stage[main]/Main/Keystone_config[identity/domain_specific_drivers_enabled]/ensure: ...

June 15, 2016

Custom Overcloud Deploys

I’ve been using Tripleo Quickstart.  I need custom deploys. Start with modifying the heat templates. I’m doing a mitaka deploy

git clone
cd tripleo-heat-templates/
git branch --track mitaka origin/stable/mitaka
git checkout mitaka
diff -r  /usr/share/openstack-tripleo-heat-templates/ tripleo-heat-templates/

Mine shows some differences, but in the file extraconfig/tasks/liberty_to_mitaka_aodh_upgrade_2.pp which should be OK. The commit is

Add redis constraint to aodh upgrade manifest

Modify the launch script in /home/stack

$ diff
< openstack overcloud deploy --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 60 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server \
> openstack overcloud deploy --templates  /home/stack/tripleo-heat-templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 60 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server \

The only change should be from

--templates  #(followed by another flag which means that --templates takes the default) 


--templates /home/stack/tripleo-heat-templates 

OK…let’s make sure we still have a stable system. First, tear down the overcloud deliberately:

[stack@undercloud ~]$ . ./stackrc 
[stack@undercloud ~]$ heat stack-delete overcloud
Are you sure you want to delete this stack(s) [y/N]? y
| id                                   | stack_name | stack_status    | creation_time       | updated_time |
| 00d81e5b-c2f9-4f6a-81e8-b135fadba921 | overcloud  | CREATE_COMPLETE | 2016-06-15T18:01:25 | None         |

Wait until the delete is coplete with

$ watch heat stack-list

Wait until it changes from

| id                                   | stack_name | stack_status	 | creation_time       | updated_
time |
| 00d81e5b-c2f9-4f6a-81e8-b135fadba921 | overcloud  | DELETE_IN_PROGRESS | 2016-06-15T18:01:25 | None


| id | stack_name | stack_status | creation_time | updated_time |

And now run the modified overcloud deploy:


End of the output looks like this

Stack overcloud CREATE_COMPLETE
/home/stack/.ssh/known_hosts updated.
Original contents retained as /home/stack/.ssh/known_hosts.old
PKI initialization in init-keystone is deprecated and will be removed.
Warning: Permanently added '' (ECDSA) to the list of known hosts.
The following cert files already exist, use --rebuild to remove the existing files before regenerating:
/etc/keystone/ssl/certs/ca.pem already exists
/etc/keystone/ssl/private/signing_key.pem already exists
/etc/keystone/ssl/certs/signing_cert.pem already exists
Connection to closed.
Skipping "horizon" postconfig because it wasn't found in the endpoint map output
Overcloud Endpoint:
Overcloud Deployed
+ heat stack-list
+ exit 0

Don’t be fooled by the last line grep -q CREATE_FAILED as that is the shell script execution logging, not a statement of failure.

OK, to do a proper “Hello, World” here I’d really like to be able to affect change on the deployment. I’m going to try and set a coupole Keystone config values that are not set (yet) in /etc/keystone/keystone.conf.

In my undercloud git repo for tripleo-heat-templates I make changes to the Overcloud post config.

$ git diff
diff --git a/puppet/manifests/overcloud_controller.pp b/puppet/manifests/overcloud_controller.pp
index c353ec0..c6385d4 100644
--- a/puppet/manifests/overcloud_controller.pp
+++ b/puppet/manifests/overcloud_controller.pp
@@ -223,6 +223,11 @@ if hiera('step') >= 3 {
   #TODO: need a solution here
+  keystone_config {  
+   'identity/domain_specific_drivers_enabled': value => 'True';  
+   'identity/domain_config_dir': value => '/etc/keystone/domains';  
+  }  
   file { [ '/etc/keystone/ssl', '/etc/keystone/ssl/certs', '/etc/keystone/ssl/private' ]:
     ensure  => 'directory',
     owner   => 'keystone',

And rerun


Once it has successfull deployed, I can check to see if the change shows up in the keystone.conf file.

$ . ./stackrc 
[stack@undercloud ~]$ openstack server list
| ID                                   | Name                    | Status | Networks            |
| 761a1b61-8bd1-4b85-912b-775e51ad99f3 | overcloud-controller-0  | ACTIVE | ctlplane= |
| f123da36-9b05-4fc3-84bb-4af147fa76f7 | overcloud-novacompute-0 | ACTIVE | ctlplane= |
[stack@undercloud ~]$ ssh heat-admin@
$ sudo grep domain_specific /etc/keystone/keystone.conf
#domain_specific_drivers_enabled = false
domain_specific_drivers_enabled = True
# if domain_specific_drivers_enabled is set to true. (string value)
[heat-admin@overcloud-controller-0 ~]$ sudo grep domain_config_dir /etc/keystone/keystone.conf
#domain_config_dir = /etc/keystone/domains
domain_config_dir = /etc/keystone/domains

Changes applied.

June 13, 2016

Ready to form Voltron! why security is like a giant robot make of lions
Due to various conversations about security this week, Voltron came up in the context of security. This is sort of a strange topic, but it makes sense when we ponder modern day security. If you talk to anyone, there is generally one thing they push as a solution for a problem. This is no different for security technologies. There is always one thing that will fix your problems. In reality this is never the case. Good security is about putting a number of technologies together to create something bigger and better than any one thing can do by itself.

For those of you who don't know what Voltron is, Voltron was a cartoon when I was a kid. There were 5 robot lions that sometime during every show would combine together to create one big robot called Voltron. By themselves the lions were pretty awesome, but it always seemed the bad guy would keep getting stronger until the lions couldn't deal with it alone, only by coming together to form a giant robot of pure awesome could they destroy whatever horrible create was causing problems.

This sounds just like security. Just a firewall will eventually be beaten by your adversaries. Just code reviews won't keep things safe for long (if at all). Just using ASLR is only good for a little while. When we start putting everything together though, things get good.

There are some people who get this, they know that there isn't one thing that's going to fix it all, a lot don't though. It's very common to attend a talk about a new security feature or product. If you talk to a vendor without a doubt whatever they're doing will cure what ails you. How often does anyone talk about how their product, feature, or idea will fit in the big picture? How can two or more things work together to add security? It's pretty uncommon to see anyone talking about how well things work together. It's human nature though. We can usually only do one thing, and why wouldn't you be proud of what you're working on? You want to talk about what you do and what you know.

I'm often guilty of this too. When talking about something like containers I'll focus on selinux, or updates, or trusted content, or seccomp. Rarely is the whole story told. Part of this may be because security technology is usually really complex, it's hard to hold a good view of it all in your head at once. The thing is though, none of those are overly useful by themselves. They're all good and do great things, but it's not until you put everything together that you can see a real difference.

This all makes sense when you think about it. Layers of defense are almost always more effective than a single layer (I know there is a lot of nuance to this, but in general, let's not nitpick). Would you want to rely on only seccomp, or would you rather have seccomp, cgroups, selinux, user namespaces, trusted content, content scanning, and ExecShield? It's a no brainer when you think about it.

How can we start to think about things as a giant evil fighting robot instead of small (but still awesome) lions? It's never easy, it's even harder when you have to expect different groups to share attention and recognition. It's going to be more important in the future though. If we don't take better looks at how things work together it's going to be a lot harder to see real improvements.

What do you think? Let me know: @joshbressers

June 08, 2016

SAML Federated Auth Plugin

SAML is usually thought of as a WebSSO mechanism, but it can be made to work for command line operations if you use the Extended Client Protocol (ECP). When we did the Rippowam demo last year, we were successful in getting an Unscoped token by using ECP, but that was not sufficient to perform operations on other services that need a scoped token.

The general approach that we are looking at for Keystone is to always have the user ask for an unscoped token first, and then upgrade that to a scoped token. The scoping process can only be done from unscoped to scoped (configuration option) to prevent elevation of privilege attacks.

The base federation plugin is capable of handling this kind of workflow. Thus, the general approach is to write a protocol specific plugin to get an unscoped token, and then to use common logic in the base class v3.FederatedBaseAuth to convert unscoped to scoped.

I just got [edit: used to say keystone] opentstack flavor list to work with ECP and Keycloak. I had to create a new auth plugin to do it:

Created a new entry point in


v3fedsaml = keystoneclient.contrib.auth.v3.saml2:FederatedSAML2

Added this to

class FederatedSAML2(v3.FederatedBaseAuth):
    """Authenticate using SAML via the keystone federation mechanisms.

       Wraps both the unscoped SAML2 Plugin to
       1.  Request an unscoped token
       2.  Use the unscoped token to request a scoped token


    def get_options(cls):
        options = super(FederatedSAML2, cls).get_options()
            cfg.StrOpt('identity-provider', help="Identity Provider's name"),
            cfg.StrOpt('protocol', help="SAML2"),
                       help="Identity Provider's URL"),
            cfg.StrOpt('user-name', dest='username', help='Username',
            cfg.StrOpt('password', help='Password')
        return options

    def __init__(self, auth_url,
                 username, password,
        #protocol = kwargs.pop('protocol')
        super(FederatedSAML2, self).__init__(auth_url, identity_provider, protocol,
        self._unscoped = Saml2UnscopedToken(auth_url,
                                            username, password,

    def get_unscoped_auth_ref(self, session, **kwargs):
         return self._unscoped.get_auth_ref(session, **kwargs)

Updated my keystone RC file:

export OS_AUTH_TYPE=v3fedsaml

This is based on RH OSP8 which is Liberty release. In later releases of OSP, the client libraries are synchronized with later versions, including the gradual replacement of keystoneauth for the Auth plugins housed in python-keystone. Thus, there will be a couple variations on this plauoing, including one that may have to live out of tree if we want it for OSP8.

June 07, 2016

IoT Technology: Devices

Discussions of IoT often focus on the technology, so let’s start there. IoT consists of devices, which are the “things” that interact with the physical world and communicate with IoT Back-end systems over a network. There are two types of IoT devices: sensors and actuators.

An IoT system will typically be made of many devices – from dozens to millions – talking to a scaleable Back-end system. This Back-end system often runs in the Cloud. In some cases the IoT devices will talk directly to the Back-end systems. In other cases an additional system called an IoT Gateway will be placed between the devices and the Back-end systems. The IoT Gateway will typically talk to multiple local IoT devices, perform communications protocol conversions, perform local processing, and connect to the Back-end systems over a Ethernet, WiFi, or cellular modem link.

IoT Devices

IoT devices consist of sensors, actuators, and communications. Sensors, as the name implies, read information from the physical world. Examples would be temperature, humidity, barometric pressure, light, weight, CO2, motion, location, Ph level, chemical concentration for many chemicals, distance, voltage, current, images, etc. There are sensors available for an incredible range of information and many more under development. Think of things like a tiny DNA sequencer or a sensor that can detect the presence of the bacteria or virus associated with various diseases – both of these are under development!

Actuators are able to change something in the physical world. Examples would be a light switch, a remotely operated valve, a remotely controlled door lock, a stepper motor, a 3D printer, or the steering, brakes and throttle for a self driving car.

IoT Device Examples

For an idea of the range of low cost IoT compatible sensors take a look at Spark Fun Electronics, a leading source of IoT device technology for prototyping, development, and hobbyists. The sensor section at lists over 200 sensors that can be used with Arduino and similar systems. Note that these are basically development and prototyping units – prices in production quantities will be lower.

Some sensors are familiar – temperature is perhaps the most obvious example. But many are more interesting. Consider, for example, the gas sensors: hydrogen, methane, lpg, alcohol, carbon monoxide; all available at prices of $4.95 – $7.95. Combined one of these with an Arduino Pro Mini available for $9.95, and you can build a targeted chemical sensor for less than $20.00.

What can you do with a $20.00 lpg or carbon monoxide sensor? That is the wrong question. Instead, you should be asking the question “what problems am I facing that could be addressed with a low cost network connected sensor?” The point is that there is an incredible and growing array of inexpensive sensors available. The technology is available – what we need now is the imagination to begin to creatively use ubiquitous sensors, ubiquitous networking, ubiquitous computing, and ubiquitous data.

The application of modern electronics technology to sensors is just beginning to be felt. As in many other areas of IoT, the basic capabilities have been around for years – detecting and measuring the concentration of lpg vapor or carbon monoxide isn’t new. Detecting lpg vapor concentration with a sub $20 networked device that feeds the data directly into a large distributed computing system in a form that is readily manipulated by software is new. And huge!

Lpg and carbon monoxide are just examples. The same technologies are producing sensors for a wide range of chemicals and gasses.

The combination of useful capabilities, low cost, network connection, and integration into complex software applications is a complete revolution. And this revolution is just beginning. What happens to agriculture when we can do a complete soil analysis for each field? What happens if we have nutrient, moisture, light, and temperature information for each ten foot square in a field, updated every 15 minutes over the entire growing season? What happens when we have this information for a 20 year period? What happens when this information is dynamically combined with plant growth monitoring, standard plant growth profiles, weather forecasts and climatic trends?

Going further, what if this data is combined with an active growth management system where application of fertilizer, pesticide, and water is optimized for individual micro-areas within a field? Technology is progressing to the point where we can provide the equivalent of hands-on gardening to commercial fields.

As an example of work going on in this area see the Industrial Internet Consortium Testbed on Precision Crop Management at