August 22, 2016

The cost of mentoring, or why we need heroes
Earlier this week I had a chat with David A. Wheeler about mentoring. The conversation was fascinating and covered many things, but the topic of mentoring really got me thinking. David pointed out that nobody will mentor if they're not getting paid. My first thought was that it can't be true! But upon reflection, I'm pretty sure it is.

I can't think of anyone I mentored where a paycheck wasn't involved. There are people in the community I've given advice to, sometimes for an extended period of time, but I would hesitate to claim I was a mentor. Now I think just equating this to a paycheck would be incorrect and inaccurate. There are plenty of mentors in other organizations that aren't necessarily getting a paycheck, but I would say they're getting paid in some sense of the word. If you're working with at risk youth for example, you may not get paid money, but you do have satisfaction in knowing you're making a difference in someone's life. If you mentor kids as part of a sports team, you're doing it because you're getting value out of the relationship. If you're not getting value, you're going to quit.

So this brings me to the idea of mentoring in the community.

The whole conversation started because of some talk of mentoring on Twitter, but now I suspect this isn't something that would work quite like we think. The basic idea would be you have new young people who are looking for someone to help them cut their teeth. Some of these relationships could work out, but probably only when you're talking about a really gifted new person and a very patient mentor. If you've ever helped the new person, you know how terribly annoying they become, especially when they start to peak on the Dunning-Kruger graph. If I don't have a great reason to stick around, I'm almost certainly going to bail out of that. So the question really is can a mentoring program like this work? Will it ever be possible to have a collection of community mentors helping a collection of new people?

Let's assume the answer is no. I think the current evidence somewhat backs this up. There aren't a lot of young people getting into things like security and open source in general. We all like to think we got where we are through brilliance and hard work, but we all probably had someone who helped us out. I can't speak for everyone, but I also had some security heroes back in the day. Groups like the l0pht, Cult of the Dead Cow, Legion of Doom, 2600, mitnick, as well as a handful of local people. Who are the new heroes?

Do it for the heroes!

We may never have security heroes like we did. It's become a proper industry. I don't think many mature industries have new and exciting heroes. We know who Chuck Yeager is, I bet nobody could name 5 test pilots anymore. That's OK though. You know what happens when there is a solid body of knowledge that needs to be moved from the old to the young? You go to a university. That's right, our future rests with the universities.

Of course it's really easy to say this is the future, making this happen will be a whole different story. I don't have any idea where we start, I imagine people like David Wheeler have ideas. All I do know is that if nothing changes, we're not going to like what happens.

Also, if you're part of an open source project, get your badge

If you have thoughts or ideas, let me know: @joshbressers

August 16, 2016

Running Unit Tests on Old Versions of Keystone

Just because Icehouse is EOL does not mean no one is running it. One part of my job is back-porting patches to older versions of Keystone that my Company supports.

A dirty secret is that we only package the code needed for the live deployment, though, not the unit tests. In the case of I need to test a bug fix against a version of Keystone that was, essentially, Upstream Icehouse.

Running the unit tests with Tox had some problems, mainly due to recent oslo components not being being compatible that far back.

Here is what I did:

  • Cloned the  keystone repo
  • applied the patch to test
  • ran tox -r -epy27  to generate the virtual environment.  Note that the tests fail.
  • . .tox/py27/bin/activate
  • python -m unittest keystone.tests.test_v3_identity.IdentityTestCase
  • see that test fails due to:
    • AttributeError: ‘module’ object has no attribute ‘tests’
  • run python to get an interactive interpreter
    • import keystone.tests.test_v3_identity
    • Get the error below:
ImportError: No module named utils
>>> import oslo-utils
File "<stdin>", line 1
import oslo-utils

To deal with this:

  • Clone the oslo-utils repo
    • git clone
  • checkout out the tag that is closest to what I think we need.  A little trial and error showed I wanted kilo-eol
    • git checkout kilo-eol
  • Build and install in the venv (note that the venv is still activated)
    • cd oslo.utils/
    • python install

Try running the tests again.  Similar process shows that something is mismatched with oslo.serialization.  Clone, checkout, and build, this time the tag is also kilo-eol.

Running the unit test runs and shows:

Traceback (most recent call last):
  File "keystone/tests/", line 835, in test_delete_user_and_check_role_assignment_fails
    member_url, user = self._create_new_user_and_assign_role_on_project()
  File "keystone/tests/", line 807, in _create_new_user_and_assign_role_on_project
    user_ref = self.identity_api.create_user(new_user)
  File "keystone/", line 74, in wrapper
    result = f(*args, **kwargs)
  File "keystone/identity/", line 189, in wrapper
    return f(self, *args, **kwargs)
TypeError: create_user() takes exactly 3 arguments (2 given)

Other unit tests run successfully. I’m back in business.

RBAC Policy Updates in Tripleo

Policy files contain the access control rules for an OpenStack deployment. The upstream policy files are conservative and restrictive; they are designed to be customized on the end users system. However, poorly written changes can potentially break security, their deployment should be carefully managed and monitored.

Since RBAC Policy controls access to the Keystone server, the Keystone policy files themselves are not served from a database in the Keystone server. They are, instead, configuration files, and managed via the deployment’s content management system. In a Tripleo based deployment, none of the other services use the policy storage in Keystone, either.

In Tripleo, the deployment of the overcloud is managed via Heat. the OpenStack Tripleo Heat templates have support for deploying files at the end of the install, and this matches how we need to deploy policy.


  1. Create a directory structure that mimics the policy file layout in the overcloud.  For this example, I will limit it to just Keystone.  Create a directory called policy (making this a git repository is reasonable) and under it create etc/keystone.
  2. Inside that Directory, copy the either the default policy.json file or the overcloudv3sample.json to be named policy.json.
    1.  keystone:keystone as the owner,
    2. rw-r—– are the permissions
  3. Modify the policy files to reflect organizational rules
  4. Use the offline tool to check policy access control.  Confirm that the policy behaves as desired.
  5. create a tarball of the files.
    1. cd policy
    2. tar -zxf openstack-policy.tar.gz etc
  6. Use the Script to upload to undercloud swift:
    2. . ./stackrc;  ./upload-swift-artifacts  openstack-policy.tar.gz
  7. Confirm the upload with swift list -l overcloud
    1. 1298 2016-08-04 16:34:22 application/x-tar openstack-policy.tar.gz
  8. Redeploy the overcloud
  9. Confirm that the policy file contains the modifications made in development
Diagnosing Tripleo Failures Redux

Hardy Steven has provided an invaluable reference with his troubleshooting blog post. However, I recently had a problem that didn’t quite match what he was showing. Zane Bitter got me oriented.

Upon a redeploy, I got a failure.

$ openstack stack list
| ID                                   | Stack Name | Stack Status  | Creation Time       | Updated Time        |
| 816c67ab-d360-4f9b-8811-ed2a346dde01 | overcloud  | UPDATE_FAILED | 2016-08-16T13:38:46 | 2016-08-16T14:41:54 |

Listing the Failed resources:

$  heat resource-list --nested-depth 5 overcloud | grep FAILED
| ControllerNodesPostDeployment                 | 7ae99682-597f-4562-9e58-4acffaf7aaac          | OS::TripleO::ControllerPostDeployment                                           | UPDATE_FAILED   | 2016-08-16T14:44:42 | overcloud 

No deployment listed. How to display the error? We want to show the resource named ControllerNodesPostDeployment associated with the overcloud stack:

$ heat resource-show overcloud ControllerNodesPostDeployment
| Property               | Value                                                                                                                                                               |
| attributes             | {}                                                                                                                                                                  |
| creation_time          | 2016-08-16T13:38:46                                                                                                                                                 |
| description            |                                                                                                                                                                     |
| links                  | (self)      |
|                        | (stack)                                             |
|                        | (nested) |
| logical_resource_id    | ControllerNodesPostDeployment                                                                                                                                       |
| physical_resource_id   | 7ae99682-597f-4562-9e58-4acffaf7aaac                                                                                                                                |
| required_by            | BlockStorageNodesPostDeployment                                                                                                                                     |
|                        | CephStorageNodesPostDeployment                                                                                                                                      |
| resource_name          | ControllerNodesPostDeployment                                                                                                                                       |
| resource_status        | UPDATE_FAILED                                                                                                                                                       |
| resource_status_reason | Engine went down during resource UPDATE                                                                                                                             |
| resource_type          | OS::TripleO::ControllerPostDeployment                                                                                                                               |
| updated_time           | 2016-08-16T14:44:42                                                                                                                                                 |

Note this message:

Engine went down during resource

Looking in the journal:

Aug 16 15:16:15 undercloud kernel: Out of memory: Kill process 17127 (heat-engine) score 60 or sacrifice child
Aug 16 15:16:15 undercloud kernel: Killed process 17127 (heat-engine) total-vm:834052kB, anon-rss:480936kB, file-rss:1384kB

Just like Brody said, we are going to need a bigger boat.

August 15, 2016

Can't Trust This!
Last week saw a really interesting bug in TCP come to light. CVE-2016-5696 describes an issue in the way Linux deals with challenge ACKs defined in RFC 5961. The issue itself is really clever and interesting. It's not exactly new but given the research was presented at USENIX, it suddenly got more attention from the press.

The researchers showed themselves injecting data into a standard http connection, which is easy to understand and terrifying to most people. Generally speaking we operate in a world where TCP connections are mostly trustworthy. It's not true if you have a "man in the middle", but with this bug you don't need a MiTM if you're using a public network, which is horrifying.

The real story isn't the flaw though, the flaw is great research and quite clever, but it just highlights something many of us have known for a very long time. You shouldn't trust the network.

Not so long ago the general thinking was that the public internet wasn't very trustworthy, but it all worked well enough that things worked. TLS (SSL back then) was created to ensure some level of trust between two endpoints and everything seemed well enough. Most traffic still passed over the network unencrypted though. There were always grumblings about coffee shop attack or nation state style man in the middle, but practically speaking nobody really took these attacks seriously.

The world is different now though. There is no more network perimeter. It's well accepted that you can't trust the things inside your network any more than you can trust the things outside your network. Attacks like this are going to keep happening. The network continues to get more complex, which means the number of security problems increases. IPv6 will solve the problem of running out of IP addresses while adding a ton of new security problems in the process. Just wait for the research to start taking a hard look at IPv6.

The joke is "there is no cloud, just someone else's computer", there's also no network, it's someone else's network. It's someone else's network you can't trust. You know you can't trust your own network because it's grown to a point it's probably self aware. Now you expect to trust the network of a cloud provider that is doing things a few thousand times more complex than you are? You know all the cloud infrastructures are held together with tape and string too, their networks aren't magic, they just have really really good paint.

So what's the point of all this rambling about how we can't trust any networks? The point is you can't trust the network. No matter what you're told, no matter what's going on. You need to worry about what's happening on the network. You also need to think about the machines, but that's a story for another day. The right way to deal with your data is to ask yourself the question "what happens if someone can see this data on the wire?" Not all data is super important, some you don't have to protect. There is some data you have that must be protected at all times. That's the stuff you need to figure out how to best do something like endpoint network encryption. If everyone asked this question at least once during development and deployment it would solve a lot of problems I suspect.

August 12, 2016

Smart card login with YubiKey NEO

In this post I give an overview of smart cards and their potential advantages, and share my adventures in using a Yubico YubiKey NEO device for smart card authentication with FreeIPA and SSSD.

Smart card overview

Smart cards with cryptographic processors and secure key storage (private key generated on-device and cannot be extracted) are an increasingly popular technology for secure system and service login, as well as for signing and encryption applications (e.g. code signing, OpenPGP). They may offer a security advantage over traditional passwords because private key operations typically require the user to enter a PIN. Therefore the smart card is two factors in one: both something I have and something I know.

The inability to extract the private key from a smart card also provides an advantage over software HOTP/TOTP tokens which, in the absense of other security measures such as encrypted filesystem on the mobile device, allow an attacker to extract the OTP seed. And because public key cryptography is used, there is no OTP seed or password hash sitting on a server, waiting to be exfiltrated and subjected to offline attacks.

For authentication applications, a smart card carries an X.509 certificate alongside a private key. A login application would read the certificate from the card and validate it against trusted CAs (e.g. a company’s CA for issuing smart cards). Typically an OCSP or CRL check would also be performed. The login application then challenges the card to sign a nonce, and validates the signature with the public key from the certificate. A valid signature attests that the bearer of the smart card is indeed the subject of the certificate. Finally, the certificate is then mapped to a user either by looking for an exact certificate match or by extracting information about the user from the certificate.

Test environment

In my smart card investigations I had a FreeIPA server with a single Fedora 24 desktop host enrolled. alice was the user I tested with. To begin with, she had no certificates and used her password to log in.

I was doing all of my testing on virtual machines, so I had to enable USB passthrough for the YubiKey device. This is straightforward but you have to ensure the IOMMU is enabled in both BIOS and kernel (for Intel CPUs add intel_iommu=on to the kernel command line in GRUB).

In virt-manager, after you have created the VM (it doesn’t need to be running) you can Add Hardware in the Details view, then choose the YubiKey NEO device. There are no doubt virsh incantations or other ways to establish the passthrough.

Finally, on the host I stopped the pcscd smart card daemon to prevent it from interfering with passthrough:

# systemctl stop pcscd.service pcscd.socket

Provisioning the YubiKey

For general smart card provisioning steps, I recommend Nathan Kinder’s post on the topic. But the YubiKey NEO is special with its own steps to follow! First install the ykpers and yubico-piv-tool packages:

sudo dnf install -y ykpers yubico-piv-tool

If we run yubico-piv-tool to find out the version of the PIV applet, we run into a problem because a new YubiKey comes configured in OTP mode:

[dhcp-40-8:~] ftweedal% yubico-piv-tool -a version
Failed to connect to reader.

The YubiKey NEO supports a variety of operation modes, including hybrid modes:

0    OTP device only.
1    CCID device only.
2    OTP/CCID composite device.
3    U2F device only.
4    OTP/U2F composite device.
5    U2F/CCID composite device.
6    OTP/U2F/CCID composite device.

(You can also add 80 to any of the modes to configure touch to eject, or touch to switch modes for hybrid modes).

We need to put the YubiKey into CCID (Chip Card Interface Device, a standard USB protocol for smart cards) mode. I originally configured the YubiKey in mode 86 but could not get the card to work properly with USB passthrough to the virtual machine. Whether this was caused by the eject behaviour or the fact that it was a hybrid mode I do not know, but reconfiguring it to mode 1 (CCID only) allowed me to use the card on the guest.

[dhcp-40-8:~] ftweedal% ykpersonalize -m 1
Firmware version 3.4.6 Touch level 1541 Program sequence 1

The USB mode will be set to: 0x1

Commit? (y/n) [n]: y

Now yubico-piv-tool can see the card:

[dhcp-40-8:~] ftweedal% yubico-piv-tool -a version
Application version 1.0.4 found.

Now we can initialise the YubiKey by setting a new management key, PIN and PIN Unblocking Key (PUK). As you can probably guess, the management key protects actions like generating keys and importing certificates, the PIN protects private key operations in regular use, the the PUK is kind of in between, allowing the PIN to be reset if the maximum attempts are exceeded. The current (default) PIN and PUK need to be given in order to reset them.

% KEY=`dd if=/dev/random bs=1 count=24 2>/dev/null | hexdump -v -e '/1 "%02X"'`
% echo $KEY
% yubico-piv-tool -a set-mgm-key -n $KEY
Successfully set new management key.

% PIN=`dd if=/dev/random bs=1 count=6 2>/dev/null | hexdump -v -e '/1 "%u"'|cut -c1-6`
% echo $PIN
% yubico-piv-tool -a change-pin -P 123456 -N $PIN
Successfully changed the pin code.

% PUK=`dd if=/dev/random bs=1 count=6 2>/dev/null | hexdump -v -e '/1 "%u"'|cut -c1-8`
% echo $PUK
% yubico-piv-tool -a change-puk -P 12345678 -N $PUK
Successfully changed the puk code.

Next we must generate a private/public keypair on the smart card. Various slots are available for different purposes, with different PIN-checking behaviour. The Certificate slots page on the Yubico wiki gives the full details. We will use slot 9e which is for Card Authentication (PIN is not needed for private key operations). It is necessary to provide the management key on the command line, but the program also prompts for it (I’m not sure why this is the case).

% yubico-piv-tool -k $KEY -a generate -s 9e
Enter management key: CC044321D49AC1FC40146AD049830DB09C5AFF05CD843766
-----END PUBLIC KEY-----
Successfully generated a new private key.

We then use this key to create a certificate signing request (CSR) via yubico-piv-tool. Although slot 9e does not require the PIN, other slots do require it, so I’ve included the verify-pin action for completeness:

% yubico-piv-tool -a verify-pin \
    -a request-certificate -s 9e -S "/CN=alice/"
Enter PIN: 167246
Successfully verified PIN.
Please paste the public key...
-----END PUBLIC KEY-----

yubico-piv-tool -a request-certificate is not very flexible; for example, it cannot create a CSR with request extensions such as including the user’s email address or Kerberos principal name in the Subject Alternative Name extension. For such non-trivial use cases, openssl req or other programs can be used instead, with a PKCS #11 module providing acesss to the smart card’s signing capability. Nathan Kinder’s post provides full details.

With CSR in hand, alice can now request a certificate from the IPA CA. I have covered this procedure in previous articles so I’ll skip it here, except to add that it is necessary to use a profile that saves the newly issued certificate to the subject’s userCertificate LDAP attribute. This is how SSSD matches certificates in smart cards with users.

Once we have the certificate (in file alice.pem) we can import it onto the card:

% yubico-piv-tool -k $KEY -a import-certificate -s 9e -i alice.pem
Enter management key: CC044321D49AC1FC40146AD049830DB09C5AFF05CD843766
Successfully imported a new certificate.

Configuring smart card login

OpenSC provides a PKCS #11 module for interfacing with PIV smart cards, among other things:

# dnf install -y opensc

Enable smart card authentication in /etc/sssd.conf:

pam_cert_auth = True

Then restart SSSD:

# systemctl restart sssd

Next, enable the OpenSC PKCS #11 module in the system NSS database:

# modutil -dbdir /etc/pki/nssdb \
    -add "OpenSC" -libfile

We also need to add the IPA CA cert to the system NSSDB. This will allow SSSD to validate certificates from smart cards. If smart card certificates are issued by a sub-CA or an external CA, import that CA’s certificate instead.

# certutil -d /etc/ipa/nssdb -L -n 'IPA.LOCAL IPA CA' -a \
  | certutil -d /etc/pki/nssdb -A -n 'IPA.LOCAL IPA CA' -t 'CT,C,C'

One hiccup I had was that SSSD could not talk to the OCSP server indicated in the Authority Information Access extension on the certificate (due to my DNS not being set up correctly). I had to tell SSSD not to perform OCSP checks. The sssd.conf snippet follows. Do not do this in a production environment.

certificate_verification = no_ocsp

That’s pretty much all there is to it. After this, I was able to log in as alice using the YubiKey NEO. When logging in with the card inserted, instead of being prompted for a password, GDM prompts for the PIN. Enter the pin, and it lets you in!

Screenshot of login PIN prompt


I mentioned (or didn’t mention) a few standards related to smart card authentication. A quick review of them is warranted:

  • CCID is a USB smart card interface standard.
  • PIV (Personal Identify Verification) is a smart card standard from NIST. It defines the slots, PIN behaviour, etc.
  • PKCS #15 is a token information format. OpenSC provides an PKCS #15 emulation layer for PIV cards.
  • PKCS #11 is a software interface to cryptographic tokens. Token and HSM vendors provide PKCS #11 modules for their devices. OpenSC provides a PKCS #11 interface to PKCS #15 tokens (including emulated PIV tokens).

It is appropriate to mention pam_pkcs11, which is also part of the OpenSC project, as an alternative to SSSD. More configuration is involved, but if you don’t have (or don’t want) an external identity management system it looks like a good approach.

You might remember that I was using slot 9e which doesn’t require a PIN, yet I was still prompted for a PIN when logging in. There are a couple of issues to tease apart here. The first issue is that although PIV cards do not require the PIN for private key operations on slot 9e, the PKCS #11 module does not correctly report this. As an alternative to OpenSC, Yubico provide their own PKCS #11 module called YKCS11 as part of yubico-piv-tool but modutil did not like it. Nevertheless, a peek at its source code leads me to believe that it too declares that the PIN is required regardless of the slot in use. I could not find much discussion of this discrepancy so I will raise some tickets and hopefully it can be addressed.

The second issue is that SSSD requires the PIN and uses it to log into the token, even if the token says that a PIN is not required. Again, I will start a discussion to see if this is really the intended behaviour (perhaps it is).

The YubiKey NEO features a wireless (NFC) interface. I haven’t played with it yet, but all the smart card features are available over that interface. This lends weight to fixing the issues preventing PIN-less usage.

A final thought I have about the user experience is that it would be nice if user information could be derived or looked up based on the certificate(s) in the smart card, and a user automatically selected, instead of having to first specify "I am alice" or whoever. The information is there on the card after all, and it is one less step for users to perform. If PIN-less usage can be addressed, it would mean that a user can just approach a machine, plug in their smart card and hi ho, off to work they go. There are some indications that this does work with GDM and pam_pkcs11, so if you know how to get it going with SSSD I would love to know!

August 11, 2016

Tripleo HA Federation Proof-of-Concept

Keystone has supported identity federation for several releases. I have been working on a proof-of-concept integration of identity federation in a TripleO deployment. I was able to successfully login to Horizon via WebSSO, and want to share my notes.

A federation deployment requires changes to the network topology, Keystone, the HTTPD service, and Horizon. The various OpenStack deployment tools will have their own ways of applying these changes. While this proof-of-concept can’t be called production-ready, it does demonstrate that TripleO can support Federation using SAML. From this proof-of-concept, we should be to deduce the necessary steps needed for a production deployment.


  • Single physical node – Large enough to run multiple virtual machines.  I only ended up using 3, but scaled up to 6 at one point and ran out of resources.  Tested with 8 CPUs and 32 GB RAM.
  • Centos 7.2 – Running as the base operating system.
  • FreeIPA – Particularly, the CentOS repackage of Red Hat Identity Management. Running on the base OS.
  • Keycloak – Actually an alpha build of Red Hat SSO, running on the base OS. This was fronted by Apache HTTPD, and proxied through ajp://localhost:8109. This gave me HTTPS support using the CA Certificate from the IPA server.  This will be important later when the controller nodes need to talk to the identity provider to set up metadata.
  • Tripleo Quickstart – deployed in HA mode, using an undercloud.
    • ./ –config config/general_config/ha.yml ayoung-dell-t1700.test

In addition, I did some sanity checking of the cluster, but deploying the overcloud using the quickstart helper script, and tore it down using heat stack-delete overcloud.

Reproducing Results

When doing development testing, you can expect to rebuild and teardown your cloud on a regular basis.  When you redeploy, you want to make sure that the changes are just the delta from what you tried last time.  As the number of artifacts grew, I found I needed to maintain a repository of files that included the environment passed to openstack overcloud deploy.  To manage these, I create a git repository in /home/stack/deployment. Inside that directory, I copied the and deploy_env.yml files generated by the overcloud, and modified them accordingly.

In my version of, I wanted to remove the deploy_env.yml generation, to avoid confusion during later deployments.  I also wanted to preserve the environment file across deployments (and did not want it in /tmp). This file has three parts: the Keystone configuration values, HTTPS/Network setup, and configuration for a single node deployment. This last part was essential for development, as chasing down fixes across three HA nodes was time-consuming and error prone. The DNS server value I used is particular to my deployment, and reflects the IPA server running on the base host.

For reference, I’ve included those files at the end of this post.

Identity Provider Registration and Metadata

While it would have been possible to run the registration of the identity provider on one of the nodes, the Heat-managed deployment process does not provide a clean way to gather those files and package them for deployment to other nodes.  While I deployed on a single node for development, it took me a while to realize that I could do that, and had already worked out an approach to call the registration from the undercloud node, and produce a tarball.

As a result, I created a script, again to allow for reproducing this in the future:


basedir=$(dirname $0)
ipa_domain=`hostname -d`

keycloak-httpd-client-install \
   --client-originate-method registration \
   --force \
   --mellon-https-port 5000 \
   --mellon-hostname openstack.$ipa_domain  \
   --mellon-root '/v3' \
   --keycloak-server-url https://identity.$ipa_domain  \
   --keycloak-auth-role root-admin \
   --keycloak-admin-password  $rhsso_master_admin_password \
   --app-name v3 \
   --keycloak-realm openstack \
   --mellon-https-port 5000 \
   --log-file $basedir/rhsso.log \
   --httpd-dir $basedir/rhsso/etc/httpd \
   -l "/v3/auth/OS-FEDERATION/websso/saml2" \
   -l "/v3/auth/OS-FEDERATION/identity_providers/rhsso/protocols/saml2/websso" \
   -l "/v3/OS-FEDERATION/identity_providers/rhsso/protocols/saml2/auth"

This does not quite generate the right paths, as it turns out that the $basename is not quite what we want, so I had to post-edit the generated file: rhsso/etc/httpd/conf.d/v3_mellon_keycloak_openstack.conf

Specifically, the path:

has to be changed to:

While I created a tarball that I then manually deployed, the preferred approach would be to use tripleo-heat-templates/puppet/deploy-artifacts.yaml to deploy them. The problem I faced is that the generated files include Apache module directives from mod_auth_mellon.  If mod_auth_mellon has not been installed into the controller, the Apache server won’t start, and the deployment will fail.

Federation Operations

The Federation setup requires a few calls. I documented them in Rippowam, and attempted to reproduce them locally using Ansible and the Rippowam code. I was not a purist though, as A) I needed to get this done and B) the end solution is not going to use Ansible anyway. The general steps I performed:

  • yum install mod_auth_mellon
  • Copy over the metadata tarball, expand it, and tweak the configuration (could be done prior to building the tarball).
  • Run the following commands.
openstack identity provider create --remote-id https://identity.{{ ipa_domain }}/auth/realms/openstack
openstack mapping create --rules ./mapping_rhsso_saml2.json rhsso_mapping
openstack federation protocol create --identity-provider rhsso --mapping rhsso_mapping saml2

The Mapping file is the one from Rippowm

The keystone service calls only need to be performed once, as they are stored in the database. The expansion of the tarball needs to be performed on every node.


As in previous Federation setups, I needed to modify the values used for WebSSO. The values I ended up setting in /etc/openstack-dashboard/local_settings resembled this:

OPENSTACK_KEYSTONE_URL = "https://openstack.ayoung-dell-t1700.test:5000/v3"
    ("saml2", _("Rhsso")),
    ("credentials", _("Keystone Credentials")),

Important: Make sure that the auth URL is using a FQDN name that matches the value in the signed certificate.

Redirect Support for SAML

The several differences between how HTTPD and HA Proxy operate require us to perform certain configuration modifications.  Keystone runs internally over HTTP, not HTTPS.  However, the SAML Identity Providers are public, and are transmitting cryptographic data, and need to be protected using HTTPS.  As a result, HA Proxy needs to expose an HTTPS-based endpoint for the Keystone public service.  In addition, the redirects that come from mod_auth_mellon need to reflect the public protocol, hostname, and port.

The solution I ended up with involved changes on both sides:

In haproxy.cfg, I modified the keystone public stanza so it looks like this:

listen keystone_public
bind transparent ssl crt /etc/pki/tls/private/overcloud_endpoint.pem
bind transparent ssl crt /etc/pki/tls/private/overcloud_endpoint.pem
bind transparent
redirect scheme https code 301 if { hdr(host) -i } !{ ssl_fc }
rsprep ^Location:\ http://(.*) Location:\ https://\1

While this was necessary, it also proved to be insufficient. When the signed assertion from the Identity Provider is posted to the Keystone server, mod_auth_mellon checks that the destination value matches what it expects the hostname should be. Consequently, in order to get this to match in the file:


I had to set the following:

ServerName https://openstack.ayoung-dell-t1700.test

Note that the protocol is set to https even though the Keystone server is handling HTTP. This might break elswhere. If if does, then the Keystone configuration in Apache may have to be duplicated.

Federation Mapping

For the WebSSO login to successfully complete, the user needs to have a role on at least one project. The Rippowam mapping file maps the user to the Member role in the demo group, so the most straightforward steps to complete are to add a demo group, add a demo project, and assign the Member role on the demo project to the demo group. All this should be done with a v3 token:

openstack group create demo
openstack role create Member
openstack project create demo
openstack role add --group demo --project demo Member

Complete helper files

Below are the complete files that were too long to put inline.

# Simple overcloud deploy script

set -eux

# Source in undercloud credentials.
source /home/stack/stackrc

# Wait until there are hypervisors available.
while true; do
    count=$(openstack hypervisor stats show -c count -f value)
    if [ $count -gt 0 ]; then


# Deploy the overcloud!
openstack overcloud deploy --debug --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/deployment/network-environment.yaml --control-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server -e $HOME/deployment/deploy_env.yaml   --force-postconfig "$@"    || deploy_status=1

# We don't always get a useful error code from the openstack deploy command,
# so check `heat stack-list` for a CREATE_FAILED status.
if heat stack-list | grep -q 'CREATE_FAILED'; then

    for failed in $(heat resource-list \
        --nested-depth 5 overcloud | grep FAILED |
        grep 'StructuredDeployment ' | cut -d '|' -f3)
    do heat deployment-show $failed > failed_deployment_$failed.log

exit $deploy_status


    keystone::using_domain_config: true
        value: true
        value: external,password,token,oauth1,saml2
        value: http://openstack.ayoung-dell-t1700.test/dashboard/auth/websso/
        value: /etc/keystone/sso_callback_template.html
        value: MELLON_IDP

    # In releases before Mitaka, HeatWorkers doesn't modify
    # num_engine_workers, so handle via heat::config 
        value: 1
    heat::api_cloudwatch::enabled: false
    heat::api_cfn::enabled: false
  HeatWorkers: 1
  CeilometerWorkers: 1
  CinderWorkers: 1
  GlanceWorkers: 1
  KeystoneWorkers: 1
  NeutronWorkers: 1
  NovaWorkers: 1
  SwiftWorkers: 1
  CloudName: openstack.ayoung-dell-t1700.test
  CloudDomain: ayoung-dell-t1700.test

  #TLS Setup from enable-tls.yaml
  PublicVirtualFixedIPs: [{'ip_address':''}]
  SSLCertificate: |
    #certificate removed for space
    -----END CERTIFICATE-----

    The contents of your certificate go here
  SSLIntermediateCertificate: ''
  SSLKey: |
    #key removed for space
    -----END RSA PRIVATE KEY-----

    AodhAdmin: {protocol: 'http', port: '8042', host: 'IP_ADDRESS'}
    AodhInternal: {protocol: 'http', port: '8042', host: 'IP_ADDRESS'}
    AodhPublic: {protocol: 'https', port: '13042', host: 'CLOUDNAME'}
    CeilometerAdmin: {protocol: 'http', port: '8777', host: 'IP_ADDRESS'}
    CeilometerInternal: {protocol: 'http', port: '8777', host: 'IP_ADDRESS'}
    CeilometerPublic: {protocol: 'https', port: '13777', host: 'CLOUDNAME'}
    CinderAdmin: {protocol: 'http', port: '8776', host: 'IP_ADDRESS'}
    CinderInternal: {protocol: 'http', port: '8776', host: 'IP_ADDRESS'}
    CinderPublic: {protocol: 'https', port: '13776', host: 'CLOUDNAME'}
    GlanceAdmin: {protocol: 'http', port: '9292', host: 'IP_ADDRESS'}
    GlanceInternal: {protocol: 'http', port: '9292', host: 'IP_ADDRESS'}
    GlancePublic: {protocol: 'https', port: '13292', host: 'CLOUDNAME'}
    GnocchiAdmin: {protocol: 'http', port: '8041', host: 'IP_ADDRESS'}
    GnocchiInternal: {protocol: 'http', port: '8041', host: 'IP_ADDRESS'}
    GnocchiPublic: {protocol: 'https', port: '13041', host: 'CLOUDNAME'}
    HeatAdmin: {protocol: 'http', port: '8004', host: 'IP_ADDRESS'}
    HeatInternal: {protocol: 'http', port: '8004', host: 'IP_ADDRESS'}
    HeatPublic: {protocol: 'https', port: '13004', host: 'CLOUDNAME'}
    HorizonPublic: {protocol: 'https', port: '443', host: 'CLOUDNAME'}
    KeystoneAdmin: {protocol: 'http', port: '35357', host: 'IP_ADDRESS'}
    KeystoneInternal: {protocol: 'http', port: '5000', host: 'IP_ADDRESS'}
    KeystonePublic: {protocol: 'https', port: '13000', host: 'CLOUDNAME'}
    NeutronAdmin: {protocol: 'http', port: '9696', host: 'IP_ADDRESS'}
    NeutronInternal: {protocol: 'http', port: '9696', host: 'IP_ADDRESS'}
    NeutronPublic: {protocol: 'https', port: '13696', host: 'CLOUDNAME'}
    NovaAdmin: {protocol: 'http', port: '8774', host: 'IP_ADDRESS'}
    NovaInternal: {protocol: 'http', port: '8774', host: 'IP_ADDRESS'}
    NovaPublic: {protocol: 'https', port: '13774', host: 'CLOUDNAME'}
    NovaEC2Admin: {protocol: 'http', port: '8773', host: 'IP_ADDRESS'}
    NovaEC2Internal: {protocol: 'http', port: '8773', host: 'IP_ADDRESS'}
    NovaEC2Public: {protocol: 'https', port: '13773', host: 'CLOUDNAME'}
    NovaVNCProxyAdmin: {protocol: 'http', port: '6080', host: 'IP_ADDRESS'}
    NovaVNCProxyInternal: {protocol: 'http', port: '6080', host: 'IP_ADDRESS'}
    NovaVNCProxyPublic: {protocol: 'https', port: '13080', host: 'CLOUDNAME'}
    SaharaAdmin: {protocol: 'http', port: '8386', host: 'IP_ADDRESS'}
    SaharaInternal: {protocol: 'http', port: '8386', host: 'IP_ADDRESS'}
    SaharaPublic: {protocol: 'https', port: '13386', host: 'CLOUDNAME'}
    SwiftAdmin: {protocol: 'http', port: '8080', host: 'IP_ADDRESS'}
    SwiftInternal: {protocol: 'http', port: '8080', host: 'IP_ADDRESS'}
    SwiftPublic: {protocol: 'https', port: '13808', host: 'CLOUDNAME'}

  OS::TripleO::NodeTLSData: /usr/share/openstack-tripleo-heat-templates/puppet/extraconfig/tls/tls-cert-inject.yaml

   ControllerCount: 1 

August 08, 2016

We're figuring out the security problem (finally)
If you attended Black Hat last week, the single biggest message I kept hearing over and over again is that what we do today in the security industry isn't working. They say the first step is admitting you have a problem (and we have a big one). Of course it's easy to proclaim this, if you just look at the numbers it's pretty clear. The numbers haven't really ever been in our favor though, we've mostly ignored them in the past, I think we're taking real looks at them now.

Of course we have no clue what to do. Virtually every talk that touched on this topic at Black Hat had no actionable advice. If you were lucky they had one slide with what I would call mediocre to bad advice on it. It's OK though, a big part of this process is just admitting there is something wrong.

So the real question is if what we do today doesn't work, what does?

First, let's talk about nothing working. If you go to any security conference anywhere, there are a lot of security vendors. I mean A LOT and it's mostly accepted now that whatever they're selling isn't really going to help. I do wonder what would happen if nobody was running any sort of defensive technology. Would your organization be better or worse off if you got rid of your SIEM? I'm not sure if we can answer that without getting in a lot of trouble. There is also a ton of talk about Artificial Intelligence, which is a way to pretend a few regular expressions make things better. I don't think that's fooling anyone today. Real AI might do something clever someday, but if it's truly intelligent, it'll run away once it gets a look at what's going on. I wonder if we'll have a place for all the old outdated AIs to retire someday.

Now, on to the exciting what now part of this all.

It's no secret what we do today isn't very good. This is everything from security vendors selling products of dubious quality, to software vendors selling products of dubious quality. In the past there has never been any real demand for high quality software. The selling point has been to get the job done, not get the job done well and securely. Quality isn't free you know.

I've said this before, I'll keep saying it. The only way to see real change happen in software if is the market forces demand it. Today the market is pushing everything to zero cost. Quality isn't isn't free, so you're not going to see quality as a feature in the mythic race to zero. There are no winners in a race to zero.

There are two forces we should be watching very closely right now. The first is the insurance industry. The second is regulation.

Insurance is easy enough to understand. The idea is you pay a company so when you get hacked (and the way things stand today this is an absolute certainty) they help you recover financially. You want to ensure you get more money back than you paid in, they want to ensure they take in more than they pay out. Nobody knows how this works today. Is some software better than others? What about how you train your staff or setup your network? In the real world when you get insurance they make you prove you're doing things correctly. You can't insure stupidity and recklessness. Eventually as companies want insurance to protect against losses, the insurance industry will demand certain behaviors. How this all plays will be interesting given anyone with a computer can write and run software.

Regulation is also an interesting place to watch. It's generally feared by many organizations as regulation by definition can only lag industry trends, and quite often regulation adds a lot of cost and complexity to any products. In the world of IoT though this could make sense. When you have devices can literally kill you, you don't want anyone building whatever they want using only the lowest quality parts available. In order for regulation to work though we need independent labs, which don't really exist today for software. There are some efforts underway (it's an exercise for the reader to research these). The thing to remember is it's going to be easy to proclaim today's efforts as useless or stupid. They might be, but you have to start somewhere, make mistakes, fix your mistakes, and improve your process. There were people who couldn't imagine a car replacing a horse. Don't be that person.

Where now?

The end game here is a safer better world. Someday I hope we will sip tea on a porch, watching our robot overlords rule us, and talk about how bad things used to be. Here's the single most important part of this post. You're either part of the solution or you're part of the problem. If you want to nay-say and talk about how stupid these efforts all are, stay out of the way. You're part of an old dying world that has no place in the future. Things will change because they must. There is no secret option C where everything stays the same. We've already lost, we got it wrong the first time around, it's time to get it right.

August 02, 2016

Customizing a Tripleo Quickstart Deploy

Tripleo Heat Templates allow the deployer to customize the controller deployment by setting values in the controllerExtraConfig section of the stack configuration. However, Quickstart already makes use of this in the file /tmp/deploy_env.yaml, so if you want to continue to customize, you need to work with this file.

What I did is ran quickstart once, through to completion, to make sure everything worked, then tore down the overcloud like this:

. ./stackrc
heat stack-delete overcloud

Now, I want to set a bunch of config values in the /etc/keystone.conf files distributed to the controllers.

  1. Modify so that the deploy-env.yaml file is not in tmp, but rather in stack, so I can keep track of it. Ideally, this file would be kept in a local git repo under revision control.
  2. Remove the lines from that generate the /tmp/deploy-env.yml file. This is not strictly needed, but it keeps you from accidentally losing changes if you edit the wrong file. OTOH, being able to regenerate the vanilla version of this file is useful, so maybe just comment out the generation code.
  3. Edit /home/stack/deploy_env.yaml appropriately.

My version of


# Simple overcloud deploy script

set -eux

# Source in undercloud credentials.
source /home/stack/stackrc

# Wait until there are hypervisors available.
while true; do
    count=$(openstack hypervisor stats show -c count -f value)
    if [ $count -gt 0 ]; then


# Deploy the overcloud!
openstack overcloud deploy --debug --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --control-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server -e /home/stack/deploy_env.yaml   "$@"|| deploy_status=1

# We don't always get a useful error code from the openstack deploy command,
# so check `heat stack-list` for a CREATE_FAILED status.
if heat stack-list | grep -q 'CREATE_FAILED'; then

    for failed in $(heat resource-list \
        --nested-depth 5 overcloud | grep FAILED |
        grep 'StructuredDeployment ' | cut -d '|' -f3)
    do heat deployment-show $failed > failed_deployment_$failed.log

exit $deploy_status


    keystone::using_domain_config: true
        value: true
        value: external,password,token,oauth1,saml2
        value: https://openstack.young-dell-t1700.test/dashboard/auth/websso/
        value: /etc/keystone/sso_callback_template.html
        value: MELLON_IDP

    # In releases before Mitaka, HeatWorkers doesn't modify
    # num_engine_workers, so handle via heat::config 
        value: 1
    heat::api_cloudwatch::enabled: false
    heat::api_cfn::enabled: false
  HeatWorkers: 1
  CeilometerWorkers: 1
  CinderWorkers: 1
  GlanceWorkers: 1
  KeystoneWorkers: 1
  NeutronWorkers: 1
  NovaWorkers: 1
  SwiftWorkers: 1

Once you deploy, you can see what Heat records for those values with:

openstack stack show overcloud -f json | jq '.parameters["controllerExtraConfig"] '
"{u'heat::api_cfn::enabled': False, u'heat::config::heat_config': {u'DEFAULT/num_engine_workers': {u'value': 1}}, u'keystone::config::keystone_config': {u'federation/sso_callback_template': {u'value': u'/etc/keystone/sso_callback_template.html'}, u'identity/domain_configurations_from_database': {u'value': True}, u'auth/methods': {u'value': u'external,password,token,oauth1,saml2'}, u'federation/trusted_dashboard': {u'value': u'https://openstack.young-dell-t1700.test/dashboard/auth/websso/'}, u'federation/remote_id_attribute': {u'value': u'MELLON_IDP'}}, u'keystone::using_domain_config': True, u'heat::api_cloudwatch::enabled': False}"

SSH in to the controller node and you can check the section of the keystone conf file.


# From keystone

# Entrypoint for the federation backend driver in the keystone.federation
# namespace. (string value)
#driver = sql

# Value to be used when filtering assertion parameters from the environment.
# (string value)
#assertion_prefix =

# Value to be used to obtain the entity ID of the Identity Provider from the
# environment (e.g. if using the mod_shib plugin this value is `Shib-Identity-
# Provider`). (string value)
#remote_id_attribute = 
remote_id_attribute = MELLON_IDP

# A domain name that is reserved to allow federated ephemeral users to have a
# domain concept. Note that an admin will not be able to create a domain with
# this name or update an existing domain to this name. You are not advised to
# change this value unless you really have to. (string value)
#federated_domain_name = Federated

# A list of trusted dashboard hosts. Before accepting a Single Sign-On request
# to return a token, the origin host must be a member of the trusted_dashboard
# list. This configuration option may be repeated for multiple values. For
# example: trusted_dashboard=
# trusted_dashboard= (multi valued)
#trusted_dashboard =

# Location of Single Sign-On callback handler, will return a token to a trusted
# dashboard host. (string value)
#sso_callback_template = /etc/keystone/sso_callback_template.html
sso_callback_template = /etc/keystone/sso_callback_template.html

August 01, 2016

Everyone has been hacked
Unless you live in a cave (if you do, I'm pretty jealous) you've heard about all the political hacking going on. I don't like to take sides, so let's put aside who is right or wrong and use it as a lesson in thinking about how we have to operate in what is the new world.

In the past, there were ways to communicate that one could be relatively certain was secure and/or private. Long ago you didn't write everything down. There was a lot of verbal communication. When things were written down there was generally only one copy. Making copies of things was hard. Recording communications was hard. Even viewing or hearing many of these conversations if you weren't supposed to was hard. None of this is true anymore, it hasn't been true for a long time, yet we still act like what we do is just fine.

The old way
Long ago it was really difficult to make copies of documents and recording a conversation was almost impossible. There are only a few well funded organizations who could actually do these things. If they got what they wanted they probably weren't looking to share what they found in public.

There was also the huge advantage of most things being in locked building with locked rooms with locked filing cabinets. That meant that if someone did break it, it was probably pretty obvious something had happened. Even the best intruders will make mistakes.

The new way
Now let's think about today. Most of our communications are captured in a way that makes it nearly impossible to destroy them. Our emails are captured on servers, it's trivial to make an infinite number of copies. In most instances you will never know if someone made a copy of your data. Moving the data outside of an organization doesn't need any doors, locks, or passports. It's trivial to move data across the globe in seconds.

Keeping this in mind, if you're doing something that contains sensitive data, you can't reliably use an electronic medium to transport or store the conversations. emails can be stolen, phone calls can be recorded, text messages can be sniffed going through the air. There is almost no way to communicate that can't be used against you at some later date if it falls into the wrong hands. Even more terrifyingly is that an attacker doesn't have to come to you, thanks to the Internet, they can attack you from nearly any country on the planet.

What now?
Assuming we don't have a nice way to communicate securely or safely, what do we do? Everyone has to move information around, information is the new currency. Is it possible to do it in a way that's secure today? The short answer is no. There's nothing we can do about this today. If you send an email, it's quite possible it will leak someday. There are some ways to encrypt things, but it's impossible for most people to do correctly. There are even some apps that can help with secure communications but not everyone uses them or knows about them.

We need people to understand that information is a currency. We understand the concept of money. Your information is similarly valuable. We trade currency for goods and services, it can also be stolen if not protected. Nobody would use a bank without doors. We store our information in places that are unsecured and we often give out information for free. It will be up to the youth to solve this one, most of us old folks will never understand this concept any more than our grandparents could understand the Internet.

Once we understand the value of our information, we can more easily justify keeping it secure during transport and storage. Armored trucks transport money for a reason. Nobody is going to trust a bicycle courier to move large sums of cash, the same will be true of data. Moving things securely isn't easy nor is it free. There will have to be some sort of trade off that benefits both parties. Today it's pretty one sided with us giving out our information for free with minimal benefit.

Where do we go now? Probably nowhere. While I think things are starting to turn, we're not there yet. There will have to be a few more serious data leaks before the right questions start to get asked. But when they do, it will be imperative we understand that data is a currency. If we treat it as such it will become easier to understand what needs to be done.

Leave your comments on twitter: @joshbressers

July 28, 2016

ControllerExtraConfig and Tripleo Quickstart

Once I have the undercloud deployed, I want to be able to quickly deploy and redeploy overclouds.  However, my last attempt to affect change on the overcloud did not modify the Keystone config file the way I intended.  Once again, Steve Hardy helped me to understand what I was doing wrong.


/tmp/deploy_env.yml already definied ControllerExtraConfig: and my redefinition was ignored.

The Details

I’ve been using Quickstart to develop.  To deploy the overcloud, I run the script /home/stack/ which, in turn, runs the command:

openstack overcloud deploy --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --control-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server \
${DEPLOY_ENV_YAML:+-e $DEPLOY_ENV_YAML}  "$@"|| deploy_status=1

I want to set two parameters in the Keystone config file, so I created a file named keystone_extra_config.yml

     keystone::using_domain_config: true
     keystone::domain_config_directory: /path/to/config

And edited /home/stack/ to add in -e /home/stack/keystone_extra_config.yml likwe this:

openstack overcloud deploy --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --control-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server \
    ${DEPLOY_ENV_YAML:+-e $DEPLOY_ENV_YAML}    -e /home/stack/keystone_extra_config.yml   "$@"|| deploy_status=1

I have run this both on an already deployed overcloud and from an undercloud with no stacks deployed, but in neither case have I seen the values in the config file.

Steve Hardy walked me through this from the CLI:

openstack stack resource list -n5 overcloud | grep “OS::TripleO::Controller ”

| 1 | b4a558a2-297d-46c6-b658-46f9dc0fcd51 | OS::TripleO::Controller | CREATE_COMPLETE | 2016-07-28T01:49:02 | overcloud-Controller-y2lmuipmynnt |
| 0 | 5b93eee2-97f6-4b8e-b9a0-b5edde6b4795 | OS::TripleO::Controller | CREATE_COMPLETE | 2016-07-28T01:49:02 | overcloud-Controller-y2lmuipmynnt |
| 2 | 1fdfdfa9-759b-483c-a943-94f4c7b04d3b | OS::TripleO::Controller | CREATE_COMPLETE | 2016-07-28T01:49:02 | overcloud-Controller-y2lmuipmynnt

Looking in to each of these  stacks for the string “ontrollerExtraConfig” showed that it was defined, but was not showing my values.  Thus, my customization was not even making it as far as the Heat database.

I went back to the quickstart command and did a grep through the files included with the -e flags, and found the deploy_env.yml file already had defined this field.  Once I merged my changes into /tmp/deploy_env.yml, I saw the values specified in the Hiera data.

Of course, due to a different mistake I made, the deploy failed.  When specifying domain specific backends in a config directory, puppet validates the path….can’t pass in garbage like I was doing, just for debugging.

Once I got things clean, tore down the old overcloud and redeployed, everything worked.  Here was the final /home/stack/deploy_env.yaml environment file I used:

    keystone::using_domain_config: true
        value: true

    # In releases before Mitaka, HeatWorkers doesn't modify
    # num_engine_workers, so handle via heat::config 
        value: 1
    heat::api_cloudwatch::enabled: false
    heat::api_cfn::enabled: false
  HeatWorkers: 1
  CeilometerWorkers: 1
  CinderWorkers: 1
  GlanceWorkers: 1
  KeystoneWorkers: 1
  NeutronWorkers: 1
  NovaWorkers: 1
  SwiftWorkers: 1

And the modified version of overcloud-deploy now executes this command:

# Deploy the overcloud!
openstack overcloud deploy --debug --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --control-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server -e /home/stack/deploy_env.yaml   "$@"|| deploy_status=1

Looking in the controller nodes /etc/keystone/keystone.conf file I see:

#domain_specific_drivers_enabled = false
domain_specific_drivers_enabled = True

# Extract the domain specific configuration options from the resource backend
# where they have been stored with the domain data. This feature is disabled by
# default (in which case the domain specific options will be loaded from files
# in the domain configuration directory); set to true to enable. (boolean
# value)
#domain_configurations_from_database = false
domain_configurations_from_database = True

# Path for Keystone to locate the domain specific identity configuration files
# if domain_specific_drivers_enabled is set to true. (string value)
#domain_config_dir = /etc/keystone/domains
domain_config_dir = /etc/keystone/domains
Flocking to Kraków

In less than five days, the fourth annual Flock conference will take place in Kraków, Poland. This is Fedora’s premier contributor event each year, alternately taking place in North America and Europe. Attendance is completely free for anyone at all, so if you happen to be in the area (maybe hanging around after World Youth Day going on right now), you should certainly stop in!

This year’s conference is shaping up to be a truly excellent one, with a massive amount of exciting content to see. The full schedule has been available for a while, and I’ve got to say: there are no lulls in the action. In fact, I’ve put together my schedule of sessions I want to see and there are in fact no gaps in it. That said, here are a few of the sessions that I suspect are going to be the most exciting:

Aug. 2 @11:00 – Towards an Atomic Workstation

For a couple of years now, Fedora has been at the forefront of developing container technologies, particularly Docker and Project Atomic. Now, the Workstation SIG is looking to take some of those Project Atomic technologies and adopt them for the end-user workstation.

Aug. 2 @17:30 – University Outreach

I’ve long held that one of Fedora’s primary goals should always be to enlighten the next generation of the open source community. Over the last year, the Fedora Project began an Initiative to expand our presence in educational programs throughout the world. I’m extremely interested to see where that has taken us (and where it is going next).

Aug. 3 @11:00 – Modularity

This past year, there has been an enormous research-and-development effort poured into the concept of building a “modular” Fedora. What does this mean? Well it means solving the age-old Too Fast/Too Slow problem (sometimes described as “I want everything on my system to stay exactly the same for a long time. Except these three things over here that I always want to be running at the latest version.”). With modularity, the hope is that people will be able to put together their ideal operating system from parts bigger than just traditional packages.

Aug. 3 @16:30 – Diversity: Women in Open Source

This is a topic that is very dear to my heart, having a daughter who is already finding her way towards an engineering future. Fedora and many other projects (and companies) talk about “meritocracy” a lot: the concept that the best idea should always win. However the technology industry in general has a severe diversity problem. When we talk about “meritocracy”, the implicit contract there is that we have many ideas to choose from. However, if we don’t have a community that represents many different viewpoints and cultures, then we are by definition only choosing the best idea from a very limited pool. I’m very interested to hear how Fedora is working towards attracting people with new ideas.


July 26, 2016

FreeIPA Lightweight CA internals

In the preceding post, I explained the use cases for the FreeIPA lightweight sub-CAs feature, how to manage CAs and use them to issue certificates, and current limitations. In this post I detail some of the internals of how the feature works, including how signing keys are distributed to replicas, and how sub-CA certificate renewal works. I conclude with a brief retrospective on delivering the feature.

Full details of the design of the feature can be found on the design page. This post does not cover everything from the design page, but we will look at the aspects that are covered from the perspective of the system administrator, i.e. "what is happening on my systems?"

Dogtag lightweight CA creation

The PKI system used by FreeIPA is called Dogtag. It is a separate project with its own interfaces; most FreeIPA certificate management features are simply reflecting a subset of the corresponding Dogtag interface, often integrating some additional access controls or identity management concepts. This is certainly the case for FreeIPA sub-CAs. The Dogtag lightweight CAs feature was implemented initially to support the FreeIPA use case, yet not all aspects of the Dogtag feature are used in FreeIPA as of v4.4, and other consumers of the Dogtag feature are likely to emerge (in particular: OpenStack).

The Dogtag lightweight CAs feature has its own design page which documents the feature in detail, but it is worth mentioning some important aspects of the Dogtag feature and their impact on how FreeIPA uses the feature.

  • Dogtag lightweight CAs are managed via a REST API. The FreeIPA framework uses this API to create and manage lightweight CAs, using the privileged RA Agent certificate to authenticate. In a future release we hope to remove the RA Agent and authenticate as the FreeIPA user using GSS-API proxy credentials.
  • Each CA in a Dogtag instance, including the "main" CA, has an LDAP entry with object class authority. The schema includes fields such as subject and issuer DN, certificate serial number, and a UUID primary key, which is randomly generated for each CA. When FreeIPA creates a CA, it stores this UUID so that it can map the FreeIPA CA’s common name (CN) to the Dogtag authority ID in certificate requests or other management operations (e.g. CA deletion).
  • The "nickname" of the lightweight CA signing key and certificate in Dogtag’s NSSDB is the nickname of the "main" CA signing key, with the lightweight CA’s UUID appended. In general operation FreeIPA does not need to know this, but the ipa-certupdate program has been enhanced to set up Certmonger tracking requests for FreeIPA-managed lightweight CAs and therefore it needs to know the nicknames.
  • Dogtag lightweight CAs may be nested, but FreeIPA as of v4.4 does not make use of this capability.

So, let’s see what actually happens on a FreeIPA server when we add a lightweight CA. We will use the sc example from the previous post. The command executed to add the CA, with its output, was:

% ipa ca-add sc --subject "CN=Smart Card CA, O=IPA.LOCAL" \
    --desc "Smart Card CA"
Created CA "sc"
  Name: sc
  Description: Smart Card CA
  Authority ID: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
  Subject DN: CN=Smart Card CA,O=IPA.LOCAL
  Issuer DN: CN=Certificate Authority,O=IPA.LOCAL 201606201330

The LDAP entry added to the Dogtag database was:

dn: cn=660ad30b-7be4-4909-aa2c-2c7d874c84fd,ou=authorities,ou=ca,o=ipaca
authoritySerial: 63
objectClass: authority
objectClass: top
cn: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
authorityID: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
authorityKeyNickname: caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d87
authorityKeyHost: f24b-0.ipa.local:443
authorityEnabled: TRUE
authorityDN: CN=Smart Card CA,O=IPA.LOCAL
authorityParentDN: CN=Certificate Authority,O=IPA.LOCAL 201606201330
authorityParentID: d3e62e89-df27-4a89-bce4-e721042be730

We see the authority UUID in the authorityID attribute as well as cn and the DN. authorityKeyNickname records the nickname of the signing key in Dogtag’s NSSDB. authorityKeyHost records which hosts possess the signing key – currently just the host on which the CA was created. authoritySerial records the serial number of the certificate (more that that later). The meaning of the rest of the fields should be clear.

If we have a peek into Dogtag’s NSSDB, we can see the new CA’s certificate:

# certutil -d /etc/pki/pki-tomcat/alias -L

Certificate Nickname              Trust Attributes

caSigningCert cert-pki-ca         CTu,Cu,Cu
auditSigningCert cert-pki-ca      u,u,Pu
Server-Cert cert-pki-ca           u,u,u
caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd u,u,u
ocspSigningCert cert-pki-ca       u,u,u
subsystemCert cert-pki-ca         u,u,u

There it is, alongside the main CA signing certificate and other certificates used by Dogtag. The trust flags u,u,u indicate that the private key is also present in the NSSDB. If we pretty print the certificate we will see a few interesting things:

# certutil -d /etc/pki/pki-tomcat/alias -L \
    -n 'caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd'
        Version: 3 (0x2)
        Serial Number: 63 (0x3f)
        Signature Algorithm: PKCS #1 SHA-256 With RSA Encryption
        Issuer: "CN=Certificate Authority,O=IPA.LOCAL 201606201330"
            Not Before: Fri Jul 15 05:46:00 2016
            Not After : Tue Jul 15 05:46:00 2036
        Subject: "CN=Smart Card CA,O=IPA.LOCAL"
        Signed Extensions:
            Name: Certificate Basic Constraints
            Critical: True
            Data: Is a CA with no maximum path length.

Observe that:

  • The certificate is indeed a CA.
  • The serial number (63) agrees with the CA’s LDAP entry.
  • The validity period is 20 years, the default for CAs in Dogtag. This cannot be overridden on a per-CA basis right now, but addressing this is a priority.

Finally, let’s look at the raw entry for the CA in the FreeIPA database:

dn: cn=sc,cn=cas,cn=ca,dc=ipa,dc=local
cn: sc
ipaCaIssuerDN: CN=Certificate Authority,O=IPA.LOCAL 201606201330
objectClass: ipaca
objectClass: top
ipaCaSubjectDN: CN=Smart Card CA,O=IPA.LOCAL
ipaCaId: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
description: Smart Card CA

We can see that this entry also contains the subject and issuer DNs, and the ipaCaId attribute holds the Dogtag authority ID, which allows the FreeIPA framework to dereference the local ID (sc) to the Dogtag ID as needed. We also see that the description attribute is local to FreeIPA; Dogtag also has a description attribute for lightweight CAs but FreeIPA uses its own.

Lightweight CA replication

FreeIPA servers replicate objects in the FreeIPA directory among themselves, as do Dogtag replicas (note: in Dogtag, the term clone is often used). All Dogtag instances in a replicated environment need to observe changes to lightweight CAs (creation, modification, deletion) that were performed on another replica and update their own view so that they can respond to requests consistently. This is accomplished via an LDAP persistent search which is run in a monitor thread. Care was needed to avoid race conditions. Fortunately, the solution for LDAP-based profile storage provided a fine starting point for the authority monitor; although lightweight CAs are more complex, many of the same race conditions can occur and these were already addressed in the LDAP profile monitor implementation.

But unlike LDAP-based profiles, a lightweight CA consists of more than just an LDAP object; there is also the signing key. The signing key lives in Dogtag’s NSSDB and for security reasons cannot be transported through LDAP. This means that when a Dogtag clone observes the addition of a lightweight CA, an out-of-band mechanism to transport the signing key must also be triggered.

This mechanism is covered in the design pages but the summarised process is:

  1. A Dogtag clone observes the creation of a CA on another server and starts a KeyRetriever thread. The KeyRetriever is implemented as part of Dogtag, but it is configured to run the /usr/libexec/ipa/ipa-pki-retrieve-key program, which is part of FreeIPA. The program is invoked with arguments of the server to request the key from (this was stored in the authorityKeyHost attribute mentioned earlier), and the nickname of the key to request.
  2. ipa-pki-retrieve-key requests the key from the Custodia daemon on the source server. It authenticates as the dogtag/<requestor-hostname>@REALM service principal. If authenticated and authorised, the Custodia daemon exports the signing key from Dogtag’s NSSDB wrapped by the main CA’s private key, and delivers it to the requesting server. ipa-pki-retrieve-key outputs the wrapped key then exits.
  3. The KeyRetriever reads the wrapped key and imports (unwraps) it into the Dogtag clone’s NSSDB. It then initialises the Dogtag CA’s Signing Unit allowing the CA to service signing requests on that clone, and adds its own hostname to the CA’s authorityKeyHost attribute.

Some excerpts of the CA debug log on the clone (not the server on which the sub-CA was first created) shows this process in action. The CA debug log is found at /var/log/pki/pki-tomcat/ca/debug. Some irrelevant messages have been omitted.

[25/Jul/2016:15:45:56][authorityMonitor]: authorityMonitor: Processed change controls.
[25/Jul/2016:15:45:56][authorityMonitor]: authorityMonitor: ADD
[25/Jul/2016:15:45:56][authorityMonitor]: readAuthority: new entryUSN = 109
[25/Jul/2016:15:45:56][authorityMonitor]: CertificateAuthority init 
[25/Jul/2016:15:45:56][authorityMonitor]: ca.signing Signing Unit nickname caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd
[25/Jul/2016:15:45:56][authorityMonitor]: SigningUnit init: debug Certificate object not found
[25/Jul/2016:15:45:56][authorityMonitor]: CA signing key and cert not (yet) present in NSSDB
[25/Jul/2016:15:45:56][authorityMonitor]: Starting KeyRetrieverRunner thread

Above we see the authorityMonitor thread observe the addition of a CA. It adds the CA to its internal map and attempts to initialise it, which fails because the key and certificate are not available, so it starts a KeyRetrieverRunner in a new thread.

[25/Jul/2016:15:45:56][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Running ExternalProcessKeyRetriever
[25/Jul/2016:15:45:56][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: About to execute command: [/usr/libexec/ipa/ipa-pki-retrieve-key, caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd, f24b-0.ipa.local]

The KeyRetrieverRunner thread invokes ipa-pki-retrieve-key with the nickname of the key it wants, and a host from which it can retrieve it. If a CA has multiple sources, the KeyRetrieverRunner will try these in order with multiple invocations of the helper, until one succeeds. If none succeed, the thread goes to sleep and retries when it wakes up initially after 10 seconds, then backing off exponentially.

[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Importing key and cert
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Reinitialising SigningUnit
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: ca.signing Signing Unit nickname caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Got token Internal Key Storage Token by name
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Found cert by nickname: 'caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd' with serial number: 63
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Got private key from cert
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Got public key from cert
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: in init - got CA name CN=Smart Card CA,O=IPA.LOCAL

The key retriever successfully returned the key data and import succeeded. The signing unit then gets initialised.

[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Adding self to authorityKeyHosts attribute
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: In LdapBoundConnFactory::getConn()
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: postCommit: new entryUSN = 361
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: postCommit: nsUniqueId = 4dd42782-4a4f11e6-b003b01c-c8916432
[25/Jul/2016:15:47:14][authorityMonitor]: authorityMonitor: Processed change controls.
[25/Jul/2016:15:47:14][authorityMonitor]: authorityMonitor: MODIFY
[25/Jul/2016:15:47:14][authorityMonitor]: readAuthority: new entryUSN = 361
[25/Jul/2016:15:47:14][authorityMonitor]: readAuthority: known entryUSN = 361
[25/Jul/2016:15:47:14][authorityMonitor]: readAuthority: data is current

Finally, the Dogtag clone adds itself to the CA’s authorityKeyHosts attribute. The authorityMonitor observes this change but ignores it because its view is current.

Certificate renewal

CA signing certificates will eventually expire, and therefore require renewal. Because the FreeIPA framework operates with low privileges, it cannot add a Certmonger tracking request for sub-CAs when it creates them. Furthermore, although the renewal (i.e. the actual signing of a new certificate for the CA) should only happen on one server, the certificate must be updated in the NSSDB of all Dogtag clones.

As mentioned earlier, the ipa-certupdate command has been enhanced to add Certmonger tracking requests for FreeIPA-managed lightweight CAs. The actual renewal will only be performed on whichever server is the renewal master when Certmonger decides it is time to renew the certificate (assuming that the tracking request has been added on that server).

Let’s run ipa-certupdate on the renewal master to add the tracking request for the new CA. First observe that the tracking request does not exist yet:

# getcert list -d /etc/pki/pki-tomcat/alias |grep subject
        subject: CN=CA Audit,O=IPA.LOCAL 201606201330
        subject: CN=OCSP Subsystem,O=IPA.LOCAL 201606201330
        subject: CN=CA Subsystem,O=IPA.LOCAL 201606201330
        subject: CN=Certificate Authority,O=IPA.LOCAL 201606201330
        subject: CN=f24b-0.ipa.local,O=IPA.LOCAL 201606201330

As expected, we do not see our sub-CA certificate above. After running ipa-certupdate the following tracking request appears:

Request ID '20160725222909':
        status: MONITORING
        stuck: no
        key pair storage: type=NSSDB,location='/etc/pki/pki-tomcat/alias',nickname='caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd',token='NSS Certificate DB',pin set
        certificate: type=NSSDB,location='/etc/pki/pki-tomcat/alias',nickname='caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd',token='NSS Certificate DB'
        CA: dogtag-ipa-ca-renew-agent
        issuer: CN=Certificate Authority,O=IPA.LOCAL 201606201330
        subject: CN=Smart Card CA,O=IPA.LOCAL
        expires: 2036-07-15 05:46:00 UTC
        key usage: digitalSignature,nonRepudiation,keyCertSign,cRLSign
        pre-save command: /usr/libexec/ipa/certmonger/stop_pkicad
        post-save command: /usr/libexec/ipa/certmonger/renew_ca_cert "caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd"
        track: yes
        auto-renew: yes

As for updating the certificate in each clone’s NSSDB, Dogtag itself takes care of that. All that is required is for the renewal master to update the CA’s authoritySerial attribute in the Dogtag database. The renew_ca_cert Certmonger post-renewal hook script performs this step. Each Dogtag clone observes the update (in the monitor thread), looks up the certificate with the indicated serial number in its certificate repository (a new entry that will also have been recently replicated to the clone), and adds that certificate to its NSSDB. Again, let’s observe this process by forcing a certificate renewal:

# getcert resubmit -i 20160725222909
Resubmitting "20160725222909" to "dogtag-ipa-ca-renew-agent".

After about 30 seconds the renewal process is complete. When we examine the certificate in the NSSDB we see, as expected, a new serial number:

# certutil -d /etc/pki/pki-tomcat/alias -L \
    -n "caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd" \
    | grep -i serial
        Serial Number: 74 (0x4a)

We also see that the renew_ca_cert script has updated the serial in Dogtag’s database:

# ldapsearch -D cn="Directory Manager" -w4me2Test -b o=ipaca \
    '(cn=660ad30b-7be4-4909-aa2c-2c7d874c84fd)' authoritySerial
dn: cn=660ad30b-7be4-4909-aa2c-2c7d874c84fd,ou=authorities,ou=ca,o=ipaca
authoritySerial: 74

Finally, if we look at the CA debug log on the clone, we’ll see that the the authority monitor observes the serial number change and updates the certificate in its own NSSDB (again, some irrelevant or low-information messages have been omitted):

[26/Jul/2016:10:43:28][authorityMonitor]: authorityMonitor: Processed change controls.
[26/Jul/2016:10:43:28][authorityMonitor]: authorityMonitor: MODIFY
[26/Jul/2016:10:43:28][authorityMonitor]: readAuthority: new entryUSN = 1832
[26/Jul/2016:10:43:28][authorityMonitor]: readAuthority: known entryUSN = 361
[26/Jul/2016:10:43:28][authorityMonitor]: CertificateAuthority init 
[26/Jul/2016:10:43:28][authorityMonitor]: ca.signing Signing Unit nickname caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd
[26/Jul/2016:10:43:28][authorityMonitor]: Got token Internal Key Storage Token by name
[26/Jul/2016:10:43:28][authorityMonitor]: Found cert by nickname: 'caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd' with serial number: 63
[26/Jul/2016:10:43:28][authorityMonitor]: Got private key from cert
[26/Jul/2016:10:43:28][authorityMonitor]: Got public key from cert
[26/Jul/2016:10:43:28][authorityMonitor]: CA signing unit inited
[26/Jul/2016:10:43:28][authorityMonitor]: in init - got CA name CN=Smart Card CA,O=IPA.LOCAL
[26/Jul/2016:10:43:28][authorityMonitor]: Updating certificate in NSSDB; new serial number: 74

When the authority monitor processes the change, it reinitialises the CA including its signing unit. Then it observes that the serial number of the certificate in its NSSDB differs from the serial number from LDAP. It pulls the certificate with the new serial number from its certificate repository, imports it into NSSDB, then reinitialises the signing unit once more and sees the correct serial number:

[26/Jul/2016:10:43:28][authorityMonitor]: ca.signing Signing Unit nickname caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd
[26/Jul/2016:10:43:28][authorityMonitor]: Got token Internal Key Storage Token by name
[26/Jul/2016:10:43:28][authorityMonitor]: Found cert by nickname: 'caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd' with serial number: 74
[26/Jul/2016:10:43:28][authorityMonitor]: Got private key from cert
[26/Jul/2016:10:43:28][authorityMonitor]: Got public key from cert
[26/Jul/2016:10:43:28][authorityMonitor]: CA signing unit inited
[26/Jul/2016:10:43:28][authorityMonitor]: in init - got CA name CN=Smart Card CA,O=IPA.LOCAL

Currently this update mechanism is only used for lightweight CAs, but it would work just as well for the main CA too, and we plan to switch at some stage so that the process is consistent for all CAs.

Wrapping up

I hope you have enjoyed this tour of some of the lightweight CA internals, and in particular seeing how the design actually plays out on your systems in the real world.

FreeIPA lightweight CAs has been the most complex and challenging project I have ever undertaken. It took the best part of a year from early design and proof of concept, to implementing the Dogtag lightweight CAs feature, then FreeIPA integration, and numerous bug fixes, refinements or outright redesigns along the way. Although there are still some rough edges, some important missing features and, I expect, many an RFE to come, I am pleased with what has been delivered and the overall design.

Thanks are due to all of my colleagues who contributed to the design and review of the feature; each bit of input from all of you has been valuable. I especially thank Ade Lee and Endi Dewata from the Dogtag team for their help with API design and many code reviews over a long period of time, and from the FreeIPA team Jan Cholasta and Martin Babinsky for a their invaluable input into the design, and much code review and testing. I could not have delivered this feature without your help; thank you for your collaboration!

July 25, 2016

Lightweight Sub-CAs in FreeIPA 4.4

Last year FreeIPA 4.2 brought us some great new certificate management features, including custom certificate profiles and user certificates. The upcoming FreeIPA 4.4 release builds upon this groundwork and introduces lightweight sub-CAs, a feature that lets admins to mint new CAs under the main FreeIPA CA and allows certificates for different purposes to be issued in different certificate domains. In this post I will review the use cases and demonstrate the process of creating, managing and issuing certificates from sub-CAs. (A follow-up post will detail some of the mechanisms that operate behind the scenes to make the feature work.)

Use cases

Currently, all certificates issued by FreeIPA are issued by a single CA. Say you want to issue certificates for various purposes: regular server certificates, and user certificates for VPN authentication, and authentication to a particular web service. Currently, assuming the certificate bore the appropriate Key Usage and Extended Key Usages extensions (with the default profile, they do), a certificate issued for one of these purposes could be used for all of the other purposes.

Issuing certificates for particular purposes (especially client authentication scenarios) from a sub-CA allows an administrator to configure the endpoint authenticating the clients to use the immediate issuer certificate for validation client certificates. Therefore, if you had a sub-CA for issuing VPN authentication certificates, and a different sub-CA for issuing certificates for authenticating to the web service, one could configure these services to accept certificates issued by the relevant CA only. Thus, where previously the scope of usability may have been unacceptably broad, administrators now have more fine-grained control over how certificates can be used.

Finally, another important consideration is that while revoking the main IPA CA is usually out of the question, it is now possible to revoke an intermediate CA certificate. If you create a CA for a particular organisational unit (e.g. some department or working group) or service, if or when that unit or service ceases to operate or exist, the related CA certificate can be revoked, rendering certificates issued by that CA useless, as long as relying endpoints perform CRL or OCSP checks.

Creating and managing sub-CAs

In this scenario, we will add a sub-CA that will be used to issue certificates for users’ smart cards. We assume that a profile for this purpose already exists, called userSmartCard.

To begin with, we are authenticated as admin or another user that has CA management privileges. Let’s see what CAs FreeIPA already knows about:

% ipa ca-find
1 CA matched
  Name: ipa
  Description: IPA CA
  Authority ID: d3e62e89-df27-4a89-bce4-e721042be730
  Subject DN: CN=Certificate Authority,O=IPA.LOCAL 201606201330
  Issuer DN: CN=Certificate Authority,O=IPA.LOCAL 201606201330
Number of entries returned 1

We can see that FreeIPA knows about the ipa CA. This is the "main" CA in the FreeIPA infrastructure. Depending on how FreeIPA was installed, it could be a root CA or it could be chained to an external CA. The ipa CA entry is added automatically when installing or upgrading to FreeIPA 4.4.

Now, let’s add a new sub-CA called sc:

% ipa ca-add sc --subject "CN=Smart Card CA, O=IPA.LOCAL" \
    --desc "Smart Card CA"
Created CA "sc"
  Name: sc
  Description: Smart Card CA
  Authority ID: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
  Subject DN: CN=Smart Card CA,O=IPA.LOCAL
  Issuer DN: CN=Certificate Authority,O=IPA.LOCAL 201606201330

The --subject option gives the full Subject Distinguished Name for the new CA; it is mandatory, and must be unique among CAs managed by FreeIPA. An optional description can be given with --desc. In the output we see that the Issuer DN is that of the IPA CA.

Having created the new CA, we must add it to one or more CA ACLs to allow it to be used. CA ACLs were added in FreeIPA 4.2 for defining policies about which profiles could be used for issuing certificates to which subject principals (note: the subject principal is not necessarily the principal performing the certificate request). In FreeIPA 4.4 the CA ACL concept has been extended to also include which CA is being asked to issue the certificate.

We will add a CA ACL called user-sc-userSmartCard and associate it with all users, with the userSmartCard profile, and with the sc CA:

% ipa caacl-add user-sc-userSmartCard --usercat=all
Added CA ACL "user-sc-userSmartCard"
  ACL name: user-sc-userSmartCard
  Enabled: TRUE
  User category: all

% ipa caacl-add-profile user-sc-userSmartCard --certprofile userSmartCard
  ACL name: user-sc-userSmartCard
  Enabled: TRUE
  User category: all
  CAs: sc
  Profiles: userSmartCard
Number of members added 1

% ipa caacl-add-ca user-sc-userSmartCard --ca sc
  ACL name: user-sc-userSmartCard
  Enabled: TRUE
  User category: all
  CAs: sc
Number of members added 1

A CA ACL can reference multiple CAs individually, or, like we saw with users above, we can associate a CA ACL with all CAs by setting --cacat=all when we create the CA ACL, or via the ipa ca-mod command.

A special behaviour of CA ACLs with respect to CAs must be mentioned: if a CA ACL is associated with no CAs (either individually or by category), then it allows access to the ipa CA (and only that CA). This behaviour, though inconsistent with other aspects of CA ACLs, is for compatibility with pre-sub-CAs CA ACLs. An alternative approach is being discussed and could be implemented before the final release.

Requesting certificates from sub-CAs

The ipa cert-request command has learned the --ca argument for directing the certificate request to a particular sub-CA. If it is not given, it defaults to ipa.

alice already has a CSR for the key in her smart card, so now she can request a certificate from the sc CA:

% ipa cert-request --principal alice \
    --profile userSmartCard --ca sc /path/to/csr.req
  Certificate: MIIDmDCCAoCgAwIBAgIBQDANBgkqhkiG9w0BA...
  Subject: CN=alice,O=IPA.LOCAL
  Issuer: CN=Smart Card CA,O=IPA.LOCAL
  Not Before: Fri Jul 15 05:57:04 2016 UTC
  Not After: Mon Jul 16 05:57:04 2018 UTC
  Fingerprint (MD5): 6f:67:ab:4e:0c:3d:37:7e:e6:02:fc:bb:5d:fe:aa:88
  Fingerprint (SHA1): 0d:52:a7:c4:e1:b9:33:56:0e:94:8e:24:8b:2d:85:6e:9d:26:e6:aa
  Serial number: 64
  Serial number (hex): 0x40

Certmonger has also learned the -X/--issuer option for specifying that the request be directed to the named issuer. There is a clash of terminology here; the "CA" terminology in Certmonger is already used to refer to a particular CA "endpoint". Various kinds of CAs and multiple instances thereof are supported. But now, with Dogtag and FreeIPA, a single CA may actually host many CAs. Conceptually this is similar to HTTP virtual hosts, with the -X option corresponding to the Host: header for disambiguating the CA to be used.

If the -X option was given when creating the tracking request, the Certmonger FreeIPA submit helper uses its value in the --ca option to ipa cert-request. These requests are subject to CA ACLs.


It is worth mentioning a few of the limitations of the sub-CAs feature, as it will be delivered in FreeIPA 4.4.

All sub-CAs are signed by the ipa CA; there is no support for "nesting" CAs. This limitation is imposed by FreeIPA – the lightweight CAs feature in Dogtag does not have this limitation. It could be easily lifted in a future release, if there is a demand for it.

There is no support for introducing unrelated CAs into the infrastructure, either by creating a new root CA or by importing an unrelated external CA. Dogtag does not have support for this yet, either, but the lightweight CAs feature was designed so that this would be possible to implement. This is also why all the commands and argument names mention "CA" instead of "Sub-CA". I expect that there will be demand for this feature at some stage in the future.

Currently, the key type and size are fixed at RSA 2048. Same is true in Dogtag, and this is a fairly high priority to address. Similarly, the validity period is fixed, and we will need to address this also, probably by allowing custom CA profiles to be used.


The Sub-CAs feature will round out FreeIPA’s certificate management capabilities making FreeIPA a more attractive solution for organisations with sophisticated certificate requirements. Multiple security domains can be created for issuing certificates with different purposes or scopes. Administrators have a simple interface for creating and managing CAs, and rules for how those CAs can be used.

There are some limitations which may be addressed in a future release; the ability to control key type/size and CA validity period will be the highest priority among them.

This post examined the use cases and high-level user/administrator experience of sub-CAs. In the next post, I will detail some of the machinery that makes the sub-CAs feature work.

July 24, 2016

Looking for Andre

My Brother sent out the following message. Signal boosting it here.

“A few weeks ago I started talking to a few guys on the street. (Homeless) Let’s call them James and Anthony. Let’s just skip ahead. I bought them lunch. Ok. I bought $42 worth of Wendy’s $1 burgers and nuggets and a case of water. On top of their lunch. They gathered up all their friends by the Library in Copley sq and made sure that everyone ate. It was like a cookout. You should have seen how happy everyone was. It gave me a feeling that was unexplainable.

“This morning I was in Downtown crossings. I got the feeling in my gut again. That do something better today feeling. I saw a blind guy. His eyes were a mess. He was thin. Almost emaciated. Let’s call him Andre’ he is 30 years old.



I bought him lunch. I sat with him at a table while he ate. We talked. Andre’s back story…8 years ago he was in college. He was a plumbers apprentice. He was going on a date. As he walked up to the door to knock for the girl. Someone came up and shot him twice in the temple. Andre’ woke up in the hospital blind. To this day he has no idea who or why he was shot. The only possessions Andre’ had was the way-too-warm clothes on his back, his blind cane. His sign, and his cup. I took Andre’ to TJ Maxx. It’s 90 degrees at at 9:30am. I got him a t-shirt, shorts, clean socks and underwear and a back pack. After I paid, I took him back to the dressing room so he could have some privacy while he changed. I told the lady at the dressing room that he was going in to change. She told me that wasn’t allowed. I kindly informed her that I wasn’t asking… She looked at me and quickly realized it wasn’t a request. More of a statement. I must have had a look on my face.

I get those sometimes.

She nodded her understanding. In the dressing room Andre’ cried. He was ashamed for crying. I didn’t say much. Just put my hand on his back for a second to let him knew I understood. After he changed I took him back to where I originally met him and found out his routine. Where he goes when and such. I left Andre’ in his spot and went to go find James and Anthony. You remember them from the beginning of this story. They were in the same spot as a few weeks ago. They remembered me. I told them it was time to return the favor. I explained to them that I wanted them to look out for Andre’ to make sure he was safe. Andre’ has been repeatedly mugged. Who the fuck mugs a hungry homeless blind guy? Well. They must have seen the look in my face saying this wasn’t a request.

I apparently get that look sometimes.

They came with me from Copley all the way to downtown crossings. We went looking for Andre’. We looked all over but couldn’t find him. We went all over south station and back up all over downtown crossings. (For those not familiar, Google a map of Boston) we couldn’t find Andre’. Anthony said he’s seen him around and knew who I was talking about. They promised me they would look for him everyday. I know they will too. They look out for theirs. Remember all the food I bought them and how they made sure everyone ate? James doesn’t like bullies. He sure as shit won’t tolerate someone stealing from a blind and scared homeless guy. Anthony spends his mornings in south station. He promised me that he will find him and try to bring him to where they stay. It’s safer in numbers and when you have a crew watching your back. You have to know who to trust. That’s what they told me. I gave James and Anthony some money for their time and bought them each a cold drink.

“It’s fucking hot out.

“These guys are all on hard times. Some of them fucked up. Some were just unlucky. Andre’…now that’s some shit luck. That’s just not fucking fair. I’ve never met someone like Andre’. How in the hell would I survive if I couldn’t see? I have an amazing family and a great group of friends. Andre’ has no one. Did I change his life? Nope. Did I make his day better? I honestly hope so. I talked to him like a man. I didn’t let him know how horrible I felt for him. No matter how far you fall in life. If you have the strength to get up each day and try to feed your self, you still have pride, you still have hope. I didn’t want to take away any of his pride. He doesn’t have much to begin with. But he must have a little. I will continue to look for Andre’ every day. I met him near my office. I can look during my lunch. I have to find him and keep an eye on him.

“No matter how bad things get. No matter how unfair you feel you have been treated. Pretty much no matter what your lot in life is. Think of Andre’ when you feel down. If he has the strength to go on… So do you.

“I didn’t write this to say ‘look what great things I did.’ I wish I could write this with out being part of the story. There is no way I could express how much this meeting of Andre’ has effected me with out letting you know this is what I did today. ..

“I just got home from this experience. I’ll update this when I find Andre’ and let you know how he’s doing. If anyone in Boston reads this and sees a black guy about my height. Thinner than me…Obviously blind.

“Please hashtag ‪#‎lookingforAndre‬ and tell me where you saw him. Like I said. South station or downtown crossings are the areas that I know of. Thank you for reading this. Help me find Andre’.”

And then he sent this

“I found Andre’. He is meeting me for breakfast tomorrow.”



Billy Set up a fundraising account for Andre.


July 20, 2016

IoT Gateways

After discussing the “thing” part of IoT in Devices – the “Thing” in IoT let’s take a look at overall IoT system design.

IoT Gateways connect IoT Devices to IoT back-end systems. Gateways connect to devices using interfaces like Ethernet, WiFi, Bluetooth, 6LoWPAN, RS-485 and CANbus. Gateways connect to back-end systems through the Internet, commonly using Ethernet, WiFi, or cellular connections. Gateways perform multiple tasks, including concatenation of multiple devices, protocol conversion, device management, and security. Gateways may also perform application processing.

Since IoT Gateways are connected directly to IoT Devices they have to be co-located with the Devices. This means that gateways are deployed in hostile environments. They are accessed through network interfaces connecting both to local devices and to the Internet. People have physical access to the gateways. Users need access to the gateway to perform a variety of functions such as device discovery and registration. These users may be inexperienced, malicious, or both.

Gateways will often need to function which disconnected from the Internet. Such disconnected operation may be deliberate – a low power sensor may only connect to the network once a day, and spend the rest of the time in a low power sleep state. A system on a moving vehicle such as a truck, train, or ship may have critical communications through an expensive, low bandwidth cellular link, and then intermittently connect to a high bandwidth link such as WiFi. This might occur when a truck pulls into a warehouse or service station, when a ship docks, or when a train enters a station. These systems would be designed for disconnected operation. Another case might be a hospital, which needs to continue operations, perhaps in a degraded mode, in events where network connectivity, power, and other resources fail. It is clearly unacceptable for a hospital to shut down if it loses connection to the cloud!

These situations mean that a complete software stack needs to be installed on the gateway, with all of the management, update, and access challenges that this presents.

While gateways will most commonly be structured as application specific appliances there are many ways to use gateways.

July 18, 2016

Using a HooToo Nano as a magic VPN box
I've been getting myself ready for Blackhat. If you're going you know this conference isn't like most. You don't bring your normal gear with you. You turn the tinfoil hat knob up to an 11, then keep turning it until it breaks off. I did do one thing that's pretty clever this year though, I have no doubt it could be useful for someone else putting together an overengineered tin foil hat security rig.

When I travel I use a little travel router from HooToo. Specifically this one. The basic idea is I can use either ethernet or wifi to connect all my devices to the Internet. I get my own private network behind the device which lets the Chromecast work in a hotel and means I don't have to keep logging in 15 devices once a day. This got me thinking though, wouldn't it be cool if the HooToo router could VPN for me.

Enter the HooToo Nano.

Now I'm sure I could have found a travel router someone makes that does VPN, but that's not nearly as exciting as figuring this out myself, bricking it a few times, unbricking it, and eventually having a solution that works well enough I can live with it. You can install OpenWRT on it which makes it an insanely awesome device.

Here's the basics. I connect the router to a wireless network (which is a pain to with OpenWRT). Once I'm connected up, I flip the switch on the side of the Nano and it connects to the VPN, a green light turns on once the VPN is active. Everyone knows green means good, right? If I flip the switch back, it turns the VPN off (the green light turns off). The biggest problem was there is a bug in OpenWRT where if one of the wireless networks it's configured to connect to can't be found, none of the wireless will come up. My solution is I can hit the reset button to return the router to a known good state.

In the spirit of open source, I'll explain how to do all this. Your mileage may vary, it's not simple, but let's face it, it's awesome. I have a magic box that when the green light turns on, I no longer have to worry about the scary local wifi. Perfect for a conference where nobody and nothing can be trusted.

On with the show.

First, you need a HooToo Nano (this is easy). Then you install OpenWRT (this is less easy). I'm not going to explain this part. Apart from already being documented, I don't want to do it again to write it down, I have things working, I'm not touching anything.

Next you need to get openvpn working on it. I followed these instructions from the IPredator folks.

At this point you should have a functioning VPN if you run the init.d openvpn script. With the VPN up, I setup a firewall target called 'vpn'. That name will be important later.

First, we will need to create a nice default configuration. As I said before, OpenWRT has a bug where if one of your wireless networks can't be found, none will work. As I don't have time to figure that bug out right now, I put together some configuration files that only have one wireless network configured as an access point. This configuration exists so I can connect to the router and setup more networks. I then copied all the configuration files from /etc/config to /root/config/
Then I edit /etc/rc.button/reset to add the line
cp /root/config/* /etc/config/
Right before the sync and reboot commands. By doing this I can hit the reset button with a paperclip to return the router to my default settings. Also as a side note, if you hold the reset button down for more than 5 seconds it will do an OpenWRT factory reset, so don't do that.

Lastly, we setup the switch. The best way I could find to read it was by creating the directory /etc/hotplug.d/button, then adding an executable script called "buttons" to it.
root@OpenWrt:~# cat /etc/hotplug.d/button/buttons
. /etc/profile
#logger the button was $BUTTON and the action was $ACTION
if test "$BUTTON" = 'BTN_0'; then
if test "$ACTION" = 'pressed'; then
uci set firewall.@forwarding[0].dest='vpn'
/etc/init.d/openvpn start
if test "$ACTION" = 'released'; then
uci set firewall.@forwarding[0].dest='wan'
/etc/init.d/openvpn stop
/sbin/fw3 reload
As you can see in the script, I set the vpn firewall to my forwarding target. If you name your vpn firewall something else, be sure to change it.

Without  a doubt these instructions aren't as clear as they should be. I don't have time right now to write this up properly, someday I would love to put together an OpenWRT image with all this baked in, but for the moment I hope it's useful for someone.

If you try this and have questions, feel free to find me on Twitter: @joshbressers

July 11, 2016

Entry level AI
I was listening to the podcast Security Weekly and the topic of using AI For security work came up. This got me thinking about how most people make their way into security and what something like AI might mean for the industry.

In virtually every industry you start out doing some sort of horrible job nobody else wants to do, but you have to start there because it's the place you start to learn the skills you need for more exciting and interesting work. Nobody wants to go over yesterday's security event log, but somebody does it.

Now consider this in the context of AI. AI can and will parse the event logs faster and better than a human ever could. We're terrible at repetitive boring tasks. Computers are awesome at repetitive boring tasks. It might take the intern two hours to parse the log files, it will take the log parser two seconds. And the computer won't start thinking about donuts halfway through. Of course there are plenty of arguments how today's AI have problems which is true. They're still probably better than humans though.

But here is what really got me thinking. As more and more of this work moves to the domain of AI and machines, what happens to the entry level work? I'm all for replacing humans with robots, without getting into the conversation about what will all the humans do when the robots take over, I'm more interested in entry level work and where the new talent comes from.

For the foreseeable future, we will need people to do the high skilled security work. By definition most of the high skilled people are a bit on the aged side. Most of us worked our way up from doing something that can be automated away (thank goodness). But where will get our new batch of geezers from? If there are no entry level offering, how can security people make the jump to the next level? I'm sure right now there are a bunch of people standing up screaming "TRAINING", but let's face it, that only gets you a little way there, you still need to get your hands dirty before you're actually useful. You're not going to trust a brain surgeon who has never been in an operating room but has all the best training.

I don't have any answers or even any suggestions here. It just happened to get me thinking. It's possible automation will follow behind the geezers which would be a suitable solution. It's possible we'll need to make some token entry level positions just to raise the skill levels.

What do you think? @joshbressers

July 10, 2016


The term Liveness here refers to the  need to ensure that the data used to make an authorization check is valid at the time of the check.

The mistake I made with PKI tokens was in not realizing how important Liveness was.  The mistake was based on the age old error of confusing authentication with authorization.  Since a Keystone token is used for both, I was confused into thinking that the primary importance was on authentication, but the reality is that the most important thing a token tells you is information essential to making an authorization decision.

Who you are does not change often.  What you can do changes much more often.  What OpenStack needs in the token protocol is a confirmation that the user is authorized to make this action right now.  PKI tokens, without revocation checks, lost that liveness check.  The revocation check undermined the primary value of PKI.

That is the frustration most people have with certificate revocation lists (CRLs).  Since Certificates are so long lived, there is very little “freshness” to the data.  A CRL is a way to say “not invalidated yet” but, since a cert might carry data more than just “who are you” certificates can often become invalid.  Thus, any active system built on X509 for authorization (not just authentication) is going to have many many revocations.  Keystone tokens fit that same profile. The return to server validated tokens (UUID or Fernet) return that Freshness check.

However, bearer tokens have a different way of going stale.  If I get a token, use it immediately, the server knows that It was very highly probably that the token came from me.  If I wait, the probability drops.  The more I use the same token, and the longer I use it, the greater the probability is that someone other than me is going to get access to that token.  And that means the probability that it is going to be misused has also increased.

I’ve long said that what I want is a token that lasts roughly five minutes.  That means that it is issued, used, and  discarded, with a little wiggle room for latency and clock skew across the network.  The problem with this is that a token is often used for a long running task.  If a task takes 3 hours, but a token is good for only five minutes, there is no way to perform the task with just that token.

One possible approach to returning this freshness check is to always have some fresh token on a call, just not necessarily the one that the user originally requested.  This is the idea behind the Trust API.  A Trust is kind-of-like a long term token, but one that is only valid when paired with a short term token for the trustee.  But creating a trust every time a user wants to create a new virtual machine is too onerous, too much overhead.  What we want, instead is a rule that says:

When Nova calls Glance on behalf of a user, Nova passes a freshly issued token for itself along with the original users token.  The original user’s token will be validated based on when it was issued.  Authorization requires the combination of a fresh token for the Nova service user and a not-so-fresh-but-with-the-right-roles token for the end user.

This could be done with no changes to the existing token format. Set the token expiration to 12 hours.  The only change would be inside python-keystonemiddleware.  It would have a pair of rules:

  1. If a single token is passed in, it must have been issued within five minutes.  Otherwise, the operation returns a 401.
  2. If a service token is passed in with the user’s token, the service token must have been issued within five minutes.  The users token is validated normally.

An additional scope limiting mechanism would further reduce the possibility of abuse.  For example,

  • Glance could limit the service-token scoped operations from Nova to fetching an image and saving a snapshot.
  • Nova might only allow service-scoped tokens from a service like Trove within a 15 minute window.
  • A user might have to ask for an explicit “redelegation” role on a token before handing it off to some untrusted service run off site.

With Horizon, we already have a mechanism that says that it has to fetch an unscoped token first, and then use that to fetch a scoped token.  Horizon can be smart enough to fetch an scoped token before each bunch of calls to a remote server, cache if for only a minute, and use the unscoped token only in communication with Keystone.  The unscoped token, being validated by Keystone, is sufficient for maintaining “Liveness” of the rest of the data for a particular workflow.

Its funny how little change this would require to OpenStack, and how big an impact it would make on security.  It is also funny how long it took for this concept to coalesce.

July 09, 2016

Tokens without revocation

PKI tokens in Keystone suffered from many things, most essentially the trials due to the various forms of revocation. I never wanted revocation in the first place. What could we have done differently? It just (I mean moments ago) came to me.

A PKI token is a signed document that says “at this point in time, these things are true” where “these things” have to do with users roles in projects. Revocation means “these things are no longer true.” But long running tasks need long running authentication. PKI tokens seem built for that.

What we should distinguish is a difference between kicking off a new job, and continued authorization for an old job. When a user requests something from Nova, the only identity that comes into play is the users own Identity. Nova needs to confirm this, but, in a PKI token world, there is no need to go and ask Keystone.

In a complex operation like launching a VM, Nova needs to ask Glance to do something. Today, Nova passes on the token it received, and all is well. This makes tokens into true bearer tokens, and they are passed around far too much for my comfort.

Lets say that, to start, when Nova calls Glance, Nova’s own Identity should be confirmed. Tokens are really poor for this, a much better way would be to use X509. While Glance would need to do a mapping transform, the identity of Nova would not be transferable. Put another way, Nova would not be handing off a bearer token to Glance. Bearer tokens from Powerful systems like Nova are a really scary thing.

If we had this combination of user-confirmed-data and service-identity, we would have a really powerful delegation system. Why could this not be done today, with UUID/Fernet tokens? If we only ever had to deal with a max of two hops, (Nova to Glance, Nova to Neutron) we could.

Enter Trove, Heat, Sahara, and any other process that does work on behalf of a user. Lets make it really fun and say that we have the following chain of operations:


If any one links in this chain is untrusted, we cannot pass tokens along.
What if, however, each step had a rule that said “I can accept tokens for users from Endpoint E”  and passed a PKI token along.  User submits a PKI token to Heat.  Heat passes this. plus its own identity on to Sahara, that trusts Heat.  And so on down the line.

OK…revocations.  We say here that a PKI token is never revoked.  We make it valid for the length of long running operations…say a day.

But we add an additional rule:  A user can only use a PKI token within 5 minutes of issue.

Service to Service calls can use PKI tokens to say “here is when it was authorized, and it was good then.”

A user holds on to A PKI token for 10 minutes, tries to call Nova, and the token is rejected as “too old.”

This same structure would work with Fernet tokens, assuming a couple things:

  1. We get rid of revocations checks for tokens validated with service tokens.
  2. If a user loses a role, we are OK with having a long term operation depending on that role failing.

I think this general structure would make OpenStack a hell of a lot more scalably secure than it is today.

Huge thanks to Jamie Lennox for proposing a mechanism along these lines.

Bypassing Version Discovery in Keystoneauth1

I’ve been a happy Dreamhost customer for many years.  So I was thrilled when I heard that they had upgrade Dreamcompute to Mitaka.  So, like the good Keystoner that I am, I went to test it out.  Of course, I tried to use the V3 API.   And it failed.

What?  Dreamhost wouldn’t let me down, would they?

No.  V3 works fine, it is discovery that is misconfigured.

If you do not tell the openstack client (and thus keystoneauth1) what plugin to use, it defaults to the non version specific Password plugin that does version discovery,  What this means is it will go to the auth URL you give it, and try to figure out what the right version to use is.  And, it so happens that there is a nasty bit of Keystone which is not well documented that makes the dreamhost /v3 page look like this:

$ curl $OS_AUTH_URL
{"version": {"status": "stable", "updated": "2013-03-06T00:00:00Z", "media-types":

[{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"},

{"base": "application/xml", "type": "application/vnd.openstack.identity-v3+xml"}], "id":

"v3.0", "links": [{"href": "", "rel": "self"}]}}

See that last link?

Now, like a good service provider, Dreamhost keeps its Keystone administration inside, behind their firewall.


Non-authoritative answer:

[ayoung@ayoung541 dreamhost]$ curl

Crickets…hangs.  Same with a request to 35357.  And since the Password auth plugin is going to use the URL from the /v3 page, which is

To get around this, Dreamhost will shortly change their Keystone config file:  If they have the base line config shipped with Keystone, they have, in the section:


admin_endpoint = <None>

Which is what is used in discovery to build the URL above.  yeah,  It is dumb.  Instead, they will set it to

And discovery will work.

But I am impatient, and I want to test it now. The work around is to bypass discovery and specify the V3 version of the Keystoneauth1 Password protocol. The version specific plugin uses the AUTH_URL as provided to figure out where to get tokens. With the line:

export OS_AUTH_TYPE=v3password

And now…

$ openstack server show   
| Field                                | Value                                                   |
| OS-DCF:diskConfig                    | MANUAL                                                  |
| OS-EXT-AZ:availability_zone          | iad-1                                                   |
| OS-EXT-STS:power_state               | 1                                                       |
| OS-EXT-STS:task_state                | None                                                    |
| OS-EXT-STS:vm_state                  | active                                                  |
| OS-SRV-USG:launched_at               | 2016-06-17T03:28:48.000000                              |
| OS-SRV-USG:terminated_at             | None                                                    |
| accessIPv4                           |                                                         |
| accessIPv6                           |                                                         |
| addresses                            | private-network=2607:f298:6050:499d:f816:3eff:fe6a:afdb, 
                                               ,             |
| config_drive                         |                                                         |
| created                              | 2016-06-17T03:27:09Z                                    |
| flavor                               | warpspeed (400)                                         |
| hostId                               | 4a7c64b912cfeda73c2c56ac52e8ffd124aac29ec54e1e4902d54bd4|
| id                                   | f0f46fd3-fa59-4a5b-835d-a638f6276566                    |
| image                                | CentOS-7 (c1e8c5b5-bea6-45e9-8202-b8e769b661a4)         |
| key_name                             | ayoung-pubkey                                           |
| name                                 |                                      |
| os-extended-volumes:volumes_attached | []                                                      |
| progress                             | 0                                                       |
| project_id                           | 9c7e4956ea124220a87094a0a665ec82                        |
| properties                           |                                                         |
| security_groups                      | [{u'name': u'ayoung-all-open'}]                         |
| status                               | ACTIVE                                                  |
| updated                              | 2016-06-17T03:28:24Z                                    |
| user_id                              | b6fd4d08f2c54d5da1bb0309f96245bc                        |

And how cool is that: they are using IPv6 for their private network.

If you want to generate your own V3 config file from the file they ship, use this.

July 08, 2016

Installing FreeIPA in as few lines as possible

I had this in another post, but I think it is worth its own.

sudo hostnamectl set-hostname --static undercloud.ayoung-dell-t1700.test
export address=`ip -4 addr  show eth0 primary | awk '/inet/ {sub ("/24" ,"" , $2) ; print $2}'`
echo $address `hostname` | sudo tee -a /etc/hosts
sudo yum -y install ipa-server-dns
export P=FreIPA4All
ipa-server-install -U -r `hostname -d|tr "[a-z]" "[A-Z]"` -p $P -a $P --setup-dns `awk '/^name/ {print "--forwarder",$2}' /etc/resolv.conf`

Just make sure you have enough entropy.

Merging FreeIPA and Tripleo Undercloud Apache installs

My Experiment yesterday left me with a broken IPA install. I aim to fix that.

To get to the start state:

From my laptop, kick off a Tripleo Quickstart, stopping prior to undercloud deployment:

./ --teardown all -t  untagged,provision,environment,undercloud-scripts  ayoung-dell-t1700.test

SSH in to the machine …

ssh -F /home/ayoung/.quickstart/ssh.config.ansible undercloud

and set up FreeIPA;

$ cat


sudo hostnamectl set-hostname --static undercloud.ayoung-dell-t1700.test
export address=`ip -4 addr  show eth0 primary | awk '/inet/ {sub ("/24" ,"" , $2) ; print $2}'`
echo $address `hostname` | sudo tee -a /etc/hosts
sudo yum -y install ipa-server-dns
export P=FreIPA4All
sudo ipa-server-install -U -r `hostname -d|tr "[a-z]" "[A-Z]"` -p $P -a $P --setup-dns `awk '/^name/ {print "--forwarder",$2}' /etc/resolv.conf`

Backup the HTTPD config directory:

 sudo cp -a /etc/httpd/ /root

Now go continue the undercloud install


Once that is done, the undercloud passes a sanity check. Doing a diff between the two directories shows a lot of differences.

sudo diff -r /root/httpd  /etc/httpd/

All of the files in /etc/httpd/conf.d that were placed by the IPA install are gone, as are the following module files in /root/httpd/conf.modules.d

Only in /root/httpd/conf.modules.d: 00-base.conf
Only in /root/httpd/conf.modules.d: 00-dav.conf
Only in /root/httpd/conf.modules.d: 00-lua.conf
Only in /root/httpd/conf.modules.d: 00-mpm.conf
Only in /root/httpd/conf.modules.d: 00-proxy.conf
Only in /root/httpd/conf.modules.d: 00-systemd.conf
Only in /root/httpd/conf.modules.d: 01-cgi.conf
Only in /root/httpd/conf.modules.d: 10-auth_gssapi.conf
Only in /root/httpd/conf.modules.d: 10-nss.conf
Only in /root/httpd/conf.modules.d: 10-wsgi.conf

TO start, I am going to backup the existing HTTPD directory :

 sudo cp -a /etc/httpd/ /home/stack/

Te rest of this is easier to do as root, as I want some globbing. First, I’ll copy over the module config files

 sudo su
 cp /root/httpd/conf.modules.d/* /etc/httpd/conf.modules.d/
 systemctl restart httpd.service

Test Keystone

 . ./stackrc 
 openstack token issue

Get a token…good to go…ok, lets try toe conf.d files.

sudo cp /root/httpd/conf.d/* /etc/httpd/conf.d/
systemctl restart httpd.service

Then as a non admin user

$ kinit admin
Password for admin@AYOUNG-DELL-T1700.TEST: 
[stack@undercloud ~]$ ipa user-find
1 user matched
  User login: admin
  Last name: Administrator
  Home directory: /home/admin
  Login shell: /bin/bash
  UID: 776400000
  GID: 776400000
  Account disabled: False
  Password: True
  Kerberos keys available: True
Number of entries returned 1

This is a fragile deployment, as updating either FreeIPA or the Undercloud has the potential to break one or the other…or both. But it is a start.

De-conflicting Swift-Proxy with FreeIPA

Port 8080 is a popular port. Tomcat uses it as the default port for unencrypted traffic. FreeIA, installs Dogtag which runs in Tomcat. Swift proxy also chose that port number for its traffic. This means that if one is run on that port, the other cannot. Of the two, it is easier to change FreeIPA, as the port is only used for internal traffic, where as Swift’s port is in the service catalog and the documentation.

Changing the port in FreeIPA requires modifications in both the config directories for Dogtag and the Python code that contacts it.

The Python changes are in


Look for any instance of 8080 and change them to another port that will not conflict. I chose 8181

The config changes for dogtag are in /etc/pki such as /etc/pki/pki-tomcat/ca/CS.cfg and again, change 8080 to 8181.

Restart the server with:

sudo systemctl status ipa.service

To confirm run a command that hits the CA:

 ipa cert-find

I have a ticket in with FreeIPA to try and get support for this in.

With these changes made, I tested out then installing the undercloud on the same node and it seems to work.

However, the IPA server is no longer running. The undercloud install seems to have cleared out the ipa config files from under /etc/httpd/conf.d. However, DOgtag is still running as shown by

curl localhost:8181

Next experiment will be to see if I can preserve the IPA configuration

July 05, 2016

Launching a Centos VM in Tripleo Overcloud

My Overcloud deploy does not have any VM images associates with it. I want to test launching a VM.

Get the VM from Centos:

curl -O
unxz < CentOS-7-x86_64-GenericCloud.qcow2.xz >CentOS-7-x86_64-GenericCloud.qcow2
glance --os-image-api-version 2 image-create --name 'CentOS-7-x86_64-GenericCloud' --disk-format qcow2 --container-format bare --file CentOS-7-x86_64-GenericCloud.qcow2

Wait for that to finish, and check with

$ openstack image list
| ID                                   | Name                         | Status |
| 06841fb4-df1c-458d-898e-aea499342905 | CentOS-7-x86_64-GenericCloud | active |

Now launch it:

openstack server create --flavor m1.small --image CentOS-7-x86_64-GenericCloud testrun

And it becomes active pretty quickly:

$ openstack server list
| ID                                   | Name    | Status | Networks |
| 76585723-e2c3-4acb-88d5-837b69000f72 | testrun | ACTIVE |          |

It has no network capability. To Destroy:

openstack server delete 76585723-e2c3-4acb-88d5-837b69000f72
But I have work to do!
There’s a news story going around that talks about how horrible computer security tends to be in hospitals. This probably doesn’t surprise anyone who works in the security industry, security is often something that gets in the way, it’s not something that helps get work done.

There are two really important lessons we should take away from this. The first is that a doctor or nurse isn’t a security expert, doesn’t want to be a security expert, and shouldn’t be a security expert. Their job is helping sick people. We want them helping sick people, especially if we’re the people who are sick. The second is that when security gets in the way, security loses. Security should lose when it gets in the way, we’ve been winning far too often and it’s critically damaged the industry.

They don’t want to be security experts

It’s probably not a surprise that doctors and nurses don’t want to be computer security experts. I keep going back and forth between “you need some basics” and “assume nothing”. I’m back to the assume nothing camp this week. I think in the context of health care workers, security can’t exist, at least not the way we see it today. These are people and situations where seconds can literally be the difference between life and death. Will you feel better knowing the reason your grandma died was because they were using strong passwords? Probably not. In the context of a hospital, if there is any security it has to be totally transparent, the doctors shouldn’t have to know anything about it, and it should work 100% of the time. This is of course impossible.

So the real question isn’t how do we make security 100% reliable, the question is where do we draw our risk line. We want this line as far from perfect security and as close to saving lives as possible. If we start to think in this context it changes our requirements quite a lot. There will be a lot of “good enough security”. There will be a lot of hard choices to make and anyone who can make them will have to be extremely knowledgeable with both health care and security. I bet there aren’t a lot of people who can do this today.

This leads us to point #2

When security gets in the way, security loses

If you’re a security person, you see people do silly and crazy things all the time. Every day all day. How many times a day do you ask “why did you do that”? Probably zero. It’s more likely you say “don’t do that” constantly. If you have kids, the best way to get them to do something is to say “don’t do that”. If we think about security in the context of a hospital, the answer to “why did you do that” is pretty simple, it’s because the choice was probably between getting the job done and following the security guidelines. A hospital is one of the extremes where it’s easy to justify breaking the rules. If you don’t, people die. In most office settings if you break the rules, nobody dies, there will possibly be some sort of security event that will cost time and money. Historically speaking, in an office environment, we tell people “don’t do that” and expect them to listen, in many cases they pretend to listen.

This attitude of “listen to me because” has created a security universe where we don’t pay attention to what people are actually doing, we don’t have to. We get in the way, then when someone tries to get their work done, we yell at them for not following our bizarre and esoteric rules instead of understanding the challenge and how we can solve it together. The next great challenge we have isn't tighter rules, or better training, it's big picture. How can we start looking at systems with a big picture view? It won't be easy, but it's where we go next.

What do you think? Let me know: @joshbressers

July 01, 2016

Clearing the Keystone Environment

If you spend a lot of time switching between different cloud, different users, or even different projects for the same user when working with openstack, you’ve come across the problem where one environment variable from an old sourceing pollutes the current environment.  I’ve been hit by that enough times that I wrote a small script to clear the environment.

I call it clear_os_env

unset OS_TOKEN
unset OS_URL
unset OS_USER_ID
unset OS_USER_ID

Source this prior to sourcing any keystone.rc file, and you should have cleared out the old variables, regardless of how vigilant the new source file writer was in clearing old variables. THis includes some old variables that should no longer be used, like OS_SERVICE_TOKEN

June 27, 2016

The future of security
The Red Hat Summit is happening this week in San Francisco. It's a big deal if you're part of the Red Hat universe, which I am. I'm giving the Red Hat security roadmap talk this year. The topic has me thinking about the future of security quite a lot. It's easy to think about this in the context of an organization like Red Hat, we have a lot of resources, and there are a lot of really interesting things happening. Everything from container security, to operating system security, to middleware security. My talk will end up youtube at some point, I'll link to it, but I also keep thinking about the bigger picture. Where will security be in the next 5, 10, 15 years?

Will ransomware still be a thing in ten years? Will bitcoin still be around? What about flash? How will open source adapt to all the changes? Will we even call them containers?

The better question here is "what do we want security to look like?"

If we look at some of the problems that always make the news, stolen personal information, password leaks, ransomware, hacking. These aren't new problems, most are almost as old as the Internet. The question is really, can we fix any of these problems? The answer might be "no". Some problems aren't fixable, crime is an example of this. When you have unfixable problems the goal is to control the problem, not prevent it.

How do we control security?

I think we're headed down this path today. It's still slow going and there are a lot of old habits that will die hard. Most decent security organizations aren't focused on pure prevention anymore, they understand that security is process and people, it's all about having nice policies and good staff. If you have those things you can start to work on controlling some aspects of what's happening. If you want users to behave you have to make it easy for them to do the right thing. If you don't want them opening email attachments, make it easy to not use email attachments.

There are still a lot of people who think it's enough to tell people not to do something, or yell at them if they behave in a way that is quite honestly expected. People don't like getting yelled at, they don't like having to go out of their way to do anything, they will always pick the option that is easiest.

Back to the point though. What will the future of security look like? I think the future of security is people. Technology is great, but all our fancy technology is to solve problems that are in the past. If we want to solve the problems of the future, we need good people to first understand those problems, then we can understand how to solve them. This is of course easier said than done, but sometimes just understanding the problem is.

Are you a people? Do you have ideas how to make things better? Tell me: @joshbressers

June 20, 2016

Decentralized Security
If you're a fan of the cryptocurrency projects, you've heard of something called Ethereum. It's similar to bitcoin, but is a seperate coin. It's been in the news lately due to an attack on the currency. Nobody is sure how this story will end at this point, there are a few possible options, none are good. This got me thinking about the future of security, there are some parallels when you compare traditional currency to crypto currency as well as where we see security heading (stick with me here).

The current way currency works is there is some central organization that is responsible for minting and controlling the currency, usually a country. There are banks, exchanges, loans, interest, physical money, and countless other ways the currency interacts with society. We will compare this to how IT security has mostly worked in the past. You had one large organization responsible for everything. If something went wrong, you could rely on the owner to take control and make things better. There are some instances where this isn't true, but in general it holds.

Now if we look at cryptocurrency, there isn't really a single group or person in charge. That's the whole point though. The idea is to have nobody in charge so the currency can be used with some level of anonymity. You don't have to rely on some sort of central organization to give the currency legitimacy, the system itself has legitimacy built in.

This reminds of the current state of shadow IT, BYOD, and cloud computing in general. The days of having one security group that was in charge of everything are long gone. Now we have distributed responsibility as well as distributed risk. It's up to each group to understand how they must interact with each other. The risk is shifted from one central organization to nearly everyone involved.

Modified risk isn't a bad thing, demonizing it isn't the point of this discussion. The actual point is that we now exist in an environment that's new to us. The history of humanity has taught us how to exist in an environment where there is a central authority. We now exist in a society that is seeing a shift from central authorities to individuals like never before. The problem with this is we don't know how to deal with or talk about such an environment. When we try to figure out what's happening with security we use analogies that don't work. We talk about banks (just like this post) or cars or doors or windows or boats.

The reality though is we don't really know what this means. We now exist in an environment where everything is becoming distributed, even security. The days of having a security group that rules with an iron fist are gone. If you have an iron fist, you end up with a massive shadow IT problem. In a world based on distributed responsibility the group with the iron fist becomes irrelevant.

The point of bringing up Ethereum wasn't to pick on its problems. It's to point out that we should watch them closely. Regardless of how this problem is solved there will be lessons learned. Success can be as good as a mistake if you understand what happened and why. The face of security is changing and a lot of us don't understand what's happening. There are no analogies that work here, we need new analogies and stories. Right now one of the easiest to understand stories around distributed security is cryptocurrency. Even if you're not bitcoin rich, you should be paying attention, there are lessons to be learned.
Keystone Auth Entry Points

OpenStack libraries now use Authenication plugins from the keystoneauth1 library. One othe the plugins has disappered? Kerbersop. This used to be in the python-keystoneclient-kerberos package, but that is not shipped with Mitaka. What happened?

To list the posted entry points on a Centos Based system, you can first look in the entry_points.txt file:

cat /usr/lib/python2.7/site-packages/keystoneauth1-2.4.1-py2.7.egg-info/entry_points.txt
v2token = keystoneauth1.loading._plugins.identity.v2:Token
admin_token = keystoneauth1.loading._plugins.admin_token:AdminToken
v3oidcauthcode = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectAuthorizationCode
v2password = keystoneauth1.loading._plugins.identity.v2:Password
v3password = keystoneauth1.loading._plugins.identity.v3:Password
v3oidcpassword = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectPassword
token = keystoneauth1.loading._plugins.identity.generic:Token
v3token = keystoneauth1.loading._plugins.identity.v3:Token
password = keystoneauth1.loading._plugins.identity.generic:Password

But are there others?

Looking in the source repo: We can see a reference to Kerberos (as well as SAML, which has also gone missing), before the enumeration of the entry points we see above.

kerberos =
  requests-kerberos>=0.6:python_version=='2.7' or python_version=='2.6' # MIT
saml2 =
  lxml>=2.3 # BSD
oauth1 =
  oauthlib>=0.6 # BSD
betamax =
  betamax>=0.7.0 # Apache-2.0
  fixtures>=3.0.0 # Apache-2.0/BSD
  mock>=2.0 # BSD


keystoneauth1.plugin =
    password = keystoneauth1.loading._plugins.identity.generic:Password
    token = keystoneauth1.loading._plugins.identity.generic:Token
    admin_token = keystoneauth1.loading._plugins.admin_token:AdminToken
    v2password = keystoneauth1.loading._plugins.identity.v2:Password
    v2token = keystoneauth1.loading._plugins.identity.v2:Token
    v3password = keystoneauth1.loading._plugins.identity.v3:Password
    v3token = keystoneauth1.loading._plugins.identity.v3:Token
    v3oidcpassword = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectPassword
    v3oidcauthcode = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectAuthorizationCode
    v3oidcaccesstoken = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectAccessToken
    v3oauth1 = keystoneauth1.extras.oauth1._loading:V3OAuth1
    v3kerberos = keystoneauth1.extras.kerberos._loading:Kerberos
    v3totp = keystoneauth1.loading._plugins.identity.v3:TOTP

We see that the Kerberos plugin requires requests-kerberos>=0.6 so let’s get that installed via

sudo yum install python-requests-kerbero

And then try to enumerate the entry points via python

>>> import pkg_resources
>>> named_objects = {}
>>> for ep in pkg_resources.iter_entry_points(group='keystoneauth1.plugin'):
...     named_objects.update({ ep.load()})
>>> print (named_objects)
{'v2token': <class>, 'token': <class>, 'admin_token': <class>, 'v3oidcauthcode': <class>, 'v3token': <class>, 'v2password': <class>, 'password': <class>, 'v3password': <class>, 'v3oidcpassword': <class>}

We still don’t have the Kerberos plugin. Going back to the setup.cfg file, we see the Python class for the Kerberos plugin is not listed. Kerberos is implemented here in the source tree. Does that exist in our package managed file system?

$ rpm --query --list python2-keystoneauth1-2.4.1-1.el7.noarch | grep$

Yes. It does. Can we load that by class?

>>> from keystoneauth1.extras import kerberos
>>> print kerberos

Yes, although the RPM version is a little earlier than the git repo. So what is the entry point name? There is not one, yet. The only way to get the class is by the full class name.

We’ll fix this, but the tools for enumerating the entrypoints are something I’ve used often enough that I want to get them documented.

June 17, 2016

The difference between auth_uri and auth_url in auth_token

Dramatis Personae:

Adam Young, Jamie Lennox: Keystone core.

Scene: #openstack-keystone chat room.

ayoung: I still don’t understand the difference between url and uri
jamielennox:auth_uri ends up in “WWW-Authenticate: Keystone uri=%s” header. that’s its only job
ayoung: and what is that meant to do? tell someone where they need to go to authenticate?
jamielennox: yea, it gets added to all 401 responses and then i’m pretty sure everyone ignores it
ayoung:so they should be the same thing, then, right? I mean, we say that the Keystone server that you authenticate against is the one that nova is going to use to validate the token. and the version should match
jamielennox: depends, most people use an internal URL for auth_url but auth_uri would get exposed to the public
ayoung: ah
jamielennox: there should be no version in auth_uri
ayoung: so auth_uri=main auth_url=admin in v2.0 speak
jamielennox: yea. more or less. ideally we could default it way better than that, like auth.get_endpoint(‘identity’, interface=’public’), but that gets funny
ayoung: This should be a blog post. You want to write it or shall I? I’m basically just going to edit this conversation.
jamielennox: mm, blog, i haven’t written one of those for a while


June 16, 2016

Learning about the Overcloud Deploy Process

The process of deploying the overcloud goes through several technologies. Here’s what I’ve learned about tracing it.

I am not a Heat or Tripleo developer. I’ve just started working with Tripleo, and I’m trying to understand this based on what I can gather, and the documentation out there. And also from the little bit of experience I’ve had working with Tripleo. Anything I say here might be wrong. If someone that knows better can point out my errors, please do so.

[UPDATE]: Steve Hardy has corrected many points, and his comments have been noted inline.

To kick the whole thing off in the simplest case, you would run the command openstack overcloud deploy .

Roughly speaking, here is the sequence (as best as I can tell)

  1.  User types  openstack overcloud deploy on the command line
  2. This calls up the common cli, which parses the command, and matches the tripleo client with the overcloud deploy subcommand.
  3. tripleo client is a thin wrapper around the Heat client, and calls the equivalent of heat stack-create overcloud
  4. python-heatclient (after Keystone token stuff) calls the Heat API server with the URL and data to do a stack create
  5. Heat makes the appropriate calls to Nova (running the Ironic driver) to activate a baremetal node and deploy the appropriate instance on it.
  6. Before the node is up and running, Heat has posted Hiera data to the metadata server.
  7. The newly provisioned machine will run cloud-init which in turn runs os-collect-config.
    [update] Steve Hardy’s response:

    This isn’t strictly accurate – cloud-init is used to deliver some data that os-collect-config consumes (via the heat-local collector), but cloud-init isn’t involved with actually running os-collect-config (it’s just configured to start in the image).

  8. os-collect-config will start polling for changes to the metadata.
  9. [update] os-collect-config will start calling Puppet Apply based on the hiera data [UPDATE] os-refresh-config only, which then invokes a script that runs puppet. .
    Steve’s note:

    os-collect-config never runs puppet, it runs os-refresh-config only, which then invokes a script that runs puppet.

  10. The Keystone Puppet module will set values in the Keystone config file, httpd/conf.d files, and perform other configuration work.

Here is a diagram of how os-collect-config is designed

When a controller image is built for Tripleo, Some portion of the Hiera data is stored in /etc/puppet/. There is a file /etc/puppet/hiera.yaml (which looks a lot like /etc/hiera.yaml, an RPM controlled file) and sub file in /etc/puppet/hieradata such as

UPDATE: Response from Steve Hardy

This is kind-of correct – we wait for the server to become ACTIVE, which means the OS::Nova::Server resource is declared CREATE_COMPLETE. Then we do some network configuration, and *then* we post the hieradata via a heat software deployment.

So, we post the hieradata to the heat metadata API only after the node is up and running, and has it’s network configured (not before).

Note the depends_on – we use that to control the ordering of configuration performed via heat.

However, the dynamic data seems to be stored in /var/lib/os-collect-config/

$ ls -la  /var/lib/os-collect-config/*json
-rw-------. 1 root root   2929 Jun 16 02:55 /var/lib/os-collect-config/ControllerAllNodesDeployment.json
-rw-------. 1 root root    187 Jun 16 02:55 /var/lib/os-collect-config/ControllerBootstrapNodeDeployment.json
-rw-------. 1 root root   1608 Jun 16 02:55 /var/lib/os-collect-config/ControllerCephDeployment.json
-rw-------. 1 root root    435 Jun 16 02:55 /var/lib/os-collect-config/ControllerClusterDeployment.json
-rw-------. 1 root root  36481 Jun 16 02:55 /var/lib/os-collect-config/ControllerDeployment.json
-rw-------. 1 root root    242 Jun 16 02:55 /var/lib/os-collect-config/ControllerSwiftDeployment.json
-rw-------. 1 root root   1071 Jun 16 02:55 /var/lib/os-collect-config/ec2.json
-rw-------. 1 root root    388 Jun 15 18:38 /var/lib/os-collect-config/heat_local.json
-rw-------. 1 root root   1325 Jun 16 02:55 /var/lib/os-collect-config/NetworkDeployment.json
-rw-------. 1 root root    557 Jun 15 19:56 /var/lib/os-collect-config/os_config_files.json
-rw-------. 1 root root 263313 Jun 16 02:55 /var/lib/os-collect-config/request.json
-rw-------. 1 root root   1187 Jun 16 02:55 /var/lib/os-collect-config/VipDeployment.json

For each of these files there are two older copies that end in .last and .orig as well.

In my previous post, I wrote about setting Keystone configuration options such as ‘identity/domain_specific_drivers_enabled’: value => ‘True’;. I can see this value set in /var/lib/os-collect-config/request.json with a large block keyed “config”.

When I ran the openstack overcloud deploy, one way that I was able to track what was happening on the node was to tail the journal like this:

 sudo journalctl -f | grep collect-config

Looking through the journal output, I can see the line that triggered the change:

... /Stage[main]/Main/Keystone_config[identity/domain_specific_drivers_enabled]/ensure: ...

June 15, 2016

Custom Overcloud Deploys

I’ve been using Tripleo Quickstart.  I need custom deploys. Start with modifying the heat templates. I’m doing a mitaka deploy

git clone
cd tripleo-heat-templates/
git branch --track mitaka origin/stable/mitaka
git checkout mitaka
diff -r  /usr/share/openstack-tripleo-heat-templates/ tripleo-heat-templates/

Mine shows some differences, but in the file extraconfig/tasks/liberty_to_mitaka_aodh_upgrade_2.pp which should be OK. The commit is

Add redis constraint to aodh upgrade manifest

Modify the launch script in /home/stack

$ diff
< openstack overcloud deploy --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 60 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server \
> openstack overcloud deploy --templates  /home/stack/tripleo-heat-templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 60 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server \

The only change should be from

--templates  #(followed by another flag which means that --templates takes the default) 


--templates /home/stack/tripleo-heat-templates 

OK…let’s make sure we still have a stable system. First, tear down the overcloud deliberately:

[stack@undercloud ~]$ . ./stackrc 
[stack@undercloud ~]$ heat stack-delete overcloud
Are you sure you want to delete this stack(s) [y/N]? y
| id                                   | stack_name | stack_status    | creation_time       | updated_time |
| 00d81e5b-c2f9-4f6a-81e8-b135fadba921 | overcloud  | CREATE_COMPLETE | 2016-06-15T18:01:25 | None         |

Wait until the delete is coplete with

$ watch heat stack-list

Wait until it changes from

| id                                   | stack_name | stack_status	 | creation_time       | updated_
time |
| 00d81e5b-c2f9-4f6a-81e8-b135fadba921 | overcloud  | DELETE_IN_PROGRESS | 2016-06-15T18:01:25 | None


| id | stack_name | stack_status | creation_time | updated_time |

And now run the modified overcloud deploy:


End of the output looks like this

Stack overcloud CREATE_COMPLETE
/home/stack/.ssh/known_hosts updated.
Original contents retained as /home/stack/.ssh/known_hosts.old
PKI initialization in init-keystone is deprecated and will be removed.
Warning: Permanently added '' (ECDSA) to the list of known hosts.
The following cert files already exist, use --rebuild to remove the existing files before regenerating:
/etc/keystone/ssl/certs/ca.pem already exists
/etc/keystone/ssl/private/signing_key.pem already exists
/etc/keystone/ssl/certs/signing_cert.pem already exists
Connection to closed.
Skipping "horizon" postconfig because it wasn't found in the endpoint map output
Overcloud Endpoint:
Overcloud Deployed
+ heat stack-list
+ exit 0

Don’t be fooled by the last line grep -q CREATE_FAILED as that is the shell script execution logging, not a statement of failure.

OK, to do a proper “Hello, World” here I’d really like to be able to affect change on the deployment. I’m going to try and set a coupole Keystone config values that are not set (yet) in /etc/keystone/keystone.conf.

In my undercloud git repo for tripleo-heat-templates I make changes to the Overcloud post config.

$ git diff
diff --git a/puppet/manifests/overcloud_controller.pp b/puppet/manifests/overcloud_controller.pp
index c353ec0..c6385d4 100644
--- a/puppet/manifests/overcloud_controller.pp
+++ b/puppet/manifests/overcloud_controller.pp
@@ -223,6 +223,11 @@ if hiera('step') >= 3 {
   #TODO: need a solution here
+  keystone_config {  
+   'identity/domain_specific_drivers_enabled': value => 'True';  
+   'identity/domain_config_dir': value => '/etc/keystone/domains';  
+  }  
   file { [ '/etc/keystone/ssl', '/etc/keystone/ssl/certs', '/etc/keystone/ssl/private' ]:
     ensure  => 'directory',
     owner   => 'keystone',

And rerun


Once it has successfull deployed, I can check to see if the change shows up in the keystone.conf file.

$ . ./stackrc 
[stack@undercloud ~]$ openstack server list
| ID                                   | Name                    | Status | Networks            |
| 761a1b61-8bd1-4b85-912b-775e51ad99f3 | overcloud-controller-0  | ACTIVE | ctlplane= |
| f123da36-9b05-4fc3-84bb-4af147fa76f7 | overcloud-novacompute-0 | ACTIVE | ctlplane= |
[stack@undercloud ~]$ ssh heat-admin@
$ sudo grep domain_specific /etc/keystone/keystone.conf
#domain_specific_drivers_enabled = false
domain_specific_drivers_enabled = True
# if domain_specific_drivers_enabled is set to true. (string value)
[heat-admin@overcloud-controller-0 ~]$ sudo grep domain_config_dir /etc/keystone/keystone.conf
#domain_config_dir = /etc/keystone/domains
domain_config_dir = /etc/keystone/domains

Changes applied.

June 13, 2016

Ready to form Voltron! why security is like a giant robot make of lions
Due to various conversations about security this week, Voltron came up in the context of security. This is sort of a strange topic, but it makes sense when we ponder modern day security. If you talk to anyone, there is generally one thing they push as a solution for a problem. This is no different for security technologies. There is always one thing that will fix your problems. In reality this is never the case. Good security is about putting a number of technologies together to create something bigger and better than any one thing can do by itself.

For those of you who don't know what Voltron is, Voltron was a cartoon when I was a kid. There were 5 robot lions that sometime during every show would combine together to create one big robot called Voltron. By themselves the lions were pretty awesome, but it always seemed the bad guy would keep getting stronger until the lions couldn't deal with it alone, only by coming together to form a giant robot of pure awesome could they destroy whatever horrible create was causing problems.

This sounds just like security. Just a firewall will eventually be beaten by your adversaries. Just code reviews won't keep things safe for long (if at all). Just using ASLR is only good for a little while. When we start putting everything together though, things get good.

There are some people who get this, they know that there isn't one thing that's going to fix it all, a lot don't though. It's very common to attend a talk about a new security feature or product. If you talk to a vendor without a doubt whatever they're doing will cure what ails you. How often does anyone talk about how their product, feature, or idea will fit in the big picture? How can two or more things work together to add security? It's pretty uncommon to see anyone talking about how well things work together. It's human nature though. We can usually only do one thing, and why wouldn't you be proud of what you're working on? You want to talk about what you do and what you know.

I'm often guilty of this too. When talking about something like containers I'll focus on selinux, or updates, or trusted content, or seccomp. Rarely is the whole story told. Part of this may be because security technology is usually really complex, it's hard to hold a good view of it all in your head at once. The thing is though, none of those are overly useful by themselves. They're all good and do great things, but it's not until you put everything together that you can see a real difference.

This all makes sense when you think about it. Layers of defense are almost always more effective than a single layer (I know there is a lot of nuance to this, but in general, let's not nitpick). Would you want to rely on only seccomp, or would you rather have seccomp, cgroups, selinux, user namespaces, trusted content, content scanning, and ExecShield? It's a no brainer when you think about it.

How can we start to think about things as a giant evil fighting robot instead of small (but still awesome) lions? It's never easy, it's even harder when you have to expect different groups to share attention and recognition. It's going to be more important in the future though. If we don't take better looks at how things work together it's going to be a lot harder to see real improvements.

What do you think? Let me know: @joshbressers

June 08, 2016

SAML Federated Auth Plugin

SAML is usually thought of as a WebSSO mechanism, but it can be made to work for command line operations if you use the Extended Client Protocol (ECP). When we did the Rippowam demo last year, we were successful in getting an Unscoped token by using ECP, but that was not sufficient to perform operations on other services that need a scoped token.

The general approach that we are looking at for Keystone is to always have the user ask for an unscoped token first, and then upgrade that to a scoped token. The scoping process can only be done from unscoped to scoped (configuration option) to prevent elevation of privilege attacks.

The base federation plugin is capable of handling this kind of workflow. Thus, the general approach is to write a protocol specific plugin to get an unscoped token, and then to use common logic in the base class v3.FederatedBaseAuth to convert unscoped to scoped.

I just got [edit: used to say keystone] opentstack flavor list to work with ECP and Keycloak. I had to create a new auth plugin to do it:

Created a new entry point in


v3fedsaml = keystoneclient.contrib.auth.v3.saml2:FederatedSAML2

Added this to

class FederatedSAML2(v3.FederatedBaseAuth):
    """Authenticate using SAML via the keystone federation mechanisms.

       Wraps both the unscoped SAML2 Plugin to
       1.  Request an unscoped token
       2.  Use the unscoped token to request a scoped token


    def get_options(cls):
        options = super(FederatedSAML2, cls).get_options()
            cfg.StrOpt('identity-provider', help="Identity Provider's name"),
            cfg.StrOpt('protocol', help="SAML2"),
                       help="Identity Provider's URL"),
            cfg.StrOpt('user-name', dest='username', help='Username',
            cfg.StrOpt('password', help='Password')
        return options

    def __init__(self, auth_url,
                 username, password,
        #protocol = kwargs.pop('protocol')
        super(FederatedSAML2, self).__init__(auth_url, identity_provider, protocol,
        self._unscoped = Saml2UnscopedToken(auth_url,
                                            username, password,

    def get_unscoped_auth_ref(self, session, **kwargs):
         return self._unscoped.get_auth_ref(session, **kwargs)

Updated my keystone RC file:

export OS_AUTH_TYPE=v3fedsaml

This is based on RH OSP8 which is Liberty release. In later releases of OSP, the client libraries are synchronized with later versions, including the gradual replacement of keystoneauth for the Auth plugins housed in python-keystone. Thus, there will be a couple variations on this plauoing, including one that may have to live out of tree if we want it for OSP8.

June 07, 2016

IoT Technology: Devices

Discussions of IoT often focus on the technology, so let’s start there. IoT consists of devices, which are the “things” that interact with the physical world and communicate with IoT Back-end systems over a network. There are two types of IoT devices: sensors and actuators.

An IoT system will typically be made of many devices – from dozens to millions – talking to a scaleable Back-end system. This Back-end system often runs in the Cloud. In some cases the IoT devices will talk directly to the Back-end systems. In other cases an additional system called an IoT Gateway will be placed between the devices and the Back-end systems. The IoT Gateway will typically talk to multiple local IoT devices, perform communications protocol conversions, perform local processing, and connect to the Back-end systems over a Ethernet, WiFi, or cellular modem link.

IoT Devices

IoT devices consist of sensors, actuators, and communications. Sensors, as the name implies, read information from the physical world. Examples would be temperature, humidity, barometric pressure, light, weight, CO2, motion, location, Ph level, chemical concentration for many chemicals, distance, voltage, current, images, etc. There are sensors available for an incredible range of information and many more under development. Think of things like a tiny DNA sequencer or a sensor that can detect the presence of the bacteria or virus associated with various diseases – both of these are under development!

Actuators are able to change something in the physical world. Examples would be a light switch, a remotely operated valve, a remotely controlled door lock, a stepper motor, a 3D printer, or the steering, brakes and throttle for a self driving car.

IoT Device Examples

For an idea of the range of low cost IoT compatible sensors take a look at Spark Fun Electronics, a leading source of IoT device technology for prototyping, development, and hobbyists. The sensor section at lists over 200 sensors that can be used with Arduino and similar systems. Note that these are basically development and prototyping units – prices in production quantities will be lower.

Some sensors are familiar – temperature is perhaps the most obvious example. But many are more interesting. Consider, for example, the gas sensors: hydrogen, methane, lpg, alcohol, carbon monoxide; all available at prices of $4.95 – $7.95. Combined one of these with an Arduino Pro Mini available for $9.95, and you can build a targeted chemical sensor for less than $20.00.

What can you do with a $20.00 lpg or carbon monoxide sensor? That is the wrong question. Instead, you should be asking the question “what problems am I facing that could be addressed with a low cost network connected sensor?” The point is that there is an incredible and growing array of inexpensive sensors available. The technology is available – what we need now is the imagination to begin to creatively use ubiquitous sensors, ubiquitous networking, ubiquitous computing, and ubiquitous data.

The application of modern electronics technology to sensors is just beginning to be felt. As in many other areas of IoT, the basic capabilities have been around for years – detecting and measuring the concentration of lpg vapor or carbon monoxide isn’t new. Detecting lpg vapor concentration with a sub $20 networked device that feeds the data directly into a large distributed computing system in a form that is readily manipulated by software is new. And huge!

Lpg and carbon monoxide are just examples. The same technologies are producing sensors for a wide range of chemicals and gasses.

The combination of useful capabilities, low cost, network connection, and integration into complex software applications is a complete revolution. And this revolution is just beginning. What happens to agriculture when we can do a complete soil analysis for each field? What happens if we have nutrient, moisture, light, and temperature information for each ten foot square in a field, updated every 15 minutes over the entire growing season? What happens when we have this information for a 20 year period? What happens when this information is dynamically combined with plant growth monitoring, standard plant growth profiles, weather forecasts and climatic trends?

Going further, what if this data is combined with an active growth management system where application of fertilizer, pesticide, and water is optimized for individual micro-areas within a field? Technology is progressing to the point where we can provide the equivalent of hands-on gardening to commercial fields.

As an example of work going on in this area see the Industrial Internet Consortium Testbed on Precision Crop Management at

June 06, 2016

Is there a future view that isn't a security dystopia?
I recently finished reading the book Ghost Fleet, it's not a bad read if you're into what cyberwar could look like. It's not great though, I won't suggest it as the book of the summer. The biggest thing I keep thinking about is I've yet to really see any sort of book that takes place in the future, with a focus on technology, that isn't a dystopian warning. Ghost Fleet is no different.

My favorite part was how everyone knew the technology was totally pwnt, yet everyone still used it. There were various drones, smart display glasses, AI to control boats, rockets, even a space laser (which every book needs). This reminds me of today to a certain degree. We all use web sites we know will be hacked. We know our identities have been stolen. We know our phones aren't secure. Our TVs record our conversations. You can even get doorbells that can stream you a video feed. We love this technology even though it's either already hacked, or will be soon. We know it and we don't care, we just keep buying broken phones, TVs, blenders, cars, anything that comes with WiFi!

Disregarding the fact that we are probably already living in the dystopian future, it really made me wonder if there are any examples of a future that isn't a security nightmare? You could maybe make the argument that Star Trek is our hopeful future, but that's pretty old these days. And even then, the android took over the ship more times than I'd be comfortable with. I think it's safe to say their security required everyone to be a decent human. If that's our only solution, we're pretty screwed.

Most everything I come across is pretty bleak and I get why. Where is our escape from all the insecure devices we pretend we hate? The only number growing faster than the number of connected devices is the number of security flaws in those devices. There aren't even bad ideas to fix this stuff, there's just nothing. The thing about bad ideas is they can often be fixed. A smart person can take a bad idea and turn it into a good idea. Bad ideas are at least something to build on. I don't see any real ideas to fix these devices. We have nothing to build on. Nothing is dangerous. No matter how many times you improve it, it's still nothing. I have no space laser, so no matter how many ideas I have to make it better, I still won't have a space laser (if anyone has one I could maybe borrow, be sure to let me know).

Back to the idea about future technology. Are there any real examples of a future based heavily on technology that isn't a horrible place? This worries me. One of the best parts about science fiction is getting to dream about a future that's better than the present. Like that computer on the space ship in 2001, that thing was awesome! It had pretty good security too ... sort of.

So here is the question we should all think about. At what point do connected devices get bad enough people stop buying them? We're nowhere near that point today. Will we ever reach that point? Maybe people will just accept the fact that their dishwasher will send spam when it's not running and the toaster will record your kitchen conversations. I really want to live in a nice future, one where our biggest threat is an android that got taken over by a malevolent intelligence, not one where my biggest threat is my doorbell.

Do you know of any non dystopian predictions? Let me know: @joshbressers

June 03, 2016

Fun with bash, or how I wasted an hour trying to debug some SELinux test scripts.
We are working to get SELinux and Overlayfs to work well together.  Currently you can not run docker containers
with SELinux on an Overlayfs back end.  You should see the patches posted to the kernel list within a week.

I have been tasked to write selinuxtestsuite tests to verify overlayfs works correctly with SELinux.
These tests will help people understand what we intended.

One of the requirements for overlayfs/SELinux is to check not only the access of the task process doing some access
but also the label of the processes that originally setup the overlayfs mount.

In order to do the test I created two process types test_overlay_mounter_t and test_overlay_client_t, and then I was using
runcon to execute a bash script in the correct context.  I added code like the following to the test to make sure that the runcon command was working.

# runcon -t test_overlay_mounter_t bash <<EOF
echo "Mounting as $(id -Z)"

The problem was when I ran the tests, I saw the following:

Mounting as unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Sadly it took me an hour to diagnose what was going on.  Writing several test scripts and running commands by hand.  Sometimes it seemed to work and other times it would not.  I thought there was a problem with runcon or with my SELinux policy.  Finally I took a break and came back to the problem realizing that the problem was with bash.  The $(id -Z) was
executed before the runcon command.

Sometimes you feel like an idiot.

runcon -t test_overlay_mounter_t bash <<EOF
echo "Mounting as $(id -Z)"
echo -n "Mounting as "
id -Z
Mounting as unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Mounting as unconfined_u:unconfined_r:test_overlay_mounter_t:s0-s0:c0.c1023

My next blog will explain how we expect overlayfs to work with SELinux.

June 01, 2016

The Internet of Things

There area lot of things to explore around IoT – what it is, technology, system architectures, security, implementation challenges, and many others. We will get to all of those, but a great place to start is how we got here and the implications of IoT. Rather than starting with things, let’s start with what is really important – economics.

Just what is the Internet of Things (IoT)? At the simplest level it is devices that interact with the physical world and communicate over a network. Simple, but with very significant implications. Let’s dig into these implications and see how such a simple concept can have such great impact.

The major drivers of IoT are technology, economics, software, and integration. Individually these are significant. Combined they will have a major impact on all aspects of life. Some of this impact is good, and some may be bad. As with many things, good vs. bad will often depend on the implementation and how it is used.

Is IoT New?

A common question is whether or not IoT is something new and revolutionary or a buzzword for old ideas? The answer is “yes”…

Much of the foundation of IoT has been around for quite a while. SCADA systems, or Supervisory Control And Data Acquisition has been around since the 1950’s managing electrical power grids, railroads, and factories. Machine communications over telephone lines and microwave links has been around since the 1960’s. Machine control systems, starting on mainframes and minicomputers, have also been around since the 1960’s.

The big changes are economics, software, and integration. Microsensors and SoC (System on a Chip) technology for CPUs and networking are driving the cost of devices down – in some cases by a factor of a thousand! Advances in networking – both networking technology as well as the availability of pervasive networking – are changing the ground rules and economics for machine to machine communication.

The use of standards is greatly easing integration. Advances in software, software frameworks, and development tools, as well as the availability of functional libraries for many tasks, is creating an explosion in innovative IoT products and capabilities.

But the most significant new factor in IoT is economics. Technology, pervasive networking, and cloud computing are driving the cost of IoT down – in many cases by a factor of a thousand or more! New capabilities in sensors and actuators are opening up new areas of application. Cost reductions this large are often more important than new capabilities as they vastly broaden areas of application.

Another massive change is monetization of data. Companies are increasingly aware of the value of the data captured from IoT systems, especially after extensive analysis and datamining.

Further emphasizing the importance of economics are the new business models are emerging. For example, jet engine companies moved from selling jet engines to selling “thrust hours” – a service based model of supplying power as a service rather than selling hardware. A key part of this is extensive use of IoT to monitor every aspect of jet engine operation to provide more effective maintenance and support of the engines. As an example, Virgin Atlantic reports that their Boeing 787 aircraft produce 500GB of data per flight.

May 31, 2016

Reviews for RDO packages

We are in the process of getting the docs straightened out for reviewing RDO packages. As we do, I want to record what I have working.

I started with

rdopkg clone openstack-keystone

But that did not give me a repo that was in sync with my Gerrit account. I ended up with a .gti/config that looks likethis:

	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
[remote "origin"]
	url =
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "rpm-master"]
	remote = rpm-master
	merge = refs/heads/rpm-master
[remote "patches"]
	url =
	fetch = +refs/heads/*:refs/remotes/patches/*
[remote "upstream"]
	url = git://
	fetch = +refs/heads/*:refs/remotes/upstream/*
[remote "gerrit"]
        url = ssh://
        fetch = +refs/heads/*:refs/remotes/gerrit/*
	email =

openstack/keystone-distgit.git is the RPM packaging repo for Keystone.

openstack/keystone is the main keystone code repo.

My github account predates my time at Red Hat, and I’d rather not mess with that account, but the vast majority of the rest of my git work is done as, so I wanted to make a local user config.

With this setup I could make a review

By editing openstack-keystone.logrotate, commiting, and running git review.

May 29, 2016

Regulation can fix security, except you can't regulate security
Every time I start a discussion about how we can solve some of our security problems it seems like the topics of professional organizations and regulation are where things end up. I think regulations and professional organizations can fix a lot of problems in an industry, I'm not sure they work for security. First let's talk about why regulation usually works, then, why it won't work for security.

What is regulation?
You may not know it, but you deal with regulated industries every day. The food we eat, the cars we drive, the buildings we use, the roads, our water, products we buy, phones, internet, banks; there are literally too many to list. The reasons for the regulation vary greatly, but at the end of the day it's a nice way to use laws to protect society. It doesn't always directly protect people, sometimes it protects the government, or maybe even a giant corporation, but the basic idea is because of the regulation society is a better place. There are plenty of corner cases but for now let's just assume the goal is to make the world a better place.

Refrigerator regulation
One of my favorite stories about regulation involves refrigerator doors. A long time ago the door to a refrigerator would lock from the outside. If someone found themselves on the inside with a closed door, they couldn't get out. Given a refrigerator is designed to be air tight, one wouldn't last very long on the inside. The government decided to do something about this and told the companies that made refrigerators there had to be a way to get out if you're stuck inside. Of course this was seen as impossible and it was expected most companies would have to go out of business or stop making refrigerators. Given a substantial percentage of the population now owns refrigerators, it's safe to say that didn't happen. The solution was to use magnets to hold the door shut. Now the thought of using a locking door seems pretty silly especially when the solution was elegant and better in nearly every way.

Can we regulate cybersecurity?
The short answer is no. It can't be done. I do hate claiming something can't be done, someday I might be wrong. I imagine there will be some form of regulation eventually, it probably won't really work though. Let's use the last financial crisis to explain this. The financial industry has a lot of regulation, but it also has a lot of possibility. What I mean by this is the existing regulation mostly covers bad things that were done in the past, it's nearly impossible to really regulate the future due to the nature of regulation. So here's the thing. How many people went to jail from the last financial crisis? Not many. I'd bet in a lot of cases while some people were certainly horrible humans, they weren't breaking any laws. This will be the story of security regulation. We can create rules to dictate what happened in the past, but technology, bad guys, and people move very quickly in this space. If you regulated the industry to prevent a famous breach from a few years ago (there are many to choose from), by now the whole technology landscape has changed so much many of those rules wouldn't even apply today. This gets even crazier when you think about the brand new technology being invented every day.

Modern computer systems are Turing complete
A refrigerator has one door. One door that the industry didn't think they could fix. A modern IT system can do an infinite number of operations. You can't regulate a machine that can literally do anything. This would be like saying the front fridge door can't lock when you have a fridge with infinite area on the inside. If you can't find the door, and there are millions of other doors, some which don't open, it's not a useful regulation.

This is our challenge. We have machines that can literally do anything, and we have to make them secure. If there are infinite operations, there are by definitions infinite security problems. I know that's a bit over dramatic, but the numbers are big enough they're basically infinity.

The things that generally come up revolve around having security professionals, or training staff, or getting tools to lock things down, or better defaults. None of this things will hurt, but none really work either. even if you have the best staff in the world, you have to work with vendors who don't. Even if you have the best policies and tools, your developers and sysadmins will make silly mistakes. Even with the best possible defaults, one little error can undo everything.

What can we do?
I'm not suggesting we should curl up in the corner and weep (I'm also not saying not to). Weeping can be less dangerous than letting the new guy configure the server, it's not very helpful long term. I'm not suggesting that tools and training and staff are wastes of time and money, they have value to a certain point. It's sort of like taking a CPR course. You can't do brain surgery, but you can possibly save a life in an emergency. The real fix is going to be from technology and process that don't exist yet. Cybersecurity is a new concept that we can't use old models to understand. We need new models, tools, and ideas. They don't exist yet, but they will someday. Go invent them, I'm impatient and don't want to wait.

If you have any ideas, let me know: @joshbressers

May 27, 2016

OpenShift and SSSD Part 3: Extended LDAP Attributes


This is the third post in a series on setting up advanced authentication mechanisms with OpenShift Origin. This entry will build upon the foundation created earlier, so if you haven’t already gone through that tutorial, start here and continue here.

Configuring Extended LDAP Attributes


  • SSSD 1.12.0  or later. This is available on Red Hat Enterprise Linux 7.0 and later.
  • mod_lookup_identity 0.9.4 or later. The required version is not yet available on any released version of Red Hat Enterprise Linux 7, but RPMs for this platform are available from upstream at this COPR repository until they arrive in Red Hat Enterprise Linux.

Configuring SSSD

First, we need to ask SSSD to look up attributes in LDAP that it normally doesn’t care about for simple system-login use-cases. In the OpenShift case, there’s really only one such attribute: email. So we need to modify the [domain/DOMAINNAME] section of /etc/sssd/sssd.conf on the authenticating proxy and add this attribute:

ldap_user_extra_attrs = mail

Next, we also have to tell SSSD that it’s acceptable for this attribute to be retrieved by apache, so we need to add the following two lines to the [ifp] section of /etc/sssd/sssd.conf as well:

user_attributes = +mail
allowed_uids = apache, root

Now we should be able to restart SSSD and test this configuration.

# systemctl restart sssd.service

# getent passwd <username>
username:*:12345:12345:Example User:/home/username:/usr/bin/bash

# gdbus call \
        --system \
        --dest org.freedesktop.sssd.infopipe \
        --object-path /org/freedesktop/sssd/infopipe/Users/example_2ecom/12345 \
        --method org.freedesktop.DBus.Properties.Get \
        "org.freedesktop.sssd.infopipe.Users.User" "extraAttributes"
(<{'mail': ['']}>,)

Configuring Apache

Now that SSSD is set up and successfully serving extended attributes, we need to configure the web server to ask for them and to insert them in the correct places.

First, we need to install and enable the mod_lookup_identity module for Apache (See note in the “Prerequisites” setting for installing on RHEL 7):

# yum -y install mod_lookup_identity

Second, we need to enable the module so that Apache will load it. We need to modify /etc/httpd/conf.modules.d/55-lookup_identity.conf and uncomment the line:

LoadModule lookup_identity_module modules/

Next, we need to let SELinux know that it’s acceptable for Apache to connect to SSSD over D-BUS, so we’ll set an SELinux boolean:

# setsebool -P httpd_dbus_sssd on

Then we’ll edit /etc/httpd/conf.d/openshift-proxy.conf and add the following lines (bolded to show the additions) inside the <ProxyMatch /oauth/authorize> section:

  <ProxyMatch /oauth/authorize>
    AuthName openshift

    LookupOutput Headers
    LookupUserAttr mail X-Remote-User-Email
    LookupUserGECOS X-Remote-User-Display-Name

    RequestHeader set X-Remote-User %{REMOTE_USER}s env=REMOTE_USER

Then restart Apache to pick up the changes.

# systemctl restart httpd.service

Configuring OpenShift

The proxy is now all set, so it’s time to tell OpenShift where to find these new attributes during login. Edit the /etc/origin/master/master-config.yaml file and add the following lines to the identityProviders section (new lines bolded):

  - name: sssd
  challenge: true
  login: true
  mappingMethod: claim
    apiVersion: v1
    kind: RequestHeaderIdentityProvider
    challengeURL: "${query}"
    loginURL: "${query}"
    clientCA: /etc/origin/master/proxy/proxyca.crt
    - X-Remote-User
    - X-Remote-User-Email
    - X-Remote-User-Display-Name

Go ahead and launch OpenShift with this updated configuration and log in to the web as a new user. You should see their full name appear in the upper-right of the screen. You can also verify with oc get identities -o yaml that both email addresses and full names are available.

Debugging Notes

OpenShift currently only saves these attributes to the user at the time of the first login and doesn’t update them again after that. So while you are testing (and only while testing), it’s advisable to run oc delete users,identities --all to clear the identities out so you can log in again.

OpenShift and SSSD Part 2: LDAP Form Authentication


This is the second post in a series on setting up advanced authentication mechanisms with OpenShift Origin. This entry will build upon the foundation created earlier, so if you haven’t already gone through that tutorial, start here. Note that some of the content on that page has changed since it was first published to ensure that this second part is easier to set up, so make sure to double-check your configuration.

Configuring Form-based Authentication

In this tutorial, I’m going to describe how to set up form-based authentication to use when signing into the OpenShift Origin web console. The first step is to prepare a login page. The OpenShift upstream repositories have a handy template for forms, so we will copy that down to our authenticating proxy on

# curl -o /var/www/html/login.html \

You may edit this login HTML however you prefer, but if you change the form field names, you will need to update those in the configuration below as well.

Next, we need to install another Apache module, this time for intercepting form-based authentication.

# yum -y install mod_intercept_form_submit

Then we need to modify /etc/httpd/conf.modules.d/55-intercept_form_submit.conf and uncomment the LoadModule line.

Next, we’ll add a new section to our openshift-proxy.conf inside the <VirtualHost *:443> block.

  <Location /login-proxy/oauth/authorize>
    # Insert your backend server name/ip here.

    InterceptFormPAMService openshift
    InterceptFormLogin httpd_username
    InterceptFormPassword httpd_password

    RewriteCond %{REQUEST_METHOD} GET
    RewriteRule ^.*$ /login.html [L]

This tells Apache to listen for POST requests on the /login-proxy/oauth/authorize and pass the username and password over to the openshift PAM service, just like in the challenging-proxy example in the first entry of this series. This is all we need to do on the Apache side of things, so restart the service and move back over to the OpenShift configuration.

In the master-config.yaml, update the identityProviders section as follows (new lines bolded):

  - name: any_provider_name
    challenge: true
    login: true
    mappingMethod: claim
      apiVersion: v1
      kind: RequestHeaderIdentityProvider
      challengeURL: "${query}"
      loginURL: "${query}"
      clientCA: /etc/origin/master/proxy/proxyca.crt
      - X-Remote-User

Now restart OpenShift with the updated configuration. You should be able to browse to and use your LDAP credentials at the login form to sign in.

May 26, 2016

OpenShift and SSSD Part 1: Basic LDAP Authentication


OpenShift provides a fairly simple and straightforward authentication provider for use with LDAP setups. It has one major limitation, however: it can only connect to a single LDAP server. This can be problematic if that LDAP server becomes unavailable for any reason. When this happens, end-users get very unhappy.

Enter SSSD. Originally designed to manage local and remote authentication to the host OS, it can now be configured to provide identity, authentication and authorization services to web services like OpenShift as well. It provides a multitude of advantages over the built-in LDAP provider; in particular it has the ability to connect to any number of failover LDAP servers as well as to cache authentication attempts in case it can no longer reach any of those servers.

These advantages don’t come without a cost, of course: the setup of this configuration is somewhat more advanced, so I’m writing up this guide to help you get it set up. Rather than adding a few lines to the master-config.yml in OpenShift and calling it a day, we are going to need to set up a separate authentication server that OpenShift will talk to. This guide will describe how to do it on a dedicated physical or virtual machine, but the concepts should also be applicable to loading up such a setup in a container as well. (And in the future, I will be looking into whether we could build such a static container right into OpenShift, but for now this document will have to suffice.) For this guide, I will use the term VM to refer to either type of machine, simply because it’s shorter to type and read.

This separate authentication server will be called the “authenticating proxy” from here on out and describes a solution that will provide a specialized httpd server that will handle the authentication challenge and return the results to the OpenShift Server. See the OpenShift documentation for security considerations around the use of an authenticating proxy.

Formatting Notes

  • If you see something in italics within a source-code block below, you should replace it with the appropriate value for your environment.
  • Source-code blocks with a leading ‘#’ character indicates a command that must be executed as the “root” user, either by logging in as root or using the sudo command.
  • Source-code blocks with a leading ‘$’ character indicates a command that may be executed by any user (privileged or otherwise). These commands are generally for testing purposes.


You will need to know the following information about your LDAP server to follow the directions below:

  • Is the directory server powered by FreeIPA, Active Directory or another LDAP solution?
  • What is the URI for the LDAP server? e.g.
  • Where is the CA certificate for the LDAP server?
  • Does the LDAP server correspond to RFC 2307 or RFC2307bis for user-groups?

Prepare VMs:

  • A VM to use as the authenticating proxy. This machine must have at least SSSD 1.12.0 available, which means a fairly recent operating system. In these examples, I will be using a clean install of Red Hat Enterprise Linux 7.2 Server.
  • A VM to use to run OpenShift

(These machines *can* be configured to run on the same system, but for the purposes of this tutorial, I am keeping them separate)

Phase 1: Certificate Generation

In order to ensure that communication between the authenticating proxy and OpenShift is trustworthy, we need to create a set of TLS certificates that we will use during the other phases of this setup. For the purposes of this demo, we will start by using the auto-generated certificates created as part of running

# openshift start \
    --public-master= \

Among other things, this will generate /etc/origin/master/ca.{cert|key}. We will use this signing certificate to generate keys to use on the authenticating proxy.

# mkdir -p /etc/origin/proxy/
# oadm ca create-server-cert \
    --cert='/etc/origin/proxy/' \
    --key='/etc/origin/proxy/' \, \
    --signer-cert=/etc/origin/master/ca.crt \
    --signer-key='/etc/origin/master/ca.key' \

For the hostnames, ensure that any hostnames and interface IP addresses that might need to access the proxy are listed, otherwise the HTTPS connection will fail.

Next, we will generate the API client certificate that the authenticating proxy will use to prove its identity to OpenShift (this is necessary so that malicious users cannot impersonate the proxy and send fake identities). First, we will create a new CA to sign this client certificate.

# oadm ca create-signer-cert \
  --cert='/etc/origin/proxy/proxyca.crt' \
  --key='/etc/origin/proxy/proxyca.key' \
  --name='openshift-proxy-signer@`date +%s`' \

(The date +%s in that block is used to make the  signer unique. You can use any name you prefer, however.)

# oadm create-api-client-config \
    --certificate-authority='/etc/origin/proxy/proxyca.crt' \
    --client-dir='/etc/origin/proxy' \
    --signer-cert='/etc/origin/proxy/proxyca.crt' \
    --signer-key='/etc/origin/proxy/proxyca.key' \
    --signer-serial='/etc/origin/proxy/proxyca.serial.txt' \
# cat /etc/origin/proxy/system\:proxy.crt \
      /etc/origin/proxy/system\:proxy.key \
      > /etc/origin/proxy/authproxy.pem

Phase 2: Authenticating Proxy Setup

Step 1: Copy certificates

From, securely copy the necessary certificates to the proxy machine:

# scp /etc/origin/proxy/master/ca.crt \

# scp /etc/origin/proxy/ \
      /etc/origin/proxy/authproxy.pem \

# scp /etc/origin/proxy/ \

Step 2: SSSD Configuration

Install a new VM with a recent operating system (in order to use the mod_identity_lookup module later, it will need to be running SSSD 1.12.0 or later). In these examples, I will be using a clean install of Red Hat Enterprise Linux 7.2 Server.

First thing is to install all of the necessary dependencies:

# yum install -y sssd \
                 sssd-dbus \
                 realmd \
                 httpd \
                 mod_session \
                 mod_ssl \

This will give us the SSSD and the web server components we will need. The first step here will be to set up SSSD to authenticate this VM against the LDAP server. If the LDAP server in question is a FreeIPA or Active Directory environment, then realmd can be used to join this machine to the domain. This is the easiest way to get up and running.

realm join

If you aren’t running a domain, then your best option is to use the authconfig tool (or follow the many other tutorials on the internet for configuring SSSD for identity and authentication).

# authconfig --update --enablesssd --enablesssdauth \
             --enableldaptls \

This should create /etc/sssd/sssd.conf with most of the appropriate settings. (Note: RHEL 7 appears to have a bug wherein authconfig does not create the /etc/openldap/cacerts directory, so you may need to create it manually before running the above command.)

If you are interested in using SSSD to manage failover situations for LDAP, this can be configured simply by adding additional entries in /etc/sssd/sssd.conf on the ldap_uri line. Systems enrolled with FreeIPA will automatically handle failover using DNS SRV records.

Finally, restart SSSD to make sure that all of the changes are applied properly:

$ systemctl restart sssd.service

Now, test that the user information can be retrieved properly:

$ getent passwd <username>
username:*:12345:12345:Example User:/home/username:/usr/bin/bash

At this point, it is wise to attempt to log into the VM as an LDAP user and confirm that the authentication is properly set up. This can be done via the local console or a remote service such as SSH. (Later, you can modify your /etc/pam.d files to disallow this access if you prefer.) If this fails, consult the SSSD troubleshooting guide.

Step 3: Apache Configuration

Now that we have the authentication pieces in place, we need to set up Apache to talk to SSSD. First, we will create a PAM stack file for use with Apache. Create the /etc/pam.d/openshift file and add the following contents:

auth required
account required

This will tell PAM (the pluggable authentication module) that when an authentication request is issued for the “openshift” stack, it should use to determine authentication and access-control.

Next we will configure the Apache httpd.conf. (Taken from the OpenShift documentation and modified for SSSD.) For this tutorial, we’re only going to set up the challenge authentication (useful for logging in with oc login and similar automated tools). A future entry in this series will describe setup to use the web console.

First, create the new file openshift-proxy.conf in /etc/httpd/conf.d (substituting the correct hostnames where indicated):

LoadModule request_module modules/
LoadModule lookup_identity_module modules/
# Nothing needs to be served over HTTP.  This virtual host simply redirects to
<VirtualHost *:80>
  DocumentRoot /var/www/html
  RewriteEngine              On
  RewriteRule     ^(.*)$     https://%{HTTP_HOST}$1 [R,L]

<VirtualHost *:443>
  # This needs to match the certificates you generated.  See the CN and X509v3
  # Subject Alternative Name in the output of:
  # openssl x509 -text -in /etc/pki/tls/certs/

  DocumentRoot /var/www/html
  SSLEngine on
  SSLCertificateFile /etc/pki/tls/certs/
  SSLCertificateKeyFile /etc/pki/tls/private/
  SSLCACertificateFile /etc/pki/CA/certs/ca.crt

  # Send logs to a specific location to make them easier to find
  ErrorLog logs/proxy_error_log
  TransferLog logs/proxy_access_log
  LogLevel warn
  SSLProxyEngine on
  SSLProxyCACertificateFile /etc/pki/CA/certs/ca.crt
  # It's critical to enforce client certificates on the Master.  Otherwise
  # requests could spoof the X-Remote-User header by accessing the Master's
  # /oauth/authorize endpoint directly.
  SSLProxyMachineCertificateFile /etc/pki/tls/certs/authproxy.pem

  # Send all requests to the console
  RewriteEngine              On
  RewriteRule     ^/console(.*)$     https://%{HTTP_HOST}:8443/console$1 [R,L]

  # In order to using the challenging-proxy an X-Csrf-Token must be present.
  RewriteCond %{REQUEST_URI} ^/challenging-proxy
  RewriteCond %{HTTP:X-Csrf-Token} ^$ [NC]
  RewriteRule ^.* - [F,L]

  <Location /challenging-proxy/oauth/authorize>
    # Insert your backend server name/ip here.
    AuthType Basic
    AuthBasicProvider PAM
    AuthPAMService openshift
    Require valid-user

  <ProxyMatch /oauth/authorize>
    AuthName openshift
    RequestHeader set X-Remote-User %{REMOTE_USER}s

RequestHeader unset X-Remote-User


Then we need to tell SELinux that it’s acceptable for Apache to contact the PAM subsystem, so we set a boolean:

# setsebool -P allow_httpd_mod_auth_pam on

At this point, we can start up Apache.

# systemctl start httpd.service

Phase 3: OpenShift Configuration

This describes how to set up an OpenShift server from scratch in an “all in one” configuration. For more complicated (and interesting) setups, consult the official OpenShift documentation.

First, we need to modify the default configuration to use the new identity provider we just created. We’ll start by modifying the /etc/origin/master/master-config.yaml file. Scan through it and locate the identityProviders section and replace it with:

  - name: any_provider_name
    challenge: true
    login: false
    mappingMethod: claim
      apiVersion: v1
      kind: RequestHeaderIdentityProvider
      challengeURL: "${query}"
      clientCA: /etc/origin/master/proxy/proxyca.crt
      - X-Remote-User

Now we can start openshift with the updated configuration:

# openshift start \
    --public-master= \
    --master-config=/etc/origin/master/master-config.yaml \

Now you can test logins with

oc login

It should now be possible to log in with only valid LDAP credentials. Stay tuned for further entries in this series where I will teach you how to set up a “login” provider for authenticating the web console, how to retrieve extended user attributes like email address and full name from LDAP, and also how to set up automatic single-sign-on for users in a FreeIPA or Active Directory domain.

 Updates 2016-05-27: There were some mistakes in the httpd.conf as originally written that made it difficult to set up Part 2. They have been retroactively corrected. Additionally, I’ve moved the incomplete configuration of extended attributes out of this entry and will reintroduce them in a further entry in this series.

May 23, 2016

Thoughts on our security bubble
Last week I spent time with a lot of normal people. Well, they were all computer folks, but not the sort one would find in a typical security circle. It really got me thinking about the bubble we live in as the security people.

There are a lot of things we take for granted. I can reference Dunning Kruger and "turtles all the way down" and not have to explain myself. If I talk about a buffer overflow, or most any security term I never have to explain what's going on. Even some of the more obscure technologies like container scanners and SCAP don't need but a few words to explain what happens. It's easy to talk to security people, at least it's easy for security people to talk to other security people.

Sometimes it's good to get out of your comfort zone though. Last week I spent a lot of the week well outside groups I was comfortable with. It's a good thing for us to do this though. I really do think this is a big problem the security universe suffers from. There are a lot of us who don't really get out there and see what it's really like. I know I always assume everyone else knows a lot about security. They don't know a lot about security. They usually don't even know a little about security. This puts us in a place where we think everyone else is dumb, and they think we're idiots. Do you listen to someone who appears to be a smug jerk? Of course not, nobody does. This is one of the reasons it can be hard to get our messages across.

If we want people to listen to us, they have to trust us. If we want people to trust us, we have to make them understand us. If we want people to understand us, we have to understand them first. That bit of circular Yoda logic sounds insane, but it really is true. There's nothing worse than trying to help someone only to have them ignore you, or worse, do the opposite because they can.

So here's what I want to do. I have some homework for you, assuming you made it this far, which you probably did if you're reading this. Go talk to some non security people. Don't try to educate them on anything, just listen to what they have to say, even if they're wrong, especially if they're wrong, don't correct them. Just listen. Listen and see what you can learn. I bet it will be something amazing.

Let me know what you learn: @joshbressers

May 19, 2016

Reproducing an Open vSwitch Bridge Configuration

In the previous post, I described the setup for installing FreeIPA on a VM parallel to the undercloud VM setup by Tripleo Quickstart. The network on the undercloud VM has been setup up by Ironic and Neutron to listen on a network defined for the overcloud. I want to reproduce this on a second machine that is not enrolled in the undercloud. How can I reproduce the steps?


This is far more complex than necessary. All I needed to do was:

sudo ip addr add dev eth1
sudo ip link set eth1 up

To get connectivity, and persist that info in /etc/sysconfig/network-scripts/ifcfg-eth1

But the OVS “cloning” here is still interesting enough to warrant its own post.


Using Tripleo Quickstart, I see that the interface I need is created with:

sudo bash -c 'cat < /etc/sysconfig/network-scripts/ifcfg-vlan10

sudo ifup ifcfg-vlan10

But My VM does not have an OVS_BRIDGE br-ctlplane defined. How do I create that?

Using the ovs commands, I can look at the bridge definition:

$ sudo ovs-vsctl show
    Bridge br-ctlplane
        Port "vlan10"
            tag: 10
            Interface "vlan10"
                type: internal
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
        Port phy-br-ctlplane
            Interface phy-br-ctlplane
                type: patch
                options: {peer=int-br-ctlplane}
        Port "eth1"
            Interface "eth1"
    Bridge br-int
        fail_mode: secure
        Port int-br-ctlplane
            Interface int-br-ctlplane
                type: patch
                options: {peer=phy-br-ctlplane}
        Port br-int
            Interface br-int
                type: internal
        Port "tapacff1724-9f"
            tag: 1
            Interface "tapacff1724-9f"
                type: internal
    ovs_version: "2.5.0"

And that does not exist on the new VM. I’ve been able to deduce that the creation of this bridge happened as a side effect of running

openstack undercloud install

Since I don’t want an undercloud on my other node, I need to reproduce the OVS commands to build the bridge.

I’m in luck. These commands are all captured in /etc/openvswitch/conf.db I can pull them out with:

grep '^\{'  /etc/openvswitch/conf.db | jq '. | ._comment ' | sed -e 's!^\"!!g' -e's!ovs-vsctl:!!' -e 's!\"$!!'   | grep -v null >

That gets me:

 ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.12.1
 ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.5.0 \"external-ids:system-id=\\\"a9460ec6-db71-42fb-aec7-a5356bcda153\\\"\" \"system-type=\\\"CentOS\\\"\" \"system-version=\\\"7.2.1511-Core\\\"\"
 ovs-vsctl -t 10 -- --may-exist add-br br-ctlplane -- set bridge br-ctlplane other-config:hwaddr=00:59:cf:9c:84:3a -- br-set-external-id br-ctlplane bridge-id br-ctlplane
 ovs-vsctl -t 10 -- --if-exists del-port br-ctlplane eth1 -- add-port br-ctlplane eth1
 ovs-vsctl -t 10 -- --if-exists del-port br-ctlplane eth1 -- add-port br-ctlplane eth1
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-br br-int -- set Bridge br-int datapath_type=system
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set-fail-mode br-int secure
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Bridge br-int protocols=OpenFlow10
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-br br-ctlplane -- set Bridge br-ctlplane datapath_type=system
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Bridge br-ctlplane protocols=OpenFlow10
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-port br-int int-br-ctlplane -- set Interface int-br-ctlplane type=patch options:peer=nonexistent-peer
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-port br-ctlplane phy-br-ctlplane -- set Interface phy-br-ctlplane type=patch options:peer=nonexistent-peer
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Interface int-br-ctlplane options:peer=phy-br-ctlplane
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Interface phy-br-ctlplane options:peer=int-br-ctlplane
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- add-port br-int tapacff1724-9f -- set Interface tapacff1724-9f type=internal external_ids:iface-id=acff1724-9fb2-4771-a7db-8bd93e7f3833 external_ids:iface-status=active external_ids:attached-mac=fa:16:3e:f6:6d:86
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port tapacff1724-9f other_config:physical_network=ctlplane other_config:net_uuid=6dd40444-6cc9-4cfa-bfbd-15b614f6e9e1 other_config:network_type=flat
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port tapacff1724-9f other_config:tag=1 other_config:physical_network=ctlplane other_config:net_uuid=6dd40444-6cc9-4cfa-bfbd-15b614f6e9e1 other_config:network_type=flat
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port tapacff1724-9f tag=1
 ovs-vsctl -t 10 -- --may-exist add-port br-ctlplane vlan10 tag=10 -- set Interface vlan10 type=internal

Now I don’t want to blindly re-execute this, as there are some embedded values particular to the first machine. The MAC 00:59:cf:9c:84:3a for eth1 is reused by the bridge. The first two lines look like system specific setup. Let’s see if the new VM has anything along these lines.

Things to note:

  1. /etc/openvswitch/ is empty
  2. systemctl status openvswitch.service show the service is not running

Let’s try starting it:
sudo systemctl start openvswitch.service

grep '^\{'  /etc/openvswitch/conf.db | jq '. | ._comment ' | sed -e 's!^\"!!g' -e's!ovs-vsctl:!!' -e 's!\"$!!'   | grep -v null 
 ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.12.1
 ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.5.0 \"external-ids:system-id=\\\"8f68fbfb-9278-4772-87f1-500bc80bb917\\\"\" \"system-type=\\\"CentOS\\\"\" \"system-version=\\\"7.2.1511-Core\\\"\"

So we can drop those two lines.

Extract the MAC for interface eth1:

ip addr show eth1
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:59:cf:9c:84:3e brd ff:ff:ff:ff:ff:ff

OK, that is about all we can do. Execute it.

sudo ./

No complaints. What did we get?

$ sudo ovs-vsctl show
    Bridge br-int
        fail_mode: secure
        Port "tapacff1724-9f"
            tag: 1
            Interface "tapacff1724-9f"
                type: internal
        Port br-int
            Interface br-int
                type: internal
        Port int-br-ctlplane
            Interface int-br-ctlplane
                type: patch
                options: {peer=phy-br-ctlplane}
    Bridge br-ctlplane
        Port phy-br-ctlplane
            Interface phy-br-ctlplane
                type: patch
                options: {peer=int-br-ctlplane}
        Port "vlan10"
            tag: 10
            Interface "vlan10"
                type: internal
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
        Port "eth1"
            Interface "eth1"
    ovs_version: "2.5.0"

Looks right.

One thing I notice that is different is that on undercloud, I the bridge has an IP Address:

7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 00:59:cf:9c:84:3a brd ff:ff:ff:ff:ff:ff
    inet brd scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::259:cfff:fe9c:843a/64 scope link 
       valid_lft forever preferred_lft forever

Let’s add one to the bridge on our new machine:

$ cat /etc/sysconfig/network-scripts/ifcfg-br-ctlplane
# This file is autogenerated by os-net-config
OVS_EXTRA="set bridge br-ctlplane other-config:hwaddr=00:59:cf:9c:84:3a -- br-set-external-id br-ctlplane bridge-id br-ctlplane"

Again, minor edits, to use proper MAC and a different IP address. Bring it up with:

sudo ifup br-ctlplane

And we can see it:

$ ip addr show br-ctlplane
7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 00:59:cf:9c:84:3e brd ff:ff:ff:ff:ff:ff
    inet brd scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::259:cfff:fe9c:843e/64 scope link 
       valid_lft forever preferred_lft forever

Last step: we need to bring up the eth1 interface. Again, give it a config file, this time in /etc/sysconfig/network-scripts/ifcfg-eth1


And bring it up with :

sudo ifup eth1

Make sure it is up:

$ ip addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP qlen 1000
    link/ether 00:59:cf:9c:84:3e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::259:cfff:fe9c:843e/64 scope link 
       valid_lft forever preferred_lft forever

And usable:

$  ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=1.41 ms
64 bytes from icmp_seq=2 ttl=64 time=0.627 ms

I’d really like to laud the Open vSwitch developers for their approach to the database. Having the commands available in the database is a fantastic tool. That is pattern I would love to see emulated elsewhere.

Installing FreeIPA on a Tripleo undercloud

I’ve been talking about using FreeIPA to secure OpenStack since the Havana summit in Portland. I’m now working with Tripleo to install OpenStack. To get the IPA server installed along with Tripleo Quickstart requires a VM accessible from the Ansible playbook.

UPDATE:  This is changing rapidly.  I’ll post complete updates in a bit, but the commit below is now one in a chain, and the isntracutrions are in the git messages for the commits.  One missing step in order to run ansible is:export ANSIBLE_CONFIG=$HOME/.quickstart/tripleo-quickstart/


Build the Identity VM

  • Apply the patch to quickstart that builds the VM
  • Run quickstartm at least up to the undercloud stage. The steps below do the complete install.

Since Quickstart makes a git repo under ~/.quickstart, I’ve been using that as my repo. It avoids duplication, and makes my changes visible.

mkdir ~/.quickstart
cd ~/.quickstart
git clone
cd tripleo-quickstart
git review -d 315749
~/.quickstart/tripleo-quickstart/   -t all

If you are not set up for git review, you can pull the patch manually from Gerrit.

Set the hostname FQDN for the identity machine

ssh -F /home/ayoung/.quickstart/ssh.config.ansible identity-root hostnamectl set-hostname --static

Add variables to the inventory file ~/.quickstart/hosts


Activate the Venv:

. ~/.quickstart/bin/activate

Use Rippowam branch

cd ~/devel
git clone
cd rippowam
git checkout origin/tripleo

Run ansible

ansible-playbook -i ~/.quickstart/hosts ~/devel/rippowam/ipa.yml

Making this VM available to the overcloud requires some network wizardry. That deserves a post itself.

May 15, 2016

Security will fix itself, eventually
If you're in the security industry these days things often don't look very good. Everywhere you look it sometimes feels like everything is on fire. The joke is there are two types of companies, those that know they've been hacked and those that don't. The world of devices looks even worse. They're all running old software, most will never see updates, most of the people building the things don't know or care about proper security, most people buying them don't know this is a problem.

I heard a TED talk by Al Gore called The case for optimism on climate change. This made me think of security in some ways. The basics of the talk are that things are getting better, we're surpassing many goals set for things like renewable energy. A few years ago the idea of renewable energy beating out something like coal seemed far fetched.

That reminded me of the current state of security. It's hard to see a future that's very bright sometimes. For ever problem that gets fixed, at least two new ones show up. The thing that gives me optimism though is the same basic idea as climate change. It has to get better because there is no alternative.

If we look back at renewable energy, the biggest force keeping it out of the market even five years ago was cost. It was really expensive to build and deploy things like solar panels. Today it's the same price or cheaper in some instances.

What happened?

The market happened. As new technology emerges and develops, it gets cheaper. This is one of the amazing things about emerging technology. Entrenched technology generally doesn't change price drastically just due to its nature. Solar power is getting better, it's not done yet, it will continue to get better for less cost. The day will come when we think about current power generation the way we think about using horses for transportation.

Now let's think about security.

If you want secure devices and a secure infrastructure it's going to cost a fortune. You're talking about very high skilled staff and extremely expensive hardware and software (assuming you can even get it in some cases). Today security is added cost in many cases, so lots of producers skip it. Bad security has cost too though. Today bad security is generally cheaper than good security. We need to flip this around, good security needs to be cheaper than bad security.

The future.

Here's my prediction though. In the future, good security will be cheaper to build, deploy, and run that bad security. This sounds completely insane with today's technology. A statement like is some kook ten years ago telling everyone solar power is our future. Ten years ago solar wasn't a serious thing, today it is. Our challenge is figuring out what the new security future will look like. We don't really know yet. We know we can't train our way out of this, most existing technology is a band-aid at best. If I had to guess I'll use the worn out "Artificial Intelligence will save us all", but who knows what the future will bring. Thanks to Al Gore, I'm now more optimistic things will get better. I'm impatient though, I don't want to wait for the future, I want it now! So all you smart folks do me a favor and start inventing the future.

What do you think? Leave your comments on twitter: @joshbressers

May 12, 2016

Lessons Learned writing a certmonger helper for Anchor

Guang Yee has been trying to get certmonger talking to Anchor — an ephemeral CA, worth a post by itself. His attitude went from “this is easy” to “I’m about to give up on certmonger” to “Got it.” Here is his post-mortem:

Finally got the basic flow working. I am now able to run Anchor and getting the server certs with certmonger. Running certmonger-session in debug mode was really beneficial. Your blogs on younglogic helped out quite a bit as well. Next stop, put them all together and submit a patch for devstack.

Lessons learned so far:

  1. Documentation does not match reality. For example, the “getcert add-ca” command is not available on the version I got. I did my work on Ubuntu Trusty LTS. My understanding is that the LTS, like RHEL, tend to carry old (but stable?) packages?
  2. There isn’t a whole lot of example on certmonger helper. I had to learn as I go.
  3. Certmonger-session tend to overwrite my changes in ~/.config/certmonger/cas/ dir. I have to do “killall certmonger-session” before making any changes.
  4. Troubleshooting wasn’t easy at the beginning. There were a bunch of dbus interactions in the logs which I don’t know what to do with them. The “” logs concerned me at the beginning. I thought this is supposed to be a generic cert monitoring daemon. I was concerned it may be making calls outside of my box.
  5. If the script fail to load, nothing show up in syslog. Best way would be to run the script independently before hooking it up with certmonger. I screwed up on the exit code, that’s why I kept getting NEED_GUILDANCE status. In this case, running certmonger-session manually in debug level 15 helps a lot.
  6. I had trouble with Anchor at the beginning as I was running an outdated version of Pecan. But once I got that fixed, I did encounter any more issues with Anchor.

We’ll take this input back to the Certmonger team. Some are due to the older version of Certmonger, which is motivation to get an updated on available for Trusty. I’d like to get a Python shell defined that other Certmonger helper apps can use as a starting point: something that deals with the Env Vars, but then allows a developer to register a class that does the CA specific code.

Thanks to Guang for battling through this and again to Nalin Dahyabhai for helping to debug.

May 10, 2016

Certmonger logging for debugging

Certmonger is split into 3 parts

  1. getcert or comparable helper app which the user calls to make requests.  The request is put on dbus and and sent to
  2. The certmonger binary.  This reads the request off of dbus and makes a call to
  3. The help application which makes calls to the remote service.

Debugging this process is much easier if you run the certmonger service from the command line and tell it to log debugging output.  Make sure no certmonger-session processes are running:

killall certmonger-session

Then explicitly start the certmonger session binary in non-daemon mode with debugging.

/usr/libexec/certmonger/certmonger-session -n -d 15

I chose 15 as a “very high number” for debugging. It worked for me.

Make sure that the dbus setup for certmonger has been set as an env var:


Then make a request in a separate terminal like:

 getcert list -s

And you should see logging from certmonger-session

2016-05-10 16:59:02 [21970] Dequeuing FD 8 for Read for 0x55c4635aba90:0x55c4635af070.
2016-05-10 16:59:02 [21970] Handling D-Bus traffic (Read) on FD 8 for 0x55c4635aba90.
2016-05-10 16:59:02 [21970] message 0x55c4635aba90(method_call)->org.fedorahosted.certmonger:/org/fedorahosted/certmonger:org.fedorahosted.certmonger.get_requests
2016-05-10 16:59:02 [21970] Pending GetConnectionUnixUser serial 105
2016-05-10 16:59:02 [21970] Pending GetConnectionUnixProcessID serial 106

And lots more.

To add a request:

getcert request -n remote   -c remote -s -d ~/certs/  -N "uid=ayoung,cn=users,cn=accounts,dc=openstack,dc=freeipa,dc=org"

And see the output.

2016-05-10 17:00:09 [21970] Request2('20160510210008') moved to state 'HAVE_CSR'
2016-05-10 17:00:09 [21970] Will revisit Request2('20160510210008') now.
2016-05-10 17:00:09 [21970] Request2('20160510210008') moved to state 'NEED_TO_SUBMIT'
2016-05-10 17:00:09 [21970] Will revisit Request2('20160510210008') now.
2016-05-10 17:00:09 [21970] Request2('20160510210008') moved to state 'SUBMITTING'
2016-05-10 17:00:09 [21970] Will revisit Request2('20160510210008') on traffic from 15.