May 23, 2016

Thoughts on our security bubble
Last week I spent time with a lot of normal people. Well, they were all computer folks, but not the sort one would find in a typical security circle. It really got me thinking about the bubble we live in as the security people.

There are a lot of things we take for granted. I can reference Dunning Kruger and "turtles all the way down" and not have to explain myself. If I talk about a buffer overflow, or most any security term I never have to explain what's going on. Even some of the more obscure technologies like container scanners and SCAP don't need but a few words to explain what happens. It's easy to talk to security people, at least it's easy for security people to talk to other security people.

Sometimes it's good to get out of your comfort zone though. Last week I spent a lot of the week well outside groups I was comfortable with. It's a good thing for us to do this though. I really do think this is a big problem the security universe suffers from. There are a lot of us who don't really get out there and see what it's really like. I know I always assume everyone else knows a lot about security. They don't know a lot about security. They usually don't even know a little about security. This puts us in a place where we think everyone else is dumb, and they think we're idiots. Do you listen to someone who appears to be a smug jerk? Of course not, nobody does. This is one of the reasons it can be hard to get our messages across.

If we want people to listen to us, they have to trust us. If we want people to trust us, we have to make them understand us. If we want people to understand us, we have to understand them first. That bit of circular Yoda logic sounds insane, but it really is true. There's nothing worse than trying to help someone only to have them ignore you, or worse, do the opposite because they can.

So here's what I want to do. I have some homework for you, assuming you made it this far, which you probably did if you're reading this. Go talk to some non security people. Don't try to educate them on anything, just listen to what they have to say, even if they're wrong, especially if they're wrong, don't correct them. Just listen. Listen and see what you can learn. I bet it will be something amazing.

Let me know what you learn: @joshbressers

May 19, 2016

Reproducing an Open vSwitch Bridge Configuration

In the previous post, I described the setup for installing FreeIPA on a VM parallel to the undercloud VM setup by Tripleo Quickstart. The network on the undercloud VM has been setup up by Ironic and Neutron to listen on a network defined for the overcloud. I want to reproduce this on a second machine that is not enrolled in the undercloud. How can I reproduce the steps?


This is far more complex than necessary. All I needed to do was:

sudo ip addr add dev eth1
sudo ip link set eth1 up

To get connectivity, and persist that info in /etc/sysconfig/network-scripts/ifcfg-eth1

But the OVS “cloning” here is still interesting enough to warrant its own post.


Using Tripleo Quickstart, I see that the interface I need is created with:

sudo bash -c 'cat < /etc/sysconfig/network-scripts/ifcfg-vlan10

sudo ifup ifcfg-vlan10

But My VM does not have an OVS_BRIDGE br-ctlplane defined. How do I create that?

Using the ovs commands, I can look at the bridge definition:

$ sudo ovs-vsctl show
    Bridge br-ctlplane
        Port "vlan10"
            tag: 10
            Interface "vlan10"
                type: internal
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
        Port phy-br-ctlplane
            Interface phy-br-ctlplane
                type: patch
                options: {peer=int-br-ctlplane}
        Port "eth1"
            Interface "eth1"
    Bridge br-int
        fail_mode: secure
        Port int-br-ctlplane
            Interface int-br-ctlplane
                type: patch
                options: {peer=phy-br-ctlplane}
        Port br-int
            Interface br-int
                type: internal
        Port "tapacff1724-9f"
            tag: 1
            Interface "tapacff1724-9f"
                type: internal
    ovs_version: "2.5.0"

And that does not exist on the new VM. I’ve been able to deduce that the creation of this bridge happened as a side effect of running

openstack undercloud install

Since I don’t want an undercloud on my other node, I need to reproduce the OVS commands to build the bridge.

I’m in luck. These commands are all captured in /etc/openvswitch/conf.db I can pull them out with:

grep '^\{'  /etc/openvswitch/conf.db | jq '. | ._comment ' | sed -e 's!^\"!!g' -e's!ovs-vsctl:!!' -e 's!\"$!!'   | grep -v null >

That gets me:

 ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.12.1
 ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.5.0 \"external-ids:system-id=\\\"a9460ec6-db71-42fb-aec7-a5356bcda153\\\"\" \"system-type=\\\"CentOS\\\"\" \"system-version=\\\"7.2.1511-Core\\\"\"
 ovs-vsctl -t 10 -- --may-exist add-br br-ctlplane -- set bridge br-ctlplane other-config:hwaddr=00:59:cf:9c:84:3a -- br-set-external-id br-ctlplane bridge-id br-ctlplane
 ovs-vsctl -t 10 -- --if-exists del-port br-ctlplane eth1 -- add-port br-ctlplane eth1
 ovs-vsctl -t 10 -- --if-exists del-port br-ctlplane eth1 -- add-port br-ctlplane eth1
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-br br-int -- set Bridge br-int datapath_type=system
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set-fail-mode br-int secure
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Bridge br-int protocols=OpenFlow10
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-br br-ctlplane -- set Bridge br-ctlplane datapath_type=system
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Bridge br-ctlplane protocols=OpenFlow10
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-port br-int int-br-ctlplane -- set Interface int-br-ctlplane type=patch options:peer=nonexistent-peer
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-port br-ctlplane phy-br-ctlplane -- set Interface phy-br-ctlplane type=patch options:peer=nonexistent-peer
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Interface int-br-ctlplane options:peer=phy-br-ctlplane
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Interface phy-br-ctlplane options:peer=int-br-ctlplane
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- add-port br-int tapacff1724-9f -- set Interface tapacff1724-9f type=internal external_ids:iface-id=acff1724-9fb2-4771-a7db-8bd93e7f3833 external_ids:iface-status=active external_ids:attached-mac=fa:16:3e:f6:6d:86
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port tapacff1724-9f other_config:physical_network=ctlplane other_config:net_uuid=6dd40444-6cc9-4cfa-bfbd-15b614f6e9e1 other_config:network_type=flat
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port tapacff1724-9f other_config:tag=1 other_config:physical_network=ctlplane other_config:net_uuid=6dd40444-6cc9-4cfa-bfbd-15b614f6e9e1 other_config:network_type=flat
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port tapacff1724-9f tag=1
 ovs-vsctl -t 10 -- --may-exist add-port br-ctlplane vlan10 tag=10 -- set Interface vlan10 type=internal

Now I don’t want to blindly re-execute this, as there are some embedded values particular to the first machine. The MAC 00:59:cf:9c:84:3a for eth1 is reused by the bridge. The first two lines look like system specific setup. Let’s see if the new VM has anything along these lines.

Things to note:

  1. /etc/openvswitch/ is empty
  2. systemctl status openvswitch.service show the service is not running

Let’s try starting it:
sudo systemctl start openvswitch.service

grep '^\{'  /etc/openvswitch/conf.db | jq '. | ._comment ' | sed -e 's!^\"!!g' -e's!ovs-vsctl:!!' -e 's!\"$!!'   | grep -v null 
 ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.12.1
 ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.5.0 \"external-ids:system-id=\\\"8f68fbfb-9278-4772-87f1-500bc80bb917\\\"\" \"system-type=\\\"CentOS\\\"\" \"system-version=\\\"7.2.1511-Core\\\"\"

So we can drop those two lines.

Extract the MAC for interface eth1:

ip addr show eth1
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:59:cf:9c:84:3e brd ff:ff:ff:ff:ff:ff

OK, that is about all we can do. Execute it.

sudo ./

No complaints. What did we get?

$ sudo ovs-vsctl show
    Bridge br-int
        fail_mode: secure
        Port "tapacff1724-9f"
            tag: 1
            Interface "tapacff1724-9f"
                type: internal
        Port br-int
            Interface br-int
                type: internal
        Port int-br-ctlplane
            Interface int-br-ctlplane
                type: patch
                options: {peer=phy-br-ctlplane}
    Bridge br-ctlplane
        Port phy-br-ctlplane
            Interface phy-br-ctlplane
                type: patch
                options: {peer=int-br-ctlplane}
        Port "vlan10"
            tag: 10
            Interface "vlan10"
                type: internal
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
        Port "eth1"
            Interface "eth1"
    ovs_version: "2.5.0"

Looks right.

One thing I notice that is different is that on undercloud, I the bridge has an IP Address:

7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 00:59:cf:9c:84:3a brd ff:ff:ff:ff:ff:ff
    inet brd scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::259:cfff:fe9c:843a/64 scope link 
       valid_lft forever preferred_lft forever

Let’s add one to the bridge on our new machine:

$ cat /etc/sysconfig/network-scripts/ifcfg-br-ctlplane
# This file is autogenerated by os-net-config
OVS_EXTRA="set bridge br-ctlplane other-config:hwaddr=00:59:cf:9c:84:3a -- br-set-external-id br-ctlplane bridge-id br-ctlplane"

Again, minor edits, to use proper MAC and a different IP address. Bring it up with:

sudo ifup br-ctlplane

And we can see it:

$ ip addr show br-ctlplane
7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 00:59:cf:9c:84:3e brd ff:ff:ff:ff:ff:ff
    inet brd scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::259:cfff:fe9c:843e/64 scope link 
       valid_lft forever preferred_lft forever

Last step: we need to bring up the eth1 interface. Again, give it a config file, this time in /etc/sysconfig/network-scripts/ifcfg-eth1


And bring it up with :

sudo ifup eth1

Make sure it is up:

$ ip addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP qlen 1000
    link/ether 00:59:cf:9c:84:3e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::259:cfff:fe9c:843e/64 scope link 
       valid_lft forever preferred_lft forever

And usable:

$  ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=1.41 ms
64 bytes from icmp_seq=2 ttl=64 time=0.627 ms

I’d really like to laud the Open vSwitch developers for their approach to the database. Having the commands available in the database is a fantastic tool. That is pattern I would love to see emulated elsewhere.

Installing FreeIPA on a Tripleo undercloud

I’ve been talking about using FreeIPA to secure OpenStack since the Havana summit in Portland. I’m now working with Tripleo to install OpenStack. To get the IPA server installed along with Tripleo Quickstart requires a VM accessible from the Ansible playbook.

Build the Identity VM

  • Apply the patch to quickstart that builds the VM
  • Run quickstartm at least up to the undercloud stage. The steps below do the complete install.

Since Quickstart makes a git repo under ~/.quickstart, I’ve been using that as my repo. It avoids duplication, and makes my changes visible.

mkdir ~/.quickstart
cd ~/.quickstart
git clone
cd tripleo-quickstart
git review -d 315749
~/.quickstart/tripleo-quickstart/   -t all

If you are not set up for git review, you can pull the patch manually from Gerrit.

Set the hostname FQDN for the identity machine

ssh -F /home/ayoung/.quickstart/ssh.config.ansible identity-root hostnamectl set-hostname --static

Add variables to the inventory file ~/.quickstart/hosts

ipa_forwarder=<this come="come" from="from" on="on" resolve.conf="resolve.conf" values="values" warp="warp">
nameserver=<this come="come" from="from" on="on" resolve.conf="resolve.conf" values="values" warp="warp">

Activate the Venv:

. ~/.quickstart/bin/activate

Use Rippowam branch

cd ~/devel
git clone
cd rippowam
git checkout origin/tripleo

Run ansible

ansible-playbook -i ~/.quickstart/hosts ~/devel/rippowam/ipa.yml

Making this VM available to the overcloud requires some network wizardry. That deserves a post itself.

May 15, 2016

Security will fix itself, eventually
If you're in the security industry these days things often don't look very good. Everywhere you look it sometimes feels like everything is on fire. The joke is there are two types of companies, those that know they've been hacked and those that don't. The world of devices looks even worse. They're all running old software, most will never see updates, most of the people building the things don't know or care about proper security, most people buying them don't know this is a problem.

I heard a TED talk by Al Gore called The case for optimism on climate change. This made me think of security in some ways. The basics of the talk are that things are getting better, we're surpassing many goals set for things like renewable energy. A few years ago the idea of renewable energy beating out something like coal seemed far fetched.

That reminded me of the current state of security. It's hard to see a future that's very bright sometimes. For ever problem that gets fixed, at least two new ones show up. The thing that gives me optimism though is the same basic idea as climate change. It has to get better because there is no alternative.

If we look back at renewable energy, the biggest force keeping it out of the market even five years ago was cost. It was really expensive to build and deploy things like solar panels. Today it's the same price or cheaper in some instances.

What happened?

The market happened. As new technology emerges and develops, it gets cheaper. This is one of the amazing things about emerging technology. Entrenched technology generally doesn't change price drastically just due to its nature. Solar power is getting better, it's not done yet, it will continue to get better for less cost. The day will come when we think about current power generation the way we think about using horses for transportation.

Now let's think about security.

If you want secure devices and a secure infrastructure it's going to cost a fortune. You're talking about very high skilled staff and extremely expensive hardware and software (assuming you can even get it in some cases). Today security is added cost in many cases, so lots of producers skip it. Bad security has cost too though. Today bad security is generally cheaper than good security. We need to flip this around, good security needs to be cheaper than bad security.

The future.

Here's my prediction though. In the future, good security will be cheaper to build, deploy, and run that bad security. This sounds completely insane with today's technology. A statement like is some kook ten years ago telling everyone solar power is our future. Ten years ago solar wasn't a serious thing, today it is. Our challenge is figuring out what the new security future will look like. We don't really know yet. We know we can't train our way out of this, most existing technology is a band-aid at best. If I had to guess I'll use the worn out "Artificial Intelligence will save us all", but who knows what the future will bring. Thanks to Al Gore, I'm now more optimistic things will get better. I'm impatient though, I don't want to wait for the future, I want it now! So all you smart folks do me a favor and start inventing the future.

What do you think? Leave your comments on twitter: @joshbressers

May 12, 2016

Lessons Learned writing a certmonger helper for Anchor

Guang Yee has been trying to get certmonger talking to Anchor — an ephemeral CA, worth a post by itself. His attitude went from “this is easy” to “I’m about to give up on certmonger” to “Got it.” Here is his post-mortem:

Finally got the basic flow working. I am now able to run Anchor and getting the server certs with certmonger. Running certmonger-session in debug mode was really beneficial. Your blogs on younglogic helped out quite a bit as well. Next stop, put them all together and submit a patch for devstack.

Lessons learned so far:

  1. Documentation does not match reality. For example, the “getcert add-ca” command is not available on the version I got. I did my work on Ubuntu Trusty LTS. My understanding is that the LTS, like RHEL, tend to carry old (but stable?) packages?
  2. There isn’t a whole lot of example on certmonger helper. I had to learn as I go.
  3. Certmonger-session tend to overwrite my changes in ~/.config/certmonger/cas/ dir. I have to do “killall certmonger-session” before making any changes.
  4. Troubleshooting wasn’t easy at the beginning. There were a bunch of dbus interactions in the logs which I don’t know what to do with them. The “” logs concerned me at the beginning. I thought this is supposed to be a generic cert monitoring daemon. I was concerned it may be making calls outside of my box.
  5. If the script fail to load, nothing show up in syslog. Best way would be to run the script independently before hooking it up with certmonger. I screwed up on the exit code, that’s why I kept getting NEED_GUILDANCE status. In this case, running certmonger-session manually in debug level 15 helps a lot.
  6. I had trouble with Anchor at the beginning as I was running an outdated version of Pecan. But once I got that fixed, I did encounter any more issues with Anchor.

We’ll take this input back to the Certmonger team. Some are due to the older version of Certmonger, which is motivation to get an updated on available for Trusty. I’d like to get a Python shell defined that other Certmonger helper apps can use as a starting point: something that deals with the Env Vars, but then allows a developer to register a class that does the CA specific code.

Thanks to Guang for battling through this and again to Nalin Dahyabhai for helping to debug.

May 10, 2016

Certmonger logging for debugging

Certmonger is split into 3 parts

  1. getcert or comparable helper app which the user calls to make requests.  The request is put on dbus and and sent to
  2. The certmonger binary.  This reads the request off of dbus and makes a call to
  3. The help application which makes calls to the remote service.

Debugging this process is much easier if you run the certmonger service from the command line and tell it to log debugging output.  Make sure no certmonger-session processes are running:

killall certmonger-session

Then explicitly start the certmonger session binary in non-daemon mode with debugging.

/usr/libexec/certmonger/certmonger-session -n -d 15

I chose 15 as a “very high number” for debugging. It worked for me.

Make sure that the dbus setup for certmonger has been set as an env var:


Then make a request in a separate terminal like:

 getcert list -s

And you should see logging from certmonger-session

2016-05-10 16:59:02 [21970] Dequeuing FD 8 for Read for 0x55c4635aba90:0x55c4635af070.
2016-05-10 16:59:02 [21970] Handling D-Bus traffic (Read) on FD 8 for 0x55c4635aba90.
2016-05-10 16:59:02 [21970] message 0x55c4635aba90(method_call)->org.fedorahosted.certmonger:/org/fedorahosted/certmonger:org.fedorahosted.certmonger.get_requests
2016-05-10 16:59:02 [21970] Pending GetConnectionUnixUser serial 105
2016-05-10 16:59:02 [21970] Pending GetConnectionUnixProcessID serial 106

And lots more.

To add a request:

getcert request -n remote   -c remote -s -d ~/certs/  -N "uid=ayoung,cn=users,cn=accounts,dc=openstack,dc=freeipa,dc=org"

And see the output.

2016-05-10 17:00:09 [21970] Request2('20160510210008') moved to state 'HAVE_CSR'
2016-05-10 17:00:09 [21970] Will revisit Request2('20160510210008') now.
2016-05-10 17:00:09 [21970] Request2('20160510210008') moved to state 'NEED_TO_SUBMIT'
2016-05-10 17:00:09 [21970] Will revisit Request2('20160510210008') now.
2016-05-10 17:00:09 [21970] Request2('20160510210008') moved to state 'SUBMITTING'
2016-05-10 17:00:09 [21970] Will revisit Request2('20160510210008') on traffic from 15.

May 09, 2016

Passing Unix Socket File Descriptors between containers processes blocked by SELinux.
SELinux controls passing of Socket file descriptors between processes.

A Fedora user posted a bugzilla complaining about SELinux blocking transfer of socket file descriptors between two docker containers.

Lets look at what happens when a socket file descriptor is created by a process.

When a process accepts a connection from a remote system, the file descriptor is created by a process it automatically gets assigned the same label as the process creating the socket.  For example when the docker service (docker_t) listens on /var/run/docker.sock and a client connects the docker service, the docker service end of the connection gets labeled by default with the label of the docker process.  On my machine this is:


The client is probably running as unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023.  SELinux would then check to make sure that unconfined_t is able to connecto docker_t sockets.

If this socket descriptor is passed to another process the new process label has to have access to the socket with the "socket label".  If it does not SELinux will block the transfer.

In containers, even though by default all container processes have the same SELinux typo, they have different MCS Labels.

If I have a process labeled system_u:system_r:svirt_lxc_net_t:s0:c1,c2 and I pass that file descriptor to a process in a different container labeled system_u:system_r:svirt_lxc_net_t:s0:c4,c5, SELinux will block the access.

The bug reporter was reporting that by default he was not able to pass the descriptor, which is goodness. We would not want to allow a confined container to be able to read/write socket file descriptors from another container by default.

The reporter also figured out that he could get this to work by disabling SELinux either on the host or inside of the container.

Surprisingly he also figured out if he shared IPC namespaces between the containers, SELinux would not block.

The reason for this is when you share the same IPC Namespace, docker automatically caused the containers share the Same SELinux label.  If docker did not do this SELinux would block processes from container A access to IPCs created in Container B.  With a shared IPC the SELinux labels for both of the reporters containers were the same, and SELinux allowed the passing.

How would I make two containers share the same SELinux labels?

Docker by default launches all containers with the same type field, but different MCS labels.  I told the reporter that you could cause two containers to run with the same MCS labels by using the --security-opt label:level:MCSLABEL option.

Something like this will work

docker run -it --rm --security-opt label:level:s0:c1000,c1001 --name server -v myvol:/tmp test /server
docker run -it --rm --security-opt label:level:s0:c1000,c1001 --name client -v myvol:/tmp test /client

These containers would then run with the same MCS labels, which would give the reporter the best security possible and still allow the two containers to pass the socket between containers.  These containers would still be locked down with SELInux from the host and other containers, however they would be able to attack each other from an SELinux point of view, however the other container separation security would still be in effect to prevent the attacks.

May 08, 2016

Security isn't a feature, it's a part of everything
Almost every industry goes through a time when new novel features are sold as some sort of add on or extra product. Remember needing a TCP stack? What about having to buy a sound card for your computer, or a CD drive? (Does anyone even know what a CD is anymore?) Did you know that web browsers used to cost money? Times were crazy.

Let's think about security now. There is a lot of security that's some sort of add on, or maybe a separate product. Some of this is because it's a clever idea, some things exist because people are willing to pay for it even if it should be included. No matter what we're talking about, there is always a march toward commoditization. This is how Linux took over the universe, the operating system is a commodity now, it's all about how you put things together using things like containers and devops and cloud.

Now let's think about security. Of all the things going on, all the products out there, all the methodologies, security is always the special snowflake. For being so special you'd think we could get more right. If everything was fine, the Red Team wouldn't win. every. single. time.

The reality is that until we stop treating security like some sort of special add on, we're not going to see things make any real improvements. Think about any product you use, there are always things that are just an expected part of it. Security should fall under this category. Imagine if your car didn't come with locks. Or if it had locks, but you had to cut your own keys before you could use them. What if every safe shipped with the same combination, if you wanted a new one you had to pay for it? There are a lot of things we just expect because they make sense.

I'm sure you get the idea I'm shooting for here. Today we treat security like something special. You have to buy a security solution if you want to be secure. Or you have to configure your product a certain way if you want it secure. If we want to really start solving security problems, we have to make sure security isn't something special we talk about later, or plan to add in version two. It has to just be a part of everything. There aren't secure options, all the options need to be what we would call "secure" today. The days of security as an optional requirement are long gone. Remember when we thought those old SSL algorithms could just stick around forever? Nobody thinks that anymore.

How are we going to fix this? That's the real trick. It's easy to talk about demanding security and voting with your pocketbook, but the reality is this isn't very possible today. Security isn't usually a big differentiator. If we expect security to just be part of everything, we also can't expect anyone to see security as a feature they look for. How do we ensure there is a demand for something that is by definition a secondary requirement? How do we get developers to care about something that isn't part of a requirement? How do we get organizations to pay for something that doesn't generate revenue?

There are some groups trying to do the right thing here. I think almost everyone is starting to understand security isn't a feature. Of course just because there's some interest and people are beginning to understand doesn't mean everything will be fixed quickly or easily. We have a long way to go still. It won't be easy, it won't be quick. It's possible everything could go off the rails. The only thing harder than security is planning for security :)

Do you think you know how to fix this mess? Impress me with your ideas: @joshbressers

May 05, 2016

Testing Fernet Tokens on Tripleo

Not the way to do it long term, but this will give you a chance to play with it.

From the controller node:

sudo keystone-manage fernet_setup --keystone-user keystone --keystone-group keystone
sudo crudini --set /etc/keystone/keystone.conf token provider fernet
sudo systemctl restart httpd.service

Test it

$ openstack  token issue -f shell

May 04, 2016

Identity work for the OpenStack Newton release

The Newton Summit is behind us, and we have six months to prepare for the next release in both upstream OpenStack and RDO. Here is my attempt to build a prioritized list of the large tasks I want to tackle in this release.

  1. Federation:  We need to test RDO against Several identity providers (IdP), to include Shibboleth, Keycloak, and Ipsilon.  In order to do this, we need a way to install a test version of the IdP in a virtual machine along-side the undercloud in a Tripleo deploy.  Since it looks like instack and tripleo-quickstart are converging, I’ll probably close this task out with quickstart.  The undercloud setup of quickstart assumes only a single machine (non-ha) and I want to make that set up a second, so the machine is visible to both the outside world and the overcloud.  I already have Ansible roles for deploying Keycloak and Ipsilon in the Rippowam repo that should be easily extensible to Shibboleth as well.
  2. Tripleo LDAP Configuration:  Continuing on the track of configuration identity from Tripleo, we want to be able to automate the steps to integrate LDAP into a Keystone server managed by Tripleo.  Like the Federation steps above and the policy work below, the m,ain effort here is making sure that configuration changes can be preserved across redeploys and will be properly synchronized in an HA deployment.
    1. A prerequisite for domain specific backens is that the deploy uses the V3 Keystone API everywhere. We need to test and confirm that this is the case, and fix any places where that has not been done.
  3. FreeIPA:  There is much of Tripleo that cannot be secured without an identity provider.  Most essential is to have a sound PKI strategy, but beyond that, we need a way to secure both the undercloud and the overcloud VMs, provide identity for each of them, and set proper access controls.  While FreeIPA will not be required for use with Tripleo, it will be possible to make use of a FreeIPA server. To ensure that it is trivial to make a FreeIPA server available for deployers that want one, the additional VM described above can be used to install the FreeIPA server.  The LDAP configuration above can make use of the LDAP backend, or a deploy  can use Federation via SSSD and Kerberos.
  4. Token Revocation: Convert the Keystone revocation events code to use a linear search in the list instead of the current Tree code.  While the Tree code was an interesting approach, it proved to be both too complex for most developers to understand, and not to perform too well.  In addition, the current code performs revocation for many events such a s project deactivation and user deletion which are better checked in the objects themselves during token validation.  Removing these redundant rules should make the revocation test go very fast, and they will be performed implicitly during re-population of Fernet tokens anyway.
  5. Fernet Tokens:  While UUID tokens will not be going away this release, there seems to be little reason to make them the default provider, when we really want people to move to Fernet.  There are  still a few issues with test that were not run when Fernet was not the default, including some time related issues that I hope will be flushed out with the simplification of the revocation code listed above.  Fernet will be the default for Tripleo, if not for Keystone itself.
  6. Oslo messaging identities:  Before we can write ACLs on what service can speak to what, we need to be able to identify the senders and receivers on the queues and topics.  We can do this with no impact to performance by creating a Rabbit user for each service, and one rabbit user for each hypervisor.  This will need to be done for both Devstack and Tripleo.
  7. Policy Customization:  There are several reasons why we are not distributing policy files from the Keystone server.  However, deployers still need to modify policy files, and to do so in a non-surprising way.  I discussed this with the Tripleo team, and the consensus seems to be that we can deploy the policy files as “deployment artifacts” which is, essentially, a tarball with the files inside in their end-locations; e.g. /etc/keystone/policy.json for Keystone.  We should be able to describe a full cycle that applies to all of the deployment tools, not just Tripleo.  While this deserves its own post, the skeleton is:
    1. harvesting the initial policy files from the controller nodes,
    2. packing up the policy archive
    3. storing it where the orchestration engine can find it,
    4. The operator fetching it from the store
    5. operator customization and testing
    6. storing the updated archive
    7. Redeploy of the overcloud with the new policy
  8. Close Bug 968696:  We made some progress on this at the summit.  The most interesting aspect will be a hack that adds “is_admin_project=True” to all tokens that are requesting the “admin” role IFF the admin project is not set in the Keystone configuration.  This will allow the services to update policy files such that, when the configuration option is set, the “admin” role will be properly scoped.  Adding this option to Keystone will allow us to submit changes to all of the other service policy files.

I am supervising a couple other efforts.

  1. Unified Delegation:  Right now, the only way a non-admin user can delegate a role to another user is via trust.  But most users should not have to execute trusts in order to perform mundane operations.  Unified delegation is a way to extend the redelegation properties of trusts to the basic role assignment process.  The work for this has been well underway for a couple releases, and we should be able to finish it up in Newton.  This work is mostly being done by Alexander Makarov.
    1. Allow a user to be able to explicitly request a single role in a token.  This will be useful for limiting exposure.  For example, a user with an Admin role will be able to request a token with only the Member role on it when talking to a third party application.
  2. Python3 LDAP:  The big thing keeping Keystone from being run on Python3 is the LDAP library. python-ldap is Python 2 only, and the maintainer does not plan on making it python3 compatible.  The original plan was to use ldap3, a pure python implementation of an LDAP client, but the protocol is complex enough that we are looking instead to use a fork of python-ldap call pyldap instead.  However, since so much of the ldap3 work is already done, we might end up having both implementations in parallel and testing.  The ldap3 based code is much cleaner, but that cleanup could be applied to pyldap as well. This work is being done by Kristi Nikolla and Roxana Gherle.
  3. Federation Shadow Users:  There is a long standing pain point that administrators must work with userids but only know usernames.  This problem is worse in Federation, where the users might not even have userids yet, as they have never authenticated.  During the design summit, we came up with a plan for a handful of new APIs.  All of these APIs would have the input values for a mapping in the payload, delivered as a dictionary.  Ron De Rose.
    1. Query the results of a mapping call
    2. Check to see if a Federated user is in the shadow table
    3. Initialize an entry in the Shadow user table

There are a couple efforts that other members of my team are working on that are complementary to the the above list.

  1. TLS Everwhere.   Juan Antonio Osorio Robles and Rob Crittenden.
    1. Network configuration for
      1. HTTPS between each server
      2. TLS enabled fort the message broker
      3. TLS/LDAPS for LDAP
      4. TLS to the Database
    2. Establishing Certmonger as the tool to provider certificates
    3. Setting up a Selfsigniong approach for Devstack
    4. Allowing for multiple CAs for deployments
      1. Dogtag/FreeIPA
      2. Anchor
  2. Autoregistration of VMs with an Identity provider. Rob Crittenden.  This one has proven to be an contentious issue, as the hooks inside Nova that we were depending on to implement it got deprecated….due to team members submitting bug reports.  Discussions at the summit are pointing at an approach based on modifing the metadata server.  Michael Still has an idea he is writing up for a general purpose mechanism.

May 02, 2016

Self-Signed SSL/TLS Certificates: Why They are Terrible and a Better Alternative

A Primer on SSL/TLS Certificates

Many of my readers (being technical folks) are probably already aware of the purpose and value of certificates, but in case you are not familiar with them, here’s a quick overview of what they are and how they work.

First, we’ll discuss public-key encryption and public-key infrastructure (PKI). It was realized very early on in human history that sometimes you want to communicate with other people in a way that prevents unauthorized people from listening in. All throughout time, people have been devising mechanisms for obfuscating communication in ways that only the intended recipient of the code would be able to understand. This obfuscation is called encryption, the data being encrypted is called plaintext and the encrypted data is called ciphertext. The cipher is the mathematical transformation that is used to turn the plaintext into the ciphertext and relies upon one or more keys known only to trusted individuals to get the plaintext back.

Early forms of encryption were mainly “symmetric” encryption, meaning that the cipher used the same key for both encryption and decryption. If you’ve ever added a password to a PDF document or a ZIP file, you have been using symmetric encryption. The password is a human-understandable version of a key. For a visual metaphor, think about the key to your front door. You may have one or more such keys, but they’re all exactly alike and each one of them can both lock and unlock the door and let someone in.

Nowadays we also have forms of encryption that are “asymmetric”. What this means is that one key is used to encrypt the message and a completely different key is used to decrypt it. This is a bit harder for many people to grasp, but it works on the basic mathematical principle that some actions are much more complicated to reverse than others. (A good example I’ve heard cited is that it’s pretty easy to figure out the square of any number with a pencil and a couple minutes, but most people can’t figure out a square-root without a modern calculator). This is harder to visualize, but the general idea is that once you lock the door with one key, only the other one can unlock it. Not even the one that locked it in the first place.

So where does the “public” part of public-key infrastructure come in? What normally happens is that once an asymmetric key-pair is generated, the user will keep one of those two keys very secure and private, so that only they have access to it. The other one will be handed out freely through some mechanism to anyone at all that wants to talk to you. Then, if they want to send you a message, they simply encrypt their message using your public key and they know you are the only one who can decrypt it. On the flip side, if the user wanted to send a public message but provide assurance that it came from them, they can also sign a message with the private key, so that the message will contain a special signature that can be decrypted with their public key. Since only one person should have that key, recipients can trust it came from them.

Astute readers will see the catch here: how do users know for certain that your public key is in fact yours? The answer is that they need to have a way of verifying it. We call this establishing trust and it’s exceedingly important (and, not surprisingly, the basis for the rest of this blog entry). There are many ways to establish trust, with the most foolproof being to receive the public key directly from the other party while looking at two forms of picture identification. Obviously, that’s not convenient for the global economy, so there needs to be other mechanisms.

Let’s say the user wants to run a webserver at “”. This server might handle private user data (such as their home address), so a wise administrator will set the server up to use HTTPS (secure HTTP). This means that they need a public and private key (which in this case we call a certificate). The common way to do this is for the user to contact a well-known certificate authority and purchase a signature from them. The certificate authority will do the hard work of verifying the user’s identity and then sign their webserver certificate with the CA’s own private key, thus providing trust by way of a third-party. Many well-known certificate authorities have their public keys shipped by default in a variety of operating systems, since the manufacturers of those systems have independently verified the CAs in turn. Now everyone who comes to the site will see the nice green padlock on their URL bar that means their communications are encrypted.

A Primer on Self-Signed Certificates

One of the major drawbacks to purchasing a CA signature is that it isn’t cheap: the CAs (with the exception of Let’s Encrypt) are out there to make money. When you’re developing a new application, you’re going to want to test that everything works with encryption, but you probably aren’t going to want to shell out cash for every test server and virtual machine that you create.

The solution to this has traditionally been to create what is called a self-signed certificate. What this means is that instead of having your certificate signed by a certificate authority, you instead use the certificates public key to add a signature to the private key. The problem with this approach is that web browsers and other clients that verify the security of the connection will be unable to verify that the server is who it says it is. In most cases, the user will be presented with a warning page that informs them that the server is pretending to be the one you went to. When setting up a test server, this is expected. Unfortunately, however, clicking through and saying “I’m sure I want to connect” has a tendency to form bad habits in users and often results in them eventually clicking through when they shouldn’t.

It should be pretty obvious, but I’ll say it anyway: Never use a self-signed certificate for a production website.

One of the problems we need to solve is how to avoid training users to ignore those warnings. One way that people often do this is to load their self-signed certificate into their local trust store (the list of certificate authorities that are trusted, usually provided by the operating system vendor but available to be extended by the user). This can have some unexpected consequences, however. For example, if the test machine is shared by multiple users (or is breached in a malicious attack), then the private key for the certificate might fall into other hands that would then use it to sign additional (potentially malicious) sites. And your computer wouldn’t try to warn you because the site would be signed by a trusted authority!

So now it seems like we’re in a Catch-22 situation: If we load the certificate into the trusted authorities list, we run the risk of a compromised private key for that certificate tricking us into a man-in-the-middle attack somewhere and stealing valuable data. If we don’t load it into the trust store, then we are constantly bombarded by a warning page that we have to ignore (or in the case of non-browser clients, we may have to pass an option not to verify the client) in which case we could still end up in a man-in-the-middle attack, because we’re blindly trusting the connection. Neither of those seems like a great option. What’s a sensible person to do?

Two Better Solutions

So, let’s take both of the situations we just learned about and see if we can locate a middle ground somewhere. Let’s go over what we know:

  • We need to have encryption to protect our data from prying eyes.
  • Our clients need to be able to trust that they are talking to the right system at the other end of the conversation.
  • If the certificate isn’t signed by a certificate in our trust store, the browser or other clients will warn or block us, training the user to skip validation.
  • If the certificate is signed by a certificate in our trust store, then clients will silently accept it.
  • Getting a certificate signed by a well-known CA can be too expensive for an R&D project, but we don’t want to put developers’ machines at risk.

So there are two better ways to deal with this. One is to have an organization-wide certificate authority rather than a public one. This should be managed by the Information Technologies staff. Then, R&D can submit their certificates to the IT department for signing and all company systems will implicitly trust that signature. This approach is powerful, but can also be difficult to set up (particularly in companies with a bring-your-own-device policy in place). So let’s look at a another solution that’s closer to the self-signed approach.

The other way to deal with it would be to create a simple site-specific certificate authority for use just in signing the development/test certificate. In other words, instead of generating a self-signed certificate, you would generate two certificates: one for the service and one to sign that certificate. Then (and this is the key point – pardon the pun), you must delete and destroy the private key for the certificate that did the signing. As a result, only the public key of that private CA will remain in existence, and it will only have ever signed a single service. Then you can provide the public key of this certificate authority to anyone who should have access to the service and they can add this one-time-use CA to their trust store.

Now, I will stress that the same rule holds true here as for self-signed certificates: do not use this setup for a production system. Use a trusted signing authority for such sites. It’s far easier on your users.

A Tool and a Tale

I came up with this approach while I was working on solving some problems for the Fedora Project. Specifically, we wanted to come up with a way to ensure that we could easily and automatically generate a certificate for services that should be running on initial start-up (such as Cockpit or OpenPegasus). Historically, Fedora had been using self-signed certificates, but the downsides I listed above gnawed at me, so I put some time into it and came up with the private-CA approach.

In addition to the algorithm described above, I’ve also built a proof-of-concept tool called sscg (the Self-Signed Certificate Generator) to easily enable the creation of these certificates (and to do so in a way that never drops the CA’s private key onto a filesystem; it remains in memory). I originally wrote it in Python 3 and that version is packaged for use in Fedora today. This past week as a self-assigned exercise to improve my knowledge of Go, I rewrote the sscg in that language. It was a fun project and had the added benefit of removing the fairly heavyweight dependency on the Python 3 version. I plan to package the golang version for Fedora 25 at some point in the near future, but if you’d like to try it out, you can clone my github repository. Patches and suggestions for functionality are most welcome.

Trusting, Trusting Trust
A long time ago Ken Thompson wrote something called Reflections on Trusting Trust. If you've never read this, go read it right now. It's short and it's something everyone needs to understand. The paper basically explains how Ken backdoored the compiler on a UNIX system in such a way it was extremely hard to get rid of the backdoors (yes, more than one). His conclusion was you can only trust code you wrote. Given the nature of the world today, that's no longer an option.

Every now and then I have someone ask me about Debian's Reproducible Builds. There are other groups working on similar things, but these guys seem to be the furthest along. I want to make clear right away that this work being done is really cool and super important, but not exactly for the reasons people assume. The Debian page is good about explaining what's going on but I think it's easy to jump to some false conclusions on this one.

Firstly, the point of a reproducible build is to allow two different systems to build the exact same binary. This tells us that the resulting binary was not tampered with. It does not tell us the compiler is trustworthy or the thing we built is trustworthy. Just that the system used to build it was clean and the binary wasn't meddled with before it got to you.

A lot of people assume a reproducible build means there can't be a backdoor in the binary. There can due to how the supply chain works. Let's break this down into a few stages. In the universe of software creation and distribution there are literally thousands to millions of steps happening. From each commit, to releases, to builds, to consumption. It's pretty wild. We'll keep it high level.

Here are the places I will talk about. Each one of these could be a book, but I'll keep it short on purpose.
  1. Development: Creation of the code in question
  2. Release: Sending the code out into the world
  3. Build: Turning the code into a binary
  4. Compose: Including the binary in some larger project
  5. Consumption: Using the binary to do something useful
The development stage of anything is possibly the hardest to control. We have reached a point in how we build software that development is now really fast. I would expect any healthy project to have hundreds or thousands of commits every day. Even with code reviews and sign offs, bugs can sneak in. A properly managed project will catch egregious attempts to insert a backdoor.

This is the stage where the project in question cuts a release and puts it somewhere it can be downloaded. A good project will include a detached signature which almost nobody checks. This stage of the trust chain has been attacked in the past. There are many instances of hacked mirrors serving up backdoored content. The detached signature ensures the release is trustworthy. We mostly have trust here solved which is why those signatures are so important.

This is the stage where we take the source code and turn it into a binary. This the step that a reproducible build project has injected trust into. Without a reproducible build stage, there was no real trust here. It's still sort of complicated though. If you've ever looked at the rules that trigger these builds, it wouldn't be very hard to violate trust there, so it's not bullet proof. It is a step in the right direction though.

This step is where we put a bunch of binaries together to make something useful. It's pretty rare for a single build to output the end result. I won't say it never happens, but it's a bit outside what we're worried about, so let's not dwell on it. The threat we see during this stage is the various libraries you bundle with your application. Do you know where they came from? Do they have some level of trust built in? At this point you could have a totally trustworthy chain of trust, but if you include a single bad library, it can undo everything. If you want to be as diligent as possible you won't ship things built by any 3rd parties. If you build it all yourself, you can ensure some level of trust up to this point then. Of course building everything yourself generally isn't practical. I think this is the next stage that we'll end up adding more trust. Various code scanners are trying to help here.

Here is where whatever you put together is used. In general nobody is looking for software, they want a solution to a problem they have. This stage can be the most complex and dangerous though. Even if you have done everything perfectly up to here, if whoever does the deployment makes a mistake it can open up substantial security problems. Better management tools can help this step a lot.

The point of this article isn't to try to scare anyone (even though it is pretty scary if you really think about it). The real point to this is to stress nobody can do this alone. There was once a time a single group could plausibly try to own their entire development stack, those times are long gone now though. What you need to do is look a the above steps and decide where you want to draw your line. Do you have a supplier you can trust all the way to consumption? Do you only trust them for development and release? If you can't draw that line, you shouldn't be using that supplier. In most cases you have to draw the line at compose. If you don't trust what your supplier does beneath that stage, you need a new supplier. Demanding they give you reproducible builds isn't going to help you, they could backdoor things during development or release. It's the old saying: Turst, but verify.

Let me know what you think. I'm @joshbressers on Twitter.

April 24, 2016

Can we train our way out of security flaws?
I had a discussion with some people I work with smarter than myself about training developers. The usual training suggests came up, but at the end of the day, and this will no doubt enrage some of you, we can't train developers to write secure code.

It's OK, my twitter handle is @joshbressers, go tell me how dumb I am, I can handle it.

So anyhow, training. It's a great idea in theory. It works in many instances, but security isn't one of them. If you look at where training is really successful it's for things like how to use a new device, or how to work with a bit of software. Those are really single purpose items, that's the trick. If you have a device that really only does one thing, you can train a person how to use it; it has a finite scope. Writing software has no scope. To quote myself from this discussion:

You have a Turing complete creature, using a Turing complete machine, writing in a Turing complete language, you're going to end up with Turing complete bugs.

The problem with training in this situation is that you can't train for infinite permutations. By its very definition, training can only cover a finite amount of content. Programming by definition requires you to draw on an infinite amount of content. The two are mutually exclusive.

Since you've made it this far, let's come to an understanding. Firstly, training, even how to write software is not a waste of time. Just because you can't train someone to write secure software you can teach them to understand the problem (or a subset of it). The tech industry is notorious for seeing everything as all or none. It's a sliding scale.

So what's the point?

My thoughts on this matter are one of how can we think about the challenges in a different way. Sometimes you have to understand the problem and the tools you have to find better solutions for it. We love to worry about how to teach everyone how to be more secure, when in reality it's all about many layers with small bits of security in each spot.

I hate car analogies, but this time it sort of makes sense.

We don't proclaim the way to stop people getting killed in road accidents is to train them to be better drivers. In fact I've never heard anyone claim this is the solution. We have rules that dictate how to road is to be used (which humans ignore). We have cars with lots of safety features (which humans love to disable). We have humans on the road to ensure the rules are being followed. We have safety built into lots of roads, like guard rails and rumble strips. At the end of the day even with layers of safety built in, there are accidents, lots of accidents, and almost no calls for more training.

You know what's currently the talk about how to make things safer? Self driving cars. It's ironic that software may be the solution to human safety. The point though is that every system reaches a point where the best you can ever do is marginal improvements. Cars are there, software is there. If we want to see substantial change we need new technology that changes everything.

In the meantime, we can continue to add layers of safety for software, this is where most effort seems to be today. We can leverage our existing knowledge and understanding of problems to work on making things marginally better. Some of this could be training, some of this will be technology. What we really need to do is figure out what's next though.

Just as humans are terrible drivers, we are terrible developers. We won't fix auto safety with training any more than we will fix software security with training. Of course there are basic rules everyone needs to understand which is why some training is useful. We're not going see any significant security improvements without some sort of new technology breakthrough. I don't know what that is, nobody does yet. What is self driving software development going to look like?

Let me know what you think. I'm @joshbressers on Twitter.

April 22, 2016

Remotely calling certmongers local signer

It is really hard to make remote calls securely without a minimal Public Key Infrastructure. For a single server development deployment, you can use a self-signed certificate, but once you have multiple servers that need to intercommunicate, you want to have a single signing cert used for all the services. I’m investigating an approach which chains multiple Certmonger instances together.

When Certmonger needs a certificate signed, it generates a Certificate Signing Request (CSR), and then calls a helper application. For a local signing, this executable is


If I want to sign a certificate without going through certmonger, I can first create a local cert database, generate a CSR, and manually sign it:

mkdir ~/certs
certutil -N -d ~certs
certutil -R -s ", O=Younglogic, ST=MA, C=USA" -o ~/mycert.req -a -g 2048 -d ~/certs
/usr/libexec/certmonger/local-submit ~/mycert.req > mycert.pem

To get a remote machine to sign it, I used the following bash script:

#!/bin/sh -x
CERTMONGER_CSR=`cat ~/mycert.req ` 

remotedir=`$SSH mktemp -d -p /home/dhc-user`
echo "$CERTMONGER_CSR" | $SSH tee $remotedir/mycert.req 
new_cert=$( $SSH  /usr/libexec/certmonger/local-submit $remotedir/mycert.req )
echo $new_cert > ~/mycert.pem
$SSH rm $remotedir/mycert.req
$SSH rmdir $remotedir

The /usr/libexec/certmonger/local-submit complies with the interface for Certmonger helper apps. Which means that it can also accept the CSR via the environment variable CERTMONGER_CSR, but as you can see, it also accepts it as an argument. If I drop the explicit definition of this variable, my script should work as a certmonger helper app.

In ~/.config/certmonger/cas/remote


Of course, this will not honor any of the other getcert commands. But we should be able to list the certs.

Call it with:

getcert request -n remote   -c remote -s -d ~/certs/  -N "uid=ayoung,cn=users,cn=accounts,dc=openstack,dc=freeipa,dc=org"
New signing request "20160422020445" added.

getcert list -s

Request ID '20160422020445':
	stuck: no
	key pair storage: type=NSSDB,location='/home/ayoung/certs',nickname='remote',token='NSS Certificate DB'
	certificate: type=NSSDB,location='/home/ayoung/certs',nickname='remote'
	signing request thumbprint (MD5): 5D1D5881 12952298 073F1DF6 48B10CB9
	signing request thumbprint (SHA1): A30FAEDE 1917DD4D 4FA3AAFC C704329E C7783B46
	CA: remote
	expires: unknown
	pre-save command: 
	post-save command: 
	track: yes
	auto-renew: yes

So, not yet. More on this later.

April 20, 2016

Running Keystone Unit Tests against older Versions of RDO Etc

Just because upstrem is no longer supporting Essix doesn’t mean that someone out there is not running it. So, if you need to back port a patch, you might find yourself in the position of having to run unit tests against an older version of Keystone (or other) that does not run cleanly against the files installed by tox. For example, I tried running against an Icehouse era checkout and got a slew of errors like this:

AssertionError: Environmental variable PATH_INFO is not a string: <type> (value: u’/v2.0/tokens/e6aed0a188f1402d9ad3586bc0e35758/endpoints’)

The basic steps are:

  1. Install the packages for the version closest to the one you want to test
  2. checkout your source from git and apply your patch
  3. install any extra rpms required to run the tests
  4. run the test using python -m unittest $TESTNAME

For RDO, the main RPMS can be installed from :

You might need additional RPMS as packaged in EPEL. You don’t however, need to use an installer, you can use yum to install just the Keystone package.

The Dependencies are a little tricky to solve. Tox uses the test-requirements.txt file in the Keystone repo to install, but thes names do not match up with the package names. Often the RPM will be the name of the python package with the “python-” prefix.

Not all of the dependencies are in Fedora, RDO, or EPEL. Many were built just for CI, and are in

For later releases, you can check out the jobs running in : and fetch the set of packages in “all_rpms.txt” but be aware that these are not the set of packages for unit tests. You might need more.

Not every package can be installed this way. For example, pyton-pysaml2 requires a bunch of additional RPMS that I had trouble pulling in. These can still be installed via pip.

April 17, 2016

Software end of life matters!
Anytime you work on a software project, the big events are always new releases. We love to get our update and see what sort of new and exciting things have been added. New versions are exciting, they're the result of months or years of hard work. Who doesn't love to talk about the new cool things going on?

There's a side of software that rarely gets talked about though, and honestly in the past it just wasn't all that important or exciting. That's the end of life. When is it time to kill off the old versions. Or sometimes even kill an entire project. When you do, what happens to the people using it? These are hard things to decide, there aren't good answers usually, it's just not a topic we're good at yet.

I bring this up now because apparently Apple has decided that Quicktime on Windows is no longer a thing. I think everyone can agree that expecting users to find some obscure message on the Internet to know they should uninstall something is pretty far fetched.

The conversation is way bigger than just Apple though. Google is going to brick some old Nest hardware. What about all those old tablets that still work but have no security updates? What about all those Windows XP machines still out there? I bet there are people still using Windows 95!

In some instances, the software and hardware can be decoupled. If you're running XP you can probably upgrade to something slightly better (maybe). Generally speaking though, you have some level of control. If you think about tablets or IoT style devices, the software and hardware are basically the same thing. The software will likely end of life before the hardware stops working. So what does that mean? In the case of pure software, if you need it to get work done, you're not going to uninstall it. It's all really complex unfortunately which is why nobody has figured this out yet.

In the past, you could keep most "hardware" working almost forever. There are cars out there nearly 100 years old. They still work and can be fixed. That's crazy. The thought of 100 year old software should frighten you to your core. They may have stopped making your washing machine years ago, but it still works and you can get it fixed. We've all seen the power tools our grandfathers used.

Now what happens when we decide to connect something to the Internet? Now we've chained the hardware to the software. Software has a defined lifecycle. It is born, it lives, it reaches end of life. Physical goods do not have a predetermined end of life (I know, it's complicated, let's keep it simple), they break, you get a new one. If we add software to this mix, software that creates a problem once it's hit the end of life stage, what do we do? There are two options really.

1) End the life of the hardware (brick it)
2) Let the hardware continue to run with the known bad software.

Neither is ideal. Now there are some devices you could just cut off features. A refrigerator for example. Instead of knowing when to order more pickles it reverts back to only keeping things cold. While this could create confusion in the pickle industry, at least you still have a working device. Other things would be tricky. An internet connected smart house isn't very useful if the things can't talk to each other. A tablet without internet isn't good for much.

I don't have any answers, just questions. We're still trying to sort out what this all means I suspect. If you think you know the answer I imagine you don't understand the question. This one is turtles all the way down.

What do you think? Tell me: @joshbressers

April 13, 2016

Getting Started with Puppet for Keystone

Tripleo uses Puppet to manage the resources in a deployment. Puppet has a command line tool to look at resources.

On my deployed Overcloud, I have:

ls /etc/puppet/modules/keystone/lib/puppet/provider
keystone         keystone_domain_config      keystone_paste_ini  keystone_service  keystone_user_role
keystone_config  keystone_endpoint           keystone.rb         keystone_tenant
keystone_domain  keystone_identity_provider  keystone_role       keystone_user

So I can use the puppet CLI to query the state of my system, or make changes:

To look at the config:

sudo puppet resource keystone_config
keystone_config { 'DEFAULT/admin_bind_host':
  ensure => 'present',
  value  => '',
keystone_config { 'DEFAULT/admin_port':
  ensure => 'present',
  value  => '35357',
keystone_config { 'DEFAULT/admin_token':
  ensure => 'present',
  value  => 'vtNheM6drk4mgKgbAtWQPrYJe',
keystone_config { 'DEFAULT/log_dir':
  ensure => 'present',
  value  => '/var/log/keystone',

OK, Admin Token is gross.

$ sudo puppet resource keystone_config DEFAULT/admin_token
keystone_config { 'DEFAULT/admin_token':
  ensure => 'present',
  value  => 'vtNheM6drk4mgKgbAtWQPrYJe',

Let’s get rid of that:

sudo puppet resource keystone_config DEFAULT/admin_token ensure=absent
Notice: /Keystone_config[DEFAULT/admin_token]/ensure: removed
keystone_config { 'DEFAULT/admin_token':
  ensure => 'absent',

Let’s add a user:

$ sudo puppet resource keystone_users
Error: Could not run: Could not find type keystone_users
[heat-admin@overcloud-controller-0 ~]$ 

Uh oh…what did I do?

[heat-admin@overcloud-controller-0 ~]$ sudo puppet resource keystone_config DEFAULT/admin_token ensure=present value=vtNheM6drk4mgKgbAtWQPrYJe
Notice: /Keystone_config[DEFAULT/admin_token]/ensure: created
keystone_config { 'DEFAULT/admin_token':
  ensure => 'present',
  value  => 'vtNheM6drk4mgKgbAtWQPrYJe',
[heat-admin@overcloud-controller-0 ~]$ sudo puppet resource keystone_user
keystone_user { 'admin':
  ensure  => 'present',
  email   => '',
  enabled => 'true',
  id      => '7cbc569993ae41e7b2736ed2aa727644',

So it looks like the Puppet modules use the Admin token to do operations.

But I really want to get rid of that admin token…

Back on the undercloud, I have created a Keystone V3 RC file. I’m going to copy that to /root/openrc on the overcloud controller.

[stack@undercloud ~]$ scp overcloudrc.v3 heat-admin@
[stack@undercloud ~]$ ssh heat-admin@
[heat-admin@overcloud-controller-0 ~]$ sudo puppet resource keystone_config DEFAULT/admin_token ensure=absent
keystone_config { 'DEFAULT/admin_token':
  ensure => 'absent',
[heat-admin@overcloud-controller-0 ~]$ sudo puppet resource keystone_user
Error: Could not run: Insufficient credentials to authenticate
[heat-admin@overcloud-controller-0 ~]$ sudo cp  overcloudrc.v3 /root/openrc
[heat-admin@overcloud-controller-0 ~]$ sudo puppet resource keystone_user
keystone_user { 'admin':
  ensure  => 'present',
  email   => '',
  enabled => 'true',
  id      => '7cbc569993ae41e7b2736ed2aa727644',

Now let’s add a user:

$ sudo puppet resource keystone_user ayoung ensure=present enabled=true password=FreeIPA4All
Notice: /Keystone_user[ayoung]/ensure: created
keystone_user { 'ayoung':
  ensure  => 'present',
  email   => '',
  enabled => 'false',

Big Shout out to Emilien Macchi who is the Master of Keystone Puppets and taught me about the openrc file.

April 12, 2016

What happened with Badlock?
Unless you live under a rock, you've heard of the Badlock security issue. It went public on April 12. Then things got weird.

I wrote about this a bit in a previous post. I mentioned there that this better be good. If it's not, people will get grumpy. People got grumpy.

The thing is, this is a nice security flaw. Whoever found it is clearly bright, and if you look at the Samba patchset, it wasn't trivial to fix. Hats off to those two groups.
$ diffstat -s samba-4.4.0-security-2016-04-12-final.patch 
 227 files changed, 14582 insertions(+), 5037 deletions(-)
 Here's the thing though. It wasn't nearly as good as the hype claimed. It probably couldn't ever be as good as the hype claimed. This is like waiting for a new Star Wars movie. You have memories from being a child and watching the first few. They were like magic back then. Nothing that ever comes out again will be as good. Your brain has created ideas and memories that are too amazing to even describe. Nothing can ever beat the reality you built in your mind.

Badlock is a similar concept.

Humans are squishy irrational creatures. When we know something is coming one of two things happen. We imagine the most amazing thing ever which nothing will ever live up to (the end result here is being disappointed). Or we imagine something stupid which almost anything will be better than (the end result here is being pleasantly surprised).

I think most of us were expecting the most amazing thing ever. We had weeks to imagine what the worse possible security flaw could be that affects Samba and Windows. Most of us can imagine some pretty amazing things. We didn't get that though. We didn't get amazing. We got a pretty good security flaw, but not one that will change the world. We expected amazing, we got OK, now we're angry. If you look at twitter, the poor guy who discovered this is probably having a bad day. Honestly, there probably wouldn't have been anything that would have lived up to the elevated expectations that were set.

All that said, I do think by doing an announcement weeks in advance created this atmosphere. If this was all quiet until today, we would have been impressed, even if it had a name. Hype isn't something you can usually control. Some try, but by its very nature things get out of hand quickly and easily.

I'll leave you with two bits of wisdom you should remember.

  1. Name your pets, not your security flaws
  2. Never over-hype security. Always underpromise and overdeliver.

What do you think? Tell me: @joshbressers

April 11, 2016

A TFTP Server in Rust

Rust is Pedantic. I’m Pedantic. We get along wonderfully. Since HTTP is way too overdone, I wanted to try something at the Byte twiddling level. I got a very, very basic TFTP server to run and fetch a larger binary file without corrupting it. Time to celebrate with a bragpost.

The code is on Github, and I went full GPL on it.

Some comments are certainly called for. Here is the main loop, that

  • reads a single packet from a UDP socket
  • extracts the OP code
  • calls the appropriate handler function
fn read_message(socket: &net::UdpSocket) {
    let mut file_streams = HashMap::new();

    let mut buf: [u8; 100] = [0; 100];
        let result = socket.recv_from(&mut buf);

        match result {
            Ok((amt, src)) => {
                let data = Vec::from(&buf[0..amt]);
                let connection = Connection{socket: socket, src: &src};
                let mut rdr = Cursor::new(&data);

                if amt < 2{
                    panic!("Not enough data in packet")
                let opcode = rdr.read_u16::<BigEndian>().unwrap();

                match opcode {
                    1 => {
                        file_streams.insert(src, handle_read_request(
                            &mut rdr, &amt, &connection));
                    2 => println!("Write"),
                    3 => println!("Data"),
                    4 => {
                        let chunk = rdr.read_u16::<BigEndian>().unwrap() + 1;
                        file_streams.get_mut(&src).unwrap().send_chunk(&chunk, &connection);
                    5 => println!("ERROR"),
                    _ => println!("Illegal Op code"),
            Err(err) => panic!("Read error: {}", err)

My first hack at reading the opcode just looked at the second byte as u8, as the big endian network approach means that value was also a valid u8 and matching. However, later on, I need to marshall larger and larger numbers into the outgoing buffer, and the byteorder crate handles that for both reading and writing.

However, reading ascii strings this way was not so clean:

 pub fn new(data: &mut Cursor<&Vec <u8>>, amt: &usize,) -> FileStream {
        let mut index = 2;
        for x in 2..20 {
            if data.get_ref().as_slice()[x] == 0{
                index = x;
        let mut full_path = String::from("/home/ayoung/tftp/");
        let filename = match str::from_utf8(&data.get_ref().as_slice()[2..index]) {
            Ok(file_name) => file_name,
            Err(why) => panic!("couldn't read filename: {}",
        println!("filename: {}", filename);

I originally tried using the
read_to_string method on the Cursor, but it did not identify the null, and I ended up with an invalid string. str::from_utf8 Worked properly, once the index is set to skip the start of the buffer.

I really like the form of error handling where the success value is used for assignment, like this:

        let file = match File::open(full_path){
            Err(err) => panic!("Can't open file: {}", err),
            Ok(file) => file,

This Server reads files out of $HOME/tftp. To Test it out, I did:

[ayoung@ayoung541 tmp]$ tftp 8888
tftp> binary
tftp> get Minecraft.jar
tftp> quit
[ayoung@ayoung541 tmp]$ diff ~/tftp/Minecraft.jar Minecraft.jar 
[ayoung@ayoung541 tmp]$ 

I think I need a syntax highlighter (for WordPress) that understands Rust.

Special thanks to this post that got me started.

April 10, 2016

Cybersecurity education isn't good, nobody is shocked
There was a news story published last week about the almost total lack of cybersecurity attention in undergraduate education. Most people in the security industry won't be surprised by this. In the majority of cases when the security folks have to talk to developers, there is a clear lack of understanding about security.

Every now and then I run across someone claiming that our training and education is going great. Sometimes I believe them for a few seconds, then I remember the state of things. Here's the thing. While there is a lot of good training and education opportunities. The ratio between competent security people and developers is without doubt going down. Software engineering positions are growing at more than double the rate of other positions. By definition it's significantly harder to educate a security person, the math says there's a problem here (this disregards the fact that as an industry we do a horrible job of passing on knowledge).

While it's clear students don't care about security, the question is should they?

It's always easy to pull out an analogy here, comparing this to car safety, or maybe architects vs civil engineers. Those analogies never really work though, the rules are just too different. The fundamental problem really boils down to the fact that a 12 year old kid in his basement has access to the exact same tools and technology the guy working on his PhD at MIT does. I'm not sure there has ever been an industry with a similar situation. Generally those in large organizations had access to significant resources that a normal person doesn't. Like building a giant rocket, or a bridge.

Here is what we need to think about.

Would we expect a kid learning how to build a game on his Dad's computer to also learn security? If I was that kid, I would say no. I want to build a game, security sounds dumb.

What if we're a college kid interested in computer algorithms. Security sounds uninteresting and is probably a waste of time. Remember when they made you take that PhyEd class and all the jocks laughed at you while you whispered to yourself about how they'll all be working at a gas station someday? Yeah, that's us now.

Let's assume that normal people don't care about security and don't want to care about security, what does that mean?

The simple answer would be to "fix the tools", but that's sort of chicken and egg. Developers build their own tools at a rather impressive speed these days, you can't really secure that stuff.

What if we sandbox everything? That really only protects the underlying system, most everything interesting these days is in the database, you can still steal all of that from a sandbox.

Maybe we could ... NO, just stop.
So how can we fix this?
We can't.

It's not that the problems are unfixable, it's that we don't understand them well enough. My best comparison here is when futurists wondered how New York could possible deal with all the horse manure if the city kept growing. Clearly they were thinking only in the context of what was available to them at the time. We think in this way too. It's not that we're dumb, I'm certain we don't really understand the problems. The problems aren't insecure code or bad tools. It's something more fundamental than that. Did we expect the people cleaning up after the horses to solve the manure problem?

If we start to think about the fundamentals, what's the level below our current development models? With the above example it was really about transportation, not horses, but horses are what everyone obsessed over. Our problems aren't really developers, code, and education. It's something more fundamental. What is it though? I don't know.

Do you think you know? Tell me: @joshbressers

April 08, 2016

Its a good thing SELinux blocks access to the docker socket.
I have seen lots of SELinux bugs being reported where users are running a container that volume mounts the docker.sock into a container.  The container then uses a docker client to do something with docker. While I appreciate that a lot of these containers probbaly need this access, I am not sure people realize that this is equivalent to giving the container full root outside of the contaienr on the host system.  I just execute the following command and I have full root access on the host.

docker run -ti --privileged -v /:/host fedora chroot /host

SELinux definitely shows its power in this case by blocking the access.  From a security point of view, we definitely want to block all confined containers from talking to the docker.sock.  Sadly the other security mechanisms on by default in containers, do NOT block this access.  If a process somehow breaks out of a container and get write to the docker.sock, your system is pwned on an SELinux disabled system. (User Namespace, if it is enabled, will also block this access also going forward).

If you have a run a container that talks to the docker.sock you need to turn off the SELinux protection. There are two ways to do this.

You can turn off all container security separation by using the --privileged flag. Since you are giving the container full access to your system from a security point of view, you probably should just do this.

docker run --privileged -v /run/docker.sock:/run/docker.sock POWERFULLCONTAINER

If you want to just disable SELinux you can do this by using the --security-opt label:disable flag.

docker run --security-opt label:disable -v /run/docker.sock:/run/docker.sock POWERFULLCONTAINER

Note in the future if you are using User Namespace and have this problem, a new flag --userns=host flag is being
developed, which will turn off user namespace within the container.

April 05, 2016


I had an interesting exchange last week. We had someone in the chatroom asking for help, morgan was doing his part, and I chimed in and proceeded to get attacked.

Me: I’m not 100% buying this, why are you using numerics?
Other: thats because you are stupid.

Um. OK. That was unexpected, but… a couple lines alter there was a “;)” so, I shrugged it off. I can take a bit of ribbing, and chose to respond by being self deprecating…

Other: you dont know unix
Me: Nah, not at all. Isn;’t that what happens when a guy starts singing soprano?

As a core, part of my job is to review code, change requests, and challenge even the assumptions already baked into the code. In this case, the code in question was something that origianlly existed to support PKI tokens, which are on their way out. Now the code was being used for Fernet key rotation. As you can see, it was an interesting technical discussion. But in focusing on the technical side, we all let the attack go un-responded. Then later:

Me: you called me stupid. Now you need to justify yourself

It might not be apparent, but the other person had got under my skin, and I responded. This, too was a mistake. However, at the time I thought I was being levelheaded. We had not had any attacks like this in our chat room yet, and I was not prepared for it. Now, if it is going to happen to anyone, I am glad it was be addressed at me; I can take it. But it is not about me.

Other: you have just proved over 100 lines that you truly are stupid, I will frame it and put it on the wall in austin
Me: go for it.

Me: feel free to insult me all you want. Do not insult other people in this channel.

Yep…I’m still too cocky. And you can see that the tension in the room rises. Other people were excusing themselves from the conversation. But you can see, I am finally starting to switch into oversight mode.

The discussion turns technical again. I’m now fully engaged. The problem set has to do with getpwnam calls inside containers…a lot like the issues we had BProc nsswitch module. Since I was irate, I was being detail oriented. But I think that the whole conversation about user identity inside versus out side the container is an important one to have. And then this:

Other: I will seriously kill you in Austin!

This was over the line. Based in the context, I did not take it as a real threat, and I still don’t. It was an expression of frustration of someone that thinks they know the answer having to justify it yet again. I know that frustration. But, I also know not to respond like this.

Other: seriously, are you lobotomized?

Then back to sane technical conversation, and later:

Other: what planet are you from?

To be honest, this is back to borderline to acceptable ribbing to me. I responded with a link to my My Alma Mater.

I needed to go do family things. As I left, I finally said what I should have said at the begining.

Me: let me make one thing clear. We are all professionals here. You’ve picked on the most mellow of people to insult, which is why you have gotten away with it. The rest of the core developers here are getting very antsy at your attitude. I am willing to help,. but if you keep this up, it will be a kickban, and I do have perms in this room to enforce that. Have I made myself clear?

Later, the PTL, coming back from dinner, had this to say:

I’m the keystone project lead. I’ve read the scroll back earlier and have determined that you have violated the code of conduct (, and lost the privilege to be a community member. There was a threat of violence, even if it was meant as hyperbole, that is unacceptable; it chases away other community members and makes for a hostile environment. As a result i am performing a kickban. For information on how to proceed please see the code of conduct website.

Let me be clear, I am not blaming myself as the victim here. I am blaming myself as a community member that did not stop an abusive conversation at the start. It does not matter that I was the target of the abuse: I am core on the project, and it is within my role to maintain a positive environment in the chat room. I made a mistake in letting the other person off simply because it was me. I would not let someone abuse anyone else in my community that way, but didn’t realize that, in not addressing it, I was encouraging a hostile atmosphere. It is about the community, and the environment in the chat room, and the project.

So, we were all caught unprepared. Fortunately, no real harm done. If something like this happens again, here is how I think we should react.

The first time someone says something that is a personal attack, the conversation stops.
It should not be the person attacked responsibility to respond, although they are certainly welcome to do so. The response should come from the other community members, especially those that have Operators privileges in the in the room.

Post a link to the code of conduct. Here is is:

If I am there, and you are the person attacked, and do not feel comfortable responding, send me a private message and I will step in. If not, find the operator of the room and ask them to respond. If they don’t, please send them a link to this blog, but also send a message to the Foundation member listed on the Code of Conduct page.

Its easy to feel sorry for the person that we kick banned. “Oh he was probably a dumb kid” or whatever, but remember, there are many, many people that get abused and just disappear. You would not take your kids to a party at a place that had a history of brawls in the parking lot. You don’t want a summer intern, new to Open Source and Open Stack and programming, to witness a hostile environment.

And I am still not certain it makes sense to allow calls that take numeric user ids from inside a container when the Name Service Switch module that backs them is not available. I am willing to be proved wrong, though, and look forward to discussing it with anyone that cares to discuss it civilly.

Adding a new filename transition rule.
Way back in 2012 we added File Name Transition Rules.  These rules allows us to create content with the correct label
in a directory with a different label.  Prior to File Name Transition RUles Administrators and other tools like init scripts creating content in a directory would have to remember to execute restorecon on the new content.  In a lot of cases they would forget
and we would end up with mislabeled content, in some cases this would open up a race condition where the data would be
temporarily mislabeled and could cause security problems.

I recently recieved this email and figured I should write a blog.

Hiya everyone. I'm an SELinux noob.

I love the newish file name transition feature. I was first made aware of it some time after RHEL7 was released (, probably thanks to some mention from Simon or one of the rest of you on this list. For things that can't be watched with restorecond, this feature is so awesome.

Can someone give me a quick tutorial on how I could add a custom rule? For example:

filetrans_pattern(unconfined_t, httpd_sys_content_t, httpd_sys_rw_content_t, dir, "rwstorage")

Of course the end goal is that if someone creates a dir named "rwstorage" in /var/www/html, that dir will automatically get the httpd_sys_rw_content_t type. Basically I'm trying to make a clone of the existing rule that does the same thing for "/var/www/html(/.*)?/uploads(/.*)?".
Thanks for reading.

First you need to create a source file myfiletrans.te

policy_module(myfiletrans, 1.0)
    type unconfined_t, httpd_sys_content_t, httpd_sys_rw_content_t;
filetrans_pattern(unconfined_t, httpd_sys_content_t, httpd_sys_rw_content_t, dir, "rwstorage")

Quickly looking at the code we added.  When writing policy, if you are using type fields, unconfined_t, httpd_sys_content_t, httpd_sys_rw_content_t, that are defined in other policy packages, you need to specify this in a gen_require block.  This is similar to defining extern variables to be used in "C".  Then we call the filetrans_pattern interface.  This code tells that kernel that if a process running as unconfined_t, creating a dir named rwstorage in a directory labeled httpd_ssy_content_t, create the directory as httpd_sys_rw_content_t.

Now we need to compile and install the code, note that you need to have selinux-policy-devel, package installed.

make -f /usr/share/selinux/devel/Makefile myfiletrans.pp
semodule -i myfiletrans.pp

Lets test it out.

# mkdir /var/www/html/rwstorage
# ls -ldZ /var/www/html/rwstorage
drwxr-xr-x. 2 root root unconfined_u:object_r:httpd_sys_rw_content_t:s0 4096 Apr  5 08:02 /var/www/html/rwstorage

Lets make sure the old behaviour still works.

# mkdir /var/www/html/rwstorage1
# ls -lZ /var/www/html/rwstorage1 -d
drwxr-xr-x. 2 root root unconfined_u:object_r:httpd_sys_content_t:s0 4096 Apr  5 08:04 /var/www/html/rwstorage1

This is an excellent way to customize your policy, if you continuously see content being created with the incorrect label.

April 03, 2016

Security is really about Risk vs Reward
Every now and then the conversation erupts about what is security really? There's the old saying that the only secure computer is one that's off (or fill in your favorite quote here, there are hundreds). But the thing is, security isn't the binary concept: you can be secure, or insecure. That's not how anything works. Everything is a sliding scale, you are never secure, you are never insecure. You're somewhere in the middle. Rather than bumble around about your risk though, you need to understand what's going on and plan for the risk.

So this brings us to the idea of risk and reward. Rather than just thinking about security, you have to think about how everything fits together. It doesn't matter if your infrastructure is super secure if nobody can do their jobs. As we've all seen over and over, if security gets in the way, security loses. Every. Single. Time.

I think about this a lot, and I've come up with a graph that I think can explain this nicely.

Don't think in the context of secure or insecure. Think in the context of how much risk do I have? Once you understand what your risks are, you can decide if the level of risk you're taking on can be justified by what the result of that risk will be. This of course holds true for nearly all decisions, not just security, but we'll just focus on security.

The above graph puts things into 4 groups. If you have a high level of risk with minimal reward (the Why box), you're making a bad decision. Anything you have in that "Why" box probably needs to go away ASAP, you will regret it someday.

Additionally, if your sustaining operations are of high risk, you're probably doing something wrong. Risk is hard and drains an organization, you should be conducting your day to day operations in a manner than poses a low risk as the day to day is generally not where the high reward is.

The place you want to be is in the "Innovation" or "No Brainer" boxes. Accepting a high level of risk isn't always a bad thing, assuming that risk comes with significant rewards. You can imagine a situation where you are deploying a new and untested technology, but the benefits to conducting business could change everything, or perhaps using a new, untested vendor for the first time.

We have to be careful with risk. Risk can be crippling if you don't understand and manage it. It can also destroy everything you've done if you let it get out of hand. Many of us find ourselves in situations where all risk is seen as bad. Risk isn't always bad, risk is never zero. It's up to everyone to determine what their acceptable level of risk is. Never forget though, that sometimes we need to bump up our level of risk to get to the next level of reward. Just make sure you can bring that risk back under control once you start seeing the outcomes.

What do you think? Let me know: @joshbressers
FreeIPA for Tripleo

My last post showed how to allocate an additional VM for Tripleo. Now I’m going to go through the steps to deploy FreeIPA on it. However, since I went through all of the effort to write Ossipee and Rippowam, I am going to use those to do the heavy lifting.

This one is pretty grungy. I’m going to generate a punch-list from, and will continue to clean up the steps as I go, but first I want to just get it working.

To start, turn the ironic node into a server:

openstack server create  --flavor compute  --image overcloud-full  --key-name default idm

Now, in order to run Ansible, we need a custom inventory. I’ve done a small hack to Ossipee to get it to Generate the appropriate Inventory from the Nova servers in Tripleo’s undercloud.

Ossipee needs to use the V3 version of the Keystone API, so lets convert the V2 stackrc into a v3 and source that. Grab the script from and run

./ stackrc > stackrc.v3
. ./stackrc.v3 

A good way to check that you are using V3 is to do a V3 only operation, like list domains:

openstack domain list
| ID                               | Name       | Enabled | Description        |
| 33c86e573f094787adb2e808c723dcca | heat_stack | True    |                    |
| default                          | Default    | True    | The default domain |

Grab Ossipee and run the inventoroy generator.

git clone
python ./ossipee/ > inventory.ini

This gets an inventory file that looks roughly like the ones Ossipee created before, but uses the same host group names as the rest of Tripleo:




IN addition, I think the ipa_forwarder values are deployment specific, and I have them wrong. Look in the controller resolv.conf to see what they should be:

$ ssh heat-admin@ cat /etc/resolv.conf
; generated by /usr/sbin/dhclient-script

The nameserver value should be the IPA address of the newly created idm vm.

openstack server list
| ID                                   | Name                    | Status | Networks             |
| f81e7122-5d8e-4377-b855-80c28116197d | idm                     | ACTIVE | ctlplane= |
| c1bf48cb-659f-4f9f-aa9d-1c0d4bcae06d | overcloud-controller-0  | ACTIVE | ctlplane= |
| c1e2069f-4ef1-461b-86a9-2fd2bb321a8a | overcloud-novacompute-0 | ACTIVE | ctlplane= |

Obviously, Ossipee needs some work here, but this is likely going to be replaced by Heat work shortly. Anyway, adjust the IP addresses accordingly.

Now grab Rippowman :

git clone

And Ansible from EPEL. Note that this needs to be two calls, as the first installs the repo used by the second.

sudo yum -y install epel-release
sudo yum -y install ansible

Rippowam still has the host name as ipa in the ipa playbook. You can change either the inventory or the Rippowam code to match. I changed Rippowman like this:

diff --git a/ipa.yml b/ipa.yml
index e0ea50c..c17c2b5 100644
--- a/ipa.yml
+++ b/ipa.yml
@@ -1,10 +1,10 @@
-- hosts: ipa
+- hosts: idm
   remote_user: "{{ cloud_user }}"
   tags: all
   tasks: []
-- hosts: ipa
+- hosts: idm
   sudo: yes
   remote_user: "{{ cloud_user }}"

The inventory file is set up for later when it needs to talk to the Overcloud controllers. Heat changes the cloud user to heat-admin. Create this on the idm machine.

ssh centos@ sudo useradd -m  heat-admin  -G wheel
ssh centos@ sudo mkdir /home/heat-admin/.ssh
ssh centos@ sudo chown heat-admin:heat-admin /home/heat-admin/.ssh
ssh centos@ sudo cp /home/centos/.ssh/authorized_keys /home/heat-admin/.ssh/
ssh centos@ sudo chown heat-admin:heat-admin /home/heat-admin/.ssh/authorized_keys
ssh heat-admin@ ls
ssh heat-admin@ pwd

I also manually went in an tweaked the /etc/sudoers values to let password-less sudo work for heat-admin. Not a long term approach I would suggest, but these are just my current development notes.

Make sure that ansible works:

 ansible -i inventory.ini --user heat-admin --sudo idm -m setup

Output not pasted here for brevity.

The machine needs a FQDN to deploy. I am going to continue the pattern from before, where the clustername jhas some aspectof the user name. Since the baremetal host is
ayoung-dell-t1700 this cluster will be ayoung-dell-t1700.test and the FQDN for this host will be idm.ayoung-dell-t1700.test

sudo vi /etc/hostname
sudo hostname `cat /etc/hostname`

Run the ipa playbook.

 ansible-playbook -i inventory.ini rippowam/ipa.yml 

Assuming that runs successfully, do the same kind of thing with the keyclock.yml play book: edit it to change the hostgroup to idm. and run.

Also, seems i have some typos in roles/keycloak/tasks/main.yml:

index 59f67c7..cce462d 100644
--- a/roles/keycloak/tasks/main.yml
+++ b/roles/keycloak/tasks/main.yml
@@ -114,7 +114,7 @@
     - keycloak
   copy: src=keycloak-proxy.conf 
-        owner=root group=rootmode="u=rw,g=r,o=r"
+        owner=root group=root mode="u=rw,g=r,o=r"
@@ -122,5 +122,5 @@
     - keycloak
   service: name=httpd
-           state=irestarted
+           state=restarted

Fix those and then:

 ansible-playbook -i inventory.ini rippowam/ipa.yml 

Ugh, that was messy. need to clean it up. But it did work.

Now, how to go look at our newly deployed servers? The best bet seems to be to use sshuttle.

From the desktop (not the undercloud)

sshuttle -e "ssh -F $HOME/.quickstart/ssh.config.ansible" -r undercloud -v

In order to point a browser at it, need to have an entry in the hosts file. For me: idm.ayoung-dell-t1700.test

Keycloak needs to be initialized. Start by SSHing to the idm machine, and then.

$ cd /var/lib/keycloak/keycloak-1.9.0.Final  
$ sudo bin/ -u admin
Press ctrl-d (Unix) or ctrl-z (Windows) to exit
Added 'admin' to '/var/lib/keycloak/keycloak-1.9.0.Final/standalone/configuration/keycloak-add-user.json', restart server to load user
(reverse-i-search)`': ^C
$ sudo systemctl restart keycloak.service
Extra node on Tripleo Quickstart

I’ve switched my Tripleo development to using tripleo quickstart. While the steps to create an additional VM for the IdM server are roughly what I posted before, it is different enough to warrant description.

When creating the undercloud, you can tell the quickstart script to use an alternative configuration. In my case, I have one based on “minimal” that has the additional node defined:

in tripleo-quickstart/playbooks/centosci/ipa.yml

control_memory: 8192
compute_memory: 8192

  - name: control_0
    flavor: control
  - name: compute_0
    flavor: compute
  - name: idm_0
    flavor: compute

# FIXME(trown) This is only temporarily set to false so we can
# change CI to use these settings without changing what is run.
# Will be changed to true in a follow-up patch.
introspect: false

extra_args: ""
tempest: false
pingtest: true

Now when kicking off the quickstart:

 ./ -c playbooks/centosci/ipa.yml -t all ayoung-dell-t1700

Note that I am using tags, and this one does the complete undercloud and overcloud deployment.

March 29, 2016

Ransomware is scary, but not for the reasons you think it is
If you've been paying any attention for the past few weeks, you know what ransomware is. It's a pretty massive pain for anyone who gets it, and in some cases, it was a matter of life and death.

It's easy to understand what makes this stuff scary, but there's another angle most haven't caught on to yet, and it's not a pleasant train of thought.

Firstly, let's consider a few thing.

  1. Getting rid of malware is expensive
  2. Recovering from a compromise is even more expensive
  3. Ransomware has a clear and speedy ROI
  4. Normal people don't have a ton of important data
So let's start with #1 and #2. If you are compromised in some way, even if it's just some malware, it's going to cost a lot to clean up the mess. Probably magnitudes more than the current ransom. It's cheaper to pay than to clean up the mess. This will remain true as there isn't an incentive for the authors to price themselves out of business. The ransomware universe is econ 101. If you're an economics PhD student and you want to look impressive, write your thesis about this stuff; you'll probably win some sort of award. We'll get back to the economics of this shortly.

If we think about #3 it's pretty obvious. You write some malware, it literally pays you money. This means there is going to be more and more of this showing up on the market. Regular old malware can't compete with this. Ransomware has a business model, a really good one, except for that whole being illegal and really unethical part. Non ransomware doesn't have such an impressive business model. This is a turning point in the malware industry.

To date most of the ransomware seems to have been targeted at normal people. The price was a bit too high I thought, $400 is probably more than the average person will or can pay. The last few we've heard about hit hospitals though, and they charged a higher fee. This is basic economics. A hospital has more money than a person, and the data and infrastructure means the difference between life and death. Paying the fee will cost less than hiring a security professional. And when you're in the business of keeping people alive, you'll pay that fee if it means getting back to whatever it is you do.

If the ransomware knows where it is and what sort of data it has, the price can fluctuate based on economics. Some businesses can afford a few days of downtime, some can't. The more critical the data and system is to your business, the more you'll be willing to pay. Of course there is a ceiling on this, if the cost of hiring some security folks is less than the cost of paying the ransom, anyone with a clue is going to pay the expert to clean up the mess. This is the next logical step in the evolution of this business model.

If we keep thinking about this and bring the ransomware to its logical conclusion, the future versions are going to request a constant ongoing payment. Not a one time get out of jail free event. Why charge them once when you can charge them over and over again? Most modern infrastructures are complex enough it will be hard to impossible to remove an extremely clever bit of malware. It's going to be time for the good guys to step it up here, more thoughts on that some other day though.

There is even a silly angle that's fun to ponder. We could imagine ransomware that attacks other known malware. If the ransomware is getting a constant ongoing payment, it would be bad if anything else could remove it, from legitimate software to other ransomware. While I don't think antivirus and ransomware will ever converge on the same point, it's still fun to think about.

What do you think? Let me know: @joshbressers

March 28, 2016

Who can +2 a patch?

You are trying to push along a patch…and it dawns on you that you have no idea who to ask. The answer is out there.

Assuming it is an OpenStack project, it configuration for it is stored in the ACLs section of the project-config repo.. For example, the Keystone project is managed by the keystone-core group.

Once you have the name of the group, you can look up on gerrit and list it to find the members. Here is Keystone.

March 24, 2016

Identifying the message sender with Rabbit MQ and Kombu

Yesterday I showed how to identify a user when using the Pika library. However, Oslo Messaging still relies on the Kombu library. This, too, supports matching the user_id in the message to the username used to authenticate to the broker.

Again, a modification of an example from the documentation.

The important modification is to add the user_id to the publish call.

    {'name': '/tmp/lolcat1.avi', 'size': 1301013},
    exchange=media_exchange, routing_key='video',
    declare=[video_queue], user_id=rabbit_userid)

This Kombu based code does not raise an exception if the message is rejected. However, looking at the count of the number of messages and the dump of the properties on the consumer, only those where the usernames match.

The sender sends two message,on that matches, one that does not. Count the messages in the video queue

$ sudo rabbitmqctl list_queues | grep video
video	0

Queue is empty.

Send two messages, one where the name matches, one where it does not.

$ python
owned message sent
misowned message sent

Now check there is only one message in the queue:

$ sudo rabbitmqctl list_queues | grep video
video	1

And receive the message:

$ python
{u'name': u'/tmp/lolcat1.avi', u'size': 1301013}
sent by 
$ sudo rabbitmqctl list_queues | grep video
video	0

Only the one where the name matches is passed through.

Here is the full code.

from kombu import Connection, Exchange, Queue

media_exchange = Exchange('media', 'direct', durable=True)
video_queue = Queue('video', exchange=media_exchange, routing_key='video')

def process_media(body, message):
    print body

rabbit_host = ''
rabbit_userid = 'a5f56bdb395f53864a80b95f45dc395e94c546c7'
rabbit_password = '06814091f31ad50b55a3509e9e3916082cce556d'

# connections                                                                                   
with Connection('amqp://%s:%s@%s//' % (rabbit_userid, rabbit_password, rabbit_host)) as conn:

    # produce                                                                                   
    producer = conn.Producer(serializer='json')
        producer.publish({'name': '/tmp/lolcat1.avi', 'size': 1301013},
                         exchange=media_exchange, routing_key='video',
                         declare=[video_queue], user_id=rabbit_userid)
        print("owned message sent")
    except Exception as e:
        raise e
        producer.publish({'name': '/tmp/phish.avi', 'size': 1301013},
                         exchange=media_exchange, routing_key='video',
                         declare=[video_queue], user_id='fake_user')
	print("misowned message sent")
    except Exception as e:
        raise e

from kombu import Connection, Exchange, Queue

media_exchange = Exchange('media', 'direct', durable=True)
video_queue = Queue('video', exchange=media_exchange, routing_key='video')

def process_media(body, message):
    print ('recved')
    print body
    print ('sent by ')
    print'user_id','no user id in message')

rabbit_host = ''
rabbit_userid = 'a5f56bdb395f53864a80b95f45dc395e94c546c7'
rabbit_password = '06814091f31ad50b55a3509e9e3916082cce556d'

# connections                                                                                   
with Connection('amqp://%s:%s@%s//' % (rabbit_userid, rabbit_password, rabbit_host)) as conn:

    with conn.Consumer(video_queue, callbacks=[process_media]) as consumer:
	# Process messages and handle events on all channels                                    
	while True:

# Consume from several queues on the same channel:                                              
video_queue = Queue('video', exchange=media_exchange, key='video')
image_queue = Queue('image', exchange=media_exchange, key='image')

March 23, 2016

I'm going to do something really cool in 3 weeks! ... Probably.
If you pay attention to the security news, there is something coming called Badlock. It just set off a treasure hunt for security flaws in Samba. Rather than link to the web site (I'd rather not support this sort of behavior), let's think about this as reasonable people.

I can imagine three possible outcomes to the events that have been set in motion.
  1. On April 12 a truly impressive security flaw will be disclosed. We will all be impressed.
  2. Someone will figure this out before April 12, they have no incentive to act responsibly and will publish what the know right away, better to be first than to be right!
  3. Whatever happens on April 12 won't be nearly as interesting or exciting as we've been led to believe. The world will say a collective 'meh' and we'll go back to looking at pictures of cats.
Numbers 1 and 2 rely on the flaw being quite serious. If it is serious, I suspect there is a far greater chance of #2 happening than #1. As an industry we should hope for #3, we don't need more terrible flaws.

The really crazy thing to think about is if the issue isn't actually serious, it probably won't be found. Everyone is looking for a giant problem. They're going to pass up minor issues (if you do find these, please report them, it's still useful work). The prize is a pot of gold we've been told, not some proverbial the journey is the reward nonsense.

The thing everyone always should remember in a situation like this is there are a lot of really smart people on the planet. If you think of something clever or discover something new, there are huge odds someone else did too. 3 weeks almost guarantees someone else can figure out whatever it is you found. It's especially interesting in this case since we have a name "Badlock" so we know it probably involves locking. We know it affects Samba and Windows. And we know who it was found by so we can look at which bits of Samba they've been working on lately. That's a lot of information for a clever person.

The real thing we need to think about here though is what's actually happening. There is a bigger story for us to think about around all these named issues.

If you name an issue, you are making a claim that it's very serious. There are literally thousands of security issues per year, and maybe ten gets fancy names. A name suggests this is something we should care about. That this issue is special. Except that's not really the case all the time. There have been a lot of named issues that weren't very impressive.

What happens in situations like this, when there is a near constant flow of information that's not really important? People stop listening. The human brain is really good at filtering out noise. Named security issues are going to become noise at the current rate things are going. I'm not opposed to this, I think you should name your pets not your security issues.

Send your comments to Twitter: @joshbressers
Identifying the message sender with Rabbit MQ (and Pika)

When sending a message via Rabbit MQ, a sender can chose to identify itself, or hid its identity, but it cannot lie.

I modified the Pika examples to work with hard coded user-id and password. Specifically, I added:

properties = pika.BasicProperties(user_id=rabbit_userid);

And used that as a parameter in:

                      body='Hello World!')

On the receiving side, in the callback, make use of the properties:

def callback(ch, method, properties, body):
    print(" [x] Received %r" % body)
    print("Message user_id is %s " % properties.user_id)

And the output looks like this:

$ python &
[1] 5062
$  [*] Waiting for messages. To exit press CTRL+C
$ python 
 [x] Sent 'Hello World!'
 [x] Received 'Hello World!'
Message user_id is a5f56bdb395f53864a80b95f45dc395e94c546c7 

Modify the sender so the ids don’t match and:

    raise exceptions.ChannelClosed(method.reply_code, method.reply_text)
pika.exceptions.ChannelClosed: (406, "PRECONDITION_FAILED - user_id property set to 'rabbit_userid' but authenticated user was 'a5f56bdb395f53864a80b95f45dc395e94c546c7'")

Here is the complete code.

#!/usr/bin/env python
import pika

rabbit_host = ''
rabbit_userid = 'a5f56bdb395f53864a80b95f45dc395e94c546c7'
rabbit_password = '06814091f31ad50b55a3509e9e3916082cce556d'
credentials = pika.PlainCredentials(rabbit_userid, rabbit_password)

connection = pika.BlockingConnection(pika.ConnectionParameters(
    host=rabbit_host, credentials=credentials))
channel =
properties = pika.BasicProperties(user_id=rabbit_userid);
                      body='Hello World!')
print(" [x] Sent 'Hello World!'")

#!/usr/bin/env python
import pika

rabbit_host = ''
rabbit_userid = 'a5f56bdb395f53864a80b95f45dc395e94c546c7'
rabbit_password = '06814091f31ad50b55a3509e9e3916082cce556d'
credentials = pika.PlainCredentials(rabbit_userid, rabbit_password)

connection = pika.BlockingConnection(pika.ConnectionParameters(
    host=rabbit_host, credentials=credentials))
channel =


def callback(ch, method, properties, body):
    print(" [x] Received %r" % body)
    print("Message user_id is %s " % properties.user_id)


print(' [*] Waiting for messages. To exit press CTRL+C')

March 20, 2016

Everything is fine, nothing to see here!
As anyone who reads this blog knows, I've been talking about soft skills in security for quite some time now. I'm willing to say it's one of our biggest issues at the moment, a point which I get a lot of disagreement on. I have sympathy for anyone who thinks this stuff doesn't matter, I used to be there. Until I had to start talking to people. As soon as you talk to most anyone outside the security echo chamber, you see what's actually going on, and it's not great.

I won't say the security industry is one fire, but nobody is going to disagree many of the things we're looking after aren't in great shape.Outside of a few very large successful companies, most organizations have serious and significant security problems that could result in a massive breach, it's just that nobody has tried, yet. I see a few reasons for many of our trouble, I always seem to come back to soft skills.

There is a skills shortage
But there's training, look at all the training, there's so much training everything is fine!

There is training. Some is good, some is bad (like anything). It's not that training in itself is bad, I would encourage anyone to go get training. It's not great though either. Most training today focuses on the symptoms of our problems. Things like pen testing, secure coding (which doesn't exist), network defense. Things that while important, aren't the real problems. I'll talk more about this in a future post, but chew on this. There are about 96,000 CISSP holders, and about 5 million security jobs. That's messed up.

Today everyone who is REALLY, I mean REALLY REALLY good at security got there through blood sweat and tears. Nobody taught them what they know, they learned it on their own. Many of us didn't have training when we were learning these things. Regardless of this though, if training is fantastic, why does it seem there is a constant march toward things getting worse instead of better? That tells me we're not teaching the right skills to the right people. The skills of yesterday don't help you today, and especially don't help tomorrow. By its very definition, training can only cover the topics of yesterday.

How do we skill up for the needs of today and tomorrow? The first thing we have to do is listen to the people running, building, and using the technology of today. They know things we don't just as we know things they don't. Security is still almost always an afterthought, even with everyone claiming it's the most important thing ever. This is our failing, not theirs.

We build our skills by being an industry that doesn't complain and belittle everyone who tries anything. We are notorious for being brutal to the new guys. Everyone starts somewhere, don't be a jerk. I know a lot of people who are afraid to do almost anything in the security space because they know if they're not 100% correct, they will have to deal with a torrent of negative comments. It's not worth talking to us in many instances.

As an industry we are failing our customers
Things aren't that bad, sure there are some breaches but in general everything is going pretty good!

If you read any news stories, you know things aren't OK. There are loads of breaches and high profile security issues. Totally broken devices, phones that can't be updated, light bulbs that can join a botnet. As an industry we like to stick to our echo chamber circles where we spin news and events into something that isn't our fault. We laugh at the stupid people doing stupid things. We find a person or event that can explain away the incident as a singular event, not a systematic problem. The problems are growing exponentially, our resources are growing linearly, this means that our resources are actually decreasing every year.

Most organizations don't have proper security and won't even have a proper conversation until they end up on the wrong side of a major compromise. It's our fault nobody is talking about this stuff, even if the breach isn't technically our fault.

What advice are we giving people they can actually use? In almost every organization the security group is feared and hated. We're not peers, we're enemies, and they are ours. This isn't helpful to anyone. How many of you actually sit down and have honest real discussions with those you are supposed to help. Do you actually understand their problems (not our problems with them, their actual problems, the ones they have to route around security to solve). Security shouldn't be something bolted on later, we're lucky if it's even that in most cases.

Security is seen as a business prohibitor, not a business enabler
I know what needs to be done, nobody wants to listen!

We've all been here before. We suggest something to the group, they ignore us. We are the problem here, not the people we are supposed to help. We blame them for not listening when the real issue is we're not talking to them properly. We throw information at people, complex hard to understand information, then rather than hold their hand when they don't understand, we declare them stupid and go find someone who agrees with us, then we complain about how dumb everyone else is and how smart we are.

They aren't stupid.

Neither are we.

The disconnect is one of talking. We have to talk to people, we have to engage with them. We have to build a relationship. You can't expect to show up and be listened to if you're not respected. People trust those they respect. If you're in that circle of respect, you won't be taken seriously. On a regular basis I hear security tell me "they'll know I was right when we get hacked!" That doesn't even make sense. It's your failure for not creating a level of understanding for the issue, it's not their fault for ignoring you.

Soft skills are hard
You don't even know what you're talking about, my skills are fine!

Maybe. I won't say I'm an expert. I am constantly thinking about the state of things and how interactions go. What I do know though is the things I discuss here are based on my real world lessons. Every day is a new journey into being a new and better security person. I know how the technology works, what I don't know is how people work. It's a journey to figure this out. I'm pretty sure I'm on to something because people I respect are encouraging, yet there are some who are trying very hard to discourage this conversation. As the old saying goes, if nobody is complaining about what you're doing, you're not doing anything interesting.

Here's what I do honestly believe. You can disagree with me or anyone you want. The industry isn't solving the problems it needs to solve. Those problems will be solved eventually, there are many industry groups forming to start talking about some of these problems, the groups mostly talk though, that's not a skill we're good at. Even then I see a lot of criticism toward those groups. Problems won't be solved quickly by doing the same thing we do today. I'm confident a big part of our future is humanizing security. Security today isn't for humans, security tomorrow needs to be. We get there by cooperating, not by arguing and insulting.

Think I'm an idiot, let me know: @joshbressers

March 19, 2016

Convert a keystone.rc from V2 to V3

Everything seems to produce V2 versions of the necessary variables for Keystone, and I am more and more dependant on the V3 setup. Converting from one to the other is trivial, especially if the setup uses the default domain.

if [ "$#" -ne 1 ]
    echo "Usage $0 <keystone.rc>"
    exit 1

. $1

NEW_OS_AUTH_URL=`echo $OS_AUTH_URL | sed 's!v2.0!v3!'`

cat << EOF
export OS_USER_DOMAIN_NAME=Default

And to run it:

[stack@undercloud ~]$ ./ stackrc > stackrc.v3 
[stack@undercloud ~]$ . ./stackrc.v3 
[stack@undercloud ~]$ openstack domain list
| ID                               | Name       | Enabled | Description        |
| d702b42eb3694279bdd0cc74a848a103 | heat_stack | True    |                    |
| default                          | Default    | True    | The default domain |

March 18, 2016

Sausage Factory: Multiple Edition Handling in Fedora

First off, let me be very clear up-front: normally, I write my blog articles to be approachable by readers of varying levels of technical background (or none at all). This will not be one of those. This will be a deep dive into the very bowels of the sausage factory.

The Problem

Starting with the initiative, the Fedora Project embarked on a journey to reinvent itself. A major piece of that effort was the creation of different “editions” of Fedora that could be targeted at specific user personas. Instead of having a One-Size-Fits-Some Fedora distribution, instead we would produce an operating system for “doers” (Fedora Workstation Edition), for traditional infrastructure administrators (Fedora Server Edition) and for new, cloudy/DevOps folks (Fedora Cloud Edition).

We made the decision early on that we did not want to produce independent distributions of Fedora. We wanted each of these editions to draw from the same collective set of packages as the classic Fedora. There were multiple reasons for this, but the most important of them was this: Fedora is largely a volunteer effort. If we started requiring that package maintainers had to do three or four times more work to support the editions (as well as the traditional DIY deployments), we would quickly find ourselves without any maintainers left.

However, differentiating the editions solely by the set of packages that they deliver in a default install isn’t very interesting. That’s actually a problem that could have been solved simply by having a few extra choices in the Anaconda installer. We also wanted to solve some classic arguments between Fedora constituencies about what the installed configuration of the system looks like. For example, people using Fedora as a workstation or desktop environment in general do not want OpenSSH running on the system by default (since their access to the system is usually by sitting down physically in front of a keyboard and monitor) and therefore don’t want any potential external access available. On the other hand, most Fedora Server installations are “headless” (no input devices or monitor attached) and thus having SSH access is critical to functionality. Other examples include the default firewall configuration of the system: Fedora Server needs to have a very tightened default firewall allowing basically nothing in but SSH and management features, whereas a firewall that restrictive proves to be harmful to usability of a Workstation.

Creating Per-Edition Default Configuration

The first step to managing separate editions is having a stable mechanism for identifying what edition is installed. This is partly aesthetic, so that the user knows what they’re running, but it’s also an important prerequisite (as we’ll see further on) to allowing the packaging system and systemd to make certain decisions about how to operate.

The advent of systemd brought with it a new file that describes the installed system called os-release. This file is considered to be authoritative for information identifying the system. So this seemed like the obvious place for us to extend to include information about the edition that was running as well. We therefore needed a way to ensure that the different editions of Fedora would produce a unique (and correct) version of the os-release file depending on the edition being installed. We did this by expanding the os-release file to include two new values: VARIANT and VARIANT_ID. VARIANT_ID is a machine-readable unique identifier that describes which version of Fedora is installed. VARIANT is a human-readable description.

In Fedora, the os-release file is maintained by a special RPM package called fedora-release. The purpose of this package is to install the files onto the system that guarantee this system is Fedora. Among other things, this includes os-release, /etc/fedora-release, /etc/issue, and the systemd preset files. (All of those will become interesting shortly).

So the first thing we needed to do was modify the fedora-release package such that it included a series of subpackages for each of the individual Fedora editions. These subpackages would be required to carry their own version of os-release that would supplant the non-edition version provided by the fedora-release base package. I’ll circle back around to precisely how this is done later, but for now accept that this is true.

So now that the os-release file on the system is guaranteed to contain appropriate VARIANT_ID, we needed to design a mechanism by which individual packages could make different decisions about their default configurations based on this. The full technical details of how to do this are captured in the Fedora Packaging Guidelines, but the basic gist of it is that any package that wants to behave differently between two or more editions must read the VARIANT_ID from os-release during its %posttrans (post-transaction) phase of package installation and place a symlink to the correct default configuration file in place. This needs to be done in the %posttrans phase because, due to the way that yum/dnf processes the assorted RPMs, there is no other way to guarantee that the os-release file has the right values until that time. This is because it’s possible for a package to install and run its %post script between the time that the fedora-release and fedora-release-EDITION package gets installed.

That all assumes that the os-release file is correct, so let’s explore how that is made to happen. First of all, we created a new directory in /usr/lib called /usr/lib/os.release.d/ which will contain all of the possible alternate versions of os-release (and some other files as well, as we’ll see later). As part of the %install phase of the fedora-release package, we generate all of the os-release variants and then drop them into os.release.d. We will then later symlink the appropriate one into /usr/lib/os-release and /etc/os-release during %post.

There’s an important caveat here: the /usr/lib/os-release file must be present and valid in order for any package to run the %systemd_post scripts to set up their unit files properly. As a result, we need to take a special precaution. The fedora-release package will always install its generic (non-edition) os-release file during its %post section, to ensure that the %systemd_post scripts will not fail. Then later if a fedora-release-EDITION package is installed, it will overwrite the fedora-release one with the EDITION-specific version.

The more keen-eyed reader may have already spotted a problem with this approach as currently described: What happens if a user installs another fedora-release-EDITION package later? The short answer was that in early attempts at this: “Bad Things Happened”. We originally had considered that installation of a fedora-release-EDITION package atop a system that only had fedora-release on it previously would result in converting the system to that edition. However, that turned out to A) be problematic and B) violate the principle of least surprise for many users.

So we decided to lock the system to the edition that was first installed by adding another file: /usr/lib/variant which is essentially just a copy of the VARIANT_ID line from /etc/os-release. In the %post script of each of the fedora-release subpackages (including the base subpackage), it is checked for its contents. If it does not exist, the %post script of a fedora-release-EDITION package will create it with the appropriate value for that edition. If processing reaches all the way to the %posttrans script of the fedora-release base package (meaning no edition package was part of the transaction), then it will write the variant file at that point to lock it into the non-edition variant.

There remains a known bug with this behavior, in that if the *initial* transaction actually includes two or more fedora-release-EDITION subpackages, whichever one is processed first will “win” and write the variant. In practice, this is effectively unlikely to happen since all of the install media are curated to include at most one fedora-release-EDITION package.

I said above that this “locks” the system into the particular release, but that’s not strictly true. We also ship a script along with fedora-release that will allow an administrator to manually convert between editions by running `/usr/sbin/convert-to-edition -e <edition>`. Effectively, this just reruns the steps that the %post of that edition would run, except that it skips the check for whether the variant file is already present.

Up to now, I’ve talked only about the os-release file, but the edition-handling also addresses several other important files on the system, including /etc/issue and the systemd presets. /etc/issue is handled identically to the os-release file, with the symlink being created by the %post scripts of the fedora-release-EDITION subpackages or the %posttrans of the fedora-release package if it gets that far.

The systemd presets are a bit of a special case, though. First of all, they do not replace the global default presets, but the do supplement them. This means that what we do is symlink in an edition-specific preset into the /usr/lib/systemd/system-preset/ directory. These presets can either enable new services (as in the Server Edition, where it turns on Cockpit and rolekit) or disable them (as in Workstation Edition where it shuts off OpenSSH). However, due to the fact that systemd only processes the preset files during its %post phase, we need to force systemd to reread them after we add the new values.

We need to be careful when doing this, because we only want to apply the new presets if the current transaction is the initial installation of the fedora-release-EDITION package. Otherwise, an upgrade could override choices that the user themselves have made (such as disabling a service that defaults to enabled). This could lead to unexpected security issues, so it has to be handled carefully.

In this implementation, instead of just calling the command to reprocess all presets, we instead parse the preset files and just process only those units that are mentioned in them. (This is to be overcautious in case any other package is changing the default enabled state besides systemd, such as some third-party RPMs that might have `systemctl enable httpd.service` in their %post section, for example.)

Lastly, due to the fact that we are using symlinks to manage most of this, we had to write the %post and %posttrans scripts in the built-in Lua implementation carried by RPM. This allowed us to call posix.symlink() without having to add a dependency on coreutils to do so in bash (which resulted in a circular dependency and broken installations). We wrote this as a single script that is imported by the RPM during the SRPM build phase. This script is actually coped by rpmbuild into the scriptlet sections verbatim, so the script must be present in the dist-git checkout on its own and not even as part of the exploded tarball. So when modifying the Lua script, it’s important to make sure to modify the copy in dist-git as well as the copy upstream.

March 17, 2016

Dependency Injection in Python applied to Ossipee

I reworked my OpenStack API based cluster builder Ossipee last weekend. It makes heavy use of dependency resolution now, and breaks apart the super-base class into properly scoped components. is the worker classes. These are designed to be reusable components. is a merger of the config and plan objects from before. Killed the majority of the copying. It is the least cleaned up of any of the code. I might continue to rework this. has the factories which determine how to build the components. Python’s lack of type support is really apparent here, leading to boilerplate code.

I particularly like how the Session and client factories now work.

session factory :

def session_factory(resolver):
    parser = resolver.resolve(argparse.ArgumentParser)
    args = parser.parse_args()
    auth_plugin = ksc_auth.load_from_argparse_arguments(args)
        if not auth_plugin.auth_url:
            logging.error('OS_AUTH_URL not set.  Aborting.')
    except AttributeError:

    session = ksc_session.Session.load_from_cli_options(
        args, auth=auth_plugin)

    return session

nova client factory :

def nova_client_factory(resolver):
    session = resolver.resolve(ksc_session.Session)
    nova_client = novaclient.Client('2', session=session)
    return nova_client

They are registered like this:

depend.register(ksc_session.Session, session_factory)
depend.register(novaclient.Client, nova_client_factory)

So, the worker object to create a host declares its dependencies in the constructor.

class Server(object):
    def __init__(self, nova, neutron, spec): =
        self.nova = nova
        self.neutron = neutron
        self.spec = spec

Ideally, the parameters to the __init__ function would have documentation about types. While that can make use of ABC, it does not help for all the code out there that does not use ABC. ABC would be useful for providing a means to automate dependency resolution.

I pulled the resolver code I wrote a few years into the tree for now for ease of development. I’ll probably merge it back to the original project. The biggest addition is the ability to name components, to be able to distinguish between two components that implement the same contract. Without this, I had subclass proliferation. Python tuples really make sense here: A Factory is registered via the tuple of the class and the (optional) name, and is resolved the same way.

Mixing named and unnamed components is still a little grungy, but it makes it nice to have a component that can both be a top level worker, and a piece of another workflow.

An instance is resolved via the scope, the class, and the name. We can cheat, and pass in a string as the Class for the name of the “worker”, but I don’t think I want to encourage that.

I only have a single scope for Ossipee, the global scope, as it was fairly short lived. I’d like to try the code in a web app with both request and session scope to see how well it works to organize things.

March 16, 2016

Tie Your Rabbit Down

I’ve been running the Tripleo Quickstart to setup my development deployments. While looking into the setup, I noticed that the default Rabbit deployment is wide open. I can’t see anything other than firewall port blocking in place. I dug deeper.

All of the services use the following values to talk to the Queues

  RabbitUserName:  guest
  RabbitPassword: guest

The Access Control List (ACL) allows all powers over all queues. There is no Transport Layer Security on the network communication.

I was able to address the first issue by editing the script that Tripleo Quickstart generates. There is a heredoc section that sets many of the defaults that go into the yaml confiog file used as the input for openstack overcloud create. I added:

  RabbitUserName:  fubar
  RabbitPassword: fumtu

And confirmed that the cloud worked with these changes by running

git clone
tripleo-ci/scripts/  --overcloud-pingtest

As well as sshing to the controller and running

$ sudo rabbitmqctl list_users
Listing users ...
fubar	[administrator]
$ sudo grep -i rabbit_password /etc/nova/nova.conf 
# Deprecated group;name - DEFAULT;rabbit_password

While I was tempted to tackle this in Quickstart, I think it is better to leave the issue visible there and instead tackle it in the Tripleo library.

We deploy all of Rabbit in a single vhost:

$ sudo rabbitmqctl list_vhosts
Listing vhosts ...

But we do allow for the separation of the RPC mechanism from the Notifications:

In the Nova config file:

# The topic compute nodes listen on (string value)
#  (string value)


The Keystone config file only has the notifications section. All have the Rabbit Userid and Password in the clear.

The Oslo RPC call is based on creating a response Queue. I would like to permit only the intended RPC target to write to this response Queue. However, these queues are generated using a Random UUID.

def _get_reply_q(self):
        with self._reply_q_lock:
            if self._reply_q is not None:
                return self._reply_q

            reply_q = 'reply_' + uuid.uuid4().hex

            conn = self._get_connection(rpc_common.PURPOSE_LISTEN)

            self._waiter = ReplyWaiter(reply_q, conn,

            self._reply_q = reply_q
            self._reply_q_conn = conn

This makes it impossible to write a regular expression to limit the set of accessible queues.

What services actually have presence on the compute nodes? (some lines removed for clarity)

$ sudo lsof -i tcp:amqp
neutron-o 17236    neutron    8u  IPv4  40581      0t0  TCP overcloud-novacompute-0.localdomain:53049->overcloud-controller-0.localdomain:amqp (ESTABLISHED)
neutron-o 17236    neutron   19u  IPv4  40590      0t0  TCP overcloud-novacompute-0.localdomain:53058->overcloud-controller-0.localdomain:amqp (ESTABLISHED)
nova-comp 17269       nova    4u  IPv4  40572      0t0  TCP overcloud-novacompute-0.localdomain:53047->overcloud-controller-0.localdomain:amqp (ESTABLISHED)
nova-comp 17269       nova   19u  IPv4 130115      0t0  TCP overcloud-novacompute-0.localdomain:53157->overcloud-controller-0.localdomain:amqp (ESTABLISHED)
ceilomete 17682 ceilometer   12u  IPv4 130381      0t0  TCP overcloud-novacompute-0.localdomain:53162->overcloud-controller-0.localdomain:amqp (ESTABLISHED)

In order to trace the connections, I created three rabbit users witth uuidgen based passwords:

sudo rabbitmqctl add_user overcloud-ceil-0 28d90d7c-1ebb-47a6-b58b-3df7aef1f6bf
sudo rabbitmqctl add_user overcloud-neutron-0 1290a77d-35a1-4afa-b5ea-cbc8f9387754
sudo rabbitmqctl add_user overcloud-novacompute-0 53493010-37b3-4188-bd88-b933b9322c7c
sudo rabbitmqctl add_user keystone 4810a2c6-60f0-4014-8fbb-d628ad9d52f9
sudo rabbitmqctl set_permissions overcloud-ceil-0 ".*" ".*" ".*"
sudo rabbitmqctl set_permissions overcloud-neutron-0 ".*" ".*" ".*"
sudo rabbitmqctl set_permissions overcloud-novacompute-0 ".*" ".*" ".*"
sudo rabbitmqctl set_permissions keystone ".*" ".*" ".*"

First, I tested editing the Keystone server on the controller, and was able to see the user change from guest to keystone.

Then, I used the appropriate values on the compute node for the rabbit_user_id and rabbit_password values in the files:


Then restarted the node. After reboot, Nova and Neutron came back, but Ceilometer was not happy (even after cycling the services on both the control node and the compute node.

$ sudo lsof -i tcp:amqp
neutron-o 1680 neutron    8u  IPv4  23125      0t0  TCP overcloud-novacompute-0.localdomain:49085->overcloud-controller-0.localdomain:amqp (ESTABLISHED)
neutron-o 1680 neutron   19u  IPv4  23449      0t0  TCP overcloud-novacompute-0.localdomain:49096->overcloud-controller-0.localdomain:amqp (ESTABLISHED)
nova-comp 1682    nova    4u  IPv4  24066      0t0  TCP overcloud-novacompute-0.localdomain:49097->overcloud-controller-0.localdomain:amqp (ESTABLISHED)
nova-comp 1682    nova   20u  IPv4 487795      0t0  TCP overcloud-novacompute-0.localdomain:49582->overcloud-controller-0.localdomain:amqp (ESTABLISHED)

Going back to the controller> There is obviously a 1 to 1 relationship between the connections from the compute node and the entities that rabbitmqctl allows us to list:

$sudo rabbitmqctl list_connections
keystone	43714	running
keystone	43921	running
overcloud-neutron-0	49085	running
overcloud-neutron-0	49096	running
overcloud-novacompute-0	49097	running
overcloud-novacompute-0	49582	running

With this information we should be able to put together a map of which service talks on which channel.

This is a complex system. I’m going to do some more digging, and see if I can come up with an approach to lock things down a bit better.

March 13, 2016

Containers are like sandwiches
During the RSA conference, I was talking about containers and it occurred to me we can think about them like a sandwich. Not so much that they're tasty, but rather where does your container come from. I was pleased that almost all of the security people I spoke with understand the current security nightmare containers are. The challenge of course is how do we explain what's going on to everyone else. Securtiy is hard and we're bad at talking about it. They also didn't know what Red Hat was doing, which is totally our own fault, but we'll talk about that somewhere else.

But containers are sandwiches. What does that mean? Let's think about them in this context. You can pick up a sandwich. You can look at it, you can tell basically what's going on inside. Are there tomatoes? Lettuce? Ham? Turkey? It's not that hard. There can be things hiding, but for the most part you can get the big details. This is just like a container. Fedora? Red Hat? Ubuntu? It has httpd, great. What about a shell? systemd? Cool. There can be scary bits hidden in there too. Someone decided to replace /bin/sh with a python script? That's just like hiding the olives under the lettuce. What sort of monster would do such a thing!

So now that we have the image of a sandwich in our minds, let's think about a few scenarios.

Find it on a bench
If you're walking through the park and you see a sandwich just laying on a bench what would you do? You might look around, wondering who left this tasty delight, but you're not going to eat it. Most people wouldn't even touch it, who put it there, where did it come from, how old is it, does it have onions? So many questions and you honestly can't get a decent answer. Even if someone could answer the questions, would you eat that sandwich? I certainly wouldn't.

Finding a sandwich on a bench is the public container registry. If this is all you know, you wouldn't think there's anything wrong with doing this, but like the public registry, you don't always know what you're getting. I wonder how many of those containers saw update for the glibc flaw from a few weeks ago? It's probably easier not knowing.

Get it from a scary shop with questionable ingredients
A long time ago I was walking around in New York and decided to hop into a sandwich shop for a quick bite. As I reached for the door, there was a notice from the health department. I decided to keep walking. Even if you can get your sandwich from a shop, if the shop is scary, you could find yourself in trouble.

There are loads of containers available out there you can download that aren't trusted sources. Don't download random containers from random places. It's no different than trying to buy a sandwich from a filthy shop that has to shoo the rats out of the kitchen with a broom.

Get it from a nice shop that uses old ingredients
We've all seen those places selling sandwiches that look nice. The sign is painted, the windows are clean. When you walk in the tables are clean enough to eat off of! But then you order and it's pretty clear everything is old and dried out. You might be able to sneak out the back door before the old man putting it together notices you're not there anymore.

This is currently a huge danger in the container space. Containers are super hip right now so there are plenty of people doing work in this space. Many of these groups don't even know they have a problem. The software in your containers is a lot like sandwich meat. After a few weeks it probably will start to smell, and after a month it's going to do some serious damage to anyone who consumes it.

Be sure to ask your container supplier what they're shipping, where it came from and how fresh it is. It would not be reasonable to ask "If this container was a sandwich would you eat it?"

Get it from a nice shop that uses nice ingredients
This is the dream. You walk into a nice shop. The nice person behind the counter takes your order and using the freshest ingredients possible constructs a sandwich shaped work of art. You take pictures and post them to all your friends explaining this sandwich is what your life was always missing and you didn't know it before now.

This is why you need a partner you can trust when it comes to container content. The closer to the source you can get the better. Ask questions abut the content. Where did it come from? Who is taking care of it? How can I prove any of this? Who is updating it? Containers are a big deal, they're new and exciting. They're also very misunderstood. Only use fresh containers. If the content it more than a few months old, you're eating a sandwich off a park bench. Don't each sandwiches off park benches. Ask hard questions. If your vendor can't answer them, you need to try the shop across the street. Part of the magic of containers is they are the result of truly commoditizing the operating system, you can get container content from a lot of sources, find a good one.

If we think about our infrastructure like we think about public health, you don't want to be responsible for making everyone sick. You need to know what you're using, where it came from, how fresh it is, who put it together, and what's in it. It's not enough to pretend everything is fine. Everything is not fine.

March 07, 2016

The interesting things from RSA are what didn't happen, and containers are sandwiches
The RSA conference is done. It was a very long and busy show, there were plenty of interesting people there and lots of clever ideas and things to do.

I think the best part is what didn't happen though. We love talking about the exciting things from the show, I'm going to talk about the unexciting non events I was waiting to happen (but thankfully they did not).

The DROWN issue came and went. It wasn't very exciting, it got the appropriate amount of attention. Basically SSLv2 is still broken, don't use it for any reasons. If you use SSLv2, it's like licking the handrail at the airport. Nobody is going to feel bad for you.

There were keynotes by actors. The world continues to turn (pun intended). But really, these keynotes are about being entertaining, I didn't go, because well, they're actors :) But I suspect they were entertaining. No doubt this will happen more and more as there are more and more security conferences, finding good keynotes will only get harder. They should hire that guy from the Hackers movie next.

There weren't any exciting hacking events. Not that stunt hacking is a thing for RSA, I'm glad nobody tried anything new. I'm sure Blackhat will be a very different story. We shall wait and see.

And most importantly, I wasn't booed off the stage :P
I was pleased with how my talk went. Attendance was light but that's expected on a Friday morning. The thing that made the happiest is that they had to kick our group out of the room for the next talk, not because I rambled on but because I got everyone in the room talking to each other. It was fantastic.

On to the interesting bit of the trip though. I found the most interest when I was talking about Red Hat's concept of a trusted container registry. Today if you're using the public registry it's comparable to finding a sandwich on a bench at the park. You can look at it, you can tell it has ham and lettuce, but I mean, it's a sandwich you found on a bench. Are you going to eat that?

If you want a nice sandwich you're going to go to a sandwich shop, order a sandwich, and watch someone make it for you. You can then go and sit on the bench if you want.

The idea behind Red Hat's trusted registry is we have a container registry for Red Hat customers. We control all the content in the registry, we know exactly what it is. We know where it came from. We control the sandwich supply chain from start to finish. No mystery meats here!

All the security people I talked to know that containers are currently a bit of a security circus. None of them knew what Red Hat was doing. This is of course a great opportunity for Red Hat to spread the word. Stay tuned for more clever sandwich jokes.

March 04, 2016

What Can Talk To What on the OpenStack Message Broker

If a hypervisor is compromised, the Nova Compute instance running on that node is also compromised. If the compute instance is compromised, then its access to the Message Queue has to be considered tainted as well. What degree of risk does this pose?

I mention the compute node, but really, any service that has access to the broker is a vector for attack. This includes any third party application that listens for, say, Keystone notifications for audit purposes.

At the bottom of this article I have posted an inventory from a recent Tripleo deployment. There are a lot of exchanges and queus, and reading through them is informative.

What we need is a table showing who can read from and who can write to each of these elements.

My first hack at an ACL approach:

  • The default rule should be “read only”.
  • If a service is responsible for creating an exchange or a queue, it should get write access.
  • Beyond that, that service should grant explicit write granted to specific services for a given queue/exchange.

What is the start state?

$ sudo rabbitmqctl list_users
Listing users ...
guest	[administrator]
$ sudo rabbitmqctl list_permissions
Listing permissions in vhost "/" ...
guest	.*	.*	.*

So, by default, all the services connect as the same user, and have full permissions to read and write on everything.

I will state that only the Keystone server should be able to write to the keystone topic, and, by default, only Ceilometer should be reading from it.

$ sudo rabbitmqctl list_exchanges
Listing exchanges ...
	direct	direct
amq.fanout	fanout
amq.headers	headers
amq.match	headers
amq.rabbitmq.log	topic
amq.rabbitmq.trace	topic
amq.topic	topic
ceilometer	topic
central	topic
cert_fanout	fanout
cinder	topic
cinder-scheduler_fanout	fanout
cinder-volume_fanout	fanout
compute_fanout	fanout
conductor_fanout	fanout
consoleauth_fanout	fanout
dhcp_agent_fanout	fanout
engine_fanout	fanout
glance	topic
heat	topic
heat-engine-listener_fanout	fanout
ironic	topic
keystone	topic
l3_agent_fanout	fanout
magnetodb	topic
magnum	topic
neutron	topic
neutron-vo-QosPolicy-1.0_fanout	fanout
nova	topic
openstack	topic
q-agent-notifier-dvr-update_fanout	fanout
q-agent-notifier-network-update_fanout	fanout
q-agent-notifier-port-delete_fanout	fanout
q-agent-notifier-port-update_fanout	fanout
q-agent-notifier-security_group-update_fanout	fanout
q-agent-notifier-tunnel-delete_fanout	fanout
q-agent-notifier-tunnel-update_fanout	fanout
q-l3-plugin_fanout	fanout
q-plugin_fanout	fanout
q-reports-plugin_fanout	fanout
reply_1cbc785538484554850f69dda902c537	direct
reply_748d4640dbab4284bae19fe086af14e8	direct
reply_ab42e35c548d48b48c9ba0fc3ac93ec7	direct
reply_b37538409ae84436804ccd1b1c0a3bdd	direct
reply_c6bebd23c7e24a5c9a06730b42d317cf	direct
reply_f34034fd84e347e8b6aeedc49f97282d	direct
sahara	topic
sahara-ops_fanout	fanout
scheduler_fanout	fanout
swift	topic
trove	topic
zaqar	topic

Here are the Queues

$ sudo rabbitmqctl list_queues
Listing queues ...
cert	0
cert.overcloud-controller-0.localdomain	0
cert_fanout_c8d9d81c87d84e728cb498a0d434c825	0
cinder-scheduler	0
cinder-scheduler.hostgroup	0
cinder-scheduler_fanout_7969a98120ca4f2097af3ade0ba159ef	0
cinder-volume	0
cinder-volume.hostgroup@tripleo_iscsi	0
cinder-volume_fanout_1520069c024c4c6490fdbb6f336819cc	0
compute	0
compute.overcloud-novacompute-0.localdomain	0
compute_fanout_7dc21bc0422b4d4c9addb151e9e2d8ba	0
conductor	0
conductor.overcloud-controller-0.localdomain	0
conductor_fanout_9f3ff7a1e8b146fc9b5dccb1aa80f119	0
consoleauth	0
consoleauth.overcloud-controller-0.localdomain	0
consoleauth_fanout_4b36518037784e7aad7ce7049b89d089	0
dhcp_agent	0
dhcp_agent.overcloud-controller-0.localdomain	0
dhcp_agent_fanout_8776747599464cc3b80a56b731841fd7	0
engine	0
engine.overcloud-controller-0.localdomain	0
engine_fanout_1030eeeec4644022b5a9f7259f7e0018	0
engine_fanout_2ffb137908934072af6a15d3a6b9e616	0
engine_fanout_bac7897eb7ac43f0a561a0c12c408e26	0
engine_fanout_f439912a1d80484ea38ab784a95fb656	0
heat-engine-listener	0
heat-engine-listener.31d42df9-f64f-451d-b9d6-7ef46229c929	0
heat-engine-listener.8730caa4-4104-4d71-bcc1-08ae17a41420	0
heat-engine-listener.b1dd3b6e-d085-4005-a4c9-a29b6f91c3f6	0
heat-engine-listener.ea60f788-af0c-49be-9325-8cefe60cc53a	0
heat-engine-listener_fanout_3b2879946f754cd9bd4becc6b8448071	0
heat-engine-listener_fanout_725990e3081f4ddc839a1bbf78520873	0
heat-engine-listener_fanout_aa1ec5483825470797e11b73cddaf223	0
heat-engine-listener_fanout_cb251c73b0f64d64ac3e38b529e1de30	0
l3_agent	0
l3_agent.overcloud-controller-0.localdomain	0
l3_agent_fanout_83ec229461dd4bd68d4e0debc7f9a39d	0
metering.sample	0
neutron-vo-QosPolicy-1.0	0
neutron-vo-QosPolicy-1.0.overcloud-controller-0.localdomain	0
neutron-vo-QosPolicy-1.0.overcloud-novacompute-0.localdomain	0
neutron-vo-QosPolicy-1.0_fanout_5f54eaed13cb47da8d80b82223f87e47	0
neutron-vo-QosPolicy-1.0_fanout_f57c66031cb4437ea75a23ec1698b287	0	0
notifications.sample	0
q-agent-notifier-dvr-update	0
q-agent-notifier-dvr-update.overcloud-controller-0.localdomain	0
q-agent-notifier-dvr-update.overcloud-novacompute-0.localdomain	0
q-agent-notifier-dvr-update_fanout_33c92818f86644899082458f893c6157	0
q-agent-notifier-dvr-update_fanout_82d2beb050dc4dde956a86cc6e2e5562	0
q-agent-notifier-network-update	0
q-agent-notifier-network-update.overcloud-controller-0.localdomain	0
q-agent-notifier-network-update.overcloud-novacompute-0.localdomain	0
q-agent-notifier-network-update_fanout_0ef20a72234a45718ece2328d230e2c6	0
q-agent-notifier-network-update_fanout_737fb57587f3453cb14d41b01c5fcdcc	0
q-agent-notifier-port-delete	0
q-agent-notifier-port-delete.overcloud-controller-0.localdomain	0
q-agent-notifier-port-delete.overcloud-novacompute-0.localdomain	0
q-agent-notifier-port-delete_fanout_03a026eb000c4efd89e15dc7834b8fdd	0
q-agent-notifier-port-delete_fanout_acd74597e74041abace267f898a2ce31	0
q-agent-notifier-port-update	0
q-agent-notifier-port-update.overcloud-controller-0.localdomain	0
q-agent-notifier-port-update.overcloud-novacompute-0.localdomain	0
q-agent-notifier-port-update_fanout_28a72273f7234c3b9c4cb4d4f64854c1	0
q-agent-notifier-port-update_fanout_b8ccb7d92aa64bfb9106ecd10c59cfea	0
q-agent-notifier-security_group-update	0
q-agent-notifier-security_group-update.overcloud-controller-0.localdomain	0
q-agent-notifier-security_group-update.overcloud-novacompute-0.localdomain	0
q-agent-notifier-security_group-update_fanout_008a11d67bc54f12bce4a03387a64000	0
q-agent-notifier-security_group-update_fanout_a9f578980b6f4c1ca65629e887bff76e	0
q-agent-notifier-tunnel-delete	0
q-agent-notifier-tunnel-delete.overcloud-controller-0.localdomain	0
q-agent-notifier-tunnel-delete.overcloud-novacompute-0.localdomain	0
q-agent-notifier-tunnel-delete_fanout_1769bab276d44b34a6db34498db522c8	0
q-agent-notifier-tunnel-delete_fanout_cb6f4fd56c8f40b9b2f3b0a6484b70ad	0
q-agent-notifier-tunnel-update	0
q-agent-notifier-tunnel-update.overcloud-controller-0.localdomain	0
q-agent-notifier-tunnel-update.overcloud-novacompute-0.localdomain	0
q-agent-notifier-tunnel-update_fanout_47164fceef534b298b8ea4ee34f9282b	0
q-agent-notifier-tunnel-update_fanout_be1a1e9cc37c4f94921131c3346eed48	0
q-l3-plugin	0
q-l3-plugin.overcloud-controller-0.localdomain	0
q-l3-plugin_fanout_bf639b0aebe6466dba97fb88151ee8b7	0
q-l3-plugin_fanout_eeac107aa8374f87afbddbf6aafcd65c	0
q-plugin	0
q-plugin.overcloud-controller-0.localdomain	0
q-plugin_fanout_38617a666c6c46fd91c6eada520f0303	0
q-reports-plugin	0
q-reports-plugin.overcloud-controller-0.localdomain	0
q-reports-plugin_fanout_4feee95d061f40b2906c22268c79a626	0
q-reports-plugin_fanout_c6123bf05ab24ddaa12adca88b920215	0
reply_1cbc785538484554850f69dda902c537	0
reply_748d4640dbab4284bae19fe086af14e8	0
reply_ab42e35c548d48b48c9ba0fc3ac93ec7	0
reply_b37538409ae84436804ccd1b1c0a3bdd	0
reply_c6bebd23c7e24a5c9a06730b42d317cf	0
reply_f34034fd84e347e8b6aeedc49f97282d	0
sahara-ops	0
sahara-ops.2baf790d-3cfe-42b7-b8bf-49611ecc9639	0
sahara-ops_fanout_91b35b7138284165b4f274f5221d6d89	0
scheduler	0
scheduler.overcloud-controller-0.localdomain	0
scheduler_fanout_0888632b036840849e04edc68d4df200	0

March 02, 2016

Creating an additional host for a Tripleo overcloud

I’ve been successful following the steps to get a Tripleo deployment. I now need to add another server to host the Identity Management and Federation services. Here’s the steps:

The easiest way to to start back at the environment setup, and tell instack to create an extra node:

export NODE_MEM=8192
export NODE_COUNT=3

The default creates two nodes: one for the controller, one for compute. By increasing this to 3, instack will provide a third virtual machine and register it with Ironic.

I then ran through the steps to deploy Tripleo using Tripleo-common.

Note, that I did not run the all-in-one. I ran each of the commands in turn, made sure that it succeeded and moved on to the next step. Running with no parameters gives the following output:

      --repo-setup         -- Perform repository setup.
      --delorean-setup     -- Install local delorean build environment.
      --delorean-build     -- Build a delorean package locally
      --undercloud         -- Install the undercloud.
      --overcloud-images   -- Build and load overcloud images.
      --register-nodes     -- Register and configure nodes.
      --introspect-nodes   -- Introspect nodes.
      --overcloud-deploy   -- Deploy an overcloud.
      --overcloud-update   -- Update a deployed overcloud.
      --overcloud-delete   -- Delete the overcloud.
      --use-containers     -- Use a containerized compute node.
      --enable-check       -- Enable checks on update.
      --overcloud-pingtest -- Run a tenant vm, attach and ping floating IP.
      --all, -a            -- Run all of the above commands.

From this list, I ran these commands in this order:

  1. –repo-setup
  2. –undercloud
  3. –overcloud-images
  4. –register-nodes
  5. –introspect-nodes
  6. –overcloud-deploy

If anything goes wrong (usually at the overcloud deploy stage) I’ve used Steve Hardy’s blog post to troubleshoot.

To then provision an operating system on the virtual machine, we can use the undercloud.

 openstack server create  --flavor baremetal --image overcloud-full --key-name default idm

When that finished:

$ openstack server list
| ID                         | Name                    | Status | Networks            |
| 099b0784-6591-4aba-90ad-   | idm                     | ACTIVE | ctlplane= |
| d5d93bf78745               |                         |        |                     |
| d4ac0792-e70c-4710-9997-b9 | overcloud-controller-0  | ACTIVE | ctlplane=  |
| 32a67c500b                 |                         |        |                     |
| 45b74adf-447f-             | overcloud-novacompute-0 | ACTIVE | ctlplane=  |
| 45c8-b308-c694b6d45862     |                         |        |                     |

Log in and do work:

 ssh centos@
The authenticity of host ' (' can't be established.
ECDSA key fingerprint is 28:40:4f:a0:70:94:ef:ed:31:87:d1:37:b7:eb:8b:5d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '' (ECDSA) to the list of known hosts.
[centos@idm ~]$ hostname

February 29, 2016

Let's talk about soft skills at RSA, plus some other things
It's been no secret that I think the lack of soft skills in the security space is one of our biggest problems. While usually I usually only write all about the world's problems and how to fix them here, during RSA I'm going to take a somewhat different approach.

I'm giving a talk on Friday titled Why Won't Anyone Listen to Us?

I'm going to talk about how a security person can talk to a normal person without turning them against us. We're a group that doesn't like talking to anyone, even each other. We need to start talking to people. I'm not saying we should stand around and accept abuse, I am saying the world wants help with security. We're not really in a place to give it because we don't like people. But they need our help, most of them know it even!

We've all had countless interactions where we give someone good, real advice, and they just completely ignore us. It's infuriating sometimes. Part of the problem is we're not talking to people, we're throwing verbal information at them, and they ignore it. They listen to Oprah, if she told them about two factor auth everyone would be using it by the end of the week!

That's just it, they listen to Oprah. They're going to listen to anyone who talks to them in a way they understand. If it's not us, it will be someone else, probably someone we don't want talking about security.

I can't teach you to like people (there are limits to my abilities), but I can help teach you how to talk to them. And of course a talk like this will need to have plenty of fun sprinkled in. How to talk to someone while being very important, can also be an extremely boring topic.

I touched on some of this during my talk.

Red Hat is also putting on a Breakfast on Wednesday morning. I'm going to keynote it (I'll keep it short and sweet for those of you attending, there's nothing worse than a speaker at 8am who talks too much).

A coworker, Richard Morrell, is running a podcast from RSA called Locked Down. Be sure to give it a listen. I may even be able to convince him to let me on his show.

I have no doubt the RSA conference will be a great time. If you're there come find me, Red Hat has a booth, North Expo #N3038. Come say hi, or not if you don't like talking to people ;)

There or not, feel free to start a conversation on Twitter. I'm @joshbressers

February 24, 2016

Keystone on Port 80 For Tripleo

Many services assume that Keystone listens on ports 5000 and 35357. I’d prefer to have Keystone listen on the standard HTTP(s) ports of 80 and 443. We can’t remove the non-standard ports without a good deal of rewriting. But there is nothing preventing us from running Keystone on port 80 or 443 in addition to those ports.

I was trying to get this to work for a Tripleo deployment where I needed to ssh in and port forward through several levels. I didn’t want to have to do this for more ports than absolutetly necessary.

I did need to backport one change to make this work with the current Tripleo, but I suspect that, come Milestone 3 of Mitaka, we’ll have it via a rebase of the RDO packages.

In Tripleo, Horizon is run on port 80, and shows up under the /dashboard URI. So, I put Keystone under /keystone (yeah yeah, it should have been /identity. I’ll do that next time.)

UPDATE 1: decreased threads to 1, as oslo-config complains on multiple.
UPDATE 2: changed Location from /keystone/main/ to /keystone/main and /keystone/admin/ to /keystone/admin to match WSGIDaemonProcess

in /etc/httpd/conf.d/11-keystone_wsgi_main.conf

WSGIApplicationGroup %{GLOBAL}
WSGIDaemonProcess keystone_main_11 display-name=keystone-main group=keystone processes=1 threads=1 user=keystone
WSGIProcessGroup keystone_main_11
WSGIScriptAlias /keystone/main "/var/www/cgi-bin/keystone/main"
<Location "/keystone/main">
WSGIProcessGroup keystone_main_11

And in /etc/httpd/conf.d/11-keystone_wsgi_admin.conf

WSGIApplicationGroup %{GLOBAL}
WSGIDaemonProcess keystone_admin_11 display-name=keystone-admin group=keystone processes=1 threads=1 user=keystone
WSGIProcessGroup keystone_admin_11
WSGIScriptAlias /keystone/admin "/var/www/cgi-bin/keystone/admin"
<Location "/keystone/admin">
WSGIProcessGroup keystone_admin_11

I have an adapted version of the overcloud rc file set of Keystone V3:

export OS_NO_CACHE=True
export OS_CLOUDNAME=overcloud
#export OS_AUTH_URL=
export OS_AUTH_URL=
export NOVA_VERSION=1.1
export OS_USERNAME=admin
export no_proxy=,
export OS_PASSWORD=`uuidgen -r`
export PYTHONWARNINGS="ignore:Certificate has no, ignore:A true SSLContext object is not available"
export OS_PROJECT_NAME=admin
export OS_USER_DOMAIN_NAME=Default

To Test:

$ . ./overcloudv3.rc 
[heat-admin@overcloud-controller-0 ~]$ openstack token issue
| Field      | Value                            |
| expires    | 2016-02-24T05:40:10.017354Z      |
| id         | 53c5ba8766034ee39a3918cc51082f2c |
| project_id | 42fddae694cb4bd29c0911b64c95440b |
| user_id    | 627727a981f149e2a9ae50422738e659 |

February 23, 2016

Change direction, increase speed! (or why glibc changes nothing)
The glibc issue has had me thinking. What will we learn from this?

I'm pretty sure the answer is "nothing", which then made me wonder why this is.

The conclusion I came up with is we are basically the aliens from space invaders. Change direction, increase speed! While this can give the appearance of doing something, we are all very busy all the time. It's not super useful when you really think about it. Look at Shellshock, Heartbleed, GHOST, LOGJAM, Venom, pick an issue with a fancy name. After the flurry of news stories and interviews, did anything change, or did everyone just go back to business as usual? Business as usual pretty much.

Dan Kaminski explains glibc nicely and has some advice. But let's look at this honestly. Is anything going to change? No. Dan found a serious DNS issue back in the day. Did we fix DNS or did we bandage it up as best as we could? We bandaged it. What Dan found was without a doubt as bad or worse than this glibc issue, nothing changed.

I've said this before, I'll say it again. I'm going to say it at RSA next week. We don't know how to fix this. We think we do, you're thinking about it right now, about how we can fix everything! We just have to do that one ... Except you can't. We don't really know what's wrong. Security bugs aren't the problem, they are the result of the problem. We can't fix all the security bugs. I'd be surprised if we've even fixed 10% of security bugs that exist. Even mitigation technologies aren't going to get us there (they are better than constantly fixing bugs, but that's a story for another day).

It's like being obsessed about your tire pressure when there is a hole in the tire. If you only worry about one detail, the tunnel vision makes you miss what's actually going on. Our tires are losing air faster than we can fix them, so we're looking for a bigger pump instead of new tires.

We say things all the time about not using C anymore, or training developers, or writing better documentation. There's nothing wrong with these ideas exactly, but the fact is they've all been tried more than once and none of them work. If we started the largest developer education program ever, if we made every developer sit through a week of training. I bet it would be optimistic if our bug rate would decrease by 5%. Think about that for a while.

We first have to understand our problem. We have lots of solutions, solutions to problems that don't really exist. Solutions without problems tend to turn into new problems. We need to understand our security problem. It's probably more like hundreds or thousands of problems. Every group, every company, every person has different problems. We understand none of them.

We start by listening. We're not going to fix any of this with code. We need to see what's happening, some big picture, some in the weeds. Today we show up, yell at people (if they're lucky), then we leave. We don't know what's really happening. We don't tell anyone what they need to know. We don't even know what they need to know. The people we're not talking to know what the problems are though. They don't know they know, we just have to give them time to explain it to us.

If you're at RSA next week, come talk to me. If not, hit me up on twitter @joshbressers
Thinking about glibc and Heartbleed, how do fix things
After my last blog post Change direction, increase speed! (or why glibc changes nothing) it really got me thinking about how can we start to fix some of this. The sad conclusion is that nothing can be fixed in the short term. Rather than trying to make up some nonsense about how to fix this, I want to explain what's happening and why this can't be fixed anytime soon.

Let's look at Heartbleed first.

There was a rather foul flaw found in OpenSSL, after Heartbleed the Linux Foundation collected a lot of money to help work on core infrastructure projects. If we look at the state of things it basically hasn't changed outside of money moving around. OpenSSL cannot be fixed for a number of reason.

  1. Old codebase
  2. Backward compatibility
  3. Difficult API
  4. It is "general purpose"
The reality is that the only way to get what could be considered a safe library, you would have to throw everything out and start over with some very specific ideas in mind. Things like sun-setting algorithms didn't exist when OpenSSL was created. There is no way you're going to get even a small number of projects to move from using OpenSSL to some new "better" library. It would have to be so much better they couldn't ignore it. As anyone who has ever written software knows, you don't build a library like that over night. I think 5 years would be a conservative estimate for double digit adoption rates.

While I'm picking on OpenSSL here, the story is the same in virtually every library and application that exists. OpenSSL isn't special, it just gets a lot of attention.

Let's think about glibc.

Glibc is the C library used by most Linux distributions. If the Kernel is the heart, glibc is the soul. Nothing can even exist without this library. Glibc even strives to be POSIX compliant, for good reason. POSIX has given us years of compatibility and cooperation.

Glibc probably has plenty more horrible bugs hiding in the code. It's wicked complex and really large. If you ever need some nightmare fuel, look at the glibc source code. Everything we do in C relies on a libc being around, glibc doesn't have that luxury.

Replacing libc is probably out of the question, it's just not something practical. So let's think about something like golang. What if everything was written using golang? It's not totally insane, there are substantial benefits. It's not as fast as C though, that will be the argument most use. Golang will probably never beat C, the things that makes it safer also makes it slower. But now if we think about replacing UNIX utilities with golang, why would we want to do that? Why not throw out all the mistakes UNIX made and do something else?

Now we're back to the legacy and compatibility arguments. Linux has more than twenty years of effort put into it. You can't just replace that. Even if you had the best team in the world I bet 10 years would be wishful thinking for having half the existing features.

So what does this mean? It means we don't know where to start yet. We are trying to solve our problems using what we know and the tools that exist. We have to solve this using new tools and new thinking. The way we fix security is by doing something nobody has ever thought of before. In 1920 the concept of the Internet didn't exist, people couldn't imagine how to even solve some of the problems we can easily solve using it. Don't try to solve our problems with what you know. Solve the problems using new ideas, find out what you don't know, that's where the solution lives.

February 19, 2016

glibc for humans
Unless you've been living under a rock, you've heard about the latest glibc issue.
CVE-2015-7547 - glibc stack-based buffer overflow in getaddrinfo()

It's always hard to understand some of these issues, so I'm going to do my best to explain it using simple language. Making security easy to understand is something I've been talking about for a long time now, it's time to do something about it.

What is it?
The fundamental problem here is that glibc has a bug that could allow a DNS response from an attacker to run the command of that attacker's choosing on your system. The final goal of course would be to become the root user.

The problem is that this glibc function is used by almost everything that talks to the network. In today's hyperconnected world, this means basically everything is vulnerable to this bug because almost everything can connect to the network. As of this writing we have not seen this attack being used on the Internet. Just because there are no known attacks is no reason to relax though, constant vigilance is key for issues like this.

Am I vulnerable?
If you run Linux (most distributions use glibc), and you haven't installed an update from your vendor, yes, you are vulnerable.

Are there workarounds?
No, there is no way to stop this issue. You have to install an update to glibc. Even the stack protector technology that is built into gcc and glibc will not stop this bug. While it is a stack overflow bug, the stack protector checks do not run before the exploit would gain control.

What about containers, VMs, or other confinement technology?
It is possible that a container, VM, or other technology such as SELinux could limit the possible damage from this bug. However it affects so many binaries on the system it should be expected that an attacker able to gain access to one applications could continue to exploit this bug to eventually become root and take over the entire machine.

Do I only need to be worried if I run a webserver or mailserver?
As stated previously, this bug affects virtually everything that talks to the network. Even if you think your webserver or mailserver are safe, everything from bash to your ssh client will use this library. Updating glibc is the only way to ensure you'll be OK.

What if I run my own DNS server?
This point is currently under investigation. It is thought that it may be possible for a bad DNS request to be able to make it through a DNS server to a vulnerable host. Rather than find out, you should update your glibc.

What about ...
No, just update your glibc :)

Do you have other questions? Ask me on twitter and I'll be sure to update this article if I know the answer.

February 11, 2016

Introduction to Tang and Clevis

In this post I continue the discussion of network-bound decryption and introduce Tang and Clevis, new unlock tools that supersede Deo (which was covered in an earlier post).

Deo is dead. Long live Tang.

Nathaniel McCallum discovered a key encryption protocol based on ElGamal with a desirable security characteristic: no one but the party decrypting the secret can learn the secret. It was reviewed and refined into McCallum-Relyea (MR) exchange. With Deo, the server decrypted (thus learned) the key and sent it back to the client (through an encrypted channel). McCallum-Relyea exchange avoids this. A new protocol based on MR was developed, called Tang.

Another perceived drawback of Deo was its use of X.509 certificates for TLS and for encryption, making it complex to deploy. The Tang protocol is simpler and avoids X.509.

I will avoid going into the details of the cryptography or the protocol in this post, but will include links at the end.


Using Tang to bind data to a network is great, but there are many other things we might want to bind our data to, such as passwords, TPM, biometrics, Bluetooth LE beacons, et cetera. It would also be nice to define policies – possibly nested – about how many of what data binders must succeed in order to decrypt or "unlock" a secret. The point here is that unlock policy should be driven by business or and/or user needs, not by technology. The technology must enable but not constrain the policy.

Enter Clevis, the pluggable client-side unlock framework. Plugins, which are called pins, implement different kinds of bindings. Clevis comes with a handful a pins including pwd (password) and https (PUT and GET the secret; a kind of escrow). The tang pin is provided by Tang to avoid circular dependencies.

The sss pin provides a way to "nest" pins, and also provides k of n threshold unlocking. "SSS" stands for Shamir’s Secret Sharing, the algorithm that makes this possible.

LUKS volume decryption, which was implemented in Deo, has not yet been implemented in Clevis, but it is a high priority.

By the way, if you were wondering about the terminology, a clevis, clevis pin and tang together form a kind of shackle.


TLS private key decryption

Let’s revisit the TLS private key decryption use case from my earlier Deo post, and update the solution to use Clevis and Tang.

Recall the encryption command, which required the user to input the TLS private key’s passphrase, then encrypted it with Deo, storing it at a location determined by convention:

# (stty -echo; read LINE; echo -n "$LINE") \
  | deo encrypt -a /etc/ipa/ca.pem deo.ipa.local \
  > /etc/httpd/deo.d/f22-4.ipa.local:443

We will continue to use the same file storage convention. Clevis, unlike Tang, does not receive a secret to be encrypted but instead generates one and tells us what it is. Let’s run clevis provision with the Tang pin and see what it gives us:

# clevis provision -P '{"type": "tang", "host": "f23-1.ipa.local"}' \
  -O /etc/httpd/tang.d/f22-4.ipa.local:443

The server advertised the following signing keys:


Do you wish to trust the keys? [yn] y

Breaking down the command, the -P argument is a JSON tang pin configuration object, specifying the Tang server’s host name. The argument to -O specifies the output filename.

The program prints the signing key(s) and asks if we want to trust them. Tang is a trust on first use (TOFU) protocol. Out-of-band validation is possible but not yet implemented (there is a ticket for DNSSEC support).

Having trusted the keys, the program performs the Tang encryption, saves the metadata in the specified output file, and finally prints the secret: 709DAFCBC8ACF879D1AC386798783C7E.

We now need to update the passphrase on the TLS private key with the secret that Clevis generated:

# openssl rsa -aes128 < key.pem > newkey.pem && mv newkey.pem key.pem
Enter pass phrase:
writing RSA key
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:

OpenSSL first asked for the original passphrase to decrypt the private key, then asks (twice) for a new passphrase, which shall be the secret Clevis told us.

Now we must change the helper script that unlocks the private key. Recall the definition of the Deo helper:

[ -f "$DEO_FILE" ] && deo decrypt < "$DEO_FILE" && echo && exit
exec /bin/systemd-ask-password "Enter SSL pass phrase for $1 ($2) : "

The Clevis helper is similar:

[ -f "$CLEVIS_FILE" ] && clevis acquire -I "$CLEVIS_FILE" && echo && exit
exec /bin/systemd-ask-password "Enter SSL pass phrase for $1 ($2) : "

The clevis acquire -I "$CLEVIS_FILE" invocation is the only substantive change. Now we can finally systemctl restart httpd and observe that the key is decrypted automatically, without prompting the operator.

What are the possible downsides to this approach? First, due to limitations in Apache’s passphrase acquisition at present it is possible only to use Clevis pins that do not interact with the user or write to standard output. Second, the secret is no longer controlled by the user doing the provisioning – the TLS private key must be re-encrypted under the new passphrase generated by Clevis, and if the Tang server is unavailable, that is the passphrase that must be entered at the fallback prompt. A lot more work needs to be done to make Clevis a suitable general solution for key decryption in Apache or other network servers, but for this simple case, Clevis and Tang work very well, as long as the Tang server is available.


This has been a very quick and shallow introduction to Clevis and Tang. For a deeper overview and demonstration of Tang server deployment and more advances Clevis policies, I recommend watching Nathaniel McCallum’s talk from 2016.

Other useful links:

February 10, 2016

OpenStack Keystone Q and A with the Boston University Distributed Systems Class Part 1

Dr. Jonathan Appavoo was kind enough to invite me to be a guest lecturer in his distributed systems class at Boston University. The students proved a list of questions, and I only got a chance to address a handful of them during the class. So, I’ll try to address the rest here.

Page 1 of 1 So far (I’ll update this line as I post the others.)

When do tokens expire? If they don’t expire, isn’t it potentially dangerous since attackers can use old tokens to gain access to privileged information?
Tokens have a set expiry. The default was originally set to be 12 hours. We shortened that to 1 hour a couple years back, but it turns out that some workloads use a token all the way through, and those workload last longer than 1 hour. Those deployments either have lengthened the life of the token in the configuration or had users explicitly request tokens that last longer than an hour.

Can users share roles with in the same project?
Yes, and this is the norm. A role assignment is a many to many to many association between users (or groups of users), projects, and roles. This means:

  • One user may have multiple roles on the same project
  • One user may have the same role on multiple projects
  • Multiple users may have the same role on a project
  • Any mix of the above

It’s interesting that one of the key components is a standardized GUI (Horizon). Wouldn’t it be more useful for there to be a handful of acceptable GUIs tailored to the service a particular set of OpenStack instances is providing?

This is the standardized GUI as you point out. While many companies have deployed OpenStack with custom GUIs, this one is the one that is designed to be the most generally applicable. Each user gets a service catalog along with their tokens, and the Horizon UI can use this to determine what services the user can see, and customize the UI displayed accordingly. So, from that perspective, the UI is tailored to the user.

The individual project teams are not compose of UI or UX folks. You really don’t want us designing UI. As soon as you realize the tough problem is getting a consistent look, feel, set of rules, and all the things that keep users from running away screaming, you realize that it really is its own effort and project.

That said, I did propose a few years back that the Keystone project should be able to render to HTML, and not just JSON (and XML). This would be the start of a progressive enhancement approach that would also make the Keystone team aware of the gaps in the coverage of the UI: its really easy to see when you click through. But, again, this was for test-ability, completeness, and rapid implementation of a UI for new features, not the rich user experience that is the end product. It would still be the source for a follow on UX effort.

Since that time, the Horizon team has embraced a single-page-app effort based on a proxy (to avoid CORS issues) to all of the endpoints. The proxy converts the URLs in the JSON requests and responses to the proxy, but otherwise lets things pass unchanged. I would love to see HTML rendering as an option on this proxy.

Can You elaborate on some examples of OpenStack being used by companies on a large scale?

Here is the official list. If you click through the Case studies, some of them have numbers.

A question about the philosophy behind open source. Do the  problems that arise in distributed systems lend themselves well to an open source approach? Or does Brooks’s Law apply?

Brooks law states: “adding manpower to a late software project makes it later.”  Open source projects are not immune to Brook’s Law.  OpenStack is not driven by any one company, and it has a “release each 6 months” rule that means that if a feature is not going to make a release, it will have another chance six months later.  We have not slipped a release since I’ve been on the project, and I don’t think they did before.

The Keystone team is particularly cautious. New features happen, but they are well debated, beat on, and often deferred a release or more.  Getting code into Keystone is a serious undertaking, with lots of back and forth for even small changes, and some big changes have gone through 70+ revisions before getting merged.  I have a page and a half of links to code that I have submitted and later abandoned.

Adding more people to a project under OpenStack (like Keystone) can’t happen without the approval of the developers.  I mean, anyone can submit and review code, but to be accepted as a core requires a vote of confidence from the existing cores, and that vote won;’t take place if you’ve not already proved yourself.  So the worst that could happen is that one company goes commit happy and gets a bunch of people to try to submit patches, and we would ignore them. It hasn’t happened yet.

Does the public source code make authentication a much more difficult project for OpenStack than it is for a close source Identity-as-a-Service?

So, the arguments why Open Source is good for security is well established, and I won’t repeat them here.  To me, there are Open Source projects, and then there is the Open Source development model.  The first means that the code3 is free software, you can contribute, etc.  But it means that the project might be run by one person or company.  The Open Source development model is more distributed by default.  It means that no one person can ram through changes.  Even the project technical leaders can;t get away with approving code without at least another core also approving.  Getting anything done is difficult.  So, from that perspective, yes.  But there is lot of benefit to offset it:  we get a wider array of inputs, and we get public discussion of features and bugs.  We get contributions from people that are interested in solving their own problems, and, in doing so, solve ours.

Why can we not force or why has there not been more standardization for Identity Providers(IdPs) in a  federation?

Adoption of Federated protocols has been happening, slow but steady.  SAML from “oh that’s neat” to “we have to have that.”  IN my time on this project.  SAML is apretty good standard, and many of the IdPs are implementing it.  There is a little wiggle room in the standard, as my coworker who is working on the client protocol (ECP) can tell you, but the more implementations we see, the more we can iron out the details.  So, I think at least from SAML, we do have a good degree of standardization.

The other protocols are also picking up steam, and I think will play out similarly to SAML.  I suspect the OpenID connect will end up just as well standardized in implementation as SAML is starting to be.  The process is really iterative, and you don;t know the issues you are going to have to deal with until you find them in a deployment.

What makes OpenStack better than other Cloud Computing services?

Short answer: I don’t know.

Too Long answer with a lot of conjecture:  I think that the relative strength of OpenStack depends on which of the other services you compare it to.

Amazon’s are more mature, so there the OpenStack benefit is Open Source, and that you can actually implement it on premise, not just have it  hosted for you.  Control of Hardware is a still a big deal.  I think the Open Source aspect really helped OpenStack compete with vCloud as well.

I think that the open source development model for the Cloud mirrors the success of the Linux Kernel.  The majority of the activity on the Kernel is device drivers.  In OpenStack, there is a real push by vendors to support their devices.  In a proprietary solution, a hardware vendor is dependent on working with that proprietary software vendor to get their device supported. In Open Stack, any device manufacturer that wants to be part of Cinder, Nova, or Neutron can get the code and make it work.

This means that even the big vendors get interested.  The software becomes a marketplace of sorts, and if you can’t inter-operate, you miss out on potential sales.  Thus, we have Cisco interested in Neutron and VMware interested in Nova where it might have initially appeared against their interested to have that competition.

I think part of its success was due to the choice of the Python programming language.  Its a language that System administrators don’t tend to react to negatively like they do with Java.  I pick on Java because it was the language used for Eucalyptus.  I personally like working in Java, but I can really see the value in Python for OpenStack.  The fact that source code is essentially shipped by default overcomes the Apache license potential for a closing off code: end users can see what is actually running on their systems. System administrators for Linux systems are likely to already have some familiarity with Python.

I think the micro project approach has also allowed OpenStack to scale.  It lets people interested in identity focus on identity, and block storage people get to focus on block storage.  The result has been an explosion of interest in contributing.

I think OpenStack got lucky with timing:  the world realized it needed a cloud management solution when OpenStack got mature enough to start filing that role.

A Holla out to the Kolla devs

Devstack uses Pip to install packages, which conflict with the RPM versions on my Fedora system. Since I still need to get work done, and want to run tests on Keystone running against a live database, I’ve long wondered if I should go with container based approach. Last week, I took the plunge and started messing around with Docker. I got the MySQL Fedora container to run, then found Lars Keystone container using Sqlite, and was stumped. I poked around for a way to get the two containers talking to each other, and realized that we had a project dedicated to exactly that in OpenStack: Kolla. While it did not work for me right out of a git-clone, several of the Kolla devs worked with me to get it up and running. here are my notes, distilled.

I started by reading the quickstart guide. Which got me oriented (I suggest you start there, too). But found a couple things I needed to learn. First, I needed a patch that has not quite landed, in order to make calls as a local user, instead of as root. I still ended up creating /etc/kolla and chowning it to ayoung. That proved necessary, as the work done in that patch is “necessary but not sufficient.”

I am not super happy about this, but I needed to make docker run without a deliberate sudo. So I added the docker group, added myself to it, and restarted the docker service via systemd. I might end up doing all this as a separate developer user, not as ayoung, so at least I need to su – developer before the docker stuff. I may be paranoid, but that does not mean they are not out to get me.

Created a dir named ~/kolla/ and put in there:


kolla_base_distro: "centos"
kolla_install_type: "source"

# This is the interface with an ip address you want to bind mariadb and keystone too
network_interface: "enp0s25"
# Set this to an ip address that currently exists on interface "network_interface"
kolla_internal_address: ""

# Easy way to change debug to True, though not required
openstack_logging_debug: "True"

# For your information, but these default to "yes" and can technically be removed
enable_keystone: "yes"
enable_mariadb: "yes"

# Builtins that are normally yes, but we set to no
enable_glance: "no"
enable_haproxy: "no"
enable_heat: "no"
enable_memcached: "no"
enable_neutron: "no"
enable_nova: "no"
enable_rabbitmq: "no"
enable_horizon: "no"

I also copied the file ./etc/kolla/passwords.yml from the repo into that directory, as it was needed during the deploy.

To build the images, I wanted to work inside the kolla venv (didn’t want to install pip packages on my system) so I ran the

tox -epy27

Which, along with running the unit tests, created a venv. I activated that venv for the build command:

. .tox/py27/bin/activate
./tools/ --type source keystone mariadb rsyslog kolla-toolbox

Note that I had first built the binary versions using:

./tools/ keystone mariadb rsyslog kolla-toolbox

But then tried to deploy the source version. The source versions are downloaded from tarballs on whereas the binary versions are the Delorean RPMS, and the trail the source versions by a little bit (not a lot).

I’ve been told “if you tox gen the config you will get a kolla-build.conf config. You can change that to git instead of url and point it to a repo.” But I have not tried that yet.

I had to downgrade to the pre 2.0 version of Ansible, as I was playing around with 2.0’s support for Keystone V3 API. Kolla needs 1.9

dnf downgrade ansible

There is an SELinux issue. I worked round for now by setting SELInux into permissive mode, but we’ll revisit that shortly. It was only for deploy; once the containers were running, I was able to switch back to enforcing mode.
We will deal with it here.

./tools/kolla-ansible --configdir /home/ayoung/kolla   deploy

Once that ran, I wanted to test Keystone. I needed a keystone RC file. To get it:

./tools/kolla-ansible post-deploy

It put it in /etc/kolla/.

. /etc/kolla/ 
[ayoung@ayoung541 kolla]$ openstack token issue
| Field      | Value                            |
| expires    | 2016-02-08T05:51:39.447112Z      |
| id         | 4a4610849e7d45fdbd710613ff0b3138 |
| project_id | fdd0b0dcf45e46398b3f9b22d2ec1ab7 |
| user_id    | 47ba89e103564db399ffe83d8351d5b8 |


I have to admin that I removed the warning.

usr/lib/python2.7/site-packages/keyring/backends/ PyGIWarning: GnomeKeyring was imported without specifying a version first. Use gi.require_version('GnomeKeyring', '1.0') before import to ensure that the right version gets loaded.
  from gi.repository import GnomeKeyring

Huge thanks to SamYaple and inc0 (Michal Jastrzebski) for their help in getting me over the learning hump.

I think Kolla is fantastic. It will be central to my development for Keystone moving forward.

February 08, 2016
I spent last week at Devconf in the Czech Republic. I didn't have time to write anything new and compelling, but I did give a talk about why everything seems to be on fire.

I explore what's going on right now, why do things look like they're on fire, and how we can start to fix this. Our problem isn't technology, it's the people. We're good at technology problems, we're bad at people problems.

Give the talk a listen. Let me know what you think, I hope to peddle this message as far and wide as possible.

Join the conversation, hit me up on twitter, I'm @joshbressers
Dealing with Duplicate SSL certs from FreeIPA

I reinstalled My browser started complaining when I try to visit it; The serial number of the TLS certificate is a duplicate. If I am seeing this, anyone else that looked at the site in the past is going to see it, too, so I don’t want to just hack my browser setup to ignore it. Here’s how I fixed it:

FreeIPA uses Certmonger to request and monitor certificates. The Certmonger daemon runs on the server that owns the certificate, and performs the tricky request format generation, then waits for an answer. So, In order to update the IPA server, I am going to tell Certmonger to request a renewal of the HTTPS TLS certificate.

The tool to talk to cermonger is called getcert. First, find the certificate. We know it is going to stored in the Apache HTTPD config directory:

sudo getcert list
Number of certificates and requests being tracked: 8.
Request ID '20160201142947':
	stuck: no
	key pair storage: type=NSSDB,location='/etc/pki/pki-tomcat/alias',nickname='auditSigningCert cert-pki-ca',token='NSS Certificate DB',pin set
	certificate: type=NSSDB,location='/etc/pki/pki-tomcat/alias',nickname='auditSigningCert cert-pki-ca',token='NSS Certificate DB'
	CA: dogtag-ipa-ca-renew-agent
	issuer: CN=Certificate Authority,O=YOUNGLOGIC.NET
	subject: CN=CA Audit,O=YOUNGLOGIC.NET
	expires: 2018-01-21 14:29:08 UTC
	key usage: digitalSignature,nonRepudiation
	pre-save command: /usr/lib64/ipa/certmonger/stop_pkicad
	post-save command: /usr/lib64/ipa/certmonger/renew_ca_cert "auditSigningCert cert-pki-ca"
	track: yes
	auto-renew: yes
Request ID '20160201143116':
	stuck: no
	key pair storage: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB',pinfile='/etc/httpd/alias/pwdfile.txt'
	certificate: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB'
	issuer: CN=Certificate Authority,O=YOUNGLOGIC.NET
	expires: 2018-02-01 14:31:15 UTC
	key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment
	eku: id-kp-serverAuth,id-kp-clientAuth
	pre-save command: 
	post-save command: /usr/lib64/ipa/certmonger/restart_httpd
	track: yes
	auto-renew: yes

There are many in there, but the one we care about is the last one, with the Request ID of 20160201143116. It is in the NSS database stored in /etc/httpd/alias. To request a new certificate, use the command:

sudo ipa-getcert resubmit -i 20160201143116

While this is an ipa-specific command, it is essentially telling certmonger to renew the certificate. After we run it, I can look at the list of certificates again and see that the “expires” value has been updated:

Request ID '20160201143116':
	stuck: no
	key pair storage: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB',pinfile='/etc/httpd/alias/pwdfile.txt'
	certificate: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB'
	issuer: CN=Certificate Authority,O=YOUNGLOGIC.NET
	expires: 2018-02-07 02:29:42 UTC
	principal name: HTTP/
	key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment
	eku: id-kp-serverAuth,id-kp-clientAuth
	pre-save command: 
	post-save command: /usr/lib64/ipa/certmonger/restart_httpd

Now when I refresh my browser window, Firefox no longer complains about the repeated serial number. Now it complains that “the site administrator has incorrectly configured the Security for this site” because I am use a CA cert that it does not know about. But now I can move on and re-install the CA cert.