February 10, 2016

A Holla out to the Kolla devs

Devstack uses Pip to install packages, which conflict with the RPM versions on my Fedora system. Since I still need to get work done, and want to run tests on Keystone running against a live database, I’ve long wondered if I should go with container based approach. Last week, I took the plunge and started messing around with Docker. I got the MySQL Fedora container to run, then found Lars Keystone container using Sqlite, and was stumped. I poked around for a way to get the two containers talking to each other, and realized that we had a project dedicated to exactly that in OpenStack: Kolla. While it did not work for me right out of a git-clone, several of the Kolla devs worked with me to get it up and running. here are my notes, distilled.

I started by reading the quickstart guide. Which got me oriented (I suggest you start there, too). But found a couple things I needed to learn. First, I needed a patch that has not quite landed, in order to make calls as a local user, instead of as root. I still ended up creating /etc/kolla and chowning it to ayoung. That proved necessary, as the work done in that patch is “necessary but not sufficient.”

I am not super happy about this, but I needed to make docker run without a deliberate sudo. So I added the docker group, added myself to it, and restarted the docker service via systemd. I might end up doing all this as a separate developer user, not as ayoung, so at least I need to su – developer before the docker stuff. I may be paranoid, but that does not mean they are not out to get me.

Created a dir named ~/kolla/ and put in there:

~/kolla/globals.yml

kolla_base_distro: "centos"
kolla_install_type: "source"

# This is the interface with an ip address you want to bind mariadb and keystone too
network_interface: "enp0s25"
# Set this to an ip address that currently exists on interface "network_interface"
kolla_internal_address: "10.0.0.13"

# Easy way to change debug to True, though not required
openstack_logging_debug: "True"

# For your information, but these default to "yes" and can technically be removed
enable_keystone: "yes"
enable_mariadb: "yes"

# Builtins that are normally yes, but we set to no
enable_glance: "no"
enable_haproxy: "no"
enable_heat: "no"
enable_memcached: "no"
enable_neutron: "no"
enable_nova: "no"
enable_rabbitmq: "no"
enable_horizon: "no"

I also copied the file ./etc/kolla/passwords.yml from the repo into that directory, as it was needed during the deploy.

To build the images, I wanted to work inside the kolla venv (didn’t want to install pip packages on my system) so I ran the

tox -epy27

Which, along with running the unit tests, created a venv. I activated that venv for the build command:

. .tox/py27/bin/activate
./tools/build.py --type source keystone mariadb rsyslog kolla-toolbox

Note that I had first built the binary versions using:

./tools/build.py keystone mariadb rsyslog kolla-toolbox

But then tried to deploy the source version. The source versions are downloaded from tarballs on http://tarballs.openstack.org/ whereas the binary versions are the Delorean RPMS, and the trail the source versions by a little bit (not a lot).

I’ve been told “if you tox gen the config you will get a kolla-build.conf config. You can change that to git instead of url and point it to a repo.” But I have not tried that yet.

I had to downgrade to the pre 2.0 version of Ansible, as I was playing around with 2.0’s support for Keystone V3 API. Kolla needs 1.9

dnf downgrade ansible

There is an SELinux issue. I worked round for now by setting SELInux into permissive mode, but we’ll revisit that shortly. It was only for deploy; once the containers were running, I was able to switch back to enforcing mode.
We will deal with it here.

./tools/kolla-ansible --configdir /home/ayoung/kolla   deploy

Once that ran, I wanted to test Keystone. I needed a keystone RC file. To get it:

./tools/kolla-ansible post-deploy

It put it in /etc/kolla/.

. /etc/kolla/admin-openrc.sh 
[ayoung@ayoung541 kolla]$ openstack token issue
+------------+----------------------------------+
| Field      | Value                            |
+------------+----------------------------------+
| expires    | 2016-02-08T05:51:39.447112Z      |
| id         | 4a4610849e7d45fdbd710613ff0b3138 |
| project_id | fdd0b0dcf45e46398b3f9b22d2ec1ab7 |
| user_id    | 47ba89e103564db399ffe83d8351d5b8 |
+------------+----------------------------------+

Success

I have to admin that I removed the warning.

usr/lib/python2.7/site-packages/keyring/backends/Gnome.py:6: PyGIWarning: GnomeKeyring was imported without specifying a version first. Use gi.require_version('GnomeKeyring', '1.0') before import to ensure that the right version gets loaded.
  from gi.repository import GnomeKeyring

Huge thanks to SamYaple and inc0 (Michal Jastrzebski) for their help in getting me over the learning hump.

I think Kolla is fantastic. It will be central to my development for Keystone moving forward.

February 08, 2016

Devconf.cz
I spent last week at Devconf in the Czech Republic. I didn't have time to write anything new and compelling, but I did give a talk about why everything seems to be on fire.

https://www.youtube.com/watch?v=zmDm7J7V7aw

I explore what's going on right now, why do things look like they're on fire, and how we can start to fix this. Our problem isn't technology, it's the people. We're good at technology problems, we're bad at people problems.

Give the talk a listen. Let me know what you think, I hope to peddle this message as far and wide as possible.

Join the conversation, hit me up on twitter, I'm @joshbressers
Dealing with Duplicate SSL certs from FreeIPA

I reinstalled https://ipa.younglogic.net. My browser started complaining when I try to visit it; The serial number of the TLS certificate is a duplicate. If I am seeing this, anyone else that looked at the site in the past is going to see it, too, so I don’t want to just hack my browser setup to ignore it. Here’s how I fixed it:

FreeIPA uses Certmonger to request and monitor certificates. The Certmonger daemon runs on the server that owns the certificate, and performs the tricky request format generation, then waits for an answer. So, In order to update the IPA server, I am going to tell Certmonger to request a renewal of the HTTPS TLS certificate.

The tool to talk to cermonger is called getcert. First, find the certificate. We know it is going to stored in the Apache HTTPD config directory:

sudo getcert list
Number of certificates and requests being tracked: 8.
Request ID '20160201142947':
	status: MONITORING
	stuck: no
	key pair storage: type=NSSDB,location='/etc/pki/pki-tomcat/alias',nickname='auditSigningCert cert-pki-ca',token='NSS Certificate DB',pin set
	certificate: type=NSSDB,location='/etc/pki/pki-tomcat/alias',nickname='auditSigningCert cert-pki-ca',token='NSS Certificate DB'
	CA: dogtag-ipa-ca-renew-agent
	issuer: CN=Certificate Authority,O=YOUNGLOGIC.NET
	subject: CN=CA Audit,O=YOUNGLOGIC.NET
	expires: 2018-01-21 14:29:08 UTC
	key usage: digitalSignature,nonRepudiation
	pre-save command: /usr/lib64/ipa/certmonger/stop_pkicad
	post-save command: /usr/lib64/ipa/certmonger/renew_ca_cert "auditSigningCert cert-pki-ca"
	track: yes
	auto-renew: yes
...
Request ID '20160201143116':
	status: MONITORING
	stuck: no
	key pair storage: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB',pinfile='/etc/httpd/alias/pwdfile.txt'
	certificate: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB'
	CA: IPA
	issuer: CN=Certificate Authority,O=YOUNGLOGIC.NET
	subject: CN=ipa.younglogic.net,O=YOUNGLOGIC.NET
	expires: 2018-02-01 14:31:15 UTC
	key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment
	eku: id-kp-serverAuth,id-kp-clientAuth
	pre-save command: 
	post-save command: /usr/lib64/ipa/certmonger/restart_httpd
	track: yes
	auto-renew: yes

There are many in there, but the one we care about is the last one, with the Request ID of 20160201143116. It is in the NSS database stored in /etc/httpd/alias. To request a new certificate, use the command:

sudo ipa-getcert resubmit -i 20160201143116

While this is an ipa-specific command, it is essentially telling certmonger to renew the certificate. After we run it, I can look at the list of certificates again and see that the “expires” value has been updated:

Request ID '20160201143116':
	status: MONITORING
	stuck: no
	key pair storage: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB',pinfile='/etc/httpd/alias/pwdfile.txt'
	certificate: type=NSSDB,location='/etc/httpd/alias',nickname='Server-Cert',token='NSS Certificate DB'
	CA: IPA
	issuer: CN=Certificate Authority,O=YOUNGLOGIC.NET
	subject: CN=ipa.younglogic.net,O=YOUNGLOGIC.NET
	expires: 2018-02-07 02:29:42 UTC
	principal name: HTTP/ipa.younglogic.net@YOUNGLOGIC.NET
	key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment
	eku: id-kp-serverAuth,id-kp-clientAuth
	pre-save command: 
	post-save command: /usr/lib64/ipa/certmonger/restart_httpd

Now when I refresh my browser window, Firefox no longer complains about the repeated serial number. Now it complains that “the site administrator has incorrectly configured the Security for this site” because I am use a CA cert that it does not know about. But now I can move on and re-install the CA cert.

February 06, 2016

Keystone Implied roles with CURL

Keystone now has Implied Roles.  What does this mean?  Lets say we define the role Admin to  imply the Member role.  Now, if you assigned someone Admin on a project they are automatically assigned the Member role on that project implicitly.

Let’s test it out:

Since we don’t yet have client or CLI support, we’ll have to make due with curl and jq for now.

This uses the same approach Keystone V3 Examples

#!/bin/sh 
. ~/adminrc

export TOKEN=`curl -si -d @token-request.json -H "Content-type: application/json" $OS_AUTH_URL/auth/tokens | awk '/X-Subject-Token/ {print $2}'`

export ADMIN_ID=`curl -H"X-Auth-Token:$TOKEN" -H "Content-type: application/json" $OS_AUTH_URL/roles?name=admin | jq --raw-output '.roles[] | {id}[]'`

export MEMBER_ID=`curl -H"X-Auth-Token:$TOKEN" -H "Content-type: application/json" $OS_AUTH_URL/roles?name=_member_ | jq --raw-output '.roles[] | {id}[]'`

curl -X PUT -H"X-Auth-Token:$TOKEN" -H "Content-type: application/json" $OS_AUTH_URL/roles/$ADMIN_ID/implies/$MEMBER_ID

curl  -H"X-Auth-Token:$TOKEN" -H "Content-type: application/json" $OS_AUTH_URL/role_inferences 

Now, create a new user and and assign them only the user role.

openstack user create Phred
openstack user show Phred
+-----------+----------------------------------+
| Field     | Value                            |
+-----------+----------------------------------+
| domain_id | default                          |
| enabled   | True                             |
| id        | 117c6f0055a446b19f869313e4cbfb5f |
| name      | Phred                            |
+-----------+----------------------------------+
$ openstack  user set --password-prompt Phred
User Password:
Repeat User Password:
$ openstack project list
+----------------------------------+-------+
| ID                               | Name  |
+----------------------------------+-------+
| fdd0b0dcf45e46398b3f9b22d2ec1ab7 | admin |
+----------------------------------+-------+
$ openstack project list
+----------------------------------+-------+
| ID                               | Name  |
+----------------------------------+-------+
| fdd0b0dcf45e46398b3f9b22d2ec1ab7 | admin |
+----------------------------------+-------+
openstack role add --user 117c6f0055a446b19f869313e4cbfb5f --project fdd0b0dcf45e46398b3f9b22d2ec1ab7 e3b08f3ac45a49b4af77dcabcd640a66

Copy token-request.json and modify the values for the new user.

 curl  -d @token-request-phred.json -H "Content-type: application/json" $OS_AUTH_URL/auth/tokens | jq '.token | {roles}'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1643  100  1098  100   545  14742   7317 --:--:-- --:--:-- --:--:-- 14837
{
  "roles": [
    {
      "id": "9fe2ff9ee4384b1894a90878d3e92bab",
      "name": "_member_"
    },
    {
      "id": "e3b08f3ac45a49b4af77dcabcd640a66",
      "name": "admin"
    }
  ]
}

January 31, 2016

Does the market care about security?
I had some discussions this week about security and the market. When I say the market I speak of what sort of products will people or won't people buy based on some requirements centered around security. This usually ends up at a discussion about regulation. That got me wondering if there are any industries that are unregulated, have high safety requirements, and aren't completely unsafe?

After a little research, it seems SCUBA is the industry I was looking for. If you read the linked article (which you should, it's great) the SCUBA story is an important lesson for the security industry. Our industry moves fast, too fast to regulate. Regulation would either hurt innovation or be useless due to too much change. Either way it would be very expensive. SCUBA is a place where the lack of regulation has allowed for dramatic innovation over the past 50 years. The article compares the personal aircraft industry which has substantial regulation and very little innovation (but the experimental aircraft industry is innovating due to lax regulation).

I don't think all regulation is bad, it certainly has its place, but in a fast moving industry it can bring innovation to a halt. And in the context of security, what could you even regulate that would actually matter? Given the knowledge gaps we have today any regulation would just end up being a box ticking exercise.

Market forces are what have kept SCUBA safe, divers and dive shops won't use or stock bad gear. Security today has no such bar, there are lots of products that would fall under the "unsafe" category that are stocked and sold by many. Can this market driven approach work for our security industry?

It's of course not that simple for security. Security isn't exactly an industry in itself. There are security products, then there are other products. If you're writing a web app security probably takes a back seat to features. Buyers don't usually ask about security, they ask about features. People buying SCUBA gear don't ask about safety, they just assume it's OK. When you run computer software today you either know it's insecure, or you're oblivious to what's going on. There's not really a happy middle.

Even if we had an industry body everyone joined, it wouldn't make a huge difference today. There is no software that exists without security problems. It's a wide spectrum of course, there are examples that are terrible and examples that do everything right. Today both groups are rewarded equally because security isn't taken into account in many instances. Even if you do everything right, you will still have security flaws in your software.

Getting the market to drive security is going to be tricky, security isn't a product, it's part of everything. I don't think it's impossible, just really hard. SCUBA has the advantage of a known and expected use case. Imagine if that gear was expected to work underwater, in space, in a fire, in the arctic, and you have to be able to eat pizza while wearing it? Nobody would even try to build something like that. The flexibility of software is also its curse.

In the early days of SCUBA there were a lot of accidents, by moving faster than the regulators could, they not only made the sport extremely safe, but probably saved what we know as SCUBA today. If it was heavily regulated I suspect much of the technology wouldn't look all that different from what was used 30+ years ago. Software regulation would probably keep things looking a like they do today, just with a lot of voodoo to tick boxes.

Our great challenge is how do we apply this lesson from SCUBA to security? Is there a way we can start creating real positive change that can be market driven innovation and avoid the regulation quagmire?

Join the conversation, hit me up on twitter, I'm @joshbressers

January 28, 2016

Remote group merging for Fedora

The Problem

One of the major features of the Fedora Server Edition is the Cockpit administrative console. This web-based interface provides administrators with a powerful set of tools for controlling their system. Cockpit relies upon low-level tools like polkit and sudo to make authorization decisions to determine what a user is permitted to do. By default, most operations on a Fedora system are granted to users in the ‘wheel’ group. People granted administrator access to Cockpit (and other tools through shell access) are generally added to the wheel group in the /etc/group file.

This works reasonably well for single-user systems or very small environments where manual edits to /etc/group are maintainable, but in larger deployments, it becomes very unwieldy to manage lots of entries in /etc/group. In these cases, most environments switch over to using some form of a domain controller (such as FreeIPA, Microsoft Active Directory or a custom LDAP setup). These domain controllers allow users to be managed centrally, allowing administrators to make changes in a single place and have this be automatically picked up by all enrolled systems.

However, there is a problem: historically the group processing on Fedora (provided by glibc) has forced users to choose between using centrally managed groups (such as those provided by a domain and maintained by SSSD) or groups maintained on the local system in the /etc/group file. The behavior of glibc is specified in /etc/nsswitch.conf to decide which of the two mechanisms will “win” in the event of a conflict. This means that administrators need to decide up front whether their groups must all come from a domain controller or some locally.

The Solution

Over the last few months, I worked on adding a new feature to the glibc name-service functionality to enable “group merging”. The net effect is that now for all lookups of a group, glibc can be configured to check both the local files and the remote service and (if the group appears in both), combine the list of member users for both representations of the group into a single response.

Thus, it becomes possible to provide both local and central administrators into the wheel group. This can come in handy for example if an administrator wants to keep one or more local accounts available to do disaster recovery in the event that the machine loses access to the remote users (such as a bad update resulting in SSSD not starting).

Of course, this functionality does not come without a cost: because all merging lookups will try both data sources, it can result in a performance hit when operating against groups that otherwise would have been answered only by the local /etc/group file. With caching services like SSSD, this impact should be minimized.

Fedora and glibc upstream

The group merging patch has been submitted to the upstream glibc project but has not yet been merged into a release. It narrowly missed the 2.23 merge window, so it is currently slated for inclusion into glibc 2.24.

However, Carlos O’Donell has taken the patch and applied it to glibc in Fedora Rawhide (which will become Fedora 24), so it will be possible to take advantage of these features first in Fedora 24, before anyone else. (For anyone interested from other distributions, the patch should apply cleanly on 2.23 and likely with minimal effort atop 2.22 as well, since little changed besides this.)


January 25, 2016

Security and Tribal Knowledge
I've noted a few times in the past the whole security industry is run by magicians. I don't mean this in a bad way, it's just how things work. Long term will will have to change, but it's not going to be an easy path.

When I say everything is run by magicians I speak of extremely smart people who are so smart they don't need or have process (they probably don't want it either so there's no incentive). They can do whatever needs to be done whenever it needs doing. The folks in the center are incredibly smart but they learned their skills on their own and don't know how to pass on knowledge. We have no way to pass knowledge on to others, many don't even know this is a problem. Magicians can be awesome if you have one, until they quit. New industries are created by magicians but no industry succeeds with magicians. There are a finite number of these people and an infinite number of problems.

This got me thinking a bit, and it reminded me of the Internet back in the early 90's.

If you were involved in the Internet back in the 90's, it was all magic back then. The number of people who knew how things worked was incredibly small. There were RFCs and books and product documents, but at the end of the day, it was all magic. If your magician quit, you were screwed until you could find and hire a new magician. The bar was incredibly high.

Sounds a lot like security today.

Back then if you had a web page, it was a huge deal. If you could write CGI scripts, you were amazing, and if you had root on a server you were probably a magician. A lot of sysadmins knew C (you had to), a lot of software was built from source. Keeping anything running was a lot of work, infrastructure barely held together and you had to be an expert at literally everything.

Today getting a web site, running server side scripts, or root aren't impressive. You can get much of those things for free. How did we get here? The bar used to be really high. The bar is pretty low now but also a lot more people understand how much of this works. They're not experts but they know enough to get things done.

How does this apply to security?

Firstly we need to lower the bar. It's not that anyone really plans to do this, it just sort of happens via better tooling. I think the Linux distribution communities helped a lot making this happen back in the day. The tools got a lot better. If you configured a server in 1995 it was horrible, everything was done by hand. Now 80% of the work just sort of happens, you don't need super deep knowledge. Almost all security work done these days is manual. I see things like AFL and LLVM as the start but we have a long way to go. As of right now we don't know which tools are actually useful. There are literally thousands of security products on the market. Only the useful ones will really make a difference in the long term.

The second thing we need to do is transfer relevant knowledge. What that knowledge is will take time to figure out. Does everyone need to know how a buffer overflow exploit works? Probably not, but the tools will really determine who needs to know what. Today you need to know everything. In the future you'll need to know how to use the tools, interpret the output, and fill in some of the gaps. Think of it as the tools having 80% of the knowledge, you just need to bring the missing 20%. Only the tool writers need to know that missing knowledge. Today people have 100% or 0% of knowledge, this is a rough learning curve.

If you look at the Internet today, there is a combination of tons of howtos and much better software to setup and run your infrastructure. There are plenty of companies that can help you build the solution you need. It's not nearly as important to know now to configure your router anymore, there are better tools that do a lot of this for you. This is where security needs to go. We need tools and documents that are useful and helpful. Unfortunately we don't yet really know how to make useful tools, or how to transfer knowledge. We have a long way to go before we can even start that conversation.

The next step security needs to make is to create and pass on tribal knowledge. It's still a bad place to be in, but it's better than magicians. We'll talk about tribal knowledge in the future.

Join the conversation, hit me up on twitter, I'm @joshbressers

January 21, 2016

Resize disks in a Centos 7 Install

The default layout for disks in a Centos deployment may make sense for the average use case, but not for using the machine as a Tripleo all-in-one development box. I have 500 GB of Disk space, and the default installer puts 400GB into /home and 50 GB into /. However, since most of the work here is going to be done in virtual machines, the majority of the /home space is wasted, and I found I have filled up the 50 GB partition on / on a regular basis. So, I want to remove /home and put all the space under /.

Here is my start state.

# df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root   50G   18G   33G  35% /
devtmpfs                  16G     0   16G   0% /dev
tmpfs                     16G     0   16G   0% /dev/shm
tmpfs                     16G   33M   16G   1% /run
tmpfs                     16G     0   16G   0% /sys/fs/cgroup
/dev/mapper/centos-home  411G  1.9G  409G   1% /home
/dev/sda1                497M  167M  331M  34% /boot
tmpfs                    3.2G     0  3.2G   0% /run/user/0

Thus far, I only have 1.9 GB under /home, and 33 out of 50 GB under /, so I have enough space to work with. I start by backing up the /home subdirectories to space on the partition that holds /.

mkdir /home-alt
df -h
mv /home/stack/ /home-alt/
umount /home

Edit the Filesystem table to remove the home directory in the future.

#
# /etc/fstab
# Created by anaconda on Wed Jan 20 14:27:36 2016
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root /                       xfs     defaults        0 0
UUID=3347d9ba-bb62-44cf-8dfc-1b961279f428 /boot                   xfs     defaults        0 0
#/dev/mapper/centos-home /home                   xfs     defaults        0 0
/dev/mapper/centos-swap swap                    swap    defaults        0 0

From the above, we can see that the partition for / and /home are /dev/mapper/centos-root and /dev/mapper/centos-home.

using the pvs command, I can see one physical volume:

  PV         VG     Fmt  Attr PSize   PFree 
  /dev/sda2  centos lvm2 a--  476.45g 64.00m

Using vgs, I can see a singe volume group:

  VG     #PV #LV #SN Attr   VSize   VFree 
  centos   1   3   0 wz--n- 476.45g 64.00m

And finally, using lvs I see the three logical volumes that appeared in my fstab;

  LV   VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home centos -wi-a----- 410.70g                                                    
  root centos -wi-ao----  50.00g                                                    
  swap centos -wi-ao----  15.69g 

Remove the centos-home volume:

lvremove /dev/mapper/centos-home
Do you really want to remove active logical volume home? [y/n]: y
  Logical volume "home" successfully removed

Extend the centos-root volume by 410GB. I can resize the underlying file system at the same time by passing -r.

lvextend -r /dev/mapper/centos-root /dev/sda2

Check if it worked:

# df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root  461G   20G  442G   5% /
devtmpfs                  16G     0   16G   0% /dev
tmpfs                     16G     0   16G   0% /dev/shm
tmpfs                     16G   33M   16G   1% /run
tmpfs                     16G     0   16G   0% /sys/fs/cgroup
/dev/sda1                497M  167M  331M  34% /boot
tmpfs                    3.2G     0  3.2G   0% /run/user/0

I’ll admit, that was easier than I expected.

Return the /home subdirectories to their correct positions in the directory tree.

# mv /home-alt/ayoung/ /home/
# mv /home-alt/stack/ /home/
# rmdir /home-alt/

For references I used:

  1. How to Extend/Reduce LVM’s (Logical Volume Management) in Linux – Part II
  2. Resize your disks on the fly with LVM
  3. and the man pages for the commands listed.

January 20, 2016

Primes, parameters and moduli

First a brief history of Diffie-Hellman for those not familiar with it

<iframe allowfullscreen="true" class="youtube-player" frameborder="0" height="359" src="https://www.youtube.com/embed/M-0qt6tdHzk?version=3&amp;rel=1&amp;fs=1&amp;autohide=2&amp;showsearch=0&amp;showinfo=1&amp;iv_load_policy=1&amp;wmode=transparent" type="text/html" width="584"></iframe>

The short version of Diffie-Hellman is that two parties (Alice and Bob) want to share a secret so they can encrypt their communications and talk securely without an eavesdropper (Eve) listening in. So Alice and Bob first share a public prime number and modulus (which Eve can see). Alice and Bob then each choose a large random number (referred to as their private key) and apply some modular arithmetic using the shared prime and modulus. If everything goes as planned Alice sends her answer to Bob, and Bob sends his answer to Alice. They each take the number sent by the other party, and using modular arithmetic, and their respective private keys are able to derive a number that will be the same for both of them, known as the shared secret. Even if Eve listens in she cannot easily derive the shared secret.

However if Eve has sufficiently advanced cryptographic experts, and sufficient computing power it is conceivable that she can derive the private keys for exchanges when a sufficiently small modulus is used. Most keys, today, are in the range of 1024 bit or larger meaning that the modulus is at least several hundred digits long.

Essentially if Alice and Bob agree to a poorly chosen modulus number then Eve will have a much easier time deriving the secret keys and listening in on their conversation. Poor examples of modulus numbers include numbers that are not actually prime, and modulus numbers that aren’t sufficiently large (e.g. a 1024 bit modulus provides vastly less protection than a 2048 bit modulus).

Why you need a good prime for your modulus

A prime number is needed for your modulus. For this example we’ll use 23. Is 23 prime? 23 is small enough that you can easily walk through all the possible factors (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), divide 23 by them and see if there is a remainder. But much larger prime numbers, such as ones that are in the mid to high hundreds of digits long, are essentially impossible to factor unless you have a lot of computational resources, and some really efficient ways to try and factor prime numbers. Fortunately there is a simple solution to this, just use an even larger prime number, such as a 2048, 4096 or even 16384 bit prime number. But when picking such a number how can you be sure it’s a prime and not easily factored? Ignoring the obvious give-aways (like all numbers ending in 0, 2, 4, 5, 6 and 8) there are several clever mathematical algorithms for testing the primality of numbers and for generating prime numbers.

Miller-Rabin, Shawe-Taylor and FIPS 186-4

The Miller-Rabin primality test was first proposed in 1976, and the Shawe-Taylor strong prime generation was first proposed in 1986. One thing that is important to remember, is that back when these algorithms were made public the amount of computing power available to generate/factorize prime numbers was much smaller than is now available. The Miller-Rabin test is a probabilistic primality test, you cannot conclusively prove a number is prime, but by running the Miller-Rabin test multiple times with different parameters you can be reasonably certain that the number in question is probably prime and with enough tests your confidence can approach almost 100%. Shawe-Taylor is also probabilistic, you’re not 100% guaranteed to get a good prime, but the chances of something going wrong and getting a non-prime number are very small.

FIPS 186-4 covers the math and usage of both Miller-Rabin and Shawe-Taylor, and gives specific information on how to use them securely (e.g. how many rounds of Miller-Rabin you’ll need to use, etc.). The main difference between Rabin-Miller and Shawe-Taylor is that Shawe-Taylor generates something that is probably a prime, whereas with Miller-Rabin you generate a number that might be prime, and then test it. As such you may immediately generate a good number, or it may take several tries. In testing on a 3Ghz CPU, using a single core it took me between less than a second and over 10 minutes to generate a 2048 bit prime using the Miller-Rabin method.

Generating primes in advance

The easiest way to deal with the time and computing resources needed to generate primes is to generate them in advance. Also because they are shared publicly during the exchange you can even distribute them in advance, which is what has happened for example with OpenSSL. Unfortunately many of the primes currently in use were generated a long time ago when the attacks available were not as well understood, and thus are not very large. Additionally, there are relatively few primes in use and it appears that there may be a literal handful of primes in wide use (especially the default ones in OpenSSL and OpenSSH for example). There is now public research to indicate that at least one large organization may have successfully attacked several of these high value prime numbers, and as computational power increases this becomes more and more likely.

To generate Diffie-Hellman primes in advance is easy, for example with OpenSSL:

openssl dhparam [bit size] -text > filename

so to generate a 2048 bit prime:

openssl dhparam 2048 -text

Or for example with OpenSSH you first generate a list of candidate primes and then test them:

ssh-keygen -G candidates -b 2048
ssh-keygen -T moduli -f candidates

Please note that OpenSSH uses a list of multiple primes so generation can take some time, especially with larger key sizes.

Defending – larger primes

The best defense against someone factoring your primes is to use really large primes. In theory every time you add a single bit to the prime you are increasing the workload significantly (assuming no major advances in math/quantum computing that we don’t know about that make factorization much easier). As such moving from a 1024 bit to 2048 bit prime is a huge improvement, and moving to something like a 4096 bit prime should be safe for a decade or more (or maybe not, I could be wrong). So why don’t we simply use very large primes? CPU power and battery power are still finite resources, and very large primes take much more computational power to use, so much so that very large primes like 16384 bit primes become impractical to use, introducing noticeable delays in connections. The best thing we can do here is set a minimum prime size such as 2048 bits now, and hopefully move to 4096 bit primes within the next few years.

Defending – diversity of primes

But what happens if you cannot use larger primes? The next best thing is to use custom generated prime numbers, this means that an attacker will have to factor your specific prime(s), increasing their workload. Please note that even if you can use large primes, prime diversity is still a good idea, but prime diversity increases the amount of work for an attacker at a much slower rate than using larger prime does. The following apps and locations contain primes you may want to replace:

OpenSSH: /etc/ssh/moduli

Apache with mod_ssl: “SSLOpenSSLCondCmd DHParameters [filename]” or append the DH param to the SSLCertificateFile

Some excellent articles on securing a variety of other services and clients are:

https://weakdh.org/sysadmin.html

https://www.eff.org/deeplinks/2015/10/how-to-protect-yourself-from-nsa-attacks-1024-bit-DH

Current and future state

So what should we do?

DH Param Compromise

I think the best plan for dealing with this in the short term is deploying larger primes (2048 bits minimum, ideally 4096 bits) right now wherever possible. For systems that cannot have larger primes (e.g. some are limited to 768, 2013 bits or other related small sizes) we should ensure that default primes are not used and instead custom primes are used, ideally for limited periods of time, replacing the primes as often as possible (which is easier since they are small and quick to generate).

In the medium term we need to ensure as many systems as possible can handle larger prime sizes, and we need to make default primes much larger, or at least provide easy mechanisms (such as firstboot scripts) to replace them.

Longer term we need to understand the size of primes needed to avoid decryption due to advances in math and quantum computing. We also need to ensure software has manageable entry points for these primes so that they can easily be replaced and rotated as needed.

Why not huge primes?

Why not simply use really large primes? Because computation is expensive, battery life matters more than ever and latency will become problems that users will not tolerate. Additionally the computation time and effort needed to find huge primes (say 16k) is difficult at best for many users and not possible for many (anyone using a system on a chip for example).

Why not test all the primes?

Why not test the DH params passed by the remote end, and refuse the connection if the primes used are too small? There is at least one program (wvstreams [wvstreams]) that tests the DH params passed to it, however it does not check for a minimum size, it simply tests the DH params for correctness. The problem with this is twofold, one there would be a significant performance impact (adding time to each connection) and two, most protocols and programs don’t really support error messages from the remote end related to the DH params, so apart from dropping the connection there is not a lot you can do.

Summary

As bad as things sound there is some good news. Fixing this issue is pretty trivial, and mostly requires some simple operational changes such as using moderately sized DH Parameters (e.g. 2048 bits now, 4096 within a few years). The second main fix for this issue is to ensure any software in use that handles DH Parameters can handle larger key sizes, if this is not possible then you will need to place a proxy in front that can handle proper key sizes (so all your old Java apps will need proxies). This also has the benefit of decoupling client-server encryption from old software which will allow you to solve future problems more easily as well.

References

January 18, 2016

OpenSSH, security, and everyone else
If you pay attention at all, this week you heard about a security flaw in OpenSSH.

Of course nothing is going to change because of this. We didn't make any real changes after Heartbleed or Shellshock, this isn't nearly as bad, it's business as usual.

Trying to force change isn't the important part though. The important thing to think about is the context this bug exists in. The folks who work on OpenSSH are some of the brightest security minds in the world. We're talking well above average here, not just bright. If they can't avoid security mistakes, is there any hope for the normal people?

The answer no.

What do we do now?

For the moment we will continue to operate just like we have been. Things aren't great, but they're not terrible. Part of our problem is things aren't broken enough yet, we're managing to squeak by in most situations.

The next step will be developing some sort of tribal knowledge model. It will develop in a mostly organic way. Long term security will be a teachable and repeatable thing, but we can't just jump to that point, we have to grow into it.

If you look at most of the security conference content today it sort of falls into two camps.

  1. Look at my awesome research
  2. Everything is broken and we can't fix it

Both of these content sets are taught by magicians. They're not really teaching knowledge, they're showing off. How do we teach? Teaching is really hard to do, it's not easy to figure out.

Many people believe security can't be learned, it's just sort of something you have. This is nonsense. There are many possible levels of skill, there is a point where you have to be especially gifted to move on, but there is also a useful place a large number of people can reach.

Perhaps the best place to start is to think about the question "I want to learn security, where do I start?"

I've been asked that many times. I've never had a good answer.

If we want to move our industry forward that's what we have to figure out. If someone came to you asking how to learn security, we have to have an answer. Remember no idea is too crazy, if you have thoughts, let's start talking about it.

Join the conversation, hit me up on twitter, I'm @joshbressers

January 15, 2016

The SLOTH attack and IKE/IPsec

Executive Summary: The IKE daemons in RHEL7 (libreswan) and RHEL6 (openswan) are not vulnerable to the SLOTH attack. But the attack is still interesting to look at .

The SLOTH attack released today is a new transcript collision attack against some security protocols that use weak or broken hashes such as MD5 or SHA1. While it mostly focuses on the issues found in TLS, it also mentions weaknesses in the “Internet Key Exchange” (IKE) protocol used for IPsec VPNs. While the TLS findings are very interesting and have been assigned CVE-2015-7575, the described attacks against IKE/IPsec got close but did not result in any vulnerabilities. In the paper, the authors describe a Chosen Prefix collision attack against IKEv2 using RSA-MD5 and RSA-SHA1 to perform a Man-in-the-Middle (MITM) attack and a Generic collision attack against IKEv1 HMAC-MD5.

We looked at libreswan and openswan-2.6.32 compiled with NSS as that is what we ship in RHEL7 and RHEL6. Upstream openswan with its custom crypto code was not evaluated. While no vulnerability was found, there was some hardening that could be done to make this attack less dangerous that will be added in the next upstream version of libreswan.

Specifically, the attack was prevented because:

  • The SPI’s in IKE are random and part of the hash, so it requires an online attack of 2^77 – not an offline attack as suggested in the paper.
  • MD5 is not enabled per default for IKEv2.
  • Weak Diffie-Hellman groups DH22, DH23 and DH24 are not enabled per default.
  • Libreswan as a server does not re-use nonces for multiple clients.
  • Libreswan destroys nonces when an IKE exchange times out (default 60s).
  • Bogus ID payloads in IKEv1 cause the connection to fail authentication.

The rest of this article explains the IKEv2 protocol and the SLOTH attack.

The IKEv2 protocol

sloth-ike-1

The IKE exchange starts with an IKE_INIT packet exchange to perform the Diffie-Hellman Key Exchange. In this exchange, the initiator and responder exchange their nonces. The result of the DH exchange is that both parties now have a shared secret called SKEYSEED. This is fed into a mutually agreed PRF algorithm (which could be MD5, SHA1 or SHA2) to generate as much pseudo-random key material as needed. The first key(s) are for the IKE exchange itself (called the IKE SA  or Parent SA), followed by keys for one or more IPsec SAs (also called Child SAs).

But before the SKEYSEED can be used, both ends need to perform an authentication step. This is the second packet exchange, called IKE_AUTH. This will bind the Diffie-Hellman channel to an identity to prevent the MITM attack. Usually these are digital signatures over the session data to prove ownership of the identity’s private key. In IKE, it signs the session data. In TLS that signature is only over a hash of the session data which made TLS more vulnerable to the SLOTH attack.

The attack is to trick both parties to sign a hash which the attacker can replay to the other party to fake the authentication of both entities.

sloth-ike-2

They call this a “transcript collision”. To facilitate the creation of the same hash, the attacker needs to be able to insert its own data in the session to the first party so that the hash of that data will be identical to the hash of the session to the second party. It can then just pass on the signatures without needing to have private keys for the identities of the parties involved. It then needs to remain in the middle to decrypt and re-encrypt and pass on the data, while keeping a copy of the decrypted data.

The IKEv2 COOKIE

The initial IKE_INIT exchange does not have many payloads that can be used to manipulate the outcome of the hashing of the session data. The only candidate is the NOTIFY payload of type COOKIE.

Performing a Diffie-Hellman exchange is relatively expensive. An attacker could send a lot of IKE_INIT requests forcing the VPN server to use up its resources. These could all come from spoofed source IP addresses, so blacklisting such an attack is impossible. To defend against this, IKEv2 introduced the COOKIE mechanism. When the server gets too busy, instead of performing the Diffie-Hellman exchange, it calculates a cookie based on the client’s IP address, the client’s nonce and its own server secret. It hashes these and sends it as a COOKIE payload in an IKE_INIT reply to the client. It then deletes all the state for this client. If this IKE_INIT exchange was a spoofed request, nothing more will happen. If the request was a legitimate client, this client will receive the IKE_INIT reply, see the COOKIE payload and re-send the original IKE_INIT request, but this time it will include the COOKIE payload it received from the server. Once the server receives this IKE_INIT request with the COOKIE, it will calculate the cookie data (again) and if it matches, the client has proven that it contacted the server before. To avoid COOKIE replays and thwart attacks attempting to brute-force the server secret used for creating the cookies, the server is expected to regularly change its secret.

Abusing the COOKIE

The SLOTH attacker is the MITM between the VPN client and VPN server. It prepares an IKE_INIT request to the VPN server but waits for the VPN client to connect. Once the VPN client connects, it does some work with the received data that includes the proposals and nonce to calculate a malicious COOKIE payload and sends this COOKIE to the VPN client. The VPN client will re-send the IKE_INIT request with the COOKIE to the MITM. The MITM now sends this data to the real VPN server to perform an IKE_INIT there. It includes the COOKIE payload even though the VPN server did not ask for a COOKIE. Why does the VPN server not reject this connection? Well, the IKEv2 RFC-7296 states:

When one party receives an IKE_SA_INIT request containing a cookie whose contents do not match the value expected, that party MUST ignore the cookie and process the message as if no cookie had been included

The intention here was likely meant for a recovering server. If the server is no longer busy, it will stop sending cookies and stop requiring cookies. But a few clients that were just about to reconnect will send back the cookie they received when the server was still busy. The server shouldn’t reject these clients now, so the advice was to ignore the cookie in that case. Alternatively, the server could just remember the last used secret for a while and if it receives a cookie when it is not busy, just do the cookie validation. But that costs some resources too which can be abused by an attacker to send IKE_INIT requests with bogus cookies. Limiting the time of cookie validation from the time when the server became unbusy would mitigate this.

COOKIE size

The paper actually pointed out a common implementation error:

To implement the attack, we must first find a collision between m1 amd m’1. We observe that in IKEv2 the length of the cookie is supposed to be at most 64 octets but we found that many implementations allow cookies of up to 2^16 bytes. We can use this flexibility in computing long collisions.

The text that limits the COOKIE to 64 byte is hidden deep down in the RFC when it talks about a special use case. It is not at all clearly defined:

When a responder detects a large number of half-open IKE SAs, it
SHOULD reply to IKE_SA_INIT requests with a response containing the
COOKIE notification. The data associated with this notification MUST
be between 1 and 64 octets in length (inclusive), and its generation
is described later in this section. If the IKE_SA_INIT response
includes the COOKIE notification, the initiator MUST then retry the
IKE_SA_INIT request, and include the COOKIE notification containing
the received data as the first payload, and all other payloads
unchanged.

A few implementations (including libreswan/openswan) missed this 64 byte limitation. So instead, those implements only looked at the COOKIE value as a NOTIFY PAYLOAD. These payloads have a two byte Payload Length value, so NOTIFY data is legitimately 2^16 (65535) bytes. Libreswan will fix this in the next release and limit the COOKIE to 64 bytes.

Attacking the AUTH hash

Assuming the above works, it needs to find a collision between m1 and m’1. The only numbers they claim could be feasible is when MD5 would be used for the authentication step in IKE_AUTH. An offline attack could then be computed of 2^16 to 2^39 which they say would take about 5 hours. As the paper states, IKEv2 implementations either don’t support MD5, or if they do it is not part of the default proposal set. It makes a case that the weak SHA1 is widely supported in IKEv2 but admits using SHA1 will need more computing power (they listed 2^61 to 2^67 or 20 years). Note that libreswan (and openswan in RHEL) requires manual configuration to enable MD5 in IKEv2, but SHA1 is still allowed for compatibility.

The final step of the attack – Diffie-Hellman

Assuming the above succeeds the attacker needs to ensure that g^xy’ = g^x’y. To facilitate that, they use a subgroup confinement attack, and illustrate this with an example of picking x’ = y’ = 0. Then the two shared secrets would have the value 1. In practice this does not work according to the authors because most IKEv2 implementations validate
the received Diffie-Hellman public value to ensure that it is larger than 1 and smaller than p – 1.They did find that Diffie-Hellman groups 22 to 24 are known to have many small subgroups, and implementations tend to not validate these. Which led to an interesting discussion on one of the cypherpunks mailinglists about the mysterious nature of the DH groups in RFC-5114. Which are not enabled in libreswan (or openswan in RHEL) by default, and require manual configuration precisely because the origin of these groups is a mystery.

The IKEv1 attack

The paper briefly brainstorms about a variant of this attack using IKEv1. It would be interesting because MD5 is very common with IKEv1, but the article is not really clear on how that attack should work. It mentions filling the ID payload with malicious data to trigger the collision, but such an ID would never pass validation.

Counter measures

Work was already started on updating the cryptographic algorithms deemed mandatory to implement for IKE. Note that it does not state which algorithms are valid to use, or which to use per default. This work is happening at the IPsec working group at the IETF and can be found at draft-ietf-ipsecme-rfc4307bis. It is expected to go through a few more rounds of discussion and one of the topics that will be raised are the weak DH groups specified in RFC-5114.

Upstream Libreswan has hardened its cookie handling code, preventing the attacker from sending an uninvited cookie to the server without having their connection dropped.

January 10, 2016

What the lottery and security have in common

If you live in the US you can't escape the news about the Powerball lottery. The jackpot has grown to $1.3 Billion (with a capital B). Everyone is buying tickets and talking about what they'll do when they win enough money to ruin their life.

This made me realize the unfortunate truth about security we like to ignore. Humans are bad at reality. Here is how most of my conversations go.

"You won't win. The odds are zero percent"
"I might! You don't know!"
GOTO 10

I'm of course labeled as being some sort of party pooper because I'm not creating stories about how I will burn through hundreds of millions of dollars in a few short weeks.

What does this have to do with security? It's because people are bad at reality. Let's find out why.

Firstly, remember that as a species evolution has built us to survive on the African Savannah. We are good at looking for horrible beasts in the grass, and begin able to quickly notice other humans (even if they appear in toast). We are bad at things like math and science because math rarely hides in the grass and eats people. The vast majority of people live their lives unaware of this as a problem. What we call "intuition" is simply "don't get eaten by things with big teeth".

Keeping this in mind, let's use the context of the lottery. The odds are basically zero percent once you take the margin of error into account. We don't care though, we want to believe that there's a chance to win. Our brain says "anything is possible" then marketing helps back that up. Almost nobody knows how bad their odds really are and since you see a winner on TV every now and then, you know it's possible, you could be next! The lottery ticket is our magic gateway to fixing all our problems.

Now switch to security. People are bad at understanding the problems. They don't grasp any of the math involved with risk, they want to do something or buy something that is the equivalent of a lottery ticket. They want a magic ticket that will solve all their problems. There are people selling these tickets. The tickets of course don't work.

How we fix this if the question. Modern medicine is a nice example. Long ago it was all magic (literally). Then by creating the scientific method and properly training doctors things got better. People stopped listening to the magicians (well, most people) and now they listen to doctors who use science to make things better. There is still plenty of quack medicine though, we want to believe in the magic cures. In general most of humanity goes to doctors when they're sick though.

Today all security is magic. We need to find a way to create security science so methods and ideas can be taught.

Between thinking about how to best blow my lottery winnings, I'll probably find some time to think about what security science looks like. Once I win though you'll all be on your own. You've been warned!

Join the conversation, hit me up on twitter, I'm @joshbressers

January 07, 2016

Deploying Keycloak via Ansible

Keystone needs to work with multiple federation sources. Keycloak is a JBoss based project that provides, among other things, SAML and OpenID connect protocols. As part of my work in getting the two integrated, I needed to deploy Keycloak. The rest of my development setup is done via Ansible and I wanted to handle Keycloak the same way.

Unlike Ipsilon, Keycloak is not deployed via RPMs and Yum. Instead, the most common deployment method is to download and expand the tarball. This provides a great deal of flexibility to the deployer. While I am not going for a full live-deployment approach here, I did want to use best practices. Here were the decisions I made:

  • Use the System deployed Java runtime
  • Run as a non-root dedicated user named keycloak.
  • Manage the process via systemd
  • Put the majority of the files under /var/run/keycloak.
  • Have all code and configuration owned by root and not be editable by the Keycloak user
  • Use firewalld to open only the ports necessary (8080 and 9990) to communicate with the Keycloak server itself.

Here is the roles/keycloak/tasks/main.yml file that has the majority of the logic:

 

---
- name: install keycloak prerequisites
  tags:
    - keycloak
  yum: name={{ item }} state=present
  with_items:
    - java-1.7.0-openjdk.x86_64
    - firewalld

- name: create keycloak user
  tags:
  - keyclock
  user: name=keycloak

- name: keycloak target directory
  tags:
  - keyclock
  file: dest={{ keycloak_dir }}
        mode=755
        owner=root
        group=root
        state=directory


- name: get Keycloak distribution tarball
  tags:
    - keycloak
  get_url: url={{ keycloak_url }}
           dest={{ keycloak_dir }}

- name: unpack keycloak
  tags:
    - keycloak
  unarchive: src={{ keycloak_dir }}/{{keycloak_archive}}
             dest={{ keycloak_dir }}
             copy=no

- name: keycloak log directory
  tags:
  - keyclock
  file: dest={{ keycloak_log_dir }}
        mode=755
        owner=keycloak
        group=keycloak
        state=directory

- name: keycloak data directory
  tags:
  - keyclock
  file: dest={{ keycloak_jboss_home }}/standalone/data
        mode=755
        owner=keycloak
        group=keycloak
        state=directory


- name: keycloak tmp directory
  tags:
  - keyclock
  file: dest={{ keycloak_jboss_home }}/standalone/tmp
        mode=755
        owner=keycloak
        group=keycloak
        state=directory

- name: make keycloak configuration directory readable
  tags:
  - keyclock
  file: dest={{ keycloak_jboss_home }}/standalone/configuration
        mode=755
        owner=keycloak
        group=keycloak
        state=directory
        recurse=yes

- name: keycloak systemd setup
  tags:
    - keycloak
  template:
       owner=root group=root mode=0644
      src=keycloak.service.j2
      dest=/etc/systemd/system/keycloak.service
  notify:
    - reload systemd

- name: enable firewalld
  tags:
    - ipaserver
  service: enabled=yes
           state=started
           name=firewalld

- name: Open Firewall for services
  tags:
    - keycloak
  firewalld: port={{ item }}
             permanent=true
             state=enabled
             immediate=yes
  with_items:
    - 8080/tcp
    - 9990/tcp

- name: keycloak systemd service enable and start
  tags:
    - keycloak
  service: name=keycloak
           enabled=yes
           state=started

It makes use of some variables that I expect to have to tweak as package versions increase. Here is the roles/keycloak/vars/main.yml file

---
keycloak_version: 1.6.1.Final
keycloak_dir: /var/lib/keycloak
keycloak_archive: keycloak-{{ keycloak_version }}.tar.gz
keycloak_url: http://downloads.jboss.org/keycloak/{{ keycloak_version }}/{{keycloak_archive }}
keycloak_jboss_home: "{{ keycloak_dir }}/keycloak-{{ keycloak_version }}"
keycloak_log_dir: "{{ keycloak_jboss_home }}/standalone/log"

For Systemd I started with the configuration as suggested by: Jens Krämer Which I tailored to reference Keycloak explicitly and also to listen on 0.0.0.0. Here is the template file roles/keycloak/tempaltes/keycloak.service.js

[Unit]
Description=Jboss Application Server
After=network.target

[Service]
Type=idle
Environment=JBOSS_HOME={{ keycloak_jboss_home }} JBOSS_LOG_DIR={{ keycloak_log_dir }} "JAVA_OPTS=-Xms1024m -Xmx20480m -XX:MaxPermSize=768m"
User=keycloak
Group=keycloak
ExecStart={{ keycloak_jboss_home }}/bin/standalone.sh -b 0.0.0.0
TimeoutStartSec=600
TimeoutStopSec=600

[Install]
WantedBy=multi-user.target

The top level playbook for this is somewhat muddied by having other roles, not relevant for this post. It looks like this:

- hosts: keycloak
  remote_user: "{{ cloud_user }}"
  tags: all
  tasks: []

- hosts: keycloak
  sudo: yes
  remote_user: "{{ cloud_user }}"
  tags:
    - ipa
  roles:
    - common
    - ipaclient
    - keycloak
  vars:
    hostname: "{{ ansible_fqdn }}"
    ipa_admin_password: "{{ ipa_admin_user_password }}"

And I call it is using:

 ansible-playbook -i ~/.ossipee/deployments/ayoung.os1/inventory.ini keycloak.yml

I’m tempted to split this code off into its own repository; right now I have it as part of Rippowam.

January 05, 2016

Boolean: virt_use_execmem What? Why? Why not Default?
In a recent bugzilla, the reporter was asking about what the virt_use_execmem.

  • What is it?

  • What did it allow?

  • Why was it not on by default?


What is it?

Well lets first look at the AVC

type=AVC msg=audit(1448268142.167:696): avc:  denied  { execmem } for  pid=5673 comm="qemu-system-x86" scontext=system_u:system_r:svirt_t:s0:c679,c730 tcontext=system_u:system_r:svirt_t:s0:c679,c730 tclass=process permissive=0

If you run this under audit2allow it gives you the following message:


#============= svirt_t ==============

#!!!! This avc can be allowed using the boolean 'virt_use_execmem'
allow svirt_t self:process execmem;


Setroubleshoot also tells you to turn on the virt_use_execmem boolean.

# setsebool -P virt_use_execmem 1

What does the virt_use_execmem boolean do?

# semanage boolean -l | grep virt_use_execmem
virt_use_execmem               (off  ,  off)  Allow confined virtual guests to use executable memory and executable stack


Ok what does that mean?  Uli Drepper back in 2006 added a series of memory checks to the SELInux kernel to handle common
attack vectors on programs using executable memory.    Basically these memory checks would allow us to stop a hacker from taking
over confined applications using buffer overflow attacks.

If qemu needs this access, why is this not enabled by default?

Using standard kvm vm's does not require qemu to have execmem privilege.  execmem blocks certain attack vectors 
Buffer Overflow attack where the hacked process is able overwrite memory and then execute the code the hacked 
program wrote. 

When using different qemu emulators that do not use kvm, the emulators require execmem to work.  If you look at 
the AVC above, I highlighted that the user was running qemu-system-x86.  I order for this emulator to work it
needs execmem so we have to loosen the policy slightly to allow the access.  Turning on the virt_use_execmem boolean
could allow a qemu process that is susceptible to buffer overflow attack to be hacked. SELinux would not block this
attack.

Note: lots of other SELinux blocks would still be in effect.

Since most people use kvm for VM's we disable it by default.



I a perfect world, libvirt would be changed to launch different emulators with different SELinux types, based on whether or not the emulator
requires execmem.   For example svirt_tcg_t is defined which allows this access.

Then you could run svirt_t kvm/qemus and svirt_tcg_t/qemu-system-x86 VMs on the same machine at the same time without having to lower
the security.  I am not sure if this is a common situation, and no one has done the work to make this happen.

January 04, 2016

A security analogy that works
Over the holiday break I spent a lot of time reading and thinking about what the security problem really is. It's really hard to describe, no analogies work, and things just seem to keep getting worse.

Until now!

Maybe.

Well, things will probably keep getting worse, but I think I've found a way to describe this almost anyone can understand. We can't really talk about our problems today, which makes it impossible to fix anything.

Security is the same problem as World Hunger. Unfortunately we can't solve either, but in theory we can make things better. Let's look at the comparisons.

First, the problem we talk about isn't just one thing. It's really hundreds or thousands of other problems we lump together into one group and give it a simple yet mostly meaningless name. The real purpose of the name is to give humans a single idea they can relate to. It's not meant to make the problem more fixable, it just makes it so we can talk about it.

Security includes things like application security, operational security, secure development, secure documentation, pen testing, hacking, DDoS, and hundreds of other things.

World hunger includes homelessness, hunger, malnutrition, lack of education, clean water, and hundreds of other things.

Lots of little things.

Second, the name isn't really the problem. It's what we can see. It's a symptom of other problems. The other problems are what you have to fix, you can't fix the name.

What we call "security" is really other things, and the real problem is rarely security, it's something else, security is the symptom we can see, the real problem is less obvious and hard to see.

In the context of world hunger the real problems are things like clean water, education, equality, corruption, crime, and the list goes on. Hunger is what we see, but to fix hunger, we have to fix those other problems.

We can give people food, but that doesn't fix the real problem, it makes things better for a day or a week. This is exactly how security works today. We run from fire to fire, fixing a single easy to see problem, then run off to the next thing. We never solve any problems, we put out fires.

So assuming this analogy holds, the sort of good news is that world hunger is slowly getting better. The bad news is progress is measured in decades. This is where my thinking starts to falter. Trade can help bring more progress to a given area. What is the equivalent in security? Are there things that can help make the situation better for a localized area? Will progress take decades?

If I had to guess, which I will, I suspect we're in the dark ages of security. We don't approach problems with a scientific mind, we try random things until something works, and then decide that spinning around while holding a chicken is what fixed that buffer overflow.

What we need is "security science". This means we need ways to approach security in a formal reproducible manner. A practice that can be taught and learned. Today it's all magic, some people have magic, most don't. Remember when the world had magicians instead of doctors? Things weren't better back then no matter what those forwards from your uncle claims.

This all leaves a lot of unanswered questions, but I think it's a starting point. Today we have no starting point, we have people complaining everything is broken, people selling magic, some have given up and assume this is how everything will just always be.

What will our Security Renaissance be? What will security science look like?

Join the conversation, hit me up on twitter, I'm @joshbressers

December 29, 2015

Security reminds me of the gym on January 2
If you have any sort of gym membership you dread the month of January. Every year, there are countless people who make a resolution to get in shape, so the gym is flooded with people for much of January. I'm in favor of everyone staying in shape and having a gym membership, my point isn't to claim how annoying the n00bs are. The point of this story is how few people stick around, and most give up because doing nothing is often easier than doing something.

What does this have to do with security?

The parallel here worries me. Let's use Heartbleed for our context.

After Heartbleed (January 1), everyone was talking about security, it was super important and everyone wanted more security (flooding the gym). After a while (February) most people stopped obsessing over security, a few stick around, most don't. As a species we're not really doing any better now than we were before Heartbleed. You could make some arguments, but it's a rounding error at best.

The real issue here is this is how humans work. We love running to whatever is popular, pretending we always knew it was cool, and watching for whatever next hip thing will pop up for us to latch on to.

Our current security problems aren't technology problems, they are human problems. We have to assume we can't change human nature. The vast majority of people will never take security seriously. They know it's important, they might even want to do it right, but at the end of the day they're not going to do anything about it.

The only solution is to make secure the default option.

This is probably harder than changing human nature.

Can this problem actually be fixed? I'm not sure. I need to think about it. I don't want to say no, but my crystal ball is pretty fuzzy here. There are a lot of weird problems all tied together in bizarre ways. I'm always happy to listen to new ideas, let me know if you have any. The more I learn the less I know seems to be the only constant.

Join the conversation, hit me up on twitter, I'm @joshbressers

December 23, 2015

DevOps On The Desktop: Containers Are Software As A Service

It seems that everyone has a metaphor to explain what containers “are”. If you want to emphasize the self-contained nature of containers and the way in which they can package a whole operating system’s worth of dependencies, you might say that they are like virtual machines. If you want to emphasize the portability of containers and their role as a distribution mechanism, you might say that they are like a platform. If you want to emphasize the dangerous state of container security nowadays, you might say that they are equivalent to root access. Each of these metaphors emphasizes one aspect of what containers “are”, and each of these metaphors is correct.

It is not an exaggeration to say that Red Hat employees have spent man-years clarifying the foggy notion invoked by the buzzword “the cloud”. We might understand cloudiness as having three dimensions: (1) irrelevant location, (2) external responsibility, and (3) the abstraction of resources. The different kinds of cloud offerings distinguish themselves from one another by their emphasis on these qualities. The location of the resources that comprise the cloud is one aspect of the cloud metaphor and the abstraction of resources is another aspect of the cloud metaphor. This understanding was Red Hat’s motivation for both its private-platform offerings and its infrastructure-as-a-service offerings (IaaS/PaaS). Though the hardware is self-hosted and administered, developers are still able to think either in terms of pools of generic computational resources that they assign to virtual machines (in the case of IaaS) or in terms of applications (in the case of PaaS).

What do containers and the cloud have in common? Software distribution. Software that is distributed via container or via statically-linked binary is essentially software-as-a-service (SaaS). The implications of this are far-reaching.

Given the three major dimensions of cloudiness, what is software as a service? It is a piece of software hosted and administered externally to you that you access mainly through a network layer (either an API or a web interface). With this definition of software as a service, we can declare that  99% of the container-distributed and statically-linked Go software is SaaS that happens to run on your own silicon powered by your own electricity. Despite being run locally, this software is still accessed through a network layer and this software is still—in practice—administered externally.

A static binary is a black box. A container is modifiable only if it was constructed as an ersatz VM. Even if the container has been constructed as an ersatz VM, it is only as flexible as (1) the underlying distribution in the container and (2) your familiarity with that distribution. Apart from basic networking, the important parts of administration must be handled by a third party: the originating vendor. For most containers, it is the originating vendor that must take responsibility for issues like Heartbleed that might be present in software’s underlying dependencies.

This trend, which shows no signs of slowing down, is a natural extension to the blurring of the distinction between development and operations. The term for this collaboration is one whose definition is even harder to pin down than “cloud”: DevOps. The DevOps movement has seen some traditional administration responsibilities—such as handling dependencies—become shared between operational personnel and developers. We have come to expect operations to consume their own bespoke containers and static binaries in order to ensure consistency and to ensure that needed runtime dependencies are always available. But now, a new trend is emerging—operational groups are now embedding the self-contained artifacts of other operational groups into their own stack. Containers and static blobs, as a result, are now emerging as a general software distribution method.

The security implications are clear. Self-contained software such as containers and static binaries must be judged as much by their vendor’s commitments to security as by their feature set because it is that vendor who will be acting as the system administrator. Like when considering the purchase of a phone, the track record for appropriate, timely, and continuous security updates is as important as any feature matrix.

Some security experts might deride the lack of local control over security that this trend represents. However, that analysis ignores economies of scale and the fact that—by definition—the average system administrator is worse than the best. Just as the semi-centralized hosting of the cloud has allowed smaller businesses to achieve previously impossible reliability for their size, so too does this trend offer the possibility of a better overall security environment.

Of course, just as the unique economic, regulatory, and feature needs of enterprise customers pushed those customers to private clouds, so too must there be offerings of more customizable containers.

Red Hat is committed to providing both “private cloud” flexibility and to helping ISVs leverage the decades of investment that we have made in system administration. We release fresh containers at a regular cadence and at the request of our security team. By curating containers in this way, we provide a balance between the containers becoming dangerously out of date and the fragility that naturally occurs when software used within a stack updates “too often”. However, just as important is our commitment to all of our containers being updatable in the ways our customers have come to expect from their servers and VMs: `yum update` for RPM based content, and zips and patches for content such as our popular JBoss products. This means that if you build a system on a RHEL-based container you can let “us” administer it by simply keeping up with the latest container releases *or* you can take control yourself using tools you already know.

Sadly, 2016 will probably not be the year of the Linux desktop, but it may well be the year of DevOps on the desktop. In the end, that may be much more exciting.

December 22, 2015

Securing email to Gmail

I’ve been working on securing my postfix configuration to enforce certificate validation and encryption on some known, higher-volume, or more sensitive connections between SMTP servers (port 25).

On many of the connections I’ve setup for secure transport there have been no problems (assuming proper TLS certificates are used).  Unfortunately Gmail™ has been a problem.  Sometimes it verifies and validates the certificate and other times it doesn’t… for days.

After conferring with Google Security I believe I’ve come up with a solution.  In my tls_policy file I’ve added the following:

gmail.com       secure match=.google.com:google.com ciphers=high protocols=TLSv1

So far this is working but I’ll continue to test.

If you run your own SMTP server and wish to maintain a secure connection with Gmail this is an easy way to enforce encryption as well as validate the certificate.  Of course this doesn’t protect the message while it’s being stored on the server or workstation (or on Google’s internal network).  To protect messages at rest (on a server) one should use GPG or S/MIME.  Using both TLS over the network between servers and GPG or S/MIME is beneficial to provide protection of the messages going over the Internet.

Update

This configuration is applicable with the OpenSSL version shipped with CentOS 6/RHEL 6.  Implementing this on CentOS 7/RHEL7 or another flavor of Linux may require a different/better configuration.


December 21, 2015

A Christmas Cyber
Mallory was dead: to begin with. Bob knew he was dead, and nobody liked Bob, he was the security guy, nobody likes the security guy.

"Merry Christmas Bob!" said Alice. "Bah humbug!" was the reply. Bob had to work over Christmas protecting the network, he had no reason to be merry. As Bob opened the door to the server room he noticed the door knocker looked like Mallory, which was odd as the server room door didn't have a knocker. A closer inspection led Bob to believe his mind was playing tricks on him.

Bob sat down at the terminal and heard the door slam shut. This is of course impossible as the door has a slow closer on it so this sort of thing couldn't happen. As Bob peeked around the side of the terminal he saw the ghost of Mallory.

"How now!" said Bob, "What do you want with me?"

"Much!" said Mallory.

"Tonight you will be visited by 3 spirits, they will guide you in hopes that you can avoid my path."

Mallory walked backwards, hit his head on the door, fell down, then stood up, looked around, and snuck out as best as one can after running into a door and falling down.

"Still an idiot" thought Bob. "I can't imagine any of this is real."

Cyber Past

Later that night while reading Alice's mail instead of checking the IDS logs, Bob heard a sound that made him look up quickly. There standing before him was a woman with a ghostly appearance.

"I am the ghost of cyber past" whispered the spirit. "The ... what, wait, what? This stupid thing is real?" "I'm here to show you how you used to be, the shadows of things that once were."

Instantly Bob was transported to the server room ten years ago. He was speaking with the lead architect about how to secure the infrastructure.

"I remember him" recalled Bob. "He should have been fired for incompetence." "You weren't always like this" said the spirit "You once had hope you could change things and help them." "Well, I was a foolish youth, these people are beyond help now" Bob recalled.

The Spirit gazed at the youthful Bob. "We should create a security policy that will help keep the network secure, it's important not to get in the way too much, I have no doubt we can do this if we work together!"

Just then the scene faded and they were returned to the server room of today, a drab place that had no joy or good ideas anywhere you looked.

"Sigh, there are going to be two more of these bozos who come tonight I suppose. I probably won't get anything done. This will be worse than end of quarter."

Cyber Present

The clock struck one, which was odd given there isn't a clock in the server room. "Now why is that even needed" yelled Bob.

Bob looked up and saw another Spirit. "You're the one who will show me nobody likes me right!" The spirit looked at him and sighed. "This is why nobody likes you Bob, let's go."

The first stop was a party where Alice is talking to some friends. "Then he actually said bah humbug. I mean, who even does that. The guy is totally mental." Bob shouted "It's not like you're any better!" "She can't hear you" said the Spirit. Bob grumbled something foul to himself.

"I had hoped to show you more, but this is the only person I could find who even talked about you, seriously Bob, you need to be nicer to, well, anyone."

The scene changed to Bob's apartment. It was a disheveled room with clutter everywhere. The computer chair was the only place that didn't have a mess on it. "I have friends in World of Warcraft!" "That's a lie and you know it!" said the Spirit. "They kicked you out of the guild because you treat them all horribly."

"Really Bob, I've been doing this a long time, you're without a doubt the most unlikable person I've seen, you need to be nicer." "Maybe if they were nice to me!" "It really doesn't work that way. Stop being such a jerk." Bob looked at the Spirit "Aren't you supposed to be all mysterious and not tell me what to do?" "I've made an exception. Also, clean up this dump when you get home."

With that the room vanished and Bob was again in the server room.

"What a waste of time" he sighed. "That guy was dumber than the people I have to work with."


Cyber Future

The last Spirit was waiting for Bob as soon as he arrived. "This one is supposed to scare me" thought Bob. He looked up and saw one of his sales reps. "Oh FOR ..." "Hi Bob, shall we get going?" "I always knew there was something up with you, you actually are the devil!" "Spirit Bob, I'm a spirit." "Oh whatever, look, I'm busy, can we just assume you show me a terrible future so I can finish up?"

"No."

The server room was suddenly much brighter, it was clearly daytime at some point in the future. There were two people talking. "Will you miss him?" said the first person, Bob didn't recognize them. "Absolutely not" said the second person. "That guy was horrible. Nobody liked him, I'm amazed it took so long to fire him, what a pain". "You can't just be a tyrant, security is important, we need someone who can help, not just tell us 'no' anytime a question is asked." "Hah, that's true, all Bob ever did was say no and yell. Thank goodness they fired him."

"They fire me!" asked Bob. "They had no choice" said the Spirit. "You weren't actually helping, you just made problems worse really. Remember this is but the future that could be if you don't change your ways. There is still hope for you Bob, you can make things better instead of just being part of the problem. Tonight was all about showing you the error of your ways so you can become the security person you once thought you could be. The security person the world needs, it's important. It's time to go now, you've seen enough. I'll call you on Monday, I think your firewall it out of compliance."

With that the server room scene changed back to the present, it was dark outside and Bob was alone in the room. He shivered, it was suddenly chilly.

Bob took a deep breath, what a night. He looked up at the clock, it was almost time to head home. The future Bob saw made him nervous. "That's not how I want to go out, I'm smart enough to make things right" he thought. Bob leaned back in his chair. After thinking about what to do Bob decided he had to change things. That's not the future he wanted he had to build a new future. A great future, a future he deserves!

Bob grinned, grabbed a scrap of paper and started writing something down. He taped it to the door. The note read "Merry Christmas everyone, Love Bob."

"This will be a Christmas to remember" Bob said out loud.

He then shut off the power to the whole server room and left. His phone started ringing immediately and he ignored it as he walked to his car. "Nobody fires me!" he thought to himself. "I wonder if the guild will let me back in?"

December 14, 2015

Security is the new paperless office!
If you're old enough, you remember reading a lot about the coming "paperless office". It never came, but I realized there are parallels we can draw in the context of our current security problems.

Back in the 90's, everyone wanted a paperless office. It sounded neat and with the future coming, who would need paper with all the flying cars and hoverboards! It turns out paper didn't go away. Everyone keeps talking about how security is the most important thing ever, investing in the paperless office was once the most important thing ever.

Stage 1: Magic!

This is where security is today. Everyone knows it's neat, but nobody knows what to really do. Well some people know, but nobody listens to them. Instead we want a magic solution that will fix everything. Most of it doesn't work but who cares, it's magic, shut up and take my money!

The paperless office had tons of bizarre things from magic scanners to document systems to things that almost looked like a tablet to store all your paper. None of those things really worked well, they were't purchased by a lot of people. Anyone who owned an early Palm Pilot probably remembers how just keeping the thing working took at least double the time a paper book consumed. That doesn't even count the odd writing style you had to use, I'm having flashbacks just thinking about it.

Back in those days most companies had rooms to store the documents. It generally had a lock on it that was never locked, and most of the documents got filed away and were never ever looked at again. The amount of wasted paper and floor space was crazy. If there was a fire, everything got lost. The reasons to get your data out of those rooms was pretty obvious. Just like the reasons to now protect that data is obvious, but how to actually do these things is not.

Stage 2: There is no stage 2

The thing is, there wasn't ever some mega event that ushered in the paperless office, there will probably never be a paperless office. What actually happened, and is still happening, is we saw a lot of incremental change over the course of decades to bring us to where we are today. I wouldn't say we're anywhere near paperless, but we will continue to approach zero. There are some things that make life a lot nicer and things seem to keep getting better.

Most companies don't have massive document rooms anymore, they store much of that paperwork on a server somewhere. A decent system can tell you exactly who viewed what, when, and why. We do this because it's better in almost every way, but it took a long time to work out how everything fits together. I never print out maps or travel information anymore, it's all on my phone. I don't keep receipts, I just scan them. A lot of HR documents are filled out through a web browser. I pay many bills through a web browser.

There are still people who claim paper is better with a nostalgic glee. There are plenty of crazy arguments about why paper is better, these people aren't worried about utility though, they have a view of reality that isn't based on the utility of something, they like things they way they are. More on this person later though, we all know one, keep them in mind.

None of these paperless changes happened quickly or with much fanfare. It was just the slow march of progress. Security is happening the same way. There isn't going to be a singular giant event that changes everything, there will be lots of little ones. Over the course of the next decade some people will continue to make incremental improvements. Things will get better one step at a time. Security today is better than it was ten years ago, it's still bad, but it is better.

Here's the catch though. a lot of security people today are actually fighting change. It's not the way they would have done it, and instead of helping they like to complain about how nothing will work. They are going to be the people in ten years talking about how much better life was when everything was on paper in a giant warehouse. Those trees had it coming!

Stage 3: Wait, but there was no stage 2 ...

So the question now is what can we do? The question of how do we fix all this mess keeps coming up over and over again. Nobody can answer it, some people don't even understand the question. If you consider yourself a security person, just start helping. Be patient, answer questions, give good advice. As everyone learns new lessons things will improve. There isn't one fix. Regulation won't fix anything, huge corporations won't fix anything, insurance won't fix anything. Everything will slowly fix itself. The best we can do is try to go from slowest to slower.

There is a bigger issue of are the bad guys moving faster than us? I think today they are, if that will ever change is a debate for a different day.

The world is going to deal with these problems, if the experts help it will go a lot smoother, if they don't we'll still get there, it just takes longer. Don't be the guy who wishes for the good old days. Figure out how to help.

Join the conversation, hit me up on twitter, I'm @joshbressers

December 12, 2015

Rippowam

Ossipee started off as OS-IPA. As it morphed into a tool for building development clusters,I realized it was more useful to split the building of the cluster from the Install and configuration of the application on that cluster. To install IPA and OpenStack, and integrate them together, we now use an ansible-playbook called Rippowam.

Ossipee generates an inventory file, populated with two hostgroups, with one host apiece, and a set of variables necessary for installing IPA and OpenStack.

Rippowam is an Ansible playbook that picks up where Ossipee leaves off.

Here is an overview of the process that Rippowman performs.

  1. Install an interal Yum Repo and upgrade all packages to their latest
  2. Install IPA, including the DNS backend, KRA, and Ipsilon
  3. Register the Openstack Control Node as an IPA client
  4. Install RDO via packstack with the QPid backend for oslo-messaging
  5. SSH enable all of the services via HTTPD configuration for Horizon and Keystone, and HA Proxy for the rest.
  6. Enable Kerberos and SSSD Federation for Keystone and enable token validation via Kerberos
  7. Fetch Keytabs for all of the services and convert them to validate tokens via Kerberos and Modify the service configuration files to use the V3 Keystone API
  8. Enable SAML Federation (via Ipsilon) for Keystone
  9. Enabled WebSSO via both mechansims for Horizon
  10. Enable Kerberos for Authorization from all services to MySQL. We tried Encryption here, too, but there is a bug.
  11. Upgrade Oslo Messaging to use Proton and AMQP 1.0 via the Proton Driver.
    Enable Kerberos for authorization and encryption of the oslo-messaging Broker, to include setting ACLs that only the services can communicate via the broker.

Click on the links to see the Ansible code that implements each of these features.

The ability to execute it was linked in with Ossipee. A user could either run:

./ossipee-create --section os1 rippowam

or execute the playbook directly like this:

ansible-playbook -i ~/.ossipee/deployments/ayoung.demorack/inventory.ini ~/devel/rippowam/site.yml

One thing you will notice that we don’t do is to enable the LDAP backend for Keystone to use the IPA server. The Federated services are backend by IPA. This prevents users from attempting to pass their passwords to the Keystone server; Kerberos is enabled for all authentications.

Once the deployment is up and running, end user can take advantage of IPA integration. Tow things we built upon this demo to show:

  1. A Virtual machine launched via Nova gets automatically registered with the IPA server and put into a corresponding Hostgroup
  2. Volumes can be encrypted with keys managed by the KRA

This was the code that set up our demonstration at the OpenStack summit in Tokyo. This is a work-in-progress.

This has been a huge team effort: Contributors were:

Jamie Lennox: Atlas, carrying the weight of this on his shoulders. A huge part of the Ansible work, master of all things with the openstack clients.
Ade Lee: Barbican
Rich Megginson: Puppet support and autoregistration of IPA clients
Robbie Harwood: Kerberos support in MySQL and an assit on QPid
Andrew Sticher: Kerberos Support in the Proton Driver
Ken Giusti: AMPQ Driver for Oslo Messaging to include Kerberos support
Rob Crittenden and John Dennis both helped clear technical hurdles while making Ipsilon work.
Nalin Dahyabhai for guidance and trouble shooting on X509, Certmonger, and Kerberos issues.

Me. I was involved in a little of each of these.

Nate Kinder, our Fearless leader and group manager that kept the engines running, including the most critical of managerial tasks: deflecting distractors.

Don’t expect to be able to run this just yet. Rippowam still uses repositories that are internal to Red Hat, both for RHEL and for OpenStack. While we were developing it, RHEL was in nightly build status. Now that it is GA, you could change the RHEL repo to one that is valid for you. I’ll update the base OS once CentOS 7.2 is GA. The OpenStack deploy is based on RDO. Again, since this was in part a QA effort, it uses repositories that are internal to make sure we were working with the specific versions of the packages that we are going to support. That said, there is nothing proprietary about where it is headed: all the code that Rippowam is already publicly available. Rippowam is confirmation of our configuration.

Rippowam is based on Packstack. While this is fine for a proof of concept deployment, the goal is to now make this work with an RDO Manager based install.

December 10, 2015

My Git Workflow

It’s been a long while since I published a new entry on this blog, and I keep meaning to improve on that. Today, I was having a conversation with one of my colleagues today and discussing how I set up my local git checkouts. It occurs to me that this might be of interest, so I figure I’ll restart this blog by describing it.

This blog will describe specifically my workflow when dealing with upstreams hosted on Github. There are only minor changes to it when discussing non-Github projects (mainly in the public forking process).

Initial Setup

First, I have to find a project to get involved in. For some people, this is a difficult process involving a great deal of contemplation. For me, on the other hand, I seem to pick up new projects to get involved like they were falling from the sky. Most open-source projects these days seem to be hosted on Github (or at least have a presence there), so my workflow has become reasonably Github-centric.

I will skip over the part where I sign up for a Github account and set up two-factor authentication and uploaded my public SSH key, but rest assured that I have done all of those things. (If you aren’t using two-factor authentication anywhere that is even remotely important, fix that now. I also highly recommend the use of the open-source FreeOTP as a software token for either iOS or Android devices over Google Authenticator; it works anywhere Google Authenticator does.) You may also assume that I am properly logged in to Github at this point.

I’ll use the imaginary package “bar” created by Github user “foo” as my representative example. So I would browse to https://github.com/foo/bar and then click on the github-fork button. (Since I also belong to several Github organizations, this prompts me for which entity I am cloning to, but if you only have a personal account, it will probably skip this phase).

Local Repository Clone

Now that I have a public fork of the “bar” project, I want to be able to work with it. This means that I need to clone my repository to the local machine so I can operate on its contents. Github provides a handy way to identify the git URL needed for the cloning operation. When cloning my personal Github fork, I will want to clone using the “SSH” URL, which allows both reading from it and pushing changes. (I’ll talk about the “HTTPS” URL in a moment). To find the “SSH” URL, look on the main toolbar of the project’s Github main page. If you don’t see it, check for github-HTTPS and click on it, then select “SSH”. After that, it should look like github-SSH and there will be a URL in the text box to the right of it. It should look something like:

git@github.com:sgallagher/bar.git

Now we will open a terminal window, change to an appropriate containing directory and run:

git clone git@github.com:sgallagher/bar.git

This will pull down a copy of the repository onto the local system, ready to work with. I can make whatever local changes I want and run `git push` to submit them to my public fork. However, we are not finished. The next step will be to create an additional “git remote” that points at the original upstream repository. This I do in order to be able to track other changes that are happening upstream (particularly so I can rebase atop others’ work and ensure that my work still applies atop the upstream code). So in this case, I would do the following: first, I would browse to https://github.com/foo/bar again and

cd bar.git
git remote add upstream https://github.com/foo/bar.git
git remote update

This means that I am adding a new remote name (“upstream”) and associating it with the “HTTPS” (read-only) URL for the original project. (The `git remote update` piece causes me to pull the latest bits from upstream and store them locally.)

Special Case: Upstream Committer

In the special case where I also have commit privileges to the upstream repository, I also add another git remote called “upstream-push” using the “SSH” URL. Then, when I have patches ready to go upstream, I can ready them in a proper branch and then run

git push upstream-push local_branch_name:remote_branch_name

The reason for this additional upstream is to avoid accidental pushes to master (which is very easy to do if you have created a branch from e.g. upstream/remote_branch_name).

Appendix: Useful Git Aliases

In addition to my workflow convention above, I have also created a number of very useful git aliases that I store in my ~/.gitconfig file.

[alias]
 patch = format-patch -M -C --patience --full-index
 patches = format-patch -M -C --patience --full-index @{upstream}..
 up = remote update
 new = log ..@{upstream}

git patch

This handy alias is essentially a wrapper around `git format-patch`, which is useful for generating comprehensive patch files for `git send-email` (among other uses). It creates patches that auto-detect file copies and renames, diffs using the “patience” algorithm (which tends to be more human-readable than other algorithms) and displays git indices using their un-shortened forms (to avoid ambiguities).

This alias needs to be invoked with additional `git format-patch` arguments, such as a specification of which commits to generate patches. Which leads us to:

git patches

This is a variant of the above, except it will automatically generate all patches on the branch that are not present in the upstream branch being tracked. (So if you are tracking e.g. upstream/master, this will output all of the patches atop master).

git up

This is a quick shorthand for `git remote update` to automatically pull all of the new data from every remote configured in the checkout. This is particularly useful before a `git rebase -i upstream/branch` right before submitting a patch for code-review. (Which you should always do, to make the reviewer’s life easier.)

git new

This is another quick shorthand essentially for a variant of `git log` that only shows you your own commits atop the remote branch.

 


December 09, 2015

HA for Tripleo

Juan Antonio Osorio Robles was instrumental in me getting Tripelo up and running. He sent me the following response, which he’s graciously allowed me to share with you.

Glad it’s working finally :) . I recommend you also try to deploy HA, which is
enabled appending the following to the “overcloud deploy” you already do:

--control-scale 3 -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server pool.ntp.org

(EDITORIAL NOTE. The final command line I ran looked like this:

openstack overcloud deploy --control-scale 3 --templates  --libvirt-type qemu -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml --ntp-server pool.ntp.org

)

For doing changes, something I usually do is just clone the tripleo-heat-templates repo:

git clone https://review.openstack.org/openstack/tripleo-heat-templates

and start using it on the overcloud deploy command (assuming you’re in
the home directory):

openstack overcloud deploy --templates tripleo-heat-templates/ --libvirt-type qemu

If you’re doing a modification to the HA templates/manifests also add the
manifest from the repo you just cloned:

openstack overcloud deploy --templates tripleo-heat-templates/ --libvirt-type qemu \
    --control-scale 3 -e tripleo-heat-templates/environments/puppet-pacemaker.yaml \
    --ntp-server pool.ntp.org

For configuring keystone on the overcloud what you want to start looking at is the following
puppet manifests:

tripleo-heat-templates/puppet/manifests/overcloud_controller.pp
tripleo-heat-templates/puppet/manifests/overcloud_controller_pacemaker.pp

The first will be used in non-HA mode while the second will be used instead when HA is
deployed.

At some point, I was having the issue that when deploying, heat was taking the puppet
manifests from the /usr/share/openstack-tripleo-heat-templates/ directory, instead of
the ones I wanted in the git repo I cloned in my home directory. So I got the habit of
removing that directory and creating a symlink that points to the git repo in my home
directory. Not sure if that’s the right thing to do, but it has worked for me pretty well
so far.

I usually test HA and non-HA (specially if you’re doing changes to the manifests), and an
easy way to start is to clean up your overcloud:

heat stack-delete overcloud

and re-deploy. Though, upgrade must also be tested. So, starting from an overcloud
deployment without your changes, you should then try upgrading the deployment.
You should be able to do upgrades with the same overcloud deploy command.
Although, you can also do it with straight heat.

December 08, 2015

Asking Me Questions about Keystone

As many of you have found out, I am relatively willing to help people out with Keystone related questions. Here are a couple guidelines.

Don’t ask me support questions here on the blog. I’ll lie or make something up and you will never know it. Instead, ask in #openstack-keystone if you can find me on IRC or on the #openstack or #oepnstack-dev mailing list with [keystone] in the title. I might not answer, but someone that knows will. Or someone that doesn’t know, will answer, and I’ll correct the answer, and then someone else will correct my correction.

It is a community. I don’t have all the answers, but I try to do my part to contribute.

If you do want to ask me the question in IRC, don’t send a message like this:

ayoung ping

I will not be happy. I might even be snippy and link to Ajax’s post about naked pings.

It is AOK to mention my IRC handle so I get notified, but then ASK YOUR QUESTION STRAIGHT OUT!

And then Stay on IRC.  Like, just leave your client up for 24 hours. I’ll respond, but maybe not for an hour or two. Even if your questions scrolls off the screen, I can pull it out of Evesdrop.  Nothing more frustrating than context shifting to answer a question, only to find the questioner has disappeared.

Don’t worry about being a neophyte:  very few people know Keystone in depth, and if you have the question, someone else does, too.

A really good thing to do:  post it on the OpenStack questions site, and then you can send me a message in IRC asking me to respond.  Then it is posted for all to see.  This is by far your best option.  Worried that someone else has already asked it?  Good reason to browse the archives.

 

Hacky Stappering.  Happy Stacking.

 

December 07, 2015

Getting Started with Tripleo

OpenStack is big. I’ve been focused on my little corner of it, Keystone, for a long time. Now, it is time for me to help out with some of the more downstream aspects of configuring RDO deployments. In order to do so, I need to do an RDO deployment. Until recently, this has meant Packstack. However, Packstack really is not meant for production deployments. RDO manager is the right tool for that. So, I am gearing up on RDO manager. The upstream of RDO Manager is TripleO.

I have a Dell T1700 with 32 Gb Ram and a single NIC. I am going to run everything I need in virtual machines on this one machine. While this does not match a production install, it seems to be the minimal hardware commitment to getting work done.

I’ve installed CentOS 7.1 on it. This is the latest released version of CentOS, the version that RDO is targeting for deployment.

It has booted and gotten an IP address from DHCP. I’ve copied this IP address to /etc/hosts on my laptop and given it the name ayoung_dell_t1700.test

From a login console, I edited /etc/ssh/sshd_config to allow root login:

PermitRootLogin yes

And then to connect to the machine automatically:

ssh-copy-id root@ayoung_dell_t1700.test 

There are many ways to get things installed. I am opting for minimal effort here, which means instack, and even that is too much work, so I am using inlunch to run instack.

On small hiccup I hit was that instack needs to expose port 2200 to the outside world in order to allow ssh into the VMs running attached to the nested network. I originally just stopped firewalld, but this kills NAT, which means VMs can’t fetch packages from the outside world. To fix it I opened up port 2200 on the hypervisor machine.

 systemctl start firewalld.service
 firewall-cmd --permanent --zone=public --add-port=2200/tcp
 firewall-cmd --reload

To run the install:

cp answers.yml.example answers.yml
INLUNCH_FQDN=ayoung_dell_t1700.test  ./instack-virt.sh

It took some time ( should have gotten lunch…) but seems to have succeeded in getting the undercloud installed.

$ ssh stack@ayoung-dell-t1700.test -p2200
Last login: Thu Dec  3 15:49:51 2015 from 192.168.122.1
[stack@instack ~]$ . ./stackrc 
[stack@instack ~]$ openstack image list
+--------------------------------------+------------------------+
| ID                                   | Name                   |
+--------------------------------------+------------------------+
| 2a1dbaf1-d5b3-489c-943d-5fd8e1c84459 | bm-deploy-kernel       |
| 173d20f1-c160-4a4d-bbad-e5c04df5e0be | bm-deploy-ramdisk      |
| eff2db2f-0c75-4709-953b-27ca93797e8e | overcloud-full         |
| a7b83a9b-b859-449e-852c-13ac2a563330 | overcloud-full-vmlinuz |
| 8b3ccdae-c6b8-491e-a964-e5c4defe6b30 | overcloud-full-initrd  |
+--------------------------------------+------------------------+

To install the overcloud, I ran:

. ./stackrc 
openstack overcloud deploy --templates  --libvirt-type qemu

Which is still running as I write this (I really should get lunch). To check the status as it runs, in a second ssh session:

[stack@instack ~]$ . ./stackrc 
[stack@instack ~]$ heat resource-list overcloud -n5 | grep PROG
| Compute                                   | f5060a84-eacc-4f65-9aea-bf7457635bc4          | OS::Heat::ResourceGroup                           | CREATE_IN_PROGRESS | 2015-12-03T15:38:06 | overcloud                                                                       |
| 0                                         | 3c28c035-d047-4201-9d6b-b970244231b2          | OS::TripleO::Compute                              | CREATE_IN_PROGRESS | 2015-12-03T15:38:24 | overcloud-Compute-njgvkybncrzg                                                  |
| NetworkDeployment                         | 35297ccc-065f-4dc7-bb47-8b9f616ff0b4          | OS::TripleO::SoftwareDeployment                   | CREATE_IN_PROGRESS | 2015-12-03T15:38:25 | overcloud-Compute-njgvkybncrzg-0-z3nnepjk3zfz                                   |
| UpdateDeployment                          | 313941dd-f19b-4e15-97e6-31c7bf31f47a          | OS::Heat::SoftwareDeployment                      | CREATE_IN_PROGRESS | 2015-12-03T15:38:25 | overcloud-Compute-njgvkybncrzg-0-z3nnepjk3zfz                                   |
[stack@instack ~]$ heat resource-list overcloud -n5 | grep PROG
| Compute                                   | f5060a84-eacc-4f65-9aea-bf7457635bc4          | OS::Heat::ResourceGroup                           | CREATE_IN_PROGRESS | 2015-12-03T15:38:06 | overcloud                                                                       |
| 0                                         | 3c28c035-d047-4201-9d6b-b970244231b2          | OS::TripleO::Compute                              | CREATE_IN_PROGRESS | 2015-12-03T15:38:24 | overcloud-Compute-njgvkybncrzg                                                  |
| NetworkDeployment                         | 35297ccc-065f-4dc7-bb47-8b9f616ff0b4          | OS::TripleO::SoftwareDeployment                   | CREATE_IN_PROGRESS | 2015-12-03T15:38:25 | overcloud-Compute-njgvkybncrzg-0-z3nnepjk3zfz                                   |
| UpdateDeployment                          | 313941dd-f19b-4e15-97e6-31c7bf31f47a          | OS::Heat::SoftwareDeployment                      | CREATE_IN_PROGRESS | 2015-12-03T15:38:25 | overcloud-Compute-njgvkybncrzg-0-z3nnepjk3zfz                                   |
[stack@instack ~]$ exit
[stack@instack ~]$ ironic node-list
+--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+
| 59761ad5-e72c-459e-b88f-e2f58ec494e9 | None | None                                 | power off   | available          | False       |
| cd73ef5a-5889-41bb-b936-6dede3fa5811 | None | None                                 | power off   | available          | False       |
| d6de3560-0e47-4add-a350-dc67f5ced804 | None | None                                 | power off   | available          | False       |
| 7f7816ff-c284-4ac7-917f-67d0c5521d56 | None | 19c4d6e1-7a3d-43f6-b8c1-b761082a5d46 | power on    | active             | False       |
| f1a7da5d-74f1-4113-8a49-aa1dd5a9ed56 | None | None                                 | power off   | available          | False       |
| 8fa69e00-559f-430c-ad5d-2438980b2f4a | None | 2ca4888d-b803-49a2-8e4e-d1f4d271f3ce | power on    | active             | False       |
+--------------------------------------+------+--------------------------------------+-------------+--------------------+-------------+

Once it is done running, there is a separate keystone resrouce file: Sourceing that allows the user to query the overcloud:

$ . ./overcloudrc 
[stack@instack ~]$ openstack user list
+----------------------------------+------------+
| ID                               | Name       |
+----------------------------------+------------+
| 187635147c994b35a8ed438c04f25645 | swift      |
| 39f8484d167b4f809e576b4433e62f19 | cinder     |
| 6ead6ea9f2af4ac295e8d75e0e3f960d | heat       |
| 867d4a61fba94d1382473629f662eff9 | admin      |
| 89930c33a4314fa289252f47f76f0a0e | cinderv2   |
| a69eaa0ff159403587b0f4ae6115174a | ceilometer |
| c3d5b13e8d354a83b77d12cbe2be4950 | neutron    |
| d4e09a73a9b54a7d819deac1afcbabdf | glance     |
| d64c2792a8364750bf875ee7235471f9 | nova       |
+----------------------------------+------------+
[stack@instack ~]$ openstack compute service list
+------------------+-------------------------+----------+---------+-------+----------------------------+
| Binary           | Host                    | Zone     | Status  | State | Updated At                 |
+------------------+-------------------------+----------+---------+-------+----------------------------+
| nova-cert        | overcloud-controller-0  | internal | enabled | up    | 2015-12-07T19:33:47.000000 |
| nova-consoleauth | overcloud-controller-0  | internal | enabled | up    | 2015-12-07T19:33:50.000000 |
| nova-scheduler   | overcloud-controller-0  | internal | enabled | up    | 2015-12-07T19:33:43.000000 |
| nova-conductor   | overcloud-controller-0  | internal | enabled | up    | 2015-12-07T19:33:44.000000 |
| nova-compute     | overcloud-novacompute-0 | nova     | enabled | up    | 2015-12-07T19:33:41.000000 |
+------------------+-------------------------+----------+---------+-------+----------------------------+
openstack image create --location https://launchpadlibrarian.net/170024918/cirros-0.3.2-source.tar.gz  cirros
+------------------+--------------------------------------+
| Field            | Value                                |
+------------------+--------------------------------------+
| checksum         | None                                 |
| container_format | bare                                 |
| created_at       | 2015-12-07T19:36:47.000000           |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | raw                                  |
| id               | f6690870-96fd-4b1a-80fb-dee80b0169ef |
| is_public        | False                                |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | cirros                               |
| owner            | 0018147affb84b3b946b40a105767b12     |
| properties       |                                      |
| protected        | False                                |
| size             | 429582                               |
| status           | active                               |
| updated_at       | 2015-12-07T19:36:47.000000           |
| virtual_size     | None                                 |
+------------------+--------------------------------------+
[stack@instack ~]$ openstack server create --image cirros --flavor m1.tiny  test
+--------------------------------------+-----------------------------------------------+
| Field                                | Value                                         |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                        |
| OS-EXT-AZ:availability_zone          |                                               |
| OS-EXT-SRV-ATTR:host                 | None                                          |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | None                                          |
| OS-EXT-SRV-ATTR:instance_name        | instance-00000001                             |
| OS-EXT-STS:power_state               | 0                                             |
| OS-EXT-STS:task_state                | scheduling                                    |
| OS-EXT-STS:vm_state                  | building                                      |
| OS-SRV-USG:launched_at               | None                                          |
| OS-SRV-USG:terminated_at             | None                                          |
| accessIPv4                           |                                               |
| accessIPv6                           |                                               |
| addresses                            |                                               |
| adminPass                            | TmSwanMP3A4K                                  |
| config_drive                         |                                               |
| created                              | 2015-12-07T19:37:19Z                          |
| flavor                               | m1.tiny (1)                                   |
| hostId                               |                                               |
| id                                   | 67166662-64a4-4265-8e3a-11926063e8c3          |
| image                                | cirros (f6690870-96fd-4b1a-80fb-dee80b0169ef) |
| key_name                             | None                                          |
| name                                 | test                                          |
| os-extended-volumes:volumes_attached | []                                            |
| progress                             | 0                                             |
| project_id                           | 0018147affb84b3b946b40a105767b12              |
| properties                           |                                               |
| security_groups                      | [{u'name': u'default'}]                       |
| status                               | BUILD                                         |
| updated                              | 2015-12-07T19:37:19Z                          |
| user_id                              | 867d4a61fba94d1382473629f662eff9              |
+--------------------------------------+-----------------------------------------------+
[stack@instack ~]$ openstack server  list
+--------------------------------------+------+--------+----------+
| ID                                   | Name | Status | Networks |
+--------------------------------------+------+--------+----------+
| 67166662-64a4-4265-8e3a-11926063e8c3 | test | ACTIVE |          |
+--------------------------------------+------+--------+----------+

Inlunch is Jiri’s personal tool, and I thank him for sharing it. It is not intended to be a large community effort, and it may break or bit-rot in the future.

In order to try and get something that is supportable, John Trowbridge has started a comparable effort called tripleo-quickstart that we are going to try and have ready for the Mitaka based RDO test day this cycle. The major differences focus between these two efforts:

  • inlunch is upstream tripleo. tripleo-quickstart is RDO based.
  • inlunch generates the VM images.  tripleo-quickstart downloads them
  • inlunch has a lot of the customization in bash scriptlets configurable from the answers.yml.  This reflects its use as a developers tool.  quicklaunch is much more straight ansible.
Security lacks patience
I had a meeting with some non security people to discuss some of the challenges around security. It's a rather popular topic these days but nobody knows what that means (remember 5 years ago when everyone talked about cloud but nobody knew what that meant?). The details are irrelevant, the most important thing that came out of this meeting was when someone pointed out as a group we are impatient.

I sort of knew this, but I wouldn't have listed this in the top 10 of "what's wrong with us".

What does it mean to be impatient? We don't listen as well as we should. We get tired of having to explain the same thing over and over again. We don't like to talk to someone who knows less than us (which is everyone). There are plenty of other examples, I'm not going to dwell on them though. It's likely many of us have no idea we're impatient.

I think the most important aspect of this though is how we deal with new idea. In almost every instance of someone proposing a new idea, we rarely talk to them about it, we spend time telling them why they're wrong. There is nothing security people like to do more than tell someone why they're wrong. Being technically correct is the best kind of correct!

I was at a working group recently where a number of people suggested new ideas. In almost every case the majority of time was spent explaining to them why their ideas were stupid and would never work. This isn't a good use of time. It's the help or shut up concept. We're not patient, we don't want to engage, we just want to prove why we're right and get back to doing nothing. Don't be this person, if you don't have constructive feedback listen instead of talking. Bad ideas generally self destruct during discussion, and discussion makes good ideas great.

Has bluntly telling someone their idea is stupid ever actually worked? I bet in almost every instance they double down and never will listen to you again. This is how bad ideas become bad projects.

How do I be more patient?

Being more patient isn't all that hard in theory, but it's really hard if you're used to proving everyone wrong all the time. You just have to learn to listen. It sounds simple but for most security people it's going to be really hard, one of the hardest things you'll ever do. Let's cover some examples.

A new way to classify security flaws is proposed, you think it's dumb. Do you
  1. Tell them why they're wrong
  2. Argue over why your way is better (even though you don't really have a way)
  3. Sit there and listen, even though it feels like your insides want to jump out and start yelling
The correct answer is #3. It's really hard to listen to someone else speak if you think they're wrong. There are few feeling of satisfaction like completely destroying someone's idea because it wasn't thought all the way through. This is why nobody likes you.

You find a remote execution flaw in some code a coworker wrote. Do you
  1. Make sure everyone knows they did this and push to revoke their git access
  2. Tell them how stupid they are and demand they fix the problem without any help
  3. Teach them how to fix the problem, listening to what they say while they're trying to learn
#1 and #2 are pretty much the way things work today. It's sort of sad when you really think about it.

If you just sit and listen, people will talk. Most people don't like silence. If you say nothing, they will say something. In the above example, the person you listen to will start to talk about why they did what they did. That will give you what you need to teach them what they need to know. This is how you gain wisdom. We are smart, we are not wise.

Listening is powerful. Patience is listening. Next time you're talking to someone, no matter what the topic is, just sit and listen. Make a point not to speak. You'll learn things you never dreamt of, and you'll build trust. Listening is more powerful than talking, every time.

Join the conversation, hit me up on twitter, I'm @joshbressers

November 30, 2015

Where is the physical trust boundary?
There's a story of a toothbrush security advisory making the rounds.

This advisory is pretty funny but it matters. The actual issue with the toothbrush isn't a huge deal, an attacker isn't going to do anything exciting with the problems. The interesting issue here is we're at the start of many problems like this we're going to see.

Today some engineers built a clever toothbrush. Tomorrow they're going to build new things, different things. Security will matter for some of them. It won't matter for most of them.

Boundaries of trust


Today when we try to decide if something is a security issue we like to ask the question "Does this cross a trust boundary?" If it does, it's probably a security issue. If no trust boundary is crossed, it's probably not an issue. There are of course lots of corner cases and nuance, but we can generally apply the rule.

Think of it this way. If a user can delete their own files, that's not crossing a trust boundary, that's just doing something silly. If a user can delete someone else's files, that's not good.

This starts to get weird when we think about real things though.

Boundaries of physical trust?


What happens in the physical world? What counts as a trust boundary? In the toothbrush example above an attacker could gain knowledge of how someone is using a toothbrush. That's technically a trust boundary (an attacker can gain data they're not supposed to have), but let's face it, it's not a big deal. If your credit card number was also included in the data, sure no question there.

But as such, we're talking about data that isn't exciting. You can make the argument about tracking data from a user over the course of time and across devices, let's not go there right now. Let's just keep the thinking small and contained.

Where do we draw the line?


If we think about physical devices, what are our lines? A concept of just a trust boundary doesn't really work here. I can think of three lines, all of which are important, but not equally important.
  1. Safety
  2. Harm
  3. Annoyance

Safety

When I say safety I'm thinking about a device that could literally kill a person. This could be something like disabling the brakes on a car. Making a toaster start a fire. Catastrophic events. I don't think anyone would ever claim this class of issues isn't a problem. They are serious, I would expect any vendor to take these very seriously.

Harm

Harm would be where someone or something can be hurt. Nothing catastrophic. Think maybe a small burn, or a scrape. Perhaps making someone fall down when using a scooter, or burn themselves with a device. We could argue this category for a while. Things will get fuzzy between if the problem is catastrophic. Some vendors will be less willing to deal with these but I bet most get fixed quickly.

Annoyance

Annoyance is where things are going to get out of hand. This is where the toothbrush advisory lives. In the case of a toothbrush it's not going to be a huge deal. Should the vendor fix it? Probably. Should you get a new toothbrush over it? Probably not.

The nuance will be which annoying problems deserve fixes and which ones don't? Some of these problems could cost you money. What if an attacker can turn up your thermostat so your furnace runs constantly? Now we have an issue that can cost real money. What if we have a problem where your 3D printer ruins a spool of filament? What if the oven burns the Christmas goose?

Where is our trust boundary in the world of annoying problems? You can't just draw the line at money and goods. What happens if you can ring a person's door bell and they have to keep getting up to check the door? Things start to get really weird.

Do you think a consumer will be willing to spend an extra $10 for "better security"? I doubt it. In the event a device will harm or kill a person there are government agencies to step in and stop such products. There are no agencies for leaking data and even if there were they would have limited resources. Compare "annoyance security" to all the products sold today that don't actually work, who is policing those?

As of right now our future is going to be one where everything is connected to the Internet, none of it is secure, and nobody cares.

Join the conversation, hit me up on twitter, I'm @joshbressers

November 20, 2015

If your outcome is perfect or nothing, nothing always wins
This tweet
https://twitter.com/RichFelker/status/666325066838339584

Led to this thread
http://marc.info/?t=144778171800001&r=1&w=2

The short version is there are some developers from Red Hat working on gcc attempting to prevent ROP style attacks. More than one person has accused this work of being pointless and a waste of time. It's not, the waste of time is arguing about why trying new things is dumb.

Here's the important thing security people always screw up.

The only waste of time is if you do nothing and complain about the people who are doing something.

It is possible the ROP work that's being done won't end up preventing anything. If that's true the absolute worst thing that will result is learning a lesson. It's all too easy in the security space to act like this. If it's not perfect you can make the argument it's bad. It's a common trait of a dysfunctional group.

This is however true in crypto, never invent your own crypto algorithm.

But in the context of humanity, this is how progress happens. First someone has an idea, it might be a terrible idea, but they work on it, then they get help, the people helping expand and change the idea, eventually, after people work together, the end is greater than the means. Or if it's a bad idea, it goes nowhere. Failure only exists if you learn nothing.

This isn't how security has worked, it's probably why everything seems so broken. The problem isn't the normal people, it's the security people. Here's how a normal security idea happens:
  1. Idea
  2. YOUR IDEA IS STUPID YOU'RE WASTING YOUR TIME AND YOU'RE STUPID!!!
  3. Give up
That's madness.

From now on, if someone has an idea and you think it's silly, say nothing. Just sit and watch. If you're right it will light on fire and you can run around giving hi5s. It probably won't though. If someone starts something, and others come to help, it's going to grow into something, or they'll fail and learn something. This is how humans learn and get better. It's how open source works, it's why open source won. It's why security is losing.

The current happy ending to the ROP thread is it's going to continue, the naysayers seem to have calmed down for now. I was a bit worried for a while I'll admit. I have no doubt they'll be back though.

Help or shut up. That is all.

Join the conversation, hit me up on twitter, I'm @joshbressers

November 18, 2015

Translating Between RDO/RHOS and Upstream OpenStack releases

There is a straight forward mapping between the version numbers used for Red Hat Enterprise Linux OpenStack Platform release numbers, and the upstream and RDO releases of OpenStack. I can never keep them straight. So, I write code.

UPDATE1: missed Juno before…this is why we code review
UPDATE2: had RDO using the version numbers, but it in sync with upstream.

#!/usr/bin/python

upstream = ['Austin', 'Bexar', 'Cactus', 'Diablo', 'Essex', 'Folsom',
            'Grizzly', 'Havana', 'Icehouse', 'Juno', 'Kilo', 'Liberty',
            'Mitaka', 'N', 'O', 'P', 'Q', 'R', 'S']

for v in range(0, len(upstream) - 3):
    print "RHOS Version %s = upstream %s" % (v, upstream[v + 3])

RHOS Version 0 = upstream Diablo
RHOS Version 1 = upstream Essex
RHOS Version 2 = upstream Folsom
RHOS Version 3 = upstream Grizzly
RHOS Version 4 = upstream Havana
RHOS Version 5 = upstream Icehouse
RHOS Version 6 = upstream Juno
RHOS Version 7 = upstream Kilo
RHOS Version 8 = upstream Liberty
RHOS Version 9 = upstream Mitaka
RHOS Version 10 = upstream N
RHOS Version 11 = upstream O
RHOS Version 12 = upstream P
RHOS Version 13 = upstream Q
RHOS Version 14 = upstream R
RHOS Version 15 = upstream S

I’ll update one we have names for N and O.

November 16, 2015

Your containers were built in some guy's barn!
Today containers are a bit like how cars used to work a long long long time ago. You couldn't really buy a car, you had to build it yourself or find someone who could build one for you in their barn. The parts were terrible and things would break all the time. It probably ran on steam or was pulled by a horse.

Containers aren't magic. Well they are for most people. Almost all technology is basically magic for almost everyone. There are some who understand it but generally speaking, it's complicated. People know enough to get by which is fine, but that also means you have to trust your supplier. Your car is probably magic to you. You put gas in a hole in the back, then you can press buttons, push peddles, and turn wheels to transport you places. I'm sure a lot of people at this point are running through the basics of how cars work in their heads to reassure themselves its' not magic and they know what's going on!

They're magic, unless you own an engine hoist (and know how to use it).

Now let's think about containers in this context. For the vast majority of container users, they get a file from somewhere, it's full of stuff that doesn't make a lot of sense. Then they run some commands they found on the internet, then some magic happens, then they repeat this twiddling things here and there until on try 47 they have a working container.

It's easy to say it doesn't matter where the container content came from, or who wrote the dockerfile, or what happens at build time. It's easy because we're still very early in the life of this technology. Most things are still fresh enough that security can squeak by. Most technology is fresh enough you don't have to worry about API or ABI issues. Most technology is new enough it mostly works.

Except even with as new as this technology is, we are starting to see reports of how many security flaws exist in docker images. This will only get worse, not better, if nothing changes. Almost nobody is paying attention, containers mean we don't have to care about this stuff, right!? We're at a point where we have guys building cars in their barns. Would you trust your family in a car built in some guy's barn? No, you want a car built with good parts and has been safety tested. Your containers are being built in some guy's barn.

If nothing changes, imagine what the future will look like. What if we had containers in 1995. There would still be people deploying Windows 95 in a container and putting it on the Internet. In 20 years, there are still going to be containers we use today being deployed. Imagine still seeing Heartbleed in 20 years if nothing changes, the thought is horrifying.

Of course I'm a bit over dramatic about all this, but the basic premise is sound. You have to understand what your container bits are. Make sure your supplier can support them. Make sure your supplier knows what they're shipping. Demand containers built with high quality parts, not pieces of old tractors found in some barn. We need secure software supply chains, there are only a few places doing it today, start asking questions and paying attention.

Join the conversation, hit me up on twitter, I'm @joshbressers

November 11, 2015

Is the Linux ransomware the first of many?
If you pay any attention to the news, no doubt the story of the Linux ransomware that's making the rounds. There has been much said about the technical merits of this, but there are two things I keep wondering.

Is this a singular incident, or the first of many?

You could argue this either way. It might be a one off blip, it might be the first of more to come. We shouldn't start to get worked up just yet. If there's another one of these before the year ends I'm going to stock up on coffee for the impending long nights.

Why now?

Why are we seeing this now? Linux and Apache have been running a lot of web servers for a very long time. Is there something different now that wasn't there before? Unpatched software isn't new. Ransomware is sort of new. Drive-by attacks aren't new. What is new is the amount of attention this thing is getting.

It is helpful that the author made a mistake so the technical analysis is more interesting that it would be otherwise. I wonder if this wouldn't have been nearly as exciting without that.

If this is the first of many, 2016 could be a long year. Let's hope it's an anomaly.

Join the conversation, hit me up on twitter, I'm @joshbressers

November 10, 2015

You don't have Nixon to kick around any more!
There has been a bit of noise lately around some groups not taking security as seriously as they should. Or maybe it's the security folks don't think they take it as seriously as they should. Someday there is going to be a security mushroom cloud! When there is, you won't have Nixon Security to kick around anymore!

Does it matter?

I keep thinking about people who predict the end of the world, there hasn't been one of these in a while now. The joke is always "someday they'll be right".

We're a bit like this when it comes to computer security. The security guys have been saying for a very long time "someday you'll wish you listened to us!" I'm not sure this will even happen though. There will be localized events of course, but I doubt there will be one singular thing, it'll likely be a long slow burn.

The future won't be packetized.

The world is different now, I don't think there will be some huge changing event, but it's for the exact reason we think it will. Open source won, but it doesn't mean security wins next, it means security wins never.

Will there be a major security event that makes everyone start paying attention? I don't think so. If you look at history, a singular major event can cause a group to quickly change direction and unite them all. This happened to Microsoft, their SDL program got created, things like Nimda and Code Red gave them purpose and direction. But Microsoft was a single entity, one person could demand they change direction and everyone had to listen. If you didn't listen, you got a new job.

Imagine what would happen if anyone inside an open source project did this, even if they are viewed as the "leader"? It would be a circus. You would have one group claiming this is great (that's us), one claiming this is dumb (those are the armchair security goofs) and a large group who wouldn't care or change their behavior because there's no incentive.

You can't "hack" open source. A single project can be attacked or have a terrible security record. Individual projects may change how they work, but fundamentally the whole ecosystem won't drastically change. Nobody can attack everything, they can only attack small bits. Now don't think this is necessarily bad. It's how open source works and it is what it is. Open source won I dare not question the methodology.

At the end of the day the way we start to get security to where we want it will be with a few important ideas. Once we have containers that can be secured, some bugs go away for example. I always say there is no security silver bullet. There isn't one, there will be many. It's the only way any of this will work out. Expecting everyone to be a security expert doesn't work, expecting volunteers to care about security doesn't work.

The future of open source security lies with the integrators. The people who take lots of random projects and put them together. That's where the accountability lives, it's where it belongs. I don't' know what that means yet, but I suspect we'll find out in the near future as security continues to be a hot topic.

It's a shame I'm not musical. Security Mushroom Cloud would be a great band name.

Join the conversation, hit me up on twitter, I'm @joshbressers

November 06, 2015

Leadership in Software Development Part 4

Principle #10 – Build A Team

Principle #11 – Employ Your Team In Accordance With Its Capabilities

No one owns the code. Everyone owns the code. While not everyone has the same capabilities, we all contribute to a common code source, and we all want it to be as high a quality as possible. There are a handful of tools that are essential to distributed development: Bug trackers and Wikis and etherpads are essential, but so is IRC, email, and, lately, a code review system. These are the tools by which you communicate.  Communication is key, and respectful communication is essential.

Your job as a leader is not just to communicate, but to ensure that others are communicating.  You need the right technical tools, and the right attitude.  To build a team, you need to set a vision where the project is going, and help people realize how that vision works to their advantage.  You need to be willing to get obstacles out of developers way.  You need to make sure you are not that obstacle.

Inspire cross-team communication.  Keep it light, without letting humor degenerate into something hurtful.  Keep a constant eye out for bottlenecks that will discourage contributors. If two people are working on two different aspects of the same project, put them in communication with each other.  Facilitate the language choices to make sure they have a common solution, even if they have different problems to solve.

Diversity of interests helps build a team.  Some people are more detailed oriented, and do wonderful code reviews.  Some people are dreamers, that have grand ideas for project directions. Some people have skills in more esoteric areas like cryptography or databases.  The net total set of skills you have on a team increases that teams capabilities.  Thus, balance off the need for different skills with the need for a commonality of purpose.

At some level, programming is programming. But, there is a different skill set in doing user interface than in doing highly performant multiprocess number crunching. Your community coalesces around your project due to both a shared need and a common skill-set. Make sure that your project stays within its bounds.

But…that isn’t always possible. When I was a team leader for a Java based web application running on JBoss, we were affected by a Kernel scheduling issue that caused JBoss to get killed by the Out of Memory (OOM) killer. While it didn’t devolve on me to fix the kernel, it did mean that I had to understand the problem and find a work around for our company. I had enough of a Linux background at that point that I was able to get us onto a stable kernel until the problem was resolved upstream.

However, I was also aware that I was spread too thin. as a team leader, I picked up all of the tasks that had to be done, but that were too small to justify distracting a team member that was working on a strategic problem. I was the lead programmer, QA engineer, system administrator, as well as the guy that had to do all of the management tasks that had nothing to do with coding. Something had to give, and I got my boss to hire a system administrator. Knowing what your team needs in order to succeed, and knowing that you don’t have the skill-set in house is vital to getting a product build and

Leadership in Software Development Part 3

Principle #7 – Keep Your Team Informed

Communication is the key to any operation. In the Army, they taught that an Infantry Soldier needs to do three things in order to succeed: Shoot, move, and communicate. Well, there should be very little gun fire in open source development, so shooting is less essential. Movement to, since most things happen via network. But communication is paramount. Tell people what you are going to do. A great decision left not communicated is no decision. In the absence of information, people will make assumptions. It is easier to correct mistakes early, and to identify them requires review and correction.

You might not know everything that people want to know. Tell them that. Maybe you don’t have a release schedule. Knowing that there is no fixed release schedule is better than wondering when the release is going to come.

Maybe you haven’t had time to review a patch. Let the submitter know that, and you will get to it when you can. In that exchange, you might learn that it really is not a high priority issue, and you can prioritize down.

Principle #8 – Develop A Sense of Responsibility In Your Team

Tough to do this, but straightforward. Set the example. Show the people that are involved with your project that you believe in it. Some will stick, some will not. The ones that do will do so for varied reasons. But not all will have the bigger picture.

Free and Open Source Software (FOSS) carries with it a built in sense of ownership, and a corresponding responsibility. Once you realize that you “can” fix something, you often feel you “should.” The trick is to get people to “do.” This means inspiring them to do so.  Respectful communication is key.  Go back and reread “Schofield’s Definition of Discipline.”

A sense of responsibility means that they learn the full software development life cycle.  Just finishing a feature is the starting point; it needs to be tested, documented, and released, and all these things require effort as well.  They are not “fun” and they can be grinding.  If a patch conflicts with another patch, it has to be rebased and deconflicted, often manually.  This can be frustrating as well. Only if a team member “owns” the patch will they be willing to put in the effort to see it through.

A trickier aspect is to get a developer from an organization with a particular perspective to understand demands from other organizations.  We’ve seen this a lot in Open Stack.  I work for a company that distributes software that other companies need to install.  OpenStack is a product to me. My customers have a particular set of demands.  I work very closely with people from other companies that have large, public clouds.  Open Stack is something that they deploy, often right out of the upstream repository, to a limited set of deployments.  It is a key part of their revenue model, and if it breaks, they suffer.  Both perspectives are important, and can be complementary.  Features built for one model can often be critical to the other model….once they are understood.  You have to be responsible to all of the downstream deployers of your project, not just the ones that pay your paycheck, or the project suffers.

Principle #9 – Ensure Each Task is Understood, Supervised and Accomplished

Not everyone gets to do the fun stuff. But sometimes, people don’t even realize what needs to be done to get a project done. Writing the code that gets executed at run time is the focus, but there is all the other stuff: packing and install scripting, start up and cleanup, Database management, and automated QA. People need to write unit tests to go along with their features. All of this stuff is important, and as the project leader, you have to make sure it happens. For a small project, you may do it all. However, as some noted Linus doesn’t scale. The Linux kernel has a strong delegation culture, and a slew of companies that fill in the gaps to make sure that aspects of the Kernel are solid.

A bug tracker is key.  I am a fan of the approach where everything goes in the bug tracker as it minimizes the number of systems to check.  Bug trackers are not the best for submitting code patches, though, and so linking the patch submission process to bugs is essential.  We’ve been using Gerrit on OpenStack and it is a fairly successful approach.  FreeIPA does everything on the mailing list, which also works, so long as you periodically comb the list to cross reference submitted versus merged changes.  The larger the project, the more essential it is to automate the tracking process.

Now, not every bug is going to get fixed in every release, and not every feature will be implemented.  Prioritize and select among them, and make sure the most essential efforts get priority.  It is OK to postpone a feature to the next release, and then the one after that if the priority is just not there; often, you find a key feature emerges that obviates an old approach.

The hardest thing to do is to tell someone that the patch that they have put a lot of effort in to writing is not acceptable.  This is often due to the need for testing, or because the patch is going in the wrong direction.  If this happens, make sure you approach the developer with the respect due and explain clearly why you are chosing not to include their patch.  Keep an open mind:  often today’s bad approach is, in retrospect, what you wish you had done a year from now.  Keep the code around and retrievable, but understand that each submission has its cost.  Just adding code to a project may increase the load on testing, docs, and deployment, and you have to justify that effort.  If you are going to perform a task, you need to ensure all the specified and implied tasks that surround it are accomplished, too.

 

Leadership in Software Development Part 2

Principle #6 – Know Your Personnel and Look Out for Their Well Being

In an Open Source software project, who are “your people?” Your people are your community. Whether they are a fellow developer from your own company, the guy that pops in once every couple of months to make a typo fix, or someone that just reports bugs, they are all the people that lead to the success (or lack thereof) of your project.

Since they don’t report to you (normally) you can’t look out for their well being the same way an Army Officer is expected to take care of the Soldier’s in the unit. You won’t be checking their feet for frostbite unless it is after a drunken Meetup on a winter night. Most open source developers will not meet each other face to face.

What you do need to do is to be aware of the reasons that the people that are drawn to your project have for getting involved. The most common reasons are that your project are essential to them getting their “day Job” done. As such, taking care of them means doing right by the project. Probably most important is to be responsive to patch submissions. If a user submits a patch, it means that they care about the feature or bug addressed by that patch. It might be essential for them putting your product into live deployment, or shipping their own product. You have to be smart: balance stability against responsiveness. Communicate, don’t let changes sit unanswered.

As with most organizations, there are going to be different viewpoints on topics. As a leader, it is not your job to make every last decision. Part of being a grown up is letting go of control, especially about the things that you care less about. Take input from many community members on process, code standards, dependencies, and let consensus grow. Sometimes you need to make the big decisions, just don’t feel the need to be all the time.

One of the quotes on leadership that has made the deepest impression on me is Schofield’s Definition of Discipline:

The discipline which makes the soldiers of a free country reliable in battle is not to be gained by harsh or tyrannical treatment. On the contrary, such treatment is far more likely to destroy than to make an army. It is possible to impart instruction and to give commands in such a manner and such a tone of voice to inspire in the soldier no feeling but an intense desire to obey, while the opposite manner and tone of voice cannot fail to excite strong resentment and a desire to disobey. The one mode or the other of dealing with subordinates springs from a corresponding spirit in the breast of the commander. He who feels the respect which is due to others cannot fail to inspire in them regard for himself, while he who feels, and hence manifests, disrespect toward others, especially his inferiors, cannot fail to inspire hatred against himself.

You have to respect the people in your community, especially the ones that you disagree with the most. Your communication should be respectful. It is easy to assume the worst in someone, to get angry, and to lose your head. You will regret it. And nothing disappears from the internet, at least not quickly.

 

Leadership in Software Development Part 1

I’ve been in and out of leadership roles from High School onward. For the past decade and a half, I’ve been a professional software developer. During that time, I’ve been in a leadership position roughly a third of the time. Recently, I was asked to evaluate my Leadership Philosophy (more on that later). I’ve also had to do the annual counselling that My company formalizes.

One tool we learned in the Army was the list of Leadership principals. As part of my evaluation, I want to see how I think they apply to what I do: Software Development in an Open Source project space. Here’s what I’ve come up with so far:

Principle #1 – Know Yourself and Seek Self Improvement
Principle #2 – Be Technically Proficient

At first blush, these may seem to be the same thing. However, this is leadership focused, and the two points emphasize different aspects of competency. It is impossible to lead in software development without knowing what you are doing technically. But Principle 1 is referring to leadership skills in particular, as well as any aspect of your life outside of coding that can impact your job. Punctuality, cleanliness, clarity of communication, focus, temper, and so forth. You might be the smartest code jockey in history, but it doesn’t mean you have all the necessary skills to lead a team.

That said, you should be able to do the job of everyone under you in your team should the need arise. Probably the most important reason for this is so that you can ensure that what each team member is doing contributes to the overall success of the team. If you cannot read SQL, you won’t be able to understand what your DBA is proposing. Code is code, and you should be comfortable with all programming languages and paradigms. What, you are not? Get studying.

Principle #3 – Seek Responsibility and Take Responsibility for Your Actions
Principle #4 – Make Sound and Timely Decisions

Sometimes a technical lead or managerial position gets thrown in your lap, but that usually happens after you have shown that you can do the job. That comes from solving problems.

Now, “Take Responsibility for Your Actions” might sound like advice to be willing to admit when you are wrong. That is only a small part of it. In reality, development is filled with thousands of tiny decisions, and not all of them are going to be optimal. Yes, sometimes you will make big mistakes, or will have to take the hit for a mistake someone on your team made despite your best efforts. That is the lesser part of taking responsibility.

The greater part is understanding that your job is to get high quality software out to your customers. You need to ensure that all of the links in that chain are covered. You might have the greatest solution to a technical problem, but if the user can’t install it, or if upgrading from a previous version will ruin the uptime of a critical system, you have more work to do. Quality Assurance takes a lot of resources, but it is essential to building quality code. You need to make the decisions that will affect not just the current code base, but the ongoing process of improving code. It is easier to make corrections closer to the decisions than down the road: make decisions, look at the impact they are having, and make continual adjustments.

Probably most important is code reviewing. Fortunately, the programming world has caught on to the fact that code reviews are good. The projects I’ve worked on these past few years have all had mandatory code review policies. However, as a senior person, your responsibility to code review is greater than ever, as you will set the tone for the people on your team.If you do cursory reviews, others will either do cursory review, or will shame you. Neither is good for your team.

You are also responsible for making sure that the code reviews are checking for certain standards. PEP8 under python, Java coding conventions, and so forth are necessary but not sufficient. This is another case where technical proficiency comes in. You need to know the requirements of your code base. For example, if you are using a single threaded dispatcher like Eventlet, make sure that none of the code you introduce blocks, or you will deadlock you application.

Principle #5 – Set the Example

Lead by example can only occur if you have the skills to do something well yourself. Often, in a review, you need to provide an example to a developer how you think they should redo a piece of code. But it also refers to that limitless list from “Know yourself.” If you show up at Noon, your developers will show up at noon. If your reviews are lackadaisical, theirs will be as well. If you are rude to members of your team, they will pick up on it, and the cohesion of the unit will suffer. If you communicate poorly, the team will communicate poorly.

Of course, this being the Open Source world, it doesn’t always happen like that. Often, someone else will step up and fill the vacuum. One member of your team that believes in good communication may take it upon themselves to be the “information pump” of the group. Well mannered project members may be better able to soothe over ruffled feathers. But often we also see communities fall apart due to rudeness and poor software development practices. Best not leave it to chance.

A Work in Progress

A Work in Progress
(For Robin)

You keep asking me for a Melody
Something sort of Upbeat and in a Major Key
Now you know that I’m not Lazy
And I did not Forget
So, sorry my dear sister, but
Your song ain’t written yet
sorry my dear sister, but
Your song ain’t written yet

Sorry that its had to keep you waiting
The process has become a bit frustrating
the rhymes that I have written
are too awkward to accept
Its a work in progress still
Your song ain’t written yet.
Yeah, its a work in progress still
Your song ain’t written yet.

Lately you’ve been going through some tough times.
Wondering what’s the pattern in your life lines.
You know there’s so much out there
that you want to go and get
Just wait a little longer now
Your song ain’t written yet.
Just a little patience now
Your song ain’t written yet.

Soon you will perceive a brand new melody
Something sort-of upbeat, and in a major key
that will make you heart beat quicken
put that bounce back in your step
Just ’cause you haven’t heard it don’t mean
Your song ain’t written yet.
Its a work in progress still
Your song ain’t written yet.

(Copyright Adam Young, 2014, all rights reserved)

November 04, 2015

Risk report update: April to October 2015

Picture of risk playing cardsIn April 2015 we took a look at a years worth of branded vulnerabilities, separating out those that mattered from those that didn’t. Six months have passed so let’s take this opportunity to update the report with the new vulnerabilities that mattered across all Red Hat products.

ABRT (April 2015) CVE-2015-3315:

ABRT (Automatic Bug Reporting Tool) is a tool to help users to detect defects in applications and to create a bug report. ABRT was vulnerable to multiple race condition and symbolic link flaws. A local attacker could use these flaws to potentially escalate their privileges on an affected system to root.

This issue affected Red Hat Enterprise Linux 7 and updates were made available. A working public exploit is available for this issue. Other products and versions of Enterprise Linux were either not affected or not vulnerable to privilege escalation.

JBoss Operations Network open APIs (April 2015) CVE-2015-0297:

Red Hat JBoss Operations Network is a middleware management solution that provides a single point of control to deploy, manage, and monitor JBoss Enterprise Middleware, applications, and services. The JBoss Operations Network server did not correctly restrict access to certain remote APIs which could allow a remote, unauthenticated attacker to execute arbitrary Java methods. We’re not aware of active exploitation of this issue. Updates were made available.

“Venom” (May 2015) CVE-2015-3456:

Venom was a branded flaw which affected QEMU. A privileged user of a guest virtual machine could use this flaw to crash the guest or, potentially, execute arbitrary code on the host with the privileges of the host’s QEMU process corresponding to the guest.

A number of Red Hat products were affected and updates were released. Red Hat products by default would block arbitrary code execution as SELinux sVirt protection confines each QEMU process.

“LogJam” (May 2015) CVE-2015-4000:

TLS connections using the Diffie-Hellman key exchange protocol were found to be vulnerable to an attack in which a man-in-the-middle attacker could downgrade vulnerable TLS connections to weak cryptography which could then be broken to decrypt the connection.

Like Poodle and Freak, this issue is hard to exploit as it requires a man in the middle attack. We’re not aware of active exploitation of this issue. Various packages providing cryptography were updated.

BIND DoS (July 2015) CVE-2015-5477:

A flaw in the Berkeley Internet Name Domain (BIND) allowed a remote attacker to cause named (functioning as an authoritative DNS server or a DNS resolver) to exit, causing a denial of service against BIND.

This issue affected the versions of BIND shipped with all versions of Red Hat Enterprise Linux. A public exploit exists for this issue. Updates were available the same day as the issue was public.

libuser privilege escalation (July 2015) CVE-2015-3246:

The libuser library implements a interface for manipulating and administering user and group accounts. Flaws in libuser could allow authenticated local users with shell access to escalate privileges to root.

Red Hat Enterprise Linux 6 and 7 were affected and updates available same day as issue was public. Red Hat Enterprise Linux 5 was affected and a mitigation was published.  A public exploit exists for this issue.

Firefox lock file stealing via PDF reader (August 2015) CVE-2015-4495:

A flaw in Mozilla Firefox could allow an attacker to access local files with the permissions of the user running Firefox. Public exploits exist for this issue, including as part of Metasploit, and targeting Linux systems.

This issue affected Firefox shipped with versions of Red Hat Enterprise Linux and updates were available the next day after the issue was public.

Firefox add-on permission warning (August 2015) CVE-2015-4498:

Mozilla Firefox normally warns a user when trying to install an add-on if initiated by a web page.  A flaw allowed this dialog to be bypassed.

This issue affected Firefox shipped with Red Hat Enterprise Linux versions and updates were available the same day as the issue was public.

Conclusion

The issues examined in this report were included because they were meaningful.  This includes the issues that are of a high severity and are likely easy to be exploited (or already have a public working exploit), as well as issues that were highly visible or branded (with a name or logo), regardless of their severity.

Between 1 April 2015 and 31 October 2015 for every Red Hat product there were 39 Critical Red Hat Security Advisories released, addressing 192 Critical vulnerabilities.  Aside from the issues in this report which were rated as having Critical security impact, all other issues with a Critical rating were part of Red Hat Enterprise Linux products and were browser-related: Firefox, Chromium, Adobe Flash, and Java (due to the browser plugin).

Our dedicated Product Security team continue to analyse threats and vulnerabilities against all our products every day, and provide relevant advice and updates through the customer portal. Customers can call on this expertise to ensure that they respond quickly to address the issues that matter.  Hear more about vulnerability handling in our upcoming virtual event: Secure Foundations for Today and Tomorrow.

CVE-2015-5602 and SELinux?

How is SELinux helpful?

That is one of the most common questions that we get when a new CVE (Common Vulnerabilities and Exposures) appears. We explain SELinux as a technology for process isolation to mitigate attacks via privilege escalation.

A real example of this attack can be seen in CVE-2015-5602 known as Unauthorized Privilege Escalation in sudo. Under certain conditions, this security issue allows you to modify any file on your system. From there it follows that you are able to modify the /etc/shadow file, containing secure user account data. To demonstrate how SELinux could help you here we would remind a SELinux feature called SELinux Confined Users.

SELinux confined users

On Fedora systems, the default Targeted security policy is enforced to confine commonly used applications/services to mitigate attacks on a system. With this policy, Linux users are unconfined by default. It means there are no restrictions for attacks coming from these users. CVE-2015-5602 is such an example. Fortunately, you can configure SELinux to confine also Linux users how it is described in Confining users with SELinux in RHEL and Confining Users on Fedora as a part of process isolation for Linux users.

I personally use SELinux confined users by default to take all advantages of process isolation for Linux users on my Fedora system.

In my case mgrepl Linux user is mapped to staff_u SELinux user

# semanage login -l |grep mgrepl

Login Name SELinux User MLS/MCS Range
mgrepl staff_u s0-s0:c0.c1023

who is supposed to be a SELinux login user with common administrative permissions and he is able to run sudo in the dedicated SELinux domain.

type_transition staff_t sudo_exec_t : process staff_sudo_t;

It tells me if staff_u SELinux user executes sudo then there is a SELinux transition to staff_sudo_t domain. With configured sudoers we can see

$ sudo -e ~/test.txt
$ ps -efZ | grep sudo
staff_u:staff_r:staff_sudo_t:s0-s0:c0.c1023 root 5390 4925 0 23:04 pts/3 00:00:00 sudo -e /home/mgrepl/test.txt

CVE-2015-5602 vs. confined SELinux users

With followed steps to reproduce of CVE-2015-5602 and with defined SELinux confinement for this Linux user using semanage utility

# semanage login -a -s staff_u usr
$ ssh usr@localohst
[usr@localhost ~]$ ln -s /etc/shadow ~/temp/test.txt
[usr@localhost ~]$ id -Z
staff_u:staff_r:staff_t:s0

we can try to edit ~/temp/test.txt file to access /etc/shadow

[usr@localhost ~]$ sudo -e ~/temp/test.txt
sudoedit: /home/usr/temp/test.txt: Permission denied
[usr@localhost ~]$ getenforce
Enforcing

That’s it.

SELINUX STOPS YOU!.

And the following log event is generated for this denied.

type=AVC msg=audit(1446584115.930:558): avc: denied { read } for pid=3098 comm="sudoedit" name="shadow" dev="dm-1" ino=1049344 scontext=staff_u:staff_r:staff_sudo_t:s0 tcontext=system_u:object_r:shadow_t:s0 tclass=file permissive=0

Are you now thinking about SELinux confined users?

I would like to thank Daniel Kopeček <dkopecek@redhat.com> for a heads-up and co-authoring this post.


FreeIPA PKI: current plans and a future vision

FreeIPA’s X.509 PKI features (based on Dogtag Certificate System) continue to be an area of interest for users and customers. In this post I summarise recently-added PKI features in FreeIPA, work in progress, and what we plan to do in future releases. Then I will outline my personal vision for what the future of PKI in FreeIPA should look like, noting how it will address pain points and limitations of the existing architecture.

Recent changes and work in progress

In the past only a single certificate profile was supported (appropriate for TLS-enabled services) but as of FreeIPA 4.2 multiple certificate profiles are supported (including custom profiles), as are user certificates. CA ACL rules define which profiles can be used to issue certificates to particular principals (users, groups, hosts, hostgroups and/or services). The FreeIPA framework (not Dogtag) enforces CA ACLs.

Custom profiles support means that the PKI can be used for a huge number of use cases, but it is still up to the user or operator to provide a suitable PKCS #10 certificate signing request (CSR).

I am currently working on implementing support for lightweight sub-CAs in Dogtag and FreeIPA so that sub-CAs can be easily created and used to issue certificates. The CA ACLs concept will be extended to include sub-CAs so that use of certain profiles can be restricted to particular CAs.

Problems with the current architecture

To put this this all in context, please study the following crappy diagram of the current FreeIPA PKI architecture:

+----------+
|   User   |
|          |  1. Generate CSR
| +------+ |     (somehow... poor user)
| | krb5 | |
| |ticket| |
+-+--|---+-+
     |                           +-----------+
     | 2. ipa cert-request       |           |
     |    (CSR payload)          |   389DS   |
     v                           |           |
+--------------------+           +-----------+
|  FreeIPA   +-------+                 ^
|            |krb5   |                 |
|            |proxy  <-----------------+
| +-------+  |ticket |   3. Validate CSR
| |RA cert|  +-------+   4. Enforce CA ACLs
+-+---|---+----------+
      |
      | 5. Dogtag cert request
      |    (CSR payload)
      v
+--------+
| Dogtag | 6. Issue certificate
+--------+

The Dogtag CA is the entity that actually issues certificates. FreeIPA requests certificates from Dogtag with the RA Agent credential (an X.509 client certificate) with which the FreeIPA framework has authority to use any profile that accepts RA Agent authentication to issue a certificate. This is a longstanding violation of an important framework design principle: the framework should only ever operate with the privileges of the authenticated principal.

Another problem is that users are burdened with the responsibility of crafting a CSR that is correct for the profile that will be used. This is a nontrivial task even for common types of certificates – it is downright painful once exotic extensions come into play. There is a lot that a user can get wrong, which may result in an invalid CSR or cause Dogtag to reject a request because it does not contain the data required by the profile. Furthermore it is reasonable to expect that any data that appear on a certificate are (or could be) stored in the directory, and could be populated into a certificate automatically according to the profile rather than by copying the data from the CSR.

On the topic of exotic extensions: although FreeIPA ensures that requested extension values of common extensions are appropriate and correspond to the subject principal’s attributes (e.g. making sure that all Subject Alternative Names are valid), no validation of uncommon extensions is performed. Nor should it be – not in the FreeIPA framework, especially; the complexity of validating extension values does not belong here, and validation is impossible if we have not yet taught FreeIPA about the extension or how to validate it, or if the validation involves custom LDAP schema. This is the problem we have with the IECUserRoles extension which we support with a profile but cannot validate – user self-service must be prohibited for profiles like this and certificate administrators must be trusted to only issue certificates with appropriate extension values.

Planned work to address (some of) these issues

The framework privilege separation (lack thereof) issue is tracked in FreeIPA ticket #5011: [RFE] Forward CA requests to Dogtag or helper by GSSAPI. This will remove the RA Agent credential and CA ACL enforcement logic from FreeIPA. Instead, the framework will obtain a proxy ticket to talk to Dogtag on behalf of the requestor principal, and Dogtag will authenticate the user, consult CA ACLs and (if all is well) continue with the certificate issuance process (which could still fail if the data in the CSR does not satisfy the profile requirements).

Implementation details for this ticket are not yet worked out but it will involve creating a service principal for Dogtag and giving Dogtag access to a keytab, performing GSSAPI authentication (probably in a Java servlet realm implementation) and providing a new profile authorisation class to read and enforce CA ACLs. Tomcat configuration and FreeIPA profile configurations will have to be updated (during upgrade) to use the new classes.

Ticket #4899: [RFE] mechanism to map principal info into certificate requests was filed to improve user experience when creating CSRs for a particular profile. An openssl req configuration file template could be stored for each profile and a command added to fill out the template and return the appropriate config for a given user, host or service. We could go further and supply config templates for other programs, or even create the whole CSR at once. Or even make it part of the cert-request command, bypassing a number of steps! The point is that there is currently a lot of busy-work around requesting certificates that is not necessary, and we can save all certificate users time and pain by improving the process.

With these enhancements, the architecture diagram changes to remove the RA certificate and provide assistance to the user in generating the CSR (which is abstracted as the user reading data from 389DS):

+----------+
|   User   | 1a. Read CSR template / attributes
|          |<--------------------------+
| +------+ |                           |
| | krb5 | |                           |
| |ticket| | 1b. Generate CSR          |
+-+--|---+-+                           |
     |                                 |
     | 2. ipa cert-request             |
     |    (CSR payload)                |
     v                                 |
+-----------+                          |
|  FreeIPA  |                          |
|           |                    +-----------+
|    +------+                    |           |
|    |krb5  |  3. Validate CSR   |   389DS   |
|    |proxy <------------------->|           |
|    |ticket|                    +-----------+
+----+--|---+                          ^
        |                              |
        | 4. Dogtag cert request       |
        |    (CSR payload)             |
        v                              |
+--------------------+                 |
|  Dogtag    +-------+                 |
|            |krb5   |                 |
|            |proxy  <-----------------+
|            |ticket |    5. Enforce CA ACLs
|            +-------+
+--------------------+
  6. Issue certificate

Future of FreeIPA PKI: my vision

There are still a number of issues that the improved architecture does not address. The data in CSRs still have to be just right. There is no way to validate exotic or unknown extension data, limiting use cases or restricting user self-service and burdening certificate issuers with the responsiblity of getting it right. There is no way to pull data from custom LDAP schema into certificates or even to automatically include data that we know is in the directory on certificates (e.g. email, KRB5PrincipalName or other kinds of alternative names).

The central concept of my vision for the future of FreeIPA’s PKI is that Dogtag should read from LDAP all the data it needs to produce a certificate according to the nominated profile (except for the subject public key which must be supplied by the requestor). This relieves the FreeIPA framework and Dogtag of most validation requirements, because we would ignore all data submitted except for the subject public key, subject principal, requestor principal and profile ID (CA ACLs would still need to be enforced).

In this architecture the PKCS #10 CSR devolves to a glorified public key format. In fact the planned CSR template feature is completely subsumed! We would undoubtedly continue to support PKCS #10 CSRs, and it would make sense to continue validating aspects of the CSR to catch obvious user errors; but this would be a UX nicety, not an essential security check.

The architecture sketch now becomes:

+----------+
|   User   |
|          | 1. Generate keypair
| +------+ |
| | krb5 | |
| |ticket| |
+-+--|---+-+
     |
     | 2. ipa cert-request
     |    (PUBKEY payload)
     v
+--------------+
|   FreeIPA    |
|              |                 +-----------+
| +----------+ |                 |           |
| |krb5 proxy| |                 |   389DS   |
| |  ticket  | |                 |           |
+-+----|-----+-+                 +-----------+
       |                               ^
       | 3. Dogtag cert request        |
       |    (PUBKEY payload)           |
       v                               |
+--------------------+                 |
|  Dogtag    +-------+                 |
|            |krb5   |                 |
|            |proxy  <-----------------+
|            |ticket |    4. Enforce CA ACLs
|            +-------+    5. Read data to be included on cert
+--------------------+
  6. Issue certificate

Consider the IECUserRoles example under this new architecture and observe the following advantages:

  • The user is relieved of the difficult task of producing a CSR with exotic extension data.
  • The profile reads the needed data (assuming it exists in standard or custom schema), allowing IECUserRoles or other exotic extensions to be easily supported.
  • Because we are not accepting raw extension data that cannot be validated, user self-service can be allowed (appropriate write access controls must still exist for the attributes involved, though) and admins are relieved of crafting or verifying the correct extension values.

In terms of implementation, over and above what was already planned this architecture will require several new Dogtag profile policy modules to be implemented, and these will be more complex (e.g. they will read data from LDAP). Pleasantly, these do not actually have to be implemented in or be formally a part of Dogtag – we can write, maintain and ship these Java classes as part of FreeIPA and easily configure Dogtag to use them.

In return we can remove a lot of validation logic from FreeIPA and profile configurations will be easier to write and understand (decide which extensions you want and trust the corresponding profile policy class to "do the right thing").

Importantly, it becomes possible for administrators to provide their own profile components implementing the relevant Java interface that read custom schema into esoteric or custom X.509 extensions, supporting any use case that we (the FreeIPA developers) don’t know about or can’t justify the effort to implement. Although this is technically possible today, moving to this approach in FreeIPA will simplify the process and provide significant prior art and expertise to help users or customers who want to do this.

Concluding thoughts

There are plans for other FreeIPA PKI features that I have not mentioned in this post, such as Let’s Encrypt / ACME support, or an interactive "profile builder" feature. The proposed architecture changes do not directly impact these features although simplifying profile configuration in any way would make the profile builder a more worthwhile / tractable feature.

The vision I have outlined here is my own at this point – although I have hinted at it over the past few months this post is my first real effort to expound and promote it. It is a significant shift from how we are currently doing things and will be a substantial amount of work but I hope that people will see the value in reducing user and administrator workload and being able to support new X.509 use cases without significant ongoing effort by the FreeIPA or Dogtag development teams.

Feedback on my proposal is strongly encouraged! You can leave comments here, send an email to me (ftweedal@redhat.com) or the FreeIPA development mailing list (freeipa-devel@redhat.com) or continue the discussion on IRC (#freeipa on Freenode).

November 03, 2015

Hack your meetings
I don't think I've ever sat down to a discussion about security that doesn't end with a plan to fix every problem ever, which of course means we have a rather impressive plan where failure is the only possible outcome.

Security people are terrible at scoping
I'm not entirely sure why this is, but almost every security discussions spirals out of control and topics that are totally unrelated seem to always come up, and sometime dominate the conversation. Part of me suspects it's because there is so much to do, it's hard to know where to start.

I've recently dealt with a few meetings that had drastically different outcomes. The first got stuck on details, oceans will need to be boiled. The second meeting was fast and insanely productive. The reason why this meeting was fantastic took me a while to figure out. We were all social engineered and it was glorious.

Meeting #1
The first meeting was a pretty typical security meeting. We have a bunch of problems, no idea where to even start, so we kept getting deeper and deeper, never solving anything. It wasn't a bad group, I don't think less of anyone. I was without a doubt acting just like everyone else. In fact I had more than one of these this week. I'm sure I'll have more next week.

Meeting #2
The meeting I'm calling meeting 2 was a crazy event unlike one I've ever had. We ended with a ton of actions and everyone happy with the results. It took me an hour of reflection to figure out what happened. One of the people on the call managed to social engineered everyone else. I have no idea if he knows this, it doesn't matter because it was awesome and I'm totally stealing the technique.

A topic would come up, it would get some discussion, know basically what we had to do, then we would hear "We should do X, I'll own the task". After the first ten minutes one person owned almost everything. After a while the other meeting attendees started taking tasks away because one person had too many.

This was brilliant.

Of course I could see this backfire if you have a meeting full of people happy to let you take all the actions, but most groups don't work like this. In almost every setting everyone wants to be an important contributing member.

I'm now eager to try this technique out. I'm sure there is nuance I'm not aware of yet, but that's half the fun in making any new idea your own.

Give it a try, let me know how it goes.

Join the conversation, hit me up on twitter, I'm @joshbressers

October 27, 2015

The Third Group
Anytime you do anything, no matter how small or big, there will always be three groups of people involved. How we interact with these groups can affect the outcome of our decisions and projects. If you don't know they exist it can be detrimental to what you're working on. If you know who they are and how to deal with them, a great deal of pain can be avoided, and you will put yourself in a better position to succeed.

The first group are those who agree with whatever is it you're doing. This group is easy to deal with as they are already in agreement. You don't have to do anything special with this group. We're not going to spend any time talking about them.

The second group is reasonable people who will listen to what you have to say. Some will come to agree with you, some won't. The ones who don't agree with you possibly won't even tell you they disagree with you. If what you're doing is a good idea you'll get almost everyone in the second group to support you, if you don't ignore them. This is the group you ignore the most, but it's where you should put most of your energy.

The third group is filled with unreasonable people. These are people that you can prove your point beyond a reasonable doubt and they still won't believe you. There is absolutely nothing you can say to this group that will make a difference. These are the people who deny evidence, you can't understand why they deny the facts, and you will spend most of your time trying to bring them to your side. This group is not only disagreeable, its' dangerous to your cause. You waste your time with the third group while you alienate the second group. This is where most people incorrectly invest almost all their time and energy.

The second group will view the conversations between the first group and the third group and decide they're both insane. Members of the first and third group are generally there for some emotional reason. They're not always using facts or reality to justify their position. You cannot convince someone if they believe they have the moral high ground. So don't try.

Time spent trying to convince the third group is time not spend engaging the second group. Nobody wants to be ignored.

The Example

As always, these concepts are easier to understand with an example. Let's use climate change because the third group is really loud, but not very large.

The first group are the climate scientists. Pretty much all of them. They agree that climate change is real.

The second group is most people. Some have heard about climate change, a lot will believe it's real. Some could be a bit skeptical but with a little coddling they'll come around.

The third group are the deniers. These people are claiming that CO2 is a vegetable. They will never change their minds. No really never. I bet you just thought about how you could convince them just now. See how easy this trap is?

The first group spends huge amounts of time trying to talk to the third group. How often do you hear of debates, or rebuttals, or "conversations" between the first and third group here. How often do you hear about the scientists trying to target the second group? Even if it is happening it's not interesting so only first-third interactions get the attention.

The second group will start to think the scientists are just as looney as the third group. Most conversations between group one and three will end in shouting. A reasonable person won't know who to believe. The only way around this is to ignore the third group completely. Any time you spend talking to the third group hurts your relationship with the second group.

What now?

Start to think about the places you see this in your own dealings. Password debates. Closed vs open source. Which language is best. The list could go on forever. How do you usually approach these? Do you focus on the people who disagree with you instead of the people who are in the middle?

The trick with security is we have no idea how to even talk to the second group. And we rather enjoy arguing with the third. While talking to the second group can be tricky, the biggest thing at this point is to just know when you're burning time and good will by engaging with the third group. Walk away, you can't win, failure is the only option if you keep arguing.

Join the conversation, hit me up on twitter, I'm @joshbressers

October 22, 2015

Red Hat CVE Database Revamp

Since 2009, Red Hat has provided details of vulnerabilities with CVE names as part of our mission to provide as much information around vulnerabilities that affect Red Hat products as possible.  These CVE pages distill information from a variety of sources to provide an overview of each flaw, including information like a description of the flaw, CVSSv2 scores, impact, public dates, and any corresponding errata that corrected the flaw in Red Hat products.

Over time this has grown to include more information, such as CWE identifiers, statements, and links to external resources that note the flaw (such as upstream advisories, etc.).  We’re pleased to note that the CVE pages have been improved yet again to provide even more information.

Beyond just a UI refresh, and deeper integration into the Red Hat Customer Portal, the CVE pages now also display specific “mitigation” information on flaws where such information is provided.  This is an area where we highlight certain steps that can be taken to prevent the exploitability of a flaw without requiring a package update. Obviously this is not applicable to all flaws, so it is noted only where it is relevant.

In addition, the CVE pages now display the “affectedness” of certain products in relation to these flaws.  For instance, in the past, you would know that an issue affected a certain product either by seeing that an erratum was available (as noted on the CVE page) or by visiting Bugzilla and trying to sort through comments and other metadata that is not easily consumable.  The CVE pages now display this information directly on the page so it is no longer required that a visitor spend time poking around in Bugzilla to see if something they are interested in is affected (but has not yet had an erratum released).

To further explain how this works, the pages will not show products that would not be affected by the flaw.  For instance, a flaw against the mutt email client would not note that JBoss EAP is unaffected because EAP does not ship, and has never shipped, the mutt email client.  However, if a flaw affected mutt on Red Hat Enterprise Linux 6, but not Red Hat Enterprise Linux 5 or 7, the CVE page might show an erratum for Red Hat Enterprise Linux 6 and show that mutt on Red Hat Enterprise Linux 5 and 7 is unaffected.  Previously, this may have been noted as part of a statement on the page, but that was by no means guaranteed.  You would have to look in Bugzilla to see if any comments or metadata noted this; now it is quite plainly noted on the pages directly.

This section of the page, entitled “Affected Packages State”, is a table that lists the affected platform, package, and a state.  This state can be:

  • “Affected”: this package is affected by this flaw on this platform
  • “Not affected”: this package, which ships on this platform, is not affected by this flaw
  • “Fix deferred”: this package is affected by this flaw on this platform, and may be fixed in the future
  • “Under investigation”: it is currently unknown whether or not this flaw affects this package on this platform, and it is under investigation
  • “Will not fix”: this package is affected by this flaw on this platform, but there is currently no intention to fix it (this would primarily be for flaws that are of Low or Moderate impact that pose no significant risk to customers)

For instance, the page for CVE-2015-5279 would look like this, noting the above affected states:

new-cve-pagesBy being explicit about the state of packages on the CVE pages, visitors will know exactly what is affected by this CVE, without having to jump through hoops and spend time digging into Bugzilla comments.

Other improvements that come with the recent changes include enhanced searching capabilities.  You can now search for CVEs by keyword, so searching for all vulnerabilities that mention “openssl” or “bind” or “XSS” are now possible.  In addition, you can filter by year and impact rating.

The Red Hat CVE pages are a primary source of vulnerability information for many, a gateway of sorts that collects the most important information that visitors are often interested in, with links to further sources of information that are of interest to the vulnerability researcher.

Red Hat continues to look for ways to provide extra value to our customers.  These enhancements and changes are designed to make your jobs easier, and we believe that they will become an even greater resource for our customers and visitors.  We hope you agree!

October 20, 2015

How do we talk to normal people?
How do we talk to the regular people? What's going to motivate them? What matters to them?

You can easily make the case that business is driven by financial rewards, but what can we say or do to get normal people to understand us, to care? Money? Privacy? Donuts?

I'm not saying we're going to turn people into experts, I'm not even suggesting they will reach a point of being slightly competent. Most people can't fix their car, or wire their house, or fix their pipes. Some can, but most can't. People don't need to really know anything about security, they don't want to, so there's no point in us even trying. When we do try, they get confused and scared. So really this comes down to:

Don't talk to normal people

Talking to them really only makes things worse. What we really need is them to trust the security people. Trust that we'll do our jobs (which we're not currently). Trust that the products they buy will be reasonably secure (which they're not currently). Trust that the industry has their best interest in mind (which they don't currently). So in summary, we are failing in every way.

Luckily for us most people don't seem to be noticing yet.

It's also important to clarify that some people will never trust us. Look at climate change denial. Ignore these people. Every denier you talk to who is convinced Google sneaks into their house at night and steals one sock is wasted time and effort. Focus on people who will listen. As humans we like to get caught up with this "third" group, thinking we can convince them. We can't, don't try. (The first group is us, the second is reasonable people, we will talk about this some other day)

So back to expectations of normal people.

I'm not sure how to even describe this. I try to think of analogies, or to compare it to existing industries. Nothing fits. Any analogy we use, ever existing industry, generally has relatively understood models surrounding them. Safes have a physical proximity requirement, the safety of cars doesn't account for malicious actors, doors really only keep out honest people. None of these work.

We know what some of the problems are, but we don't really have a way to tell people about them. We can't use terms that are even moderately complex. Every time I work through this I keep coming back to trust. We need people to trust us. I hate saying that, blind trust is never a good thing. We have to earn it.

Trust me, I'm an expert!

So let's assume our only solution for the masses at this point is "trust". How will anyone know who to trust? Should I trust the guy in the suit? What about the guy who looks homeless? That person over there uses really big words!

Let's think about some groups that demand a certain amount of trust. You trust your bank enough to hold your money. You have to trust doctors and nurses. You probably trust engineers who build your buildings and roads. You trust your teachers.

The commonality there seems to be education and certification. You're not going to visit a doctor who has no education, nor an engineer who failed his certification exam. Would that work for us? We have some certifications, but the situation is bleak at best, and the brightest folks have zero formal qualifications.

Additionally, who is honestly going to make certifications a big deal, everything we need know changes ever 6 months.

As I write this post I find myself getting more and more confused. I wonder if there's any way to fix anything. Let's just start simple. What's important? Building trust, so here's how we're going to do it.
  1. Do not talk, only answer questions (and don't be a pedantic jerk when you do)
  2. Understand your message, know it like the back of your hand
  3. Be able to describe the issue without using any lingo (NONE)
  4. Once you think you understand their challenges, needs, and asks; GOTO 1
I'm not saying this will work, I'm hopeful though that if we start practicing some level of professionalism we can build trust. Nobody ever built real trust by talking, you build trust by listening. Maybe we've spent so much time being right we never noticed we were wrong.


Join the conversation, hit me up on twitter, I'm @joshbressers

October 19, 2015

Admin

While I tend to play up bug 968696 for dramatic effect, the reality is we have a logical contradiction on what we mean by ‘admin’ when talking about RBAC.

In early iterations of OpenStack, roles were global. This is reflected in many of the Policy checks that only look for the global role. However, prior to the Keystone-Light rewrite, role assignments became scoped to tenants. This shows up in the Keystone git history. As this pattern got established, some people wrote policy checks that assert:

role==admin and tenant_id=resource.tenant_id

This contradicts the global-ness of the admin roles. If I assign

(‘joeuser’, ‘admin’,’mytenant’)

I’ve just granted them the ability to perform all of the admin operations.

Thus, today we have a situation where, unless the user rewrites the default policy, they have to only assign the role admins to users that are trusted to be admins on the whole deployment.

We have a few choices.

  1. Remove Admin from the scoping for projects. Admin is a special role reserved only for system admins. Replace project scoped admins with ‘manager’ or some other comparable role. This is actually the easiest solution.
  2. Create a special project for administrative actions. Cloud admin users are assigned to this project. Communicate that project Id to the remote systems. This is what the policy.v3cloudsample.json file (http://git.openstack.org/cgit/openstack/keystone/tree/etc/policy.v3cloudsample.json) recommends.

However, 2 is really not practical without some significant engineering. For a new deployment, it would require the following steps.

  1. Every single policy file would have to be “templatized”
  2. Then deployment mechanism would have to create the admin project, get the id for it, and string replace it in the policy file.

We could make this happen in Devstack. The same is true of Puppet, OSAD, and Fuel. There would be a lag and the downstream mechanisms would eventually pick it up, multiple releases down the road.
I went through this logic back when I started proposing the Dynamic Policy approach. If OpenStack managed policy deployment via an inte4rnal mechanism, then adding things like the admin_project_id becomes trivial.

While I think Dynamic Policy provides a lot of value, I concede that it is overkill for just substituting in a single value. The real reason I am backing off Dynamic Policy for the moment is that we need to better understand what part of policy should be dynamic and what part should be static; we are just getting that clean now.

There is an additional dimension to the admin_project_id issue that several developers want solved. In larger deployments, different users should have administrative capabilities on different endpoints. Sometimes this is segregated by service (storage admins vs network admins) and sometimes by region.

Having a special project clearly communicates the intention of RBAC. But even clearer would be to have the role assignment explicitly on the catalog item itself. Which of the following statements would you think is clearer?

  1. Assign Joe the admin role on the project designated to administer endpoint 0816.
  2. Assign Joe the admin role on endpoint 0816.

I think you will agree that it is the latter. Making this happened would not be too difficult on the Keystone side, and would require fairly simple changes on the policy enforcement of the remote projects. We’ve already discussed “endpoint binding of tokens” where an endpoint needs to know its own ID. Having a new “scope” in a token that is endpoint_id would be fairly easy to execute.

One drawback, though, it that all of the client tooling would need to change. Horizon, openstackclient, and keystoneauth would need to handle “endpoint” as the scope. This includes third party integrations, which we do not control.

All of these constraints drive toward a solution where we link the admin project to the existing endpoint ids.

Make the catalog a separate domain.
Make regions, services, and endpoints projects
Use the rules of Hierarchical Multitenancy to manage the role assignments for a project.

On the enforcing side, endpoints *must* know their own ID. They would have checks that assert token.project_id = self.endpoint_id.

This is the “least magic” approach. It reuses existing abstractions without radically altering them. The chance of a collision between an existing project_id and and endpoint_id is vanishingly small\, and could be addressed by modifying one or the other accordingly. The biggest effort would be in updating the policy files, but this seems to be within the capability of cross project efforts.

We will be discussing this at the Cross Project session at the summit on Global Admin

Please read this, process it, and be ready to help come to a proper conclusion of this bug.

References:
“admin”-ness not properly scoped
Original Dynamic Policy Post
Current Dynamic Policy Wiki
Endpoint_ID from URL
Catalog Scoped Roles

October 13, 2015

How do we talk to business?
How many times have you tried to get buyin for a security idea at work, or with a client, only to have them say "no". Even though you knew it was really important, they still made the wrong decision.

We've all seen this more times than we can count. We usually walk away grumbling about how sorry they'll be someday. Some of them will be, some won't. The reason is always the same though:

You're bad at talking to the business world

You can easily make the argument that money is a big motivator for a business. For some it's the only motivator. Businesses want save money, prevent problems, be competitive, and stay off the front page for bad news. The business folks don't care about technical details as much as they worry about running their business. They don't worry about which TLS library is the best. They want to know how something is going to make their lives easier (or harder).

If we can't frame our arguments in this context, we have no argument we're really just wasting time.


Making their lives easier


We need to answer the question, how can security make lives easier? Don't answer too quickly, it's complicated.

Everything has tradeoffs. If we add a security product or process, what's going to be neglected? If we purchase a security solution, what aren't we purchasing with those funds? Some businesses would compare these choices to buying food or tires. If you're hungry, you can't eat tires.

We actually have two problems to solve.
  1. Is this problem actually important
  2. How can I show the value
Is something important is always tricky. When you're a security person, lots of things seem important but aren't really. Let's say inside your corporate network someone wants to disable their firewall. Is that important? It could be. Is missing payroll because of the firewall more important? Yes.

First you have to decide how important is the thing you have in mind. I generally ponder if I'd be willing to get fired over this. If the answer is "no", it's probably not very important. We'll talk about how to determine what's important in the future (it's really hard to do).

Let's assume we have something that is important.

Now how do we bring this to the people in charge?

Historically I would write extremely long emails or talk to people at length about how smart I am and how great my idea is. This never works.

You should write up a business proposal. Lay out the costs, benefits, requirements, features, all of it. This is the sort of thing business people like to see. It's possible you may even figure out what you're proposing is a terrible idea before you even get it in front of someone who can write a check. Think for a minute what happens when you develop a reputation for only showing up with good well documented ideas? Right.

Here's how this usually works. Someone has an idea, then it gets debated for days or weeks. It's not uncommon to spend more time actually discussing an idea than it is to implement the thing. By writing down what's going on, there is no ambiguity, there's no misunderstanding, there's no pointless discussion about ketchup.

I actually did this a while back. There was discussion about a feature, it had lasted for weeks, nobody had a good answer and the general idea kept going back and forth. I wrote up a proper business proposal and it actually changed my mind, it was a HORRIBLE idea (I was in favor of it before that). I spent literally less than a single work day and cast in stone our decision. In about 6 hours I managed to negate hundreds of hours of debate. It was awesome.

The language of the business is one of requirements, costs, and benefits. It's not about outsmarting anyone or seeing who knows the biggest word. There's still plenty of nuance here, but for now if you're looking to make the most splash, you need to learn how to write a business plan. I'll leave how you do this as an exercise to the reader, there are plenty of examples.

Join the conversation, hit me up on twitter, I'm @joshbressers

October 06, 2015

What's filling the vacuum?
Anytime there's some sort of vacuum, something will appear to fill the gap. In this context we're going to look at what's filling the vacuum in security. There are a lot of smart people, but we're failing horribly at getting our message out.

The answer to this isn't simple. You have to look at what's getting attention that doesn't deserve to get attention. Just because we know a product, service, or idea is hogwash doesn't mean non security people know this. They have to attempt to find someone to trust, then listen to what they have to say. Unfortunately when you're talking about extremely complex and technical problems, they listen to whoever they can understand as there's no way they can determine who is technically more correct. They're going to follow whoever sounds the smartest.

If you've never seen the musical "The Music Man" you should. This is what we're dealing with.

Rather than dwell on it and try to call out the snake oil, we should put our effort into the messaging. We'll never have a better message than this group, but we really only need to be good enough, not perfect. We always strive for our messages to be perfect, but that's an impossible goal. The goal here is to sound smarter than the con men. This is harder than it sounds unfortunately.

We can use the crypto backdoor conversation as a good example. There are many groups claiming we should have backdoors in our crypto to keep ourselves safer. Security people know this is a bad idea, but here's what the conversation sounds like.

Them

We need crypto backdoors to stop the bad guys, trust us, we're the good guys

Us

<random nonsense>, backdoors don't work
We don't do a good job of telling people why backdoors dont' work. Why should they trust us, why don't backdoors work, who will keep us safe? Our first instinct would be to frame the discussion like this:


  1. Backdoors never work
  2. Look at the TSA key fiasco
  3. Encryption is hard, there's no way to get this right

This argument wont' work. The facts aren't what are important. You have to think about how you make people feel. We just confused them, so now they don't like us. Technical details are fine if you're talking to technical people, but any decent technical person probably doesn't need this explained.

We have to think about how can we make people feel bad about encryption backdoors? That's the argument we need. What can we say that gives them the feels?

I don't know if these work, they're just some ideas I have. I've yet to engage anyone on this topic.

What are things people worry about? They do value their privacy. The old "if you have nothing to fear you have nothing to hide" argument only works when it's not your neighbor who has access to your secrets.

Here's what I would ask
Are you OK with your neighbor/wife/parent having access to your secrets?
Then see where to conversation goes. You can't get technical, we have to focus on emotions, which is super hard for most security people. If you try this out, let me know how it goes.

Join the conversation, hit me up on twitter, I'm @joshbressers

September 29, 2015

We're losing the battle for security
The security people are currently losing the battle to win the hearts and minds of the people. The war is far from over but it's not currently looking good for our team.

As with all problems, if there is a vacuum, something or someone end up filling it. This is happening right now in security. There are a lot of really smart security people out there. We generally know what's wrong, and sometimes even know how to fix it, but the people we need to listen aren't. I don't blame them either, we're not telling them what they need to know.

On the other side though, we also think we understand the problems, but we don't really. Everything we know comes from an echo chamber inside a vacuum. We understand our problems, not their problems.

We have to move our conversations into the streets, the board rooms, and the CIO offices. Today all these people think we're just a bunch of nuts ranting about crazy things. The problem isn't that we're all crazy, it's that we're not talking to people correctly, which also means we're not listening either.

We have to stop talking about how nobody knows anything and start talking about how we're going to help people. Security isn't important to them, they have something they want to do, so we have to help them understand how what we do is important and will help them. We have to figure out how to talk about what we do in words they understand and will motivate them.

How many times have you tried to explain to someone why they should use a firewall and even though it should have been completely obvious, they didn't use it?

How many times have you tried to get a security bug fixed but nobody cared?

How many times have you tried to get a security feature, like stack protector, enabled by developers but nobody wanted to listen?

There are literally thousands of examples we could cover. In virtually every example we failed because we weren't telling the right story. We might have thought we were talking about security, but we really were saying "I'm going to cost more money and make your life harder".

It's time we figure out how to tell these stories. I don't have all the answers, but I'm starting to notice some patterns now that I've escaped from the institution.

There are three important things we're going to discuss in the next few posts:

  1. What's filling the vacuum?
  2. How do we talk to the business world?
  3. How do we talk to normal people?
The vacuum is currently being filled by a lot of snake oil. I'm not interested in calling specific people out, you know who they are. We'll talk about what we can learn from this group. They know how to interact with people, they're successfully getting people to buy their broken toys. This group will go away if we learn how to talk about what we do.

Then we'll talk about what motivates a business. They don't really care about security, they care about making their business successful. So how can we ensure security is part of the solution? We know what's going to happen if there's no security involved.

Lastly we'll talk about the normal people. Folks like your neighbors or parents. Who don't have a clue what's going on, and never will. This group is going to be the hardest of all to talk to. I sort of covered this group in a previous post: How can we describe a buffer overflow in common terms? These are people who have to be forced to wear seat belts, it's not going to be pleasant.

If you have any good stories or examples that would make these stories better, be sure to let me know.

Join the conversation, hit me up on twitter, I'm @joshbressers

September 27, 2015

Ossipee

OpenStack is a big distributed system. FreeIPA is designed for security in distributed system. In order to develop and test each of them, separately or together, I need a distributed system. Virtualization has been a key technology for making this kind of work possible. OpenStack is great of managing virtualization. Added to that is the benefits found when we “fly our own airplanes.” Thus, I am using OpenStack to develop OpenStack.

Steve Okay took this while waiting for a flight to LAS

Early to Rise
757-200 lifts off Rwy 1 at SFO at sunrise. Credit Steve Okay. Used With Permission

One of my tasks is to make it possible to easily reproduce my development environment. In the past, I would have done something like this with Bash. However, I have been coding in Python enough the past 3 years that it is as easy (if not easier) for me to think in Python than in Bash. And again, there are the benefits of actually testing and working with the OpenStack Client APIs.

In order to install Nova properly, I need two networks, a public one that connects to the outside world, and private one that Nova could manage. Each network needs a subnet, and the public subnet needs a router to connect to the public network.

Horizon Network Topology Screen  showing multiple  successful Ossipee networks setups.

Horizon Network Topology

For development, I need two virtual machines. One will run as the IPA Server, and one will run as the OpenStack controller (all-in-one install). In future deployments, I will need multiple controllers fronted by HA proxy, so I want a pattern that will extend.

This is a development setup, which means that I will be coding, as I do, via trial and error. I need to be able to return to a clean setup with a minimum of fuss. Not just to wipe everything, but perhaps only a teardown and recreate the hosts, or a single host, or the network.

I realized I wanted a system that ran as set of tasks. Each would run forward to create, or backwards to teardown. I wanted to be able to compose individual tasks into larger tasks and so forth.

There are many tools out there that I could have used. Ansible 2 OpenStack modules are based on the Shade library. There are other orchestration tools. However, for a small task, I want small code. The whole thing is a single python module, and is understandable in a single viewing. This is personal code, tailored to my exact inseam and sleeve length. It is easy for me to see how to modify it in the future if I want it idempotent or to handle adding random hosts to an existing network.

Well, at least that is how it started. Like most things, it has grown a bit as it is used. My whole team needs the same setups as I have. But still, this is not meant to be a shippable project, this is a software “jig” and will not be maintained after it is needed.

However, the code is worth recording. There are a couple things I feel it offers. First, it shows how to use the python-nova and python-neutron clients with a Session, getting the configuration from the command line;

    @property
    def session(self):
        if not self._session:
            auth_plugin = ksc_auth.load_from_argparse_arguments(self.args)
            try:
                if not auth_plugin.auth_url:
                    logging.error('OS_AUTH_URL not set.  Aborting.')
                    sys.exit(-1)
            except AttributeError:
                pass

            self._session = ksc_session.Session.load_from_cli_options(
                self.args,
                auth=auth_plugin)

        return self._session

It has examples, including the order, of the necessary Neutron commands to build a network and connect it to an external one;

As you can see here, the order is

  1. Router
  2. Network
  3. SubNet
  4. RouterInterface

and the reversal of the steps to tear it down:

  1. RouterInterface
  2. Subnet
  3. Network
  4. Router

Beyond the virtual infrastructure necessary to Run an OpenStack install, Ossipee creates a host enttry for each virtual machine, and resets the OpenSSH known_hosts entry by removing old values:

subprocess.call(['ssh-keygen', '-R', ip_address])

And suppressing the unknown_key check for the first connection:

            subprocess.check_call(
                ['ssh',
                 '-o', 'StrictHostKeyChecking=no',
                 '-o', 'PasswordAuthentication=no',
                 '-l', self.plan.profile['cloud_user'],
                 ip_address, 'hostname'])

Finally, it generates an Ansible inventory file that can be used with our playbooks to install IPA and OpenStack. More about them later.

September 25, 2015

Keystone Unit Tests

Running the Keystone Unit tests takes a long time.

To start with a blank slate, you want to make sure you have the latest from master and a clean git repository.

cd /opt/stack/keystone
git checkout master
git rebase origin/master

clean -xdf keystone/
time tox -r
...
  py27: commands succeeded
ERROR:   py34: commands failed
  pep8: commands succeeded
  docs: commands succeeded
  genconfig: commands succeeded

real	8m17.530s
user	33m1.851s
sys	0m56.828s

The -r option to tox recreates the tox virtual environments. Additional runs should go faster
time tox

  py27: commands succeeded
ERROR:   py34: commands failed
  pep8: commands succeeded
  docs: commands succeeded
  genconfig: commands succeeded

real	5m52.367s
user	30m57.366s
sys	0m35.403s

To run just the py27 tests:

time tox -e py27
...

Ran: 5695 tests in 243.0000 sec.
....
  py27: commands succeeded
  congratulations :)

real	4m18.144s
user	28m51.506s
sys	0m31.286s


Not much faster, so we know where most of the time goes. It also reported the slowest tests:
keystone.tests.unit.token.test_fernet_provider.TestFernetKeyRotation.test_rotation 2.856

So we have 5000+ test that take 4 minutes to run.

Running just a single test:

time tox -e py27  -- keystone.tests.unit.token.test_fernet_provider.TestFernetKeyRotation.test_rotation

....
======
Totals
======
Ran: 1 tests in 4.0000 sec.
....


  py27: commands succeeded
  congratulations :)

real	0m17.200s
user	0m15.802s
sys	0m1.681s

17 Seconds is a little long, considering the test itself only ran for four seconds of it. Once in a while is not a problem, but if this breaks the flow of thought during coding, it is problematic.

What can we shave off? Lets see if we can avoid the discovery step, run inside the venv, and specify exactly the test we want to run;

. .tox/py27/bin/activate
 time python -m testtools.run keystone.tests.unit.token.test_fernet_provider.TestFernetKeyRotation.test_rotation
Tests running...

Ran 1 test in 2.770s
OK

real	0m3.137s
user	0m2.708s
sys	0m0.428s

That seems to have had only an overhead of a second.

OK, what about some of the end-to-end test that set up an HTTP listener and talk to the database, such as those in: keystone.tests.unit.test_v3_auth?

time python -m testtools.run keystone.tests.unit.test_v3_auth
Tests running...
Ran 329 tests in 91.925s
OK

real	1m32.459s
user	1m28.260s
sys	0m4.669s

Fast enough for a pre-commit check, but not for “run after each change.” How about a single test?

time python -m testtools.run keystone.tests.unit.test_v3_auth.TestAuth.test_disabled_default_project_domain_result_in_unscoped_token
Tests running...

Ran 1 test in 0.965s
OK

real	0m1.382s
user	0m1.308s
sys	0m0.076s

I think it is important to run the tests before you write a line of code, and to run the tests continuously. But if you don’t run the entire body of unit tests, how can you make sure you are exercising the code you wrote? One technique is to put in a break-point.

I want to work on the roles infrastructure. Specifically, I want to make the assignment of one (prior) role imply the assignment of another (inferred) role. I won’t go in to the whole design, but I will start with the database structure. Role inference is a a many-to-many relationship. As such, I need to implement a table which has two IDs: prior_role_id and inferred_role_id. Lets start with the database migrations for that.

time python -m testtools.run keystone.tests.unit.test_sql_upgrade
Tests running...

Ran 30 tests in 3.528s
OK

real	0m3.948s
user	0m3.874s
sys	0m0.075s

OK…full disclosure, I’m writing this because I did too much before writing tests, my tests were hanging, and I want to redo things slower and more controlled to find out what went wrong. I have some placeholders for migrations: a way to keep from changing the migration number for my review as other reviews get merged/ They just execute:

def upgrade(migrate_engine):
    pass

So..I’m going to cherry-pick this commit and run the migration test.

migrate.exceptions.ScriptError: You can only have one Python script per version, but you have: /opt/stack/keystone/keystone/common/sql/migrate_repo/versions/075_placeholder.py and /opt/stack/keystone/keystone/common/sql/migrate_repo/versions/075_confirm_config_registration.py

Already caught up with me…

$ git mv keystone/common/sql/migrate_repo/versions/075_placeholder.py keystone/common/sql/migrate_repo/versions/078_placeholder.py 
(py27)[ayoung@ayoung541 keystone]$ time python -m testtools.run keystone.tests.unit.test_sql_upgrade
Tests running...

Ran 30 tests in 3.576s
OK

real	0m4.028s
user	0m3.951s
sys	0m0.081s

OK…lets see what happens if I put a breakpoint in one of these tests.

def upgrade(migrate_engine):
     import pdb; pdb.set_trace()

And run

(py27)[ayoung@ayoung541 keystone]$ time python -m testtools.run keystone.tests.unit.test_sql_upgrade
Tests running...
--Return--
 /opt/stack/keystone/keystone/common/sql/migrate_repo/versions/078_placeholder.py(18)upgrade()-None
-> import pdb; pdb.set_trace()

Ctrl C to kill the test (0r cont to keep running). This may not always work; some of the more complex tests will do manipulations of the thread libraries, and will keep the breakpoints from interrupting the debugging thread. For these cases, use rpdb and telnet.

More info about running the tests in OpenStack can be dfound here: https://wiki.openstack.org/wiki/Testr
I wrote about using rpdb to debug here: http://adam.younglogic.com/2015/02/debugging-openstack-with-rpdb/