Fedora security Planet

Security by Isolating Insecurity

Posted by Russel Doty on July 25, 2017 10:14 PM

In my previous post I introduced “Goldilocks Security”, proposing three approaches to security.

Solution 1: Ignore Security

Safety in the crowd – with tens of millions of cameras out there, why would anyone pick mine? Odds are that the bad guys won’t pick yours – they will pick all of them! Automated search and penetration tools easily find millions of IP cameras. You will be lost in the crowd – the crowd of bots!

Solution 2: Secure the Cameras

For home and small business customers, a secure the camera approach simply won’t work because ease of use wins out over effective security in product design and because the camera vendors’ business model (low-cost, ease of use, and access over the Internet) all conspire against security. What’s left?

Solution 3: Isolation

If the IP cameras can’t be safely placed on the Internet, then isolate them from the Internet.

To do this, introduce an IoT Gateway between the cameras and all other systems. This IoT Gateway would have two network interfaces: one network interface dedicated to the cameras and the second network interface used to connect to the outside world. An application running on the IoT Gateway would talk to the IP cameras and then talk to the outside world (if needed). There would be no network connection between the IP cameras and anything other than the IoT Gateway application. The IoT Gateway would also be hardened and actively managed for best security.

How is this implemented?

  • Put the IP cameras on a dedicated network. This should be a separate physical network. At a minimum it should be a VLAN (Virtual LAN). There will typically be a relatively small number of IP cameras in use, so a dedicated network switch, probably with PoE, is cost effective.
    • Use static IP addresses. If the IP cameras are assigned static IP addresses, there is no need to have an IP gateway or DNS server on the network segment. This further reduces the ability of the IP cameras to get out on the network. You lose the convenience of DHCP assigned address and gain significant security.
    • You can have multiple separate networks. For example, you might have one for external cameras, one for cameras in interior public spaces, one for manufacturing space and one for labs. With this configuration, someone gaining access to the exterior network would not be able to gain access to the lab cameras.
  • Add an IoT Gateway – a computer with a network interface connected to the camera network. In the example above, the gateway would have four network interfaces – one for each camera network. The IoT Gateway would probably also be connected to the corporate network; this would require a fifth network interface. Note that you can have multiple IoT Gateways, such as one for each camera network, one for a building management system, one for other security systems, and one that connects an entire building or campus to the Internet.
  • Use a video monitoring program such as ZoneMinder or a commercial program to receive, monitor and display the video data. Such a program can monitor multiple camera feeds, analyze the video feeds for things such as motion detection, record multiple video streams, and create events and alerts. These events and alerts can do things like trigger alarms, send emails, send text messages, or trigger other business rules. Note that the video monitoring program further isolates the cameras from the Internet – the cameras talk to the video monitoring program and the video monitoring program talks to the outside world.
  • Sandbox the video monitoring program using tools like SELinux and containers. These both protect the application and protect the rest of the system from the application – even if the application is compromised, it won’t be able to attack the rest of the system.
  • Remove any unneeded services from the IoT Gateway. This is a dedicated device performing a small set of tasks. There shouldn’t be any software on the system that is not needed to perform these tasks – no development tools, no extraneous programs, no unneeded services running.
  • Run the video monitoring program with minimal privileges. This program should not require root level access.
  • Configure strong firewall settings on the IoT Gateway. Only allow required communications. For example, only allow communications with specific IP addresses or mac addresses (the IP cameras configured into the system) over specific ports using specific protocols. You can also configure the firewall to only allow specific applications access to the network port. These settings would keep anything other than authorized cameras from accessing the gateway and keep the authorized cameras from talking to anything other than the video monitoring application. This approach also protects the cameras. Anyone attempting to attack the cameras from the Internet would need to penetrate the IoT Gateway and then change settings such as the firewall and SELinux before they could get to the cameras.
  • Use strong access controls. Multi-factor authentication is a really good idea. Of course you have a separate account for each user, and assign each user the minimum privilege they need to do their job. Most of the time you don’t need to be logged in to the system – most video monitoring applications can display on the lock screen, allowing visual monitoring of the video streams without being able to change the system. For remote gateways interactive access isn’t needed at all; they simply process sensor data and send it to a remote system.
  • Other systems should be able to verify the identity of the IoT Gateway. A common way to do this is to install a certificate on the gateway. Each gateway should have a unique certificate, which can be provided by systems like Linux IdM or MS Active Directory. Even greater security can be provided by placing the system identity into a hardware root of trust like a TPM (Trusted Processing Module), which prevents the identity from being copied, cloned, or spoofed.
  • Encrypted communications is always a good idea for security. Encryption protects the contents of the video stream from being revealed, prevents the contents of the video stream from being modified or spoofed, and verifies the integrity of the video stream – any modifications of the encrypted traffic, either deliberate or due to network error, are detected. Further, if you configure a VPN (Virtual Private Network) between the IoT Gateway and backend systems you can force all network traffic through the VPN, thus preventing network attacks against the IoT Gateway. For security systems it is good practice to encrypt all traffic, both internal and external.
  • Proactively manage the IoT Gateway. Regularly update it to get the latest security patches and bug fixes. Scan it regularly with tools like OpenSCAP to maintain secure configuration. Monitor logfiles for anomalies that might be related to security events, hardware issues, or software issues.

You can see how a properly configured IoT Gateway can allow you to use insecure IoT devices as part of a secure system. This approach isn’t perfect – the cameras should also be managed like the gateway – but it is a viable approach to building a reasonably secure and robust system out of insecure devices.

One issue is that the cameras are not protected from local attack. If WiFi is used the attacker only needs to be nearby. If Ethernet is used an attacker can add another device to the network. This is difficult as you would need to gain access to the network switch and find a live port on the proper network. Attacking the Ethernet cable leaves signs, including network glitches. Physically attacking a camera also leaves signs. All of this can be done, but is more challenging than a network based attack over the Internet and can be managed through physical security and good network monitoring. These are some of the reasons why I strongly prefer wired network connections over wireless network connections.

Security and privacy are the same thing

Posted by Josh Bressers on July 23, 2017 12:36 AM
Earlier today I ran across this post on Reddit
Security but not Privacy (Am I doing this right?)

The poster basically said "I care about security but not privacy".

It got me thinking about security and privacy. There's not really a difference between the two. They are two faces of the same coin but why isn't always obvious in today's information universe. If a site like Facebook or Google knows everything about you it doesn't mean you don't care about privacy, it means you're putting your trust in those sites. The same sort of trust that makes passwords private.

The first thing we need to grasp is what I'm going to call a trust boundary. I trust you understand trust already (har har har). But a trust boundary is less obvious sometimes. A security (or privacy) incident happens when there is a breach of the trust boundary. Let's just dive into some examples to better understand this.

A web site is defaced
In this example the expectation is the website owner is the only person or group that can update the website content. The attacker crossed a trust boundary that allowed them to make unwanted changes to the website.

Your credit card is used fraudulently
It's expected that only you will be using your credit card. If someone gets your number somehow and starts to make purchases with your card, how they got the card crosses a trust boundary. You could easily put this example in the "privacy" bucket if you wanted to keep them separate, it's likely your card was stolen due to lax security at one of the businesses you visited.

Your wallet is stolen
This one is tricky. The trust boundary is probably your pocket or purse. Maybe you dropped it or forgot it on a counter. Whatever happened the trust boundary is broken when you lose control of your wallet. An event like this can trickle down though. It could result in identity theft, your credit card could be used. Maybe it's just about the cash. The scary thing is you don't really know because you lost a lot of information. Some things we'd call privacy problems, some we'd call security problems.

I use a confusing last example on purpose to help prove my point. The issue is all about who do you trust with what. You can trust Facebook and give them tons of information, many of us do. You can trust Google for the same basic reasons. That doesn't mean you don't care about privacy, it just means you have put them inside a certain trust boundary. There are limits to that trust though.

What if Facebook decided to use your personal information to access your bank records? That would be a pretty substantial trust boundary abuse. What if your phone company decided to use the information they have to log into your Facebook account?

A good password isn't all that different from your credit card number. It's a bit of private information that you share with one or more other organizations. You are expecting them not to cross a trust boundary with the information you gave them.

The real challenge is to understand what trust boundaries you're comfortable with. What do you share with who? Nobody is an island, we must exist in an ecosystem of trust. We all have different boundaries of what we will share. That's quite all right. If you understand your trust boundary making good security/privacy decisions becomes a lot easier.

They say information is the new oil. If that's true then trust must be the currency.

Goldilocks Security: Bad, Won’t Work, and Plausible

Posted by Russel Doty on July 20, 2017 11:03 PM

Previous posts discussed the security challenge presented by IoT devices, using IP Video Cameras as an example. Now let’s consider some security alternatives:

Solution 1: Ignore Security

This is the most common approach to IoT security today. And, to a significant degree, it works. In the same way that ignoring fire safety usually works – only a few businesses or homes burn down each year!

Like fire safety, the risks from ignoring IoT security grow over time. Like fire safety, the cost of the relatively rare events can be catastrophic. Unlike fire safety, an IoT event can affect millions of entities at the same time.

And, unlike traditional IT security issues, IoT security issues can result in physical damage and personal injury. Needless to say, I do not recommend ignoring the issue as a viable approach to IoT security!

Solution 2: Secure the Cameras

Yes, you should secure IP cameras. They are computers sitting on your network – and should be treated like computers on your network! Best practices for IT security are well known and readily available. You should install and configure them securely, update them regularly, and monitor them continuously.

If you have a commercial implementation of an IP video security system you should have regular updates and maintenance of your system. You should be demanding strong security – both physical security and IT security – of the video security system.

You did have IT involved in selection, implementation and operation of the video security system, didn’t you? You did make security a key part of the selection process, just as you would for any other IT system, didn’t you? You are doing regular security scans of the video security system and monitoring all network traffic, aren’t you? Good, you have nothing to worry about!

If you are like many companies, you are probably feeling a bit nervous right now…

For home and small business customers, a secure the camera approach simply won’t work.

  • Customer ease of use expectations largely prevent effective security.
  • Customer knowledge and expertise doesn’t support secure configuration or updates to the system.
  • The IoT vendor business model doesn’t support security: Low cost, short product life, a great feature set, ease of use, and access over the Internet all conspire against security.
  • There is a demonstrated lack of demand for security. People have shown, by their actions and purchasing decisions, the effective security is not a priority. At least until there is a security breach – and then they are looking for someone to blame. And often someone to sue…

Securing the cameras is a great recommendation but generally will not work in practice. Unfortunately. Still, it should be a requirement for any Industrial IoT deployment.

Solution 3: Isolation

If ignoring the problem doesn’t work and fixing the problem isn’t viable, what is left? Isolation. If the IP cameras can’t be safely placed on the Internet, then isolate them from the Internet.

Such isolation will both protect the cameras from the Internet and protect the Internet from the cameras.

The challenge is that networked cameras have to be on the network to work.

Even though the cameras are designed to be directly connected to the Internet, they don’t have to be directly connected to the Internet. The cameras can be placed on a separate isolated network.

In my next post, I will go into detail on how to achieve this isolation using an IoT Gateway between the cameras and all the other systems.

Summer is coming

Posted by Josh Bressers on July 20, 2017 12:27 PM
I'm getting ready to attend Black Hat. I will miss BSides and Defcon this year unfortunately due to some personal commitments. And as I'm packing up my gear, I started thinking about what these conferences have really changed. We've been doing this every summer for longer than many of us can remember now. We make our way to the desert, we attend talks by what we consider the brightest minds in our industry. We meet lots of people. Everyone has a great time. But what is the actionable events that come from these things.

The answer is nothing. They've changed nothing.

But I'm going to put an asterisk next to that.

I do think things are getting better, for some definition of better. Technology is marching forward, security is getting dragged along with a lot of it. Some things, like IoT, have some learning to do, but the real change won't come from the security universe.

Firstly we should understand that the world today has changed drastically. The skillset that mattered ten years ago doesn't have a lot of value anymore. Things like buffer overflows are far less important than they used to be. Coding in C isn't quite what it once was. There are many protections built into frameworks and languages. The cloud has taken over a great deal of infrastructure. The list can go on.

The point of such a list is to ask the question, how much of the important change that's made a real difference came from our security leaders? I'd argue not very much. The real change comes from people we've never heard of. There are people in the trenches making small changes every single day. Those small changes eventually pile up until we notice they're something big and real.

Rather than trying to fix the big problems, our time is better spent ignoring the thought leaders and just doing something small. Conferences are important, but not to listen to the leaders. Go find the vendors and attendees who are doing new and interesting things. They are the ones that will make a difference, they are literally the future. Even the smallest bug bounty, feature, or pull request can make a difference. The end goal isn't to be a noisy gasbag, instead it should be all about being useful.

New version of buildah 0.2 released to Fedora.

Posted by Dan Walsh on July 19, 2017 01:01 PM
New features and bugfixes in this release

Updated Commands
buildah run
     Add support for -- ending options parsing
     Add a way to disable PTY allocation
     Handle run without an explicit command correctly
Buildah build-using-dockerfile (bud)
    Ensure volume points get created, and with perms
buildah containers
     Add a -a/--all option - Lists containers not created by buildah.
buildah Add/Copy
     Support for glob syntax
buildah commit
     Add flag to remove containers on commit
buildah push
     Improve man page and help information
buildah export:
    Allows you to export a container image
buildah images:
    update commands
    Add JSON output option
buildah rmi
    update commands
buildah containers
     Add JSON output option

New Commands
buildah version
     Identify version information about the buildah command
buildah export
     Allows you to export a containers image

Buildah docs: clarify --runtime-flag of run command
Update to match newer storage and image-spec APIs
Update containers/storage and containers/image versions

Episode 56 - Devil's Advocate and other fuzzy topics

Posted by Open Source Security Podcast on July 18, 2017 08:50 PM
Josh and Kurt talk about forest fires, fuzzing, old time Internet, and Net Neutrality. Listen to Kurt play the Devil's Advocate and manage to change Josh's mind about net neutrality.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="http://html5-player.libsyn.com/embed/episode/id/5551879/height/90/width/640/theme/custom/autonext/no/thumbnail/yes/autoplay/no/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="640"></iframe>

Show Notes

Representative IoT Device: IP Video Camera

Posted by Russel Doty on July 17, 2017 09:58 PM

One of the most flexible, powerful, and useful IoT sensors is a video camera. Video streams can be used directly. They can also be analyzed using modern software and an incredible range of information extracted from the images: motion detection for eventing and alerts, automobile license recognition for parking systems and theft detection, facial recognition, manufacturing quality control, part location and orientation for robotics, local environment for autonomous vehicles, crop analysis for health and pests, and new uses that haven’t been thought of yet!

The IoT revolution for video cameras is the IP (Internet Protocol) camera – a video camera with integrated computer that can talk directly to a network and provide video and still images in a format that can be directly manipulated by software. An IP camera is essentially a computer with an image sensor and a network interface. A surprisingly powerful computer which can do image processing, image analysis, image conversion, image compression, and send multiple real-time video streams over the Internet. The IP cameras use standard processors, operating systems, and toolkits for video processing and networking.

Modern IP security cameras have high resolution – 3MP-5MP – excellent image quality, the ability to see in complete darkness, and good mechanical construction that can withstand direct exposure to the elements for many years. Many of these IP Video Cameras have enough processing power to be able to do motion detection inside the camera – a rather advanced video analysis capability! They can be connected to the network over WiFi or Ethernet. A popular capability is PoE or Power over Ethernet, which allows a camera to use a single Ethernet cable for both network and power. For ease of use these IP cameras are designed to automatically connect to back-end servers in the cloud and then to display the video stream on smartphones.

These IP cameras are available with full support and regular updates from industrial suppliers at prices ranging from several hundred to a few thousand dollars per camera. They are commonly sold in systems that include cameras, installation, monitoring and recording systems and software, integration, and service and support. There are a few actual manufacturers of the cameras, and many OEMs place their own brand names on the cameras.

These same cameras are readily available to consumers for less than $100 through unofficial, unsupported, “grey market” channels.

IP cameras need an account for setup, configuration and management. They contain an embedded webserver with full control of the camera. Virtually all cameras have a root level account with username of admin and password of admin. Some of them even recommend that you change this default password… One major brand of IP cameras also has two hardcoded maintenance accounts with root access; you can’t change the password on these accounts. And you can discover the username and password with about 15 seconds of Internet research.

The business model that allows you to purchase a high quality IP camera for <$100 does not support lifetime updates of software. It also does not support high security – ease of use and avoiding support calls is the highest priority. Software updates can easily cause problems – and the easiest way to avoid problems caused by software updates is to avoid software updates. The result is a “fire and forget” model where the software in the IP camera is never updated after the camera is installed. This means that security vulnerabilities are never addressed.

Let’s summarize:

  • IP video cameras are powerful, versatile and flexible IoT sensors that can be used for many purposes.
  • High quality IP cameras are readily available at low cost.
  • IP video cameras are powerful general purpose computers.
  • The business model for IP video cameras results in cameras that are seldom updated and are typically not configured for good security.
  • IP video cameras are easy to compromise and take over.
    • Can be used to penetrate the rest of your network.
    • Can be used to attack the Internet.
  • There are 10’s of millions of IP video cameras installed.

So far we have outlined the problem. The next post will begin to explore how we can address the security issues – including obvious approaches that won’t work…

Episode 55 - Good docs ruin my story

Posted by Open Source Security Podcast on July 12, 2017 01:56 PM
Josh and Kurt talk about Let's Encrypt, certificates, Kaspersky, A/V, code signing, Not Petya, self driving cars, and failures that become security problems.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="http://html5-player.libsyn.com/embed/episode/id/5534632/height/90/width/640/theme/custom/autonext/no/thumbnail/yes/autoplay/no/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/87A93A/" style="border: none;" webkitallowfullscreen="" width="640"></iframe>

Show Notes

How I Survived the Internet of Things

Posted by Russel Doty on July 12, 2017 12:32 AM

Working with IoT from a software architecture perspective teaches you a lot, but leaves the nagging question “how does this really work?”. Theory is great and watching other people work is relaxing, but the time comes when I have to get my hands dirty. So I decided I had to actually implement an IoT project.

The first step was to define the goals for the project:

  • Hands-on experience with Industrial IoT technologies. I’m much more interested in Industrial IoT than Consumer IoT. I am not going to have anything to do with an Internet refrigerator!
  • Accomplish a real task with IoT:
    • Something useful and worthwhile; something that makes a difference.
    • Something usable – including by non-technical people!
    • Something robust and reliable. A system that can be expected to function for a decade or longer with essentially perfect reliability.
    • “Affordable” – a reasonably low cost entry cost, but with a bias toward functional capabilities, low maintenance, and long life. Balance initial costs with operational costs and minimize system elements that have monthly or yearly fees.
    • Secure – including system and network security. There will be much more on this topic!
  • Learn how things really work. Engage in hand to hand combat with sensors, devices, systems, wired vs. wireless, reliability, usability, interoperability, and the myriad other factors that crop up when you actually try to make something work.
  • A bias toward using commercial components and systems rather than building things out of Raspberry Pi and sensor modules. There isn’t anything wrong with Raspberry Pi and low level integration, I just wanted to work at a higher level.
  • And, to be completely honest, to have an excuse to play with some neat toys!

Based on these goals I chose to work on home automation with a focus on security and lighting. After considering many things that could be done I chose to implement monitoring of fire, carbon monoxide, power, temperature, water intrusion, perimeter intrusion, and video monitoring. I also implemented lighting control with the goals of power savings, convenience, and having lights on when you come home. When designing and implementing the various subsystems I chose commercial grade monitoring, sensors and controls.

I sometimes get the question “do you live in a bad neighborhood?” No, I live in a great neighborhood. The main reasons for this project were safety, reduced power consumption, and an excuse to play with neat toys. Yes, I got carried away…

October 2016: Things Attack the Internet

In October 2016 several large Internet sites were subjected to a massive DdoS (Distributed Denial of Service) attack carried out by hundreds of thousands, perhaps millions, of compromised IP video cameras and home routers. These attacks were some of the highest bandwidth attacks ever observed and are hard to defend against.

In January of 2017, an estimated 70% of the security cameras in Washington DC were compromised by malware and were not able to stream video. Workers had to physically go to each individual camera and do a fresh install of the original firmware to return them to operation.

Security experts have been warning about weaknesses in IoT for years. Many of these warnings are about how easy it is to compromise and subvert IoT systems. The October 2016 attacks showed that these IoT weaknesses can also be used to directly attack key parts of the Internet. A larger attack could potentially make the Internet unusable!

Since IP cameras were used in the first major attack by IoT on the Internet and I have several of these cameras installed in my system, let’s start our case study with with them.

The next article will begin exploring the capabilities, security, and business model of powerful and affordable IoT devices.

Implications of Common Name deprecation for Dogtag and FreeIPA

Posted by Fraser Tweedale on July 11, 2017 03:25 AM

Or, ERR_CERT_COMMON_NAME_INVALID, and what we are doing about it.

Google Chrome version 58, released in April 2017, removed support for the X.509 certificate Subject Common Name (CN) as a source of naming information when validating certificates. As a result, certificates that do not carry all relevant domain names in the Subject Alternative Name (SAN) extension result in validation failures.

At the time of writing this post Chrome is just the first mover, but Mozilla Firefox and other programs and libraries will follow suit. The public PKI used to secure the web and other internet communiations is largely unaffected (browsers and CAs moved a long time ago to ensure that certificates issued by publicly trusted CAs carried all DNS naming information in the SAN extension), but some enterprises running internal PKIs are feeling the pain.

In this post I will provide some historical and technical context to the situation, and explain what we are are doing in Dogtag and FreeIPA to ensure that we issue valid certificates.


X.509 certificates carry subject naming information in two places: the Subject Distinguished Name (DN) field, and the Subject Alternative Name extension. There are many types of attributes available in the DN, including organisation, country, and common name. The definitions of these attribute types came from X.500 (the precursor to LDAP) and all have an ASN.1 representation.

Within the X.509 standard, the CN has no special interpretation, but when certificates first entered widespread use in the SSL protocol, it was used to carry the domain name of the subject site or service. When connecting to a web server using TLS/SSL, the client would check that the CN matches the domain name they used to reach the server. If the certificate is chained to a trusted CA, the signature checks out, and the domain name matches, then the client has confidence that all is well and continues the handshake.

But there were a few problems with using the Common Name. First, what if you want a certificate to support multiple domain names? This was especially a problem for virtual hosts in the pre-SNI days where one IP address could only have one certificate associated with it. You can have multiple CNs in a Distinguished Name, but the semantics of X.500 DNs is strictly heirarichical. It is not an appropriate use of the DN to cram multiple, possibly non-hierarchical domain names into it.

Second, the CN in X.509 has a length limit of 64 characters. DNS names can be longer. The length limit is too restrictive, especially in the world of IaaS and PaaS where hosts and services are spawned and destroyed en masse by orchestration frameworks.

Third, some types of subject names do not have a corresponding X.500 attribute, including domain names. The solution to all three of these problems was the introduction of the Subject Alternative Name X.509 extension, to allow more types of names to be used in a certificate. (The SAN extensions is itself extensible; apart from DNS names other important name types include IP addresses, email addresses, URIs and Kerberos principal names). TLS clients added support for validating SAN DNSName values in addition to the CN.

The use of the CN field to carry DNS names was never a standard. The Common Name field does not have these semantics; but using the CN in this way was an approach that worked. This interpretation was later formalised by the CA/B Forum in their Baseline Requirements for CAs, but only as a reflection of a current practice in SSL/TLS server and client implementations. Even in the Baseline Requirements the CN was a second-class citizen; they mandated that if the CN was present at all, it must reflect one of the DNSName or IP address values from the SAN extension. All public CAs had to comply with this requirement, which is why Chrome’s removal of CN support is only affecting private PKIs, not public web sites.

Why remove CN validation?

So, Common Name was not ideal for carrying DNS naming information, but given that we now have SAN, was it really necessary to deprecate it, and is it really necessary to follow through and actually stop using it, causing non-compliant certificates that were previously accepted to now be rejected?

The most important reason for deprecating CN validation is the X.509 Name Constraints extension. Name Constraints, if they appear in a CA certificate or intermediate CA certificate, constrain the valid subject names on leaf certificates. Various name types are supported including DNS names; a DNS name constraint restricts the domain of validity to the domain(s) listed and subdomains thereof. For example, if the DNS name example.com appears in a CA certificate’s Name Constraints extension, leaf certificates with a DNS name of example.com or foo.example.com could be valid, but a DNS name of foo.example.net could not be valid. Conforming X.509 implementations must enforce these constraints.

But these constraints only apply to SAN DNSName values, not to the CN. This is why accepting DNS naming information in the CN had to be deprecated – the name constraints cannot be properly enforced!

So back in May 2000 the use of Common Name for carrying a DNS name was deprecated by RFC 2818. Although it deprecated the practice this RFC required clients to fall back to the Common Name if there were no SAN DNSName values on the certificate. Then in 2011 RFC 6125 removed the requirement for clients to fall back to the common name, making this optional behaviour. Over recent years, some TLS clients began emitting warnings when they encountered certificates without SAN DNSNames, or where a DNS name in the CN did not also appear in the SAN extension. Finally, Chrome has become the first widely used client to remove support.

Despite more than 15 years notice on the deprecation of this use of Common Name, a lot of CA software and client tooling still does not have first-class support for the SAN extension. Most tools used to generate CSRs do not even ask about SAN, and require complex configuration to generate a request bearing the SAN extension. Similarly, some CA programs does not do a good job of issuing RFC-compliant certificates. Right now, this includes Dogtag and FreeIPA.

Subject Alternative Name and FreeIPA

For some years, FreeIPA (in particular, the default profile for host and service certificates, called caIPAserviceCert) has supported the SAN extension, but the client is required to submit a CSR containing the desired SAN extension data. The names in the CSR (the CN and all alternative names) get validated against the subject principal, and then the CA would issue the certificate with exactly those names. There was no way to ensure that the domain name in the CN was also present in the SAN extension.

We could add this requirement to FreeIPA’s CSR validation routine, but this imposes an unreasonable burden on the user to "get it right". Tools like OpenSSL have poor usability and complex configuration. Certmonger supports generating a CSR with the SAN extension but it must be explicitly requested. For FreeIPA’s own certificates, we have (in recent major releases) ensured that they have contained the SAN extension, but this is not the default behaviour and that is a problem.

FreeIPA 4.5 brought with it a CSR autogeneration feature that, for a given certificate profile, lets the administrator specify how to construct a CSR appropriate for that profile. This reduces the burden on the end user, but they must still opt in to this process.

Subject Alternative Name and Dogtag

Until Dogtag 10.4, there were two ways to produce a certificate with the SAN extension. One was the SubjectAltNameExtDefault profile component, which, for a given profile, supports a fixed number of names, either hard coded or based on particular request attributes (e.g. the CN, the email address of the authenticated user, etc). The other was the UserExtensionDefault which copies a given extension from the CSR to the final certificate verbatim (no validation of the data occurs). We use UserExtensionDefault in FreeIPA’s certificate profile (all names are validated by the FreeIPA framework before the request is submitted to Dogtag).

Unfortunately, SubjectAltNameExtDefault and UserExtensionDefault are not compatible with each other. If a profile uses both and the CSR contains the SAN extension, issuance will fail with an error because Dogtag tried to add two SAN extensions to the certificate.

In Dogtag 10.4 we introduced a new profile component that improves the situation, especially for dealing with the removal of client CN validation. The CommonNameToSANDefault will cause any profile that uses it to examine the Common Name, and if it looks like a DNS name, it will add it to the SAN extension (creating the extension if necessary).

Ultimately, what is needed is a way to define a certificate profile that just makes the right certificate, without placing an undue burden on the client (be it a human user or a software agent). The complexity and burden should rest with Dogtag, for the sake of all users. We are gradually making steps toward this, but it is still a long way off. I have discussed this utopian vision in a previous post.

Configuring CommonNameToSANDefault

If you have Dogtag 10.4, here is how to configure a profile to use the CommonNameToSANDefault. Add the following policy directives (the policyset and serverCertSet and index 12 are indicative only, but the index must not collide with other profile components):

policyset.serverCertSet.12.constraint.name=No Constraint
policyset.serverCertSet.12.default.name=Copy Common Name to Subject

Add the index to the list of profile policies:


Then import the modified profile configuration, and you are good to go. There are a few minor caveats to be aware of:

  • Names containing wildcards are not recognised as DNS names. The rationale is twofold; wildcard DNS names, although currently recognised by most programs, are technically a violation of the X.509 specification (RFC 5280), and they are discouraged by RFC 6125. Therefore if the CN contains a wildcard DNS name, CommonNameToSANDefault will not copy it to the SAN extension.
  • Single-label DNS names are not copied. It is unlikely that people will use Dogtag to issue certificates for top-level domains. If CommonNameToSANDefault encounters a single-label DNS name, it will assume it is actually not a DNS name at all, and will not copy it to the SAN extension.
  • The CommonNameToSANDefault policy index must come after UserExtensionDefault, SubjectAltNameExtDefault, or any other component that adds the SAN extension, otherwise an error may occur because the older components do not gracefully handle the situation where the SAN extension is already present.

What we are doing in FreeIPA

Updating FreeIPA profiles to use CommonNameToSANDefault is trickier – FreeIPA configures Dogtag to use LDAP-based profile storage, and mixed-version topologies are possible, so updating a profile to use the new component could break certificate requests on other CA replicas if they are not all at the new versions. We do not want this situation to occur.

The long-term fix is to develop a general, version-aware profile update mechanism that will import the best version of a profile supported by all CA replicas in the topology. I will be starting this effort soon. When it is in place we will be able to safely update the FreeIPA-defined profiles in existing deployments.

In the meantime, we will bump the Dogtag dependency and update the default profile for new installations only in the 4.5.3 point release. This will be safe to do because you can only install replicas at the same or newer versions of FreeIPA, and it will avoid the CN validation problems for all new installations.


In this post we looked at the technical reasons for deprecating and removing support for CN domain validation in X.509 certificates, and discussed the implications of this finally happening, namely: none for the public CA world, but big problems for some private PKIs and programs including FreeIPA and Dogtag. We looked at the new CommonNameToSANDefault component in Dogtag that makes it easier to produce compliant certs even when the tools to generate the CSR don’t help you much, and discussed upcoming and proposed changes in FreeIPA to improve the situation there.

One big takeaway from this is to be more proactive in dealing with deprecated features in standards, APIs or programs. It is easy to punt on the work, saying "well yes it is deprecated but all the programs still support it…" The thing is, tomorrow they may not support it anymore, and when it was deprecated for good reasons you really cannot lay the blame at Google (or whoever). On the FreeIPA team we (and especially me as PKI wonk in residence) were aware of these issues but kept putting off the work. Then one day users and customers start having problems accessing their internal services in Chrome! 15 years should have been enough time to deal with it… but we (I) did not.

Lesson learned.

Who's got your hack back?

Posted by Josh Bressers on July 09, 2017 12:22 AM
The topic of hacking back keeps coming up these days. There's an attempt to pass a bill in the US that would legalize hacking back. There are many opinions on this topic, I'm generally not one to take a hard stand against what someone else thinks. In this case though, if you think hacking back is a good idea, you're wrong. Painfully wrong.

Everything I've seen up to this point tells me the people who think hacking back is a good idea are either mistaken about the issue or they're misleading others on purpose. Hacking back isn't self defense, it's not about being attacked, it's not about protection. It's a terrible idea that has no place in a modern society. Hacking back is some sort of stone age retribution tribal law. It has no place in our world.

Rather than break the various argument apart. Let's think about two examples that exist in the real world.

Firstly, why don't we give the people doing mall security guns? There is one really good reasons I can think of here. The insurance company that holds the policy on the mall would never allow the security to carry guns. If you let security carry guns, they will use them someday. They'll probably use them in an inappropriate manner, the mall will be sued, and they will almost certainly lose. That doesn't mean the mall has to pay a massive settlement, it means the insurance company has to pay a massive settlement. They don't want to do that. Even if some crazy law claims it's not illegal to hack back, no sane insurance company will allow it. I'm not talking about cyber insurance, I'm just talking about general policies here.

The second example revolves around shoplifting. If someone is caught stealing from a store, does someone go to their house and take some of their stuff in retribution? They don't of course. Why not? Because we're not cave people anymore. That's why. Retribution style justice has no place in a modern civilization. This is how a feud starts, nobody has ever won a feud, at best it's a draw when they all kill each other.

So this has me really thinking. Why would anyone want to hack back? There aren't many reasons that don't revolve around revenge. The way most attacks work you can't reliably know who is doing what with any sort of confidence. Hacking back isn't going to make anything better. It would make things a lot worse. Nobody wants to be stuck in the middle of a senseless feud. Well, nobody sane.

Redeploying just virt-controller for Kubevirt development

Posted by Adam Young on July 07, 2017 05:11 PM

Bottom line up front:

cluster/kubectl.sh delete -f manifests/virt-controller.yaml
cluster/kubectl.sh create -f manifests/virt-controller.yaml

When reworking code (refactoring or rewriting) you want to make sure the tests run. While Unit tests run quickly and within the code tree, functional tests require a more dedicated setup. Since the time to deploy a full live cluster is non-trivial, we want to be able to deploy only the component we’ve been working on. In the case of virt-controller, this is managed as a service, a deployment, and a single pod. All are defined by manifests/virt-controller.yaml.

To update a deployment, we need to make sure that the next time the containers run, they contains the new code. ./cluster/vagrant/sync_build.sh does a few things to make that happen. It complies the go code, rebuilds the containers, and uploads them to the image repositories on the vagrant machines.

All of these steps can be done using the single line:

make vagrant-deploy

but it will take a while.  I ran it using the time command and it took 1m9.724s.

make alone takes 0m5.685s.

./cluster/vagrant/sync_build.sh  takes 0m24.773s

cluster/kubectl.sh delete -f manifests/virt-controller.yaml takes 0m3.265s


time cluster/kubectl.sh create -f manifests/virt-controller.yaml takes 0m0.203s.  Running it this way I find keeps me from getting distracted and losing the zone.

Running make docker is very slow, as it regenerates all of the docker containers.  If you don’t really care about all of them, you can generate just virt-controller by running:

./hack/build-docker.sh build virt-controller

Which takes 0m1.521s.

So, the gating factor seems to be the roughly 40 second deploy time for ./cluster/vagrant/sync_build.sh.  Not ideal for rapid development, but not horrible.


Episode 54 - Turning into an old person

Posted by Open Source Security Podcast on July 04, 2017 09:07 PM
Josh and Kurt talk about Canada Day, Not Petya, Interac goes down, Minecraft, airport security and books, then GDPR.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="http://html5-player.libsyn.com/embed/episode/id/5534634/height/90/width/640/theme/custom/autonext/no/thumbnail/yes/autoplay/no/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/87A93A/" style="border: none;" webkitallowfullscreen="" width="640"></iframe>

Show Notes

Sausage Factory: Advanced module building in Fedora

Posted by Stephen Gallagher on June 30, 2017 01:58 PM

First off, let me be very clear up-front: normally, I write my blog articles to be approachable by readers of varying levels of technical background (or none at all). This will not be one of those. This will be a deep dive into the very bowels of the sausage factory.

This blog post is a continuation of the Introduction to building modules in Fedora entry I wrote last month. It will assume a familiarity with all of the concepts discussed there.

Analyzing a more complicated module

Last time, we picked an extremely simple package to create. The talloc module needed to contain only a single RPM, since all the dependencies necessary both at build-time and runtime were available from the existing base-runtime, shared-userspace and common-build-dependencies packages.

This time, we will pick a slightly more complicated example that will require exploring some of the concepts around building with package dependencies. For this purpose, I am selecting the sscg package (one of my own and discussed previously on this blog in the article “Self-Signed SSL/TLS Certificates: Why they are terrible and a better alternative“).

We will start by analyzing sscg‘s dependencies. As you probably recall from the earlier post, we can do this with dnf repoquery:

dnf repoquery --requires sscg.x86_64 --resolve

Which returns with:


and then also get the build-time dependencies with:

dnf repoquery --requires --enablerepo=fedora-source --enablerepo=updates-source sscg.src --resolve

Which returns with:/home/sgallagh/modulebuild/builds/module-talloc-master-20170526153440/results/module-build-macros-mock-stderr.log


So let’s start by narrowing down the set of dependencies we already have by comparing them to the three foundational modules. The base-runtime module provides gcc, glibcopenssl-libs, openssl-devel, popt, and popt-devel . The shared-userspace module provides libpath_utils and libpath_utils-devel as well, which leaves us with only libtalloc as an unsatisfied dependency. Wow, what a convenient and totally unexpected outcome when I chose this package at random! Kidding aside, in most real-world situations this would be the point at which we would start recursively going through the leftover packages and seeing what their dependencies are. In this particular case, we know from the previous article that libtalloc is self-contained, so we will only need to include sscg and libtalloc in the module.

As with the libtalloc example, we need to now clone the dist-git repositories of both packages and determine the git hash that we intend to use for building the sscg module. See the previous blog post for details on this.

Creating a module with internal dependencies

Now let’s set up our git repository for our new module:

mkdir sscg && cd sscg
touch sscg.yaml
git init
git add sscg.yaml
git commit -m "Initial setup of the module"

And then we’ll edit the sscg.yaml the same way we did for the libtalloc module:

document: modulemd
version: 1
  summary: Simple SSL certificate generator
  description: A utility to aid in the creation of more secure "self-signed" certificates. The certificates created by this tool are generated in a way so as to create a CA certificate that can be safely imported into a client machine to trust the service certificate without needing to set up a full PKI environment and without exposing the machine to a risk of false signatures from the service certificate.
  stream: ''
  version: 0
    - GPLv3+
    community: https://github.com/sgallagher/sscg
    documentation: https://github.com/sgallagher/sscg/blob/master/README.md
    tracker: https://github.com/sgallagher/sscg/issues
      base-runtime: f26
      shared-userspace: f26
      common-build-dependencies: f26
      perl: f26
      base-runtime: f26
      shared-userspace: f26
    - sscg
    - sscg
        rationale: Provides a hierarchical memory allocator with destructors. Dependency of sscg.
        ref: f284a27d9aad2c16ba357aaebfd127e4f47e3eff
        buildorder: 0
        rationale: Purpose of this module. Provides certificate generation helpers.
        ref: d09681020cf3fd33caea33fef5a8139ec5515f7b
        buildorder: 1

There are several changes from the libtalloc example in this modulemd, so let’s go through them one at a time.

The first you may notice is the addition of perl in the buildrequires: dependencies. This is actually a workaround at the moment for a bug in the module-build-service where not all of the runtime requirements of the modules specified as buildrequires: are properly installed into the buildroot. It’s unfortunate, but it should be fixed in the near future and I will try to remember to update this blog post when it happens.

You may also notice that the api section only includes sscg and not the packages from the libtalloc component. This is intentional. For the purposes of this module, libtalloc satisfies some dependencies for sscg, but as the module owner I do not want to treat libtalloc as a feature of this module (and by extension, support its use for anything other than the portions of the library used by sscg). It remains possible for consumers of the module to link against it and use it for their own purposes, but they are doing so without any guarantee that the interfaces will remain stable or even be present on the next release of the module.

Next on the list is the addition of the entirely-new profiles section. Profiles are a way to indicate to the package manager (DNF) that some packages from this module should automatically be installed when the module is activated if a certain system profile is enabled. The ‘default’ profile will take effect if no other profile is explicitly set. So in this case, the expectation if a user did dnf module install sscg would be to activate this module and install the sscg package (along with its runtime dependencies) immediately.

Lastly, under the RPM components there is a new option, buildorder. This is used to inform the MBS that some packages are dependent upon others in the module when building. In our case, we need libtalloc to be built and added into the buildroot before we can build sscg or else the build will fail and we will be sad. By adding buildorder, we tell the MBS: it’s okay to build any of the packages with the same buildorder value concurrently, but we should not attempt to build anything with a higher buildorder value until all of those lower have completed. Once all packages in a buildorder level are complete, the MBS will generate a private buildroot repository for the next buildorder to use which includes these packages. If the buildorder value is left out of the modulemd file, it is treated as being buildorder: 0.

At this point, you should be able to go ahead and commit this modulemd file to git and run mbs-build local successfully. Enjoy!

Protected: DRAFT: Sausage Factory: Advanced module building in Fedora

Posted by Stephen Gallagher on June 30, 2017 01:34 PM

This post is password protected. You must visit the website and enter the password to continue reading.

Quick Blog on Buildah.

Posted by Dan Walsh on June 30, 2017 12:39 PM
Buildah is a new tool that we released last week for building containers without requiring a container runtime daemon running. --nodockerneeded

Here is a blog that talks about some of its features.


Our main goal was to make this simple.  I was asked by a fellow engineer about a feature that docker has for copying a file out of a container onto the host.  "docker cp".  In docker this ends up being a client server operation, and required someone to code it up.  We don't have this feature in buildah.  :^(

BUT, buildah gives you the primitives you need to do simpler functionality and allows you to use the full power of bash.  If I want to copy a file out of a container, I can simply mount the container and copy it out.

# mnt=$(buildah mount CONTAINER_ID)
# buildah umount CONTAINER_ID

The beauty of this is we could use lots of tools, I could scp if I wanted to copy to another machine, or rsync, or ftp...

Once your have the container mounted up, you can use any bash command on it, to move files in or out.

buildah == simplicity

Running Kubevirt functional tests in Gogland

Posted by Adam Young on June 30, 2017 12:42 AM

When tests fail, as they often will, the debugger can greatly shorten the time it takes to figure out why.  The Kubevirt functional tests run essentially as a remote client.  Getting a debuggable setup is not that different from my earlier post on running virt-launcher in the debugger.

I started by trying to run the unit tests list other tests, but had a similar problem to the virt-controller setup.  Looking at cluster/run_tests.sh what is not clear is that it is doing a directory based run of the tests.  Changing the config to look like this worked:

Note that I changes Kind to directory and edited the Directory field to point to where we have our unit tests.  The program arguments field. looks like this in full:

-master= --kubeconfig=/home/ayoung/go/src/kubevirt.io/kubevirt/cluster/vagrant/.kubeconfig


What's new in José v8?

Posted by Nathaniel McCallum on June 29, 2017 01:22 PM
Wait! What’s José? José is a general purpose cryptography toolkit which uses the data formats standardized by the JOSE IETF Working Group. By analogy, José is to JOSE what GPG is to OpenPGP and OpenSSL is to X.509. José provides both a C-language library and a command line interface and is licensed under the Apache Software License version 2.0. José v8+ is available on Fedora 26+ (dnf install jose) and macOS/Homebrew (brew install jose).

Episode 53 - A plane isn't like a car

Posted by Open Source Security Podcast on June 28, 2017 12:14 PM
Josh and Kurt talk about security through obscurity, airplanes, the FAA, the Windows source code leak, and chicken sandwiches.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="http://html5-player.libsyn.com/embed/episode/id/5534635/height/90/width/640/theme/custom/autonext/no/thumbnail/yes/autoplay/no/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/87A93A/" style="border: none;" webkitallowfullscreen="" width="640"></iframe>

Show Notes

Wildcard SAN certificates in FreeIPA

Posted by Fraser Tweedale on June 26, 2017 12:48 PM

In an earlier post I discussed how to make a certificate profile for wildcard certificates in FreeIPA, where the wildcard name appeared in the Subject Common Name (CN) (but not the Subject Alternative Name (SAN) extension). Apart from the technical details that post also explained that wildcard certificates are deprecated, why they are deprecated, and therefore why I was not particularly interested in pursuing a way to get wildcard DNS names into the SAN extension.

But, as was portended long ago (more than 15 years, when RFC 2818 was published) DNS name assertions via the CN field are deprecated, and finally some client software removed CN name processing support. The Chrome browser is first off the rank, but it won’t be the last!

Unfortunately, programs that have typically used wildcard certificates (hosting services/platforms, PaaS, and sites with many subdomains) are mostly still using wildcard certificates, and FreeIPA still needs to support these programs. As much as I would like to say "just use Let’s Encrypt / ACME!", it is not realistic for all of these programs to update in so short a time. Some may never be updated. So for now, wildcard DNS names in SAN is more than a "nice to have" – it is a requirement for a handful of valid use cases.


Here is how to do it in FreeIPA. Most of the steps are the same as in the earlier post so I will not repeat them here. The only substantive difference is in the Dogtag profile configuration.

In the profile configuration, set the following directives (note that the key serverCertSet and the index 12 are indicative only; the index does not matter as long as it is different from the other profile policy components):

policyset.serverCertSet.12.constraint.name=No Constraint
policyset.serverCertSet.12.default.name=Subject Alternative Name Extension Default

Also be sure to add the index to the directive containing the list of profile policies:


This configuration will cause two SAN DNSName values to be added to the certificate – one using the CN from the CSR, and the other using the CN from the CSR preceded by a wildcard label.

Finally, be aware that because the subjectAltNameExtDefaultImpl component adds the SAN extension to a certificate, it conflicts with the userExtensionDefault component when configured to copy the SAN extension from a CSR to the new certificate. This profile component will have a configuration like the following:

policyset.serverCertSet.11.constraint.name=No Constraint
policyset.serverCertSet.11.default.name=User Supplied Extension Default

Again the numerical index is indicative only, but the OID is not; is the OID for the SAN extension. If your starting profile configuration contains the same directives, remove them from the configuration, and remove the index from the policy list too:



The profile containing the configuration outlined above will issue certificates with a wildcard DNS name in the SAN extension, alongside the DNS name from the CN. Mission accomplished; but note the following caveats.

This configuration cannot contain the userExtensionDefaultImpl component, which copies the SAN extension from the CSR to the final certificate if present in the CSR, because any CSR that contains a SAN extension would cause Dogtag to attempt to add a second SAN extension to the certificate (this is an error). It would be better if the conflicting profile components somehow "merged" the SAN values, but this is not their current behaviour.

Because we are not copying the SAN extension from the CSR, any SAN extension in the CSR get ignored by Dogtag – but not by FreeIPA; the FreeIPA CSR validation machinery always fully validates the subject alternative names it sees in a CSR, regardless of the Dogtag profile configuration.

If you work on software or services that currently use wildcard certificates please start planning to move away from this. CN validation was deprecated for a long time and is finally being phased out; wildcard certificates are also deprecated (RFC 6125) and they too may eventually be phased out. Look at services and technologies like Let’s Encrypt (a free, automated, publicly trusted CA) and ACME (the protocol that powers it) for acquiring all the certificates you need without administrator or operator intervention.

When in doubt, blame open source

Posted by Josh Bressers on June 26, 2017 12:54 AM
If you've not read my previous post on thought leadership, go do that now, this one builds on it. The thing that really kicked off my thinking on these matters was this article:

Security liability is coming for software: Is your engineering team ready?

The whole article is pretty silly, but the bit about liability and open source is the real treat. There's some sort of special consideration when you use open source apparently, we'll get back to that. Right now there is basically no liability of any sort when you use software. I doubt there will be anytime soon. Liability laws are tricky, but the lawyers I've spoken with have been clear that software isn't currently covered in most instances. The whole article is basically nonsense from that respect. The people they interview set the stage for liability and responsibility then seem to discuss how open source should be treated special in this context.

Nothing is special, open source is no better or worse than closed source software. If you build something why would open source need more responsibility than closed source? It doesn't of course, it's just an easy target to pick on. The real story is we don't know how to deal with this problem. Open source is an easy boogeyman. It's getting picked on because we don't know where else to point the finger.

The real problem is we don't know how to secure our software in an acceptable manner. Trying to talk about liability and responsibility is fine, nobody is going to worry about security until they have to. Using open source as a discussion point in this conversation clouds it though. We now get to shift the conversation from how do we improve security, to blaming something else for our problems. Open source is one of the tools we use to build our software. It might be the most powerful tool we've ever had. Tools are never the problem in a broken system even though they get blamed on a regular basis.

The conversation we must have revolves around incentives. There is no incentive to build secure software. Blaming open source or talking about responsibility are just attempts to skirt the real issue. We have to fix our incentives. Liability could be an incentive, regulation can be an incentive. User demand can be an incentive as well. Today the security quality of software doesn't seem to matter.

I'd like to end this saying we should make an effort to have more honest discussions about security incentives, but I don't think that will happen. As I mention in my previous blog post, our problem is a lack of leadership. Even if we fix security incentives, I don't see things getting much better under current leadership.

Running virt-controller locally

Posted by Adam Young on June 24, 2017 02:16 AM

While developing Kubevirt, I often want to step through my code. My most recent tasks have involved the virt-controller process. Here’s how I debug them.

UPDATED: See the full command line at the end.

A little background; during development, we use a vagrant based deployment of Kubernetes running in a two node cluster. We need two nodes to test VM migrations. This has been the biggest reason we have not yet moved to minikube.

The various piece of kubevirt run in the vagrant cluster inside containers. However, the design of Kubernetes (which kubevirt follows) allows a controller to run from pretty much anywhere that it has access to the API server. In this case, I want to run it from within my git repo on my laptop. I actually run it from gogland, as I like the interactive debugger, but that is not a requirement.

The first step is to bring up the cluster;

vagrant up
make vagrant-deploy

Once that is up and running (successfully), delete the virt-controller deployment.

cluster/kubectl.sh delete deployments virt-controller

One hack: I had to change the name of the file from cmd/virt-controller/virt-controller.go to cmd/virt-controller/main.go or I could not debug it. Seems to be a bug in gogland. Once I changed the file name, running it will report an error.

2017/06/23 22:09:25 invalid configuration: default cluster has no server defined

To run any of the kubernetes aware pieces you have to pass in a proper configuration environment. In the case of virt-controller, that is done by setting the parameters on the command line by adding the flag.

UPDATED: added the rest of the parameters:

--kubeconfig=/home/ayoung/go/src/kubevirt.io/kubevirt/cluster/vagrant/.kubeconfig    --launcher-image kubevirt/virt-launcher:devel --migrator-image kubevirt/virt-handler:devel --port 8182

Find the run toolbar.

Edit the configuration for main.go

Add the above line to Program Arguments.

once you run it you should see successful start up in the logs:

 level=info timestamp=2017-06-24T02:10:35.858154Z pos=main.go:89 component=virt-controller service=http action=listening interface= port=8182
 level=info timestamp=2017-06-24T02:10:35.858175Z pos=vm.go:73 component=virt-controller service=http msg="Starting controller."
 level=info timestamp=2017-06-24T02:10:35.858272Z pos=migration.go:70 component=virt-controller service=http msg="Starting controller."

To run from the command line, after running make to compile all the files:

$ bin/virt-controller --kubeconfig=/home/ayoung/go/src/kubevirt.io/kubevirt/cluster/vagrant/.kubeconfig    --launcher-image kubevirt/virt-launcher:devel --migrator-image kubevirt/virt-handler:devel --port 8182

level=info timestamp=2017-06-24T02:13:03.129090Z pos=virt-controller.go:89 component=virt-controller service=http action=listening interface= port=8182
level=info timestamp=2017-06-24T02:13:03.129093Z pos=vm.go:73 component=virt-controller service=http msg="Starting controller."
level=info timestamp=2017-06-24T02:13:03.129170Z pos=migration.go:70 component=virt-controller service=http msg="Starting controller."

What capabilities do I really need in my container?

Posted by Dan Walsh on June 20, 2017 07:40 PM

I have written previous blogs discussing using linux capabilities in containers.

Recently I gave a talk in New York and someone in the audience asked me about how do they figure out what capabilities their containers require?

This person was dealing with a company that was shipping their software as a container image, but they had instructed the buyer, that you would have to run their container ‘fully privileged”.  He wanted to know what privileges the container actually needed.  I told him about a project we worked on a few years ago, we called Friendly Eperm.

Permission Denied!  WHY?

A few years ago the SELinux team realized that more and more applications were getting EPERM returns when a syscall requested some access.  Most operators understood EPERM (Permission Denied) inside of a log file to mean something was wrong with the Ownership of a process of the contents it was trying to access or the permission flags on the object were wrong.  This type of Access Control is called DAC (Discretionary Access Control) and under certain conditions SELinux also caused the kernel to return EPERM.  This caused Operators to get confused and is one of the reasons that Operators did not like SELinux. They would ask, why didn’t httpd report that Permission denied because of SELinux?  We realized that there was a growing list of other tools besides regular DAC and SELinux which could cause EPERM.  Things like SECCOMP, Dropped Capabilities, other LSM …   The problem was that the processes getting the EPERM had no way to know why they got EPERM.  The only one that knew was the kernel and in a lot of cases the kernel was not even logging the fact that it denied access.  At least SELinux denials usually show up in the audit log (AVCs).   The goal of Friendly EPERM was to allow the processes to figure out why they got EPERM and make it easier for admin to diagnose.

Here is the request that talks about the proposal.


The basic idea was to have something in the /proc file system which would identify why the previous EPERM happened.  You are running a process, say httpd, and it gets permission denied. Now somehow the process can get information on why it got permission denied.  One suggestion was that we enhanced the libc/kernel to provide this information. The logical place for the kernel to reveal it would be in /proc/self.  But the act of httpd attempting to read the information out of /proc/self itself could give you a permission denied.  Basically we did not succeed because it would be a race condition, and the information could be wrong.

Here is a link to the discussion https://groups.google.com/forum/#!msg/fa.linux.kernel/WQyHPUdvodE/ZGTnxBQw4ioJ

Bottom line, no one has figured a way to get this information out of the kernel.


Later I received an email discussing the Friendly EPERM product and asking if there was a way to at least figure out what capabilities the application needed.

I wondered if the audit subsystem would give us anything here.  But I contacted the Audit guys at Red Hat, Steve Grubb and Paul Moore,  and they informed me that there is no Audit messages generated when DAC Capabilities are blocked.

An interesting discussion occurred in the email chain:

DWALSH: Well I would argue most developers have no idea what capabilities their application requires.

SGRUBB: I don't think people are that naive. If you are writing a program that runs as root and then you get the idea to run as a normal user, you will immediately see your program crash. You would immediately look at where it’s having problems. Its pretty normal to lookup the errno on the syscall man page to see what it says about it. They almost always list necessary capabilities for that syscall. If you are an admin restricting software you didn't write, then it’s kind of a  puzzle. But the reason there's no infrastructure is because historically it’s never been a problem because the software developer had to choose to use capabilities and it’s incumbent on the developer to know what they are doing.  With new management tools offering to do this for you, I guess it’s new territory.

But here we had a vendor telling a customer that it needed full root, ALL Capabilities,  to run his application,

DWALSH:  This is exactly what containers are doing.  Which is why the emailer is asking.  A vendor comes to him telling him it needs all Capabilities.  The emailer does not believe them and wants to diagnose what they actually need.

DWALSH: With containers and SELinux their is a great big "TURN OFF SECURITY" button, which is too easy for software packagers to do, and then they don't have to figure out exactly what their app needs.

Paul Moore - Red Hat SELinux Kernel Engineer suggested

That while audit can not record the DAC Failures, SELinux also enforces the capability checks.  If we could put the processes into a SELinux type that had no capabilities by default, then ran the process with full capabilities and SELinux in permissive mode, we could gather the SELinux AVC messages indicating which capabilities the application required to run.

“ (Ab)using security to learn through denial messages. What could possibly go wrong?! :)

After investigating further, turns out the basic type used to run containers, `container_t`, can be setup to have no capabilities by turning off an SELinux boolean.

To turn off the capabilities via a boolean, and put the machine into permissive mode.

setsebool virt_sandbox_use_all_caps=0

setenforce 0

Now execute the application via docker with all capabilities allowed.

docker run --cap-add all IMAGE ...

Run and test the application. This should cause SELinux to generate AVC messages about capabilities used.

grep capability /var/log/audit/audit.log

type=AVC msg=audit(1495655327.756:44343): avc:  denied  { syslog } for  pid=5246 comm="rsyslogd" capability=34  scontext=system_u:system_r:container_t:s0:c795,c887 tcontext=system_u:system_r:container_t:s0:c795,c887 tclass=capability2   


Now you know your list.

Turns out the application the emailer was trying to containerize was a tool which was allowed to manipulate the syslog system, and the only capability it needed was CAP_SYSLOG.  The emailer should be able to run the container by simply adding the CAP_SYSLOG capability and everything else about the container should be locked down.

docker run --cap-add syslog IMAGE ...


After writing this blog, I was pointed to

Find what capabilities an application requires to successful run in a container

Which is similar in that it finds out the capabilities needed for a container/process by using SystemTap.

Constructor Dependency Injection in Go

Posted by Adam Young on June 20, 2017 05:17 PM

Dependency Injection

Organization is essential to scale. Compare the two images of cabling a data center:

A well organized wiring approach to the data center.

One of the less egregious cabling systems.

Obviously, the top image appears much more organized. I don’t think it is accidental that the better organized approach is visible in the larger data center. In order to scale, you need organization. If you have a small number of servers, a haphazard cabling scheme is less likely to impact your ability to trace and fix network problems. Such an approach would not work for a million-node data center.

The same is true of code. Without many of the visual cues we use to navigate the real world, tracking code can be very difficult. Thus, code can degenerate into chaos as fast or faster than physical devices. Indeed, the long standing name for poorly organized code is “Spaghetti Code” which is an analogy to the same kind of linear mess we can visualize with the network cables.

Dependency injection provides a tool to help minimize the chaos. Instead of wires run across the data center direct from one machine to another, the well organized scheme routes them to intermediate switches and routers in a standardized way. Just so, dependency injection provides an mediator between components, removing the need for one component to know the approach used to create the specific instance.

The guiding rule is that dependency injection separates object use from object construction.

Constructor Dependency Injection

Of the three forms of Dependency Injection that Martin Fowler enumerates, only the constructor form enforces that an object always meets its invariants.   The idea is that, once the constructor returns the object should be valid.  Whenever I start working with a new language, or return to an old language, I try to figure out how best to do dependency injection using constructors.

I have a second design criteria, which is that I should continue to program exclusively in that language.  Using a marshaling language like XML or YAML as a way to describe how objects interact breaks a lot of the development flow, especially when working with a debugger.  Thus, I want to be able to describe my object relationships inside the programming language.

With these two goals in mind, I started looking in to dependency injection in Go.


There is a common underlying form to the way I approach dependency injection.  The two distinct stages are:

  1. For a given Type, use the languages type management system to register a factory method that describes how to construct it.
  2. For a given type, use the languages type management system to request an instance that implements that type via a lazy load proxy that calls the factory method.
  3. When a factory method requires additional objects to fulfill dependencies it uses the same lazy load proxies to fulfill those dependencies.

This approach works well with a language that provides the ability to program using the Type system.  C++ Supports this via template meta-programming.  A comparable version can be done in Java using Generics.

Go provides minimal reflection capabilities.  The above design goals pushes them to their limits, and perhaps a bit beyond.

Golang Reflection

The API to request the type of an object in Go is


IN order to avoid creating an object just to get its type information, go allows the following workaround:


This will return an object of reflect.Type.


Proof of Concept

Here is a very minimal Dependency Injection framework. A factory is defined with a function like this:

func createRestClient(cc dependencies.ComponentCache, _ string) (interface{}, error) {
	return kubecli.GetRESTClient() //returns two values: *rest.RESTClient, error

And registered with the ComponentCache via a call that references the type:

	CC = dependencies.NewComponentCache()
        CC.Register(reflect.TypeOf((*rest.RESTClient)(nil)), createRestClient)

Code that needs to Get a rest client out of the component cache uses the same form of reflection as the registration function:

func GetRestClient(cc dependencies.ComponentCache) *rest.RESTClient {
	t, ok := cc.Fetch(reflect.TypeOf((*rest.RESTClient)(nil))).(*rest.RESTClient)
	if !ok {
	return t

Here is a rough way that the classes work together:


The rest of the code for implementing this framework is included below.


package dependencies

import "reflect"

type ComponentFactory func(CC ComponentCache, which string) (interface{}, error)

type ComponentKey struct {
	Type  reflect.Type
	which string

type ComponentCache struct {
	components map[ComponentKey]interface{}
	factories  map[ComponentKey]ComponentFactory

func NewComponentCache() ComponentCache {
	cc := ComponentCache{
		components: make(map[ComponentKey]interface{}),
		factories:  make(map[ComponentKey]ComponentFactory),
	return cc

func (cc ComponentCache) Register(Type reflect.Type, factory ComponentFactory) {
	var which string
	which = ""
	key := ComponentKey{Type, which}
	cc.factories[key] = factory

func (cc ComponentCache) RegisterFactory(Type reflect.Type, which string, factory ComponentFactory) {
	key := ComponentKey{Type, which}
	cc.factories[key] = factory

func (cc ComponentCache) FetchComponent(Type reflect.Type, which string) interface{} {
	key := ComponentKey{Type, which}
	var err error
	if component, ok := cc.components[key]; ok {
		return component
	} else if factory, ok := cc.factories[key]; ok {
		//IDEALLY locked on a per key basis.
		component, err = factory(cc, which)
		if err != nil {
		cc.components[key] = component
		return component
	} else {

func (cc ComponentCache) Fetch(Type reflect.Type) interface{} {
	return cc.FetchComponent(Type, "")

func (cc ComponentCache) Clear() {
	//Note.  I originally tried to create a new map using
	// cc.components = make(map[ComponentKey]interface{})
	// but it left the old values in place.  Thus, the brute force method below.
	for k := range cc.components {
		delete(cc.components, k)


This is a bit simplistic, as it does not support many of the use cases that we want for Dependency Injection, but implementing those do not require further investigation into the language.


Unlike structures, Go, does not expose the type information of interfaces. Thus, the technique of

reflect.TypeOf((* SomeInterface)(nil))

Will return nil, not the type of the interface. While I think this is a bug in the implementation of the language, it is a reality today, and requires a workaround. Thus far, I have been wrapping interface types with a structure. An example from my current work:

type TemplateServiceStruct struct {

func createTemplateService(cc dependencies.ComponentCache, _ string) (interface{}, error) {
	ts, err := services.NewTemplateService(launcherImage, migratorImage)
	return &TemplateServiceStruct{
	}, err

And the corresponging accessor:

func GetTemplateService(cc dependencies.ComponentCache) *TemplateServiceStruct {
	return CC.Fetch(reflect.TypeOf((*TemplateServiceStruct)(nil))).(*TemplateServiceStruct)

Which is then further unwrapped in the calling code:

var templateService services.TemplateService
templateService = GetTemplateService(CC).TemplateService

I hope to find a better way to handle interfaces in the future.

Follow on work

Code generation

This approach requires a lot of boilerplate code. This code could be easily generated using a go generate step. A template version would look something like this.

func Get{{ Tname }}(cc dependencies.ComponentCache) *{{ T }} {
	t, ok := cc.Fetch(reflect.TypeOf((*{{ T }} )(nil))).(*{{ T }})
	if !ok {
	return t

func create{{ Tname }}(cc dependencies.ComponentCache, _ string) (interface{}, error) {
	{{ Tbody }}

Separate repository

I’ve started working on this code in the context of Kubevirt. It should be pulled out into its own repository.

Split cache from factory

The factories should not be directly linked to the cache . One set of factories should be capable of composing multiple sets of components. The clear method should be replaced by dropping the cache completely and creating a whole new set of components.

In this implementation, a factory can be registered over a previous registration of that factory. This is usually an error, but makes replacing factories for unit tests possible. A better solution is to split the factory registration into stages, so and factories required for unit tests are mutually exclusive with factories that are required for live deployment. In this scheme, re-registering a component would raise a panic.

Pre-activate components

A cache should allow for activating all components in order to ensure that none of them would throw an exception upon construction. This is essential to avoid panics that happen long after an application is run triggered via uncommon code paths.

Multiple level caches

Caches and factories should be able to work at multiple levels. For example, a web framework might specify request, session, and global components. If a factory is defined at the global level, the user should still be able to access it from the request level. The resolution and creation logic is roughly:

func (cc ComponentCache) FetchComponent(Type reflect.Type, which string) interface{} {
	key := ComponentKey{Type, which}
	var err error
	if component, ok := cc.components[key]; ok {
		return component
	} else if factory, ok := cc.factories[key]; ok {
		//IDEALLY locked on a per key basis.
		component, err = factory(cc, which)
		if err != nil {
		cc.components[key] = component
		return component
	} else if (cc.parent != null ){
           return cc.parent.FetchComponent(Type, which)
        }else {

This allows caches to exist in a DAG structure. Object lifelines are sorted from shortest to longest: an object can point at another object either within the same cache, or of longer lifeline in the parent cache, chained up the ancestry.

Episode 52 - You could have done it right, but you didn't

Posted by Open Source Security Podcast on June 20, 2017 02:01 AM
Josh and Kurt talk about the new Stack Clash flaw, Grenfell Tower, risk management, and backwards compatibility.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="http://html5-player.libsyn.com/embed/episode/id/5534636/height/90/width/640/theme/custom/autonext/no/thumbnail/yes/autoplay/no/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="640"></iframe>

Show Notes

Thought leaders aren't leaders

Posted by Josh Bressers on June 18, 2017 02:15 AM
For the last few weeks I've seen news stories and much lamenting on twitter about the security skills shortage. Some say there is no shortage, some say it's horrible beyond belief. Basically there's someone arguing every possible side of this. I'm not going to debate if there is or isn't a worker shortage, that's not really the point. A lot of complaining was done by people who would call themselves leaders in the security universe. I then read the below article and change my thinking up a bit.

Our problem isn't a staff shortage. Our problem is we don't have any actual leaders. I mean people who aren't just "in charge". Real leaders aren't just in charge, they help their people grow in a way that accomplishes their vision. Virtually everyone in the security space has spent their entire careers working alone to learn new things. We are not an industry known for working together and the thing I'd never really thought about before was that if we never work together, we never really care about anyone or anything (except ourselves). The security people who are in charge of other security people aren't motivating anyone which by definition means they're not accomplishing any sort of vision. This holds true for most organizations since barely keeping the train on the track is pretty much the best case scenario.

If I was going to guess the existing HR people look at most security groups and see the same dumpster fire we see when we look at IoT.

In the industry today virtually everyone who is seen as being some sort of security leader is what a marketing person would call "thought leaders". Thought leaders aren't leaders. Some do have talent. Some had talent. And some just own a really nice suit. It doesn't matter though. What we end up with is a situation where the only thing anyone worries about is how many Twitter followers they have instead of making a real difference. You make a real difference when you coach and motivate someone else do great things.

Being a leader with loyal employees would be a monumental step for most organizations. We have no idea who to hire and how to teach them because the leaders don't know how to do those things. Those are skills real leaders have and real leaders develop in their people. I suspect the HR department knows what's wrong with the security groups. They also know we won't listen to them.

There is a security talent shortage, but it's a shortage of leadership talent.

Episode 51 - All about CVE

Posted by Open Source Security Podcast on June 12, 2017 12:44 PM
Josh and Kurt talk to Dan Adinolfi about CVE. Most anything you ever wanted to know about CVE is discussed.

<iframe allowfullscreen="allowfullscreen" height="90" mozallowfullscreen="mozallowfullscreen" msallowfullscreen="msallowfullscreen" oallowfullscreen="oallowfullscreen" scrolling="no" src="http://html5-player.libsyn.com/embed/episode/id/5534637/height/90/width/640/theme/custom/autonext/no/thumbnail/yes/autoplay/no/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none" webkitallowfullscreen="webkitallowfullscreen" width="640"></iframe>

Show Notes

Humanity isn't proactive

Posted by Josh Bressers on June 11, 2017 09:51 PM
I ran across this article about IoT security the other day

The US Needs to Get Serious About Securing the Internet of Hackable Things

I find articles like this frustrating for the simple fact everyone keeps talking about security, but nobody is going to do anything. If you look at the history of humanity, we've never been proactive when dealing with problems. We wait until things can't get worse and the only actual option is to fix the problem. If you look at every problem there are at least two options. Option #1 is always "fix it". Option #2 is ignore it. There could be more options, but generally we pick #2 because it's the least amount of work in the short term. Humanity rarely cares about the long term implications of anything.

I know this isn't popular, but I'm going to say it: We aren't going to fix IoT security for a very long time

I really wish this wasn't true, but it just is. If a senator wants to pretend they're doing something but they're really just ignoring the problem, they hold a hearing and talk about how horrible something is. If they actually want to fix it they propose legislation. I'm not blaming anyone in charge mind you. They're really just doing what they think the people want. If we want the government to fix IoT we have to tell them to do it. Most people don't really care because they don't have a reason to care.

Here's the second point that I suspect many security people won't want to hear. The reason nobody cares about IoT security isn't because they're stupid. This is the narrative we've been telling ourselves for years. They don't care because the cost of doing nothing is substantially less than fixing IoT security. We love telling scary campfire stories about how the botnet was coming from inside the house and how a pacemaker will kill grandpa, but the reality is there hasn't been enough real damage done yet from insecure IoT. I'm not saying there won't ever be, there just hasn't been enough expensive widespread damage done yet to make anyone really care.

In world filled with insecurity, adding security to your product isn't a feature anyone really cares about. I've been doing research about topics such as pollution, mine safety, auto safety, airline safety, and a number of other problems from our past. There are no good examples where humans decided to be proactive and solve a problem before it became absolutely horrible. People need a reason to care, there isn't a reason for IoT security.


Someday something might happen that makes people start to care. As we add compute power to literally everything my security brain says there is some sort of horrible doom coming without security. But I've also been saying this for years and it's never really happened. There is a very real possibility that IoT security will just never happen if things never get bad enough.

Manually Deploying Kubevirt on OpenShift Origin

Posted by Adam Young on June 10, 2017 01:57 AM

It has been to be coding kubevirt in go.  However, unless the code gets deployed to servers, no one will use it in production.  I’ve been learning OpenShift as an integration point for Kubevirt.  Here are my notes for getting it up and running.  This is not quite production grade, but should help write a proper deployment mechanism.

Using openshift-ansible.  I originally had to apply one patch but it got merged!

Here is my inventory file for a two node deployment.





munchlax openshift_node_labels="{'region': 'infra'}"
dialga openshift_node_labels="{'region': 'infra'}"


Running the playbook like this:

ansible-playbook -i  ~/devel/local-openshift-ansible/inventory.ini  ~/devel/openshift-ansible/playbooks/byo/config.yml

I should have modified that to be able to schedule on the master node0, but it can be done after the fact like this:

oadm manage-node munchlax --schedulable=true

Had to edit the manifests in kubevirt:

Move the manifests over to the master, in order to use the service account to create the various resources:

scp /home/ayoung/go/src/kubevirt.io/kubevirt/manifests/* ansible@munchlax:manifests/


Note the differences from the source:

$ diff -u virt-api.yaml.in virt-api.yaml
--- virt-api.yaml.in 2017-06-06 12:01:46.077594982 -0400
+++ virt-api.yaml 2017-06-07 10:47:03.048151082 -0400
@@ -7,7 +7,7 @@
 - port: 8183
 targetPort: virt-api
 externalIPs :
- - "{{ master_ip }}"
+ - ""
 app: virt-api
@@ -23,17 +23,18 @@
 - name: virt-api
- image: {{ docker_prefix }}/virt-api:{{ docker_tag }}
+ image: kubevirt/virt-api:latest
 imagePullPolicy: IfNotPresent
 - "/virt-api"
 - "--port"
 - "8183"
 - "--spice-proxy"
- - "{{ master_ip }}:3128"
+ - ""
 - containerPort: 8183
 name: "virt-api"
 protocol: "TCP"
- kubernetes.io/hostname: master
+ kubernetes.io/hostname: munchlax

All of the referenced images need to be “latest” instead of devel.

For both libvirt and virt-handler I use the privilegeduser service account.  The master node (munchlax) has a ~/.kubeconf file set up to allow operations on the kube-system.

#For libvirt we need a service user:
oc create serviceaccount -n default privilegeduser
oc adm policy add-scc-to-user privileged -ndefault -z privilegeduser

Starting the services in the dependency order is not necessary, but I do it anyway

kubectl create -f vm-resource.yaml
kubectl create -f migration-resource.yaml
kubectl create -f virt-api.yaml
kubectl create -f virt-controller.yaml
kubectl create -f libvirt.yaml
kubectl create -f virt-handler.yaml

As of this writing, kubevirt only support VMs in the default namespace. The VM launches using a few iSCSI volumes.  I need to create the iscsi volume in the same namespace as the VM:

kubectl create -f iscsi-demo-target.yaml --namespace default

I’m going to regret this but…overpowering the ayoung user to be god on the cluster.

oc create user ayoung

oadm policy add-role-to-user edit ayoung 
#Don't trust ayoung...
oadm policy add-cluster-role-to-user cluster-admin ayoung
# he can't even keep track of his car keys and cellphone, and
# you make him admin on your cluster?

oadm policy add-role-to-user admin system:serviceaccount:kube-system:default

Try to create a vm:

oc login

kubectl config set-context $(kubectl config current-context) --namespace=default

kubectl create -f ~/go/src/kubevirt.io/kubevirt/cluster/vm.yaml
#wait for a bit and then
kubectl get vms -o json | jq '. | .items[0] | .status | .phase'

Still todo:  Spice, console, and migrations.

Upstream First…or Second?

Posted by Adam Young on June 07, 2017 01:58 PM

From December 2011 until December 2016, my professional life was driven by OpenStack Keystone development. As I’ve made an effort to diversify myself a bit since then, I’ve also had the opportunity to reflect on our approach, and perhaps see somethings I would like to do differently in the future.

OpenStack moves slowly, and Keystone moves slower than the average OpenStack project. As a security sensitive project, it is very risk adverse, and change requires overcoming a lot of inertia. Very different in pacing from the WebUI and application development I’ve done in the past. Even (mostly) internal projects like BProc had periods of rapid development. But Keystone stalled, and I think that is not healthy.

One aspect of the stall is the slow adoption of new technology, and this is, in part, due to the policy we had in place for downstream development that something has to be submitted upstream before it could be included in a midstream or downstream product. This is a very non-devops style code deployment, and I don’t blame people for being resistant to accepting code into the main product that has never been tested in anger.

When I started on Keystone, very, very little was core API. The V2 API did not even have a mechanisms for changing passwords: it was all extensions. When I wrote the Trusts API, at the last minute, I was directed by several other people to make it into an extension, even though something that touched as deeply into so many parts of othe code could not, realistically be an extension. The result was only halfway done as an extension, that could neither be completely turned off or ignored by core pieces, but that still had fragments of the namespace OS-TRUST floating around in it.

As Keystone pushed toward the V3 API, the idea of extensions was downplayed and then excised. Federation was probably the last major extension, and it has slowly been pulled into the main V3 API. There was no replacement for extensions. All features went into the main API into the next version.

Unfortunately, Keystone is lacking some pretty fundamental features. The ability to distinguish cluster level from project scoped authorization, (good ole bug 968696) Is very fundamentally coded into all of the remote services. Discoverability of features or URLS is very limited. Discoverability of authorization is non-existent.

How could I have worked on something for so long, felt so strongly about it, and yet had so little impact? Several things came in to play, but the realization struck me recently as I was looking at OpenShift origin.

Unlike RDO, which followed OpenStack, OpenShift as a project pre-dated the (more)upstream Kuberenetes project on which it is now based. Due to the need to keep up with the state of container development, OpenShift actually shifted from an single-vendor-open-source-project approach to embrace Kubernetes. In doing so, however, it had the demands of real users and the need for a transition strategy. OpenShift specific operations show up in the discovery page, just under their own top level URL.The oc, and oadm commands are still used to perform operations that are not yet in the upstream kubectl. This ability to add on to upstream has proved to be Kubernetes’ strength.

The RBAC mechanism in Kubernetes has most of what I wanted from the RBAC mechanism in Keystone (exception is Implied Roles.) This was developed in Origin first, and then submitted to the upstream Kubernetes project without (AFAICT) any significant fanfare. One reason I think it was accepted so simply was that the downstream deployment had vetted the idea. Very little teaches software what it needs to do like having users.

PKI tokens are a pretty solid example of a feature that would have been much better with a rapid deployment and turnaround time. If I had put PKI tokens into production in a small way prior to submitting them upstream, the mechanism would have be significantly more efficient: we would have discovered the size issues with the X-AUTH_TOKEN headers, the issues with revocation, and built the mechanisms to minimize the token payload.

We probably would have used PKI token mechanism for K2K, not SAML, as originally envisioned.

We ended up with a competing implementation come with Fernet coming out of Rackspace, and that had the weight of “our operators need this.”

I understand why RDO pursued the Upstream First policy. We did not want to appear to be attempting a vendor lock in. We wanted to be good community members. And I think we accomplished that. But having a midstream or downstream extension strategy to vet new ideas appears essential. Both Upstream Keystone and midstream RDO could have worked to make this happen. Its worth remembering for future development.

It is unlikely that a software spec will better cover the requirements than actually having code in production.

Episode 50 - This is a security podcast after all

Posted by Open Source Security Podcast on June 06, 2017 09:25 PM
Josh and Kurt discuss Futurama, tornadoes, sudo, encryption, hacking back, and something called an ombudsman. Also episode 50!

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="http://html5-player.libsyn.com/embed/episode/id/5534638/height/90/width/640/theme/custom/autonext/no/thumbnail/yes/autoplay/no/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="640"></iframe>

Show Notes

Creating a privileged container in OpenShift

Posted by Adam Young on June 05, 2017 08:51 PM

While trying to launch kubevirt containers in OpenShift, I continually tripped over problems regarding authorization.

Most looked like this:

 message: 'unable to create pods: pods "libvirt-3407864139-" is forbidden: unable
 to validate against any security context constraint: [spec.securityContext.hostNetwork:
 Invalid value: true: Host network is not allowed to be used spec.securityContext.hostPID:
 Invalid value: true: Host PID is not allowed to be used spec.securityContext.hostIPC:
 Invalid value: true: Host IPC is not allowed to be used securityContext.runAsUser:
 Invalid value: 0: UID on container libvirtd does not match required range. Found
 0, required min: 1000060000 max: 1000069999 spec.containers[0].securityContext.privileged:
 Invalid value: true: Privileged containers are not allowed spec.containers[0].securityContext.volumes[0]:
 Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.volumes[1]:
 Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.volumes[2]:
 Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.volumes[3]:
 Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.volumes[4]:
 Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[0].securityContext.hostNetwork:
 Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.containers[0].hostPort:
 Invalid value: 16509: Host ports are not allowed to be used spec.containers[0].securityContext.hostPID:
 Invalid value: true: Host PID is not allowed to be used spec.containers[0].securityContext.hostIPC:
 Invalid value: true: Host IPC is not allowed to be used securityContext.runAsUser:
 Invalid value: 0: UID on container virtlogd does not match required range. Found
 0, required min: 1000060000 max: 1000069999 spec.containers[1].securityContext.volumes[0]:
 Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[1].securityContext.volumes[1]:
 Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[1].securityContext.volumes[2]:
 Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[1].securityContext.volumes[3]:
 Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[1].securityContext.volumes[4]:
 Invalid value: "hostPath": hostPath volumes are not allowed to be used spec.containers[1].securityContext.hostNetwork:
 Invalid value: true: Host network is not allowed to be used spec.containers[1].securityContext.containers[0].hostPort:
 Invalid value: 16509: Host ports are not allowed to be used spec.containers[1].securityContext.hostPID:
 Invalid value: true: Host PID is not allowed to be used spec.containers[1].securityContext.hostIPC:
 Invalid value: true: Host IPC is not allowed to be used]'

I was scratching my head because I had granted my user the appropriate security context:


oc adm policy add-scc-to-user privileged -nkubevirt -z ayoung

But, it turns out, that as not the right user to specify.  What I really needed was the service user that Kuberenetes uses to actually run the pod.

I could specifically create a service user and grant it the appropraite security context like this (as the admin service user):

oc create serviceaccount -n kubevirt privilegeduser
oc adm policy add-scc-to-user privileged -nkubevirt -z privilegeduser

and then add the following to the pod declaration:


Or, I could use the default user for creating pods in that domain:

(as the admin service user):

oc adm policy add-scc-to-user privileged -nkubevirt -z default

Both of these allow me to successfully launch pods where the containers use host networking and other privileged operations.

Free Market Security

Posted by Josh Bressers on June 04, 2017 09:10 PM
I've been thinking about the concept of free market forces this weekend. The basic idea here is that the price of a good is decided by the supply and demand of the market. If the market demands something, the price will go up if there it's in short supply. This is basically why the Nintendo Switch is still selling on eBay for more than it would cost in the store. There is a demand but there isn't a supply. But back to security. Let's think about something I'm going to call "free market security". What if demand and supply was driving security? Or we can flip the question around, what if the market will never drive security?

Of course security isn't really a thing like we think of goods and services in this context. At best we could call it a feature of another product. You can't buy security to add it to your products, it's just sort of something that happens as part of a larger system.

I'm leaning in the direction of secure products. Let's pick on mobile phones because that environment is really interesting. Is the market driving security into phones? I'd say the answer today is a giant "no". Most people buy phones that will never see a security update. They don't even ask about updates or security in most instances. You could argue they don't know this is even a problem.

Apple is the leader here by a wide margin. They have invested substantially into security, but why did they do this? If we want to think about market forces and security, what's the driver? If Apple phones were less secure would the market stop buying them? I suspect the sales wouldn't change at all. I know very few people who buy an iPhone for the security. I know zero people outside of some security professionals who would ever think about this question. Why Apple decided to take these actions is a topic for another day I suspect.

Switching gears, the Android ecosystem is pretty rough in this regard. The vast majority of phones sold today are android phones. Android phones that are competitively priced, all have similar hardware, and almost all of them are completely insecure. People still buy them though. Security is clearly not a feature that's driving anything in this market. I bought a Nexus phone because of security. This one single feature. I am clearly not the norm here though.

The whole point we should be thinking about is idea of a free market for security. It doesn't exist, it probably won't exist. I see it like pollution. There isn't a very large market products that either don't pollute, or are made without polluting in some way. I know there are some people who worry about sustainability, but the vast majority of consumers don't really care. In fact nobody really cared about pollution until a river actually lit on fire. There are still some who don't, even after a river lit on fire.

I think there are many of us in security who keep waiting for demand to appear for more security. We keep watching and waiting, any day now everyone will see why this matters! It's not going to happen though. We do need security more  and more each day. The way everything is heading, things aren't looking great. I'd like to think we won't have to wait for the security equivalent of a river catching on fire, but I'm pretty sure that's what it will take.

Querying Policy Rules in OpenShift using jq

Posted by Adam Young on June 01, 2017 06:26 PM

In my last post on the subject, I mentioned that I was able to use kubectl to get the same information as oc describe clusterPolicy. Here are some more details.

To see the whole cluster policy in an openshift deployment, you can run:

kubectl get clusterpolicy

But it produces a lot of data. I want to know what roles to assign to a user in order to get access to a specific API. I’ve come to rely on the jq tool to do this kind of work.

To see the complete list of roles, with all the outside containers removed, run:

kubectl get clusterpolicy -o json | jq '.items[] | .roles[]'

Notice that the roles are in

To get just the role names:

kubectl get clusterpolicy -o json | jq '.items[] | .roles[] | .name'


Lets start simple.  Lets see what is defined for a single role:

kubectl get clusterpolicy -o json | jq '.items[] | .roles[] | select (.name == "view")'


And that returns a lot of information.  To make sure we have the right general approch, go simpler:

$ kubectl get clusterpolicy -o json | jq '.items[] | .roles[] | select (.name == "view") | .name ' 

To see the resources:

kubectl get clusterpolicy -o json | jq '.items[] | .roles[] | select (.name == "view") | .role | .rules[] | .resources'


Again, a lot of data.  Time to dig deep.  In order to find an element inside a list, we can use the bsearch function.

$ kubectl get clusterpolicy -o json | jq '.items[] | .roles[] | .role | .rules[] | select(.resources | bsearch("buildlogs") > -1 ) '

Which returns

 "apiGroups": [
 "attributeRestrictions": null,
 "resources": [
 "verbs": [
 "apiGroups": [
 "attributeRestrictions": null,
 "resources": [
 "verbs": [
 "apiGroups": [
 "attributeRestrictions": null,
 "resources": [
 "verbs": [
 "apiGroups": [
 "attributeRestrictions": null,
 "resources": [
 "verbs": [

Which shows that the specific rule matches.  Of course, it does not have the role names in it as it is one level too low.  If we target one level higher in the structure:

kubectl get clusterpolicy -o json | jq '.items[] | .roles[] | .role | select(.rules[] | .resources | bsearch("buildlogs") > -1 ) '

We see that there is, once again, too much data.  So now we can filter it out:

$ kubectl get clusterpolicy -o json | jq '.items[] | .roles[] | select(.role | .rules[] | .resources | bsearch("buildlogs") > -1 ) | .name' 

The general pattern I followed here was to push elements from the chain into the select until I get only the object I wanted, and then append new filter values after the select.


I’ll spare you the rest of the trial and error and come to the final solution:

kubectl get clusterpolicy -o json | jq '.items[] | .roles[] |  .role |  select( (.rules[] | select (.verbs | bsearch ("create") > -1) and ( .resources | bsearch ("imagestreams") > -1 )) ) | .metadata | .name  '  


Episode 49 - Testing software is impossible

Posted by Open Source Security Podcast on May 30, 2017 10:39 PM
Josh and Kurt discuss Samba, FTP sites, MSDOS, regulation, and the airplane laptop travel ban.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="http://html5-player.libsyn.com/embed/episode/id/5534639/height/90/width/640/theme/custom/autonext/no/thumbnail/yes/autoplay/no/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="640"></iframe>

Show Notes

Sausage Factory: An introduction to building modules in Fedora

Posted by Stephen Gallagher on May 30, 2017 07:00 PM

First off, let me be very clear up-front: normally, I write my blog articles to be approachable by readers of varying levels of technical background (or none at all). This will not be one of those. This will be a deep dive into the very bowels of the sausage factory.

This blog post assumes that the reader is aware of the Fedora Modularity Initiative and would like to learn how to build their very own modules for inclusion into the Fedora Project. I will guide you through the creation of a simple module built from existing Fedora Project packages on the “F26” branch.

To follow along, you will need a good working knowledge of the git source-control system (in particular, Fedora’s “dist-git“) as well as being generally comfortable around Fedora system tools such as dnf and python.

Setting up the Module Build Service

For the purposes of this blog, I am going to use Fedora 25 (the most recent stable release of Fedora) as the host platform for this demonstration and Fedora 26 (the current in-development release) as the target. To follow along, please install Fedora 25 Server on a bare-metal or virtual machine with at least four processors and 8 GiB of RAM.

First, make sure that the system is completely up-to-date with all of the latest packages. Then we will install the “module-build-service” package. We will need version 1.3.24 or later of the module-build-service RPM and version 1.2.0 or later of python2-modulemd, which at the time of this writing requires installing from the “updates-testing” repository. (EDIT 2017-06-30: version 1.3.24 requires the mock-scm package for local builds but doesn’t have a dependency on it.)

dnf install --enablerepo=updates-testing module-build-service python2-modulemd mock-scm

This may install a considerable number of dependency packages as well. Once this is installed, I recommend modifying /etc/module-build-service/config.py to change NUM_CONCURRENT_BUILDS to match the number of available processors on the system.

Leave the rest of the options alone at this time. The default configuration will interact with the production Fedora Project build-systems and is exactly what we want for the rest of this tutorial.

In order to perform builds locally on your machine, your local user will need to be a member of the mock group on the system. To do this, run the following command:

usermod -a -G mock <yourloginname>

Then you will need to log out of the system and back in for this to take effect (since Linux only adds group memberships at login time).

Gathering the module dependencies

So now that we have a build environment, we need something to build. For demonstration purposes, I’m going to build a module to provide the libtalloc library used by the Samba and SSSD projects. This is obviously a trivial example and would never become a full module on its own.

The first thing we need to do is figure out what runtime and build-time dependencies this package has. We can use dnf repoquery to accomplish this, starting with the runtime dependencies:

dnf repoquery --requires libtalloc.x86_64 --resolve

Which returns with:


There are two libcrypt implementations that will satisfy this dependency, so we can pick one a little later. For glibc, we only want the one that will operate on the primary architecture, so we’ll ignore the .i686 version.

Next we need to get the build-time dependencies with:

dnf repoquery --requires --enablerepo=fedora-source --enablerepo=updates-source libtalloc.src --resolve

Which returns with:


OK, that’s not bad. Similar to the runtime dependencies above, we will ignore the .i686 versions. So now we have to find out which of these packages are provided already by the base-runtime module or the shared-userspace module, so we don’t need to rebuild them. Unfortunately, we don’t have a good reference location for getting this data yet (it’s coming a little ways into the future), so for the time being we will need to look directly at the module metadata YAML files:

When reading the YAML, the section that we are interested in is the api->rpms section. This part of the metadata describes the set of packages whose interfaces are public and can be consumed directly by the end-user or other modules. So, looking through these two foundational modules, we see that the base-runtime provides glibc, libcrypt and python3-devel and shared-userspace provides docbook-style-xsl, libxslt​ and python2-devel and common-build-dependencies provides doxygen. So in this case, all of the dependencies are satisfied by these three core modules. If they were not, we’d need to recurse through the dependencies and figure out what additional packages we would need to include in our module to support libtalloc or see if there was another module in the collection that provided it.

So, the next thing we’re going to need to do is decide which version of libtalloc we want to package. What we want to do here is check out the libtalloc module from Fedora dist-git and then find a git commit has that matches the build we want to add to our module. We can check out the libtalloc module by doing:

fedpkg clone --anonymous rpms/libtalloc && cd libtalloc

Once we’re in this git checkout, we can use the git log command to find the commit hash that we want to include. For example:

[sgallagh@sgallagh540:libtalloc (master)]$ git log -1
commit f284a27d9aad2c16ba357aaebfd127e4f47e3eff (HEAD -> master, origin/master, origin/f26, origin/HEAD)
Author: Lukas Slebodnik <lslebodn@redhat.com>
Date: Tue Feb 28 09:03:05 2017 +0100

New upstream release - 2.1.9
 rhbz#1401225 - Rename python packages to match packaging guidelines

The string of hexadecimal characters following the word “commit” is the git commit hash. Save it somewhere, we’re going to need it in the next section.

Creating a new module

The first thing to be aware of is that the module build-service has certain constraints. The build can only be executed from a directory that has the same name as the module and will look for a file named modulename.yaml in that directory. So in our case, I’m going to name the module talloc, which means I must create a directory called talloc and a module metadata file called talloc.yaml. Additionally, the module-build-service will only work within a git checkout, so we will initialize this directory with a blank metadata file.

mkdir talloc && cd talloc
touch talloc.yaml
git init
git add talloc.yaml
git commit -m "Initial setup of the module"

Now we need to edit the module metadata file talloc.yml and define the contents of the module. A module metadata file’s basic structure looks like this:

document: modulemd
version: 1
  summary: Short description of this module
  description: Full description of this module
    community: Website for the community that supports this module
    documentation: Documentation website for this module
    tracker: Issue-tracker website for this module
      base-runtime: f26
      shared-userspace: f26
      common-build-dependencies: f26
      base-runtime: f26
      shared-userspace: f26
    - rpm1
    - ...
    - filteredrpm1
    - ...
        rationale: reason to include rpm1

Let’s break this down a bit. First, the document type and version are fixed values. These determine the version of the metadata format. Next comes the “data” section, which contains all the information about this module.

The summary, description and references are described in the sample. The license field should describe the license of the module, not its contents which carry their own licenses.

The apisection is a list of binary RPMs that are built from the source RPMs in this module whose presence you want to treat as “public”. In other words, these are the RPMs in this module that others can expect to be available for their use. Other RPMs may exist in the repository (to satisfy dependencies or simply because they were built as a side-effect of generating these RPMs that you need), but these are the ones that consumers should use.

On the flip side of that, we have the filter section. This is a place to list binary RPM packages that explicitly must not appear in the final module so that no user will try to consume them. The main reason to use this would be if a package builds a subpackage that is not useful to the intended audience and requires additional dependencies which are not packaged in the module. (For example, a module might contain a package that provides a plugin for another package and we don’t want to ship that other package just for this reason).

Each of the components describes a source RPM that will be built as part of this module. The rationale is a helpful comment to explain why it is needed in this module. The ref field describes any reference in the dist-git repository that can be used to acquire these sources. It is recommended to use an exact git commit here so that the results are always repeatable, but you can also use tag or branch names.

So our talloc module should look like this:

document: modulemd
version: 1
  summary: The talloc library
  description: A library that implements a hierarchical allocator with destructors.
  stream: ''
  version: 0
    - LGPLv3+
    community: https://talloc.samba.org/
    documentation: https://talloc.samba.org/talloc/doc/html/libtalloc__tutorial.html
    tracker: http://bugzilla.samba.org/
      base-runtime: f26
      shared-userspace: f26
      common-build-dependencies: f26
      base-runtime: f26
    - libtalloc
    - libtalloc-devel
    - python-talloc
    - python-talloc-devel
    - python3-talloc
    - python3-talloc-devel
        rationale: Provides a hierarchical memory allocator with destructors
        ref: f284a27d9aad2c16ba357aaebfd127e4f47e3eff

You will notice I omitted the “filter” section because we want to provide all of the subpackages here to our consumers. Additionally, while most modules will require the shared-userspace module at runtime, this particular trivial example does not.

So, now we need to commit these changes to the local git repository so that the module build service will be able to see it.

git commit talloc.yaml -m "Added module metadata"

Now, we can build this module in the module build service. Just run:

mbs-build local

The build will proceed and will provide a considerable amount of output telling you what it is doing (and even more if you set LOG_LEVEL = 'debug' in the /etc/module-build-service/config.py file). The first time it runs, it will take a long time because it will need to download and cache all of the packages from the base-runtime and shared-userspace modules to perform the build. (Note: due to some storage-related issues in the Fedora infrastructure right now, you may see some of the file downloads time out, canceling the build. If you restart it, it will pick up from where it left off and retry those downloads.)

The build will run and deposit results in the ~/modulebuild/builds directory in a subdirectory named after the module and the timestamp of the git commit from which it was built. This will include mock build logs for each individual dependency, which will show you if it succeeded or failed.

When the build completes successfully, the module build service will have created a yum repository in the same results directory as the build logs containing all of the produced RPMs and repodata (after filtering out the undesired subpackages).

And there you have it! Go off and build modules!

Edit 2017-06-30: Switched references from NUM_CONSECUTIVE_BUILDS to NUM_CONCURRENT_BUILDS and updated the minimum MBS requirement to 1.3.24. Added notes about needing to be in the ‘mock’ group.

Stealing from customers

Posted by Josh Bressers on May 29, 2017 09:59 PM
I was having some security conversations last week and cybersecurity insurance came up as a topic. This isn't overly unusual as it's a pretty popular topic, but someone said something that really got me thinking.
What if the insurance covered the customers instead of the companies?
Now I understand that many cybersecurity insurance policies can cover some amount of customer damage and loss, but fundamentally the coverage is for the company that is attacked, customers who have data stolen will maybe get a year of free credit monitoring or some other token service. That's all well and good, but I couldn't help myself from thinking about this problem from another angle. Let's think about insurance in the context of shoplifting. For this thought exercise we're going to use a real store in our example, which won't be exactly correct, but the point is to think about the problem, not get all the minor details correct.

If you're in a busy store shopping and someone steals your wallet, it's generally accepted that the store is not at fault for this theft. Most would put some effort into helping you, but at the end of the day you're probably out of luck if you expect the store to repay you for anything you lost. They almost certainly won't have insurance to cover the theft of customer property in their store.

Now let's also imagine there are things taken from the store, actual merchandise gets stolen. This is called shoplifting. It has a special name and many stores even have special groups to help minimize this damage. They also have insurance to cover some of these losses. Most businesses see some shoplifting as a part of doing business. They account for some volume of this theft when doing their planning and profit calculations.

In the real world, I suspect customers being robbed while in a store isn't very common. If there is a store that gains a reputation for customers having wallets stolen, nobody will shop there. If you visit a store in a rough part of town they might even have a security guard at the door to help keep the riffraff out. This is because no shop wants to be known as a dangerous place. You can't exist as a store with that sort of reputation. Customers need to feel safe.

In the virtual world, all that can be stolen is basically information. Sometimes that information can be equated to actual money, sometimes it's just details about a person. Some will have little to no value like a very well known email address. Sometimes it can have a huge value like a tax identifier that can be used to commit identity theft. It can be very very difficult to know when information is stolen, but also the value of that information taken can vary widely. We also seem to place very little value on our information. Many people will trade it away for a trinket online worth a fraction of the information they just supplied.

Now let's think about insurance. Just like loss prevention insurance, cybersecurity insurance isn't there to protect customers. It exists to help protect the company from the losses of an attack. If customer data is stolen the customers are not really covered, in many instances there's nothing a customer can do. It could be impossible to prove your information was stolen, even if it gets used somewhere else can you prove it came from the business in question?

After spending some time on the question of what if insurance covered the customers, I realize how hard this problem is to deal with. While real world customer theft isn't very common and it's basically not covered, there's probably no hope for information. It's so hard to prove things beyond a reasonable doubt and many of our laws require actual harm to happen before any action can be taken. Proving this harm is very very difficult. We're almost certainly going to need new laws to deal with these situations.

Merging Kubernetes client configs at run time

Posted by Adam Young on May 26, 2017 03:20 PM

Last time I walked through the process of merging two sets of Kubernetest client configurations into one. For more ephemeral data, you might not want to munge it all into your main configuration. The KUBECONFIG environment variables lets you specify muiltiple configuration files and merge them into a single set of configuration data.


kubectl config --help

If $KUBECONFIG environment variable is set, then it is used [as] a list of paths (normal path delimiting rules for your system). These paths are merged. When a value is modified, it is modified in the file that defines the stanza. When a value is created, it is created in the first file that exists. If no files in the chain exist, then it creates the last file in the list.


So, lets start with the file downloaded by the kubevirt build system yesterday.


[ayoung@ayoung541 vagrant]$ echo $PWD
[ayoung@ayoung541 vagrant]$ export KUBECONFIG=$PWD/.kubeconfig
[ayoung@ayoung541 vagrant]$ kubectl config get-contexts
* kubernetes-admin@kubernetes kubernetes kubernetes-admin 

Contrast this with what a get without the environment variable set, if I use the configuration in ~/.kube, which I synced over from my OpenShift cluster:

[ayoung@ayoung541 vagrant]$ unset KUBECONFIG
[ayoung@ayoung541 vagrant]$ kubectl config get-contexts
 default/munchlax:8443/ayoung munchlax:8443 ayoung/munchlax:8443 default
* default/munchlax:8443/system:admin munchlax:8443 system:admin/munchlax:8443 default
 kube-system/munchlax:8443/system:admin munchlax:8443 system:admin/munchlax:8443 kube-system

I want to create a new configuration for the vagrant managed machines for Kubevirt.  IT turns out that the API server specified there is actually a proxy, a short term shim we put in place as we anxiously awate the Amagalmated Api Server of 1.7.  However, sometimes this proxy is broken or we just need to by-pass it.  The only difference between this setup and the proxied setup is the server URL.

So…I create a new file, based on the .kubeconfig file, but munged slightly.  Here is the diff:

[ayoung@ayoung541 vagrant]$ diff -Nurd .kubeconfig .kubeconfig-core 
--- .kubeconfig 2017-05-24 19:49:24.643158731 -0400
+++ .kubeconfig-core 2017-05-26 11:10:49.359955538 -0400
@@ -3,13 +3,13 @@
 - cluster:
- name: kubernetes
+ name: core
 - context:
- cluster: kubernetes
+ cluster: core
 user: kubernetes-admin
- name: kubernetes-admin@kubernetes
-current-context: kubernetes-admin@kubernetes
+ name: kubernetes-admin@core
+current-context: kubernetes-admin@core
 kind: Config
 preferences: {}

Now I have a couple choices. I can just specify this second config file on the command line:

[ayoung@ayoung541 vagrant]$ kubectl --kubeconfig=$PWD/.kubeconfig-core config get-contexts
 kubernetes-admin@core core kubernetes-admin

Or I can munge the two together and provide a flag which states which context to use.

[ayoung@ayoung541 vagrant]$ export KUBECONFIG=$PWD/.kubeconfig:$PWD/.kubeconfig-core
[ayoung@ayoung541 vagrant]$ kubectl config get-contexts
* kubernetes-admin@kubernetes kubernetes kubernetes-admin 
 kubernetes-admin@core core kubernetes-admin

Note that this gives a different current context (with the asterix) than if I reverse the order of the files in the env-var:

[ayoung@ayoung541 vagrant]$ export KUBECONFIG=$PWD/.kubeconfig-core:$PWD/.kubeconfig
[ayoung@ayoung541 vagrant]$ kubectl config get-contexts
* kubernetes-admin@core core kubernetes-admin 
 kubernetes-admin@kubernetes kubernetes kubernetes-admin

Whichever one declared the default first wins.

However, regardless of the order, I can explicitly set the context I want to use on the command line:

[ayoung@ayoung541 vagrant]$ kubectl config get-contexts
 kubernetes-admin@core core kubernetes-admin 
* kubernetes-admin@kubernetes kubernetes kubernetes-admin 
[ayoung@ayoung541 vagrant]$ kubectl --context=kubernetes-admin@core config get-contexts
* kubernetes-admin@kubernetes kubernetes kubernetes-admin 
 kubernetes-admin@core core kubernetes-admin

Again, notice the line where asterix specifies which context is in use.

With only two files, it might be easier to just specify the –kubeconfig option, but as the number of configs you work with grows, you might find you want to share the user data between two of them, or have a bunch of scripts that work between them, and it is easier to track which context to use than to track which file contains which set of data.

Merging two Kubernetes client configurations

Posted by Adam Young on May 25, 2017 03:22 PM

I have two distinct Kubernetes clusters I work with on a daily basis. One is a local vagrant bases set of VM built by the Kubevirt code base. The other is a “baremetal” install of OpenShift Origin on a pair of Fedora workstation in my office. I want to be able to switch back and forth between them.

When you run the kubectl command without specifying where the application should look for the configuration file, it defaults to looking in $HOME/.kube/config. This file maintains the configuration values for a handful of object types. Here is an abbreviated look at the one set up by origin.

apiVersion: v1
- cluster:
 api-version: v1
 certificate-authority-data: LS0...LQo=
 server: https://munchlax:8443
 name: munchlax:8443
- context:
 cluster: munchlax:8443
 namespace: default
 user: system:admin/munchlax:8443
 name: default/munchlax:8443/system:admin
- context:
 cluster: munchlax:8443
 namespace: kube-system
 user: system:admin/munchlax:8443
 name: kube-system/munchlax:8443/system:admin
current-context: kube-system/munchlax:8443/system:admin
kind: Config
preferences: {}
- name: system:admin/munchlax:8443
 client-certificate-data: LS0...tLS0K
 client-key-data: LS0...LS0tCg==

Note that I have ellided the very long cryptographic entries for certificate-authority-data, client-certificate-data, and client-key-data.

First up is an array of clusters.  The minimal configuration for each here provides a servername, which is the remote URL to use, some set of certificate authority data, and a name to be used for this configuration elsewhere in this file.

At the bottom of the file, we see a chunk of data for user identification.  Again, the user has a local name


With the rest of the identifying information hidden away inside the client certificate.

These two entities are pulled together in a Context entry. In addition, a context entry has a namespace field. Again, we have an array, with each entry containing a name field. The Name of the context object is going to be used in the current-context field and this is where kubectl starts its own configuration.   Here is an object diagram.

The next time I run kubectl, it will read this file.

  1. Based on the value of CurrentContext, it will see it should use the kube-system/munchlax:8443/system:admin context.
  2. From that context, it will see it should use
    1. the system:admin/munchlax:8443 user,
    2. the kube-system namespace, and
    3. the URL https://munchlax:8443 from the munchlax:8443 server.

Below is a similar file from the kubevirt set up, found on my machine at the path ~/go/src/kubevirt.io/kubevirt/cluster/vagrant/.kubeconfig

apiVersion: v1
- cluster:
 certificate-authority-data: LS0...LS0tLQo=
 name: kubernetes
- context:
 cluster: kubernetes
 user: kubernetes-admin
 name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
- name: kubernetes-admin
 client-certificate-data: LS0...LS0tLQo=
 client-key-data: LS0...LS0tCg==

Again, I’ve ellided the long cryptographic data.  This file is organized the same way as the default one.  kubevirt uses it via a shell script that resolves to the following command line:

${KUBEVIRT_PATH}cluster/vagrant/.kubectl --kubeconfig=${KUBEVIRT_PATH}cluster/vagrant/.kubeconfig "$@"

which overrides the default configuration location.  What if I don’t want to use the shell script?  I’ve manually merged the two files into a single ~/.kube/config.  The resulting one has two users,

  • system:admin/munchlax:8443
  • kubernetes-admin

two clusters,

  • munchlax:8443
  • kubernetes

and three contexts.

  • default/munchlax:8443/system:admin
  • kube-system/munchlax:8443/system:admin
  • kubernetes-admin@kubernetes

With current-context: kubernetes-admin@kubernetes:

$ kubectl get pods
haproxy-686891680-k4fxp 1/1 Running 0 15h
iscsi-demo-target-tgtd-2918391489-4wxv0 1/1 Running 0 15h
kubevirt-cockpit-demo-1842943600-3fcf9 1/1 Running 0 15h
libvirt-199kq 2/2 Running 0 15h
libvirt-zj6vw 2/2 Running 0 15h
spice-proxy-2868258710-l85g2 1/1 Running 0 15h
virt-api-3813486938-zpd8f 1/1 Running 0 15h
virt-controller-1975339297-2z6lc 1/1 Running 0 15h
virt-handler-2s2kh 1/1 Running 0 15h
virt-handler-9vvk1 1/1 Running 0 15h
virt-manifest-322477288-g46l9 2/2 Running 0 15h

but with current-context: kube-system/munchlax:8443/system:admin

$ kubectl get pods
tiller-deploy-3580499742-03pbx 1/1 Running 2 8d
youthful-wolverine-testme-4205106390-82gwk 0/1 CrashLoopBackOff 30 2h

There is support in the kubectl executable for configuration:

[ayoung@ayoung541 helm-charts]$ kubectl config get-contexts
 kubernetes-admin@kubernetes kubernetes kubernetes-admin 
 default/munchlax:8443/system:admin munchlax:8443 system:admin/munchlax:8443 default
* kube-system/munchlax:8443/system:admin munchlax:8443 system:admin/munchlax:8443 kube-system
[ayoung@ayoung541 helm-charts]$ kubectl config current-context kubernetes-admin@kubernetes
[ayoung@ayoung541 helm-charts]$ kubectl config get-contexts
 default/munchlax:8443/system:admin munchlax:8443 system:admin/munchlax:8443 default
* kube-system/munchlax:8443/system:admin munchlax:8443 system:admin/munchlax:8443 kube-system
 kubernetes-admin@kubernetes kubernetes kubernetes-admin

The openshift login command can add additional configuration information.

$ oc login
Authentication required for https://munchlax:8443 (openshift)
Username: ayoung
Login successful.

You have one project on this server: "default"

Using project "default".

This added the following information to my .kube/config

under contexts:

- context:
 cluster: munchlax:8443
 namespace: default
 user: ayoung/munchlax:8443
 name: default/munchlax:8443/ayoung

under users:

- name: ayoung/munchlax:8443
 token: 24i...o8_8

This time I elided the token.

It seems that it would be pretty easy to write a tool for merging two configuration files.  The caveats I can see include:

  • don’t duplicate entries
  • ensure that two entries with the same name but different values trigger an error

Getting started with helm on OpenShift

Posted by Adam Young on May 24, 2017 05:20 PM

After attending in on a helm based lab at the OpenStack summit, I decided I wanted to try it out for myself on my OpenShift cluster.

Since helm is not yet part of Fedora, I used the upstream binary distribution Inside the tarball was, among other things, a standalone binary named helm, which I moved to ~/bin (which is in my path). Once I had that in place:

$ helm init
Creating /home/ayoung/.helm 
Creating /home/ayoung/.helm/repository 
Creating /home/ayoung/.helm/repository/cache 
Creating /home/ayoung/.helm/repository/local 
Creating /home/ayoung/.helm/plugins 
Creating /home/ayoung/.helm/starters 
Creating /home/ayoung/.helm/repository/repositories.yaml 
$HELM_HOME has been configured at /home/ayoung/.helm.

Tiller (the helm server side component) has been installed into your Kubernetes Cluster.
Happy Helming!

Checking on that Tiller install:

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                             READY     STATUS    RESTARTS   AGE
default       docker-registry-2-z91cq          1/1       Running   0          23h
default       registry-console-1-g4qml         1/1       Running   0          1d
default       router-5-4w3zt                   1/1       Running   0          23h
kube-system   tiller-deploy-3210876050-8gx0w   1/1       Running   0          1m

But trying a helm command line operation fails.

$ helm list
Error: User "system:serviceaccount:kube-system:default" cannot list configmaps in project "kube-system"

This looks like an RBAC issue. I want to assign the role ‘admin’ to the user “system:serviceaccount:kube-system:tiller” on the project “kube-system”

$ oc project kube-system
Now using project "kube-system" on server "https://munchlax:8443".
[ansible@munchlax ~]$ oadm policy add-role-to-user admin system:serviceaccount:kube-system:tiller
role "admin" added: "system:serviceaccount:kube-system:tiller"
[ansible@munchlax ~]$ ./helm list
[ansible@munchlax ~]$

Now I can follow the steps outlined in the getting started guide:

[ansible@munchlax ~]$ ./helm create mychart
Creating mychart
[ansible@munchlax ~]$ rm -rf mychart/templates/
deployment.yaml  _helpers.tpl     ingress.yaml     NOTES.txt        service.yaml     
[ansible@munchlax ~]$ rm -rf mychart/templates/*.*
[ansible@munchlax ~]$ 
[ansible@munchlax ~]$ 
[ansible@munchlax ~]$ vi mychart/templates/configmap.yaml
[ansible@munchlax ~]$ ./helm install ./mychart
NAME:   esteemed-pike
LAST DEPLOYED: Wed May 24 11:46:52 2017
NAMESPACE: kube-system

==> v1/ConfigMap
NAME               DATA  AGE
mychart-configmap  1     0s
[ansible@munchlax ~]$ ./helm get manifest esteemed-pike

# Source: mychart/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
  name: mychart-configmap
  myvalue: "Hello World"
[ansible@munchlax ~]$ ./helm delete esteemed-pike
release "esteemed-pike" deleted

Exploring OpenShift RBAC

Posted by Adam Young on May 24, 2017 03:27 PM

OK, since I did it wrong last time, I’m going to try creating an user in OpenShift, and grant that user permissions to do various things. 

I’m going to start by removing the ~/.kube directory on my laptop and perform operations via SSH on the master node.  From my last session I can see I still have:

$ oc get users
ayoung cca08f74-3a53-11e7-9754-1c666d8b0614 allow_all:ayoung
$ oc get identities
allow_all:ayoung allow_all ayoung ayoung cca08f74-3a53-11e7-9754-1c666d8b0614

What openshift calls projects (perhaps taking the lead from Keystone?) Kubernetes calls namespaces:

$ oc get projects
default Active
kube-system Active
logging Active
management-infra Active
openshift Active
openshift-infra Active
[ansible@munchlax ~]$ kubectl get namespaces
default Active 18d
kube-system Active 18d
logging Active 7d
management-infra Active 10d
openshift Active 18d
openshift-infra Active 18d

According to the documentation here I should be able to log in from my laptop, and all of the configuration files just get magically set up.  Lets see what happens:

$ oc login
Server [https://localhost:8443]: https://munchlax:8443 
The server uses a certificate signed by an unknown authority.
You can bypass the certificate check, but any data you send to the server could be intercepted by others.
Use insecure connections? (y/n): y

Authentication required for https://munchlax:8443 (openshift)
Username: ayoung
Login successful.

You don't have any projects. You can try to create a new project, by running

oc new-project <projectname>

Welcome! See 'oc help' to get started.

Just to make sure I sent something, a typed in the password “test” but it could have been anything.  The config file now has this:

$ cat ~/.kube
.kube/ .kube.bak/ 
[ayoung@ayoung541 ~]$ cat ~/.kube/config 
apiVersion: v1
- cluster:
 insecure-skip-tls-verify: true
 server: https://munchlax:8443
 name: munchlax:8443
- context:
 cluster: munchlax:8443
 user: ayoung/munchlax:8443
 name: /munchlax:8443/ayoung
current-context: /munchlax:8443/ayoung
kind: Config
preferences: {}
- name: ayoung/munchlax:8443
 token: 4X2UAMEvy43sGgUXRAp5uU8KMyLyKiHupZg7IUp-M3Q

I’m going to resist the urge to look too closely into that token thing.
I’m going to work under the assumption that a user can be granted roles in several namespaces. Lets see:

 $ oc get namespaces
 Error from server (Forbidden): User "ayoung" cannot list all namespaces in the cluster

Not a surprise.  But the question I have now is “which namespace am I working with?”  Let me see if I can figure it out.

$ oc get pods
Error from server (Forbidden): User "ayoung" cannot list pods in project "default"

and via kubectl

$ kubectl get pods
Error from server (Forbidden): User "ayoung" cannot list pods in project "default"

What role do I need to be able to get pods?  Lets start by looking at the head node again:

[ansible@munchlax ~]$ oc get ClusterRoles | wc -l
[ansible@munchlax ~]$ oc get Roles | wc -l
No resources found.

This seems a bit strange. ClusterRoles are not limited to a namespace, whereas Roles are. Why am I not seeing any roles defined?

Lets start with figuring out who can list pods:

oadm policy who-can GET pods
Namespace: default
Verb:      GET
Resource:  pods

Users:  system:admin

Groups: system:cluster-admins

And why is this? What roles are permitted to list pods?

$ oc get rolebindings
NAME                   ROLE                    USERS     GROUPS                           SERVICE ACCOUNTS     SUBJECTS
system:deployer        /system:deployer                                                   deployer, deployer   
system:image-builder   /system:image-builder                                              builder, builder     
system:image-puller    /system:image-puller              system:serviceaccounts:default                        

I don’t see anything that explains why admin would be able to list pods there. And the list is a bit thin.

Another page advises I try the command

oc describe  clusterPolicy

But the output of that is voluminous. With a little trial and error, I discover I can do the same thing using the kubectl command, and get the output in JSON, to let me inspect. Here is a fragment of the output.

         "roles": [
                    "name": "admin",
                    "role": {
                        "metadata": {
                            "creationTimestamp": "2017-05-05T02:24:17Z",
                            "name": "admin",
                            "resourceVersion": "24",
                            "uid": "f063233e-3139-11e7-8169-1c666d8b0614"
                        "rules": [
                                "apiGroups": [
                                "attributeRestrictions": null,
                                "resources": [
                                "verbs": [

There are many more rules, but this one shows what I want: there is a policy role named “admin” that has a rule that provides access to the pods via the list verbs, among others.

Lets see if I can make my ayoung account into a cluster-reader by adding the role to the user directly.

On the master

$ oadm policy add-role-to-user cluster-reader ayoung
role "cluster-reader" added: "ayoung"

On my laptop

$ kubectl get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-2-z91cq    1/1       Running   3          8d
registry-console-1-g4qml   1/1       Running   3          8d
router-5-4w3zt             1/1       Running   3          8d

Back on master, we see that:

$  oadm policy who-can list pods
Namespace: default
Verb:      list
Resource:  pods

Users:  ayoung

Groups: system:cluster-admins

And now to remove the role:
On the master

$ oadm policy remove-role-from-user cluster-reader ayoung
role "cluster-reader" removed: "ayoung"

On my laptop

$ kubectl get pods
Error from server (Forbidden): User "ayoung" cannot list pods in project "default"

Fixing Bug 96869

Posted by Adam Young on May 23, 2017 03:47 AM

Bug 968696

The word Admin is used all over the place. To administer was originally something servants did to their masters. In one of the greater inversions of linguistic history, we now use Admin as a way to indicate authority. In OpenStack, the admin role is used for almost all operations that are reserved for someone with a higher level of authority. These actions are not expected to be performed by people with the plebean Member role.

Global versus Scoped

We have some objects that are global, and some that are scoped to projects. Global objects are typically things used to run the cloud, such as the set of hypervisor machines that Nova knows about. Everyday members are not allowed to “Enable Scheduling For A Compute Service” via the HTTP Call PUT /os-services/enable.

Keystone does not have a way to do global roles. All roles are scoped to a project. This by itself is not a problem. The problem is that a resource like a hypervisor does not have a project associated with it. If keystone can only hand out tokens scoped to projects, there is still no way to match the scoped token to the unscoped resource.

So, what Nova and many other services do is just look for the Role. And thus our bug. How do we go about fixing this?

Use cases

Let me see if I can show this.

In our initial state, we have two users.  Annie is the cloud admin, responsible for maintaining the over all infrastructure, such as “Enable Scheduling For A Compute Service”.  Pablo is a project manager. As such, he has to do admin level things, but only with his project, such as setting the Metadata used for servers inside this project.  Both operations are currently protected by the “admin” role.

Role Assignments

Lets look at the role assignment object diagram.  For this discussion, we are going to assume everything is inside a domain called “Default” which I will leave out of the diagrams to simplify them.

In both cases, our users are explicitly assigned roles on a project: Annie has the Admin role on the Infra project, and Pablo has the Admin role on the Devel project.


The API call to Add Hypervisor only checks the role on the token, and enforces that it must be “Admin.”  Thus, both Pablo and Annie’s scoped tokens will pass the policy check for the Add Hypervisor call.

How do we fix this?

Scope everything

Lets assume, for the moment, that we were able instantly run a migration that added a project_id to every database table that holds a resource, and to every API that manages those resources.  What would we use to populate that project_id?  What value would we give it?

Lets say we add an admin project value to Keystone.  When a new admin-level resource is made, it gets assigned to this admin project.  All of those resources we have already should get this value, too. How would we communicate this project ID?  We don’t have a keystone instance available when running the Nova Database migrations.

Turns out Nova does not need to know the actual project_id.  Nova just needs to know that Keystone considers the token valid for global resources.

Admin Projects

We’ve added a couple values to the Keystone configuration file: admin_domain_name and admin_project_name.  These two values are how Keystone specifies which project is represents and admin project.  When these two values are set, all token validation responses contain a value for is_admin_project.  If the project requested matches the domain and project name, that value is True, otherwise false.


instead, we want the create_cell call to use a different rule.  Instead of the scope check performed by admin_or_owner, it should confirm the admin role, as it did before, and also that the token has the is_admin_project Flag set.


Keystone already has support for setting is_admin_project, but none of the remote service are honoring it yet. Why?  In part because, in order for it to make sense for one to do so, they all must do so.  But also, because we cannot predict what project would be the admin project.

If we select a project based on name (e.g. Admin) we might be selecting a project that does not exist.

If we force that project to exist, we still do not know what users to assign to it.  We would have effectively broken their cloud, as no users could execute Global admin level tasks.

In the long run, the trick is to provide a transition plan for when the configuration options are unset.]

The Hack

If no admin project is set, then every project is admin project.  This is enforced by oslo-context, which is used in policy enforcement.

Yeah, that seems surprising, but tt turns out that we have just codified what every deployment has already.  Look ad the bug description again:

Problem: Granting a user an “admin” role on ANY tenant grants them unlimited “admin”-ness throughout the system because there is no differentiation between a scoped “admin”-ness and a global “admin”-ness.

Adding in the field is a necessary per-cursor to solving it, but the real problem is in the enforcement in Nova, Glance, and Cinder.  Until they enforce on the flag, the bug still exists.

Fixing things

There is a phased plan to fix things.

  1. enable the is_admin_project mechanism in Keystone but leave it disabled.
  2. Add is_admin_project enforcement in the policy file for all of the services
  3. Enable an actual admin_project in devstack and Tempest
  4. After a few releases, when we are sure that people are using admin_project, remove the hack from oslo-context.

This plan was discussed and agreed upon by the policy team within Keystone, and vetted by several of the developers in the other projects, but it seems it was never fully disseminated, and thus the patches have sat in a barely reviewed state for a long while…over half a year.  Meanwhile, the developers focused on this have shifted tasks.

Now’s The Time

We’ve got a renewed effort, and some new, energetic developers committed to making this happen.  The changes have been rewritten with advice from earlier code reviews and resubmitted.  This bug has been around for a long time: Bug #968696 was reported by Gabriel Hurley on 2012-03-29.  Its been a hard task to come up with and execute a plan to solve it.  If you are a core project reviewer, please look for the reviews for your project, or, even better, talk with us on IRC (Freenode #openstack-keystone) and help us figure out how to best adjust the default policy for your service. 


You know how to fix enterprise patching? Please tell me more!!!

Posted by Josh Bressers on May 22, 2017 12:54 AM
If you pay attention to Twitter at all, you've probably seen people arguing about patching your enterprise after the WannaCry malware. The short story is that Microsoft fixed a very serious security flaw a few months before the malware hit. That means there are quite a few machines on the Internet that haven't applied a critical security update. Of course as you imagine there is plenty of back and forth about updates. There are two basic arguments I keep seeing.

Patching is hard and if you think I can just turn on windows update for all these computers running Windows 3.11 on token ring you've never had to deal with a real enterprise before! You out of touch hipsters don't know what it's really like here. We've seen thing, like, real things. We party like it's 1995. GET OFF MY LAWN.

The other side sounds a bit like this.

How can you be running anything that's less than a few hours old? Don't you know what the Internet looks like! If everyone just applied all updates immediately and ran their business in the cloud using agile scrum based SecDevSecOps serverless development practices everything would be fine!

Of course both of these groups are wrong for basically the same reason. The world isn't simple, and whatever works for you won't work for anyone else. The tie that binds us all together is that everything is broken, all the time. All the things we use are broken, how we use them is broken, and how we manage them is broken. We can't fix them even though we try and sometimes we pretend we can fix things.

However ...

Just because everything is broken, that's no excuse to do nothing. It's easy to declare something too hard and give up. A lot of enterprises do this, a lot of enterprise security people are using this defense why they can't update their infrastructure. On the other side though, sometimes moving too fast is more dangerous than moving too slow. Reckless updates are no better than no updates. Sometimes there is nothing we can do. Security as an industry is basically a big giant Kobayashi Maru test.

I have no advice to give on how to fix this problem. I think both groups are silly and wrong but why I think this is unimportant. The right way is for everyone to have civil conversations where we put ourselves in the other person's shoes. That won't happen though, it never happens even though basically ever leader ever has said that sort of behavior is a good idea. I suggest you double down on whatever bad practices you've hitched your horse to. In the next few months we'll all have an opportunity to show why our way to do things is the worst way ever, and we'll also find an opportunity to mock someone else for noting doing things the way we do.

In this game there are no winners and losers, just you. And you've already lost.

Episode 48 - Machine Learning: Not actually magic

Posted by Open Source Security Podcast on May 21, 2017 07:53 PM
Josh and Kurt have a guest! Mike Paquette from Elastic discusses the fundamentals and basics of Machine Learning. We also discuss how ML could have helped with WannaCry.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="http://html5-player.libsyn.com/embed/episode/id/5534640/height/90/width/640/theme/custom/autonext/no/thumbnail/yes/autoplay/no/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="640"></iframe>

Show Notes

Unix Sockets For Auth

Posted by Robbie Harwood on May 21, 2017 04:00 AM

Let's not talk about the Pam/NSS stack and instead talk about a different weird auth thing on Linux.

So sockets aren't just for communication over the network. And by that I don't mean that one can talk to local processes on the same machine by connecting to localhost (which is correct, but goes over the "lo" network), but rather something designed for this purpose only: Unix domain sockets. Because they're restricted to local use only, their features can take advantage of both ends being managed by the same kernel.

I'm not interested in performance effects (and I doubt there are any worth writing home about), but rather what the security implications are. So of particular interest is SO_PEERCRED. With the receiving end of an AF_UNIX stream socket, if you ask getsockopt(2) nicely, it will give you back assurances about the connecting end of the socket in the form of a struct ucred. When _GNU_SOURCE is defined, this will contain pid, uid, and gid of the process on the other end.

It's worth noting that these are set while in the syscall connect(2). Which is to say that they can be changed by the process on the other end by things like dropping privileges, for instance. This isn't really a problem, though, in that it can't be exploited to gain a higher level of access, since the connector already has that access.

Anyway, the uid information is clearly useful; one can imagine filtering such that a connection came from apache, for instance (or not from apache, for that matter), or keeping per-user settings, or any number of things. The gid is less clearly useful, but I can immediately see uses in terms of policy setting, perhaps. But what about the pid?

Linux has a relative of plan9's procfs, which means there's a lot of information presented in /proc. (/proc can be locked down pretty hard by admins, but let's assume it's not.) proc(5) covers more of these than I will, but there are some really neat ones. Within /proc/[pid], the interesting ones for my purposes are:

  • cmdline shows the process's argv.

  • cwd shows the current working directory of the process.

  • environ similarly shows the process's environment.

  • exe is a symlink to the executable for the process.

  • root is a symlink to the process's root directory, which means we can tell whether it's in a chroot.

So it seems like we could use this to implement filtering by the process being run: for instance, we could do things only if the executable is /usr/bin/ssh. And indeed we can; /proc/[pid]/exe will be a symlink to the ssh binary, and everything works out.

There's a slight snag, though: /usr/bin/ssh is a native executable (in this case, an ELF file). But we can also run non-native executables using the shebang - e.g., #!/bin/sh, or #!/usr/bin/python2, and so on. While this is convenient for scripting, it makes the /proc/[pid]/exe value much less useful, since it will just point at the interpreter.

The way the shebang is implemented causes the interpreter to be run with argv[1] set to the input file. So we can pull it out of /proc/[pid]/cmdline and everything is fine, right?

Well, no. Linux doesn't canonicalize the path to the script file, so unless it was originally invoked using a non-relative path, we don't have that information.

Maybe we can do the resolution ourselves, though. We have the process environment, so $PATH-based resolution should be doable, right? And if it's a relative path, we can use /proc/[pid]/cwd, right?

Nope. Although inspecting the behavior of shells would suggest that /proc/[pid]/cwd doesn't change, this is a shell implementation detail; the program can just modify this value if it wants.

Even if we nix relative paths, we're still not out of the woods. /proc/[pid]/environ looks like exactly what it want, as the man page specifies that even getenv(3)/setenv(3) do not modify this. However, the next paragraph indicates the syscall needed to just move what region of memory it points to, so we can't trust that value either.

There's actually a bigger problem, though. Predictably, from the way the last two went, processes can just modify argv. So: native code only.

Anyway, thanks for reading this post about a piece of gssproxy's guts. Surprise!

OpenShift Origin Default Auth

Posted by Adam Young on May 16, 2017 04:37 PM

Once I got the Ansible playbook to run, I was able to poke at the openshift setup.

The install creates a default configuration in the Ansible users home directory on the master node.

I can use the openshift client:

ssh ansible@munchlax oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-2-z91cq    1/1       Running   0          18h
registry-console-1-g4qml   1/1       Running   0          20h
router-5-4w3zt             1/1       Running   0          18h
ssh ansible@munchlax oc create user ayoung

Or even the kubectl executable:

$ ssh ansible@munchlax kubectl get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-2-z91cq    1/1       Running   0          18h
registry-console-1-g4qml   1/1       Running   0          20h
router-5-4w3zt             1/1       Running   0          18h

If I want to pull this over to my home machine, I can use rsync:

rsync -a  ansible@munchlax:.kube ~/
[ayoung@ayoung541 kubevirt-ansible]$ ls ~/.kube/
cache  config  munchlax_8443  schema
[ayoung@ayoung541 kubevirt-ansible]$ kubectl get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-2-z91cq    1/1       Running   0          18h
registry-console-1-g4qml   1/1       Running   0          20h
router-5-4w3zt             1/1       Running   0          18h

Although the advice I got from sdodson in IRC sounds solid:

ansible_user on the first master should have admin’s kubeconfig in ~/.kube/config The intention is that you use that to provision additional admins/users and grant them required permissions. Then they can use `oc` or the web console using whatever credentials you’ve created for them.

I can use the WebUI by requesting the following URL from the Browser.


Assuming I bypass the Certificate warnings, I can see the login screen. Since the admin user is secured with a Client Cert, and the UI supports password login, I’ll create a user to mirror my account and log in that way, Following the instructions here:

[ayoung@ayoung541 kubevirt-ansible]$ oc create user ayoung
user "ayoung" created
[ayoung@ayoung541 kubevirt-ansible]$ oc get user ayoung
NAME      UID                                    FULL NAME   IDENTITIES
ayoung    cca08f74-3a53-11e7-9754-1c666d8b0614               
[ayoung@ayoung541 kubevirt-ansible]$ oc get identities
No resources found.
[ayoung@ayoung541 kubevirt-ansible]$ oc get identity
No resources found.

Hmmm, no Identity providers seem to be configured. I see I can override this via the Ansible inventory file if I rerun.

I can see the current configuration in

sudo cat /etc/origin/master/master-config.yaml

Which has this line in it…

      kind: AllowAllPasswordIdentityProvider

perhaps my new user will work?

From the login screen.  Using a password of ‘test’ which I have not set anywhere.


I get logged in and see the “new project” screen.

This works for development, but I need something more serious for a live deployment in the future.

Installing OpenShift Origin via Ansible on Fedora 25

Posted by Adam Young on May 15, 2017 11:18 PM

While many people referred me to run one of the virtualized setups of OpenShift, I wanted something on baremetal in order to eventually test out KubeVirt.  Just running

oc cluster up

As some people suggested did not work, as it assumes prerequisites are properly set up;  the docker registry was one that I tripped over.  So, I decided to give openshift-ansible a test run.  Here are my notes.

SSH and Ansible has been set up and used for upstream Kubernetes testing on this machine.  Kubernetes has been removed.  There might be artifacts left behind or not explicitly listed here.

There is no ~/.kube directory, which I know has messed me up elsewhere in the past.

git clone https://github.com/openshift/openshift-ansible

I have two nodes for the cluster. My head node is munchlax, and dialga the compute node.

sudo yum install python3 --best --allowerasing
sudo yum install python3-yaml

I created a local file for inventory that looks like this:





dialga openshift_node_labels="{'region': 'infra'}"



Note that, while it might seem silly to specify


for each of the groups instead of under all, specifying it for all:vars will break the deployment as it then overrides local commands and performs them via sudo, and those should not be done as root.

I’m still working on getting the versions values right, but these seemed to work, with a couple work arounds.  I’ve  posted a diff at the end.

The value openshift_node_labels=”{‘region’: ‘infra’}” is used to specify where the registry is installed.

To run the install, I ran:

ansible-playbook -vvvi /home/ayoung/devel/local-openshift-ansible/inventory.ini /home/ayoung/devel/openshift-ansible/playbooks/byo/config.yml

To test the cluster.

ssh ansible@munchlax

[ansible@munchlax ~]$ kubectl get pods

docker-registry-1-deploy 0/1 Pending 0 31m
registry-console-1-g4qml 1/1 Running 0 31m
router-4-deploy 0/1 Pending 0 32m

Update: I also needed one commit from a Pull request:

commit 75da091c3e917dc3cd673d4fd201c1b2606132f2
Author: Jeff Peeler <jpeeler>
Date:   Fri May 12 18:51:26 2017 -0400

    Fix python3 error in repoquery
    Explicitly convert from bytes to string so that splitting the string is
    successful. This change works with python 2 as well.
    Closes #4182

Here are the change from master I had to make by hand:

  1. the cerficate allocation used the unsupported flag –expired-days  which I removed.
  2. The Ansible sysctl module has a known issue for Python 3.  I converted to running the CLI
  3. The version check betwen the container and RPM versions was too strict an unpassable on my system.  Commented it out.
diff --git a/roles/openshift_hosted/tasks/registry/secure.yml b/roles/openshift_hosted/tasks/registry/secure.yml
index 29c164f..5134fdd 100644
--- a/roles/openshift_hosted/tasks/registry/secure.yml
+++ b/roles/openshift_hosted/tasks/registry/secure.yml
@@ -58,7 +58,7 @@
 - "{{ docker_registry_route_hostname }}"
 cert: "{{ openshift_master_config_dir }}/registry.crt"
 key: "{{ openshift_master_config_dir }}/registry.key"
- expire_days: "{{ openshift_hosted_registry_cert_expire_days if openshift_version | oo_version_gte_3_5_or_1_5(openshift.common.deployment_type) | bool else omit }}"
+# expire_days: "{{ openshift_hosted_registry_cert_expire_days if openshift_version | oo_version_gte_3_5_or_1_5(openshift.common.deployment_type) | bool else omit }}"
 register: server_cert_out
 - name: Create the secret for the registry certificates
diff --git a/roles/openshift_node/tasks/main.yml b/roles/openshift_node/tasks/main.yml
index 656874f..e2e187b 100644
--- a/roles/openshift_node/tasks/main.yml
+++ b/roles/openshift_node/tasks/main.yml
@@ -105,7 +105,12 @@
 # startup, but if the network service is restarted this setting is
 # lost. Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1372388
 - name: Persist net.ipv4.ip_forward sysctl entry
- sysctl: name="net.ipv4.ip_forward" value=1 sysctl_set=yes state=present reload=yes
+ command: sysctl -w net.ipv4.ip_forward=1 
+- name: reload for net.ipv4.ip_forward sysctl entry
+ command: sysctl -p/etc/sysctl.conf
 - name: Start and enable openvswitch service
diff --git a/roles/openshift_version/tasks/main.yml b/roles/openshift_version/tasks/main.yml
index 2e9b4ca..cc14453 100644
--- a/roles/openshift_version/tasks/main.yml
+++ b/roles/openshift_version/tasks/main.yml
@@ -99,11 +99,11 @@
 when: not rpm_results.results.package_found
 - set_fact:
 openshift_rpm_version: "{{ rpm_results.results.versions.available_versions.0 | default('0.0', True) }}"
- - name: Fail if rpm version and docker image version are different
- fail:
- msg: "OCP rpm version {{ openshift_rpm_version }} is different from OCP image version {{ openshift_version }}"
+# - name: Fail if rpm version and docker image version are different
+# fail:
+# msg: "OCP rpm version {{ openshift_rpm_version }} is different from OCP image version {{ openshift_version }}"
 # Both versions have the same string representation
- when: openshift_rpm_version != openshift_version
+# when: openshift_rpm_version != openshift_version
 when: is_containerized | bool
 # Warn if the user has provided an openshift_image_tag but is not doing a containerized install



Episode 47 - WannaCry: Everything is basically broken

Posted by Open Source Security Podcast on May 14, 2017 05:22 PM
Josh and Kurt discuss the WannaCry worm.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="http://html5-player.libsyn.com/embed/episode/id/5534642/height/90/width/640/theme/custom/autonext/no/thumbnail/yes/autoplay/no/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="640"></iframe>

Show Notes

Please Remove Your Prng

Posted by Robbie Harwood on May 14, 2017 04:00 AM

The gist of this post is that if your program or library has its own PRNG, I would like you to remove it. If you are not convinced that this is a good idea, read on; if you want links on what to do instead, skip to the second section.

Why do you have this?

I believe in code re-use, in not re-inventing the wheel where necessary, and in cooperation. So if code to do something exists, and there are no strong reasons why I shouldn't use it: I use it. Formulated in this manner it sounds almost like a strict rule; in practice, the same result can be achieved for new code just by laziness. For existing code, it becomes a maintainability question. I maintain code as well as writing it (as everyone who writes code should), and while I won't deny a certain satisfaction in well-oiled machinery, less would be better. So everything we maintain should serve a purpose, and reducing unneeded size, scope, and complexity are worthwhile.

Every project is different, which means your project's reasons for having a PRNG will be different as well. Maybe it doesn't care about the quality of the pseudorandom numbers (at which point it should probably just read /dev/urandom). Or maybe it's performing cryptographic operations (use getrandom(2) or similar). But I invite you to think about whether continuing to maintain your own is worth it, or whether it might be better to use something which has been more strongly audited (e.g., the kernel CSPRNG).

To look at an example: a few months ago now, I performed this change for krb5. In our case, we had a Fortuna implementation because quality entropy used to be difficult to come by. Fortuna specifically was used due to its ability to recover from low-quality input. However, upon further examination, the time to recover is quite long (so it only really helps the server), and in the mean time, operation will appear to be normal, with low quality random numbers. Since there're already quality random numbers available on all server platforms we support, I added the option to just use them directly. (This describes the current behavior on Fedora, as well as the behavior for all future RHEL/CentOS releases).

What to do instead

For this post, I will be focusing on Linux. If you are not in a Linux environment, you call a different function.

Anyway, the short answer is: you should use getrandom(2).

The longer answer is just me telling you how to use getrandom(2). For that, I want to draw from this post which contains a useful diagram about how /dev/random and /dev/urandom relate. The author points out two issues with using /dev/urandom directly on Linux (that do not occur on certain BSD, where one just uses /dev/random instead):

  • First, that /dev/urandom is not guaranteed to be seeded. getrandom(2) actually provides seeding guarantees by default. More precisely, it will block the call until the requested number of bytes can be generated; in the case of the urandom pool, this means until the pool has been seeded. (If this behavior is not desired, one should just read directly from /dev/urandom instead.)

  • Second, that one may wish to use /dev/random despite it being slower if they're feeling especially paranoid. There's a getrandom(2) flag for this, it turns out: GRND_RANDOM.

There's one pitfall with this approach, which is that (for reasons that are opaque to me) glibc was slow to add a wrapper for this function. (See: rhbz, upstream.) So if you want to support older versions of glibc, you have to use syscall(2) instead, like this.

Future work

A while back, I remember reading (but can no longer find) a post which surveyed open source software's usage of rand()/srand() and related functions. There were some decidedly bizarre usages in there, but most of them were just run-of-the-mill incorrect. Anyway, inspired by that, I've been toying with the idea of writing a shim library of sorts to make these functions actually return cryptographically random numbers, discarding seeding and such. The only real pitfall I'm aware of with this is users of these functions that expect deterministic behavior, but I'm not really sure I want to care. Maybe an environment variable for configuration or something.

Why Quotas are Hard

Posted by Adam Young on May 12, 2017 02:39 AM

A quota is a numerical limit on a group of resources. Quotas have to be both recorded and enforced.

We had a session at the summit this past week about hierarchical quotas and, if I took anything away from it, it is that quotas are hard.

Keystone supports a project hierarchy. Here’s a sample one for you:

Hierarchical quotas are assigned to a parent project and applied to a child project.  This hierarchy is only 3 levels deep and only has 9 projects.  A real deployment will be much larger than this. Often, a large organization has one project per user, in addition to departmental projects like the ones shown above.

Lets assume that our local sys-admin has granted our Internal domain a quota of 100 virtual machines.  How would we enforce this.  If the user attempts to create a VM in the root project of the hierarchy (a domain IS-A project) then Nova should see that the quota for that domain is 100, and that there are currently 0 VMs, so it should create the VM.  The second time this happens, there is a remaining quota of 99, and so on.

Now, lets assume that the quota is stored in Keystone, as in the current proposal we were discussing.  When Nova asks Keystone what is the quoat for “Internal” Keystone can return 100.  Nova can then query all VMs to find out which have a project ID that matches that of “Internal” and verify that there are 2. Since 100 – 2 > 0, Nova should create the VM.

What if the user wants to create a VM in the “Sales” project?  That is where things get hierarchical.  We discussed schemes where the quota would be explicitly assigned to Sales and where the quota was assumed to come from “Internal.”  Both are tricky.

Lets say we allow the explicit allocation of quota from higher to lower.  Does this mean that the parent project is reducing its own quota while creating an explicit quota for the lower project?  Or does it mean that both quotas need to be enforced?  If the quota for sales is set to 10, and the quota for the three node projects are all set to 10, is this legal or an error?

Lets assume, for a moment, that it is legal.  Under this scheme, a user with a token scoped to TestingA create 10 projects. As each project is created, Nova needs to check the number of machines already created in project TestingA.  It also needs to check the number of machines in project StagingA, ProductionA, and Sales to ensure that the quota for “Sales” has not been exceeded.  If the is an explicit quota on “Internal”, Nova needs to check the number of VMs created in that project and any project under it.  Our entire tree must be searched and counted and that count compared with the parent project.

Ideally, we would only ever have to check the quota for a single project.  That only works if:

  1. Every project in the whole tree has an explicit quota
  2. Quotas can be “split” amongst child projects but never reclaimed.

If that second statement seems strong, assume the “Marketing”  project with a quota of 10 chips off 9 for TestingB, creates 5 VMs, drops the quota for TestingB to 0, Sets the quota for StagingB to 9, and creates 9 VMs in that project.  This leaves it with 18 VMs running but only an explicit quota of 10.

The word “never” really is too strong, but it would require some form of reconcilliation process, by which Nova confirmed that both projects were within the end-state limits.

Automated Reconciliation is hard.  Keystone needs to know how to query random quanties on remote objects, and it probably should not even have acceess to those objects.  Or, Nova (and every other service using Quotas) needs to provide an API for keystone to query to confirm resources have been freed.

Manual reconcilliation is probably possible, but will be labor intensive.

One possibility is that Keystone actually record the usage of quotas, as well as the freeing of actual resources.  This is also painful, as now every single call that either creates or deletes a resource requires an additional call to Keystone.  Or, If quotas are “Batch” fetched by Nova, Nova needs to remember them, and store them locally.  If quotas then  change in Keystone, the cache is invalid.

This is only a fragment of the whole discussion.

Quotas are hard.