July 26, 2016

FreeIPA Lightweight CA internals

In the preceding post, I explained the use cases for the FreeIPA lightweight sub-CAs feature, how to manage CAs and use them to issue certificates, and current limitations. In this post I detail some of the internals of how the feature works, including how signing keys are distributed to replicas, and how sub-CA certificate renewal works. I conclude with a brief retrospective on delivering the feature.

Full details of the design of the feature can be found on the design page. This post does not cover everything from the design page, but we will look at the aspects that are covered from the perspective of the system administrator, i.e. "what is happening on my systems?"

Dogtag lightweight CA creation

The PKI system used by FreeIPA is called Dogtag. It is a separate project with its own interfaces; most FreeIPA certificate management features are simply reflecting a subset of the corresponding Dogtag interface, often integrating some additional access controls or identity management concepts. This is certainly the case for FreeIPA sub-CAs. The Dogtag lightweight CAs feature was implemented initially to support the FreeIPA use case, yet not all aspects of the Dogtag feature are used in FreeIPA as of v4.4, and other consumers of the Dogtag feature are likely to emerge (in particular: OpenStack).

The Dogtag lightweight CAs feature has its own design page which documents the feature in detail, but it is worth mentioning some important aspects of the Dogtag feature and their impact on how FreeIPA uses the feature.

  • Dogtag lightweight CAs are managed via a REST API. The FreeIPA framework uses this API to create and manage lightweight CAs, using the privileged RA Agent certificate to authenticate. In a future release we hope to remove the RA Agent and authenticate as the FreeIPA user using GSS-API proxy credentials.
  • Each CA in a Dogtag instance, including the "main" CA, has an LDAP entry with object class authority. The schema includes fields such as subject and issuer DN, certificate serial number, and a UUID primary key, which is randomly generated for each CA. When FreeIPA creates a CA, it stores this UUID so that it can map the FreeIPA CA’s common name (CN) to the Dogtag authority ID in certificate requests or other management operations (e.g. CA deletion).
  • The "nickname" of the lightweight CA signing key and certificate in Dogtag’s NSSDB is the nickname of the "main" CA signing key, with the lightweight CA’s UUID appended. In general operation FreeIPA does not need to know this, but the ipa-certupdate program has been enhanced to set up Certmonger tracking requests for FreeIPA-managed lightweight CAs and therefore it needs to know the nicknames.
  • Dogtag lightweight CAs may be nested, but FreeIPA as of v4.4 does not make use of this capability.

So, let’s see what actually happens on a FreeIPA server when we add a lightweight CA. We will use the sc example from the previous post. The command executed to add the CA, with its output, was:

% ipa ca-add sc --subject "CN=Smart Card CA, O=IPA.LOCAL" \
    --desc "Smart Card CA"
---------------
Created CA "sc"
---------------
  Name: sc
  Description: Smart Card CA
  Authority ID: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
  Subject DN: CN=Smart Card CA,O=IPA.LOCAL
  Issuer DN: CN=Certificate Authority,O=IPA.LOCAL 201606201330

The LDAP entry added to the Dogtag database was:

dn: cn=660ad30b-7be4-4909-aa2c-2c7d874c84fd,ou=authorities,ou=ca,o=ipaca
authoritySerial: 63
objectClass: authority
objectClass: top
cn: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
authorityID: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
authorityKeyNickname: caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d87
 4c84fd
authorityKeyHost: f24b-0.ipa.local:443
authorityEnabled: TRUE
authorityDN: CN=Smart Card CA,O=IPA.LOCAL
authorityParentDN: CN=Certificate Authority,O=IPA.LOCAL 201606201330
authorityParentID: d3e62e89-df27-4a89-bce4-e721042be730

We see the authority UUID in the authorityID attribute as well as cn and the DN. authorityKeyNickname records the nickname of the signing key in Dogtag’s NSSDB. authorityKeyHost records which hosts possess the signing key – currently just the host on which the CA was created. authoritySerial records the serial number of the certificate (more that that later). The meaning of the rest of the fields should be clear.

If we have a peek into Dogtag’s NSSDB, we can see the new CA’s certificate:

# certutil -d /etc/pki/pki-tomcat/alias -L

Certificate Nickname              Trust Attributes
                                  SSL,S/MIME,JAR/XPI

caSigningCert cert-pki-ca         CTu,Cu,Cu
auditSigningCert cert-pki-ca      u,u,Pu
Server-Cert cert-pki-ca           u,u,u
caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd u,u,u
ocspSigningCert cert-pki-ca       u,u,u
subsystemCert cert-pki-ca         u,u,u

There it is, alongside the main CA signing certificate and other certificates used by Dogtag. The trust flags u,u,u indicate that the private key is also present in the NSSDB. If we pretty print the certificate we will see a few interesting things:

# certutil -d /etc/pki/pki-tomcat/alias -L \
    -n 'caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd'
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 63 (0x3f)
        Signature Algorithm: PKCS #1 SHA-256 With RSA Encryption
        Issuer: "CN=Certificate Authority,O=IPA.LOCAL 201606201330"
        Validity:
            Not Before: Fri Jul 15 05:46:00 2016
            Not After : Tue Jul 15 05:46:00 2036
        Subject: "CN=Smart Card CA,O=IPA.LOCAL"
        ...
        Signed Extensions:
            ...
            Name: Certificate Basic Constraints
            Critical: True
            Data: Is a CA with no maximum path length.
            ...

Observe that:

  • The certificate is indeed a CA.
  • The serial number (63) agrees with the CA’s LDAP entry.
  • The validity period is 20 years, the default for CAs in Dogtag. This cannot be overridden on a per-CA basis right now, but addressing this is a priority.

Finally, let’s look at the raw entry for the CA in the FreeIPA database:

dn: cn=sc,cn=cas,cn=ca,dc=ipa,dc=local
cn: sc
ipaCaIssuerDN: CN=Certificate Authority,O=IPA.LOCAL 201606201330
objectClass: ipaca
objectClass: top
ipaCaSubjectDN: CN=Smart Card CA,O=IPA.LOCAL
ipaCaId: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
description: Smart Card CA

We can see that this entry also contains the subject and issuer DNs, and the ipaCaId attribute holds the Dogtag authority ID, which allows the FreeIPA framework to dereference the local ID (sc) to the Dogtag ID as needed. We also see that the description attribute is local to FreeIPA; Dogtag also has a description attribute for lightweight CAs but FreeIPA uses its own.

Lightweight CA replication

FreeIPA servers replicate objects in the FreeIPA directory among themselves, as do Dogtag replicas (note: in Dogtag, the term clone is often used). All Dogtag instances in a replicated environment need to observe changes to lightweight CAs (creation, modification, deletion) that were performed on another replica and update their own view so that they can respond to requests consistently. This is accomplished via an LDAP persistent search which is run in a monitor thread. Care was needed to avoid race conditions. Fortunately, the solution for LDAP-based profile storage provided a fine starting point for the authority monitor; although lightweight CAs are more complex, many of the same race conditions can occur and these were already addressed in the LDAP profile monitor implementation.

But unlike LDAP-based profiles, a lightweight CA consists of more than just an LDAP object; there is also the signing key. The signing key lives in Dogtag’s NSSDB and for security reasons cannot be transported through LDAP. This means that when a Dogtag clone observes the addition of a lightweight CA, an out-of-band mechanism to transport the signing key must also be triggered.

This mechanism is covered in the design pages but the summarised process is:

  1. A Dogtag clone observes the creation of a CA on another server and starts a KeyRetriever thread. The KeyRetriever is implemented as part of Dogtag, but it is configured to run the /usr/libexec/ipa/ipa-pki-retrieve-key program, which is part of FreeIPA. The program is invoked with arguments of the server to request the key from (this was stored in the authorityKeyHost attribute mentioned earlier), and the nickname of the key to request.
  2. ipa-pki-retrieve-key requests the key from the Custodia daemon on the source server. It authenticates as the dogtag/<requestor-hostname>@REALM service principal. If authenticated and authorised, the Custodia daemon exports the signing key from Dogtag’s NSSDB wrapped by the main CA’s private key, and delivers it to the requesting server. ipa-pki-retrieve-key outputs the wrapped key then exits.
  3. The KeyRetriever reads the wrapped key and imports (unwraps) it into the Dogtag clone’s NSSDB. It then initialises the Dogtag CA’s Signing Unit allowing the CA to service signing requests on that clone, and adds its own hostname to the CA’s authorityKeyHost attribute.

Some excerpts of the CA debug log on the clone (not the server on which the sub-CA was first created) shows this process in action. The CA debug log is found at /var/log/pki/pki-tomcat/ca/debug. Some irrelevant messages have been omitted.

[25/Jul/2016:15:45:56][authorityMonitor]: authorityMonitor: Processed change controls.
[25/Jul/2016:15:45:56][authorityMonitor]: authorityMonitor: ADD
[25/Jul/2016:15:45:56][authorityMonitor]: readAuthority: new entryUSN = 109
[25/Jul/2016:15:45:56][authorityMonitor]: CertificateAuthority init 
[25/Jul/2016:15:45:56][authorityMonitor]: ca.signing Signing Unit nickname caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd
[25/Jul/2016:15:45:56][authorityMonitor]: SigningUnit init: debug Certificate object not found
[25/Jul/2016:15:45:56][authorityMonitor]: CA signing key and cert not (yet) present in NSSDB
[25/Jul/2016:15:45:56][authorityMonitor]: Starting KeyRetrieverRunner thread

Above we see the authorityMonitor thread observe the addition of a CA. It adds the CA to its internal map and attempts to initialise it, which fails because the key and certificate are not available, so it starts a KeyRetrieverRunner in a new thread.

[25/Jul/2016:15:45:56][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Running ExternalProcessKeyRetriever
[25/Jul/2016:15:45:56][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: About to execute command: [/usr/libexec/ipa/ipa-pki-retrieve-key, caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd, f24b-0.ipa.local]

The KeyRetrieverRunner thread invokes ipa-pki-retrieve-key with the nickname of the key it wants, and a host from which it can retrieve it. If a CA has multiple sources, the KeyRetrieverRunner will try these in order with multiple invocations of the helper, until one succeeds. If none succeed, the thread goes to sleep and retries when it wakes up initially after 10 seconds, then backing off exponentially.

[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Importing key and cert
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Reinitialising SigningUnit
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: ca.signing Signing Unit nickname caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Got token Internal Key Storage Token by name
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Found cert by nickname: 'caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd' with serial number: 63
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Got private key from cert
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Got public key from cert
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: in init - got CA name CN=Smart Card CA,O=IPA.LOCAL

The key retriever successfully returned the key data and import succeeded. The signing unit then gets initialised.

[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: Adding self to authorityKeyHosts attribute
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: In LdapBoundConnFactory::getConn()
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: postCommit: new entryUSN = 361
[25/Jul/2016:15:47:13][KeyRetrieverRunner-660ad30b-7be4-4909-aa2c-2c7d874c84fd]: postCommit: nsUniqueId = 4dd42782-4a4f11e6-b003b01c-c8916432
[25/Jul/2016:15:47:14][authorityMonitor]: authorityMonitor: Processed change controls.
[25/Jul/2016:15:47:14][authorityMonitor]: authorityMonitor: MODIFY
[25/Jul/2016:15:47:14][authorityMonitor]: readAuthority: new entryUSN = 361
[25/Jul/2016:15:47:14][authorityMonitor]: readAuthority: known entryUSN = 361
[25/Jul/2016:15:47:14][authorityMonitor]: readAuthority: data is current

Finally, the Dogtag clone adds itself to the CA’s authorityKeyHosts attribute. The authorityMonitor observes this change but ignores it because its view is current.

Certificate renewal

CA signing certificates will eventually expire, and therefore require renewal. Because the FreeIPA framework operates with low privileges, it cannot add a Certmonger tracking request for sub-CAs when it creates them. Furthermore, although the renewal (i.e. the actual signing of a new certificate for the CA) should only happen on one server, the certificate must be updated in the NSSDB of all Dogtag clones.

As mentioned earlier, the ipa-certupdate command has been enhanced to add Certmonger tracking requests for FreeIPA-managed lightweight CAs. The actual renewal will only be performed on whichever server is the renewal master when Certmonger decides it is time to renew the certificate (assuming that the tracking request has been added on that server).

Let’s run ipa-certupdate on the renewal master to add the tracking request for the new CA. First observe that the tracking request does not exist yet:

# getcert list -d /etc/pki/pki-tomcat/alias |grep subject
        subject: CN=CA Audit,O=IPA.LOCAL 201606201330
        subject: CN=OCSP Subsystem,O=IPA.LOCAL 201606201330
        subject: CN=CA Subsystem,O=IPA.LOCAL 201606201330
        subject: CN=Certificate Authority,O=IPA.LOCAL 201606201330
        subject: CN=f24b-0.ipa.local,O=IPA.LOCAL 201606201330

As expected, we do not see our sub-CA certificate above. After running ipa-certupdate the following tracking request appears:

Request ID '20160725222909':
        status: MONITORING
        stuck: no
        key pair storage: type=NSSDB,location='/etc/pki/pki-tomcat/alias',nickname='caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd',token='NSS Certificate DB',pin set
        certificate: type=NSSDB,location='/etc/pki/pki-tomcat/alias',nickname='caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd',token='NSS Certificate DB'
        CA: dogtag-ipa-ca-renew-agent
        issuer: CN=Certificate Authority,O=IPA.LOCAL 201606201330
        subject: CN=Smart Card CA,O=IPA.LOCAL
        expires: 2036-07-15 05:46:00 UTC
        key usage: digitalSignature,nonRepudiation,keyCertSign,cRLSign
        pre-save command: /usr/libexec/ipa/certmonger/stop_pkicad
        post-save command: /usr/libexec/ipa/certmonger/renew_ca_cert "caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd"
        track: yes
        auto-renew: yes

As for updating the certificate in each clone’s NSSDB, Dogtag itself takes care of that. All that is required is for the renewal master to update the CA’s authoritySerial attribute in the Dogtag database. The renew_ca_cert Certmonger post-renewal hook script performs this step. Each Dogtag clone observes the update (in the monitor thread), looks up the certificate with the indicated serial number in its certificate repository (a new entry that will also have been recently replicated to the clone), and adds that certificate to its NSSDB. Again, let’s observe this process by forcing a certificate renewal:

# getcert resubmit -i 20160725222909
Resubmitting "20160725222909" to "dogtag-ipa-ca-renew-agent".

After about 30 seconds the renewal process is complete. When we examine the certificate in the NSSDB we see, as expected, a new serial number:

# certutil -d /etc/pki/pki-tomcat/alias -L \
    -n "caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd" \
    | grep -i serial
        Serial Number: 74 (0x4a)

We also see that the renew_ca_cert script has updated the serial in Dogtag’s database:

# ldapsearch -D cn="Directory Manager" -w4me2Test -b o=ipaca \
    '(cn=660ad30b-7be4-4909-aa2c-2c7d874c84fd)' authoritySerial
dn: cn=660ad30b-7be4-4909-aa2c-2c7d874c84fd,ou=authorities,ou=ca,o=ipaca
authoritySerial: 74

Finally, if we look at the CA debug log on the clone, we’ll see that the the authority monitor observes the serial number change and updates the certificate in its own NSSDB (again, some irrelevant or low-information messages have been omitted):

[26/Jul/2016:10:43:28][authorityMonitor]: authorityMonitor: Processed change controls.
[26/Jul/2016:10:43:28][authorityMonitor]: authorityMonitor: MODIFY
[26/Jul/2016:10:43:28][authorityMonitor]: readAuthority: new entryUSN = 1832
[26/Jul/2016:10:43:28][authorityMonitor]: readAuthority: known entryUSN = 361
[26/Jul/2016:10:43:28][authorityMonitor]: CertificateAuthority init 
[26/Jul/2016:10:43:28][authorityMonitor]: ca.signing Signing Unit nickname caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd
[26/Jul/2016:10:43:28][authorityMonitor]: Got token Internal Key Storage Token by name
[26/Jul/2016:10:43:28][authorityMonitor]: Found cert by nickname: 'caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd' with serial number: 63
[26/Jul/2016:10:43:28][authorityMonitor]: Got private key from cert
[26/Jul/2016:10:43:28][authorityMonitor]: Got public key from cert
[26/Jul/2016:10:43:28][authorityMonitor]: CA signing unit inited
[26/Jul/2016:10:43:28][authorityMonitor]: in init - got CA name CN=Smart Card CA,O=IPA.LOCAL
[26/Jul/2016:10:43:28][authorityMonitor]: Updating certificate in NSSDB; new serial number: 74

When the authority monitor processes the change, it reinitialises the CA including its signing unit. Then it observes that the serial number of the certificate in its NSSDB differs from the serial number from LDAP. It pulls the certificate with the new serial number from its certificate repository, imports it into NSSDB, then reinitialises the signing unit once more and sees the correct serial number:

[26/Jul/2016:10:43:28][authorityMonitor]: ca.signing Signing Unit nickname caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd
[26/Jul/2016:10:43:28][authorityMonitor]: Got token Internal Key Storage Token by name
[26/Jul/2016:10:43:28][authorityMonitor]: Found cert by nickname: 'caSigningCert cert-pki-ca 660ad30b-7be4-4909-aa2c-2c7d874c84fd' with serial number: 74
[26/Jul/2016:10:43:28][authorityMonitor]: Got private key from cert
[26/Jul/2016:10:43:28][authorityMonitor]: Got public key from cert
[26/Jul/2016:10:43:28][authorityMonitor]: CA signing unit inited
[26/Jul/2016:10:43:28][authorityMonitor]: in init - got CA name CN=Smart Card CA,O=IPA.LOCAL

Currently this update mechanism is only used for lightweight CAs, but it would work just as well for the main CA too, and we plan to switch at some stage so that the process is consistent for all CAs.

Wrapping up

I hope you have enjoyed this tour of some of the lightweight CA internals, and in particular seeing how the design actually plays out on your systems in the real world.

FreeIPA lightweight CAs has been the most complex and challenging project I have ever undertaken. It took the best part of a year from early design and proof of concept, to implementing the Dogtag lightweight CAs feature, then FreeIPA integration, and numerous bug fixes, refinements or outright redesigns along the way. Although there are still some rough edges, some important missing features and, I expect, many an RFE to come, I am pleased with what has been delivered and the overall design.

Thanks are due to all of my colleagues who contributed to the design and review of the feature; each bit of input from all of you has been valuable. I especially thank Ade Lee and Endi Dewata from the Dogtag team for their help with API design and many code reviews over a long period of time, and from the FreeIPA team Jan Cholasta and Martin Babinsky for a their invaluable input into the design, and much code review and testing. I could not have delivered this feature without your help; thank you for your collaboration!

July 25, 2016

Lightweight Sub-CAs in FreeIPA 4.4

Last year FreeIPA 4.2 brought us some great new certificate management features, including custom certificate profiles and user certificates. The upcoming FreeIPA 4.4 release builds upon this groundwork and introduces lightweight sub-CAs, a feature that lets admins to mint new CAs under the main FreeIPA CA and allows certificates for different purposes to be issued in different certificate domains. In this post I will review the use cases and demonstrate the process of creating, managing and issuing certificates from sub-CAs. (A follow-up post will detail some of the mechanisms that operate behind the scenes to make the feature work.)

Use cases

Currently, all certificates issued by FreeIPA are issued by a single CA. Say you want to issue certificates for various purposes: regular server certificates, and user certificates for VPN authentication, and authentication to a particular web service. Currently, assuming the certificate bore the appropriate Key Usage and Extended Key Usages extensions (with the default profile, they do), a certificate issued for one of these purposes could be used for all of the other purposes.

Issuing certificates for particular purposes (especially client authentication scenarios) from a sub-CA allows an administrator to configure the endpoint authenticating the clients to use the immediate issuer certificate for validation client certificates. Therefore, if you had a sub-CA for issuing VPN authentication certificates, and a different sub-CA for issuing certificates for authenticating to the web service, one could configure these services to accept certificates issued by the relevant CA only. Thus, where previously the scope of usability may have been unacceptably broad, administrators now have more fine-grained control over how certificates can be used.

Finally, another important consideration is that while revoking the main IPA CA is usually out of the question, it is now possible to revoke an intermediate CA certificate. If you create a CA for a particular organisational unit (e.g. some department or working group) or service, if or when that unit or service ceases to operate or exist, the related CA certificate can be revoked, rendering certificates issued by that CA useless, as long as relying endpoints perform CRL or OCSP checks.

Creating and managing sub-CAs

In this scenario, we will add a sub-CA that will be used to issue certificates for users’ smart cards. We assume that a profile for this purpose already exists, called userSmartCard.

To begin with, we are authenticated as admin or another user that has CA management privileges. Let’s see what CAs FreeIPA already knows about:

% ipa ca-find
------------
1 CA matched
------------
  Name: ipa
  Description: IPA CA
  Authority ID: d3e62e89-df27-4a89-bce4-e721042be730
  Subject DN: CN=Certificate Authority,O=IPA.LOCAL 201606201330
  Issuer DN: CN=Certificate Authority,O=IPA.LOCAL 201606201330
----------------------------
Number of entries returned 1
----------------------------

We can see that FreeIPA knows about the ipa CA. This is the "main" CA in the FreeIPA infrastructure. Depending on how FreeIPA was installed, it could be a root CA or it could be chained to an external CA. The ipa CA entry is added automatically when installing or upgrading to FreeIPA 4.4.

Now, let’s add a new sub-CA called sc:

% ipa ca-add sc --subject "CN=Smart Card CA, O=IPA.LOCAL" \
    --desc "Smart Card CA"
---------------
Created CA "sc"
---------------
  Name: sc
  Description: Smart Card CA
  Authority ID: 660ad30b-7be4-4909-aa2c-2c7d874c84fd
  Subject DN: CN=Smart Card CA,O=IPA.LOCAL
  Issuer DN: CN=Certificate Authority,O=IPA.LOCAL 201606201330

The --subject option gives the full Subject Distinguished Name for the new CA; it is mandatory, and must be unique among CAs managed by FreeIPA. An optional description can be given with --desc. In the output we see that the Issuer DN is that of the IPA CA.

Having created the new CA, we must add it to one or more CA ACLs to allow it to be used. CA ACLs were added in FreeIPA 4.2 for defining policies about which profiles could be used for issuing certificates to which subject principals (note: the subject principal is not necessarily the principal performing the certificate request). In FreeIPA 4.4 the CA ACL concept has been extended to also include which CA is being asked to issue the certificate.

We will add a CA ACL called user-sc-userSmartCard and associate it with all users, with the userSmartCard profile, and with the sc CA:

% ipa caacl-add user-sc-userSmartCard --usercat=all
------------------------------------
Added CA ACL "user-sc-userSmartCard"
------------------------------------
  ACL name: user-sc-userSmartCard
  Enabled: TRUE
  User category: all

% ipa caacl-add-profile user-sc-userSmartCard --certprofile userSmartCard
  ACL name: user-sc-userSmartCard
  Enabled: TRUE
  User category: all
  CAs: sc
  Profiles: userSmartCard
-------------------------
Number of members added 1
-------------------------

% ipa caacl-add-ca user-sc-userSmartCard --ca sc
  ACL name: user-sc-userSmartCard
  Enabled: TRUE
  User category: all
  CAs: sc
-------------------------
Number of members added 1
-------------------------

A CA ACL can reference multiple CAs individually, or, like we saw with users above, we can associate a CA ACL with all CAs by setting --cacat=all when we create the CA ACL, or via the ipa ca-mod command.

A special behaviour of CA ACLs with respect to CAs must be mentioned: if a CA ACL is associated with no CAs (either individually or by category), then it allows access to the ipa CA (and only that CA). This behaviour, though inconsistent with other aspects of CA ACLs, is for compatibility with pre-sub-CAs CA ACLs. An alternative approach is being discussed and could be implemented before the final release.

Requesting certificates from sub-CAs

The ipa cert-request command has learned the --ca argument for directing the certificate request to a particular sub-CA. If it is not given, it defaults to ipa.

alice already has a CSR for the key in her smart card, so now she can request a certificate from the sc CA:

% ipa cert-request --principal alice \
    --profile userSmartCard --ca sc /path/to/csr.req
  Certificate: MIIDmDCCAoCgAwIBAgIBQDANBgkqhkiG9w0BA...
  Subject: CN=alice,O=IPA.LOCAL
  Issuer: CN=Smart Card CA,O=IPA.LOCAL
  Not Before: Fri Jul 15 05:57:04 2016 UTC
  Not After: Mon Jul 16 05:57:04 2018 UTC
  Fingerprint (MD5): 6f:67:ab:4e:0c:3d:37:7e:e6:02:fc:bb:5d:fe:aa:88
  Fingerprint (SHA1): 0d:52:a7:c4:e1:b9:33:56:0e:94:8e:24:8b:2d:85:6e:9d:26:e6:aa
  Serial number: 64
  Serial number (hex): 0x40

Certmonger has also learned the -X/--issuer option for specifying that the request be directed to the named issuer. There is a clash of terminology here; the "CA" terminology in Certmonger is already used to refer to a particular CA "endpoint". Various kinds of CAs and multiple instances thereof are supported. But now, with Dogtag and FreeIPA, a single CA may actually host many CAs. Conceptually this is similar to HTTP virtual hosts, with the -X option corresponding to the Host: header for disambiguating the CA to be used.

If the -X option was given when creating the tracking request, the Certmonger FreeIPA submit helper uses its value in the --ca option to ipa cert-request. These requests are subject to CA ACLs.

Limitations

It is worth mentioning a few of the limitations of the sub-CAs feature, as it will be delivered in FreeIPA 4.4.

All sub-CAs are signed by the ipa CA; there is no support for "nesting" CAs. This limitation is imposed by FreeIPA – the lightweight CAs feature in Dogtag does not have this limitation. It could be easily lifted in a future release, if there is a demand for it.

There is no support for introducing unrelated CAs into the infrastructure, either by creating a new root CA or by importing an unrelated external CA. Dogtag does not have support for this yet, either, but the lightweight CAs feature was designed so that this would be possible to implement. This is also why all the commands and argument names mention "CA" instead of "Sub-CA". I expect that there will be demand for this feature at some stage in the future.

Currently, the key type and size are fixed at RSA 2048. Same is true in Dogtag, and this is a fairly high priority to address. Similarly, the validity period is fixed, and we will need to address this also, probably by allowing custom CA profiles to be used.

Conclusion

The Sub-CAs feature will round out FreeIPA’s certificate management capabilities making FreeIPA a more attractive solution for organisations with sophisticated certificate requirements. Multiple security domains can be created for issuing certificates with different purposes or scopes. Administrators have a simple interface for creating and managing CAs, and rules for how those CAs can be used.

There are some limitations which may be addressed in a future release; the ability to control key type/size and CA validity period will be the highest priority among them.

This post examined the use cases and high-level user/administrator experience of sub-CAs. In the next post, I will detail some of the machinery that makes the sub-CAs feature work.

July 24, 2016

Looking for Andre

My Brother sent out the following message. Signal boosting it here.

“A few weeks ago I started talking to a few guys on the street. (Homeless) Let’s call them James and Anthony. Let’s just skip ahead. I bought them lunch. Ok. I bought $42 worth of Wendy’s $1 burgers and nuggets and a case of water. On top of their lunch. They gathered up all their friends by the Library in Copley sq and made sure that everyone ate. It was like a cookout. You should have seen how happy everyone was. It gave me a feeling that was unexplainable.

“This morning I was in Downtown crossings. I got the feeling in my gut again. That do something better today feeling. I saw a blind guy. His eyes were a mess. He was thin. Almost emaciated. Let’s call him Andre’ he is 30 years old.

looking_for_andre

Andre’

I bought him lunch. I sat with him at a table while he ate. We talked. Andre’s back story…8 years ago he was in college. He was a plumbers apprentice. He was going on a date. As he walked up to the door to knock for the girl. Someone came up and shot him twice in the temple. Andre’ woke up in the hospital blind. To this day he has no idea who or why he was shot. The only possessions Andre’ had was the way-too-warm clothes on his back, his blind cane. His sign, and his cup. I took Andre’ to TJ Maxx. It’s 90 degrees at at 9:30am. I got him a t-shirt, shorts, clean socks and underwear and a back pack. After I paid, I took him back to the dressing room so he could have some privacy while he changed. I told the lady at the dressing room that he was going in to change. She told me that wasn’t allowed. I kindly informed her that I wasn’t asking… She looked at me and quickly realized it wasn’t a request. More of a statement. I must have had a look on my face.

I get those sometimes.

She nodded her understanding. In the dressing room Andre’ cried. He was ashamed for crying. I didn’t say much. Just put my hand on his back for a second to let him knew I understood. After he changed I took him back to where I originally met him and found out his routine. Where he goes when and such. I left Andre’ in his spot and went to go find James and Anthony. You remember them from the beginning of this story. They were in the same spot as a few weeks ago. They remembered me. I told them it was time to return the favor. I explained to them that I wanted them to look out for Andre’ to make sure he was safe. Andre’ has been repeatedly mugged. Who the fuck mugs a hungry homeless blind guy? Well. They must have seen the look in my face saying this wasn’t a request.

I apparently get that look sometimes.

They came with me from Copley all the way to downtown crossings. We went looking for Andre’. We looked all over but couldn’t find him. We went all over south station and back up all over downtown crossings. (For those not familiar, Google a map of Boston) we couldn’t find Andre’. Anthony said he’s seen him around and knew who I was talking about. They promised me they would look for him everyday. I know they will too. They look out for theirs. Remember all the food I bought them and how they made sure everyone ate? James doesn’t like bullies. He sure as shit won’t tolerate someone stealing from a blind and scared homeless guy. Anthony spends his mornings in south station. He promised me that he will find him and try to bring him to where they stay. It’s safer in numbers and when you have a crew watching your back. You have to know who to trust. That’s what they told me. I gave James and Anthony some money for their time and bought them each a cold drink.

“It’s fucking hot out.

“These guys are all on hard times. Some of them fucked up. Some were just unlucky. Andre’…now that’s some shit luck. That’s just not fucking fair. I’ve never met someone like Andre’. How in the hell would I survive if I couldn’t see? I have an amazing family and a great group of friends. Andre’ has no one. Did I change his life? Nope. Did I make his day better? I honestly hope so. I talked to him like a man. I didn’t let him know how horrible I felt for him. No matter how far you fall in life. If you have the strength to get up each day and try to feed your self, you still have pride, you still have hope. I didn’t want to take away any of his pride. He doesn’t have much to begin with. But he must have a little. I will continue to look for Andre’ every day. I met him near my office. I can look during my lunch. I have to find him and keep an eye on him.

“No matter how bad things get. No matter how unfair you feel you have been treated. Pretty much no matter what your lot in life is. Think of Andre’ when you feel down. If he has the strength to go on… So do you.

“I didn’t write this to say ‘look what great things I did.’ I wish I could write this with out being part of the story. There is no way I could express how much this meeting of Andre’ has effected me with out letting you know this is what I did today. ..

“I just got home from this experience. I’ll update this when I find Andre’ and let you know how he’s doing. If anyone in Boston reads this and sees a black guy about my height. Thinner than me…Obviously blind.

“Please hashtag ‪#‎lookingforAndre‬ and tell me where you saw him. Like I said. South station or downtown crossings are the areas that I know of. Thank you for reading this. Help me find Andre’.”

And then he sent this

“I found Andre’. He is meeting me for breakfast tomorrow.”

 

UPDATE:

Billy Set up a fundraising account for Andre.

 

July 20, 2016

IoT Gateways

After discussing the “thing” part of IoT in Devices – the “Thing” in IoT let’s take a look at overall IoT system design.

IoT Gateways connect IoT Devices to IoT back-end systems. Gateways connect to devices using interfaces like Ethernet, WiFi, Bluetooth, 6LoWPAN, RS-485 and CANbus. Gateways connect to back-end systems through the Internet, commonly using Ethernet, WiFi, or cellular connections. Gateways perform multiple tasks, including concatenation of multiple devices, protocol conversion, device management, and security. Gateways may also perform application processing.

Since IoT Gateways are connected directly to IoT Devices they have to be co-located with the Devices. This means that gateways are deployed in hostile environments. They are accessed through network interfaces connecting both to local devices and to the Internet. People have physical access to the gateways. Users need access to the gateway to perform a variety of functions such as device discovery and registration. These users may be inexperienced, malicious, or both.

Gateways will often need to function which disconnected from the Internet. Such disconnected operation may be deliberate – a low power sensor may only connect to the network once a day, and spend the rest of the time in a low power sleep state. A system on a moving vehicle such as a truck, train, or ship may have critical communications through an expensive, low bandwidth cellular link, and then intermittently connect to a high bandwidth link such as WiFi. This might occur when a truck pulls into a warehouse or service station, when a ship docks, or when a train enters a station. These systems would be designed for disconnected operation. Another case might be a hospital, which needs to continue operations, perhaps in a degraded mode, in events where network connectivity, power, and other resources fail. It is clearly unacceptable for a hospital to shut down if it loses connection to the cloud!

These situations mean that a complete software stack needs to be installed on the gateway, with all of the management, update, and access challenges that this presents.

While gateways will most commonly be structured as application specific appliances there are many ways to use gateways.


July 18, 2016

Using a HooToo Nano as a magic VPN box
I've been getting myself ready for Blackhat. If you're going you know this conference isn't like most. You don't bring your normal gear with you. You turn the tinfoil hat knob up to an 11, then keep turning it until it breaks off. I did do one thing that's pretty clever this year though, I have no doubt it could be useful for someone else putting together an overengineered tin foil hat security rig.

When I travel I use a little travel router from HooToo. Specifically this one. The basic idea is I can use either ethernet or wifi to connect all my devices to the Internet. I get my own private network behind the device which lets the Chromecast work in a hotel and means I don't have to keep logging in 15 devices once a day. This got me thinking though, wouldn't it be cool if the HooToo router could VPN for me.

Enter the HooToo Nano.

Now I'm sure I could have found a travel router someone makes that does VPN, but that's not nearly as exciting as figuring this out myself, bricking it a few times, unbricking it, and eventually having a solution that works well enough I can live with it. You can install OpenWRT on it which makes it an insanely awesome device.

Here's the basics. I connect the router to a wireless network (which is a pain to with OpenWRT). Once I'm connected up, I flip the switch on the side of the Nano and it connects to the VPN, a green light turns on once the VPN is active. Everyone knows green means good, right? If I flip the switch back, it turns the VPN off (the green light turns off). The biggest problem was there is a bug in OpenWRT where if one of the wireless networks it's configured to connect to can't be found, none of the wireless will come up. My solution is I can hit the reset button to return the router to a known good state.

In the spirit of open source, I'll explain how to do all this. Your mileage may vary, it's not simple, but let's face it, it's awesome. I have a magic box that when the green light turns on, I no longer have to worry about the scary local wifi. Perfect for a conference where nobody and nothing can be trusted.

On with the show.

First, you need a HooToo Nano (this is easy). Then you install OpenWRT (this is less easy). I'm not going to explain this part. Apart from already being documented, I don't want to do it again to write it down, I have things working, I'm not touching anything.

Next you need to get openvpn working on it. I followed these instructions from the IPredator folks.

At this point you should have a functioning VPN if you run the init.d openvpn script. With the VPN up, I setup a firewall target called 'vpn'. That name will be important later.

First, we will need to create a nice default configuration. As I said before, OpenWRT has a bug where if one of your wireless networks can't be found, none will work. As I don't have time to figure that bug out right now, I put together some configuration files that only have one wireless network configured as an access point. This configuration exists so I can connect to the router and setup more networks. I then copied all the configuration files from /etc/config to /root/config/
Then I edit /etc/rc.button/reset to add the line
cp /root/config/* /etc/config/
Right before the sync and reboot commands. By doing this I can hit the reset button with a paperclip to return the router to my default settings. Also as a side note, if you hold the reset button down for more than 5 seconds it will do an OpenWRT factory reset, so don't do that.

Lastly, we setup the switch. The best way I could find to read it was by creating the directory /etc/hotplug.d/button, then adding an executable script called "buttons" to it.
root@OpenWrt:~# cat /etc/hotplug.d/button/buttons
#!/bin/sh
. /etc/profile
#logger the button was $BUTTON and the action was $ACTION
if test "$BUTTON" = 'BTN_0'; then
if test "$ACTION" = 'pressed'; then
uci set firewall.@forwarding[0].dest='vpn'
/etc/init.d/openvpn start
fi
if test "$ACTION" = 'released'; then
uci set firewall.@forwarding[0].dest='wan'
/etc/init.d/openvpn stop
/sbin/fw3 reload
fi
fi
As you can see in the script, I set the vpn firewall to my forwarding target. If you name your vpn firewall something else, be sure to change it.

Without  a doubt these instructions aren't as clear as they should be. I don't have time right now to write this up properly, someday I would love to put together an OpenWRT image with all this baked in, but for the moment I hope it's useful for someone.

If you try this and have questions, feel free to find me on Twitter: @joshbressers

July 11, 2016

Entry level AI
I was listening to the podcast Security Weekly and the topic of using AI For security work came up. This got me thinking about how most people make their way into security and what something like AI might mean for the industry.

In virtually every industry you start out doing some sort of horrible job nobody else wants to do, but you have to start there because it's the place you start to learn the skills you need for more exciting and interesting work. Nobody wants to go over yesterday's security event log, but somebody does it.

Now consider this in the context of AI. AI can and will parse the event logs faster and better than a human ever could. We're terrible at repetitive boring tasks. Computers are awesome at repetitive boring tasks. It might take the intern two hours to parse the log files, it will take the log parser two seconds. And the computer won't start thinking about donuts halfway through. Of course there are plenty of arguments how today's AI have problems which is true. They're still probably better than humans though.

But here is what really got me thinking. As more and more of this work moves to the domain of AI and machines, what happens to the entry level work? I'm all for replacing humans with robots, without getting into the conversation about what will all the humans do when the robots take over, I'm more interested in entry level work and where the new talent comes from.

For the foreseeable future, we will need people to do the high skilled security work. By definition most of the high skilled people are a bit on the aged side. Most of us worked our way up from doing something that can be automated away (thank goodness). But where will get our new batch of geezers from? If there are no entry level offering, how can security people make the jump to the next level? I'm sure right now there are a bunch of people standing up screaming "TRAINING", but let's face it, that only gets you a little way there, you still need to get your hands dirty before you're actually useful. You're not going to trust a brain surgeon who has never been in an operating room but has all the best training.

I don't have any answers or even any suggestions here. It just happened to get me thinking. It's possible automation will follow behind the geezers which would be a suitable solution. It's possible we'll need to make some token entry level positions just to raise the skill levels.

What do you think? @joshbressers

July 10, 2016

Liveness

The term Liveness here refers to the  need to ensure that the data used to make an authorization check is valid at the time of the check.

The mistake I made with PKI tokens was in not realizing how important Liveness was.  The mistake was based on the age old error of confusing authentication with authorization.  Since a Keystone token is used for both, I was confused into thinking that the primary importance was on authentication, but the reality is that the most important thing a token tells you is information essential to making an authorization decision.

Who you are does not change often.  What you can do changes much more often.  What OpenStack needs in the token protocol is a confirmation that the user is authorized to make this action right now.  PKI tokens, without revocation checks, lost that liveness check.  The revocation check undermined the primary value of PKI.

That is the frustration most people have with certificate revocation lists (CRLs).  Since Certificates are so long lived, there is very little “freshness” to the data.  A CRL is a way to say “not invalidated yet” but, since a cert might carry data more than just “who are you” certificates can often become invalid.  Thus, any active system built on X509 for authorization (not just authentication) is going to have many many revocations.  Keystone tokens fit that same profile. The return to server validated tokens (UUID or Fernet) return that Freshness check.

However, bearer tokens have a different way of going stale.  If I get a token, use it immediately, the server knows that It was very highly probably that the token came from me.  If I wait, the probability drops.  The more I use the same token, and the longer I use it, the greater the probability is that someone other than me is going to get access to that token.  And that means the probability that it is going to be misused has also increased.

I’ve long said that what I want is a token that lasts roughly five minutes.  That means that it is issued, used, and  discarded, with a little wiggle room for latency and clock skew across the network.  The problem with this is that a token is often used for a long running task.  If a task takes 3 hours, but a token is good for only five minutes, there is no way to perform the task with just that token.

One possible approach to returning this freshness check is to always have some fresh token on a call, just not necessarily the one that the user originally requested.  This is the idea behind the Trust API.  A Trust is kind-of-like a long term token, but one that is only valid when paired with a short term token for the trustee.  But creating a trust every time a user wants to create a new virtual machine is too onerous, too much overhead.  What we want, instead is a rule that says:

When Nova calls Glance on behalf of a user, Nova passes a freshly issued token for itself along with the original users token.  The original user’s token will be validated based on when it was issued.  Authorization requires the combination of a fresh token for the Nova service user and a not-so-fresh-but-with-the-right-roles token for the end user.

This could be done with no changes to the existing token format. Set the token expiration to 12 hours.  The only change would be inside python-keystonemiddleware.  It would have a pair of rules:

  1. If a single token is passed in, it must have been issued within five minutes.  Otherwise, the operation returns a 401.
  2. If a service token is passed in with the user’s token, the service token must have been issued within five minutes.  The users token is validated normally.

An additional scope limiting mechanism would further reduce the possibility of abuse.  For example,

  • Glance could limit the service-token scoped operations from Nova to fetching an image and saving a snapshot.
  • Nova might only allow service-scoped tokens from a service like Trove within a 15 minute window.
  • A user might have to ask for an explicit “redelegation” role on a token before handing it off to some untrusted service run off site.

With Horizon, we already have a mechanism that says that it has to fetch an unscoped token first, and then use that to fetch a scoped token.  Horizon can be smart enough to fetch an scoped token before each bunch of calls to a remote server, cache if for only a minute, and use the unscoped token only in communication with Keystone.  The unscoped token, being validated by Keystone, is sufficient for maintaining “Liveness” of the rest of the data for a particular workflow.

Its funny how little change this would require to OpenStack, and how big an impact it would make on security.  It is also funny how long it took for this concept to coalesce.

July 09, 2016

Tokens without revocation

PKI tokens in Keystone suffered from many things, most essentially the trials due to the various forms of revocation. I never wanted revocation in the first place. What could we have done differently? It just (I mean moments ago) came to me.

A PKI token is a signed document that says “at this point in time, these things are true” where “these things” have to do with users roles in projects. Revocation means “these things are no longer true.” But long running tasks need long running authentication. PKI tokens seem built for that.

What we should distinguish is a difference between kicking off a new job, and continued authorization for an old job. When a user requests something from Nova, the only identity that comes into play is the users own Identity. Nova needs to confirm this, but, in a PKI token world, there is no need to go and ask Keystone.

In a complex operation like launching a VM, Nova needs to ask Glance to do something. Today, Nova passes on the token it received, and all is well. This makes tokens into true bearer tokens, and they are passed around far too much for my comfort.

Lets say that, to start, when Nova calls Glance, Nova’s own Identity should be confirmed. Tokens are really poor for this, a much better way would be to use X509. While Glance would need to do a mapping transform, the identity of Nova would not be transferable. Put another way, Nova would not be handing off a bearer token to Glance. Bearer tokens from Powerful systems like Nova are a really scary thing.

If we had this combination of user-confirmed-data and service-identity, we would have a really powerful delegation system. Why could this not be done today, with UUID/Fernet tokens? If we only ever had to deal with a max of two hops, (Nova to Glance, Nova to Neutron) we could.

Enter Trove, Heat, Sahara, and any other process that does work on behalf of a user. Lets make it really fun and say that we have the following chain of operations:

Deep-delegatuion-chain

If any one links in this chain is untrusted, we cannot pass tokens along.
What if, however, each step had a rule that said “I can accept tokens for users from Endpoint E”  and passed a PKI token along.  User submits a PKI token to Heat.  Heat passes this. plus its own identity on to Sahara, that trusts Heat.  And so on down the line.

OK…revocations.  We say here that a PKI token is never revoked.  We make it valid for the length of long running operations…say a day.

But we add an additional rule:  A user can only use a PKI token within 5 minutes of issue.

Service to Service calls can use PKI tokens to say “here is when it was authorized, and it was good then.”

A user holds on to A PKI token for 10 minutes, tries to call Nova, and the token is rejected as “too old.”

This same structure would work with Fernet tokens, assuming a couple things:

  1. We get rid of revocations checks for tokens validated with service tokens.
  2. If a user loses a role, we are OK with having a long term operation depending on that role failing.

I think this general structure would make OpenStack a hell of a lot more scalably secure than it is today.

Huge thanks to Jamie Lennox for proposing a mechanism along these lines.

Bypassing Version Discovery in Keystoneauth1

I’ve been a happy Dreamhost customer for many years.  So I was thrilled when I heard that they had upgrade Dreamcompute to Mitaka.  So, like the good Keystoner that I am, I went to test it out.  Of course, I tried to use the V3 API.   And it failed.

What?  Dreamhost wouldn’t let me down, would they?

No.  V3 works fine, it is discovery that is misconfigured.

If you do not tell the openstack client (and thus keystoneauth1) what plugin to use, it defaults to the non version specific Password plugin that does version discovery,  What this means is it will go to the auth URL you give it, and try to figure out what the right version to use is.  And, it so happens that there is a nasty bit of Keystone which is not well documented that makes the dreamhost /v3 page look like this:

$ curl $OS_AUTH_URL
{"version": {"status": "stable", "updated": "2013-03-06T00:00:00Z", "media-types":

[{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"},

{"base": "application/xml", "type": "application/vnd.openstack.identity-v3+xml"}], "id":

"v3.0", "links": [{"href": "https://keystone-admin.dream.io:35357/v3/", "rel": "self"}]}}

See that last link?

Now, like a good service provider, Dreamhost keeps its Keystone administration inside, behind their firewall.

nslookup keystone-admin.dream.io
Server: 75.75.75.75
Address: 75.75.75.75#53

Non-authoritative answer:
Name: keystone-admin.dream.io
Address: 10.64.140.19

[ayoung@ayoung541 dreamhost]$ curl keystone-admin.dream.io

Crickets…hangs.  Same with a request to 35357.  And since the Password auth plugin is going to use the URL from the /v3 page, which is

https://keystone-admin.dream.io:35357/v3.

To get around this, Dreamhost will shortly change their Keystone config file:  If they have the base line config shipped with Keystone, they have, in the section:

[DEFAULT]

admin_endpoint = <None>

Which is what is used in discovery to build the URL above.  yeah,  It is dumb.  Instead, they will set it to

https://keystone.dream.io/

And discovery will work.

But I am impatient, and I want to test it now. The work around is to bypass discovery and specify the V3 version of the Keystoneauth1 Password protocol. The version specific plugin uses the AUTH_URL as provided to figure out where to get tokens. With the line:

export OS_AUTH_TYPE=v3password

And now…

$ openstack server show ipa.younglogic.net   
+--------------------------------------+---------------------------------------------------------+
| Field                                | Value                                                   |
+--------------------------------------+---------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                  |
| OS-EXT-AZ:availability_zone          | iad-1                                                   |
| OS-EXT-STS:power_state               | 1                                                       |
| OS-EXT-STS:task_state                | None                                                    |
| OS-EXT-STS:vm_state                  | active                                                  |
| OS-SRV-USG:launched_at               | 2016-06-17T03:28:48.000000                              |
| OS-SRV-USG:terminated_at             | None                                                    |
| accessIPv4                           |                                                         |
| accessIPv6                           |                                                         |
| addresses                            | private-network=2607:f298:6050:499d:f816:3eff:fe6a:afdb, 
                                                         10.10.10.75, 173.236.248.45             |
| config_drive                         |                                                         |
| created                              | 2016-06-17T03:27:09Z                                    |
| flavor                               | warpspeed (400)                                         |
| hostId                               | 4a7c64b912cfeda73c2c56ac52e8ffd124aac29ec54e1e4902d54bd4|
| id                                   | f0f46fd3-fa59-4a5b-835d-a638f6276566                    |
| image                                | CentOS-7 (c1e8c5b5-bea6-45e9-8202-b8e769b661a4)         |
| key_name                             | ayoung-pubkey                                           |
| name                                 | ipa.younglogic.net                                      |
| os-extended-volumes:volumes_attached | []                                                      |
| progress                             | 0                                                       |
| project_id                           | 9c7e4956ea124220a87094a0a665ec82                        |
| properties                           |                                                         |
| security_groups                      | [{u'name': u'ayoung-all-open'}]                         |
| status                               | ACTIVE                                                  |
| updated                              | 2016-06-17T03:28:24Z                                    |
| user_id                              | b6fd4d08f2c54d5da1bb0309f96245bc                        |
+--------------------------------------+---------------------------------------------------------+

And how cool is that: they are using IPv6 for their private network.

If you want to generate your own V3 config file from the file they ship, use this.

July 08, 2016

Installing FreeIPA in as few lines as possible

I had this in another post, but I think it is worth its own.

sudo hostnamectl set-hostname --static undercloud.ayoung-dell-t1700.test
export address=`ip -4 addr  show eth0 primary | awk '/inet/ {sub ("/24" ,"" , $2) ; print $2}'`
echo $address `hostname` | sudo tee -a /etc/hosts
sudo yum -y install ipa-server-dns
export P=FreIPA4All
ipa-server-install -U -r `hostname -d|tr "[a-z]" "[A-Z]"` -p $P -a $P --setup-dns `awk '/^name/ {print "--forwarder",$2}' /etc/resolv.conf`

Just make sure you have enough entropy.

Merging FreeIPA and Tripleo Undercloud Apache installs

My Experiment yesterday left me with a broken IPA install. I aim to fix that.

To get to the start state:

From my laptop, kick off a Tripleo Quickstart, stopping prior to undercloud deployment:

./quickstart.sh --teardown all -t  untagged,provision,environment,undercloud-scripts  ayoung-dell-t1700.test

SSH in to the machine …

ssh -F /home/ayoung/.quickstart/ssh.config.ansible undercloud

and set up FreeIPA;

$ cat install-ipa.sh

#!/usr/bin/bash

sudo hostnamectl set-hostname --static undercloud.ayoung-dell-t1700.test
export address=`ip -4 addr  show eth0 primary | awk '/inet/ {sub ("/24" ,"" , $2) ; print $2}'`
echo $address `hostname` | sudo tee -a /etc/hosts
sudo yum -y install ipa-server-dns
export P=FreIPA4All
sudo ipa-server-install -U -r `hostname -d|tr "[a-z]" "[A-Z]"` -p $P -a $P --setup-dns `awk '/^name/ {print "--forwarder",$2}' /etc/resolv.conf`

Backup the HTTPD config directory:

 sudo cp -a /etc/httpd/ /root

Now go continue the undercloud install

./undercloud-install.sh 

Once that is done, the undercloud passes a sanity check. Doing a diff between the two directories shows a lot of differences.

sudo diff -r /root/httpd  /etc/httpd/

All of the files in /etc/httpd/conf.d that were placed by the IPA install are gone, as are the following module files in /root/httpd/conf.modules.d

Only in /root/httpd/conf.modules.d: 00-base.conf
Only in /root/httpd/conf.modules.d: 00-dav.conf
Only in /root/httpd/conf.modules.d: 00-lua.conf
Only in /root/httpd/conf.modules.d: 00-mpm.conf
Only in /root/httpd/conf.modules.d: 00-proxy.conf
Only in /root/httpd/conf.modules.d: 00-systemd.conf
Only in /root/httpd/conf.modules.d: 01-cgi.conf
Only in /root/httpd/conf.modules.d: 10-auth_gssapi.conf
Only in /root/httpd/conf.modules.d: 10-nss.conf
Only in /root/httpd/conf.modules.d: 10-wsgi.conf

TO start, I am going to backup the existing HTTPD directory :

 sudo cp -a /etc/httpd/ /home/stack/

Te rest of this is easier to do as root, as I want some globbing. First, I’ll copy over the module config files

 sudo su
 cp /root/httpd/conf.modules.d/* /etc/httpd/conf.modules.d/
 systemctl restart httpd.service

Test Keystone

 . ./stackrc 
 openstack token issue

Get a token…good to go…ok, lets try toe conf.d files.

sudo cp /root/httpd/conf.d/* /etc/httpd/conf.d/
systemctl restart httpd.service

Then as a non admin user

$ kinit admin
Password for admin@AYOUNG-DELL-T1700.TEST: 
[stack@undercloud ~]$ ipa user-find
--------------
1 user matched
--------------
  User login: admin
  Last name: Administrator
  Home directory: /home/admin
  Login shell: /bin/bash
  UID: 776400000
  GID: 776400000
  Account disabled: False
  Password: True
  Kerberos keys available: True
----------------------------
Number of entries returned 1
----------------------------

This is a fragile deployment, as updating either FreeIPA or the Undercloud has the potential to break one or the other…or both. But it is a start.

Defconflicting Swift-Proxy with FreeIPA

Port 8080 is a popular port. Tomcat uses it as the default port for unencrypted traffic. FreeIA, installs Dogtag which runs in Tomcat. Swift proxy also chose that port number for its traffic. This means that if one is run on that port, the other cannot. Of the two, it is easier to change FreeIPA, as the port is only used for internal traffic, where as Swift’s port is in the service catalog and the documentation.

Changing the port in FreeIPA requires modifications in both the config directories for Dogtag and the Python code that contacts it.

The Python changes are in

/usr/lib/python2.7/site-packages/ipaplatform/base/services.py
/usr/lib/python2.7/site-packages/ipapython/dogtag.py

Look for any instance of 8080 and change them to another port that will not conflict. I chose 8181

The config changes for dogtag are in /etc/pki such as /etc/pki/pki-tomcat/ca/CS.cfg and again, change 8080 to 8181.

Restart the server with:

sudo systemctl status ipa.service

To confirm run a command that hits the CA:

 ipa cert-find

I have a ticket in with FreeIPA to try and get support for this in.

With these changes made, I tested out then installing the undercloud on the same node and it seems to work.

However, the IPA server is no longer running. The undercloud install seems to have cleared out the ipa config files from under /etc/httpd/conf.d. However, DOgtag is still running as shown by

curl localhost:8181

Next experiment will be to see if I can preserve the IPA configuration

July 05, 2016

Launching a Centos VM in Tripleo Overcloud

My Overcloud deploy does not have any VM images associates with it. I want to test launching a VM.

Get the VM from Centos:

curl -O  http://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2.xz
unxz < CentOS-7-x86_64-GenericCloud.qcow2.xz >CentOS-7-x86_64-GenericCloud.qcow2
glance --os-image-api-version 2 image-create --name 'CentOS-7-x86_64-GenericCloud' --disk-format qcow2 --container-format bare --file CentOS-7-x86_64-GenericCloud.qcow2

Wait for that to finish, and check with

$ openstack image list
+--------------------------------------+------------------------------+--------+
| ID                                   | Name                         | Status |
+--------------------------------------+------------------------------+--------+
| 06841fb4-df1c-458d-898e-aea499342905 | CentOS-7-x86_64-GenericCloud | active |
+--------------------------------------+------------------------------+--------+

Now launch it:

openstack server create --flavor m1.small --image CentOS-7-x86_64-GenericCloud testrun

And it becomes active pretty quickly:

$ openstack server list
+--------------------------------------+---------+--------+----------+
| ID                                   | Name    | Status | Networks |
+--------------------------------------+---------+--------+----------+
| 76585723-e2c3-4acb-88d5-837b69000f72 | testrun | ACTIVE |          |
+--------------------------------------+---------+--------+----------+

It has no network capability. To Destroy:

openstack server delete 76585723-e2c3-4acb-88d5-837b69000f72
But I have work to do!
There’s a news story going around that talks about how horrible computer security tends to be in hospitals. This probably doesn’t surprise anyone who works in the security industry, security is often something that gets in the way, it’s not something that helps get work done.

There are two really important lessons we should take away from this. The first is that a doctor or nurse isn’t a security expert, doesn’t want to be a security expert, and shouldn’t be a security expert. Their job is helping sick people. We want them helping sick people, especially if we’re the people who are sick. The second is that when security gets in the way, security loses. Security should lose when it gets in the way, we’ve been winning far too often and it’s critically damaged the industry.

They don’t want to be security experts

It’s probably not a surprise that doctors and nurses don’t want to be computer security experts. I keep going back and forth between “you need some basics” and “assume nothing”. I’m back to the assume nothing camp this week. I think in the context of health care workers, security can’t exist, at least not the way we see it today. These are people and situations where seconds can literally be the difference between life and death. Will you feel better knowing the reason your grandma died was because they were using strong passwords? Probably not. In the context of a hospital, if there is any security it has to be totally transparent, the doctors shouldn’t have to know anything about it, and it should work 100% of the time. This is of course impossible.

So the real question isn’t how do we make security 100% reliable, the question is where do we draw our risk line. We want this line as far from perfect security and as close to saving lives as possible. If we start to think in this context it changes our requirements quite a lot. There will be a lot of “good enough security”. There will be a lot of hard choices to make and anyone who can make them will have to be extremely knowledgeable with both health care and security. I bet there aren’t a lot of people who can do this today.

This leads us to point #2

When security gets in the way, security loses

If you’re a security person, you see people do silly and crazy things all the time. Every day all day. How many times a day do you ask “why did you do that”? Probably zero. It’s more likely you say “don’t do that” constantly. If you have kids, the best way to get them to do something is to say “don’t do that”. If we think about security in the context of a hospital, the answer to “why did you do that” is pretty simple, it’s because the choice was probably between getting the job done and following the security guidelines. A hospital is one of the extremes where it’s easy to justify breaking the rules. If you don’t, people die. In most office settings if you break the rules, nobody dies, there will possibly be some sort of security event that will cost time and money. Historically speaking, in an office environment, we tell people “don’t do that” and expect them to listen, in many cases they pretend to listen.

This attitude of “listen to me because” has created a security universe where we don’t pay attention to what people are actually doing, we don’t have to. We get in the way, then when someone tries to get their work done, we yell at them for not following our bizarre and esoteric rules instead of understanding the challenge and how we can solve it together. The next great challenge we have isn't tighter rules, or better training, it's big picture. How can we start looking at systems with a big picture view? It won't be easy, but it's where we go next.

What do you think? Let me know: @joshbressers

July 01, 2016

Clearing the Keystone Environment

If you spend a lot of time switching between different cloud, different users, or even different projects for the same user when working with openstack, you’ve come across the problem where one environment variable from an old sourceing pollutes the current environment.  I’ve been hit by that enough times that I wrote a small script to clear the environment.

I call it clear_os_env

unset OS_AUTH_TYPE
unset OS_AUTH_URL
unset OS_CACERT
unset OS_COMPUTE_API_VERSION
unset OS_DEFAULT_DOMAIN
unset OS_DOMAIN_ID
unset OS_DOMAIN_NAME
unset OS_IDENTITY_API_VERSION
unset OS_IDENTITY_PROVIDER
unset OS_IDENTITY_PROVIDER_URL
unset OS_IMAGE_API_VERSION
unset OS_NETWORK_API_VERSION
unset OS_OBJECT_API_VERSION
unset OS_PASSWORD
unset OS_PROJECT_DOMAIN_ID
unset OS_PROJECT_DOMAIN_NAME
unset OS_PROJECT_ID
unset OS_PROJECT_NAME
unset OS_REGION_NAME
unset OS_SERVICE_ENDPOINT
unset OS_SERVICE_PROVIDER_ENDPOINT
unset OS_SERVICE_TOKEN
unset OS_TENANT_ID
unset OS_TENANT_NAME
unset OS_TOKEN
unset OS_TRUST_ID
unset OS_URL
unset OS_USERNAME
unset OS_USER_DOMAIN_ID
unset OS_USER_DOMAIN_NAME
unset OS_USER_ID
unset OS_USER_ID
unset OS_VOLUME_API_VERSION

Source this prior to sourcing any keystone.rc file, and you should have cleared out the old variables, regardless of how vigilant the new source file writer was in clearing old variables. THis includes some old variables that should no longer be used, like OS_SERVICE_TOKEN

June 27, 2016

The future of security
The Red Hat Summit is happening this week in San Francisco. It's a big deal if you're part of the Red Hat universe, which I am. I'm giving the Red Hat security roadmap talk this year. The topic has me thinking about the future of security quite a lot. It's easy to think about this in the context of an organization like Red Hat, we have a lot of resources, and there are a lot of really interesting things happening. Everything from container security, to operating system security, to middleware security. My talk will end up youtube at some point, I'll link to it, but I also keep thinking about the bigger picture. Where will security be in the next 5, 10, 15 years?

Will ransomware still be a thing in ten years? Will bitcoin still be around? What about flash? How will open source adapt to all the changes? Will we even call them containers?

The better question here is "what do we want security to look like?"

If we look at some of the problems that always make the news, stolen personal information, password leaks, ransomware, hacking. These aren't new problems, most are almost as old as the Internet. The question is really, can we fix any of these problems? The answer might be "no". Some problems aren't fixable, crime is an example of this. When you have unfixable problems the goal is to control the problem, not prevent it.

How do we control security?

I think we're headed down this path today. It's still slow going and there are a lot of old habits that will die hard. Most decent security organizations aren't focused on pure prevention anymore, they understand that security is process and people, it's all about having nice policies and good staff. If you have those things you can start to work on controlling some aspects of what's happening. If you want users to behave you have to make it easy for them to do the right thing. If you don't want them opening email attachments, make it easy to not use email attachments.

There are still a lot of people who think it's enough to tell people not to do something, or yell at them if they behave in a way that is quite honestly expected. People don't like getting yelled at, they don't like having to go out of their way to do anything, they will always pick the option that is easiest.

Back to the point though. What will the future of security look like? I think the future of security is people. Technology is great, but all our fancy technology is to solve problems that are in the past. If we want to solve the problems of the future, we need good people to first understand those problems, then we can understand how to solve them. This is of course easier said than done, but sometimes just understanding the problem is.

Are you a people? Do you have ideas how to make things better? Tell me: @joshbressers

June 20, 2016

Decentralized Security
If you're a fan of the cryptocurrency projects, you've heard of something called Ethereum. It's similar to bitcoin, but is a seperate coin. It's been in the news lately due to an attack on the currency. Nobody is sure how this story will end at this point, there are a few possible options, none are good. This got me thinking about the future of security, there are some parallels when you compare traditional currency to crypto currency as well as where we see security heading (stick with me here).

The current way currency works is there is some central organization that is responsible for minting and controlling the currency, usually a country. There are banks, exchanges, loans, interest, physical money, and countless other ways the currency interacts with society. We will compare this to how IT security has mostly worked in the past. You had one large organization responsible for everything. If something went wrong, you could rely on the owner to take control and make things better. There are some instances where this isn't true, but in general it holds.

Now if we look at cryptocurrency, there isn't really a single group or person in charge. That's the whole point though. The idea is to have nobody in charge so the currency can be used with some level of anonymity. You don't have to rely on some sort of central organization to give the currency legitimacy, the system itself has legitimacy built in.

This reminds of the current state of shadow IT, BYOD, and cloud computing in general. The days of having one security group that was in charge of everything are long gone. Now we have distributed responsibility as well as distributed risk. It's up to each group to understand how they must interact with each other. The risk is shifted from one central organization to nearly everyone involved.

Modified risk isn't a bad thing, demonizing it isn't the point of this discussion. The actual point is that we now exist in an environment that's new to us. The history of humanity has taught us how to exist in an environment where there is a central authority. We now exist in a society that is seeing a shift from central authorities to individuals like never before. The problem with this is we don't know how to deal with or talk about such an environment. When we try to figure out what's happening with security we use analogies that don't work. We talk about banks (just like this post) or cars or doors or windows or boats.

The reality though is we don't really know what this means. We now exist in an environment where everything is becoming distributed, even security. The days of having a security group that rules with an iron fist are gone. If you have an iron fist, you end up with a massive shadow IT problem. In a world based on distributed responsibility the group with the iron fist becomes irrelevant.

The point of bringing up Ethereum wasn't to pick on its problems. It's to point out that we should watch them closely. Regardless of how this problem is solved there will be lessons learned. Success can be as good as a mistake if you understand what happened and why. The face of security is changing and a lot of us don't understand what's happening. There are no analogies that work here, we need new analogies and stories. Right now one of the easiest to understand stories around distributed security is cryptocurrency. Even if you're not bitcoin rich, you should be paying attention, there are lessons to be learned.
Keystone Auth Entry Points

OpenStack libraries now use Authenication plugins from the keystoneauth1 library. One othe the plugins has disappered? Kerbersop. This used to be in the python-keystoneclient-kerberos package, but that is not shipped with Mitaka. What happened?

To list the posted entry points on a Centos Based system, you can first look in the entry_points.txt file:

cat /usr/lib/python2.7/site-packages/keystoneauth1-2.4.1-py2.7.egg-info/entry_points.txt
[keystoneauth1.plugin]
v2token = keystoneauth1.loading._plugins.identity.v2:Token
admin_token = keystoneauth1.loading._plugins.admin_token:AdminToken
v3oidcauthcode = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectAuthorizationCode
v2password = keystoneauth1.loading._plugins.identity.v2:Password
v3password = keystoneauth1.loading._plugins.identity.v3:Password
v3oidcpassword = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectPassword
token = keystoneauth1.loading._plugins.identity.generic:Token
v3token = keystoneauth1.loading._plugins.identity.v3:Token
password = keystoneauth1.loading._plugins.identity.generic:Password

But are there others?

Looking in the source repo: We can see a reference to Kerberos (as well as SAML, which has also gone missing), before the enumeration of the entry points we see above.

[extras]
kerberos =
  requests-kerberos>=0.6:python_version=='2.7' or python_version=='2.6' # MIT
saml2 =
  lxml>=2.3 # BSD
oauth1 =
  oauthlib>=0.6 # BSD
betamax =
  betamax>=0.7.0 # Apache-2.0
  fixtures>=3.0.0 # Apache-2.0/BSD
  mock>=2.0 # BSD

[entry_points]

keystoneauth1.plugin =
    password = keystoneauth1.loading._plugins.identity.generic:Password
    token = keystoneauth1.loading._plugins.identity.generic:Token
    admin_token = keystoneauth1.loading._plugins.admin_token:AdminToken
    v2password = keystoneauth1.loading._plugins.identity.v2:Password
    v2token = keystoneauth1.loading._plugins.identity.v2:Token
    v3password = keystoneauth1.loading._plugins.identity.v3:Password
    v3token = keystoneauth1.loading._plugins.identity.v3:Token
    v3oidcpassword = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectPassword
    v3oidcauthcode = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectAuthorizationCode
    v3oidcaccesstoken = keystoneauth1.loading._plugins.identity.v3:OpenIDConnectAccessToken
    v3oauth1 = keystoneauth1.extras.oauth1._loading:V3OAuth1
    v3kerberos = keystoneauth1.extras.kerberos._loading:Kerberos
    v3totp = keystoneauth1.loading._plugins.identity.v3:TOTP

We see that the Kerberos plugin requires requests-kerberos>=0.6 so let’s get that installed via

sudo yum install python-requests-kerbero

And then try to enumerate the entry points via python

>>> import pkg_resources
>>> named_objects = {}
>>> for ep in pkg_resources.iter_entry_points(group='keystoneauth1.plugin'):
...     named_objects.update({ep.name: ep.load()})
... 
>>> print (named_objects)
{'v2token': <class>, 'token': <class>, 'admin_token': <class>, 'v3oidcauthcode': <class>, 'v3token': <class>, 'v2password': <class>, 'password': <class>, 'v3password': <class>, 'v3oidcpassword': <class>}

We still don’t have the Kerberos plugin. Going back to the setup.cfg file, we see the Python class for the Kerberos plugin is not listed. Kerberos is implemented here in the source tree. Does that exist in our package managed file system?

$ rpm --query --list python2-keystoneauth1-2.4.1-1.el7.noarch | grep kerberos.py$
/usr/lib/python2.7/site-packages/keystoneauth1/extras/kerberos.py

Yes. It does. Can we load that by class?

>>> from keystoneauth1.extras import kerberos
>>> print kerberos
<module>

Yes, although the RPM version is a little earlier than the git repo. So what is the entry point name? There is not one, yet. The only way to get the class is by the full class name.

We’ll fix this, but the tools for enumerating the entrypoints are something I’ve used often enough that I want to get them documented.

June 17, 2016

The difference between auth_uri and auth_url in auth_token

Dramatis Personae:

Adam Young, Jamie Lennox: Keystone core.

Scene: #openstack-keystone chat room.

ayoung: I still don’t understand the difference between url and uri
jamielennox:auth_uri ends up in “WWW-Authenticate: Keystone uri=%s” header. that’s its only job
ayoung: and what is that meant to do? tell someone where they need to go to authenticate?
jamielennox: yea, it gets added to all 401 responses and then i’m pretty sure everyone ignores it
ayoung:so they should be the same thing, then, right? I mean, we say that the Keystone server that you authenticate against is the one that nova is going to use to validate the token. and the version should match
jamielennox: depends, most people use an internal URL for auth_url but auth_uri would get exposed to the public
ayoung: ah
jamielennox: there should be no version in auth_uri
ayoung: so auth_uri=main auth_url=admin in v2.0 speak
jamielennox: yea. more or less. ideally we could default it way better than that, like auth.get_endpoint(‘identity’, interface=’public’), but that gets funny
ayoung: This should be a blog post. You want to write it or shall I? I’m basically just going to edit this conversation.
jamielennox: mm, blog, i haven’t written one of those for a while

(scene)

June 16, 2016

Learning about the Overcloud Deploy Process

The process of deploying the overcloud goes through several technologies. Here’s what I’ve learned about tracing it.

I am not a Heat or Tripleo developer. I’ve just started working with Tripleo, and I’m trying to understand this based on what I can gather, and the documentation out there. And also from the little bit of experience I’ve had working with Tripleo. Anything I say here might be wrong. If someone that knows better can point out my errors, please do so.

[UPDATE]: Steve Hardy has corrected many points, and his comments have been noted inline.

To kick the whole thing off in the simplest case, you would run the command openstack overcloud deploy .
VM-config-changes-via-Heat-1

Roughly speaking, here is the sequence (as best as I can tell)

  1.  User types  openstack overcloud deploy on the command line
  2. This calls up the common cli, which parses the command, and matches the tripleo client with the overcloud deploy subcommand.
  3. tripleo client is a thin wrapper around the Heat client, and calls the equivalent of heat stack-create overcloud
  4. python-heatclient (after Keystone token stuff) calls the Heat API server with the URL and data to do a stack create
  5. Heat makes the appropriate calls to Nova (running the Ironic driver) to activate a baremetal node and deploy the appropriate instance on it.
  6. Before the node is up and running, Heat has posted Hiera data to the metadata server.
  7. The newly provisioned machine will run cloud-init which in turn runs os-collect-config.
    [update] Steve Hardy’s response:

    This isn’t strictly accurate – cloud-init is used to deliver some data that os-collect-config consumes (via the heat-local collector), but cloud-init isn’t involved with actually running os-collect-config (it’s just configured to start in the image).

  8. os-collect-config will start polling for changes to the metadata.
  9. [update] os-collect-config will start calling Puppet Apply based on the hiera data [UPDATE] os-refresh-config only, which then invokes a script that runs puppet. .
    Steve’s note:

    os-collect-config never runs puppet, it runs os-refresh-config only, which then invokes a script that runs puppet.

  10. The Keystone Puppet module will set values in the Keystone config file, httpd/conf.d files, and perform other configuration work.

Here is a diagram of how os-collect-config is designed

When a controller image is built for Tripleo, Some portion of the Hiera data is stored in /etc/puppet/. There is a file /etc/puppet/hiera.yaml (which looks a lot like /etc/hiera.yaml, an RPM controlled file) and sub file in /etc/puppet/hieradata such as
/etc/puppet/hieradata/heat_config_ControllerOvercloudServicesDeployment_Step4.json

UPDATE: Response from Steve Hardy

This is kind-of correct – we wait for the server to become ACTIVE, which means the OS::Nova::Server resource is declared CREATE_COMPLETE. Then we do some network configuration, and *then* we post the hieradata via a heat software deployment.

So, we post the hieradata to the heat metadata API only after the node is up and running, and has it’s network configured (not before).

https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/controller.yaml#L610

Note the depends_on – we use that to control the ordering of configuration performed via heat.

However, the dynamic data seems to be stored in /var/lib/os-collect-config/

$ ls -la  /var/lib/os-collect-config/*json
-rw-------. 1 root root   2929 Jun 16 02:55 /var/lib/os-collect-config/ControllerAllNodesDeployment.json
-rw-------. 1 root root    187 Jun 16 02:55 /var/lib/os-collect-config/ControllerBootstrapNodeDeployment.json
-rw-------. 1 root root   1608 Jun 16 02:55 /var/lib/os-collect-config/ControllerCephDeployment.json
-rw-------. 1 root root    435 Jun 16 02:55 /var/lib/os-collect-config/ControllerClusterDeployment.json
-rw-------. 1 root root  36481 Jun 16 02:55 /var/lib/os-collect-config/ControllerDeployment.json
-rw-------. 1 root root    242 Jun 16 02:55 /var/lib/os-collect-config/ControllerSwiftDeployment.json
-rw-------. 1 root root   1071 Jun 16 02:55 /var/lib/os-collect-config/ec2.json
-rw-------. 1 root root    388 Jun 15 18:38 /var/lib/os-collect-config/heat_local.json
-rw-------. 1 root root   1325 Jun 16 02:55 /var/lib/os-collect-config/NetworkDeployment.json
-rw-------. 1 root root    557 Jun 15 19:56 /var/lib/os-collect-config/os_config_files.json
-rw-------. 1 root root 263313 Jun 16 02:55 /var/lib/os-collect-config/request.json
-rw-------. 1 root root   1187 Jun 16 02:55 /var/lib/os-collect-config/VipDeployment.json

For each of these files there are two older copies that end in .last and .orig as well.

In my previous post, I wrote about setting Keystone configuration options such as ‘identity/domain_specific_drivers_enabled’: value => ‘True’;. I can see this value set in /var/lib/os-collect-config/request.json with a large block keyed “config”.

When I ran the openstack overcloud deploy, one way that I was able to track what was happening on the node was to tail the journal like this:

 sudo journalctl -f | grep collect-config

Looking through the journal output, I can see the line that triggered the change:

... /Stage[main]/Main/Keystone_config[identity/domain_specific_drivers_enabled]/ensure: ...

June 15, 2016

Custom Overcloud Deploys

I’ve been using Tripleo Quickstart.  I need custom deploys. Start with modifying the heat templates. I’m doing a mitaka deploy

git clone https://github.com/openstack/tripleo-heat-templates.git
cd tripleo-heat-templates/
git branch --track mitaka origin/stable/mitaka
git checkout mitaka
diff -r  /usr/share/openstack-tripleo-heat-templates/ tripleo-heat-templates/

Mine shows some differences, but in the file extraconfig/tasks/liberty_to_mitaka_aodh_upgrade_2.pp which should be OK. The commit is

Add redis constraint to aodh upgrade manifest

Modify the launch script in /home/stack

$ diff overcloud-deploy.sh.orig overcloud-deploy.sh
48c48
< openstack overcloud deploy --templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 60 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server pool.ntp.org \
---
> openstack overcloud deploy --templates  /home/stack/tripleo-heat-templates --libvirt-type qemu --control-flavor oooq_control --compute-flavor oooq_compute --ceph-storage-flavor oooq_ceph --timeout 60 -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e $HOME/network-environment.yaml --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server pool.ntp.org \

The only change should be from

--templates  #(followed by another flag which means that --templates takes the default) 

to

--templates /home/stack/tripleo-heat-templates 

OK…let’s make sure we still have a stable system. First, tear down the overcloud deliberately:

[stack@undercloud ~]$ . ./stackrc 
[stack@undercloud ~]$ heat stack-delete overcloud
Are you sure you want to delete this stack(s) [y/N]? y
+--------------------------------------+------------+-----------------+---------------------+--------------+
| id                                   | stack_name | stack_status    | creation_time       | updated_time |
+--------------------------------------+------------+-----------------+---------------------+--------------+
| 00d81e5b-c2f9-4f6a-81e8-b135fadba921 | overcloud  | CREATE_COMPLETE | 2016-06-15T18:01:25 | None         |
+--------------------------------------+------------+---------------

Wait until the delete is coplete with

$ watch heat stack-list

Wait until it changes from

+--------------------------------------+------------+--------------------+---------------------+---------
-----+
| id                                   | stack_name | stack_status	 | creation_time       | updated_
time |
+--------------------------------------+------------+--------------------+---------------------+---------
-----+
| 00d81e5b-c2f9-4f6a-81e8-b135fadba921 | overcloud  | DELETE_IN_PROGRESS | 2016-06-15T18:01:25 | None
     |
+--------------------------------------+------------+--------------------+---------------------+---------
-----+

To

+----+------------+--------------+---------------+--------------+
| id | stack_name | stack_status | creation_time | updated_time |
+----+------------+--------------+---------------+--------------+
+----+------------+--------------+---------------+--------------+

And now run the modified overcloud deploy:

./overcloud-deploy.sh

End of the output looks like this

Stack overcloud CREATE_COMPLETE
/home/stack/.ssh/known_hosts updated.
Original contents retained as /home/stack/.ssh/known_hosts.old
PKI initialization in init-keystone is deprecated and will be removed.
Warning: Permanently added '192.0.2.9' (ECDSA) to the list of known hosts.
The following cert files already exist, use --rebuild to remove the existing files before regenerating:
/etc/keystone/ssl/certs/ca.pem already exists
/etc/keystone/ssl/private/signing_key.pem already exists
/etc/keystone/ssl/certs/signing_cert.pem already exists
Connection to 192.0.2.9 closed.
Skipping "horizon" postconfig because it wasn't found in the endpoint map output
Overcloud Endpoint: http://10.0.0.4:5000/v2.0
Overcloud Deployed
+ heat stack-list
+ grep -q CREATE_FAILED
+ exit 0

Don’t be fooled by the last line grep -q CREATE_FAILED as that is the shell script execution logging, not a statement of failure.

OK, to do a proper “Hello, World” here I’d really like to be able to affect change on the deployment. I’m going to try and set a coupole Keystone config values that are not set (yet) in /etc/keystone/keystone.conf.

In my undercloud git repo for tripleo-heat-templates I make changes to the Overcloud post config.

$ git diff
diff --git a/puppet/manifests/overcloud_controller.pp b/puppet/manifests/overcloud_controller.pp
index c353ec0..c6385d4 100644
--- a/puppet/manifests/overcloud_controller.pp
+++ b/puppet/manifests/overcloud_controller.pp
@@ -223,6 +223,11 @@ if hiera('step') >= 3 {
 
   #TODO: need a cleanup-keystone-tokens.sh solution here
 
+  keystone_config {  
+   'identity/domain_specific_drivers_enabled': value => 'True';  
+   'identity/domain_config_dir': value => '/etc/keystone/domains';  
+  }  
+
   file { [ '/etc/keystone/ssl', '/etc/keystone/ssl/certs', '/etc/keystone/ssl/private' ]:
     ensure  => 'directory',
     owner   => 'keystone',

And rerun

./overcloud-deploy.sh

Once it has successfull deployed, I can check to see if the change shows up in the keystone.conf file.

$ . ./stackrc 
[stack@undercloud ~]$ openstack server list
+--------------------------------------+-------------------------+--------+---------------------+
| ID                                   | Name                    | Status | Networks            |
+--------------------------------------+-------------------------+--------+---------------------+
| 761a1b61-8bd1-4b85-912b-775e51ad99f3 | overcloud-controller-0  | ACTIVE | ctlplane=192.0.2.11 |
| f123da36-9b05-4fc3-84bb-4af147fa76f7 | overcloud-novacompute-0 | ACTIVE | ctlplane=192.0.2.10 |
+--------------------------------------+-------------------------+--------+---------------------+
[stack@undercloud ~]$ ssh heat-admin@192.0.2.11
$ sudo grep domain_specific /etc/keystone/keystone.conf
#domain_specific_drivers_enabled = false
domain_specific_drivers_enabled = True
# if domain_specific_drivers_enabled is set to true. (string value)
[heat-admin@overcloud-controller-0 ~]$ sudo grep domain_config_dir /etc/keystone/keystone.conf
#domain_config_dir = /etc/keystone/domains
domain_config_dir = /etc/keystone/domains

Changes applied.

June 13, 2016

Ready to form Voltron! why security is like a giant robot make of lions
Due to various conversations about security this week, Voltron came up in the context of security. This is sort of a strange topic, but it makes sense when we ponder modern day security. If you talk to anyone, there is generally one thing they push as a solution for a problem. This is no different for security technologies. There is always one thing that will fix your problems. In reality this is never the case. Good security is about putting a number of technologies together to create something bigger and better than any one thing can do by itself.

For those of you who don't know what Voltron is, Voltron was a cartoon when I was a kid. There were 5 robot lions that sometime during every show would combine together to create one big robot called Voltron. By themselves the lions were pretty awesome, but it always seemed the bad guy would keep getting stronger until the lions couldn't deal with it alone, only by coming together to form a giant robot of pure awesome could they destroy whatever horrible create was causing problems.

This sounds just like security. Just a firewall will eventually be beaten by your adversaries. Just code reviews won't keep things safe for long (if at all). Just using ASLR is only good for a little while. When we start putting everything together though, things get good.

There are some people who get this, they know that there isn't one thing that's going to fix it all, a lot don't though. It's very common to attend a talk about a new security feature or product. If you talk to a vendor without a doubt whatever they're doing will cure what ails you. How often does anyone talk about how their product, feature, or idea will fit in the big picture? How can two or more things work together to add security? It's pretty uncommon to see anyone talking about how well things work together. It's human nature though. We can usually only do one thing, and why wouldn't you be proud of what you're working on? You want to talk about what you do and what you know.

I'm often guilty of this too. When talking about something like containers I'll focus on selinux, or updates, or trusted content, or seccomp. Rarely is the whole story told. Part of this may be because security technology is usually really complex, it's hard to hold a good view of it all in your head at once. The thing is though, none of those are overly useful by themselves. They're all good and do great things, but it's not until you put everything together that you can see a real difference.

This all makes sense when you think about it. Layers of defense are almost always more effective than a single layer (I know there is a lot of nuance to this, but in general, let's not nitpick). Would you want to rely on only seccomp, or would you rather have seccomp, cgroups, selinux, user namespaces, trusted content, content scanning, and ExecShield? It's a no brainer when you think about it.

How can we start to think about things as a giant evil fighting robot instead of small (but still awesome) lions? It's never easy, it's even harder when you have to expect different groups to share attention and recognition. It's going to be more important in the future though. If we don't take better looks at how things work together it's going to be a lot harder to see real improvements.

What do you think? Let me know: @joshbressers

June 08, 2016

SAML Federated Auth Plugin

SAML is usually thought of as a WebSSO mechanism, but it can be made to work for command line operations if you use the Extended Client Protocol (ECP). When we did the Rippowam demo last year, we were successful in getting an Unscoped token by using ECP, but that was not sufficient to perform operations on other services that need a scoped token.

The general approach that we are looking at for Keystone is to always have the user ask for an unscoped token first, and then upgrade that to a scoped token. The scoping process can only be done from unscoped to scoped (configuration option) to prevent elevation of privilege attacks.

The base federation plugin is capable of handling this kind of workflow. Thus, the general approach is to write a protocol specific plugin to get an unscoped token, and then to use common logic in the base class v3.FederatedBaseAuth to convert unscoped to scoped.

I just got [edit: used to say keystone] opentstack flavor list to work with ECP and Keycloak. I had to create a new auth plugin to do it:

Created a new entry point in

/usr/lib/python2.7/site-packages/python_keystoneclient-1.7.2-py2.7.egg-info/entry_points.txt

v3fedsaml = keystoneclient.contrib.auth.v3.saml2:FederatedSAML2

Added this to
/usr/lib/python2.7/site-packages/keystoneclient/contrib/auth/v3/saml2.py

class FederatedSAML2(v3.FederatedBaseAuth):
    """Authenticate using SAML via the keystone federation mechanisms.

       Wraps both the unscoped SAML2 Plugin to
       1.  Request an unscoped token
       2.  Use the unscoped token to request a scoped token

    """

    @classmethod
    def get_options(cls):
        options = super(FederatedSAML2, cls).get_options()
        options.extend([
            cfg.StrOpt('identity-provider', help="Identity Provider's name"),
            cfg.StrOpt('protocol', help="SAML2"),
            cfg.StrOpt('identity-provider-url',
                       help="Identity Provider's URL"),
            cfg.StrOpt('user-name', dest='username', help='Username',
                       deprecated_name='username'),
            cfg.StrOpt('password', help='Password')
        ])
        return options

    def __init__(self, auth_url,
                 identity_provider,
                 protocol,
                 identity_provider_url,
                 username, password,
                 **kwargs):
        #protocol = kwargs.pop('protocol')
        super(FederatedSAML2, self).__init__(auth_url, identity_provider, protocol,
                                             **kwargs)
        self._unscoped = Saml2UnscopedToken(auth_url,
                                            identity_provider,
                                            identity_provider_url,
                                            username, password,
                                            **kwargs)


    def get_unscoped_auth_ref(self, session, **kwargs):
         return self._unscoped.get_auth_ref(session, **kwargs)

Updated my keystone RC file:

export OS_AUTH_TYPE=v3fedsaml

This is based on RH OSP8 which is Liberty release. In later releases of OSP, the client libraries are synchronized with later versions, including the gradual replacement of keystoneauth for the Auth plugins housed in python-keystone. Thus, there will be a couple variations on this plauoing, including one that may have to live out of tree if we want it for OSP8.

June 07, 2016

IoT Technology: Devices

Discussions of IoT often focus on the technology, so let’s start there. IoT consists of devices, which are the “things” that interact with the physical world and communicate with IoT Back-end systems over a network. There are two types of IoT devices: sensors and actuators.

An IoT system will typically be made of many devices – from dozens to millions – talking to a scaleable Back-end system. This Back-end system often runs in the Cloud. In some cases the IoT devices will talk directly to the Back-end systems. In other cases an additional system called an IoT Gateway will be placed between the devices and the Back-end systems. The IoT Gateway will typically talk to multiple local IoT devices, perform communications protocol conversions, perform local processing, and connect to the Back-end systems over a Ethernet, WiFi, or cellular modem link.

IoT Devices

IoT devices consist of sensors, actuators, and communications. Sensors, as the name implies, read information from the physical world. Examples would be temperature, humidity, barometric pressure, light, weight, CO2, motion, location, Ph level, chemical concentration for many chemicals, distance, voltage, current, images, etc. There are sensors available for an incredible range of information and many more under development. Think of things like a tiny DNA sequencer or a sensor that can detect the presence of the bacteria or virus associated with various diseases – both of these are under development!

Actuators are able to change something in the physical world. Examples would be a light switch, a remotely operated valve, a remotely controlled door lock, a stepper motor, a 3D printer, or the steering, brakes and throttle for a self driving car.

IoT Device Examples

For an idea of the range of low cost IoT compatible sensors take a look at Spark Fun Electronics, a leading source of IoT device technology for prototyping, development, and hobbyists. The sensor section at https://www.sparkfun.com/categories/23 lists over 200 sensors that can be used with Arduino and similar systems. Note that these are basically development and prototyping units – prices in production quantities will be lower.

Some sensors are familiar – temperature is perhaps the most obvious example. But many are more interesting. Consider, for example, the gas sensors: hydrogen, methane, lpg, alcohol, carbon monoxide; all available at prices of $4.95 – $7.95. Combined one of these with an Arduino Pro Mini available for $9.95, and you can build a targeted chemical sensor for less than $20.00.

What can you do with a $20.00 lpg or carbon monoxide sensor? That is the wrong question. Instead, you should be asking the question “what problems am I facing that could be addressed with a low cost network connected sensor?” The point is that there is an incredible and growing array of inexpensive sensors available. The technology is available – what we need now is the imagination to begin to creatively use ubiquitous sensors, ubiquitous networking, ubiquitous computing, and ubiquitous data.

The application of modern electronics technology to sensors is just beginning to be felt. As in many other areas of IoT, the basic capabilities have been around for years – detecting and measuring the concentration of lpg vapor or carbon monoxide isn’t new. Detecting lpg vapor concentration with a sub $20 networked device that feeds the data directly into a large distributed computing system in a form that is readily manipulated by software is new. And huge!

Lpg and carbon monoxide are just examples. The same technologies are producing sensors for a wide range of chemicals and gasses.

The combination of useful capabilities, low cost, network connection, and integration into complex software applications is a complete revolution. And this revolution is just beginning. What happens to agriculture when we can do a complete soil analysis for each field? What happens if we have nutrient, moisture, light, and temperature information for each ten foot square in a field, updated every 15 minutes over the entire growing season? What happens when we have this information for a 20 year period? What happens when this information is dynamically combined with plant growth monitoring, standard plant growth profiles, weather forecasts and climatic trends?

Going further, what if this data is combined with an active growth management system where application of fertilizer, pesticide, and water is optimized for individual micro-areas within a field? Technology is progressing to the point where we can provide the equivalent of hands-on gardening to commercial fields.

As an example of work going on in this area see the Industrial Internet Consortium Testbed on Precision Crop Management at http://www.iiconsortium.org/precision-crop-management.htm.


June 06, 2016

Is there a future view that isn't a security dystopia?
I recently finished reading the book Ghost Fleet, it's not a bad read if you're into what cyberwar could look like. It's not great though, I won't suggest it as the book of the summer. The biggest thing I keep thinking about is I've yet to really see any sort of book that takes place in the future, with a focus on technology, that isn't a dystopian warning. Ghost Fleet is no different.

My favorite part was how everyone knew the technology was totally pwnt, yet everyone still used it. There were various drones, smart display glasses, AI to control boats, rockets, even a space laser (which every book needs). This reminds me of today to a certain degree. We all use web sites we know will be hacked. We know our identities have been stolen. We know our phones aren't secure. Our TVs record our conversations. You can even get doorbells that can stream you a video feed. We love this technology even though it's either already hacked, or will be soon. We know it and we don't care, we just keep buying broken phones, TVs, blenders, cars, anything that comes with WiFi!

Disregarding the fact that we are probably already living in the dystopian future, it really made me wonder if there are any examples of a future that isn't a security nightmare? You could maybe make the argument that Star Trek is our hopeful future, but that's pretty old these days. And even then, the android took over the ship more times than I'd be comfortable with. I think it's safe to say their security required everyone to be a decent human. If that's our only solution, we're pretty screwed.

Most everything I come across is pretty bleak and I get why. Where is our escape from all the insecure devices we pretend we hate? The only number growing faster than the number of connected devices is the number of security flaws in those devices. There aren't even bad ideas to fix this stuff, there's just nothing. The thing about bad ideas is they can often be fixed. A smart person can take a bad idea and turn it into a good idea. Bad ideas are at least something to build on. I don't see any real ideas to fix these devices. We have nothing to build on. Nothing is dangerous. No matter how many times you improve it, it's still nothing. I have no space laser, so no matter how many ideas I have to make it better, I still won't have a space laser (if anyone has one I could maybe borrow, be sure to let me know).

Back to the idea about future technology. Are there any real examples of a future based heavily on technology that isn't a horrible place? This worries me. One of the best parts about science fiction is getting to dream about a future that's better than the present. Like that computer on the space ship in 2001, that thing was awesome! It had pretty good security too ... sort of.

So here is the question we should all think about. At what point do connected devices get bad enough people stop buying them? We're nowhere near that point today. Will we ever reach that point? Maybe people will just accept the fact that their dishwasher will send spam when it's not running and the toaster will record your kitchen conversations. I really want to live in a nice future, one where our biggest threat is an android that got taken over by a malevolent intelligence, not one where my biggest threat is my doorbell.

Do you know of any non dystopian predictions? Let me know: @joshbressers

June 03, 2016

Fun with bash, or how I wasted an hour trying to debug some SELinux test scripts.
We are working to get SELinux and Overlayfs to work well together.  Currently you can not run docker containers
with SELinux on an Overlayfs back end.  You should see the patches posted to the kernel list within a week.

I have been tasked to write selinuxtestsuite tests to verify overlayfs works correctly with SELinux.
These tests will help people understand what we intended.

One of the requirements for overlayfs/SELinux is to check not only the access of the task process doing some access
but also the label of the processes that originally setup the overlayfs mount.

In order to do the test I created two process types test_overlay_mounter_t and test_overlay_client_t, and then I was using
runcon to execute a bash script in the correct context.  I added code like the following to the test to make sure that the runcon command was working.

# runcon -t test_overlay_mounter_t bash <<EOF
echo "Mounting as $(id -Z)"
...
EOF


The problem was when I ran the tests, I saw the following:

Mounting as unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
...


Sadly it took me an hour to diagnose what was going on.  Writing several test scripts and running commands by hand.  Sometimes it seemed to work and other times it would not.  I thought there was a problem with runcon or with my SELinux policy.  Finally I took a break and came back to the problem realizing that the problem was with bash.  The $(id -Z) was
executed before the runcon command.

Sometimes you feel like an idiot.

runcon -t test_overlay_mounter_t bash <<EOF
echo "Mounting as $(id -Z)"
echo -n "Mounting as "
id -Z
EOF
Mounting as unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Mounting as unconfined_u:unconfined_r:test_overlay_mounter_t:s0-s0:c0.c1023


My next blog will explain how we expect overlayfs to work with SELinux.

June 01, 2016

The Internet of Things

There area lot of things to explore around IoT – what it is, technology, system architectures, security, implementation challenges, and many others. We will get to all of those, but a great place to start is how we got here and the implications of IoT. Rather than starting with things, let’s start with what is really important – economics.

Just what is the Internet of Things (IoT)? At the simplest level it is devices that interact with the physical world and communicate over a network. Simple, but with very significant implications. Let’s dig into these implications and see how such a simple concept can have such great impact.

The major drivers of IoT are technology, economics, software, and integration. Individually these are significant. Combined they will have a major impact on all aspects of life. Some of this impact is good, and some may be bad. As with many things, good vs. bad will often depend on the implementation and how it is used.

Is IoT New?

A common question is whether or not IoT is something new and revolutionary or a buzzword for old ideas? The answer is “yes”…

Much of the foundation of IoT has been around for quite a while. SCADA systems, or Supervisory Control And Data Acquisition has been around since the 1950’s managing electrical power grids, railroads, and factories. Machine communications over telephone lines and microwave links has been around since the 1960’s. Machine control systems, starting on mainframes and minicomputers, have also been around since the 1960’s.

The big changes are economics, software, and integration. Microsensors and SoC (System on a Chip) technology for CPUs and networking are driving the cost of devices down – in some cases by a factor of a thousand! Advances in networking – both networking technology as well as the availability of pervasive networking – are changing the ground rules and economics for machine to machine communication.

The use of standards is greatly easing integration. Advances in software, software frameworks, and development tools, as well as the availability of functional libraries for many tasks, is creating an explosion in innovative IoT products and capabilities.

But the most significant new factor in IoT is economics. Technology, pervasive networking, and cloud computing are driving the cost of IoT down – in many cases by a factor of a thousand or more! New capabilities in sensors and actuators are opening up new areas of application. Cost reductions this large are often more important than new capabilities as they vastly broaden areas of application.

Another massive change is monetization of data. Companies are increasingly aware of the value of the data captured from IoT systems, especially after extensive analysis and datamining.

Further emphasizing the importance of economics are the new business models are emerging. For example, jet engine companies moved from selling jet engines to selling “thrust hours” – a service based model of supplying power as a service rather than selling hardware. A key part of this is extensive use of IoT to monitor every aspect of jet engine operation to provide more effective maintenance and support of the engines. As an example, Virgin Atlantic reports that their Boeing 787 aircraft produce 500GB of data per flight.


May 31, 2016

Reviews for RDO packages

We are in the process of getting the docs straightened out for reviewing RDO packages. As we do, I want to record what I have working.

I started with

rdopkg clone openstack-keystone

But that did not give me a repo that was in sync with my Gerrit account. I ended up with a .gti/config that looks likethis:

core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
[remote "origin"]
	url = http://review.rdoproject.org/r/p/openstack/keystone-distgit.git
	fetch = +refs/heads/*:refs/remotes/origin/*
[branch "rpm-master"]
	remote = rpm-master
	merge = refs/heads/rpm-master
[remote "patches"]
	url = http://review.rdoproject.org/r/p/openstack/keystone.git
	fetch = +refs/heads/*:refs/remotes/patches/*
[remote "upstream"]
	url = git://git.openstack.org/openstack/keystone
	fetch = +refs/heads/*:refs/remotes/upstream/*
[remote "gerrit"]
        url = ssh://admiyo@review.rdoproject.org:29418/openstack/keystone-distgit.git
        fetch = +refs/heads/*:refs/remotes/gerrit/*
[user]
	email = adam@younglogic.com

openstack/keystone-distgit.git is the RPM packaging repo for Keystone.

openstack/keystone is the main keystone code repo.

My github account predates my time at Red Hat, and I’d rather not mess with that account, but the vast majority of the rest of my git work is done as ayoung@redhat.com, so I wanted to make a local user config.

With this setup I could make a review

By editing openstack-keystone.logrotate, commiting, and running git review.

May 29, 2016

Regulation can fix security, except you can't regulate security
Every time I start a discussion about how we can solve some of our security problems it seems like the topics of professional organizations and regulation are where things end up. I think regulations and professional organizations can fix a lot of problems in an industry, I'm not sure they work for security. First let's talk about why regulation usually works, then, why it won't work for security.

What is regulation?
You may not know it, but you deal with regulated industries every day. The food we eat, the cars we drive, the buildings we use, the roads, our water, products we buy, phones, internet, banks; there are literally too many to list. The reasons for the regulation vary greatly, but at the end of the day it's a nice way to use laws to protect society. It doesn't always directly protect people, sometimes it protects the government, or maybe even a giant corporation, but the basic idea is because of the regulation society is a better place. There are plenty of corner cases but for now let's just assume the goal is to make the world a better place.

Refrigerator regulation
One of my favorite stories about regulation involves refrigerator doors. A long time ago the door to a refrigerator would lock from the outside. If someone found themselves on the inside with a closed door, they couldn't get out. Given a refrigerator is designed to be air tight, one wouldn't last very long on the inside. The government decided to do something about this and told the companies that made refrigerators there had to be a way to get out if you're stuck inside. Of course this was seen as impossible and it was expected most companies would have to go out of business or stop making refrigerators. Given a substantial percentage of the population now owns refrigerators, it's safe to say that didn't happen. The solution was to use magnets to hold the door shut. Now the thought of using a locking door seems pretty silly especially when the solution was elegant and better in nearly every way.

Can we regulate cybersecurity?
The short answer is no. It can't be done. I do hate claiming something can't be done, someday I might be wrong. I imagine there will be some form of regulation eventually, it probably won't really work though. Let's use the last financial crisis to explain this. The financial industry has a lot of regulation, but it also has a lot of possibility. What I mean by this is the existing regulation mostly covers bad things that were done in the past, it's nearly impossible to really regulate the future due to the nature of regulation. So here's the thing. How many people went to jail from the last financial crisis? Not many. I'd bet in a lot of cases while some people were certainly horrible humans, they weren't breaking any laws. This will be the story of security regulation. We can create rules to dictate what happened in the past, but technology, bad guys, and people move very quickly in this space. If you regulated the industry to prevent a famous breach from a few years ago (there are many to choose from), by now the whole technology landscape has changed so much many of those rules wouldn't even apply today. This gets even crazier when you think about the brand new technology being invented every day.

Modern computer systems are Turing complete
A refrigerator has one door. One door that the industry didn't think they could fix. A modern IT system can do an infinite number of operations. You can't regulate a machine that can literally do anything. This would be like saying the front fridge door can't lock when you have a fridge with infinite area on the inside. If you can't find the door, and there are millions of other doors, some which don't open, it's not a useful regulation.

This is our challenge. We have machines that can literally do anything, and we have to make them secure. If there are infinite operations, there are by definitions infinite security problems. I know that's a bit over dramatic, but the numbers are big enough they're basically infinity.

The things that generally come up revolve around having security professionals, or training staff, or getting tools to lock things down, or better defaults. None of this things will hurt, but none really work either. even if you have the best staff in the world, you have to work with vendors who don't. Even if you have the best policies and tools, your developers and sysadmins will make silly mistakes. Even with the best possible defaults, one little error can undo everything.

What can we do?
I'm not suggesting we should curl up in the corner and weep (I'm also not saying not to). Weeping can be less dangerous than letting the new guy configure the server, it's not very helpful long term. I'm not suggesting that tools and training and staff are wastes of time and money, they have value to a certain point. It's sort of like taking a CPR course. You can't do brain surgery, but you can possibly save a life in an emergency. The real fix is going to be from technology and process that don't exist yet. Cybersecurity is a new concept that we can't use old models to understand. We need new models, tools, and ideas. They don't exist yet, but they will someday. Go invent them, I'm impatient and don't want to wait.

If you have any ideas, let me know: @joshbressers

May 27, 2016

OpenShift and SSSD Part 3: Extended LDAP Attributes

Overview

This is the third post in a series on setting up advanced authentication mechanisms with OpenShift Origin. This entry will build upon the foundation created earlier, so if you haven’t already gone through that tutorial, start here and continue here.

Configuring Extended LDAP Attributes

Prerequisites

  • SSSD 1.12.0  or later. This is available on Red Hat Enterprise Linux 7.0 and later.
  • mod_lookup_identity 0.9.4 or later. The required version is not yet available on any released version of Red Hat Enterprise Linux 7, but RPMs for this platform are available from upstream at this COPR repository until they arrive in Red Hat Enterprise Linux.

Configuring SSSD

First, we need to ask SSSD to look up attributes in LDAP that it normally doesn’t care about for simple system-login use-cases. In the OpenShift case, there’s really only one such attribute: email. So we need to modify the [domain/DOMAINNAME] section of /etc/sssd/sssd.conf on the authenticating proxy and add this attribute:

[domain/example.com]
...
ldap_user_extra_attrs = mail

Next, we also have to tell SSSD that it’s acceptable for this attribute to be retrieved by apache, so we need to add the following two lines to the [ifp] section of /etc/sssd/sssd.conf as well:

[ifp]
user_attributes = +mail
allowed_uids = apache, root

Now we should be able to restart SSSD and test this configuration.

# systemctl restart sssd.service

# getent passwd <username>
username:*:12345:12345:Example User:/home/username:/usr/bin/bash

# gdbus call \
        --system \
        --dest org.freedesktop.sssd.infopipe \
        --object-path /org/freedesktop/sssd/infopipe/Users/example_2ecom/12345 \
        --method org.freedesktop.DBus.Properties.Get \
        "org.freedesktop.sssd.infopipe.Users.User" "extraAttributes"
(<{'mail': ['username@example.com']}>,)

Configuring Apache

Now that SSSD is set up and successfully serving extended attributes, we need to configure the web server to ask for them and to insert them in the correct places.

First, we need to install and enable the mod_lookup_identity module for Apache (See note in the “Prerequisites” setting for installing on RHEL 7):

# yum -y install mod_lookup_identity

Second, we need to enable the module so that Apache will load it. We need to modify /etc/httpd/conf.modules.d/55-lookup_identity.conf and uncomment the line:

LoadModule lookup_identity_module modules/mod_lookup_identity.so

Next, we need to let SELinux know that it’s acceptable for Apache to connect to SSSD over D-BUS, so we’ll set an SELinux boolean:

# setsebool -P httpd_dbus_sssd on

Then we’ll edit /etc/httpd/conf.d/openshift-proxy.conf and add the following lines (bolded to show the additions) inside the <ProxyMatch /oauth/authorize> section:

  <ProxyMatch /oauth/authorize>
    AuthName openshift

    LookupOutput Headers
    LookupUserAttr mail X-Remote-User-Email
    LookupUserGECOS X-Remote-User-Display-Name

    RequestHeader set X-Remote-User %{REMOTE_USER}s env=REMOTE_USER
 </ProxyMatch>

Then restart Apache to pick up the changes.

# systemctl restart httpd.service

Configuring OpenShift

The proxy is now all set, so it’s time to tell OpenShift where to find these new attributes during login. Edit the /etc/origin/master/master-config.yaml file and add the following lines to the identityProviders section (new lines bolded):

  identityProviders:
  - name: sssd
  challenge: true
  login: true
  mappingMethod: claim
  provider:
    apiVersion: v1
    kind: RequestHeaderIdentityProvider
    challengeURL: "https://proxy.example.com/challenging-proxy/oauth/authorize?${query}"
    loginURL: "https://proxy.example.com/login-proxy/oauth/authorize?${query}"
    clientCA: /etc/origin/master/proxy/proxyca.crt
    headers:
    - X-Remote-User
    emailHeaders:
    - X-Remote-User-Email
    nameHeaders:
    - X-Remote-User-Display-Name

Go ahead and launch OpenShift with this updated configuration and log in to the web as a new user. You should see their full name appear in the upper-right of the screen. You can also verify with oc get identities -o yaml that both email addresses and full names are available.

Debugging Notes

OpenShift currently only saves these attributes to the user at the time of the first login and doesn’t update them again after that. So while you are testing (and only while testing), it’s advisable to run oc delete users,identities --all to clear the identities out so you can log in again.


OpenShift and SSSD Part 2: LDAP Form Authentication

Overview

This is the second post in a series on setting up advanced authentication mechanisms with OpenShift Origin. This entry will build upon the foundation created earlier, so if you haven’t already gone through that tutorial, start here. Note that some of the content on that page has changed since it was first published to ensure that this second part is easier to set up, so make sure to double-check your configuration.

Configuring Form-based Authentication

In this tutorial, I’m going to describe how to set up form-based authentication to use when signing into the OpenShift Origin web console. The first step is to prepare a login page. The OpenShift upstream repositories have a handy template for forms, so we will copy that down to our authenticating proxy on proxy.example.com.

# curl -o /var/www/html/login.html \
    https://raw.githubusercontent.com/openshift/openshift-extras/master/misc/form_auth/login.html

You may edit this login HTML however you prefer, but if you change the form field names, you will need to update those in the configuration below as well.

Next, we need to install another Apache module, this time for intercepting form-based authentication.

# yum -y install mod_intercept_form_submit

Then we need to modify /etc/httpd/conf.modules.d/55-intercept_form_submit.conf and uncomment the LoadModule line.

Next, we’ll add a new section to our openshift-proxy.conf inside the <VirtualHost *:443> block.

  <Location /login-proxy/oauth/authorize>
    # Insert your backend server name/ip here.
    ProxyPass https://openshift.example.com:8443/oauth/authorize

    InterceptFormPAMService openshift
    InterceptFormLogin httpd_username
    InterceptFormPassword httpd_password

    RewriteCond %{REQUEST_METHOD} GET
    RewriteRule ^.*$ /login.html [L]
  </Location>

This tells Apache to listen for POST requests on the /login-proxy/oauth/authorize and pass the username and password over to the openshift PAM service, just like in the challenging-proxy example in the first entry of this series. This is all we need to do on the Apache side of things, so restart the service and move back over to the OpenShift configuration.

In the master-config.yaml, update the identityProviders section as follows (new lines bolded):

  identityProviders:
  - name: any_provider_name
    challenge: true
    login: true
    mappingMethod: claim
    provider:
      apiVersion: v1
      kind: RequestHeaderIdentityProvider
      challengeURL: "https://proxy.example.com/challenging-proxy/oauth/authorize?${query}"
      loginURL: "https://proxy.example.com/login-proxy/oauth/authorize?${query}"
      clientCA: /etc/origin/master/proxy/proxyca.crt
      headers:
      - X-Remote-User

Now restart OpenShift with the updated configuration. You should be able to browse to https://openshift.example.com:8443 and use your LDAP credentials at the login form to sign in.


May 26, 2016

OpenShift and SSSD Part 1: Basic LDAP Authentication

Overview

OpenShift provides a fairly simple and straightforward authentication provider for use with LDAP setups. It has one major limitation, however: it can only connect to a single LDAP server. This can be problematic if that LDAP server becomes unavailable for any reason. When this happens, end-users get very unhappy.

Enter SSSD. Originally designed to manage local and remote authentication to the host OS, it can now be configured to provide identity, authentication and authorization services to web services like OpenShift as well. It provides a multitude of advantages over the built-in LDAP provider; in particular it has the ability to connect to any number of failover LDAP servers as well as to cache authentication attempts in case it can no longer reach any of those servers.

These advantages don’t come without a cost, of course: the setup of this configuration is somewhat more advanced, so I’m writing up this guide to help you get it set up. Rather than adding a few lines to the master-config.yml in OpenShift and calling it a day, we are going to need to set up a separate authentication server that OpenShift will talk to. This guide will describe how to do it on a dedicated physical or virtual machine, but the concepts should also be applicable to loading up such a setup in a container as well. (And in the future, I will be looking into whether we could build such a static container right into OpenShift, but for now this document will have to suffice.) For this guide, I will use the term VM to refer to either type of machine, simply because it’s shorter to type and read.

This separate authentication server will be called the “authenticating proxy” from here on out and describes a solution that will provide a specialized httpd server that will handle the authentication challenge and return the results to the OpenShift Server. See the OpenShift documentation for security considerations around the use of an authenticating proxy.

Formatting Notes

  • If you see something in italics within a source-code block below, you should replace it with the appropriate value for your environment.
  • Source-code blocks with a leading ‘#’ character indicates a command that must be executed as the “root” user, either by logging in as root or using the sudo command.
  • Source-code blocks with a leading ‘$’ character indicates a command that may be executed by any user (privileged or otherwise). These commands are generally for testing purposes.

Prerequisites

You will need to know the following information about your LDAP server to follow the directions below:

  • Is the directory server powered by FreeIPA, Active Directory or another LDAP solution?
  • What is the URI for the LDAP server? e.g. ldap.example.com
  • Where is the CA certificate for the LDAP server?
  • Does the LDAP server correspond to RFC 2307 or RFC2307bis for user-groups?

Prepare VMs:

  • proxy.example.com: A VM to use as the authenticating proxy. This machine must have at least SSSD 1.12.0 available, which means a fairly recent operating system. In these examples, I will be using a clean install of Red Hat Enterprise Linux 7.2 Server.
  • openshift.example.com: A VM to use to run OpenShift

(These machines *can* be configured to run on the same system, but for the purposes of this tutorial, I am keeping them separate)

Phase 1: Certificate Generation

In order to ensure that communication between the authenticating proxy and OpenShift is trustworthy, we need to create a set of TLS certificates that we will use during the other phases of this setup. For the purposes of this demo, we will start by using the auto-generated certificates created as part of running

# openshift start \
    --public-master=https://openshift.example.com:8443 \
    --write-config=/etc/origin/

Among other things, this will generate /etc/origin/master/ca.{cert|key}. We will use this signing certificate to generate keys to use on the authenticating proxy.

# mkdir -p /etc/origin/proxy/
# oadm ca create-server-cert \
    --cert='/etc/origin/proxy/proxy.example.com.crt' \
    --key='/etc/origin/proxy/proxy.example.com.key' \
    --hostnames=proxy.example.com,1.2.3.4 \
    --signer-cert=/etc/origin/master/ca.crt \
    --signer-key='/etc/origin/master/ca.key' \
    --signer-serial='/etc/origin/master/ca.serial.txt'

For the hostnames, ensure that any hostnames and interface IP addresses that might need to access the proxy are listed, otherwise the HTTPS connection will fail.

Next, we will generate the API client certificate that the authenticating proxy will use to prove its identity to OpenShift (this is necessary so that malicious users cannot impersonate the proxy and send fake identities). First, we will create a new CA to sign this client certificate.

# oadm ca create-signer-cert \
  --cert='/etc/origin/proxy/proxyca.crt' \
  --key='/etc/origin/proxy/proxyca.key' \
  --name='openshift-proxy-signer@`date +%s`' \
  --serial='/etc/origin/proxy/proxyca.serial.txt'

(The date +%s in that block is used to make the  signer unique. You can use any name you prefer, however.)

# oadm create-api-client-config \
    --certificate-authority='/etc/origin/proxy/proxyca.crt' \
    --client-dir='/etc/origin/proxy' \
    --signer-cert='/etc/origin/proxy/proxyca.crt' \
    --signer-key='/etc/origin/proxy/proxyca.key' \
    --signer-serial='/etc/origin/proxy/proxyca.serial.txt' \
    --user='system:proxy'
# cat /etc/origin/proxy/system\:proxy.crt \
      /etc/origin/proxy/system\:proxy.key \
      > /etc/origin/proxy/authproxy.pem

Phase 2: Authenticating Proxy Setup

Step 1: Copy certificates

From openshift.example.com, securely copy the necessary certificates to the proxy machine:

# scp /etc/origin/proxy/master/ca.crt \
      root@proxy.example.com:/etc/pki/CA/certs/

# scp /etc/origin/proxy/proxy.example.com.crt \
      /etc/origin/proxy/authproxy.pem \
      root@proxy.example.com:/etc/pki/tls/certs/

# scp /etc/origin/proxy/proxy.example.com.key \
      root@proxy.example.com:/etc/pki/tls/private/

Step 2: SSSD Configuration

Install a new VM with a recent operating system (in order to use the mod_identity_lookup module later, it will need to be running SSSD 1.12.0 or later). In these examples, I will be using a clean install of Red Hat Enterprise Linux 7.2 Server.

First thing is to install all of the necessary dependencies:

# yum install -y sssd \
                 sssd-dbus \
                 realmd \
                 httpd \
                 mod_session \
                 mod_ssl \
                 mod_authnz_pam

This will give us the SSSD and the web server components we will need. The first step here will be to set up SSSD to authenticate this VM against the LDAP server. If the LDAP server in question is a FreeIPA or Active Directory environment, then realmd can be used to join this machine to the domain. This is the easiest way to get up and running.

realm join ldap.example.com

If you aren’t running a domain, then your best option is to use the authconfig tool (or follow the many other tutorials on the internet for configuring SSSD for identity and authentication).

# authconfig --update --enablesssd --enablesssdauth \
             --ldapserver=ldap.example.com \
             --enableldaptls \
             --ldaploadcert=http://ldap.example.com/ca.crt

This should create /etc/sssd/sssd.conf with most of the appropriate settings. (Note: RHEL 7 appears to have a bug wherein authconfig does not create the /etc/openldap/cacerts directory, so you may need to create it manually before running the above command.)

If you are interested in using SSSD to manage failover situations for LDAP, this can be configured simply by adding additional entries in /etc/sssd/sssd.conf on the ldap_uri line. Systems enrolled with FreeIPA will automatically handle failover using DNS SRV records.

Finally, restart SSSD to make sure that all of the changes are applied properly:

$ systemctl restart sssd.service

Now, test that the user information can be retrieved properly:

$ getent passwd <username>
username:*:12345:12345:Example User:/home/username:/usr/bin/bash

At this point, it is wise to attempt to log into the VM as an LDAP user and confirm that the authentication is properly set up. This can be done via the local console or a remote service such as SSH. (Later, you can modify your /etc/pam.d files to disallow this access if you prefer.) If this fails, consult the SSSD troubleshooting guide.

Step 3: Apache Configuration

Now that we have the authentication pieces in place, we need to set up Apache to talk to SSSD. First, we will create a PAM stack file for use with Apache. Create the /etc/pam.d/openshift file and add the following contents:

auth required pam_sss.so
account required pam_sss.so

This will tell PAM (the pluggable authentication module) that when an authentication request is issued for the “openshift” stack, it should use pam_sss.so to determine authentication and access-control.

Next we will configure the Apache httpd.conf. (Taken from the OpenShift documentation and modified for SSSD.) For this tutorial, we’re only going to set up the challenge authentication (useful for logging in with oc login and similar automated tools). A future entry in this series will describe setup to use the web console.

First, create the new file openshift-proxy.conf in /etc/httpd/conf.d (substituting the correct hostnames where indicated):

LoadModule request_module modules/mod_request.so
LoadModule lookup_identity_module modules/mod_lookup_identity.so
# Nothing needs to be served over HTTP.  This virtual host simply redirects to
# HTTPS.
<VirtualHost *:80>
  DocumentRoot /var/www/html
  RewriteEngine              On
  RewriteRule     ^(.*)$     https://%{HTTP_HOST}$1 [R,L]
</VirtualHost>

<VirtualHost *:443>
  # This needs to match the certificates you generated.  See the CN and X509v3
  # Subject Alternative Name in the output of:
  # openssl x509 -text -in /etc/pki/tls/certs/proxy.example.com.crt
  ServerName proxy.example.com

  DocumentRoot /var/www/html
  SSLEngine on
  SSLCertificateFile /etc/pki/tls/certs/proxy.example.com.crt
  SSLCertificateKeyFile /etc/pki/tls/private/proxy.example.com.key
  SSLCACertificateFile /etc/pki/CA/certs/ca.crt

  # Send logs to a specific location to make them easier to find
  ErrorLog logs/proxy_error_log
  TransferLog logs/proxy_access_log
  LogLevel warn
  SSLProxyEngine on
  SSLProxyCACertificateFile /etc/pki/CA/certs/ca.crt
  # It's critical to enforce client certificates on the Master.  Otherwise
  # requests could spoof the X-Remote-User header by accessing the Master's
  # /oauth/authorize endpoint directly.
  SSLProxyMachineCertificateFile /etc/pki/tls/certs/authproxy.pem

  # Send all requests to the console
  RewriteEngine              On
  RewriteRule     ^/console(.*)$     https://%{HTTP_HOST}:8443/console$1 [R,L]

  # In order to using the challenging-proxy an X-Csrf-Token must be present.
  RewriteCond %{REQUEST_URI} ^/challenging-proxy
  RewriteCond %{HTTP:X-Csrf-Token} ^$ [NC]
  RewriteRule ^.* - [F,L]

  <Location /challenging-proxy/oauth/authorize>
    # Insert your backend server name/ip here.
    ProxyPass https://openshift.example.com:8443/oauth/authorize
    AuthType Basic
    AuthBasicProvider PAM
    AuthPAMService openshift
    Require valid-user
  </Location>

  <ProxyMatch /oauth/authorize>
    AuthName openshift
    RequestHeader set X-Remote-User %{REMOTE_USER}s
  </ProxyMatch>
</VirtualHost>

RequestHeader unset X-Remote-User

 

Then we need to tell SELinux that it’s acceptable for Apache to contact the PAM subsystem, so we set a boolean:

# setsebool -P allow_httpd_mod_auth_pam on

At this point, we can start up Apache.

# systemctl start httpd.service

Phase 3: OpenShift Configuration

This describes how to set up an OpenShift server from scratch in an “all in one” configuration. For more complicated (and interesting) setups, consult the official OpenShift documentation.

First, we need to modify the default configuration to use the new identity provider we just created. We’ll start by modifying the /etc/origin/master/master-config.yaml file. Scan through it and locate the identityProviders section and replace it with:

  identityProviders:
  - name: any_provider_name
    challenge: true
    login: false
    mappingMethod: claim
    provider:
      apiVersion: v1
      kind: RequestHeaderIdentityProvider
      challengeURL: "https://proxy.example.com/challenging-proxy/oauth/authorize?${query}"
      clientCA: /etc/origin/master/proxy/proxyca.crt
      headers:
      - X-Remote-User

Now we can start openshift with the updated configuration:

# openshift start \
    --public-master=https://openshift.example.com:8443 \
    --master-config=/etc/origin/master/master-config.yaml \
    --node-config=/etc/origin/node-node1.example.com/node-config.yaml

Now you can test logins with

oc login https://openshift.example.com:8443

It should now be possible to log in with only valid LDAP credentials. Stay tuned for further entries in this series where I will teach you how to set up a “login” provider for authenticating the web console, how to retrieve extended user attributes like email address and full name from LDAP, and also how to set up automatic single-sign-on for users in a FreeIPA or Active Directory domain.

 Updates 2016-05-27: There were some mistakes in the httpd.conf as originally written that made it difficult to set up Part 2. They have been retroactively corrected. Additionally, I’ve moved the incomplete configuration of extended attributes out of this entry and will reintroduce them in a further entry in this series.

May 23, 2016

Thoughts on our security bubble
Last week I spent time with a lot of normal people. Well, they were all computer folks, but not the sort one would find in a typical security circle. It really got me thinking about the bubble we live in as the security people.

There are a lot of things we take for granted. I can reference Dunning Kruger and "turtles all the way down" and not have to explain myself. If I talk about a buffer overflow, or most any security term I never have to explain what's going on. Even some of the more obscure technologies like container scanners and SCAP don't need but a few words to explain what happens. It's easy to talk to security people, at least it's easy for security people to talk to other security people.

Sometimes it's good to get out of your comfort zone though. Last week I spent a lot of the week well outside groups I was comfortable with. It's a good thing for us to do this though. I really do think this is a big problem the security universe suffers from. There are a lot of us who don't really get out there and see what it's really like. I know I always assume everyone else knows a lot about security. They don't know a lot about security. They usually don't even know a little about security. This puts us in a place where we think everyone else is dumb, and they think we're idiots. Do you listen to someone who appears to be a smug jerk? Of course not, nobody does. This is one of the reasons it can be hard to get our messages across.

If we want people to listen to us, they have to trust us. If we want people to trust us, we have to make them understand us. If we want people to understand us, we have to understand them first. That bit of circular Yoda logic sounds insane, but it really is true. There's nothing worse than trying to help someone only to have them ignore you, or worse, do the opposite because they can.

So here's what I want to do. I have some homework for you, assuming you made it this far, which you probably did if you're reading this. Go talk to some non security people. Don't try to educate them on anything, just listen to what they have to say, even if they're wrong, especially if they're wrong, don't correct them. Just listen. Listen and see what you can learn. I bet it will be something amazing.

Let me know what you learn: @joshbressers

May 19, 2016

Reproducing an Open vSwitch Bridge Configuration

In the previous post, I described the setup for installing FreeIPA on a VM parallel to the undercloud VM setup by Tripleo Quickstart. The network on the undercloud VM has been setup up by Ironic and Neutron to listen on a network defined for the overcloud. I want to reproduce this on a second machine that is not enrolled in the undercloud. How can I reproduce the steps?

UPDATE:

This is far more complex than necessary. All I needed to do was:

sudo ip addr add 192.0.2.29/24 dev eth1
sudo ip link set eth1 up

To get connectivity, and persist that info in /etc/sysconfig/network-scripts/ifcfg-eth1

But the OVS “cloning” here is still interesting enough to warrant its own post.

END UPDATE

Using Tripleo Quickstart, I see that the interface I need is created with:

sudo bash -c 'cat < /etc/sysconfig/network-scripts/ifcfg-vlan10
DEVICE=vlan10
ONBOOT=yes
DEVICETYPE=ovs
TYPE=OVSIntPort
BOOTPROTO=static
IPADDR=10.0.0.1
NETMASK=255.255.255.0
OVS_BRIDGE=br-ctlplane
OVS_OPTIONS="tag=10"
EOF'

sudo ifup ifcfg-vlan10

But My VM does not have an OVS_BRIDGE br-ctlplane defined. How do I create that?

Using the ovs commands, I can look at the bridge definition:

$ sudo ovs-vsctl show
84640eab-ba50-452b-bfe9-615d7b254972
    Bridge br-ctlplane
        Port "vlan10"
            tag: 10
            Interface "vlan10"
                type: internal
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
        Port phy-br-ctlplane
            Interface phy-br-ctlplane
                type: patch
                options: {peer=int-br-ctlplane}
        Port "eth1"
            Interface "eth1"
    Bridge br-int
        fail_mode: secure
        Port int-br-ctlplane
            Interface int-br-ctlplane
                type: patch
                options: {peer=phy-br-ctlplane}
        Port br-int
            Interface br-int
                type: internal
        Port "tapacff1724-9f"
            tag: 1
            Interface "tapacff1724-9f"
                type: internal
    ovs_version: "2.5.0"

And that does not exist on the new VM. I’ve been able to deduce that the creation of this bridge happened as a side effect of running

openstack undercloud install

Since I don’t want an undercloud on my other node, I need to reproduce the OVS commands to build the bridge.

I’m in luck. These commands are all captured in /etc/openvswitch/conf.db I can pull them out with:

grep '^\{'  /etc/openvswitch/conf.db | jq '. | ._comment ' | sed -e 's!^\"!!g' -e's!ovs-vsctl:!!' -e 's!\"$!!'   | grep -v null > build-ovs.sh

That gets me:

 ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.12.1
 ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.5.0 \"external-ids:system-id=\\\"a9460ec6-db71-42fb-aec7-a5356bcda153\\\"\" \"system-type=\\\"CentOS\\\"\" \"system-version=\\\"7.2.1511-Core\\\"\"
 ovs-vsctl -t 10 -- --may-exist add-br br-ctlplane -- set bridge br-ctlplane other-config:hwaddr=00:59:cf:9c:84:3a -- br-set-external-id br-ctlplane bridge-id br-ctlplane
 ovs-vsctl -t 10 -- --if-exists del-port br-ctlplane eth1 -- add-port br-ctlplane eth1
 ovs-vsctl -t 10 -- --if-exists del-port br-ctlplane eth1 -- add-port br-ctlplane eth1
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-br br-int -- set Bridge br-int datapath_type=system
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set-fail-mode br-int secure
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Bridge br-int protocols=OpenFlow10
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-br br-ctlplane -- set Bridge br-ctlplane datapath_type=system
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Bridge br-ctlplane protocols=OpenFlow10
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-port br-int int-br-ctlplane -- set Interface int-br-ctlplane type=patch options:peer=nonexistent-peer
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-port br-ctlplane phy-br-ctlplane -- set Interface phy-br-ctlplane type=patch options:peer=nonexistent-peer
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Interface int-br-ctlplane options:peer=phy-br-ctlplane
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Interface phy-br-ctlplane options:peer=int-br-ctlplane
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- add-port br-int tapacff1724-9f -- set Interface tapacff1724-9f type=internal external_ids:iface-id=acff1724-9fb2-4771-a7db-8bd93e7f3833 external_ids:iface-status=active external_ids:attached-mac=fa:16:3e:f6:6d:86
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port tapacff1724-9f other_config:physical_network=ctlplane other_config:net_uuid=6dd40444-6cc9-4cfa-bfbd-15b614f6e9e1 other_config:network_type=flat
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port tapacff1724-9f other_config:tag=1 other_config:physical_network=ctlplane other_config:net_uuid=6dd40444-6cc9-4cfa-bfbd-15b614f6e9e1 other_config:network_type=flat
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port tapacff1724-9f tag=1
 ovs-vsctl -t 10 -- --may-exist add-port br-ctlplane vlan10 tag=10 -- set Interface vlan10 type=internal

Now I don’t want to blindly re-execute this, as there are some embedded values particular to the first machine. The MAC 00:59:cf:9c:84:3a for eth1 is reused by the bridge. The first two lines look like system specific setup. Let’s see if the new VM has anything along these lines.

Things to note:

  1. /etc/openvswitch/ is empty
  2. systemctl status openvswitch.service show the service is not running

Let’s try starting it:
sudo systemctl start openvswitch.service

grep '^\{'  /etc/openvswitch/conf.db | jq '. | ._comment ' | sed -e 's!^\"!!g' -e's!ovs-vsctl:!!' -e 's!\"$!!'   | grep -v null 
 ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.12.1
 ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.5.0 \"external-ids:system-id=\\\"8f68fbfb-9278-4772-87f1-500bc80bb917\\\"\" \"system-type=\\\"CentOS\\\"\" \"system-version=\\\"7.2.1511-Core\\\"\"

So we can drop those two lines.

Extract the MAC for interface eth1:

ip addr show eth1
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:59:cf:9c:84:3e brd ff:ff:ff:ff:ff:ff

OK, that is about all we can do. Execute it.

sudo ./build-ovs.sh

No complaints. What did we get?

$ sudo ovs-vsctl show
1d0cb182-c7c7-4256-8c42-ab013628c2d1
    Bridge br-int
        fail_mode: secure
        Port "tapacff1724-9f"
            tag: 1
            Interface "tapacff1724-9f"
                type: internal
        Port br-int
            Interface br-int
                type: internal
        Port int-br-ctlplane
            Interface int-br-ctlplane
                type: patch
                options: {peer=phy-br-ctlplane}
    Bridge br-ctlplane
        Port phy-br-ctlplane
            Interface phy-br-ctlplane
                type: patch
                options: {peer=int-br-ctlplane}
        Port "vlan10"
            tag: 10
            Interface "vlan10"
                type: internal
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
        Port "eth1"
            Interface "eth1"
    ovs_version: "2.5.0"

Looks right.

One thing I notice that is different is that on undercloud, I the bridge has an IP Address:

7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 00:59:cf:9c:84:3a brd ff:ff:ff:ff:ff:ff
    inet 192.0.2.1/24 brd 192.0.2.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::259:cfff:fe9c:843a/64 scope link 
       valid_lft forever preferred_lft forever

Let’s add one to the bridge on our new machine:

$ cat /etc/sysconfig/network-scripts/ifcfg-br-ctlplane
# This file is autogenerated by os-net-config
DEVICE=br-ctlplane
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=no
PEERDNS=no
DEVICETYPE=ovs
TYPE=OVSBridge
BOOTPROTO=static
IPADDR=192.0.2.1
NETMASK=255.255.255.0
OVS_EXTRA="set bridge br-ctlplane other-config:hwaddr=00:59:cf:9c:84:3a -- br-set-external-id br-ctlplane bridge-id br-ctlplane"

Again, minor edits, to use proper MAC and a different IP address. Bring it up with:

sudo ifup br-ctlplane

And we can see it:

$ ip addr show br-ctlplane
7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 00:59:cf:9c:84:3e brd ff:ff:ff:ff:ff:ff
    inet 192.0.2.3/24 brd 192.0.2.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::259:cfff:fe9c:843e/64 scope link 
       valid_lft forever preferred_lft forever

Last step: we need to bring up the eth1 interface. Again, give it a config file, this time in /etc/sysconfig/network-scripts/ifcfg-eth1

DEVICE=eth1
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=no
PEERDNS=no
DEVICETYPE=ovs
TYPE=OVSPort
OVS_BRIDGE=br-ctlplane
BOOTPROTO=none

And bring it up with :

sudo ifup eth1

Make sure it is up:

$ ip addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP qlen 1000
    link/ether 00:59:cf:9c:84:3e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::259:cfff:fe9c:843e/64 scope link 
       valid_lft forever preferred_lft forever

And usable:

$  ping 192.0.2.1
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.
64 bytes from 192.0.2.1: icmp_seq=1 ttl=64 time=1.41 ms
64 bytes from 192.0.2.1: icmp_seq=2 ttl=64 time=0.627 ms

I’d really like to laud the Open vSwitch developers for their approach to the database. Having the commands available in the database is a fantastic tool. That is pattern I would love to see emulated elsewhere.

Installing FreeIPA on a Tripleo undercloud

I’ve been talking about using FreeIPA to secure OpenStack since the Havana summit in Portland. I’m now working with Tripleo to install OpenStack. To get the IPA server installed along with Tripleo Quickstart requires a VM accessible from the Ansible playbook.

UPDATE:  This is changing rapidly.  I’ll post complete updates in a bit, but the commit below is now one in a chain, and the isntracutrions are in the git messages for the commits.  One missing step in order to run ansible is:export ANSIBLE_CONFIG=$HOME/.quickstart/tripleo-quickstart/ansible.cf

 

Build the Identity VM

  • Apply the patch to quickstart that builds the VM
  • Run quickstartm at least up to the undercloud stage. The steps below do the complete install.

Since Quickstart makes a git repo under ~/.quickstart, I’ve been using that as my repo. It avoids duplication, and makes my changes visible.

mkdir ~/.quickstart
cd ~/.quickstart
git clone https://github.com/openstack/tripleo-quickstart
cd tripleo-quickstart
git review -d 315749
~/.quickstart/tripleo-quickstart/quickstart.sh   -t all warp.lab4.eng.bos.redhat.com

If you are not set up for git review, you can pull the patch manually from Gerrit.

Set the hostname FQDN for the identity machine

ssh -F /home/ayoung/.quickstart/ssh.config.ansible identity-root hostnamectl set-hostname --static identity.warp.lab4.eng.bos.redhat.com

Add variables to the inventory file ~/.quickstart/hosts

[vms:vars]
ipa_server_password=FreeIPA4All
ipa_domain=warp.lab4.eng.bos.redhat.com
deployment_dir=/home/ayoung/.ossipee/deployments/warp.lab4.eng.bos.redhat.com
ipa_realm=WARP.LAB4.ENG.BOS.REDHAT.COM
cloud_user=stack
ipa_admin_user_password=FreeIPA4All
ipa_forwarder=
nameserver=

Activate the Venv:

. ~/.quickstart/bin/activate

Use Rippowam branch

cd ~/devel
git clone https://github.com/admiyo/rippowam
cd rippowam
git checkout origin/tripleo

Run ansible

ansible-playbook -i ~/.quickstart/hosts ~/devel/rippowam/ipa.yml

Making this VM available to the overcloud requires some network wizardry. That deserves a post itself.

May 15, 2016

Security will fix itself, eventually
If you're in the security industry these days things often don't look very good. Everywhere you look it sometimes feels like everything is on fire. The joke is there are two types of companies, those that know they've been hacked and those that don't. The world of devices looks even worse. They're all running old software, most will never see updates, most of the people building the things don't know or care about proper security, most people buying them don't know this is a problem.

I heard a TED talk by Al Gore called The case for optimism on climate change. This made me think of security in some ways. The basics of the talk are that things are getting better, we're surpassing many goals set for things like renewable energy. A few years ago the idea of renewable energy beating out something like coal seemed far fetched.

That reminded me of the current state of security. It's hard to see a future that's very bright sometimes. For ever problem that gets fixed, at least two new ones show up. The thing that gives me optimism though is the same basic idea as climate change. It has to get better because there is no alternative.

If we look back at renewable energy, the biggest force keeping it out of the market even five years ago was cost. It was really expensive to build and deploy things like solar panels. Today it's the same price or cheaper in some instances.

What happened?

The market happened. As new technology emerges and develops, it gets cheaper. This is one of the amazing things about emerging technology. Entrenched technology generally doesn't change price drastically just due to its nature. Solar power is getting better, it's not done yet, it will continue to get better for less cost. The day will come when we think about current power generation the way we think about using horses for transportation.

Now let's think about security.

If you want secure devices and a secure infrastructure it's going to cost a fortune. You're talking about very high skilled staff and extremely expensive hardware and software (assuming you can even get it in some cases). Today security is added cost in many cases, so lots of producers skip it. Bad security has cost too though. Today bad security is generally cheaper than good security. We need to flip this around, good security needs to be cheaper than bad security.

The future.

Here's my prediction though. In the future, good security will be cheaper to build, deploy, and run that bad security. This sounds completely insane with today's technology. A statement like is some kook ten years ago telling everyone solar power is our future. Ten years ago solar wasn't a serious thing, today it is. Our challenge is figuring out what the new security future will look like. We don't really know yet. We know we can't train our way out of this, most existing technology is a band-aid at best. If I had to guess I'll use the worn out "Artificial Intelligence will save us all", but who knows what the future will bring. Thanks to Al Gore, I'm now more optimistic things will get better. I'm impatient though, I don't want to wait for the future, I want it now! So all you smart folks do me a favor and start inventing the future.

What do you think? Leave your comments on twitter: @joshbressers

May 12, 2016

Lessons Learned writing a certmonger helper for Anchor

Guang Yee has been trying to get certmonger talking to Anchor — an ephemeral CA, worth a post by itself. His attitude went from “this is easy” to “I’m about to give up on certmonger” to “Got it.” Here is his post-mortem:

Finally got the basic flow working. I am now able to run Anchor and getting the server certs with certmonger. Running certmonger-session in debug mode was really beneficial. Your blogs on younglogic helped out quite a bit as well. Next stop, put them all together and submit a patch for devstack.

Lessons learned so far:

  1. Documentation does not match reality. For example, the “getcert add-ca” command is not available on the version I got. I did my work on Ubuntu Trusty LTS. My understanding is that the LTS, like RHEL, tend to carry old (but stable?) packages?
  2. There isn’t a whole lot of example on certmonger helper. I had to learn as I go.
  3. Certmonger-session tend to overwrite my changes in ~/.config/certmonger/cas/ dir. I have to do “killall certmonger-session” before making any changes.
  4. Troubleshooting wasn’t easy at the beginning. There were a bunch of dbus interactions in the logs which I don’t know what to do with them. The “org.fedorahosted.certmonger.ca.get_nickname” logs concerned me at the beginning. I thought this is supposed to be a generic cert monitoring daemon. I was concerned it may be making calls outside of my box.
  5. If the script fail to load, nothing show up in syslog. Best way would be to run the script independently before hooking it up with certmonger. I screwed up on the exit code, that’s why I kept getting NEED_GUILDANCE status. In this case, running certmonger-session manually in debug level 15 helps a lot.
  6. I had trouble with Anchor at the beginning as I was running an outdated version of Pecan. But once I got that fixed, I did encounter any more issues with Anchor.

We’ll take this input back to the Certmonger team. Some are due to the older version of Certmonger, which is motivation to get an updated on available for Trusty. I’d like to get a Python shell defined that other Certmonger helper apps can use as a starting point: something that deals with the Env Vars, but then allows a developer to register a class that does the CA specific code.

Thanks to Guang for battling through this and again to Nalin Dahyabhai for helping to debug.

May 10, 2016

Certmonger logging for debugging

Certmonger is split into 3 parts

  1. getcert or comparable helper app which the user calls to make requests.  The request is put on dbus and and sent to
  2. The certmonger binary.  This reads the request off of dbus and makes a call to
  3. The help application which makes calls to the remote service.

Debugging this process is much easier if you run the certmonger service from the command line and tell it to log debugging output.  Make sure no certmonger-session processes are running:

killall certmonger-session

Then explicitly start the certmonger session binary in non-daemon mode with debugging.

/usr/libexec/certmonger/certmonger-session -n -d 15

I chose 15 as a “very high number” for debugging. It worked for me.

Make sure that the dbus setup for certmonger has been set as an env var:

$ echo $DBUS_SESSION_BUS_ADDRESS
unix:abstract=/tmp/dbus-bNCrVqqfu5,guid=36fe37806871d8469a484e91573145db

Then make a request in a separate terminal like:

 getcert list -s

And you should see logging from certmonger-session

2016-05-10 16:59:02 [21970] Dequeuing FD 8 for Read for 0x55c4635aba90:0x55c4635af070.
2016-05-10 16:59:02 [21970] Handling D-Bus traffic (Read) on FD 8 for 0x55c4635aba90.
2016-05-10 16:59:02 [21970] message 0x55c4635aba90(method_call)->org.fedorahosted.certmonger:/org/fedorahosted/certmonger:org.fedorahosted.certmonger.get_requests
2016-05-10 16:59:02 [21970] Pending GetConnectionUnixUser serial 105
2016-05-10 16:59:02 [21970] Pending GetConnectionUnixProcessID serial 106
...

And lots more.

To add a request:

getcert request -n remote   -c remote -s -d ~/certs/  -N "uid=ayoung,cn=users,cn=accounts,dc=openstack,dc=freeipa,dc=org"

And see the output.

2016-05-10 17:00:09 [21970] Request2('20160510210008') moved to state 'HAVE_CSR'
2016-05-10 17:00:09 [21970] Will revisit Request2('20160510210008') now.
2016-05-10 17:00:09 [21970] Request2('20160510210008') moved to state 'NEED_TO_SUBMIT'
2016-05-10 17:00:09 [21970] Will revisit Request2('20160510210008') now.
2016-05-10 17:00:09 [21970] Request2('20160510210008') moved to state 'SUBMITTING'
2016-05-10 17:00:09 [21970] Will revisit Request2('20160510210008') on traffic from 15.

May 09, 2016

Passing Unix Socket File Descriptors between containers processes blocked by SELinux.
SELinux controls passing of Socket file descriptors between processes.

A Fedora user posted a bugzilla complaining about SELinux blocking transfer of socket file descriptors between two docker containers.

Lets look at what happens when a socket file descriptor is created by a process.

When a process accepts a connection from a remote system, the file descriptor is created by a process it automatically gets assigned the same label as the process creating the socket.  For example when the docker service (docker_t) listens on /var/run/docker.sock and a client connects the docker service, the docker service end of the connection gets labeled by default with the label of the docker process.  On my machine this is:

system_u:system_r:docker_t:s0

The client is probably running as unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023.  SELinux would then check to make sure that unconfined_t is able to connecto docker_t sockets.

If this socket descriptor is passed to another process the new process label has to have access to the socket with the "socket label".  If it does not SELinux will block the transfer.

In containers, even though by default all container processes have the same SELinux typo, they have different MCS Labels.

If I have a process labeled system_u:system_r:svirt_lxc_net_t:s0:c1,c2 and I pass that file descriptor to a process in a different container labeled system_u:system_r:svirt_lxc_net_t:s0:c4,c5, SELinux will block the access.

The bug reporter was reporting that by default he was not able to pass the descriptor, which is goodness. We would not want to allow a confined container to be able to read/write socket file descriptors from another container by default.

The reporter also figured out that he could get this to work by disabling SELinux either on the host or inside of the container.

Surprisingly he also figured out if he shared IPC namespaces between the containers, SELinux would not block.

The reason for this is when you share the same IPC Namespace, docker automatically caused the containers share the Same SELinux label.  If docker did not do this SELinux would block processes from container A access to IPCs created in Container B.  With a shared IPC the SELinux labels for both of the reporters containers were the same, and SELinux allowed the passing.

How would I make two containers share the same SELinux labels?

Docker by default launches all containers with the same type field, but different MCS labels.  I told the reporter that you could cause two containers to run with the same MCS labels by using the --security-opt label:level:MCSLABEL option.

Something like this will work

docker run -it --rm --security-opt label:level:s0:c1000,c1001 --name server -v myvol:/tmp test /server
docker run -it --rm --security-opt label:level:s0:c1000,c1001 --name client -v myvol:/tmp test /client


These containers would then run with the same MCS labels, which would give the reporter the best security possible and still allow the two containers to pass the socket between containers.  These containers would still be locked down with SELInux from the host and other containers, however they would be able to attack each other from an SELinux point of view, however the other container separation security would still be in effect to prevent the attacks.

May 08, 2016

Security isn't a feature, it's a part of everything
Almost every industry goes through a time when new novel features are sold as some sort of add on or extra product. Remember needing a TCP stack? What about having to buy a sound card for your computer, or a CD drive? (Does anyone even know what a CD is anymore?) Did you know that web browsers used to cost money? Times were crazy.

Let's think about security now. There is a lot of security that's some sort of add on, or maybe a separate product. Some of this is because it's a clever idea, some things exist because people are willing to pay for it even if it should be included. No matter what we're talking about, there is always a march toward commoditization. This is how Linux took over the universe, the operating system is a commodity now, it's all about how you put things together using things like containers and devops and cloud.

Now let's think about security. Of all the things going on, all the products out there, all the methodologies, security is always the special snowflake. For being so special you'd think we could get more right. If everything was fine, the Red Team wouldn't win. every. single. time.

The reality is that until we stop treating security like some sort of special add on, we're not going to see things make any real improvements. Think about any product you use, there are always things that are just an expected part of it. Security should fall under this category. Imagine if your car didn't come with locks. Or if it had locks, but you had to cut your own keys before you could use them. What if every safe shipped with the same combination, if you wanted a new one you had to pay for it? There are a lot of things we just expect because they make sense.

I'm sure you get the idea I'm shooting for here. Today we treat security like something special. You have to buy a security solution if you want to be secure. Or you have to configure your product a certain way if you want it secure. If we want to really start solving security problems, we have to make sure security isn't something special we talk about later, or plan to add in version two. It has to just be a part of everything. There aren't secure options, all the options need to be what we would call "secure" today. The days of security as an optional requirement are long gone. Remember when we thought those old SSL algorithms could just stick around forever? Nobody thinks that anymore.

How are we going to fix this? That's the real trick. It's easy to talk about demanding security and voting with your pocketbook, but the reality is this isn't very possible today. Security isn't usually a big differentiator. If we expect security to just be part of everything, we also can't expect anyone to see security as a feature they look for. How do we ensure there is a demand for something that is by definition a secondary requirement? How do we get developers to care about something that isn't part of a requirement? How do we get organizations to pay for something that doesn't generate revenue?

There are some groups trying to do the right thing here. I think almost everyone is starting to understand security isn't a feature. Of course just because there's some interest and people are beginning to understand doesn't mean everything will be fixed quickly or easily. We have a long way to go still. It won't be easy, it won't be quick. It's possible everything could go off the rails. The only thing harder than security is planning for security :)

Do you think you know how to fix this mess? Impress me with your ideas: @joshbressers

May 05, 2016

Testing Fernet Tokens on Tripleo

Not the way to do it long term, but this will give you a chance to play with it.

From the controller node:

sudo keystone-manage fernet_setup --keystone-user keystone --keystone-group keystone
sudo crudini --set /etc/keystone/keystone.conf token provider fernet
sudo systemctl restart httpd.service

Test it

$ openstack  token issue -f shell
expires="2016-05-05T05:21:44Z"
id="gAAAAABXKspYhz7Ti5ldwi0mU4D69NqTINEU_t-e8MoxqVkVhR40w1E7GOmgai-9lanr2Z6bnoyQSgNWIhD63UOm1Mlsm9_hw5oTCqVO_pWJZwTomlWM2BrG5LqTOyp6PNqYz2pZ0DIaSTOnOQPeVqKp4ot8S3B6oA4Xy1JZo3305DPiApCzOyQ"
project_id="b383d314cc344639939f2a9a381a6945"
user_id="4e154e7d166d4bd6b8199dfd3a6f2468"

May 04, 2016

Identity work for the OpenStack Newton release

The Newton Summit is behind us, and we have six months to prepare for the next release in both upstream OpenStack and RDO. Here is my attempt to build a prioritized list of the large tasks I want to tackle in this release.

  1. Federation:  We need to test RDO against Several identity providers (IdP), to include Shibboleth, Keycloak, and Ipsilon.  In order to do this, we need a way to install a test version of the IdP in a virtual machine along-side the undercloud in a Tripleo deploy.  Since it looks like instack and tripleo-quickstart are converging, I’ll probably close this task out with quickstart.  The undercloud setup of quickstart assumes only a single machine (non-ha) and I want to make that set up a second, so the machine is visible to both the outside world and the overcloud.  I already have Ansible roles for deploying Keycloak and Ipsilon in the Rippowam repo that should be easily extensible to Shibboleth as well.
  2. Tripleo LDAP Configuration:  Continuing on the track of configuration identity from Tripleo, we want to be able to automate the steps to integrate LDAP into a Keystone server managed by Tripleo.  Like the Federation steps above and the policy work below, the m,ain effort here is making sure that configuration changes can be preserved across redeploys and will be properly synchronized in an HA deployment.
    1. A prerequisite for domain specific backens is that the deploy uses the V3 Keystone API everywhere. We need to test and confirm that this is the case, and fix any places where that has not been done.
  3. FreeIPA:  There is much of Tripleo that cannot be secured without an identity provider.  Most essential is to have a sound PKI strategy, but beyond that, we need a way to secure both the undercloud and the overcloud VMs, provide identity for each of them, and set proper access controls.  While FreeIPA will not be required for use with Tripleo, it will be possible to make use of a FreeIPA server. To ensure that it is trivial to make a FreeIPA server available for deployers that want one, the additional VM described above can be used to install the FreeIPA server.  The LDAP configuration above can make use of the LDAP backend, or a deploy  can use Federation via SSSD and Kerberos.
  4. Token Revocation: Convert the Keystone revocation events code to use a linear search in the list instead of the current Tree code.  While the Tree code was an interesting approach, it proved to be both too complex for most developers to understand, and not to perform too well.  In addition, the current code performs revocation for many events such a s project deactivation and user deletion which are better checked in the objects themselves during token validation.  Removing these redundant rules should make the revocation test go very fast, and they will be performed implicitly during re-population of Fernet tokens anyway.
  5. Fernet Tokens:  While UUID tokens will not be going away this release, there seems to be little reason to make them the default provider, when we really want people to move to Fernet.  There are  still a few issues with test that were not run when Fernet was not the default, including some time related issues that I hope will be flushed out with the simplification of the revocation code listed above.  Fernet will be the default for Tripleo, if not for Keystone itself.
  6. Oslo messaging identities:  Before we can write ACLs on what service can speak to what, we need to be able to identify the senders and receivers on the queues and topics.  We can do this with no impact to performance by creating a Rabbit user for each service, and one rabbit user for each hypervisor.  This will need to be done for both Devstack and Tripleo.
  7. Policy Customization:  There are several reasons why we are not distributing policy files from the Keystone server.  However, deployers still need to modify policy files, and to do so in a non-surprising way.  I discussed this with the Tripleo team, and the consensus seems to be that we can deploy the policy files as “deployment artifacts” which is, essentially, a tarball with the files inside in their end-locations; e.g. /etc/keystone/policy.json for Keystone.  We should be able to describe a full cycle that applies to all of the deployment tools, not just Tripleo.  While this deserves its own post, the skeleton is:
    1. harvesting the initial policy files from the controller nodes,
    2. packing up the policy archive
    3. storing it where the orchestration engine can find it,
    4. The operator fetching it from the store
    5. operator customization and testing
    6. storing the updated archive
    7. Redeploy of the overcloud with the new policy
  8. Close Bug 968696:  We made some progress on this at the summit.  The most interesting aspect will be a hack that adds “is_admin_project=True” to all tokens that are requesting the “admin” role IFF the admin project is not set in the Keystone configuration.  This will allow the services to update policy files such that, when the configuration option is set, the “admin” role will be properly scoped.  Adding this option to Keystone will allow us to submit changes to all of the other service policy files.

I am supervising a couple other efforts.

  1. Unified Delegation:  Right now, the only way a non-admin user can delegate a role to another user is via trust.  But most users should not have to execute trusts in order to perform mundane operations.  Unified delegation is a way to extend the redelegation properties of trusts to the basic role assignment process.  The work for this has been well underway for a couple releases, and we should be able to finish it up in Newton.  This work is mostly being done by Alexander Makarov.
    1. Allow a user to be able to explicitly request a single role in a token.  This will be useful for limiting exposure.  For example, a user with an Admin role will be able to request a token with only the Member role on it when talking to a third party application.
  2. Python3 LDAP:  The big thing keeping Keystone from being run on Python3 is the LDAP library. python-ldap is Python 2 only, and the maintainer does not plan on making it python3 compatible.  The original plan was to use ldap3, a pure python implementation of an LDAP client, but the protocol is complex enough that we are looking instead to use a fork of python-ldap call pyldap instead.  However, since so much of the ldap3 work is already done, we might end up having both implementations in parallel and testing.  The ldap3 based code is much cleaner, but that cleanup could be applied to pyldap as well. This work is being done by Kristi Nikolla and Roxana Gherle.
  3. Federation Shadow Users:  There is a long standing pain point that administrators must work with userids but only know usernames.  This problem is worse in Federation, where the users might not even have userids yet, as they have never authenticated.  During the design summit, we came up with a plan for a handful of new APIs.  All of these APIs would have the input values for a mapping in the payload, delivered as a dictionary.  Ron De Rose.
    1. Query the results of a mapping call
    2. Check to see if a Federated user is in the shadow table
    3. Initialize an entry in the Shadow user table

There are a couple efforts that other members of my team are working on that are complementary to the the above list.

  1. TLS Everwhere.   Juan Antonio Osorio Robles and Rob Crittenden.
    1. Network configuration for
      1. HTTPS between each server
      2. TLS enabled fort the message broker
      3. TLS/LDAPS for LDAP
      4. TLS to the Database
    2. Establishing Certmonger as the tool to provider certificates
    3. Setting up a Selfsigniong approach for Devstack
    4. Allowing for multiple CAs for deployments
      1. Dogtag/FreeIPA
      2. Anchor
  2. Autoregistration of VMs with an Identity provider. Rob Crittenden.  This one has proven to be an contentious issue, as the hooks inside Nova that we were depending on to implement it got deprecated….due to team members submitting bug reports.  Discussions at the summit are pointing at an approach based on modifing the metadata server.  Michael Still has an idea he is writing up for a general purpose mechanism.

May 02, 2016

Self-Signed SSL/TLS Certificates: Why They are Terrible and a Better Alternative

A Primer on SSL/TLS Certificates

Many of my readers (being technical folks) are probably already aware of the purpose and value of certificates, but in case you are not familiar with them, here’s a quick overview of what they are and how they work.

First, we’ll discuss public-key encryption and public-key infrastructure (PKI). It was realized very early on in human history that sometimes you want to communicate with other people in a way that prevents unauthorized people from listening in. All throughout time, people have been devising mechanisms for obfuscating communication in ways that only the intended recipient of the code would be able to understand. This obfuscation is called encryption, the data being encrypted is called plaintext and the encrypted data is called ciphertext. The cipher is the mathematical transformation that is used to turn the plaintext into the ciphertext and relies upon one or more keys known only to trusted individuals to get the plaintext back.

Early forms of encryption were mainly “symmetric” encryption, meaning that the cipher used the same key for both encryption and decryption. If you’ve ever added a password to a PDF document or a ZIP file, you have been using symmetric encryption. The password is a human-understandable version of a key. For a visual metaphor, think about the key to your front door. You may have one or more such keys, but they’re all exactly alike and each one of them can both lock and unlock the door and let someone in.

Nowadays we also have forms of encryption that are “asymmetric”. What this means is that one key is used to encrypt the message and a completely different key is used to decrypt it. This is a bit harder for many people to grasp, but it works on the basic mathematical principle that some actions are much more complicated to reverse than others. (A good example I’ve heard cited is that it’s pretty easy to figure out the square of any number with a pencil and a couple minutes, but most people can’t figure out a square-root without a modern calculator). This is harder to visualize, but the general idea is that once you lock the door with one key, only the other one can unlock it. Not even the one that locked it in the first place.

So where does the “public” part of public-key infrastructure come in? What normally happens is that once an asymmetric key-pair is generated, the user will keep one of those two keys very secure and private, so that only they have access to it. The other one will be handed out freely through some mechanism to anyone at all that wants to talk to you. Then, if they want to send you a message, they simply encrypt their message using your public key and they know you are the only one who can decrypt it. On the flip side, if the user wanted to send a public message but provide assurance that it came from them, they can also sign a message with the private key, so that the message will contain a special signature that can be decrypted with their public key. Since only one person should have that key, recipients can trust it came from them.

Astute readers will see the catch here: how do users know for certain that your public key is in fact yours? The answer is that they need to have a way of verifying it. We call this establishing trust and it’s exceedingly important (and, not surprisingly, the basis for the rest of this blog entry). There are many ways to establish trust, with the most foolproof being to receive the public key directly from the other party while looking at two forms of picture identification. Obviously, that’s not convenient for the global economy, so there needs to be other mechanisms.

Let’s say the user wants to run a webserver at “www.mydomain.com”. This server might handle private user data (such as their home address), so a wise administrator will set the server up to use HTTPS (secure HTTP). This means that they need a public and private key (which in this case we call a certificate). The common way to do this is for the user to contact a well-known certificate authority and purchase a signature from them. The certificate authority will do the hard work of verifying the user’s identity and then sign their webserver certificate with the CA’s own private key, thus providing trust by way of a third-party. Many well-known certificate authorities have their public keys shipped by default in a variety of operating systems, since the manufacturers of those systems have independently verified the CAs in turn. Now everyone who comes to the site will see the nice green padlock on their URL bar that means their communications are encrypted.

A Primer on Self-Signed Certificates

One of the major drawbacks to purchasing a CA signature is that it isn’t cheap: the CAs (with the exception of Let’s Encrypt) are out there to make money. When you’re developing a new application, you’re going to want to test that everything works with encryption, but you probably aren’t going to want to shell out cash for every test server and virtual machine that you create.

The solution to this has traditionally been to create what is called a self-signed certificate. What this means is that instead of having your certificate signed by a certificate authority, you instead use the certificates public key to add a signature to the private key. The problem with this approach is that web browsers and other clients that verify the security of the connection will be unable to verify that the server is who it says it is. In most cases, the user will be presented with a warning page that informs them that the server is pretending to be the one you went to. When setting up a test server, this is expected. Unfortunately, however, clicking through and saying “I’m sure I want to connect” has a tendency to form bad habits in users and often results in them eventually clicking through when they shouldn’t.

It should be pretty obvious, but I’ll say it anyway: Never use a self-signed certificate for a production website.

One of the problems we need to solve is how to avoid training users to ignore those warnings. One way that people often do this is to load their self-signed certificate into their local trust store (the list of certificate authorities that are trusted, usually provided by the operating system vendor but available to be extended by the user). This can have some unexpected consequences, however. For example, if the test machine is shared by multiple users (or is breached in a malicious attack), then the private key for the certificate might fall into other hands that would then use it to sign additional (potentially malicious) sites. And your computer wouldn’t try to warn you because the site would be signed by a trusted authority!

So now it seems like we’re in a Catch-22 situation: If we load the certificate into the trusted authorities list, we run the risk of a compromised private key for that certificate tricking us into a man-in-the-middle attack somewhere and stealing valuable data. If we don’t load it into the trust store, then we are constantly bombarded by a warning page that we have to ignore (or in the case of non-browser clients, we may have to pass an option not to verify the client) in which case we could still end up in a man-in-the-middle attack, because we’re blindly trusting the connection. Neither of those seems like a great option. What’s a sensible person to do?

Two Better Solutions

So, let’s take both of the situations we just learned about and see if we can locate a middle ground somewhere. Let’s go over what we know:

  • We need to have encryption to protect our data from prying eyes.
  • Our clients need to be able to trust that they are talking to the right system at the other end of the conversation.
  • If the certificate isn’t signed by a certificate in our trust store, the browser or other clients will warn or block us, training the user to skip validation.
  • If the certificate is signed by a certificate in our trust store, then clients will silently accept it.
  • Getting a certificate signed by a well-known CA can be too expensive for an R&D project, but we don’t want to put developers’ machines at risk.

So there are two better ways to deal with this. One is to have an organization-wide certificate authority rather than a public one. This should be managed by the Information Technologies staff. Then, R&D can submit their certificates to the IT department for signing and all company systems will implicitly trust that signature. This approach is powerful, but can also be difficult to set up (particularly in companies with a bring-your-own-device policy in place). So let’s look at a another solution that’s closer to the self-signed approach.

The other way to deal with it would be to create a simple site-specific certificate authority for use just in signing the development/test certificate. In other words, instead of generating a self-signed certificate, you would generate two certificates: one for the service and one to sign that certificate. Then (and this is the key point – pardon the pun), you must delete and destroy the private key for the certificate that did the signing. As a result, only the public key of that private CA will remain in existence, and it will only have ever signed a single service. Then you can provide the public key of this certificate authority to anyone who should have access to the service and they can add this one-time-use CA to their trust store.

Now, I will stress that the same rule holds true here as for self-signed certificates: do not use this setup for a production system. Use a trusted signing authority for such sites. It’s far easier on your users.

A Tool and a Tale

I came up with this approach while I was working on solving some problems for the Fedora Project. Specifically, we wanted to come up with a way to ensure that we could easily and automatically generate a certificate for services that should be running on initial start-up (such as Cockpit or OpenPegasus). Historically, Fedora had been using self-signed certificates, but the downsides I listed above gnawed at me, so I put some time into it and came up with the private-CA approach.

In addition to the algorithm described above, I’ve also built a proof-of-concept tool called sscg (the Self-Signed Certificate Generator) to easily enable the creation of these certificates (and to do so in a way that never drops the CA’s private key onto a filesystem; it remains in memory). I originally wrote it in Python 3 and that version is packaged for use in Fedora today. This past week as a self-assigned exercise to improve my knowledge of Go, I rewrote the sscg in that language. It was a fun project and had the added benefit of removing the fairly heavyweight dependency on the Python 3 version. I plan to package the golang version for Fedora 25 at some point in the near future, but if you’d like to try it out, you can clone my github repository. Patches and suggestions for functionality are most welcome.


Trusting, Trusting Trust
A long time ago Ken Thompson wrote something called Reflections on Trusting Trust. If you've never read this, go read it right now. It's short and it's something everyone needs to understand. The paper basically explains how Ken backdoored the compiler on a UNIX system in such a way it was extremely hard to get rid of the backdoors (yes, more than one). His conclusion was you can only trust code you wrote. Given the nature of the world today, that's no longer an option.

Every now and then I have someone ask me about Debian's Reproducible Builds. There are other groups working on similar things, but these guys seem to be the furthest along. I want to make clear right away that this work being done is really cool and super important, but not exactly for the reasons people assume. The Debian page is good about explaining what's going on but I think it's easy to jump to some false conclusions on this one.

Firstly, the point of a reproducible build is to allow two different systems to build the exact same binary. This tells us that the resulting binary was not tampered with. It does not tell us the compiler is trustworthy or the thing we built is trustworthy. Just that the system used to build it was clean and the binary wasn't meddled with before it got to you.

A lot of people assume a reproducible build means there can't be a backdoor in the binary. There can due to how the supply chain works. Let's break this down into a few stages. In the universe of software creation and distribution there are literally thousands to millions of steps happening. From each commit, to releases, to builds, to consumption. It's pretty wild. We'll keep it high level.

Here are the places I will talk about. Each one of these could be a book, but I'll keep it short on purpose.
  1. Development: Creation of the code in question
  2. Release: Sending the code out into the world
  3. Build: Turning the code into a binary
  4. Compose: Including the binary in some larger project
  5. Consumption: Using the binary to do something useful
Development
The development stage of anything is possibly the hardest to control. We have reached a point in how we build software that development is now really fast. I would expect any healthy project to have hundreds or thousands of commits every day. Even with code reviews and sign offs, bugs can sneak in. A properly managed project will catch egregious attempts to insert a backdoor.

Release
This is the stage where the project in question cuts a release and puts it somewhere it can be downloaded. A good project will include a detached signature which almost nobody checks. This stage of the trust chain has been attacked in the past. There are many instances of hacked mirrors serving up backdoored content. The detached signature ensures the release is trustworthy. We mostly have trust here solved which is why those signatures are so important.

Build
This is the stage where we take the source code and turn it into a binary. This the step that a reproducible build project has injected trust into. Without a reproducible build stage, there was no real trust here. It's still sort of complicated though. If you've ever looked at the rules that trigger these builds, it wouldn't be very hard to violate trust there, so it's not bullet proof. It is a step in the right direction though.

Compose
This step is where we put a bunch of binaries together to make something useful. It's pretty rare for a single build to output the end result. I won't say it never happens, but it's a bit outside what we're worried about, so let's not dwell on it. The threat we see during this stage is the various libraries you bundle with your application. Do you know where they came from? Do they have some level of trust built in? At this point you could have a totally trustworthy chain of trust, but if you include a single bad library, it can undo everything. If you want to be as diligent as possible you won't ship things built by any 3rd parties. If you build it all yourself, you can ensure some level of trust up to this point then. Of course building everything yourself generally isn't practical. I think this is the next stage that we'll end up adding more trust. Various code scanners are trying to help here.

Consumption
Here is where whatever you put together is used. In general nobody is looking for software, they want a solution to a problem they have. This stage can be the most complex and dangerous though. Even if you have done everything perfectly up to here, if whoever does the deployment makes a mistake it can open up substantial security problems. Better management tools can help this step a lot.

The point of this article isn't to try to scare anyone (even though it is pretty scary if you really think about it). The real point to this is to stress nobody can do this alone. There was once a time a single group could plausibly try to own their entire development stack, those times are long gone now though. What you need to do is look a the above steps and decide where you want to draw your line. Do you have a supplier you can trust all the way to consumption? Do you only trust them for development and release? If you can't draw that line, you shouldn't be using that supplier. In most cases you have to draw the line at compose. If you don't trust what your supplier does beneath that stage, you need a new supplier. Demanding they give you reproducible builds isn't going to help you, they could backdoor things during development or release. It's the old saying: Turst, but verify.

Let me know what you think. I'm @joshbressers on Twitter.

April 24, 2016

Can we train our way out of security flaws?
I had a discussion with some people I work with smarter than myself about training developers. The usual training suggests came up, but at the end of the day, and this will no doubt enrage some of you, we can't train developers to write secure code.

It's OK, my twitter handle is @joshbressers, go tell me how dumb I am, I can handle it.

So anyhow, training. It's a great idea in theory. It works in many instances, but security isn't one of them. If you look at where training is really successful it's for things like how to use a new device, or how to work with a bit of software. Those are really single purpose items, that's the trick. If you have a device that really only does one thing, you can train a person how to use it; it has a finite scope. Writing software has no scope. To quote myself from this discussion:

You have a Turing complete creature, using a Turing complete machine, writing in a Turing complete language, you're going to end up with Turing complete bugs.

The problem with training in this situation is that you can't train for infinite permutations. By its very definition, training can only cover a finite amount of content. Programming by definition requires you to draw on an infinite amount of content. The two are mutually exclusive.

Since you've made it this far, let's come to an understanding. Firstly, training, even how to write software is not a waste of time. Just because you can't train someone to write secure software you can teach them to understand the problem (or a subset of it). The tech industry is notorious for seeing everything as all or none. It's a sliding scale.

So what's the point?

My thoughts on this matter are one of how can we think about the challenges in a different way. Sometimes you have to understand the problem and the tools you have to find better solutions for it. We love to worry about how to teach everyone how to be more secure, when in reality it's all about many layers with small bits of security in each spot.

I hate car analogies, but this time it sort of makes sense.

We don't proclaim the way to stop people getting killed in road accidents is to train them to be better drivers. In fact I've never heard anyone claim this is the solution. We have rules that dictate how to road is to be used (which humans ignore). We have cars with lots of safety features (which humans love to disable). We have humans on the road to ensure the rules are being followed. We have safety built into lots of roads, like guard rails and rumble strips. At the end of the day even with layers of safety built in, there are accidents, lots of accidents, and almost no calls for more training.

You know what's currently the talk about how to make things safer? Self driving cars. It's ironic that software may be the solution to human safety. The point though is that every system reaches a point where the best you can ever do is marginal improvements. Cars are there, software is there. If we want to see substantial change we need new technology that changes everything.

In the meantime, we can continue to add layers of safety for software, this is where most effort seems to be today. We can leverage our existing knowledge and understanding of problems to work on making things marginally better. Some of this could be training, some of this will be technology. What we really need to do is figure out what's next though.

Just as humans are terrible drivers, we are terrible developers. We won't fix auto safety with training any more than we will fix software security with training. Of course there are basic rules everyone needs to understand which is why some training is useful. We're not going see any significant security improvements without some sort of new technology breakthrough. I don't know what that is, nobody does yet. What is self driving software development going to look like?

Let me know what you think. I'm @joshbressers on Twitter.

April 22, 2016

Remotely calling certmongers local signer

It is really hard to make remote calls securely without a minimal Public Key Infrastructure. For a single server development deployment, you can use a self-signed certificate, but once you have multiple servers that need to intercommunicate, you want to have a single signing cert used for all the services. I’m investigating an approach which chains multiple Certmonger instances together.

When Certmonger needs a certificate signed, it generates a Certificate Signing Request (CSR), and then calls a helper application. For a local signing, this executable is

/usr/libexec/certmonger/local-submit

If I want to sign a certificate without going through certmonger, I can first create a local cert database, generate a CSR, and manually sign it:

mkdir ~/certs
certutil -N -d ~certs
certutil -R -s "CN=www.younglogic.net, O=Younglogic, ST=MA, C=USA" -o ~/mycert.req -a -g 2048 -d ~/certs
/usr/libexec/certmonger/local-submit ~/mycert.req > mycert.pem

To get a remote machine to sign it, I used the following bash script:

#!/bin/sh -x

REMOTE_HOST=keycloak.younglogic.net
REMOTE_USER=dhc-user
SSH="ssh $REMOTE_USER@$REMOTE_HOST"      
CERTMONGER_CSR=`cat ~/mycert.req ` 

remotedir=`$SSH mktemp -d -p /home/dhc-user`
echo "$CERTMONGER_CSR" | $SSH tee $remotedir/mycert.req 
new_cert=$( $SSH  /usr/libexec/certmonger/local-submit $remotedir/mycert.req )
echo $new_cert > ~/mycert.pem
$SSH rm $remotedir/mycert.req
$SSH rmdir $remotedir

The /usr/libexec/certmonger/local-submit complies with the interface for Certmonger helper apps. Which means that it can also accept the CSR via the environment variable CERTMONGER_CSR, but as you can see, it also accepts it as an argument. If I drop the explicit definition of this variable, my script should work as a certmonger helper app.

In ~/.config/certmonger/cas/remote

id=remote
ca_is_default=0
ca_type=EXTERNAL
ca_external_helper=/home/ayoung/bin/remote_certmonger.sh

Of course, this will not honor any of the other getcert commands. But we should be able to list the certs.

Call it with:

getcert request -n remote   -c remote -s -d ~/certs/  -N "uid=ayoung,cn=users,cn=accounts,dc=openstack,dc=freeipa,dc=org"
New signing request "20160422020445" added.

getcert list -s

Request ID '20160422020445':
	status: SUBMITTING
	stuck: no
	key pair storage: type=NSSDB,location='/home/ayoung/certs',nickname='remote',token='NSS Certificate DB'
	certificate: type=NSSDB,location='/home/ayoung/certs',nickname='remote'
	signing request thumbprint (MD5): 5D1D5881 12952298 073F1DF6 48B10CB9
	signing request thumbprint (SHA1): A30FAEDE 1917DD4D 4FA3AAFC C704329E C7783B46
	CA: remote
	issuer: 
	subject: 
	expires: unknown
	pre-save command: 
	post-save command: 
	track: yes
	auto-renew: yes

So, not yet. More on this later.

April 20, 2016

Running Keystone Unit Tests against older Versions of RDO Etc

Just because upstrem is no longer supporting Essix doesn’t mean that someone out there is not running it. So, if you need to back port a patch, you might find yourself in the position of having to run unit tests against an older version of Keystone (or other) that does not run cleanly against the files installed by tox. For example, I tried running against an Icehouse era checkout and got a slew of errors like this:

AssertionError: Environmental variable PATH_INFO is not a string: <type> (value: u’/v2.0/tokens/e6aed0a188f1402d9ad3586bc0e35758/endpoints’)

The basic steps are:

  1. Install the packages for the version closest to the one you want to test
  2. checkout your source from git and apply your patch
  3. install any extra rpms required to run the tests
  4. run the test using python -m unittest $TESTNAME

For RDO, the main RPMS can be installed from :

https://repos.fedorapeople.org/repos/openstack/

You might need additional RPMS as packaged in EPEL. You don’t however, need to use an installer, you can use yum to install just the Keystone package.

The Dependencies are a little tricky to solve. Tox uses the test-requirements.txt file in the Keystone repo to install, but thes names do not match up with the package names. Often the RPM will be the name of the python package with the “python-” prefix.

Not all of the dependencies are in Fedora, RDO, or EPEL. Many were built just for CI, and are in https://copr.fedorainfracloud.org/coprs/abregman/.

For later releases, you can check out the jobs running in : https://ci.centos.org/view/rdo/view/promotion-pipeline/ and fetch the set of packages in “all_rpms.txt” but be aware that these are not the set of packages for unit tests. You might need more.

Not every package can be installed this way. For example, pyton-pysaml2 requires a bunch of additional RPMS that I had trouble pulling in. These can still be installed via pip.

April 17, 2016

Software end of life matters!
Anytime you work on a software project, the big events are always new releases. We love to get our update and see what sort of new and exciting things have been added. New versions are exciting, they're the result of months or years of hard work. Who doesn't love to talk about the new cool things going on?

There's a side of software that rarely gets talked about though, and honestly in the past it just wasn't all that important or exciting. That's the end of life. When is it time to kill off the old versions. Or sometimes even kill an entire project. When you do, what happens to the people using it? These are hard things to decide, there aren't good answers usually, it's just not a topic we're good at yet.

I bring this up now because apparently Apple has decided that Quicktime on Windows is no longer a thing. I think everyone can agree that expecting users to find some obscure message on the Internet to know they should uninstall something is pretty far fetched.

The conversation is way bigger than just Apple though. Google is going to brick some old Nest hardware. What about all those old tablets that still work but have no security updates? What about all those Windows XP machines still out there? I bet there are people still using Windows 95!

In some instances, the software and hardware can be decoupled. If you're running XP you can probably upgrade to something slightly better (maybe). Generally speaking though, you have some level of control. If you think about tablets or IoT style devices, the software and hardware are basically the same thing. The software will likely end of life before the hardware stops working. So what does that mean? In the case of pure software, if you need it to get work done, you're not going to uninstall it. It's all really complex unfortunately which is why nobody has figured this out yet.

In the past, you could keep most "hardware" working almost forever. There are cars out there nearly 100 years old. They still work and can be fixed. That's crazy. The thought of 100 year old software should frighten you to your core. They may have stopped making your washing machine years ago, but it still works and you can get it fixed. We've all seen the power tools our grandfathers used.

Now what happens when we decide to connect something to the Internet? Now we've chained the hardware to the software. Software has a defined lifecycle. It is born, it lives, it reaches end of life. Physical goods do not have a predetermined end of life (I know, it's complicated, let's keep it simple), they break, you get a new one. If we add software to this mix, software that creates a problem once it's hit the end of life stage, what do we do? There are two options really.

1) End the life of the hardware (brick it)
2) Let the hardware continue to run with the known bad software.

Neither is ideal. Now there are some devices you could just cut off features. A refrigerator for example. Instead of knowing when to order more pickles it reverts back to only keeping things cold. While this could create confusion in the pickle industry, at least you still have a working device. Other things would be tricky. An internet connected smart house isn't very useful if the things can't talk to each other. A tablet without internet isn't good for much.

I don't have any answers, just questions. We're still trying to sort out what this all means I suspect. If you think you know the answer I imagine you don't understand the question. This one is turtles all the way down.

What do you think? Tell me: @joshbressers

April 13, 2016

Getting Started with Puppet for Keystone

Tripleo uses Puppet to manage the resources in a deployment. Puppet has a command line tool to look at resources.

On my deployed Overcloud, I have:

ls /etc/puppet/modules/keystone/lib/puppet/provider
keystone         keystone_domain_config      keystone_paste_ini  keystone_service  keystone_user_role
keystone_config  keystone_endpoint           keystone.rb         keystone_tenant
keystone_domain  keystone_identity_provider  keystone_role       keystone_user

So I can use the puppet CLI to query the state of my system, or make changes:

To look at the config:

sudo puppet resource keystone_config
keystone_config { 'DEFAULT/admin_bind_host':
  ensure => 'present',
  value  => '10.149.2.13',
}
keystone_config { 'DEFAULT/admin_port':
  ensure => 'present',
  value  => '35357',
}
keystone_config { 'DEFAULT/admin_token':
  ensure => 'present',
  value  => 'vtNheM6drk4mgKgbAtWQPrYJe',
}
keystone_config { 'DEFAULT/log_dir':
  ensure => 'present',
  value  => '/var/log/keystone',
}
...

OK, Admin Token is gross.

$ sudo puppet resource keystone_config DEFAULT/admin_token
keystone_config { 'DEFAULT/admin_token':
  ensure => 'present',
  value  => 'vtNheM6drk4mgKgbAtWQPrYJe',
}

Let’s get rid of that:

sudo puppet resource keystone_config DEFAULT/admin_token ensure=absent
Notice: /Keystone_config[DEFAULT/admin_token]/ensure: removed
keystone_config { 'DEFAULT/admin_token':
  ensure => 'absent',
}

Let’s add a user:

$ sudo puppet resource keystone_users
Error: Could not run: Could not find type keystone_users
[heat-admin@overcloud-controller-0 ~]$ 

Uh oh…what did I do?

[heat-admin@overcloud-controller-0 ~]$ sudo puppet resource keystone_config DEFAULT/admin_token ensure=present value=vtNheM6drk4mgKgbAtWQPrYJe
Notice: /Keystone_config[DEFAULT/admin_token]/ensure: created
keystone_config { 'DEFAULT/admin_token':
  ensure => 'present',
  value  => 'vtNheM6drk4mgKgbAtWQPrYJe',
}
[heat-admin@overcloud-controller-0 ~]$ sudo puppet resource keystone_user
keystone_user { 'admin':
  ensure  => 'present',
  email   => 'admin@example.com',
  enabled => 'true',
  id      => '7cbc569993ae41e7b2736ed2aa727644',
}
...

So it looks like the Puppet modules use the Admin token to do operations.

But I really want to get rid of that admin token…

Back on the undercloud, I have created a Keystone V3 RC file. I’m going to copy that to /root/openrc on the overcloud controller.

[stack@undercloud ~]$ scp overcloudrc.v3 heat-admin@10.149.2.13:
[stack@undercloud ~]$ ssh heat-admin@10.149.2.13
[heat-admin@overcloud-controller-0 ~]$ sudo puppet resource keystone_config DEFAULT/admin_token ensure=absent
keystone_config { 'DEFAULT/admin_token':
  ensure => 'absent',
}
[heat-admin@overcloud-controller-0 ~]$ sudo puppet resource keystone_user
Error: Could not run: Insufficient credentials to authenticate
[heat-admin@overcloud-controller-0 ~]$ sudo cp  overcloudrc.v3 /root/openrc
[heat-admin@overcloud-controller-0 ~]$ sudo puppet resource keystone_user
keystone_user { 'admin':
  ensure  => 'present',
  email   => 'admin@example.com',
  enabled => 'true',
  id      => '7cbc569993ae41e7b2736ed2aa727644',
}
...

Now let’s add a user:

$ sudo puppet resource keystone_user ayoung ensure=present email=ayoung@redhat.com enabled=true password=FreeIPA4All
Notice: /Keystone_user[ayoung]/ensure: created
keystone_user { 'ayoung':
  ensure  => 'present',
  email   => 'ayoung@redhat.com',
  enabled => 'false',
}

Big Shout out to Emilien Macchi who is the Master of Keystone Puppets and taught me about the openrc file.

April 12, 2016

What happened with Badlock?
Unless you live under a rock, you've heard of the Badlock security issue. It went public on April 12. Then things got weird.

I wrote about this a bit in a previous post. I mentioned there that this better be good. If it's not, people will get grumpy. People got grumpy.

The thing is, this is a nice security flaw. Whoever found it is clearly bright, and if you look at the Samba patchset, it wasn't trivial to fix. Hats off to those two groups.
$ diffstat -s samba-4.4.0-security-2016-04-12-final.patch 
 227 files changed, 14582 insertions(+), 5037 deletions(-)
 Here's the thing though. It wasn't nearly as good as the hype claimed. It probably couldn't ever be as good as the hype claimed. This is like waiting for a new Star Wars movie. You have memories from being a child and watching the first few. They were like magic back then. Nothing that ever comes out again will be as good. Your brain has created ideas and memories that are too amazing to even describe. Nothing can ever beat the reality you built in your mind.

Badlock is a similar concept.

Humans are squishy irrational creatures. When we know something is coming one of two things happen. We imagine the most amazing thing ever which nothing will ever live up to (the end result here is being disappointed). Or we imagine something stupid which almost anything will be better than (the end result here is being pleasantly surprised).

I think most of us were expecting the most amazing thing ever. We had weeks to imagine what the worse possible security flaw could be that affects Samba and Windows. Most of us can imagine some pretty amazing things. We didn't get that though. We didn't get amazing. We got a pretty good security flaw, but not one that will change the world. We expected amazing, we got OK, now we're angry. If you look at twitter, the poor guy who discovered this is probably having a bad day. Honestly, there probably wouldn't have been anything that would have lived up to the elevated expectations that were set.

All that said, I do think by doing an announcement weeks in advance created this atmosphere. If this was all quiet until today, we would have been impressed, even if it had a name. Hype isn't something you can usually control. Some try, but by its very nature things get out of hand quickly and easily.

I'll leave you with two bits of wisdom you should remember.

  1. Name your pets, not your security flaws
  2. Never over-hype security. Always underpromise and overdeliver.

What do you think? Tell me: @joshbressers