Fedora security Planet

Episode 130 - Chat with Snyk co-founder Danny Grander

Posted by Open Source Security Podcast on January 21, 2019 01:12 AM
Josh and Kurt talk to Danny Grander one of the co-founders of Snyk about Zip Slip, what it is, how to fix it, and how they disclosed everything. We also touch on plenty of other open source security topics as Danny is involved in many aspects of open source security.



<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8328605/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes


    SUSE Open Build Service cheat sheet

    Posted by William Brown on January 18, 2019 02:00 PM

    SUSE Open Build Service cheat sheet

    Part of starting at SUSE has meant that I get to learn about Open Build Service. I’ve known that the project existed for a long time but I have never had a chance to use it. So far I’m thoroughly impressed by how it works and the features it offers.

    As A Consumer

    The best part of OBS is that it’s trivial on OpenSUSE to consume content from it. Zypper can add projects with the command:

    zypper ar obs://<project name> <repo nickname>
    zypper ar obs://network:ldap network:ldap
    

    I like to give the repo nickname (your choice) to be the same as the project name so I know what I have enabled. Once you run this you can easily consume content from OBS.

    Package Management

    As someone who has started to contribute to the suse 389-ds package, I’ve been slowly learning how this work flow works. OBS similar to GitHub/Lab allows a branching and request model.

    On OpenSUSE you will want to use the osc tool for your workflow:

    zypper in osc
    

    You can branch from an existing project to make changes with:

    osc branch <project> <package>
    osc branch network:ldap 389-ds
    

    This will branch the project to my home namespace. For me this will land in “home:firstyear:branches:network:ldap”. Now I can checkout the content on to my machine to work on it.

    osc co <project>
    osc co home:firstyear:branches:network:ldap
    

    This will create the folder “home:…:ldap” in the current working directory.

    From here you can now work on the project. Some useful commands are:

    Add new files to the project (patches, new source tarballs etc).

    osc add <path to file>
    osc add feature.patch
    osc add new-source.tar.xz
    

    Edit the change log of the project (I think this is used in release notes?)

    osc vc
    

    Build your changes locally matching the system you are on. Packages normally build on all/most OpenSUSE versions and architectures, this will build just for your local system and arch.

    osc build
    

    Commit your changes to the OBS server, where a complete build will be triggered:

    osc commit
    

    View the results of the last commit:

    osc results
    

    Enable people to use your branch/project as a repository. You edit the project metadata and enable repo publishing:

    osc meta prj -e <name of project>
    osc meta prj -e home:firstyear:branches:network:ldap
    
    # When your editor opens, change this section to enabled (disabled by default):
    <publish>
      <enabled />
    </publish>
    

    NOTE: In some cases if you have the package already installed, and you add the repo/update it won’t install from your repo. This is because in SUSE packages have a notion of “vendoring”. They continue to update from the same repo as they were originally installed from. So if you want to change this you use:

    zypper [d]up --from <repo name>
    

    You can then create a “request” to merge your branch changes back to the project origin. This is:

    osc sr
    

    So far this is as far as I have gotten with OBS, but I already appreciate how great this work flow is for package maintainers, reviewers and consumers. It’s a pleasure to work with software this well built.

    Structuring Rust Transactions

    Posted by William Brown on January 18, 2019 02:00 PM

    Structuring Rust Transactions

    I’ve been working on a database-related project in Rust recently, which takes advantage of my concurrently readable datastructures. However I ran into a problem of how to structure Read/Write transaction structures that shared the reader code, and container multiple inner read/write types.

    Some Constraints

    To be clear, there are some constraints. A “parent” write, will only ever contain write transaction guards, and a read will only ever contain read transaction guards. This means we aren’t going to hit any deadlocks in the code. Rust can’t protect us from mis-ording locks. An additional requirement is that readers and a single write must be able to proceed simultaneously - but having a rwlock style writer or readers behaviour would still work here.

    Some Background

    To simplify this, imagine we have two concurrently readable datastructures. We’ll call them db_a and db_b.

    struct db_a { ... }
    
    struct db_b { ... }
    

    Now, each of db_a and db_b has their own way to protect their inner content, but they’ll return a DBWriteGuard or DBReadGuard when we call db_a.read()/write() respectively.

    impl db_a {
        pub fn read(&self) -> DBReadGuard {
            ...
        }
    
        pub fn write(&self) -> DBWriteGuard {
            ...
        }
    }
    

    Now we make a “parent” wrapper transaction such as:

    struct server {
        a: db_a,
        b: db_b,
    }
    
    struct server_read {
        a: DBReadGuard,
        b: DBReadGuard,
    }
    
    struct server_write {
        a: DBWriteGuard,
        b: DBWriteGuard,
    }
    
    impl server {
        pub fn read(&self) -> server_read {
            server_read {
                self.a.read(),
                self.b.read(),
            }
        }
    
        pub fn write(&self) -> server_write {
            server_read {
                self.a.write(),
                self.b.write(),
            }
        }
    }
    

    The Problem

    Now the problem is that on my server_read and server_write I want to implement a function for “search” that uses the same code. Search or a read or write should behave identically! I wanted to also avoid the use of macros as the can hide issues while stepping in a debugger like LLDB/GDB.

    Often the answer with rust is “traits”, to create an interface that types adhere to. Rust also allows default trait implementations, which sounds like it could be a solution here.

    pub trait server_read_trait {
        fn search(&self) -> SomeResult {
            let result_a = self.a.search(...);
            let result_b = self.b.search(...);
            SomeResult(result_a, result_b)
        }
    }
    

    In this case, the issue is that &self in a trait is not aware of the fields in the struct - traits don’t define that fields must exist, so the compiler can’t assume they exist at all.

    Second, the type of self.a/b is unknown to the trait - because in a read it’s a “a: DBReadGuard”, and for a write it’s “a: DBWriteGuard”.

    The first problem can be solved by using a get_field type in the trait. Rust will also compile this out as an inline, so the correct thing for the type system is also the optimal thing at run time. So we’ll update this to:

    pub trait server_read_trait {
        fn get_a(&self) -> ???;
    
        fn get_b(&self) -> ???;
    
        fn search(&self) -> SomeResult {
            let result_a = self.get_a().search(...); // note the change from self.a to self.get_a()
            let result_b = self.get_b().search(...);
            SomeResult(result_a, result_b)
        }
    }
    
    impl server_read_trait for server_read {
        fn get_a(&self) -> &DBReadGuard {
            &self.a
        }
        // get_b is similar, so ommitted
    }
    
    impl server_read_trait for server_write {
        fn get_a(&self) -> &DBWriteGuard {
            &self.a
        }
        // get_b is similar, so ommitted
    }
    

    So now we have the second problem remaining: for the server_write we have DBWriteGuard, and read we have a DBReadGuard. There was a much longer experimentation process, but eventually the answer was simpler than I was expecting. Rust allows traits to have Self types that define traits themself.

    So provided that DBReadGuard and DBWriteGuard both implement “DBReadTrait”, then we can have the server_read_trait have a self type that enforces this. It looks something like:

    pub trait DBReadTrait {
        fn search(&self) -> ...;
    }
    
    impl DBReadTrait for DBReadGuard {
        fn search(&self) -> ... { ... }
    }
    
    impl DBReadTrait for DBWriteGuard {
        fn search(&self) -> ... { ... }
    }
    
    pub trait server_read_trait {
        type GuardType: DBReadTrait; // Say that GuardType must implement DBReadTrait
    
        fn get_a(&self) -> &Self::GuardType; // implementors must return that type implementing the trait.
    
        fn get_b(&self) -> &Self::GuardType;
    
        fn search(&self) -> SomeResult {
            let result_a = self.get_a().search(...);
            let result_b = self.get_b().search(...);
            SomeResult(result_a, result_b)
        }
    }
    
    impl server_read_trait for server_read {
        fn get_a(&self) -> &DBReadGuard {
            &self.a
        }
        // get_b is similar, so ommitted
    }
    
    impl server_read_trait for server_write {
        fn get_a(&self) -> &DBWriteGuard {
            &self.a
        }
        // get_b is similar, so ommitted
    }
    

    This works! We now have a way to write a single “search” type for our server read and write types. In my case, the DBReadTrait also uses a similar technique to define a search type shared between the DBReadGuard and DBWriteGuard.

    Security isn’t a feature

    Posted by Josh Bressers on January 15, 2019 03:48 PM

    As CES draws to a close, I’ve seen more than one security person complain that nobody at the show was talking about security. There were an incredible number of consumer devices unveiled, no doubt there is no security in any of them. I think we get caught up in the security world sometimes so we forget that the VAST majority of people don’t care if something has zero security. People want interesting features that amuse them or make their lives easier. Security is rarely either of these, generally it makes their lives worse so it’s an anti-feature to many.

    Now the first thing many security people think goes something like this “if there’s no security they’ll be sorry when their lightbulb steals their wallet and dumps the milk on the floor!!!” The reality is that argument will convince nobody, it’s not even very funny so they’re laughing at us, not with us. Our thoughts by very nature blame all the wrong people and we try to scare them into listening to us. It’s never worked. Ever. That one time you think it worked they were only pretended to care so you would go away.

    So it brings us to the idea that security isn’t a feature. Turning your lights on is a feature. Cooking you dinner is a feature. Driving your car is a feature. Not bursting into flames is not a feature. Well it sort of is, but nobody talks about it. Security is a lot like the bursting into flames thing. Security really is about something not happening, things not happening is the fundamental  problem we have when we try to talk about all this. You can’t build a plausible story around an event that may or may not happen. Trying to build a narrative around something that may or may not happen is incredibly confusing. This isn’t how feature work, features do positive things, they don’t not do negative things (I don’t even know if that’s right). Security isn’t a feature.

    So the question you should be asking then is how do we make products being created contain more of this thing we keep calling security. The reality is we can’t make this happen given our current strategies. There are two ways products will be produced that are less insecure (see what I did there). Either the market demands it, which given the current trends isn’t happening anytime soon. People just don’t care about security. The second way is a government creates regulations that demand it. Given the current state of the world’s governments, I’m not confident that will happen either.

    Let’s look at market demand first. If consumers decide that buying products that are horribly insecure is bad, they could start buying products with more built in security. But even the security industry can’t define what that really means. How can you measure which product has the best security? Consumers don’t have a way to know which products are safer. How to measure security could be a multi-year blog series so I won’t get into the details today.

    What if the government regulates security? We sort of end up in a similar place to consumer demand. How do we define security? It’s a bit like defining safety I suppose. We’re a hundred years into safety regulations and still get a lot wrong and I don’t think anyone would argue defining safety is much easier than defining security. Security regulation would probably follow a similar path. It will be decades before things could be good enough to create real change. It’s very possible by then the machines will have taken over (that’s the secret third way security gets fixed, perhaps a post for another day).

    So here we are again, things seem a bit doom and gloom. That’s not the intention of this post. The real purpose is to point out we have to change the way we talk about security. Yelling at vendors for building insecure devices isn’t going to ever work. We could possibly talk to consumers in a way that resonates with them, but does anyone buy the stove that promises to burst into flames the least? Nobody would ever use that as a marketing strategy. I bet it would have the opposite effect, a bit like our current behaviors and talking points I suppose.

    Complaining that companies don’t take security seriously hasn’t ever worked and never will work. They need an incentive to care, us complaining isn’t an incentive. Stay tuned for some ideas on how to frame these conversations and who the audience needs to be.

    Episode 129 - The EU bug bounty program

    Posted by Open Source Security Podcast on January 14, 2019 12:58 AM
    Josh and Kurt talk about the EU bug bounty program. There have been a fair number of people complaining it's solving the wrong problem, but it's the only way the EU has to spend money on open source today. If that doesn't change this program will fail.



    <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8242709/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

    Show Notes


      Episode 128 - Australia's encryption backdoor bill

      Posted by Open Source Security Podcast on January 07, 2019 12:41 AM
      Josh and Kurt talk about Australia's recently passed encryption bill. What is the law that was passed, what does it mean, and what are the possible outcomes? The show notes contain a flow chart of possible outcomes.


      <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8156204/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

      Show Notes


        TripleO Networks from Simplest to Not-So-Simple

        Posted by Adam Young on January 03, 2019 05:27 PM

        If you read the TripleO setup for network isolation, it lists eight distinct networks. Why does TripleO need so many networks? Lets take it from the ground up.

        WiFi to the Workstation

        WifI to the Workstation

        I run Red Hat OpenStack Platform (OSP) Director, which is the productized version of TripleO.  Everything I say here should apply equally well to the upstream and downstream variants.

        My setup has OSP Director running in a virtual machine (VM). To get that virtual machine set up takes network connectivity. I perform this via Wireless, as I move around the hose with my laptop, and the workstation has a built in wireless card.

        Let’s start here: Director runs inside a virtual machine on the workstation.  It has complete access to the network interface card (NIC) via macvtap.  This NIC is attached to a Cisco Catalyst switch.  A wired cable to my laptop is also attached to the switch. This allows me to setup and test the first stage of network connectivity:  SSH access to the virtual machine running in the workstation.

        Provisioning Network

        The Blue Network here is the provisioning network.  This reflects two of the networks from the Tripleo document:

        • IPMI* (IPMI System controller, iLO, DRAC)
        • Provisioning* (Undercloud control plane for deployment and management)

        These two distinct roles can be served by the same network in my setup, and, infact they must be.  Why?  Because my Dell servers have a  NIC that acts as both the IPMI endpoint and is the only NIC that supports PXE.  Thus, unless I wanted to do some serious VLAN wizardry, and get the NIC to switch both (tough to debug during the setup stage) I am better off with them both using untagged VLAN traffic.  Thus, each server is allocated two static IPv4 address, one to be used for IPMI, and one that will be assigned during the hardware provisioning.

        Apologies for the acronym soup.  It bothers me, too.

        Another way to think about the set of networks you need is via DHCP traffic.  Since the IPMI cards are statically assigned their IP addresses, they do not need a DHCP server.  But, the hardware’s Operating system will get its IP address from DHCP.  Thus, it is OK if these two functions share a Network.

        This does not scale very well.  IPMI and IDrac can both support DHCP, and that would be the better way to go in the future, but is beyond the scope of what I am willing to mess with in my lab.

        Deploying the Overcloud

        In order to deploy the overcloud, the Director machine needs to perform two classes of network calls:

        1. SSH calls to the baremetal OS to lunch the services, almost all of which are containers.  This is on the Blue network above.
        2. HTTPS calls to the services running in those containers.  These services also need to be able to talk to each other.  This is on the Yellow internal API network above.  I didn’t color code “Yellow” as you can’t read it.  Yellow.

        Internal (not) versus External

        You might notice that my diagram has an additional network; the External API network is shown in Red.

        Provisioning and calling services are two very different use cases.  The most common API call in OpenStack is POST https://identity/v3/auth/token.  This call is made prior to any other call.  The second most common is the call to validate a token.  The create token  call needs to be access able from everywhere that OpenStack is used.  The validate token call does not.  But, if the API server only  listens on the same network that is used for provisioning, that means the network is wide open;  people that should only be able to access the OpenStack APIs can now send network attacks against the IPMI cards.

        To split this traffic, either the network APIs need to listen on both networks, or the provisioning needs to happen on the external API network. Either way, both networks are going to be set up when the overcloud is deployed.

        Thus, the Red Server represents the API servers that are running on the controller, and the yellow server represents the internal agents that are running on the compute node.

        Some Keystone History

        When a user performs an action in the OpenStack system, they make an API call.  This request is processed by the webserver running on the appropriate controller host.  There is no difference between a Nova server requesting a token and project member requesting a token. These were seen as separate use cases, and were put on separate network ports.  The internal traffic was on port 35357, and the project member traffic was on port 5000.

        It turns out that running on two different ports of the same IP address does not solve the real problem people were trying to solve.  They wanted to limit API access via network, not by port.  Thus, there really was no need for two different ports, but rather two different IP addresses.

        This distinction still shows up in the Keystone service catalog, where Endpoints are classified as External or Internal.

        Deploying and Using a Virtual Machine

        Now Our Diagram has gotten a little more complicated.  Lets start with the newly added Red Lap top, attached to the External API network.  This system is used by our project member, and is used to create the new virtual machine via the compute create_server API call. In order:

        1. The API call comes from the outside world, travels over the Red external API network to the Nova server (shown in red)
        2. The Nova posts messages to the the Queue, which are eventually picked up and processed by the compute agent (shown in yellow).
        3. The compute agent talks back to the other API servers (also shown in Red) to fetch images, create network ports, and connect to storage volumes.
        4. The new VM (shown in green) is created and connects via an internal, non-routable IP address to the metadata server to fetch configuration data.
        5. The new VM is connected to the provider network (also shown in green).

        At this point, the VM is up and running.  If an end user wants to connect to it they can do so.  Obviously, the Provider network does not run all the way through the router to the end users system, but this path is the “open for business” network pathway.

        Note that this is an instance of a provider network as Assaf defined in his post.

        Tenant Networks

        Let say you are not using a provider network.  How does that change the setup?  First, lets re-label the Green network to be the “External Network.”  Notice that the virtual machines do not connect to it now.  Instead, they connect via the new, purple networks.

        Note that the Purple networks connect to the external network in the network controller node, show in purple on the bottom server.  This service plays the role of a router, converting the internal traffic on the tenant network to the external traffic.  This is where the Floating IPs terminate, and are mapped to an address on the internal network.

        Wrap Up

        The TripleO network story has evolved to support a robust configuration that splits traffic into its component segments.  The diagrams above attempt to pass along my understanding of how they work, and why.

        I’ve left off some of the story, as I do not show the separate networks that can be used for storage.  I’ve collapsed the controllers and agents into a simple block to avoid confusing detail, my goal is accuracy, but here it sacrifices precision.  It also only shows a simple rack configuration, much like the one here in my office.  The concepts presented should allow you to understand how it would scale up to a larger deployment.  I expect to talk about that in the future as well.

        I’ll be sure to update  this article with feedback. Please let me know what I got wrong, and what I can state more clearly.

        Remotely Provisioning a Virtual Machine using Ansible and Libvirt

        Posted by Adam Young on January 03, 2019 03:15 PM

        Ansible exists to help automate the time consuming repeated tasks that technologist depend upon. One very common jobs is to create and tear down a virtual machine. While cloud technologies have made this possible to perform remotely, there are many times when I’ve needed to setup and tear down virtual machines on systems that were stand alone Linux servers. In this case, the main interfaces to the machine are ssh and libvirt. I recently worked through an Ansible role to setup and tear down an virtual machine via libvirt, and I’d like to walk through it, and record my reasons for some of the decisions I made.

        Constant Refactoring

        Always work from success.  Change one thing at a time, so that you know what broke when things don’t work.  Thus, when I work something out, the first iteration is hard coded.  I get it to work, and then I clean it up.  The most common refactoring is to introduce a variable.  For example, If I am working with a file such as:

         

        /var/lib/libvirt/images/rhel-server-7.5-x86_64-kvm.qcow2

        I’ll use exactly that line in the Ansible play to start such as

        - name: push base vm image to hypervisor
          copy:
          src: /var/lib/libvirt/images/rhel-server-7.5-x86_64-kvm.qcow2
          dest: /var/lib/libvirt/images/rhel-server-7.5-x86_64-kvm.qcow2
          owner: qemu
          group: qemu
          mode: u=rw,g=r,o=r

        Once I get that to work, I’ll clean it up to something like:

        - name: push base vm image to hypervisor
          copy:
          src: "{{ source_image_dir }}/{{ source_image_file }}"
          dest: "{{ target_image_dir }}/{{ source_image_file }}"
          owner: qemu
          group: qemu
          mode: u=rw,g=r,o=r

        With the definition of the variables going into the role’s default/main.yml file.

        Customizing the VM Backing Store image

        The Backing store from the virtual machine is created by copying the original VM image file to a new file, and then using virt-customize to modify the image. This is a little expensive in terms of disk space; I could, instead, clone the original file and use the qcow aspect of it to provide the same image base to all of the VMs generated from it. I might end up doing that in the long run. However, that does put cross file dependencies in place. If I do something to the original file, I lose all of the VMs built off it. If I want to copy the VM to a remote machine, I would have to copy both files and keep them in the same directory. I may end up doing some of that in the future, if disk space becomes an issue.

        The virtual machine base image

        The above code block shows how I copy the raw image file over to the hypervisor. I find that I am often creating multiple VMs off of the same base file. While I could customize this file directly, it would then no longer match the fingerprint of the file I downloaded from the Red Hat site, and I would have no way to confirm I was using a safe image. Also, copying the file to the remote machine is one of the longest tasks in this playbook, so I do not remove it in the cleanup task list.

        Templatization of files to modify the base image

        Before I can modify the VM image, I need to copy templated files from the Ansible host to the remote system. This two step process is necessary, as I cannot fill in a template during the call to virt-customize. Thus, all templatization is done in the template tasks. For This script, I use the /tmp directory as the interim location. This could be problematic, and I would be better off creating a deliberate subdirectory under /home/cloud-user or another known location. That would be safer, and less likely to have a conflict.

        Network card access and Static IP Addresses

        The virtual machine I am building is going to have to work with both a PXE service and also be available to physical machines outside the cluster. As such, I want it to have a network interface linked to a physical one on its host, and to assign that interface a static IP address. The physical passthrough is handled by making the device into a macvtap device. The XML Fragment for it looks like this:

            
        
            <interface type="direct">
              <mac address="52:54:00:26:29:db">
              <source dev="em1" mode="bridge">
              <model type="virtio">
              
        </interface>

        The flat naming of the variable ethernet_device will be problematic over time, and I will probably make it a dictionary value under the with_items collection.

        To assign this device a static IP address, I copied an ifcfg-eth1 file and templatized it.

        Multiple Inventories

        I have a fairly powerful laptop that is supposed to work as a portable Demo machine. I want to be able to use the same playbook to deploy VMs on the laptop as I do on the workstation I’;ve been testing this on. On my laptop, I typically run with sshd disabled, and only enable it when I want to run this or similar Ansible playbooks.

        Part of the constant refactoring is moving variables from the tasks, to defaults, to the inventory files.

        More and more, my inventory setup is starting to look like ansible tower. Eventually, I expect to have something like the template mechanism to be able to track “run this playbook with that inventory and these specific values.”

        Creating servers in a loop

        While my current task requires only a single VM, eventually I am going to want two or more. This means that I need to create the set of servers in a loop. This actually ends up flowing into all tasks that modify the base image. This is one case where constant refactoring comes in, but also where I show I can easily break thee multi-inventory set up. For example, the addresses that are hard coded into the XML fragment above really need to vary per host. Thus, that fragment should look like this:

           
            <interface type="direct">
              <mac address="{{ item.mac }}">
              <source dev="em1" mode="bridge">
              <model type="virtio">
              
        </interface>

        And the ethernet configuration should look like this:

           
        TYPE=Ethernet
        PROXY_METHOD=none
        BROWSER_ONLY=no
        BOOTPROTO=none
        IPADDR={{ item.static_ip_address }}
        PREFIX=24
        GATEWAY=10.127.0.1
        DEFROUTE=yes
        IPV4_FAILURE_FATAL=no
        IPV6INIT=yes
        IPV6_AUTOCONF=yes
        IPV6_DEFROUTE=yes
        IPV6_FAILURE_FATAL=no
        IPV6_ADDR_GEN_MODE=stable-privacy
        NAME=eth1
        DEVICE=eth1
        ONBOOT=yes
        ZONE=public
        DNS1=10.127.0.7
        PEERDNS=no
        UUID= {{ item.uuid }}
         

        …and that still hard codes some values. The collection that I iterate through to create the servers now needs these additional keys. Thus, my default file should look like this:

        ---
        cluster_hosts:
          - {name: passimian, uuid: 9c92fad9-6ecb-3e6c-eb4d-8a47c6f50c0, static_ip_address: 10.127.0.3, mac: 52:54:00:26:29:db }
        

        The task for copying in the network configuration currently looks like this:

        - template:
            src: ifcfg-eth1.j2
            dest: '{{ hypervisor_keystore_dir }}/ifcfg-eth1'
        

        It will have to be modified to:

        - template:
            src: ifcfg-eth1.j2
            dest: '{{ hypervisor_keystore_dir }}/{{ item.name }}-ifcfg-eth1'
          with_items: "{{ cluster_hosts }}"
        

        And the configuration of the VM image would also have to reflect this. Currently the call is:

        -command  'id -u cloud-user &>/dev/null || /usr/sbin/useradd -u 1000 cloud-user'  --ssh-inject cloud-user:file:/tmp/authorized_keys   --hostname {{ item.name }}.home.younglogic.net   --copy-in {{ hypervisor_keystore_dir }}/ifcfg-eth1:/etc/sysconfig/network-scripts  --selinux-relabel
          with_items: "{{ cluster_hosts }}"
        

        The flag would need to be updated to

         --copy-in {{ hypervisor_keystore_dir }}/{{ item.name }}-ifcfg-eth1:/etc/sysconfig/network-scripts
        

        Since I start by making the changes in default/main.yml, I only have to make them once. Once I push the cluster_hosts definition to the inventory files, refactoring gets harder: I cannot atomically make a change without breaking one of the configurations. Once I have more than one system using this playbook, adding parameters this way introduces a non-backwards compatible change.

        Conclusion

        Like much system administration work, this task is going to be used before it is completed. It is perpetually a work-in-progress. This is healthy. As soon as we start getting afraid of our code, it calcifies and breaks. Even worse, we treat the code as immutable, and build layers around it, making simple things more complicated. These notes serve to remind me (and others) why things look the way they do, and where things are going. Hopefully, when time comes to change things in the future, these notes will help this code base grow to match the needs I have for it.

        G#

        Posted by Adam Young on January 02, 2019 09:52 PM

        G# is a magic note.  It takes the vanilla, banal, bland sound of a major scale and makes it into music. Here’s how.

        Listen to the first line of Fur Elise, by Beethoven. Focus on the left hand, Note where he added a G sharp.
        <iframe allowfullscreen="allowfullscreen" frameborder="0" height="394" src="https://musescore.com/user/23259686/scores/5376166/embed" width="100%"></iframe>“Fur Elise (Opening)” by Ludwig Von Beethoven

        Now read an listen to the start to Invention 13 by Bach. Again, pay attention to the G sharps, but also what the sound is where he leaves them as G natural.

        <iframe allowfullscreen="allowfullscreen" frameborder="0" height="394" src="https://musescore.com/user/23259686/scores/5376172/embed" width="100%"></iframe>“bach-invention-13-start” by J.S. Bach 4

        Both pieces are nominally in A minor (relative to C Major) but make heavy use of modulation to G Sharp.

        Lets start with the basic C scale

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bb1f7',], "�X:1�K:C�L:1/4

        CDEFGABc�".replace(/\x01/g,"\n"), {}, {}, {});</script>

        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bb270',], "�X:1�K:C�L:1/4

        CDEFGABc�".replace(/\x01/g,"\n"), {}, {});</script>

        For every major scale, there is a relative minor. The relative minor scale is created by playing the major scale but starting and ending on the 6th note. For the Key of C major, the relative minor is A minor. It has no accidentals.

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bb2df',], "�X:1�K:C�L:1/4

        ABcdefga�".replace(/\x01/g,"\n"), {}, {}, {});</script>

        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bb34b',], "�X:1�K:C�L:1/4

        ABcdefga�".replace(/\x01/g,"\n"), {}, {});</script>

        The harmonic minor scale is created by lifting the G one Half step:

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bb3b6',], "�X:1�K:C�L:1/4

        ABcdef^ga�".replace(/\x01/g,"\n"), {}, {}, {});</script>

        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bb420',], "�X:1�K:C�L:1/4�ABcdef^ga�".replace(/\x01/g,"\n"), {}, {});</script>

        By raising that note, it makes an interval of a a minor third between the F and the G#. This strongly emphasizes the minor effect.

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bb48a',], "�X:1�K:C�L:1/4�f4^g4�".replace(/\x01/g,"\n"), {}, {}, {});</script>
        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bb4f5',], "�X:1�K:C�L:1/4

        f4^g4�".replace(/\x01/g,"\n"), {}, {});</script>

        When we start working with the relative minor, using the G sharp converts what was originally an E minor chord into an E major chord

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bb55f',], "�X:1�K:C�L:1/4�[egb]4[e^gb]4�".replace(/\x01/g,"\n"), {}, {}, {});</script>
        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bb5c9',], "�X:1�K:C�L:1/4�[egb]4[e^gb]4�".replace(/\x01/g,"\n"), {}, {});</script>

        Conversion to the blues scale.

        This sound is used for so much more than just Baroque. Lets go back to our C scale, but this time play it from D to D. This is called the Dorian mode.

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bb632',], "�X:1�K:C�L:1/4�DEFGABcd�".replace(/\x01/g,"\n"), {}, {}, {});</script>
        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bb69c',], "�X:1�K:C�L:1/4�DEFGABcd�".replace(/\x01/g,"\n"), {}, {});</script>

        If We drop out the E and the B, we end up with a minor Pentatonic scale.

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bb706',], "�X:1�K:C�L:1/4�DFGAcd�".replace(/\x01/g,"\n"), {}, {}, {});</script>
        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bb76f',], "�X:1�K:C�L:1/4�DFGAcd�".replace(/\x01/g,"\n"), {}, {});</script>

        If we add in that G# again, we get a blues scale.

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bb7da',], "�X:1�K:C�L:1/4�DFG^GAcd�".replace(/\x01/g,"\n"), {}, {}, {});</script>
        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bb843',], "�X:1�K:C�L:1/4�DFG^GAcd�".replace(/\x01/g,"\n"), {}, {});</script>

        If we rotate back to the Root position, we have a major blues scale:

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bb8ad',], "�X:1�K:C�L:1/4�CDFG^GAc�".replace(/\x01/g,"\n"), {}, {}, {});</script>
        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bb916',], "�X:1�K:C�L:1/4�CDFG^GAc�".replace(/\x01/g,"\n"), {}, {});</script>

        Back in the late 1930s, a Jazz musicians were looking for ways to get their lines of eighth notes to flow. The problem is that a major scale has 7 distinct pitches in it, but a measure has 8 spaces to fill. This means that a pattern of eighth notes does not fallout on the same down beat after a measure. Note where the chord tones fall on the following phrase.

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bb983',], "�X:1�K:C�L:1/8

        | CDEF GABc | defg fedc |Z8|

        ".replace(/\x01/g,"\n"), {}, {}, {});</script>

        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bb9ee',], "�X:1�K:C�L:1/8

        | CDEF GABc | defg fedc |Z8|�".replace(/\x01/g,"\n"), {}, {});</script>

        For the first 7 beats, the C Maj 7 Chord tones are on the down beat. C On 1, E on 2, G on 3, and B on 4. But the C repeats on the upbeat of 4, and the downbeat of one in the second measure is, again, a non chord tone. This is much closer to a a D minor line than a C major line.

        If we add in the G Sharp, the line now falls out so that all the major chord tones are on down beats. We adjust our expectation so that A (the Sixth) is the fourth tone of the scale.

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bba5a',], "�X:1�K:C�L:1/8

        | CDEF G^GAB | cdef g^gab |Z8|

        ".replace(/\x01/g,"\n"), {}, {}, {});</script>

        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bbac5',], "�X:1�K:C�L:1/8

        | CDEF G^GAB | cdef g^gab |Z8|�".replace(/\x01/g,"\n"), {}, {});</script>

        This works for the minor and Seventh chords as well. A D minor riff:

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bbb33',], "�X:1�K:C�L:1/8

        | DEFG ^GABc | defg ^g=gfe |Z8|

        ".replace(/\x01/g,"\n"), {}, {}, {});</script>

        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bbba0',], "�X:1�K:C�L:1/8�| DEFG ^GABc | defg ^g=gfe |Z8|�".replace(/\x01/g,"\n"), {}, {});</script>

        A G 7 Riff

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bbc0c',], "�X:1�K:C�L:1/8

        | G^GAB cdef | G^GAB cdef |Z8|

        ".replace(/\x01/g,"\n"), {}, {}, {});</script>

        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bbc76',], "�X:1�K:C�L:1/8�| G^GAB cdef |G^GAB cdef |Z8|�".replace(/\x01/g,"\n"), {}, {});</script>

        All this by adding the blues note.

        Of course, this is only for the Key of C. The same relationship works in any key. You take the Fifth tone and sharp it. For example, in the Key of G, you sharp the D.

        <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c3fb773bbce0',], "�X:1�K:G�L:1/4�| G A B c | d^def| g4 |�".replace(/\x01/g,"\n"), {}, {}, {});</script>
        <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c3fb773bbd4a',], "�X:1�K:G�L:1/4�| G A B c | d^def| g4 |�".replace(/\x01/g,"\n"), {}, {});</script>

        I tried to illuminate this concept on the Saxophone a while back. There is a gaffe here where I mixed up the keys, but it does make the point.
        <iframe allowfullscreen="allowfullscreen" frameborder="0" height="315" src="https://www.youtube.com/embed/XgICg4r7MdY" width="560"></iframe>

        Useful USG pro 4 commands and hints

        Posted by William Brown on January 01, 2019 02:00 PM

        Useful USG pro 4 commands and hints

        I’ve recently changed from a FreeBSD vm as my router to a Ubiquiti PRO USG4. It’s a solid device, with many great features, and I’m really impressed at how it “just works” in many cases. So far my only disappointment is lack of documentation about the CLI, especially for debugging and auditing what is occuring in the system, and for troubleshooting steps. This post will aggregate some of my knowledge about the topic.

        Current config

        Show the current config with:

        mca-ctrl -t dump-cfg
        

        You can show system status with the “show” command. Pressing ? will cause the current compeletion options to be displayed. For example:

        # show <?>
        arp              date             dhcpv6-pd        hardware
        

        DNS

        The following commands show the DNS statistics, the DNS configuration, and allow changing the cache-size. The cache-size is measured in number of records cached, rather than KB/MB. To make this permanent, you need to apply the change to config.json in your controllers sites folder.

        show dns forwarding statistics
        show system name-server
        set service dns forwarding cache-size 10000
        

        Logging

        You can see and aggregate of system logs with

        show log
        

        Note that when you set firewall rules to “log on block” they go to dmesg, not syslog, so as a result you need to check dmesg for these.

        It’s a great idea to forward your logs in the controller to a syslog server as this allows you to aggregate and see all the events occuring in a single time series (great when I was diagnosing an issue recently).

        Interfaces

        To show the system interfaces

        show interfaces
        

        To restart your pppoe dhcp6c:

        release dhcpv6-pd interface pppoe0
        renew dhcpv6-pd interface pppoe0
        

        There is a current issue where the firmware will start dhcp6c on eth2 and pppoe0, but the session on eth2 blocks the pppoe0 client. As a result, you need to release on eth2, then renew of pppoe0

        If you are using a dynamic prefix rather than static, you may need to reset your dhcp6c duid.

        delete dhcpv6-pd duid
        

        To restart an interface with the vyatta tools:

        disconnect interface pppoe
        connect interface pppoe
        

        OpenVPN

        I have setup customised OpenVPN tunnels. To show these:

        show interfaces openvpn detail
        

        These are configured in config.json with:

        # Section: config.json - interfaces - openvpn
            "vtun0": {
                    "encryption": "aes256",
                    # This assigns the interface to the firewall zone relevant.
                    "firewall": {
                            "in": {
                                    "ipv6-name": "LANv6_IN",
                                    "name": "LAN_IN"
                            },
                            "local": {
                                    "ipv6-name": "LANv6_LOCAL",
                                    "name": "LAN_LOCAL"
                            },
                            "out": {
                                    "ipv6-name": "LANv6_OUT",
                                    "name": "LAN_OUT"
                            }
                    },
                    "mode": "server",
                    # By default, ubnt adds a number of parameters to the CLI, which
                    # you can see with ps | grep openvpn
                    "openvpn-option": [
                            # If you are making site to site tunnels, you need the ccd
                            # directory, with hostname for the file name and
                            # definitions such as:
                            # iroute 172.20.0.0 255.255.0.0
                            "--client-config-dir /config/auth/openvpn/ccd",
                            "--keepalive 10 60",
                            "--user nobody",
                            "--group nogroup",
                            "--proto udp",
                            "--port 1195"
                    ],
                    "server": {
                            "push-route": [
                                    "172.24.0.0/17"
                            ],
                            "subnet": "172.24.251.0/24"
                    },
                    "tls": {
                            "ca-cert-file": "/config/auth/openvpn/vps/vps-ca.crt",
                            "cert-file": "/config/auth/openvpn/vps/vps-server.crt",
                            "dh-file": "/config/auth/openvpn/dh2048.pem",
                            "key-file": "/config/auth/openvpn/vps/vps-server.key"
                    }
            },
        

        The idea of CI and Engineering

        Posted by William Brown on January 01, 2019 02:00 PM

        The idea of CI and Engineering

        In software development I see and interesting trend and push towards continuous integration, continually testing, and testing in production. These techniques are designed to allow faster feedback on errors, use real data for application testing, and to deliver features and changes faster.

        But is that really how people use software on devices? When we consider an operation like google or amazon, this always online technique may work, but what happens when we apply a continous integration and “we’ll patch it later” mindset to devices like phones or internet of things?

        What happens in other disciplines?

        In real engineering disciplines like aviation or construction, techniques like this don’t really work. We don’t continually build bridges, then fix them when they break or collapse. There are people who provide formal analysis of materials, their characteristics. Engineers consider careful designs, constraints, loads and situations that may occur. The structure is planned, reviewed and verified mathematically. Procedures and oversight is applied to ensure correct building of the structure. Lessons are learnt from past failures and incidents and are applied into every layer of the design and construction process. Communication between engineers and many other people is critical to the process. Concerns are always addressed and managed.

        The first thing to note is that if we just built lots of scale-model bridges and continually broke them until we found their limits, this would waste many resources to do this. Bridges are carefully planned and proven.

        So whats the point with software?

        Today we still have a mindset that continually breaking and building is a reasonable path to follow. It’s not! It means that the only way to achieve quality is to have a large test suite (requires people and time to write), which has to be further derived from failures (and those failures can negatively affect real people), then we have to apply large amounts of electrical energy to continually run the tests. The test suites can’t even guarantee complete coverage of all situations and occurances!

        This puts CI techniques out of reach of many application developers due to time and energy (translated to dollars) limits. Services like travis on github certainly helps to lower the energy requirement, but it doesn’t stop the time and test writing requirements.

        No matter how many tests we have for a program, if that program is written in C or something else, we continually see faults and security/stability issues in that software.

        What if we CI on … a phone?

        Today we even have hardware devices that are approached as though they “test in production” is a reasonable thing. It’s not! People don’t patch, telcos don’t allow updates out to users, and those that are aware, have to do custom rom deployment. This creates an odd dichomtemy of “haves” and “haves not”, of those in technical know how who have a better experience, and the “haves not” who have to suffer potentially insecure devices. This is especially terrifying given how deeply personal phones are.

        This is a reality of our world. People do not patch. They do not patch phones, laptops, network devices and more. Even enterprises will avoid patching if possible. Rather than trying to shift the entire culture of humans to “update always”, we need to write software that can cope in harsh conditions, for long term. We only need to look to software in aviation to see we can absolutely achieve this!

        What should we do?

        I believe that for software developers to properly become software engineers we should look to engineers in civil and aviation industries. We need to apply:

        • Regualation and ethics (Safety of people is always first)
        • Formal verification
        • Consider all software will run long term (5+ years)
        • Improve team work and collaboration on designs and development

        The reality of our world is people are deploying devices (routers, networks, phones, lights, laptops more …) where they may never be updated or patched in their service life. Even I’m guilty (I have a modem that’s been unpatched for about 6 years but it’s pretty locked down …). As a result we need to rely on proof that the device can not fail at build time, rather than patch it later which may never occur! Putting formal verification first, and always considering user safety and rights first, shifts a large burden to us in terms of time. But many tools (Coq, fstar, rust …) all make formal verification more accessible to use in our industry. Verifying our software is a far stronger assertion of quality than “throw tests at it and hope it works”.

        Over time our industry will evolve, and it will become easier and more cost effective to formally verify than to operate and deploy CI. This doesn’t mean we don’t need tests - it means that the first line of quality should be in verification of correctness using formal techniques rather than using tests and CI to prove correct behaviour. Tests are certainly still required to assert further behavioural elements of software.

        Conclusion

        Over time our industry must evolve to put the safety of humans first. To achive this we must look to other safety driven cultures such as aviation and civil engineering. Only by learning from their strict disciplines and behaviours can we start to provide software that matches behavioural and quality expectations humans have.

        Misguided misguidings over the EU bug bounty

        Posted by Josh Bressers on December 30, 2018 03:03 PM

        The EU recently announced they are going to sponsor a security bug bounty program for 14 open source projects in 2019. There has been quite a bit of buzz about this program in all the usual places. The opinions are all over the place. Some people wonder why those 14, some wonder why not more. Some think it’s great. Some think it’s a horrible idea.

        I don’t want to focus too much on the details as they are unimportant in the big picture. Which applications are part of the program don’t really matter. What matters is why are we here today and where should this go in the future.

        There are plenty of people claiming that a security bug bounty isn’t fair, we need to be paying the project developers, the people who are going to fix the bugs found by the bug bounty. Why are we only paying the people who find the bugs? This is the correct question, but it’s not correct for the reasons most think it is.

        There are a lot of details to unpack about all this and I don’t want to write a novel to explain all the nuance and complication around what’s going to happen. The TL;DR is basically this: The EU doesn’t have a way to pay the projects today, but they do have a way to pay security bug bounties.

        Right now if you want to pay a particular project, who do you send the check to? In some cases like the Apache Software Foundation it’s quite clear. In other cases when it’s some person who publishes a library for fun, it’s not clear at all. It may even be illegal in some cases, sending money across borders can get complicated very quickly. I’ll give a shoutout to Tidelift here, I think they’re on the right path to make this happen. The honest truth is it’s really really hard to give money to the open source projects you use.

        Now, the world of bug bounties has figured out a lot of these problems. They’ve gotten pretty good at paying people in various countries. Making sure the people getting paid are vetted in some way. And most importantly, they give an organization one place to send the check and one place to hold accountable. They’ve given us frameworks to specify who gets paid and for what. It’s quite nice really.  I wrote some musing about this a few years ago. I still mostly agree with my past self.

        So what does this all really mean is the big question. The EU is doing the only thing they can do right now. They have money to throw at the problem, the only place they can throw it today is a bug bounty, so that’s what they did. I think it’s great. Step one is basically admitting you have a problem.

        Where we go next is the real question. If nothing changes and bug bounties are the only way to spend money on open source, this will fizzle out as there isn’t going to be a massive return on investment. The projects are already overworked, they don’t need a bunch of new bugs to fix. We need a “next step” that will give the projects resources. Resources aren’t always money, sometimes it’s help, sometimes it’s gear, sometimes it’s pizza. An organization like the EU has money, they need help turning that into something useful to an open source project.

        I don’t know exactly what the next few steps will look like, but I do know the final step is going to be some framework that lets different groups fund open source projects. Some will be governments, some will be companies, some might even be random people who want to give a project a few bucks.

        Everyone is using open source everywhere. It’s woven into the fabric of our most critical infrastructures. It’s imperative we find a way to ensure it has the care and feeding it needs. Don’t bash this bug bounty program for being short sighted, praise it for being the first step of a long journey.

        On that note, if you are part of any of these projects (or any project really) and you want help dealing with security reports, get in touch, I will help you with security patches, advisories, and vulnerability coordination. I know what sort of pain you’ll have to deal with, open source security can be even less rewarding than running an open source project 🙂

        Nextcloud and badrequest filesize incorrect

        Posted by William Brown on December 30, 2018 02:00 PM

        Nextcloud and badrequest filesize incorrect

        My friend came to my house and was trying to share some large files with my nextcloud instance. Part way through the upload an error occurred.

        "Exception":"Sabre\\DAV\\Exception\\BadRequest","Message":"expected filesize 1768906752 got 1768554496"
        

        It turns out this error can be caused by many sources. It could be timeouts, bad requests, network packet loss, incorrect nextcloud configuration or more.

        We tried uploading larger files (by a factor of 10 times) and they worked. This eliminated timeouts as a cause, and probably network loss. Being on ethernet direct to the server generally also helps to eliminate packet loss as a cause compared to say internet.

        We also knew that the server must not have been misconfigured because a larger file did upload, so no file or resource limits were being hit.

        This also indicated that the client was likely doing the right thing because larger and smaller files would upload correctly. The symptom now only affected a single file.

        At this point I realised, what if the client and server were both victims to a lower level issue? I asked my friend to ls the file and read me the number of bytes long. It was 1768906752, as expected in nextcloud.

        Then I asked him to cat that file into a new file, and to tell me the length of the new file. Cat encountered an error, but ls on the new file indeed showed a size of 1768554496. That means filesystem corruption! What could have lead to this?

        HFS+

        Apple’s legacy filesystem (and the reason I stopped using macs) is well known for silently eating files and corrupting content. Here we had yet another case of that damage occuring, and triggering errors elsewhere.

        Bisecting these issues and eliminating possibilities through a scientific method is always the best way to resolve the cause, and it may come from surprising places!

        2018 Christmas Special - Is Santa GDPR compliant?

        Posted by Open Source Security Podcast on December 24, 2018 01:00 AM
        Josh and Kurt talk about which articles of the GDPR apply to Santa, and if he's following the rules the way he should be (spoiler, he's probably not). Should Santa be on his own naughty list? We also create a new holiday character - George the DPO Elf!


        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7999541/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Identity ideas …

        Posted by William Brown on December 20, 2018 02:00 PM

        Identity ideas …

        I’ve been meaning to write this post for a long time. Taking half a year away from the 389-ds team, and exploring a lot of ideas from other projects has led me to come up with some really interesting ideas about what we do well, and what we don’t. I feel like this blog could be divisive, as I really think that for our services to stay relevant we need to make changes that really change our own identity - so that we can better represent yours.

        So strap in, this is going to be long …

        What’s currently on the market

        Right now the market for identity has two extremes. At one end we have the legacy “create your own” systems, that are build on technologies like LDAP and Kerberos. I’m thinking about things like 389 Directory Server, OpenLDAP, Active Directory, FreeIPA and more. These all happen to be constrained heavily by complexity, fragility, and administrative workload. You need to spend months to learn these and even still, you will make mistakes and there will be problems.

        At the other end we have hosted “Identity as a Service” options like Azure AD and Auth0. These have very intelligently, unbound themself from legacy, and tend to offer HTTP apis, 2fa and other features that “just work”. But they are all in the cloud, and outside your control.

        But there is nothing in the middle. There is no option that “just works”, supports modern standards, and is unhindered by legacy that you can self deploy with minimal administrative fuss - or years of experience.

        What do I like from 389?

        • Replication

        The replication system is extremely robust, and has passed many complex tests for cases of eventual consistency correctness. It’s very rare to hear of any kind of data corruption or loss within our replication system, and that’s testament to the great work of people who spent years looking at the topic.

        • Performance

        We aren’t as fast as OpenLDAP is 1 vs 1 server, but our replication scalability is much higher, where in any size of MMR or read-only replica topology, we have higher horizontal scaling, nearly linear based on server additions. If you want to run a cloud scale replicated database, we scale to it (and people already do this!).

        • Stability

        Our server stability is well known with administrators, and honestly is a huge selling point. We see servers that only go down when administrators are performing upgrades. Our work with sanitising tools and the careful eyes of the team has ensured our code base is reliable and solid. Having extensive tests and amazing dedicated quality engineers also goes a long way.

        • Feature rich

        There are a lot of features I really like, and are really useful as an admin deploying this service. Things like memberof (which is actually a group resolution cache when you think about it …), automember, online backup, unique attribute enforcement, dereferencing, and more.

        • The team

        We have a wonderful team of really smart people, all of whom are caring and want to advance the state of identity management. Not only do they want to keep up with technical changes and excellence, they are listening to and want to improve our social awareness of identity management.

        Pain Points

        • C

        Because DS is written in C, it’s risky and difficult to make changes. People constantly make mistakes that introduce unsafety (even myself), and worse. No amount of tooling or intelligence can take away the fact that C is just hard to use, and people need to be perfect (people are not perfect!) and today we have better tools. We can not spend our time chasing our tails on pointless issues that C creates, when we should be doing better things.

        • Everything about dynamic admin, config, and plugins is hard and can’t scale

        Because we need to maintain consistency through operations from start to end but we also allow changing config, plugins, and more during the servers operation the current locking design just doesn’t scale. It’s also not 100% safe either as the values are changed by atomics, not managed by transactions. We could use copy-on-write for this, but why? Config should be managed by tools like ansible, but today our dynamic config and plugins is both a performance over head and an admin overhead because we exclude best practice tools and have to spend a large amount of time to maintain consistent data when we shouldn’t need to. Less features is less support overhead on us, and simpler to test and assert quality and correct behaviour.

        • Plugins to address shortfalls, but a bit odd.

        We have all these features to address issues, but they all do it … kind of the odd way. Managed Entries creates user private groups on object creation. But the problem is “unix requires a private group” and “ldap schema doesn’t allow a user to be a group and user at the same time”. So the answer is actually to create a new objectClass that let’s a user ALSO be it’s own UPG, not “create an object that links to the user”. (Or have a client generate the group from user attributes but we shouldn’t shift responsibility to the client.)

        Distributed Numeric Assignment is based on the AD rid model, but it’s all about “how can we assign a value to a user that’s unique?”. We already have a way to do this, in the UUID, so why not derive the UID/GID from the UUID. This means there is no complex inter-server communication, pooling, just simple isolated functionality.

        We have lots of features that just are a bit complex, and could have been made simpler, that now we have to support, and can’t change to make them better. If we rolled a new “fixed” version, we would then have to support both because projects like FreeIPA aren’t going to just change over.

        • client tools are controlled by others and complex (sssd, openldap)

        Every tool for dealing with ldap is really confusing and arcane. They all have wild (unhelpful) defaults, and generally this scares people off. I took months of work to get a working ldap server in the past. Why? It’s 2018, things need to “just work”. Our tools should “just work”. Why should I need to hand edit pam? Why do I need to set weird options in SSSD.conf? All of this makes the whole experience poor.

        We are making client tools that can help (to an extent), but they are really limited to system administration and they aren’t “generic” tools for every possible configuration that exists. So at some point people will still find a limit where they have to touch ldap commands. A common request is a simple to use web portal for password resets, which today only really exists in FreeIPA, and that limits it’s application already.

        • hard to change legacy

        It’s really hard to make code changes because our surface area is so broad and the many use cases means that we risk breakage every time we do. I have even broken customer deployments like this. It’s almost impossible to get away from, and that holds us back because it means we are scared to make changes because we have to support the 1 million existing work flows. To add another is more support risk.

        Many deployments use legacy schema elements that holds us back, ranging from the inet types, schema that enforces a first/last name, schema that won’t express users + groups in a simple away. It’s hard to ask people to just up and migrate their data, and even if we wanted too, ldap allows too much freedom so we are more likely to break data, than migrate it correctly if we tried.

        This holds us back from technical changes, and social representation changes. People are more likely to engage with a large migrational change, than an incremental change that disturbs their current workflow (IE moving from on prem to cloud, rather than invest in smaller iterative changes to make their local solutions better).

        • ACI’s are really complex

        389’s access controls are good because they are in the tree and replicated, but bad because the syntax is awful, complex, and has lots of traps and complexity. Even I need to look up how to write them when I have to. This is not good for a project that has such deep security concerns, where your ACI’s can look correct but actually expose all your data to risks.

        • LDAP as a protocol is like an 90’s drug experience

        LDAP may be the lingua franca of authentication, but it’s complex, hard to use and hard to write implementations for. That’s why in opensource we have a monoculture of using the openldap client libraries because no one can work out how to write a standalone library. Layer on top the complexity of the object and naming model, and we have a situation where no one wants to interact with LDAP and rather keeps it at arm length.

        It’s going to be extremely hard to move forward here, because the community is so fragmented and small, and the working groups dispersed that the idea of LDAPv4 is a dream that no one should pursue, even though it’s desperately needed.

        • TLS

        TLS is great. NSS databases and tools are not.

        • GSSAPI + SSO

        GSSAPI and Kerberos are a piece of legacy that we just can’t escape from. They are a curse almost, and one we need to break away from as it’s completely unusable (even if it what it promises is amazing). We need to do better.

        That and SSO allows loads of attacks to proceed, where we actually want isolated token auth with limited access scopes …

        What could we offer

        • Web application as a first class consumer.

        People want web portals for their clients, and they want to be able to use web applications as the consumer of authentication. The HTTP protocols must be the first class integration point for anything in identity management today. This means using things like OAUTH/OIDC.

        • Systems security as a first class consumer.

        Administrators still need to SSH to machines, and people still need their systems to have identities running on them. Having pam/nsswitch modules is a very major requirement, where those modules have to be fast, simple, and work correctly. Users should “imply” a private group, and UID/GID should by dynamic from UUID (or admins can override it).

        • 2FA/u2f/TOTP.

        Multi-factor auth is here (not coming, here), and we are behind the game. We already have Apple and MS pushing for webauthn in their devices. We need to be there for these standards to work, and to support the next authentication tool after that.

        • Good RADIUS integration.

        RADIUS is not going away, and is important in education providers and business networks, so RADIUS must “just work”. Importantly, this means mschapv2 which is the universal default for all clients to operate with, which means nthash.

        However, we can make the nthash unlinked from your normal password, so you can then have wifi password and a seperate loging password. We could even generate an NTHash containing the TOTP token for more high security environments.

        • better data structure (flat, defined by object types).

        The tree structure of LDAP is confusing, but a flatter structure is easier to manage and understand. We can use ideas from kubernetes like tags/labels which can be used to provide certain controls and filtering capabilities for searches and access profiles to apply to.

        • structured logging, with in built performance profiling.

        Being able to diagnose why an operation is slow is critical and having structured logs with profiling information is key to allowing admins and developers to resolve performance issues at scale. It’s also critical to have auditing of every single change made in the system, including internal changes that occur during operations.

        • access profiles with auditing capability.

        Access profiles that express what you can access, and how. Easier to audit, generate, and should be tightly linked to group membership for real RBAC style capabilities.

        • transactions by allowing batch operations.

        LDAP wants to provide a transaction system over a set of operations, but that may cause performance issues on write paths. Instead, why not allow submission of batches of changes that all must occur “at the same time” or “none”. This is faster network wise, protocol wise, and simpler for a server to implement.

        What’s next then …

        Instead of fixing what we have, why not take the best of what we have, and offer something new in parallel? Start a new front end that speaks in an accessible way, that has modern structures, and has learnt from the lessons of the past? We can build it to standalone, or proxy from the robust core of 389 Directory Server allowing migration paths, but eschew the pain of trying to bring people to the modern world. We can offer something unique, an open source identity system that’s easy to use, fast, secure, that you can run on your terms, or in the cloud.

        This parallel project seems like a good idea … I wonder what to name it …

        Episode 127 - Walled gardens, appstores, and more

        Posted by Open Source Security Podcast on December 17, 2018 12:56 AM
        Josh and Kurt talk about Mozilla pulling a paywall bypassing extension. We then turn our attention to talking about walled gardens. Are they good, are they bad? Something in the middle? There is a lot of prior art to draw on here, everything from Windows, Android, iOS, even Linux distributions.


        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7939541/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        PXE in a VM for Baremetal

        Posted by Adam Young on December 14, 2018 11:57 PM

        One of the main reasons for a strategy of “go virtual first” is the ease of checkpointing and restoring key pieces of infrastructure. When running a PXE provisioning system, the PXE server itslef is a piece of key infrastructure, and thus is a viable candidate for running in a Virtual Machine. How did I set up the network to make that possible? macvtap.

        The MacVTap device type allows us to allocate a single physical NIC on the Hypervisor to be used by the virtual machines. When creating the virtual machine to as a PXE server, I created the network device as type MacVTap, and told to to communicate directly with em1. This essentailly creates all off the required Linux Kernel abstractions to allow the virtual machine to access the NIC directly. It does not allocate the NIC to the VM (direct passthrough) so it can still be shared between multiple virtual machines on the same hypervisor.

        This Generates a fragment of the domain XML file that looks like this:

            <interface type="direct">
              <mac address="52:54:00:bb:7f:49">
              <source dev="em1" mode="vepa">
              <model type="virtio">
              
        </interface>

        I set mine up using VEPA. However, since I potentially want multiple VMs on this host o be able talk to each other efficiently, I might change this to Bridged in the future.

        Mapping Network Ports from Physical to Logical

        Posted by Adam Young on December 12, 2018 04:38 PM

        The Workstation on top of my server rack has 3 Ethernet ports.  One is built in to the mother board, and and two are on a card.  I want to use these three ports for different purposes.  How can I tell which is which internally? The answer lies in /sys/bus/pci/devices/.

        You can get most of what you need if you run nmcli with no arguments. 

        em1: disconnected
        "Intel 82571EB/82571GB D0/D1"
        ethernet (e1000e), 00:14:5E:75:F4:DA, hw, mtu 1500
        
        em2: disconnected
        "Intel 82571EB/82571GB D0/D1"
        ethernet (e1000e), 00:14:5E:75:F4:DB, hw, mtu 1500
        
        p3p1: disconnected
        "Realtek RTL8111/8168/8411"
        1 connection available
        ethernet (r8169), C8:1F:66:46:1A:43, hw, mtu 1500

        Some Output removed.

        One important exceptioj is that it does not show the pci data.

        ip a shows me this:

        $ ip a
        2: em1: <no-carrier> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
        link/ether 00:14:5e:75:f4:da brd ff:ff:ff:ff:ff:ff
        3: p3p1: <broadcast> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether c8:1f:66:46:1a:43 brd ff:ff:ff:ff:ff:ff
        4: em2: <broadcast> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 00:14:5e:75:f4:db brd ff:ff:ff:ff:ff:ff

        Again, some output removed.

        I can see the set of hardware devices from lspci.

        $ lspci | grep Ethernet
        01:00.0 Ethernet controller: Intel Corporation 82571EB/82571GB Gigabit Ethernet Controller D0/D1 (copper applications) (rev 06)
        01:00.1 Ethernet controller: Intel Corporation 82571EB/82571GB Gigabit Ethernet Controller D0/D1 (copper applications) (rev 06)
        03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
        

        The Dual socket card has the two Intel ports. The Realtek is the Motherboard port.  But the three address from ip a seem to be all over the place.  the p3p1 naming convention implies https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/

        Or, as the first comment here explains:

        ” The convention does actually make a little sense as it now names network ports on the motherboard as interface “emX” where X is a digit starting at 0. It then names ethernet ports that are on plugin cards as pYpZ where Y is the PCI slot number and Z is the port number on that card. It’s meant to make the naming of devices more consistent and to alleviate the occurrences of people complaining that their eth0 is now eth1 etc.

        I do not understand why I have an em2 device, though.  I wonder if I delete it, and reboot, if it will show up as an p3p* name instead?  Lets find out.

        2: p3p1: <broadcast> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether c8:1f:66:46:1a:43 brd ff:ff:ff:ff:ff:ff
        3: em1: <no-carrier> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
        link/ether 00:14:5e:75:f4:da brd ff:ff:ff:ff:ff:ff
        4: em2: <no-carrier> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
        link/ether 00:14:5e:75:f4:db brd ff:ff:ff:ff:ff:ff

        Nope.  And…only the motherboard is plugged in right now, and that is the link showing as Up, so p3p1 is the MoBo port, and the em* ports are the Intel.  But…what if we did not have physical access, as it likely the case when working with a machine in a datacenter?

        In general, linux systems have an enumeration of devices under their bus type in the /sys directory.  If run lspci, it is enumerating the sub-directories of

        $ ls /sys/bus/pci/devices/
        0000:00:00.0 0000:00:02.0 0000:00:14.0 0000:00:1a.0 0000:00:1c.0 0000:00:1c.4 0000:00:1f.0 0000:00:1f.3 0000:01:00.1 0000:04:00.0
        0000:00:01.0 0000:00:03.0 0000:00:16.0 0000:00:1b.0 0000:00:1c.3 0000:00:1d.0 0000:00:1f.2 0000:01:00.0 0000:03:00.0

        In much the same way the lsusb lists:

        # ls /sys/bus/usb/devices/
        1-0:1.0 1-1 1-1:1.0 2-0:1.0 2-1 2-1:1.0 3-0:1.0 3-10 3-10:1.0 3-10:1.1 3-2 3-2:1.0 3-2:1.1 3-6 3-6:1.0 3-7 3-7:1.0 4-0:1.0 usb1 usb2 usb3 usb4

        However, network devices are managed via a socket:

        [root@dialga ~]$ strace ip a 2>&1 | grep open
        open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
        open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
        open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
        open("/etc/iproute2/group", O_RDONLY) = 4
        [root@dialga ~]$ strace ip a 2>&1 | grep socket
        socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC, NETLINK_ROUTE) = 3

        And this is what nmcli uses.

        What if we wanted to script this, and match on the ids? ethtool pulls out the pci id. See the bus-info field:

        $ ethtool -i p3p1
        driver: r8169
        version: 2.3LK-NAPI
        firmware-version: rtl8168g-2_0.0.1 02/06/13
        expansion-rom-version: 
        bus-info: 0000:03:00.0
        supports-statistics: yes
        supports-test: no
        supports-eeprom-access: no
        supports-register-dump: yes
        supports-priv-flags: no
        

        In this case it is 0000:03:00.0 which is the name of the subdir under /sys/bus/pci/devices/

        $ ls /sys/bus/pci/devices/0000:03:00.0
        broken_parity_status consistent_dma_mask_bits dma_mask_bits enable local_cpulist msi_bus numa_node rescan resource0 resource4_wc subsystem_vendor vpd
        class d3cold_allowed driver firmware_node local_cpus msi_irqs power reset resource2 subsystem uevent
        config device driver_override irq modalias net remove resource resource4 subsystem_device vendor

        And notice that there is a subdirectory:

        $ ls /sys/bus/pci/devices/0000:03:00.0/net
        p3p1

        However…find does not work in this subtree:

        $ find /sys/bus/pci/devices/ -name net
        
        

        The p3p1 subdirectory has all of the Kernel exposed data about the network device that you are likely to need.

        Episode 126 - The not so dire future of supply chain security

        Posted by Open Source Security Podcast on December 10, 2018 01:15 AM
        Josh and Kurt continue the discussion from episode 125. We look at the possible future of software supply chains. It's far less dire than previously expected. It's likely there will be some change in the near future.


        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7856546/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Work around docker exec bug

        Posted by William Brown on December 08, 2018 02:00 PM

        Work around docker exec bug

        There is currently a docker exec bug in Centos/RHEL 7 that causes errors such as:

        # docker exec -i -t instance /bin/sh
        rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""
        

        As a work around you can use nsenter instead:

        PID=docker inspect --format {{.State.Pid}} <name of container>
        nsenter --target $PID --mount --uts --ipc --net --pid /bin/sh
        

        For more information, you can see the bugreport here.

        Episode 125 - Open Source, supply chains, npm, and you

        Posted by Open Source Security Podcast on December 03, 2018 01:00 AM
        Josh and Kurt talk about how open source deals with malicious events. It's probably impossible to stop these from happening, but the open source universe deals with it in its own unique way. We start to discuss what you can do, since everyone is using open source everywhere now. There will be a second part to this episode where we discuss what the future holds for these sort of problems.

        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7774118/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Updated Home Network Setup

        Posted by Adam Young on December 01, 2018 01:53 AM

        OpenStack is Network intensive. The setup I had previously, based around a Juniper Router, did not have enough Ports to reflect a real OpenStack deployment. I decided to forgo GigE speeds and get an older Cicso Catalyst 2960-WS Switch. Here is the new setup.

        Updated Home Network Connectivity

        I’ve added a Dell Inspiron 3647 Workstation as a jumphost/VM Host for director. It originally only had one Ethernet port, but I added a second dual nic card, so it is up to three. Two of these are connected to the switch.

        Each of the servers also have 2 of the four ports connected to the switch. One of the connections (black cables) are plugged into the ports that are setup for IPMI and PXE. Here is a schematic:

        Black: Power and Provisioning. Gray: Tenant, Admin, Storage. Blue: Serial Console

        The idea is that the black cables will be the power and provisioning network. IPMI is already set up on the poweredge machines:

        10.0.0.221 idrac-zygarde
        10.0.0.223 idrac-umbreon
        10.0.0.224 idrac-zubat

        The Grey Cables will be the tenant, storage, and admin networks, each on VLANs. Director will run on the Inspiron, in a virtual machine. I’ll be able top run one controller and two compute nodes, or three controllers with no compute nodes. Ideally, I would get 2 more Power Edge machines so I can run 3 controllers and two compute.

        Launching a VM From the virt-install command line interface

        Posted by Adam Young on November 30, 2018 03:16 PM

        I do this infrequently enough that I want to record a reminder how I do it:

        sudo cp ~/Downloads/rhel-server-7.6-x86_64-kvm.qcow2 /var/lib/libvirt/images/tower.qcow2
        sudo virt-install --vcpus=2  --name tower  --ram 4096  --import  --disk /var/lib/libvirt/images/tower.qcow2
        

        Manually Adding SSH Keys to a Cloud Image

        Posted by Adam Young on November 29, 2018 02:33 PM

        Not all of my virtual machines run on OpenStack; I have to run a fair number of virtual machines on my personal workstation via libvirt. However, I like using the cloud versions of RHEL, as they most closely match what I do run in OpenStack. The disconnect is that the Cloud images are designed to accept cloud-init, which pulls the ssh public keys from a metadata web server. Without that, there are no public keys added to the cloud-user account, and the VM is unaccessable. Here is how I add the ssh keys manually.

        Start by guest-mounting the image. You can do this from and to anywhere on your system. I ran:

        sudo guestmount -a /var/lib/libvirt/images/tower --rw /mnt/vms/tower/  -m /dev/sda1
        

        To add the key:

        sudo cp /home/ayoung/.ssh/id_rsa.pub /mnt/vms/tower/home/cloud_init//ssh/authorized_keys
        sudo chown 1000:1000 /mnt/vms/tower/home/cloud_init//ssh/authorized_keys
        

        The .ssh directory was pre-created with the right permissions, as was the authorized_keys file. If you overwrite it, it might be necessary to chmod the file as well:

        sudo chmod 600 /mnt/vms/tower/home/cloud_init/.ssh/authorized_keys
        

        Unmount and boot the virtual machine. To get the IP address:

        $ sudo umount /mnt/vms/tower 
        $ sudo virsh start tower
        Domain tower started
        
        $ sudo virsh list
         Id   Name    State    
        -----------------------
         2    tower   running  
        $ sudo virsh domifaddr 2
         Name       MAC address          Protocol     Address
        -------------------------------------------------------------------------------
         vnet0      52:54:00:01:cd:61    ipv4         192.168.122.252/24
        $ sudo virsh domifaddr 2
         Name       MAC address          Protocol     Address
        -------------------------------------------------------------------------------
         vnet0      52:54:00:01:cd:61    ipv4         192.168.122.252/24
        
        $ ssh cloud-user@192.168.122.252
        

        What’s up with backdoored npm packages?

        Posted by Josh Bressers on November 27, 2018 02:45 AM

        A story broke recently about a backdoor added to a Node Package Manager (NPM) package called event-stream. This package is downloaded about two million times a week by developers. That’s a pretty impressive amount, many projects would be happy with two million downloads a year.

        The Register did a pretty good writeup, I don’t want to recap the details here, I have a different purpose and that’s really to look at how does this happen and can we stop it?

        Firstly, the short answer is we can’t stop it. You can stop reading now if that’s all you came for. Go tell all your friends how smart you are for only using artisan C libraries instead of filthy NPM modules.

        The long answer is, well, long.

        So the thing is event-stream is an open source project. There are a lot of open source projects. More than we can count. Probably millions. The VAST majority of open source projects are not well funded or run by people getting paid to work on their project. A few are, Linux and Apache are easy examples. These are not the norm though.

        We’ll use event-stream as our example for obvious reasons. If we look at the contributions graph, we see a project that isn’t swimming in help or commits. This is probably a pretty normal looking set of contributions for most open source.

        So the way it works is if I want to help, I just pretty much start helping. It’s likely they’ll accept my pull requests without too much fanfare. I could probably become a committer in a few weeks if I really wanted to. People like it when someone helps them out, we like helpers, helpers are our friends. Humans evolved to generally trust each other because we’re mostly good. We don’t deal well with bad actors. It’s a good thing we don’t try to account for bad actors in our every day lives, it will drive you mad (I’m looking at you security industry).

        So basically someone asked if they could help, they were allowed to help, then they showed their true intent once they got into the building. This is not the first time this has happened. It’s happened many times before, I pretty much guarantee it’s happening right now with other open source projects. We don’t have a good way to know who has nefarious intent when they start helping a project.

        At this point if your first thought is to stop using open source you should probably slow down a little bit. Open source already won, you can’t avoid it, don’t pretend you can. Anyone who tells you different is an idiot or trying to sell you something (or both).

        As long as open source is willing to allow people to contribute, this problem will exist. If people can’t contribute to open source, it’s not very open. There are some who will say we should make sure we review every commit or run some sort of code scanner or some other equally unlikely process. The reality is a small project just can’t do this, they don’t have the resources and probably never will.

        It’s not all doom and gloom though, the real point to this story is that open source worked exactly how it is meant to work. Someone did something bad, it was noticed, and quickly fixed. There are some people who will be bitten by this package, it sucks if you’re one of them, but it’s just how things work sometimes.

        It’s a bit like public health. We can’t stop all disease, some number of people will get sick, some will even die. We can’t prevent all disease but we can control things well enough that there isn’t an epidemic that wipes out huge numbers of people. Prevention is an impossible goal, containment is not.

        This problem was found and fixed in a pretty reasonable amount of time, that’s pretty good. Our focus shouldn’t be on prevention, prevention is impossible. We should focus on containment. When something bad starts to happen, we make sure we can stop it quickly. Open source is really a story about containment, not a story about prevention. We like to focus on prevention because it sounds like the better option, but it’s really impossible. What happened with event-stream isn’t a tire fire, it’s how open source works. It will happen again. The circle will be unbroken.

        Episode 124 - Cloudflare's service workers and the economics of security

        Posted by Open Source Security Podcast on November 26, 2018 01:39 AM
        Josh and Kurt talk about Cloudflare's new Workers service. We spend a lot of time discussing how economics drives technology, not security. It's quite likely this new service is less secure than existing alternatives, but it will be cheaper and faster which will matter more than security.

        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7665482/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Scoped and Unscoped access policy in OpenStack

        Posted by Adam Young on November 22, 2018 02:56 PM

        Ozz did a fantastic job laying out the rules around policy. This article assumes you’ve read that. I’ll wait.

        I’d like to dig a little deeper into how policy rules should be laid out, and a bit about the realities of how OpenStack policy has evolved.

        OpenStack uses the policy mechanisms describe to limit access to various APIs. In order to make sensible decisions, the policy engine needs to know some information about the request, and the user that is making it.

        There are two classes of APIs in OpenStack, scoped and unscoped. A Scoped API is one where the resource is assigned to a project, or possibly, a domain. Since Domains are only used for Keystone, we’ll focus on projects for now. The other class of APIs are where the resources are not scoped to a project or domain, but rather belong to the cloud as a whole. A good example is a Nova Hypervisor.

        The general approach to accessing scoped resources is to pass two checks. The first check is that the auth-data associated with the token has one of the appropriate roles in it. The second is that the auth-data is scoped to the same project as the resource.

        For an example, lets look at the cinder API to for volumes. The API to create a new volume is:

        POST /v3/{project_id}/volumes

        and the API to then read the volume is

        GET /v3/{project_id}/volumes/{volume_id}

        The default policy.yaml for these APIs shows as:

        # Create volume.
        # POST /volumes
        #"volume:create": ""
        
        # Show volume.
        # GET /volumes/{volume_id}
        #"volume:get": "rule:admin_or_owner"

        We’ll dig a little deeper into these in a moment.

        One thing that distinguishes Cinder from many other APIs is that it
        includes the project ID in the URL. This makes it easier to see what
        the policy is that we need to enforce. For example, if I have a
        Project ID of a226dc9813f745e19ece3d60ac5a351c and I want to create a
        volume in it, I call:

        POST https://cinderhost/v3/a226dc9813f745e19ece3d60ac5a351c/volumes

        With the appropriate payload. Since the volume does not exist yet, we
        have enough information to enforce policy right up front. If the
        token I present has the following data in it:

        {
          "token": {
            "methods": [
               "password"
             ],
             "roles": [
                {
                    "id": "f03fda8f8a3249b2a70fb1f176a7b631",
                     "name": "Member"
                }
             ],
             "project": {
                "id": "a226dc9813f745e19ece3d60ac5a351c",
                "domain": {
                     "id": "default",
                     "name": "Default"
                },
                "enabled": true,
                "description": null,
                "name": "tenant_name1"
             },
          }
        }

        Lets take another look at the policy rule to create a volume:

        "volume:create": ""
        

        There are no restrictions placed on this API. Does this mean that
        anyone can create a volume? Not quite.

        Just because oslo-policy CAN be used to enforce access does not mean
        it is the only thing that does so. Since each of the services in
        OpenStack have had long lives of their own, we find quirks like this.
        In this case, the URL structure that has the project ID in it is
        checked against the token externally to the oslo-policy check.

        It also means that no role is enforced on create. Any user, with any
        role on the project can create a volume.

        What about afterwards? The rule on the get command is

        "volume:get": "rule:admin_or_owner"
        

        Here’s another gotcha. Each service has its own definition of what is
        meant by an owner. You need to look at the service specific definition
        of the rule to see what this means.

        # Default rule for most non-Admin APIs.
        #"admin_or_owner": "is_admin:True or (role:admin and is_admin_project:True) or project_id:%(project_id)s"
        

        If you have understood the earlier two articles, you should be able to
        interpret most of this rule. Lets start with the rightmost section:

        `or project_id:%(project_id)s"`
        

        The or rule means that, even if everything before this failed, we can
        still pass if we pass only the part that follows. In this case, it is
        doing the kind of scope check I described above: that the project_id
        on from the token’s auth-data matches the project_id on the volume
        object. While this is Cinder, and it is still doing the check based
        on the URL, it aslo checks based on the resource, in this case the
        volume. That means that this chech can’t happen until Cinder fetches
        the Volume record from the database. There is no role check on this
        API. A user with any role assigned on the project will be able to
        execute the API.

        What about the earlier parts of the rule? Lets start with the part we
        can explain with the knowledge we have so far:

        `role:admin`
        

        This is a generic check that the user has the role assigned on the
        token. If we were to look at this rule a couple years ago, this would have
        been the end of the check. Instead, we see it is coupled with

        `and is_admin_project:True`
        

        This is an additional flag on the token’s auth data. It is attempting
        to mitigate one of the oldest bugs in the bug tracker.

        Bug 968696: “admin”-ness not properly scoped

        Another way to describe this bug is to say that most policy rules were
        written too permissive. A user that was assigned the `admin` role
        anywhere ended up having `admin` permissions everywhere.

        This breaks the scoping concept we discussed earlier.

        So, what this flag implies is that the project that the user’s token is scoped
        to is designated as the `admin` project in Keystone. If this is the
        case, the token will have this additional flag set.

        Essentially, the `admin` project is a magic project with elevated
        privileges.

        This provides a way to do cloud-wide administration tasks.

        What about that first rule:

        `is_admin:True`
        

        This is a value set by the code inside the Cinder service. A similar
        pattern exists in most projects in OpenStack. It is a way for cinder
        to be able to override the policy check for internal operations. Look
        in the code for places that call get_admin_context() such as:

        volume_types = db.volume_type_get_all(context.get_admin_context(),
        False)
        

        What about those unscope APIs we were looking at earlier? It turns
        out, they are mostly implemented with the first half of the cinder
        Rule. For example, Update cluster has the policy rule

        # PUT /clusters/{cluster_id}
        "clusters:update": "rule:admin_api"
        

        which is implemented as

        # Default rule for most Admin APIs.
        "admin_api": "is_admin:True or (role:admin and is_admin_project:True)"
        

        One requirement that the Operator community had was that they needed to be able to do cloud wide operations, even when the operations should have been scoped to a project. List all VMs, list all users, and other types of operations were allowed to happen with admin-scoped tokens. This really obscured the difference between globlal and project scoped operations.

        The is_admin_project hack works, but it is a bit esoteric. One current effort in the Keystone community is to do something a little more readable: actully have peroper scoping for things that are outside of projects. We are calling this Service scoping. Service scoped roles are available in the Rocky release, and can be used much as is_admin_project to mitigate bug 968696.

        Dependencies in open source

        Posted by Josh Bressers on November 19, 2018 11:41 PM

        The topic of securing your open source dependencies just seems to keep getting bigger and bigger. I always expect it to get less attention for some reason, and every year I’m wrong about what’s happening out there. I remember when I first started talking about this topic, nobody really cared about it. It’s getting a lot more traction these days, especially as we see stories about open source dependencies being wildly out of date and some even being malicious backdoors.

        So what does it really mean to have dependencies? Ignoring the topic of open source for a minute, we should clarify what a dependency is. If you develop software today, there’s no way you build everything yourself. Even if you’re writing something in a low level language there are other libraries you rely on to do certain things for you. Just printing “hello world” calls into another library to actually print the text on the screen. Nobody builds at this level outside of a few select low level projects. Because of this we use code and applications that someone else wrote. If your business is about selling something online, writing your own web server would be a massive cost. It’s far cheaper to find a web server someone else wrote. The web server would be a dependency. If the web server is open source (which is probably is), we would call that an open source dependency.

        Now that we grasp the idea of a dependency, what makes an open source dependency different? Fundamentally there is no difference. The differences revolve around availability and perception. Open source is very available. Anyone can find it and use it without almost no barrier to entry. If you have a library or application you have to purchase from another company the availability is a very different story. It’s going to take some time and effort to find that dependency. Open source doesn’t need this level of time and effort.

        If you visit github.com to download code to include in your project, or you visit stack overflow for help, or if you find snippits using a search engine, you understand the simplicity of finding and using open source code. This is without question one of the reasons open source won. If you have a deadline of next month are you going to use the library you can find and use right now, or spend three weeks trying to find and buy a library from another company? Even if it’s not as good, having something right now is a massive advantage.

        The perception aspect of open source is sort of a unique beast. I still see some people who wonder if this open source thing will catch on. I secretly feel bad for those people (not very bad though). There are also some who think open source is a huge free buffet of solutions they can take whatever they want and never look at it again. Neither of these attitudes are healthy. I’m going to ignore the group that wonders if open source is a real thing. If you found this blog you’re almost certainly not one of those people. What needs to be understood is that open source isn’t free. You can’t just take things then use them without consequence, all software, including open source has to be cared for during its life. It’s free like a puppy in that regard.

        Obviously this is a security focused blog. If you’re using any software you have to worry about security updates in your software. Security bugs are found in all software. It’s up to us to decide how and when to fix them. If you include dependencies in whatever you’re doing (and you are certainly including them) ignoring security issues is going to end badly someday. Probably not right away, but I’ve never seen that story end well.

        Something that’s not often understood about all this open source, is open source usually depends on other open source. It’s turtles all the way down! It’s very common for any bit of code to depend on something else. Think about this in the context of a complicated machine like your car. Your car has parts, lots of parts. Most of those parts are built from multiple parts, which have parts. Eventually we get to parts that are mostly bolts and pieces of metal. Software works like this to a degree. Complex pieces of software are built with less complex pieces of software.

        This post is meant to be the first part in what I suspect will be a very long series talking about how open source dependencies are, how they work, and what you need to do about it. It’s not terribly difficult to understand all this, but it’s not very obvious either.

         

        Episode 123 - Talking about Kubernetes and container security with Liz Rice

        Posted by Open Source Security Podcast on November 19, 2018 01:02 AM
        Josh and Kurt talk to Liz Rice about Kubernetes and container security. How did we get where we are today, what's new and exciting today, and where do we think things are going.

        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7607201/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Container Labeling

        Posted by Dan Walsh on November 12, 2018 02:01 PM

        An issue was recently raised on libpod, the github repo for Podman.

        "container_t isn't allowed to access container_var_lib_t"

        Container policy is defined in the container-selinux package. By default containers run with the SELinux type "container_t" whether this is a container launched by just about any container engine like: podman, cri-o, docker, buildah, moby.  And most people who use SELinux with containers from container runtimes like runc, systemd-nspawn use it also.

        By default container_t is allowed to read/execute labels under /usr, read generically labeled content in the hosts /etc directory (etc_t). 

        The default label for content in /var/lib/docker and  /var/lib/containers is container_var_lib_t, This is not accessible by  containers, container_t,  whether they are running under podman, cri-o,  docker, buildah ...  We specifically do not want containers to be able to read this content, because content that uses block devices like devicemapper and btrfs(I believe) is labeled container_var_lib_t, when the containers are not running.  

        For overlay content we need to allow containers to read/execute the content, we use the type container_share_t, for this content.  So container_t is allowed to read/execute container_share_t files, but not write/modify them.

        Content under /var/lib/containers/overlay* and /var/lib/docker/overlay* is labeled container_share_ by default.

        $ grep overlay /etc/selinux/targeted/contexts/files/file_contexts
        /var/lib/docker/overlay(/.*)? system_u:object_r:container_share_t:s0
        /var/lib/docker/overlay2(/.*)? system_u:object_r:container_share_t:s0
        /var/lib/containers/overlay(/.*)? system_u:object_r:container_share_t:s0
        /var/lib/containers/overlay2(/.*)? system_u:object_r:container_share_t:s0
        /var/lib/docker-latest/overlay(/.*)? system_u:object_r:container_share_t:s0
        /var/lib/docker-latest/overlay2(/.*)? system_u:object_r:container_share_t:s0
        /var/lib/containers/storage/overlay(/.*)? system_u:object_r:container_share_t:s0
        /var/lib/containers/storage/overlay2(/.*)? system_u:object_r:container_share_t:s0

        The label container_file_t is the only type that is writeable by containers.  container_file_t  is used when the overlay mount is created for the upper directory  of an image. It is also used for content mounted from devicemapper and btrfs.  

        If you  volume mount in a directory into  a container and add a :z or :Z the container engines relabeled the content under the volumes to container_file_t.

        Failure to read/write/execute content labeled container_var_lib_t is expected.  

        When I see this type of AVC, I expect that this is either a volume mounted in  from /var/lib/container or /var/lib/docker or a mislabeled content  under and overlay directory like /var/lib/containers/storage/overlay.  

        Solution:

        To solve these, I usually recommend running 

        restorecon -R -v /var/lib/containers
        restorecon -R -v /var/lib/docker

        Or if it is a volume mount to use the :z, or :Z/


        Episode 122 - What will Apple's T2 chip mean for the rest of us?

        Posted by Open Source Security Podcast on November 12, 2018 04:01 AM
        Josh and Kurt talk about Apple's new T2 security chip. It's not open source but we expect it to change the security landscape in the coming years.


        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7523042/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Episode 121 - All about the security of voting

        Posted by Open Source Security Podcast on November 05, 2018 01:01 AM
        Josh and Kurt talk about voting security. What does it mean, how does it work. What works, what doesn't work, and most importantly why we may not see secure electronic voting anytime soon.


        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7429520/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        High Available RADVD on Linux

        Posted by William Brown on October 31, 2018 02:00 PM

        High Available RADVD on Linux

        Recently I was experimenting again with high availability router configurations so that in the cause of an outage or a failover the other router will take over and traffic is still served.

        This is usually done through protocols like VRRP to allow virtual ips to exist that can be failed between. However with ipv6 one needs to still allow clients to find the router, and in the cause of a failure, the router advertisments still must continue for client renewals.

        To achieve this we need two parts. A shared Link Local address, and a special RADVD configuration.

        Because of howe ipv6 routers work, all traffic (even global) is still sent to your link local router. We can use an address like:

        fe80::1:1
        

        This doesn’t clash with any reserved or special ipv6 addresses, and it’s easy to remember. Because of how link local works, we can put this on many interfaces of the router (many vlans) with no conflict.

        So now to the two components.

        Keepalived

        Keepalived is a VRRP implementation for linux. It has extensive documentation and sometimes uses some implementation specific language, but it works well for what it does.

        Our configuration looks like:

        #  /etc/keepalived/keepalived.conf
        global_defs {
          vrrp_version 3
        }
        
        vrrp_sync_group G1 {
         group {
           ipv6_ens256
         }
        }
        
        vrrp_instance ipv6_ens256 {
           interface ens256
           virtual_router_id 62
           priority 50
           advert_int 1.0
           virtual_ipaddress {
            fe80::1:1
            2001:db8::1
           }
           nopreempt
           garp_master_delay 1
        }
        

        Note that we provide both a global address and an LL address for the failover. This is important for services and DNS for the router to have the global, but you could omit this. The LL address however is critical to this configuration and must be present.

        Now you can start up vrrp, and you should see one of your two linux machines pick up the address.

        RADVD

        For RADVD to work, a feature of the 2.x series is required. Packaging this for el7 is out of scope of this post, but fedora ships the version required.

        The feature is that RADVD can be configured to specify which address it advertises for the router, rather than assuming the interface LL autoconf address is the address to advertise. The configuration appears as:

        # /etc/radvd.conf
        interface ens256
        {
            AdvSendAdvert on;
            MinRtrAdvInterval 30;
            MaxRtrAdvInterval 100;
            AdvRASrcAddress {
                fe80::1:1;
            };
            prefix 2001:db8::/64
            {
                AdvOnLink on;
                AdvAutonomous on;
                AdvRouterAddr off;
            };
        };
        

        Note the AdvRASrcAddress parameter? This defines a priority list of address to advertise that could be available on the interface.

        Now start up radvd on your two routers, and try failing over between them while you ping from your client. Remember to ping LL from a client you need something like:

        ping6 fe80::1:1%en1
        

        Where the outgoing interface of your client traffic is denoted after the ‘%’.

        Happy failover routing!

        Episode 120 - Bloomberg and hardware backdoors - it's already happening

        Posted by Open Source Security Podcast on October 29, 2018 12:01 AM
        Josh and Kurt talk about Bloomberg's story about backdoors and motherboards. The story is probably false, but this is almost certainly happening already with hardware. What does it mean if your hardware is already backdoored by one or more countries?

        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7345613/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Targeted vs General purpose security

        Posted by Josh Bressers on October 23, 2018 01:13 PM

        There seems to be a lot of questions going around lately about how to best give out simple security advice that is actionable. Goodness knows I’ve talked about this more than I can even remember at this point. The security industry is really bad at giving out actionable advice. It’s common someone will ask what’s good advice. They’ll get a few morsels, them someone will point out whatever corner case makes that advice bad and the conversation will spiral into nonsense where we find ourselves trying to defend someone mostly concerned about cat pictures from being kidnapped by a foreign nation. Eventually whoever asked for help quit listening a long time ago and decided to just keep their passwords written on a sticky note under the keyboard.

        I’m pretty sure the fundamental flaw in all this thinking is we never differentiate between a targeted attack and general purpose security. They are not the same thing. They’re incredibly different in fact. General purpose advice can be reasonable, simple, and good. If you are a target you’ve already lost, most advice won’t help you.

        General purpose security is just basic hygiene. These are the really easy concepts. Ideas like using a password manager, multi-factor-auth, install updates on your system. These are the activities anyone and everyone should be doing. One could argue these should be the default settings for any given computer or service (that’s a post for another day though). You don’t need to be a security genius to take these steps. You just have to restrain yourself from acting like a crazy person so whoever asked for help can actually get the advice they need.

        Now if you’re the target of a security operation, things are really different. Targeted security is when you’re an active target, someone has picked you out for some reason and has a specific end goal in mind. This is the sort of attack where people will send you very specific phishing mails. They will probably try to brute force your password to a given account. They might call friends and family. Maybe even looking through your trash for clues they can use. If you are a target the goal isn’t to stop the attacker, it’s just to slow them down enough so you know you’re under attack. Once you know you’re under attack you can find a responsible adult to help.

        These two things are very different. If you try to lump them together you end up with no solution, and at best a confused audience. In reality you probably end up with no audience because you sound like a crazy person.

        Here is an example. Let’s say someone asks for some advice for people connecting to public wifi. Then you get a response about how your pen test used public wifi against an employee to steal their login credentials. That’s not a sane comparison. If you have a specific target in mind you can play off their behaviors and typical activities. You knew which sites they visit, you knew which coffee house they like. You knew which web browser and operating system they had. You had a level of knowledge that put the defender in a position they couldn’t defend against. General security doesn’t work like that.

        The goal of general purpose advice is to be, well, general. This is like telling people to wash their hands. You don’t get into specifics about if they’ve been in contact with flesh eating bacteria and how they should be keeping some incredibly strong antiseptic on hand at all times just in case. Actual advice is to get some soap, pretty much any soap is fine, and wash your hands. That’s it. If you find yourself in the company of flesh eating bacteria in the future, go find someone who specializes in such a field. They’ll know what to actually do. Keeping special soap under your sink isn’t going to be one of the things they suggest.

        There’s nothing wrong with telling people the coffee house wifi is probably OK for many things. Don’t do banking from it, make sure you have an updated browser and operating system. Stay away from dodgy websites. If things start to feel weird, stop using the wifi. The goal isn’t to eliminate all security threats, it’s just to make things a little bit better. Progress is made one step at a time, not in one massive leap. Massive leaps are how you trip and fall.

        And if you are a specific target, you can only lose. You aren’t going to stop that attacker. Targeted attacks, given enough time, never fail.

        Episode 119 - The Google+ and Facebook incidents, it's not your data anymore

        Posted by Open Source Security Podcast on October 22, 2018 12:22 AM
        Josh and Kurt talk about the Google+ and Facebook data incidents. We don't have any control over this data anymore. The incidents didn't really affect the users because we have no idea who has access to it. We also touch on GDPR and what it could mean in this context.

        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7262717/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Rust RwLock and Mutex Performance Oddities

        Posted by William Brown on October 18, 2018 02:00 PM

        Rust RwLock and Mutex Performance Oddities

        Recently I have been working on Rust datastructures once again. In the process I wanted to test how my work performed compared to a standard library RwLock and Mutex. On my home laptop the RwLock was 5 times faster, the Mutex 2 times faster than my work.

        So checking out my code on my workplace workstation and running my bench marks I noticed the Mutex was the same - 2 times faster. However, the RwLock was 4000 times slower.

        What’s a RwLock and Mutex anyway?

        In a multithreaded application, it’s important that data that needs to be shared between threads is consistent when accessed. This consistency is not just logical consistency of the data, but affects hardware consistency of the memory in cache. As a simple example, let’s examine an update to a bank account done by two threads:

        acc = 10
        deposit = 3
        withdrawl = 5
        
        [ Thread A ]            [ Thread B ]
        acc = load_balance()    acc = load_balance()
        acc = acc + deposit     acc = acc - withdrawl
        store_balance(acc)      store_balance(acc)
        

        What will the account balance be at the end? The answer is “it depends”. Because threads are working in parallel these operations could happen:

        • At the same time
        • Interleaved (various possibilities)
        • Sequentially

        This isn’t very healthy for our bank account. We could lose our deposit, or have invalid data. Valid outcomes at the end are that acc could be 13, 5, 8. Only one of these is correct.

        A mutex protects our data in multiple ways. It provides hardware consistency operations so that our cpus cache state is valid. It also allows only a single thread inside of the mutex at a time so we can linearise operations. Mutex comes from the word “Mutual Exclusion” after all.

        So our example with a mutex now becomes:

        acc = 10
        deposit = 3
        withdrawl = 5
        
        [ Thread A ]            [ Thread B ]
        mutex.lock()            mutex.lock()
        acc = load_balance()    acc = load_balance()
        acc = acc + deposit     acc = acc - withdrawl
        store_balance(acc)      store_balance(acc)
        mutex.unlock()          mutex.unlock()
        

        Now only one thread will access our account at a time: The other thread will block until the mutex is released.

        A RwLock is a special extension to this pattern. Where a mutex guarantees single access to the data in a read and write form, a RwLock (Read Write Lock) allows multiple read-only views OR single read and writer access. Importantly when a writer wants to access the lock, all readers must complete their work and “drain”. Once the write is complete readers can begin again. So you can imagine it as:

        Time ->
        
        T1: -- read --> x
        T3:     -- read --> x                x -- read -->
        T3:     -- read --> x                x -- read -->
        T4:                   | -- write -- |
        T5:                                  x -- read -->
        

        Test Case for the RwLock

        My test case is simple. Given a set of 12 threads, we spawn:

        • 8 readers. Take a read lock, read the value, release the read lock. If the value == target then stop the thread.
        • 4 writers. Take a write lock, read the value. Add one and write. Continue until value == target then stop.

        Other conditions:

        • The test code is identical between Mutex/RwLock (beside the locking costruct)
        • –release is used for compiler optimisations
        • The test hardware is as close as possible (i7 quad core)
        • The tests are run multiple time to construct averages of the performance

        The idea being that X target number of writes must occur, while many readers contend as fast as possible on the read. We are pressuring the system of choice between “many readers getting to read fast” or “writers getting priority to drain/block readers”.

        On OSX given a target of 500 writes, this was able to complete in 0.01 second for the RwLock. (MBP 2011, 2.8GHz)

        On Linux given a target of 500 writes, this completed in 42 seconds. This is a 4000 times difference. (i7-7700 CPU @ 3.60GHz)

        All things considered the Linux machine should have an advantage - it’s a desktop processor, of a newer generation, and much faster clock speed. So why is the RwLock performance so much different on Linux?

        To the source code!

        Examining the Rust source code , many OS primitives come from libc. This is because they require OS support to function. RwLock is an example of this as is mutex and many more. The unix implementation for Rust consumes the pthread_rwlock primitive. This means we need to read man pages to understand the details of each.

        OSX uses FreeBSD userland components, so we can assume they follow the BSD man pages. In the FreeBSD man page for pthread_rwlock_rdlock we see:

        IMPLEMENTATION NOTES
        
         To prevent writer starvation, writers are favored over readers.
        

        Linux however, uses different constructs. Looking at the Linux man page:

        PTHREAD_RWLOCK_PREFER_READER_NP
          This is the default.  A thread may hold multiple read locks;
          that is, read locks are recursive.  According to The Single
          Unix Specification, the behavior is unspecified when a reader
          tries to place a lock, and there is no write lock but writers
          are waiting.  Giving preference to the reader, as is set by
          PTHREAD_RWLOCK_PREFER_READER_NP, implies that the reader will
          receive the requested lock, even if a writer is waiting.  As
          long as there are readers, the writer will be starved.
        

        Reader vs Writer Preferences?

        Due to the policy of a RwLock having multiple readers OR a single writer, a preference is given to one or the other. The preference basically boils down to the choice of:

        • Do you respond to write requests and have new readers block?
        • Do you favour readers but let writers block until reads are complete?

        The difference is that on a read heavy workload, a write will continue to be delayed so that readers can begin and complete (up until some threshold of time). However, on a writer focused workload, you allow readers to stall so that writes can complete sooner.

        On Linux, they choose a reader preference. On OSX/BSD they choose a writer preference.

        Because our test is about how fast can a target of write operations complete, the writer preference of BSD/OSX causes this test to be much faster. Our readers still “read” but are giving way to writers, which completes our test sooner.

        However, the linux “reader favour” policy means that our readers (designed for creating conteniton) are allowed to skip the queue and block writers. This causes our writers to starve. Because the test is only concerned with writer completion, the result is (correctly) showing our writers are heavily delayed - even though many more readers are completing.

        If we were to track the number of reads that completed, I am sure we would see a large factor of difference where Linux has allow many more readers to complete than the OSX version.

        Linux pthread_rwlock does allow you to change this policy (PTHREAD_RWLOCK_PREFER_WRITER_NP) but this isn’t exposed via Rust. This means today, you accept (and trust) the OS default. Rust is just unaware at compile time and run time that such a different policy exists.

        Conclusion

        Rust like any language consumes operating system primitives. Every OS implements these differently and these differences in OS policy can cause real performance differences in applications between development and production.

        It’s well worth understanding the constructions used in programming languages and how they affect the performance of your application - and the decisions behind those tradeoffs.

        This isn’t meant to say “don’t use RwLock in Rust on Linux”. This is meant to say “choose it when it makes sense - on read heavy loads, understanding writers will delay”. For my project (A copy on write cell) I will likely conditionally compile rwlock on osx, but mutex on linux as I require a writer favoured behaviour. There are certainly applications that will benefit from the reader priority in linux (especially if there is low writer volume and low penalty to delayed writes).

        Creating a Self Trust In Keystone

        Posted by Adam Young on October 18, 2018 02:44 AM

        Lets say you are an administrator of an OpenStack cloud. This means you are pretty much all powerful in the deployment. Now, you need to perform some operation, but you don’t want to give it full admin privileges? Why? well, do you work as root on your Linux box? I hope note. Here’s how to set up a self trust for a reduced set of roles on your token.

        First, get a regular token, but use the –debug to see what the project ID, role ID, and your User ID actually are:

        In my case, they are … long uuids.

        I’ll trim them down both for obscurity as well as the make it more legible. Here is the command to create the trust.

        openstack trust create --project 9417f7 --role 9fe2ff 154741 154741

        Mine returned:

        +--------------------+----------------------------------+
        | Field              | Value                            |
        +--------------------+----------------------------------+
        | deleted_at         | None                             |
        | expires_at         | None                             |
        | id                 | 26f8d2                           |
        | impersonation      | False                            |
        | project_id         | 9417f7                           |
        | redelegation_count | 0                                |
        | remaining_uses     | None                             |
        | roles              | _member_                         |
        | trustee_user_id    | 154741                           |
        | trustor_user_id    | 154741                           |
        +--------------------+----------------------------------+
        

        On my system, role_id 9fe2ff is the _member_role.

        Note that, if you are Admin, you need to explicitly grant yourself the _member_ role, or use an implied role rule that says admin implies member.

        Now, you can get a reduced scope token. Unset the variables that are used to scope the token, since you want to scope to the trust now.

        $ unset OS_PROJECT_DOMAIN_NAME 
        $ unset OS_PROJECT_NAME 
        $ openstack token issue --os-trust-id  26f8d2eaf1404489ab8e8e5822a0195d
        +------------+----------------------------------+
        | Field      | Value                            |
        +------------+----------------------------------+
        | expires    | 2018-10-18T10:31:57+0000         |
        | id         | f16189                           |
        | project_id | 9417f7                           |
        | user_id    | 154741                           |
        +------------+----------------------------------+
        

        This still requires you to authenticate with your userid and password. An even better mechanism is the new Application Credentials API. It works much the same way, but you use an explicitly new password. More about that next time.

        Episode 118 - Cloudflare's IPFS and onion service

        Posted by Open Source Security Podcast on October 15, 2018 01:39 AM
        Josh and Kurt talk about Cloudflare's new IPFS and Onion services. One brings distributed blockchain files to the masses, the other lets you host your site on tor easily.

        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7128770/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Episode 117 - Will security follow Linus' lead on being nice?

        Posted by Open Source Security Podcast on October 08, 2018 12:01 AM
        Josh and Kurt talk about Linus' effort to work on his attitude. What will this mean for security and IT in general?

        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7128768/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Imagining Go With Alternate Boards

        Posted by Adam Young on October 04, 2018 03:47 PM

        The Game of Go is still pretty much the ultimate strategy game.  No other game distills strategy to its essence, in such simplicity, and thus lets the complexity emerge. 

        The board is simplicity itself: a 19 X 19 Grid:

        Standard Go board

        One aspect of Go is that you start from the corners, build a semi-secure formation, and then grow out from there.

        For example, here is a recent game of mine in the early stages:

        Both my opponent and I have staked out positions in the corners.

        What if the board was a little different?

        Chinese Checkers has 6 Points where people start:

        The 6 corners allow 6 players a secure base to start from.

        What if we adapt this idea into a Go board?

        More Corners

        Sometimes, less is more.

        Fewer corners. Each player picks one to start.

        One interesting aspect of the Stratego board is that it has terrain features in the middle:

        Courtesy of Mark Alldrige. Stratego is copyright Milton Bradley, 1980

        What if we cut out a some spaces in the middle of a go board, like this:

        Terrain features in the middle of the battlefield

        Or combine some of these ideas:

        Corners and Center are now out of play.

        How would each of these variations modify the game? Many of the tactical patterns would stay the same, but would have a different role in the overall strategy.

        SELinux blocks podman container from talking to libvirt

        Posted by Dan Walsh on October 02, 2018 10:27 AM

        I received this bug report this week.

        "I see this when I try to use vagrant from a container using podman on Fedora 29 Beta.

        Podman version: 0.8.4

        Command to run container:

        sudo podman run -it --rm -v /run/libvirt:/run/libvirt:Z -v $(pwd):/root:Z localhost/vagrant vagrant up

        Logs:

        ...

        Sep 30 21:17:25 Home audit[22760]: AVC avc:  denied  { connectto } for  pid=22760 comm="batch_action.r*" path="/run/libvirt/libvirt-sock" scontext=system_u:system_r:container_t:s0:c57,c527 tcontext=system_u:system_r:virtd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=0

        "

        This is an interesting use case of using SELinux and containers.  SELinux is protecting the file system, and the host from attack from inside of the container.  People who have listened to me over the years understand that SELinux is protecting the label of files, in the case of containers, it only allows a container_t to read/write/execute files labeled container_file_t.

        But the reporter of the bug, thinks he did the right thing, he told podman to relabel the volumes he was mounting into the container.

        Lets look at his command to launch the container.

        sudo podman run -it --rm -v /run/libvirt:/run/libvirt:Z -v $(pwd):/root:Z localhost/vagrant vagrant up

        He told podman to go out and relabel /run/libvirt and $(pwd)/root with a private label generated for the container, that is what the ":Z" means, system_u:object_r:container_file_t:MCS.  Well sadly this is not the right thing to do and will probably cause him issues going forward.  Since /run/libvirt is probably used by other processes outside of the container, he might have broken them.  libvirt running as virtd_t is not probably not allowed to write to container_file_t.  The $(pwd)/root directory is probably fine, since this is not likely to be shared with other confined daemons.

        Ignoring that this was the wrong thing to do, 

        SELinux still blocked the container, Why?

        SELinux does not only block access to files on disk.  While SElinux would allow container_t to write to a unix domain socket, "/run/libvirt/libvirt-sock", labeled container_file_t, a second SELinux check happens between the processes. SELinux also checks whether the container can talk to the daemon, libvirt,  running as virtd_t.

        Since there is no allow rule for container_t to connectto virtd_t, the connection fails.

        Currently in situations like this I tell people to just disable SELinux separation inside this container, rather then fooling around with the labels.

        sudo podman run -it --security-opt label=disable  --rm -v /run/libvirt:/run/libvirt -v $(pwd):/root localhost/vagrant vagrant up

        Notice I removed the :Z. This will cause podman to run the container as spc_t, which is an unconfined domain, and all confined domains are allowed to communicate with it.

        Since this is not a full disablement of SELinux, it does not make me cry.  :^)

        In the future Lukas Vrabek is working on a better solution, udica, where you could simple create a new type based on container_t, and then run your container with it.

        udica should allow you to generate container_vagrant_t which would be allowed to write to /run/libvirts labels and communicate with virtd_t, and still have all other SELinux confinement. Then you could execute something like this:

        sudo podman run -it --security-opt label=type:container_vagrant_t --rm -v /run/libvirt:/run/libvirt -v $(pwd):/root localhost/vagrant vagrant up


        Millions of unfixed security flaws is a lie

        Posted by Josh Bressers on October 01, 2018 01:26 PM

        On a pretty regular basis I see claims that the public CVE dataset is missing some large number of security issues. I’ve seen ranges from tens of thousands all the way up to millions. The purpose behind such statements is to show that the CVE data is woefully incomplete. Of course almost everyone making that claim has a van filled with security issues and candy they’re trying very hard to lure us into. It’s a pretty typical sales tactic as old as time itself. Whatever you have today isn’t good enough, but what I have, holy cow it’s better. It’s so much better you better come right over and see for yourself. After you pay me of course.

        If you take away any single thing from this post, make it this: There are not millions of unfixed security flaws missing from the CVE data.

        If you’re not familiar with how CVE works, I’ll give you a very short crash course. Essentially someone (anyone) requests a CVE ID, and if it’s a real security issue, a CVE gets assigned. It really is fundamentally this simple. Using some sort of advanced logic, the obvious question becomes: “why not get a CVE ID for all these untracked security flaws?”

        That’s a great question! There are two possible reasons for this. The first is the organizations in question don’t want to share what they know. The second is all the things they claim are security issues really aren’t security issues at all. The second answer is of course correct, but let’s understand why.

        The first answer assumes their security flaw are some sort of secret information only they know. This would also suggest the security issues in question are not acknowledged by the projects or vendors. If a project has any sort of security maturity, they are issuing CVE IDs (note: if you are a project who cares about security and don’t issue CVE IDs, talk to me, I will help you). This means that if the project knows about a security issue they will release a CVE ID for it. If they don’t know about the issue, it not only doesn’t have a CVE ID but is also unfixed. Not telling projects and vendors about security issues would be pretty weaselly. It also wouldn’t make anyone any safer. In fact it would make us all a lot less safe.

        This brings us to the next stop in our complex logical journey. If you are a company that has the ability to scan and track security issues, and you find an unknown security issue in a project, you will want to make some noise about finding it. That means you follow some sort of security process that includes getting a CVE ID for the issue in question. After all, you want to make sure your security problem is known to the public and what better way then the largest public security dataset?

        This brings us to our logical conclusion about all these untracked security issues is that they’re not really security problems. Some are just bugs. Some are nothing. Some are probably design decisions. Fundamentally if there is a security issue that matters, it will get a CVE ID. We should all be working together to make CVE better, not trying to claim our secret data is better than everyone else’s. There are no winners and loser when it comes to security issues. We all win or we all lose.

        As most of these sort of fantastical claims tend to end, if it sounds too good to be true, it probably is.

        Episode 116 - The future of the CISO with Michael Piacente

        Posted by Open Source Security Podcast on October 01, 2018 12:01 AM
        Josh and Kurt talk to Michael Piacente from Hitch Partners about the past, present, and future role of the CISO in the industry.

        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7104119/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Episode 115 - Discussion with Brian Hajost from SteelCloud

        Posted by Open Source Security Podcast on September 24, 2018 12:02 AM
        Josh and Kurt talk to Brian Hajost from SteelCloud about public sector compliance. The world of public sector compliance can be confusing and strange, but it's not that bad when it's explained by someone with experience.

        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7081715/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Episode 114 - Review of "Click Here to Kill Everybody"

        Posted by Open Source Security Podcast on September 17, 2018 12:13 AM
        Josh and Kurt review Bruce Schneier's new book Click Here to Kill Everybody. It's a book everyone could benefit from reading. It does a nice job explaining many existing security problems in a simple manner.

        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7052800/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        Episode 113 - Actual real security advice

        Posted by Open Source Security Podcast on September 10, 2018 12:07 AM
        Josh and Kurt talk about actual real world advice. Based on a story about trying to secure political campaigns, if we had to give some security help what should it look like, who should we give it to?

        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7024289/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes

        • Security advice to Democrats
        • Our actual advice
          • Don’t run your own services 
          • Email - Google or Microsoft 
          • Don't’ use GPG 
          • Use a trusted device 
          • Use a password manager on a secure device 
          • Use 2FA 
          • Backups 

        Converting a RHEL Workstation to a Server

        Posted by Adam Young on September 08, 2018 04:07 AM

        My laptop is my Demo machine.  I need to be able to run the Red Hat cloud Suite of software on it.  I want to install this software the same way a customer would.  However, much of this software is server side software, and my machine was registered as a workstation. This means the Red Hat Content network won’t show me the server yum repositories.  Here is how I converted my machine to be a server.

        The key is to change the installed  RPM from  redhat-release-workstation to redhat-release-server.  These two RPMS control the set of files that tell the yum system what product is installed, and from that, the set of available (Yum) repositories.  However, since they conflict, you cannot just install redhat-release-server.  That leads to the following errors:

        Transaction check error:
          file /etc/os-release from install of redhat-release-server-7.5-8.el7.x86_64 conflicts with file from package redhat-release-workstation-7.5-8.el7.x86_64
          file /etc/redhat-release from install of redhat-release-server-7.5-8.el7.x86_64 conflicts with file from package redhat-release-workstation-7.5-8.el7.x86_64
          file /etc/system-release-cpe from install of redhat-release-server-7.5-8.el7.x86_64 conflicts with file from package redhat-release-workstation-7.5-8.el7.x86_64
          file /usr/lib/systemd/system-preset/90-default.preset from install of redhat-release-server-7.5-8.el7.x86_64 conflicts with file from package redhat-release-workstation-7.5-8.el7.x86_64

        Here are the steps I worked out to work around this.

        First, download the  redhat-release-server RPM on a server-registered machine.  Use the yum coammnad, to make sure keys are presnet, and the repo lets you in.

        sudo yum reinstall redhat-release-server  --downloadonly

        This will download a copy, that you can find with:

         

        find /var/cache/yum/x86_64/7Server/rhel-7-server-rpms/ -name redhat-release-server-7.5-8.el7.x86_64.rpm

        And then copy it over from the target machine.  IN my case:

        scp -i ~/keys/id_rsa cloud-user@128.31.26.132:/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/packages/redhat-release-server-7.5-8.el7.x86_64.rpm /home/ayoung/Downloads/

        To install it, use the yum shell to perform multiple yum commands in a single transaction:

        $ sudo yum shell
        Loaded plugins: changelog, fs-snapshot, priorities, product-id, refresh-packagekit, rpm-warm-cache, search-disabled-repos, subscription-manager, verify
        > erase redhat-release-workstation-7.5-8.el7.x86_64
        > install /home/ayoung/Downloads/redhat-release-server-7.5-8.el7.x86_64.rpm
        Examining /home/ayoung/Downloads/redhat-release-server-7.5-8.el7.x86_64.rpm: redhat-release-server-7.5-8.el7.x86_64
        Marking /home/ayoung/Downloads/redhat-release-server-7.5-8.el7.x86_64.rpm to be installed
        > run

        Assuming that runs to completion, use the command exit to return to the bash command prompt.  Update the set of repos with:

        sudo  mv /etc/yum.repos.d/redhat.repo /etc/yum.repos.d/redhat.repo.old
        sudo subscription-manager refresh

        And then list, and you should see that most of the repos that had “workstation” in them before now have “server” in their names.

        $ sudo subscription-manager repos --list-enabled
        +----------------------------------------------------------+
        Available Repositories in /etc/yum.repos.d/redhat.repo
        +----------------------------------------------------------+
        Repo ID: rhel-7-server-htb-rpms
        Repo Name: Red Hat Enterprise Linux 7 Server HTB (RPMs)
        Repo URL: https://cdn.redhat.com/content/htb/rhel/server/7/$basearch/os
        Enabled: 1
        
        Repo ID: rhel-7-workstation-rpms
        Repo Name: Red Hat Enterprise Linux 7 Workstation (RPMs)
        Repo URL: https://cdn.redhat.com/content/dist/rhel/workstation/7/$releasever/$basearch/os
        Enabled: 1
        
        Repo ID: rhel-7-server-rpms
        Repo Name: Red Hat Enterprise Linux 7 Server (RPMs)
        Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basearch/os
        Enabled: 1

        I only want the server RPMs for now:

        $ sudo subscription-manager repos --disable rhel-7-server-htb-rpms
        Repository 'rhel-7-server-htb-rpms' is disabled for this system.
        $ sudo subscription-manager repos --disable rhel-7-workstation-rpms
        Repository 'rhel-7-workstation-rpms' is disabled for this system.

        And…

        $ sudo yum update
        Loaded plugins: changelog, fs-snapshot, priorities, product-id, refresh-packagekit, rpm-warm-cache, search-disabled-repos, subscription-manager, verify
        No packages marked for update

        I wonder what this is going to break.

        I cannot, yet, say whether this is a sane thing to do or not.  I’ll let you know.

         

         

        SELinux prevent users from executing programs, for security? Who cares.

        Posted by Dan Walsh on September 04, 2018 12:45 PM

        I recently received the following email about using SELinux to prevent users from executing programs.
         

        I just started to learn SELinux and this is nice utility if you want confine any user who interact with your system.

        A lot of information on Net about how to confine programs, but can't find about confining man's :)

        I found rbash (https://access.redhat.com/solutions/65822) which help me forbid execution any software inside and outside user home directory except few.

        As I understand correctly to do this using SELinux I need a new user domain(customuser)  which by default should deny all or I can start with predefined       guest_t?

        Next then for example I can enable netutils_exec_ping(customuser_t, customuser_r).

        I responded that:

        SELinux does not worry so much about executing individual programs, although it can do this.  SELinux is basically about  defining the access of a process type.  
        Just because a program can execute another program does not mean  that this process type is going to be allowed the access that the program requires.  For example.  

        A user running as guest_t can execute su and sudo, and even if the user might discover the       correct password to become root, they can not become root on the system, SELinux would block it.  Similarly guest_t is not allowed to connect out of the system, so being able to execute ssh or ping does not mean that the user would be able to ping another host or       ssh to another system.

        This is far more powerful than just blocking access to certain programs, since the user theoretically could down load those programs to his homedir, and use them there.

        There are lots of Turing complete tools that the user will get access to, that would allow them to write code to do pretty much what every application installed on the system can do.  

        Bottom line:

        Blocking access to system objects and Linux Capabilities is far mor powerfull then blocking a user process from executing a program on disk.