Fedora security Planet

Episode 142 - Hypothetical security: what if you find a USB flash drive?

Posted by Open Source Security Podcast on April 21, 2019 11:48 PM
Josh and Kurt talk about what one could do if you find a USB drive. The context is based on the story where the Secret Service was rumored to have plugged a malicious USB drive into a computer. The purpose of discussion is to explore how to handle a situation like this in the real world. We end the episode with a fantastic comparison of swim safety and security.


<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/9476621/height/90/theme/custom/thumbnail/yes/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes


    Episode 141 - Timezones are hard, security is harder

    Posted by Open Source Security Podcast on April 15, 2019 12:18 AM
    Josh and Kurt talk about the difficulty of security. We look at the difficulty of the EU not observing daylight savings time, which is probably magnitudes easier than getting security right. We also hit on a discussion on Reddit about U2F that shows the difficulty. Security today is too hard, even for the experts.


    <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/9392213/height/90/theme/custom/thumbnail/yes/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

    Show Notes


      Using Rust Generics to Enforce DB Record State

      Posted by William Brown on April 12, 2019 02:00 PM

      Using Rust Generics to Enforce DB Record State

      In a database, entries go through a lifecycle which represents what attributes they have have, db record keys, and if they have conformed to schema checking.

      I’m currently working on a (private in 2019, public in july 2019) project which is a NoSQL database writting in Rust. To help us manage the correctness and lifecycle of database entries, I have been using advice from the Rust Embedded Group’s Book.

      As I have mentioned in the past, state machines are a great way to design code, so let’s plot out the state machine we have for Entries:

      Entry State Machine

      The lifecyle is:

      • A new entry is submitted by the user for creation
      • We schema check that entry
      • If it passes schema, we commit it and assign internal ID’s
      • When we search the entry, we retrieve it by internal ID’s
      • When we modify the entry, we need to recheck it’s schema before we commit it back
      • When we delete, we just remove the entry.

      This leads to a state machine of:

                          |
                   (create operation)
                          |
                          v
                  [ New + Invalid ] -(schema check)-> [ New + Valid ]
                                                            |
                                                     (send to backend)
                                                            |
                                                            v    v-------------\
      [Commited + Invalid] <-(modify operation)- [ Commited + Valid ]          |
                |                                          ^   \       (write to backend)
                \--------------(schema check)-------------/     ---------------/
      

      This is a bit rough - The version on my whiteboard was better :)

      The main observation is that we are focused only on the commitability and validty of entries - not about where they are or if the commit was a success.

      Entry Structs

      So to make these states work we have the following structs:

      struct EntryNew;
      struct EntryCommited;
      
      struct EntryValid;
      struct EntryInvalid;
      
      struct Entry<STATE, VALID> {
          state: STATE,
          valid: VALID,
          // Other db junk goes here :)
      }
      

      We can then use these to establish the lifecycle with functions (similar) to this:

      impl Entry<EntryNew, EntryInvalid> {
          fn new() -> Self {
              Entry {
                  state: EntryNew,
                  valid: EntryInvalid,
                  ...
              }
          }
      
      }
      
      impl<STATE> Entry<STATE, EntryInvalid> {
          fn validate(self, schema: Schema) -> Result<Entry<STATE, EntryValid>, ()> {
              if schema.check(self) {
                  Ok(Entry {
                      state: self.state,
                      valid: EntryValid,
                      ...
                  })
              } else {
                  Err(())
              }
          }
      
          fn modify(&mut self, ...) {
              // Perform any modifications on the entry you like, only works
              // on invalidated entries.
          }
      }
      
      impl<STATE> Entry<STATE, EntryValid> {
          fn seal(self) -> Entry<EntryCommitted, EntryValid> {
              // Assign internal id's etc
              Entry {
                  state: EntryCommited,
                  valid: EntryValid,
              }
          }
      
          fn compare(&self, other: Entry<STATE, EntryValid>) -> ... {
              // Only allow compares on schema validated/normalised
              // entries, so that checks don't have to be schema aware
              // as the entries are already in a comparable state.
          }
      }
      
      impl Entry<EntryCommited, EntryValid> {
          fn invalidate(self) -> Entry<EntryCommited, EntryInvalid> {
              // Invalidate an entry, to allow modifications to be performed
              // note that modifications can only be applied once an entry is created!
              Entry {
                  state: self.state,
                  valid: EntryInvalid,
              }
          }
      }
      

      What this allows us to do importantly is to control when we apply search terms, send entries to the backend for storage and more. Benefit is this is compile time checked, so you can never send an entry to a backend that is not schema checked, or run comparisons or searches on entries that aren’t schema checked, and you can even only modify or delete something once it’s created. For example other parts of the code now have:

      impl BackendStorage {
          // Can only create if no db id's are assigned, IE it must be new.
          fn create(&self, ..., entry: Entry<EntryNew, EntryValid>) -> Result<...> {
          }
      
          // Can only modify IF it has been created, and is validated.
          fn modify(&self, ..., entry: Entry<EntryCommited, EntryValid>) -> Result<...> {
          }
      
          // Can only delete IF it has been created and committed.
          fn delete(&self, ..., entry: Entry<EntryCommited, EntryValid>) -> Result<...> {
          }
      }
      
      impl Filter<STATE> {
          // Can only apply filters (searches) if the entry is schema checked. This has an
          // important behaviour, where we can schema normalise. Consider a case-insensitive
          // type, we can schema-normalise this on the entry, then our compare can simply
          // be a string.compare, because we assert both entries *must* have been through
          // the normalisation routines!
          fn apply_filter(&self, ..., entry: &Entry<STATE, EntryValid>) -> Result<bool, ...> {
          }
      }
      

      Using this with Serde?

      I have noticed that when we serialise the entry, that this causes the valid/state field to not be compiled away - because they have to be serialised, regardless of the empty content meaning the compiler can’t eliminate them.

      A future cleanup will be to have a serialised DBEntry form such as the following:

      struct DBEV1 {
          // entry data here
      }
      
      enum DBEntryVersion {
          V1(DBEV1)
      }
      
      struct DBEntry {
          data: DBEntryVersion
      }
      
      impl From<Entry<EntryNew, EntryValid>> for DBEntry {
          fn from(e: Entry<EntryNew, EntryValid>) -> Self {
              // assign db id's, and return a serialisable entry.
          }
      }
      
      impl From<Entry<EntryCommited, EntryValid>> for DBEntry {
          fn from(e: Entry<EntryCommited, EntryValid>) -> Self {
              // Just translate the entry to a serialisable form
          }
      }
      

      This way we still have the zero-cost state on Entry, but we are able to move to a versioned seralised structure, and we minimise the run time cost.

      Testing the Entry

      To help with testing, I needed to be able to shortcut and move between anystate of the entry so I could quickly make fake entries, so I added some unsafe methods:

      #[cfg(test)]
      unsafe fn to_new_valid(self, Entry<EntryNew, EntryInvalid>) -> {
          Entry {
              state: EntryNew,
              valid: EntryValid
          }
      }
      

      These allow me to setup and create small unit tests where I may not have a full backend or schema infrastructure, so I can test specific aspects of the entries and their lifecycle. It’s limited to test runs only, and marked unsafe. It’s not “technically” memory unsafe, but it’s unsafe from the view of “it could absolutely mess up your database consistency guarantees” so you have to really want it.

      Summary

      Using statemachines like this, really helped me to clean up my code, make stronger assertions about the correctness of what I was doing for entry lifecycles, and means that I have more faith when I and future-contributors will work on the code base that we’ll have compile time checks to ensure we are doing the right thing - to prevent data corruption and inconsistency.

      The security of dependencies

      Posted by Josh Bressers on April 10, 2019 01:47 AM

      So you’ve written some software. It’s full of open source dependencies. These days all software is full of open source, there’s no way around it at this point. I explain the background in my previous post.

      Now that we have all this open source, how do we keep up with it? If you’re using a lot of open source in your code there could be one or more updated dependencies per day!

      Step one is knowing what you have. There are a ton of ways to do this, but I’m going to bucket things into 3 areas.

      1. Do nothing
      2. Track things on your own
      3. Use an existing tool to track things

      Do nothing

      First up is don’t track anything. Ignore the problem.

      At first glance you may think I’m joking, but this could be a potential solution. There are two ways to think about this one.

      One is you literally ignore the dependencies. You never ever update them. Ever. This is a bad idea, there will be bugs, there will be security problem. They will affect you and someday you’ll regret this decision. I wouldn’t suggest this to anyone ever. If you do this, make sure you keep your résumé up to date.

      The non bananas way you can do this is to let things auto update. I don’t mean ignore things altogether, I mean ignore knowing exactly what you have. If you’re building a container, make sure you update the container to the latest and greatest everything during build. For example if you have a Fedora container, you would run “dnf -y upgrade” on every build. That will pull in the latest and greatest packages from Fedora. If you pull in npm dependencies, you make sure the latest and greatest npm packages are installed every time you build. If you’re operating in a very devops style environment you’re rebuilding everything constantly (right …. RIGHT!) so why not take advantage of it.

      Now, it should be noted that if you operate in this way, sometimes things will break. And by sometimes I mean quite often and by things I mean everything. Updated dependencies will eventually break existing functionality. The more dependencies you have, the more often things will break. It’s not a deal breaker, it’s just something you have to be prepared for.

      Track things on your own

      The next option is to track things on your own. This one is going to be pretty rough. I’ve been part of teams that have done this in the past. It’s a lot of heavy lifting. A lot. You have to keep track of everything you have, everything that gets added, how it’s used, what’s being updated. What has outstanding security problems. I would compare this to juggling 12 balls with one hand.

      Now, even though it’s extremely difficult to try to track all this on your own, you do have the opportunity to track exactly what you need and how you need it. It’s an effort that will require a generous number of people.

      I’m not going to spend any time explaining this because I think it’s a corner case now. It used to be fairly common mostly because options 1 and 3 either didn’t exist or weren’t practical. If this is something you have interest in, feel free to reach out, I’d be happy to convince you not to do it 🙂

      Use an existing tool to track things

      The last option is to use an existing tool. In the past few years there have been quite a few tools and companies to emerge with the purpose of tracking what open source you have in your products. Some have a focus on security vulnerabilities. Some focus on licensing. Some look for code that’s been copy and pasted. It’s really nice to see so many options available.

      There are two really important things you should keep in mind if this is the option you’re interested in. Firstly, understand what your goal is. If your primary concern is keeping your dependencies up to date in a node.js project, make sure you look for that. Some tools do a better job with certain languages. Some tools inspect containers and not source code for example. Some focus on git repositories. Know what you want then go find it.

      The second important thing to keep in mind is none of these tools are going to be 100% correct. You’ll probably see around 80% accuracy, maybe less depending what you’re doing. I often say “perfect and nothing are the same thing”. There is no perfect here so don’t expect it. There are going to be false positives, there will be false negatives. This isn’t a reason to write off tools. Account for this in your planning. Things will get missed, there will be some fire-drills. If you’re prepared to deal with it it won’t be a huge deal.

      The return on investment will be magnitudes greater than trying to build your own perfect tracking system. It’s best to look at a lot of these things from a return on investment perspective. Perfect isn’t realistic. Nothing isn’t realistic. Find your minim viable security.

       

      So now that you know what you’re shipping, how does this all work, what do you do next? We’ll cover that in the near future. Stay tuned.

      Episode 140 - Good enough security is a pretty high bar

      Posted by Open Source Security Podcast on April 08, 2019 12:13 AM
      Josh and Kurt talk about identity. It's a nice example we can generally understand in the context of how much security is enough security? When we deal with identity the idea of good enough is often acceptable for the vast majority of uses. Perfect identity tracking isn't really a thing nor is it practical.


      <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/9303929/height/90/theme/custom/thumbnail/yes/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

      Show Notes


        Debugging MacOS bluetooth audio stutter

        Posted by William Brown on April 07, 2019 02:00 PM

        Debugging MacOS bluetooth audio stutter

        I was noticing that audio to my bluetooth headphones from my iPhone was always flawless, but I started to noticed stutter and drops from my MBP. After exhausting some basic ideas, I was stumped.

        To the duck duck go machine, and I searched for issues with bluetooth known issues. Nothing appeared.

        However, I then decided to debug the issue - thankfully there was plenty of advice on this matter. Press shift + option while clicking bluetooth in the menu-bar, and then you have a debug menu. You can also open Console.app and search for “bluetooth” to see all the bluetooth related logs.

        I noticed that when the audio stutter occured that the following pattern was observed.

        default     11:25:45.840532 +1000   wirelessproxd   About to scan for type: 9 - rssi: -90 - payload: <00000000 00000000 00000000 00000000 00000000 0000> - mask: <00000000 00000000 00000000 00000000 00000000 0000> - peers: 0
        default     11:25:45.840878 +1000   wirelessproxd   Scan options changed: YES
        error       11:25:46.225839 +1000   bluetoothaudiod Error sending audio packet: 0xe00002e8
        error       11:25:46.225899 +1000   bluetoothaudiod Too many outstanding packets. Drop packet of 8 frames (total drops:451 total sent:60685 percentDropped:0.737700) Outstanding:17
        

        There was always a scan, just before the stutter initiated. So what was scanning?

        I searched for the error related to packets, and there were a lot of false leads. From weird apps to dodgy headphones. In this case I could eliminate both as the headphones worked with other devices, and I don’t have many apps installed.

        So I went back and thought about what macOS services could be the problem, and I found that airdrop would scan periodically for other devices to send and recieve files. Disabling Airdrop from the sharing menu in System Prefrences cleared my audio right up.

        Supplying the supply chain

        Posted by Josh Bressers on April 02, 2019 11:53 PM

        A long time ago Marc Andreessen said “software is eating the world”. This statement ended up being quite profound in hindsight, as most profound statements are. At the time nobody really understood what he meant and it probably wasn’t until the public cloud caught on that it became something nobody could ignore. The future of technology was less about selling hardware as it is about building software.

        We’re at a point now where it’s time to rethink software. Well, the rethinking happened quite some time ago, now everyone has to catch up. Today it’s a pretty safe statement to declare open source is eating the world. Open source won, it’s everywhere, you can’t not use it. It’s not always well understood. And it’s powering your supply chain, even if you don’t know it.

        In a previous post I talk about what open source dependencies are. This post is meant to explain how all these dependencies interact with each other and what you need to know about it. The topic of supply chains is coming up more and more and then and it’s usually not great news. When open source comes up in the context of the supply chain it’s very common for the story to center around how dangerous open source is. Of course if you just use this one tool, or this one vendor, or this one something, you’ll be able to sleep at night. Buying solutions for problems you don’t understand is usually slightly less useful than just throwing your money directly into the fire.

        Any application depends on other software. Without getting overly detailed it’s safe to say that most of us develop software using libraries, interpreters, compilers, and operating systems from somewhere else. In most cases these are open source projects. Purely proprietary software is an endangered species. It’s probably already extinct but there are a few deniers who won’t let it go quietly into the night.

        The intent of the next few blog posts is going to be to pick apart what using open source in your supply chain means. For the rest of this particular post we’re going to put our focus on open source libraries you depend on in your project. Specifically I’m going to pick on npm and containers in my examples. They have two very different ways to deal with dependencies. Containers tends to include packaged dependencies where npm has a more on demand approach. I don’t think one is right, each has drawbacks and advantages, they’re just nice examples that are widely used.

        Let’s explain Containers first.

        So in the container world we use what’s called a filesystem bundle. It’s really just a compressed archive file but that’s not important. The idea is if you need some sort of library to solve a problem, you toss it in the bundle. You can share your bundles, others can add more things on top, then ship a complete package that has all the important bits stuffed inside in one pretty package. This is mostly done because it’s far easier to deploy a complete system than it is to give someone hundreds of poorly written instructions to setup and deploy a solution. Sysadmins from the late 90’s and early 2000’s understand this pain better than anyone ever. The advantages substantially outweigh the drawbacks which is one of the reasons containers are taking over the world.

        The way something like NPM does this is a bit different. When you need a dependency for NPM, you install the dependency, then it installs whatever it needs. It’s sort of turtles all the way down with dependencies having dependencies of dependencies. Then you get to use it. The thing that’s missed sometimes is that if you install something today, then you install the exact same thing tomorrow, you could get a different set of packages and versions. If version 1.2 is released today it couldn’t have been the version you installed yesterday. This has the advantage of getting more updated packages, but has the downside of breaking things as newer packages can behave differently. You can work around this by specifying a certain version of a package at install time. It’s not uncommon to peg the version like this, but it does introduce some of the container problems with outdated dependencies.

        My code is old but it works

        There are basically two tradeoffs here.

        You have old code in your project, but it works because you don’t have to worry about a newer library version changing something. While the code doesn’t change, the world around it does. There is a 100% chance security flaws and bugs will be discovered and fixed in the dependencies you rely on.

        The other option is you don’t have old libraries, you update things constantly and quickly but you run the risk of breaking your application every time you update the dependencies. It’s also 100% risk. At some point, something will happen that breaks your application. Sometimes it will be a one line fix, sometimes you’re going to be rewriting huge sections of a feature or installing the old library and never updating it again.

        The constant update option is the more devops style and probably the future, but we have to get ourselves to that future. It’s not practical for every project to update their dependencies at this breakneck speed.

        What now

        The purpose of this post wasn’t to solve any problems, it’s just to explain where we are today. Problem solving will come as part of the next few posts on this topic. I have future posts that will explain how to handle the dependencies in your project, and a post that explains some of the rules and expectations around handling open source security problems.

        GDB autoloads for 389 DS

        Posted by William Brown on April 02, 2019 01:00 PM

        GDB autoloads for 389 DS

        I’ve been writing a set of extensions to help debug 389-ds a bit easier. Thanks to the magic of python, writing GDB extensions is really easy.

        On OpenSUSE, when you start your DS instance under GDB, all of the extensions are automatically loaded. This will help make debugging a breeze.

        zypper in 389-ds gdb
        gdb /usr/sbin/ns-slapd
        
        GNU gdb (GDB; openSUSE Tumbleweed) 8.2
        (gdb) ds-
        ds-access-log  ds-backtrace
        (gdb) set args -d 0 -D /etc/dirsrv/slapd-<instance name>
        (gdb) run
        ...
        

        All the extensions are under the ds- namespace, so they are easy to find. There are some new ones on the way, which I’ll discuss here too:

        ds-backtrace

        As DS is a multithreaded process, it can be really hard to find the active thread involved in a problem. So we provided a command that knows how to fold duplicated stacks, and to highlight idle threads that you can (generally) skip over.

        ===== BEGIN ACTIVE THREADS =====
        Thread 37 (LWP 70054))
        Thread 36 (LWP 70053))
        Thread 35 (LWP 70052))
        Thread 34 (LWP 70051))
        Thread 33 (LWP 70050))
        Thread 32 (LWP 70049))
        Thread 31 (LWP 70048))
        Thread 30 (LWP 70047))
        Thread 29 (LWP 70046))
        Thread 28 (LWP 70045))
        Thread 27 (LWP 70044))
        Thread 26 (LWP 70043))
        Thread 25 (LWP 70042))
        Thread 24 (LWP 70041))
        Thread 23 (LWP 70040))
        Thread 22 (LWP 70039))
        Thread 21 (LWP 70038))
        Thread 20 (LWP 70037))
        Thread 19 (LWP 70036))
        Thread 18 (LWP 70035))
        Thread 17 (LWP 70034))
        Thread 16 (LWP 70033))
        Thread 15 (LWP 70032))
        Thread 14 (LWP 70031))
        Thread 13 (LWP 70030))
        Thread 12 (LWP 70029))
        Thread 11 (LWP 70028))
        Thread 10 (LWP 70027))
        #0  0x00007ffff65db03c in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
        #1  0x00007ffff66318b0 in PR_WaitCondVar () at /usr/lib64/libnspr4.so
        #2  0x00000000004220e0 in [IDLE THREAD] connection_wait_for_new_work (pb=0x608000498020, interval=4294967295) at /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:970
        #3  0x0000000000425a31 in connection_threadmain () at /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:1536
        #4  0x00007ffff6637484 in None () at /usr/lib64/libnspr4.so
        #5  0x00007ffff65d4fab in start_thread () at /lib64/libpthread.so.0
        #6  0x00007ffff6afc6af in clone () at /lib64/libc.so.6
        

        This example shows that there are 17 idle threads (look at frame 2) here, that all share the same trace.

        ds-access-log

        The access log is buffered before writing, so if you have a coredump, and want to see the last few events before they were written to disk, you can use this to display the content:

        (gdb) ds-access-log
        ===== BEGIN ACCESS LOG =====
        $2 = 0x7ffff3c3f800 "[03/Apr/2019:10:58:42.836246400 +1000] conn=1 fd=64 slot=64 connection from 127.0.0.1 to 127.0.0.1
        [03/Apr/2019:10:58:42.837199400 +1000] conn=1 op=0 BIND dn=\"\" method=128 version=3
        [03/Apr/2019:10:58:42.837694800 +1000] conn=1 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0001200300 dn=\"\"
        [03/Apr/2019:10:58:42.838881800 +1000] conn=1 op=1 SRCH base=\"\" scope=2 filter=\"(objectClass=*)\" attrs=ALL
        [03/Apr/2019:10:58:42.839107600 +1000] conn=1 op=1 RESULT err=32 tag=101 nentries=0 etime=0.0001070800
        [03/Apr/2019:10:58:42.840687400 +1000] conn=1 op=2 UNBIND
        [03/Apr/2019:10:58:42.840749500 +1000] conn=1 op=2 fd=64 closed - U1
        ", '\276' <repeats 3470 times>
        

        At the end the line that repeats shows the log is “empty” in that segment of the buffer.

        ds-entry-print

        This command shows the in-memory entry. It can be common to see Slapi_Entry * pointers in the codebase, so being able to display these is really helpful to isolate what’s occuring on the entry. Your first argument should be the Slapi_Entry pointer.

        (gdb) ds-entry-print ec
        Display Slapi_Entry: cn=config
        cn: config
        objectClass: top
        objectClass: extensibleObject
        objectClass: nsslapdConfig
        nsslapd-schemadir: /opt/dirsrv/etc/dirsrv/slapd-standalone1/schema
        nsslapd-lockdir: /opt/dirsrv/var/lock/dirsrv/slapd-standalone1
        nsslapd-tmpdir: /tmp
        nsslapd-certdir: /opt/dirsrv/etc/dirsrv/slapd-standalone1
        ...
        

        Episode 139 - Secure voting, firefox send, and toxic comments on the internet

        Posted by Open Source Security Podcast on April 01, 2019 12:03 AM
        Josh and Kurt talk about Brexit, voting, Firefox send, and toxic comments. Is there anything we can do to slow the current trend of conversation on the Internet always seeming to spiral out of control? The answer is maybe with a lot of asterisks.


        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/9097757/height/90/theme/custom/thumbnail/yes/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

        Show Notes


          Episode 138 - Information wants to be free

          Posted by Open Source Security Podcast on March 25, 2019 12:05 AM
          Josh and Kurt talk about a prank gone wrong, the reality of when your data ends up public. Once it's public you can't ever put it back. We also discuss Notepad++ no longer signing releases and what signing releases means for the world in general.


          <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/9031157/height/90/theme/custom/thumbnail/yes/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

          Show Notes


            Lost in (Kerberos) service translation?

            Posted by Alexander Bokovoy on March 24, 2019 07:13 AM

            A year ago Brian J. Atkisson from Red Hat IT filed a bug against FreeIPA asking to remove a default [domain_realm] mapping section from the krb5.conf configuration file generated during installation of a FreeIPA client. The bug is still open and I’d like to use this opportunity to discuss some less known aspects of a Kerberos service principal resolution.

            When an application uses Kerberos to authenticate to a remote service, it needs to talk to a Kerberos key distribution center (KDC) to obtain a service ticket to that remote service. There are multiple ways how an application could construct a name of a service but in a simplistic view it boils down to getting a remote service host name and attaching it to a remote service type name. Type names are customary and really depend on an established tradition for a protocol in use. For example, browsers universally assume that a component HTTP/ is used in the service name; to authenticate to www.example.com server they would ask a KDC a service ticket to HTTP/www.example.com principal. When an LDAP client talks to an LDAP server ldap.example.com and uses SASL GSSAPI authentication, it will ask KDC for a service ticket to ldap/ldap.example.com. Sometimes these assumptions are written down in a corresponding RFC document, sometimes not, but they assume both client and server know what they are doing.

            There are, however, few more moving parts at play. A host name part of a service principal might come from an interaction with a user. For browser, this would be a server name from a URL entered by a user and browser would need to construct the target service principal from it. The host name part might be incomplete in some cases: if you only have a single DNS domain in use, server names would be unique in that domain and your users might find it handy to only use the first label of a DNS name of the server to address it. Such approach was certainly very popular among system administrators who relied on capabilities of a Kerberos library to expand the short name into a fully qualified one.

            Let’s look into that. Kerberos configuration file, krb5.conf, allows to say for any application that a hostname passed down to the library would need to be canonicalized. This option, dns_canonicalize_hostname, allows us to say “I want to connect to a server bastion” and let libkrb5 to expand that one to a bastion.example.com host name. While this behavior is handy, it relies on DNS. A downside of disabling canonicalization of the hostnames is that short hostnames will not be canonicalized and upon requests to KDC might be not recognized. Finally, there is a possibility of DNS hijacking. For Kerberos, use cases when DNS responses are spoofed aren’t too problematic since the fake KDC or the fake service wouldn’t gain much knowledge, but even in a normal situation a latency of DNS responses might be a considerable problem.

            Another part of the equation is to find out which Kerberos realm a specified target service principal belongs to. If you have a single Kerberos realm, it might not be an issue; by setting default_realm option in the krb5.conf we can make sure a client always assumes the only realm we have. However, if there are multiple Kerberos realms, it is important to map the target service principal to the target realm at a client side, before a request is issued to a KDC.

            There might be multiple Kerberos realms in existence at any site. For example, FreeIPA deployment provides one. If FreeIPA has established a trust to an Active Directory forest, then that forest would represent another Kerberos realm. Potentially, even more than one as each Active Directory domain in an Active Directory forest is a separate Kerberos realm in itself.

            Kerberos protocol defines that a realm in which the application server is located must be determined by the client (RFC 4120 section 3.3.1). The specification also defines several strategies how a client may map the hostname of the application server to the realm it believes the server belongs to.

            Domain to realm mapping

            Let us stop and think a bit at this point. A Kerberos client has full control over the decision process of to which realm a particular application server belongs to. If it decides that the application server is from a different realm than the client is itself, then it needs to ask for a cross-realm ticket granting ticket from its own KDC. Then, with the cross-realm TGT in possession, the client can ask a KDC of the application server’s realm for the actual service ticket.

            As a client, we want to be sure we would be talking to the correct KDC. As mentioned earlier, overly relying on DNS is not always a particulary secure action. As a result, krb5 library provides a way to consider how a particular hostname is mapped to a realm. The search mechanism for a realm mapping is pluggable and by default includes:

            • registry-based search on WIN32 (does nothing for Linux)
            • profile-based search: uses [domain_realm] section in krb5.conf to do actual mapping
            • dns-based search that can be disabled with dns_lookup_realm = false
            • domain-based search: it is disabled by default and can be enabled with realm_try_domains = ... option in krb5.conf

            The order of search is important. It is hard-coded in krb5 library and depends on what operation is performed. For realm selection it is hard-coded that profile-based search is done before DNS-based search and domain-based search is done as the last one.

            When [domain_realm] section exists in krb5.conf, it will be used to map a hostname of the application server to a realm. The mapping table in this section is typically build up based on the host and domain maps:

            [domain_realm]
               www.example.com = EXAMPLE.COM
               .dev.example.com = DEVEXAMPLE.COM
               .example.com = EXAMPLE.COM
            

            The mapping above says that www.example.com would be explicitly mapped to EXAMPLE.COM realm, all machines in DNS zone dev.example.com would be mapped to DEVEXAMPLE.COM realm and the rest of hosts in DNS zone example.com would be mapped to EXAMPLE.COM. This mapping only applies to hostnames, so a hostname foo.bar.example.com would not be mapped by this schema to any realm.

            Profile-based search is visible in the Kerberos trace output as a selection of the realm right at the beginning of a request for a service ticket to a host-based service principal:

            [root@client ~]# kinit -k
            [root@client ~]# KRB5_TRACE=/dev/stderr kvno -S cifs client.example.com
            [30798] 1552847822.721561: Getting credentials host/client.example.com@EXAMPLE.COM -> cifs/client.example.com@EXAMPLE.COM using ccache KEYRING:persistent:0:0
            ...
            

            The difference here is that for a service principal not mapped with profile-based search there will be no assumed realm and the target principal would be constructed without a realm:

            [root@client ~]# kinit -k
            [root@client ~]# KRB5_TRACE=/dev/stderr kvno -S ldap dc.ad.example.com
            [30684] 1552841274.602324: Getting credentials host/client.example.com@EXAMPLE.COM -> ldap/dc.ad.example.com@ using ccache KEYRING:persistent:0:0
            

            DNS-based search is activated when dns_lookup_realm option is set to true in krb5.conf and profile-based search did not return any results. Kerberos library will do a number of DNS queries for a TXT record starting with _kerberos. It will help it to discover which Kerberos realm is responsible for the DNS host of the application server. Kerberos library will perform these searches for the hostname itself first and then for each domain component in the hostname until it finds an answer or processes all domain components.

            If we have www.example.com as a hostname, then Kerberos library would issue a DNS query for TXT record _kerberos.www.example.com to find a name of the Kerberos realm of www.example.com. If that fails, next try will be for a TXT record _kerberos.example.com and so on, until DNS components are all processed.

            It should be noted that this algorithm is only implemented in MIT and Heimdal Kerberos libraries. Active Directory implementation from Microsoft does not allow to query _kerberos.$hostname DNS TXT record to find out which realm a target application server belongs to. Instead, Windows environments delegate the discovery process to their domain controllers.

            DNS canonicalization feature (or lack of) also affects DNS search since without it we wouldn’t know what realm to map to a non-fully qualified hostname. When dns_canonicalize_hostname option is set to false, Kerberos client would send the request to KDC with a default realm associated with the non-fully qualified hostname. Most likely such service principal wouldn’t be understood by the KDC and reported as not found.

            To help in this situations, FreeIPA KDC supports Kerberos principal aliases. One can use the following ipa command line tool’s command to add aliases to hosts. Remember that a host principal is really a host/<hostname>:

            $ ipa help host-add-principal
            Usage: ipa [global-options] host-add-principal HOSTNAME KRBPRINCIPALNAME... [options]
            
            Add new principal alias to host entry
            Options:
              -h, --help    show this help message and exit
              --all         Retrieve and print all attributes from the server. Affects
                            command output.
              --raw         Print entries as stored on the server. Only affects output
                            format.
              --no-members  Suppress processing of membership attributes.
            
            $ ipa host-add-principal bastion.example.com host/bastion
            -------------------------------------------
            Added new aliases to host "bastion.example.com"
            -------------------------------------------
              Host name: bastion.example.com
              Principal alias: host/bastion.example.com@EXAMPLE.COM, host/bastion@EXAMPLE.COM
            

            and for other Kerberos service principals the corresponding command is ipa service-add-principal:

            $ ipa help service-add-principal
            Usage: ipa [global-options] service-add-principal CANONICAL-PRINCIPAL PRINCIPAL... [options]
            
            Add new principal alias to a service
            Options:
              -h, --help    show this help message and exit
              --all         Retrieve and print all attributes from the server. Affects
                            command output.
              --raw         Print entries as stored on the server. Only affects output
                            format.
              --no-members  Suppress processing of membership attributes.
            
            $ ipa service-show HTTP/bastion.example.com
              Principal name: HTTP/bastion.example.com@EXAMPLE.COM
              Principal alias: HTTP/bastion.example.com@EXAMPLE.COM
              Keytab: False
              Managed by: bastion.example.com
              Groups allowed to create keytab: admins
            [root@nyx ~]# ipa service-add-principal HTTP/bastion.example.com HTTP/bastion
            ---------------------------------------------------------------------------------
            Added new aliases to the service principal "HTTP/bastion.example.com@EXAMPLE.COM"
            ---------------------------------------------------------------------------------
              Principal name: HTTP/bastion.example.com@EXAMPLE.COM
              Principal alias: HTTP/bastion.example.com@EXAMPLE.COM, HTTP/bastion@EXAMPLE.COM
            
            

            Finally, domain-based search is activated when realm_try_domains = ... is specified. In this case Kerberos library will try heuristics based on the hostname of the target application server and a specific number of domain components of the application server hostname depending on how many components realm_try_domains option is allowing to cut off. More about that later.

            However, there is another option employed by MIT Kerberos library. In case when MIT Kerberos client is unable to find out a realm on its own, starting with MIT krb5 1.6 version, a client will issue a request for without a known realm to own KDC. A KDC (must be MIT krb5 1.7 or later) can opt to recognize the hostname against own [domain_realm] mapping table and choose to issue a referral to the appropriate service realm.

            The latter approach would only work if the KDC has been configured to allow such referrals to be issued and if client is asking for a host-based service. FreeIPA KDC, by default, allows this behavior. For trusted Active Directory realms there is also a support from SSSD on IPA masters: SSSD generates automatically [domain_realm] and [capaths] sections for all known trusted realms so that KDC is able to respond with the referrals.

            However, a care should be taken by an application itself on the client side when constructing such Kerberos principal. For example, if we would use kvno utility, then a request kvno -S service hostname would ask for a referral while kvno service/hostname would not. The former is constructing a host-based principal while the latter is not.

            When looking at the Kerberos trace, we can see the difference. Below host/client.example.com is asking for a service ticket to ldap/dc.ad.example.com as a host-based principal, without knowing which realm the application server’s principal belongs to:

            [root@client ~]# kinit -k
            [root@client ~]# KRB5_TRACE=/dev/stderr kvno -S ldap dc.ad.example.com
            [30684] 1552841274.602324: Getting credentials host/client.example.com@EXAMPLE.COM -> ldap/dc.ad.example.com@ using ccache KEYRING:persistent:0:0
            [30684] 1552841274.602325: Retrieving host/client.example.com@EXAMPLE.COM -> ldap/dc.ad.example.com@ from KEYRING:persistent:0:0 with result: -1765328243/Matching credential not found
            [30684] 1552841274.602326: Retrying host/client.example.com@EXAMPLE.COM -> ldap/dc.ad.example.com@EXAMPLE.COM with result: -1765328243/Matching credential not found
            [30684] 1552841274.602327: Server has referral realm; starting with ldap/dc.ad.example.com@EXAMPLE.COM
            [30684] 1552841274.602328: Retrieving host/client.example.com@EXAMPLE.COM -> krbtgt/EXAMPLE.COM@EXAMPLE.COM from KEYRING:persistent:0:0 with result: 0/Success
            [30684] 1552841274.602329: Starting with TGT for client realm: host/client.example.com@EXAMPLE.COM -> krbtgt/EXAMPLE.COM@EXAMPLE.COM
            [30684] 1552841274.602330: Requesting tickets for ldap/dc.ad.example.com@EXAMPLE.COM, referrals on
            [30684] 1552841274.602331: Generated subkey for TGS request: aes256-cts/A93C
            [30684] 1552841274.602332: etypes requested in TGS request: aes256-cts, aes128-cts, aes256-sha2, aes128-sha2, des3-cbc-sha1, rc4-hmac, camellia128-cts, camellia256-cts
            [30684] 1552841274.602334: Encoding request body and padata into FAST request
            [30684] 1552841274.602335: Sending request (965 bytes) to EXAMPLE.COM
            [30684] 1552841274.602336: Initiating TCP connection to stream ip.ad.dr.ess:88
            [30684] 1552841274.602337: Sending TCP request to stream ip.ad.dr.ess:88
            [30684] 1552841274.602338: Received answer (856 bytes) from stream ip.ad.dr.ess:88
            [30684] 1552841274.602339: Terminating TCP connection to stream ip.ad.dr.ess:88
            [30684] 1552841274.602340: Response was from master KDC
            [30684] 1552841274.602341: Decoding FAST response
            [30684] 1552841274.602342: FAST reply key: aes256-cts/D1E2
            [30684] 1552841274.602343: Reply server krbtgt/AD.EXAMPLE.COM@EXAMPLE.COM differs from requested ldap/dc.ad.example.com@EXAMPLE.COM
            [30684] 1552841274.602344: TGS reply is for host/client.example.com@EXAMPLE.COM -> krbtgt/AD.EXAMPLE.COM@EXAMPLE.COM with session key aes256-cts/470F
            [30684] 1552841274.602345: TGS request result: 0/Success
            [30684] 1552841274.602346: Following referral TGT krbtgt/AD.EXAMPLE.COM@EXAMPLE.COM
            [30684] 1552841274.602347: Requesting tickets for ldap/dc.ad.example.com@AD.EXAMPLE.COM, referrals on
            [30684] 1552841274.602348: Generated subkey for TGS request: aes256-cts/F0C6
            [30684] 1552841274.602349: etypes requested in TGS request: aes256-cts, aes128-cts, aes256-sha2, aes128-sha2, des3-cbc-sha1, rc4-hmac, camellia128-cts, camellia256-cts
            [30684] 1552841274.602351: Encoding request body and padata into FAST request
            [30684] 1552841274.602352: Sending request (921 bytes) to AD.EXAMPLE.COM
            [30684] 1552841274.602353: Sending DNS URI query for _kerberos.AD.EXAMPLE.COM.
            [30684] 1552841274.602354: No URI records found
            [30684] 1552841274.602355: Sending DNS SRV query for _kerberos._udp.AD.EXAMPLE.COM.
            [30684] 1552841274.602356: SRV answer: 0 0 88 "dc.ad.example.com."
            [30684] 1552841274.602357: Sending DNS SRV query for _kerberos._tcp.AD.EXAMPLE.COM.
            [30684] 1552841274.602358: SRV answer: 0 0 88 "dc.ad.example.com."
            [30684] 1552841274.602359: Resolving hostname dc.ad.example.com.
            [30684] 1552841274.602360: Resolving hostname dc.ad.example.com.
            [30684] 1552841274.602361: Initiating TCP connection to stream ano.ther.add.ress:88
            [30684] 1552841274.602362: Sending TCP request to stream ano.ther.add.ress:88
            [30684] 1552841274.602363: Received answer (888 bytes) from stream ano.ther.add.ress:88
            [30684] 1552841274.602364: Terminating TCP connection to stream ano.ther.add.ress:88
            [30684] 1552841274.602365: Sending DNS URI query for _kerberos.AD.EXAMPLE.COM.
            [30684] 1552841274.602366: No URI records found
            [30684] 1552841274.602367: Sending DNS SRV query for _kerberos-master._tcp.AD.EXAMPLE.COM.
            [30684] 1552841274.602368: No SRV records found
            [30684] 1552841274.602369: Response was not from master KDC
            [30684] 1552841274.602370: Decoding FAST response
            [30684] 1552841274.602371: FAST reply key: aes256-cts/10DE
            [30684] 1552841274.602372: TGS reply is for host/client.example.com@EXAMPLE.COM -> ldap/dc.ad.example.com@AD.EXAMPLE.COM with session key aes256-cts/24D1
            [30684] 1552841274.602373: TGS request result: 0/Success
            [30684] 1552841274.602374: Received creds for desired service ldap/dc.ad.example.com@AD.EXAMPLE.COM
            [30684] 1552841274.602375: Storing host/client.example.com@EXAMPLE.COM -> ldap/dc.ad.example.com@ in KEYRING:persistent:0:0
            [30684] 1552841274.602376: Also storing host/client.example.com@EXAMPLE.COM -> ldap/dc.ad.example.com@AD.EXAMPLE.COM based on ticket
            [30684] 1552841274.602377: Removing host/client.example.com@EXAMPLE.COM -> ldap/dc.ad.example.com@AD.EXAMPLE.COM from KEYRING:persistent:0:0
            ldap/dc.ad.example.com@: kvno = 28
            

            However, when not using a host-based principal in the request we’ll fail.

            [root@client ~]# kinit -k
            [root@client ~]# KRB5_TRACE=/dev/stderr kvno ldap/dc.ad.example.com
            [30695] 1552841932.100975: Getting credentials host/client.example.com@EXAMPLE.COM -> ldap/dc.ad.example.com@EXAMPLE.COM using ccache KEYRING:persistent:0:0
            [30695] 1552841932.100976: Retrieving host/client.example.com@EXAMPLE.COM -> ldap/dc.ad.example.com@EXAMPLE.COM from KEYRING:persistent:0:0 with result: -1765328243/Matching credential not found
            [30695] 1552841932.100977: Retrieving host/client.example.com@EXAMPLE.COM -> krbtgt/EXAMPLE.COM@EXAMPLE.COM from KEYRING:persistent:0:0 with result: 0/Success
            [30695] 1552841932.100978: Starting with TGT for client realm: host/client.example.com@EXAMPLE.COM -> krbtgt/EXAMPLE.COM@EXAMPLE.COM
            [30695] 1552841932.100979: Requesting tickets for ldap/dc.ad.example.com@EXAMPLE.COM, referrals on
            [30695] 1552841932.100980: Generated subkey for TGS request: aes256-cts/27DA
            [30695] 1552841932.100981: etypes requested in TGS request: aes256-cts, aes128-cts, aes256-sha2, aes128-sha2, des3-cbc-sha1, rc4-hmac, camellia128-cts, camellia256-cts
            [30695] 1552841932.100983: Encoding request body and padata into FAST request
            [30695] 1552841932.100984: Sending request (965 bytes) to EXAMPLE.COM
            [30695] 1552841932.100985: Initiating TCP connection to stream ip.ad.dr.ess:88
            [30695] 1552841932.100986: Sending TCP request to stream ip.ad.dr.ess:88
            [30695] 1552841932.100987: Received answer (461 bytes) from stream ip.ad.dr.ess:88
            [30695] 1552841932.100988: Terminating TCP connection to stream ip.ad.dr.ess:88
            [30695] 1552841932.100989: Response was from master KDC
            [30695] 1552841932.100990: Decoding FAST response
            [30695] 1552841932.100991: TGS request result: -1765328377/Server ldap/dc.ad.example.com@EXAMPLE.COM not found in Kerberos database
            [30695] 1552841932.100992: Requesting tickets for ldap/dc.ad.example.com@EXAMPLE.COM, referrals off
            [30695] 1552841932.100993: Generated subkey for TGS request: aes256-cts/C1BF
            [30695] 1552841932.100994: etypes requested in TGS request: aes256-cts, aes128-cts, aes256-sha2, aes128-sha2, des3-cbc-sha1, rc4-hmac, camellia128-cts, camellia256-cts
            [30695] 1552841932.100996: Encoding request body and padata into FAST request
            [30695] 1552841932.100997: Sending request (965 bytes) to EXAMPLE.COM
            [30695] 1552841932.100998: Initiating TCP connection to stream ip.ad.dr.ess:88
            [30695] 1552841932.100999: Sending TCP request to stream ip.ad.dr.ess:88
            [30695] 1552841932.101000: Received answer (461 bytes) from stream ip.ad.dr.ess:88
            [30695] 1552841932.101001: Terminating TCP connection to stream ip.ad.dr.ess:88
            [30695] 1552841932.101002: Response was from master KDC
            [30695] 1552841932.101003: Decoding FAST response
            [30695] 1552841932.101004: TGS request result: -1765328377/Server ldap/dc.ad.example.com@EXAMPLE.COM not found in Kerberos database
            kvno: Server ldap/dc.ad.example.com@EXAMPLE.COM not found in Kerberos database while getting credentials for ldap/dc.ad.example.com@EXAMPLE.COM
            

            As you can see, our client tried to ask for a service ticket to a non-host-based service principal from outside our realm and this was not accepted by the KDC, thus resolution failing.

            Mixed realm deployments

            The behavior above is predictable. However, a client-side processing of the target realm behaves wrongly in case a client needs to request a service ticket to a service principal located in a trusted realm but situated in a DNS zone belonging to our own realm. This might sound like a complication but it is a typical situation for deployments with FreeIPA trusting Active Directory forests. In such cases customers often want to place Linux machines right in the DNS zones associated with Active Directory domains.

            Since Microsoft Active Directory implementation does not support per-host Kerberos realm hint, unlike MIT Kerberos or Heimdal, such request from Windows client will always fail. It will be not possible to obtain a service ticket in such situation from Windows machines.

            However, when both realms trusting each other are MIT Kerberos, their KDCs and clients can be configured for a selective realm discovery.

            As explained at FOSDEM 2018 and devconf.cz 2019, Red Hat IT moved from an old plain Kerberos realm to the FreeIPA deployment. This is a situation where we have EXAMPLE.COM and IPA.EXAMPLE.COM both trusting each other and migrating systems to IPA.EXAMPLE.COM over long period of time. We want to continue providing services in example.com DNS zone but use IPA.EXAMPLE.COM realm. Our clients are in both Kerberos realms but over time they will all eventually migrate to IPA.EXAMPLE.COM.

            Working with such situation can be tricky. Let’s start with a simple example.

            Suppose our client’s krb5.conf has [domain_realm] section that looks like this:

            [domain_realm]
               client.example.com = EXAMPLE.COM
               .example.com = EXAMPLE.COM
            

            If we need to ask for a HTTP/app.example.com service ticket to the application server hosted on app.example.com, the Kerberos library on the client will map HTTP/app.example.com to the EXAMPLE.COM and will not attempt to request a referral from a KDC. If our application server is enrolled into IPA.EXAMPLE.COM realm, it means the client with such configuration will never try to discover HTTP/app.example.com@IPA.EXAMPLE.COM and will never be able to authenticate to app.example.com with Kerberos.

            There are two possible solutions here. We can either add an explicit mapping for app.example.com host to IPA.EXAMPLE.COM in the client’s [domain_realm] section in krb5.conf or remove .example.com mapping entry from the [domain_realm] on the client side completely and rely on KDC or DNS-based search.

            First solution does not scale and is a management issue. Updating all clients when a new application server is migrated to the new realm sounds like a nightmare if majority of your clients are laptops. You’d really want to force them to delegate to the KDC or do DNS-based search instead.

            Of course, there is a simple solution: add _kerberos.app.example.com TXT record pointing out to IPA.EXAMPLE.COM in the DNS and let clients to use it. This would assume that all clients will not have .example.com = EXAMPLE.COM mapping rule.

            Unfortunately, it is more complicated. As Robbie Harwood, Fedora and RHEL maintainer of MIT Kerberos, explained to me, the problem is what happens if there’s inadequate DNS information, e.g. DNS-based search failed. A client would fall back to heuristics (domain-based search) and these would differ depending which MIT Kerberos version is in use. Since MIT Kerberos 1.16 heuristics would be trying to prefer mapping HTTP/app.ipa.example.com into IPA.EXAMPLE.COM over EXAMPLE.COM, and prefer EXAMPLE.COM to failure. However, there is no a way to map HTTP/app.example.com to IPA.EXAMPLE.COM with these heuristics.

            Domain-based search gives us another heuristics based on the realm. It is tunable via realm_try_domains option but it also would affect the way how MIT Kerberos library would choose which credentials cache from a credentials cache collection (KEYRING:, DIR:, KCM: ccache types). This logic is present since MIT Kerberos 1.12 but it also wouldn’t help us to map HTTP/app.example.com to IPA.EXAMPLE.COM.

            After some discussions, Robbie and I came to a conclusion that perhaps changing the order how these methods are applied by the MIT Kerberos library could help. As I mentioned in “Domain to realm mapping” section, the current order is hard-coded: for realm selection the profile-based search is done before DNS-based search and domain-based search is done as the last one. Ideally, choosing which search is done after which could be given to administrators. However, there aren’t many reasonable orders out there. Perhaps, allowing just two options would be enough:

            • prioritizing DNS search over a profile search
            • prioritizing a profile search over DNS search

            Until it is done, we are left with the following recommendation for mixed-domain Kerberos principals from multiple realms:

            • make sure you don’t use [domain_realm] mapping for mixed realm domains
            • make sure you have _kerberos.$hostname TXT record set per host/domain for the right realm name. Remember that Kerberos realm is case-sensitive and almost everywhere it is uppercase, so be sure the value of the TXT record is correct.

            Episode 137.5 - Holy cow Beto was in the cDc, this is awesome!

            Posted by Open Source Security Podcast on March 18, 2019 12:01 AM
            Josh and Kurt talk about Beto being in the Cult of the Dead Cow (cDc). This is a pretty big deal in a very good way. We hit on some history, why it's a great thing, what we can probably expect from opponents. There's even some advice at the end how we can all help. We need more politicians with backgrounds like this.


            <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/9037547/height/90/theme/custom/thumbnail/yes/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

            Show Notes


              Convert Docker Image Output to an HTML Table

              Posted by Adam Young on March 14, 2019 11:17 PM
              #!/bin/sh
              
              docker images | awk '
              BEGIN {print ("")};
              /REPOSITORY/{
              print("")}
              /MB/{
              print ("")}
              END {print ("
              " $1,"" $2,"" $3,$4,"" $5"" $6"
              " $1,"" $2,"" $3,"" $4,$5,$6 " " $7,$8 "
              ")}'

              Building the Kolla Keystone Container

              Posted by Adam Young on March 14, 2019 03:43 PM

              Kolla has become the primary source of Containers for running OpenStack services. Since if has been a while since I tried deliberately running just the Keystone container, I decided to build the Kolla version from scratch and run it.

              UPDATE: Ozz wrote it already, and did it better: http://jaormx.github.io/2017/testing-containerized-openstack-services-with-kolla/

              I had an clone of the Kolla repo already, but if you need one, you can get it by cloning

              git clone git://git.openstack.org/openstack/kolla

              All of the dependencies you need to run the build process are handled by tox. Assuming you can run tox elsewhere, you can use that here, too:

              tox -e py35
              

              That will run through all the unit tests. They do not take that long.

              To build all of the containers you can active the virtual environment and then use the build tool. That takes quite a while, since there are a lot of containers required to run OpenStack.

              $ . .tox/py35/bin/activate
              (py35) [ayoung@ayoungP40 kolla]$ tools/build.py 
              

              If you want to build just the keystone containers….

               python tools/build.py keystone
              

              Building this with no base containers cached took me 5 minutes. Delta builds should be much faster.

              Once the build is complete, you will have a bunch of container images defined on your system:

              REPOSITORY TAG IMAGE ID CREATEDSIZE
              kolla/centos-binary-keystone 7.0.2 69049739bad6 33 minutes ago 800 MB
              kolla/centos-binary-keystone-fernet 7.0.2 89977265fcbb 33 minutes ago 800 MB
              kolla/centos-binary-keystone-ssh 7.0.2 4b377e854980 33 minutes ago 819 MB
              kolla/centos-binary-barbican-keystone-listener 7.0.2 6265d0acff16 33 minutes ago 732 MB
              kolla/centos-binary-keystone-base 7.0.2 b6d78b9e0769 33 minutes ago 774 MB
              kolla/centos-binary-barbican-base 7.0.2 ccd7b4ff311f 34 minutes ago 706 MB
              kolla/centos-binary-openstack-base 7.0.2 38dbb3c57448 34 minutes ago 671 MB
              kolla/centos-binary-base 7.0.2 177c786e9b01 36 minutes ago 419 MB
              docker.io/centos 7 1e1148e4cc2c 3 months ago 202 MB

              Note that the build instructions live in the git repo under docs.

              Episode 137 - When the IoT attacks!

              Posted by Open Source Security Podcast on March 11, 2019 12:03 AM
              Josh and Kurt talk about when devices attack! It's not quite that exciting, but there have been a slew of news about physical devices causing problems for humans. We end on the note that we're getting closer to a point when lawyers and regulators will start to pay attention. We're not there yet, so we still have a horrible insecure future on the horizon.


              <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8947304/height/90/theme/custom/thumbnail/yes/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

              Show Notes


                Accessing C Arrays of String from Rust

                Posted by Adam Young on March 10, 2019 08:53 PM

                Now that I can list the group names, I want to be able to list the member of the groups.

                First task is to add an field to the GroupEnt structure that can access the member list. The structure now looks like this:

                #[repr(C)]
                pub struct GroupEnt {
                    gr_name:   *const c_char,        /* group name */
                    gr_passwd:  *const c_char,      /* group password */
                    gr_gid:    u32,         /* group ID */
                    gr_mem:    *const *const c_char         /* NULL-terminated array of pointers to names of group members */
                }
                

                Iterating through the gr_mem field is done via the offset accessor like this:

                 let member_ptr_arr: *const *const c_char  =  (*groupent).gr_mem;
                                if member_ptr_arr == ptr::null() {
                                    continue;
                                }
                                for i in 0 .. {
                                    let member_ptr: *const c_char = *(member_ptr_arr.offset(i));
                

                This code is tricky to get right:
                let member_ptr: *const c_char = *(member_ptr_arr.offset(i));
                offset inside the parents, * outside. But that matches the intention of the C code.
                Access of the strings via the same method as the group names. If the pointer is null, exit the loop

                     if member_ptr != ptr::null() {
                     let member: &CStr = CStr::from_ptr(member_ptr) ;
                         println!("  {}", member.to_str().unwrap());
                     }else{
                         break; 
                     }
                

                The whole function looks like this:

                fn enumerate_groups(){
                    let mut groupent: * const GroupEnt;
                    unsafe{
                        setgrent();
                        groupent = getgrent();
                    }
                    while groupent != ptr::null(){
                        let c_str: &CStr = unsafe { CStr::from_ptr((*groupent).gr_name) };
                        println!("{}", c_str.to_str().unwrap());
                        unsafe{
                            if  (*groupent).gr_mem != ptr::null(){
                                let member_ptr_arr: *const *const c_char  =  (*groupent).gr_mem;
                                if member_ptr_arr == ptr::null() {
                                    continue;
                                }
                                for i in 0 .. {
                                    let member_ptr: *const c_char = *(member_ptr_arr.offset(i));
                                    if member_ptr != ptr::null() {
                                        let member: &CStr = CStr::from_ptr(member_ptr) ;
                                        println!("  {}", member.to_str().unwrap());
                                    }else{
                                        break;
                                    }
                                }
                            }
                        }         
                        unsafe{
                            groupent = getgrent();
                        }        
                    }       
                    unsafe{
                        endgrent();
                    }
                }
                
                
                

                Big thanks to Sebastian K (sebk) in #rust-beginners (Mozilla IRC) for helping me get this to work and understand this.

                Iterating through an FFI API in Rust

                Posted by Adam Young on March 10, 2019 05:26 PM

                Now that I know I can read a single group, the next step is to iterate.

                Iteration of this C API requires the ability to test for the end of iteration. For this, I use the std::ptr to test for a null pointer. To include this in the rust file:

                use std::ptr; 

                To test for the a null pointer in a while loop:

                while groupent != ptr::null(){

                This style of while loop requires calling the getgrpent function twice. I don’t love that, but it seems to be the clearest code. here is the whole loop:

                fn enumerate_groups(){
                    let mut groupent: * const GroupEnt;
                
                    unsafe{
                        setgrent();
                        groupent = getgrent();
                    }
                    while groupent != ptr::null(){
                        let c_str: &CStr = unsafe { CStr::from_ptr((*groupent).gr_name) };
                        println!("{}", c_str.to_str().unwrap());
                        unsafe{
                            groupent = getgrent();
                        }        
                    }       
                    unsafe{
                        endgrent();
                    }
                }
                

                The multiple unsafe blocks are to try an isolate the unsafe portions, but also to enable refactoring as a follow on step.

                Reading Linux groups via the Rust Foreign Function Interface

                Posted by Adam Young on March 10, 2019 05:17 PM

                The world continues to embraces Rust for its safety properties. While writing utilities in Rust, we are going to have to work with existing code to perform common tasks. I recently needed to list the set of Linux groups registered on a system, and get access to the users assigned to each. Here’s my notes of what I learned.

                Comparable code in C

                The three C APIs I want to call are:

                • getgrenet
                • setgrent
                • endgrent

                A simple C program to enumerate the groups looks like this:

                #include <stdio.h>
                
                #define _GNU_SOURCE         /* See feature_test_macros(7) */
                #include <grp.h>
                
                
                int main(){
                  struct group * current_group;
                  printf("Hello World\n");
                  setgrent();
                  while( current_group = getgrent()){
                    if (current_group){
                      printf("ID: %6d ", current_group->gr_gid);
                      printf("Name: %20s \n", current_group->gr_name);
                    }
                  } 
                  endgrent();
                  return 0;
                }
                
                

                Steps in Rust

                In order to make these same calls from Rust, I have to do a few things:

                1. Import the functions from native code.
                2. Create a comparable structure to the struct group defined in C.
                3. Wrap the calls to the C code in an unsafe block
                4. Convert from the raw memory types to Rust types that I can use in standard rust macros like println!

                Import the functions from Native Code

                The functions I want to call are in libc. For the Cargo system to acces s them, I need to following dependency:

                [dependencies]
                libc = "0.2.0"
                

                Inside the rust code itself, I have to reference the foreign library. I also need a couple standard functions for string conversion:

                extern crate libc;
                use libc::c_char;
                use std::ffi::CStr;
                use std::str;
                

                Create a comparable structure to the struct group defined in C.

                For the Group structure, I need a comparable rust structure. Since this iteration I am not going through the group members, I can limit myself to the first couple elements of the structure:

                #[repr(C)]
                pub struct GroupEnt {
                    gr_name:   *const c_char,        /* group name */
                    gr_passwd:  *const c_char,      /* group password */
                    gr_gid:    u32         /* group ID */
                }
                
                
                To import the foreign functions, I need a block that defines them:
                extern {
                    fn setgrent();
                    fn getgrent() -> *const GroupEnt;
                    fn endgrent();
                }
                

                Wrap the calls to the C code in an unsafe block and convert from the raw memory types to Rust types

                Finally, to call the code, I need to wrap them in unsafe blocks.

                fn enumerate_groups(){
                    let groupent: * const GroupEnt;
                
                    unsafe{
                        setgrent();
                        groupent = getgrent();
                    }
                    let c_str: &CStr = unsafe { CStr::from_ptr((*groupent).gr_name) };
                    println!("{}", c_str.to_str().unwrap());
                
                    unsafe{
                        endgrent();
                    }
                }
                
                

                This will print the first element of the list. Next steps:

                1. Iterate through the whole list
                2. Iterate through the list of users.

                Episode 136 - How people feel is more important than being right

                Posted by Open Source Security Podcast on March 04, 2019 01:08 AM
                Josh and Kurt talk about github blocking the Deepfakes repository. There's a far bigger discussion about how people feel, and sometimes security fails to understand that making people feel happy or safer is more important than being right.


                <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8860862/height/90/theme/custom/thumbnail/yes/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

                Show Notes


                  Programming Lessons and Methods

                  Posted by William Brown on February 25, 2019 01:00 PM

                  Programming Lessons and Methods

                  Everyone has their own lessons and methods that they use when they approaching programming. These are the lessons that I have learnt, which I think are the most important when it comes to design, testing and communication.

                  Comments and Design

                  Programming is the art of writing human readable code, that a machine will eventually run. Your program needs to be reviewed, discussed and parsed by another human. That means you need to write your program in a way they can understand first.

                  Rather than rushing into code, and hacking until it works, I find it’s great to start with comments such as:

                  fn data_access(search: Search) -> Type {
                      // First check the search is valid
                      //  * No double terms
                      //  * All schema is valid
                  
                      // Retrieve our data based on the search
                  
                      // if debug, do an un-indexed assert the search matches
                  
                      // Do any need transform
                  
                      // Return the data
                  }
                  

                  After that, I walk away, think about the issue, come back, maybe tweak these comments. When I eventually fill in the code inbetween, I leave all the comments in place. This really helps my future self understand what I was thinking, but it also helps other people understand too.

                  State Machines

                  State machines are a way to design and reason about the states a program can be in. They allow exhaustive represenations of all possible outcomes of a function. A simple example is a microwave door.

                    /----\            /----- close ----\          /-----\
                    |     \          /                 v         v      |
                    |    -------------                ---------------   |
                  open   | Door Open |                | Door Closed |  close
                    |    -------------                ---------------   |
                    |    ^          ^                  /          \     |
                    \---/            \------ open ----/            \----/
                  

                  When the door is open, opening it again does nothing. Only when the door is open, and we close the door (and event), does the door close (a transition). Once closed, the door can not be closed any more (event does nothing). It’s when we open the door now, that a state change can occur.

                  There is much more to state machines than this, but they allow us as humans to reason about our designs and model our programs to have all possible outcomes considered.

                  Zero, One and Infinite

                  In mathematics there are only three numbers that matter. Zero, One and Infinite. It turns out the same is true in a computer too.

                  When we are making a function, we can define limits in these terms. For example:

                  fn thing(argument: Type)
                  

                  In this case, argument is “One” thing, and must be one thing.

                  fn thing(argument: Option<Type>)
                  

                  Now we have argument as an option, so it’s “Zero” or “One”.

                  fn thing(argument: Vec<Type>)
                  

                  Now we have argument as vec (array), so it’s “Zero” to “Infinite”.

                  When we think about this, our functions have to handle these cases properly. We don’t write functions that take a vec with only two items, we write a function with two arguments where each one must exist. It’s hard to handle “two” - it’s easy to handle two cases of “one”.

                  It also is a good guide for how to handle data sets, assuming they could always be infinite in size (or at least any arbitrary size).

                  You can then apply this to tests. In a test given a function of:

                  fn test_me(a: Option<Type>, b: Vec<Type>)
                  

                  We know we need to test permutations of:

                  • a is “Zero” or “One” (Some, None)
                  • b is “Zero”, “One” or “Infinite” (.len() == 0, .len() == 1, .len() > 0)

                  Note: Most languages don’t have an array type that is “One to Infinite”, IE non-empty. If you want this condition (at least one item), you have to assert it yourself ontop of the type system.

                  Correct, Simple, Fast

                  Finally, we can put all these above tools together and apply a general philosophy. When writing a program, first make it correct, then simpify the program, then make it fast.

                  If you don’t do it in this order you will hit barriers - social and technical. For example, if you make something fast, simple, correct, you will likely have issues that can be fixed without making a decrease in performance. People don’t like it when you introduce a patch that drops performance, so as a result correctness is now sacrificed. (Spectre anyone?)

                  If you make something too simple, you may never be able to make it correctly handle all cases that exist in your application - likely facilitating a future rewrite to make it correct.

                  If you do correct, fast, simple, then your program will be correct, and fast, but hard for a human to understand. Because programming is the art of communicating intent to a person sacrificing simplicity in favour of fast will make it hard to involve new people and educate and mentor them into development of your project.

                  • Correct: Does it behave correctly, handle all states and inputs correctly?
                  • Simple: Is it easy to comprehend and follow for a human reader?
                  • Fast: Is it performant?

                  Episode 135 - Passwords, AI, and cloud strategy

                  Posted by Open Source Security Podcast on February 25, 2019 01:07 AM
                  Josh and Kurt talk about change your password day (what a terrible day). Google's password checkup (not a terrible idea), an AI finding new spice flavors we expect will one day take over the world, and we finish up on a new DoD cloud strategy. Also Josh burnt his finger, but is going to be OK.


                  <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8766359/height/90/theme/custom/thumbnail/yes/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

                  Show Notes


                    Musings on Hybrid Cloud

                    Posted by Dan Walsh on February 20, 2019 08:55 PM

                    I work on the lowest levels of container runtimes and usually around process security.  My team and I work on basically everything needed run containers on the host operating system under Kubernetes.  I also work in the OpenShift group at Red Hat.

                    I hear a lot of thoughts on Hybrid Cloud and how the goal of OpenShift is to bridge the gap between on-prem data center services and virtualization with cloud services.  Usually these services are provided by the big three clouds.  Amazon AWS, Microsoft Azure, and Google GCE.  Maybe I should add Alibaba to this list. 

                    It is really cool that OpenShift and Kubernetes have the ability to move workloads from your in-house data centers to different clouds.  Imagine you have VMWare, OpenStack or RHEV virtualization Kubernetes nodes running along with nodes running in the cloud services, all powered by OpenShift/Kubernetes. 

                    OpenShift/Kubernetes can scale up off of your in-house data centers to the cloud, basically renting capacity when demand skyrockets but then drops back when demand slackens, saving you the rent check.  

                    I envision a world where you could get deals off of Microsoft Azure to save .05 cents per hour on your rent.  You press a button on OpenShift which moves  hundreds/thousands of nodes off of AWS and onto Azure.  (Of course to make this work customers need to make sure they don’t get tied into services on any of the big cloud vendors)

                    Big Cloud Vendors == Walmart/Amazon Retail Business

                    I have been thinking of another use case, and I like the analogy of what Walmart and Amazon did to the retail business in the world.  Over the last 20/30 years we have seen the retail world destroyed by these two behemoths.  Malls all over the US and probably all over the world are crumbling, but one type of retail has survived and I might say thrived, and this is specialty stores.  I should know because my wife drags me to them all the time. 

                    But Dan what does this have to do with Cloud?

                    Well I look at the big cloud vendors like AWS, Azure and GCE, as the "Walmarts" of the retail business. But when I look around and I am seeing some specialty clouds showing up. Here are three examples.

                    • Red Hat has been working recently with NVidia on their cloud services. What would happen if NVidia started providing their latest and greatest GPU’s in their cloud before they were available for retail sale? Would early adopters be willing to rent these services for a while?  I could see people wanting to run certain workloads in NVidia Cloud to get access to these super fast GPUs.
                    • IBM Cloud could start to offer services on their high powered main frames, Z Series?  Perhaps access to quantum computing services. Maybe Knative support for Watson?
                    • Oracle Cloud can probably do a better job of handling databases on demand than anyone else.  Imagine Cloud services with Database on demand.

                    Who knows, maybe Walmart will start offering cloud services…

                    Could we see in the future when customers want to run their application front ends on the “Walmart” clouds but run their back end services on some of the specialty clouds?

                    Some of these specialty clouds could grow rather large as well, similar to say how Lowes, Home Depot and Best Buy have been able to stay alive, and even thrive by concentrating on specialty services which the Walmarts/Amazon retail business have difficulty competing.

                    Conclusion

                    I believe working with a Hybrid/Cross cloud tools like OpenShift gives customers the best tools to prevent lock-in to any of the big cloud vendors.  OpenShift will allow users to move workloads between the big cloud vendors, their private data centers and the specialty clouds.  The best of local retail along with the commodity retail.  Run your application where it makes sense and protect it against vendor lock-in.

                    Episode 134 - What's up with the container runc security flaw?

                    Posted by Open Source Security Podcast on February 18, 2019 01:23 AM
                    Josh and Kurt talk about the new runc container security flaw. How does the flaw work, what can you do about it, what should you do about it, and what the future of container security may look like.


                    <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8680961/height/90/theme/custom/thumbnail/yes/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

                    Show Notes


                      Extract Method Refactoring in Rust

                      Posted by Adam Young on February 14, 2019 06:20 PM

                      I’m writing a simple utility for manage the /etc/hosts file. I want it in a native language so I can make it SUID, or even better, to lock it down via capabilities. I want to remember how to code in rust. Once I get a simple bit working, I want to refactor. Here’s what I did.

                      Here is the functioning code:

                      use std::env;
                      use std::fs;
                      
                      fn words_by_line<'a>(s: &'a str) -> Vec<vec><&'a str>> {
                          s.lines().map(|line| {
                              line.split_whitespace().collect()
                          }).collect()
                      }
                      
                      fn main() {
                          let args: Vec<string> = env::args().collect();
                          let filename = &args[1];
                          let _operation = &args[2];
                          let contents = fs::read_to_string(filename)
                              .expect("Something went wrong reading the file");
                          
                          let wbyl = words_by_line(&contents);
                          for i in &wbyl {
                              for j in i{
                                  print!("{} ", j);
                              }
                              println!("");
                          }
                      }
                      
                      
                      
                      

                      Build and run with

                      cargo build
                      ./target/debug/hostsman ./hosts list

                      And it spits out the contents of the local copy of /etc/hosts. We'll treat this as the unit test for now.

                      The next step is to start working towards a switch based on the _operation variable. To do this, I want to pull the loop that dumps the file out into its own function. And to do that, I need to figure out the type of the Vector. I use the hack of introducing an error to get the compiler to tell me the type. I change the assignment line to get:

                      let wbyl: u8 = words_by_line(&contents);

                      And that tells me:

                      error[E0308]: mismatched types
                        --> src/main.rs:18:20
                         |
                      18 |     let wbyl: u8 = words_by_line(&contents);
                         |                    ^^^^^^^^^^^^^^^^^^^^^^^^ expected u8, found struct `std::vec::Vec`
                         |
                         = note: expected type `u8`
                                    found type `std::vec::Vec<std::vec::vec><&str>>`
                      
                      

                      So I convert the code to use that, build and run. Code now looks like this:

                      let wbyl: std::vec::Vec> = words_by_line(&
                      contents)

                      Now now create a function by copying the existing code block and using the variable type in the parameter list. It looks like this:

                      use std::env;
                      use std::fs;
                      
                      fn words_by_line<'a>(s: &'a str) -> Vec<vec><&'a str>> {
                          s.lines().map(|line| {
                              line.split_whitespace().collect()
                          }).collect()
                      }
                      
                      fn list(wbyl: std::vec::Vec<std::vec::vec><&str>>){    
                          for i in &wbyl {
                              for j in i{
                                  print!("{} ", j);
                              }
                              println!("");
                          }
                      }
                      
                      fn main() {
                          let args: Vec<string> = env::args().collect();
                          let filename = &args[1];
                          let _operation = &args[2];
                          let contents = fs::read_to_string(filename)
                              .expect("Something went wrong reading the file");
                          
                          let wbyl: std::vec::Vec<std::vec::vec><&str>> = words_by_line(&contents);
                      
                          list(wbyl);
                      }
                      

                      Now we are prepped to continue development. Next up is to parse the command and execute a different function based on it.

                      Meaningful 2fa on modern linux

                      Posted by William Brown on February 11, 2019 01:00 PM

                      Meaningful 2fa on modern linux

                      Recently I heard of someone asking the question:

                      “I have an AD environment connected with <product> IDM. I want to have 2fa/mfa to my linux machines for ssh, that works when the central servers are offline. What’s the best way to achieve this?”

                      Today I’m going to break this down - but the conclusion for the lazy is:

                      This is not realistically possible today: use ssh keys with ldap distribution, and mfa on the workstations, with full disk encryption.

                      Background

                      So there are a few parts here. AD is for intents and purposes an LDAP server. The <product> is also an LDAP server, that syncs to AD. We don’t care if that’s 389-ds, freeipa or vendor solution. The results are basically the same.

                      Now the linux auth stack is, and will always use pam for the authentication, and nsswitch for user id lookups. Today, we assume that most people run sssd, but pam modules for different options are possible.

                      There are a stack of possible options, and they all have various flaws.

                      • FreeIPA + 2fa
                      • PAM TOTP modules
                      • PAM radius to a TOTP server
                      • Smartcards

                      FreeIPA + 2fa

                      Now this is the one most IDM people would throw out. The issue here is the person already has AD and a vendor product. They don’t need a third solution.

                      Next is the fact that FreeIPA stores the TOTP in the LDAP, which means FreeIPA has to be online for it to work. So this is eliminated by the “central servers offline” requirement.

                      PAM radius to TOTP server

                      Same as above: An extra product, and you have a source of truth that can go down.

                      PAM TOTP module on hosts

                      Okay, even if you can get this to scale, you need to send the private seed material of every TOTP device that could login to the machine, to every machine. That means any compromise, compromises every TOTP token on your network. Bad place to be in.

                      Smartcards

                      Are notoriously difficult to have functional, let alone with SSH. Don’t bother. (Where the Smartcard does TLS auth to the SSH server this is.)

                      Come on William, why are you so doom and gloom!

                      Lets back up for a second and think about what we we are trying to prevent by having mfa at all. We want to prevent single factor compromise from having a large impact and we want to prevent brute force attacks. (There are probably more reasons, but these are the ones I’ll focus on).

                      So the best answer: Use mfa on the workstation (password + totp), then use ssh keys to the hosts.

                      This means the target of the attack is small, and the workstation can be protected by things like full disk encryption and group policy. To sudo on the host you still need the password. This makes sudo MFA to root as you need something know, and something you have.

                      If you are extra conscious you can put your ssh keys on smartcards. This works on linux and osx workstations with yubikeys as I am aware. Apparently you can have ssh keys in TPM, which would give you tighter hardware binding, but I don’t know how to achieve this (yet).

                      To make all this better, you can distributed your ssh public keys in ldap, which means you gain the benefits of LDAP account locking/revocation, you can remove the keys instantly if they are breached, and you have very little admin overhead to configuration of this service on the linux server side. Think about how easy onboarding is if you only need to put your ssh key in one place and it works on every server! Let alone shutting down a compromised account: lock it in one place, and they are denied access to every server.

                      SSSD as the LDAP client on the server can also cache the passwords (hashed) and the ssh public keys, which means a disconnected client will still be able to be authenticated to.

                      At this point, because you have ssh key auth working, you could even deny password auth as an option in ssh altogether, eliminating an entire class of bruteforce vectors.

                      For bonus marks: You can use AD as the generic LDAP server that stores your SSH keys. No additional vendor products needed, you already have everything required today, for free. Everyone loves free.

                      Conclusion

                      If you want strong, offline capable, distributed mfa on linux servers, the only choice today is LDAP with SSH key distribution.

                      Want to know more? This blog contains how-tos on SSH key distribution for AD, SSH keys on smartcards, and how to configure SSSD to use SSH keys from LDAP.

                      Episode 133 - Smart locks and the government hacking devices

                      Posted by Open Source Security Podcast on February 11, 2019 12:49 AM
                      Josh and Kurt talk about the fiasco hacks4pancakes described on Twitter and what the future of smart locks will look like. We then discuss what it means if the Japanese government starts hacking consumer IoT gear, is it ethical? Will it make anything better?


                      <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8593940/height/90/theme/custom/thumbnail/yes/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

                      Show Notes


                        Ansible and FreeIPA Part 2

                        Posted by Adam Young on February 08, 2019 01:09 AM

                        After some discussion with Bill Nottingham I got a little further along with what it would take to integrate Ansible Tower and FreeIPA. Here are the notes from that talk.

                        FreeIPA work best when you can use SSSD to manage the user and groups of the application. Since Ansible is a DJango Application running behind NGinx, this means using REMOTE_USER configuration. However, Ansible Tower already provides integration with SAML and OpenIDC using Python Social Auth. If an administrator wants to enable SAML, they do so in the database layer, and that provides replication to all of the Ansible Tower instances in a cluster.

                        The Social integration provides the means to map from the SAML/OpenIDC assertion to the local user and groups. An alternative based on the REMOTE_USER section would have the same set of mappings, but from Variables exposed by the SSSD layer. The variables available would any exposed from an Nginx module, such as those documented here.

                        Some configuration of the Base OS would be required beyond enrolling the system as an IPA client. Specifically, any variables that the user wishes to expose would be specified in /etc/sssd/sssd.conf.

                        This mirrors how I set up SSSD Federation in OpenStack Keystone. The configuration of SSSD is the same.

                        Ansible and FreeIPA Part-1

                        Posted by Adam Young on February 07, 2019 08:25 PM

                        Ansible is a workflow engine. I use it to do work on my behalf.

                        FreeIPA is an identity management system. It allows me to manage the identities of users in my organization

                        How do I get the two things to work together? The short answer is that it is trivial to do using Ansible Engine. It is harder to do using Ansible tower.

                        Edit: Second part is here. Third part is coming.

                        Engine


                        Lets start with engine. Lets say that I want to execute a playbook on a remote system. Both my local and remote systems are FreeIPA clients. Thus, I can use Kerberos to authenticate when I ssh in to the remote system. This same mechanism is reused by Ansible when I connect to the system. The following two commands are roughly comparable

                        scp myfile.txt  ayoung@hostname:  
                        
                        ansible  --user ayoung hostname -m copy -a /
                        "src=myfile.txt dest=/home/ayoung"  
                        

                        Ignoring all the extra work that the copy module does, checking hashes etc.

                        Under the covers, the ssh layer checks the various authentication mechanism available to communicate with the remote machine. If I have run kinit (successfully) prior to executing the scp command, it will try the Kerberos credentials (via GSSAPI, don’t get me started on the acronym soup) to authenticate to the remote system.

                        This is all well and good if I am running the playbook interactively. But, what if I want to kick off the playbook from an automated system, like cron?

                        Keys

                        The most common way that people use ssh is using asymmetric keys with no certificated. On a Linux system, these keys are kept in ~/.ssh. If I am using rsa, then the private key is kept in ~/.ssh/id_rsa. I can use a passphrase to protect this file. If I want to script using that key, I need to remove the passphrase, or I need to store the passphrase in a file that automates submitting it. While there are numerous ways to handle this, a very common pattern is to have a second set of credentials, stored in a second file, and a configuration option that says to use them. For example, I have a directory ~/keys that contains an id_rsa file. I can use it with ssh like this:

                        ssh cloud-user@128.31.24.146 -i ~/keys/id_rsa

                        And with Ansible:

                         ansible -i inventory.py ayoung_resources --key-file ~/keys/id_rsa  -u cloud-user   -m ping

                        Ansible lacks knowledge of Kerberos. There is no way to say “kinit blah” prior to the playbook. While you can add this to a script, you are now providing a wrapper around Ansible.

                        Automating via Kerberos

                        Kerberos has a different way to automate credentials: You can use a keytab ( a file with symmetric keys stored in it) to get a Ticket Granting Ticket (TGT) and you can place that TGT in a special directory: /var/kerberos/krb5/user/<uid>

                        I wrote this up a few years back: https://adam.younglogic.com/2015/05/auto-kerberos-authn/

                        Lets take this a little bit further. Lets say that I don’t want to perform the operation as me. Specifically, I don’t want to create a TGT for my user that has all of my authority in an automated fashion. I want to create some other, limited scope principal (the Kerberos term for users and things that are like users that can do things) and use that.

                        Service Principals

                        I’d prefer to create a service principal from my machine. If my machine is testing.demo1.freeipa.org and I create on it a service called ansible, I’ll end up with a principal of:

                        anisble/testing.demo1.freeipa.org@DEMO1.FREEIPA.ORG

                        A user can allocate to this principal a Keytab, an X509 Certificate, or both. These credentials can be used to authenticate with a remote machine.

                        If I want to allow this service credential to get access to a host that I set up as some specified user, I can put an entry in the file ~/.k5login that will specify what principals are allowed to login. So I add the above principal line and now that principal can log in.

                        Lets assume, however, that we want to limit what that user can do. Say we want to restrict it only to be able to perform git operations. Instead of ~/.k5login, we would use ~/.k5users. This allows us to put a list of commands on the line. It would look like this:

                        anisble/testing.demo1.freeipa.org@DEMO1.FREEIPA.ORG /usr/bin/git

                        Ansible Tower

                        Now that we can set up delegations for the playbooks to use, we can turn our eyes to Ansible Tower. Today, when a user kicks off a playbook from Tower, they have to reuse a set of credentials stored in Ansible tower. However, that means that any external identity management must be duplicated inside tower.

                        What if we need to pass through the user that logs in to Tower in order to use that initial users identity for operations? We have a few tools available.

                        Lets start with the case where the user logs in to the Tower instance using Kerberos. We can make use of a mechanism that goes by the unwieldy name of Service-for-User-to-Proxy, usually reduced to S4U2Proxy. This provides a constrained delegation.

                        What if a user is capable of logging in via some mechanism that is not Kerberos? There is a second mechanism called Service-for-User-to-Self. This allows a system to convert from, say, a password based mechanism, to a Kerberos ticket.

                        Simo Sorce wrote these up a few years back.

                        https://ssimo.org/blog/id_011.html

                        And the Microsoft RFC that describe the mechanisms in detail

                        https://msdn.microsoft.com/en-us/library/cc246071.aspx

                        In the case of Ansible Tower, we’d have to specify at the playbook level what user to use when executing the template: The AWX account that runs tower, or the TGT fetched via the S4U* mechanism.

                        What would it take to extend Tower to do use S4U? Tower can already user Kerberos from the original user:

                        https://docs.ansible.com/ansible-tower/latest/html/administration/kerberos_auth.html.

                        The Tower web application would then need to be able to perform the S4U transforms. Fortunately, iot is Python cade. The FreeIPA server has to perform these transforms itself, and it would be comparable transforms.

                        Configuring the S4U mechanisms in FreeIPA is fairly manual process, as documented by https://vda.li/en/posts/2013/07/29/Setting-up-S4U2Proxy-with-FreeIPA/ I would suggest using Ansible to automate it.

                        Wrap Up

                        Kerberos provides a distributed authentication scheme with validation that the user is still active. The is a powerful combination. Ansible should be able to take advantage of the Kerberos support in ssh to greatly streaml;ine the authorization decisions in provisioning and orchestration.

                        Episode 132 - Bird Scooter: 0, Cory Doctorow: 1

                        Posted by Open Source Security Podcast on February 04, 2019 01:01 AM
                        Josh and Kurt talk about the Bird Scooter vs Corey Doctorow incident. We then get into some of the social norms around new technology and what lessons the security industry can take from something new like shared scooters.


                        <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8503886/height/90/theme/custom/thumbnail/yes/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>


                        Show Notes


                          Using the latest 389-ds on OpenSUSE

                          Posted by William Brown on January 29, 2019 01:00 PM

                          Using the latest 389-ds on OpenSUSE

                          Thanks to some help from my friend who works on OBS, I’ve finally got a good package in review for submission to tumbleweed. However, if you are impatient and want to use the “latest” and greatest 389-ds version on OpenSUSE (docker anyone?).

                          docker run -i -t opensuse/tumbleweed:latest
                          zypper ar obs://network:ldap network:ldap
                          zypper in 389-ds
                          

                          Now, we still have an issue with “starting” from dsctl (we don’t really expect you to do it like this ….) so you have to make a tweak to defaults.inf:

                          vim /usr/share/dirsrv/inf/defaults.inf
                          # change the following to match:
                          with_systemd = 0
                          

                          After this, you should now be able to follow our new quickstart guide on the 389-ds website.

                          I’ll try to keep this repo up to date as much as possible, which is great for testing and early feedback to changes!

                          EDIT: Updated 2019-04-03 to change repo as changes have progressed forward.

                          Episode 131 - Windows micropatches, Google's privacy fine, and Mastercard fixes trial abuse

                          Posted by Open Source Security Podcast on January 28, 2019 01:03 AM
                          Josh and Kurt talk about non-Microsoft Windows micropatches. The days of pretending closed source matters are long gone. Google gets hit with a privacy fine, that probably won't matter. And Mastercard makes it easier for consumers to not accidentally sign up for services they don't want.



                          <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8416991/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

                          Show Notes


                            Your Own, Personal, Ansible

                            Posted by Adam Young on January 25, 2019 06:22 PM

                            Me. Us. Them.

                            The story I tell when I explain the various offereings that Red Hat has based on Ansible follow is based on the progression of Me. Us. Them.

                            Me: Get my playbook working for me on my workstation. For this, I use Ansible Engine.

                            Us: Share my playbook with my larger team. For this, I use Ansible Tower.

                            Them: Make a Self service catalog for the larger organization to consume. This is where Ansible integration into other products comes in to play. I typically talk about CloudForms integration here, but Satellite and OpenShift are also capable of making use of Ansible here.

                            Here is how I have my local setup for doing Ansible based development. This is organized roughly around how Tower will later consume the roles and playbooks I design here.

                            Git Repositories

                            Ever since I started doing the majority of my coding on a Linux system, I have kept my working code under my home directory in a sub-directory called “devel.” For my current work, I have two main git repositories under ~/home/devel that contain Ansible work. Rippowam, which I wrote about a while back, is my provisioning and install repository. There are multiple playbooks here, which can provision to OpenStack, Azure, and direct Libvirt. Additional playbooks and roles can:

                            • Install upstream code like FreeIPA
                            • Enroll a system with the Red Hat content delivery systems
                            • Install Red Hat products

                            My Second repository is a DevOps demo I’ve started working on. It is designed to show off the capabilities of Ansible when talking to people that are involved in development, continuous delivery, and operations. I keep this separate from the Rippowam repo in order to keep it simple and understandable.

                            ~/ansible

                            I want to organize the code that is used to call the playbook in a similar manner to what I am going to use when I later deploy it to tower.I keep all of the configuration information in a local subdirectory outside of git called ~/ansible. Here is the expanded directory listing

                            ls ~/ansible/
                            bin  files  inventories  variables

                            ~/ansible/bin

                            I’ve found that the command lines to call a playbook get somewhat large For all but the most trivial of playbooks. An example:

                            ansible-playbook \
                                -i ~/ansible/inventories/localhost.ini \
                                -e @~/ansible/variables/azure.yml \
                                -e @~/ansible/variables/ghoul.yml \
                                playbooks/azure.yml 

                            This call references four distinct variables:

                            • The inventory file
                            • the variables for defining the cluster as customized for a specific providers. These are in the azure.yml file. Yes, this could be better named.
                            • The variables specific to the application I am deploying
                            • the playbook itself.

                            To record the parameters used for a specific invocation, I keep these scripts in the bin directory. A naming scheme is starting to emerge:

                            <cloud>-<intention-<stage>.sh

                            For example:

                            azure-ghoul-build.sh

                            Most of these shell scripts have a minimal amount of boilerplate. For instance, since I keep an ansible.cfg checked in as part of rippowam, I want to change directory to ~/devel/rippowam prior to executing the playbook. So the overall structure looks like this:

                            #!/bin/sh
                            cd ~/devel/rippowam
                            ansible-playbook  -e image_name=rhel-guest-image-7.5-1a  \
                            -e @~/ansible/variables/vault.yml \
                            -e cloudname=fsi-moc -e @~/.config/openstack/clouds.yaml  \
                            -e @~/ansible/variables/ghoul.yml  \
                            ~/devel/rippowam/playbooks/openstack-provision.yml 
                            

                            ~/ansible/inventories

                            In the past, my inventory files have been fairly thick, including many of the variables required by the playbooks. As I reuse the inventories more and more, I have been pulling those variables out into separate variables files. Thus, the files under inventories are more simple host lists.

                            In addition, I’ve been trying to get my code to run against dynamic inventories as much as possible. Both Ansible and OpenStack have external python scripts that build the inventories. These code fragments are also placed in the inventories sub directory.

                            ~/ansible/variables

                            When I extract variables from either a playbook or an inventory file, I put into a yaml file under the variables sub directory. These files also include any vaults I use for storing credentials

                            ~/ansible/files

                            Update: I didn’t explain what this is for. Sometimes, I can’t check a file in to git due to licensing or other legal reasons. Sometimes, they are too big, like VM images. This directory is for those kinds of files.

                            Working with Tower

                            Each of the files I have above have an analogue in Tower. The shell scripts in ~/ansible/bin become the templates. The inventories become inventories. The git repos become projects. The variables go a few different places: templates, credentials, inventories.

                            Once the template is configured in Tower, the structure above makes it possible to continue development using my workstation. I can make and test changes locally. When I am satisfied with them, I can push them to git, and run the resync process in Tower. I find this is much more streamlined than editing code, checking it in to git, running it from tower, finding what is broken, and repeating the process. I make heavy use of the Ansible –start-at-task tag, and I would rather add and remove this from a shells script via editor than add-and-remove it from a Template. As I said at the onset, Tower is used for shared code, and I don’t want to share my templates in non-functional form.

                            Azure: from Portal to Ansible: part 2

                            Posted by Adam Young on January 23, 2019 03:35 PM

                            In my last post, I went from the Azure Web Portal to the command line. Time to go one step further and use Ansible.

                            Ansible Modules for Azure

                            Ansible Engine is the Red Hat supported way to run Ansible from the command line. However, we don’t support every single upstream module. There’s over 2000 modules and the modules are typically run on the remote system. Azure falls into this category. What this means is Azure modules must be installed via pip from upstream, not from RHEL repos. And, since we are installing via pip, we need to install pip first. Pip is installed via software collections.

                             sudo subscription-manager repos --enable rhel-server-rhscl-7-rpms
                            sudo yum install python27-python-pip -y
                            scl enable python27 bash
                            pip install --user 'ansible[azure]'
                            

                            Note that in order to run ansible-playbooks that use the Azure modules in the future, you will have to re-run the scl enable line prior to executing the playbook.

                            I took the identity info from my last post and put it into a yaml file:

                            $ cat ~/azure.yml 
                            
                            $  
                            ---
                            "appId": "fb511363-5616-4b1b-a74e-9c7ace6887a3"
                            "displayName": "Rippowam"
                            "name": "http://Rippowam"
                            "password": "redacted"
                            "tenant": "a003ca9d-0f6b-4f3a-adc2-cd94f0ff402d"
                            

                            I have a “Hello World” playbook that creates a resource group in Azure:

                            ---
                            - hosts: localhost
                              become: no
                              vars:
                              tasks:
                                - name: Create a resource group
                                  azure_rm_resourcegroup:
                                    name: "Ossipee"
                                    location: "eastus2"
                            

                            And I can run this playbook with:

                            ansible-playbook -e @~/azure.yml devel/azure-ansible/azure.yml
                            

                            Switch to CLI

                            Now, last year I had a provisioning playbook for Azure written using Fedora 25 and the Ansible modules. This year, I need to demonstrate using RHEL 7 and Ansible Tower. However, the pip based installer requires many newer versions of Python files, as well as some native packages, than I feel comfortable running on a RHEL 7 system, especially since some of them will likely conflict with the Ansible Tower versions. So, until we get a RHEL 7 friendly version of the Ansible modules, I have resorted to using the Ansible command module and the Azure command line. Fortunately, the command line has been idempotent in all the places I’ve tried, and leads to straight forward roles. For example, creating virtual machines in a loop, much as I did last year is:

                            - name: create vm
                              command: az vm create \
                                -n {{ item.name }}  \
                                -g {{ az_resources }} \
                                --image RHEL  \
                                --availability-set {{ az_av_set }} \
                                --nics "{{ item.name }}.nic" \
                                --admin-username cloud-user \
                                --ssh-key-value "{{ pubkey }}"
                              with_items: "{{ cluster_hosts }}"
                              register: osservers
                            

                            In my next post, I will document how I manage all of the different playbooks and their variations for different clouds, but I will give you a sneak peek here. I run this playbook from a bash script that looks like this:

                            #!/bin/sh
                            
                            cd ~/devel/rippowam
                            
                            ansible-playbook \
                                -i ~/ansible/inventories/localhost.ini \
                                -e @~/ansible/variables/azure.yml \
                                -e @~/ansible/variables/ghoul.yml \
                                playbooks/azure.yml 
                            

                            The azure specific login variables are in ~/ansible/variables/azure.yml. Right now, they use bad variable names, as that was what the Azure modules suggested. I plan on going back and prefixing them all with az_.

                            Azure: from Portal to Ansible: part 1

                            Posted by Adam Young on January 23, 2019 03:23 PM

                            While Azure figured prominently in my work about a year ago, I have not had as much to do with it again until recently. I had to relearn everything I had set up last year. As a Keystone and FreeIPA developer, I was focused on identity. Thus, it is somewhat ironic that I had problems getting my head around the identity setup when using Ansible to manage Azure. Here are the steps I went through to go from using the Web Portal to getting Ansible to work. Part one gets through the identity stuff.

                            Table of contents

                            Portal

                            Since I burnt through my free Azure time last year, I have to pay for this. I have set up a credit card etc. To login, I go to portal.azure.com which bounces me to a single sign on page, and then back to the portal.


                            Using the UI to create a Resource Group and then a VM is pretty well documented, and I will skip over that here. Next was to use the Command Line Interface (CLI).

                            Command Line Interface

                            It turns out that the CLI is supported in Fedora. All I had to do to get it was to yum install.

                            sudo yum install azure-cli-2.0.54-1.el7.x86_64  -y

                            The first step is to perform a login. That kicks up a browser for Single Sign On:

                            $ az login
                            Note, we have launched a browser for you to login. For old experience with device code, use "az login --use-device-code"

                            I find this a little frustrating, as it is not something I would want to have happen in a scriptable environment. I realize it makes the intial workflow easy, but it makes it harder to figure out how to script azure without human interaction.

                            Once the SSO is complete, I get a json block displayed in the command line prompt window.

                            [
                              {
                                "cloudName": "AzureCloud",
                                "id": "9ffc4e5a-a9c3-4c0b-b5ef-b6a7d7a90178",
                                "isDefault": true,
                                "name": "Pay-As-You-Go",
                                "state": "Enabled",
                                "tenantId": "a003ca9d-0f6b-4f3a-adc2-cd94f0ff402d",
                                "user": {
                                  "name": "adam@younglogic.com",
                                  "type": "user"
                                }
                              }
                            ]
                            

                            Next I want to figure out how to do that without using WebSSO. My WebSSO account fails when I do it as command line parameters

                            [ayoung@ayoungP40 azure]$ az logout
                            [ayoung@ayoungP40 azure]$ az vm list
                            Please run 'az login' to setup account.
                            [ayoung@ayoungP40 rippowam]$ az logout
                            [ayoung@ayoungP40 rippowam]$ az login -u adam@younglogic.com -p $AZ_PASSWORD
                            The user name might be invalid. For cross-check, try 'az login' to authenticate through browser.
                            

                            So I did the interactive login again, then followed the rules here:

                            $ az ad sp create-for-rbac --name Rippowam
                            Changing "Rippowam" to a valid URI of "http://Rippowam", which is the required format used for service principal names
                            {
                              "appId": "fb511363-5616-4b1b-a74e-9c7ace6887a3",
                              "displayName": "Rippowam",
                              "name": "http://Rippowam",
                              "password": "<redacted>",
                              "tenant": "a003ca9d-0f6b-4f3a-adc2-cd94f0ff402d"
                            }
                            

                            And using that data I can now do a log in:

                            $ az login --service-principal --username http://Rippowam --password $PASSWORD --tenant a003ca9d-0f6b-4f3a-adc2-cd94f0ff402d
                            
                            [
                              {
                                "cloudName": "AzureCloud",
                                "id": "9ffc4e5a-a9c3-4c0b-b5ef-b6a7d7a90178",
                                "isDefault": true,
                                "name": "Pay-As-You-Go",
                                "state": "Enabled",
                                "tenantId": "a003ca9d-0f6b-4f3a-adc2-cd94f0ff402d",
                                "user": {
                                  "name": "http://Rippowam",
                                  "type": "servicePrincipal"
                                }
                              }
                            ]
                            
                            $ az role assignment list --assignee fb511363-5616-4b1b-a74e-9c7ace6887a3 
                            [
                              {
                                "canDelegate": null,
                                "id": "/subscriptions/9ffc4e5a-a9c3-4c0b-b5ef-b6a7d7a90178/providers/Microsoft.Authorization/roleAssignments/7460d266-56be-4843-843a-53ed54e41ce0",
                                "name": "7460d266-56be-4843-843a-53ed54e41ce0",
                                "principalId": "92b12b1c-78ec-45b2-af40-4bb3130f8380",
                                "principalName": "http://Rippowam",
                                "roleDefinitionId": "/subscriptions/9ffc4e5a-a9c3-4c0b-b5ef-b6a7d7a90178/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c",
                                "roleDefinitionName": "Contributor",
                                "scope": "/subscriptions/9ffc4e5a-a9c3-4c0b-b5ef-b6a7d7a90178",
                                "type": "Microsoft.Authorization/roleAssignments"
                              }
                            ]
                            
                            

                            Since all resources live in a (resource) group and a resource group lives in a location, I need to find a location to create a resource group:

                            [ayoung@ayoungP40 azure]$ az account list-locations | jq '.[] | .name '
                            "eastasia"
                            "southeastasia"
                            "centralus"
                            "eastus"
                            "eastus2"
                            "westus"
                            "northcentralus"
                            "southcentralus"
                            "northeurope"
                            "westeurope"
                            "japanwest"
                            "japaneast"
                            "brazilsouth"
                            "australiaeast"
                            "australiasoutheast"
                            "southindia"
                            "centralindia"
                            "westindia"
                            "canadacentral"
                            "canadaeast"
                            "uksouth"
                            "ukwest"
                            "westcentralus"
                            "westus2"
                            "koreacentral"
                            "koreasouth"
                            "francecentral"
                            "francesouth"
                            "australiacentral"
                            "australiacentral2"
                            

                            So I’ll create a resource called Rippowam in East US 2:

                            [ayoung@ayoungP40 azure]$ az group create   --name Rippowam --location "eastus2"
                            
                            {
                              "id": "/subscriptions/9ffc4e5a-a9c3-4c0b-b5ef-b6a7d7a90178/resourceGroups/Rippowam",
                              "location": "eastus2",
                              "managedBy": null,
                              "name": "Rippowam",
                              "properties": {
                                "provisioningState": "Succeeded"
                              },
                              "tags": null
                            }
                            
                            

                            Now that I can use the CLI, it is time to try Ansible. That is in my next post.

                            Episode 130 - Chat with Snyk co-founder Danny Grander

                            Posted by Open Source Security Podcast on January 21, 2019 01:12 AM
                            Josh and Kurt talk to Danny Grander one of the co-founders of Snyk about Zip Slip, what it is, how to fix it, and how they disclosed everything. We also touch on plenty of other open source security topics as Danny is involved in many aspects of open source security.



                            <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8328605/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

                            Show Notes


                              SUSE Open Build Service cheat sheet

                              Posted by William Brown on January 18, 2019 01:00 PM

                              SUSE Open Build Service cheat sheet

                              Part of starting at SUSE has meant that I get to learn about Open Build Service. I’ve known that the project existed for a long time but I have never had a chance to use it. So far I’m thoroughly impressed by how it works and the features it offers.

                              As A Consumer

                              The best part of OBS is that it’s trivial on OpenSUSE to consume content from it. Zypper can add projects with the command:

                              zypper ar obs://<project name> <repo nickname>
                              zypper ar obs://network:ldap network:ldap
                              

                              I like to give the repo nickname (your choice) to be the same as the project name so I know what I have enabled. Once you run this you can easily consume content from OBS.

                              Package Management

                              As someone who has started to contribute to the suse 389-ds package, I’ve been slowly learning how this work flow works. OBS similar to GitHub/Lab allows a branching and request model.

                              On OpenSUSE you will want to use the osc tool for your workflow:

                              zypper in osc
                              # If you plan to use the "service" command
                              zypper in obs-service-tar obs-service-obs_scm obs-service-recompress obs-service-set_version obs-service-download_files
                              

                              You can branch from an existing project to make changes with:

                              osc branch <project> <package>
                              osc branch network:ldap 389-ds
                              

                              This will branch the project to my home namespace. For me this will land in “home:firstyear:branches:network:ldap”. Now I can checkout the content on to my machine to work on it.

                              osc co <project>
                              osc co home:firstyear:branches:network:ldap
                              

                              This will create the folder “home:…:ldap” in the current working directory.

                              From here you can now work on the project. Some useful commands are:

                              Add new files to the project (patches, new source tarballs etc).

                              osc add <path to file>
                              osc add feature.patch
                              osc add new-source.tar.xz
                              

                              Edit the change log of the project (I think this is used in release notes?)

                              osc vc
                              

                              To ammend your changes, use:

                              osc vc -e
                              

                              Build your changes locally matching the system you are on. Packages normally build on all/most OpenSUSE versions and architectures, this will build just for your local system and arch.

                              osc build
                              

                              Make sure you clean up files you aren’t using any more with:

                              osc rm <filename>
                              # This commands removes anything untracked by osc.
                              osc clean
                              

                              Commit your changes to the OBS server, where a complete build will be triggered:

                              osc commit
                              

                              View the results of the last commit:

                              osc results
                              

                              Enable people to use your branch/project as a repository. You edit the project metadata and enable repo publishing:

                              osc meta prj -e <name of project>
                              osc meta prj -e home:firstyear:branches:network:ldap
                              
                              # When your editor opens, change this section to enabled (disabled by default):
                              <publish>
                                <enabled />
                              </publish>
                              

                              NOTE: In some cases if you have the package already installed, and you add the repo/update it won’t install from your repo. This is because in SUSE packages have a notion of “vendoring”. They continue to update from the same repo as they were originally installed from. So if you want to change this you use:

                              zypper [d]up --from <repo name>
                              

                              You can then create a “request” to merge your branch changes back to the project origin. This is:

                              osc sr
                              

                              A helpful maintainer will then review your changes. You can see this with.

                              osc rq show <your request id>
                              

                              If you change your request, to submit again, use:

                              osc sr
                              

                              And it will ask if you want to replace (supercede) the previous request.

                              I was also helped by a friend to provie a “service” configuration that allows generation of tar balls from git. It’s not always appropriate to use this, but if the repo has a “_service” file, you can regenerate the tar with:

                              osc service ra
                              

                              So far this is as far as I have gotten with OBS, but I already appreciate how great this work flow is for package maintainers, reviewers and consumers. It’s a pleasure to work with software this well built.

                              As an additional piece of information, it’s a good idea to read the OBS Packaging Guidelines
                              to be sure that you are doing the right thing!

                              Structuring Rust Transactions

                              Posted by William Brown on January 18, 2019 01:00 PM

                              Structuring Rust Transactions

                              I’ve been working on a database-related project in Rust recently, which takes advantage of my concurrently readable datastructures. However I ran into a problem of how to structure Read/Write transaction structures that shared the reader code, and container multiple inner read/write types.

                              Some Constraints

                              To be clear, there are some constraints. A “parent” write, will only ever contain write transaction guards, and a read will only ever contain read transaction guards. This means we aren’t going to hit any deadlocks in the code. Rust can’t protect us from mis-ording locks. An additional requirement is that readers and a single write must be able to proceed simultaneously - but having a rwlock style writer or readers behaviour would still work here.

                              Some Background

                              To simplify this, imagine we have two concurrently readable datastructures. We’ll call them db_a and db_b.

                              struct db_a { ... }
                              
                              struct db_b { ... }
                              

                              Now, each of db_a and db_b has their own way to protect their inner content, but they’ll return a DBWriteGuard or DBReadGuard when we call db_a.read()/write() respectively.

                              impl db_a {
                                  pub fn read(&self) -> DBReadGuard {
                                      ...
                                  }
                              
                                  pub fn write(&self) -> DBWriteGuard {
                                      ...
                                  }
                              }
                              

                              Now we make a “parent” wrapper transaction such as:

                              struct server {
                                  a: db_a,
                                  b: db_b,
                              }
                              
                              struct server_read {
                                  a: DBReadGuard,
                                  b: DBReadGuard,
                              }
                              
                              struct server_write {
                                  a: DBWriteGuard,
                                  b: DBWriteGuard,
                              }
                              
                              impl server {
                                  pub fn read(&self) -> server_read {
                                      server_read {
                                          self.a.read(),
                                          self.b.read(),
                                      }
                                  }
                              
                                  pub fn write(&self) -> server_write {
                                      server_read {
                                          self.a.write(),
                                          self.b.write(),
                                      }
                                  }
                              }
                              

                              The Problem

                              Now the problem is that on my server_read and server_write I want to implement a function for “search” that uses the same code. Search or a read or write should behave identically! I wanted to also avoid the use of macros as the can hide issues while stepping in a debugger like LLDB/GDB.

                              Often the answer with rust is “traits”, to create an interface that types adhere to. Rust also allows default trait implementations, which sounds like it could be a solution here.

                              pub trait server_read_trait {
                                  fn search(&self) -> SomeResult {
                                      let result_a = self.a.search(...);
                                      let result_b = self.b.search(...);
                                      SomeResult(result_a, result_b)
                                  }
                              }
                              

                              In this case, the issue is that &self in a trait is not aware of the fields in the struct - traits don’t define that fields must exist, so the compiler can’t assume they exist at all.

                              Second, the type of self.a/b is unknown to the trait - because in a read it’s a “a: DBReadGuard”, and for a write it’s “a: DBWriteGuard”.

                              The first problem can be solved by using a get_field type in the trait. Rust will also compile this out as an inline, so the correct thing for the type system is also the optimal thing at run time. So we’ll update this to:

                              pub trait server_read_trait {
                                  fn get_a(&self) -> ???;
                              
                                  fn get_b(&self) -> ???;
                              
                                  fn search(&self) -> SomeResult {
                                      let result_a = self.get_a().search(...); // note the change from self.a to self.get_a()
                                      let result_b = self.get_b().search(...);
                                      SomeResult(result_a, result_b)
                                  }
                              }
                              
                              impl server_read_trait for server_read {
                                  fn get_a(&self) -> &DBReadGuard {
                                      &self.a
                                  }
                                  // get_b is similar, so ommitted
                              }
                              
                              impl server_read_trait for server_write {
                                  fn get_a(&self) -> &DBWriteGuard {
                                      &self.a
                                  }
                                  // get_b is similar, so ommitted
                              }
                              

                              So now we have the second problem remaining: for the server_write we have DBWriteGuard, and read we have a DBReadGuard. There was a much longer experimentation process, but eventually the answer was simpler than I was expecting. Rust allows traits to have Self types that enforce trait bounds rather than a concrete type.

                              So provided that DBReadGuard and DBWriteGuard both implement “DBReadTrait”, then we can have the server_read_trait have a self type that enforces this. It looks something like:

                              pub trait DBReadTrait {
                                  fn search(&self) -> ...;
                              }
                              
                              impl DBReadTrait for DBReadGuard {
                                  fn search(&self) -> ... { ... }
                              }
                              
                              impl DBReadTrait for DBWriteGuard {
                                  fn search(&self) -> ... { ... }
                              }
                              
                              pub trait server_read_trait {
                                  type GuardType: DBReadTrait; // Say that GuardType must implement DBReadTrait
                              
                                  fn get_a(&self) -> &Self::GuardType; // implementors must return that type implementing the trait.
                              
                                  fn get_b(&self) -> &Self::GuardType;
                              
                                  fn search(&self) -> SomeResult {
                                      let result_a = self.get_a().search(...);
                                      let result_b = self.get_b().search(...);
                                      SomeResult(result_a, result_b)
                                  }
                              }
                              
                              impl server_read_trait for server_read {
                                  fn get_a(&self) -> &DBReadGuard {
                                      &self.a
                                  }
                                  // get_b is similar, so ommitted
                              }
                              
                              impl server_read_trait for server_write {
                                  fn get_a(&self) -> &DBWriteGuard {
                                      &self.a
                                  }
                                  // get_b is similar, so ommitted
                              }
                              

                              This works! We now have a way to write a single “search” type for our server read and write types. In my case, the DBReadTrait also uses a similar technique to define a search type shared between the DBReadGuard and DBWriteGuard.

                              Security isn’t a feature

                              Posted by Josh Bressers on January 15, 2019 03:48 PM

                              As CES draws to a close, I’ve seen more than one security person complain that nobody at the show was talking about security. There were an incredible number of consumer devices unveiled, no doubt there is no security in any of them. I think we get caught up in the security world sometimes so we forget that the VAST majority of people don’t care if something has zero security. People want interesting features that amuse them or make their lives easier. Security is rarely either of these, generally it makes their lives worse so it’s an anti-feature to many.

                              Now the first thing many security people think goes something like this “if there’s no security they’ll be sorry when their lightbulb steals their wallet and dumps the milk on the floor!!!” The reality is that argument will convince nobody, it’s not even very funny so they’re laughing at us, not with us. Our thoughts by very nature blame all the wrong people and we try to scare them into listening to us. It’s never worked. Ever. That one time you think it worked they were only pretended to care so you would go away.

                              So it brings us to the idea that security isn’t a feature. Turning your lights on is a feature. Cooking you dinner is a feature. Driving your car is a feature. Not bursting into flames is not a feature. Well it sort of is, but nobody talks about it. Security is a lot like the bursting into flames thing. Security really is about something not happening, things not happening is the fundamental  problem we have when we try to talk about all this. You can’t build a plausible story around an event that may or may not happen. Trying to build a narrative around something that may or may not happen is incredibly confusing. This isn’t how feature work, features do positive things, they don’t not do negative things (I don’t even know if that’s right). Security isn’t a feature.

                              So the question you should be asking then is how do we make products being created contain more of this thing we keep calling security. The reality is we can’t make this happen given our current strategies. There are two ways products will be produced that are less insecure (see what I did there). Either the market demands it, which given the current trends isn’t happening anytime soon. People just don’t care about security. The second way is a government creates regulations that demand it. Given the current state of the world’s governments, I’m not confident that will happen either.

                              Let’s look at market demand first. If consumers decide that buying products that are horribly insecure is bad, they could start buying products with more built in security. But even the security industry can’t define what that really means. How can you measure which product has the best security? Consumers don’t have a way to know which products are safer. How to measure security could be a multi-year blog series so I won’t get into the details today.

                              What if the government regulates security? We sort of end up in a similar place to consumer demand. How do we define security? It’s a bit like defining safety I suppose. We’re a hundred years into safety regulations and still get a lot wrong and I don’t think anyone would argue defining safety is much easier than defining security. Security regulation would probably follow a similar path. It will be decades before things could be good enough to create real change. It’s very possible by then the machines will have taken over (that’s the secret third way security gets fixed, perhaps a post for another day).

                              So here we are again, things seem a bit doom and gloom. That’s not the intention of this post. The real purpose is to point out we have to change the way we talk about security. Yelling at vendors for building insecure devices isn’t going to ever work. We could possibly talk to consumers in a way that resonates with them, but does anyone buy the stove that promises to burst into flames the least? Nobody would ever use that as a marketing strategy. I bet it would have the opposite effect, a bit like our current behaviors and talking points I suppose.

                              Complaining that companies don’t take security seriously hasn’t ever worked and never will work. They need an incentive to care, us complaining isn’t an incentive. Stay tuned for some ideas on how to frame these conversations and who the audience needs to be.

                              Episode 129 - The EU bug bounty program

                              Posted by Open Source Security Podcast on January 14, 2019 12:58 AM
                              Josh and Kurt talk about the EU bug bounty program. There have been a fair number of people complaining it's solving the wrong problem, but it's the only way the EU has to spend money on open source today. If that doesn't change this program will fail.



                              <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8242709/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

                              Show Notes


                                Episode 128 - Australia's encryption backdoor bill

                                Posted by Open Source Security Podcast on January 07, 2019 12:41 AM
                                Josh and Kurt talk about Australia's recently passed encryption bill. What is the law that was passed, what does it mean, and what are the possible outcomes? The show notes contain a flow chart of possible outcomes.


                                <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/8156204/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

                                Show Notes


                                  TripleO Networks from Simplest to Not-So-Simple

                                  Posted by Adam Young on January 03, 2019 05:27 PM

                                  If you read the TripleO setup for network isolation, it lists eight distinct networks. Why does TripleO need so many networks? Lets take it from the ground up.

                                  WiFi to the Workstation

                                  WifI to the Workstation

                                  I run Red Hat OpenStack Platform (OSP) Director, which is the productized version of TripleO.  Everything I say here should apply equally well to the upstream and downstream variants.

                                  My setup has OSP Director running in a virtual machine (VM). To get that virtual machine set up takes network connectivity. I perform this via Wireless, as I move around the hose with my laptop, and the workstation has a built in wireless card.

                                  Let’s start here: Director runs inside a virtual machine on the workstation.  It has complete access to the network interface card (NIC) via macvtap.  This NIC is attached to a Cisco Catalyst switch.  A wired cable to my laptop is also attached to the switch. This allows me to setup and test the first stage of network connectivity:  SSH access to the virtual machine running in the workstation.

                                  Provisioning Network

                                  The Blue Network here is the provisioning network.  This reflects two of the networks from the Tripleo document:

                                  • IPMI* (IPMI System controller, iLO, DRAC)
                                  • Provisioning* (Undercloud control plane for deployment and management)

                                  These two distinct roles can be served by the same network in my setup, and, infact they must be.  Why?  Because my Dell servers have a  NIC that acts as both the IPMI endpoint and is the only NIC that supports PXE.  Thus, unless I wanted to do some serious VLAN wizardry, and get the NIC to switch both (tough to debug during the setup stage) I am better off with them both using untagged VLAN traffic.  Thus, each server is allocated two static IPv4 address, one to be used for IPMI, and one that will be assigned during the hardware provisioning.

                                  Apologies for the acronym soup.  It bothers me, too.

                                  Another way to think about the set of networks you need is via DHCP traffic.  Since the IPMI cards are statically assigned their IP addresses, they do not need a DHCP server.  But, the hardware’s Operating system will get its IP address from DHCP.  Thus, it is OK if these two functions share a Network.

                                  This does not scale very well.  IPMI and IDrac can both support DHCP, and that would be the better way to go in the future, but is beyond the scope of what I am willing to mess with in my lab.

                                  Deploying the Overcloud

                                  In order to deploy the overcloud, the Director machine needs to perform two classes of network calls:

                                  1. SSH calls to the baremetal OS to lunch the services, almost all of which are containers.  This is on the Blue network above.
                                  2. HTTPS calls to the services running in those containers.  These services also need to be able to talk to each other.  This is on the Yellow internal API network above.  I didn’t color code “Yellow” as you can’t read it.  Yellow.

                                  Internal (not) versus External

                                  You might notice that my diagram has an additional network; the External API network is shown in Red.

                                  Provisioning and calling services are two very different use cases.  The most common API call in OpenStack is POST https://identity/v3/auth/token.  This call is made prior to any other call.  The second most common is the call to validate a token.  The create token  call needs to be access able from everywhere that OpenStack is used.  The validate token call does not.  But, if the API server only  listens on the same network that is used for provisioning, that means the network is wide open;  people that should only be able to access the OpenStack APIs can now send network attacks against the IPMI cards.

                                  To split this traffic, either the network APIs need to listen on both networks, or the provisioning needs to happen on the external API network. Either way, both networks are going to be set up when the overcloud is deployed.

                                  Thus, the Red Server represents the API servers that are running on the controller, and the yellow server represents the internal agents that are running on the compute node.

                                  Some Keystone History

                                  When a user performs an action in the OpenStack system, they make an API call.  This request is processed by the webserver running on the appropriate controller host.  There is no difference between a Nova server requesting a token and project member requesting a token. These were seen as separate use cases, and were put on separate network ports.  The internal traffic was on port 35357, and the project member traffic was on port 5000.

                                  It turns out that running on two different ports of the same IP address does not solve the real problem people were trying to solve.  They wanted to limit API access via network, not by port.  Thus, there really was no need for two different ports, but rather two different IP addresses.

                                  This distinction still shows up in the Keystone service catalog, where Endpoints are classified as External or Internal.

                                  Deploying and Using a Virtual Machine

                                  Now Our Diagram has gotten a little more complicated.  Lets start with the newly added Red Lap top, attached to the External API network.  This system is used by our project member, and is used to create the new virtual machine via the compute create_server API call. In order:

                                  1. The API call comes from the outside world, travels over the Red external API network to the Nova server (shown in red)
                                  2. The Nova posts messages to the the Queue, which are eventually picked up and processed by the compute agent (shown in yellow).
                                  3. The compute agent talks back to the other API servers (also shown in Red) to fetch images, create network ports, and connect to storage volumes.
                                  4. The new VM (shown in green) is created and connects via an internal, non-routable IP address to the metadata server to fetch configuration data.
                                  5. The new VM is connected to the provider network (also shown in green).

                                  At this point, the VM is up and running.  If an end user wants to connect to it they can do so.  Obviously, the Provider network does not run all the way through the router to the end users system, but this path is the “open for business” network pathway.

                                  Note that this is an instance of a provider network as Assaf defined in his post.

                                  Tenant Networks

                                  Let say you are not using a provider network.  How does that change the setup?  First, lets re-label the Green network to be the “External Network.”  Notice that the virtual machines do not connect to it now.  Instead, they connect via the new, purple networks.

                                  Note that the Purple networks connect to the external network in the network controller node, show in purple on the bottom server.  This service plays the role of a router, converting the internal traffic on the tenant network to the external traffic.  This is where the Floating IPs terminate, and are mapped to an address on the internal network.

                                  Wrap Up

                                  The TripleO network story has evolved to support a robust configuration that splits traffic into its component segments.  The diagrams above attempt to pass along my understanding of how they work, and why.

                                  I’ve left off some of the story, as I do not show the separate networks that can be used for storage.  I’ve collapsed the controllers and agents into a simple block to avoid confusing detail, my goal is accuracy, but here it sacrifices precision.  It also only shows a simple rack configuration, much like the one here in my office.  The concepts presented should allow you to understand how it would scale up to a larger deployment.  I expect to talk about that in the future as well.

                                  I’ll be sure to update  this article with feedback. Please let me know what I got wrong, and what I can state more clearly.

                                  Remotely Provisioning a Virtual Machine using Ansible and Libvirt

                                  Posted by Adam Young on January 03, 2019 03:15 PM

                                  Ansible exists to help automate the time consuming repeated tasks that technologist depend upon. One very common jobs is to create and tear down a virtual machine. While cloud technologies have made this possible to perform remotely, there are many times when I’ve needed to setup and tear down virtual machines on systems that were stand alone Linux servers. In this case, the main interfaces to the machine are ssh and libvirt. I recently worked through an Ansible role to setup and tear down an virtual machine via libvirt, and I’d like to walk through it, and record my reasons for some of the decisions I made.

                                  Constant Refactoring

                                  Always work from success.  Change one thing at a time, so that you know what broke when things don’t work.  Thus, when I work something out, the first iteration is hard coded.  I get it to work, and then I clean it up.  The most common refactoring is to introduce a variable.  For example, If I am working with a file such as:

                                   

                                  /var/lib/libvirt/images/rhel-server-7.5-x86_64-kvm.qcow2

                                  I’ll use exactly that line in the Ansible play to start such as

                                  - name: push base vm image to hypervisor
                                    copy:
                                    src: /var/lib/libvirt/images/rhel-server-7.5-x86_64-kvm.qcow2
                                    dest: /var/lib/libvirt/images/rhel-server-7.5-x86_64-kvm.qcow2
                                    owner: qemu
                                    group: qemu
                                    mode: u=rw,g=r,o=r

                                  Once I get that to work, I’ll clean it up to something like:

                                  - name: push base vm image to hypervisor
                                    copy:
                                    src: "{{ source_image_dir }}/{{ source_image_file }}"
                                    dest: "{{ target_image_dir }}/{{ source_image_file }}"
                                    owner: qemu
                                    group: qemu
                                    mode: u=rw,g=r,o=r

                                  With the definition of the variables going into the role’s default/main.yml file.

                                  Customizing the VM Backing Store image

                                  The Backing store from the virtual machine is created by copying the original VM image file to a new file, and then using virt-customize to modify the image. This is a little expensive in terms of disk space; I could, instead, clone the original file and use the qcow aspect of it to provide the same image base to all of the VMs generated from it. I might end up doing that in the long run. However, that does put cross file dependencies in place. If I do something to the original file, I lose all of the VMs built off it. If I want to copy the VM to a remote machine, I would have to copy both files and keep them in the same directory. I may end up doing some of that in the future, if disk space becomes an issue.

                                  The virtual machine base image

                                  The above code block shows how I copy the raw image file over to the hypervisor. I find that I am often creating multiple VMs off of the same base file. While I could customize this file directly, it would then no longer match the fingerprint of the file I downloaded from the Red Hat site, and I would have no way to confirm I was using a safe image. Also, copying the file to the remote machine is one of the longest tasks in this playbook, so I do not remove it in the cleanup task list.

                                  Templatization of files to modify the base image

                                  Before I can modify the VM image, I need to copy templated files from the Ansible host to the remote system. This two step process is necessary, as I cannot fill in a template during the call to virt-customize. Thus, all templatization is done in the template tasks. For This script, I use the /tmp directory as the interim location. This could be problematic, and I would be better off creating a deliberate subdirectory under /home/cloud-user or another known location. That would be safer, and less likely to have a conflict.

                                  Network card access and Static IP Addresses

                                  The virtual machine I am building is going to have to work with both a PXE service and also be available to physical machines outside the cluster. As such, I want it to have a network interface linked to a physical one on its host, and to assign that interface a static IP address. The physical passthrough is handled by making the device into a macvtap device. The XML Fragment for it looks like this:

                                      
                                  
                                      <interface type="direct">
                                        <mac address="52:54:00:26:29:db">
                                        <source dev="em1" mode="bridge">
                                        <model type="virtio">
                                        
                                  </interface>

                                  The flat naming of the variable ethernet_device will be problematic over time, and I will probably make it a dictionary value under the with_items collection.

                                  To assign this device a static IP address, I copied an ifcfg-eth1 file and templatized it.

                                  Multiple Inventories

                                  I have a fairly powerful laptop that is supposed to work as a portable Demo machine. I want to be able to use the same playbook to deploy VMs on the laptop as I do on the workstation I’;ve been testing this on. On my laptop, I typically run with sshd disabled, and only enable it when I want to run this or similar Ansible playbooks.

                                  Part of the constant refactoring is moving variables from the tasks, to defaults, to the inventory files.

                                  More and more, my inventory setup is starting to look like ansible tower. Eventually, I expect to have something like the template mechanism to be able to track “run this playbook with that inventory and these specific values.”

                                  Creating servers in a loop

                                  While my current task requires only a single VM, eventually I am going to want two or more. This means that I need to create the set of servers in a loop. This actually ends up flowing into all tasks that modify the base image. This is one case where constant refactoring comes in, but also where I show I can easily break thee multi-inventory set up. For example, the addresses that are hard coded into the XML fragment above really need to vary per host. Thus, that fragment should look like this:

                                     
                                      <interface type="direct">
                                        <mac address="{{ item.mac }}">
                                        <source dev="em1" mode="bridge">
                                        <model type="virtio">
                                        
                                  </interface>

                                  And the ethernet configuration should look like this:

                                     
                                  TYPE=Ethernet
                                  PROXY_METHOD=none
                                  BROWSER_ONLY=no
                                  BOOTPROTO=none
                                  IPADDR={{ item.static_ip_address }}
                                  PREFIX=24
                                  GATEWAY=10.127.0.1
                                  DEFROUTE=yes
                                  IPV4_FAILURE_FATAL=no
                                  IPV6INIT=yes
                                  IPV6_AUTOCONF=yes
                                  IPV6_DEFROUTE=yes
                                  IPV6_FAILURE_FATAL=no
                                  IPV6_ADDR_GEN_MODE=stable-privacy
                                  NAME=eth1
                                  DEVICE=eth1
                                  ONBOOT=yes
                                  ZONE=public
                                  DNS1=10.127.0.7
                                  PEERDNS=no
                                  UUID= {{ item.uuid }}
                                   

                                  …and that still hard codes some values. The collection that I iterate through to create the servers now needs these additional keys. Thus, my default file should look like this:

                                  ---
                                  cluster_hosts:
                                    - {name: passimian, uuid: 9c92fad9-6ecb-3e6c-eb4d-8a47c6f50c0, static_ip_address: 10.127.0.3, mac: 52:54:00:26:29:db }
                                  

                                  The task for copying in the network configuration currently looks like this:

                                  - template:
                                      src: ifcfg-eth1.j2
                                      dest: '{{ hypervisor_keystore_dir }}/ifcfg-eth1'
                                  

                                  It will have to be modified to:

                                  - template:
                                      src: ifcfg-eth1.j2
                                      dest: '{{ hypervisor_keystore_dir }}/{{ item.name }}-ifcfg-eth1'
                                    with_items: "{{ cluster_hosts }}"
                                  

                                  And the configuration of the VM image would also have to reflect this. Currently the call is:

                                  -command  'id -u cloud-user &>/dev/null || /usr/sbin/useradd -u 1000 cloud-user'  --ssh-inject cloud-user:file:/tmp/authorized_keys   --hostname {{ item.name }}.home.younglogic.net   --copy-in {{ hypervisor_keystore_dir }}/ifcfg-eth1:/etc/sysconfig/network-scripts  --selinux-relabel
                                    with_items: "{{ cluster_hosts }}"
                                  

                                  The flag would need to be updated to

                                   --copy-in {{ hypervisor_keystore_dir }}/{{ item.name }}-ifcfg-eth1:/etc/sysconfig/network-scripts
                                  

                                  Since I start by making the changes in default/main.yml, I only have to make them once. Once I push the cluster_hosts definition to the inventory files, refactoring gets harder: I cannot atomically make a change without breaking one of the configurations. Once I have more than one system using this playbook, adding parameters this way introduces a non-backwards compatible change.

                                  Conclusion

                                  Like much system administration work, this task is going to be used before it is completed. It is perpetually a work-in-progress. This is healthy. As soon as we start getting afraid of our code, it calcifies and breaks. Even worse, we treat the code as immutable, and build layers around it, making simple things more complicated. These notes serve to remind me (and others) why things look the way they do, and where things are going. Hopefully, when time comes to change things in the future, these notes will help this code base grow to match the needs I have for it.

                                  G#

                                  Posted by Adam Young on January 02, 2019 09:52 PM

                                  G# is a magic note.  It takes the vanilla, banal, bland sound of a major scale and makes it into music. Here’s how.

                                  Listen to the first line of Fur Elise, by Beethoven. Focus on the left hand, Note where he added a G sharp.
                                  <iframe allowfullscreen="allowfullscreen" frameborder="0" height="394" src="https://musescore.com/user/23259686/scores/5376166/embed" width="100%"></iframe>“Fur Elise (Opening)” by Ludwig Von Beethoven

                                  Now read an listen to the start to Invention 13 by Bach. Again, pay attention to the G sharps, but also what the sound is where he leaves them as G natural.

                                  <iframe allowfullscreen="allowfullscreen" frameborder="0" height="394" src="https://musescore.com/user/23259686/scores/5376172/embed" width="100%"></iframe>“bach-invention-13-start” by J.S. Bach 4

                                  Both pieces are nominally in A minor (relative to C Major) but make heavy use of modulation to G Sharp.

                                  Lets start with the basic C scale

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c8547449185f',], "�X:1�K:C�L:1/4

                                  CDEFGABc�".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c85474491902',], "�X:1�K:C�L:1/4

                                  CDEFGABc�".replace(/\x01/g,"\n"), {}, {});</script>

                                  For every major scale, there is a relative minor. The relative minor scale is created by playing the major scale but starting and ending on the 6th note. For the Key of C major, the relative minor is A minor. It has no accidentals.

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c85474491977',], "�X:1�K:C�L:1/4

                                  ABcdefga�".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c854744919e8',], "�X:1�K:C�L:1/4

                                  ABcdefga�".replace(/\x01/g,"\n"), {}, {});</script>

                                  The harmonic minor scale is created by lifting the G one Half step:

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c85474491aa2',], "�X:1�K:C�L:1/4

                                  ABcdef^ga�".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c85474491b15',], "�X:1�K:C�L:1/4�ABcdef^ga�".replace(/\x01/g,"\n"), {}, {});</script>

                                  By raising that note, it makes an interval of a a minor third between the F and the G#. This strongly emphasizes the minor effect.

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c85474491b86',], "�X:1�K:C�L:1/4�f4^g4�".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c85474491bfd',], "�X:1�K:C�L:1/4

                                  f4^g4�".replace(/\x01/g,"\n"), {}, {});</script>

                                  When we start working with the relative minor, using the G sharp converts what was originally an E minor chord into an E major chord

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c85474491c6d',], "�X:1�K:C�L:1/4�[egb]4[e^gb]4�".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c85474491cdd',], "�X:1�K:C�L:1/4�[egb]4[e^gb]4�".replace(/\x01/g,"\n"), {}, {});</script>

                                  Conversion to the blues scale.

                                  This sound is used for so much more than just Baroque. Lets go back to our C scale, but this time play it from D to D. This is called the Dorian mode.

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c85474491d4c',], "�X:1�K:C�L:1/4�DEFGABcd�".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c85474491dbb',], "�X:1�K:C�L:1/4�DEFGABcd�".replace(/\x01/g,"\n"), {}, {});</script>

                                  If We drop out the E and the B, we end up with a minor Pentatonic scale.

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c85474491e29',], "�X:1�K:C�L:1/4�DFGAcd�".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c85474491e97',], "�X:1�K:C�L:1/4�DFGAcd�".replace(/\x01/g,"\n"), {}, {});</script>

                                  If we add in that G# again, we get a blues scale.

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c85474491f05',], "�X:1�K:C�L:1/4�DFG^GAcd�".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c85474491f74',], "�X:1�K:C�L:1/4�DFG^GAcd�".replace(/\x01/g,"\n"), {}, {});</script>

                                  If we rotate back to the Root position, we have a major blues scale:

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c85474491fe3',], "�X:1�K:C�L:1/4�CDFG^GAc�".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c85474492053',], "�X:1�K:C�L:1/4�CDFG^GAc�".replace(/\x01/g,"\n"), {}, {});</script>

                                  Back in the late 1930s, a Jazz musicians were looking for ways to get their lines of eighth notes to flow. The problem is that a major scale has 7 distinct pitches in it, but a measure has 8 spaces to fill. This means that a pattern of eighth notes does not fallout on the same down beat after a measure. Note where the chord tones fall on the following phrase.

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c854744920c5',], "�X:1�K:C�L:1/8

                                  | CDEF GABc | defg fedc |Z8|

                                  ".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c85474492136',], "�X:1�K:C�L:1/8

                                  | CDEF GABc | defg fedc |Z8|�".replace(/\x01/g,"\n"), {}, {});</script>

                                  For the first 7 beats, the C Maj 7 Chord tones are on the down beat. C On 1, E on 2, G on 3, and B on 4. But the C repeats on the upbeat of 4, and the downbeat of one in the second measure is, again, a non chord tone. This is much closer to a a D minor line than a C major line.

                                  If we add in the G Sharp, the line now falls out so that all the major chord tones are on down beats. We adjust our expectation so that A (the Sixth) is the fourth tone of the scale.

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c854744921d5',], "�X:1�K:C�L:1/8

                                  | CDEF G^GAB | cdef g^gab |Z8|

                                  ".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c85474492292',], "�X:1�K:C�L:1/8

                                  | CDEF G^GAB | cdef g^gab |Z8|�".replace(/\x01/g,"\n"), {}, {});</script>

                                  This works for the minor and Seventh chords as well. A D minor riff:

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c85474492337',], "�X:1�K:C�L:1/8

                                  | DEFG ^GABc | defg ^g=gfe |Z8|

                                  ".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c854744923ae',], "�X:1�K:C�L:1/8�| DEFG ^GABc | defg ^g=gfe |Z8|�".replace(/\x01/g,"\n"), {}, {});</script>

                                  A G 7 Riff

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c85474492424',], "�X:1�K:C�L:1/8

                                  | G^GAB cdef | G^GAB cdef |Z8|

                                  ".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c85474492498',], "�X:1�K:C�L:1/8�| G^GAB cdef |G^GAB cdef |Z8|�".replace(/\x01/g,"\n"), {}, {});</script>

                                  All this by adding the blues note.

                                  Of course, this is only for the Key of C. The same relationship works in any key. You take the Fifth tone and sharp it. For example, in the Key of G, you sharp the D.

                                  <script type="text/javascript">ABCJS.renderAbc(['abc-paper-5c8547449250a',], "�X:1�K:G�L:1/4�| G A B c | d^def| g4 |�".replace(/\x01/g,"\n"), {}, {}, {});</script>

                                  <script type="text/javascript">ABCJS.renderMidi(['abc-midi-5c8547449257c',], "�X:1�K:G�L:1/4�| G A B c | d^def| g4 |�".replace(/\x01/g,"\n"), {}, {});</script>

                                  I tried to illuminate this concept on the Saxophone a while back. There is a gaffe here where I mixed up the keys, but it does make the point.
                                  <iframe allowfullscreen="allowfullscreen" frameborder="0" height="315" src="https://www.youtube.com/embed/XgICg4r7MdY" width="560"></iframe>

                                  Useful USG pro 4 commands and hints

                                  Posted by William Brown on January 01, 2019 01:00 PM

                                  Useful USG pro 4 commands and hints

                                  I’ve recently changed from a FreeBSD vm as my router to a Ubiquiti PRO USG4. It’s a solid device, with many great features, and I’m really impressed at how it “just works” in many cases. So far my only disappointment is lack of documentation about the CLI, especially for debugging and auditing what is occuring in the system, and for troubleshooting steps. This post will aggregate some of my knowledge about the topic.

                                  Current config

                                  Show the current config with:

                                  mca-ctrl -t dump-cfg
                                  

                                  You can show system status with the “show” command. Pressing ? will cause the current compeletion options to be displayed. For example:

                                  # show <?>
                                  arp              date             dhcpv6-pd        hardware
                                  

                                  DNS

                                  The following commands show the DNS statistics, the DNS configuration, and allow changing the cache-size. The cache-size is measured in number of records cached, rather than KB/MB. To make this permanent, you need to apply the change to config.json in your controllers sites folder.

                                  show dns forwarding statistics
                                  show system name-server
                                  set service dns forwarding cache-size 10000
                                  clear dns forwarding cache
                                  

                                  Logging

                                  You can see and aggregate of system logs with

                                  show log
                                  

                                  Note that when you set firewall rules to “log on block” they go to dmesg, not syslog, so as a result you need to check dmesg for these.

                                  It’s a great idea to forward your logs in the controller to a syslog server as this allows you to aggregate and see all the events occuring in a single time series (great when I was diagnosing an issue recently).

                                  Interfaces

                                  To show the system interfaces

                                  show interfaces
                                  

                                  To restart your pppoe dhcp6c:

                                  release dhcpv6-pd interface pppoe0
                                  renew dhcpv6-pd interface pppoe0
                                  

                                  There is a current issue where the firmware will start dhcp6c on eth2 and pppoe0, but the session on eth2 blocks the pppoe0 client. As a result, you need to release on eth2, then renew of pppoe0

                                  If you are using a dynamic prefix rather than static, you may need to reset your dhcp6c duid.

                                  delete dhcpv6-pd duid
                                  

                                  To restart an interface with the vyatta tools:

                                  disconnect interface pppoe
                                  connect interface pppoe
                                  

                                  OpenVPN

                                  I have setup customised OpenVPN tunnels. To show these:

                                  show interfaces openvpn detail
                                  

                                  These are configured in config.json with:

                                  # Section: config.json - interfaces - openvpn
                                      "vtun0": {
                                              "encryption": "aes256",
                                              # This assigns the interface to the firewall zone relevant.
                                              "firewall": {
                                                      "in": {
                                                              "ipv6-name": "LANv6_IN",
                                                              "name": "LAN_IN"
                                                      },
                                                      "local": {
                                                              "ipv6-name": "LANv6_LOCAL",
                                                              "name": "LAN_LOCAL"
                                                      },
                                                      "out": {
                                                              "ipv6-name": "LANv6_OUT",
                                                              "name": "LAN_OUT"
                                                      }
                                              },
                                              "mode": "server",
                                              # By default, ubnt adds a number of parameters to the CLI, which
                                              # you can see with ps | grep openvpn
                                              "openvpn-option": [
                                                      # If you are making site to site tunnels, you need the ccd
                                                      # directory, with hostname for the file name and
                                                      # definitions such as:
                                                      # iroute 172.20.0.0 255.255.0.0
                                                      "--client-config-dir /config/auth/openvpn/ccd",
                                                      "--keepalive 10 60",
                                                      "--user nobody",
                                                      "--group nogroup",
                                                      "--proto udp",
                                                      "--port 1195"
                                              ],
                                              "server": {
                                                      "push-route": [
                                                              "172.24.0.0/17"
                                                      ],
                                                      "subnet": "172.24.251.0/24"
                                              },
                                              "tls": {
                                                      "ca-cert-file": "/config/auth/openvpn/vps/vps-ca.crt",
                                                      "cert-file": "/config/auth/openvpn/vps/vps-server.crt",
                                                      "dh-file": "/config/auth/openvpn/dh2048.pem",
                                                      "key-file": "/config/auth/openvpn/vps/vps-server.key"
                                              }
                                      },
                                  

                                  Netflow

                                  Net flows allow a set of connection tracking data to be sent to a remote host for aggregation and analysis. Sadly this process was mostly undocumented, bar some useful forum commentors. Here is the process that I came up with. This is how you configure it live:

                                  set system flow-accounting interface eth3.11
                                  set system flow-accounting netflow server 172.24.10.22 port 6500
                                  set system flow-accounting netflow version 5
                                  set system flow-accounting netflow sampling-rate 1
                                  set system flow-accounting netflow timeout max-active-life 1
                                  commit
                                  

                                  To make this persistent:

                                  "system": {
                                              "flow-accounting": {
                                                      "interface": [
                                                              "eth3.11",
                                                              "eth3.12"
                                                      ],
                                                      "netflow": {
                                                              "sampling-rate": "1",
                                                              "version": "5",
                                                              "server": {
                                                                      "172.24.10.22": {
                                                                              "port": "6500"
                                                                      }
                                                              },
                                                              "timeout": {
                                                                      "max-active-life": "1"
                                                              }
                                                      }
                                              }
                                      },
                                  

                                  To show the current state of your flows:

                                  show flow-accounting
                                  

                                  The idea of CI and Engineering

                                  Posted by William Brown on January 01, 2019 01:00 PM

                                  The idea of CI and Engineering

                                  In software development I see and interesting trend and push towards continuous integration, continually testing, and testing in production. These techniques are designed to allow faster feedback on errors, use real data for application testing, and to deliver features and changes faster.

                                  But is that really how people use software on devices? When we consider an operation like google or amazon, this always online technique may work, but what happens when we apply a continous integration and “we’ll patch it later” mindset to devices like phones or internet of things?

                                  What happens in other disciplines?

                                  In real engineering disciplines like aviation or construction, techniques like this don’t really work. We don’t continually build bridges, then fix them when they break or collapse. There are people who provide formal analysis of materials, their characteristics. Engineers consider careful designs, constraints, loads and situations that may occur. The structure is planned, reviewed and verified mathematically. Procedures and oversight is applied to ensure correct building of the structure. Lessons are learnt from past failures and incidents and are applied into every layer of the design and construction process. Communication between engineers and many other people is critical to the process. Concerns are always addressed and managed.

                                  The first thing to note is that if we just built lots of scale-model bridges and continually broke them until we found their limits, this would waste many resources to do this. Bridges are carefully planned and proven.

                                  So whats the point with software?

                                  Today we still have a mindset that continually breaking and building is a reasonable path to follow. It’s not! It means that the only way to achieve quality is to have a large test suite (requires people and time to write), which has to be further derived from failures (and those failures can negatively affect real people), then we have to apply large amounts of electrical energy to continually run the tests. The test suites can’t even guarantee complete coverage of all situations and occurances!

                                  This puts CI techniques out of reach of many application developers due to time and energy (translated to dollars) limits. Services like travis on github certainly helps to lower the energy requirement, but it doesn’t stop the time and test writing requirements.

                                  No matter how many tests we have for a program, if that program is written in C or something else, we continually see faults and security/stability issues in that software.

                                  What if we CI on … a phone?

                                  Today we even have hardware devices that are approached as though they “test in production” is a reasonable thing. It’s not! People don’t patch, telcos don’t allow updates out to users, and those that are aware, have to do custom rom deployment. This creates an odd dichomtemy of “haves” and “haves not”, of those in technical know how who have a better experience, and the “haves not” who have to suffer potentially insecure devices. This is especially terrifying given how deeply personal phones are.

                                  This is a reality of our world. People do not patch. They do not patch phones, laptops, network devices and more. Even enterprises will avoid patching if possible. Rather than trying to shift the entire culture of humans to “update always”, we need to write software that can cope in harsh conditions, for long term. We only need to look to software in aviation to see we can absolutely achieve this!

                                  What should we do?

                                  I believe that for software developers to properly become software engineers we should look to engineers in civil and aviation industries. We need to apply:

                                  • Regualation and ethics (Safety of people is always first)
                                  • Formal verification
                                  • Consider all software will run long term (5+ years)
                                  • Improve team work and collaboration on designs and development

                                  The reality of our world is people are deploying devices (routers, networks, phones, lights, laptops more …) where they may never be updated or patched in their service life. Even I’m guilty (I have a modem that’s been unpatched for about 6 years but it’s pretty locked down …). As a result we need to rely on proof that the device can not fail at build time, rather than patch it later which may never occur! Putting formal verification first, and always considering user safety and rights first, shifts a large burden to us in terms of time. But many tools (Coq, fstar, rust …) all make formal verification more accessible to use in our industry. Verifying our software is a far stronger assertion of quality than “throw tests at it and hope it works”.

                                  You’re crazy William, and also wrong

                                  Am I? Looking at “critical” systems like iPhone encryption hardware, they are running the formally verified Sel4. We also heard at Kiwicon in 2018 that Microsoft and XBox are using formal verification to design their low levels of their system to prevent exploits from occuring in the first place.

                                  Over time our industry will evolve, and it will become easier and more cost effective to formally verify than to operate and deploy CI. This doesn’t mean we don’t need tests - it means that the first line of quality should be in verification of correctness using formal techniques rather than using tests and CI to prove correct behaviour. Tests are certainly still required to assert further behavioural elements of software.

                                  Today, if you want to do this, you should be looking at Coq and program extraction, fstar and the kremlin (project everest, a formally verified https stack), Rust (which has a subset of the safe language formally proven). I’m sure there are more, but these are the ones I know off the top of my head.

                                  Conclusion

                                  Over time our industry must evolve to put the safety of humans first. To achive this we must look to other safety driven cultures such as aviation and civil engineering. Only by learning from their strict disciplines and behaviours can we start to provide software that matches behavioural and quality expectations humans have for software.

                                  Misguided misguidings over the EU bug bounty

                                  Posted by Josh Bressers on December 30, 2018 03:03 PM

                                  The EU recently announced they are going to sponsor a security bug bounty program for 14 open source projects in 2019. There has been quite a bit of buzz about this program in all the usual places. The opinions are all over the place. Some people wonder why those 14, some wonder why not more. Some think it’s great. Some think it’s a horrible idea.

                                  I don’t want to focus too much on the details as they are unimportant in the big picture. Which applications are part of the program don’t really matter. What matters is why are we here today and where should this go in the future.

                                  There are plenty of people claiming that a security bug bounty isn’t fair, we need to be paying the project developers, the people who are going to fix the bugs found by the bug bounty. Why are we only paying the people who find the bugs? This is the correct question, but it’s not correct for the reasons most think it is.

                                  There are a lot of details to unpack about all this and I don’t want to write a novel to explain all the nuance and complication around what’s going to happen. The TL;DR is basically this: The EU doesn’t have a way to pay the projects today, but they do have a way to pay security bug bounties.

                                  Right now if you want to pay a particular project, who do you send the check to? In some cases like the Apache Software Foundation it’s quite clear. In other cases when it’s some person who publishes a library for fun, it’s not clear at all. It may even be illegal in some cases, sending money across borders can get complicated very quickly. I’ll give a shoutout to Tidelift here, I think they’re on the right path to make this happen. The honest truth is it’s really really hard to give money to the open source projects you use.

                                  Now, the world of bug bounties has figured out a lot of these problems. They’ve gotten pretty good at paying people in various countries. Making sure the people getting paid are vetted in some way. And most importantly, they give an organization one place to send the check and one place to hold accountable. They’ve given us frameworks to specify who gets paid and for what. It’s quite nice really.  I wrote some musing about this a few years ago. I still mostly agree with my past self.

                                  So what does this all really mean is the big question. The EU is doing the only thing they can do right now. They have money to throw at the problem, the only place they can throw it today is a bug bounty, so that’s what they did. I think it’s great. Step one is basically admitting you have a problem.

                                  Where we go next is the real question. If nothing changes and bug bounties are the only way to spend money on open source, this will fizzle out as there isn’t going to be a massive return on investment. The projects are already overworked, they don’t need a bunch of new bugs to fix. We need a “next step” that will give the projects resources. Resources aren’t always money, sometimes it’s help, sometimes it’s gear, sometimes it’s pizza. An organization like the EU has money, they need help turning that into something useful to an open source project.

                                  I don’t know exactly what the next few steps will look like, but I do know the final step is going to be some framework that lets different groups fund open source projects. Some will be governments, some will be companies, some might even be random people who want to give a project a few bucks.

                                  Everyone is using open source everywhere. It’s woven into the fabric of our most critical infrastructures. It’s imperative we find a way to ensure it has the care and feeding it needs. Don’t bash this bug bounty program for being short sighted, praise it for being the first step of a long journey.

                                  On that note, if you are part of any of these projects (or any project really) and you want help dealing with security reports, get in touch, I will help you with security patches, advisories, and vulnerability coordination. I know what sort of pain you’ll have to deal with, open source security can be even less rewarding than running an open source project 🙂

                                  Nextcloud and badrequest filesize incorrect

                                  Posted by William Brown on December 30, 2018 01:00 PM

                                  Nextcloud and badrequest filesize incorrect

                                  My friend came to my house and was trying to share some large files with my nextcloud instance. Part way through the upload an error occurred.

                                  "Exception":"Sabre\\DAV\\Exception\\BadRequest","Message":"expected filesize 1768906752 got 1768554496"
                                  

                                  It turns out this error can be caused by many sources. It could be timeouts, bad requests, network packet loss, incorrect nextcloud configuration or more.

                                  We tried uploading larger files (by a factor of 10 times) and they worked. This eliminated timeouts as a cause, and probably network loss. Being on ethernet direct to the server generally also helps to eliminate packet loss as a cause compared to say internet.

                                  We also knew that the server must not have been misconfigured because a larger file did upload, so no file or resource limits were being hit.

                                  This also indicated that the client was likely doing the right thing because larger and smaller files would upload correctly. The symptom now only affected a single file.

                                  At this point I realised, what if the client and server were both victims to a lower level issue? I asked my friend to ls the file and read me the number of bytes long. It was 1768906752, as expected in nextcloud.

                                  Then I asked him to cat that file into a new file, and to tell me the length of the new file. Cat encountered an error, but ls on the new file indeed showed a size of 1768554496. That means filesystem corruption! What could have lead to this?

                                  HFS+

                                  Apple’s legacy filesystem (and the reason I stopped using macs) is well known for silently eating files and corrupting content. Here we had yet another case of that damage occuring, and triggering errors elsewhere.

                                  Bisecting these issues and eliminating possibilities through a scientific method is always the best way to resolve the cause, and it may come from surprising places!

                                  2018 Christmas Special - Is Santa GDPR compliant?

                                  Posted by Open Source Security Podcast on December 24, 2018 01:00 AM
                                  Josh and Kurt talk about which articles of the GDPR apply to Santa, and if he's following the rules the way he should be (spoiler, he's probably not). Should Santa be on his own naughty list? We also create a new holiday character - George the DPO Elf!


                                  <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7999541/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

                                  Show Notes

                                  Identity ideas …

                                  Posted by William Brown on December 20, 2018 01:00 PM

                                  Identity ideas …

                                  I’ve been meaning to write this post for a long time. Taking half a year away from the 389-ds team, and exploring a lot of ideas from other projects has led me to come up with some really interesting ideas about what we do well, and what we don’t. I feel like this blog could be divisive, as I really think that for our services to stay relevant we need to make changes that really change our own identity - so that we can better represent yours.

                                  So strap in, this is going to be long …

                                  What’s currently on the market

                                  Right now the market for identity has two extremes. At one end we have the legacy “create your own” systems, that are build on technologies like LDAP and Kerberos. I’m thinking about things like 389 Directory Server, OpenLDAP, Active Directory, FreeIPA and more. These all happen to be constrained heavily by complexity, fragility, and administrative workload. You need to spend months to learn these and even still, you will make mistakes and there will be problems.

                                  At the other end we have hosted “Identity as a Service” options like Azure AD and Auth0. These have very intelligently, unbound themself from legacy, and tend to offer HTTP apis, 2fa and other features that “just work”. But they are all in the cloud, and outside your control.

                                  But there is nothing in the middle. There is no option that “just works”, supports modern standards, and is unhindered by legacy that you can self deploy with minimal administrative fuss - or years of experience.

                                  What do I like from 389?

                                  • Replication

                                  The replication system is extremely robust, and has passed many complex tests for cases of eventual consistency correctness. It’s very rare to hear of any kind of data corruption or loss within our replication system, and that’s testament to the great work of people who spent years looking at the topic.

                                  • Performance

                                  We aren’t as fast as OpenLDAP is 1 vs 1 server, but our replication scalability is much higher, where in any size of MMR or read-only replica topology, we have higher horizontal scaling, nearly linear based on server additions. If you want to run a cloud scale replicated database, we scale to it (and people already do this!).

                                  • Stability

                                  Our server stability is well known with administrators, and honestly is a huge selling point. We see servers that only go down when administrators are performing upgrades. Our work with sanitising tools and the careful eyes of the team has ensured our code base is reliable and solid. Having extensive tests and amazing dedicated quality engineers also goes a long way.

                                  • Feature rich

                                  There are a lot of features I really like, and are really useful as an admin deploying this service. Things like memberof (which is actually a group resolution cache when you think about it …), automember, online backup, unique attribute enforcement, dereferencing, and more.

                                  • The team

                                  We have a wonderful team of really smart people, all of whom are caring and want to advance the state of identity management. Not only do they want to keep up with technical changes and excellence, they are listening to and want to improve our social awareness of identity management.

                                  Pain Points

                                  • C

                                  Because DS is written in C, it’s risky and difficult to make changes. People constantly make mistakes that introduce unsafety (even myself), and worse. No amount of tooling or intelligence can take away the fact that C is just hard to use, and people need to be perfect (people are not perfect!) and today we have better tools. We can not spend our time chasing our tails on pointless issues that C creates, when we should be doing better things.

                                  • Everything about dynamic admin, config, and plugins is hard and can’t scale

                                  Because we need to maintain consistency through operations from start to end but we also allow changing config, plugins, and more during the servers operation the current locking design just doesn’t scale. It’s also not 100% safe either as the values are changed by atomics, not managed by transactions. We could use copy-on-write for this, but why? Config should be managed by tools like ansible, but today our dynamic config and plugins is both a performance over head and an admin overhead because we exclude best practice tools and have to spend a large amount of time to maintain consistent data when we shouldn’t need to. Less features is less support overhead on us, and simpler to test and assert quality and correct behaviour.

                                  • Plugins to address shortfalls, but a bit odd.

                                  We have all these features to address issues, but they all do it … kind of the odd way. Managed Entries creates user private groups on object creation. But the problem is “unix requires a private group” and “ldap schema doesn’t allow a user to be a group and user at the same time”. So the answer is actually to create a new objectClass that let’s a user ALSO be it’s own UPG, not “create an object that links to the user”. (Or have a client generate the group from user attributes but we shouldn’t shift responsibility to the client.)

                                  Distributed Numeric Assignment is based on the AD rid model, but it’s all about “how can we assign a value to a user that’s unique?”. We already have a way to do this, in the UUID, so why not derive the UID/GID from the UUID. This means there is no complex inter-server communication, pooling, just simple isolated functionality.

                                  We have lots of features that just are a bit complex, and could have been made simpler, that now we have to support, and can’t change to make them better. If we rolled a new “fixed” version, we would then have to support both because projects like FreeIPA aren’t going to just change over.

                                  • client tools are controlled by others and complex (sssd, openldap)

                                  Every tool for dealing with ldap is really confusing and arcane. They all have wild (unhelpful) defaults, and generally this scares people off. I took months of work to get a working ldap server in the past. Why? It’s 2018, things need to “just work”. Our tools should “just work”. Why should I need to hand edit pam? Why do I need to set weird options in SSSD.conf? All of this makes the whole experience poor.

                                  We are making client tools that can help (to an extent), but they are really limited to system administration and they aren’t “generic” tools for every possible configuration that exists. So at some point people will still find a limit where they have to touch ldap commands. A common request is a simple to use web portal for password resets, which today only really exists in FreeIPA, and that limits it’s application already.

                                  • hard to change legacy

                                  It’s really hard to make code changes because our surface area is so broad and the many use cases means that we risk breakage every time we do. I have even broken customer deployments like this. It’s almost impossible to get away from, and that holds us back because it means we are scared to make changes because we have to support the 1 million existing work flows. To add another is more support risk.

                                  Many deployments use legacy schema elements that holds us back, ranging from the inet types, schema that enforces a first/last name, schema that won’t express users + groups in a simple away. It’s hard to ask people to just up and migrate their data, and even if we wanted too, ldap allows too much freedom so we are more likely to break data, than migrate it correctly if we tried.

                                  This holds us back from technical changes, and social representation changes. People are more likely to engage with a large migrational change, than an incremental change that disturbs their current workflow (IE moving from on prem to cloud, rather than invest in smaller iterative changes to make their local solutions better).

                                  • ACI’s are really complex

                                  389’s access controls are good because they are in the tree and replicated, but bad because the syntax is awful, complex, and has lots of traps and complexity. Even I need to look up how to write them when I have to. This is not good for a project that has such deep security concerns, where your ACI’s can look correct but actually expose all your data to risks.

                                  • LDAP as a protocol is like an 90’s drug experience

                                  LDAP may be the lingua franca of authentication, but it’s complex, hard to use and hard to write implementations for. That’s why in opensource we have a monoculture of using the openldap client libraries because no one can work out how to write a standalone library. Layer on top the complexity of the object and naming model, and we have a situation where no one wants to interact with LDAP and rather keeps it at arm length.

                                  It’s going to be extremely hard to move forward here, because the community is so fragmented and small, and the working groups dispersed that the idea of LDAPv4 is a dream that no one should pursue, even though it’s desperately needed.

                                  • TLS

                                  TLS is great. NSS databases and tools are not.

                                  • GSSAPI + SSO

                                  GSSAPI and Kerberos are a piece of legacy that we just can’t escape from. They are a curse almost, and one we need to break away from as it’s completely unusable (even if it what it promises is amazing). We need to do better.

                                  That and SSO allows loads of attacks to proceed, where we actually want isolated token auth with limited access scopes …

                                  What could we offer

                                  • Web application as a first class consumer.

                                  People want web portals for their clients, and they want to be able to use web applications as the consumer of authentication. The HTTP protocols must be the first class integration point for anything in identity management today. This means using things like OAUTH/OIDC.

                                  • Systems security as a first class consumer.

                                  Administrators still need to SSH to machines, and people still need their systems to have identities running on them. Having pam/nsswitch modules is a very major requirement, where those modules have to be fast, simple, and work correctly. Users should “imply” a private group, and UID/GID should by dynamic from UUID (or admins can override it).

                                  • 2FA/u2f/TOTP.

                                  Multi-factor auth is here (not coming, here), and we are behind the game. We already have Apple and MS pushing for webauthn in their devices. We need to be there for these standards to work, and to support the next authentication tool after that.

                                  • Good RADIUS integration.

                                  RADIUS is not going away, and is important in education providers and business networks, so RADIUS must “just work”. Importantly, this means mschapv2 which is the universal default for all clients to operate with, which means nthash.

                                  However, we can make the nthash unlinked from your normal password, so you can then have wifi password and a seperate loging password. We could even generate an NTHash containing the TOTP token for more high security environments.

                                  • better data structure (flat, defined by object types).

                                  The tree structure of LDAP is confusing, but a flatter structure is easier to manage and understand. We can use ideas from kubernetes like tags/labels which can be used to provide certain controls and filtering capabilities for searches and access profiles to apply to.

                                  • structured logging, with in built performance profiling.

                                  Being able to diagnose why an operation is slow is critical and having structured logs with profiling information is key to allowing admins and developers to resolve performance issues at scale. It’s also critical to have auditing of every single change made in the system, including internal changes that occur during operations.

                                  • access profiles with auditing capability.

                                  Access profiles that express what you can access, and how. Easier to audit, generate, and should be tightly linked to group membership for real RBAC style capabilities.

                                  • transactions by allowing batch operations.

                                  LDAP wants to provide a transaction system over a set of operations, but that may cause performance issues on write paths. Instead, why not allow submission of batches of changes that all must occur “at the same time” or “none”. This is faster network wise, protocol wise, and simpler for a server to implement.

                                  What’s next then …

                                  Instead of fixing what we have, why not take the best of what we have, and offer something new in parallel? Start a new front end that speaks in an accessible way, that has modern structures, and has learnt from the lessons of the past? We can build it to standalone, or proxy from the robust core of 389 Directory Server allowing migration paths, but eschew the pain of trying to bring people to the modern world. We can offer something unique, an open source identity system that’s easy to use, fast, secure, that you can run on your terms, or in the cloud.

                                  This parallel project seems like a good idea … I wonder what to name it …

                                  Episode 127 - Walled gardens, appstores, and more

                                  Posted by Open Source Security Podcast on December 17, 2018 12:56 AM
                                  Josh and Kurt talk about Mozilla pulling a paywall bypassing extension. We then turn our attention to talking about walled gardens. Are they good, are they bad? Something in the middle? There is a lot of prior art to draw on here, everything from Windows, Android, iOS, even Linux distributions.


                                  <iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7939541/height/90/theme/custom/thumbnail/yes/preload/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

                                  Show Notes