Fedora security Planet

Container Labeling

Posted by Dan Walsh on November 12, 2018 02:01 PM

An issue was recently raised on libpod, the github repo for Podman.

"container_t isn't allowed to access container_var_lib_t"

Container policy is defined in the container-selinux package. By default containers run with the SELinux type "container_t" whether this is a container launched by just about any container engine like: podman, cri-o, docker, buildah, moby.  And most people who use SELinux with containers from container runtimes like runc, systemd-nspawn use it also.

By default container_t is allowed to read/execute labels under /usr, read generically labeled content in the hosts /etc directory (etc_t). 

The default label for content in /var/lib/docker and  /var/lib/containers is container_var_lib_t, This is not accessible by  containers, container_t,  whether they are running under podman, cri-o,  docker, buildah ...  We specifically do not want containers to be able to read this content, because content that uses block devices like devicemapper and btrfs(I believe) is labeled container_var_lib_t, when the containers are not running.  

For overlay content we need to allow containers to read/execute the content, we use the type container_share_t, for this content.  So container_t is allowed to read/execute container_share_t files, but not write/modify them.

Content under /var/lib/containers/overlay* and /var/lib/docker/overlay* is labeled container_share_ by default.

$ grep overlay /etc/selinux/targeted/contexts/files/file_contexts
/var/lib/docker/overlay(/.*)? system_u:object_r:container_share_t:s0
/var/lib/docker/overlay2(/.*)? system_u:object_r:container_share_t:s0
/var/lib/containers/overlay(/.*)? system_u:object_r:container_share_t:s0
/var/lib/containers/overlay2(/.*)? system_u:object_r:container_share_t:s0
/var/lib/docker-latest/overlay(/.*)? system_u:object_r:container_share_t:s0
/var/lib/docker-latest/overlay2(/.*)? system_u:object_r:container_share_t:s0
/var/lib/containers/storage/overlay(/.*)? system_u:object_r:container_share_t:s0
/var/lib/containers/storage/overlay2(/.*)? system_u:object_r:container_share_t:s0

The label container_file_t is the only type that is writeable by containers.  container_file_t  is used when the overlay mount is created for the upper directory  of an image. It is also used for content mounted from devicemapper and btrfs.  

If you  volume mount in a directory into  a container and add a :z or :Z the container engines relabeled the content under the volumes to container_file_t.

Failure to read/write/execute content labeled container_var_lib_t is expected.  

When I see this type of AVC, I expect that this is either a volume mounted in  from /var/lib/container or /var/lib/docker or a mislabeled content  under and overlay directory like /var/lib/containers/storage/overlay.  

Solution:

To solve these, I usually recommend running 

restorecon -R -v /var/lib/containers
restorecon -R -v /var/lib/docker

Or if it is a volume mount to use the :z, or :Z/


Episode 122 - What will Apple's T2 chip mean for the rest of us?

Posted by Open Source Security Podcast on November 12, 2018 04:01 AM
Josh and Kurt talk about Apple's new T2 security chip. It's not open source but we expect it to change the security landscape in the coming years.


<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7523042/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes

Episode 121 - All about the security of voting

Posted by Open Source Security Podcast on November 05, 2018 01:01 AM
Josh and Kurt talk about voting security. What does it mean, how does it work. What works, what doesn't work, and most importantly why we may not see secure electronic voting anytime soon.


<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7429520/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes

High Available RADVD on Linux

Posted by William Brown on October 31, 2018 02:00 PM

High Available RADVD on Linux

Recently I was experimenting again with high availability router configurations so that in the cause of an outage or a failover the other router will take over and traffic is still served.

This is usually done through protocols like VRRP to allow virtual ips to exist that can be failed between. However with ipv6 one needs to still allow clients to find the router, and in the cause of a failure, the router advertisments still must continue for client renewals.

To achieve this we need two parts. A shared Link Local address, and a special RADVD configuration.

Because of howe ipv6 routers work, all traffic (even global) is still sent to your link local router. We can use an address like:

fe80::1:1

This doesn’t clash with any reserved or special ipv6 addresses, and it’s easy to remember. Because of how link local works, we can put this on many interfaces of the router (many vlans) with no conflict.

So now to the two components.

Keepalived

Keepalived is a VRRP implementation for linux. It has extensive documentation and sometimes uses some implementation specific language, but it works well for what it does.

Our configuration looks like:

#  /etc/keepalived/keepalived.conf
global_defs {
  vrrp_version 3
}

vrrp_sync_group G1 {
 group {
   ipv6_ens256
 }
}

vrrp_instance ipv6_ens256 {
   interface ens256
   virtual_router_id 62
   priority 50
   advert_int 1.0
   virtual_ipaddress {
    fe80::1:1
    2001:db8::1
   }
   nopreempt
   garp_master_delay 1
}

Note that we provide both a global address and an LL address for the failover. This is important for services and DNS for the router to have the global, but you could omit this. The LL address however is critical to this configuration and must be present.

Now you can start up vrrp, and you should see one of your two linux machines pick up the address.

RADVD

For RADVD to work, a feature of the 2.x series is required. Packaging this for el7 is out of scope of this post, but fedora ships the version required.

The feature is that RADVD can be configured to specify which address it advertises for the router, rather than assuming the interface LL autoconf address is the address to advertise. The configuration appears as:

# /etc/radvd.conf
interface ens256
{
    AdvSendAdvert on;
    MinRtrAdvInterval 30;
    MaxRtrAdvInterval 100;
    AdvRASrcAddress {
        fe80::1:1;
    };
    prefix 2001:db8::/64
    {
        AdvOnLink on;
        AdvAutonomous on;
        AdvRouterAddr off;
    };
};

Note the AdvRASrcAddress parameter? This defines a priority list of address to advertise that could be available on the interface.

Now start up radvd on your two routers, and try failing over between them while you ping from your client. Remember to ping LL from a client you need something like:

ping6 fe80::1:1%en1

Where the outgoing interface of your client traffic is denoted after the ‘%’.

Happy failover routing!

Episode 120 - Bloomberg and hardware backdoors - it's already happening

Posted by Open Source Security Podcast on October 29, 2018 12:01 AM
Josh and Kurt talk about Bloomberg's story about backdoors and motherboards. The story is probably false, but this is almost certainly happening already with hardware. What does it mean if your hardware is already backdoored by one or more countries?

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7345613/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes

Targeted vs General purpose security

Posted by Josh Bressers on October 23, 2018 01:13 PM

There seems to be a lot of questions going around lately about how to best give out simple security advice that is actionable. Goodness knows I’ve talked about this more than I can even remember at this point. The security industry is really bad at giving out actionable advice. It’s common someone will ask what’s good advice. They’ll get a few morsels, them someone will point out whatever corner case makes that advice bad and the conversation will spiral into nonsense where we find ourselves trying to defend someone mostly concerned about cat pictures from being kidnapped by a foreign nation. Eventually whoever asked for help quit listening a long time ago and decided to just keep their passwords written on a sticky note under the keyboard.

I’m pretty sure the fundamental flaw in all this thinking is we never differentiate between a targeted attack and general purpose security. They are not the same thing. They’re incredibly different in fact. General purpose advice can be reasonable, simple, and good. If you are a target you’ve already lost, most advice won’t help you.

General purpose security is just basic hygiene. These are the really easy concepts. Ideas like using a password manager, multi-factor-auth, install updates on your system. These are the activities anyone and everyone should be doing. One could argue these should be the default settings for any given computer or service (that’s a post for another day though). You don’t need to be a security genius to take these steps. You just have to restrain yourself from acting like a crazy person so whoever asked for help can actually get the advice they need.

Now if you’re the target of a security operation, things are really different. Targeted security is when you’re an active target, someone has picked you out for some reason and has a specific end goal in mind. This is the sort of attack where people will send you very specific phishing mails. They will probably try to brute force your password to a given account. They might call friends and family. Maybe even looking through your trash for clues they can use. If you are a target the goal isn’t to stop the attacker, it’s just to slow them down enough so you know you’re under attack. Once you know you’re under attack you can find a responsible adult to help.

These two things are very different. If you try to lump them together you end up with no solution, and at best a confused audience. In reality you probably end up with no audience because you sound like a crazy person.

Here is an example. Let’s say someone asks for some advice for people connecting to public wifi. Then you get a response about how your pen test used public wifi against an employee to steal their login credentials. That’s not a sane comparison. If you have a specific target in mind you can play off their behaviors and typical activities. You knew which sites they visit, you knew which coffee house they like. You knew which web browser and operating system they had. You had a level of knowledge that put the defender in a position they couldn’t defend against. General security doesn’t work like that.

The goal of general purpose advice is to be, well, general. This is like telling people to wash their hands. You don’t get into specifics about if they’ve been in contact with flesh eating bacteria and how they should be keeping some incredibly strong antiseptic on hand at all times just in case. Actual advice is to get some soap, pretty much any soap is fine, and wash your hands. That’s it. If you find yourself in the company of flesh eating bacteria in the future, go find someone who specializes in such a field. They’ll know what to actually do. Keeping special soap under your sink isn’t going to be one of the things they suggest.

There’s nothing wrong with telling people the coffee house wifi is probably OK for many things. Don’t do banking from it, make sure you have an updated browser and operating system. Stay away from dodgy websites. If things start to feel weird, stop using the wifi. The goal isn’t to eliminate all security threats, it’s just to make things a little bit better. Progress is made one step at a time, not in one massive leap. Massive leaps are how you trip and fall.

And if you are a specific target, you can only lose. You aren’t going to stop that attacker. Targeted attacks, given enough time, never fail.

Episode 119 - The Google+ and Facebook incidents, it's not your data anymore

Posted by Open Source Security Podcast on October 22, 2018 12:22 AM
Josh and Kurt talk about the Google+ and Facebook data incidents. We don't have any control over this data anymore. The incidents didn't really affect the users because we have no idea who has access to it. We also touch on GDPR and what it could mean in this context.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7262717/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes

Rust RwLock and Mutex Performance Oddities

Posted by William Brown on October 18, 2018 02:00 PM

Rust RwLock and Mutex Performance Oddities

Recently I have been working on Rust datastructures once again. In the process I wanted to test how my work performed compared to a standard library RwLock and Mutex. On my home laptop the RwLock was 5 times faster, the Mutex 2 times faster than my work.

So checking out my code on my workplace workstation and running my bench marks I noticed the Mutex was the same - 2 times faster. However, the RwLock was 4000 times slower.

What’s a RwLock and Mutex anyway?

In a multithreaded application, it’s important that data that needs to be shared between threads is consistent when accessed. This consistency is not just logical consistency of the data, but affects hardware consistency of the memory in cache. As a simple example, let’s examine an update to a bank account done by two threads:

acc = 10
deposit = 3
withdrawl = 5

[ Thread A ]            [ Thread B ]
acc = load_balance()    acc = load_balance()
acc = acc + deposit     acc = acc - withdrawl
store_balance(acc)      store_balance(acc)

What will the account balance be at the end? The answer is “it depends”. Because threads are working in parallel these operations could happen:

  • At the same time
  • Interleaved (various possibilities)
  • Sequentially

This isn’t very healthy for our bank account. We could lose our deposit, or have invalid data. Valid outcomes at the end are that acc could be 13, 5, 8. Only one of these is correct.

A mutex protects our data in multiple ways. It provides hardware consistency operations so that our cpus cache state is valid. It also allows only a single thread inside of the mutex at a time so we can linearise operations. Mutex comes from the word “Mutual Exclusion” after all.

So our example with a mutex now becomes:

acc = 10
deposit = 3
withdrawl = 5

[ Thread A ]            [ Thread B ]
mutex.lock()            mutex.lock()
acc = load_balance()    acc = load_balance()
acc = acc + deposit     acc = acc - withdrawl
store_balance(acc)      store_balance(acc)
mutex.unlock()          mutex.unlock()

Now only one thread will access our account at a time: The other thread will block until the mutex is released.

A RwLock is a special extension to this pattern. Where a mutex guarantees single access to the data in a read and write form, a RwLock (Read Write Lock) allows multiple read-only views OR single read and writer access. Importantly when a writer wants to access the lock, all readers must complete their work and “drain”. Once the write is complete readers can begin again. So you can imagine it as:

Time ->

T1: -- read --> x
T3:     -- read --> x                x -- read -->
T3:     -- read --> x                x -- read -->
T4:                   | -- write -- |
T5:                                  x -- read -->

Test Case for the RwLock

My test case is simple. Given a set of 12 threads, we spawn:

  • 8 readers. Take a read lock, read the value, release the read lock. If the value == target then stop the thread.
  • 4 writers. Take a write lock, read the value. Add one and write. Continue until value == target then stop.

Other conditions:

  • The test code is identical between Mutex/RwLock (beside the locking costruct)
  • –release is used for compiler optimisations
  • The test hardware is as close as possible (i7 quad core)
  • The tests are run multiple time to construct averages of the performance

The idea being that X target number of writes must occur, while many readers contend as fast as possible on the read. We are pressuring the system of choice between “many readers getting to read fast” or “writers getting priority to drain/block readers”.

On OSX given a target of 500 writes, this was able to complete in 0.01 second for the RwLock. (MBP 2011, 2.8GHz)

On Linux given a target of 500 writes, this completed in 42 seconds. This is a 4000 times difference. (i7-7700 CPU @ 3.60GHz)

All things considered the Linux machine should have an advantage - it’s a desktop processor, of a newer generation, and much faster clock speed. So why is the RwLock performance so much different on Linux?

To the source code!

Examining the Rust source code , many OS primitives come from libc. This is because they require OS support to function. RwLock is an example of this as is mutex and many more. The unix implementation for Rust consumes the pthread_rwlock primitive. This means we need to read man pages to understand the details of each.

OSX uses FreeBSD userland components, so we can assume they follow the BSD man pages. In the FreeBSD man page for pthread_rwlock_rdlock we see:

IMPLEMENTATION NOTES

 To prevent writer starvation, writers are favored over readers.

Linux however, uses different constructs. Looking at the Linux man page:

PTHREAD_RWLOCK_PREFER_READER_NP
  This is the default.  A thread may hold multiple read locks;
  that is, read locks are recursive.  According to The Single
  Unix Specification, the behavior is unspecified when a reader
  tries to place a lock, and there is no write lock but writers
  are waiting.  Giving preference to the reader, as is set by
  PTHREAD_RWLOCK_PREFER_READER_NP, implies that the reader will
  receive the requested lock, even if a writer is waiting.  As
  long as there are readers, the writer will be starved.

Reader vs Writer Preferences?

Due to the policy of a RwLock having multiple readers OR a single writer, a preference is given to one or the other. The preference basically boils down to the choice of:

  • Do you respond to write requests and have new readers block?
  • Do you favour readers but let writers block until reads are complete?

The difference is that on a read heavy workload, a write will continue to be delayed so that readers can begin and complete (up until some threshold of time). However, on a writer focused workload, you allow readers to stall so that writes can complete sooner.

On Linux, they choose a reader preference. On OSX/BSD they choose a writer preference.

Because our test is about how fast can a target of write operations complete, the writer preference of BSD/OSX causes this test to be much faster. Our readers still “read” but are giving way to writers, which completes our test sooner.

However, the linux “reader favour” policy means that our readers (designed for creating conteniton) are allowed to skip the queue and block writers. This causes our writers to starve. Because the test is only concerned with writer completion, the result is (correctly) showing our writers are heavily delayed - even though many more readers are completing.

If we were to track the number of reads that completed, I am sure we would see a large factor of difference where Linux has allow many more readers to complete than the OSX version.

Linux pthread_rwlock does allow you to change this policy (PTHREAD_RWLOCK_PREFER_WRITER_NP) but this isn’t exposed via Rust. This means today, you accept (and trust) the OS default. Rust is just unaware at compile time and run time that such a different policy exists.

Conclusion

Rust like any language consumes operating system primitives. Every OS implements these differently and these differences in OS policy can cause real performance differences in applications between development and production.

It’s well worth understanding the constructions used in programming languages and how they affect the performance of your application - and the decisions behind those tradeoffs.

This isn’t meant to say “don’t use RwLock in Rust on Linux”. This is meant to say “choose it when it makes sense - on read heavy loads, understanding writers will delay”. For my project (A copy on write cell) I will likely conditionally compile rwlock on osx, but mutex on linux as I require a writer favoured behaviour. There are certainly applications that will benefit from the reader priority in linux (especially if there is low writer volume and low penalty to delayed writes).

Creating a Self Trust In Keystone

Posted by Adam Young on October 18, 2018 02:44 AM

Lets say you are an administrator of an OpenStack cloud. This means you are pretty much all powerful in the deployment. Now, you need to perform some operation, but you don’t want to give it full admin privileges? Why? well, do you work as root on your Linux box? I hope note. Here’s how to set up a self trust for a reduced set of roles on your token.

First, get a regular token, but use the –debug to see what the project ID, role ID, and your User ID actually are:

In my case, they are … long uuids.

I’ll trim them down both for obscurity as well as the make it more legible. Here is the command to create the trust.

openstack trust create --project 9417f7 --role 9fe2ff 154741 154741

Mine returned:

+--------------------+----------------------------------+
| Field              | Value                            |
+--------------------+----------------------------------+
| deleted_at         | None                             |
| expires_at         | None                             |
| id                 | 26f8d2                           |
| impersonation      | False                            |
| project_id         | 9417f7                           |
| redelegation_count | 0                                |
| remaining_uses     | None                             |
| roles              | _member_                         |
| trustee_user_id    | 154741                           |
| trustor_user_id    | 154741                           |
+--------------------+----------------------------------+

On my system, role_id 9fe2ff is the _member_role.

Note that, if you are Admin, you need to explicitly grant yourself the _member_ role, or use an implied role rule that says admin implies member.

Now, you can get a reduced scope token. Unset the variables that are used to scope the token, since you want to scope to the trust now.

$ unset OS_PROJECT_DOMAIN_NAME 
$ unset OS_PROJECT_NAME 
$ openstack token issue --os-trust-id  26f8d2eaf1404489ab8e8e5822a0195d
+------------+----------------------------------+
| Field      | Value                            |
+------------+----------------------------------+
| expires    | 2018-10-18T10:31:57+0000         |
| id         | f16189                           |
| project_id | 9417f7                           |
| user_id    | 154741                           |
+------------+----------------------------------+

This still requires you to authenticate with your userid and password. An even better mechanism is the new Application Credentials API. It works much the same way, but you use an explicitly new password. More about that next time.

Episode 118 - Cloudflare's IPFS and onion service

Posted by Open Source Security Podcast on October 15, 2018 01:39 AM
Josh and Kurt talk about Cloudflare's new IPFS and Onion services. One brings distributed blockchain files to the masses, the other lets you host your site on tor easily.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7128770/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes

Episode 117 - Will security follow Linus' lead on being nice?

Posted by Open Source Security Podcast on October 08, 2018 12:01 AM
Josh and Kurt talk about Linus' effort to work on his attitude. What will this mean for security and IT in general?

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7128768/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes

Imagining Go With Alternate Boards

Posted by Adam Young on October 04, 2018 03:47 PM

The Game of Go is still pretty much the ultimate strategy game.  No other game distills strategy to its essence, in such simplicity, and thus lets the complexity emerge. 

The board is simplicity itself: a 19 X 19 Grid:

Standard Go board

One aspect of Go is that you start from the corners, build a semi-secure formation, and then grow out from there.

For example, here is a recent game of mine in the early stages:

Both my opponent and I have staked out positions in the corners.

What if the board was a little different?

Chinese Checkers has 6 Points where people start:

The 6 corners allow 6 players a secure base to start from.

What if we adapt this idea into a Go board?

More Corners

Sometimes, less is more.

Fewer corners. Each player picks one to start.

One interesting aspect of the Stratego board is that it has terrain features in the middle:

Courtesy of Mark Alldrige. Stratego is copyright Milton Bradley, 1980

What if we cut out a some spaces in the middle of a go board, like this:

Terrain features in the middle of the battlefield

Or combine some of these ideas:

Corners and Center are now out of play.

How would each of these variations modify the game? Many of the tactical patterns would stay the same, but would have a different role in the overall strategy.

SELinux blocks podman container from talking to libvirt

Posted by Dan Walsh on October 02, 2018 10:27 AM

I received this bug report this week.

"I see this when I try to use vagrant from a container using podman on Fedora 29 Beta.

Podman version: 0.8.4

Command to run container:

sudo podman run -it --rm -v /run/libvirt:/run/libvirt:Z -v $(pwd):/root:Z localhost/vagrant vagrant up

Logs:

...

Sep 30 21:17:25 Home audit[22760]: AVC avc:  denied  { connectto } for  pid=22760 comm="batch_action.r*" path="/run/libvirt/libvirt-sock" scontext=system_u:system_r:container_t:s0:c57,c527 tcontext=system_u:system_r:virtd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=0

"

This is an interesting use case of using SELinux and containers.  SELinux is protecting the file system, and the host from attack from inside of the container.  People who have listened to me over the years understand that SELinux is protecting the label of files, in the case of containers, it only allows a container_t to read/write/execute files labeled container_file_t.

But the reporter of the bug, thinks he did the right thing, he told podman to relabel the volumes he was mounting into the container.

Lets look at his command to launch the container.

sudo podman run -it --rm -v /run/libvirt:/run/libvirt:Z -v $(pwd):/root:Z localhost/vagrant vagrant up

He told podman to go out and relabel /run/libvirt and $(pwd)/root with a private label generated for the container, that is what the ":Z" means, system_u:object_r:container_file_t:MCS.  Well sadly this is not the right thing to do and will probably cause him issues going forward.  Since /run/libvirt is probably used by other processes outside of the container, he might have broken them.  libvirt running as virtd_t is not probably not allowed to write to container_file_t.  The $(pwd)/root directory is probably fine, since this is not likely to be shared with other confined daemons.

Ignoring that this was the wrong thing to do, 

SELinux still blocked the container, Why?

SELinux does not only block access to files on disk.  While SElinux would allow container_t to write to a unix domain socket, "/run/libvirt/libvirt-sock", labeled container_file_t, a second SELinux check happens between the processes. SELinux also checks whether the container can talk to the daemon, libvirt,  running as virtd_t.

Since there is no allow rule for container_t to connectto virtd_t, the connection fails.

Currently in situations like this I tell people to just disable SELinux separation inside this container, rather then fooling around with the labels.

sudo podman run -it --security-opt label=disable  --rm -v /run/libvirt:/run/libvirt -v $(pwd):/root localhost/vagrant vagrant up

Notice I removed the :Z. This will cause podman to run the container as spc_t, which is an unconfined domain, and all confined domains are allowed to communicate with it.

Since this is not a full disablement of SELinux, it does not make me cry.  :^)

In the future Lukas Vrabek is working on a better solution, udica, where you could simple create a new type based on container_t, and then run your container with it.

udica should allow you to generate container_vagrant_t which would be allowed to write to /run/libvirts labels and communicate with virtd_t, and still have all other SELinux confinement. Then you could execute something like this:

sudo podman run -it --security-opt label=type:container_vagrant_t --rm -v /run/libvirt:/run/libvirt -v $(pwd):/root localhost/vagrant vagrant up


Millions of unfixed security flaws is a lie

Posted by Josh Bressers on October 01, 2018 01:26 PM

On a pretty regular basis I see claims that the public CVE dataset is missing some large number of security issues. I’ve seen ranges from tens of thousands all the way up to millions. The purpose behind such statements is to show that the CVE data is woefully incomplete. Of course almost everyone making that claim has a van filled with security issues and candy they’re trying very hard to lure us into. It’s a pretty typical sales tactic as old as time itself. Whatever you have today isn’t good enough, but what I have, holy cow it’s better. It’s so much better you better come right over and see for yourself. After you pay me of course.

If you take away any single thing from this post, make it this: There are not millions of unfixed security flaws missing from the CVE data.

If you’re not familiar with how CVE works, I’ll give you a very short crash course. Essentially someone (anyone) requests a CVE ID, and if it’s a real security issue, a CVE gets assigned. It really is fundamentally this simple. Using some sort of advanced logic, the obvious question becomes: “why not get a CVE ID for all these untracked security flaws?”

That’s a great question! There are two possible reasons for this. The first is the organizations in question don’t want to share what they know. The second is all the things they claim are security issues really aren’t security issues at all. The second answer is of course correct, but let’s understand why.

The first answer assumes their security flaw are some sort of secret information only they know. This would also suggest the security issues in question are not acknowledged by the projects or vendors. If a project has any sort of security maturity, they are issuing CVE IDs (note: if you are a project who cares about security and don’t issue CVE IDs, talk to me, I will help you). This means that if the project knows about a security issue they will release a CVE ID for it. If they don’t know about the issue, it not only doesn’t have a CVE ID but is also unfixed. Not telling projects and vendors about security issues would be pretty weaselly. It also wouldn’t make anyone any safer. In fact it would make us all a lot less safe.

This brings us to the next stop in our complex logical journey. If you are a company that has the ability to scan and track security issues, and you find an unknown security issue in a project, you will want to make some noise about finding it. That means you follow some sort of security process that includes getting a CVE ID for the issue in question. After all, you want to make sure your security problem is known to the public and what better way then the largest public security dataset?

This brings us to our logical conclusion about all these untracked security issues is that they’re not really security problems. Some are just bugs. Some are nothing. Some are probably design decisions. Fundamentally if there is a security issue that matters, it will get a CVE ID. We should all be working together to make CVE better, not trying to claim our secret data is better than everyone else’s. There are no winners and loser when it comes to security issues. We all win or we all lose.

As most of these sort of fantastical claims tend to end, if it sounds too good to be true, it probably is.

Episode 116 - The future of the CISO with Michael Piacente

Posted by Open Source Security Podcast on October 01, 2018 12:01 AM
Josh and Kurt talk to Michael Piacente from Hitch Partners about the past, present, and future role of the CISO in the industry.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7104119/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes

Episode 115 - Discussion with Brian Hajost from SteelCloud

Posted by Open Source Security Podcast on September 24, 2018 12:02 AM
Josh and Kurt talk to Brian Hajost from SteelCloud about public sector compliance. The world of public sector compliance can be confusing and strange, but it's not that bad when it's explained by someone with experience.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7081715/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes

Episode 114 - Review of "Click Here to Kill Everybody"

Posted by Open Source Security Podcast on September 17, 2018 12:13 AM
Josh and Kurt review Bruce Schneier's new book Click Here to Kill Everybody. It's a book everyone could benefit from reading. It does a nice job explaining many existing security problems in a simple manner.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7052800/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes

Episode 113 - Actual real security advice

Posted by Open Source Security Podcast on September 10, 2018 12:07 AM
Josh and Kurt talk about actual real world advice. Based on a story about trying to secure political campaigns, if we had to give some security help what should it look like, who should we give it to?

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/7024289/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes

  • Security advice to Democrats
  • Our actual advice
    • Don’t run your own services 
    • Email - Google or Microsoft 
    • Don't’ use GPG 
    • Use a trusted device 
    • Use a password manager on a secure device 
    • Use 2FA 
    • Backups 

Converting a RHEL Workstation to a Server

Posted by Adam Young on September 08, 2018 04:07 AM

My laptop is my Demo machine.  I need to be able to run the Red Hat cloud Suite of software on it.  I want to install this software the same way a customer would.  However, much of this software is server side software, and my machine was registered as a workstation. This means the Red Hat Content network won’t show me the server yum repositories.  Here is how I converted my machine to be a server.

The key is to change the installed  RPM from  redhat-release-workstation to redhat-release-server.  These two RPMS control the set of files that tell the yum system what product is installed, and from that, the set of available (Yum) repositories.  However, since they conflict, you cannot just install redhat-release-server.  That leads to the following errors:

Transaction check error:
  file /etc/os-release from install of redhat-release-server-7.5-8.el7.x86_64 conflicts with file from package redhat-release-workstation-7.5-8.el7.x86_64
  file /etc/redhat-release from install of redhat-release-server-7.5-8.el7.x86_64 conflicts with file from package redhat-release-workstation-7.5-8.el7.x86_64
  file /etc/system-release-cpe from install of redhat-release-server-7.5-8.el7.x86_64 conflicts with file from package redhat-release-workstation-7.5-8.el7.x86_64
  file /usr/lib/systemd/system-preset/90-default.preset from install of redhat-release-server-7.5-8.el7.x86_64 conflicts with file from package redhat-release-workstation-7.5-8.el7.x86_64

Here are the steps I worked out to work around this.

First, download the  redhat-release-server RPM on a server-registered machine.  Use the yum coammnad, to make sure keys are presnet, and the repo lets you in.

sudo yum reinstall redhat-release-server  --downloadonly

This will download a copy, that you can find with:

 

find /var/cache/yum/x86_64/7Server/rhel-7-server-rpms/ -name redhat-release-server-7.5-8.el7.x86_64.rpm

And then copy it over from the target machine.  IN my case:

scp -i ~/keys/id_rsa cloud-user@128.31.26.132:/var/cache/yum/x86_64/7Server/rhel-7-server-rpms/packages/redhat-release-server-7.5-8.el7.x86_64.rpm /home/ayoung/Downloads/

To install it, use the yum shell to perform multiple yum commands in a single transaction:

$ sudo yum shell
Loaded plugins: changelog, fs-snapshot, priorities, product-id, refresh-packagekit, rpm-warm-cache, search-disabled-repos, subscription-manager, verify
> erase redhat-release-workstation-7.5-8.el7.x86_64
> install /home/ayoung/Downloads/redhat-release-server-7.5-8.el7.x86_64.rpm
Examining /home/ayoung/Downloads/redhat-release-server-7.5-8.el7.x86_64.rpm: redhat-release-server-7.5-8.el7.x86_64
Marking /home/ayoung/Downloads/redhat-release-server-7.5-8.el7.x86_64.rpm to be installed
> run

Assuming that runs to completion, use the command exit to return to the bash command prompt.  Update the set of repos with:

sudo  mv /etc/yum.repos.d/redhat.repo /etc/yum.repos.d/redhat.repo.old
sudo subscription-manager refresh

And then list, and you should see that most of the repos that had “workstation” in them before now have “server” in their names.

$ sudo subscription-manager repos --list-enabled
+----------------------------------------------------------+
Available Repositories in /etc/yum.repos.d/redhat.repo
+----------------------------------------------------------+
Repo ID: rhel-7-server-htb-rpms
Repo Name: Red Hat Enterprise Linux 7 Server HTB (RPMs)
Repo URL: https://cdn.redhat.com/content/htb/rhel/server/7/$basearch/os
Enabled: 1

Repo ID: rhel-7-workstation-rpms
Repo Name: Red Hat Enterprise Linux 7 Workstation (RPMs)
Repo URL: https://cdn.redhat.com/content/dist/rhel/workstation/7/$releasever/$basearch/os
Enabled: 1

Repo ID: rhel-7-server-rpms
Repo Name: Red Hat Enterprise Linux 7 Server (RPMs)
Repo URL: https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basearch/os
Enabled: 1

I only want the server RPMs for now:

$ sudo subscription-manager repos --disable rhel-7-server-htb-rpms
Repository 'rhel-7-server-htb-rpms' is disabled for this system.
$ sudo subscription-manager repos --disable rhel-7-workstation-rpms
Repository 'rhel-7-workstation-rpms' is disabled for this system.

And…

$ sudo yum update
Loaded plugins: changelog, fs-snapshot, priorities, product-id, refresh-packagekit, rpm-warm-cache, search-disabled-repos, subscription-manager, verify
No packages marked for update

I wonder what this is going to break.

I cannot, yet, say whether this is a sane thing to do or not.  I’ll let you know.

 

 

SELinux prevent users from executing programs, for security? Who cares.

Posted by Dan Walsh on September 04, 2018 12:45 PM

I recently received the following email about using SELinux to prevent users from executing programs.
 

I just started to learn SELinux and this is nice utility if you want confine any user who interact with your system.

A lot of information on Net about how to confine programs, but can't find about confining man's :)

I found rbash (https://access.redhat.com/solutions/65822) which help me forbid execution any software inside and outside user home directory except few.

As I understand correctly to do this using SELinux I need a new user domain(customuser)  which by default should deny all or I can start with predefined       guest_t?

Next then for example I can enable netutils_exec_ping(customuser_t, customuser_r).

I responded that:

SELinux does not worry so much about executing individual programs, although it can do this.  SELinux is basically about  defining the access of a process type.  
Just because a program can execute another program does not mean  that this process type is going to be allowed the access that the program requires.  For example.  

A user running as guest_t can execute su and sudo, and even if the user might discover the       correct password to become root, they can not become root on the system, SELinux would block it.  Similarly guest_t is not allowed to connect out of the system, so being able to execute ssh or ping does not mean that the user would be able to ping another host or       ssh to another system.

This is far more powerful than just blocking access to certain programs, since the user theoretically could down load those programs to his homedir, and use them there.

There are lots of Turing complete tools that the user will get access to, that would allow them to write code to do pretty much what every application installed on the system can do.  

Bottom line:

Blocking access to system objects and Linux Capabilities is far mor powerfull then blocking a user process from executing a program on disk.

Episode 112 - Google's Titan Key and the latest Struts issue

Posted by Open Source Security Podcast on September 03, 2018 12:02 AM
Josh and Kurt talk about the new Google Titan security key. There are some in the industry uneasy about the supply chain for the devices. We also discuss the latest Struts security issue. Struts is old and scary now, stop using it.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/6998699/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes


Security reviews and microservices

Posted by Josh Bressers on August 28, 2018 12:57 PM

We love to do security reviews on the projects, products, and services our companies use. Security reviews are one of those ways we can show how important security is. If those reviews didn’t get done we might end up using a service that could put our users and data at risk. Every good horror story involving dinosaurs starts with bad security reviews! It’s a lesson too few of us really take to heart.

The reality is someone picks a service, we review it, and it will probably still put our data at risk, but went through a very rigorous review so we can show how much … review it got? I’m not really sure what that means but we know that security reviews are really important.

These reviews are quite complex and fairly important in all seriousness. Doing any sort of review of an application takes a certain amount of knowledge and understanding. There’s a lot of value in making sure you’re not shipping or using something that is a tire fire of security problems. All these rules are going to change. The world of microservices is going to make us rethink how everything works.

One security review can take a day or more depending what you’re looking for. If something is large enough it wouldn’t be unreasonable for someone to spend a week going over all the details you need to understand before trusting something with your most important data.

But what happens when we have to review a dozen microservices?

What happens when we have to review a thousand microservices?

We can’t review a thousand microservices. We probably can’t review a dozen in all seriousness. It’s possible some things can be grouped together in some sane and reasonable manner but we all know that’s not going to be the norm, it’s going to be the exception.

What do we do now? There are two basic paths we can take. The first is we spend some quality time crying under our desk. It’s not terribly useful but will make you feel better. The second option is to automate the heck out of this stuff.

Humans don’t scale, not even linearly. In fact adding more humans probably results in worse performance. If you need to review a thousand services you will need an incredible number of people, and anytime people are involved there are going to be a lot of mistakes made. There is no secret option three where we just staff up to get this done. Staffing up probably just means you now have two problems instead of one.

Automation is the only plausible solution.

I did some digging into this, there isn’t a ton of information on automating this sort of process. I find that interesting as it’s a pretty big problem for most everyone, yet we don’t have a nice way to simplify this process.

I did manage to find a project Google seems to use called VSAQ

It’s not exactly automation as a human from the vendor has to fill out a form. Once the details are entered you can do things with the data and results. It puts all the work on the vendor to get things right. I don’t think vendors try to purposely mislead, but mistakes happen. And if you’re using open source there is no vendor to fill out the form.

Unfortunately this blog post is going to end without any sort of actionable advice. I had hoped to spend time reviewing options in this space but I found nothing. So the call to action is two things.

Firstly, if there is something I’m missing, please let me know. Nothing would please me more than a giant “updated” section showing off some tools.

Second, if this is a problem you have let’s collaborate a bit. This would make a great open source project (I have some ideas I’m already working on, more about those in a future post). The best way is to hit me up on twitter @joshbressers

Episode 111 - The TLS 1.3 and DNS episode

Posted by Open Source Security Podcast on August 27, 2018 12:03 AM
Josh and Kurt talk about TLS 1.3 and DNS. What can we expect from the future for these, how are they related (or not related). We touch on DNSSEC and why it probably won't matter. DNS over TLS is looking pretty great though. There is also a guest appearance from quantum crypto.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/6963261/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes


Running software collections maven from a script

Posted by Adam Young on August 25, 2018 04:26 AM

If I want to run software collections code without enabling bash and running interactively, I have to pass the whole command on the command line like this:

scl enable rh-maven35 "mvn package"

I’ll need to use this form to run from Ansible.

Networking Acronyms

Posted by Adam Young on August 25, 2018 02:04 AM

My new role has me paying attention to the Network side of cloud a lot more than I had to in the past. One thing I’ve noticed about Networking is that it has a lot of acronyms, and people that work in it tend to throw them out in context and move on. This is my collection of recent acronyms and their meanings.

I will continue to update this one as I come across additional relevant terms and acronyms.

Maven With Software Collections

Posted by Adam Young on August 25, 2018 02:04 AM

I’ve been interested in the intersection of Ansible and Java development.  To test this out, I want to build a “Hello World” maven App and use Ansible to drive the process to build, test, and deploy it.  I’m going to use the Software Collections way of installing and running Maven to build a simple Tomcat Web Application as the basis.

In a past article I enabled software collections:  That is still valid and necessary.

Install the software collection

 sudo yum install rh-maven35

Enable the rh-maven35 software collection

scl enable rh-maven35 bash
[ayoung@ayoung ~]$ cd ~/devel/
[ayoung@ayoung devel]$ mkdir ghoul
[ayoung@ayoung devel]$ cd ghoul/

Then create a new WebApp with

 mvn archetype:generate -DgroupId=net.younglogic.apps -DartifactId=GhoulWebApp -DarchetypeArtifactId=maven-archetype-webapp -DinteractiveMode=false

 

(lots of output ending with)

[INFO] project created from Old (1.x) Archetype in dir: /home/ayoung/devel/ghoul/GhoulWebApp
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.751 s
[INFO] Finished at: 2018-08-24T21:18:44-04:00
[INFO] Final Memory: 15M/395M
[INFO] ------------------------------------------------------------------------
[ayoung@ayoung ghoul]$

Do some basic git setup.

cd GhoulWebApp/
[ayoung@ayoung GhoulWebApp]$ git init .
Initialized empty Git repository in /home/ayoung/devel/ghoul/GhoulWebApp/.git/
[ayoung@ayoung GhoulWebApp]$ ls
pom.xml src
[ayoung@ayoung GhoulWebApp]$ git add pom.xml src/
[ayoung@ayoung GhoulWebApp]$ git commit -m "initial project generation"
[master (root-commit) 783f3c1] initial project generation
3 files changed, 33 insertions(+)
create mode 100644 pom.xml
create mode 100644 src/main/webapp/WEB-INF/web.xml
create mode 100644 src/main/webapp/index.jsp

Compile with

mvn package

lots of output ending with

[INFO] Processing war project
[INFO] Copying webapp resources [/home/ayoung/devel/ghoul/GhoulWebApp/src/main/webapp]
[INFO] Webapp assembled in [17 msecs]
[INFO] Building war: /home/ayoung/devel/ghoul/GhoulWebApp/target/GhoulWebApp.war
[INFO] WEB-INF/web.xml already added, skipping
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 12.426 s
[INFO] Finished at: 2018-08-24T21:23:37-04:00
[INFO] Final Memory: 14M/376M
[INFO] ------------------------------------------------------------------------

Run with

 

mvn clean install tomcat:run

lots of output ending with

[INFO] Running war on http://localhost:8080/GhoulWebApp
[INFO] Creating Tomcat server configuration at /home/ayoung/devel/ghoul/GhoulWebApp/target/tomcat
Aug 24, 2018 9:37:06 PM org.apache.catalina.startup.Embedded start
INFO: Starting tomcat server
Aug 24, 2018 9:37:06 PM org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.29
Aug 24, 2018 9:37:06 PM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8080
Aug 24, 2018 9:37:06 PM org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-8080

Check in a web browser:

http://localhost:8080/GhoulWebApp/And Stack trace with

 

org.apache.jasper.JasperException: Unable to compile class for JSP: 

An error occurred at line: 1 in the generated java file
The type java.io.ObjectInputStream cannot be resolved. It is indirectly referenced from required .class files

Known error, need to move to Tomcat 8.  Add this to the pom.xml

diff --git a/pom.xml b/pom.xml
index 4f73c5c..397207f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -16,6 +16,16 @@
     </dependency>
   </dependencies>
   <build>
+    <plugins>
+      <plugin>
+       <groupid>org.apache.tomcat.maven</groupid>
+         <artifactid>tomcat7-maven-plugin</artifactid>
+         <version>2.1</version>
+         <configuration>
+           /
+         </configuration>
+      </plugin> 
+    </plugins>
     <finalname>GhoulWebApp</finalname>
   </build>
 </project>

And run with

 

 mvn clean install tomcat7:run

Lots of output ending with

[INFO] Creating Tomcat server configuration at /home/ayoung/devel/ghoul/GhoulWebApp/target/tomcat
[INFO] create webapp with contextPath: 
Aug 24, 2018 9:52:31 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-bio-8080"]
Aug 24, 2018 9:52:31 PM org.apache.catalina.core.StandardService startInternal
INFO: Starting service Tomcat
Aug 24, 2018 9:52:31 PM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.37
Aug 24, 2018 9:52:32 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-bio-8080"]

 

Look in a web browser:

Hello World!

OK,  lets check this in to git:

[ayoung@ayoung GhoulWebApp]$ git status
# On branch master
# Changes not staged for commit:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: pom.xml
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# target/
no changes added to commit (use "git add" and/or "git commit -a")
[ayoung@ayoung GhoulWebApp]$ git add pom.xml
[ayoung@ayoung GhoulWebApp]$ git commit -m "Move to tomcat7 or later"
[master 1e82829] Move to tomcat7 or later
1 file changed, 10 insertions(+)

And we have a sample app to use with Ansible.

 

 

 

 

 

Creating a Job in Ansible Tower via the REST API

Posted by Adam Young on August 23, 2018 09:59 PM

Now that we can use the REST API to list inventory, it is not a big stretch to decide we want to kick off Jobs, too. Here it is in a nutshell, and some related operations for working with jobs and templates.

To List Templates:

curl -s  -k  -u $CREDS https://$TOWER_HOST/api/v2/job_templates/ | jq '.results | .[] | .name '

"MOC Stack  Teardown Production"
"MOC Stack Update Causal"
"RDU OpenShift Build"
"RDU OpenShift Tear Down"
"RDU Stack Provision"
"RDU Stack Subscribe"
"Stack Tear Down"
"TestCred"

To Select the launch URL for a Template by name:

 curl -s  -k  -X POST -u $CREDS https://$TOWER_HOST/api/v2/job_templates/11/launch/ | jq '.url'
"/api/v2/jobs/1018/"

To Launch it (and get the resulting job URL:

curl -s  -k  -X POST -u $CREDS https://$TOWER_HOST/api/v2/job_templates/11/launch/ | jq '.url'
"/api/v2/jobs/1021/"

To figure out what happened on a job:

 curl -s  -k  -u $CREDS https://$TOWER_HOST/api/v2/jobs/1015/ | jq '.job_explanation'
"Previous Task Failed: {\"job_type\": \"project_update\", \"job_name\": \"Rippowam\", \"job_id\": \"1016\"}"

To Find the URL of Template used to launch a job:

curl -s  -k  -u $CREDS https://$TOWER_HOST/api/v2/jobs/1021/ | jq '.related | .job_template'
"/api/v2/job_templates/11/"

To list the names of templates used to kick off jobs:

curl -s  -k  -u $CREDS https://$TOWER_HOST/api/v2/jobs/ | jq '.results | .[]| .summary_fields| .job_template | .name'

To Kick Off a Job again:

export RELAUNCH_URL=$( curl -s  -k  -u $CREDS https://$TOWER_HOST/api/v2/jobs/1021/ | jq -r '.related | .relaunch' )
curl -s  -k  -X POST -u $CREDS https://$TOWER_HOST$RELAUNCH_URL

Note the -r option to curl to strip the quotes, and the lack of a / between $TOWER_HOST and $RELAUNCH_URL.

Using an Ansible Tower Inventory from Command Line Ansible

Posted by Adam Young on August 23, 2018 08:26 PM

In an earlier post, I wrote about using the OpenStack Ansible inventory helper when calling and Ansible command line tools. However, When developing an playbook, often there is more information pulled from the inventory than just the set of hosts. Often, the inventory also collects variables that are used in common across multiple playbooks. For this reason, and many more, I want to be able to call an Ansible playbook or Ad-Hoc command from the command line, but use the inventory as defined by an Ansible Tower instance. It turns out this is fairly simple to do, using the REST API.

If you have a working Ansible Tower instance, you can find the REST API starting point by editing the URL in your browser. For example, I have a VM running on my machine listening on local IP address 192.168.122.151. If, after authenticating, I edit the URL to https://192.168.122.151/api/ I see the page below:

To navigate to this same page via curl, I run:

curl -s -k https://192.168.122.151/api/

Which returns:

{"available_versions":{"v1":"/api/v1/","v2":"/api/v2/"},"description":"AWX REST API","current_version":"/api/v2/"}

To go much deeper into the API I need to be authenticated.  I can do that here using Basic-Auth, and a little jq to make it legible:

 export CREDS='admin:password'

$ curl -s -k -u $CREDS https://192.168.122.151/api/v2/ | jq ' 

Which returns

 
{
"authtoken": "/api/v2/authtoken/",
"ping": "/api/v2/ping/",
"instances": "/api/v2/instances/",
"instance_groups": "/api/v2/instance_groups/",
"config": "/api/v2/config/",
"settings": "/api/v2/settings/",
"me": "/api/v2/me/",
"dashboard": "/api/v2/dashboard/",
"organizations": "/api/v2/organizations/",
"users": "/api/v2/users/",
"projects": "/api/v2/projects/",
"project_updates": "/api/v2/project_updates/",
"teams": "/api/v2/teams/",
"credentials": "/api/v2/credentials/",
"credential_types": "/api/v2/credential_types/",
"inventory": "/api/v2/inventories/",
"inventory_scripts": "/api/v2/inventory_scripts/",
"inventory_sources": "/api/v2/inventory_sources/",
"inventory_updates": "/api/v2/inventory_updates/",
"groups": "/api/v2/groups/",
"hosts": "/api/v2/hosts/",
"job_templates": "/api/v2/job_templates/",
"jobs": "/api/v2/jobs/",
"job_events": "/api/v2/job_events/",
"ad_hoc_commands": "/api/v2/ad_hoc_commands/",
"system_job_templates": "/api/v2/system_job_templates/",
"system_jobs": "/api/v2/system_jobs/",
"schedules": "/api/v2/schedules/",
"roles": "/api/v2/roles/",
"notification_templates": "/api/v2/notification_templates/",
"notifications": "/api/v2/notifications/",
"labels": "/api/v2/labels/",
"unified_job_templates": "/api/v2/unified_job_templates/",
"unified_jobs": "/api/v2/unified_jobs/",
"activity_stream": "/api/v2/activity_stream/",
"workflow_job_templates": "/api/v2/workflow_job_templates/",
"workflow_jobs": "/api/v2/workflow_jobs/",
"workflow_job_template_nodes": "/api/v2/workflow_job_template_nodes/",
"workflow_job_nodes": "/api/v2/workflow_job_nodes/"
}

This call lays out all of the top level objects available in tower. The “inventory” suburl, “/api/v2/inventories/,” enumerates the inventory objects recorded in the system. Because there is a lot of information available on each object, I’ve found it necessary to filter down by name:

 $ curl -s -k -u $CREDS https://192.168.122.151/api/v2/inventories/ | jq '.results | .[] | .name'
"Demo Inventory"
"Generic OCP Inventory"
"Ibis"
"MOC_RedHatFSI"
"RDU-OpenStack"

And to get access to a specific inventory:

 curl -s -k -u $CREDS https://192.168.122.151/api/v2/inventories/ | jq '.results | .[] | select (.name == "MOC_RedHatFSI") '

Again, returning a lot of information. What I care about, however, is the URL that I can call to create an inventory json structure consumable by Ansible. And this is in the related collection in the field script.

$ curl -s -k -u $CREDS https://192.168.122.151/api/v2/inventories/ | jq '.results | .[] | select (.name == "MOC_RedHatFSI") |.related '

{
"created_by": "/api/v2/users/1/",
"job_templates": "/api/v2/inventories/4/job_templates/",
"variable_data": "/api/v2/inventories/4/variable_data/",
"root_groups": "/api/v2/inventories/4/root_groups/",
"object_roles": "/api/v2/inventories/4/object_roles/",
"ad_hoc_commands": "/api/v2/inventories/4/ad_hoc_commands/",
"script": "/api/v2/inventories/4/script/",
"tree": "/api/v2/inventories/4/tree/",
"access_list": "/api/v2/inventories/4/access_list/",
"activity_stream": "/api/v2/inventories/4/activity_stream/",
"instance_groups": "/api/v2/inventories/4/instance_groups/",
"hosts": "/api/v2/inventories/4/hosts/",
"groups": "/api/v2/inventories/4/groups/",
"update_inventory_sources": "/api/v2/inventories/4/update_inventory_sources/",
"inventory_sources": "/api/v2/inventories/4/inventory_sources/",
"organization": "/api/v2/organizations/1/"
}

Peeking at the source code for AWX shows just how it uses this field in when one Tower instance acts as the inventory for another.

That lead me to write a short script that I can call as part of the Ansible command line:

$ cat ~/bin/tower-inventory.sh 
#!/bin/sh
export TOWER_CREDS='admin:password'
export TOWER_HOST=192.168.122.151
export INVNUM=4

curl  -s -k  -u $TOWER_CREDS https://$TOWER_HOST/api/v2/inventories/$INVNUM/script/?hostvars=1&towervars=1&all=1

Which I can then use as below:

$ ansible --key-file ~/keys/id_rsa  --user cloud-user   -i ~/bin/tower-inventory.sh idm -m shell -a 'echo $USER'
idm | SUCCESS | rc=0 >>
cloud-user

This should be sufficient to get you started. This script can be more flexible. Aside from the obvious changes to consume external values for the variables, it could mimic the work I performed earlier in the article to navigate down, and find an inventory by name. However, that would also slow down the processing, as there would be several more round trip calls to tower.

Actionable Advice

Posted by Josh Bressers on August 22, 2018 08:38 PM

I gave a talk at OSCON 20 about security. It’s not a typical security talk though. I’ve given and attended a lot of what I would call “typical” security presentations. It’s generally about some big security idea, there’s likely some amount of blaming everyone except the security industry itself. We should make sure we throw in some analogies, maybe comparing cars to buggies or bridge safety. Blockchain is pretty hip now so that can probably solve the problem, maybe with AI. In general these presentation aren’t overly exciting and tend to play to the audience. They are fun, but that’s not the point this time.

The best part about getting to give a security talk at OSCON is I’m not talking to a security audience, I get to talk to developers about security. Developers, the ones who do the actual work, sometimes in spite of their security teams causing friction and slowing things down. It’s very common for security guidance to lack actionable advice. You need to use a strong password! OK, sure, but why and what does that mean? How do I write secure code? How can I fix these security problems you just told me my project has? I tried to fill my talk with actionable advice for the developers. Also bad jokes.

Actionable advice is hard. It’s very easy to point out what’s wrong with something, it’s probably ten times harder to actually fix it. Brian Kernighan has a quote that I like to use to explain this “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” The same basic thing holds true for fixing any problem, security problems included. Fixing problems can be very difficult, helping someone else understand then fix a problem is REALLY difficult.

A great example can be found with cross site scripting (XSS) security flaws. These are bugs that basically let an attacker take over the content on your website (I know this is a gross oversimplification). In many instances the developer will get a report about a XSS bug found in the website, so they fix the bug. There is literally an infinite number of these bugs on every website. Developers are adding new XSS bugs faster than anyone is fixing old ones. What if I also told you there is a way to fix all of these problems. Forever!

Well, nothing is really forever, but this is one of the examples I use during this presentation. If we look at the OWASP Top 10 we can get a sense for the most common mistakes in web applications. In the 2017 list XSS was #7. I expect it will always be on the Top 10 list. I like OWASP a lot, they’re a great group and you should get involved if you’re not already. But I do have some issues with the Top 10 list from the viewpoint of non security developers. The list doesn’t contain actionable advice in the way I would like. It treats these issues as being unrelated and often offers a number of possible solutions to each of the top ten.

If you pick a modern web framework and use it properly, you can remove about half that list! That’s pretty wild if you think about it.

There’s a little more nuance than this of course. You also have to keep your framework updated, and you better make sure it has a healthy upstream. You will also make mistakes. Everyone makes mistakes. When mistakes are made fix them fast. We love to focus on blame but that’s not very useful. What is useful is having the ability to move fast.

I could of course go on in more detail, but the basic idea for the presentation is I break the OWASP Top 10 into something closer to the top 3. Everyone can remember three things, nobody can remember ten.

One of my goals is to discuss security with everyone. While security conferences are a lot of fun, the topics are often self serving and not reaching beyond the typical security people. It’s very common to hear “security should be everyone’s job”. This statement is sort of silly if you think about it. Is electricity everyone’s job? No, it’s just something that exists and we don’t really think about unless it’s broken. Security should be like electricity or plumbing. It exists, it’s pretty easy to use correctly, and as long it’s doing what it was designed to do, nobody worries about it.

Using the OpenStack inventory helper for Ansible

Posted by Adam Young on August 20, 2018 02:27 AM

While Tower makes it easy to manage custom inventory, I still want to develop using the command line. Thus, I want to generate a comparable smart inventory for my Ansible playbook as I get from tower.

THe inventory helpers for Ansible are committed to a contrib subdirectory upstream. For example, the openstack inventory helper is here:

I cloned the git repo and checked out the devel branch (contrib is hidden in release branches) so I could run the script. Then source my keystone.rc file:

. ~/devel/openstack/moc/keystonev3.rc

And make sure I have some hosts:

$ openstack server list
+--------------------------------------+------------------------+--------+------------------------------------------------------+------------------------+-----------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------------------------+--------+------------------------------------------------------+------------------------+-----------+
| 1ece60af-4fe5-4cb4-9da0-605f0e37b708 | dev1jboss.awx.devstack | ACTIVE | dev1_network=192.168.24.10, 1.1.1.134 | rhel-guest-image-7.5-1 | m1.medium |
| 737c8c9c-4a90-46ed-bf2c-16e042c8547e | datagen.awx.devstack | ACTIVE | causal_network=192.168.24.9, 1.1.1.95 | rhel-guest-image-7.5-1 | m1.xlarge |
| 4992c139-4b21-4c38-8912-3e6f8f6e1e7a | idm.awx.devstack | ACTIVE | awx-private-net_network=192.168.24.12, 1.1.1.132 | rhel-guest-image-7.5-1 | m1.medium |
+--------------------------------------+------------------------+--------+------------------------------------------------------+------------------------+-----------+

(NOTE: I modified the IP addresses for the floating, public accessible hosts)

Now run the script:

 /home/ayoung/devel/ansible/contrib/inventory/openstack_inventory.py --list

And I get a lot of output:

{
  "_meta": {
    "hostvars": {
      "1ece60af-4fe5-4cb4-9da0-605f0e37b708": {
        "ansible_host": "1.1.1.134", 
        "ansible_ssh_host": "1.1.1.134", 
        "openstack": {
...
  "nova": [
    "4992c139-4b21-4c38-8912-3e6f8f6e1e7a", 
    "1ece60af-4fe5-4cb4-9da0-605f0e37b708", 
    "737c8c9c-4a90-46ed-bf2c-16e042c8547e"
  ]
}

Each server as named by Nova gets its own entry. For example, I have a server named dev1jboss.awx.devstack and I can query it via:

$/home/ayoung/devel/ansible/contrib/inventory/openstack_inventory.py --host dev1jboss.awx.devstack | jq '.name'
"dev1jboss.awx.devstack"

I can run an ad-hoc command using the inventory as a parameter:

ansible  --key-file ~/keys/id_rsa  --user cloud-user  -i /home/ayoung/devel/ansible/contrib/inventory/openstack_inventory.py dev1jboss.awx.devstack -m setup

If I have a playbook that can work with all hosts, and I want to limit it to just one, I can use the -l or –limit option. I use that, for example for installing an application on a random host. If I want to install on only Development boxes (versus production) I can run:

ansible-playbook  --key-file ~/keys/id_rsa -e cloud_user=cloud-user  --user cloud-user  -i /home/ayoung/devel/ansible/contrib/inventory/openstack_inventory.py playbooks/jbosseap.yml --limit dev1jboss* -e @~/vault.yml

After working through this, I decided to redo my setup so that the server name was the short version (sso vs sso.awx.devstack) so the host grouping would match what the setup scripts required. I still would like to find a way to set the group information in the OpenStack layer, but it seems like the best bet is to use a hostname regex match in Tower.

Episode 110 - Review of Black Hat, Defcon, and the effect of security policies

Posted by Open Source Security Podcast on August 20, 2018 12:00 AM
Josh and Kurt talk about Black Hat and Defcon and how unexciting they have become. What happened with hotels at Defcon, and more importantly how many security policies have 2nd and 3rd level effects we often can't foresee. We end with important information about pizza, bananas, and can openers.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/6944689/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes


Resetting the Configuration of a SRX220

Posted by Adam Young on August 16, 2018 08:20 PM

I’m trying to do only the minimal amount via Minicom to get the SRX220 up and running. The goal is to then do the remainder of the work via Ansible.  These are my notes on resetting the device back to an initial configuration.

Following the guide here: https://www.juniper.net/documentation/en_US/release-independent/junos/topics/task/configuration/services-gateway-srx220-configuring-with-cli.html

To start with, I powered on an logged in with the machine in its old configuration. I pressed and held the config-reset buttong for 15 seconds until I got the message:

Broadcast Message from root@arceus
(no tty) at 17:26 UTC...

Config button pressed

Committing factory default configuration

And then I pressed and held the power button for 15 seconds to reboot the machine. At the end of the boot process is see:

Amnesiac (ttyu0)                                                                
                                                                                
login: 

Which indicates the hostname has been reset. A good sign. Accordign to the docs I can sign in with root and no password:

But, it seems my old password is still set, and I can log in as the admin account.  I log in as admin, and see if the current configuration is valid.

 

admin> configure shared 
Entering configuration mode 
The configuration has been changed but not committed 

{hold:node0}[edit] 
admin# commit 
[edit] 
'system' 
Missing mandatory statement: 'root-authentication' 
error: commit failed: (missing statements) 

{hold:node0}[edit] 
admin#

So it seems it does not like what I have:  let me try resetting the admin account:

Set that:

admin# set system root-authentication plain-text-password 
New password: 
Retype new password: 

{hold:node0}[edit] 
admin# commit

And that fails with an error I expect:

[edit interfaces] 
'ge-0/0/6' 
HA management port cannot be configured 
error: configuration check-out failed 

{hold:node0}[edit]

In its default set up, it is looking for clustering support on the last two interfaces:  ge-0/0/6 and 7.

Delete the two ge interfaces:

{hold:node0}[edit]
admin# delete interfaces ge-0/0/6

{hold:node0}[edit]
admin# delete interfaces ge-0/0/7

And then commit.  And it works.  But this seems suboptimal.  I wonder if I can recreate them.  Power cycle the machine to check the state:

I can now log in as root with the password I set above.  So my changes “took.”

set system host-name arceus.home.younglogic.net

set system login user admin class super-user authentication plain-text-password

set system login user admin class super-user authentication ssh-rsa "ssh-rsa {key}"

set system login user ansible class super-user authentication ssh-rsa "ssh-rsa {key}"

set system services netconf ssh

For communication between the router and the jump host, I am going to use the default subnet:

192.168.1.0/24

The cable that connects my Jump Host to the SRX 220 is on Port 0/0/1 (numbering starts at 0 on the left). Running:

show config | match "set system services" | display set

returns

set system services ssh
set system services telnet
set system services xnm-clear-text
set system services web-management http interface vlan.0
set system services web-management https system-generated-certificate
set system services web-management https interface vlan.0
set system services dhcp router 192.168.1.1
set system services dhcp pool 192.168.1.0/24 address-range low 192.168.1.2
set system services dhcp pool 192.168.1.0/24 address-range high 192.168.1.254
set system services dhcp propagate-settings ge-0/0/0.0

I have to manually set the IP address for the Jump Host Interface:

$ cat /etc/sysconfig/network-scripts/ifcfg-enp3s0 
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
#BOOTPROTO=dhcp
BOOTPROTO=static
IPADDR=192.168.1.10
PREFIX=24
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=enp3s0
UUID=81d49a60-6c61-4764-83c7-c46a5ddc3c8c
DEVICE=enp3s0
ONBOOT=yes

And I can now ping the machine.

To SSH to the machine:

[ayoung@dialga aj]$ ssh -i ~/keys/id_rsa admin@192.168.1.1 
--- JUNOS 12.1X46-D55.3 built 2016-07-08 18:46:54 UTC
{primary:node0}

Physical Home Cluster Setup

Posted by Adam Young on August 15, 2018 07:11 PM

I’ve been building a home cluster for investigative work.   Here’s what I have so far.

Table of contents

Hardware List

Jump host:
Hostname: Dialga

Dell inspiron with extra Intel Pro/1000 Dual port Server Adapter

3 Nodes Dell Poweredge 610 ( each with a different Solid State Drive)

Juniper SRX 220 Router

Tenda 8 port Gigabit Desktop switch

The whole thing is housed in a  StarTech.com 12U Adjustable Depth Open Frame 4 Post Server Rack

Here’s what it looks like:

# Node Cluster plus Jump Host

Physical Network

This is a close up of the SRX 220

 

And the Jump host named Dialga:

Dialga has an onboard ethernet port and an additional NIC with 2 ports.  The Gray cable visible in the picture is connected to the onboard, and is the primary access point of the whole cluster.  Dialga is running Fedora 28.  Thus, the ports are named: enp3s0, enp1s0f0, and enp1s0f1.  Since enp3s0 is plugged in to a wall socket that connects to the company network, this is the only port that has an IP address by default:

 

2: enp3s0: <broadcast> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether c8:1f:66:46:1a:43 brd ff:ff:ff:ff:ff:ff
    inet 10.18.57.195/23 brd 10.18.57.255 scope global dynamic noprefixroute enp3s0
       valid_lft 85852sec preferred_lft 85852sec
    inet6 2620:52:0:1238:7ba1:d6c3:fead:140e/64 scope global dynamic noprefixroute 
       valid_lft 2591453sec preferred_lft 604253sec
    inet6 fe80::7340:db2e:1af9:62a0/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

This is what I use to SSH in to dialga from my workstation.

Port enp1s0f0  is connected via the Blue Cable to port 0/0/1 of the Switch, and will be used for configuration.  The goal is to have SSH enabled on the SRX 220 via this port only.

Port enp1s0f1 is connected via the Black Cable to port 0/0/5 of the Switch, and will be used for Director; PXE and IPMI to start.

The console cable has an RJ45 connector on one end and a USB connector on the other, with a conversion in the middle from/to serial.  I can connect from Dialga to the SRX 220 over this using  Minicom.

 

 

 

Episode 109 - OSCon and actionable advice

Posted by Open Source Security Podcast on August 13, 2018 12:11 AM
Josh and Kurt talk about phishing training and how it doesn't really matter. Josh spoke at OSCon and comes back with some fun observations and advice. People want practical actionable advice and we're not good at that.

<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/6916665/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes


Optimizing R: Replace data.frame with parallel vectors

Posted by Adam Young on August 10, 2018 04:03 AM

“First make it work, then make it faster” — Brian Kerningham

However, if the program takes too long to run, it might not be possible to make it correct. Iterative development breaks down when each iteration takes several hours. Some R code I was working with was supposed to run for a 1000 iterations, but with each iteration taking 75 seconds to run, we couldn’t take the time to let it run to completion. The problem was the continued appending of rows to a data.frame object.  Referred to as the Second Circle here. That in turn points to this larger article about problems R programmers find themselves facing. . How does one extricate oneself? Here were my steps.

The Problem

Since Data.frames are organized as a set of parallel vectors, adding a new row requires reallocation and copy.  We tend to think of the data like this:  Each row contains one entry each of a (A, Y, Z) represented by one of each color Blue, Green, and Red.

When we want to add a row, we think of it like this:

But that is not accurate.  Many operations have to be performed on all of the values of a single column.  Thus, all the A (Blue) values are kept in contiguous memory.  So internally the computer lays them out like this:

And adding a row is more like two inserts and an append:

Because all the Data is laid out end to end, the Y and Z columns have to be moved.  This is expensive.

The fix is to avoid treating the data as a table until all of the data is generated.  When working with individual vectors, we can append values to a single vector with moving the others.

Refactoring The Code

Get it into Git

Even if you are only working with a local repository, Git makes it much easier to revert to a known working version.

create a subdirectory, change into it, and then:

git init
git add *.R

When ever you make a change that you want to keep:

git add *.R
git commit -m "Message about the commit."

Commit early, commit often.

I also put a few entries into .gitignore. Here is what I have:

$ cat .gitignore 
*~
*Rout
.RData
.Rhistory
rprof.out

Unit Testing

Setup a unit test or two to confirm the function has not changed. In my case, the code calculates a value called bias_nowt.

[1] “bias_nowt”
[1] 0.2189567

So long as I continue to product this number, my code behaves the same, and I can continue to restructure.

Ideally, I would be checking this value in an automated manner, but this code is not yet in the vicinity of being an R package, and so setting up Unit tests would be more work than called for. It would not take much for me to change my mind on this, though.

Remove commented out code

Make it possible to function inside your code. In the case of the code I am working with, lots of trial and error has lead to much, much commented out code. Removing it reduced the lines of code from 554 to around 300. It makes it easier to not get lost. With old code committed to git, you don’t need to leave it inside the working code.

  • Remove commented out Code.
  • Run the unit tests.
  • Commit to Git.

Profiling

Add a main function

You are going to want to call your code from somewhere else. Take everything that is “top level” and put it into a function.

main(){
#all your top level code
}

Create a small wrapper that calls the main, like this:

$ cat datagen.R

Which contains

source("datagen_ipw_instrumented.R")
main()

and can be run:

 R CMD BATCH datagen.R

Run the unit tests. Commit to Git.

Add a profiling wrapper

$ cat profile-datagen.R

Which contains

source("datagen_ipw_instrumented.R")
Rprof("rprof.out")
main()
Rprof(NULL)
summaryRprof("rprof.out")

Use RProf to get actual timing numbers

When I run the code now, I see the times spend in various functions.

At the top of the output, I see:

$by.self
                        self.time self.pct total.time total.pct
"rbind"                     14.64    18.87      40.78     52.57
"[[.data.frame"              4.94     6.37      13.74     17.71

And further down

$by.total
                          total.time total.pct self.time self.pct
"main"                         77.58    100.00      0.02     0.03
"FUN"                          70.12     90.38      0.28     0.36
"lapply"                       70.12     90.38      0.02     0.03
"datagen"                      70.04     90.28      1.86     2.40
"rbind"                        40.78     52.57     14.64    18.87
"$"                            20.06     25.86      3.00     3.87

It takes a little bit of analysis of the code to realize that lapply is calling FUN, which is calling datagen, which is calling rbind. The goal is to replace the rbind calls
with something faster.

Start State

Each time the function is called, it initializes some variables, and then uses them to create each of the rows. The initialization code looks like this:

 id <- as.numeric(i)
 t0 <- 0
 U <- runif(1, 0, 1) # Uniformly distributed between 0 and 1
 L1 <- rnorm(1, mean=beta1_0+beta1_1*U, sd=sigma)
 cumavgL1 <- cumsum(L1)[1]/1 # Calculate cumavg(L1)
 L2 <- rbinom(1, 1, plogis(beta2_0+beta2_1*U+beta2_2*L1))
  A <- rbinom(1, 1, plogis(alpha0+alpha1*cumavgL1+
                             alpha2*L2+alpha3*t0))

  Y <- rbinom(1, 1, plogis(theta0+theta1*U+theta2*A+theta3*L1+theta4*L2))

A few are also passed in as parameters. Later on, it creates a Data.Frame:

 temp <- data.frame(Z = Z, id = id, t0 = t0, U = U, L1 = L1, L2 = L2, cavgL1 = cumavgL1,
                     A = A, Alag1 = Alag1, Y = Y_, wgt_temp = wgt_temp)

Because these are all scalar values, the data.frame is a single row.
My goal is to replace these with vectors, and call data.frame one with the completed vector.

Introduce Vectors

I added Vectors to the code and insert data into them Parallel to theData.Frame.  My naming convention was to create a new variable that matched the original, but with an underbar after it. A Scalar named A would be paired with a Vector named A_.

Code looks like this:

 
  id <- as.numeric(i)
  id_ <- c(as.numeric(i))
  
  t0 <- 0
  t0_ <- c(0)
 
  U <- runif(1, 0, 1) # Uniformly distributed between 0 and 1
  U_ <- c(U)
 
  L1 <- rnorm(1, mean=beta1_0+beta1_1*U, sd=sigma)
  L1_ <- c(L1)
 
  cumavgL1 <- cumsum(L1)[1]/1 # Calculate cumavg(L1)
  cumavgL1_ <- c(cumavgL1)
 
  L2 <- rbinom(1, 1, plogis(beta2_0+beta2_1*U+beta2_2*L1))
  L2_ = c(L2)

  A <- rbinom(1, 1, plogis(alpha0+alpha1*cumavgL1+
                             alpha2*L2+alpha3*t0))
  A_ = c(A)

And so on.

This is a good time to run the unit tests. Nothing should change. Confirm that with the test.

If the run successfully, check the changes in to git. It is an interim change, but one that you might want to roll back to if you mess up the next step.

Add new values to the Vectors

When the loop iterates, it generates a new set of values, and appends to the array. The code looks like this:

    #  Add data to dataframe
      temp <- rbind(temp, c(Z, id, t0, U, L1star, L2star, cumavgL1,
                            Astar, tail(temp$A, 1), Ystar, NA))

I want to make sure all of these values are also captures in the Arrays. I added this code at the base of the loop:

      Z_[j]         <- Z
      id_[j]        <- id
      t0_[j]        <- t0
      U_[j]         <- U
      L1_[j]        <- L1star
      L2_[j]        <- L2star
      cumavgL1_[j]  <- cumavgL1
      A_[j]         <- Astar
      Alag1_[j]     <- tail(A_, 1)
      Y_[j]         <- Ystar
      wgt_temp_[j]  <- NA

Run the unit tests. If the run successfully, check the changes in to git.

Replace data.frame usage

Once all of the Vectors were in place, I started using them in places where the data.frame structure had been used previously. And example of what the code looks like to start:

        L1star <- rnorm(1, mean=beta1_0+beta1_1*U+beta1_2*temp$A[j-1]+
                          beta1_4*cumsum(temp$L1)[j-1]/(j-1)+beta1_5*temp$L2[j-1]+
                          beta1_7*t0, sd=sigma)

Note the temp$A[j-1] usage. The variable temp is the data.frame. Since we've captured the same data in the Vector, we can replace the data.frame use with the vector A_:

        L1star <- rnorm(1, mean=beta1_0+beta1_1*U+beta1_2*A_[j-1]+
                          beta1_4*cumsum(L1_)[j-1]/(j-1)+beta1_5*L2_[j-1]+
                          beta1_7*t0, sd=sigma)

After each of these, rerun the unit tests. You might be tempted to make more changes than this at once and skip running the tests (they take close to 90 seconds at this point) but don't. You will save more time in the long run triggering your errors right as you make them.

Once all of the temp$ uses have been replaced with Vectors, we can build the return data out of the Vectors instead of using the append. That looks like this:

Replace data.frame return value

  temp <- data.frame(Z = Z_, id = id_, t0 = t0_, U = U_, L1 = L1_, L2 = L2_, cavgL1 = cumavgL1_,
                     A = A_, Alag1 = Alag1_, Y = Y_, wgt_temp = wgt_temp_)

  return (temp)

Run the unit tests. Commit.

You will also want to remove the other call to data.frame and the call to rbind.

Run the unit tests. Commit.

Profile. I get:

$by.total
                          total.time total.pct self.time self.pct
"main"                         28.92    100.00      0.06     0.21
"lapply"                       21.42     74.07      0.04     0.14
"FUN"                          21.40     74.00      0.04     0.14
"datagen"                      21.34     73.79      2.14     7.40
"tail"                          8.24     28.49      1.14     3.94

and an overall time of

 > proc.time()
   user  system elapsed 
 29.400   0.343  29.734 

A reduction From 75 seconds to 29.4. Cut the time to execute more than in half. I'm sure we can make additional performance improvements, but we won't be clipping 35 seconds off the time again.

Episode 108 - Bluetooth, phishing, airgaps, and eating soup off the floor

Posted by Open Source Security Podcast on August 06, 2018 01:01 AM
Josh and Kurt talk about the latest attack on bluetooth and discuss phishing in the modern world. U2F is a great way to stop phishing, training is not. We also discuss airgaps in response to attacks on airgapped power utilities.


<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/6890393/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes


Photography - Why You Should Use JPG (not RAW)

Posted by William Brown on August 05, 2018 02:00 PM

Photography - Why You Should Use JPG (not RAW)

When I started my modern journey into photography, I simply shot in JPG. I was happy with the results, and the images I was able to produce. It was only later that I was introduced to a now good friend and he said: “You should always shoot RAW! You can edit so much more if you do.”. It’s not hard to find many ‘beginner’ videos all touting the value of RAW for post editing, and how it’s the step from beginner to serious photographer (and editor).

Today, I would like to explore why I have turned off RAW on my camera bodies for good. This is a deeply personal decision, and I hope that my experience helps you to think about your own creative choices. If you want to stay shooting RAW and editing - good on you. If this encourages you to try turning back to JPG - good on you too.

There are two primary reasons for why I turned off RAW:

  • Colour reproduction of in body JPG is better to the eye today.
  • Photography is about composing an image from what you have infront of you.

Colour is about experts (and detail)

I have always been unhappy with the colour output of my editing software when processing RAW images. As someone who is colour blind I did not know if it was just my perception, or if real issues existed. No one else complained so it must just be me right!

Eventually I stumbled on an article about how to develop real colour and extract camera film simulations for my editor. I was interested in both the ability to get true reflections of colour in my images, but also to use the film simulations in post (the black and white of my camera body is beautiful and soft, but my editor is harsh).

I spent a solid week testing and profiling both of my cameras. I quickly realised a great deal about what was occuring in my editor, but also my camera body.

The editor I have, is attempting to generalise over the entire set of sensors that a manufacturer has created. They are also attempting to create a true colour output profile, that is as reflective of reality as possible. So when I was exporting RAWs to JPG, I was seeing the differences between what my camera hardware is, vs the editors profiles. (This was particularly bad on my older body, so I suspect the RAW profiles are designed for the newer sensor).

I then created film simulations and quickly noticed the subtle changes. Blacks were blacker, but retained more fine detail with the simulation. Skin tone was softer. Exposure was more even across a variety of image types. How? RAW and my editor is meant to create the best image possible? Why is a film-simulation I have “extracted” creating better images?

As any good engineer would do I created sample images. A/B testing. I would provide the RAW processed by my editor, and a RAW processed with my film simulation. I would vary the left/right of the image, exposure, subject, and more. After about 10 tests across 5 people, only on one occasion did someone prefer the RAW from my editor.

At this point I realised that my camera manufacturer is hiring experts who build, live and breath colour technology. They have tested and examined everything about the body I have, and likely calibrated it individually in the process to make produce exact reproductions as they see in a lab. They are developing colour profiles that are not just broadly applicable, but also pleasing to look at (even if not accurate reproductions).

So how can my film simulations I extracted and built in a week, measure up to the experts? I decided to find out. I shot test images in JPG and RAW and began to provide A/B/C tests to people.

If the editor RAW was washed out compared to the RAW with my film simulation, the JPG from the body made them pale in comparison. Every detail was better, across a range of conditions. The features in my camera body are better than my editor. Noise reduction, dynamic range, sharpening, softening, colour saturation. I was holding in my hands a device that has thousands of hours of expert design, that could eclipse anything I built on a weekend for fun to achieve the same.

It was then I came to think about and realise …

Composition (and effects) is about you

Photography is a complex skill. It’s not having a fancy camera and just clicking the shutter, or zooming in. Photography is about taking that camera and putting it in a position to take a well composed image based on many rules (and exceptions) that I am still continually learning.

When you stop to look at an image you should always think “how can I compose the best image possible?”.

So why shoot in RAW? RAW is all about enabling editing in post. After you have already composed and taken the image. There are valid times and useful functions of editing. For example whitebalance correction and minor cropping in some cases. Both of these are easily conducted with JPG with no loss in quality compared to the RAW. I still commonly do both of these.

However RAW allows you to recover mistakes during composition (to a point). For example, the powerful base-curve fusion module allows dynamic range “after the fact”. You may even add high or low pass filters, or mask areas to filter and affect the colour to make things pop, or want that RAW data to make your vibrance control as perfect as possible. You may change the perspective, or even add filters and more. Maybe you want to optimise de-noise to make smooth high ISO images. There are so many options!

But all these things are you composing after. Today, many of these functions are in your camera - and better performing. So while I’m composing I can enable dynamic range for the darker elements of the frame. I can compose and add my colour saturation (or remove it). I can sharpen, soften. I can move my own body to change perspective. All at the time I am building the image in my mind, as I compose, I am able to decide on the creative effects I want to place in that image. I’m not longer just composing within a frame, but a canvas of potential effects.

To me this was an important distinction. I always found I was editing poorly-composed images in an attempt to “fix” them to something acceptable. Instead I should have been looking at how to compose them from the start to be great, using the tool in my hand - my camera.

Really, this is a decision that is yours. Do you spend more time now to make the image you want? Or do you spend it later editing to achieve what you want?

Conclusion

Photography is a creative process. You will have your own ideas of how that process should look, and how you want to work with it. Great! This was my experience and how I have arrived at a creative process that I am satisfied with. I hope that it provides you an alternate perspective to the generally accepted “RAW is imperative” line that many people advertise.

Episode 107 - The year of the Linux Desktop and other hardware stories

Posted by Open Source Security Podcast on July 30, 2018 03:42 AM
Josh and Kurt talk about modern hardware, how security relates to devices and actions. Everything from secure devices, to the cables we use, to thermal cameras and coat hangers. We end the conversation discussing the words we use and how they affect the way people see us and themselves.


<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/6864275/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes


Episode 106 - Data isn't oil, it's nuclear waste

Posted by Open Source Security Podcast on July 23, 2018 01:01 AM
Josh and Kurt talk about Cory Doctorow's piece on Facebook data privacy. It's common to call data the new oil but it's more like nuclear waste. How we fix the data problem in the future is going to require solutions we can't yet imagine as well as new ways of thinking about the problems.


<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/6839782/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes


Testing if a patch has test coverage

Posted by Adam Young on July 17, 2018 05:35 PM

When a user requests a code review, the review is responsible for making sure that the code is tested.  While the quality of the tests is a subjective matter, their presences is not;  either they are there or they are not there.  If they are not there, it is on the developer to explain why or why not.

Not every line of code is testable.  Not every test is intelligent.  But, at a minimum, a test should ensure that the code in a patch is run at least once, without an unexpected exception.

For Keystone and related projects, we have a tox job called cover that we can run on a git repo at a given revision.  For example, I can code review (even without git review) by pulling down a revision using the checkout link in  gerrit, and then running tox:

 

git fetch git://git.openstack.org/openstack/keystoneauth refs/changes/15/583215/2 && git checkout FETCH_HEAD
git checkout -b netloc-and-version
tox -e cover

I can look at the patch using show –stat to see what files were changed:

$ git show --stat
commit 2ac26b5e1ccdb155a4828e3e2d030b55fb8863b2
Author: wangxiyuan <wangxiyuan>
Date:   Tue Jul 17 19:43:21 2018 +0800

    Add netloc and version check for version discovery
    
    If the url netloc in the catalog and service's response
    are not the same, we should choose the catalog's and
    add the version info to it if needed.
    
    Change-Id: If78d368bd505156a5416bb9cbfaf988204925c79
    Closes-bug: #1733052

 keystoneauth1/discover.py                                 | 16 +++++++++++++++-
 keystoneauth1/tests/unit/identity/test_identity_common.py |  2 +-

and I want to skip looking at any files in keystoneauth1/tests as those are not production code. So we have 16 lines of new code. What are they?

Modifying someone elses’ code, I got to

 git show | gawk 'match($0,"^@@ -([0-9]+),[0-9]+ [+]([0-9]+),[0-9]+ @@",a){left=a[1];right=a[2];next};\
   /^\+\+\+/{print;next};\
   {line=substr($0,2)};\
   /^-/{left++; next};\
   /^[+]/{print right++;next};\
   {left++; right++}'

Which gives me:

+++ b/keystoneauth1/discover.py
420
421
422
423
424
425
426
427
428
429
430
431
432
433
437
+++ b/keystoneauth1/tests/unit/identity/test_identity_common.py
332

Looking in a the cover directory, I can see if a line is uncovered by its class:

class="stm mis"

For example:

$ grep n432\" cover/keystoneauth1_discover_py.html | grep "class=\"stm mis\""

432

For the lines above, I can use a seq to check them, since they are in order (with none missing)

for LN in `seq 420 437` ; do grep n$LN\" cover/keystoneauth1_discover_py.html ; done

Which produces:

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

I drop the grep “class=\”stm mis\”” to make sure I get something, then add it back in, and get no output.

Episode 105 - More backdoors in open source

Posted by Open Source Security Podcast on July 16, 2018 01:08 AM
Josh and Kurt talk about some recent backdoor problems in open source packages. We touch on is open source secure, how that security works, and what it should look like in the future. This problem is never going to go away or get better, and that's probably OK.


<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/6814252/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes


Building QGo on RHEL 7.5

Posted by Adam Young on July 12, 2018 03:06 PM

I’ve played Go for years. I’ve found that having a graphical Go client has helped me improve my game immensely. And, unlike many distractors,. I can make a move, then switch back in to work mode without really losing my train of thought.

I always like the QGo client. I have found it to be worthwhile to build and run from the git repo. After moving to RHEL 7.5 for my desktop, I had to go through the process again. Here is the short version.

Playing Go using the the QGo Client

All of the pre-reqs can come from Yum.

For the compiler and build tools, it is easiest to use a yum group:

sudo yum groupinstall "Development and Creative Workstation"

Once those packages are installed, you need some of the Qt5 development packages. At the bottom are is the complete list I have. I did not install all of these directly, but instead recently installed:

qt5-devel
qt5-qtbase-devel
qt5-qttools-devel
qt5-qttranslations
qt5-qtmultimedia
qt5-qtmultimedia-devel

TO run the actual qmake command, things are a bit different from the README.

/usr/bin/qmake-qt5 src
make

That puts things in ../build, which took me a moment to find.

Now I can run qgo with

/home/ayoung/devel/build/qgo

Et Voila

QGo Running on RHEL 7.5

The complete list of qt packages I have installed are:

qt5-qttools-libs-designer-5.9.2-1.el7.x86_64
adwaita-qt5-1.0-1.el7.x86_64
qt5-qtmultimedia-devel-5.9.2-1.el7.x86_64
qt-settings-19-23.7.el7.noarch
qt5-qtbase-devel-5.9.2-3.el7.x86_64
qt5-qttools-5.9.2-1.el7.x86_64
qt5-qtbase-5.9.2-3.el7.x86_64
qt5-rpm-macros-5.9.2-3.el7.noarch
qt5-doctools-5.9.2-1.el7.x86_64
qt5-designer-5.9.2-1.el7.x86_64
qt5-qtbase-common-5.9.2-3.el7.noarch
highcontrast-qt5-0.1-2.el7.x86_64
qt5-qtmultimedia-5.9.2-1.el7.x86_64
qt5-qttools-libs-designercomponents-5.9.2-1.el7.x86_64
qt-4.8.7-2.el7.x86_64
qt5-qtdeclarative-devel-5.9.2-1.el7.x86_64
qt5-qttools-libs-help-5.9.2-1.el7.x86_64
qt5-qtbase-gui-5.9.2-3.el7.x86_64
qt3-3.3.8b-51.el7.x86_64
qt5-qtxmlpatterns-5.9.2-1.el7.x86_64
qt5-qttools-common-5.9.2-1.el7.noarch
qt5-qttools-devel-5.9.2-1.el7.x86_64
qt5-qtdeclarative-5.9.2-1.el7.x86_64
qt5-linguist-5.9.2-1.el7.x86_64
qt5-qttranslations-5.9.2-1.el7.noarch
qt-x11-4.8.7-2.el7.x86_64

unlabeled_t type

Posted by Dan Walsh on July 12, 2018 03:02 PM

I often see bug reports or people showing AVC messages about confined domains not able to deal with unlabeled_t files.

type=AVC msg=audit(1530786314.091:639): avc:  denied  { read } for  pid=4698 comm="modprobe" name="modules.alias.bin" dev="dm-0" ino=9115100 scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=file

I just saw this AVC, which shows the openvswitch domain attempting to read a file, modules.alias.bin, with modprobe.   The usual response to this is to run restorecon on the files and everything should be fine.

But the next question I get is how did this content get the label unlabeled_t, and my response is usually I don't know, you did something.

Well lets look at how unlabeled_t files get created.

unlabeled_t really just means that the file on disk does not have an SELinux xattr indicating a file label.  Here are a few ways these files can get created

1 File was created by on a file system when the kernel was not running in SELinux mode.  If you take a system that was installed without SELinux (God forbid) or someone booted the machine with SELinux disabled, then all files created will not have labels.  This is why we force a relabel, anytime someone changes from SELinux disabled to SElinux enabled at boot time.

2. An extension on content created while the kernel is not in SELinux mode is files created in the initramfs before SELinux Policy in the kernel.  We have an issue in CoreOS Right now, where when the system boots up the initramfs is running `ignition`, which runs before systemd loads SELinux policy.  The inition scrips create files on the file system, while SELinux is not enabled in the kernel, so those files get created as unlabeled_t.  Ignition is adding a onetime systemd unit file to run restorecon on the content created.

3. People create USB Sticks with ext4 or xfs on them, on a non SELinux system, and then stick into systems with SELinux enabled and 'mv' the content onto the system.  The `mv` command actually maintains the SELinux label or lack thereof, when it moves files across file systems.  If you use a `mv -Z`, the mv command will relabel the target content, or you can just use restorecon.

4 The forth way I can think of creating unlabeled_t files it to create a brand new file system on an SELinux system.  When you create  a new file system the kernel creates the "/" (root) of the file system without a label.  So if you mound the file system on to a mount point, the directory where you mounted it will have no labels.  If an unconfined domain creates files no this new file system, then it will also create unlabeled_t files since the default behaviour of the SELinux kernel is create content based on the parents directory, which in this case is labeled unlabeled_t.  I recommend people run restorecon on the mount point as soon as you mount a new file system, to fix this behaviour.  Or you can run `restorecon -R -v MOUNTPOINT ` to cleanup all the files.

Note: The unlabeled_t type can also show up on other objects besides file system objects.  For example on labeled networks, but this blog is only concerned with file system objects.

Bottom Line:

Unlabeled file should always be cleaned up ASAP since they will cause confined domains lots of problems and restorecon is your friend.

The father of modern security: B. F. Skinner

Posted by Josh Bressers on July 11, 2018 01:43 PM

A lot of what we call security is voodoo. Most of it actually.

What I mean with that statement is our security process is often based on ideas that don’t really work. As an industry we have built up a lot of ideas and processes that aren’t actually grounded in facts and science. We don’t understand why we do certain things, but we know that if we don’t do those things something bad will happen! Will it really happen? I heard something will happen. I suspect the answer is no, but it’s very difficult to explain this concept sometimes.

I’m going to start with some research B. F. Skinner did as my example here. The very short version is that Skinner did research on pigeons. He had a box that delivered food at random intervals. The birds developed rituals that they would do in order to have their food delivered. If a pigeon decided that spinning around would cause food to be delivered, it would continue to spin around, eventually the food would appear reinforcing the nonsensical behavior. The pigeon believed their ritual was affecting how often the food was delivered. The reality is nothing the pigeon did affected how often food was delivered. The pigeon of course didn’t know this, they only knew what they experienced.

My favorite example  to use next to this pigeon experiment is the password policies of old. A long time ago someone made up some rules about what a good password should look like. A good password has letters, and numbers, and special characters, and the name of a tree in it. How often we should change a password was also part of this. Everyone knows you should change passwords as often as possible. Two or three times a day is best. The more you change it the more secure it is!

Today we’ve decided that all this advice was terrible. The old advice was based on voodoo. It was our ritual that kept us safe. The advice to some people seemed like a fair idea, but there were no facts backing it up. Lots of random characters seems like a good idea, but we didn’t know why. Changing your password often seemed like a good idea, but we didn’t know why. This wasn’t much different than the pigeon spinning around to get more food. We couldn’t prove it didn’t not work, so we kept doing it because we had to do something.

Do you know why we changed all of our password advice? We changed it because someone did the research around passwords. We found out that very long passwords using real words is substantially better than a nonsense short password. We found out that people aren’t good at changing their passwords every 90 days. They end up using horrible passwords and adding a 1 to the end. We measured the effectiveness of these processes and understood they were actually doing the opposite of what we wanted them to do. Without question there are other security ideas we do today that fall into this category.

Even though we have research showing this password advice was terrible we still see a lot of organizations and people who believe the old rituals are the right way to keep passwords safe. Sometimes even when you prove something to someone they can’t believe it. They are so invested in their rituals that they are unable to imagine any other way of existing. A lot of security happens this way. How many of our rules and processes are based on bad ideas?

How to measure
Here’s where it gets real. It’s easy to pick on the password example because it’s in the past. We need to focus on the present and the future. You have an organization that’s full of policy, ideas, and stuff. How can we try to make a dent in what we have today? What matters? What doesn’t work, and what’s actually harmful?

I’m going to split everything into 3 possible categories. We’ll dive deeper into each in future posts, but we’ll talk about them briefly right now.

Things that make money
Number one is things that make money. This is something like a product you sell, or a website that customers use to interact with your company. Every company does something that generates revenue. Measuring things that fit into this category is really easy. You just ask “Will this make more, less, or the same amount of money?” If the answer is less you’re wasting your time. I wrote about this a bit a long time ago, the post isn’t great, but the graphic I made is useful, print it out and plot your features on it. You can probably start asking this question today without much excitement.

Cost of doing business
The next category is what I call cost of doing business. This would be things like compliance or being a part of a professional organization. Sending staff to conferences and meetings. Things that don’t directly generate revenue but can have a real impact on the revenue. If you don’t have PCI compliance, you can’t process payments, you have no revenue, and the company won’t last long. Measuring some of these is really hard. Does sending someone to Black Hat directly generate revenue? No. But it will create valuable connections and they will likely learn new things that will be a benefit down the road. I guess you could think of these as investments in future revenue.

My thoughts on how to measure this one is less mature. I think about these often. I’ll elaborate more in a future post.

Infrastructure
The last category I’m going to call “infrastructure”. This one is a bit harder to grasp what makes sense. It’s not unlike the previous question though. In this case we ask ourselves “If I stopped doing this what bad thing would happen?” Now I don’t mean movie plot bad thing. Yeah if you stopped using your super expensive keycard entry system a spy from a competitor could break in and steal all your secrets using an super encrypted tor enabled flash drive, but they probably won’t. This is the category where you have to consider the cost of an action vs the cost of not doing an action. Not doing things will often have a cost, but doing things also has a cost.

Return on investment is the name of the game here. Nobody likes to spend money they don’t have to. This is why cloud is disrupting everything. Why pay for servers you don’t need when you can rent only what you do need?

I have some great stories for this category, be sure to come back when I publish this followup article.

The homework for everyone now is to just start thinking about what you do and why you do it. If you don’t have a good reason, you need to change your thinking. Changing your thinking is really hard to do as a human though. Many of us like to double down on our old beliefs when presented with facts. Don’t be that person, keep an open mind.

Fun with DAC_OVERRIDE and SELinux

Posted by Dan Walsh on July 10, 2018 02:31 PM

Lately the SELinux team has been trying to remove as many SELinux Domain Types that have DAC_OVERRIDE.

man capabilities

...

       CAP_DAC_OVERRIDE

              Bypass file read, write, and execute permission checks.  (DAC is an abbreviation of "discretionary access control".)

This means a process with CAP_DAC_OVERRIDE can read any file on the system and can write any file on the system from a standard permissions point of view.  With SELinux it means that they can read all file types that SELinux allows them to read, even if they are running with a process UID that is not allowed to read the file.  Similar they are allowed to write all SELinux writable types even if they aren't allowed to write based on UID.  

Obviously most confined domains never need to have this access, but some how over the years lots of domains got added this access.  

I recently received and email asking about syslog, generating lots of AVC's.  The writer said that he understood SELinux and has set up the types for syslog to write to, and even the content was getting written properly.  But the Kernel was generating an AVC every time the service started.

Here is the AVC.

Jul 09 15:24:57

 audit[9346]: HOSTNAME AVC avc:  denied  { dac_override }  for  pid=9346 comm=72733A6D61696E20513A526567 capability=1   scontext=system_u:system_r:syslogd_t:s0  tcontext=system_u:system_r:syslogd_t:s0 tclass=capability permissive=0

Sadly the kernel is not in full debug mode so we don't know what the path that the syslog process was trying to read or write.  

Note: You can turn on full auditing using a command like: `auctl -w /etc/shadow`. But this could effect your system performances.

But I had a guess on what could be causing the AVC's.

What causes DAC_OVERRIDE AVCs

A couple of easy places that a root process needs DAC_OVERRIDE is to look at the /etc/shadow file.

 ls -l /etc/shadow
----------. 1 root root 1474 Jul  9 14:02 /etc/shadow

As you see in the permissions no UID is allowed to read or write /etc/shadow,  So the only want to examine this file is using DAC_OVERRIDE.  But I am pretty sure syslogd is not attempting to read this file.  (Other SELinux AVC's would be screaming it if was).

The other location that can easily cause DAC_OVERRIDE AVC's is attempting to create content in the /root directory.

 ls -ld /root
dr-xr-x---. 19 root root 4096 Jul  9 15:59 /root

On Fedora, RHEL, Centos boxes, the root directory is set with permissions that do not allow any process to write to it even the root process, unless it uses DAC_OVERRIDE.  This is a security measure which prevents processes running as root that drop privileges from being able to write content in /root.  If a process can write content in /root, they could modify the /root/.bashrc file.  This means later an admin logging into the system as root executing a shell would execute the .bashrc script with full privs.  By setting the privs on the /root directory to 550, the systems are a little more security and admins know that only processes with DAC_OVERRIDE can write to this directory.  

Well this causes an issue.  Turns out that starting a shell like bash, ut wants to write to the the .bash_history directory in its home dir, if the script is running as root it wants to write /root/.bash_history file.  If the file does not exists, then the shell would require DAC_OVERRIDE to write this file.  Luckily bash continues working fine if it can not write this file.  

But if you are running on an SELinux system a confined application that launches bash, will generate an AVC message to the kernel stating that the confined domain wans DAC_OVERRIDE.

I recommend that if this situation happens to just add a DONTAUDIT rule to the policy.  Then SELinux will be silent about the denial, but the process will still not gain that access.

audit2allow -D -i /tmp/avc.log
#============= syslogd_t ==============
dontaudit syslogd_t self:capability dac_override;

To generate policy

audit2allow -M mysyslog -D -i /tmp/t1
******************** IMPORTANT ***********************
To make this policy package active, execute:
semodule -i mysyslog.pp

CONCLUSION

Bottom line, DAC_OVERRIDE is a fairly dangerous access to grant and can often be granted when it is not really necessary.  So I recommend fixing the permissions on files/directories or just adding dontaudit rules.


Episode 104 - The Gentoo security incident

Posted by Open Source Security Podcast on July 09, 2018 12:14 AM
Josh and Kurt talk about the Gentoo security incident. Gentoo did a really good job being open and dealing with the incident quickly. The basic takeaway from all this is make sure your organization is forcing users to use 2 factor authentication. The long term solution is going to be all identity providers forcing everyone to use 2FA.


<iframe allowfullscreen="" height="90" mozallowfullscreen="" msallowfullscreen="" oallowfullscreen="" scrolling="no" src="https://html5-player.libsyn.com/embed/episode/id/6784275/height/90/theme/custom/autoplay/no/autonext/no/thumbnail/yes/preload/no/no_addthis/no/direction/backward/render-playlist/no/custom-color/6e6a6a/" style="border: none;" webkitallowfullscreen="" width="100%"></iframe>

Show Notes


Converting policy.yaml to a list of dictionaries

Posted by Adam Young on July 08, 2018 03:38 AM

The policy .yaml file generated from oslo has the following format:

# Intended scope(s): system
#"identity:update_endpoint_group": "rule:admin_required"

# Delete endpoint group.
# DELETE /v3/OS-EP-FILTER/endpoint_groups/{endpoint_group_id}
# Intended scope(s): system
#"identity:delete_endpoint_group": "rule:admin_required"

This is not very useful for anything other than feeding to oslo-policy to enforce. If you want to use these values for anything else, it would be much more useful to have each rule as a dictionary, and all of the rules in a list. Here is a little bit of awk to help out:

#!/usr/bin/awk -f
BEGIN {apilines=0; print("---")}
/#"/ {
    if (api == 1){
	printf("  ")
    }else{
	printf("- ")
    }
  split ($0,array,"\"")
  print ("rule:", array[2]);
  print ("  check:", array[4]);
  rule=0
}    
/# / {api=1;}
/^$/ {api=0; apilines=0;}
api == 1 && apilines == 0 {print ("- description:" substr($0,2))}
/# GET/  || /# DELETE/ || /# PUT/ || /# POST/ || /# HEAD/ || /# PATCH/ {
     print ("  " $2 ": " $3)
}
api == 1 { apilines = apilines +1 }

I have it saved in mungepolicy.awk. I ran it like this:

cat etc/keystone.policy.yaml.sample | ./mungepolicy.awk > /tmp/keystone.access.yaml

And the output looks like this:

---
- rule: admin_required
  check: role:admin or is_admin:1
- rule: service_role
  check: role:service
- rule: service_or_admin
  check: rule:admin_required or rule:service_role
- rule: owner
  check: user_id:%(user_id)s
- rule: admin_or_owner
  check: rule:admin_required or rule:owner
- rule: token_subject
  check: user_id:%(target.token.user_id)s
- rule: admin_or_token_subject
  check: rule:admin_required or rule:token_subject
- rule: service_admin_or_token_subject
  check: rule:service_or_admin or rule:token_subject
- description: Show application credential details.
  GET: /v3/users/{user_id}/application_credentials/{application_credential_id}
  HEAD: /v3/users/{user_id}/application_credentials/{application_credential_id}
  rule: identity:get_application_credential
  check: rule:admin_or_owner
- description: List application credentials for a user.
  GET: /v3/users/{user_id}/application_credentials
  HEAD: /v3/users/{user_id}/application_credentials
  rule: identity:list_application_credentials
  check: rule:admin_or_owner

Which is valid yaml. It might be a pain to deal with the verbs in separate keys. Ideally, that would be a list, too, but this will work for starters.

Running OpenStack components on RHEL with Software Collections

Posted by Adam Young on July 07, 2018 01:50 PM

The Python world has long since embraced Python3.  However, the stability guarantees of RHEL have limited it to Python2.7 as the base OS.  Now that I am running RHEL on my laptop, I have to find a way to work with Python 3.5 in order to contribute to OpenStack.  To further constrain myself, I do not want to “pollute” the installed python modules by using PIP to mix and match between upstream and downstream.  The solution is the Software Collections version of Python 3.5.  Here’s how I got it to work.

Start by enabling the Software Collections Yum repos and refreshing:

sudo subscription-manager repos --enable rhel-workstation-rhscl-7-rpms
sudo subscription-manager refresh

Now what I need is Python 3.5.  Since I did this via trial and error, I don’t have the exact yum command I used, but I ended up with the following rpms installed, and they were sufficient.

rh-python35-python-setuptools-18.0.1-2.el7.noarch
rh-python35-python-libs-3.5.1-11.el7.x86_64
rh-python35-python-pip-7.1.0-2.el7.noarch
rh-python35-scldevel-2.0-2.el7.x86_64
rh-python35-runtime-2.0-2.el7.x86_64
rh-python35-python-six-1.10.0-1.el7.noarch
rh-python35-python-devel-3.5.1-11.el7.x86_64
rh-python35-python-3.5.1-11.el7.x86_64

To enable the software collections:

scl enable rh-python35 bash

However, thus far there is no Tox installed. I can get using pip, and I’m ok with that so long as I do a user install.  Make sure You have run the above scl enable command to do this for the right version of python.

 pip install --user --upgrade tox

This puts all the code in ~/.local/ as well as appending ~/.local/bin dir to the PATH env var.  You need to restart your terminal session to pick that up on first use.

Now I can run code in the Keystone repo.  For example, to build the sample policy.json files:

tox -e genpolicy

A Git Style change management for a Database driven app.

Posted by Adam Young on July 06, 2018 07:38 PM

The Policy management tool I’m working on really needs revision and change management.  Since I’ve spent so much time with Git, it affects my thinking about change management things.  So, here is my attempt to lay out my current thinking for implementing a git-like scheme for managing policy rules.

A policy line is composed of two chunks of data.  A Key and a Value.  The keys are in the form

  identity:create_user.

Additionally, the keys are scoped to a specific service (Keystone, Nova, etc).

The value is the check string.  These are of the form

role:admin and project_id=target.project_id

It is the check string that is most important to revision control. This lends itself to an entity diagram like this:

Whether each of these gets its own table remains to be seen.  The interesting part is the rule_name to policy_rule mapping.

Lets state that the policy_rule table entries are immutable.  If we want to change policy, we add a new entry, and leave the old ones in there.  The new entry will have a new revision value.  For now, lets assume revisions are integers and are monotonically increasing.  So, when I first upload the Keystone policy.json file, each entry gets a revision ID of 1.  In this example, all check_strings start off as are “is_admin:True”

Now lets assume I modify the identity:create_user rule.  I’m going to arbitrarily say that the id for this record is 68.  I want to Change it to:

role:admin and domain_id:target.domain_id

So we can do some scope checking.  This entry goes into the policy_rule table like so:

 

rule_name_id check_string revision
68 is_admin:True 1
68 role:admin and domain_id:target.domain_id 2

From a storage perspective this is quite nice, but from a “what does my final policy look like” perspective it is a mess.

In order to build the new view, we need sql along the lines of

select * from policy_rule where revision = ?

Lets call this line_query and assume that when we call it, the parameter is substituted for the question mark.  We would then need code like this pseudo-code:

doc = dict()
for revision in 1 to max:
    for result in line_query.execute(revision):
        index = result['rule_name_id']
        doc[index] = result.check_string()

 

This would build a dictionary layer by layer through all the revisions.

So far so good, but what happens if we decided to revert, and then to go a different direction? Right now, we have a revision chain like this:

And if we keep going, we have,

But what happens if 4 was a mistake? We need to revert to 6 and create a new new branch.

We have two choices. First, we could be destructive and delete all of the lines in revision 4, 5, and 6. This means we can never recreate the state of 6 again.

What if we don’t know that 4 is a mistake? What if we just want to try another route, but come back to 4,5, and 6 in the future?

We want this:

 

But how will we know to take the branch when we create the new doc?

Its a database! We put it in another table.

revision_id revision_parent_id
2 1
3 2
4 3
5 4
6 5
7 3
8 7
9 8

In order to recreate revision 9, we use a stack. Push 9 on the stack, then find the row with revision_id 9 in the table, push the revision_parent_id on the stack, and continue until there are no more rows.  Then, pop each revision_id off the stack and execute the same kind of pseudo code I posted above.

It is a lot.  It is kind of complicated, but it is the type of complicated that Python does well.  However, database do not do this kind of iterative querying well.  It would take a stored procedure to perform this via a single database query.

Talking through this has encouraged me decide to take another look at using git as the backing store instead of a relational database.