Fedora summer-coding Planet

Photography - Why You Should Use JPG (not RAW)

Posted by William Brown on August 05, 2018 02:00 PM

Photography - Why You Should Use JPG (not RAW)

When I started my modern journey into photography, I simply shot in JPG. I was happy with the results, and the images I was able to produce. It was only later that I was introduced to a now good friend and he said: “You should always shoot RAW! You can edit so much more if you do.”. It’s not hard to find many ‘beginner’ videos all touting the value of RAW for post editing, and how it’s the step from beginner to serious photographer (and editor).

Today, I would like to explore why I have turned off RAW on my camera bodies for good. This is a deeply personal decision, and I hope that my experience helps you to think about your own creative choices. If you want to stay shooting RAW and editing - good on you. If this encourages you to try turning back to JPG - good on you too.

There are two primary reasons for why I turned off RAW:

  • Colour reproduction of in body JPG is better to the eye today.
  • Photography is about composing an image from what you have infront of you.

Colour is about experts (and detail)

I have always been unhappy with the colour output of my editing software when processing RAW images. As someone who is colour blind I did not know if it was just my perception, or if real issues existed. No one else complained so it must just be me right!

Eventually I stumbled on an article about how to develop real colour and extract camera film simulations for my editor. I was interested in both the ability to get true reflections of colour in my images, but also to use the film simulations in post (the black and white of my camera body is beautiful and soft, but my editor is harsh).

I spent a solid week testing and profiling both of my cameras. I quickly realised a great deal about what was occuring in my editor, but also my camera body.

The editor I have, is attempting to generalise over the entire set of sensors that a manufacturer has created. They are also attempting to create a true colour output profile, that is as reflective of reality as possible. So when I was exporting RAWs to JPG, I was seeing the differences between what my camera hardware is, vs the editors profiles. (This was particularly bad on my older body, so I suspect the RAW profiles are designed for the newer sensor).

I then created film simulations and quickly noticed the subtle changes. Blacks were blacker, but retained more fine detail with the simulation. Skin tone was softer. Exposure was more even across a variety of image types. How? RAW and my editor is meant to create the best image possible? Why is a film-simulation I have “extracted” creating better images?

As any good engineer would do I created sample images. A/B testing. I would provide the RAW processed by my editor, and a RAW processed with my film simulation. I would vary the left/right of the image, exposure, subject, and more. After about 10 tests across 5 people, only on one occasion did someone prefer the RAW from my editor.

At this point I realised that my camera manufacturer is hiring experts who build, live and breath colour technology. They have tested and examined everything about the body I have, and likely calibrated it individually in the process to make produce exact reproductions as they see in a lab. They are developing colour profiles that are not just broadly applicable, but also pleasing to look at (even if not accurate reproductions).

So how can my film simulations I extracted and built in a week, measure up to the experts? I decided to find out. I shot test images in JPG and RAW and began to provide A/B/C tests to people.

If the editor RAW was washed out compared to the RAW with my film simulation, the JPG from the body made them pale in comparison. Every detail was better, across a range of conditions. The features in my camera body are better than my editor. Noise reduction, dynamic range, sharpening, softening, colour saturation. I was holding in my hands a device that has thousands of hours of expert design, that could eclipse anything I built on a weekend for fun to achieve the same.

It was then I came to think about and realise …

Composition (and effects) is about you

Photography is a complex skill. It’s not having a fancy camera and just clicking the shutter, or zooming in. Photography is about taking that camera and putting it in a position to take a well composed image based on many rules (and exceptions) that I am still continually learning.

When you stop to look at an image you should always think “how can I compose the best image possible?”.

So why shoot in RAW? RAW is all about enabling editing in post. After you have already composed and taken the image. There are valid times and useful functions of editing. For example whitebalance correction and minor cropping in some cases. Both of these are easily conducted with JPG with no loss in quality compared to the RAW. I still commonly do both of these.

However RAW allows you to recover mistakes during composition (to a point). For example, the powerful base-curve fusion module allows dynamic range “after the fact”. You may even add high or low pass filters, or mask areas to filter and affect the colour to make things pop, or want that RAW data to make your vibrance control as perfect as possible. You may change the perspective, or even add filters and more. Maybe you want to optimise de-noise to make smooth high ISO images. There are so many options!

But all these things are you composing after. Today, many of these functions are in your camera - and better performing. So while I’m composing I can enable dynamic range for the darker elements of the frame. I can compose and add my colour saturation (or remove it). I can sharpen, soften. I can move my own body to change perspective. All at the time I am building the image in my mind, as I compose, I am able to decide on the creative effects I want to place in that image. I’m not longer just composing within a frame, but a canvas of potential effects.

To me this was an important distinction. I always found I was editing poorly-composed images in an attempt to “fix” them to something acceptable. Instead I should have been looking at how to compose them from the start to be great, using the tool in my hand - my camera.

Really, this is a decision that is yours. Do you spend more time now to make the image you want? Or do you spend it later editing to achieve what you want?


Photography is a creative process. You will have your own ideas of how that process should look, and how you want to work with it. Great! This was my experience and how I have arrived at a creative process that I am satisfied with. I hope that it provides you an alternate perspective to the generally accepted “RAW is imperative” line that many people advertise.

Dev Null Productions

Posted by Mo Morsi on July 24, 2018 01:35 PM

After my Departure from RedHat I was able to get some RnR, but quickly wanted to get a head start of my next venture. This is because I decided to put a cap on the amount of time that would be dedicated to trying to make "it" happen, and would pause at regular checkpoints to monitor progress. This is not to say I'm going to quit the endeavor at that point in the future (the timeframe of which I'm keeping private), but the intent is to drive focus and keep the ball moving forward objectively. While Omega was a great project to work on, both fun and the source of much growth and experience, I am not comfortable with the amount of time spent on it, for what was gained. All hindsight is 20/20, but every good trader knows when to cut losses.

Dev Null Productions LLC. was launched four months ago in April 2018 and we haven't looked back. Our flagship product, Wipple XRP Intelligence was launched shortly after, providing realtime access to the XRP network and high level stats and reporting. The product is under continued development and we've begun a social-media based marketing drive to promote it. Things are still early, and there is still aways to go & obstacles to overcome (not to mention the crypto-currency bear market that we've been in for the last 1/2 year), but the progress has been great, and there are many more awesome features in the queue

This Thursday, I am giving a presentation on XRP to the Syracuse Software Development Meetup, hosted at the Tech Garden, a tech incubator in Syracuse, NY. I aim to go over the XRP protocol, discussing both the history of Ripple and the technical details, as well as common use cases and gotchyas from our experiences. The event is looking very solid, and there is already a large turnout and some great momentum growing, so I'm excited to participate in it and see how it all goes. While we're still in the early phases of development, I'm hoping to drive some interest in the project, and perhaps meet collaborators who'd like to come onboard for a percentage of ultimate profits!

Be sure to stay tuned for more updates and developments, until then, keep Rippling!

Ripple moon

GSoC Fedora Happines Packets Update – Week 8th and 9th

Posted by Algogator on July 19, 2018 05:15 AM

The talk got accepted a month ago (https://pagure.io/flock/issue/55) but I didn’t know if I’d be able to attend it. I couldn’t find an appointment at the Houston consulate for July, the entire month was booked but luckily someone cancelled last week so I made a quick trip to Houston. And I got my passport back […]

The post GSoC Fedora Happines Packets Update – Week 8th and 9th appeared first on Anna Philips.

fedmsg on CentOS

Posted by Algogator on June 27, 2018 06:46 PM

Zeromq fedmsg is “a library built on ZeroMQ using the PyZMQ Python bindings”. So I thought it might help to learn a little bit more about ZeroMQ ( which is not a messaging queue). Contexts – You usually have one context per process and which manages the sockets. Socket – It can be configured and […]

The post fedmsg on CentOS appeared first on Anna Philips.

Fedora Happiness Packets on CentOS 7

Posted by Algogator on June 15, 2018 05:55 PM

I’m writing this for future me. My default OS is Ubuntu so setting up the project on CentOS was new for me. gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong –param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong –param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC […]

The post Fedora Happiness Packets on CentOS 7 appeared first on Anna Philips.

GSoC update Week 1 and 2

Posted by Algogator on June 12, 2018 07:03 PM

Week 1: Setup a dev instance of Fedora Happiness Packets on CentOS. Took some time to install all the dependencies. It was so much easier on Ubuntu. Get a dev instance from the Infra team https://pagure.io/fedora-infrastructure/issue/6690 Created my wiki user page over here Started reading and tweaking with mozilla-django-oidc to authenticate with FAS Week 2 […]

The post GSoC update Week 1 and 2 appeared first on Anna Philips.

OpenID Connect and Authenticating against Ipsilon

Posted by Algogator on June 08, 2018 02:20 PM

Step 1 – Get your credentials You need to register your app with the OP https://iddev.fedorainfracloud.org/ I used https://github.com/puiterwijk/oidc-register for that. pip install oidc-register To register you need the provider, application, and redirect(callback) URLs oidc-register https://iddev.fedorainfracloud.org/ Note: You can’t use http unless it’s with a localhost.  After you do that you should have […]

The post OpenID Connect and Authenticating against Ipsilon appeared first on Anna Philips.

GSoC 2018: Week 2

Posted by Amitosh Swain Mahapatra on May 30, 2018 06:14 AM

This is Status Report for Fedora App filled by participants on a weekly basis.

Status Report for Amitosh Swain Mahapatra (amitosh)

  • Fedora Account: amitosh
  • IRC: amitosh (found in #fedora, #fedora-dotnet, #fedora-summer-coding, #fedora-commops)
  • Fedora Wiki User Page: amitosh

This time, I am working on improving the Fedora Community App with the Fedora project. It’s been a week since we started off our coding on may 14.

The Fedora App is a central location for Fedora users and innovators to stay updated on The Fedora Project. News updates, social posts, Ask Fedora, as well as articles from Fedora Magazine are all held under this app.

Tasks Completed

Here is the summary of my work in the second week:

We now have offline capablilties in the app (#62). The app now caches the content from Fedora Magazine, FedoCal and Fedora Social. Every time we load the app, we refersh the cache from the API end points in the background. We no longer block the user from interacting with the app and the content also loads a lot faster. (#61)

It still has some rough edges, we will be addressing them in the following weeks.

There are only two hard things in Computer Science: cache invalidation and naming things.

– Phil Karlton

Two Hard Things from Martin Fowler

And in particularly JS, we shall modify it to:

There are only two hard things in Computer Science: 1> Cache invalidation
3> Async callbacks
2> Naming things.

And fortunately, RxJS provides an elegant solution to (3).

This week was particularly challenging and exciting. RxJS Observables and reactive programming patterns was really interesting to learn. Learn RxJS by @btroncone was a great resource that helped me to quickly grasp the concepts. Many thanks!

You can find the weekly report for Week 1 here.

What’s next ?

I’m working on creating unit tests for various services we use in our app and the integration tests for the different screens.

Google Summer of Code 2018 with Fedora

Posted by Algogator on May 24, 2018 03:37 PM

I got selected to work with Fedora on the Fedora Happiness Packets for GSoC 2018 😀 A shout-out to Jona and Bee for helping me with the proposal and initial PRs! About me: Hi there! My name is Anna. I go by the username Algogator on IRC and elsewhere. I study computer science at UTA. […]

The post Google Summer of Code 2018 with Fedora appeared first on Anna Philips.

GSoC 2018: Week 1

Posted by Amitosh Swain Mahapatra on May 22, 2018 05:24 PM

This is Status Report for Fedora App filled by participants on a weekly basis.

Status Report for Amitosh Swain Mahapatra (amitosh)

  • Fedora Account: amitosh
  • IRC: amitosh (found in #fedora, #fedora-dotnet, #fedora-summer-coding, #fedora-commops)
  • Fedora Wiki User Page: amitosh

This time, I am working on improving the Fedora Community App with the Fedora project. It’s been a week since we started off our coding on may 14.

The Fedora App is a central location for Fedora users and innovators to stay updated on The Fedora Project. News updates, social posts, Ask Fedora, as well as articles from Fedora Magazine are all held under this app.

Progress Report

Here is the summary of my work in the first week:

  1. We now have a system for loading configuration such as API keys, depending on the environment we are running. It also allows us to load them from external sources (#55). This would help us to remove the need to store API keys under version control (#52).
  2. The JS to TS conversion I started earlier (#4, #6) is finally complete. All of our code is now fully checked by TypeScript (TS) compiler for type safety (#54, #56). Except for JSON responses, where typing things will be a waste of time as TS does not provide run-time type safety, all other functions and services are now checked using TS interfaces. (#57).
  3. Our code is now following Angular patterns even more closely. I have standardized the Providers who use to return a callback or a Promise to return Observables. We now load network data a bit faster due to improved concurrency in the code.
  4. The documentation coverage of our code has increased. As the part of conversion, I have added TS doc comments describing the usage of various Providers, Services and Components, what the expect and what they return.
  5. The annoying white screen on launch (#16) in certain devices is now gone! (#47)
  6. After the restructuring, we no longer have any in-memory caching. I will be working on offline storage and caching implementation in this week.

What’s next ?

I am working to bring offline storage and sync to Fedora Magazine and Fedora Social sections of the app. This will both improve the usability and performance of the app. From a UX perspective, we will start syncing data rather than blocking the user from doing anything.

GSoC 2018: Kicking off the Coding

Posted by Amitosh Swain Mahapatra on May 14, 2018 03:55 PM

It’s May 14, and this is when we officially start coding for GSoC, 2018 edition. This time, I would be working on improving the Fedora Community App with the Fedora project. This marks the beginning of a journey of 3 months of coding, patching, debugging, git (mess) and the awesome discussions with my mentors and the community.

The Fedora App is a central location for Fedora users and innovators to stay updated on The Fedora Project. News updates, social posts, Ask Fedora, as well as articles from Fedora Magazine are all held under this app.

For the first month of my GSoC coding, I will be working of improving the code quality by completing the TypeScript conversion I started earlier as well as I will also be working on bring offline capabilities to Fedora Magazine reader and the Fedora Social.

Why TypeScript?

TypeScript (TS) is a super-set of JavaScript (JS) that brings optional static typing features to JS. TS is the language used by Ionic - the framework on which the Fedora Community App is built. I personally believe, in a medium to large scale project, a static type system is necessary - it especially helps to catch errors very early and allows to perform safe refactors, but I also agree that there are cases where it’s useful to fallback to dynamic typing.

TS does just this. It is implemented as a syntax extension to JavaScript which is transpiled into JS using the TS compiler. Every valid JS syntax is also valid TS syntax, so it becomes easier to port a project - even partially to use TS features, without incurring the cost of a full project rewrite.

In my earlier work in updating the source code to the latest version of Ionic, I started a conversion from JS to TS. Essential parts are already in place, but still there is a big chunk left for conversion. In my first week of coding, I will be working on this conversion.

Offline Capabilities in Fedora App

Currently, the app lacks any offline capabilities - you always require an internet connection to perform most of the actions.

My work will be to bring offline storage and sync to Fedora Magazine and Fedora Social sections of the app. This will both improve the apps usability and performance. From a UX perspective, we will start syncing data rather than blocking the user from doing anything (as it is currently). The user can continue to act on cached items, while we continue to fetch new items in the background.

Further in second month, I will be working on implementing a lightweight blog reader and caching complete offline copies of user-selected articles.

RETerm to The Terminal with a GUI

Posted by Mo Morsi on May 09, 2018 02:59 PM

When it comes to user interfaces, most (if not all) software applications can be classified into one of three categories:

  • Text Based - whether they entail one-off commands, interactive terminals (REPL), or text-based visual widgets, these saw a major rise in the 50s-80s though were usurped by GUIs in the 80s-90s
  • Graphical - GUIs, or Graphical User Interfaces, facilitate creating visual windows which the user may interact with via the mouse or keyboard. There are many different GUI frameworks available for various platforms
  • Web Based - A special type of graphical interface rendered via a web browser, many applications provide their frontend via HTML, Javascript, & CSS
Interfaces comparison

In recent years modern interface trends seem to be moving in the direction of the Web User Interfaces (WUI), with increasing numbers of apps offering their functionality primarily via HTTP. That being said GUIs and TUIs (Text User Interfaces) are still an entrenched use case for various reasons:

  • Web browsers, servers, and network access may not be available or permissable on all systems
  • Systems need mechanisms to access and interact with the underlying components, incase higher level constructs, such as graphics and network subsystems fail or are unreliable
  • Simpler text & graphical implementations can be coupled and optimized for the underlying operational environment without having to worry about portability and cross-env compatability. Clients can thus be simpler and more robust.

Finally there is a certain pleasing ascethic to simple text interfaces that you don't get with GUIs or WUIs. Of course this is a human-preference sort-of-thing, but it's often nice to return to our computational roots as we move into the future of complex gesture and voice controlled computer interactions.

Scifi terminal

When working on a recent side project (to be announced), I was exploring various concepts as to the user interface which to throw ontop of it. Because other solutions exist in the domain which I'm working in (and for other reasons), I wanted to explore something novel as far as user interaction, and decided to expirement with a text-based approach. ncurses is the goto library for this sort of thing, being available on most modern platforms, along with many widget libraries and high level wrappers


Unfortunately ncurses comes with alot of boilerplate and it made sense to seperate that from the project I intend to use this for. Thus the RETerm library was born, with the intent to provide a high level DSL to implmenent terminal interfaces and applications (... in Ruby of couse <3 !!!)

Reterm sc1

RETerm, aka the Ruby Enhanced TERMinal allows the user to incorporate high level text-based widgets into an orgnaized terminal window, with seemless standardized keyboard interactions (mouse support is on the roadmap to be added). So for example, one could define a window containing a child widget like so:

require 'reterm'
include RETerm

value = nil

init_reterm {
  win = Window.new :rows => 10,
                   :cols => 30

  slider = Components::VSlider.new
  win.component = slider
  value = slider.activate!

puts "Slider Value: #{value}"

This would result in the following interface containing a vertical slider:

Reterm sc2

RETerm ships with many built widgets including:

Text Entry

Reterm sc3

Clickable Button

Reterm sc4

Radio Switch/Rocker/Selectable List

Reterm sc5 Reterm sc6 Reterm sc7

Sliders (both horizontal and vertical)


Ascii Text (with many fonts via artii/figlet)

Reterm sc8

Images (via drawille)

Reterm sc9

RETerm is now available via rubygems. To install, simplly:

  $ gem install reterm

That's All Folks... but wait there is more!!! Afterall:

Delorian meme

For a bit of a value-add, I decided to implement a standard schema where text interfaces could be described in a JSON config file and loaded by the framework, similar to xml schemas which GTK and Android use for their interfaces. One can simply describe their interface in JSON and the framework will instantiate the corresponding text interface:

  "window" : {
    "rows"      : 10,
    "cols"      : 50,
    "border"    : true,
    "component" : {
      "type" : "Entry",
      "init" : {
        "title" : "<C>Demo",
        "label" : "Enter Text: "
Reterm sc10

To assist in generating this schema, I implemented a graphical designer, where components can be dragged and dropped into a 2D canvas to layout the interface.

That's right, you can now use a GUI based application to design a text-based interface.

Retro meme

The Designer itself can be found in the same repo as the RETerm project, loaded in the "designer/" subdir.

Reterm designer

To use if you need to install visualruby (a high level wrapper to ruby-gnome) like so:

  $ gem install visualruby

And that's it! (for real this time) This was certainly a fun side-project to a side-project (toss in a third "side-project" if you consider the designer to be its own thing!). As I to return to the project using RETerm, I aim to revisit it every so often, adding new features, widgets, etc....



Why I still choose Ruby

Posted by Mo Morsi on May 09, 2018 02:59 PM

With the plethora of languages available to developers, I wanted to do a quick follow-up post as to why given my experience in many different environments, Ruby is still the goto language for all my computational needs!

Prg mtn

While different languages offer different solutions in terms of syntax support, memory managment, runtime guarantees, and execution flows; the underlying arithmatic, logical, and I/O hardware being controlled is the same. Thus in theory, given enough time and optimization the performance differences between languages should go to 0 as computational power and capacity increases / goes to infinity (yes, yes, Moore's law and such, but lets ignore that for now).

Of course different classes of problem domains impose their own requirements,

  • real time processing depends low level optimizations that can only be done in assembly and C,
  • data crunching and process parallelization often needs minimal latency and optimized runtimes, something which you only get with compiled/static-typed languages such as C++ and Java,
  • and higher level languages such as Ruby, Python, Perl, and PHP are great for rapid development cycles and providing high level constructs where complicated algorithms can be invoked via elegant / terse means.

But given the rapid rate of hardware performance in recent years, whole classes of problems which were previously limited to 'lower-level' languages such as C and C++ are now able to be feasbily implemented in higher level languages.

Computer power


Thus we see high performance financial applications being implemented in Python, major websites with millions of users a day being implemented in Ruby and Javascript, massive data sets being crunched in R, and much more.

So putting the performance aspect of these environments aside we need to look at the syntactic nature of these languages as well as the features and tools they offer for developers. The last is the easiest to tackle as these days most notable languages come with compilers/interpreters, debuggers, task systems, test suites, documentation engines, and much more. This was not always the case though as Ruby was one of the first languages that pioneered builtin package management through rubygems, and integrated dependency solutions via gemspecs, bundler, etc. CPAN and a few other language-specific online repositories existed before, but with Ruby you got integration that was a core part of the runtime environment and community support. Ruby is still known to be on the leading front of integrated and end-to-end solutions.

Syntax differences is a much more difficult subject to objectively dicuss as much of it comes down to programmer preference, but it would be hard to object to the statement that Ruby is one of the most Object Oriented languages out there. It's not often that you can call the string conversion or type identification methods on ALL constructs, variables, constants, types, literals, primitives, etc:

  > 1.to_s
  => "1"
  > 1.class
  => Integer

Ruby also provides logical flow control constructs not seen in many other languages. For example in addition to the standard if condition then dosomething paradigm, Ruby allows the user to specify the result after the predicate, eg dosomething if condition. This simple change allows developers to express concepts in a natural manner, akin to how they would often be desrcibed between humans. In addition to this, other simple syntax conveniences include:

  • The unless keyword, simply evaluating to if not
          file = File.open("/tmp/foobar", "w")
          file.write("Hello World") unless file.exist?
  • Methods are allowed to end with ? and ! which is great for specifying immutable methods (eg. Socket.open?), mutable methods and/or methods that can thrown an exception (eg. DBRecord.save!)
  • Inclusive and exclusive ranges can be specified via parenthesis and two or three elipses. So for example:
          > (1..4).include?(4)
          => true
          > (1...4).include?(4)
          => false
  • The yield keywork makes it trivial for any method to accept and invoke a callback during the course of its lifetime
  • And much more

Expanding upon the last, blocks are a core concept in Ruby, once which the language nails right on the head. Not only can any function accept an anonymous callback block, blocks can be bound to parameters and operated on like any other data. You can check the number of parameters the callbacks accept by invoking block.arity, dynamically dispatch blocks, save them for later invokation and much more.

Due to the asynchronous nature of many software solutions (many problems can be modeled as asynchronous tasks) blocks fit into many Ruby paradigms, if not as the primary invocation mechanism, then as an optional mechanism so as to enforce various runtime guarantees:

  File.open("/tmp/foobar"){ |file|
    # do whatever with file here

  # File is guaranteed to be closed here, we didn't have to close it ourselves!

By binding block contexts, Ruby facilitates implementing tightly tailored solutions for many problem domains via DSLs. Ruby DSLs exists for web development, system orchestration, workflow management, and much more. This of course is not to mention the other frameworks, such as the massively popular Rails, as well as other widely-used technologies such as Metasploit

Finally, programming in Ruby is just fun. The language is condusive to expressing complex concepts elegantly, jives with many different programming paradigms and styles, and offers a quick prototype to production workflow that is intuitive for both novice and seasoned developers. Nothing quite scratches that itch like Ruby!

Doge ruby

Into The Unknown - My Departure from RedHat

Posted by Mo Morsi on May 09, 2018 02:59 PM

In May 2006, a young starry eyed intern walked into the large corporate lobby of RedHat's Centential Campus in Raleigh, NC, to begin what would be a 12 year journey full of ups and downs, break-throughs and setbacks, and many many memories. Flash forward to April 2018, when the "intern-turned-hardend-software-enginner" filed his resignation and ended his tenure at RedHat to venture into the risky but exciting world of self-employment / entrepreneurship... Incase you were wondering that former-intern / Software Engineer is myself, and after nearly 12 years at RedHat, I finished my last day of employment on Friday April 13th, 2018.

Overall RedHat has been a great experience, I was able to work on many ground-breaking products and technologies, with many very talented individuals from across the spectrum and globe, in a manner that facilitated maximum professional and personal growth. It wasn't all sunshine and lolipops though, there were many setbacks, including many cancelled projects and dead-ends. That being said, I felt I was always able to speak my mind without fear of reprocussion, and always strived to work on those items that mattered the most and had the furthest reaching impact.

Some (but certainly not all) of those items included:

  • The highly publicized, but now defunct, RHX project
  • The oVirt virtualization management (cloud) platform, where I was on the original development team, and helped build the first prototypes & implementation
  • The RedHat Ruby stack, which was a battle to get off the ground (given the prevalence of the Java and Python ecosystems, continuing to to this day). This is one of the items I am most proud of, we addressed the needs of both the RedHat/Fedora and Ruby communities, building the necessary bridge logic to employ Ruby and Rails solutions in many enterprise production environments. This continues to this day as the team continuously stays ontop of upstream Ruby developments and provide robust support and solutions for downstream use
  • The RedHat CloudForms projects, on which I worked on serveral interations, again including initial prototypes and standards, as well as ManageIQ integration.
  • ReFS reverse engineering and parser. The last major research topic that I explored during my tenure at RedHat, this was a very fun project where I built upon the sparse information about the filesystem internals that's out there, and was able to deduce the logic up to the point of being able to read directory lists and file contents and metadata out of a formatted filesystem. While there is plenty of work to go on this front, I'm confident that the published writeups are an excellent launching point for additional research as well as the development of formal tools to extract data from the filesystem.

My plans for the immediate future are to take a short break then file to form a LLC and explore options under that umbrella. Of particular interest are crypto-currencies, specifically Ripple. I've recently begun developing an integrated wallet, ledger and market explorer, and statistical analysis framework called Wipple which I'm planning to continue working on and if all goes according to plan, generating some revenue from. There is alot of ??? between here and there, but thats the name of the game!

Until then, I'd like to thank everyone who helped me do my thing at RedHat, from landing my initial internships and the full-time after that, to showing me the ropes, and not shooting me down when I put myself out there to work on and promote our innovation solutions and technologies!


Data &amp; Market Analysis in C++, R, and Python

Posted by Mo Morsi on May 09, 2018 02:59 PM

In recent years, since efforts on The Omega Project and The Guild sort of fizzled out, I've been exploring various areas of interest with no particular intent other than to play around with some ideas. Data & Financial Engineering was one of those domains and having spent some time diving into the subject (before once again moving on to something else altogether) I'm sharing a few findings here.

My journey down this path started not too long after the Bitcoin Barber Shop Pole was completed, and I was looking for a new project to occupy my free time (the little of it that I have). Having long since stepped down from the SIG315 board, but still renting a private office at the space, I was looking for some way to incorporate that into my next project (besides just using it as the occasional place to work). Brainstorming a bit, I settled on a data visualization idea, where data relating to any number of categories would be aggregated, geotagged, and then projected onto a virtual globe. I decided to use the Marble widget library, built ontop of the QT Framework and had great success:


The architecture behind the DataChoppa project was simple, a generic 'Data' class was implemented using smart pointers ontop of which the Facet Pattern was incorporated, allowing data to be recorded from any number of sources in a generic manner and represented via convenient high level accessors. This was all collected via synchronization and generation plugins which implement a standarized interface whose output was then fed onto a queue on which processing plugings were listening, selecting the data that they were interested in to be operated on from there. The Processors themselves could put more data onto the queue after which the whole process was repeated ad inf., allowing each plugin to satisfy one bit of data-related functionality.

Datachoppa arch

Core Generic & Data Classes

namespace DataChoppa{
  // Generic value container
  class Generic{
      Map<std::string, boost::any> values;
      Map<std::string, std::string> value_strings;

  namespace Data{
    /// Data representation using generic values
    class Data : public Generic{
        Data() = default;
        Data(const Data& data) = default;

        Data(const Generic& generic, TYPES _types, const Source* _source) :
          Generic(generic), types(_types), source(_source) {}

        bool of_type(TYPE type) const;

        Vector to_vector() const;

        TYPES types;

        const Source* source;
    }; // class Data
  }; // namespace Data
}; // namespace DataChoppa

The Process Loop

  namespace DataChoppa {
    namespace Framework{
      void Processor::process_next(){
        if(to_process.empty()) return;
        Data::Data data = to_process.first();
        Plugins::Processors::iterator plugin = plugins.begin();
        while(plugin != plugins.end()) {
          Plugins::Meta* meta = dynamic_cast<Plugins::Meta*>(*plugin);
          //LOG(debug) << "Processing " << meta->id;
          }catch(const Exceptions::Exception& e){
            LOG(warning) << "Error when processing: " << e.what()
                         << " via " << meta->id;
    }; /// namespace Framework
  }; /// namespace DataChoppa

The HTTP Plugin (abridged)

namespace DataChoppa {
  namespace Plugins{
    class HTTP : public Framework::Plugins::Syncer,
                 public Framework::Plugins::Job,
                 public Framework::Plugins::Meta {
        /// ...

        /// sync - always return data to be added to queue, even on error
        Data::Vector sync(){
          String _url = url();
          Network::HTTP::SyncRequest request(_url, request_timeout);

          for(const Network::HTTP::Header& header : headers())

          int attempted = 0;
          Network::HTTP::Response response(request);

          while(attempts == -1 || attempted &lt; attempts){


              if(attempted == attempts){
                Data::Data result = response.to_error_data();
                result.source = &source;
                return result.to_vector();

              if(attempted == attempts){
                Data::Data result = response.to_error_data();
                result.source = &source;
                return result.to_vector();

              Data::Data result = response.to_data();
              result.source = &source;
              return result.to_vector();

          /// we should never get here
          return Data::Vector();
  }; // namespace Plugins
}; // namespace DataChoppa

Overall I was pleased with the result (and perhaps I should have stopped there...). The application collected and aggregated data from many sources including RSS feeds (google news, reddit, etc), weather sources (yahoo weather, weather.com), social networks (facebook, twitter, meetup, linkedin), chat protocols (IRC, slack), financial sources, and much more. While exploring the last I discovered the world of technical analysis and began incorporating many various market indicators into a financial analysis plugin for the project.

The Market Analysis Architecture

Datachoppa extractors Datachoppa annotators

Aroon Indicator (for example)

namespace DataChoppa{
  namespace Market {
    namespace Annotators {
      class Aroon : public Annotator {
          double aroon_up(const Quote& quote, int high_offset, double range){
            return ((range-1) - high_offset) / (range-1) * 100;

          DoubleVector aroon_up(const Quotes& quotes, const Annotations::Extrema* extrema, int range){
            return quotes.collect<DoubleVector>([extrema, range](const Quote& q, int i){
                     return aroon_up(q, extrema->high_offsets[i], range);

          double aroon_down(const Quote& quote, int low_offset, double range){
            return ((range-1) - low_offset) / (range-1) * 100;

          DoubleVector aroon_down(const Quotes& quotes, const Annotations::Extrema* extrema, int range){
            return quotes.collect<DoubleVector>([extrema, range](const Quote& q, int i){
                     return aroon_down(q, extrema->low_offsets[i], range);

          AnnotationList annotate() const{
            const Quotes& quotes = market->quotes;
            if(quotes.size() < range) return AnnotationList();

            const Annotations::Extrema* extrema = aroon_extrema(market, range);
                    Annotations::Aroon* aroon = new Annotations::Aroon(range);
                                        aroon->upper = aroon_up(market->quotes, extrema, range);
                                        aroon->lower = aroon_down(market->quotes, extrema, range);
            return aroon->to_list();
      }; /// class Aroon
    }; /// namespace Annotators
  }; /// namespace Market
}; // namespace DataChoppa

The whole thing worked great, data was pulled in both real time and historical from yahoo finance (until they discontinued it... from then it was google finance), the indicators were run, and results were output. Of course, making $$$ is not as simple as just crunching numbers, and being rather naive I just tossed the results of the indicators into weighted "buckets" and backtested based on simple boolean flags based on the computed signals against threshold values. Thankfully I backtested though as the performance was horrible as losses greatly exceed profits :-(

At this point I should take a step back and note that my progress so far was the result of the availibilty of alot of great resources (we really live in the age of accelerated learning). Specifically the following are indispensible books & sites for those interested in this subject:

  • stockcharts.com - Information on any indicator can be found on this site with details on how it is computed and how it can be used
  • investopedia - Sort of the Wikipedia of investment knowledge, offers great high level insights into how market works and the financial world as it stands
  • Beyond Candlesticks - Though candlestick patterns have limited use, this is great intro to the subject, and provides a good into to reading charts.
  • Nerds on Wall Street - A great book detailing the history of computational finance. Definetly must read if you are new to the domain as it provides a concise high level history on how markets have worked the last few centuries and various computations techniques employed to Seek Alpha
  • High Probability Trading - Provides insights as to the mentality and common pitfalls when trading.
Beyond candlesticksNerds on wallstreetHigh prob trading

The last book is an excellent resource which conveys the importance of money and risk management, as well as the necessity to combine in all factors, or as many factors as you can, when making financial decisions. In the end, I feel this is the gist of it, it's not soley a matter of luck (though there is an aspect of that to this), but rather patience, discipline, balance, and most importantly focus (similar to Aikido but that's a topic for another time). There is no shorting it (unless you're talking about the assets themselves!), and if one does not have / take the necessary time to research and properly plan and out and execute strategies, they will most likely fail (as most do according to the numbers).

It was at this point that I decided to take a step back and restrategize, and having reflected and discussed it over with some acquaintances, I hedged my bets, cut my losses (tech-wise) and switched from C++ to another platform which would allow me prototype and execute ideas quicker. A good amount of time has gone into the C++ project and it worked great, but it did not make sense to continue via a slower development cycle when faster options are available (and afterall every engineer knows time is our most precious resource).

Python and R are the natural choices for this project domain, as there is extensive support in both languages for market analysis, backtesting, and execution. I have used Python at various, points in the past so it was easy to hit the ground running; R was new but by this time no language really poses a serious surprise, the best way I can describe it is spreadsheets on steroids (not exactly, as rather than spreadsheets, data frames and matrixes are the core components, but one can imagine R as being similar to the central execution environment behind Excel, Matlab, or other statistical-software).

I quickly picked up quantmod and prototyped some volatility, trend-following, momentum, and other analysis signal generators in R, plotting them using the provided charting interface. R is a great language for this sort of data manipulation, one can quickly load up structured data from CSV files or online resources, splice it and dice it, chunk it and dunk it, organize it and prioritize it, according to any arithmatic, statistical, or linear/non-linear means which they desire. Quickly loading a new 'view' on the data is as simply as a line of code, and operations can quickly be chained together at high performance.

Volatility indicator in R (consolidated)

quotes <- load_default_symbol("volatility")

quotes.atr <- ATR(quotes, n=ATR_RANGE)

quotes.atr$tr_atr_ratio <- quotes.atr$tr / quotes.atr$atr
quotes.atr$is_high      <- ifelse(quotes.atr$tr_atr_ratio > HIGH_LEVEL, TRUE, FALSE)

# Also Generate ratio of atr to close price
quotes.atr$atr_close_ratio <- quotes.atr$atr / Cl(quotes)

# Generate rising, falling, sideways indicators by calculating slope of ATR regression line
atr_lm       <- list()
atr_lm$df    <- data.frame(quotes.atr$atr, Time = index(quotes.atr))
atr_lm$model <- lm(atr ~ poly(Time, POLY_ORDER), data = atr_lm$df) # polynomial linear model

atr_lm$fit   <- fitted(atr_lm$model)
atr_lm$diff  <- diff(atr_lm$fit)
atr_lm$diff  <- as.xts(atr_lm$diff)

# Current ATR / Close Ratio
quotes.atr.abs_per <- median(quotes.atr$atr_close_ratio[!is.na(quotes.atr$atr_close_ratio)])

# plots
addTA(quotes.atr$tr, type="h")
addTA(as.xts(as.logical(quotes.atr$is_high), index(quotes.atr)), col=col1, on=1)

While it all works great, the R language itself offers very little syntactic sugar for operations not related to data-processing. While there are libraries for most common functionality found in many other execution environments, languages such as Ruby and Python, offer a "friendlier" experience to both novice and seasoned developers alike. Furthermore the process of data synchronization was a tedious step, I was looking for something that offered the flexability of DataChoppa to pull in and process live and historical data from a wide variety of sources, caching results on the fly, and using those results and analysis for subsequent operations.

This all led me to developing a series of Python libraries targeted towards providing a configurable high level view of the market. Intelligence Amplification (IA) as opposed to Artifical Intelligence (AI) if you will (see Nerds on Wall Street).

marketquery.py is a high level market querying library, which implements plugins used to resolve generic market queries for ticker time based data. One can used the interface to query for the lastest quotes or a specific range of them from a particular source, or allow the framework to select one for you.

Retrieve first 3 months of the last 5 years of GBPUSD data

  from marketquery.querier        import Querier
  from marketbase.query.builder   import QueryBuilder
  sym = "GBPUSD"
  first_3mo_of_last_5yr = (QueryBuilder().symbol(sym)
  querier = Querier()
  res     = querier.run(first_3mo_of_last_5yr)
  for query, dat in res.items():
      print(dat.raw[:1000] + (dat.raw[1000:] and '...'))

Retrieve last two month of hourly EURJPY data

  from marketquery.querier        import Querier
  from marketbase.query.builder   import QueryBuilder
  sym = "EURJPY"
  two_months_of_hourly = (QueryBuilder().symbol(sym)
  querier = Querier()
  res     = querier.run(two_months_of_hourly).raw()
  print(res[:1000] + (res[1000:] and '...'))

This provides a quick way to both lookup market data according to specific criteria, as well as cache it so that network resources are used effectively. All caching is configurable, and the user can define timeouts based on the target query, source, and/or data retrieved.

From there the next level up is the technical analysis is was trivial to whip up the tacache.py module which uses the marketquery.py interface to retrieve raw data before feeding it into TALib caching the results. The same caching mechanisms offering the same flexability is employed, if one needs to process a large data set and/or subsets multiple times in a specified period, computational resources are not wasted (important when running on a metered cloud)

Computing various technical indicators

  from marketquery.querier       import Querier
  from marketbase.query.builder  import QueryBuilder
  from tacache.runner            import TARunner
  from tacache.source            import Source
  from tacache.indicator         import Indicator
  from talib                     import SMA
  from talib                  import MACD
  res = Querier().run(QueryBuilder().symbol("AUDUSD")
  ta_runner = TARunner()
  analysis  = ta_runner.run(Indicator(SMA),
  analysis  = ta_runner.run(Indicator(MACD),
  macd, sig, hist = analysis.raw

Finally ontop of all this I wrote a2m.py, a high level querying interface consisting of modules reporting on market volatility and trends as well as other metrics; python scripts which I could quickly execute to report the current and historical market state, making used of the underlying cached query and technical analysis data, periodically invalidated to pull in new/recent live data.

Example using a2m to compute volatility

  sym = "EURUSD"
  self.resolver  = Resolver()
  self.ta_runner = TARunner()

  daily = (QueryBuilder().symbol(sym)

  hourly = (QueryBuilder().symbol(sym)

  current = (QueryBuilder().symbol(sym)

  daily_quotes   = resolver.run(daily)
  hourly_quotes  = resolver.run(hourly)
  current_quotes = resolver.run(current)

  daily_avg  = ta_runner.run(Indicator(talib.SMA, timeperiod=120),  query_result=daily_quotes).raw[-1]
  hourly_avg = ta_runner.run(Indicator(talib.SMA, timeperiod=30),  query_result=hourly_quotes).raw[-1]

  current_val    = current_quotes.raw()[-1]['Close']
  daily_percent  = current_val / daily_avg  if current_val &lt; daily_avg  else daily_avg  / current_val
  hourly_percent = current_val / hourly_avg if current_val &lt; hourly_avg else hourly_avg / current_val
Awesome to the max

I would go onto use this to execute some Forex trades, again not in an algorithmic / automated manner, but rather based on combined knowledge from fundamentals research, as well as the high level technical data, and what was the result...

Poor squidward

I jest, though I did lose a little $$$, it wasn't that much, and to be honest I feel this was due to lack of patience/discipline and other "novice" mistakes as discussed above. I did make about 1/2 of it back, and then lost interest. This all requires alot of focus and time, and I had already spent 2+ years worth of free time on this. With many other interests pulling my strings, I decided to sideline the project(s) alltogether and focus on my next crazy venture.


After some of consideration, I decided to release the R code I wrote under the MIT license. They are rather simple expirements though could be useful as a starting point for others new to the subject. As far as the Python modules and DataChoppa, I intended to eventually release them but aim to take a break first to focus on other efforts and then go back to the war room, to figure out the next stage of the strategy.

And that's that! Enough number crunching, time to go out for a hike!

Hiking meme

ReFS Part III - Back to the Resilience

Posted by Mo Morsi on May 09, 2018 02:59 PM

We've made some great headway on the ReFS filesystem anaylsis front to the point of being able to implement a rudimentary file extraction mechanism (complete with timestamps).

First a recap of the story so far:

  • ReFS, aka "The Resilient FileSystem" is a relatively new filesystem developed by Microsoft. First shipped in Windows Server 2012, it has since seen an increase in popularity and use, especially in enterprise and cloud environments.
  • Little is known about the ReFS internals outside of some sparse information provided by Microsoft. According to that, data is organized into pages of a fixed size, starting at a static position on the disk. The first round of analysis was to determine the boundaries of these top level organizational units to be able to scan the disk for high level structures.
  • Once top level structures, including the object table and root directory, were identified, each was analyzed in detail to determine potential parsable structures such as generic Attribute and Record entities as well as file and directory references.
  • The latest round of analysis consisted of diving into these entities in detail to try and deduce a mechanism which to extract file metadata and content

Before going into details, we should note this analysis is based on observations against ReFS disks generated locally, without extensive sequential cross-referencing and comparison of many large files with many changes. Also it is possible that some structures are oversimplified and/or not fully understood. That being said, this should provide a solid basis for additional analysis, getting us deep into the filesystem, and allowing us to poke and prod with isolated bits to identify their semantics.

Now onto the fun stuff!

- A ReFS filesystem can be identified with the following signature at the very start of the partition:

    00 00 00 52  65 46 53 00  00 00 00 00  00 00 00 00 ...ReFS.........
    46 53 52 53  XX XX XX XX  XX XX XX XX  XX XX XX XX FSRS

- The following Ruby code will tell you if a given offset in a given file contains a ReFS partition:

    # Point this to the file containing the disk image

    # Point this at the start of the partition containing the ReFS filesystem

    # FileSystem Signature we are looking for
    FS_SIGNATURE  = [0x00, 0x00, 0x00, 0x52, 0x65, 0x46, 0x53, 0x00] # ...ReFS.

    img = File.open(File.expand_path(DISK), 'rb')
    img.seek ADDRESS
    sig = img.read(FS_SIGNATURE.size).unpack('C*')
    puts "Disk #{sig == FS_SIGNATURE ? "contains" : "does not contain"} ReFS filesystem"

- ReFS pages are 0x4000 bytes in length

- On all inspected systems, the first page number is 0x1e (0x78000 bytes after the start of the partition containing the filesystem). This is inline w/ Microsoft documentation which states that the first metadata dir is at a fixed offset on the disk.

- Other pages contain various system, directory, and volume structures and tables as well as journaled versions of each page (shadow-written upon regular disk writes)

- The first byte of each page is its Page Number

- The first 0x30 bytes of every metadata page (dubbed the Page Header) seem to follow a certain pattern:

    byte  0: XX XX 00 00   00 00 00 00   YY 00 00 00   00 00 00 00
    byte 16: 00 00 00 00   00 00 00 00   ZZ ZZ 00 00   00 00 00 00
    byte 32: 01 00 00 00   00 00 00 00   00 00 00 00   00 00 00 00
  • dword 0 (XX XX) is the page number which is sequential and corresponds to the 0x4000 offset of the page
  • dword 2 (YY) is the journal number or sequence number
  • dword 6 (ZZ ZZ) is the "Virtual Page Number", which is non-sequential (eg values are in no apparent order) and seem to tie related pages together.
  • dword 8 is always 01, perhaps an "allocated" flag or other

- Multiple pages may share a virtual page number (byte 24/dword 6) but usually don't appear in sequence.

- The following Ruby code will print out the pages in a ReFS partition along w/ their shadow copies:

    # Point this to the file containing the disk image
    # Point this at the start of the partition containing the ReFS filesystem
    FIRST_PAGE = 0x1e
    img = File.open(File.expand_path(DISK), 'rb')
    page_id = FIRST_PAGE
    img.seek(ADDRESS + page_id*PAGE_SIZE)
    while contents = img.read(PAGE_SIZE)
      id = contents.unpack('S').first
      if id == page_id
        pos = img.pos
        start = ADDRESS + page_id * PAGE_SIZE
        img.seek(start + PAGE_SEQ)
        seq = img.read(4).unpack("L").first
        img.seek(start + PAGE_VIRTUAL_PAGE_NUM)
        vpn = img.read(4).unpack("L").first
        print "page: "
        print "0x#{id.to_s(16).upcase}".ljust(7)
        print " @ "
        print "0x#{start.to_s(16).upcase}".ljust(10)
        print ": Seq - "
        print "0x#{seq.to_s(16).upcase}".ljust(7)
        print "/ VPN - "
        print "0x#{vpn.to_s(16).upcase}".ljust(9)
        img.seek pos
      page_id += 1

- The object table (virtual page number 0x02) associates object ids' with the pages on which they reside. Here we an AttributeList consisting of Records of key/value pairs (see below for the specifics on these data structures). We can lookup the object id of the root directory (0x600000000) to retrieve the page on which it resides:

   50 00 00 00 10 00 10 00 00 00 20 00 30 00 00 00 - total length / key & value boundries
   00 00 00 00 00 00 00 00 00 06 00 00 00 00 00 00 - object id
   F4 0A 00 00 00 00 00 00 00 00 02 08 08 00 00 00 - page id / flags
   CE 0F 85 14 83 01 DC 39 00 00 00 00 00 00 00 00 - checksum
   08 00 00 00 08 00 00 00 04 00 00 00 00 00 00 00

^ The object table entry for the root dir, containing its page (0xAF4)

- When retrieving pages by id or virtual page number, look for the ones with the highest sequence number as those are the latest copies of the shadow-write mechanism.

- Expanding upon the previous example we can implement some logic to read and dump the object table:

    ATTR_START = 0x30
    def img
      @img ||= File.open(File.expand_path(DISK), 'rb')
    def pages
      @pages ||= begin
        _pages = {}
        page_id = FIRST_PAGE
        img.seek(ADDRESS + page_id*PAGE_SIZE)
        while contents = img.read(PAGE_SIZE)
          id = contents.unpack('S').first
          if id == page_id
            pos = img.pos
            start = ADDRESS + page_id * PAGE_SIZE
            img.seek(start + PAGE_SEQ)
            seq = img.read(4).unpack("L").first
            img.seek(start + PAGE_VIRTUAL_PAGE_NUM)
            vpn = img.read(4).unpack("L").first
            _pages[id] = {:id => id, :seq => seq, :vpn => vpn}
            img.seek pos
          page_id += 1
    def page(opts)
      if opts.key?(:id)
        return pages[opts[:id]]
      elsif opts[:vpn]
        return pages.values.select { |v|
          v[:vpn] == opts[:vpn]
        }.sort { |v1, v2| v1[:seq] <=> v2[:seq] }.last
    def obj_pages
      @obj_pages ||= begin
        obj_table = page(:vpn => 2)
        img.seek(ADDRESS + obj_table[:id] * PAGE_SIZE)
        bytes = img.read(PAGE_SIZE).unpack("C*")
        len1 = bytes[ATTR_START]
        len2 = bytes[ATTR_START+len1]
        start = ATTR_START + len1 + len2
        objs = {}
        while bytes.size > start && bytes[start] != 0
          len = bytes[start]
          id  = bytes[start+0x10..start+0x20-1].collect { |i| i.to_s(16).upcase }.reverse.join()
          tgt = bytes[start+0x20..start+0x21].collect   { |i| i.to_s(16).upcase }.reverse.join()
          objs[id] = tgt
          start += len
    obj_pages.each { |id, tgt|
      puts "Object #{id} is on page #{tgt}"

We could also implement a method to lookup a specific object's page:

    def obj_page(obj_id)

    puts page(:id => obj_page("0000006000000000").to_i(16))

This will retrieve the page containing the root directory

- Directories, from the root dir down, follow a consistent pattern. They are comprised of sequential lists of data structures whose length is given by the first word value (Attributes and Attribute Lists).

List are often prefixed with a Header Attribute defining the total length of the Attributes that follow that consititute the list. Though this is not a hard set rule as in the case where the list resides in the body of another Attribute (more on that below).

In either case, Attributes may be parsed by iterating over the bytes after the directory page header, reading and processing the first word to determine the next number of bytes to read (minus the length of the first word), and then repeating until null (0000) is encountered (being sure to process specified padding in the process)

- Various Attributes take on different semantics including references to subdirs and files as well as branches to additional pages containing more directory contents (for large directories); though not all Attributes have been identified.

The structures in a directory listing always seem to be of one of the following formats:

- Base Attribute - The simplest / base attribute consisting of a block whose length is given at the very start.

An example of a typical Attribute follows:

      a8 00 00 00  28 00 01 00  00 00 00 00  10 01 00 00  
      10 01 00 00  02 00 00 00  00 00 00 00  00 00 00 00  
      00 00 00 00  00 00 00 00  a9 d3 a4 c3  27 dd d2 01  
      5f a0 58 f3  27 dd d2 01  5f a0 58 f3  27 dd d2 01  
      a9 d3 a4 c3  27 dd d2 01  20 00 00 00  00 00 00 00  
      00 06 00 00  00 00 00 00  03 00 00 00  00 00 00 00  
      5c 9a 07 ac  01 00 00 00  19 00 00 00  00 00 00 00  
      00 00 01 00  00 00 00 00  00 00 00 00  00 00 00 00  
      00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  
      00 00 00 00  00 00 00 00  01 00 00 00  00 00 00 00  
      00 00 00 00  00 00 00 00

Here we a section of 0xA8 length containing the following four file timestamps (more on this conversion below)

       a9 d3 a4 c3  27 dd d2 01 - 2017-06-04 07:43:20
       5f a0 58 f3  27 dd d2 01 - 2017-06-04 07:44:40
       5f a0 58 f3  27 dd d2 01 - 2017-06-04 07:44:40
       a9 d3 a4 c3  27 dd d2 01 - 2017-06-04 07:43:20

It is safe to assume that either

  • one of the first fields in any given Attribute contains an identifier detailing how the attribute should be parsed _or_
  • the context is given by the Attribute's position in the list.
  • attributes corresponding to given meaning are referenced by address or identifier elsewhere

The following is a method which can be used to parse a given Attribute off disk, provided the img read position is set to its start:

    def read_attr
      pos = img.pos
      packed = img.read(4)
      return new if packed.nil?
      attr_len = packed.unpack('L').first
      return new if attr_len == 0

      img.seek pos
      value = img.read(attr_len)
      Attribute.new(:pos   => pos,
                    :bytes => value.unpack("C*"),
                    :len   => attr_len)

- Records - Key / Value pairs whose total length and key / value lengths are given in the first 0x20 bytes of the attribute. These are used to associated metadata sections with files whose names are recorded in the keys and contents are recorded in the value.

An example of a typical Record follows:

    40 04 00 00   10 00 1A 00   08 00 30 00   10 04 00 00   @.........0.....
    30 00 01 00   6D 00 6F 00   66 00 69 00   6C 00 65 00   0...m.o.f.i.l.e.
    31 00 2E 00   74 00 78 00   74 00 00 00   00 00 00 00   1...t.x.t.......
    A8 00 00 00   28 00 01 00   00 00 00 00   10 01 00 00   ¨...(...........
    10 01 00 00   02 00 00 00   00 00 00 00   00 00 00 00   ................
    00 00 00 00   00 00 00 00   A9 D3 A4 C3   27 DD D2 01   ........©Ó¤Ã'ÝÒ.
    5F A0 58 F3   27 DD D2 01   5F A0 58 F3   27 DD D2 01   _ Xó'ÝÒ._ Xó'ÝÒ.
    A9 D3 A4 C3   27 DD D2 01   20 00 00 00   00 00 00 00   ©Ó¤Ã'ÝÒ. .......
    00 06 00 00   00 00 00 00   03 00 00 00   00 00 00 00   ................
    5C 9A 07 AC   01 00 00 00   19 00 00 00   00 00 00 00   \..¬............
    00 00 01 00   00 00 00 00   00 00 00 00   00 00 00 00   ................
    00 00 00 00   00 00 00 00   00 00 00 00   00 00 00 00   ................
    00 00 00 00   00 00 00 00   01 00 00 00   00 00 00 00   ................
    00 00 00 00   00 00 00 00   20 00 00 00   A0 01 00 00   ........ ... ...
    D4 00 00 00   00 02 00 00   74 02 00 00   01 00 00 00   Ô.......t.......
    78 02 00 00   00 00 00 00 ...(cutoff)                   x.......

Here we see the Record parameters given by the first row:

  • total length - 4 bytes = 0x440
  • key offset - 2 bytes = 0x10
  • key length - 2 bytes = 0x1A
  • flags / identifer - 2 bytes = 0x08
  • value offset - 2 bytes = 0x30
  • value length - 2 bytes = 0x410

Naturally, the Record finishes after the value, 0x410 bytes after the value start at 0x30, or 0x440 bytes after the start of the Record (which lines up with the total length).

We also see that this Record corresponds to a file I created on disk as the key is the File Metadata flag (0x10030) followed by the filename (mofile1.txt).

Here the first attribute in the Record value is the simple attribute we discussed above, containing the file timestamps. The File Reference Attribute List Header follows (more on that below).

From observation Records w/ flag values of '0' or '8' are what we are looking for, while '4' occurs often, this almost always seems to indicate a Historical Record, or a Record that has since been replaced with another.

Since Records are prefixed with their total length, they can be thought of a subclass of Attribute. The following is a Ruby class that uses composition to dispatch record field lookup calls to values in the underlying Attribute:

    class Record
      attr_accessor :attribute

      def initialize(attribute)
        @attribute = attribute

      def key_offset
        @key_offset ||= attribute.words[2]

      def key_length
        @key_length ||= attribute.words[3]

      def flags
        @flags ||= attribute.words[4]

      def value_offset
        @value_offset ||= attribute.words[5]

      def value_length
        @value_offset ||= attribute.words[6]

      def key
        @key ||= begin
          ko, kl, vo, vl = boundries

      def value
        @value ||= begin
          ko, kl, vo, vl = boundries

      def value_pos
        attribute.pos + value_offset

      def key_pos
        attribute.pos + key_offset
    end # class Record

- AttributeList - These are more complicated but interesting. At first glance they are simple Attributes of length 0x20 but upon further inspection we consistently see it contains the length of a large block of Attributes (this length is inclusive, as it contains this first one). After parsing this Attribute, dubbed the 'List Header', we should read the remaining bytes in the List as well as the padding, before arriving at the next Attribute

   20 00 00 00   A0 01 00 00   D4 00 00 00   00 02 00 00 <- list header specifying total length (0x1A0) and padding (0xD4)
   74 02 00 00   01 00 00 00   78 02 00 00   00 00 00 00
   80 01 00 00   10 00 0E 00   08 00 20 00   60 01 00 00
   60 01 00 00   00 00 00 00   80 00 00 00   00 00 00 00
   88 00 00 00  ... (cutoff)

Here we see an Attribute of 0x20 length, that contains a reference to a larger block size (0x1A0) in its third word.

This can be confirmed by the next Attribute whose size (0x180) is the larger block size minute the length of the header (0x1A0 - 0x20). In this case the list only contains one item/child attribute.

In general a simple strategy to parse the entire case would be to:

  • Parse Attributes individually as normal
  • If we encounter a List Header Attribute, we calculate the size of the list (total length minus header length)
  • Then continue parsing Attributes, adding them to the list until the total length is completed.

It also seems that:

  • the padding that occurs after the list is given by header word number 5 (in this case 0xD4). After the list is parsed, we consistently see this many null bytes before the next Attribute begins (which is not part of & unrelated to the list).
  • the type of list is given by its 7th word; directory contents correspond to 0x200 while directory branches are indicated with 0x301

Here is a class that represents an AttributeList header attribute by encapsulating it in a similar manner to Record above:

    class AttributeListHeader
      attr_accessor :attribute

      def initialize(attr)
        @attribute = attr

      # From my observations this is always 0x20
      def len
        @len ||= attribute.dwords[0]

      def total_len
        @total_len ||= attribute.dwords[1]

      def body_len
        @body_len ||= total_len - len

      def padding
        @padding ||= attribute.dwords[2]

      def type
        @type ||= attribute.dwords[3]

      def end_pos
        @end_pos ||= attribute.dwords[4]

      def flags
        @flags ||= attribute.dwords[5]

      def next_pos
        @next_pos ||= attribute.dwords[6]

Here is a method to parse the actual Attribute List assuming the image read position is set to the beginning of the List Header

    def read_attribute_list
      header        = Header.new(read_attr)
      remaining_len = header.body_len
      orig_pos      = img.pos
      bytes         = img.read remaining_len
      img.seek orig_pos

      attributes = []

      until remaining_len == 0
        attributes    << read_attr
        remaining_len -= attributes.last.len

      img.seek orig_pos - header.len + header.end_pos

      AttributeList.new :header     => header,
                        :pos        => orig_pos,
                        :bytes      => bytes,
                        :attributes => attributes

Now we have most of what is needed to locate and parse individual files, but there are a few missing components including:

- Directory Tree Branches: These are Attribute Lists where each Attribute corresponds to a record whose value references a page which contains more directory contents.

Upon encountering an AttributeList header with flag value 0x301, we should

  • iterate over the Attributes in the list,
  • parse them as Records,
  • use the first dword in each value as the page to repeat the directory traversal process (recursively).

Additional files and subdirs found on the referenced pages should be appended to the list of current directory contents.

Note this is the (an?) implementation of the BTree structure in the ReFS filesystem described by Microsoft, as the record keys contain the tree leaf identifiers (based on file and subdirectory names).

This can be used for quick / efficient file and subdir lookup by name (see 'optimization' in 'next steps' below)

- SubDirectories: these are simply Records in the directory's Attribute List whose key contains the Directory Metadata flag (0x20030) as well as the subdir name.

The value of this Record is the corresponding object id which can be used to lookup the page containing the subdir in the object table.

A typical subdirectory Record

    70 00 00 00  10 00 12 00  00 00 28 00  48 00 00 00  
    30 00 02 00  73 00 75 00  62 00 64 00  69 00 72 00  <- here we see the key containing the flag (30 00 02 00) followed by the dir name ("subdir2")
    32 00 00 00  00 00 00 00  03 07 00 00  00 00 00 00  <- here we see the object id as the first qword in the value (0x730)
    00 00 00 00  00 00 00 00  14 69 60 05  28 dd d2 01  <- here we see the directory timestamps (more on those below)
    cc 87 ce 52  28 dd d2 01  cc 87 ce 52  28 dd d2 01  
    cc 87 ce 52  28 dd d2 01  00 00 00 00  00 00 00 00  
    00 00 00 00  00 00 00 00  00 00 00 10  00 00 00 00

- Files: like directories are Records whose key contains a flag (0x10030) followed by the filename.

The value is far more complicated though and while we've discovered some basic Attributes allowing us to pull timestamps and content from the fs, there is still more to be deduced as far as the semantics of this Record's value.

- The File Record value consists of multiple attributes, though they just appear one after each other, without a List Header. We can still parse them sequentially given that all Attributes are individually prefixed with their lengths and the File Record value length gives us the total size of the block.

- The first attribute contains 4 file timestamps at an offset given by the fifth byte of the attribute (though this position may be coincidental an the timestamps could just reside at a fixed location in this attribute).

In the first attribute example above we see the first timestamp is

       a9 d3 a4 c3  27 dd d2 01

This corresponds to the following date

        2017-06-04 07:43:20

And may be converted with the following algorithm:

          tsi = TIMESTAMP_BYTES.pack("C*").unpack("Q*").first
          Time.at(tsi / 10000000 - 11644473600)

Timestamps being in nanoseconds since the Windows Epoch Data (11644473600 = Jan 1, 1601 UTC)

- The second Attribute seems to be the Header of an Attribute List containing the 'File Reference' semantics. These are the Attributes that encapsulate the file length and content pointers.

I'm assuming this is an Attribute List so as to contain many of these types of Attributes for large files. What is not apparent are the full semantics of all of these fields.

But here is where it gets complicated, this List only contains a single attribute with a few child Attributes. This encapsulation seems to be in the same manner as the Attributes stored in the File Record value above, just a simple sequential collection without a Header.

In this single attribute (dubbed the 'File Reference Body') the first Attribute contains the length of the file while the second is the Header for yet another List, this one containing a Record whose value contains a reference to the page which the file contents actually reside.

      | ...                                  |
      | File Entry Record                    |
      | Key: 0x10030 [FileName]              |
      | Value:                               |
      | Attribute1: Timestamps               |
      | Attribute2:                          |
      |   File Reference List Header         |
      |   File Reference List Body(Record)   |
      |     Record Key: ?                    |
      |     Record Value:                    |
      |       File Length Attribute          |
      |       File Content List Header       |
      |       File Content Record(s)         |
      | Padding                              |
      | ...                                  |

While complicated each level can be parsed in a similar manner to all other Attributes & Records, just taking care to parse Attributes into their correct levels & structures.

As far as actual values,

  • the file length is always seen at a fixed offset within its attribute (0x3c) and
  • the content pointer seems to always reside in the second qword of the Record value. This pointer is simply a reference to the page which the file contents can be read verbatim.


And that's it! An example implementation of all this logic can be seen in our expiremental 'resilience' library found here:


The next steps would be to

  • expand upon the data structures above (verify that we have interpreted the existing structures correctly)
  • deduce full Attribute and Record semantics so as to be able to consistently parse files of any given length, with any given number of modifications out of the file system

And once we have done so robustly, we can start looking at optimization, possibly rolling out some expiremental production logic for ReFS filesystem support!

... Cha-ching $ £ ¥ ¢ ₣ ₩ !!!!

Extracting Formally Verified C with FStar and KreMLin

Posted by William Brown on April 29, 2018 02:00 PM

Extracting Formally Verified C with FStar and KreMLin

As software engineering has progressed, the correctness of software has become a major issue. However the tools that exist today to help us create correct programs have been lacking. Human’s continue to make mistakes that cause harm to others (even I do!).

Instead, tools have now been developed that allow the verification of programs and algorithms as correct. These have not gained widespread adoption due to the complexities of their tool chains or other social and cultural issues.

The Project Everest research has aimed to create a formally verified webserver and cryptography library. To achieve this they have developed a language called F* (FStar) and KreMLin as an extraction tool. This allows an FStar verified algorithm to be extracted to a working set of C source code - C source code that can be easily added to existing projects.

Setting up a FStar and KreMLin environment

Today there are a number of undocumented gotchas with opam - the OCaml package manager. Most of these are silent errors. I used the following steps to get to a working environment.

# It's important to have bzip2 here else opam silently fails!
dnf install -y rsync git patch opam bzip2 which gmp gmp-devel m4 \
        hg unzip pkgconfig redhat-rpm-config

opam init
# You need 4.02.3 else wasm will not compile.
opam switch create 4.02.3
opam switch 4.02.3
echo ". /home/Work/.opam/opam-init/init.sh > /dev/null 2> /dev/null || true" >> .bashrc
opam install batteries fileutils yojson ppx_deriving_yojson zarith fix pprint menhir process stdint ulex wasm

PATH "~/z3/bin:~/FStar/bin:~/kremlin:$PATH"
# You can get the "correct" z3 version from https://github.com/FStarLang/binaries/raw/master/z3-tested/z3-
unzip z3-
mv z3- z3

# You will need a "stable" release of FStar https://github.com/FStarLang/FStar/archive/stable.zip
unzip stable.zip
mv FStar-stable Fstar
cd ~/FStar
opam config exec -- make -C src/ocaml-output -j
opam config exec -- make -C ulib/ml

# You need a master branch of kremlin https://github.com/FStarLang/kremlin/archive/master.zip
cd ~
unzip master.zip
mv kremlin-master kremlin
cd kremlin
opam config exec -- make
opam config exec -- make test

Your first FStar extraction

Summer, Code and Fedora

Posted by Amitosh Swain Mahapatra on April 29, 2018 10:41 AM

I have been selected for Google Summer of Code (GSoC) 2018. I’ll be working on developing the backend for the Fedora Community App.

Google Summer of Code

Fedora Project

Fedora has an android app which lets a user browse Fedora Magazine, Ask Fedora, FedoCal etc within it. This app is build using the Ionic Framework, Angular and Cordova. Essentially it is a cross-platform hybrid app.

Project Description

In the current form, most of the functions in the app rely on an in-app browser to render content. This project aims to improve the existing Fedora App for Android for speed, utility and responsiveness, introduce a deeper native integration and make the app more personal for the user.

During the GSoC period I aim for the following deliverables:

  1. Refactored, and restructured code for the android app, providing native experience to the fullest.
  2. Deeper integration with system as well as various Fedora Infra apps.
  3. An Ionic hybrid app that is publishable to app stores like Google Play Store and F-Droid
  4. Providing an offline first experience – caching data from Fedora Magazine and Social and optionally, making some of the content accessible offline.

What would I be working on

The Fedora android app is developed using the Ionic framework and Cordova. It is a framework for developing cross-platform mobile apps, also called as hybrid apps using HTML, CSS and JavaScript.

Here are some of the features I plan to integrate this summer:

  1. Offine data for posts and calendar.
  2. Syncing system calendar with events from FedoCal.
  3. Fedora package search.
  4. Bookmarks and offline reading.
  5. FMN notifications.

About Google Summer of Code

Google Summer of Code (GSoC) is a yearly program by Google to help the open source communities to reach out to student contributors. Organisations pitch projects, and when selected, pick up university students to work on these projects or their own ideas related to the organisation’s project(s).

This is my second GSoC participation. Last summer (2017), I worked with the FOSSi Foundation for bringing code quality metrics for projects listed under LibreCores.org.

While, the summer this time is really really hot (I mean 40°C/104°F in April!), it is going to be interesting!

AD directory admins group setup

Posted by William Brown on April 25, 2018 02:00 PM

AD directory admins group setup

Recently I have been reading many of the Microsoft Active Directory best practices for security and hardening. These are great resources, and very well written. The major theme of the articles is “least privilege”, where accounts like Administrators or Domain Admins are over used and lead to further compromise.

A suggestion that is put forward by the author is to have a group that has no other permissions but to manage the directory service. This should be used to temporarily make a user an admin, then after a period of time they should be removed from the group.

This way you have no Administrators or Domain Admins, but you have an AD only group that can temporarily grant these permissions when required.

I want to explore how to create this and configure the correct access controls to enable this scheme.

Create our group

First, lets create a “Directory Admins” group which will contain our members that have the rights to modify or grant other privileges.

# /usr/local/samba/bin/samba-tool group add 'Directory Admins'
Added group Directory Admins

It’s a really good idea to add this to the “Denied RODC Password Replication Group” to limit the risk of these accounts being compromised during an attack. Additionally, you probably want to make your “admin storage” group also a member of this, but I’ll leave that to you.

# /usr/local/samba/bin/samba-tool group addmembers "Denied RODC Password Replication Group" "Directory Admins"

Now that we have this, lets add a member to it. I strongly advise you create special accounts just for the purpose of directory administration - don’t use your daily account for this!

# /usr/local/samba/bin/samba-tool user create da_william
User 'da_william' created successfully
# /usr/local/samba/bin/samba-tool group addmembers 'Directory Admins' da_william
Added members to group Directory Admins

Configure the permissions

Now we need to configure the correct dsacls to allow Directory Admins full control over directory objects. It could be possible to constrain this to only modification of the cn=builtin and cn=users container however, as directory admins might not need so much control for things like dns modification.

If you want to constrain these permissions, only apply the following to cn=builtins instead - or even just the target groups like Domain Admins.

First we need the objectSID of our Directory Admins group so we can build the ACE.

# /usr/local/samba/bin/samba-tool group show 'directory admins' --attributes=cn,objectsid
dn: CN=Directory Admins,CN=Users,DC=adt,DC=blackhats,DC=net,DC=au
cn: Directory Admins
objectSid: S-1-5-21-2488910578-3334016764-1009705076-1104

Now with this we can construct the ACE.


This permission grants:

  • RP: read property
  • WP: write property
  • LC: list child objects
  • LO: list objects
  • RC: read control

It could be possible to expand these rights: it depends if you want directory admins to be able to do “day to day” ad control jobs, or if you just use them for granting of privileges. That’s up to you. An expanded ACE might be:

# Same as Enterprise Admins

Now lets actually apply this and do a test:

# /usr/local/samba/bin/samba-tool dsacl set --sddl='(A;CI;RPWPLCLORC;;;S-1-5-21-2488910578-3334016764-1009705076-1104)' --objectdn='dc=adt,dc=blackhats,dc=net,dc=au'
# /usr/local/samba/bin/samba-tool group addmembers 'directory admins' administrator -U 'da_william%...'
Added members to group directory admins
# /usr/local/samba/bin/samba-tool group listmembers 'directory admins' -U 'da_william%...'
# /usr/local/samba/bin/samba-tool group removemembers 'directory admins' -U 'da_william%...'
Removed members from group directory admins
# /usr/local/samba/bin/samba-tool group listmembers 'directory admins' -U 'da_william%...'

It works!


With these steps we have created a secure account that has limited admin rights, able to temporarily promote users with privileges for administrative work - and able to remove it once the work is complete.

Understanding AD Access Control Entries

Posted by William Brown on April 19, 2018 02:00 PM

Understanding AD Access Control Entries

A few days ago I set out to work on making samba 4 my default LDAP server. In the process I was forced to learn about Active Directory Access controls. I found that while there was significant documentation around the syntax of these structures, very little existed explaining how to use them effectively.

What’s in an ACE?

If you look at the the ACL of an entry in AD you’ll see something like:


This seems very confusing and complex (and someone should write a tool to explain these … maybe me). But once you can see the structure it starts to make sense.

Most of the access controls you are viewing here are DACLs or Discrestionary Access Control Lists. These make up the majority of the output after ‘O:DAG:DAD:AI’. TODO: What does ‘O:DAG:DAD:AI’ mean completely?

After that there are many ACEs defined in SDDL or ???. The structure is as follows:


Each of these fields can take varies types. These interact to form the access control rules that allow or deny access. Thankfully, you don’t need to adjust many fields to make useful ACE entries.

MS maintains a document of these field values here.

They also maintain a list of wellknown SID values here

I want to cover some common values you may see though:


Most of the types you’ll see are “A” and “OA”. These mean the ACE allows an access by the SID.


These change the behaviour of the ACE. Common values you may want to set are CI and OI. These determine that the ACE should be inherited to child objects. As far as the MS docs say, these behave the same way.

If you see ID in this field it means the ACE has been inherited from a parent object. In this case the inherit_object_guid field will be set to the guid of the parent that set the ACE. This is great, as it allows you to backtrace the origin of access controls!


This is the important part of the ACE - it determines what access the SID has over this object. The MS docs are very comprehensive of what this does, but common values are:

  • RP: read property
  • WP: write property
  • CR: control rights
  • CC: child create (create new objects)
  • DC: delete child
  • LC: list child objects
  • LO: list objects
  • RC: read control
  • WO: write owner (change the owner of an object)
  • WD: write dac (allow writing ACE)
  • SW: self write
  • SD: standard delete
  • DT: delete tree

I’m not 100% sure of all the subtle behaviours of these, because they are not documented that well. If someone can help explain these to me, it would be great.


We will skip some fields and go straight to SID. This is the SID of the object that is allowed the rights from the rights field. This field can take a GUID of the object, or it can take a “well known” value of the SID. For example ‘AN’ means “anonymous users”, or ‘AU’ meaning authenticated users.


I won’t claim to be an AD ACE expert, but I did find the docs hard to interpret at first. Having a breakdown and explanation of the behaviour of the fields can help others, and I really want to hear from people who know more about this topic on me so that I can expand this resource to help others really understand how AD ACE’s work.

Making Samba 4 the default LDAP server

Posted by William Brown on April 17, 2018 02:00 PM

Making Samba 4 the default LDAP server

Earlier this year Andrew Bartlett set me the challenge: how could we make Samba 4 the default LDAP server in use for Linux and UNIX systems? I’ve finally decided to tackle this, and write up some simple changes we can make, and decide on some long term goals to make this a reality.

What makes a unix directory anyway?

Great question - this is such a broad topic, even I don’t know if I can single out what it means. For the purposes of this exercise I’ll treat it as “what would we need from my previous workplace”. My previous workplace had a dedicated set of 389 Directory Server machines that served lookups mainly for email routing, application authentication and more. The didn’t really process a great deal of login traffic as the majority of the workstations were Windows - thus connected to AD.

What it did show was that Linux clients and applications:

  • Want to use anonymous binds and searchs - Applications and clients are NOT domain members - they just want to do searches
  • The content of anonymous lookups should be “public safe” information. (IE nothing private)
  • LDAPS is a must for binds
  • MemberOf and group filtering is very important for access control
  • sshPublicKey and userCertificate;binary is important for 2fa/secure logins

This seems like a pretty simple list - but it’s not the model Samba 4 or AD ship with.

You’ll also want to harden a few default settings. These include:

  • Disable Guest
  • Disable 10 machine join policy

AD works under the assumption that all clients are authenticated via kerberos, and that kerberos is the primary authentication and trust provider. As a result, AD often ships with:

  • Disabled anonymous binds - All clients are domain members or service accounts
  • No anonymous content available to search
  • No LDAPS (GSSAPI is used instead)
  • no sshPublicKey or userCertificates (pkinit instead via krb)
  • Access control is much more complex topic than just “matching an ldap filter”.

As a result, it takes a bit of effort to change Samba 4 to work in a way that suits both, securely.

Isn’t anonymous binding insecure?

Let’s get this one out the way - no it’s not. In every pen test I have seen if you can get access to a domain joined machine, you probably have a good chance of taking over the domain in various ways. Domain joined systems and krb allows lateral movement and other issues that are beyond the scope of this document.

The lack of anonymous lookup is more about preventing information disclosure - security via obscurity. But it doesn’t take long to realise that this is trivially defeated (get one user account, guest account, domain member and you can search …).

As a result, in some cases it may be better to allow anonymous lookups because then you don’t have spurious service accounts, you have a clear understanding of what is and is not accessible as readable data, and you don’t need every machine on the network to be domain joined - you prevent a possible foothold of lateral movement.

So anonymous binding is just fine, as the unix world has shown for a long time. That’s why I have very few concerns about enabling it. Your safety is in the access controls for searches, not in blocking anonymous reads outright.

Installing your DC

As I run fedora, you will need to build and install samba for source so you can access the heimdal kerberos functions. Fedora’s samba 4 ships ADDC support now, but lacks some features like RODC that you may want. In the future I expect this will change though.

These documents will help guide you:


build steps

install a domain

I strongly advise you use options similar to:

/usr/local/samba/bin/samba-tool domain provision --server-role=dc --use-rfc2307 --dns-backend=SAMBA_INTERNAL --realm=SAMDOM.EXAMPLE.COM --domain=SAMDOM --adminpass=Passw0rd

Allow anonymous binds and searches

Now that you have a working domain controller, we should test you have working ldap:

/usr/local/samba/bin/samba-tool forest directory_service dsheuristics 0000002 -H ldaps://localhost --simple-bind-dn='administrator@samdom.example.com'
ldapsearch -b DC=samdom,DC=example,DC=com -H ldaps://localhost -x

You can see the domain object but nothing else. Many other blogs and sites recommend a blanket “anonymous read all” access control, but I think that’s too broad. A better approach is to add the anonymous read to only the few containers that require it.

/usr/local/samba/bin/samba-tool dsacl set --objectdn=DC=samdom,DC=example,DC=com --sddl='(A;;RPLCLORC;;;AN)' --simple-bind-dn="administrator@samdom.example.com" --password=Passw0rd
/usr/local/samba/bin/samba-tool dsacl set --objectdn=CN=Users,DC=samdom,DC=example,DC=com --sddl='(A;CI;RPLCLORC;;;AN)' --simple-bind-dn="administrator@samdom.example.com" --password=Passw0rd
/usr/local/samba/bin/samba-tool dsacl set --objectdn=CN=Builtin,DC=samdom,DC=example,DC=com --sddl='(A;CI;RPLCLORC;;;AN)' --simple-bind-dn="administrator@samdom.example.com" --password=Passw0rd

In AD groups and users are found in cn=users, and some groups are in cn=builtin. So we allow read to the root domain object, then we set a read on cn=users and cn=builtin that inherits to it’s child objects. The attribute policies are derived elsewhere, so we can assume that things like kerberos data and password material are safe with these simple changes.

Configuring LDAPS

This is a reasonable simple exercise. Given a ca cert, key and cert we can place these in the correct locations samba expects. By default this is the private directory. In a custom install, that’s /usr/local/samba/private/tls/, but for distros I think it’s /var/lib/samba/private. Simply replace ca.pem, cert.pem and key.pem with your files and restart.

Adding schema

To allow adding schema to samba 4 you need to reconfigure the dsdb config on the schema master. To show the current schema master you can use:

/usr/local/samba/bin/samba-tool fsmo show -H ldaps://localhost --simple-bind-dn='administrator@adt.blackhats.net.au' --password=Password1

Look for the value:

SchemaMasterRole owner: CN=NTDS Settings,CN=LDAPKDC,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au

And note the CN=ldapkdc = that’s the hostname of the current schema master.

On the schema master we need to adjust the smb.conf. The change you need to make is:

    dsdb:schema update allowed = yes

Now restart the instance and we can update the schema. The following LDIF should work if you replace ${DOMAINDN} with your namingContext. You can apply it with ldapmodify

dn: CN=sshPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: add
objectClass: top
objectClass: attributeSchema
cn: sshPublicKey
name: sshPublicKey
lDAPDisplayName: sshPublicKey
description: MANDATORY: OpenSSH Public key
oMSyntax: 4
isSingleValued: FALSE
searchFlags: 8

dn: CN=ldapPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: add
objectClass: top
objectClass: classSchema
cn: ldapPublicKey
name: ldapPublicKey
description: MANDATORY: OpenSSH LPK objectclass
lDAPDisplayName: ldapPublicKey
subClassOf: top
objectClassCategory: 3
defaultObjectCategory: CN=ldapPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
mayContain: sshPublicKey

dn: CN=User,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: modify
replace: auxiliaryClass
auxiliaryClass: ldapPublicKey
auxiliaryClass: posixAccount
auxiliaryClass: shadowAccount
sudo ldapmodify -f sshpubkey.ldif -D 'administrator@adt.blackhats.net.au' -w Password1 -H ldaps://localhost
adding new entry "CN=sshPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au"

adding new entry "CN=ldapPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au"

modifying entry "CN=User,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au"

To my surprise, userCertificate already exists! The reason I missed it is a subtle ad schema behaviour I missed. The ldap attribute name is stored in the lDAPDisplayName and may not be the same as the CN of the schema element. As a result, you can find this with:

ldapsearch -H ldaps://localhost -b CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au -x -D 'administrator@adt.blackhats.net.au' -W '(attributeId='

This doesn’t solve my issues: Because I am a long time user of 389-ds, that means I need some ns compat attributes. Here I add the nsUniqueId value so that I can keep some compatability.

dn: CN=nsUniqueId,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: add
objectClass: top
objectClass: attributeSchema
attributeID: 2.16.840.1.113730.3.1.542
cn: nsUniqueId
name: nsUniqueId
lDAPDisplayName: nsUniqueId
description: MANDATORY: nsUniqueId compatability
oMSyntax: 4
isSingleValued: TRUE
searchFlags: 9

dn: CN=nsOrgPerson,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: add
objectClass: top
objectClass: classSchema
governsID: 2.16.840.1.113730.3.2.334
cn: nsOrgPerson
name: nsOrgPerson
description: MANDATORY: Netscape DS compat person
lDAPDisplayName: nsOrgPerson
subClassOf: top
objectClassCategory: 3
defaultObjectCategory: CN=nsOrgPerson,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
mayContain: nsUniqueId

dn: CN=User,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: modify
replace: auxiliaryClass
auxiliaryClass: ldapPublicKey
auxiliaryClass: posixAccount
auxiliaryClass: shadowAccount
auxiliaryClass: nsOrgPerson

Now with this you can extend your users with the required data for SSH, certificates and maybe 389-ds compatability.

/usr/local/samba/bin/samba-tool user edit william  -H ldaps://localhost --simple-bind-dn='administrator@adt.blackhats.net.au'


Out of the box a number of the unix attributes are not indexed by Active Directory. To fix this you need to update the search flags in the schema.

Again, temporarily allow changes:

    dsdb:schema update allowed = yes

Now we need to add some indexes for common types. Note that in the nsUniqueId schema I already added the search flags. We also want to set that these values should be preserved if they become tombstones so we can recove them.

/usr/local/samba/bin/samba-tool schema attribute modify uid --searchflags=9
/usr/local/samba/bin/samba-tool schema attribute modify nsUniqueId --searchflags=9
/usr/local/samba/bin/samba-tool schema attribute modify uidnumber --searchflags=9
/usr/local/samba/bin/samba-tool schema attribute modify gidnumber --searchflags=9
# Preserve on tombstone but don't index
/usr/local/samba/bin/samba-tool schema attribute modify x509-cert --searchflags=8
/usr/local/samba/bin/samba-tool schema attribute modify sshPublicKey --searchflags=8
/usr/local/samba/bin/samba-tool schema attribute modify gecos --searchflags=8
/usr/local/samba/bin/samba-tool schema attribute modify loginShell --searchflags=8
/usr/local/samba/bin/samba-tool schema attribute modify home-directory --searchflags=24

AD Hardening

We want to harden a few default settings that could be considered insecure. First, let’s stop “any user from being able to domain join machines”.

/usr/local/samba/bin/samba-tool domain settings account_machine_join_quota 0 -H ldaps://localhost --simple-bind-dn='administrator@adt.blackhats.net.au'

Now let’s disable the Guest account

/usr/local/samba/bin/samba-tool user disable Guest -H ldaps://localhost --simple-bind-dn='administrator@adt.blackhats.net.au'

I plan to write a more complete samba-tool extension for auditing these and more options, so stay tuned!

SSSD configuration

Now that our directory service is configured, we need to configure our clients to utilise it correctly.

Here is my SSSD configuration, that supports sshPublicKey distribution, userCertificate authentication on workstations and SID -> uid mapping. In the future I want to explore sudo rules in LDAP with AD, and maybe even HBAC rules rather than GPO.

Please refer to my other blog posts on configuration of the userCertificates and sshKey distribution.

ignore_group_members = False

# There is a bug in SSSD where this actually means "ipv6 only".
# lookup_family_order=ipv6_first
cache_credentials = True
id_provider = ldap
auth_provider = ldap
access_provider = ldap
chpass_provider = ldap
ldap_search_base = dc=blackhats,dc=net,dc=au

# This prevents an infinite referral loop.
ldap_referrals = False
ldap_id_mapping = True
ldap_schema = ad
# Rather that being in domain users group, create a user private group
# automatically on login.
# This is very important as a security setting on unix!!!
# See this bug if it doesn't work correctly.
# https://pagure.io/SSSD/sssd/issue/3723
auto_private_groups = true

ldap_uri = ldaps://ad.blackhats.net.au
ldap_tls_reqcert = demand
ldap_tls_cacert = /etc/pki/tls/certs/bh_ldap.crt

# Workstation access
ldap_access_filter = (memberOf=CN=Workstation Operators,CN=Users,DC=blackhats,DC=net,DC=au)

ldap_user_member_of = memberof
ldap_user_gecos = cn
ldap_user_uuid = objectGUID
ldap_group_uuid = objectGUID
# This is really important as it allows SSSD to respect nsAccountLock
ldap_account_expire_policy = ad
ldap_access_order = filter, expire
# Setup for ssh keys
ldap_user_ssh_public_key = sshPublicKey
# This does not require ;binary tag with AD.
ldap_user_certificate = userCertificate
# This is required for the homeDirectory to be looked up in the sssd schema
ldap_user_home_directory = homeDirectory

services = nss, pam, ssh, sudo
config_file_version = 2
certificate_verification = no_verification

domains = blackhats.net.au
homedir_substring = /home

pam_cert_auth = True







With these simple changes we can easily make samba 4 able to perform the roles of other unix focused LDAP servers. This allows stateless clients, secure ssh key authentication, certificate authentication and more.

Some future goals to improve this include:

  • Ship samba 4 with schema templates that can be used
  • Schema querying (what objectclass takes this attribute?)
  • Group editing (same as samba-tool user edit)
  • Security auditing tools
  • user/group modification commands
  • Refactor and improve the cli tools python to be api driven - move the logic from netcmd into samdb so that samdb can be an API that python can consume easier. Prevent duplication of logic.

The goal is so that an admin never has to see an LDIF ever again.

Smartcards and You - How To Make Them Work on Fedora/RHEL

Posted by William Brown on February 26, 2018 02:00 PM

Smartcards and You - How To Make Them Work on Fedora/RHEL

Smartcards are a great way to authenticate users. They have a device (something you have) and a pin (something you know). They prevent password transmission, use strong crypto and they even come in a variety of formats. From your “card” shapes to yubikeys.

So why aren’t they used more? It’s the classic issue of usability - the setup for them is undocumented, complex, and hard to discover. Today I hope to change this.

The Goal

To authenticate a user with a smartcard to a physical linux system, backed onto LDAP. The public cert in LDAP is validated, as is the chain to the CA.

You Will Need

I’ll be focusing on the yubikey because that’s what I own.

Preparing the Smartcard

First we need to make the smartcard hold our certificate. Because of a crypto issue in yubikey firmware, it’s best to generate certificates for these externally.

I’ve documented this before in another post, but for accesibility here it is again.

Create an NSS DB, and generate a certificate signing request:

certutil -d . -N -f pwdfile.txt
certutil -d . -R -a -o user.csr -f pwdfile.txt -g 4096 -Z SHA256 -v 24 \
--keyUsage digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment --nsCertType sslClient --extKeyUsage clientAuth \
-s "CN=username,O=Testing,L=example,ST=Queensland,C=AU"

Once the request is signed, and your certificate is in “user.crt”, import this to the database.

certutil -A -d . -f pwdfile.txt -i user.crt -a -n TLS -t ",,"
certutil -A -d . -f pwdfile.txt -i ca.crt -a -n TLS -t "CT,,"

Now export that as a p12 bundle for the yubikey to import.

pk12util -o user.p12 -d . -k pwdfile.txt -n TLS

Now import this to the yubikey - remember to use slot 9a this time! As well make sure you set the touch policy NOW, because you can’t change it later!

yubico-piv-tool -s9a -i user.p12 -K PKCS12 -aimport-key -aimport-certificate -k --touch-policy=always

Setting up your LDAP user

First setup your system to work with LDAP via SSSD. You’ve done that? Good! Now it’s time to get our user ready.

Take our user.crt and convert it to DER:

openssl x509 -inform PEM -outform DER -in user.crt -out user.der

Now you need to transform that into something that LDAP can understand. In the future I’ll be adding a tool to 389-ds to make this “automatic”, but for now you can use python:

>>> import base64
>>> with open('user.der', 'r') as f:
>>>    print(base64.b64encode(f.read))

That should output a long base64 string on one line. Add this to your ldap user with ldapvi:

userCertificate;binary:: <BASE64>

Note that ‘;binary’ tag has an important meaning here for certificate data, and the ‘::’ tells ldap that this is b64 encoded, so it will decode on addition.

Setting up the system

Now that you have done that, you need to teach SSSD how to intepret that attribute.

In your various SSSD sections you’ll need to make the following changes:

auth_provider = ldap
ldap_user_certificate = userCertificate;binary

# This controls OCSP checks, you probably want this enabled!
# certificate_verification = no_verification

pam_cert_auth = True

Now the TRICK is letting SSSD know to use certificates. You need to run:

sudo touch /var/lib/sss/pubconf/pam_preauth_available

With out this, SSSD won’t even try to process CCID authentication!

Add your ca.crt to the system trusted CA store for SSSD to verify:

certutil -A -d /etc/pki/nssdb -i ca.crt -n USER_CA -t "CT,,"

Add coolkey to the database so it can find smartcards:

modutil -dbdir /etc/pki/nssdb -add "coolkey" -libfile /usr/lib64/libcoolkeypk11.so

Check that SSSD can find the certs now:

# sudo /usr/libexec/sssd/p11_child --pre --nssdb=/etc/pki/nssdb
PIN for william
CAC ID Certificate

If you get no output here you are missing something! If this doesn’t work, nothing will!

Finally, you need to tweak PAM to make sure that pam_unix isn’t getting in the way. I use the following configuration.

auth        required      pam_env.so
# This skips pam_unix if the given uid is not local (IE it's from SSSD)
auth        [default=1 ignore=ignore success=ok] pam_localuser.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 1000 quiet_success
auth        sufficient    pam_sss.so prompt_always ignore_unknown_user
auth        required      pam_deny.so

account     required      pam_unix.so
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 1000 quiet
account     [default=bad success=ok user_unknown=ignore] pam_sss.so
account     required      pam_permit.so

password    requisite     pam_pwquality.so try_first_pass local_users_only retry=3 authtok_type=
password    sufficient    pam_unix.so sha512 shadow try_first_pass use_authtok
password    sufficient    pam_sss.so use_authtok
password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
-session    optional      pam_systemd.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      pam_unix.so
session     optional      pam_sss.so

That’s it! Restart SSSD, and you should be good to go.

Finally, you may find SELinux isn’t allowing authentication. This is really sad that smartcards don’t work with SELinux out of the box and I have raised a number of bugs, but check this just in case.

Happy authentication!

Using b43 firmware on Fedora Atomic Workstation

Posted by William Brown on December 22, 2017 02:00 PM

Using b43 firmware on Fedora Atomic Workstation

My Macbook Pro has a broadcom b43 wireless chipset. This is notorious for being one of the most annoying wireless adapters on linux. When you first install Fedora you don’t even see “wifi” as an option, and unless you poke around in dmesg, you won’t find how to enable b43 to work on your platform.


The b43 driver requires proprietary firmware to be loaded else the wifi chip will not run. There are a number of steps for this process found on the linux wireless page . You’ll note that one of the steps is:

export FIRMWARE_INSTALL_DIR="/lib/firmware"
sudo b43-fwcutter -w "$FIRMWARE_INSTALL_DIR" broadcom-wl-5.100.138/linux/wl_apsta.o

So we need to be able to write and extract our firmware to /usr/lib/firmware, and then reboot and out wifi works.

Fedora Atomic Workstation

Atomic WS is similar to atomic server, that it’s a read-only ostree based deployment of fedora. This comes with a number of unique challenges and quirks but for this issue:

sudo touch /usr/lib/firmware/test
/bin/touch: cannot touch '/usr/lib/firmware/test': Read-only file system

So we can’t extract our firmware!

Normally linux also supports reading from /usr/local/lib/firmware (which on atomic IS writeable …) but for some reason fedora doesn’t allow this path.

Solution: Layered RPMs

Atomic has support for “rpm layering”. Ontop of the ostree image (which is composed of rpms) you can supply a supplemental list of packages that are “installed” at rpm-ostree update time.

This way you still have an atomic base platform, with read-only behaviours, but you gain the ability to customise your system. To achive it, it must be possible to write to locations in /usr during rpm install.

This means our problem has a simple solution: Create a b43 rpm package. Note, that you can make this for yourself privately, but you can’t distribute it for legal reasons.

Get setup on atomic to build the packages:

rpm-ostree install rpm-build createrepo

RPM specfile:


%define debug_package %{nil} Summary: Allow b43 fw to install on ostree installs due to bz1512452 Name: b43-fw Version: 1.0.0 Release: 1 License: Proprietary, DO NOT DISTRIBUTE BINARY FORMS URL: http://linuxwireless.sipsolutions.net/en/users/Drivers/b43/ Group: System Environment/Kernel

BuildRequires: b43-fwcutter

Source0: http://www.lwfinger.com/b43-firmware/broadcom-wl-5.100.138.tar.bz2

%description Broadcom firmware for b43 chips.

%prep %setup -q -n broadcom-wl-5.100.138

%build true

%install pwd mkdir -p %{buildroot}/usr/lib/firmware b43-fwcutter -w %{buildroot}/usr/lib/firmware linux/wl_apsta.o

%files %defattr(-,root,root,-) %dir %{_prefix}/lib/firmware/b43 %{_prefix}/lib/firmware/b43/*

%changelog * Fri Dec 22 2017 William Brown <william at blackhats.net.au> - 1.0.0 - Initial version

Now you can put this into a folder like so:

mkdir -p ~/rpmbuild/{SPECS,SOURCES}
<editor> ~/rpmbuild/SPECS/b43-fw.spec
wget -O ~/rpmbuild/SOURCES/broadcom-wl-5.100.138.tar.bz2 http://www.lwfinger.com/b43-firmware/broadcom-wl-5.100.138.tar.bz2

We are now ready to build!

rpmbuild -bb ~/rpmbuild/SPECS/b43-fw.spec
createrepo ~/rpmbuild/RPMS/x86_64/

Finally, we can install this. Create a yum repos file:

baseurl=file:///home/<YOUR USERNAME HERE>/rpmbuild/RPMS/x86_64
rpm-ostree install b43-fw

Now reboot and enjoy wifi on your Fedora Atomic Macbook Pro!

What am I doing ?

Posted by Mayank Jha on December 08, 2017 04:31 AM

Joining club 27 because it’s cool ?

No one knows when will someone die.
My friend died, whom I loved.
Unfortunate and sad.
I could have been one.
I could die the next moment.
I am shit scared now.
What should I do ?
The memory of my friend will stay with me forever till I die.
It’s a scar which will be lifelong.
But I cherish the moments
I had with him.
The wild explorations we had.
Let’s cherish the moments we made.
Sadly you could not stay with us any longer.


Makes us realise how transient human life is.
The one final reality which we all need to face.
The one final breath.
The one final slump into slumber and nothingness.
Complete void.
All that remains is between the start and end.
He said, in the end nothing matters.
But the thing in between is ALL that is.
You came with nothing and would go with nothing.
But in the middle is where the magic called life happens


The emotion called love, which binds us.
Love from our creators.
Love from humans around us.
I don’t know whether or not I’ll end up with a girl.
I don’t know whether I’ll end up with a guy.
I know for sure, that spreading love and passion.
My nerves might never heal.
The itch might never go.
But love shall stay.

Creating yubikey SSH and TLS certificates

Posted by William Brown on November 10, 2017 02:00 PM

Creating yubikey SSH and TLS certificates

Recently yubikeys were shown to have a hardware flaw in the way the generated private keys. This affects the use of them to provide PIV identies or SSH keys.

However, you can generate the keys externally, and load them to the key to prevent this issue.


First, we’ll create a new NSS DB on an airgapped secure machine (with disk encryption or in memory storage!)

certutil -N -d . -f pwdfile.txt

Now into this, we’ll create a self-signed cert valid for 10 years.

certutil -S -f pwdfile.txt -d . -t "C,," -x -n "SSH" -g 2048 -s "cn=william,O=ssh,L=Brisbane,ST=Queensland,C=AU" -v 120

We export this now to PKCS12 for our key to import.

pk12util -o ssh.p12 -d . -k pwdfile.txt -n SSH

Next we import the key and cert to the hardware in slot 9c

yubico-piv-tool -s9c -i ssh.p12 -K PKCS12 -aimport-key -aimport-certificate -k

Finally, we can display the ssh-key from the token.

ssh-keygen -D /usr/lib64/opensc-pkcs11.so -e

Note, we can make this always used by ssh client by adding the following into .ssh/config:

PKCS11Provider /usr/lib64/opensc-pkcs11.so

TLS Identities

The process is almost identical for user certificates.

First, create the request:

certutil -d . -R -a -o user.csr -f pwdfile.txt -g 4096 -Z SHA256 -v 24 \
--keyUsage digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment --nsCertType sslClient --extKeyUsage clientAuth \
-s "CN=username,O=Testing,L=example,ST=Queensland,C=AU"

Once the request is signed, we should have a user.crt back. Import that to our database:

certutil -A -d . -f pwdfile.txt -i user.crt -a -n TLS -t ",,"

Import our CA certificate also. Next export this to p12:

pk12util -o user.p12 -d . -k pwdfile.txt -n TLS

Now import this to the yubikey - remember to use slot 9a this time!

yubico-piv-tool -s9a -i user.p12 -K PKCS12 -aimport-key -aimport-certificate -k --touch-policy=always


What’s the problem with NUMA anyway?

Posted by William Brown on November 06, 2017 02:00 PM

What’s the problem with NUMA anyway?

What is NUMA?

Non-Uniform Memory Architecture is a method of seperating ram and memory management units to be associated with CPU sockets. The reason for this is performance - if multiple sockets shared a MMU, they will cause each other to block, delaying your CPU.

To improve this, each NUMA region has it’s own MMU and RAM associated. If a CPU can access it’s local MMU and RAM, this is very fast, and does not prevent another CPU from accessing it’s own. For example:

CPU 0   <-- QPI --> CPU 1
  |                   |
  v                   v
MMU 0               MMU 1
  |                   |
  v                   v
RAM 1               RAM 2

For example, on the following system, we can see 1 numa region:

# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 12188 MB
node 0 free: 458 MB
node distances:
node   0
  0:  10

On this system, we can see two:

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 24 25 26 27 28 29 30 31 32 33 34 35
node 0 size: 32733 MB
node 0 free: 245 MB
node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23 36 37 38 39 40 41 42 43 44 45 46 47
node 1 size: 32767 MB
node 1 free: 22793 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

This means that on the second system there is 32GB of ram per NUMA region which is accessible, but the system has total 64GB.

The problem

The problem arises when a process running on NUMA region 0 has to access memory from another NUMA region. Because there is no direct connection between CPU 0 and RAM 1, we must communicate with our neighbour CPU 1 to do this for us. IE:

CPU 0 --> CPU 1 --> MMU 1 --> RAM 1

Not only do we pay a time delay price for the QPI communication between CPU 0 and CPU 1, but now CPU 1’s processes are waiting on the MMU 1 because we are retrieving memory on behalf of CPU 0. This is very slow (and can be seen by the node distances in the numactl –hardware output).

Today’s work around

The work around today is to limit your Directory Server instance to a single NUMA region. So for our example above, we would limit the instance to NUMA region 0 or 1, and treat the instance as though it only has access to 32GB of local memory.

It’s possible to run two instances of DS on a single server, pinning them to their own regions and using replication between them to provide synchronisation. You’ll need a load balancer to fix up the TCP port changes, or you need multiple addresses on the system for listening.

The future

In the future, we’ll be adding support for better copy-on-write techniques that allow the cores to better cache content after a QPI negotiation - but we still have to pay the transit cost. We can minimise this as much as possible, but there is no way today to avoid this penalty. To use all your hardware on a single instance, there will always be a NUMA cost somewhere.

The best solution is as above: run an instance per NUMA region, and internally provide replication for them. Perhaps we’ll support an automatic configuration of this in the future.

GSoC 2017 - Mentor Report from 389 Project

Posted by William Brown on August 23, 2017 02:00 PM

GSoC 2017 - Mentor Report from 389 Project

This year I have had the pleasure of being a mentor for the Google Summer of Code program, as part of the Fedora Project organisation. I was representing the 389 Directory Server Project and offered students the oppurtunity to work on our command line tools written in python.


From the start we have a large number of really talented students apply to the project. This was one of the hardest parts of the process was to choose a student, given that I wanted to mentor all of them. Sadly I only have so many hours in the day, so we chose Ilias, a student from Greece. What really stood out was his interest in learning about the project, and his desire to really be part of the community after the project concluded.

The project

The project was very deliberately “loose” in it’s specification. Rather than giving Ilias a fixed goal of you will implement X, Y and Z, I chose to set a “broad and vague” task. Initially I asked him to investigate a single area of the code (the MemberOf plugin). As he investigated this, he started to learn more about the server, ask questions, and open doors for himself to the next tasks of the project. As these smaller questions and self discoveries stacked up, I found myself watching Ilias start to become a really complete developer, who could be called a true part of our community.

Ilias’ work was exceptional, and he has documented it in his final report here .

Since his work is complete, he is now free to work on any task that takes his interest, and he has picked a good one! He has now started to dive deep into the server internals, looking at part of our backend internals and how we dump databases from id2entry to various output formats.

What next?

I will be participating next year - Sadly, I think the python project oppurtunities may be more limited as we have to finish many of these tasks to release our new CLI toolset. This is almost a shame as the python components are a great place to start as they ease a new contributor into the broader concepts of LDAP and the project structure as a whole.

Next year I really want to give this oppurtunity to an under-represented group in tech (female, poc, etc). I personally have been really inspired by Noriko and I hope to have the oppurtunity to pass on her lessons to another aspiring student. We need more engineers like her in the world, and I want to help create that future.

Advice for future mentors

Mentoring is not for everyone. It’s not a task which you can just send a couple of emails and be done every day.

Mentoring is a process that requires engagement with the student, and communication and the relationship is key to this. What worked well was meeting early in the project, and working out what community worked best for us. We found that email questions and responses worked (given we are on nearly opposite sides of the Earth) worked well, along with irc conversations to help fix up any other questions. It would not be uncommon for me to spend at least 1 or 2 hours a day working through emails from Ilias and discussions on IRC.

A really important aspect of this communication is how you do it. You have to balance positive communication and encouragement, along with critcism that is constructive and helpful. Empathy is a super important part of this equation.

My number one piece of advice would be that you need to create an environment where questions are encouraged and welcome. You can never be dismissive of questions. If ever you dismiss a question as “silly” or “dumb”, you will hinder a student from wanting to ask more questions. If you can’t answer the question immediately, send a response saying “hey I know this is important, but I’m really busy, I’ll answer you as soon as I can”.

Over time you can use these questions to help teach lessons for the student to make their own discoveries. For example, when Ilias would ask how something worked, I would send my response structured in the way I approached the problem. I would send back links to code, my thoughts, and how I arrived at the conclusion. This not only answered the question but gave a subtle lesson in how to research our codebase to arrive at your own solutions. After a few of these emails, I’m sure that Ilias has now become self sufficent in his research of the code base.

Another valuable skill is that overtime you can help to build confidence through these questions. To start with Ilias would ask “how to implement” something, and I would answer. Over time, he would start to provide ideas on how to implement a solution, and I would say “X is the right one”. As time went on I started to answer his question with “What do you think is the right solution and why?”. These exchanges and justifications have (I hope) helped him to become more confident in his ideas, the presentation of them, and justification of his solutions. It’s led to this excellent exchange on our mailing lists, where Ilias is discussing the solutions to a problem with the broader community, and working to a really great answer.

Final thoughts

This has been a great experience for myself and Ilias, and I really look forward to helping another student next year. I’m sure that Ilias will go on to do great things, and I’m happy to have been part of his journey.

So you want to script gdb with python …

Posted by William Brown on August 03, 2017 02:00 PM

So you want to script gdb with python …

Gdb provides a python scripting interface. However the documentation is highly technical and not at a level that is easily accessible.

This post should read as a tutorial, to help you understand the interface and work toward creating your own python debuging tools to help make gdb usage somewhat “less” painful.

The problem

I have created a problem program called “naughty”. You can find it here .

You can compile this with the following command:

gcc -g -lpthread -o naughty naughty.c

When you run this program, your screen should be filled with:

thread ...
thread ...
thread ...
thread ...
thread ...
thread ...

It looks like we have a bug! Now, we could easily see the issue if we looked at the C code, but that’s not the point here - lets try to solve this with gdb.

gdb ./naughty
(gdb) run
[New Thread 0x7fffb9792700 (LWP 14467)]
thread ...

Uh oh! We have threads being created here. We need to find the problem thread. Lets look at all the threads backtraces then.

Thread 129 (Thread 0x7fffb3786700 (LWP 14616)):
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff78e936f in clone () from /lib64/libc.so.6

Thread 128 (Thread 0x7fffb3f87700 (LWP 14615)):
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff78e936f in clone () from /lib64/libc.so.6

Thread 127 (Thread 0x7fffb4788700 (LWP 14614)):
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff78e936f in clone () from /lib64/libc.so.6


We have 129 threads! Anyone of them could be the problem. We could just read these traces forever, but that’s a waste of time. Let’s try and script this with python to make our lives a bit easier.

Python in gdb

Python in gdb works by bringing in a copy of the python and injecting a special “gdb” module into the python run time. You can only access the gdb module from within python if you are using gdb. You can not have this work from a standard interpretter session.

We can access a dynamic python runtime from within gdb by simply calling python.

(gdb) python
>print("hello world")
>hello world

The python code only runs when you press Control D.

Another way to run your script is to import them as “new gdb commands”. This is the most useful way to use python for gdb, but it does require some boilerplate to start.

import gdb

class SimpleCommand(gdb.Command):
    def __init__(self):
        # This registers our class as "simple_command"
        super(SimpleCommand, self).__init__("simple_command", gdb.COMMAND_DATA)

    def invoke(self, arg, from_tty):
        # When we call "simple_command" from gdb, this is the method
        # that will be called.
        print("Hello from simple_command!")

# This registers our class to the gdb runtime at "source" time.

We can run the command as follows:

(gdb) source debug_naughty.py
(gdb) simple_command
Hello from simple_command!

Solving the problem with python

So we need a way to find the “idle threads”. We want to fold all the threads with the same frame signature into one, so that we can view anomalies.

First, let’s make a “stackfold” command, and get it to list the current program.

class StackFold(gdb.Command):
def __init__(self):
    super(StackFold, self).__init__("stackfold", gdb.COMMAND_DATA)

def invoke(self, arg, from_tty):
    # An inferior is the 'currently running applications'. In this case we only
    # have one.
    inferiors = gdb.inferiors()
    for inferior in inferiors:


To reload this in the gdb runtime, just run “source debug_naughty.py” again. Try running this: Note that we dumped a heap of output? Python has a neat trick that dir and help can both return strings for printing. This will help us to explore gdb’s internals inside of our program.

We can see from the inferiors that we have threads available for us to interact with:

class Inferior(builtins.object)
 |  GDB inferior object
 |  threads(...)
 |      Return all the threads of this inferior.

Given we want to fold the stacks from all our threads, we probably need to look at this! So lets get one thread from this, and have a look at it’s help.

inferiors = gdb.inferiors()
for inferior in inferiors:
    thread_iter = iter(inferior.threads())
    head_thread = next(thread_iter)

Now we can run this by re-running “source” on our script, and calling stackfold again, we see help for our threads in the system.

At this point it get’s a little bit less obvious. Gdb’s python integration relates closely to how a human would interact with gdb. In order to access the content of a thread, we need to change the gdb context to access the backtrace. If we were doing this by hand it would look like this:

(gdb) thread 121
[Switching to thread 121 (Thread 0x7fffb778e700 (LWP 14608))]
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff78e936f in clone () from /lib64/libc.so.6

We need to emulate this behaviour with our python calls. We can swap to the thread’s context with:

class InferiorThread(builtins.object)
 |  GDB thread object
 |  switch(...)
 |      switch ()
 |      Makes this the GDB selected thread.

Then once we are in the context, we need to take a different approach to explore the stack frames. We need to explore the “gdb” modules raw context.

inferiors = gdb.inferiors()
for inferior in inferiors:
    thread_iter = iter(inferior.threads())
    head_thread = next(thread_iter)
    # Move our gdb context to the selected thread here.

Now that we have selected our thread’s context, we can start to explore here. gdb can do a lot within the selected context - as a result, the help output from this call is really large, but it’s worth reading so you can understand what is possible to achieve. In our case we need to start to look at the stack frames.

To look through the frames we need to tell gdb to rewind to the “newest” frame (ie, frame 0). We can then step down through progressively older frames until we exhaust. From this we can print a rudimentary trace:


# Reset the gdb frame context to the "latest" frame.
# Now, work down the frames.
cur_frame = gdb.selected_frame()
while cur_frame is not None:
    # get the next frame down ....
    cur_frame = cur_frame.older()
(gdb) stackfold

Great! Now we just need some extra metadata from the thread to know what thread id it is so the user can go to the correct thread context. So lets display that too:


# These are the OS pid references.
(tpid, lwpid, tid) = head_thread.ptid
# This is the gdb thread number
gtid = head_thread.num
print("tpid %s, lwpid %s, tid %s, gtid %s" % (tpid, lwpid, tid, gtid))
# Reset the gdb frame context to the "latest" frame.
(gdb) stackfold
tpid 14485, lwpid 14616, tid 0, gtid 129

At this point we have enough information to fold identical stacks. We’ll iterate over every thread, and if we have seen the “pattern” before, we’ll just add the gdb thread id to the list. If we haven’t seen the pattern yet, we’ll add it. The final command looks like:

def invoke(self, arg, from_tty):
    # An inferior is the 'currently running applications'. In this case we only
    # have one.
    stack_maps = {}
    # This creates a dict where each element is keyed by backtrace.
    # Then each backtrace contains an array of "frames"
    inferiors = gdb.inferiors()
    for inferior in inferiors:
        for thread in inferior.threads():
            # Change to our threads context
            # Get the thread IDS
            (tpid, lwpid, tid) = thread.ptid
            gtid = thread.num
            # Take a human readable copy of the backtrace, we'll need this for display later.
            o = gdb.execute('bt', to_string=True)
            # Build the backtrace for comparison
            backtrace = []
            cur_frame = gdb.selected_frame()
            while cur_frame is not None:
                cur_frame = cur_frame.older()
            # Now we have a backtrace like ['pthread_cond_wait@@GLIBC_2.3.2', 'lazy_thread', 'start_thread', 'clone']
            # dicts can't use lists as keys because they are non-hashable, so we turn this into a string.
            # Remember, C functions can't have spaces in them ...
            s_backtrace = ' '.join(backtrace)
            # Let's see if it exists in the stack_maps
            if s_backtrace not in stack_maps:
                stack_maps[s_backtrace] = []
            # Now lets add this thread to the map.
            stack_maps[s_backtrace].append({'gtid': gtid, 'tpid' : tpid, 'bt': o} )
    # Now at this point we have a dict of traces, and each trace has a "list" of pids that match. Let's display them
    for smap in stack_maps:
        # Get our human readable form out.
        o = stack_maps[smap][0]['bt']
        for t in stack_maps[smap]:
            # For each thread we recorded
            print("Thread %s (LWP %s))" % (t['gtid'], t['tpid']))

Here is the final output.

(gdb) stackfold
Thread 129 (LWP 14485))
Thread 128 (LWP 14485))
Thread 127 (LWP 14485))
Thread 10 (LWP 14485))
Thread 9 (LWP 14485))
Thread 8 (LWP 14485))
Thread 7 (LWP 14485))
Thread 6 (LWP 14485))
Thread 5 (LWP 14485))
Thread 4 (LWP 14485))
Thread 3 (LWP 14485))
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff78e936f in clone () from /lib64/libc.so.6

Thread 2 (LWP 14485))
#0  0x00007ffff78d835b in write () from /lib64/libc.so.6
#1  0x00007ffff78524fd in _IO_new_file_write () from /lib64/libc.so.6
#2  0x00007ffff7854271 in __GI__IO_do_write () from /lib64/libc.so.6
#3  0x00007ffff7854723 in __GI__IO_file_overflow () from /lib64/libc.so.6
#4  0x00007ffff7847fa2 in puts () from /lib64/libc.so.6
#5  0x00000000004007e9 in naughty_thread (arg=0x0) at naughty.c:27
#6  0x00007ffff7bbd3a9 in start_thread () from /lib64/libpthread.so.0
#7  0x00007ffff78e936f in clone () from /lib64/libc.so.6

Thread 1 (LWP 14485))
#0  0x00007ffff7bbe90d in pthread_join () from /lib64/libpthread.so.0
#1  0x00000000004008d1 in main (argc=1, argv=0x7fffffffe508) at naughty.c:51

With our stackfold command we can easily see that threads 129 through 3 have the same stack, and are idle. We can see that tread 1 is the main process waiting on the threads to join, and finally we can see that thread 2 is the culprit writing to our display.

My solution

You can find my solution to this problem as a reference implementation here .

Time safety and Rust

Posted by William Brown on July 11, 2017 02:00 PM

Time safety and Rust

Recently I have had the great fortune to work on this ticket . This was an issue that stemmed from an attempt to make clock performance faster. Previously, a call to time or clock_gettime would involve a context switch an a system call (think solaris etc). On linux we have VDSO instead, so we can easily just swap to the use of raw time calls.

The problem

So what was the problem? And how did the engineers of the past try and solve it?

DS heavily relies on time. As a result, we call time() a lot in the codebase. But this would mean context switches.

So a wrapper was made called “current_time()”, which would cache a recent output of time(), and then provide that to the caller instead of making the costly context switch. So the code had the following:

static time_t   currenttime;
static int      currenttime_set = 0;

    if ( !currenttime_set ) {
        currenttime_set = 1;

    time( &currenttime );
    return( currenttime );

current_time( void )
    if ( currenttime_set ) {
        return( currenttime );
    } else {
        return( time( (time_t *)0 ));

In another thread, we would poll this every second to update the currenttime value:

void *
time_thread(void *nothing __attribute__((unused)))
    PRIntervalTime    interval;

    interval = PR_SecondsToInterval(1);

    while(!time_shutdown) {
        csngen_update_time ();


So what is the problem here

Besides the fact that we may not poll accurately (meaning we miss seconds but always advance), this is not thread safe. The reason is that CPU’s have register and buffers that may cache both stores and writes until a series of other operations (barriers + atomics) occur to flush back out to cache. This means the time polling thread could update the clock and unless the POLLING thread issues a lock or a barrier+atomic, there is no guarantee the new value of currenttime will be seen in any other thread. This means that the only way this worked was by luck, and no one noticing that time would jump about or often just be wrong.

Clearly this is a broken design, but this is C - we can do anything.

What if this was Rust?

Rust touts mulithread safety high on it’s list. So lets try and recreate this in rust.

First, the exact same way:

use std::time::{SystemTime, Duration};
use std::thread;

static mut currenttime: Option<SystemTime> = None;

fn read_thread() {
    let interval = Duration::from_secs(1);

    for x in 0..10 {
        let c_time = currenttime.unwrap();
        println!("reading time {:?}", c_time);

fn poll_thread() {
    let interval = Duration::from_secs(1);

    for x in 0..10 {
        currenttime = Some(SystemTime::now());
        println!("polling time");

fn main() {
    let poll = thread::spawn(poll_thread);
    let read = thread::spawn(read_thread);

Rust will not compile this code.

> rustc clock.rs
error[E0133]: use of mutable static requires unsafe function or block
  --> clock.rs:13:22
13 |         let c_time = currenttime.unwrap();
   |                      ^^^^^^^^^^^ use of mutable static

error[E0133]: use of mutable static requires unsafe function or block
  --> clock.rs:22:9
22 |         currenttime = Some(SystemTime::now());
   |         ^^^^^^^^^^^ use of mutable static

error: aborting due to 2 previous errors

Rust has told us that this action is unsafe, and that we shouldn’t be modifying a global static like this.

This alone is a great reason and demonstration of why we need a language like Rust instead of C - the compiler can tell us when actions are dangerous at compile time, rather than being allowed to sit in production code for years.

For bonus marks, because Rust is stricter about types than C, we don’t have issues like:

int c_time = time();

Which is a 2038 problem in the making :)

RetroFlix / PI Switch Followup

Posted by Mo Morsi on July 05, 2017 07:22 PM

I've been trying to dedicate some cycles to wrapping up the Raspberry PI entertainment center project mentioned a while back. I decided to abandon the PI Switch idea as the original controller which was purchased for it just did not work properly (or should I say only worked sporadically/intermitantly). It being a cheap device bought online, it wasn't worth the effort to debug (funny enough I can't find the device on Amazon anymore, perhaps other people were having issues...).

Not being able to find another suitable gamepad to use as the basis for a snap together portable device, I bought a Rii wireless controller (which works great out of the box!) and dropped the project (also partly due to lack of personal interest). But the previously designed wall mount works great, and after a bit of work the PI now functions as a seamless media center.

Unfortunately to get it there, a few workarounds were needed. These are listed below (in no particular order).

<style> #rpi_setup li{ margin-bottom: 10px; } </style>
  • To start off, increase your GPU memory. This we be needed to run games with any reasonable performance. This can be accomplished through the Raspberry PI configuration interface.

    Rpi setup1 Rpi setup2

    Here you can also overclock your PI if your model supports it (v3.0 does not as evident w/ the screenshot, though there are workarounds)

  • If you are having trouble w/ the PI output resolution being too large / small for your tv, try adjusting the aspect ratio on your set. Previously mine was set to "theater mode", cutting off the edges of the device output. Resetting it to normal resolved the issue.

    Rpi setup3 Rpi setup5 Rpi setup4
  • To get the Playstation SixAxis controller working via bluetooth required a few steps.
    • Unplug your playstation (since it will boot by default when the controller is activated)
    • On the PI, run
              sudo bluetoothctl
    • Start the controller and watch for a new devices in the bluetoothctl output. Make note of the device id
    • Still in the bluetoothctl command prompt, run
              trust [deviceid]
    • In the Raspberry PI bluetooth menu, click 'make discoverable' (this can also be accomplished via the bluetoothctl command prompt with the discoverable on command) Rpi setup6
    • Finally restart the controller and it should autoconnect!
  • To install recent versions of Ruby you will need to install and setup rbenv. The current version in the RPI repos is too old to be of use (of interest for RetroFlix, see below)
  • Using mednafen requires some config changes, notabley to disable opengl output and enable SDL. Specifically change the following line from
          video.driver opengl
          video.driver sdl
    Unfortunately after alot of effort, I was not able to get mupen64 working (while most games start, as confirmed by audio cues, all have black / blank screens)... so no N64 games on the PI for now ☹
  • But who needs N64 when you have Nethack! ♥‿♥(the most recent version of which works flawlessly). In addition to the small tweaks needed to compile the latest version on Linux, inorder to get the awesome Nevanda tileset working, update include/config.h to enable XPM graphics:
        -/* # define USE_XPM */ /* Disable if you do not have the XPM library */
        +#define USE_XPM  /* Disable if you do not have the XPM library */
    Once installed, edit your nh/install/games/lib/nethackdir/NetHack.ad config file (in ~ if you installed nethack there), to reference the newtileset:
        -NetHack.tile_file: x11tiles
        +NetHack.tile_file: /home/pi/Downloads/Nevanda.xpm

Finally RetroFlix received some tweaking & love. Most changes were visual optimizations and eye candy (including some nice retro fonts and colors), though a workers were also added so the actual downloads could be performed without blocking the UI. Overall it's simple and works great, a perfect portal to work on those high scores!

That's all for now, look for some more updates on the ReFS front in the near future!

indexed search performance for ds - the mystery of the and query

Posted by William Brown on June 25, 2017 02:00 PM

indexed search performance for ds - the mystery of the and query

Directory Server is heavily based on set mathematics - one of the few topics I enjoyed during university. Our filters really boil down to set queries:


This filter describes the intersection of sets of objects containing “attr=val1” and “attr=val2”.

One of the properties of sets is that operations on them are commutative - the sets to a union or intersection may be supplied in any order with the same results. As a result, these are equivalent:


In the past I noticed an odd behaviour: that the order of filter terms in an ldapsearch query would drastically change the performance of the search. For example:


The later query may significantly outperform the former - but 10% or greater. I have never understood the reason why though. I toyed with ideas of re-arranging queries in the optimise step to put the terms in a better order, but I didn’t know what factors affected this behaviour.

Over time I realised that if you put the “more specific” filters first over the general filters, you would see a performance increase.

What was going on?

Recently I was asked to investigate a full table scan issue with range queries. This led me into an exploration of our search internals, and yielded the answer to the issue above.

Inside of directory server, our indexes are maintained as “pre-baked” searches. Rather than trying to search every object to see if a filter matches, our indexes contain a list of entries that match a term. For example:

uid=mark: 1, 2
uid=william: 3
uid=noriko: 4

From each indexed term we construct an IDList, which is the set of entries matching some term.

On a complex query we would need to intersect these. So the algorithm would iteratively apply this:

t1 = (a, b)
t2 = (c, t1)
t3 = (d, t2)

In addition, the intersection would allocate a new IDList to insert the results into.

What would happen is that if your first terms were large, we would allocate large IDLists, and do many copies into it. This would also affect later filters as we would need to check large ID spaces to perform the final intersection.

In the above example, consider a, b, c all have 10,000 candidates. This would mean t1, t2 is at least 10,000 IDs, and we need to do at least 20,000 comparisons. If d were only 3 candidates, this means that we then throw away the majority of work and allocations when we get to t3 = (d, t2).

What is the fix?

We now wrap each term in an idl_set processing api. When we get the IDList from each AVA, we insert it to the idl_set. This tracks the “minimum” IDList, and begins our intersection from the smallest matching IDList. This means that we have the quickest reduction in set size, and results in the smallest possible IDList allocation for the results. In my tests I have seen up to 10% improvement on complex queries.

For the example above, this means we procees d first, to reduce t1 to the smallest possible candidate set we can.

t1 = (d, a)
t2 = (b, t1)
t3 = (c, t2)

This means to create t2, t3, we will do an allocation that is bounded by the size of d (aka 3, rather than 10,000), we only need to perform fewer queries to reach this point.

A benefit of this strategy is that it means if on the first operation we find t1 is empty set, we can return immediately because no other intersection will have an impact on the operation.

What is next?

I still have not improved union performance - this is still somewhat affected by the ordering of terms in a filter. However, I have a number of ideas related to either bitmask indexes or disjoin set structures that can be used to improve this performance.

Stay tuned ….

Glad to be a Mentor of Google Summer Code again!

Posted by Tong Hui on May 25, 2017 04:44 PM

This year I will mentoring in FedoraProject, and help Mandy Wang finish her GSoC program about “Migrate Plinth to Fedora Server” which raised by me.

While, why I proposed this idea? Plinth is developed by Freedombox which is a Debian based project. The Freedombox is aiming for building a 100% free software self-hosting web server to deploy social applications on small machines. It provides online communication tools respecting user privacy and data ownership, replacing services provided by third-parties that under surveillance. Plinth is the front-end of Freedombox, written in Python.

This idea mainly about migrate Plinth from Deb-based to RPM-based, and make it available for Fedora Server which will running on ARM machines.

The main goal of this idea is to make Plinth works fine in Fedora Server or Minimal flavor, due to Plinth write APT commands hard coded, so it is better to make it more adoptive for RPM. The secondary goal is to make a RPM package for Plinth from source and setup a repo for it, so that everyone who use Fedora could use Plinth. Honestly to say, it is not a easy-hand task for GSoC student although I marked it “Novice” in Skill Level.

Mandy who were selected for finishing this idea is super adopted in my mind, because she write a very clear graph for how she will do in this GSoC.

It will start this May 30. I believe we will finish this mission successfully!

TLS Authentication and FreeRADIUS

Posted by William Brown on May 24, 2017 02:00 PM

TLS Authentication and FreeRADIUS

In a push to try and limit the amount of passwords sent on my network, I’m changing my wireless to use TLS certificates for authentication.


Kerberos - why the world moved on

Posted by William Brown on May 22, 2017 02:00 PM

Kerberos - why the world moved on

For a long time I have tried to integrate and improve authentication technologies in my own environments. I have advocated the use of GSSAPI, IPA, AD, and others. However, the more I have learnt, the further I have seen the world moving away. I want to explore some of my personal experiences and views as to why this occured, and what we can do.


Custom OSTree images

Posted by William Brown on May 21, 2017 02:00 PM

Custom OSTree images

Project Atomic is in my view, one of the most promising changes to come to linux distributions in a long time. It boasts the ability to atomicupgrade and alter your OS by maintaining A/B roots of the filesystem. It is currently focused on docker and k8s runtimes, but we can use atomic in other locations.


3D Printing Fun

Posted by Mo Morsi on May 14, 2017 06:52 PM

The PI Sw1tch project took a bit of a setback recently when we discovered the 5200mAH usb battery we had wasn't sufficient to drive the PI + display for any extensive period of time. I've ordered a higher-capacity battery from Amazon, but until it arrives the working prototype was redesigned into a snappable wall mount:

Wall pi2 Wall pi1

The current implementation can be found on thingiverse for all your 3D printing needs!

Additionally, we threw together a design for a wall mount for my last smartphone, the Samsung Intercept, which has been sitting on my shelf since I upgraded to the Huawei Union. The Intercept was a great phone, and it still works well, albiet a bit slow compared to modern devices (not to mention it only runs Android 2.2). But it more than suffices for a "smart" entertainment hub, and having mounted & wired up to my stereo system, I now have easy access to all the albums in the world (...that are available via youtube...). The device supposidly can be rooted, though I was not able to accomplish that myself and don't really care to spend more time figuring that out (really wouldn't gain much). But just goes to show how a little inguinity + some design work can go along way at reducing e-waste.

Wall intercept2 Wall intercept1

Now time to figure out what to do w/ my ancient Droid (the original A855!)

Keep on hacking!

RetroFlix - A Weekend Project

Posted by Mo Morsi on May 14, 2017 06:52 PM

Now that we have the 'mount' component of the PI Sw1tch, and an awesome way to playgames through the PI on our TV, we need a collection of games to play! It goes without saying that Nethack was installed (combined w/ ssh X11 forwarding = persistent graphical nethack anywhere = epicness!!!). But I also happen to have a huge box of Retro video games (dating back to my childhood), which would be good to load onto the device. But unfortunately cloning so many games would take some time, and there are already online databases of these backups, so I opted to write a small web app to download and manage the collection.

It can be found here and you can see some screens below:

My library Game info Game previews Game list

It was built as a Sinatra web service, simply acting as a frontend to a popular emulator database, allowing the user to navigate & preview app for various systems, and download / run them locally. The RetroFlix application itself is offered as a lightweight Microservice simply acting as a proxy to the required various underlying components. It's fairly simple to setup & install (see the README), and builds upon existing emulators & components the user has locally.

As with everything else, it's still a work in progress, but it already sufficing to relive classic memories!

<iframe frameborder="0" height="315" src="https://www.youtube.com/embed/QhH_iibOJv0" width="560"></iframe>

A New Site, A Fresh Start

Posted by Mo Morsi on April 29, 2017 03:59 AM

I started this blog 10 years ago. How the world has changed... (and yet is still the same...)

Recently I noticed my site was unaccessible. No 404, no error response, just a blank page. After a brief moment of panic, I ssh'd into my host provider and breathed a sign of relief upon discovering all db & fs entities intact, including the instance of Drupal which my site was based on (a horribly outdated instance mind you, and note I said was based). In any case, I suspect my (cheap) host provider updated their version of PHP or some critical library w/ the effect of my Drupal instance not working.


Having struggled w/ PHP & Drupal many times over the years, I was finally ready to go cold turkey, and migrated the blog to Middleman, which brings the awesomeness of Rails to static site generation. I am very much in love with Middleman right now, it's the perfect tool for this problem domain, it's incredibly easy to setup a new site, use any high level templating / markup / styling language to customize your frontend, throw in any js or other framework to handle dynamic interactions (including emscripten to run C in the browser), and you're good to go. Tailoring things on the fly is a cinch due to the convenient embedded webserver sporting live-reloading, and when you're ready to push to production it's a single command to build the static html. A quick rsync -azP synchronizes it w/ your webserver and now your site is available to the world at blazing speeds!

Anyways, enough Middleman gushing (but seriously check it out!). In addition to the port, I rethemed the site, be sure to also check out the new design if your reading this via rss. Note mobile browser UI's aren't currently supported, so no angry emails if you can't read things on your phone! (I know their coming...)

Be sure to stay subscribed to github for updates. I'm hoping virtfs-refs will see some love soon if I can figure out how to extend the current fs parsing mechanisms w/ file content retrieval. We've also been prototyping designs for the PI Switch project I mentioned a while back, more updates on that soon as things progress.

Keep surfing!!!

Compiling / Playing NetHack 3.6.0 on Fedora

Posted by Mo Morsi on April 26, 2017 08:43 PM

The following are the simplest instructions required to compile NetHack 3.6.0 for Fedora 25.

Why might you want to compile NetHack from source, instead of simply installing the package (sudo dnf install nethack)? For many reasons. Applying patches for custom game mechanics. Running an alternate frontend. And more!

While the official Linux instructions are complete, they are pretty involved and must be followed exactly for things to work. To give the dev team credit, they’ve been supporting a plethora of platforms and environments for 20+ years (and the number is still increasing). While a consolidated guide was written for compiling NetHack from scratch on Ubuntu/Debian but nothing exists for Fedora… until now!

# On a fresh Fedora installation (with updates) install the dependencies:

$ sudo dnf install ncurses-devel libXt-devel libXaw-devel byacc flex

# Download the NetHack (3.6.0) source tarball from the official site and unpack it:

$ tar xzvf [download]
$ cd nethack-3.6.0/

# Run the base setup utility for Linux:

$ cd sys/unix
$ ./setup.sh hints/linux
$ cd ../..

# Edit [include/unixconf.h] to uncomment the following line…

#define LINUX

# Edit [include/config.h] to uncomment the following line…

#define X11_GRAPHICS

# Edit [src/Makefile] and update the following lines…


# …to look like so


# Edit [Makefile] to uncomment the following line

VARDATND = x11tiles NetHack.ad pet_mark.xbm pilemark.xpm rip.xpm

# In previous line, apply this bugfix by changing…


# …to


# Build and install the game

$ make all
$ make install

# Finally create [~/.nethackrc] config file and populate it with the following: OPTIONS=windowtype:x11

# To play:

$ ~/nh/install/games/nethack

Go get that Amulet!

Project Idea - PI Sw1tch

Posted by Mo Morsi on April 25, 2017 12:07 PM

While gaming is not high on my agenda anymore (... or rather at all), I have recently been mulling buying a new console, to act as much as a home entertainment center as a gaming system.

Having owned several generations PlayStation and Sega products, a few new consoles caught my eye. While the most "open" solution, the Steambox sort-of fizzled out, Nintendo's latest console Switch does seem to stand out of the crowd. The balance between power and portability looks like a good fit, and given Nintendo's previous successes, it wouldn't be surprising if it became a hit.

In addition to the separate home and mobile gaming markets, new entertainment mechanisms are needing to provide seamless integration between the two environments, as well as offer comprehensive data and information access capabilities. After all what'd be the point of a gaming tablet if you couldn't watch Youtube on it! Neal Stephenson recently touched on this at his latest TechCrunch talk, by expressing a vision of technology that is more integrated/synergized with our immediate environment. While mobile solutions these days offer a lot in terms of processing power, nothing quite offers the comfort or immersion that a console / home entertainment solution provides (not to mention mobile phones being horrendous interfaces for gaming purposes!)

Being the geek that I am, this naturally led me to thinking about developing a hybrid mechanism of my own, based on open / existing solutions, so that it could be prototyped and demonstrated quickly. Having recently bought a Raspeberry PI (after putting my Arduino to use in my last microcontroller project), and a few other odds and end pieces, I whipped up the following:

Pi sw1tch

The idea is simple, the Raspberry PI would act as the 'console', with a plethora of games and 'apps' available (via open repositories, steam, emulators, and many more... not to mention Nethack!). It would be anchorable to the wall, desk, or any other surface by using a 3D-printed mount, and made portable via a cheap wireless controller / LCD display / battery pack setup (tied together through another custom 3D printed bracket). The entire rig would be quickly assemblable and easy to use, simply snap the PI into the wall to play on your TV; remove and snap into the controller bracket to take it on the go.

I suspect the power component is going to be the most difficult to nail down, finding an affordable USB power source that is lightweight but offers sufficient juice to drive the Raspberry PI w/ LCD might be tricky. But if this is done correctly, all components will be interchangeable, and one can easily plug in a lower-power microcontroller and/or custom hardware component for a tailored experience.

If there is any interest, let me know via email. If 3 or so people commit, this could be done in a weekend! (stay tuned for updates!)

Nethack Encyclopedia Reduxd

Posted by Mo Morsi on April 24, 2017 05:23 PM

I've been working on way too many projects recently... Alas, I was able to slip in some time to update the NetHack Encyclopedia app on the Android MarketPlace (first released nearly 5 years ago!).

Version 5.3 brings several features including new useful tools. The first is the Message Searcher that allows the user to quickly query the many cryptic game messages by substring & context. Additionally the Game Tracker has been implemented, faciliting player, item, and level identification in a persistant manner. Simply enter entity attributes as they are discovered and the tracker will deduce the remaining missing information based on its internal alogrithm. This is ontop of many enhancements to the backend including the incorporation of a searchable item database.

The logic of the application has been highly refactored & cleaned up, the code has come along ways since first being written. In large, I feel pretty comfortable with the Android platform at the current time, it has its nuances, but all platorms do, and it's pretty easy to go from concept to implementation.

As far as the game itself, I have a ways to go before retrieving the Amulet! It's quite a challenge, but you learn with every replay, and thus you get closer. Ascension will be mine! (someday)

Nethack 5.3 screen1 Nethack 5.3 screen2 Nethack 5.3 screen3 Nethack 5.3 screen4

Lessons on Aikido and Life via Splix

Posted by Mo Morsi on April 24, 2017 05:23 PM

Recently, I've stumbled upon splix, a new obsession game, with simple mechanics that unfold into a complex competitive challenge requiring fast reflexes and dynamic tactics.

Splix intro

At the core the rule set is very simple: - surround territory to claim it - do not allow other players to hit your tail (you lose... game over)

Splix overextended

While in your territory you have no tail, rendering you invulnerable, but during battles territory is always changing, and you don't want to get caught deep on an attack just to be surrounded by an enemy who swaps the territory alignment to his!

Splix deception

The simple dynamic yields an unbelievable amount of strategy & tactics to excel at while at the same time requiring quick calculation and planning. A foolheardy player will just rush into enemy territory to attempt to capture squares and attack his opponent but a smart player will bait his opponent into his sphere of influence through tactful strikes and misdirections.

Splix bait

Furthermore we see age old adages such as "better to run and fight another day" and the wisdom of pitting opponents against each other. Alliances are always shifting in splix, it simply takes a single tap from any other player to end your game. So while you may be momentarily coordinating with another player to surround and obliterate a third, watch your back as the alliance may dissove at the first opportunity (not to mention the possiblity of outside players appearing anytime!)

Splix alliance

All in all, I've found careful observation and quick action to yield the most successful results on the battlefield. The ideal kill is from behind an opponent who has periously invaded your territory deeply. Beyond this, lurking at the border so as the goad the enemy into a foolheardy / reckless attack is a robust tactic provided you have built up the relfexes and coordination to quickly move in and out of territory which is constantly changing. Make sure you don't fall suspect to your own trick and overpenetrate the enemy border!

Splix bait2

Another tactic to deal w/ an overly aggressive opponent is to slightly fallback into your safe zone to quickly return to the front afterwords, perhaps at a different angle or via a different route. Often a novice opponent will see the retreat as a sign of fear or weakness and become over confident, penetrating deep into your territory in the hopes of securing a large portion quickly. By returning to the front at an unexpected moment, you will catch the opponents off guard and be able to destroy them before they have a chance to retreat to their safe zone.

Splix draw out

Of course if the opponent employs the same strategy, a player can take a calculated risk and drive a distance into the enemy territory before returning to the safe zone. By paying attention to the percentage of visible territory which the player's vulnerability zone occupies and the relative position of the opponent, they should be able to safely guage the safe distance to which they can extend so as to ensure a safe return. Taking large amounts of territory quickly is psychologically damaging to an opponent, especially one undergoing attacks on multiple fronts.

Splix draw out2

If all else fails to overcome a strong opponent, a reasonable retreat followed by an alternate attack vector may result in success. Since in splix we know that an safe zone corresponds to only one enemy, if we can guage / guess where they are, we can attempt to alter the dynamics of the battle accordingly. If we see that an opponent has stretch far beyond the mass of his safe zone via a single / thin channel, we can attempt to cut them off, preventing a retreat without crossing your sphere of influence.

Splix changing

This dynamic becomes even more pronounced if we can encircle an opponent, and start slowly reducing his control of the board. By slowly but mechanically & gradually taking enemy territory we can drive an opponent in a desired direction, perhaps towards a wall or other player.

Splix tactics2

Regardless of the situation, the true strategist will always be shuffling his tactics and actions to adapt to the board and setup the conditions for guaranteed victory. At no point should another player be underestimated or trusted. Even a new player with little territory can pose a threat to the top of the leader board given the right conditions and timing. The victorious will stay clam in the heat of the the battle, and use careful observations, timing, and quick reflexes to win the game.

(<endnote> the game *requires* a keyboard, it can be played via smartphone (swapping) but the arrow keys yields the fastest feedback</endnode>)

Search and Replace The VIM Way

Posted by Mo Morsi on April 24, 2017 04:18 PM

Did you know that it is 2017 and the VIM editor still does not have a decent multi-file search and replacement mechanism?! While you can always roll your own, it’s rather cumbersome, and even though some would say this isn’t in the spirit of an editor such as VIM, a large community has emerged around extending it in ways to behave more like a traditional IDE.

Having written about doing something similar to this via the cmd line a while back, and having refactored a large amount of code recently that involved lots of renaming, I figured it was time to write a plugin to do just that, rename strings across source files, using grep and sed

Before we begin, it should be noted that this is of most use with a ‘rooting’ plugin like vim-rooter. By using this, you will ensure vim is always running in the root directory of the project you are working on, regardless of the file being modified. Thus all search & replace commands will be run relative to the top project dir.

To install vsearch, we use Vundle. Setup & installation of that is out of scope for this article, but I highly recommend familiarizing yourself with Vundle as it’s the best Vim plugin management system (in my opinion).

Once Vundle is installed, using vsearch is as simple as adding the following to your ~/.vim/vimrc:

Plugin ‘movitto/vim-vsearch’

Restart Vim and run :PluginInstall to install vsearch from github. Now you’re good to go!

vsearch provides two commands :VSearch and :VReplace.

VSearch simply runs grep and displays the results, without interrupting the buffer you are currently editing.

VReplace runs a search in a similar manner to VSearch but also performs and in-memory string replacement using the specified args. This is displayed to the user who is prompted for comfirmation. Upon receiving it, the plugin then executes sed and reports the results.

VirtFS New Plugin Guide

Posted by Mo Morsi on April 24, 2017 03:27 PM

Having recently extracted much of the FS interface from MiQ into virtfs plugins, it was a good time to write a guide on how to write a new plugin from scratch. It is attached below.

This document details the process of writing a new VirtFS plugin from scratch.

Plugins may be written for many targets, from traditional filesystems (EXT, FAT, XFS), to filesystem-like entities, such as databases and object repositories, to things completely unrelated all together. Once written, VirtFS will use the plugin to expose the underlying component via the Ruby Filesystem API. Simply issue File & Dir calls to files under the specified mountpoint, and VirtFS will take care of the remaining details.

This guide assumes basic familiarity with the Ruby language and gem project format, in this tutorial we will be creating a new gem called virtfs-hellofs for our ‘hello’ filesystem, based on a simple JSON map.

Note, the end result can be seen at virtfs-hellofs

Initial Project Layout

Create a new working directory with the following contents:


TODO: a generator [patches are welcome!]

Required Components

The following components are required to define a full-fledged filesystem plugin:

  • A ‘mounting’ mechanism - Allows VirtFS to load your FS at the specified filesystem path / mountpoint.

  • Core File and Dir classes and class methods - VirtFS maps standard Ruby FS operations to their equivalent plugin calls

  • FS specific representations - the internal representation of filesystem constructs being implemented so as to satisfy the core class calls

Upon instantiation, a fs-specific ‘blocklike device’ is often required so as to provide block-level seek/read/write operations (such as from a physical disk, disk image, or other).

Eventually this will be implemented via a separate abstraction hierarchy, but for the time being virt-disk provides basic functionality to read simple file-based “devices”. Since we are only using a simply in-memory JSON based fs, we do not need to pull in virt_disk here.

Core functionality

First we will define the FS class providing our filesystem interface:


  module VirtFS::HelloFS
    class FS
      include DirClassMethods
      include FileClassMethods

      attr_accessor :mount_point, :superblock

      # Return bool indicating if device contains
      # a HelloFS instance
      def self.match?(device)
          Superblock.new(self, device)
          return true
        rescue => err
          return false

      # Initialze new HelloFS instance w/ the
      # specified device
      def initialize(device)
        @superblock  = Superblock.new(self, device)

      # Return root directory of the filesystem
      def root_dir

      def thin_interface?

      def umount
        @mount_point = nil
    end # class FS
  end # module VirtFS::HelloFS

Here we see a few things, particularly the inclusion of the Directory and File class methods satisfying the VirtFS API (more on those later) and the instantiation of a HelloFS specific Superblock construct.

In the #match? method, We verify the superblock of the underlying device matches that required by hellofs and we specify various core callbacks needed by VirtFS (particularly the #unmount and #thin_interface? methods, see this for more details on thin vs. thick interfaces).

The superblock class for HelloFS is simple, we implement our ‘filesystem’ through a simple json map, passed into virtfs on instantiation


module VirtFS::HelloFS
  # Top level filesystem construct.
  # In our case, we simply create a new
  # root directory from the HelloFS
  # json hash, but in most cases this
  # would parse / read top level metadata
  class Superblock
    attr_accessor :device

    def initialize(fs, device)
      @fs     = fs
      @device = device

    def root_dir
      Dir.new(self, device)
  end # class SuperBlock
end # module VirtFS::Hello


In the previous section the core fs class included two mixins, DirClassMethods and FileClassMethods implementing the VirtFS filesystem interface.


module VirtFS::HelloFS
  class FS
    # VirtFS Dir API implementation, dispatches
    # calls to underlying HelloFS constructs
    module DirClassMethods
      def dir_delete(p)

      def dir_entries(p)
        dir = get_dir(p)
        return nil if dir.nil?

      def dir_exist?(p)

      def dir_foreach(p, &block)
        r = get_dir(p).try(:glob_names)
                      .try(:each, &block)
        block.nil? ? r : nil

      def dir_mkdir(p, permissions)

      def dir_new(fs_rel_path, hash_args, _open_path, _cwd)


      def get_dir(p)
        names = p.split(/[\\\/]/)

        dir = get_dir_r(names)
        raise "Directory '#{p}' not found" if dir.nil?

      def get_dir_r(names)
        return root_dir if names.empty?

        # Check for this path in the cache.
        fname = names.join('/')

        name = names.pop
        pdir = get_dir_r(names)
        return nil if pdir.nil?

        de = pdir.find_entry(name)
        return nil if de.nil?

        Directory.new(self, superblock, de.inode)
    end # module DirClassMethods
  end # class FS
end # module VirtFS::HelloFS

This module implements the standard Ruby Dir Class operations including retrieving & modifying directory contents, and checking for file existence.

Particularly noteworthy is the get_dir method which returns the FS specific dir instance.


module VirtFS::HelloFS
  class FS
    # VirtFS file class implemention, dispatches requests
    # to underlying HelloFS constructs
    module FileClassMethods
      def file_atime(p)

      def file_blockdev?(p)

      def file_chardev?(p)

      def file_chmod(permission, p)
        raise "writes not supported"

      def file_chown(owner, group, p)
        raise "writes not supported"

      def file_ctime(p)

      def file_delete(p)

      def file_directory?(p)
        f = get_file(p)
        !f.nil? && f.dir?

      def file_executable?(p)

      def file_executable_real?(p)

      def file_exist?(p)

      def file_file?(p)
        f = get_file(p)
        !f.nil? && f.file?

      def file_ftype(p)

      def file_grpowned?(p)

      def file_identical?(p1, p2)

      def file_lchmod(permission, p)

      def file_lchown(owner, group, p)

      def file_link(p1, p2)

      def file_lstat(p)

      def file_mtime(p)

      def file_owned?(p)

      def file_pipe?(p)

      def file_readable?(p)

      def file_readable_real?(p)

      def file_readlink(p)

      def file_rename(p1, p2)

      def file_setgid?(p)

      def file_setuid?(p)

      def file_size(p)

      def file_socket?(p)

      def file_stat(p)

      def file_sticky?(p)

      def file_symlink(oname, p)

      def file_symlink?(p)

      def file_truncate(p, len)

      def file_utime(atime, mtime, p)

      def file_world_readable?(p)

      def file_world_writable?(p)

      def file_writable?(p)

      def file_writable_real?(p)

      def file_new(f, parsed_args, _open_path, _cwd)
        file = get_file(f)
        raise Errno::ENOENT, "No such file or directory" if file.nil?
        File.new(file, superblock)


        def get_file(p)
          dir, fname = VfsRealFile.split(p)

            dir_obj = get_dir(dir)
            dir_entry = dir_obj.nil? ? nil : dir_obj.find_entry(fname)
          rescue RuntimeError
    end # module FileClassMethods
  end # class FS
end # module VirtFS::HelloFS

The FileClassMethods module provides all the FS-specific funcality needed by Ruby to dispatch File Class calls (which contains a larger footprint than Dir, hence the need for more methods here).

Here we see many methods are not yet implemented. This is OK for the purposes of use in VirtFS but note any calls to the corresponding methods on a mounted filesystem will fail.

File and Dir classes

The final missing piece of the puzzle is the File and Dir classes. These provide standard interfaces which VirtFS can extract file and dir information.


module VirtFS::HelloFS
  # File class representation, responsible for
  # managing corresponding dir_entry attributes
  # and file content.
  # For HelloFS, files are simple in memory strings
  class File
    attr_accessor :superblock, :dir_entry

    def initialize(superblock, dir_entry)
      @sb        = superblock
      @dir_entry = dir_entry

    def to_h
      { :directory? => dir?,
        :file?      => file?,
        :symlink?   => false }

    def dir?

    def file?

    def fs

    def size
      dir? ? 0 : dir_entry.size

    def close
  end # class File
end # module VirtFS::HelloFS


module VirtFS::HelloFS
  # Dir class representation, responsible
  # for managing corresponding dir_entry
  # attributes
  # For HelloFS, dirs are simply nested
  # json maps
  class Dir
    attr_accessor :sb, :dir_entry

    def initialize(sb, dir_entry)
      @sb        = sb
      @dir_entry = dir_entry

    def close

    def glob_names

    def find_entry(name, type = nil)
      dir = type == :dir
      fle = type == :file

      return nil unless glob_names.include?(name)
      return nil if (dir && !dir_entry[name].is_a?(Hash)) ||
                    (fle && !dir_entry[name].is_a?(String))
      dir ? Dir.new(sb, dir_entry[name]) :
            File.new(sb, dir_entry[name])
  end # class Directory
end # module VirtFS::HelloFS

Again these are fairly straightforward, providing access to the underlying JSON map in a filesystem-like manner.


To finish, we’ll populate the project components required by every rubygem:


require "virtfs/hellofs.rb"


require "virtfs/hellofs/version"
require_relative 'hellofs/fs.rb'
require_relative 'hellofs/dir'
require_relative 'hellofs/file'
require_relative 'hellofs/superblock'


module VirtFS
  module HelloFS
    VERSION = "0.1.0"


lib = File.expand_path('../lib', __FILE__)
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
require 'virtfs/hellofs/version'

Gem::Specification.new do |spec|
  spec.name          = "virtfs-hellofs"
  spec.version       = VirtFS::HelloFS::VERSION
  spec.authors       = ["Cool Developers"]

  spec.summary       = %q{An HELLO based filesystem module for VirtFS}
  spec.description   = %q{An HELLO based filesystem module for VirtFS}
  spec.homepage      = "https://github.com/ManageIQ/virtfs-hellofs"
  spec.license       = "Apache 2.0"

  spec.files         = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
  spec.bindir        = "exe"
  spec.executables   = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
  spec.require_paths = ["lib"]

  spec.add_dependency "activesupport"
  spec.add_development_dependency "bundler"
  spec.add_development_dependency "rake", "~> 10.0"
  spec.add_development_dependency "rspec", "~> 3.0"
  spec.add_development_dependency "factory_girl"


source 'https://rubygems.org'

gem 'virtfs', "~> 0.0.1",
    :git => "https://github.com/ManageIQ/virtfs.git",
    :branch => "master"

# Specify your gem's dependencies in virtfs-hellofs.gemspec

group :test do
  gem 'virt_disk', "~> 0.0.1",
      :git => "https://github.com/ManageIQ/virt_disk.git",
      :branch => "initial"


require "bundler/gem_tasks"
require "rspec/core/rake_task"


task :default => :spec

Packaging It Up

Building virtfs-hellofs.gem is as simple as running:

rake build

in the project directory.

The gem will be written to the ‘pkg’ subdir and is ready for subsequent use / upload to rubygems.


To verify the plugin, create a test module which simply mounts a FS instance and dumps the directory contents:


require 'json'
require 'virtfs'
require 'virtfs/hellofs'

PATH = JSON.parse(File.read('hello.fs'))

exit 1 unless VirtFS::HelloFS::FS.match?(PATH)
fs = VirtFS::HelloFS::FS.new(PATH)

VirtFS.mount fs, '/'
puts VirtFS::VDir.entries('/')

We can create a simple JSON filesystem for testing purposes:


  "f1" : "foobar",
  "f2" : "barfoo",
  "d1" : { "sf1" : "fignewton",
           "sd1" : { "t" : "s" } }

Run the script, and if the directory contents are printed, you verified your FS!


rspec and factory_girl were added as development dependencies to the project and testing the new filesystem is as simple as adding new unit tests.

For ‘real’ filesystems, the plugin author will need to generate a ‘blocklike device’ image and populate it w/ the necessary test data.

Because large block image files are not condusive to source repository systems and automated build systems, virtfs-camcorderfs can be used to record and playback disk interactions in local dev environment, recording text based ‘cassettes’ which may be used to replicate disk interactions. See virtfs-camcorderfs for usage details.

Next Steps

We added barebones basic VirtFS functionality for our hellofs filesystem backend. From here, we can continue expanding upon this, providing read, write, and query support. Once implemented, VirtFS will use this filesystem like every other, providing seamless interchangeabilty!

Your Code Has Impact

Posted by William Brown on March 09, 2017 02:00 PM

Your Code Has Impact

As an engineer, sometimes it’s easy to forget why we are writing programs. Deep in a bug hunt, or designing a new feature it’s really easy to focus so hard on these small things you forget the bigger picture. I’ve even been there and made this mistake.


CVE-2017-2591 - DoS via OOB heap read

Posted by William Brown on February 21, 2017 02:00 PM

CVE-2017-2591 - DoS via OOB heap read

On 18 of Jan 2017, the following email found it’s way to my notifications .

This is to disclose the following CVE:

CVE-2017-2591 389 Directory Server: DoS via OOB heap read

Description :

The "attribute uniqueness" plugin did not properly NULL-terminate an array
when building up its configuration, if a so called 'old-style'
configuration, was being used (Using nsslapd-pluginarg<X> parameters) .

A attacker, authenticated, but possibly also unauthenticated, could
possibly force the plugin to read beyond allocated memory and trigger a

The crash could also possibly be triggered accidentally.

Upstream patch :
Affected versions : from

Fixed version : 1.3.6

Impact: Low
CVSS3 scoring : 3.7 -- CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:L

Upstream bug report : https://fedorahosted.org/389/ticket/48986

So I decided to pull this apart: Given I found the issue and wrote the fix, I didn’t deem it security worthy, so why was a CVE raised?


2016 – My Year in Review

Posted by Justin W. Flory on February 17, 2017 08:30 AM
2016 – My Year in Review

Before looking too far ahead to the future, it’s important to spend time to reflect over the past year’s events, identify successes and failures, and devise ways to improve. Describing my 2016 is a challenge for me to find the right words for. This post continues a habit I started last year with my 2015 Year in Review. One thing I discover nearly every day is that I’m always learning new things from various people and circumstances. Even though 2017 is already getting started, I want to reflect back on some of these experiences and opportunities of the past year.


When I started writing this in January, I read freenode‘s “Happy New Year!” announcement. Even though their recollection of the year began as a negative reflection, the freenode team did not fail to find some of the positives of this year as well. The attitude reflected in their blog post is reflective of the attitude of many others today. 2016 has brought more than its share of sadness, fear, and a bleak unknown, but the colors of radiance, happiness, and hope have not faded either. Even though some of us celebrated the end of 2016 and its tragedies, two thoughts stay in my mind.

One, it is fundamentally important for all of us to stay vigilant and aware of what is happening in the world around us. The changing political atmosphere of the world has brought a shroud of unknowing, and the changing of a number does not and will not signify the end of these doubts and fears. 2017 brings its own series of unexpected events. I don’t consider this a negative, but in order for it not to become a negative, we must constantly remain active and aware.

Secondly, despite the more bleak moments of this year, there has never been a more important time to embrace the positives of the past year. For every hardship faced, there is an equal and opposite reaction. Love is all around us and sometimes where we least expect it. Spend extra time this new year remembering the things that brought you happiness in the past year. Hold them close, but share that light of happiness with others too. You might not know how much it’s needed.

First year of university: complete!

Many things changed since I decided to pack up my life and go to a school a thousand miles away from my hometown. In May, I officially finished my first year at the Rochester Institute of Technology, finishing the full year on dean’s list. Even though it was only a single year, the changes from my decision to make the move are incomparable. Rochester exposed me to amazing, brilliant people. I’m connected to organizations and groups based on my interests like I never imagined. My courses are challenging, but interesting. If there is anything I am appreciative of in 2016, it is for the opportunities that have presented themselves to me in Rochester.

Adventures into FOSS@MAGIC

On 2016 Dec. 10th, the "FOSS Family" went to dinner at a local restaurant to celebrate the semester

On 2016 Dec. 10th, the “FOSS Family” went to dinner at a local restaurant to celebrate the semester

My involvement with the Free and Open Source Software (FOSS) community at RIT has grown exponentially since I began participating in 2015. I took my first course in the FOSS minor, Humanitarian Free and Open Source Software Development in spring 2016. In the following fall 2016 semester, I became the teaching assistant for the course. I helped show our community’s projects at Imagine RIT. I helped carry the RIT FOSS flag in California (more on that later). The FOSS@MAGIC initiative was an influencing factor for my decision to attend RIT and continues to play an impact in my life as a student.

I eagerly look forward to future opportunities for the FOSS projects and initiatives at RIT to grow and expand. Bringing open source into more students’ hands excites me!

I <3 WiC

With a new schedule, the fall 2016 semester marked the beginning of my active involvement with the Women in Computing (WiC) program at RIT, as part of the Allies committee. Together with other members of the RIT community, we work together to find issues in our community, discuss them and share experiences, and find ways to grow the WiC mission: to promote the success and advancement of women in their academic and professional careers.

WiCHacks 2016 Opening CeremonyIn spring 2016, I participated as a volunteer for WiCHacks, the annual all-female hackathon hosted at RIT. My first experience with WiCHacks left me impressed by all the hard work by the organizers and the entire atmosphere and environment of the event. After participating as a volunteer, I knew I wanted to become more involved with the organization. Fortunately, fall 2016 enabled me to become more active and engaged with the community. Even though I will be unable to attend WiCHacks 2017, I hope to help support the event in any way I can.

Also, hey! If you’re a female high school or university student in the Rochester area (or willing to do some travel), you should seriously check this out!

Google Summer of Code

Google Summer of Code, abbreviated to GSoC, is an annual program run by Google every year. Google works with open source projects to offer stipends for them to pay students to work on projects over the summer. In a last-minute decision to apply, I was accepted as a contributing student to the Fedora Project. My proposal was to work within the Fedora Infrastructure team to help automate the WordPress platforms with Ansible. My mentor, Patrick Uiterwijk, provided much of the motivation for the proposal and worked with me throughout the summer as I began learning Ansible for the first time. Over the course of the summer, my learned knowledge began to turn into practical experience.

It would be unfair for a reflection to count successes but not failures. GSoC was one of the most challenging and stressful activities I’ve ever participated in. It was a complete learning experience for me. One area I noted that I needed to improve on was communication. My failing point was not regularly communicating what I was working through or stuck on with my mentor and the rest of the Fedora GSoC community. GSoC taught me the value of asking questions often when you’re stuck, especially in an online contribution format.

On the positive side, GSoC helped formally introduce me to Ansible, and to a lesser extent, the value of automation in operations work. My work in GSoC helped enable me to become a sponsored sysadmin of Fedora, where I mostly focus my time contributing to the Badges site. Additionally, my experience in GSoC helped me when interviewing for summer internships (also more on this later).

Google Summer of Code came with many ups and downs. But I made it and passed the program. I’m happy and fortunate to have received this opportunity from the Fedora Project and Google. I learned several valuable lessons that have and will impact going forward into my career. I look forward to participating either as a mentor or organizer for GSoC 2017 with the Fedora Project this year.

Flock 2016

Group photo of all Flock 2016 attendees outside of the conference venue (Photo courtesy of Joe Brockmeier)

Group photo of all Flock 2016 attendees outside of the conference venue (Photo courtesy of Joe Brockmeier)

Towards the end of summer, in the beginning of August, I was accepted as a speaker to the annual Fedora Project contributor conference, Flock. As a speaker, my travel and accommodation were sponsored to the event venue in Kraków, Poland.

Months after Flock, I am still incredibly grateful for receiving the opportunity to attend the conference. I am appreciative and thankful to Red Hat for helping cover my costs to attend, which is something I would never be able to do on my own. Outside of the real work and productivity that happened during the conference, I am happy to have mapped names to faces. I met incredible people from all corners of the world and have made new lifelong friends (who I was fortunate to see again in 2017)! Flock introduced me in-person to the diverse and brilliant community behind the Fedora Project. It is an experience that will stay with me forever.

To read a more in-depth analysis of my time in Poland, you can read my full write-up of Flock 2016.

To Kraków for Flock with Bee, Amita, Jona, and Giannis

On a bus to the Kraków city center with Bee Padalkar, Amita Sharma, Jona Azizaj, and Giannis Konstantinidis (left to right).

Maryland (Bitcamp), Massachusetts (HackMIT), California (MINECON)

Bitcamp 2016: The Fedora Ambassadors of Bitcamp 2016

The Fedora Ambassadors at Bitcamp 2016. Left to right: Chaoyi Zha (cydrobolt), Justin W. Flory (jflory7), Mike DePaulo (mikedep333), Corey Sheldon (linuxmodder)

2016 provided me the opportunity to explore various parts of my country. Throughout the year, I attended various conferences to represent the Fedora Project, the SpigotMC project, and the RIT open source community.

There are three distinct events that stand out in my memory. For the first time, I visited the University of Maryland for Bitcamp as a Fedora Ambassador. It also provided me an opportunity to see my nation’s capitol for the first time. I also visited Boston for the first time this year as well for HackMIT, MIT’s annual hackathon event. I also participated as a Fedora Ambassador and met brilliant students from around the country (and even the world, with one student I met flying in from India for the weekend).

"Team Ubuntu" shows off their project to Charles Profitt before the project deadline for HackMIT 2016

“Team Ubuntu” shows off their project to Charles Profitt before the project deadline for HackMIT 2016

Lastly, I also took my first journey to the US west coast for MINECON 2016, the annual Minecraft convention. I attended as a staff member of the SpigotMC project and a representative of the open source community at RIT.

All three of these events have their own event reports to go with them. More info and plenty of pictures are in the full reports.

Vermont 2016 with Matt

Shortly after I arrived, Matt Coutu took me around to see the sights and find coffee

Shortly after I arrived, Matt took me around to see the sights and find coffee.

Some trips happen without prior arrangements and planning. Sometimes, the best memories are made by not saying no. I remember the phone call with one of my closest friends, Matt Coutu, at some point in October. On a sudden whim, we planned my first visit to Vermont to visit him. Some of the things he told me to expect made me excited to explore Vermont! And then in the pre-dawn hours of November 4th, I made the trek out to Vermont to see him.

50 feet up into the air atop Spruce Mountain was colder than we expected

50 feet up into the air atop Spruce Mountain was colder than we expected.

Instantly when crossing over the state border, I knew this was one of the most beautiful states I ever visited. During the weekend, the two of us did things that I think only the two of us would enjoy. We climbed a snowy mountain to reach an abandoned fire watchtower, where we endured a mini blizzard. We walked through a city without a specific destination in mind, but to go wherever the moment took us.

We visited a quiet dirt road that led to a meditation house and cavern maintained by monks, where we meditated and drank in the experience. I wouldn’t classify the trip has a high-energy or engaging trip, but for me, it was one of the most enjoyable trips I’ve embarked on yet. There are many things that I still hold on to from that weekend for remembering or reflecting back on.

A big shout-out to Matt for always supporting me with everything I do and always being there when we need each other.

Martin Bridge may not be one of your top places to visit in Vermont, but if you keep going, you'll find a one-of-a-kind view

Martin Bridge may not be one of your top places to visit in Vermont, but if you keep going, you’ll find a one-of-a-kind view.

Finally seeing NYC with Nolski

Mike Nolan and Justin W. Flory venture through New York City early on a Sunday evening

Mike Nolan and I venture through New York City early on a Sunday evening

In no short time after the Vermont trip, I purchased tickets for my favorite band, El Ten Eleven, in New York City on November 12th. What turned into a one-day trip to see the band turned into an all-weekend trip to see the band, see New York City, and spend some time catching up with two of my favorite people, Mike Nolan (nolski) and Remy DeCausemaker (decause). During the weekend, I saw the World Trade Center memorial site for the first time, tried some amazing bagels, explored virtual reality in Samsung’s HQ, and got an exclusive inside look at the Giphy office.

This was my third time in New York City, but my first time to explore the city. Another shout-out goes to Mike for letting me crash on his couch and stealing his Sunday to walk through his metaphorical backyard. Hopefully it isn’t my last time to visit the city either!

Finalizing study abroad

This may be cheating since it was taken in 2017, but this is one of my favorite photos from Dubrovnik, Croatia so far

This may be cheating since it was taken in 2017, but this is one of my favorite photos from Dubrovnik, Croatia so far. You can find more like this on my 500px gallery!

At the end of 2016, I finalized a plan that was more than a year in the making. I applied and was accepted to study abroad at the Rochester Institute of Technology campus in Dubrovnik, Croatia. RIT has a few satellite campuses across the world: two in Croatia (Zagreb and Dubrovnik) and one in Dubai, UAE. In addition to being accepted, the university provided me a grant to further my education abroad. I am fortunate to have received this opportunity and can’t wait to spend the next few months of my life in Croatia. I am currently studying in Dubrovnik since January until the end of May.

During my time here, I will be taking 12 credit hours of courses. I am taking ISTE-230 (Introduction to Database and Data Modeling), ENGL-361 (Technical Writing), ENVS-150 (Ecology of the Dalmatian Coast), and lastly, FOOD-161 (Wines of the World). The last one was a fun one that I took for myself to try broadening my experiences while abroad.

Additionally, one of my personal goals for 2017 is to practice my photography skills. During my time abroad, I have created a gallery on 500px where I upload my top photos from every week. I welcome feedback and opinions about my pictures, and if you have criticism for how I can improve, I’d love to hear about it!

Accepting my first co-op

The last big break that I had in 2016 was accepting my first co-op position. Starting in June, I will be a Production Engineering Intern at Jump Trading, LLC. I started interviewing with Jump Trading in October and even had an on-site interview that brought me to their headquarters in Chicago at the beginning of December. After meeting the people and understanding the culture of the company, I am happy to accept a place at the team. I look forward to learning from some of the best in the industry and hope to contribute to some of the fascinating projects going on there.

From June until late August, I will be starting full-time at their Chicago office. If you are in the area or ever want to say hello, let me know and I’d be happy to grab coffee, once I figure out where all the best coffee shops in Chicago are!

In summary

2015 felt like a difficult year to follow, but 2016 exceeded my expectations. I acknowledge and I’m grateful for the opportunities this year presented to me. Most importantly, I am thankful for the people who have touched my life in a unique way. I met many new people and strengthened my friendships and bonds with many old faces too. All of the great things from the past year would not be possible without the influence, mentorship, guidance, friendship, and comradery these people have given me. My mission is to always pay it forward to others in any way that I can, so that others are able to experience the same opportunities (or better).

2017 is starting off hot and moving quickly, so I hope I can keep up! I can’t wait to see what this year brings and hope that I have the chance to meet more amazing people, and also meet many of my old friends again, wherever that may be.

Keep the FOSS flag high.

The post 2016 – My Year in Review appeared first on Justin W. Flory's Blog.

Usability of software: The challenges facing projects

Posted by William Brown on January 22, 2017 02:00 PM

Usability of software: The challenges facing projects

I have always desired the usability of software like Directory Server to improve. As a former system administrator, usabilty and documentation are very important for me. Improvements to usability can eliminate load on documentation, support services and more.

Consider a microwave. No one reads the user manual. They unbox it, plug it in, and turn it on. You punch in a time and expect it to “make cold things hot”. You only consult the manual if it blows up.

Many of these principles are rooted in the field of design. Design is an important and often over looked part of software development - All the way from the design of an API to the configuration, and even the user interface of software.


The next year of Directory Server

Posted by William Brown on January 22, 2017 02:00 PM

The next year of Directory Server

Last year I wrote a post about the vision behind Directory Server and what I wanted to achieve in the project personally. My key aims were:

  • We need to modernise our tooling, and installers.
  • Setting up replication groups and masters needs to be simpler.
  • We need to get away from long lived static masters.
  • During updates, we need to start to enable smarter choices by default.
  • Out of the box we need smarter settings.
  • Web Based authentication


LCA2017 - Getting Into the Rusty Bucket

Posted by William Brown on January 22, 2017 02:00 PM

LCA2017 - Getting Into the Rusty Bucket

I spoke at Linux Conf Australia 2017 recently. I presented techniques and lessons about integrating Rust with existing C code bases. This is related to my work on Directory Server.

The recording of the talk can be found on youtube and on the Linux Australia Mirror .

You can find the git repository for the project on github .

The slides can be viewed on slides.com .

I have already had a lot of feedback on improvements to make to this system including the use of struct pointers instead of c_void, and the use of bindgen in certain places.