Fedora summer-coding Planet

The Last Two Years

Posted by Mo Morsi on November 01, 2020 04:00 AM

The last post on this blog (now mobile friendly!) was just over two years ago. Shortly after Dev Null Productions was launched, long before COVID19. That world is now foreign and as we live and grow, we transform in ways we never expected, our experiences drive us to new insights and perspectives. The past few years has seen alot of changes, which in return has led to much reflection. Meditation helps the day in and day out in this increasingly complex world. There are many great articles on the subject, I encourage all to read the one that introduced me onto the subject many years ago.

Meditation

I also advise caution to those finding themselves spending a large amout of time in The Matrix. While technical development can be fruitful and rewarding, there is alot to be said about the balance in life and human connection. Any worthwhile endeavor takes many hours and focus, especially one as complex as launching a business, each individual needs to undertake their own journey. Dev Null Productions moves forward and I'm excited for the things that are next in store.

But before that I'll be taking some personal time off, many hours were spent during the first 1/2 of the year running the NYC/XRP meetup and building and launching our latest project Zerp Tracker. I'm proud of what we delivered and the product is working beautifully with no manual intervention, but the body and mind needs to recuperate.

I've been playing alot of guitar, honing my rollerblading technique, and am planning on launching a cross-country road trip in the near future... stay tuned for photos! Unfortunately the pandemic and looming economic crisis may affect that, the extent and ramifications of which noone will be able to foresee, but the open road calls and I look forward to the sights and experiences across this great planet.

Openroad

Data & Market Analysis in C++, R, and Python

Posted by Mo Morsi on October 30, 2020 06:22 AM

In recent years, since efforts on The Omega Project and The Guild sort of fizzled out, I've been exploring various areas of interest with no particular intent other than to play around with some ideas. Data & Financial Engineering was one of those domains and having spent some time diving into the subject (before once again moving on to something else altogether) I'm sharing a few findings here.

My journey down this path started not too long after the Bitcoin Barber Shop Pole was completed, and I was looking for a new project to occupy my free time (the little of it that I have). Having long since stepped down from the SIG315 board, but still renting a private office at the space, I was looking for some way to incorporate that into my next project (besides just using it as the occasional place to work). Brainstorming a bit, I settled on a data visualization idea, where data relating to any number of categories would be aggregated, geotagged, and then projected onto a virtual globe. I decided to use the Marble widget library, built ontop of the QT Framework and had great success:

Datachoppa

The architecture behind the DataChoppa project was simple, a generic 'Data' class was implemented using smart pointers ontop of which the Facet Pattern was incorporated, allowing data to be recorded from any number of sources in a generic manner and represented via convenient high level accessors. This was all collected via synchronization and generation plugins which implement a standarized interface whose output was then fed onto a queue on which processing plugings were listening, selecting the data that they were interested in to be operated on from there. The Processors themselves could put more data onto the queue after which the whole process was repeated ad inf., allowing each plugin to satisfy one bit of data-related functionality.

Datachoppa arch

Core Generic & Data Classes

namespace DataChoppa{
  // Generic value container
  class Generic{
      Map<std::string, boost::any> values;
      Map<std::string, std::string> value_strings;
  };

  namespace Data{
    /// Data representation using generic values
    class Data : public Generic{
      public:
        Data() = default;
        Data(const Data& data) = default;

        Data(const Generic& generic, TYPES _types, const Source* _source) :
          Generic(generic), types(_types), source(_source) {}

        bool of_type(TYPE type) const;

        Vector to_vector() const;

      private:
        TYPES types;

        const Source* source;
    }; // class Data
  }; // namespace Data
}; // namespace DataChoppa

The Process Loop

  namespace DataChoppa {
    namespace Framework{
      void Processor::process_next(){
        if(to_process.empty()) return;
  
        Data::Data data = to_process.first();
        to_process.pop_front();
  
        Plugins::Processors::iterator plugin = plugins.begin();
  
        while(plugin != plugins.end()) {
          Plugins::Meta* meta = dynamic_cast<Plugins::Meta*>(*plugin);
          //LOG(debug) << "Processing " << meta->id;
  
          try{
            queue((*plugin)->process(data));
  
          }catch(const Exceptions::Exception& e){
            LOG(warning) << "Error when processing: " << e.what()
                         << " via " << meta->id;
          }
  
          plugin++;
        }
      }
    }; /// namespace Framework
  }; /// namespace DataChoppa

The HTTP Plugin (abridged)

namespace DataChoppa {
  namespace Plugins{
    class HTTP : public Framework::Plugins::Syncer,
                 public Framework::Plugins::Job,
                 public Framework::Plugins::Meta {
      public:
        /// ...

        /// sync - always return data to be added to queue, even on error
        Data::Vector sync(){
          String _url = url();
          Network::HTTP::SyncRequest request(_url, request_timeout);

          for(const Network::HTTP::Header& header : headers())
            request.header(header);

          int attempted = 0;
          Network::HTTP::Response response(request);

          while(attempts == -1 || attempted &lt; attempts){
            ++attempted;

            try{
              response.update_from(request.request(payload()));

            }catch(Exceptions::Timeout){
              if(attempted == attempts){
                Data::Data result = response.to_error_data();
                result.source = &source;
                return result.to_vector();
              }
            }

            if(response.has_error()){
              if(attempted == attempts){
                Data::Data result = response.to_error_data();
                result.source = &source;
                return result.to_vector();
              }

            }else{
              Data::Data result = response.to_data();
              result.source = &source;
              return result.to_vector();
            }
          }

          /// we should never get here
          return Data::Vector();
        }
    };
  }; // namespace Plugins
}; // namespace DataChoppa

Overall I was pleased with the result (and perhaps I should have stopped there...). The application collected and aggregated data from many sources including RSS feeds (google news, reddit, etc), weather sources (yahoo weather, weather.com), social networks (facebook, twitter, meetup, linkedin), chat protocols (IRC, slack), financial sources, and much more. While exploring the last I discovered the world of technical analysis and began incorporating many various market indicators into a financial analysis plugin for the project.

The Market Analysis Architecture

Datachoppa extractors Datachoppa annotators

Aroon Indicator (for example)

namespace DataChoppa{
  namespace Market {
    namespace Annotators {
      class Aroon : public Annotator {
        public:
          double aroon_up(const Quote& quote, int high_offset, double range){
            return ((range-1) - high_offset) / (range-1) * 100;
          }

          DoubleVector aroon_up(const Quotes& quotes, const Annotations::Extrema* extrema, int range){
            return quotes.collect<DoubleVector>([extrema, range](const Quote& q, int i){
                     return aroon_up(q, extrema->high_offsets[i], range);
                   });
          }

          double aroon_down(const Quote& quote, int low_offset, double range){
            return ((range-1) - low_offset) / (range-1) * 100;
          }

          DoubleVector aroon_down(const Quotes& quotes, const Annotations::Extrema* extrema, int range){
            return quotes.collect<DoubleVector>([extrema, range](const Quote& q, int i){
                     return aroon_down(q, extrema->low_offsets[i], range);
                   });
          }

          AnnotationList annotate() const{
            const Quotes& quotes = market->quotes;
            if(quotes.size() < range) return AnnotationList();

            const Annotations::Extrema* extrema = aroon_extrema(market, range);
                    Annotations::Aroon* aroon = new Annotations::Aroon(range);
                                        aroon->upper = aroon_up(market->quotes, extrema, range);
                                        aroon->lower = aroon_down(market->quotes, extrema, range);
            return aroon->to_list();
          }
      }; /// class Aroon
    }; /// namespace Annotators
  }; /// namespace Market
}; // namespace DataChoppa

The whole thing worked great, data was pulled in both real time and historical from yahoo finance (until they discontinued it... from then it was google finance), the indicators were run, and results were output. Of course, making $$$ is not as simple as just crunching numbers, and being rather naive I just tossed the results of the indicators into weighted "buckets" and backtested based on simple boolean flags based on the computed signals against threshold values. Thankfully I backtested though as the performance was horrible as losses greatly exceed profits :-(

At this point I should take a step back and note that my progress so far was the result of the availibilty of alot of great resources (we really live in the age of accelerated learning). Specifically the following are indispensible books & sites for those interested in this subject:

  • stockcharts.com - Information on any indicator can be found on this site with details on how it is computed and how it can be used
  • investopedia - Sort of the Wikipedia of investment knowledge, offers great high level insights into how market works and the financial world as it stands
  • Beyond Candlesticks - Though candlestick patterns have limited use, this is great intro to the subject, and provides a good into to reading charts.
  • Nerds on Wall Street - A great book detailing the history of computational finance. Definetly must read if you are new to the domain as it provides a concise high level history on how markets have worked the last few centuries and various computations techniques employed to Seek Alpha
  • High Probability Trading - Provides insights as to the mentality and common pitfalls when trading.
Beyond candlesticks Nerds on wallstreet High prob trading

The last book is an excellent resource which conveys the importance of money and risk management, as well as the necessity to combine in all factors, or as many factors as you can, when making financial decisions. In the end, I feel this is the gist of it, it's not soley a matter of luck (though there is an aspect of that to this), but rather patience, discipline, balance, and most importantly focus (similar to Aikido but that's a topic for another time). There is no shorting it (unless you're talking about the assets themselves!), and if one does not have / take the necessary time to research and properly plan and out and execute strategies, they will most likely fail (as most do according to the numbers).

It was at this point that I decided to take a step back and restrategize, and having reflected and discussed it over with some acquaintances, I hedged my bets, cut my losses (tech-wise) and switched from C++ to another platform which would allow me prototype and execute ideas quicker. A good amount of time has gone into the C++ project and it worked great, but it did not make sense to continue via a slower development cycle when faster options are available (and afterall every engineer knows time is our most precious resource).

Python and R are the natural choices for this project domain, as there is extensive support in both languages for market analysis, backtesting, and execution. I have used Python at various, points in the past so it was easy to hit the ground running; R was new but by this time no language really poses a serious surprise, the best way I can describe it is spreadsheets on steroids (not exactly, as rather than spreadsheets, data frames and matrixes are the core components, but one can imagine R as being similar to the central execution environment behind Excel, Matlab, or other statistical-software).

I quickly picked up quantmod and prototyped some volatility, trend-following, momentum, and other analysis signal generators in R, plotting them using the provided charting interface. R is a great language for this sort of data manipulation, one can quickly load up structured data from CSV files or online resources, splice it and dice it, chunk it and dunk it, organize it and prioritize it, according to any arithmatic, statistical, or linear/non-linear means which they desire. Quickly loading a new 'view' on the data is as simply as a line of code, and operations can quickly be chained together at high performance.

Volatility indicator in R (consolidated)

quotes <- load_default_symbol("volatility")

quotes.atr <- ATR(quotes, n=ATR_RANGE)

quotes.atr$tr_atr_ratio <- quotes.atr$tr / quotes.atr$atr
quotes.atr$is_high      <- ifelse(quotes.atr$tr_atr_ratio > HIGH_LEVEL, TRUE, FALSE)

# Also Generate ratio of atr to close price
quotes.atr$atr_close_ratio <- quotes.atr$atr / Cl(quotes)

# Generate rising, falling, sideways indicators by calculating slope of ATR regression line
atr_lm       <- list()
atr_lm$df    <- data.frame(quotes.atr$atr, Time = index(quotes.atr))
atr_lm$model <- lm(atr ~ poly(Time, POLY_ORDER), data = atr_lm$df) # polynomial linear model

atr_lm$fit   <- fitted(atr_lm$model)
atr_lm$diff  <- diff(atr_lm$fit)
atr_lm$diff  <- as.xts(atr_lm$diff)

# Current ATR / Close Ratio
quotes.atr.abs_per <- median(quotes.atr$atr_close_ratio[!is.na(quotes.atr$atr_close_ratio)])

# plots
chartSeries(quotes.atr$atr)
addLines(predict(atr_lm$model))
addTA(quotes.atr$tr, type="h")
addTA(as.xts(as.logical(quotes.atr$is_high), index(quotes.atr)), col=col1, on=1)

While it all works great, the R language itself offers very little syntactic sugar for operations not related to data-processing. While there are libraries for most common functionality found in many other execution environments, languages such as Ruby and Python, offer a "friendlier" experience to both novice and seasoned developers alike. Furthermore the process of data synchronization was a tedious step, I was looking for something that offered the flexability of DataChoppa to pull in and process live and historical data from a wide variety of sources, caching results on the fly, and using those results and analysis for subsequent operations.

This all led me to developing a series of Python libraries targeted towards providing a configurable high level view of the market. Intelligence Amplification (IA) as opposed to Artifical Intelligence (AI) if you will (see Nerds on Wall Street).

marketquery.py is a high level market querying library, which implements plugins used to resolve generic market queries for ticker time based data. One can used the interface to query for the lastest quotes or a specific range of them from a particular source, or allow the framework to select one for you.

Retrieve first 3 months of the last 5 years of GBPUSD data

  from marketquery.querier        import Querier
  from marketbase.query.builder   import QueryBuilder
  
  sym = "GBPUSD"
  
  first_3mo_of_last_5yr = (QueryBuilder().symbol(sym)
                                         .first("3months_of_year")
                                         .last("5years")
                                         .query)
  
  querier = Querier()
  res     = querier.run(first_3mo_of_last_5yr)
  
  for query, dat in res.items():
      print(query)
      print(dat.raw[:1000] + (dat.raw[1000:] and '...'))

Retrieve last two month of hourly EURJPY data

  from marketquery.querier        import Querier
  from marketbase.query.builder   import QueryBuilder
  
  sym = "EURJPY"
  
  two_months_of_hourly = (QueryBuilder().symbol(sym)
                                        .last("2months")
                                        .interval("hourly")
                                        .query)
  
  querier = Querier()
  res     = querier.run(two_months_of_hourly).raw()
  print(res[:1000] + (res[1000:] and '...'))

This provides a quick way to both lookup market data according to specific criteria, as well as cache it so that network resources are used effectively. All caching is configurable, and the user can define timeouts based on the target query, source, and/or data retrieved.

From there the next level up is the technical analysis is was trivial to whip up the tacache.py module which uses the marketquery.py interface to retrieve raw data before feeding it into TALib caching the results. The same caching mechanisms offering the same flexability is employed, if one needs to process a large data set and/or subsets multiple times in a specified period, computational resources are not wasted (important when running on a metered cloud)

Computing various technical indicators

  from marketquery.querier       import Querier
  from marketbase.query.builder  import QueryBuilder
  
  from tacache.runner            import TARunner
  from tacache.source            import Source
  from tacache.indicator         import Indicator
  from talib                     import SMA
  from talib                  import MACD
  
  ###
  
  res = Querier().run(QueryBuilder().symbol("AUDUSD")
                                    .query)
  
  ###
  
  ta_runner = TARunner()
  analysis  = ta_runner.run(Indicator(SMA),
                            query_result=res)
  print(analysis.raw)
  
  analysis  = ta_runner.run(Indicator(MACD),
                            query_result=res)
  macd, sig, hist = analysis.raw
  print(macd)

Finally ontop of all this I wrote a2m.py, a high level querying interface consisting of modules reporting on market volatility and trends as well as other metrics; python scripts which I could quickly execute to report the current and historical market state, making used of the underlying cached query and technical analysis data, periodically invalidated to pull in new/recent live data.

Example using a2m to compute volatility

  sym = "EURUSD"
  self.resolver  = Resolver()
  self.ta_runner = TARunner()

  daily = (QueryBuilder().symbol(sym)
                         .interval("daily")
                         .last("year")
                         .query)

  hourly = (QueryBuilder().symbol(sym)
                          .interval("hourly")
                          .last("3months")
                          .latest()
                          .query)

  current = (QueryBuilder().symbol(sym)
                           .latest()
                           .data_dict()
                           .query)

  daily_quotes   = resolver.run(daily)
  hourly_quotes  = resolver.run(hourly)
  current_quotes = resolver.run(current)

  daily_avg  = ta_runner.run(Indicator(talib.SMA, timeperiod=120),  query_result=daily_quotes).raw[-1]
  hourly_avg = ta_runner.run(Indicator(talib.SMA, timeperiod=30),  query_result=hourly_quotes).raw[-1]

  current_val    = current_quotes.raw()[-1]['Close']
  daily_percent  = current_val / daily_avg  if current_val &lt; daily_avg  else daily_avg  / current_val
  hourly_percent = current_val / hourly_avg if current_val &lt; hourly_avg else hourly_avg / current_val
Awesome to the max

I would go onto use this to execute some Forex trades, again not in an algorithmic / automated manner, but rather based on combined knowledge from fundamentals research, as well as the high level technical data, and what was the result...

Poor squidward

I jest, though I did lose a little $$$, it wasn't that much, and to be honest I feel this was due to lack of patience/discipline and other "novice" mistakes as discussed above. I did make about 1/2 of it back, and then lost interest. This all requires alot of focus and time, and I had already spent 2+ years worth of free time on this. With many other interests pulling my strings, I decided to sideline the project(s) alltogether and focus on my next crazy venture.

TLDR;

After some of consideration, I decided to release the R code I wrote under the MIT license. They are rather simple expirements though could be useful as a starting point for others new to the subject. As far as the Python modules and DataChoppa, I intended to eventually release them but aim to take a break first to focus on other efforts and then go back to the war room, to figure out the next stage of the strategy.

And that's that! Enough number crunching, time to go out for a hike!

Hiking meme

Why I still choose Ruby

Posted by Mo Morsi on October 30, 2020 06:17 AM

With the plethora of languages available to developers, I wanted to do a quick follow-up post as to why given my experience in many different environments, Ruby is still the goto language for all my computational needs!

Prg mtn

While different languages offer different solutions in terms of syntax support, memory managment, runtime guarantees, and execution flows; the underlying arithmatic, logical, and I/O hardware being controlled is the same. Thus in theory, given enough time and optimization the performance differences between languages should go to 0 as computational power and capacity increases / goes to infinity (yes, yes, Moore's law and such, but lets ignore that for now).

Of course different classes of problem domains impose their own requirements,

  • real time processing depends low level optimizations that can only be done in assembly and C,
  • data crunching and process parallelization often needs minimal latency and optimized runtimes, something which you only get with compiled/static-typed languages such as C++ and Java,
  • and higher level languages such as Ruby, Python, Perl, and PHP are great for rapid development cycles and providing high level constructs where complicated algorithms can be invoked via elegant / terse means.

But given the rapid rate of hardware performance in recent years, whole classes of problems which were previously limited to 'lower-level' languages such as C and C++ are now able to be feasbily implemented in higher level languages.

Computer power

(source)

Thus we see high performance financial applications being implemented in Python, major websites with millions of users a day being implemented in Ruby and Javascript, massive data sets being crunched in R, and much more.

So putting the performance aspect of these environments aside we need to look at the syntactic nature of these languages as well as the features and tools they offer for developers. The last is the easiest to tackle as these days most notable languages come with compilers/interpreters, debuggers, task systems, test suites, documentation engines, and much more. This was not always the case though as Ruby was one of the first languages that pioneered builtin package management through rubygems, and integrated dependency solutions via gemspecs, bundler, etc. CPAN and a few other language-specific online repositories existed before, but with Ruby you got integration that was a core part of the runtime environment and community support. Ruby is still known to be on the leading front of integrated and end-to-end solutions.

Syntax differences is a much more difficult subject to objectively dicuss as much of it comes down to programmer preference, but it would be hard to object to the statement that Ruby is one of the most Object Oriented languages out there. It's not often that you can call the string conversion or type identification methods on ALL constructs, variables, constants, types, literals, primitives, etc:

  > 1.to_s
  => "1"
  > 1.class
  => Integer

Ruby also provides logical flow control constructs not seen in many other languages. For example in addition to the standard if condition then dosomething paradigm, Ruby allows the user to specify the result after the predicate, eg dosomething if condition. This simple change allows developers to express concepts in a natural manner, akin to how they would often be desrcibed between humans. In addition to this, other simple syntax conveniences include:

  • The unless keyword, simply evaluating to if not
      file = File.open("/tmp/foobar", "w")
      file.write("Hello World") unless file.exist?
    
  • Methods are allowed to end with ? and ! which is great for specifying immutable methods (eg. Socket.open?), mutable methods and/or methods that can thrown an exception (eg. DBRecord.save!)
  • Inclusive and exclusive ranges can be specified via parenthesis and two or three elipses. So for example:
      > (1..4).include?(4)
      => true
      > (1...4).include?(4)
      => false
    
  • The yield keywork makes it trivial for any method to accept and invoke a callback during the course of its lifetime
  • And much more

Expanding upon the last, blocks are a core concept in Ruby, once which the language nails right on the head. Not only can any function accept an anonymous callback block, blocks can be bound to parameters and operated on like any other data. You can check the number of parameters the callbacks accept by invoking block.arity, dynamically dispatch blocks, save them for later invokation and much more.

Due to the asynchronous nature of many software solutions (many problems can be modeled as asynchronous tasks) blocks fit into many Ruby paradigms, if not as the primary invocation mechanism, then as an optional mechanism so as to enforce various runtime guarantees:

  File.open("/tmp/foobar"){ |file|
    # do whatever with file here
  }

  # File is guaranteed to be closed here, we didn't have to close it ourselves!

By binding block contexts, Ruby facilitates implementing tightly tailored solutions for many problem domains via DSLs. Ruby DSLs exists for web development, system orchestration, workflow management, and much more. This of course is not to mention the other frameworks, such as the massively popular Rails, as well as other widely-used technologies such as Metasploit

Finally, programming in Ruby is just fun. The language is condusive to expressing complex concepts elegantly, jives with many different programming paradigms and styles, and offers a quick prototype to production workflow that is intuitive for both novice and seasoned developers. Nothing quite scratches that itch like Ruby!

Doge ruby

RETerm to The Terminal with a GUI

Posted by Mo Morsi on October 30, 2020 06:11 AM

When it comes to user interfaces, most (if not all) software applications can be classified into one of three categories:

  • Text Based - whether they entail one-off commands, interactive terminals (REPL), or text-based visual widgets, these saw a major rise in the 50s-80s though were usurped by GUIs in the 80s-90s
  • Graphical - GUIs, or Graphical User Interfaces, facilitate creating visual windows which the user may interact with via the mouse or keyboard. There are many different GUI frameworks available for various platforms
  • Web Based - A special type of graphical interface rendered via a web browser, many applications provide their frontend via HTML, Javascript, & CSS
Interfaces comparison

In recent years modern interface trends seem to be moving in the direction of the Web User Interfaces (WUI), with increasing numbers of apps offering their functionality primarily via HTTP. That being said GUIs and TUIs (Text User Interfaces) are still an entrenched use case for various reasons:

  • Web browsers, servers, and network access may not be available or permissable on all systems
  • Systems need mechanisms to access and interact with the underlying components, incase higher level constructs, such as graphics and network subsystems fail or are unreliable
  • Simpler text & graphical implementations can be coupled and optimized for the underlying operational environment without having to worry about portability and cross-env compatability. Clients can thus be simpler and more robust.

Finally there is a certain pleasing ascethic to simple text interfaces that you don't get with GUIs or WUIs. Of course this is a human-preference sort-of-thing, but it's often nice to return to our computational roots as we move into the future of complex gesture and voice controlled computer interactions.

Scifi terminal

When working on a recent side project (to be announced), I was exploring various concepts as to the user interface which to throw ontop of it. Because other solutions exist in the domain which I'm working in (and for other reasons), I wanted to explore something novel as far as user interaction, and decided to expirement with a text-based approach. ncurses is the goto library for this sort of thing, being available on most modern platforms, along with many widget libraries and high level wrappers

Ncurses

Unfortunately ncurses comes with alot of boilerplate and it made sense to seperate that from the project I intend to use this for. Thus the RETerm library was born, with the intent to provide a high level DSL to implmenent terminal interfaces and applications (... in Ruby of couse <3 !!!)

Reterm sc1

RETerm, aka the Ruby Enhanced TERMinal allows the user to incorporate high level text-based widgets into an orgnaized terminal window, with seemless standardized keyboard interactions (mouse support is on the roadmap to be added). So for example, one could define a window containing a child widget like so:

require 'reterm'
include RETerm

value = nil

init_reterm {
  win = Window.new :rows => 10,
                   :cols => 30
  win.border!
  update_reterm

  slider = Components::VSlider.new
  win.component = slider
  value = slider.activate!
}

puts "Slider Value: #{value}"

This would result in the following interface containing a vertical slider:

Reterm sc2

RETerm ships with many built widgets including:

Text Entry

Reterm sc3

Clickable Button

Reterm sc4

Radio Switch/Rocker/Selectable List

Reterm sc5 Reterm sc6 Reterm sc7

Sliders (both horizontal and vertical)

Dial

Ascii Text (with many fonts via artii/figlet)

Reterm sc8

Images (via drawille)

Reterm sc9

RETerm is now available via rubygems. To install, simplly:

  $ gem install reterm

That's All Folks... but wait there is more!!! Afterall:

Delorian meme

For a bit of a value-add, I decided to implement a standard schema where text interfaces could be described in a JSON config file and loaded by the framework, similar to xml schemas which GTK and Android use for their interfaces. One can simply describe their interface in JSON and the framework will instantiate the corresponding text interface:

{
  "window" : {
    "rows"      : 10,
    "cols"      : 50,
    "border"    : true,
    "component" : {
      "type" : "Entry",
      "init" : {
        "title" : "<C>Demo",
        "label" : "Enter Text: "
      }
    }
  }
}
Reterm sc10

To assist in generating this schema, I implemented a graphical designer, where components can be dragged and dropped into a 2D canvas to layout the interface.

That's right, you can now use a GUI based application to design a text-based interface.

Retro meme

The Designer itself can be found in the same repo as the RETerm project, loaded in the "designer/" subdir.

Reterm designer

To use if you need to install visualruby (a high level wrapper to ruby-gnome) like so:

  $ gem install visualruby

And that's it! (for real this time) This was certainly a fun side-project to a side-project (toss in a third "side-project" if you consider the designer to be its own thing!). As I to return to the project using RETerm, I aim to revisit it every so often, adding new features, widgets, etc....

EOF

CLS

Into The Unknown - My Departure from RedHat

Posted by Mo Morsi on October 30, 2020 06:09 AM

In May 2006, a young starry eyed intern walked into the large corporate lobby of RedHat's Centential Campus in Raleigh, NC, to begin what would be a 12 year journey full of ups and downs, break-throughs and setbacks, and many many memories. Flash forward to April 2018, when the "intern-turned-hardend-software-enginner" filed his resignation and ended his tenure at RedHat to venture into the risky but exciting world of self-employment / entrepreneurship... Incase you were wondering that former-intern / Software Engineer is myself, and after nearly 12 years at RedHat, I finished my last day of employment on Friday April 13th, 2018.

Overall RedHat has been a great experience, I was able to work on many ground-breaking products and technologies, with many very talented individuals from across the spectrum and globe, in a manner that facilitated maximum professional and personal growth. It wasn't all sunshine and lolipops though, there were many setbacks, including many cancelled projects and dead-ends. That being said, I felt I was always able to speak my mind without fear of reprocussion, and always strived to work on those items that mattered the most and had the furthest reaching impact.

Some (but certainly not all) of those items included:

  • The highly publicized, but now defunct, RHX project
  • The oVirt virtualization management (cloud) platform, where I was on the original development team, and helped build the first prototypes & implementation
  • The RedHat Ruby stack, which was a battle to get off the ground (given the prevalence of the Java and Python ecosystems, continuing to to this day). This is one of the items I am most proud of, we addressed the needs of both the RedHat/Fedora and Ruby communities, building the necessary bridge logic to employ Ruby and Rails solutions in many enterprise production environments. This continues to this day as the team continuously stays ontop of upstream Ruby developments and provide robust support and solutions for downstream use
  • The RedHat CloudForms projects, on which I worked on serveral interations, again including initial prototypes and standards, as well as ManageIQ integration.
  • ReFS reverse engineering and parser. The last major research topic that I explored during my tenure at RedHat, this was a very fun project where I built upon the sparse information about the filesystem internals that's out there, and was able to deduce the logic up to the point of being able to read directory lists and file contents and metadata out of a formatted filesystem. While there is plenty of work to go on this front, I'm confident that the published writeups are an excellent launching point for additional research as well as the development of formal tools to extract data from the filesystem.

My plans for the immediate future are to take a short break then file to form a LLC and explore options under that umbrella. Of particular interest are crypto-currencies, specifically Ripple. I've recently begun developing an integrated wallet, ledger and market explorer, and statistical analysis framework called Wipple which I'm planning to continue working on and if all goes according to plan, generating some revenue from. There is alot of ??? between here and there, but thats the name of the game!

Until then, I'd like to thank everyone who helped me do my thing at RedHat, from landing my initial internships and the full-time after that, to showing me the ropes, and not shooting me down when I put myself out there to work on and promote our innovation solutions and technologies!

Solong

Dev Null Productions

Posted by Mo Morsi on October 30, 2020 06:09 AM

After my Departure from RedHat I was able to get some RnR, but quickly wanted to get a head start of my next venture. This is because I decided to put a cap on the amount of time that would be dedicated to trying to make "it" happen, and would pause at regular checkpoints to monitor progress. This is not to say I'm going to quit the endeavor at that point in the future (the timeframe of which I'm keeping private), but the intent is to drive focus and keep the ball moving forward objectively. While Omega was a great project to work on, both fun and the source of much growth and experience, I am not comfortable with the amount of time spent on it, for what was gained. All hindsight is 20/20, but every good trader knows when to cut losses.

Dev Null Productions LLC. was launched four months ago in April 2018 and we haven't looked back. Our flagship product, Wipple XRP Intelligence was launched shortly after, providing realtime access to the XRP network and high level stats and reporting. The product is under continued development and we've begun a social-media based marketing drive to promote it. Things are still early, and there is still aways to go & obstacles to overcome (not to mention the crypto-currency bear market that we've been in for the last 1/2 year), but the progress has been great, and there are many more awesome features in the queue

This Thursday, I am giving a presentation on XRP to the Syracuse Software Development Meetup, hosted at the Tech Garden, a tech incubator in Syracuse, NY. I aim to go over the XRP protocol, discussing both the history of Ripple and the technical details, as well as common use cases and gotchyas from our experiences. The event is looking very solid, and there is already a large turnout and some great momentum growing, so I'm excited to participate in it and see how it all goes. While we're still in the early phases of development, I'm hoping to drive some interest in the project, and perhaps meet collaborators who'd like to come onboard for a percentage of ultimate profits!

Be sure to stay tuned for more updates and developments, until then, keep Rippling!

Ripple moon

IBM To Buy RedHat &amp; Dev Null Update

Posted by Mo Morsi on October 30, 2020 06:04 AM

I had no idea about the acquisition when I left Red Hat last spring, but I'll just leave this one here:

Rht journey

In other news, Dev Null is going great, the product has more or less stabilized, we're hitting up conferences all over the country to promote it, and it's gaining some traction with the XRP community on social media. There's still along ways to go to see Wipple reach full fruition but I'm excited by the progress and eager to continue driving development.

That's all for now, I have a laundry list of topics which I'd like to blog about but until I get some time to do so, they will have to stay on the backburner!

Backburner

USG fixing avahi

Posted by William Brown on March 14, 2020 02:00 PM

USG fixing avahi

Sadly on the USG pro 4 avahi will regularly spiral out of control taking up 100% cpu. To fix this, we set an hourly restart:

sudo -s
crontab -e

Then add:

15 * * * * /usr/sbin/service avahi-daemon restart

Fedora 32 Wallpaper Submission - Story

Posted by William Brown on March 13, 2020 02:00 PM

Fedora 32 Wallpaper Submission - Story

Fedora opens submissions for wallpapers to be submitted for the next version of the release. I used fedora for a long time, so I decided to submit this photo, and write this post to talk about it:

../../../_images/20191119_184819_DSCF0043_5.jpg

This was takeing on 2019-11-19 in my home city of Adelaide, South Australia. I had traveled to see some friends over Christmas. We went to Mount Osmond to take some photos, and I took this as we walked up to the lookout.

The next day, this area was a high risk location for a possible bushfire - and many bushfires have since devastated many regions of Australia, affecting many people that I know.

I really find that the Australian landscape is so different to Europe or Asia - many tones of subtle reds, browns, and more. A dry and dusty look. The palette is such a contrast to the lush greens of Europe. Australia is a really beautiful country, in a very distinct and striking manner.

Anyway, I hope you like the photo :)

Fixing a MacBook Pro 8,2 with dead AMD GPU

Posted by William Brown on February 03, 2020 02:00 PM

Fixing a MacBook Pro 8,2 with dead AMD GPU

I’ve owned a MacBook Pro 8,2 late 2011 edition, which I used from 2011 to about 2018. It was a great piece of hardware, and honestly I’m surprised it lasted so long given how many MacOS and Fedora installs it’s seen.

I upgraded to a MacBook Pro 15,1, and I gave the 8,2 to a friend who was in need of a new computer so she could do her work. It worked really well for her until today when she’s messaged me that the machine is having a problem.

The Problem

The machine appeared to be in a bootloop, where just before swapping from the EFI GPU to the main display server, it would go black and then lock up/reboot. Booting to single user mode (boot holding cmd + s) showed the machine’s disk was intact with a clean apfs. The system.log showed corruption at the time of the fault, which didn’t instill confidence in me.

Attempting a recovery boot (boot holding cmd + r), this also yielded the bootloop. So we have potentially eliminated the installed copy of MacOS as the source of the issue.

I’ve then used the apple hardware test (boot while holding d), and it has passed the machine as a clear bill of health.

I have seen one of these machines give up in the past - my friends mother had one from the same generation and that died in almost the same way - could it be the same?

The 8,2’s cursed gpu stack

The 8,2 15” mbp has dual gpu’s - it has the on cpu Intel 3000, and an AMD radeon 6750M. The two pass through an LVDS graphics multiplexer to the main panel. The external display port however is not so clear - the DDC lines are passed through the GMUX, but the datalines directly attach to the the display port.

The machine is also able to boot with EFI rendering to either card. By default this is the AMD radeon. Which ever card is used at boot is also the first card MacOS attempts to use, but it will try to swap to the radeon later on.

This generation had a large number of the radeons develop faults in their 3d rendering capability so it would render the EFI buffer correctly, but on the initiation of 3d rendering it would fail. Sounds like what we have here!

To fix this …

Okay, so this is fixable. First, we need to tell EFI to boot primarily from the intel card. Boot to single user mode and then run.

nvram fa4ce28d-b62f-4c99-9cc3-6815686e30f9:gpu-power-prefs=%01%00%00%00

Now we need to prevent loading of the AMD drivers so that during the boot MacOS doesn’t attempt to swap from Intel to the Radeon. We can do this by hiding the drivers. System integrity protection will stop you, so you need to do this as part of recovery. Boot with cmd + r, which now works thanks to the EFI changes, then open terminal

cd /Volumes/Macintosh HD
sudo mkdir amdkext
sudo mv System/Library/Extensions/AMDRadeonX3000.kext amdkext/

Then reboot. You’ll notice the fans go crazy because the Radeon card can’t be disabled without the driver. We can post-boot load the driver to stop the fans to fix this up.

To achieve this we make a helper script:

# cat /usr/local/libexec/amd_kext_load.sh
#!/bin/sh
/sbin/kextload /amdkext/AMDRadeonX3000.kext

And a launchctl daemon

# cat /Library/LaunchDaemons/au.net.blackhats.fy.amdkext.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
        <dict>
                <key>Label</key>
                <string>au.net.blackhats.fy.amdkext</string>
                <key>Program</key>
                <string>/usr/local/libexec/amd_kext_load.sh</string>
                <key>RunAtLoad</key>
                <true/>
                <key>StandardOutPath</key>
                <string>/var/log/amd_kext_load.log</string>
        </dict>
</plist>

Now if you reboot, you’ll have a working mac, and the fans will stop properly. I’ve tested this with suspend and resume too and it works! The old beast continues to live :)

There are no root causes

Posted by William Brown on January 19, 2020 02:00 PM

There are no root causes

At Gold Coast LCA2020 I gave a lightning talk on swiss cheese. Well, maybe not really swiss cheese. But it was about the swiss cheese failure model which was proposed at the university of manchester.

Please note this will cover some of the same topics as the talk, but in more detail, and with less jokes.

An example problem

So we’ll discuss the current issues behind modern CPU isolation attacks IE spectre. Spectre is an attack that uses timing of a CPU’s speculative execution unit to retrieve information from another running process on the same physical system.

Modern computers rely on hardware features in their CPU to isolate programs from each other. This could be isolating your web-browser from your slack client, or your sibling’s login from yours.

This isolation however has been compromised by attacks like Spectre, and it looks unlikely that it can be resolved.

What is speculative execution?

In order to be “fast” modern CPU’s are far more complex than most of us have been taught. Often we believe that a CPU thread/core is executing “one instruction/operation” at a time. However this isn’t how most CPU’s work. Most work by having a pipeline of instructions that are in various stages of execution. You could imagine it like this:

let mut x = 0
let mut y = 0
x = 15 * some_input;
y = 10 * other_input;
if x > y {
    return true;
} else {
    return false;
}

This is some made up code, but in a CPU, every part of this could be in the “pipeline” at once.

let mut x = 0                   <<-- at the head of the queue and "further" along completion
let mut y = 0                   <<-- it's executed part way, but not to completion
x = 15 * some_input;
y = 10 * other_input;           <<-- all of these are in pipeline, and partially complete
if x > y {                      <<-- what happens here?
    return true;
} else {
    return false;
}

So how does this “pipeline” handle the if statement? If the pipeline is looking ahead, how can we handle a choice like an if? Can we really predict the future?

Speculative execution

At the if statement, the CPU uses past measurements to make a prediction about which branch might be taken, and it then begins to execute that path, even though ‘x > y’ has not been executed or completed yet! At this point x or y may not have even finished being computed yet!

Let’s assume for now our branch predictor thinks that ‘x > y’ is false, so we’ll start to execute the “return false” or any other content in that branch.

Now the instructions ahead catch up, and we resolve “did we really predict correctly?”. If we did, great! We have been able to advance the program state asynchronously even without knowing the answer until we get there.

If not, ohh nooo. We have to unwind what we were doing, clear some of the pipeline and try to do the correct branch.

Of course this has an impact on timing of the program. Some people found you could write a program to manipulate this predictor and using specific addresses and content, they could use these timing variations to “access memory” they are not allowed to by letting the specualative executor contribute to code they are not allowed to access before the unroll occurs. They could time this, and retrieve the memory contents from areas they are not allowed to access, breaking isolation.

Owwww my brain

Yes. Mine too.

Community Reactions

Since this has been found, a large amount of the community reaction has been about the “root cause”. ‘Clearly’ the root cause is “Intel are bad at making CPU’s” and so everyone should buy AMD instead because they “weren’t affected quite as badly” (Narrators voice: They were absolutely just as bad). We’ve had some intel CPU updates and kernel/program fixes so all good right? We addressed the root cause.

Or … did we?

Our computers are still asynchronous, and contain many out-of-order parts. It’s hard to believe we have “found” every method of exploiting this. Indeed in the last year many more ways to bypass hardware isolation due to our systems async nature have been found.

Maybe the “root cause” wasn’t addressed. Maybe … there are no ….

History

To understand how we got to this situation we need to look at how CPU’s have evolved. This is not a complete history.

The PDP11 was a system owned at bell labs, where the C programing language was developed. Back then CPU’s were very simple - A CPU and memory, executing one instruction at a time.

The C programming language gained a lot of popularity as it was able to be “quickly” ported to other CPU models to allow software to be compiled on other platforms. This led to many systems being developed in C.

Intel introduced the 8086, and many C programs were ported to run on it. Intel then released the 80486 in 1989, which had the first pipeline and cache to improve performance. In order to continue to support C, this meant the memory model could not change from the PDP11 - the cache had to be transparent, and the pipeline could not expose state.

This has of course led to computers being more important in our lives and businesses, so we expected further performance, leading to increased frequencies and async behaviours.

The limits of frequencies were really hit in the Pentium 4 era, when about 4GHz was shown to be a barrier of stability for those systems. They had very deep pipelines to improve performance, but that also had issues when branch prediction failed causing pipeline stalls. Systems had to improve their async behaviours futher to squeeze every single piece of performance possible out.

Compiler developers also wanted more performance so they started to develop ways to transform C in ways that “took advantage” of x86_64 tricks, by manipulating the environment so the CPU is “hinted” into states we “hope” it gets into.

Many businesses also started to run servers to provide to consumers, and in order to keep costs low they would put many users onto single pieces of hardware so they could share or overcommit resources.

This has created a series of positive reinforcement loops - C is ‘abi stable’ so we keep developing it due to it’s universal nature. C code can’t be changed without breaking every existing system. We can’t change the CPU memory model without breaking C, which is hugely prevalent. We improve the CPU to make C faster, transparently so that users/businesses can run more C programs and users. And then we improve compilers to make C faster given quirks of the current CPU models that exist …

Swiss cheese model

It’s hard to look at the current state of systems security and simply say “it’s the cpu vendors fault”. There are many layers that have come together to cause this situation.

This is called the “swiss cheese model”. Imagine you take a stack of swiss cheese and rotate and rearrange the slices. You will not be able to see through it. but as you continue to rotate and rearrange, eventually you may see a tunnel through the cheese where all the holes line up.

This is what has happened here - we developed many layers socially and technically that all seemed reasonable over time, and only after enough time and re-arrangements of the layers, have we now arrived at a situation where a failure has occured that permeates all of computer hardware.

To address it, we need to look beyond just “blaming hardware makers” or “software patches”. We need to help developers move away from C to other languages that can be brought onto new memory models that have manual or other cache strategies. We need hardware vendors to implement different async models. We need to educate businesses on risk analysis and how hardware works to provide proper decision making capability. We need developers to alter there behaviour to work in environments with higher performance constraints. And probably much much more.

There are no root causes

It is a very pervasive attitude in IT that every issue has a root cause. However, looking above we can see it’s never quite so simple.

Saying an issue has a root cause, prevents us from examining the social, political, economic and human factors that all become contributing factors to failure. Because we are unable to examine them, we are unable to address the various layers that have contributed to our failures.

There are no root causes. Only contributing factors.

Concurrency 2: Concurrently Readable Structures

Posted by William Brown on December 28, 2019 02:00 PM

Concurrency 2: Concurrently Readable Structures

In this post, I’ll discuss concurrently readable datastructures that exist, and ideas for future structures. Please note, this post is an inprogress design, and may be altered in the future.

Before you start, make sure you have read part 1

Concurrent Cell

The simplest form of concurrently readable structure is a concurrent cell. This is equivalent to a read-write lock, but has concurrently readable properties instead. The key mechanism to enable this is that when the writer begins, it clones the data before writing it. We trade more memory usage for a gain in concurrency.

To see an implementation, see my rust crate, concread

Concurrent Tree

The concurrent cell is good for small data, but a larger structure - like a tree - may take too long to clone on each write. A good estimate is that if your data in the cell is larger than about 512 bytes, you likely want a concurrent tree instead.

In a concurrent tree, only the branches involved in the operation are cloned. Imagine the following tree:

         --- root 1 ---
        /               \
    branch 1         branch 2
    /     \         /        \
leaf 1   leaf 2  leaf 3    leaf 4

When we attempt to change a value in leaf 4 we copy it before we begin.

      ---------------------------
     /   --- root 1 ---          \-- root 2
     v  /               \                \
    branch 1         branch 2            branch 2(c)
    /     \         /        \           /    \
leaf 1   leaf 2  leaf 3    leaf 4        |    leaf 4(c)
                    ^                    |
                    \-------------------/

In the process the pointers from the new root 2 to branch 1 are maintained. branch 2(c) also maintains a pointer to leaf 3.

This means that in this example only 3/7 nodes are copied, saving a lot of cloning. As your tree grows this saves a lot of work. Consider a tree with node-widths of 7 pointers and at height level 5. Assuming perfect layout, you only need to clone 5/~16000 nodes. A huge saving in memory copy!

The interesting part is a reader of root 1, also is unaffected by the changes to root 2 - the tree from root 1 hasn’t been changed, as all it’s pointers and nodes are still valid.

When any reader of root 1 ends, we clean up all the nodes it pointed to that no longer are needed by root 2 (this can be done with atomic reference counting, or garbage lists in transactions).

         --- root 2 ---
        /               \
    branch 1         branch 2(c)
    /     \         /        \
leaf 1   leaf 2  leaf 3    leaf 4(c)

It is through this copy-on-write (also called multi view concurrency control) that we achieve concurrent readability in the tree.

This is really excellent for databases where you have in memory structures that work in parallel to the database transactions. In kanidm an example is the in-memory schema that is used at run time but loaded from the database. They require transactional behaviours to match the database, and ACID properties so that readers of a past transaction have the “matched” schema in memory.

Future Idea - Concurrent Cache

A design I have floated in my head is a concurrently readable cache - it should have the same transactional properties as a concurrently readable structure - one writer, multiple readers with consistent views of the data. As well it should support rollbacks if a writer fails.

This scheme should work with any cache type - LRU, LRU2Q, LFU. I plan to use ARC however.

ARC was popularised by ZFS - ARC is not specific to ZFS, it’s a strategy for cache replacement.

ARC is a combination of an LRU and LFU with a set of ghost lists and a weighting factor. When an entry is “missed” it’s inserted to the LRU. When it’s accessed from the LRU a second time, it moves to the LFU.

When entries are evicted from the LRU or LFU they are added to the ghost list. When a cache miss occurs, the ghost list is consulted. If the entry “would have been” in the LRU, but was not, the LRU grows and the LFU shrinks. If the item “would have been” in the LFU but was not, the LFU is expanded.

This causes ARC to be self tuning to your workload, as well as balancing “high frequency” and “high locality” operations.

A major problem though is ARC is not designed for concurrency - LFU/LRU rely on double linked lists which is very much something that only a single thread can modify safely.

How to make ARC concurrent

To make this concurrent, I think it’s important to specify the goals.

  • Readers should be able to read and find entries in the cache
  • If a reader locates a missing entry it must be able to load it from the database
  • The reader should be able to send loaded entries to the cache so they can be used.
  • Reader access metrics should be acknowledged by the cache.
  • Multiple reader generations should exist
  • A writer should be able to load entries to the cache
  • A writer should be able to modify an entry of the cache without affecting readers
  • Writers should be able to be rolled back with low penalty

There are a lot of places to draw inspiration from, and I don’t think I can list - or remember them all.

My current “work in progress” is that we use a concurrently readable pair of trees to store the LRU and LFU. These trees are able to be read by readers, and a writer can concurrently write changes.

The ghost lists of the LRU/LFU are maintained single thread by the writer. The linked lists for both are also single threaded and use key-references from the main trees to maintain themselves. The writer maintains the recv end of an mpsc queue. Finally a writer has an always-incrementing transaction id associated.

A reader when initiated has access to the writer of the queue and the transaction id of the writer that created this generation. The reader has a an empty hash map.

Modification to ARC

A modification is that we need to retain the transaction id’s related to items. This means the LRU and LFU contain:

type Txid: usize;

struct ARC<K, Value<V>> {
    lru: LRU<K, Value<V>>,
    lfu: LFU<K, Value<V>>,
    ghost_lru: BTreeMap<K, Txid>
    ghost_lfu: BTreeMap<K, Txid>
}

struct Value<V> {
    txid: Txid,
    data: V,
}

Reader Behaviour

The reader is the simpler part of the two, so we’ll start with that.

When a reader seeks an item in the cache, it references the read-only LRU/LFU trees. If found, we queue a cache-hit marker to the channel.

If we miss, we look in our local hashmap. If found we return that.

If it is not in the local hashmap, we now seek in the database - if found, we load the entry. The entry is stored in our local hashmap.

As the reader transaction ends, we send the set of entries in our local hash map as values (see Modification to ARC), so that the data and the transaction id of the generation when we loaded is associated. This has to be kept together as the queue could be recieving items from many generations at once.

The reader attempts a “try_include” at the end of the operation, and if unable, it proceeds.

enum State<V> {
    Missed<V>
    Accessed
}

struct ChanValue<K, V> {
    txid: Txid,
    key: K,
    data: State<V>
}

Writer Behaviour

There are two major aspects to writer behaviour. The writer is responsible for maintaining a local cache of missed items, a local cache of writen (dirty) items, managing the global LRU/LFU, and responding to the reader inclusion requests.

When the writer looks up a value, it looks in the LFU/LRU. If found (and the writer is reading) we return the data to the caller, and add an “accessed” value to the local thread store.

If the writer is attempting to mutate, we clone the value and put it into the local thread store in the “dirty” state.

enum State {
    Dirty(V),
    Clean(V),
    Accessed
}

struct Value<V> {
    txid: usize,
    state: State<V>
}

If it is not found, we seek the value in the database. It is added to the cache. If this is a write, we flag the entry as dirty. Else it’s flagged clean.

If we abort, we move to the include step before we complete the operation.

If we commit, we write our clean and dirty flagged data to the LFU/LRU as required. The LRU/LFU self manages it’s lists and sets, it’s okay to the concurrent behaviours. We indicate which items have been accessed.

We the perform an “include” operation. Readers attempt this at the end of their operations if the lock can be taken, and skip if not.

We dequeue from the queue up to some limit of values. For each value that is requested, we look it up in our LRU/LFU.

  • If the value was not in the ARC, and in the ghost list, include it + it’s txid if the txid is higher than the ghost key txid
  • If the value was not in the ARC, and not in the ghost list, include it.
  • If the value is in the ARC, but a lower txid, we update the access metrics.
  • If the value is in the ARC and a higher txid, we update the access metrics and update the value to the newer version.
  • If the value is an accessed marker, and the item is in the ghost list, continue
  • If the value is an accessed marker, and the item is in the ARC, we update it’s access metrics

Questions for Future William

ARC moves from LRU -> LFU if the LRU has a hit, but this seems overly aggresive. Perhaps this should be if LRU is a hit on 2 occasions move to LFU?

A thread must wake and feed the cache if we are unable to drain the readers, as we don’t want the queue to grow without bound.

Limitations and Concerns

Cache missing is very expensive - multiple threads may load the value, the readers must queue the value, and the writer must then act on the queue. Sizing the cache to be large enough is critically important as eviction/missing will have a higher penalty than normal. Optimally the cache will be “as large or larger” than the working set.

Due to the inclusion cost, the cache may be “slow” during the warm up, so this style of cache really matters for highly concurrent software that can not tolerate locking behaviour, and for items where the normal code paths are extremely slow. IE large item deserialisation and return.

Concurrency 1: Types of Concurrency

Posted by William Brown on December 28, 2019 02:00 PM

Concurrency 1: Types of Concurrency

I want to explain different types of concurrent datastructures, so that we can explore their properties and when or why they might be useful.

As our computer systems become increasingly parallel and asynchronous, it’s important that our applications are able to work in these environments effectively. Languages like Rust help us to ensure our concurrent structures are safe.

CPU Memory Model Crash Course

In no way is this a thorough, complete, or 100% accurate representation of CPU memory. My goal is to give you a quick brief on how it works. I highly recommend you read “what every programmer should know about memory” if you want to learn more.

In a CPU we have a view of a memory space. That could be in the order of KB to TB. But it’s a single coherent view of that space.

Of course, over time systems and people have demanded more and more performance. But we also have languages like C, that won’t change from their view of a system as a single memory space, or change how they work. Of course, it turns out C is not a low level language but we like to convince ourselves it is.

To keep working with C and others, CPU’s have acquired cache’s that are transparent to the operation of the memory. You have no control of what is - or is not - in the cache. It “just happens” asynchronously. This is exactly why spectre and meltdown happened (and will continue to happen) because these async behaviours will always have the observable effect of making your CPU faster. Who knew!

Anyway, for this to work, each CPU has multiple layers of cache. At L3 the cache is shared with all the cores on the die. At L1 it is “per cpu”.

Of course it’s a single view into memory. So if address 0xff is in the CPU cache of core 1, and also in cache of core 2, what happens? Well it’s supported! Caches between cores are kept in sync via a state machine called MESI. These states are:

  • Exclusive - The cache is the only owner of this value, and it is unchanged.
  • Modified - The cache is the only owner of this value, and it has been changed.
  • Invalid - The cache holds this value but another cache has changed it.
  • Shared - This cache and maybe others are viewing this valid value.

To gloss very heavily over this topic, we want to avoid invaild. Why? That means two cpus are contending for the value, causing many attempts to keep each other in check. These contentions cause CPU’s to slow down.

We want values to either be in E/M or S. In shared, many cpu’s are able to read the value at maximum speed, all the time. In E/M, we know only this cpu is changing the value.

This cache coherency is also why mutexes and locks exist - they issue the needed CPU commands to keep the caches in the correct states for the memory we are accessing.

Keep in mind Rust’s variables are immutable, and able to share between threads, or mutable and single thread only. Sound familar? Rust is helping with concurrency by keeping our variables in the fastest possible cache states.

Data Structures

We use data structures in programming to help improve behaviours of certain tasks. Maybe we need to find values quicker, sort contents, or search for things. Data Structures are a key element of modern computer performance.

However most data structures are not thread safe. This means only a single CPU can access or change them at a time. Why? Because if a second read them, due to cache-differences in content the second CPU may see an invalid datastructure, leading to undefined behaviour.

Mutexes can be used, but this causes other CPU’s to stall and wait for the mutex to be released - not really what we want on our system. We want every CPU to be able to process data without stopping!

Thread Safe Datastructures

There exist many types of thread safe datastructures that can work on parallel systems. They often avoid mutexes to try and keep CPU’s moving as fast as possible, relying on special atomic cpu operations to keep all the threads in sync.

Multiple classes of these structures exist, which have different properties.

Mutex

I have mentioned these already, but it’s worth specifying the properties of a mutex. A mutex is a system where a single CPU exists in the mutex. It becomes one “reader/writer” and all other CPU’s must wait until the mutex is released by the current CPU holder.

Read Write Lock

Often called RWlock, these allow one writer OR multiple parallel readers. If a reader is reading then a writer request is delayed until the readers complete. If a writer is changing data, all new reads are blocked. All readers will always be reading the same data.

These are great for highly concurrent systems provided your data changes infrequently. If you have a writer changing data a lot, this causes your readers to be continually blocking. The delay on the writer is also high due to a potentially high amount of parallel readers that need to exit.

Lock Free

Lock free is a common (and popular) datastructue type. These are structures that don’t use a mutex at all, and can have multiple readers and multiple writers at the same time.

The most common and popular structure for lock free is queues, where many CPUs can append items and many can dequeue at the same time. There are also a number of lock free sets which can be updated in the same way.

An interesting part of lock free is that all CPU’s are working on the same set - if CPU 1 reads a value, then CPU 2 writes the same value, the next read from CPU 1 will show the new value. This is because these structures aren’t transactional - lock free, but not transactional. There are some times where this is really useful as a property when you need a single view of the world between all threads, and your program can tolerate data changing between reads.

Wait Free

This is a specialisation of lock free, where the reader/writer has guaranteed characteristics about the time they will wait to read or write data. This is very detailed and subtle, only affecting real time systems that have strict deadline and performance requirements.

Concurrently Readable

In between all of these is a type of structure called concurrently readable. A concurrently readable structure allows one writer and multiple parallel readers. An interesting property is that when the reader “begins” to read, the view for that reader is guaranteed not to change until the reader completes. This means that the structure is transactional.

An example being if CPU 1 reads a value, and CPU 2 writes to it, CPU 1 would NOT see the change from CPU 2 - it’s outside of the read transaction!

In this way there are a lot of read-only immutable data, and one writer mutating and changing things … sounds familar? It’s very close to how our CPU’s cache work!

These structures also naturally lend themself well to long processing or database systems where you need transactional (ACID) properties. In fact some databases use concurrent readable structures to achieve ACID semantics.

If it’s not obvious - concurrent readability is where my interest lies, and in the next post I’ll discuss some specific concurrently readable structures that exist today, and ideas for future structures.

Packaging and the Security Proposition

Posted by William Brown on December 18, 2019 02:00 PM

Packaging and the Security Proposition

As a follow up to my post on distribution packaging, it was commented by Fraser Tweedale (@hackuador) that traditionally the “security” aspects of distribution packaging was a compelling reason to use distribution packages over “upstreams”. I want to dig into this further.

Why does C need “securing”

C as a language is unsafe in every meaning of the word. The best C programmers on the planet are incapable of writing a secure program. This is because to code in C you have to express a concurrent problem, into a language that is linearised, which is compiled relying on undefined behaviour, to be executed on an asynchronous concurrent out of order CPU. What could possibly go wrong?!

There is a lot you need to hold in mind to make C work. I can tell you now that I spend a majority of my development time thinking about the code to change rather than writing C because of this!

This has led to C based applications having just about every security issue known to man.

How is C “secured”

So as C is security swiss cheese, this means we have developed processes around the language to soften this issue - for example advice like patch and update continually as new changes are continually released to resolve issues.

Distribution packages have always been the “source” of updates for these libraries and applications. These packages are maintained by humans who need to update these packages. This means when a C project releases a fix, these maintainers would apply the patch to various versions, and then release the updates. These library updates due to C’s dynamic nature means when the machine is next rebooted (yes rebooted, not application restarted) that these fixes apply to all consumers who have linked to that library - change one, fix everything. Great!

But there are some (glaring) weaknesses to this model. C historically has little to poor application testing so many of these patches and their effects can’t be reproduced. Which also subsequently means that consuming applications also aren’t re-tested adequately. It can also have impacts where a change to a shared library can impact a consuming application in a way that was unforseen as the library changed.

The Dirty Secret

The dirty secret of many of these things is that “thoughts and prayers” is often the testing strategy of choice when patches are applied. It’s only because humans carefully think about and write tiny amounts of C that we have any reliability in our applications. And we already established that it’s nearly impossible for humans to write correct C …

Why Are We Doing This?

Because C linking and interfaces are so fragile, and due to the huge scope in which C can go wrong due to being a memory unsafe language, distributions and consumers have learnt to fear version changes. So instead we patch ancient C code stacks, barely test them, and hope that our castles of sand don’t fall over, all so we can keep “the same version” of a program to avoid changing it as much as possible. Ironically this makes those stacks even worse because we’ve developed infinite numbers of bespoke barely tested packages that people rely on daily.

To add more insult to this, most of this process is manual - humans monitor mailing lists, and have to know what code needs what patch, and when in what release streams. It’s a monumental amount of human time and labour involved to keep the sand castles standing. This manual involvement is what leads to information overload, and maintainers potentially missing security updates or releases that causes many distribution packages to be outdated, missing patches, or vulnerable more often than not. In other cases packages continue to be shipped that are unmaintained or have no upstream, so any issues that may exist are unknown or unresolved.

Distribution Security

This means all of platform and distribution security comes to one factor.

A lot of manual human labour.

It’s is only because distributions have so many volunteers or paid staff, that this entire system continues to progress to give the illusion of security and reliability. When it fails, it fails silently.

Heartbleed really dragged the poor state of C security into the open , and it’s still not been addressed.

When people say “how can we secure docker/flatpak/Rust” like we do with distributions, I say: “Do we really secure distributions at all?”. We only have a veneer of best effort masquerading as a secure supply chain.

A Different Model …

So let’s look briefly at Rust and how you package it today (against distribution maintainer advice).

Because it’s staticly linked, each application must be rebuilt if a library changes. Because the code comes from a central upstream, there are automated tools to find security issues (like cargo audit). The updates are pulled from the library as a whole working tested unit, and then built into our application to to recieve further testing and verification of the application as a whole singular functional unit.

These dependencies once can then be vendored to a tar (allowing offline builds and some aspects of reproducability). This vendor.tar.gz is placed into the source rpm along with the application source, and then built.

There is a much stronger pipeline of assurances here! And to further aid Rust’s cause, because it is a memory safe language, it eliminates most of the security issues that C is afflicted by, causing security updates to be far fewer, and to often affect higher level or esoteric situations. If you don’t believe me, look at the low frequency, and low severity commits for the rust advisory-db

People have worried that because Rust is staticly linked we’ll have to rebuild it and update it continually to keep it secure - I’d say because it’s Rust we’ll have stronger guarantees at build that security issues are less likely to exist and we won’t have to ship updates nearly as often as a C stack.

Another point to make is Rust libraries don’t release patches - because of Rust’s stronger guarantees at compile time and through integrated testing, people are less afraid of updates to versions. We are very unlikely to see Rust releasing patches, rather than just shipping “updates” to libraries and expecting you to update. Because these are staticly linked, we don’t have to worry about versions for other libraries on the platform, we only need to assure the application is currently working as intended. Because of the strong typing those interfaces of those libraries has stronger compile time guarantees at build time, meaning the issues around shared object versioning and symbol/version mismatching simply don’t exist - one of the key reasons people became version change averse in the first place.

So Why Not Package All The Things?

Many distribution packagers have been demanding a C-like model for Rust and others (remember, square peg, round hole). This means every single crate (library) is packaged, and then added to a set of buildrequires for the application. When a crate updates, it triggers the application to rebuild. When a security update for a library comes out, it rebuilds etc.

This should sound familiar … because it is. It’s reinventing Cargo in a clean-room.

RPM provides a way to manage dependencies. Cargo provides a way to manage dependencies.

RPM provides a way to offline build sources. Cargo provides a way to offline build sources.

RPM provides a way to patch sources. Cargo provides a way to update them inplace - and patch if needed.

RPM provides a way to … okay you get the point.

There is also a list of what we won’t get from distribution packages - remember distribution packages are the C language packaging system

We won’t get the same level of attention to detail, innovation and support as the upstream language tooling has. Simply put, users of the language just won’t use distribution packages (or toolchains, libraries …) in their workflows.

Distribution packages can’t offer is the integration into tools like cargo-audit for scanning for security issues - that needs still needs Cargo, not RPM, meaning the RPM will need to emulate what Cargo does exactly.

Using distribution packages means you have an untested pipeline that may add more risks now. Developers won’t use distribution packages - they’ll use cargo. Remember applications work best as they are tested and developed - outside of that environment they are an unknown.

Finally, the distribution maintainers security proposition is to secure our libraries - for distributions only. That’s acting in self interest. Cargo is offering a way to secure upstream so that everyone benefits. That means less effort and less manual labour all around. And secure libraries are not the full picture. Secure applications is what matters.

The large concerning factor is the sheer amount of human effort. We would spend hundreds if not thousands of hours to reinvent a functional tool in a disengaged manner, just so that we can do things as they have always been done in C - for the benefit of distributions individually rather than languages upstream.

What is the Point

Again - as a platform our role is to provide applications that people can trust. The way we provide these applications is never going to be one size fits all. Our objective isn’t to secure “this library” or “that library”, it’s to secure applications as a functional whole. That means that companies shipping those applications, should hire maintainers to work on those applications to secure their stacks.

Today I honestly think Rust has a better security and updating story than C packages ever has, powered by automation and upstream integration. Let’s lean on that, contribute to it, and focus on shipping applications instead of reinventing tools. We need to accept our current model is focused on C, that developers have moved around distribution packaging, and that we need to change our approach to eliminate the large human risk factor that currently exists.

We can’t keep looking to the models of the past, we need to start to invest in new methods for the future.

Today, distributions should focus on supporting and distributing _applications_ and work with native language supply chains to enable this.

Which is why I’ll keep using cargo’s tooling and auditing, and use distribution packages as a delievery mechanism for those applications.

What Could it Look Like?

We have a platform that updates as a whole (Fedora Atomic comes to mind …) with known snapshots that are tested and well known. This platform has methods to run applications, and those applications are isolated from each other, have their own libraries, and security audits.

And because there are now far fewer moving parts, quality is easier to assert, understand, and security updates are far easier and faster, less risky.

It certainly sounds a lot like what macOS and iOS have been doing with a read-only base, and self-contained applications within that system.

Packaging, Vendoring, and How It’s Changing

Posted by William Brown on December 17, 2019 02:00 PM

Packaging, Vendoring, and How It’s Changing

In today’s thoughts, I was considering packaging for platforms like opensuse or other distributions and how that interacts with language based packaging tools. This is a complex and … difficult topic, so I’ll start with my summary:

Today, distributions should focus on supporting and distributing _applications_ and work with native language supply chains to enable this.

Distribution Packaging

Let’s start by clarifying what distribution packaging is. This is your linux or platforms method of distributing it’s programs libraries. For our discussion we really only care about linux so say suse or fedora here. How macOS or FreeBSD deal with this is quite different.

Now these distribution packages are built to support certain workflows and end goals. Many open source C projects release their source code in varying states, perhaps also patches to improve or fix issues. These code are then put into packages, dependencies between them established due to dynamic linking, they are signed for verification purposes and then shipped.

This process is really optimised for C applications. C has been the “system language” for many decades now, and so we can really see these features designed to promote - and fill in gaps - for these applications.

For example, C applications are dynamically linked. Because of this it encourages package maintainers to “split” applications into smaller units that can have shared elements. An example that I know is openldap which may be a single source tree, but often is packaged to multiple parts such as the libldap.so, lmdb, openldap-client applications, it’s server binary, and probably others. The package maintainer is used to taking their scalpels and carefully slicing sources into elegant packages that can minimise how many things are installed to what is “just needed”.

We also see other behaviours where C shared objects have “versions”, which means you can install multiple versions of them at once and programs declare in their headers which library versions they want to consume. This means a distribution package can have many versions of the same thing installed!

This in mind, the linking is simplistic and naive. If a shared object symbol doesn’t exist, or you don’t give it the “right arguments” via a weak-compile time contract, it’s likely bad things (tm) will happen. So for this, distribution packaging provides the stronger assertions about “this program requires that library version”.

As well, in the past the internet was a more … wild place, where TLS wasn’t really widely used. This meant that to gain strong assertions about the source of a package and that it had not been tampered, tools like GPG were used.

What about Ruby or Python?

Ruby and Python are very different languages compared to C though. They don’t have information in their programs about what versions of software they require, and how they mesh together. Both languages are interpreted, and simply “import library” by name, searching a filesystem path for a library matching regardless of it’s version. Python then just loads in that library as source straight to the running vm.

It’s already apparent how we’ll run into issues here. What if we have a library “foo” that has a different function interface between version 1 and version 2? Python applications only request access to “foo”, not the version, so what happens if the wrong version is found? What if it’s not found?

Some features here are pretty useful from the “distribution package” though. Allowing these dynamic tools to have their dependencies requested from the “package”, and having the package integrity checked for them.

But overtime, conflicts started, and issues arose. A real turning point was ruby in debian/ubuntu where debian package maintainers (who are used to C) brought out the scalpel and attempted to slice ruby down to “parts” that could be reused form a C mindset. This led to a combinations of packages that didn’t make sense (rubygems minus TLS, but rubygems requires https), which really disrupted the communities.

Another issue was these languages as they grew in popularity had different projects requiring different versions of libraries - which as before we mentioned isn’t possible beside library search path manipulations which is frankly user hostile.

These issues (and more) eventually caused these communities as a whole to stop recommending distribution packages.

So put this history in context. We have Ruby (1995) and Python (1990), which both decided to avoid distribution packages with their own tools aka rubygems (2004) and pip (2011), as well as tools to manage multiple parallel environments (rvm, virtualenv) that were per-application.

New kids on the block

Since then I would say three languages have risen to importance and learnt from the experiences of Ruby - This is Javascript (npm/node), Go and Rust.

Rust went further than Ruby and Python and embedded distribution of libraries into it’s build tools from an early date with Cargo. As Rust is staticly linked (libraries are build into the final binary, rather than being dynamicly loaded), this moves all dependency management to build time - which prevents runtime library conflict. And because Cargo is involved and controls all the paths, it can do things such as having multiple versions available in a single build for different components and coordinating all these elements.

Now to hop back to npm/js. This ecosystem introduced a new concept - micro-dependencies. This happened because javascript doesn’t have dead code elimination. So if given a large utility library, and you call one function out of 100, you still have to ship all 99 unused ones. This means they needed a way to manage and distribute hundreds, if not thousands of tiny libraries, each that did “one thing” so that you pulled in “exactly” the minimum required (that’s not how it turned out … but it was the intent).

Rust also has inherited a similar culture - not to the same extreme as npm because Rust DOES have dead code elimination, but still enough that my concread library with 3 dependencies pulls in 32 dependencies, and kanidm from it’s 30 dependencies, pulls in 365 into it’s graph.

But in a way this also doesn’t matter - Rust enforces strong typing at compile time, so changes in libraries are detected before a release (not after like in C, or dynamic languages), and those versions at build are used in production due to the static linking.

This has led to a great challenge is distribution packaging for Rust - there are so many “libraries” that to package them all would be a monumental piece of human effort, time, and work.

But once again, we see the distribution maintainers, scalpel in hand, a shine in their eyes looking and thinking “excellent, time to package 365 libraries …”. In the name of a “supply chain” and adding “security”.

We have to ask though, is there really value of spending all this time to package 365 libraries when Rust functions so differently?

What are you getting at here?

To put it clearly - distribution packaging isn’t a “higher” form of distributing software. Distribution packages are not the one-true solution to distribute software. It doesn’t magically enable “security”. Distribution Packaging is the C language source and binary distribution mechanism - and for that it works great!

Now that we can frame it like this we can see why there are so many challenges when we attempt to package Rust, Python or friends in rpms.

Rust isn’t C. We can’t think about Rust like C. We can’t secure Rust like C.

Python isn’t C. We can’t think about Python like C. We can’t secure Python like C.

These languages all have their own quirks, behaviours, flaws, benefits, and goals. They need to be distributed in unique ways appropriate to those languages.

An example of the mismatch

To help drive this home, I want to bring up FreeIPA. FreeIPA has a lot of challenges in packaging due to it’s huge number of C, Python and Java dependencies. Recently on twitter it was annouced that “FreeIPA has been packaged for debian” as the last barrier (being dogtag/java) was overcome to package the hundreds of required dependencies.

The inevitable outcome of debian now packaging FreeIPA will be:

  • FreeIPA will break in some future event as one of the python or java libraries was changed in a way that was not expected by the developers or package maintainers.
  • Other applications may be “held back” from updating at risk/fear of breaking FreeIPA which stifles innovation in the java/python ecosystems surrounding.

It won’t be the fault of FreeIPA. It won’t be the fault of the debian maintainers. It will be that we are shoving square applications through round C shaped holes and hoping it works.

So what does matter?

It doesn’t matter if it’s Kanidm, FreeIPA, or 389-ds. End users want to consume applications. How that application is developed, built and distributed is a secondary concern, and many people will go their whole lives never knowing how this process works.

We need to stop focusing on packaging libraries and start to focus on how we distribute applications.

This is why projects like docker and flatpak have surprised traditional packaging advocates. These tools are about how we ship applications, and their build and supply chains are separated from these.

This is why I have really started to advocate and say:

Today, distributions should focus on supporting and distributing _applications_ and work with native language supply chains to enable this.

Only we accept this shift, we can start to find value in distributions again as sources of trusted applications, and how we see the distribution as an application platform rather than a collection of tiny libraries.

The risk of not doing this is alienating communities (again) from being involved in our platforms.

Follow Up

There have been some major comments since:

First, there is now a C package manager named conan . I have no experience with this tool, so at a distance I can only assume it works well for what it does. However it was noted it’s not gained much popularity, likely due to the fact that distro packages are the current C language distribution channels.

The second was about the security components of distribution packaging and this - that topic is so long I’ve written another post about the topic instead, to try and keep this post focused on the topic.

Finally, The Fedora Modularity effort is trying to deal with some of these issues - that modules, aka applications have different cadences and requirements, and those modules can move more freely from the base OS.

Some of the challenges have been explored by LWN and it’s worth reading. But I think the underlying issue is that again we are approaching things in a way that may not align with reality - people are looking at modules as libraries, not applications which is causing things to go sideways. And when those modules are installed, they aren’t isolated from each other , meaning we are back to square one, with a system designed only for C. People are starting to see that but the key point is continually missed - that modularity should be about applications and their isolation not about multiple library versions.

Fixing opensuse virtual machines with resume

Posted by William Brown on December 14, 2019 02:00 PM

Fixing opensuse virtual machines with resume

Today I hit an unexpected issue - after changing a virtual machines root disk to scsi, I was unable to boot the machine.

The host is opensuse leap 15.1, and the vm is the same. What’s happening!

The first issue appears to be that opensuse 15.1 doesn’t support scsi disks from libvirt. I’m honestly not sure what’s wrong here.

The second is that by default opensuse leap configures suspend and resume to disk - by it uses the pci path instead of a swap volume UUID. So when you change the bus type, it renames the path making the volume inaccessible. This causes boot to fail.

To work around you can remove “resume=/disk/path” from the cli. Then to fix it permanently you need:

transactional-update shell
vim /etc/default/grub
# Edit this line to remove "resume"
GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,115200 resume=/dev/disk/by-path/pci-0000:00:07.0-part3 splash=silent quiet showopts"

vim /etc/default/grub_installdevice
# Edit the path to the correct swap location as by ls -al /dev/disk/by-path
/dev/disk/by-path/pci-0000:00:07.0-part3

I have reported these issues, and I hope they are resolved.

Password Quality and Badlisting in Kanidm

Posted by William Brown on December 06, 2019 02:00 PM

Password Quality and Badlisting in Kanidm

Passwords are still a required part of any IDM system. As much as I wish for Kanidm to only support webauthn and stronger authentication types, at the end of the day devices can be lost, destroyed, some people may not be able to afford them, some clients aren’t compatible with them and more.

This means the current state of the art is still multi-factor auth. Something you have and something you know.

Despite the presence of the multiple factors, it’s still important to quality check passwords. Microsoft’s Azure security team have written about passwords, and it really drives home the current situation. I would certainly trust these people at Microsoft to know what they are talking about given the scale of what they have to defend daily.

The most important take away is that trying to obscure the password from a bruteforce is a pointless exercise because passwords end up in password dumps, they get phished, keylogged, and more. MFA matters!

It’s important here to look at the “easily guessed” and “credential stuffing” category. That’s what we really want to defend against with password quality, and MFA protects us against keylogging, phising (only webauthn), and reuse.

Can we Avoid This?

Yes! Kanidm supports a “generated” password that is a long, high entropy password that should be stored in a password manager or similar tool to prevent human knowledge. This fits to our “device as authentication” philosophy. You authenticate to your device (phone, laptop etc), and then the devices stored passwords authenticate you from that point on. This has the benefit that devices and password managers generally perform better checking of the target where we enter the password, making phising less likely.

But sometimes we can’t rely on this, so we still need human-known passwords, so we still take steps to handle these.

Quality

In the darkages, we would decree that human known passwords had to have a number, symbol, roman numeral, no double letters, one capital letter, two kanji and no symbols that could be part of an sql injection because of that one legacy system we can’t patch.

This lead to people making horrid, un-rememberable passwords in leetspeak, or giving up altogether and making the excellent “Password1” (which passes AD minimum password requirements on server 2003).

What we really want is entropy, length, and memorability. Dropbox made a great library for this called zxcvbn, which has since been ported to rust . I highly recommend it.

This library is great because it focuses on entropy, and then if the password doesn’t meet these requirements, the library recommends ways to improve. This is excellent for human interaction and experience, guiding people to create better passwords that they can remember, rather than our outdated advice of the complex passwords as described above.

Badlisting

Badlisting is another great technique for improving password quality. It’s essentially a blocklist of passwords that people are not allowed to set. This way you can have corporate-specific breach lists, or the top 10k most used passwords badlisted to prevent users using them. For example, “correct horse battery staple” may be a strong password, but it’s well known thanks to xkcd.

It’s also good for preventing password reuse in your company if you are phished and the credentials privately notified to you as some of the regional CERT’s do, allowing you to block these without them being in a public breach list.

This is important as many bots will attempt to spam these passwords against accounts (rate limiting auth and soft-locking accounts also helps to delay these attack styles).

In Kanidm

In Kanidm, we chose to use both approaches. First we check the password with zxcvbn, then we ensure it’s not in a badlist.

In order to minimise the size of the badlist, the badlist uses case insensitive storage so that multiple variants of “password” and “PasSWOrD” are only listed once. We also preprocessed the badlist with zxcvbn to remove any passwords that it would have denied from being entered. The preprocessor tool will be shipped with kanidm so that administrators can preprocess their own lists before adding them to the badlist configuration.

Creating a Badlist

I decided to do some analysis on a well known set of passwords maintained on the seclists repository . Apparently this is what pentesters reach for when they want to bruteforce or credential stuff on a domain.

I analysed this in three ways. The first was as a full set of passwords (about 25 million) and a smaller but “popular” set in the “rockyou” files which is about 60,000 passwords. Finally I did an analysis of the rockyou + top 10 million most common (which combined was 1011327 unique passwords, so about 50k of the rockyou set is from top 10 million).

From the 25 million set I ran this through a preprocessor tool that I developed for kanidm. It eliminated anything less than a score of 3 and no length rule. This showed that zxcvbn was able to prevent 80% of these inputs from being allowed. If I was to ship this full list, this would contain 4.8 million badlisted passwords. It’s pretty amazing already that zxcvbn stops 80% of bad passwords that end up in breaches from being able to be used, with the remaining 20% likely to be complex passwords that just got dumped by other means.

However, for the badlist in Kanidm, I decided to start with “what’s popular” for now, and to allow sites to add extra content if they desire. This meant that I focused instead on the “rockyou” password set instead.

From the rockyou set I did more tests. zxcvbn has a concept of scores, and we can have policy to request a minimum score is present to allow the password. I did a score 3 test, a score 3 with min pw len 10 and a score 4 test. This showed the following results which has the % blocked by zxcvbn and the no. that passed which will required badlisting as zxcvbn can’t detect them (today).

TEST     | % blocked | no. passed
---------------------------------
 s3      |  98.3%    |  1004
 s3 + 10 |  98.9%    |  637
 s4      |  99.7%    |  133

Personally, it’s quite hilarious that “2fast2furious” passed the score 3 check, and “30secondstomars” and “dracomalfoy” passed the score 4 check, but who am I to judge - that’s what bad lists are for.

More seriously, I found it interesting to see the effect of the check on length - not only was the preprocessor step faster, but alone that eliminated ~400 passwords that would have “passed” on score 3.

Finally, from the rockyou + 10m set, the results show the following in the same conditions.

TEST     | % blocked | no. passed
---------------------------------
 s3      |  89.9%    |  101349
 s3 + 10 |  92.4%    |  76425
 s4      |  96.5%    |  34696

This shows that a very “easy” win is to enforce password length, in addition to entropy checkers like zxcvbn, which are effective to block 92% of the most common passwords in use on a broad set and 98% of what a pentester will look for (assuming rockyou lists). If you have a high security environment you should consider setting zxcvbn to request passwords of score 4 (the maximum), given that on the 10m set it had a 96.5% block rate.

Conclusions

You should use zxcvbn, it’s a great library, which quickly reduces a huge amount of risk from low quality passwords.

After that your next two strongest controls are password length, and being able to support badlisting.

Better yet, use MFA like Webauthn as well, and support server-side generated high-entropy passwords!

Rust 2020 - helping to get rust deployed

Posted by William Brown on November 27, 2019 02:00 PM

Rust 2020 - helping to get rust deployed

This is my contribution to Rust 2020, where community members put forward ideas on what they thing Rust should aim to achieve in 2020.

In my view, Rust has had an amazing adoption by developers, and is great if you are in a position to deploy it in your own infrastructure, but we have yet to really see Rust make it to broad low-level components (IE in a linux distro or other infrastructure).

As someone who works on “enterprise” software (389-ds) and my own IDM project (kanidm), there is a need to have software packaged and distributed. We can not ask our consumers to build and compile these tools. One could view it as a chain, where I develop software in a language, it’s packaged for a company (like SUSE), and then consumed by a customer (could be anyone!) who provides a service to others (indirect users).

Rust however has always been modeled that there is no “middle” section. You have either a developer who’s intent is to develop for other developers. This is where Rust ideas like crates.io becomes involved. Alternately, you have a larger example in firefox, where developers build a project and can “bundle” everything into a whole unit that is then distributed directly to customers.

The major difference is that in the intermediate distribution case, we have to take on different responsibilities such as security auditing, building, ensuring dependencies exist etc.

So this leads me to:

1: Cargo Vendor Needs Some Love

Cargo vendor today is quite confusing in some scenarios, and it’s not clear how to have it work for projects that require offline builds. I have raised issues about this, but largely they have been un-acted upon.

2: Cargo is Difficult to Use in Mixed Language Projects

A large value proposition of Rust is the ability to use it with FFI and C. This is great if you say have cargo compile your C code for you.

But most major existing projects don’t. They use autotools, cmake, or maybe even meson or more esoteric, waf (looking at you samba). Cargo’s extreme opinionation in this area makes it extremely difficult to integrate Rust into an existing build system reliably. It’s hard to point to one fault, as much as a broader “lack of concern” in the space, and I think cargo needs to evolve improvements to help allow Rust to be used from other build tools.

3: Rust Moves Too Fast

A lot of “slower” enterprise companies want to move slowly, including compiler versions. Admittedly, this conservative behaviour is because of the historical instability of gcc versions and how it can change or affect your code between releases. Rust doesn’t suffer this, but people are still wary of fast version changes. This means Rustc + Cargo will get pinned to some version that may be 6 months old.

However crate authors don’t consider this - they will use the latest and greatest features from stable (and sometimes still nightly … grrr) in releases. Multiple times I have found that on my development environment even with a 3 month old compiler, dependencies won’t build.

Compounding this, crates.io doesn’t distinguish a security release from a feature one. Crates also encourages continuall version bumping, rather than maintenence of versioned branches. IE version 0.4.3 of a crate with a security fix will become 0.4.4, but then a feature update to include try_from may go to 0.4.5 as it “adds” to the api, or they use it internally as a cleanup.

Part of this issue is that Rust applications need to be treated closer to docker, as static whole units where only the resulting binary is supported rather than the toolchain that built it. But that only works on pure Rust applications - any mixed C + Rust application will hit this issue due to the difference between a system Rust version and what crate dependencies publish.

So I think that from this it leads to:

3.1: Crates need to indicate a minimum supported compiler version

Rust has “toyed” with the idea of editions, but within 2018 we’ve seen new features like maybeuninit and try_from land, which within an “edition” caused crates to stop worked on older compilers.

As a result, editions I think is “too broad” and people will fear incrementing it, and Rust will add features without changing edition anyway. Instead Rust needs to consider following up on the minimum supported rust version flag RFC. Rust has made it pretty clear the only “edition” flag that matters is the rust compiler version based on crate developers and what they are releasing.

3.2: Rust Needs to Think “What’s Our End Goal”

Rust is still moving incredibly fast, and I think in a way we need to ask … when will Rust be finished? When will it as a language go from continually rapid growth to stable and known feature sets? The idea of Rust editions acts as though this has happened (saying we change only every few years) when this is clearly not the case. Rust is evolving release-on-release, every 6 weeks.

4: Zero Cost Needs to Factor in Human Cost

My final wish for Rust is that sometimes we are so obsessed with the technical desire for zero cost abstraction, that we forget the high human cost and barriers that can exist as a result making feature adoption challenging. Rust has had a great community that treats people very well, and I think sometimes we need to extend that into feature development, to really consider the human cognitive cost of a feature.

Summarised - what’s the benefit of a zero cost abstraction if people can not work out how to use it?

Summary

I want to see Rust become a major part of operating systems and how we build computer systems, but I think that we need to pace ourselves, improve our tooling, and have some better ideas around what Rust should look like.

Recovering LVM when a device is missing with a cache pool lv

Posted by William Brown on November 25, 2019 02:00 PM

Recovering LVM when a device is missing with a cache pool lv

I had a heartstopping moment today: my after running a command lvm proudly annouced it had removed an 8TB volume containing all of my virtual machine backing stores.

Everyone, A short view back to the past …

I have a home server, with the configured storage array of:

  • 2x 8TB SMR (Shingled Magnetic Recording) archive disks (backup target)
  • 2x 8TB disks (vm backing store)
  • 2x 1TB nvme SSD (os + cache)

The vm backing store also had a lvm cache segment via the nvme ssds in a raid 1 configuration. This means that the 2x8TB drives are in raid 1, and a partition on each of the nvme devices are in raid 1, then they are composed to allow the nvme to cache blocks from/to the 8TB array.

Two weeks ago I noticed one of the nvme drives was producing IO errors indicating a fault of the device. Not wanting to risk corruption or other issues from growing out of hand, I immediately shutdown the machine and identified the nvme disk with the error.

At this stage I took the precaution of imaging (dd) both the good and bad nvme devices to the archive array. Subsequently I completed a secure erase of the faulty nvme drive before returning it to the vendor for RMA.

I then left the server offline as I was away from my home for more than a week and would not need, and was unable to monitor if the other drives would produce further errors.

Returning home …

I decided to ignore William of the past (always a bad idea) and to “break” the raid on the remaining nvme device so that my server could operate allowing me some options for work related tasks.

This is an annoying process in lvm - you need to remove the missing device from the volume group as well as indicating to the array that it should no longer be in a raid state. This vgreduce is only for removing missing PV’s, it shouldn’t be doing anything else.

I initiated the raid break process on the home, root and swap devices. The steps are:

vgreduce --removemissing <vgname>
lvconvert -m0 <vgname>/<lvname>

This occured without fault due to being present on an isolated “system” volume group, so the partial lvs were untouched and left on the remaining pv in the vg.

When I then initiated this process on the “data” vg which contained the libvirt backing store, vgreduce gave me the terrifying message:

Removing logical volume "libvirt_t2".

Oh no ~

Recovery Attempts

When a logical volume is removed, it can be recovered as lvm stores backups of the LVM metadata state in /etc/lvm/archive.

My initial first reaction was that I was on a live disk, so I needed to backup this content else it would be lost on reboot. I chose to put this on the unaffected, and healthy SMR archive array.

mount /dev/mapper/archive-backup /archive
cp -a /etc/lvm /archive/lvm-backup

At this point I knew that randomly attempting commands would cause further damage and likely prevent any ability to recover.

The first step was to activate ssh so that I could work from my laptop - rather than the tty with keyboard and monitor on my floor. It also means you can copy paste, which reduces errors. Remember, I’m booted on a live usb, which is why I reset the password.

# Only needed in a live usb.
passwd
systemctl start sshd

I then formulated a plan and wrote it out. This helps to ensure that I’ve thought through the recovery process and the risks, it helps be to be fully aware of the situation.

vim recovery-plan.txt

Into this I laid out the commands I would follow. Here is the plan:

bytes 808934440960
data_00001-2096569583

dd if=/dev/zero of=/mnt/lv_temp bs=4096 count=197493760
losetup /dev/loop10 /mnt/lv_temp
pvcreate --restorefile /etc/lvm/archive/data_00001-2096569583.vg --uuid iC4G41-PSFt-6vqp-GC0y-oN6T-NHnk-ivssmg /dev/loop10
vgcfgrestore data --test --file /etc/lvm/archive/data_00001-2096569583.vg

Now to explain this: The situation we are in is:

  • We have a removed data/libvirt_t2 logical volume
  • The VG data is missing a single PV (nvme0). It still has three PVs (nvme1, sda1, sdb1).
  • We can not restore metadata unless all devices are present as per the vgcfgrestore man page.

This means, we need to make a replacement device to replace into the array, and then to restore the metadata with that.

The “bytes” section you see, is the size of the missing nvme0 partition that was a member of this array - we need to create a loopback device of the same or greater size to allow us to restore the metadata. (dd, losetup)

Once the loopback is created, we can then recreate the pv on the loopback device with the same UUID as the missing device.

Once this is present, we can now restore the metadata as documented which should contain the logical volume.

I ran these steps and it was all great until vgcfgrestore. I can not remember the exact error but it was along the lines of:

Unable to restore metadata as PV was missing for VG when last modification was performed.

Yep, the vgreduce command has changed the VG state, triggering a metadata backup, but because a device was missing at the time, we can not restore this metadata.

Options …

At this point I had to consider alternate options. I conducted research into this topic as well to see if others had encountered this case (no one has ever not been able to restore their metadata apparently in this case …). The options that I arrived at:

    1. Restore the metadata from the nvme /root as it has older (but known) states - however I had recently expanded the libvirt_t2 volume from a live disk, meaning it may not have the correct part sizes.
    1. Attempt to extract the xfs filesystem with DD from the disk to begin a data recovery.
    1. Cry in a corner
    1. Use lvcreate with the “same paramaters” and hope that it aligns the start at the same location as the former data/libvirt_t2 allowing the xfs filesystem to be accessed.

All of these weren’t good - especially not 3.

I opted to attempt solution 1, and then if that failed, I would disconnect one of the 8TB disks, attempt solution 4, then if that ALSO failed, I would then attempt 2, finally all else lost I would begin solution 3. The major risk of relying on 4 and 2 is that LVM has dynamic geometry on disk, it does not always allocate contiguously. This means that attempting 4 with lvcreate may not create with the same geometry, and it may write to incorrect locations causing dataloss. The risk of 2 was again, due to the dynamic geometry what we recover may be re-arranged and corrupt.

This mean option 1 was the best way to proceed.

I mounted the /root volume of the host and using the lvm archive I was able to now restore the metadata.

vgcfgrestore data --test --file /system/etc/lvm/archive/data_00004-xxxx.vg

Once completed I performed an lvscan to refresh what block devices were available. I was then shown that every member of the VG data had conflicting seqno, and that the metadata was corrupt and unable to proceed.

Somehow we’d made it worse :(

Successful Procedure

At this point, faced with 3 options that were all terrible, I started to do more research. I finally discovered a post describing that the lvm metadata is stored on disk in the same format as the .vg files in the archive, and it’s a ring buffer. We may be able to restore from these.

To do so, you must dd out of the disk into a file, and then manipulate the file to only contain a single metadata entry.

Remember how I made images of my disks before I sent them back? This was their time to shine.

I did do a recovery plan with these commands too, but it was more evolving due to the paramaters involved so it changed frequently with the offsets. The plan was very similar to above - use a loop device as a stand in for the missing block device, restore the metadata, and then go from there.

We know that LVM metadata occurs in the first section of the disk, just after the partition start. So to work out where this is we use gdisk to show the partitions in the backup image.

# gdisk /mnt/mion.nvme0n1.img
GPT fdisk (gdisk) version 1.0.4
...

Command (? for help): p
Disk /mnt/mion.nvme0n1.img: 2000409264 sectors, 953.9 GiB
Sector size (logical): 512 bytes
...

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048         1026047   500.0 MiB   EF00
   2         1026048       420456447   200.0 GiB   8E00
   3       420456448      2000409230   753.4 GiB   8E00

It’s important to note the sector size flag, as well as the fact the output is in sectors.

The LVM header occupies 255 sectors after the start of the partition. So this in mind we can now create a dd command to extract the needed information.

dd if=/mnt/mion.nvme0n1.img of=/tmp/lvmmeta bs=512 count=255 skip=420456448

bs sets the sector size to 512, count will read from the start up to 255 sectors of size ‘bs’, and skip says to start reading after ‘skip’ * ‘sector’.

At this point, we can now copy this and edit the file:

cp /tmp/lvmmeta /archive/lvm.meta.edit

Within this file you can see the ring buffer of lvm metadata. You need to find the highest seqno that is a complete record. For example, my seqno = 20 was partial (is the lvm meta longer than 255, please contact me if you know!), but seqno=19 was complete.

Here is the region:

# ^ more data above.
}
# Generated by LVM2 version 2.02.180(2) (2018-07-19): Mon Nov 11 18:05:45 2019

contents = "Text Format Volume Group"
version = 1

description = ""

creation_host = "linux-p21s"    # Linux linux-p21s 4.12.14-lp151.28.25-default #1 SMP Wed Oct 30 08:39:59 UTC 2019 (54d7657) x86_64
creation_time = 1573459545      # Mon Nov 11 18:05:45 2019

^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@data {
id = "4t86tq-3DEW-VATS-1Q5x-nLLy-41pR-zEWwnr"
seqno = 19
format = "lvm2"

So from there you remove everything above “contents = …”, and clean up the vgname header. It should look something like this.

contents = "Text Format Volume Group"
version = 1

description = ""

creation_host = "linux-p21s"    # Linux linux-p21s 4.12.14-lp151.28.25-default #1 SMP Wed Oct 30 08:39:59 UTC 2019 (54d7657) x86_64
creation_time = 1573459545      # Mon Nov 11 18:05:45 2019

data {
id = "4t86tq-3DEW-VATS-1Q5x-nLLy-41pR-zEWwnr"
seqno = 19
format = "lvm2"

Similar, you need to then find the bottom of the segment (look for the next highest seqno) and remove everything below the line: “# Generated by LVM2 …”

Now, you can import this metadata to the loop device for the missing device. Note I had to wipe the former lvm meta segment due to the previous corruption, which caused pvcreate to refuse to touch the device.

dd if=/dev/zero of=/dev/loop10 bs=512 count=255
pvcreate --restorefile lvmmeta.orig.nvme1.edited --uuid iC4G41-PSFt-6vqp-GC0y-oN6T-NHnk-ivssmg /dev/loop10

Now you can do a dry run:

vgcfgrestore --test -f lvmmeta.orig.nvme1.edited data

And the real thing:

vgcfgrestore -f lvmmeta.orig.nvme1.edited data
lvscan

Hooray! We have volumes! Let’s check them, and ensure their filesystems are sane:

lvs
lvchange -ay data/libvirt_t2
xfs_repair -n /dev/mapper/data-libvirt_t2

If xfs_repair says no errors, then go ahead and mount!

At this point, lvm started to resync the raid, so I’ll leave that to complete before I take any further action to detach the loopback device.

How to Handle This Next Time

The cause of this issue really comes from vgreduce –removemissing removing the device when a cache member can’t be found. I plan to report this as a bug.

However another key challenge was the inability to restore the lvm metadata when the metadata archive reported a missing device. This is what stopped me from being able to restore the array in the first place, even though I had a “fake” replacement. This is also an issue I intend to raise.

Next time I would:

  • Activate the array as a partial
  • Remove the cache device first
  • Then stop the raid
  • Then perform the vgreduction

I really hope this doesn’t happen to you!

Upgrading OpenSUSE 15.0 to 15.1

Posted by William Brown on September 24, 2019 02:00 PM

Upgrading OpenSUSE 15.0 to 15.1

It’s a little bit un-obvious how to do this. You have to edit the repo files to change the release version, then refresh + update.

sed -ri 's/15\.0/15.1/' /etc/zypp/repos.d/*.repo
zypper ref
zypper dup
reboot

Note this works on a transactional host too:

sed -ri 's/15\.0/15.1/' /etc/zypp/repos.d/*.repo
transactional-update dup
reboot

It would be nice if these was an upgrade tool that would attempt the upgrade and revert the repo files, or use temporary repo files for the upgrade though. It would be a bit nicer as a user experience than sed of the repo files.

Announcing Kanidm - A new IDM project

Posted by William Brown on September 17, 2019 02:00 PM

Announcing Kanidm - A new IDM project

Today I’m starting to talk about my new project - Kanidm. Kanidm is an IDM project designed to be correct, simple and scalable. As an IDM project we should be able to store the identities and groups of people, authenticate them securely to various other infrastructure components and services, and much more.

You can find the source for kanidm on github.

For more details about what the project is planning to achieve, and what we have already implemented please see the github.

What about 389 Directory Server

I’m still part of the project, and working hard on making it the best LDAP server possible. Kanidm and 389-ds have different goals. 389 Directory Server is a globally scalable, distributed database, that can store huge amounts of data and process thousands of operations per second. 389-ds let’s you build a system ontop, in any way you want. If you want an authentication system today, use 389-ds. We are even working on a self-service web portal soon too (one of our most requested features!). Besides my self, no one on the (amazing) 389 DS team has any association with kanidm (yet?).

Kanidm is an opinionated IDM system, and has strong ideas about how authentication and users should be processed. We aim to be scalable, but that’s a long road ahead. We also want to have more web integrations, client tools and more. We’ll eventually write a kanidm to 389-ds sync tool.

Why not integrate something with 389? Why something new?

There are a lot of limitations with LDAP when it comes to modern web-focused auth processes such as webauthn. Because of this, I wanted to make something that didn’t have the same limitations, and had different ideas about data storage and apis. That’s why I wanted to make something new in parallel. It was a really hard decision to want to make something outside of 389 Directory Server (Because I really do love the project, and have great pride in the team), but I felt like it was going to be more productive to build in parallel, than ontop.

When will it be ready?

I think that a single-server deployment will be usable for small installations early 2020, and a fully fledged system with replication would be late 2020. It depends on how much time I have and what parts I implement in what order. Current rough work order (late 2019) is indexing, RADIUS integration, claims, and then self-service/web ui.

OpenSUSE leap as a virtualisation host

Posted by William Brown on September 01, 2019 02:00 PM

OpenSUSE leap as a virtualisation host

I’ve been rebuilding my network to use SUSE from CentOS, and the final server was my hypervisor. Most of the reason for this is the change in my employment, so I feel it’s right to dogfood for my workplace.

What you will need

  • Some computer parts (assembaly may be required)
  • OpenSUSE LEAP 15.1 media (dd if=opensuse.iso of=/dev/a_usb_i_hope)

What are we aiming for?

My new machine has dual NVME and dual 8TB spinning disks. The intent is to have the OS on the NVME and to have a large part of the NVME act as LVM cache for the spinning disk. The host won’t run any applications beside libvirt, and has to connect a number of vlans over a link aggregation.

Useful commands

Through out this document I’ll assume some details about your devices and partitions. To find your own, and to always check and confirm what you are doing, some command will help:

lsblk  # Shows all block storage devices, and how (if) they are mounted.
lvs  # shows all active logical volumes
vgs  # shows all active volume groups
pvs  # shows all active physical volumes
dmidecode  # show hardware information
ls -al /dev/disk/by-<ID TYPE>/  # how to resolve disk path to a uuid etc.

I’m going to assume you have devices like:

/dev/nvme0  # the first nvme of your system that you install to.
/dev/nvme1  # the second nvme, used later
/dev/sda    # Two larger block storage devices.
/dev/sdb

Install

Install and follow the prompts. Importantly when you install you install to a single NVME, and choose transactional server + lvm, then put btrfs on the /root in the lvm. You want to partition such that there is free space still in the NVME - I left about 400GB unpartitioned for this. In other words, the disk should be like:

[ /efi | pv + vg system               | pv (unused) ]
       | /root (btrfs), /boot, /home  |

Remember to leave about 1GB of freespace on the vg system to allow raid 1 metadata later!

Honestly, it may take you a try or two to get this right with YaST, and it was probably the trickiest part of the install.

You should also select that network management is via networkmanager, not wicked. You may want to enable ssh here. I disabled the firewall personally because there are no applications and it interfers with the bridging for the vms.

Packages

Because this is suse transactional we need to add packages and reboot each time. Here is what I used, but you may find you don’t need everything here:

transactional-update pkg install libvirt libvirt-daemon libvirt-daemon-qemu \
  sssd sssd-ad sssd-ldap sssd-tools docker zsh ipcalc python3-docker rdiff-backup \
  vim rsync iotop tmux fwupdate fwupdate-efi bridge-utils qemu-kvm apcupsd

Reboot, and you are ready to partition.

Partitioning - post install

First, copy your gpt from the first NVME to the second. You can do this by hand with:

gdisk /dev/nvme0
p
q

gdisk /dev/nvme1
c
<duplicate the parameters as required>

Now we’ll make your /efi at least a little redundant

mkfs.fat /dev/nvme1p0
ls -al /dev/disk/by-uuid/
# In the above, look for your new /efi fs, IE CE0A-2C1D -> ../../nvme1n1p1
# Now add a line to /etc/fstab like:
UUID=CE0A-2C1D    /boot/efi2              vfat   defaults                      0  0

Now to really make this work, because it’s transactional, you have to make a change to the /root, which is readonly! To do this run

transactional-update shell dup

This put’s you in a shell at the end. Run:

mkdir /boot/efi2

Now reboot. After the reboot your second efi should be mounted. rsync /boot/efi/* to /boot/efi2/. I leave it to the reader to decide how to sync this periodically.

Next you can setup the raid 1 mirror for /root and the system vg.

pvcreate /dev/nvme1p1
vgextend system /dev/nvme1p1

Now we have enough pvs to make a raid 1, so we convert all our volumes:

lvconvert --type raid1 --mirrors 1 system/home
lvconvert --type raid1 --mirrors 1 system/root
lvconvert --type raid1 --mirrors 1 system/boot

If this fails with “not enough space to alloc metadata”, it’s because you didn’t leave space on the vg during install. Honestly, I made this mistake twice due to confusion about things leading to two reinstalls …

Getting ready to cache

Now lets get ready to cache some data. We’ll make pvs and vgs for data:

pvcreate /dev/nvme0p2
pvcreate /dev/nvme1p2
pvcreate /dev/sda1
pvcreate /dev/sda2
vgcreate data /dev/nvme0p2 /dev/nvme1p2 /dev/sda1 /dev/sdb2

Create the larger volume

lvcreate --type raid1 --mirrors 1 -L7.5T -n libvirt_t2 data /dev/sda1 /dev/sdb1

Prepare the caches

lvcreate --type raid1 --mirrors 1 -L 4G -n libvirt_t2_meta data
lvcreate --type raid1 --mirrors 1 -L 400G -n libvirt_t2_cache data
lvconvert --type cache-pool --poolmetadata data/libvirt_t2_meta data/libvirt_t2_cache

Now put the caches in front of the disks. It’s important for you to check you have the correct cachemode at this point, because you can’t change it without removing and re-adding the cache. I choose writeback because my nvme devices are in a raid 1 mirror, and it helps to accelerate writes. You may err to use the default where the SSD’s are read cache only.

lvconvert --type cache --cachemode writeback --cachepool data/libvirt_t2_cache data/libvirt_t2
mkfs.xfs /dev/mapper/data-libvirt_t2

You can monitor the amount of “cached” data in the data column of lvs.

Now you can add this to /etc/fstab as any other xfs drive. I mounted it to /var/lib/libvirt/images.

Network Manager

Now, I have to assemble the network bridges. Network Manager has some specific steps to follow to achieve this. I have:

  • two intel gigabit ports
  • the ports are link aggregated by 802.3ad
  • there are multiple vlans ontop of the link agg
  • bridges must be built on top of the vlans

This requires a specific set of steps to layer this, because network manager sees the bridge and the lagg as seperate things that require the vlan to tie them together.

Configure the link agg, and add our two ethernet phys

nmcli conn add type bond con-name bond0 ifname bond0 mode 802.3ad ipv4.method disabled ipv6.method ignore
nmcli connection add type ethernet con-name bond0-eth1 ifname eth1 master bond0 slave-type bond
nmcli connection add type ethernet con-name bond0-eth2 ifname eth2 master bond0 slave-type bond

Add a bridge for a vlan:

nmcli connection add type bridge con-name net_18 ifname net_18 ipv4.method disabled ipv6.method ignore

Now tie together a vlan on the bond, to the bridge we created.

nmcli connection add type vlan con-name bond0.18 ifname bond0.18 dev bond0 id 18 master net_18 slave-type bridge

You will need to repeat these last two commands as required for the vlans you have.

House Keeping

Finally you need to do some house keeping. Transactional server will automatically reboot and update so you need to be ready for this. You may disable this with:

systemctl disable transactional-update.timer

You likely want to edit:

/etc/sysconfig/libvirt-guests

To be able to handle guest shutdown policy due to a UPS failure or a reboot.

Now you can enable and start libvirt:

systemctl enable libvirtd
systemctl start libvirtd

Finally you can build and import virtualmachines.

LDAP Filter Syntax Validation

Posted by William Brown on August 28, 2019 02:00 PM

LDAP Filter Syntax Validation

Today I want to do a deep-dive into a change that will be released in 389 Directory Server 1.4.2. It’s a reasonably complicated change for our server, but it has a simple user interaction for admins and developers. I want to peel back some of the layers to explain what kind of experience, thought and team work goes into a change like this.

TL;DR - just keep upgrading your 389 Directory Server instance, and our ‘correct by default’ policy will apply, and you’ll keep having the best LDAP server we can make :)

LDAP Filters and How They Work

LDAP filters are one of the primary methods of expression in LDAP, and are used in almost every aspect of the system - from finding who you are when you login, to asserting you are member of a group or have other security attributes.

For the purposes of this discussion we’ll look at this filter:

'(|(cn=william)(cn=claire))'

In order to execute these queries quickly (LDAP is designed to handle thousands of operations per second) we heavily rely on indexing. Indexing is often a topic where people believe it to be some kind of “magic” but it’s reasonably simple: indexes are pre-computed partial result sets. So why do we need these?

We’ll imagine we have two entries (invalid, and truncated for brevity).

dn: cn=william,...
cn: william

dn: cn=claire,...
cn: claire

These entries both have entry-ids - these id’s are per-server in a replication group and are integers. You can show them by requesting entryid as an attribute in 389.

dn: cn=william,...
entryid: 1
cn: william

dn: cn=claire,...
entryid: 2
cn: claire

Our entries are stored in the main-entry database in /var/lib/dirsrv/slapd-standalone1/db/userRoot in the file “id2entry.db4”. This is a key-value database where the keys are the entryid, and the value is the serialised entry itself. Roughly, it’s:

[ ID ][ Entry             ]
  1     dn: cn=william,...
        cn: william

  2     dn: cn=claire,...
        cn: claire

Now, if we had NO indexes, to evaluate our filters we have to scan every entry of id2entry to determine if the filter matches. This algorithm is:

candidate_set = []
for id in id-min to id-max:
    entry = load_entry_by_id(id)
    if apply_filter(filter, entry):
        candidate_set.append(entry)

For two entries, this may be fast, but when you have 1000, 10.000, or even millions, this is extremely slow. We call these searches full table scans or in 389 DS, ALLIDS searches.

To make our searches faster we have indexes. An index is a mapping of a partial query term to an id list (In 389 we call these IDLs). An IDL is a set of integers. Our index for these examples would likely be something like:

cn
=william: [1, ]
=claire: [2, ]

These indexes are also stored in key-value databases in userRoot - you can see this as cn.db4.

So when we have an indexed term, to evaluate the query, we’ll load the indexes, then using mathematical set operations, we then produce a candidate_id_set, and we can then load the entries that only match.

For example in psuedo python code:

# Assume query is: (cn=william)

attr = filter.get_attr_name()
with open('%s.db' % attr) as index:
    idl = index.get('=william') # from the filter :)

for id in idl:
    ... # as before.

So we can see now that when we load the idl for cn index, this would give us the set [1, ]. Even if the database had 100 million entries, as our idl is a single value, we only need to load the one entry that matches. Neat!

When we have a more complex operation such as AND and OR, we can now manipulate the idl sets. For example:

(|(uid=claire)(uid=william))
   uid =claire -> idl [2, ]
   uid =william -> idl [1, ]

candidate_idl_set = union([2, ], [1, ])
# [1, 2]

This means again, even with millions of entries, we only need to load entry 1 and 2 to uphold the query provided to us.

So we finally know enough to understand how our example query is executed. PHEW!

Unindexed Attributes

However, it’s not always so easy. When we have an attribute that isn’t indexed, we have to handle this situation. In these cases, while we operate on the idl set, we may insert an idl with the value of ALLIDS (which as previously mentioned, is the “set of all entries”). This can have various effects.

If this was an AND query, we can annotate that the filter is partially resolved. This means that if we had:

(&(cn=william)(unindexed=foo))

Because an AND condition, both filter components must be satisfied, we have a partial candidate set from cn=william of [1, ]. We can load this partial candidate set, and then apply the filter test as in the full table scan case, but as we only apply it to a single entry this is really fast.

The real problem is OR queries. If we had:

(|(cn=william)(unindexed=foo))

Because OR means that both filter components could be satisfied, we have to turn unindexd into ALLIDS, and the result of the OR as a whole is ALLIDS. So even if we have 30 indexed values in the OR, a single ALLIDS (unindexed value) will always turn that OR into a full table scan. This is not good for performance!

Missing Attributes

So as a weirder case … what if the attribute doesn’t exist in schema at all? For example we could search for Microsoft AD attributes in 389 Directory Server, or we could submit bogus filters like “(whargarble=foo)”. What happens here?

Well, historically we treated these the same as unindexed queries. Which means that any term that is not in schema, would be treated as ALLIDS. This led to a “quitely known” denial of service attack again 389 Directory Server where you could emit a large number of queries for attributes that don’t exist, causing the server to attempt many ALLIDS scans. We have some defences like the allids limit (how many entries you can full table scan before giving up). But it can still cause entry cache churn and other performance issues.

I was first made aware of this issue in 2014 while working for University of Adelaide where our VMWare service would query LDAP for MS attributes, causing a large performance issue. We resolved this by adding the MS attributes to schema and indexing them so that they would create empty indexes - now we would call this in 389 Directory Server and “idl_alloc(0)” or “empty IDL”.

When initially hired by Red Hat in 2015 I knew this issue existed but I didn’t know enough about the server core to really fix it, so it went in the back of my mind … it was rare to have a customer raise this issue, but we had the work around and so I was able to advise support services on how to mitigate this.

In 2019 however, while investigating an issue related to filter optimisation, I was made aware of an issue with FreeIPA where they were doing certmap queries that requested MS Cert attributes. However it would cause large performance issues. We now had the issue again, and in a large widely installed product so it was time to tackle it.

How to handle this?

A major issue in this is “never breaking customers”. Because we had always supported this behaviour there is a risk that any solution would cause customer queries to “silently” begin to break if we issued a fix or change. More accurately, any change to how we execute the filters could cause results of the filters to change, which would disrupt customers.

Saying this, there is also precedent that 389 Directory Server was handling this incorrectly. From the RFC for LDAP it was noted:

Any assertion about the values of such an attribute is only defined if the AttributeType is known by the evaluating mechanism, the purported AttributeValue(s) conforms to the attribute syntax defined for that attribute type, the implied or indicated matching rule is applicable to that attribute type, and (when used) a presented matchValue conforms to the syntax defined for the indicated matching rules. When these conditions are not met, the FilterItem shall evaluate to the logical value UNDEFINED. An assertion which is defined by these conditions additionally evaluates to UNDEFINED if it relates to an attribute value and the attribute type is not present in an attribute against which the assertion is being tested. An assertion which is defined by these conditions and relates to the presence of an attribute type evaluates to FALSE.

Translation: If a filter component (IE nonexist=foo) is in a filter but NOT in the schema, the result of the filter’s evaluation is an empty-set aka undefined.

It was also clear that if an engaged and active consumer like FreeIPA is making this mistake, then it must be overlooked by many others without notice. So there is sometimes value in helping to raise the standard so that everyone benefits, and highlight mistakes quicker.

The Technical Solution

This is the easy part - we add a new configuration option with three states. “on”, “off”, “warn”. “On” would enable the strictest handling of filters, rejecting them an not continuing if any attribute requested was not in the schema. “Warn” would provide the rfc compliant behaviour, mapping to empty-set index, and notifying in the logs that this occured. Finally, “off” would be the previous “silently allow” behaviour.

This was easily achieved in filter parsing, by checking the attribute of each filter component against our schema hashmap. We then tag the filter element, and depending on the current setting level reject or continue.

In the query execution code, we now check the filter tag to understand if the attribute is schema present or not. If it’s flagged as “undefined”, then we immediately shortcut to return idl_alloc(0) instead of returning ALLIDS on the failure to find the relevant index db.

We can show the performance impact of this change:

Before with non-existant attribute

Average rate:    7.40/thr

After with “warn” enabled (map to empty set)

Average rate: 4808.70/thr

This is a huge improvement, and certainly shows the risk of DOS and how effective the solution was!

The Social Solution

Of course, this is the hard part - the 389 Directory Server team are all amazingly smart people, from many countries, and all live around the world. They all care that the server is the best possible, and that our standards as a team are high. That means when introducing a change that has a risk of affecting query result sets like this, they pay attention, and ask many difficult questions about how the feature will be implemented.

The first important justification - is a change like this worth while? We can see from the performance results that the risk of DOS is reasonable, so the answer there becomes Yes from a security view. But it’s also important to consider the cost on consumers - is this change going to benefit FreeIPA for example? As I am biased being the author I want to say “yes” - by notifying or rejecting invalid filters earlier, we can help FreeIPA developers improve their code quality, without expecting them to know LDAP inside and out.

The next major question is performance - before the feature was developed there is clearly a risk of DOS, but when we implement this we are performing additional locking on the schema. Is that a risk to our standalone performance or normal operating conditions. This had to also be discussed and assessed.

A really important point that was raised by Thierry is how we communicated these errors too. Previously we would use the “notes=” field of the access log. It looks like this:

conn=1 op=4 RESULT err=0 tag=101 nentries=13 etime=0.0003795424 notes=U

The challenge with the notes= field, is that it’s easy to overlook, and unless you are familar, hard to see what this is indicating. In this case, notes=U means partially unindexed query (one filter component but not all returned ALLIDS).

We can’t change the notes field due to the risk of breaking our own scripts like logconv.pl, support tools developed by RH or SUSE, or even integrations to platforms like splunk. But clearly we need a way to detail what is happening with your filter. So Thierry suggested an extension to have details about the provided notes. Now we get:

conn=1 op=4 RESULT err=0 tag=101 nentries=13 etime=0.0003795424 notes=U details="Partially Unindexed Filter"
conn=1 op=8 RESULT err=0 tag=101 nentries=0 etime=0.0001886208 notes=F details="Filter Element Missing From Schema"

So we have extended our log message, but without breaking existing integrations.

The final question is what our defaults should be. It’s one thing to have this feature, but what should we ship with? Do we strictly reject filters? Warn? Or disable, and expect people to turn this on.

This became a long discussion with Ludwig, Thierry and I - we discussed the risk of DOS in the first place, what the impact of the levels could be, how it could break legacy applications or sites using deprecated features or with weird data imports. Many different aspects were considered. We decided to default to “warn” (non-existant becomes empty-set), and we settled on communication with support to advise them of the upcoming change, but also we considered that our “back out” plan is to change the default and ship a patch if there is a large volume of negative feedback.

Conclusion

As of today, the PR is merged, and the code on it’s way to the next release. It’s a long process but the process exists to ensure we do what’s best for our users, while we try to balance many different aspects. We have a great team of people, with decades of experience from many backgrounds which means that these discussions can be long and detailed, but in the end, we hope to give what is the best product possible to our community.

It’s also valuable to share how much thought and effort goes into projects - in your life you may only interact with 1% of our work through our configuration and system, but we have an iceberg of decisions and design process that affects you every day, where we have to be responsible and considerate in our actions.

I hope you enjoyed this exploration of this change!

References

PR#50379

Using ramdisks with Cargo

Posted by William Brown on August 25, 2019 02:00 PM

Using ramdisks with Cargo

I have a bit of a history of killing SSDs - probably because I do a bit too much compiling and management of thousands of tiny files. Plenty of developers have this problem! So while thinking one evening, I was curious if I could setup a ramdisk on my mac for my cargo work to output to.

Making the ramdisk

On Linux you’ll need to use tmpfs or some access to /dev/shm.

On OSX you need to run a script like the following:

diskutil partitionDisk $(hdiutil attach -nomount ram://4096000) 1 GPTFormat APFS 'ramdisk' '100%'

This creates and mounts a 4GB ramdisk to /Volumes/ramdisk. Make sure you have enough ram!

Asking cargo to use it

We probably don’t want to make our changes permant in Cargo.toml, so we’ll use the environment:

CARGO_TARGET_DIR=/Volumes/ramdisk/rs cargo ...

Does it work?

Yes!

Disk Build (SSD, 2018MBP)

Finished dev [unoptimized + debuginfo] target(s) in 2m 29s

4 GB APFS ramdisk

Finished dev [unoptimized + debuginfo] target(s) in 1m 53s

For me it’s more valuable to try and save those precious SSD write cycles, so I think I’ll try to stick with this setup. You can see how much rust writes by doing a clean + build. My project used the following:

Filesystem                             Size   Used  Avail Capacity    iused               ifree %iused  Mounted on
/dev/disk110s1                        2.0Gi  1.2Gi  751Mi    63%       3910 9223372036854771897    0%   /Volumes/ramdisk

Make it permanent

Put the following in /Library/LaunchDaemons/au.net.blackhats.fy.ramdisk.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
        <dict>
                <key>Label</key>
                <string>au.net.blackhats.fy.ramdisk</string>
                <key>Program</key>
                <string>/usr/local/libexec/ramdisk.sh</string>
                <key>RunAtLoad</key>
                <true/>
                <key>StandardOutPath</key>
                <string>/var/log/ramdisk.log</string>
        </dict>
</plist>

And the following into /usr/local/libexec/ramdisk.sh

#!/bin/bash
date
diskutil partitionDisk $(hdiutil attach -nomount ram://4096000) 1 GPTFormat APFS 'ramdisk' '100%'

Finally put this in your cargo file of choice

[build]
target-dir = "/Volumes/ramdisk/rs"

Future william will need to work out if there are negative consequences to multiple cargo projects sharing the same target directory … hope not!

Launchctl tips

# Init the service
launchctl load /Library/LaunchDaemons/au.net.blackhats.fy.ramdisk.plist

References

lanuchd.info

CPU atomics and orderings explained

Posted by William Brown on July 15, 2019 02:00 PM

CPU atomics and orderings explained

Sometimes the question comes up about how CPU memory orderings work, and what they do. I hope this post explains it in a really accessible way.

Short Version - I wanna code!

Summary - The memory model you commonly see is from C++ and it defines:

  • Relaxed
  • Acquire
  • Release
  • Acquire/Release (sometimes AcqRel)
  • SeqCst

There are memory orderings - every operation is “atomic”, so will work correctly, but there rules define how the memory and code around the atomic are influenced.

If in doubt - use SeqCst - it’s the strongest guarantee and prevents all re-ordering of operations and will do the right thing.

The summary is:

  • Relaxed - no ordering guarantees, just execute the atomic as is.
  • Acquire - all code after this atomic, will be executed after the atomic.
  • Release - all code before this atomic, will be executed before the atomic.
  • Acquire/Release - both Acquire and Release - ie code stays before and after.
  • SeqCst - Stronger consistency of Acquire/Release.

Long Version … let’s begin …

So why do we have memory and operation orderings at all? Let’s look at some code to explain:

let mut x = 0;
let mut y = 0;
x = x + 3;
y = y + 7;
x = x + 4;
x = y + x;

Really trivial example - now to us as a human, we read this and see a set of operations that are linear by time. That means, they execute from top to bottom, in order.

However, this is not how computers work. First, compilers will optimise your code, and optimisation means re-ordering of the operations to achieve better results. A compiler may optimise this to:

let mut x = 0;
let mut y = 0;
// Note removal of the x + 3 and x + 4, folded to a single operation.
x = x + 7
y = y + 7;
x = y + x;

Now there is a second element. Your CPU presents the illusion of running as a linear system, but it’s actually an asynchronous, out-of-order task execution engine. That means a CPU will reorder your instructions, and may even run them concurrently and asynchronously.

For example, your CPU will have both x + 7 and y + 7 in the pipeline, even though neither operation has completed - they are effectively running at the “same time” (concurrently).

When you write a single thread program, you generally won’t notice this behaviour. This is because a lot of smart people write compilers and CPU’s to give the illusion of linear ordering, even though both of them are operating very differently.

Now we want to write a multithreaded application. Suddenly this is the challenge:

We write a concurrent program, in a linear language, executed on a concurrent asynchronous machine.

This means there is a challenge is the translation between our mind (thinking about the concurrent problem), the program (which we have to express as a linear set of operations), which then runs on our CPU (an async concurrent device).

Phew. How do computers even work in this scenario?!

Why are CPU’s async?

CPU’s have to be async to be fast - remember spectre and meltdown? These are attacks based on measuring the side effects of CPU’s asynchronous behaviour. While computers are “fast” these attacks will always be possible, because to make a CPU synchronous is slow - and asynchronous behaviour will always have measurable side effects. Every modern CPU’s performance is an illusion of async black magic.

A large portion of the async behaviour comes from the interaction of the CPU, cache, and memory.

In order to provide the “illusion” of a coherent synchronous memory interface there is no seperation of your programs cache and memory. When the cpu wants to access “memory” the CPU cache is utilised transparently and will handle the request, and only on a cache miss, will we retrieve the values from RAM.

(Aside: in almost all cases more CPU cache, not frequency will make your system perform better, because a cache miss will mean your task stalls waiting on RAM. Ohh no!)

CPU -> Cache -> RAM

When you have multiple CPU’s, each CPU has it’s own L1 cache:

CPU1 -> L1 Cache -> |              |
CPU2 -> L1 Cache -> | Shared L2/L3 | -> RAM
CPU3 -> L1 Cache -> |              |
CPU4 -> L1 Cache -> |              |

Ahhh! Suddenly we can see where problems can occur - each CPU has an L1 cache, which is transparent to memory but unique to the CPU. This means that each CPU can make a change to the same piece of memory in their L1 cache without the other CPU knowing. To help explain, let’s show a demo.

CPU just trash my variables fam

We’ll assume we now have two threads - my code is in rust again, and there is a good reason for the unsafes - this code really is unsafe!

// assume global x: usize = 0; y: usize = 0;

THREAD 1                        THREAD 2

if unsafe { *x == 1 } {          unsafe {
    unsafe { *y += 1 }              *y = 10;
}                                   *x = 1;
                                }

At the end of execution, what state will X and Y be in? The answer is “it depends”:

  • What order did the threads run?
  • The state of the L1 cache of each CPU
  • The possible interleavings of the operations.
  • Compiler re-ordering

In the end the result of x will always be 1 - because x is only mutated in one thread, the caches will “eventually” (explained soon) become consistent.

The real question is y. y could be:

  • 10
  • 11
  • 1

10 - This can occur because in thread 2, x = 1 is re-ordered above y = 10, causing the thread 1 “y += 1” to execute, followed by thread 2 assign 10 directly to y. It can also occur because the check for x == 1 occurs first, so y += 1 is skipped, then thread 2 is run, causing y = 10. Two ways to achieve the same result!

11 - This occurs in the “normal” execution path - all things considered it’s a miracle :)

1 - This is the most complex one - The y = 10 in thread 2 is applied, but the result is never sent to THREAD 1’s cache, so x = 1 occurs and is made available to THREAD 1 (yes, this is possible to have different values made available to each cpu …). Then thread 1 executes y (0) += 1, which is then sent back trampling the value of y = 10 from thread 2.

If you want to know more about this and many other horrors of CPU execution, Paul McKenny is an expert in this field and has many talks at LCA and others on the topic. He can be found on twitter and is super helpful if you have questions.

So how does a CPU work at all?

Obviously your system (likely a multicore system) works today - so it must be possible to write correct concurrent software. Cache’s are kept in sync via a protocol called MESI. This is a state machine describing the states of memory and cache, and how they can be synchronised. The states are:

  • Modified
  • Exclusive
  • Shared
  • Invalid

What’s interesting about MESI is that each cache line is maintaining it’s own state machine of the memory addresses - it’s not a global state machine. To coordinate CPU’s asynchronously message each other.

A CPU can be messaged via IPC (Inter-Processor-Communication) to say that another CPU wants to “claim” exclusive ownership of a memory address, or to indicate that it has changed the content of a memory address and you should discard your version. It’s important to understand these messages are asynchronous. When a CPU modifies an address it does not immediately send the invalidation message to all other CPU’s - and when a CPU recieves the invalidation request it does not immediately act upon that message.

If CPU’s did “synchronously” act on all these messages, they would be spending so much time handling IPC traffic, they would never get anything done!

As a result, it must be possible to indicate to a CPU that it’s time to send or acknowledge these invalidations in the cache line. This is where barriers, or the memory orderings come in.

  • Relaxed - No messages are sent or acknowledged.
  • Release - flush all pending invalidations to be sent to other CPUS
  • Acquire - Acknowledge and process all invalidation requests in my queue
  • Acquire/Release - flush all outgoing invalidations, and process my incomming queue
  • SeqCst - as AcqRel, but with some other guarantees around ordering that are beyond this discussion.

Understand a Mutex

With this knowledge in place, we are finally in a position to understand the operations of a Mutex

// Assume mutex: Mutex<usize> = Mutex::new(0);

THREAD 1                            THREAD 2

{                                   {
    let guard = mutex.lock()            let guard = mutex.lock()
    *guard += 1;                        println!(*guard)
}                                   }

We know very clearly that this will print 1 or 0 - it’s safe, no weird behaviours. Let’s explain this case though:

THREAD 1

{
    let guard = mutex.lock()
    // Acquire here!
    // All invalidation handled, guard is 0.
    // Compiler is told "all following code must stay after .lock()".
    *guard += 1;
    // content of usize is changed, invalid req is queue
}
// Release here!
// Guard goes out of scope, invalidation reqs sent to all CPU's
// Compiler told all proceeding code must stay above this point.

            THREAD 2

            {
                let guard = mutex.lock()
                // Acquire here!
                // All invalidations handled - previous cache of usize discarded
                // and read from THREAD 1 cache into S state.
                // Compiler is told "all following code must stay after .lock()".
                println(*guard);
            }
            // Release here!
            // Guard goes out of scope, no invalidations sent due to
            // no modifications.
            // Compiler told all proceeding code must stay above this point.

And there we have it! How barriers allow us to define an ordering in code and a CPU, to ensure our caches and compiler outputs are correct and consistent.

Benefits of Rust

A nice benefit of Rust, and knowing these MESI states now, we can see that the best way to run a system is to minimise the number of invalidations being sent and acknowledged as this always causes a delay on CPU time. Rust variables are always mutable or immutable. These map almost directly to the E and S states of MESI. A mutable value is always exclusive to a single cache line, with no contention - and immutable values can be placed into the Shared state allowing each CPU to maintain a cache copy for higher performance.

This is one of the reasons for Rust’s amazing concurrency story is that the memory in your program map to cache states very clearly.

It’s also why it’s unsafe to mutate a pointer between two threads (a global) - because the cache of the two cpus’ won’t be coherent, and you may not cause a crash, but one threads work will absolutely be lost!

Finally, it’s important to see that this is why using the correct concurrency primitives matter - it can highly influence your cache behaviour in your program and how that affects cache line contention and performance.

For comments and more, please feel free to email me!

Shameless Plug

I’m the author and maintainer of Conc Read - a concurrently readable datastructure library for Rust. Check it out on crates.io!

I no longer recommend FreeIPA

Posted by William Brown on July 09, 2019 02:00 PM

I no longer recommend FreeIPA

It’s probably taken me a few years to write this, but I can no longer recommend FreeIPA for IDM installations.

Why not?

The FreeIPA project focused on Kerberos and SSSD, with enough other parts glued on to look like a complete IDM project. Now that’s fine, but it means that concerns in other parts of the project are largely ignored. It creates design decisions that are not scalable or robust.

Due to these decisions IPA has stability issues and scaling issues that other products do not.

To be clear: security systems like IDM or LDAP can never go down. That’s not acceptable.

What do you recommend instead?

  • Samba with AD
  • AzureAD
  • 389 Directory Server

All of these projects are very reliable, secure, scalable. We have done a lot of work into 389 to improve our out of box IDM capabilities too, but there is more to be done too. The Samba AD team have done great things too, and deserve a lot of respect for what they have done.

Is there more detail than this?

Yes - buy me a drink and I’ll talk :)

Didn’t you help?

I tried and it was not taken on board.

So what now?

Hopefully in the next year we’ll see new IDM projects for opensource released that have totally different approachs to the legacy we currently ride upon.

Using 389ds with docker

Posted by William Brown on July 04, 2019 02:00 PM

Using 389ds with docker

I’ve been wanting to containerise 389 Directory Server for a long time - it’s been a long road to get here, but I think that our container support is getting very close to a production ready and capable level. It took so long due to health issues and generally my obsession to do everything right.

Today, container support along with our new command line tools makes 389 a complete breeze to administer. So lets go through an example of a deployment now.

Please note: the container image here is a git-master build and is not production ready as of 2019-07, hopefully this changes soon.

Getting the Container

docker pull firstyear/389ds:latest

If you want to run an ephemeral instance (IE you will LOSE all your data on a restart)

docker run firstyear/389ds:latest

If you want your data to persist, you need to attach a volume at /data:

docker volume create 389ds_data
docker run -v 389ds_data:/data firstyear/389ds:latest

The image exposes ports 3389 and 3636, so you may want to consider publishing these if you want external access.

The container should now setup and run an instance! That’s it, LDAP has never been easier to deploy!

Actually Adding Some Data …

LDAP only really matters if we have some data! So we’ll create a new backend. You need to run these instructions inside the current container, so I prefix these with:

docker exec -i -t <name of container> <command>
docker exec -i -t 389inst dsconf ....

This uses the ldapi socket via /data, and authenticates you based on your process uid to map you to the LDAP administrator account - basically, it’s secure, host only admin access to your data.

Now you can choose any suffix you like, generally based on your dns name (IE I use dc=blackhats,dc=net,dc=au).

dsconf localhost backend create --suffix dc=example,dc=com --be-name userRoot
> The database was sucessfully created

Now fill in the suffix details into your configuration of the container. You’ll need to find where docker stores the volume on your host for this (docker inspect will help you). My location is listed here:

vim /var/lib/docker/volumes/389ds_data/_data/config/container.inf

--> change
# basedn ...
--> to
basedn = dc=example,dc=com

Now you can populate data into that: The dsidm command is our tool to manage users and groups of a backend, and it can provide initialised data which has best-practice aci’s, demo users and groups and starts as a great place for you to build an IDM system.

dsidm localhost initialise

That’s it! You can now see you have a user and a group!

dsidm localhost user list
> demo_user
dsidm localhost group list
> demo_group

You can create your own user:

dsidm localhost user create --uid william --cn William --displayName 'William Brown' --uidNumber 1000 --gidNumber 1000 --homeDirectory /home/william
> Successfully created william
dsidm localhost user get william

It’s trivial to add an ssh key to the user:

dsidm localhost user modify william add:nsSshPublicKey:AAA...
> Successfully modified uid=william,ou=people,dc=example,dc=com

Or to add them to a group:

dsidm localhost group add_member demo_group uid=william,ou=people,dc=example,dc=com
> added member: uid=william,ou=people,dc=example,dc=com
dsidm localhost group members demo_group
> dn: uid=william,ou=people,dc=example,dc=com

Finally, we can even generale config templates for your applications:

dsidm localhost client_config sssd.conf
dsidm localhost client_config ldap.conf
dsidm localhost client_config display

I’m happy to say, LDAP administration has never been easier - we plan to add more functionality to enabled broader ranges of administrative tasks, especially in the IDM area and management of the configuration. It’s honestly hard to beleve that in a shortlist of commands you can now have a fully functional LDAP IDM solution working.

The Case for Ethics in OpenSource

Posted by William Brown on April 27, 2019 02:00 PM

The Case for Ethics in OpenSource

For a long time there have been incidents in technology which have caused negative effects on people - from leaks of private data, to interfaces that are not accessible, to even issues like UI’s doing things that may try to subvert a persons intent. I’m sure there are many more: and we could be here all day listing the various issues that exist in technology, from small to great.

The theme however is that these issues continue to happen: we continue to make decisions in applications that can have consequences to humans.

Software is pointless without people. People create software, people deploy software, people interact with software, and even software indirectly can influence people’s lives. At every layer people exist, and all software will affect them in some ways.

I think that today, we have made a lot of progress in our communities around the deployment of code’s of conduct. These are great, and really help us to discuss the decisions and actions we take within our communities - with the people who create the software. I would like this to go further, where we can have a framework to discuss the effect of software on people that we write: the people that deploy, interact with and are influenced by our work.

Disclaimers

I’m not a specialist in ethics or morality: I’m not a registered or certified engineer in the legal sense. Finally, like all humans I am a product of my experiences which causes all my view points to be biased through the lens of my experience.

Additionally, I specialise in Identity Management software, so many of the ideas and issues I have encountered are really specific to this domain - which means I may overlook the issues in other areas. I also have a “security” mindset which also factors into my decisions too.

Regardless I hope that this is a starting point to recieve further input and advice from others, and a place where we can begin to improve.

The Problem

TODO: Discuss data handling practices

Let’s consider some issues and possible solutions in work that I’m familiar with - identity management software. Lets list a few “features”. (Please don’t email me about how these are wrong, I know they are …)

  • Storing usernames as first and last name
  • Storing passwords in cleartext.
  • Deleting an account sets a flag to mark deletion
  • Names are used as the primary key
  • We request sex on signup
  • To change account details, you have to use a command line tool

Now “technically”, none of these decisions are incorrect at all. There is literally no bad technical decision here, and everything is “technically correct” (not always the best kind of correct).

What do we want to achieve?

There are lots of different issues here, but really want to prevent harm to a person. What is harm? Well that’s a complex topic. To me, it could be emotional harm, disrespect of their person, it could be a feeling of a lack of control.

I don’t believe it’s correct to dictate a set of rules that people should follow. People will be fatigued, and will find the process too hard. We need to trust that people can learn and want to improve. Instead I believe it’s important we provide important points that people should be able to consider in a discussion around the development of software. The same way we discuss technical implementation details, we should discuss potential human impact in every change we have. To realise this, we need a short list of important factors that relate to humans.

I think the following points are important to consider when designing software. These relate to general principles which I have learnt and researched.

People should be respected to have:

  • Informed consent
  • Choice over how they are identified
  • Ability to be forgotten
  • Individual Autonomy
  • Free from Harmful Discrimination
  • Privacy
  • Ability to meaningfully access and use software

There is already some evidence in research papers to show that there are strong reasons for moral positions in software. For example, to prevent harm to come to people, to respect peoples autonomy and to conform to privacy legislation ( source ).

Let’s apply these

Given our set of “features”, lets now discuss these with the above points in mind.

  • Storing usernames as first and last name

This point clearly is in violation of the ability to choose how people are identified - some people may only have a single name, some may have multiple family names. On a different level this also violates the harmful discrimination rule due to the potential to disrespect individuals with cultures that have different name schemes compared to western/English societies.

A better way to approach this is “displayName” as a freetext UTF8 case sensitive field, and to allow substring search over the content (rather than attempting to sort by first/last name which also has a stack of issues).

  • Storing passwords in cleartext.

This one is a violation of privacy, that we risk the exposure of a password which may have been reused (we can’t really stop password reuse, we need to respect human behaviour). Not only that some people may assume we DO hash these correctly, so we actually are violating informed consent as we didn’t disclose the method of how we store these details.

A better thing here is to hash the password, or at least to disclose how it will be stored and used.

  • Deleting an account sets a flag to mark deletion

This violates the ability to be forgotten, because we aren’t really deleting the account. It also breaks informed consent, because we are being “deceptive” about what our software is actually doing compared to the intent of the users request

A better thing is to just delete the account, or if not possible, delete all user data and leave a tombstone inplace that represents “an account was here, but no details associated”.

  • Names are used as the primary key

This violates choice over identification, especially for women who have a divorce, or individuals who are transitioning or just people who want to change their name in general. The reason for the name change doesn’t matter - what matters is we need to respect peoples right to identification.

A better idea is to use UUID/ID numbers as a primary key, and have name able to be changed at any point in time.

  • We request sex on signup

Violates a privacy as a first point - we probably have no need for the data unless we are a medical application, so we should never ask for this at all. We also need to disclose why we need this data to satisfy informed consent, and potentially to allow them to opt-out of providing the data. Finally (if we really require this), to not violate self identification, we need to allow this to be a free-text field rather than a Male/Female boolean. This is not just in respect of individuals who are LGBTQI+, but the reality that there are biologically people who medically are neither. We also need to allow this to be changed at any time in the future. This in mind Sex and Gender are different concepts, so we should be careful which we request - Sex is the medical term of a person’s genetics, and Gender is who the person identifies as.

Not only this, because this is a very personal piece of information, we must disclose how we protect this information from access, who can see it, and if or how we’ll ever share it with other systems or authorities.

Generally, we probably don’t need to know, so don’t ask for it at all.

  • To change account details, you have to use a command line tool

This violates a users ability to meaningfully access and use software - remember, people come from many walks of life and all have different skill sets, but using command line tools is not something we can universally expect.

A proper solution here is at minimum a web/graphical self management portal that is easy to access and follows proper UX/UI design rules, and for a business deploying, a service desk with humans involved that can support and help people change details on their account on their behalf if the person is unable to self-support via the web service.

Proposal

I think that OpenSource should aim to have a code of ethics - the same way we have a code of conduct to guide our behaviour internally to a project, we should have a framework to promote discussion of people’s rights that use, interact with and are affected by our work. We should not focus on technical matters only, but should be promoting people at the core of all our work. Every decision we make is not just technical, but social.

I’m sure that there are more points that could be considere than what I have listed here: I’d love to hear feedback to william at blackhats.net.au. Thanks!

Implementing Webauthn - a series of complexities …

Posted by William Brown on April 27, 2019 02:00 PM

Implementing Webauthn - a series of complexities …

I have recently started to work on a rust webauthn library, to allow servers to be implemented. However, in this process I have noticed a few complexities to an API that should have so much promise for improving the state of authentication. So far I can say I have not found any cryptographic issues, but the design of the standard does raise questions about the ability for people to correctly implement Webauthn servers.

Odd structure decisions

Webauth is made up of multiple encoding standards. There is a good reason for this, which is that the json parts are for the webbrowser, and the cbor parts are for ctap and the authenticator device.

However, I quickly noticed an issue in the Attestation Object, as described here https://w3c.github.io/webauthn/#sctn-attestation . Can you see the problem?

The problem is that the Authenticator Data relies on hand-parsing bytes, and has two structures that are concatenated with no length. This means:

  • You have to hand parse bytes 0 -> 36
  • You then have to CBOR deserialise the Attested Cred Data (if present)
  • You then need to serialise the ACD back to bytes and record that length (if your library doesn’t tell you how long the amount of data is parsed was).
  • Then you need to CBOR deserialise the Extensions.

What’s more insulting about this situation is that the Authenticator Data literally is part of the AttestationObject which is already provided as CBOR! There seems to be no obvious reason for this to require hand-parsing, as the Authenticator Data which will be signature checked, has it’s byte form checked, so you could have the AttestationObject store authDataBytes, then you can CBOR decode the nested structure (allowing the hashing of the bytes later).

There are many risks here because now you have requirements to length check all the parameters which people could get wrong - when CBOR would handle this correctly for you you and provides a good level of correctness that the structure is altered. I also trust the CBOR parser authors to do proper length checks too compared to my crappy byte parsing code!

Confusing Naming Conventions and Layout

The entire standard is full of various names and structures, which are complex, arbitrarily nested and hard to see why they are designed this way. Perhaps it’s a legacy compatability issue? More likely I think it’s object-oriented programming leaking into the specification, which is a paradigm that is not universally applicable.

Regardless, it would be good if the structures were flatter, and named better. There are many confusing structure names throughout the standard, and it can sometimes be hard to identify what you require and don’t require.

Additionally, naming of fields and their use, uses abbrivations to save bandwidth, but makes it hard to follow. I did honestly get confused about the difference between rp (the relying party name) and rp_id, where the challenge provides rp, and the browser response use rp_id.

It can be easy to point fingers and say “ohh William, you’re just not reading it properly and are stupid”. Am I? Or is it that humans find it really hard to parse data like this, and our brains are better suited to other tasks? Human factors are important to consider in specification design both in naming of values, consistency of their use, and appropriate communication as to how they are used properly. I’m finding this to be a barrier to correct implementation now (especially as the signature verification section is very fragmented and hard to follow …).

Crypto Steps seem complex or too static

There are a lot of possible choices here - there are 6 attestation formats and 5 attestation types. As some formats only do some types, there are then 11 verification paths you need to implement for all possible authenticators. I think this level of complexity will lead to mistakes over a large number of possible code branch paths, or lacking support for some device types which people may not have access to.

I think it may have been better to limit the attestation format to one, well defined format, and within that to limit the attestation types available to suit a more broad range of uses.

It feels a lot like these choice are part of some internal Google/MS/Other internal decisions for high security devices, or custom deviges, which will be internally used. It’s leaked into the spec and it raises questions about the ability for people to meaningfully implement the full specification for all possible devices, let alone correctly.

Some parts even omit details in a cryptographic operation, such as here in verification step 2, it doesn’t even list what format the bytes are. (Hint: it’s DER x509).

What would I change?

  • Be more specific

There should be no assumptions about format types, what is in bytes. Be verbose, detailed and without ambiguity.

  • Use type safe, length checked structures.

I would probably make the entire thing a single CBOR structure which contains other nested structures as required. We should never have to hand-parse bytes in 2019, especially when there is a great deal of evidence to show the risks of expecting people to do this.

  • Don’t assume object orientation

I think simpler, flatter structures in the json/cbor would have helped, and been clearer to implement, rather than the really complex maze of types currently involved.

Summary

Despite these concerns, I still think webauthn is a really good standard, and I really do think it will become the future of authentication. I’m hoping to help make that a reality in opensource and I hope that in the future I can contribute to further development and promotion of webauthn.

Using Rust Generics to Enforce DB Record State

Posted by William Brown on April 12, 2019 02:00 PM

Using Rust Generics to Enforce DB Record State

In a database, entries go through a lifecycle which represents what attributes they have have, db record keys, and if they have conformed to schema checking.

I’m currently working on a (private in 2019, public in july 2019) project which is a NoSQL database writting in Rust. To help us manage the correctness and lifecycle of database entries, I have been using advice from the Rust Embedded Group’s Book.

As I have mentioned in the past, state machines are a great way to design code, so let’s plot out the state machine we have for Entries:

Entry State Machine

The lifecyle is:

  • A new entry is submitted by the user for creation
  • We schema check that entry
  • If it passes schema, we commit it and assign internal ID’s
  • When we search the entry, we retrieve it by internal ID’s
  • When we modify the entry, we need to recheck it’s schema before we commit it back
  • When we delete, we just remove the entry.

This leads to a state machine of:

                    |
             (create operation)
                    |
                    v
            [ New + Invalid ] -(schema check)-> [ New + Valid ]
                                                      |
                                               (send to backend)
                                                      |
                                                      v    v-------------\
[Commited + Invalid] <-(modify operation)- [ Commited + Valid ]          |
          |                                          ^   \       (write to backend)
          \--------------(schema check)-------------/     ---------------/

This is a bit rough - The version on my whiteboard was better :)

The main observation is that we are focused only on the commitability and validty of entries - not about where they are or if the commit was a success.

Entry Structs

So to make these states work we have the following structs:

struct EntryNew;
struct EntryCommited;

struct EntryValid;
struct EntryInvalid;

struct Entry<STATE, VALID> {
    state: STATE,
    valid: VALID,
    // Other db junk goes here :)
}

We can then use these to establish the lifecycle with functions (similar) to this:

impl Entry<EntryNew, EntryInvalid> {
    fn new() -> Self {
        Entry {
            state: EntryNew,
            valid: EntryInvalid,
            ...
        }
    }

}

impl<STATE> Entry<STATE, EntryInvalid> {
    fn validate(self, schema: Schema) -> Result<Entry<STATE, EntryValid>, ()> {
        if schema.check(self) {
            Ok(Entry {
                state: self.state,
                valid: EntryValid,
                ...
            })
        } else {
            Err(())
        }
    }

    fn modify(&mut self, ...) {
        // Perform any modifications on the entry you like, only works
        // on invalidated entries.
    }
}

impl<STATE> Entry<STATE, EntryValid> {
    fn seal(self) -> Entry<EntryCommitted, EntryValid> {
        // Assign internal id's etc
        Entry {
            state: EntryCommited,
            valid: EntryValid,
        }
    }

    fn compare(&self, other: Entry<STATE, EntryValid>) -> ... {
        // Only allow compares on schema validated/normalised
        // entries, so that checks don't have to be schema aware
        // as the entries are already in a comparable state.
    }
}

impl Entry<EntryCommited, EntryValid> {
    fn invalidate(self) -> Entry<EntryCommited, EntryInvalid> {
        // Invalidate an entry, to allow modifications to be performed
        // note that modifications can only be applied once an entry is created!
        Entry {
            state: self.state,
            valid: EntryInvalid,
        }
    }
}

What this allows us to do importantly is to control when we apply search terms, send entries to the backend for storage and more. Benefit is this is compile time checked, so you can never send an entry to a backend that is not schema checked, or run comparisons or searches on entries that aren’t schema checked, and you can even only modify or delete something once it’s created. For example other parts of the code now have:

impl BackendStorage {
    // Can only create if no db id's are assigned, IE it must be new.
    fn create(&self, ..., entry: Entry<EntryNew, EntryValid>) -> Result<...> {
    }

    // Can only modify IF it has been created, and is validated.
    fn modify(&self, ..., entry: Entry<EntryCommited, EntryValid>) -> Result<...> {
    }

    // Can only delete IF it has been created and committed.
    fn delete(&self, ..., entry: Entry<EntryCommited, EntryValid>) -> Result<...> {
    }
}

impl Filter<STATE> {
    // Can only apply filters (searches) if the entry is schema checked. This has an
    // important behaviour, where we can schema normalise. Consider a case-insensitive
    // type, we can schema-normalise this on the entry, then our compare can simply
    // be a string.compare, because we assert both entries *must* have been through
    // the normalisation routines!
    fn apply_filter(&self, ..., entry: &Entry<STATE, EntryValid>) -> Result<bool, ...> {
    }
}

Using this with Serde?

I have noticed that when we serialise the entry, that this causes the valid/state field to not be compiled away - because they have to be serialised, regardless of the empty content meaning the compiler can’t eliminate them.

A future cleanup will be to have a serialised DBEntry form such as the following:

struct DBEV1 {
    // entry data here
}

enum DBEntryVersion {
    V1(DBEV1)
}

struct DBEntry {
    data: DBEntryVersion
}

impl From<Entry<EntryNew, EntryValid>> for DBEntry {
    fn from(e: Entry<EntryNew, EntryValid>) -> Self {
        // assign db id's, and return a serialisable entry.
    }
}

impl From<Entry<EntryCommited, EntryValid>> for DBEntry {
    fn from(e: Entry<EntryCommited, EntryValid>) -> Self {
        // Just translate the entry to a serialisable form
    }
}

This way we still have the zero-cost state on Entry, but we are able to move to a versioned seralised structure, and we minimise the run time cost.

Testing the Entry

To help with testing, I needed to be able to shortcut and move between anystate of the entry so I could quickly make fake entries, so I added some unsafe methods:

#[cfg(test)]
unsafe fn to_new_valid(self, Entry<EntryNew, EntryInvalid>) -> {
    Entry {
        state: EntryNew,
        valid: EntryValid
    }
}

These allow me to setup and create small unit tests where I may not have a full backend or schema infrastructure, so I can test specific aspects of the entries and their lifecycle. It’s limited to test runs only, and marked unsafe. It’s not “technically” memory unsafe, but it’s unsafe from the view of “it could absolutely mess up your database consistency guarantees” so you have to really want it.

Summary

Using statemachines like this, really helped me to clean up my code, make stronger assertions about the correctness of what I was doing for entry lifecycles, and means that I have more faith when I and future-contributors will work on the code base that we’ll have compile time checks to ensure we are doing the right thing - to prevent data corruption and inconsistency.

Debugging MacOS bluetooth audio stutter

Posted by William Brown on April 07, 2019 02:00 PM

Debugging MacOS bluetooth audio stutter

I was noticing that audio to my bluetooth headphones from my iPhone was always flawless, but I started to noticed stutter and drops from my MBP. After exhausting some basic ideas, I was stumped.

To the duck duck go machine, and I searched for issues with bluetooth known issues. Nothing appeared.

However, I then decided to debug the issue - thankfully there was plenty of advice on this matter. Press shift + option while clicking bluetooth in the menu-bar, and then you have a debug menu. You can also open Console.app and search for “bluetooth” to see all the bluetooth related logs.

I noticed that when the audio stutter occured that the following pattern was observed.

default     11:25:45.840532 +1000   wirelessproxd   About to scan for type: 9 - rssi: -90 - payload: <00000000 00000000 00000000 00000000 00000000 0000> - mask: <00000000 00000000 00000000 00000000 00000000 0000> - peers: 0
default     11:25:45.840878 +1000   wirelessproxd   Scan options changed: YES
error       11:25:46.225839 +1000   bluetoothaudiod Error sending audio packet: 0xe00002e8
error       11:25:46.225899 +1000   bluetoothaudiod Too many outstanding packets. Drop packet of 8 frames (total drops:451 total sent:60685 percentDropped:0.737700) Outstanding:17

There was always a scan, just before the stutter initiated. So what was scanning?

I searched for the error related to packets, and there were a lot of false leads. From weird apps to dodgy headphones. In this case I could eliminate both as the headphones worked with other devices, and I don’t have many apps installed.

So I went back and thought about what macOS services could be the problem, and I found that airdrop would scan periodically for other devices to send and recieve files. Disabling Airdrop from the sharing menu in System Prefrences cleared my audio right up.

UPDATE 2019-12-20: It looks like the Airdrop sharing option in system preferences has been removed in 10.15, so I don’t believe it’s possible to resove this issue now - audio stutter forever!

GDB autoloads for 389 DS

Posted by William Brown on April 02, 2019 02:00 PM

GDB autoloads for 389 DS

I’ve been writing a set of extensions to help debug 389-ds a bit easier. Thanks to the magic of python, writing GDB extensions is really easy.

On OpenSUSE, when you start your DS instance under GDB, all of the extensions are automatically loaded. This will help make debugging a breeze.

zypper in 389-ds gdb
gdb /usr/sbin/ns-slapd
GNU gdb (GDB; openSUSE Tumbleweed) 8.2
(gdb) ds-
ds-access-log  ds-backtrace
(gdb) set args -d 0 -D /etc/dirsrv/slapd-<instance name>
(gdb) run
...

All the extensions are under the ds- namespace, so they are easy to find. There are some new ones on the way, which I’ll discuss here too:

ds-backtrace

As DS is a multithreaded process, it can be really hard to find the active thread involved in a problem. So we provided a command that knows how to fold duplicated stacks, and to highlight idle threads that you can (generally) skip over.

===== BEGIN ACTIVE THREADS =====
Thread 37 (LWP 70054))
Thread 36 (LWP 70053))
Thread 35 (LWP 70052))
Thread 34 (LWP 70051))
Thread 33 (LWP 70050))
Thread 32 (LWP 70049))
Thread 31 (LWP 70048))
Thread 30 (LWP 70047))
Thread 29 (LWP 70046))
Thread 28 (LWP 70045))
Thread 27 (LWP 70044))
Thread 26 (LWP 70043))
Thread 25 (LWP 70042))
Thread 24 (LWP 70041))
Thread 23 (LWP 70040))
Thread 22 (LWP 70039))
Thread 21 (LWP 70038))
Thread 20 (LWP 70037))
Thread 19 (LWP 70036))
Thread 18 (LWP 70035))
Thread 17 (LWP 70034))
Thread 16 (LWP 70033))
Thread 15 (LWP 70032))
Thread 14 (LWP 70031))
Thread 13 (LWP 70030))
Thread 12 (LWP 70029))
Thread 11 (LWP 70028))
Thread 10 (LWP 70027))
#0  0x00007ffff65db03c in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x00007ffff66318b0 in PR_WaitCondVar () at /usr/lib64/libnspr4.so
#2  0x00000000004220e0 in [IDLE THREAD] connection_wait_for_new_work (pb=0x608000498020, interval=4294967295) at /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:970
#3  0x0000000000425a31 in connection_threadmain () at /home/william/development/389ds/ds/ldap/servers/slapd/connection.c:1536
#4  0x00007ffff6637484 in None () at /usr/lib64/libnspr4.so
#5  0x00007ffff65d4fab in start_thread () at /lib64/libpthread.so.0
#6  0x00007ffff6afc6af in clone () at /lib64/libc.so.6

This example shows that there are 17 idle threads (look at frame 2) here, that all share the same trace.

ds-access-log

The access log is buffered before writing, so if you have a coredump, and want to see the last few events before they were written to disk, you can use this to display the content:

(gdb) ds-access-log
===== BEGIN ACCESS LOG =====
$2 = 0x7ffff3c3f800 "[03/Apr/2019:10:58:42.836246400 +1000] conn=1 fd=64 slot=64 connection from 127.0.0.1 to 127.0.0.1
[03/Apr/2019:10:58:42.837199400 +1000] conn=1 op=0 BIND dn=\"\" method=128 version=3
[03/Apr/2019:10:58:42.837694800 +1000] conn=1 op=0 RESULT err=0 tag=97 nentries=0 etime=0.0001200300 dn=\"\"
[03/Apr/2019:10:58:42.838881800 +1000] conn=1 op=1 SRCH base=\"\" scope=2 filter=\"(objectClass=*)\" attrs=ALL
[03/Apr/2019:10:58:42.839107600 +1000] conn=1 op=1 RESULT err=32 tag=101 nentries=0 etime=0.0001070800
[03/Apr/2019:10:58:42.840687400 +1000] conn=1 op=2 UNBIND
[03/Apr/2019:10:58:42.840749500 +1000] conn=1 op=2 fd=64 closed - U1
", '\276' <repeats 3470 times>

At the end the line that repeats shows the log is “empty” in that segment of the buffer.

ds-entry-print

This command shows the in-memory entry. It can be common to see Slapi_Entry * pointers in the codebase, so being able to display these is really helpful to isolate what’s occuring on the entry. Your first argument should be the Slapi_Entry pointer.

(gdb) ds-entry-print ec
Display Slapi_Entry: cn=config
cn: config
objectClass: top
objectClass: extensibleObject
objectClass: nsslapdConfig
nsslapd-schemadir: /opt/dirsrv/etc/dirsrv/slapd-standalone1/schema
nsslapd-lockdir: /opt/dirsrv/var/lock/dirsrv/slapd-standalone1
nsslapd-tmpdir: /tmp
nsslapd-certdir: /opt/dirsrv/etc/dirsrv/slapd-standalone1
...

Programming Lessons and Methods

Posted by William Brown on February 25, 2019 02:00 PM

Programming Lessons and Methods

Everyone has their own lessons and methods that they use when they approaching programming. These are the lessons that I have learnt, which I think are the most important when it comes to design, testing and communication.

Comments and Design

Programming is the art of writing human readable code, that a machine will eventually run. Your program needs to be reviewed, discussed and parsed by another human. That means you need to write your program in a way they can understand first.

Rather than rushing into code, and hacking until it works, I find it’s great to start with comments such as:

fn data_access(search: Search) -> Type {
    // First check the search is valid
    //  * No double terms
    //  * All schema is valid

    // Retrieve our data based on the search

    // if debug, do an un-indexed assert the search matches

    // Do any need transform

    // Return the data
}

After that, I walk away, think about the issue, come back, maybe tweak these comments. When I eventually fill in the code inbetween, I leave all the comments in place. This really helps my future self understand what I was thinking, but it also helps other people understand too.

State Machines

State machines are a way to design and reason about the states a program can be in. They allow exhaustive represenations of all possible outcomes of a function. A simple example is a microwave door.

  /----\            /----- close ----\          /-----\
  |     \          /                 v         v      |
  |    -------------                ---------------   |
open   | Door Open |                | Door Closed |  close
  |    -------------                ---------------   |
  |    ^          ^                  /          \     |
  \---/            \------ open ----/            \----/

When the door is open, opening it again does nothing. Only when the door is open, and we close the door (and event), does the door close (a transition). Once closed, the door can not be closed any more (event does nothing). It’s when we open the door now, that a state change can occur.

There is much more to state machines than this, but they allow us as humans to reason about our designs and model our programs to have all possible outcomes considered.

Zero, One and Infinite

In mathematics there are only three numbers that matter. Zero, One and Infinite. It turns out the same is true in a computer too.

When we are making a function, we can define limits in these terms. For example:

fn thing(argument: Type)

In this case, argument is “One” thing, and must be one thing.

fn thing(argument: Option<Type>)

Now we have argument as an option, so it’s “Zero” or “One”.

fn thing(argument: Vec<Type>)

Now we have argument as vec (array), so it’s “Zero” to “Infinite”.

When we think about this, our functions have to handle these cases properly. We don’t write functions that take a vec with only two items, we write a function with two arguments where each one must exist. It’s hard to handle “two” - it’s easy to handle two cases of “one”.

It also is a good guide for how to handle data sets, assuming they could always be infinite in size (or at least any arbitrary size).

You can then apply this to tests. In a test given a function of:

fn test_me(a: Option<Type>, b: Vec<Type>)

We know we need to test permutations of:

  • a is “Zero” or “One” (Some, None)
  • b is “Zero”, “One” or “Infinite” (.len() == 0, .len() == 1, .len() > 0)

Note: Most languages don’t have an array type that is “One to Infinite”, IE non-empty. If you want this condition (at least one item), you have to assert it yourself ontop of the type system.

Correct, Simple, Fast

Finally, we can put all these above tools together and apply a general philosophy. When writing a program, first make it correct, then simpify the program, then make it fast.

If you don’t do it in this order you will hit barriers - social and technical. For example, if you make something fast, simple, correct, you will likely have issues that can be fixed without making a decrease in performance. People don’t like it when you introduce a patch that drops performance, so as a result correctness is now sacrificed. (Spectre anyone?)

If you make something too simple, you may never be able to make it correctly handle all cases that exist in your application - likely facilitating a future rewrite to make it correct.

If you do correct, fast, simple, then your program will be correct, and fast, but hard for a human to understand. Because programming is the art of communicating intent to a person sacrificing simplicity in favour of fast will make it hard to involve new people and educate and mentor them into development of your project.

  • Correct: Does it behave correctly, handle all states and inputs correctly?
  • Simple: Is it easy to comprehend and follow for a human reader?
  • Fast: Is it performant?

Meaningful 2fa on modern linux

Posted by William Brown on February 11, 2019 02:00 PM

Meaningful 2fa on modern linux

Recently I heard of someone asking the question:

“I have an AD environment connected with <product> IDM. I want to have 2fa/mfa to my linux machines for ssh, that works when the central servers are offline. What’s the best way to achieve this?”

Today I’m going to break this down - but the conclusion for the lazy is:

This is not realistically possible today: use ssh keys with ldap distribution, and mfa on the workstations, with full disk encryption.

Background

So there are a few parts here. AD is for intents and purposes an LDAP server. The <product> is also an LDAP server, that syncs to AD. We don’t care if that’s 389-ds, freeipa or vendor solution. The results are basically the same.

Now the linux auth stack is, and will always use pam for the authentication, and nsswitch for user id lookups. Today, we assume that most people run sssd, but pam modules for different options are possible.

There are a stack of possible options, and they all have various flaws.

  • FreeIPA + 2fa
  • PAM TOTP modules
  • PAM radius to a TOTP server
  • Smartcards

FreeIPA + 2fa

Now this is the one most IDM people would throw out. The issue here is the person already has AD and a vendor product. They don’t need a third solution.

Next is the fact that FreeIPA stores the TOTP in the LDAP, which means FreeIPA has to be online for it to work. So this is eliminated by the “central servers offline” requirement.

PAM radius to TOTP server

Same as above: An extra product, and you have a source of truth that can go down.

PAM TOTP module on hosts

Okay, even if you can get this to scale, you need to send the private seed material of every TOTP device that could login to the machine, to every machine. That means any compromise, compromises every TOTP token on your network. Bad place to be in.

Smartcards

Are notoriously difficult to have functional, let alone with SSH. Don’t bother. (Where the Smartcard does TLS auth to the SSH server this is.)

Come on William, why are you so doom and gloom!

Lets back up for a second and think about what we we are trying to prevent by having mfa at all. We want to prevent single factor compromise from having a large impact and we want to prevent brute force attacks. (There are probably more reasons, but these are the ones I’ll focus on).

So the best answer: Use mfa on the workstation (password + totp), then use ssh keys to the hosts.

This means the target of the attack is small, and the workstation can be protected by things like full disk encryption and group policy. To sudo on the host you still need the password. This makes sudo MFA to root as you need something know, and something you have.

If you are extra conscious you can put your ssh keys on smartcards. This works on linux and osx workstations with yubikeys as I am aware. Apparently you can have ssh keys in TPM, which would give you tighter hardware binding, but I don’t know how to achieve this (yet).

To make all this better, you can distributed your ssh public keys in ldap, which means you gain the benefits of LDAP account locking/revocation, you can remove the keys instantly if they are breached, and you have very little admin overhead to configuration of this service on the linux server side. Think about how easy onboarding is if you only need to put your ssh key in one place and it works on every server! Let alone shutting down a compromised account: lock it in one place, and they are denied access to every server.

SSSD as the LDAP client on the server can also cache the passwords (hashed) and the ssh public keys, which means a disconnected client will still be able to be authenticated to.

At this point, because you have ssh key auth working, you could even deny password auth as an option in ssh altogether, eliminating an entire class of bruteforce vectors.

For bonus marks: You can use AD as the generic LDAP server that stores your SSH keys. No additional vendor products needed, you already have everything required today, for free. Everyone loves free.

Conclusion

If you want strong, offline capable, distributed mfa on linux servers, the only choice today is LDAP with SSH key distribution.

Want to know more? This blog contains how-tos on SSH key distribution for AD, SSH keys on smartcards, and how to configure SSSD to use SSH keys from LDAP.

Using the latest 389-ds on OpenSUSE

Posted by William Brown on January 29, 2019 02:00 PM

Using the latest 389-ds on OpenSUSE

Thanks to some help from my friend who works on OBS, I’ve finally got a good package in review for submission to tumbleweed. However, if you are impatient and want to use the “latest” and greatest 389-ds version on OpenSUSE.

zypper ar obs://network:ldap network:ldap
zypper in 389-ds

Docker

docker run --rm -i -t registry.opensuse.org/home/firstyear/containers/389-ds-container:latest

To make it persistent:

docker run -v 389ds_data:/data <your options here ...> registry.opensuse.org/home/firstyear/containers/389-ds-container:latest

Then to run the admin tools:

docker exec -i -t <container name> /usr/sbin/dsconf ...
docker exec -i -t <container name> /usr/sbin/dsidm ...

Testing in docker?

If you are “testing” in docker (please don’t do this in production: for production see above) you’ll need to do some tweaks to get around the lack of systemd.

docker run -i -t opensuse/tumbleweed:latest
zypper ar obs://network:ldap network:ldap
zypper in 389-ds

vim /usr/share/dirsrv/inf/defaults.inf
# change the following to match:
with_systemd = 0

What next?

After this, you should now be able to follow our new quickstart guide on the 389-ds website.

If you followed the docker steps, skip to adding users and groups

The network:ldap repo and the container listed are updated when upstream makes releases so you’ll always get the latest 389-ds

EDIT: Updated 2019-04-03 to change repo as changes have progressed forward.

EDIT: Updated 2019-08-27 Improve clarity about when you need to do docker tweaks, and add docker image steps

SUSE Open Build Service cheat sheet

Posted by William Brown on January 18, 2019 02:00 PM

SUSE Open Build Service cheat sheet

Part of starting at SUSE has meant that I get to learn about Open Build Service. I’ve known that the project existed for a long time but I have never had a chance to use it. So far I’m thoroughly impressed by how it works and the features it offers.

As A Consumer

The best part of OBS is that it’s trivial on OpenSUSE to consume content from it. Zypper can add projects with the command:

zypper ar obs://<project name> <repo nickname>
zypper ar obs://network:ldap network:ldap

I like to give the repo nickname (your choice) to be the same as the project name so I know what I have enabled. Once you run this you can easily consume content from OBS.

Package Management

As someone who has started to contribute to the suse 389-ds package, I’ve been slowly learning how this work flow works. OBS similar to GitHub/Lab allows a branching and request model.

On OpenSUSE you will want to use the osc tool for your workflow:

zypper in osc
# If you plan to use the "service" command
zypper in obs-service-tar obs-service-obs_scm obs-service-recompress obs-service-set_version obs-service-download_files python-xml obs-service-format_spec_file

You can branch from an existing project to make changes with:

osc branch <project> <package>
osc branch network:ldap 389-ds

This will branch the project to my home namespace. For me this will land in “home:firstyear:branches:network:ldap”. Now I can checkout the content on to my machine to work on it.

osc co <project>
osc co home:firstyear:branches:network:ldap

This will create the folder “home:…:ldap” in the current working directory.

From here you can now work on the project. Some useful commands are:

Add new files to the project (patches, new source tarballs etc).

osc add <path to file>
osc add feature.patch
osc add new-source.tar.xz

Edit the change log of the project (I think this is used in release notes?)

osc vc

To ammend your changes, use:

osc vc -e

Build your changes locally matching the system you are on. Packages normally build on all/most OpenSUSE versions and architectures, this will build just for your local system and arch.

osc build

Make sure you clean up files you aren’t using any more with:

osc rm <filename>
# This commands removes anything untracked by osc.
osc clean

Commit your changes to the OBS server, where a complete build will be triggered:

osc commit

View the results of the last commit:

osc results

Enable people to use your branch/project as a repository. You edit the project metadata and enable repo publishing:

osc meta prj -e <name of project>
osc meta prj -e home:firstyear:branches:network:ldap

# When your editor opens, change this section to enabled (disabled by default):
<publish>
  <enabled />
</publish>

NOTE: In some cases if you have the package already installed, and you add the repo/update it won’t install from your repo. This is because in SUSE packages have a notion of “vendoring”. They continue to update from the same repo as they were originally installed from. So if you want to change this you use:

zypper [d]up --from <repo name>

You can then create a “request” to merge your branch changes back to the project origin. This is:

osc sr

A helpful maintainer will then review your changes. You can see this with.

osc rq show <your request id>

If you change your request, to submit again, use:

osc sr

And it will ask if you want to replace (supercede) the previous request.

I was also helped by a friend to provie a “service” configuration that allows generation of tar balls from git. It’s not always appropriate to use this, but if the repo has a “_service” file, you can regenerate the tar with:

osc service ra

So far this is as far as I have gotten with OBS, but I already appreciate how great this work flow is for package maintainers, reviewers and consumers. It’s a pleasure to work with software this well built.

As an additional piece of information, it’s a good idea to read the OBS Packaging Guidelines
to be sure that you are doing the right thing!
# Acts as tail
osc bl osc r -v

# How to access the meta and docker stuff

osc meta pkg -e osc meta prj -e

(will add vim to the buildroot), then you can chroot (allow editing in the build root)

osc chroot osc build -x vim

-k <dir> keeps artifacts in directory dir IE rpm outputs

oscrc buildroot variable, mount tmpfs to that location.

docker privs SYS_ADMIN, SYS_CHROOT

Multiple spec: commit second spec to pkg then:

osc linkpac prj pkg prj new-link-pkg

rpm –eval ‘%{variable}’

%setup -n “name of what the directory unpacks to, not what to rename to”

When no link exists

osc submitpac destprj deskpkg

Structuring Rust Transactions

Posted by William Brown on January 18, 2019 02:00 PM

Structuring Rust Transactions

I’ve been working on a database-related project in Rust recently, which takes advantage of my concurrently readable datastructures. However I ran into a problem of how to structure Read/Write transaction structures that shared the reader code, and container multiple inner read/write types.

Some Constraints

To be clear, there are some constraints. A “parent” write, will only ever contain write transaction guards, and a read will only ever contain read transaction guards. This means we aren’t going to hit any deadlocks in the code. Rust can’t protect us from mis-ording locks. An additional requirement is that readers and a single write must be able to proceed simultaneously - but having a rwlock style writer or readers behaviour would still work here.

Some Background

To simplify this, imagine we have two concurrently readable datastructures. We’ll call them db_a and db_b.

struct db_a { ... }

struct db_b { ... }

Now, each of db_a and db_b has their own way to protect their inner content, but they’ll return a DBWriteGuard or DBReadGuard when we call db_a.read()/write() respectively.

impl db_a {
    pub fn read(&self) -> DBReadGuard {
        ...
    }

    pub fn write(&self) -> DBWriteGuard {
        ...
    }
}

Now we make a “parent” wrapper transaction such as:

struct server {
    a: db_a,
    b: db_b,
}

struct server_read {
    a: DBReadGuard,
    b: DBReadGuard,
}

struct server_write {
    a: DBWriteGuard,
    b: DBWriteGuard,
}

impl server {
    pub fn read(&self) -> server_read {
        server_read {
            self.a.read(),
            self.b.read(),
        }
    }

    pub fn write(&self) -> server_write {
        server_read {
            self.a.write(),
            self.b.write(),
        }
    }
}

The Problem

Now the problem is that on my server_read and server_write I want to implement a function for “search” that uses the same code. Search or a read or write should behave identically! I wanted to also avoid the use of macros as the can hide issues while stepping in a debugger like LLDB/GDB.

Often the answer with rust is “traits”, to create an interface that types adhere to. Rust also allows default trait implementations, which sounds like it could be a solution here.

pub trait server_read_trait {
    fn search(&self) -> SomeResult {
        let result_a = self.a.search(...);
        let result_b = self.b.search(...);
        SomeResult(result_a, result_b)
    }
}

In this case, the issue is that &self in a trait is not aware of the fields in the struct - traits don’t define that fields must exist, so the compiler can’t assume they exist at all.

Second, the type of self.a/b is unknown to the trait - because in a read it’s a “a: DBReadGuard”, and for a write it’s “a: DBWriteGuard”.

The first problem can be solved by using a get_field type in the trait. Rust will also compile this out as an inline, so the correct thing for the type system is also the optimal thing at run time. So we’ll update this to:

pub trait server_read_trait {
    fn get_a(&self) -> ???;

    fn get_b(&self) -> ???;

    fn search(&self) -> SomeResult {
        let result_a = self.get_a().search(...); // note the change from self.a to self.get_a()
        let result_b = self.get_b().search(...);
        SomeResult(result_a, result_b)
    }
}

impl server_read_trait for server_read {
    fn get_a(&self) -> &DBReadGuard {
        &self.a
    }
    // get_b is similar, so ommitted
}

impl server_read_trait for server_write {
    fn get_a(&self) -> &DBWriteGuard {
        &self.a
    }
    // get_b is similar, so ommitted
}

So now we have the second problem remaining: for the server_write we have DBWriteGuard, and read we have a DBReadGuard. There was a much longer experimentation process, but eventually the answer was simpler than I was expecting. Rust allows traits to have Self types that enforce trait bounds rather than a concrete type.

So provided that DBReadGuard and DBWriteGuard both implement “DBReadTrait”, then we can have the server_read_trait have a self type that enforces this. It looks something like:

pub trait DBReadTrait {
    fn search(&self) -> ...;
}

impl DBReadTrait for DBReadGuard {
    fn search(&self) -> ... { ... }
}

impl DBReadTrait for DBWriteGuard {
    fn search(&self) -> ... { ... }
}

pub trait server_read_trait {
    type GuardType: DBReadTrait; // Say that GuardType must implement DBReadTrait

    fn get_a(&self) -> &Self::GuardType; // implementors must return that type implementing the trait.

    fn get_b(&self) -> &Self::GuardType;

    fn search(&self) -> SomeResult {
        let result_a = self.get_a().search(...);
        let result_b = self.get_b().search(...);
        SomeResult(result_a, result_b)
    }
}

impl server_read_trait for server_read {
    fn get_a(&self) -> &DBReadGuard {
        &self.a
    }
    // get_b is similar, so ommitted
}

impl server_read_trait for server_write {
    fn get_a(&self) -> &DBWriteGuard {
        &self.a
    }
    // get_b is similar, so ommitted
}

This works! We now have a way to write a single “search” type for our server read and write types. In my case, the DBReadTrait also uses a similar technique to define a search type shared between the DBReadGuard and DBWriteGuard.

Useful USG pro 4 commands and hints

Posted by William Brown on January 01, 2019 02:00 PM

Useful USG pro 4 commands and hints

I’ve recently changed from a FreeBSD vm as my router to a Ubiquiti PRO USG4. It’s a solid device, with many great features, and I’m really impressed at how it “just works” in many cases. So far my only disappointment is lack of documentation about the CLI, especially for debugging and auditing what is occuring in the system, and for troubleshooting steps. This post will aggregate some of my knowledge about the topic.

Current config

Show the current config with:

mca-ctrl -t dump-cfg

You can show system status with the “show” command. Pressing ? will cause the current compeletion options to be displayed. For example:

# show <?>
arp              date             dhcpv6-pd        hardware

DNS

The following commands show the DNS statistics, the DNS configuration, and allow changing the cache-size. The cache-size is measured in number of records cached, rather than KB/MB. To make this permanent, you need to apply the change to config.json in your controllers sites folder.

show dns forwarding statistics
show system name-server
set service dns forwarding cache-size 10000
clear dns forwarding cache

Logging

You can see and aggregate of system logs with

show log

Note that when you set firewall rules to “log on block” they go to dmesg, not syslog, so as a result you need to check dmesg for these.

It’s a great idea to forward your logs in the controller to a syslog server as this allows you to aggregate and see all the events occuring in a single time series (great when I was diagnosing an issue recently).

Interfaces

To show the system interfaces

show interfaces

To restart your pppoe dhcp6c:

release dhcpv6-pd interface pppoe0
renew dhcpv6-pd interface pppoe0

There is a current issue where the firmware will start dhcp6c on eth2 and pppoe0, but the session on eth2 blocks the pppoe0 client. As a result, you need to release on eth2, then renew of pppoe0

If you are using a dynamic prefix rather than static, you may need to reset your dhcp6c duid.

delete dhcpv6-pd duid

To restart an interface with the vyatta tools:

disconnect interface pppoe
connect interface pppoe

OpenVPN

I have setup customised OpenVPN tunnels. To show these:

show interfaces openvpn detail

These are configured in config.json with:

# Section: config.json - interfaces - openvpn
    "vtun0": {
            "encryption": "aes256",
            # This assigns the interface to the firewall zone relevant.
            "firewall": {
                    "in": {
                            "ipv6-name": "LANv6_IN",
                            "name": "LAN_IN"
                    },
                    "local": {
                            "ipv6-name": "LANv6_LOCAL",
                            "name": "LAN_LOCAL"
                    },
                    "out": {
                            "ipv6-name": "LANv6_OUT",
                            "name": "LAN_OUT"
                    }
            },
            "mode": "server",
            # By default, ubnt adds a number of parameters to the CLI, which
            # you can see with ps | grep openvpn
            "openvpn-option": [
                    # If you are making site to site tunnels, you need the ccd
                    # directory, with hostname for the file name and
                    # definitions such as:
                    # iroute 172.20.0.0 255.255.0.0
                    "--client-config-dir /config/auth/openvpn/ccd",
                    "--keepalive 10 60",
                    "--user nobody",
                    "--group nogroup",
                    "--proto udp",
                    "--port 1195"
            ],
            "server": {
                    "push-route": [
                            "172.24.0.0/17"
                    ],
                    "subnet": "172.24.251.0/24"
            },
            "tls": {
                    "ca-cert-file": "/config/auth/openvpn/vps/vps-ca.crt",
                    "cert-file": "/config/auth/openvpn/vps/vps-server.crt",
                    "dh-file": "/config/auth/openvpn/dh2048.pem",
                    "key-file": "/config/auth/openvpn/vps/vps-server.key"
            }
    },

Netflow

Net flows allow a set of connection tracking data to be sent to a remote host for aggregation and analysis. Sadly this process was mostly undocumented, bar some useful forum commentors. Here is the process that I came up with. This is how you configure it live:

set system flow-accounting interface eth3.11
set system flow-accounting netflow server 172.24.10.22 port 6500
set system flow-accounting netflow version 5
set system flow-accounting netflow sampling-rate 1
set system flow-accounting netflow timeout max-active-life 1
commit

To make this persistent:

"system": {
            "flow-accounting": {
                    "interface": [
                            "eth3.11",
                            "eth3.12"
                    ],
                    "netflow": {
                            "sampling-rate": "1",
                            "version": "5",
                            "server": {
                                    "172.24.10.22": {
                                            "port": "6500"
                                    }
                            },
                            "timeout": {
                                    "max-active-life": "1"
                            }
                    }
            }
    },

To show the current state of your flows:

show flow-accounting

The idea of CI and Engineering

Posted by William Brown on January 01, 2019 02:00 PM

The idea of CI and Engineering

In software development I see and interesting trend and push towards continuous integration, continually testing, and testing in production. These techniques are designed to allow faster feedback on errors, use real data for application testing, and to deliver features and changes faster.

But is that really how people use software on devices? When we consider an operation like google or amazon, this always online technique may work, but what happens when we apply a continous integration and “we’ll patch it later” mindset to devices like phones or internet of things?

What happens in other disciplines?

In real engineering disciplines like aviation or construction, techniques like this don’t really work. We don’t continually build bridges, then fix them when they break or collapse. There are people who provide formal analysis of materials, their characteristics. Engineers consider careful designs, constraints, loads and situations that may occur. The structure is planned, reviewed and verified mathematically. Procedures and oversight is applied to ensure correct building of the structure. Lessons are learnt from past failures and incidents and are applied into every layer of the design and construction process. Communication between engineers and many other people is critical to the process. Concerns are always addressed and managed.

The first thing to note is that if we just built lots of scale-model bridges and continually broke them until we found their limits, this would waste many resources to do this. Bridges are carefully planned and proven.

So whats the point with software?

Today we still have a mindset that continually breaking and building is a reasonable path to follow. It’s not! It means that the only way to achieve quality is to have a large test suite (requires people and time to write), which has to be further derived from failures (and those failures can negatively affect real people), then we have to apply large amounts of electrical energy to continually run the tests. The test suites can’t even guarantee complete coverage of all situations and occurances!

This puts CI techniques out of reach of many application developers due to time and energy (translated to dollars) limits. Services like travis on github certainly helps to lower the energy requirement, but it doesn’t stop the time and test writing requirements.

No matter how many tests we have for a program, if that program is written in C or something else, we continually see faults and security/stability issues in that software.

What if we CI on … a phone?

Today we even have hardware devices that are approached as though they “test in production” is a reasonable thing. It’s not! People don’t patch, telcos don’t allow updates out to users, and those that are aware, have to do custom rom deployment. This creates an odd dichomtemy of “haves” and “haves not”, of those in technical know how who have a better experience, and the “haves not” who have to suffer potentially insecure devices. This is especially terrifying given how deeply personal phones are.

This is a reality of our world. People do not patch. They do not patch phones, laptops, network devices and more. Even enterprises will avoid patching if possible. Rather than trying to shift the entire culture of humans to “update always”, we need to write software that can cope in harsh conditions, for long term. We only need to look to software in aviation to see we can absolutely achieve this!

What should we do?

I believe that for software developers to properly become software engineers we should look to engineers in civil and aviation industries. We need to apply:

  • Regualation and ethics (Safety of people is always first)
  • Formal verification
  • Consider all software will run long term (5+ years)
  • Improve team work and collaboration on designs and development

The reality of our world is people are deploying devices (routers, networks, phones, lights, laptops more …) where they may never be updated or patched in their service life. Even I’m guilty (I have a modem that’s been unpatched for about 6 years but it’s pretty locked down …). As a result we need to rely on proof that the device can not fail at build time, rather than patch it later which may never occur! Putting formal verification first, and always considering user safety and rights first, shifts a large burden to us in terms of time. But many tools (Coq, fstar, rust …) all make formal verification more accessible to use in our industry. Verifying our software is a far stronger assertion of quality than “throw tests at it and hope it works”.

You’re crazy William, and also wrong

Am I? Looking at “critical” systems like iPhone encryption hardware, they are running the formally verified Sel4. We also heard at Kiwicon in 2018 that Microsoft and XBox are using formal verification to design their low levels of their system to prevent exploits from occuring in the first place.

Over time our industry will evolve, and it will become easier and more cost effective to formally verify than to operate and deploy CI. This doesn’t mean we don’t need tests - it means that the first line of quality should be in verification of correctness using formal techniques rather than using tests and CI to prove correct behaviour. Tests are certainly still required to assert further behavioural elements of software.

Today, if you want to do this, you should be looking at Coq and program extraction, fstar and the kremlin (project everest, a formally verified https stack), Rust (which has a subset of the safe language formally proven). I’m sure there are more, but these are the ones I know off the top of my head.

Conclusion

Over time our industry must evolve to put the safety of humans first. To achive this we must look to other safety driven cultures such as aviation and civil engineering. Only by learning from their strict disciplines and behaviours can we start to provide software that matches behavioural and quality expectations humans have for software.

Nextcloud and badrequest filesize incorrect

Posted by William Brown on December 30, 2018 02:00 PM

Nextcloud and badrequest filesize incorrect

My friend came to my house and was trying to share some large files with my nextcloud instance. Part way through the upload an error occurred.

"Exception":"Sabre\\DAV\\Exception\\BadRequest","Message":"expected filesize 1768906752 got 1768554496"

It turns out this error can be caused by many sources. It could be timeouts, bad requests, network packet loss, incorrect nextcloud configuration or more.

We tried uploading larger files (by a factor of 10 times) and they worked. This eliminated timeouts as a cause, and probably network loss. Being on ethernet direct to the server generally also helps to eliminate packet loss as a cause compared to say internet.

We also knew that the server must not have been misconfigured because a larger file did upload, so no file or resource limits were being hit.

This also indicated that the client was likely doing the right thing because larger and smaller files would upload correctly. The symptom now only affected a single file.

At this point I realised, what if the client and server were both victims to a lower level issue? I asked my friend to ls the file and read me the number of bytes long. It was 1768906752, as expected in nextcloud.

Then I asked him to cat that file into a new file, and to tell me the length of the new file. Cat encountered an error, but ls on the new file indeed showed a size of 1768554496. That means filesystem corruption! What could have lead to this?

HFS+

Apple’s legacy filesystem (and the reason I stopped using macs) is well known for silently eating files and corrupting content. Here we had yet another case of that damage occuring, and triggering errors elsewhere.

Bisecting these issues and eliminating possibilities through a scientific method is always the best way to resolve the cause, and it may come from surprising places!

Identity ideas …

Posted by William Brown on December 20, 2018 02:00 PM

Identity ideas …

I’ve been meaning to write this post for a long time. Taking half a year away from the 389-ds team, and exploring a lot of ideas from other projects has led me to come up with some really interesting ideas about what we do well, and what we don’t. I feel like this blog could be divisive, as I really think that for our services to stay relevant we need to make changes that really change our own identity - so that we can better represent yours.

So strap in, this is going to be long …

What’s currently on the market

Right now the market for identity has two extremes. At one end we have the legacy “create your own” systems, that are build on technologies like LDAP and Kerberos. I’m thinking about things like 389 Directory Server, OpenLDAP, Active Directory, FreeIPA and more. These all happen to be constrained heavily by complexity, fragility, and administrative workload. You need to spend months to learn these and even still, you will make mistakes and there will be problems.

At the other end we have hosted “Identity as a Service” options like Azure AD and Auth0. These have very intelligently, unbound themself from legacy, and tend to offer HTTP apis, 2fa and other features that “just work”. But they are all in the cloud, and outside your control.

But there is nothing in the middle. There is no option that “just works”, supports modern standards, and is unhindered by legacy that you can self deploy with minimal administrative fuss - or years of experience.

What do I like from 389?

  • Replication

The replication system is extremely robust, and has passed many complex tests for cases of eventual consistency correctness. It’s very rare to hear of any kind of data corruption or loss within our replication system, and that’s testament to the great work of people who spent years looking at the topic.

  • Performance

We aren’t as fast as OpenLDAP is 1 vs 1 server, but our replication scalability is much higher, where in any size of MMR or read-only replica topology, we have higher horizontal scaling, nearly linear based on server additions. If you want to run a cloud scale replicated database, we scale to it (and people already do this!).

  • Stability

Our server stability is well known with administrators, and honestly is a huge selling point. We see servers that only go down when administrators are performing upgrades. Our work with sanitising tools and the careful eyes of the team has ensured our code base is reliable and solid. Having extensive tests and amazing dedicated quality engineers also goes a long way.

  • Feature rich

There are a lot of features I really like, and are really useful as an admin deploying this service. Things like memberof (which is actually a group resolution cache when you think about it …), automember, online backup, unique attribute enforcement, dereferencing, and more.

  • The team

We have a wonderful team of really smart people, all of whom are caring and want to advance the state of identity management. Not only do they want to keep up with technical changes and excellence, they are listening to and want to improve our social awareness of identity management.

Pain Points

  • C

Because DS is written in C, it’s risky and difficult to make changes. People constantly make mistakes that introduce unsafety (even myself), and worse. No amount of tooling or intelligence can take away the fact that C is just hard to use, and people need to be perfect (people are not perfect!) and today we have better tools. We can not spend our time chasing our tails on pointless issues that C creates, when we should be doing better things.

  • Everything about dynamic admin, config, and plugins is hard and can’t scale

Because we need to maintain consistency through operations from start to end but we also allow changing config, plugins, and more during the servers operation the current locking design just doesn’t scale. It’s also not 100% safe either as the values are changed by atomics, not managed by transactions. We could use copy-on-write for this, but why? Config should be managed by tools like ansible, but today our dynamic config and plugins is both a performance over head and an admin overhead because we exclude best practice tools and have to spend a large amount of time to maintain consistent data when we shouldn’t need to. Less features is less support overhead on us, and simpler to test and assert quality and correct behaviour.

  • Plugins to address shortfalls, but a bit odd.

We have all these features to address issues, but they all do it … kind of the odd way. Managed Entries creates user private groups on object creation. But the problem is “unix requires a private group” and “ldap schema doesn’t allow a user to be a group and user at the same time”. So the answer is actually to create a new objectClass that let’s a user ALSO be it’s own UPG, not “create an object that links to the user”. (Or have a client generate the group from user attributes but we shouldn’t shift responsibility to the client.)

Distributed Numeric Assignment is based on the AD rid model, but it’s all about “how can we assign a value to a user that’s unique?”. We already have a way to do this, in the UUID, so why not derive the UID/GID from the UUID. This means there is no complex inter-server communication, pooling, just simple isolated functionality.

We have lots of features that just are a bit complex, and could have been made simpler, that now we have to support, and can’t change to make them better. If we rolled a new “fixed” version, we would then have to support both because projects like FreeIPA aren’t going to just change over.

  • client tools are controlled by others and complex (sssd, openldap)

Every tool for dealing with ldap is really confusing and arcane. They all have wild (unhelpful) defaults, and generally this scares people off. I took months of work to get a working ldap server in the past. Why? It’s 2018, things need to “just work”. Our tools should “just work”. Why should I need to hand edit pam? Why do I need to set weird options in SSSD.conf? All of this makes the whole experience poor.

We are making client tools that can help (to an extent), but they are really limited to system administration and they aren’t “generic” tools for every possible configuration that exists. So at some point people will still find a limit where they have to touch ldap commands. A common request is a simple to use web portal for password resets, which today only really exists in FreeIPA, and that limits it’s application already.

  • hard to change legacy

It’s really hard to make code changes because our surface area is so broad and the many use cases means that we risk breakage every time we do. I have even broken customer deployments like this. It’s almost impossible to get away from, and that holds us back because it means we are scared to make changes because we have to support the 1 million existing work flows. To add another is more support risk.

Many deployments use legacy schema elements that holds us back, ranging from the inet types, schema that enforces a first/last name, schema that won’t express users + groups in a simple away. It’s hard to ask people to just up and migrate their data, and even if we wanted too, ldap allows too much freedom so we are more likely to break data, than migrate it correctly if we tried.

This holds us back from technical changes, and social representation changes. People are more likely to engage with a large migrational change, than an incremental change that disturbs their current workflow (IE moving from on prem to cloud, rather than invest in smaller iterative changes to make their local solutions better).

  • ACI’s are really complex

389’s access controls are good because they are in the tree and replicated, but bad because the syntax is awful, complex, and has lots of traps and complexity. Even I need to look up how to write them when I have to. This is not good for a project that has such deep security concerns, where your ACI’s can look correct but actually expose all your data to risks.

  • LDAP as a protocol is like an 90’s drug experience

LDAP may be the lingua franca of authentication, but it’s complex, hard to use and hard to write implementations for. That’s why in opensource we have a monoculture of using the openldap client libraries because no one can work out how to write a standalone library. Layer on top the complexity of the object and naming model, and we have a situation where no one wants to interact with LDAP and rather keeps it at arm length.

It’s going to be extremely hard to move forward here, because the community is so fragmented and small, and the working groups dispersed that the idea of LDAPv4 is a dream that no one should pursue, even though it’s desperately needed.

  • TLS

TLS is great. NSS databases and tools are not.

  • GSSAPI + SSO

GSSAPI and Kerberos are a piece of legacy that we just can’t escape from. They are a curse almost, and one we need to break away from as it’s completely unusable (even if it what it promises is amazing). We need to do better.

That and SSO allows loads of attacks to proceed, where we actually want isolated token auth with limited access scopes …

What could we offer

  • Web application as a first class consumer.

People want web portals for their clients, and they want to be able to use web applications as the consumer of authentication. The HTTP protocols must be the first class integration point for anything in identity management today. This means using things like OAUTH/OIDC.

  • Systems security as a first class consumer.

Administrators still need to SSH to machines, and people still need their systems to have identities running on them. Having pam/nsswitch modules is a very major requirement, where those modules have to be fast, simple, and work correctly. Users should “imply” a private group, and UID/GID should by dynamic from UUID (or admins can override it).

  • 2FA/u2f/TOTP.

Multi-factor auth is here (not coming, here), and we are behind the game. We already have Apple and MS pushing for webauthn in their devices. We need to be there for these standards to work, and to support the next authentication tool after that.

  • Good RADIUS integration.

RADIUS is not going away, and is important in education providers and business networks, so RADIUS must “just work”. Importantly, this means mschapv2 which is the universal default for all clients to operate with, which means nthash.

However, we can make the nthash unlinked from your normal password, so you can then have wifi password and a seperate loging password. We could even generate an NTHash containing the TOTP token for more high security environments.

  • better data structure (flat, defined by object types).

The tree structure of LDAP is confusing, but a flatter structure is easier to manage and understand. We can use ideas from kubernetes like tags/labels which can be used to provide certain controls and filtering capabilities for searches and access profiles to apply to.

  • structured logging, with in built performance profiling.

Being able to diagnose why an operation is slow is critical and having structured logs with profiling information is key to allowing admins and developers to resolve performance issues at scale. It’s also critical to have auditing of every single change made in the system, including internal changes that occur during operations.

  • access profiles with auditing capability.

Access profiles that express what you can access, and how. Easier to audit, generate, and should be tightly linked to group membership for real RBAC style capabilities.

  • transactions by allowing batch operations.

LDAP wants to provide a transaction system over a set of operations, but that may cause performance issues on write paths. Instead, why not allow submission of batches of changes that all must occur “at the same time” or “none”. This is faster network wise, protocol wise, and simpler for a server to implement.

What’s next then …

Instead of fixing what we have, why not take the best of what we have, and offer something new in parallel? Start a new front end that speaks in an accessible way, that has modern structures, and has learnt from the lessons of the past? We can build it to standalone, or proxy from the robust core of 389 Directory Server allowing migration paths, but eschew the pain of trying to bring people to the modern world. We can offer something unique, an open source identity system that’s easy to use, fast, secure, that you can run on your terms, or in the cloud.

This parallel project seems like a good idea … I wonder what to name it …

Work around docker exec bug

Posted by William Brown on December 08, 2018 02:00 PM

Work around docker exec bug

There is currently a docker exec bug in Centos/RHEL 7 that causes errors such as:

# docker exec -i -t instance /bin/sh
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:110: decoding init error from pipe caused \"read parent: connection reset by peer\""

As a work around you can use nsenter instead:

PID=docker inspect --format {{.State.Pid}} <name of container>
nsenter --target $PID --mount --uts --ipc --net --pid /bin/sh

For more information, you can see the bugreport here.

High Available RADVD on Linux

Posted by William Brown on October 31, 2018 02:00 PM

High Available RADVD on Linux

Recently I was experimenting again with high availability router configurations so that in the cause of an outage or a failover the other router will take over and traffic is still served.

This is usually done through protocols like VRRP to allow virtual ips to exist that can be failed between. However with ipv6 one needs to still allow clients to find the router, and in the cause of a failure, the router advertisments still must continue for client renewals.

To achieve this we need two parts. A shared Link Local address, and a special RADVD configuration.

Because of howe ipv6 routers work, all traffic (even global) is still sent to your link local router. We can use an address like:

fe80::1:1

This doesn’t clash with any reserved or special ipv6 addresses, and it’s easy to remember. Because of how link local works, we can put this on many interfaces of the router (many vlans) with no conflict.

So now to the two components.

Keepalived

Keepalived is a VRRP implementation for linux. It has extensive documentation and sometimes uses some implementation specific language, but it works well for what it does.

Our configuration looks like:

#  /etc/keepalived/keepalived.conf
global_defs {
  vrrp_version 3
}

vrrp_sync_group G1 {
 group {
   ipv6_ens256
 }
}

vrrp_instance ipv6_ens256 {
   interface ens256
   virtual_router_id 62
   priority 50
   advert_int 1.0
   virtual_ipaddress {
    fe80::1:1
    2001:db8::1
   }
   nopreempt
   garp_master_delay 1
}

Note that we provide both a global address and an LL address for the failover. This is important for services and DNS for the router to have the global, but you could omit this. The LL address however is critical to this configuration and must be present.

Now you can start up vrrp, and you should see one of your two linux machines pick up the address.

RADVD

For RADVD to work, a feature of the 2.x series is required. Packaging this for el7 is out of scope of this post, but fedora ships the version required.

The feature is that RADVD can be configured to specify which address it advertises for the router, rather than assuming the interface LL autoconf address is the address to advertise. The configuration appears as:

# /etc/radvd.conf
interface ens256
{
    AdvSendAdvert on;
    MinRtrAdvInterval 30;
    MaxRtrAdvInterval 100;
    AdvRASrcAddress {
        fe80::1:1;
    };
    prefix 2001:db8::/64
    {
        AdvOnLink on;
        AdvAutonomous on;
        AdvRouterAddr off;
    };
};

Note the AdvRASrcAddress parameter? This defines a priority list of address to advertise that could be available on the interface.

Now start up radvd on your two routers, and try failing over between them while you ping from your client. Remember to ping LL from a client you need something like:

ping6 fe80::1:1%en1

Where the outgoing interface of your client traffic is denoted after the ‘%’.

Happy failover routing!

Rust RwLock and Mutex Performance Oddities

Posted by William Brown on October 18, 2018 02:00 PM

Rust RwLock and Mutex Performance Oddities

Recently I have been working on Rust datastructures once again. In the process I wanted to test how my work performed compared to a standard library RwLock and Mutex. On my home laptop the RwLock was 5 times faster, the Mutex 2 times faster than my work.

So checking out my code on my workplace workstation and running my bench marks I noticed the Mutex was the same - 2 times faster. However, the RwLock was 4000 times slower.

What’s a RwLock and Mutex anyway?

In a multithreaded application, it’s important that data that needs to be shared between threads is consistent when accessed. This consistency is not just logical consistency of the data, but affects hardware consistency of the memory in cache. As a simple example, let’s examine an update to a bank account done by two threads:

acc = 10
deposit = 3
withdrawl = 5

[ Thread A ]            [ Thread B ]
acc = load_balance()    acc = load_balance()
acc = acc + deposit     acc = acc - withdrawl
store_balance(acc)      store_balance(acc)

What will the account balance be at the end? The answer is “it depends”. Because threads are working in parallel these operations could happen:

  • At the same time
  • Interleaved (various possibilities)
  • Sequentially

This isn’t very healthy for our bank account. We could lose our deposit, or have invalid data. Valid outcomes at the end are that acc could be 13, 5, 8. Only one of these is correct.

A mutex protects our data in multiple ways. It provides hardware consistency operations so that our cpus cache state is valid. It also allows only a single thread inside of the mutex at a time so we can linearise operations. Mutex comes from the word “Mutual Exclusion” after all.

So our example with a mutex now becomes:

acc = 10
deposit = 3
withdrawl = 5

[ Thread A ]            [ Thread B ]
mutex.lock()            mutex.lock()
acc = load_balance()    acc = load_balance()
acc = acc + deposit     acc = acc - withdrawl
store_balance(acc)      store_balance(acc)
mutex.unlock()          mutex.unlock()

Now only one thread will access our account at a time: The other thread will block until the mutex is released.

A RwLock is a special extension to this pattern. Where a mutex guarantees single access to the data in a read and write form, a RwLock (Read Write Lock) allows multiple read-only views OR single read and writer access. Importantly when a writer wants to access the lock, all readers must complete their work and “drain”. Once the write is complete readers can begin again. So you can imagine it as:

Time ->

T1: -- read --> x
T3:     -- read --> x                x -- read -->
T3:     -- read --> x                x -- read -->
T4:                   | -- write -- |
T5:                                  x -- read -->

Test Case for the RwLock

My test case is simple. Given a set of 12 threads, we spawn:

  • 8 readers. Take a read lock, read the value, release the read lock. If the value == target then stop the thread.
  • 4 writers. Take a write lock, read the value. Add one and write. Continue until value == target then stop.

Other conditions:

  • The test code is identical between Mutex/RwLock (beside the locking costruct)
  • –release is used for compiler optimisations
  • The test hardware is as close as possible (i7 quad core)
  • The tests are run multiple time to construct averages of the performance

The idea being that X target number of writes must occur, while many readers contend as fast as possible on the read. We are pressuring the system of choice between “many readers getting to read fast” or “writers getting priority to drain/block readers”.

On OSX given a target of 500 writes, this was able to complete in 0.01 second for the RwLock. (MBP 2011, 2.8GHz)

On Linux given a target of 500 writes, this completed in 42 seconds. This is a 4000 times difference. (i7-7700 CPU @ 3.60GHz)

All things considered the Linux machine should have an advantage - it’s a desktop processor, of a newer generation, and much faster clock speed. So why is the RwLock performance so much different on Linux?

To the source code!

Examining the Rust source code , many OS primitives come from libc. This is because they require OS support to function. RwLock is an example of this as is mutex and many more. The unix implementation for Rust consumes the pthread_rwlock primitive. This means we need to read man pages to understand the details of each.

OSX uses FreeBSD userland components, so we can assume they follow the BSD man pages. In the FreeBSD man page for pthread_rwlock_rdlock we see:

IMPLEMENTATION NOTES

 To prevent writer starvation, writers are favored over readers.

Linux however, uses different constructs. Looking at the Linux man page:

PTHREAD_RWLOCK_PREFER_READER_NP
  This is the default.  A thread may hold multiple read locks;
  that is, read locks are recursive.  According to The Single
  Unix Specification, the behavior is unspecified when a reader
  tries to place a lock, and there is no write lock but writers
  are waiting.  Giving preference to the reader, as is set by
  PTHREAD_RWLOCK_PREFER_READER_NP, implies that the reader will
  receive the requested lock, even if a writer is waiting.  As
  long as there are readers, the writer will be starved.

Reader vs Writer Preferences?

Due to the policy of a RwLock having multiple readers OR a single writer, a preference is given to one or the other. The preference basically boils down to the choice of:

  • Do you respond to write requests and have new readers block?
  • Do you favour readers but let writers block until reads are complete?

The difference is that on a read heavy workload, a write will continue to be delayed so that readers can begin and complete (up until some threshold of time). However, on a writer focused workload, you allow readers to stall so that writes can complete sooner.

On Linux, they choose a reader preference. On OSX/BSD they choose a writer preference.

Because our test is about how fast can a target of write operations complete, the writer preference of BSD/OSX causes this test to be much faster. Our readers still “read” but are giving way to writers, which completes our test sooner.

However, the linux “reader favour” policy means that our readers (designed for creating conteniton) are allowed to skip the queue and block writers. This causes our writers to starve. Because the test is only concerned with writer completion, the result is (correctly) showing our writers are heavily delayed - even though many more readers are completing.

If we were to track the number of reads that completed, I am sure we would see a large factor of difference where Linux has allow many more readers to complete than the OSX version.

Linux pthread_rwlock does allow you to change this policy (PTHREAD_RWLOCK_PREFER_WRITER_NP) but this isn’t exposed via Rust. This means today, you accept (and trust) the OS default. Rust is just unaware at compile time and run time that such a different policy exists.

Conclusion

Rust like any language consumes operating system primitives. Every OS implements these differently and these differences in OS policy can cause real performance differences in applications between development and production.

It’s well worth understanding the constructions used in programming languages and how they affect the performance of your application - and the decisions behind those tradeoffs.

This isn’t meant to say “don’t use RwLock in Rust on Linux”. This is meant to say “choose it when it makes sense - on read heavy loads, understanding writers will delay”. For my project (A copy on write cell) I will likely conditionally compile rwlock on osx, but mutex on linux as I require a writer favoured behaviour. There are certainly applications that will benefit from the reader priority in linux (especially if there is low writer volume and low penalty to delayed writes).

Photography - Why You Should Use JPG (not RAW)

Posted by William Brown on August 05, 2018 02:00 PM

Photography - Why You Should Use JPG (not RAW)

When I started my modern journey into photography, I simply shot in JPG. I was happy with the results, and the images I was able to produce. It was only later that I was introduced to a now good friend and he said: “You should always shoot RAW! You can edit so much more if you do.”. It’s not hard to find many ‘beginner’ videos all touting the value of RAW for post editing, and how it’s the step from beginner to serious photographer (and editor).

Today, I would like to explore why I have turned off RAW on my camera bodies for good. This is a deeply personal decision, and I hope that my experience helps you to think about your own creative choices. If you want to stay shooting RAW and editing - good on you. If this encourages you to try turning back to JPG - good on you too.

There are two primary reasons for why I turned off RAW:

  • Colour reproduction of in body JPG is better to the eye today.
  • Photography is about composing an image from what you have infront of you.

Colour is about experts (and detail)

I have always been unhappy with the colour output of my editing software when processing RAW images. As someone who is colour blind I did not know if it was just my perception, or if real issues existed. No one else complained so it must just be me right!

Eventually I stumbled on an article about how to develop real colour and extract camera film simulations for my editor. I was interested in both the ability to get true reflections of colour in my images, but also to use the film simulations in post (the black and white of my camera body is beautiful and soft, but my editor is harsh).

I spent a solid week testing and profiling both of my cameras. I quickly realised a great deal about what was occuring in my editor, but also my camera body.

The editor I have, is attempting to generalise over the entire set of sensors that a manufacturer has created. They are also attempting to create a true colour output profile, that is as reflective of reality as possible. So when I was exporting RAWs to JPG, I was seeing the differences between what my camera hardware is, vs the editors profiles. (This was particularly bad on my older body, so I suspect the RAW profiles are designed for the newer sensor).

I then created film simulations and quickly noticed the subtle changes. Blacks were blacker, but retained more fine detail with the simulation. Skin tone was softer. Exposure was more even across a variety of image types. How? RAW and my editor is meant to create the best image possible? Why is a film-simulation I have “extracted” creating better images?

As any good engineer would do I created sample images. A/B testing. I would provide the RAW processed by my editor, and a RAW processed with my film simulation. I would vary the left/right of the image, exposure, subject, and more. After about 10 tests across 5 people, only on one occasion did someone prefer the RAW from my editor.

At this point I realised that my camera manufacturer is hiring experts who build, live and breath colour technology. They have tested and examined everything about the body I have, and likely calibrated it individually in the process to make produce exact reproductions as they see in a lab. They are developing colour profiles that are not just broadly applicable, but also pleasing to look at (even if not accurate reproductions).

So how can my film simulations I extracted and built in a week, measure up to the experts? I decided to find out. I shot test images in JPG and RAW and began to provide A/B/C tests to people.

If the editor RAW was washed out compared to the RAW with my film simulation, the JPG from the body made them pale in comparison. Every detail was better, across a range of conditions. The features in my camera body are better than my editor. Noise reduction, dynamic range, sharpening, softening, colour saturation. I was holding in my hands a device that has thousands of hours of expert design, that could eclipse anything I built on a weekend for fun to achieve the same.

It was then I came to think about and realise …

Composition (and effects) is about you

Photography is a complex skill. It’s not having a fancy camera and just clicking the shutter, or zooming in. Photography is about taking that camera and putting it in a position to take a well composed image based on many rules (and exceptions) that I am still continually learning.

When you stop to look at an image you should always think “how can I compose the best image possible?”.

So why shoot in RAW? RAW is all about enabling editing in post. After you have already composed and taken the image. There are valid times and useful functions of editing. For example whitebalance correction and minor cropping in some cases. Both of these are easily conducted with JPG with no loss in quality compared to the RAW. I still commonly do both of these.

However RAW allows you to recover mistakes during composition (to a point). For example, the powerful base-curve fusion module allows dynamic range “after the fact”. You may even add high or low pass filters, or mask areas to filter and affect the colour to make things pop, or want that RAW data to make your vibrance control as perfect as possible. You may change the perspective, or even add filters and more. Maybe you want to optimise de-noise to make smooth high ISO images. There are so many options!

But all these things are you composing after. Today, many of these functions are in your camera - and better performing. So while I’m composing I can enable dynamic range for the darker elements of the frame. I can compose and add my colour saturation (or remove it). I can sharpen, soften. I can move my own body to change perspective. All at the time I am building the image in my mind, as I compose, I am able to decide on the creative effects I want to place in that image. I’m not longer just composing within a frame, but a canvas of potential effects.

To me this was an important distinction. I always found I was editing poorly-composed images in an attempt to “fix” them to something acceptable. Instead I should have been looking at how to compose them from the start to be great, using the tool in my hand - my camera.

Really, this is a decision that is yours. Do you spend more time now to make the image you want? Or do you spend it later editing to achieve what you want?

Conclusion

Photography is a creative process. You will have your own ideas of how that process should look, and how you want to work with it. Great! This was my experience and how I have arrived at a creative process that I am satisfied with. I hope that it provides you an alternate perspective to the generally accepted “RAW is imperative” line that many people advertise.

GSoC Fedora Happines Packets Update – Week 8th and 9th

Posted by Algogator on July 19, 2018 05:15 AM

The talk got accepted a month ago (https://pagure.io/flock/issue/55) but I didn’t know if I’d be able to attend it. I couldn’t find an appointment at the Houston consulate for July, the entire month was booked but luckily someone cancelled last week so I made a quick trip to Houston. And I got my passport back […]

The post GSoC Fedora Happines Packets Update – Week 8th and 9th appeared first on Anna Philips.

fedmsg on CentOS

Posted by Algogator on June 27, 2018 06:46 PM

Zeromq fedmsg is “a library built on ZeroMQ using the PyZMQ Python bindings”. So I thought it might help to learn a little bit more about ZeroMQ ( which is not a messaging queue). Contexts – You usually have one context per process and which manages the sockets. Socket – It can be configured and […]

The post fedmsg on CentOS appeared first on Anna Philips.

Fedora Happiness Packets on CentOS 7

Posted by Algogator on June 15, 2018 05:55 PM

I’m writing this for future me. My default OS is Ubuntu so setting up the project on CentOS was new for me. gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong –param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong –param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC […]

The post Fedora Happiness Packets on CentOS 7 appeared first on Anna Philips.

GSoC update Week 1 and 2

Posted by Algogator on June 12, 2018 07:03 PM

Week 1: Setup a dev instance of Fedora Happiness Packets on CentOS. Took some time to install all the dependencies. It was so much easier on Ubuntu. Get a dev instance from the Infra team https://pagure.io/fedora-infrastructure/issue/6690 Created my wiki user page over here Started reading and tweaking with mozilla-django-oidc to authenticate with FAS Week 2 […]

The post GSoC update Week 1 and 2 appeared first on Anna Philips.

OpenID Connect and Authenticating against Ipsilon

Posted by Algogator on June 08, 2018 02:20 PM

Step 1 – Get your credentials You need to register your app with the OP https://iddev.fedorainfracloud.org/ I used https://github.com/puiterwijk/oidc-register for that. pip install oidc-register To register you need the provider, application, and redirect(callback) URLs oidc-register https://iddev.fedorainfracloud.org/ https://127.0.0.1:8443 https://127.0.0.1:8443/oidc/callback/ Note: You can’t use http unless it’s with a localhost.  After you do that you should have […]

The post OpenID Connect and Authenticating against Ipsilon appeared first on Anna Philips.