Fedora summer-coding Planet

GSoC 2018: Kicking off the Coding

Posted by Amitosh Swain Mahapatra on May 14, 2018 03:55 PM

It’s May 14, and this is when we officially start coding for GSoC, 2018 edition. This time, I would be working on improving the Fedora Community App with the Fedora project. This marks the beginning of a journey of 3 months of coding, patching, debugging, git (mess) and the awesome discussions with my mentors and the community.

The Fedora App is a central location for Fedora users and innovators to stay updated on The Fedora Project. News updates, social posts, Ask Fedora, as well as articles from Fedora Magazine are all held under this app.

For the first month of my GSoC coding, I will be working of improving the code quality by completing the TypeScript conversion I started earlier as well as I will also be working on bring offline capabilities to Fedora Magazine reader and the Fedora Social.

Why TypeScript?

TypeScript (TS) is a super-set of JavaScript (JS) that brings optional static typing features to JS. TS is the language used by Ionic - the framework on which the Fedora Community App is built. I personally believe, in a medium to large scale project, a static type system is necessary - it especially helps to catch errors very early and allows to perform safe refactors, but I also agree that there are cases where it’s useful to fallback to dynamic typing.

TS does just this. It is implemented as a syntax extension to JavaScript which is transpiled into JS using the TS compiler. Every valid JS syntax is also valid TS syntax, so it becomes easier to port a project - even partially to use TS features, without incurring the cost of a full project rewrite.

In my earlier work in updating the source code to the latest version of Ionic, I started a conversion from JS to TS. Essential parts are already in place, but still there is a big chunk left for conversion. In my first week of coding, I will be working on this conversion.

Offline Capabilities in Fedora App

Currently, the app lacks any offline capabilities - you always require an internet connection to perform most of the actions.

My work will be to bring offline storage and sync to Fedora Magazine and Fedora Social sections of the app. This will both improve the apps usability and performance. From a UX perspective, we will start syncing data rather than blocking the user from doing anything (as it is currently). The user can continue to act on cached items, while we continue to fetch new items in the background.

Further in second week, I will be working on implementing a lightweight blog reader and caching complete offline copies of user-selected articles.

Summer, Code and Fedora

Posted by Amitosh Swain Mahapatra on April 29, 2018 10:41 AM

I have been selected for Google Summer of Code (GSoC) 2018. I’ll be working on developing the backend for the Fedora Community App.

Google Summer of Code

Fedora Project

Fedora has an android app which lets a user browse Fedora Magazine, Ask Fedora, FedoCal etc within it. This app is build using the Ionic Framework, Angular and Cordova. Essentially it is a cross-platform hybrid app.

Project Description

In the current form, most of the functions in the app rely on an in-app browser to render content. This project aims to improve the existing Fedora App for Android for speed, utility and responsiveness, introduce a deeper native integration and make the app more personal for the user.

During the GSoC period I aim for the following deliverables:

  1. Refactored, and restructured code for the android app, providing native experience to the fullest.
  2. Deeper integration with system as well as various Fedora Infra apps.
  3. An Ionic hybrid app that is publishable to app stores like Google Play Store and F-Droid
  4. Providing an offline first experience – caching data from Fedora Magazine and Social and optionally, making some of the content accessible offline.

What would I be working on

The Fedora android app is developed using the Ionic framework and Cordova. It is a framework for developing cross-platform mobile apps, also called as hybrid apps using HTML, CSS and JavaScript.

Here are some of the features I plan to integrate this summer:

  1. Offine data for posts and calendar.
  2. Syncing system calendar with events from FedoCal.
  3. Fedora package search.
  4. Bookmarks and offline reading.
  5. FMN notifications.

About Google Summer of Code

Google Summer of Code (GSoC) is a yearly program by Google to help the open source communities to reach out to student contributors. Organisations pitch projects, and when selected, pick up university students to work on these projects or their own ideas related to the organisation’s project(s).

This is my second GSoC participation. Last summer (2017), I worked with the FOSSi Foundation for bringing code quality metrics for projects listed under LibreCores.org.

While, the summer this time is really really hot (I mean 40°C/104°F in April!), it is going to be interesting!

AD directory admins group setup

Posted by William Brown on April 25, 2018 02:00 PM

AD directory admins group setup

Recently I have been reading many of the Microsoft Active Directory best practices for security and hardening. These are great resources, and very well written. The major theme of the articles is “least privilege”, where accounts like Administrators or Domain Admins are over used and lead to further compromise.

A suggestion that is put forward by the author is to have a group that has no other permissions but to manage the directory service. This should be used to temporarily make a user an admin, then after a period of time they should be removed from the group.

This way you have no Administrators or Domain Admins, but you have an AD only group that can temporarily grant these permissions when required.

I want to explore how to create this and configure the correct access controls to enable this scheme.

Create our group

First, lets create a “Directory Admins” group which will contain our members that have the rights to modify or grant other privileges.

# /usr/local/samba/bin/samba-tool group add 'Directory Admins'
Added group Directory Admins

It’s a really good idea to add this to the “Denied RODC Password Replication Group” to limit the risk of these accounts being compromised during an attack. Additionally, you probably want to make your “admin storage” group also a member of this, but I’ll leave that to you.

# /usr/local/samba/bin/samba-tool group addmembers "Denied RODC Password Replication Group" "Directory Admins"

Now that we have this, lets add a member to it. I strongly advise you create special accounts just for the purpose of directory administration - don’t use your daily account for this!

# /usr/local/samba/bin/samba-tool user create da_william
User 'da_william' created successfully
# /usr/local/samba/bin/samba-tool group addmembers 'Directory Admins' da_william
Added members to group Directory Admins

Configure the permissions

Now we need to configure the correct dsacls to allow Directory Admins full control over directory objects. It could be possible to constrain this to only modification of the cn=builtin and cn=users container however, as directory admins might not need so much control for things like dns modification.

If you want to constrain these permissions, only apply the following to cn=builtins instead - or even just the target groups like Domain Admins.

First we need the objectSID of our Directory Admins group so we can build the ACE.

# /usr/local/samba/bin/samba-tool group show 'directory admins' --attributes=cn,objectsid
dn: CN=Directory Admins,CN=Users,DC=adt,DC=blackhats,DC=net,DC=au
cn: Directory Admins
objectSid: S-1-5-21-2488910578-3334016764-1009705076-1104

Now with this we can construct the ACE.


This permission grants:

  • RP: read property
  • WP: write property
  • LC: list child objects
  • LO: list objects
  • RC: read control

It could be possible to expand these rights: it depends if you want directory admins to be able to do “day to day” ad control jobs, or if you just use them for granting of privileges. That’s up to you. An expanded ACE might be:

# Same as Enterprise Admins

Now lets actually apply this and do a test:

# /usr/local/samba/bin/samba-tool dsacl set --sddl='(A;CI;RPWPLCLORC;;;S-1-5-21-2488910578-3334016764-1009705076-1104)' --objectdn='dc=adt,dc=blackhats,dc=net,dc=au'
# /usr/local/samba/bin/samba-tool group addmembers 'directory admins' administrator -U 'da_william%...'
Added members to group directory admins
# /usr/local/samba/bin/samba-tool group listmembers 'directory admins' -U 'da_william%...'
# /usr/local/samba/bin/samba-tool group removemembers 'directory admins' -U 'da_william%...'
Removed members from group directory admins
# /usr/local/samba/bin/samba-tool group listmembers 'directory admins' -U 'da_william%...'

It works!


With these steps we have created a secure account that has limited admin rights, able to temporarily promote users with privileges for administrative work - and able to remove it once the work is complete.

Understanding AD Access Control Entries

Posted by William Brown on April 19, 2018 02:00 PM

Understanding AD Access Control Entries

A few days ago I set out to work on making samba 4 my default LDAP server. In the process I was forced to learn about Active Directory Access controls. I found that while there was significant documentation around the syntax of these structures, very little existed explaining how to use them effectively.

What’s in an ACE?

If you look at the the ACL of an entry in AD you’ll see something like:


This seems very confusing and complex (and someone should write a tool to explain these … maybe me). But once you can see the structure it starts to make sense.

Most of the access controls you are viewing here are DACLs or Discrestionary Access Control Lists. These make up the majority of the output after ‘O:DAG:DAD:AI’. TODO: What does ‘O:DAG:DAD:AI’ mean completely?

After that there are many ACEs defined in SDDL or ???. The structure is as follows:


Each of these fields can take varies types. These interact to form the access control rules that allow or deny access. Thankfully, you don’t need to adjust many fields to make useful ACE entries.

MS maintains a document of these field values here.

They also maintain a list of wellknown SID values here

I want to cover some common values you may see though:


Most of the types you’ll see are “A” and “OA”. These mean the ACE allows an access by the SID.


These change the behaviour of the ACE. Common values you may want to set are CI and OI. These determine that the ACE should be inherited to child objects. As far as the MS docs say, these behave the same way.

If you see ID in this field it means the ACE has been inherited from a parent object. In this case the inherit_object_guid field will be set to the guid of the parent that set the ACE. This is great, as it allows you to backtrace the origin of access controls!


This is the important part of the ACE - it determines what access the SID has over this object. The MS docs are very comprehensive of what this does, but common values are:

  • RP: read property
  • WP: write property
  • CR: control rights
  • CC: child create (create new objects)
  • DC: delete child
  • LC: list child objects
  • LO: list objects
  • RC: read control
  • WO: write owner (change the owner of an object)
  • WD: write dac (allow writing ACE)
  • SW: self write
  • SD: standard delete
  • DT: delete tree

I’m not 100% sure of all the subtle behaviours of these, because they are not documented that well. If someone can help explain these to me, it would be great.


We will skip some fields and go straight to SID. This is the SID of the object that is allowed the rights from the rights field. This field can take a GUID of the object, or it can take a “well known” value of the SID. For example ‘AN’ means “anonymous users”, or ‘AU’ meaning authenticated users.


I won’t claim to be an AD ACE expert, but I did find the docs hard to interpret at first. Having a breakdown and explanation of the behaviour of the fields can help others, and I really want to hear from people who know more about this topic on me so that I can expand this resource to help others really understand how AD ACE’s work.

Making Samba 4 the default LDAP server

Posted by William Brown on April 17, 2018 02:00 PM

Making Samba 4 the default LDAP server

Earlier this year Andrew Bartlett set me the challenge: how could we make Samba 4 the default LDAP server in use for Linux and UNIX systems? I’ve finally decided to tackle this, and write up some simple changes we can make, and decide on some long term goals to make this a reality.

What makes a unix directory anyway?

Great question - this is such a broad topic, even I don’t know if I can single out what it means. For the purposes of this exercise I’ll treat it as “what would we need from my previous workplace”. My previous workplace had a dedicated set of 389 Directory Server machines that served lookups mainly for email routing, application authentication and more. The didn’t really process a great deal of login traffic as the majority of the workstations were Windows - thus connected to AD.

What it did show was that Linux clients and applications:

  • Want to use anonymous binds and searchs - Applications and clients are NOT domain members - they just want to do searches
  • The content of anonymous lookups should be “public safe” information. (IE nothing private)
  • LDAPS is a must for binds
  • MemberOf and group filtering is very important for access control
  • sshPublicKey and userCertificate;binary is important for 2fa/secure logins

This seems like a pretty simple list - but it’s not the model Samba 4 or AD ship with.

You’ll also want to harden a few default settings. These include:

  • Disable Guest
  • Disable 10 machine join policy

AD works under the assumption that all clients are authenticated via kerberos, and that kerberos is the primary authentication and trust provider. As a result, AD often ships with:

  • Disabled anonymous binds - All clients are domain members or service accounts
  • No anonymous content available to search
  • No LDAPS (GSSAPI is used instead)
  • no sshPublicKey or userCertificates (pkinit instead via krb)
  • Access control is much more complex topic than just “matching an ldap filter”.

As a result, it takes a bit of effort to change Samba 4 to work in a way that suits both, securely.

Isn’t anonymous binding insecure?

Let’s get this one out the way - no it’s not. In every pen test I have seen if you can get access to a domain joined machine, you probably have a good chance of taking over the domain in various ways. Domain joined systems and krb allows lateral movement and other issues that are beyond the scope of this document.

The lack of anonymous lookup is more about preventing information disclosure - security via obscurity. But it doesn’t take long to realise that this is trivially defeated (get one user account, guest account, domain member and you can search …).

As a result, in some cases it may be better to allow anonymous lookups because then you don’t have spurious service accounts, you have a clear understanding of what is and is not accessible as readable data, and you don’t need every machine on the network to be domain joined - you prevent a possible foothold of lateral movement.

So anonymous binding is just fine, as the unix world has shown for a long time. That’s why I have very few concerns about enabling it. Your safety is in the access controls for searches, not in blocking anonymous reads outright.

Installing your DC

As I run fedora, you will need to build and install samba for source so you can access the heimdal kerberos functions. Fedora’s samba 4 ships ADDC support now, but lacks some features like RODC that you may want. In the future I expect this will change though.

These documents will help guide you:


build steps

install a domain

I strongly advise you use options similar to:

/usr/local/samba/bin/samba-tool domain provision --server-role=dc --use-rfc2307 --dns-backend=SAMBA_INTERNAL --realm=SAMDOM.EXAMPLE.COM --domain=SAMDOM --adminpass=Passw0rd

Allow anonymous binds and searches

Now that you have a working domain controller, we should test you have working ldap:

/usr/local/samba/bin/samba-tool forest directory_service dsheuristics 0000002 -H ldaps://localhost --simple-bind-dn='administrator@samdom.example.com'
ldapsearch -b DC=samdom,DC=example,DC=com -H ldaps://localhost -x

You can see the domain object but nothing else. Many other blogs and sites recommend a blanket “anonymous read all” access control, but I think that’s too broad. A better approach is to add the anonymous read to only the few containers that require it.

/usr/local/samba/bin/samba-tool dsacl set --objectdn=DC=samdom,DC=example,DC=com --sddl='(A;;RPLCLORC;;;AN)' --simple-bind-dn="administrator@samdom.example.com" --password=Passw0rd
/usr/local/samba/bin/samba-tool dsacl set --objectdn=CN=Users,DC=samdom,DC=example,DC=com --sddl='(A;CI;RPLCLORC;;;AN)' --simple-bind-dn="administrator@samdom.example.com" --password=Passw0rd
/usr/local/samba/bin/samba-tool dsacl set --objectdn=CN=Builtin,DC=samdom,DC=example,DC=com --sddl='(A;CI;RPLCLORC;;;AN)' --simple-bind-dn="administrator@samdom.example.com" --password=Passw0rd

In AD groups and users are found in cn=users, and some groups are in cn=builtin. So we allow read to the root domain object, then we set a read on cn=users and cn=builtin that inherits to it’s child objects. The attribute policies are derived elsewhere, so we can assume that things like kerberos data and password material are safe with these simple changes.

Configuring LDAPS

This is a reasonable simple exercise. Given a ca cert, key and cert we can place these in the correct locations samba expects. By default this is the private directory. In a custom install, that’s /usr/local/samba/private/tls/, but for distros I think it’s /var/lib/samba/private. Simply replace ca.pem, cert.pem and key.pem with your files and restart.

Adding schema

To allow adding schema to samba 4 you need to reconfigure the dsdb config on the schema master. To show the current schema master you can use:

/usr/local/samba/bin/samba-tool fsmo show -H ldaps://localhost --simple-bind-dn='administrator@adt.blackhats.net.au' --password=Password1

Look for the value:

SchemaMasterRole owner: CN=NTDS Settings,CN=LDAPKDC,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au

And note the CN=ldapkdc = that’s the hostname of the current schema master.

On the schema master we need to adjust the smb.conf. The change you need to make is:

    dsdb:schema update allowed = yes

Now restart the instance and we can update the schema. The following LDIF should work if you replace ${DOMAINDN} with your namingContext. You can apply it with ldapmodify

dn: CN=sshPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: add
objectClass: top
objectClass: attributeSchema
cn: sshPublicKey
name: sshPublicKey
lDAPDisplayName: sshPublicKey
description: MANDATORY: OpenSSH Public key
oMSyntax: 4
isSingleValued: FALSE
searchFlags: 8

dn: CN=ldapPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: add
objectClass: top
objectClass: classSchema
cn: ldapPublicKey
name: ldapPublicKey
description: MANDATORY: OpenSSH LPK objectclass
lDAPDisplayName: ldapPublicKey
subClassOf: top
objectClassCategory: 3
defaultObjectCategory: CN=ldapPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
mayContain: sshPublicKey

dn: CN=User,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: modify
replace: auxiliaryClass
auxiliaryClass: ldapPublicKey
auxiliaryClass: posixAccount
auxiliaryClass: shadowAccount
sudo ldapmodify -f sshpubkey.ldif -D 'administrator@adt.blackhats.net.au' -w Password1 -H ldaps://localhost
adding new entry "CN=sshPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au"

adding new entry "CN=ldapPublicKey,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au"

modifying entry "CN=User,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au"

To my surprise, userCertificate already exists! The reason I missed it is a subtle ad schema behaviour I missed. The ldap attribute name is stored in the lDAPDisplayName and may not be the same as the CN of the schema element. As a result, you can find this with:

ldapsearch -H ldaps://localhost -b CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au -x -D 'administrator@adt.blackhats.net.au' -W '(attributeId='

This doesn’t solve my issues: Because I am a long time user of 389-ds, that means I need some ns compat attributes. Here I add the nsUniqueId value so that I can keep some compatability.

dn: CN=nsUniqueId,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: add
objectClass: top
objectClass: attributeSchema
attributeID: 2.16.840.1.113730.3.1.542
cn: nsUniqueId
name: nsUniqueId
lDAPDisplayName: nsUniqueId
description: MANDATORY: nsUniqueId compatability
oMSyntax: 4
isSingleValued: TRUE
searchFlags: 9

dn: CN=nsOrgPerson,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: add
objectClass: top
objectClass: classSchema
governsID: 2.16.840.1.113730.3.2.334
cn: nsOrgPerson
name: nsOrgPerson
description: MANDATORY: Netscape DS compat person
lDAPDisplayName: nsOrgPerson
subClassOf: top
objectClassCategory: 3
defaultObjectCategory: CN=nsOrgPerson,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
mayContain: nsUniqueId

dn: CN=User,CN=Schema,CN=Configuration,DC=adt,DC=blackhats,DC=net,DC=au
changetype: modify
replace: auxiliaryClass
auxiliaryClass: ldapPublicKey
auxiliaryClass: posixAccount
auxiliaryClass: shadowAccount
auxiliaryClass: nsOrgPerson

Now with this you can extend your users with the required data for SSH, certificates and maybe 389-ds compatability.

/usr/local/samba/bin/samba-tool user edit william  -H ldaps://localhost --simple-bind-dn='administrator@adt.blackhats.net.au'


Out of the box a number of the unix attributes are not indexed by Active Directory. To fix this you need to update the search flags in the schema.

Again, temporarily allow changes:

    dsdb:schema update allowed = yes

Now we need to add some indexes for common types. Note that in the nsUniqueId schema I already added the search flags. We also want to set that these values should be preserved if they become tombstones so we can recove them.

/usr/local/samba/bin/samba-tool schema attribute modify uid --searchflags=9
/usr/local/samba/bin/samba-tool schema attribute modify nsUniqueId --searchflags=9
/usr/local/samba/bin/samba-tool schema attribute modify uidnumber --searchflags=9
/usr/local/samba/bin/samba-tool schema attribute modify gidnumber --searchflags=9
# Preserve on tombstone but don't index
/usr/local/samba/bin/samba-tool schema attribute modify x509-cert --searchflags=8
/usr/local/samba/bin/samba-tool schema attribute modify sshPublicKey --searchflags=8
/usr/local/samba/bin/samba-tool schema attribute modify gecos --searchflags=8
/usr/local/samba/bin/samba-tool schema attribute modify loginShell --searchflags=8
/usr/local/samba/bin/samba-tool schema attribute modify home-directory --searchflags=24

AD Hardening

We want to harden a few default settings that could be considered insecure. First, let’s stop “any user from being able to domain join machines”.

/usr/local/samba/bin/samba-tool domain settings account_machine_join_quota 0 -H ldaps://localhost --simple-bind-dn='administrator@adt.blackhats.net.au'

Now let’s disable the Guest account

/usr/local/samba/bin/samba-tool user disable Guest -H ldaps://localhost --simple-bind-dn='administrator@adt.blackhats.net.au'

I plan to write a more complete samba-tool extension for auditing these and more options, so stay tuned!

SSSD configuration

Now that our directory service is configured, we need to configure our clients to utilise it correctly.

Here is my SSSD configuration, that supports sshPublicKey distribution, userCertificate authentication on workstations and SID -> uid mapping. In the future I want to explore sudo rules in LDAP with AD, and maybe even HBAC rules rather than GPO.

Please refer to my other blog posts on configuration of the userCertificates and sshKey distribution.

ignore_group_members = False

# There is a bug in SSSD where this actually means "ipv6 only".
# lookup_family_order=ipv6_first
cache_credentials = True
id_provider = ldap
auth_provider = ldap
access_provider = ldap
chpass_provider = ldap
ldap_search_base = dc=blackhats,dc=net,dc=au

# This prevents an infinite referral loop.
ldap_referrals = False
ldap_id_mapping = True
ldap_schema = ad
# Rather that being in domain users group, create a user private group
# automatically on login.
# This is very important as a security setting on unix!!!
# See this bug if it doesn't work correctly.
# https://pagure.io/SSSD/sssd/issue/3723
auto_private_groups = true

ldap_uri = ldaps://ad.blackhats.net.au
ldap_tls_reqcert = demand
ldap_tls_cacert = /etc/pki/tls/certs/bh_ldap.crt

# Workstation access
ldap_access_filter = (memberOf=CN=Workstation Operators,CN=Users,DC=blackhats,DC=net,DC=au)

ldap_user_member_of = memberof
ldap_user_gecos = cn
ldap_user_uuid = objectGUID
ldap_group_uuid = objectGUID
# This is really important as it allows SSSD to respect nsAccountLock
ldap_account_expire_policy = ad
ldap_access_order = filter, expire
# Setup for ssh keys
ldap_user_ssh_public_key = sshPublicKey
# This does not require ;binary tag with AD.
ldap_user_certificate = userCertificate
# This is required for the homeDirectory to be looked up in the sssd schema
ldap_user_home_directory = homeDirectory

services = nss, pam, ssh, sudo
config_file_version = 2
certificate_verification = no_verification

domains = blackhats.net.au
homedir_substring = /home

pam_cert_auth = True







With these simple changes we can easily make samba 4 able to perform the roles of other unix focused LDAP servers. This allows stateless clients, secure ssh key authentication, certificate authentication and more.

Some future goals to improve this include:

  • CLI tools to manage sshPublicKeys and userCertificates easily
  • Ship samba 4 with schema templates that can be used
  • Schema querying (what objectclass takes this attribute?)
  • Group editing (same as samba-tool user edit)
  • Security auditing tools
  • user/group modification commands
  • Refactor and improve the cli tools python to be api driven - move the logic from netcmd into samdb so that samdb can be an API that python can consume easier. Prevent duplication of logic.

The goal is so that an admin never has to see an LDIF ever again.

Into The Unknown - My Departure from RedHat

Posted by Mo Morsi on April 16, 2018 04:05 PM

In May 2006, a young starry eyed intern walked into the large corporate lobby of RedHat's Centential Campus in Raleigh, NC, to begin what would be a 12 year journey full of ups and downs, break-throughs and setbacks, and many many memories. Flash forward to April 2018, when the "intern-turned-hardend-software-enginner" filed his resignation and ended his tenure at RedHat to venture into the risky but exciting world of self-employment / entrepreneurship... Incase you were wondering that former-intern / Software Engineer is myself, and after nearly 12 years at RedHat, I finished my last day of employment on Friday April 13th, 2018.

Overall RedHat has been a great experience, I was able to work on many ground-breaking products and technologies, with many very talented individuals from across the spectrum and globe, in a manner that facilitated maximum professional and personal growth. It wasn't all sunshine and lolipops though, there were many setbacks, including many cancelled projects and dead-ends. That being said, I felt I was always able to speak my mind without fear of reprocussion, and always strived to work on those items that mattered the most and had the furthest reaching impact.

Some (but certainly not all) of those items included:

  • The highly publicized, but now defunct, RHX project
  • The oVirt virtualization management (cloud) platform, where I was on the original development team, and helped build the first prototypes & implementation
  • The RedHat Ruby stack, which was a battle to get off the ground (given the prevalence of the Java and Python ecosystems, continuing to to this day). This is one of the items I am most proud of, we addressed the needs of both the RedHat/Fedora and Ruby communities, building the necessary bridge logic to employ Ruby and Rails solutions in many enterprise production environments. This continues to this day as the team continuously stays ontop of upstream Ruby developments and provide robust support and solutions for downstream use
  • The RedHat CloudForms projects, on which I worked on serveral interations, again including initial prototypes and standards, as well as ManageIQ integration.
  • ReFS reverse engineering and parser. The last major research topic that I explored during my tenure at RedHat, this was a very fun project where I built upon the sparse information about the filesystem internals that's out there, and was able to deduce the logic up to the point of being able to read directory lists and file contents and metadata out of a formatted filesystem. While there is plenty of work to go on this front, I'm confident that the published writeups are an excellent launching point for additional research as well as the development of formal tools to extract data from the filesystem.

My plans for the immediate future are to take a short break then file to form a LLC and explore options under that umbrella. Of particular interest are crypto-currencies, specifically Ripple. I've recently begun developing an integrated wallet, ledger and market explorer, and statistical analysis framework called Wipple which I'm planning to continue working on and if all goes according to plan, generating some revenue from. There is alot of ??? between here and there, but thats the name of the game!

Until then, I'd like to thank everyone who helped me do my thing at RedHat, from landing my initial internships and the full-time after that, to showing me the ropes, and not shooting me down when I put myself out there to work on and promote our innovation solutions and technologies!


Smartcards and You - How To Make Them Work on Fedora/RHEL

Posted by William Brown on February 26, 2018 02:00 PM

Smartcards and You - How To Make Them Work on Fedora/RHEL

Smartcards are a great way to authenticate users. They have a device (something you have) and a pin (something you know). They prevent password transmission, use strong crypto and they even come in a variety of formats. From your “card” shapes to yubikeys.

So why aren’t they used more? It’s the classic issue of usability - the setup for them is undocumented, complex, and hard to discover. Today I hope to change this.

The Goal

To authenticate a user with a smartcard to a physical linux system, backed onto LDAP. The public cert in LDAP is validated, as is the chain to the CA.

You Will Need

I’ll be focusing on the yubikey because that’s what I own.

Preparing the Smartcard

First we need to make the smartcard hold our certificate. Because of a crypto issue in yubikey firmware, it’s best to generate certificates for these externally.

I’ve documented this before in another post, but for accesibility here it is again.

Create an NSS DB, and generate a certificate signing request:

certutil -d . -N -f pwdfile.txt
certutil -d . -R -a -o user.csr -f pwdfile.txt -g 4096 -Z SHA256 -v 24 \
--keyUsage digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment --nsCertType sslClient --extKeyUsage clientAuth \
-s "CN=username,O=Testing,L=example,ST=Queensland,C=AU"

Once the request is signed, and your certificate is in “user.crt”, import this to the database.

certutil -A -d . -f pwdfile.txt -i user.crt -a -n TLS -t ",,"
certutil -A -d . -f pwdfile.txt -i ca.crt -a -n TLS -t "CT,,"

Now export that as a p12 bundle for the yubikey to import.

pk12util -o user.p12 -d . -k pwdfile.txt -n TLS

Now import this to the yubikey - remember to use slot 9a this time! As well make sure you set the touch policy NOW, because you can’t change it later!

yubico-piv-tool -s9a -i user.p12 -K PKCS12 -aimport-key -aimport-certificate -k --touch-policy=always

Setting up your LDAP user

First setup your system to work with LDAP via SSSD. You’ve done that? Good! Now it’s time to get our user ready.

Take our user.crt and convert it to DER:

openssl x509 -inform PEM -outform DER -in user.crt -out user.der

Now you need to transform that into something that LDAP can understand. In the future I’ll be adding a tool to 389-ds to make this “automatic”, but for now you can use python:

>>> import base64
>>> with open('user.der', 'r') as f:
>>>    print(base64.b64encode(f.read))

That should output a long base64 string on one line. Add this to your ldap user with ldapvi:

userCertificate;binary:: <BASE64>

Note that ‘;binary’ tag has an important meaning here for certificate data, and the ‘::’ tells ldap that this is b64 encoded, so it will decode on addition.

Setting up the system

Now that you have done that, you need to teach SSSD how to intepret that attribute.

In your various SSSD sections you’ll need to make the following changes:

auth_provider = ldap
ldap_user_certificate = userCertificate;binary

# This controls OCSP checks, you probably want this enabled!
# certificate_verification = no_verification

pam_cert_auth = True

Now the TRICK is letting SSSD know to use certificates. You need to run:

sudo touch /var/lib/sss/pubconf/pam_preauth_available

With out this, SSSD won’t even try to process CCID authentication!

Add your ca.crt to the system trusted CA store for SSSD to verify:

certutil -A -d /etc/pki/nssdb -i ca.crt -n USER_CA -t "CT,,"

Add coolkey to the database so it can find smartcards:

modutil -dbdir /etc/pki/nssdb -add "coolkey" -libfile /usr/lib64/libcoolkeypk11.so

Check that SSSD can find the certs now:

# sudo /usr/libexec/sssd/p11_child --pre --nssdb=/etc/pki/nssdb
PIN for william
CAC ID Certificate

If you get no output here you are missing something! If this doesn’t work, nothing will!

Finally, you need to tweak PAM to make sure that pam_unix isn’t getting in the way. I use the following configuration.

auth        required      pam_env.so
# This skips pam_unix if the given uid is not local (IE it's from SSSD)
auth        [default=1 ignore=ignore success=ok] pam_localuser.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 1000 quiet_success
auth        sufficient    pam_sss.so prompt_always ignore_unknown_user
auth        required      pam_deny.so

account     required      pam_unix.so
account     sufficient    pam_localuser.so
account     sufficient    pam_succeed_if.so uid < 1000 quiet
account     [default=bad success=ok user_unknown=ignore] pam_sss.so
account     required      pam_permit.so

password    requisite     pam_pwquality.so try_first_pass local_users_only retry=3 authtok_type=
password    sufficient    pam_unix.so sha512 shadow try_first_pass use_authtok
password    sufficient    pam_sss.so use_authtok
password    required      pam_deny.so

session     optional      pam_keyinit.so revoke
session     required      pam_limits.so
-session    optional      pam_systemd.so
session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
session     required      pam_unix.so
session     optional      pam_sss.so

That’s it! Restart SSSD, and you should be good to go.

Finally, you may find SELinux isn’t allowing authentication. This is really sad that smartcards don’t work with SELinux out of the box and I have raised a number of bugs, but check this just in case.

Happy authentication!

ReFS Part III - Back to the Resilience

Posted by Mo Morsi on February 06, 2018 04:18 PM

We've made some great headway on the ReFS filesystem anaylsis front to the point of being able to implement a rudimentary file extraction mechanism (complete with timestamps).

First a recap of the story so far:

  • ReFS, aka "The Resilient FileSystem" is a relatively new filesystem developed by Microsoft. First shipped in Windows Server 2012, it has since seen an increase in popularity and use, especially in enterprise and cloud environments.
  • Little is known about the ReFS internals outside of some sparse information provided by Microsoft. According to that, data is organized into pages of a fixed size, starting at a static position on the disk. The first round of analysis was to determine the boundaries of these top level organizational units to be able to scan the disk for high level structures.
  • Once top level structures, including the object table and root directory, were identified, each was analyzed in detail to determine potential parsable structures such as generic Attribute and Record entities as well as file and directory references.
  • The latest round of analysis consisted of diving into these entities in detail to try and deduce a mechanism which to extract file metadata and content

Before going into details, we should note this analysis is based on observations against ReFS disks generated locally, without extensive sequential cross-referencing and comparison of many large files with many changes. Also it is possible that some structures are oversimplified and/or not fully understood. That being said, this should provide a solid basis for additional analysis, getting us deep into the filesystem, and allowing us to poke and prod with isolated bits to identify their semantics.

Now onto the fun stuff!

- A ReFS filesystem can be identified with the following signature at the very start of the partition:

    00 00 00 52  65 46 53 00  00 00 00 00  00 00 00 00 ...ReFS.........
    46 53 52 53  XX XX XX XX  XX XX XX XX  XX XX XX XX FSRS

- The following Ruby code will tell you if a given offset in a given file contains a ReFS partition:

    # Point this to the file containing the disk image

    # Point this at the start of the partition containing the ReFS filesystem

    # FileSystem Signature we are looking for
    FS_SIGNATURE  = [0x00, 0x00, 0x00, 0x52, 0x65, 0x46, 0x53, 0x00] # ...ReFS.

    img = File.open(File.expand_path(DISK), 'rb')
    img.seek ADDRESS
    sig = img.read(FS_SIGNATURE.size).unpack('C*')
    puts "Disk #{sig == FS_SIGNATURE ? "contains" : "does not contain"} ReFS filesystem"

- ReFS pages are 0x4000 bytes in length

- On all inspected systems, the first page number is 0x1e (0x78000 bytes after the start of the partition containing the filesystem). This is inline w/ Microsoft documentation which states that the first metadata dir is at a fixed offset on the disk.

- Other pages contain various system, directory, and volume structures and tables as well as journaled versions of each page (shadow-written upon regular disk writes)

- The first byte of each page is its Page Number

- The first 0x30 bytes of every metadata page (dubbed the Page Header) seem to follow a certain pattern:

    byte  0: XX XX 00 00   00 00 00 00   YY 00 00 00   00 00 00 00
    byte 16: 00 00 00 00   00 00 00 00   ZZ ZZ 00 00   00 00 00 00
    byte 32: 01 00 00 00   00 00 00 00   00 00 00 00   00 00 00 00
  • dword 0 (XX XX) is the page number which is sequential and corresponds to the 0x4000 offset of the page
  • dword 2 (YY) is the journal number or sequence number
  • dword 6 (ZZ ZZ) is the "Virtual Page Number", which is non-sequential (eg values are in no apparent order) and seem to tie related pages together.
  • dword 8 is always 01, perhaps an "allocated" flag or other

- Multiple pages may share a virtual page number (byte 24/dword 6) but usually don't appear in sequence.

- The following Ruby code will print out the pages in a ReFS partition along w/ their shadow copies:

    # Point this to the file containing the disk image
    # Point this at the start of the partition containing the ReFS filesystem
    FIRST_PAGE = 0x1e
    img = File.open(File.expand_path(DISK), 'rb')
    page_id = FIRST_PAGE
    img.seek(ADDRESS + page_id*PAGE_SIZE)
    while contents = img.read(PAGE_SIZE)
      id = contents.unpack('S').first
      if id == page_id
        pos = img.pos
        start = ADDRESS + page_id * PAGE_SIZE
        img.seek(start + PAGE_SEQ)
        seq = img.read(4).unpack("L").first
        img.seek(start + PAGE_VIRTUAL_PAGE_NUM)
        vpn = img.read(4).unpack("L").first
        print "page: "
        print "0x#{id.to_s(16).upcase}".ljust(7)
        print " @ "
        print "0x#{start.to_s(16).upcase}".ljust(10)
        print ": Seq - "
        print "0x#{seq.to_s(16).upcase}".ljust(7)
        print "/ VPN - "
        print "0x#{vpn.to_s(16).upcase}".ljust(9)
        img.seek pos
      page_id += 1

- The object table (virtual page number 0x02) associates object ids' with the pages on which they reside. Here we an AttributeList consisting of Records of key/value pairs (see below for the specifics on these data structures). We can lookup the object id of the root directory (0x600000000) to retrieve the page on which it resides:

   50 00 00 00 10 00 10 00 00 00 20 00 30 00 00 00 - total length / key & value boundries
   00 00 00 00 00 00 00 00 00 06 00 00 00 00 00 00 - object id
   F4 0A 00 00 00 00 00 00 00 00 02 08 08 00 00 00 - page id / flags
   CE 0F 85 14 83 01 DC 39 00 00 00 00 00 00 00 00 - checksum
   08 00 00 00 08 00 00 00 04 00 00 00 00 00 00 00

^ The object table entry for the root dir, containing its page (0xAF4)

- When retrieving pages by id or virtual page number, look for the ones with the highest sequence number as those are the latest copies of the shadow-write mechanism.

- Expanding upon the previous example we can implement some logic to read and dump the object table:

    ATTR_START = 0x30
    def img
      @img ||= File.open(File.expand_path(DISK), 'rb')
    def pages
      @pages ||= begin
        _pages = {}
        page_id = FIRST_PAGE
        img.seek(ADDRESS + page_id*PAGE_SIZE)
        while contents = img.read(PAGE_SIZE)
          id = contents.unpack('S').first
          if id == page_id
            pos = img.pos
            start = ADDRESS + page_id * PAGE_SIZE
            img.seek(start + PAGE_SEQ)
            seq = img.read(4).unpack("L").first
            img.seek(start + PAGE_VIRTUAL_PAGE_NUM)
            vpn = img.read(4).unpack("L").first
            _pages[id] = {:id => id, :seq => seq, :vpn => vpn}
            img.seek pos
          page_id += 1
    def page(opts)
      if opts.key?(:id)
        return pages[opts[:id]]
      elsif opts[:vpn]
        return pages.values.select { |v|
          v[:vpn] == opts[:vpn]
        }.sort { |v1, v2| v1[:seq] <=> v2[:seq] }.last
    def obj_pages
      @obj_pages ||= begin
        obj_table = page(:vpn => 2)
        img.seek(ADDRESS + obj_table[:id] * PAGE_SIZE)
        bytes = img.read(PAGE_SIZE).unpack("C*")
        len1 = bytes[ATTR_START]
        len2 = bytes[ATTR_START+len1]
        start = ATTR_START + len1 + len2
        objs = {}
        while bytes.size > start && bytes[start] != 0
          len = bytes[start]
          id  = bytes[start+0x10..start+0x20-1].collect { |i| i.to_s(16).upcase }.reverse.join()
          tgt = bytes[start+0x20..start+0x21].collect   { |i| i.to_s(16).upcase }.reverse.join()
          objs[id] = tgt
          start += len
    obj_pages.each { |id, tgt|
      puts "Object #{id} is on page #{tgt}"

We could also implement a method to lookup a specific object's page:

    def obj_page(obj_id)

    puts page(:id => obj_page("0000006000000000").to_i(16))

This will retrieve the page containing the root directory

- Directories, from the root dir down, follow a consistent pattern. They are comprised of sequential lists of data structures whose length is given by the first word value (Attributes and Attribute Lists).

List are often prefixed with a Header Attribute defining the total length of the Attributes that follow that consititute the list. Though this is not a hard set rule as in the case where the list resides in the body of another Attribute (more on that below).

In either case, Attributes may be parsed by iterating over the bytes after the directory page header, reading and processing the first word to determine the next number of bytes to read (minus the length of the first word), and then repeating until null (0000) is encountered (being sure to process specified padding in the process)

- Various Attributes take on different semantics including references to subdirs and files as well as branches to additional pages containing more directory contents (for large directories); though not all Attributes have been identified.

The structures in a directory listing always seem to be of one of the following formats:

- Base Attribute - The simplest / base attribute consisting of a block whose length is given at the very start.

An example of a typical Attribute follows:

      a8 00 00 00  28 00 01 00  00 00 00 00  10 01 00 00  
      10 01 00 00  02 00 00 00  00 00 00 00  00 00 00 00  
      00 00 00 00  00 00 00 00  a9 d3 a4 c3  27 dd d2 01  
      5f a0 58 f3  27 dd d2 01  5f a0 58 f3  27 dd d2 01  
      a9 d3 a4 c3  27 dd d2 01  20 00 00 00  00 00 00 00  
      00 06 00 00  00 00 00 00  03 00 00 00  00 00 00 00  
      5c 9a 07 ac  01 00 00 00  19 00 00 00  00 00 00 00  
      00 00 01 00  00 00 00 00  00 00 00 00  00 00 00 00  
      00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  
      00 00 00 00  00 00 00 00  01 00 00 00  00 00 00 00  
      00 00 00 00  00 00 00 00

Here we a section of 0xA8 length containing the following four file timestamps (more on this conversion below)

       a9 d3 a4 c3  27 dd d2 01 - 2017-06-04 07:43:20
       5f a0 58 f3  27 dd d2 01 - 2017-06-04 07:44:40
       5f a0 58 f3  27 dd d2 01 - 2017-06-04 07:44:40
       a9 d3 a4 c3  27 dd d2 01 - 2017-06-04 07:43:20

It is safe to assume that either

  • one of the first fields in any given Attribute contains an identifier detailing how the attribute should be parsed _or_
  • the context is given by the Attribute's position in the list.
  • attributes corresponding to given meaning are referenced by address or identifier elsewhere

The following is a method which can be used to parse a given Attribute off disk, provided the img read position is set to its start:

    def read_attr
      pos = img.pos
      packed = img.read(4)
      return new if packed.nil?
      attr_len = packed.unpack('L').first
      return new if attr_len == 0

      img.seek pos
      value = img.read(attr_len)
      Attribute.new(:pos   => pos,
                    :bytes => value.unpack("C*"),
                    :len   => attr_len)

- Records - Key / Value pairs whose total length and key / value lengths are given in the first 0x20 bytes of the attribute. These are used to associated metadata sections with files whose names are recorded in the keys and contents are recorded in the value.

An example of a typical Record follows:

    40 04 00 00   10 00 1A 00   08 00 30 00   10 04 00 00   @.........0.....
    30 00 01 00   6D 00 6F 00   66 00 69 00   6C 00 65 00   0...m.o.f.i.l.e.
    31 00 2E 00   74 00 78 00   74 00 00 00   00 00 00 00   1...t.x.t.......
    A8 00 00 00   28 00 01 00   00 00 00 00   10 01 00 00   ¨...(...........
    10 01 00 00   02 00 00 00   00 00 00 00   00 00 00 00   ................
    00 00 00 00   00 00 00 00   A9 D3 A4 C3   27 DD D2 01   ........©Ó¤Ã'ÝÒ.
    5F A0 58 F3   27 DD D2 01   5F A0 58 F3   27 DD D2 01   _ Xó'ÝÒ._ Xó'ÝÒ.
    A9 D3 A4 C3   27 DD D2 01   20 00 00 00   00 00 00 00   ©Ó¤Ã'ÝÒ. .......
    00 06 00 00   00 00 00 00   03 00 00 00   00 00 00 00   ................
    5C 9A 07 AC   01 00 00 00   19 00 00 00   00 00 00 00   \..¬............
    00 00 01 00   00 00 00 00   00 00 00 00   00 00 00 00   ................
    00 00 00 00   00 00 00 00   00 00 00 00   00 00 00 00   ................
    00 00 00 00   00 00 00 00   01 00 00 00   00 00 00 00   ................
    00 00 00 00   00 00 00 00   20 00 00 00   A0 01 00 00   ........ ... ...
    D4 00 00 00   00 02 00 00   74 02 00 00   01 00 00 00   Ô.......t.......
    78 02 00 00   00 00 00 00 ...(cutoff)                   x.......

Here we see the Record parameters given by the first row:

  • total length - 4 bytes = 0x440
  • key offset - 2 bytes = 0x10
  • key length - 2 bytes = 0x1A
  • flags / identifer - 2 bytes = 0x08
  • value offset - 2 bytes = 0x30
  • value length - 2 bytes = 0x410

Naturally, the Record finishes after the value, 0x410 bytes after the value start at 0x30, or 0x440 bytes after the start of the Record (which lines up with the total length).

We also see that this Record corresponds to a file I created on disk as the key is the File Metadata flag (0x10030) followed by the filename (mofile1.txt).

Here the first attribute in the Record value is the simple attribute we discussed above, containing the file timestamps. The File Reference Attribute List Header follows (more on that below).

From observation Records w/ flag values of '0' or '8' are what we are looking for, while '4' occurs often, this almost always seems to indicate a Historical Record, or a Record that has since been replaced with another.

Since Records are prefixed with their total length, they can be thought of a subclass of Attribute. The following is a Ruby class that uses composition to dispatch record field lookup calls to values in the underlying Attribute:

    class Record
      attr_accessor :attribute

      def initialize(attribute)
        @attribute = attribute

      def key_offset
        @key_offset ||= attribute.words[2]

      def key_length
        @key_length ||= attribute.words[3]

      def flags
        @flags ||= attribute.words[4]

      def value_offset
        @value_offset ||= attribute.words[5]

      def value_length
        @value_offset ||= attribute.words[6]

      def key
        @key ||= begin
          ko, kl, vo, vl = boundries

      def value
        @value ||= begin
          ko, kl, vo, vl = boundries

      def value_pos
        attribute.pos + value_offset

      def key_pos
        attribute.pos + key_offset
    end # class Record

- AttributeList - These are more complicated but interesting. At first glance they are simple Attributes of length 0x20 but upon further inspection we consistently see it contains the length of a large block of Attributes (this length is inclusive, as it contains this first one). After parsing this Attribute, dubbed the 'List Header', we should read the remaining bytes in the List as well as the padding, before arriving at the next Attribute

   20 00 00 00   A0 01 00 00   D4 00 00 00   00 02 00 00 <- list header specifying total length (0x1A0) and padding (0xD4)
   74 02 00 00   01 00 00 00   78 02 00 00   00 00 00 00
   80 01 00 00   10 00 0E 00   08 00 20 00   60 01 00 00
   60 01 00 00   00 00 00 00   80 00 00 00   00 00 00 00
   88 00 00 00  ... (cutoff)

Here we see an Attribute of 0x20 length, that contains a reference to a larger block size (0x1A0) in its third word.

This can be confirmed by the next Attribute whose size (0x180) is the larger block size minute the length of the header (0x1A0 - 0x20). In this case the list only contains one item/child attribute.

In general a simple strategy to parse the entire case would be to:

  • Parse Attributes individually as normal
  • If we encounter a List Header Attribute, we calculate the size of the list (total length minus header length)
  • Then continue parsing Attributes, adding them to the list until the total length is completed.

It also seems that:

  • the padding that occurs after the list is given by header word number 5 (in this case 0xD4). After the list is parsed, we consistently see this many null bytes before the next Attribute begins (which is not part of & unrelated to the list).
  • the type of list is given by its 7th word; directory contents correspond to 0x200 while directory branches are indicated with 0x301

Here is a class that represents an AttributeList header attribute by encapsulating it in a similar manner to Record above:

    class AttributeListHeader
      attr_accessor :attribute

      def initialize(attr)
        @attribute = attr

      # From my observations this is always 0x20
      def len
        @len ||= attribute.dwords[0]

      def total_len
        @total_len ||= attribute.dwords[1]

      def body_len
        @body_len ||= total_len - len

      def padding
        @padding ||= attribute.dwords[2]

      def type
        @type ||= attribute.dwords[3]

      def end_pos
        @end_pos ||= attribute.dwords[4]

      def flags
        @flags ||= attribute.dwords[5]

      def next_pos
        @next_pos ||= attribute.dwords[6]

Here is a method to parse the actual Attribute List assuming the image read position is set to the beginning of the List Header

    def read_attribute_list
      header        = Header.new(read_attr)
      remaining_len = header.body_len
      orig_pos      = img.pos
      bytes         = img.read remaining_len
      img.seek orig_pos

      attributes = []

      until remaining_len == 0
        attributes    << read_attr
        remaining_len -= attributes.last.len

      img.seek orig_pos - header.len + header.end_pos

      AttributeList.new :header     => header,
                        :pos        => orig_pos,
                        :bytes      => bytes,
                        :attributes => attributes

Now we have most of what is needed to locate and parse individual files, but there are a few missing components including:

- Directory Tree Branches: These are Attribute Lists where each Attribute corresponds to a record whose value references a page which contains more directory contents.

Upon encountering an AttributeList header with flag value 0x301, we should

  • iterate over the Attributes in the list,
  • parse them as Records,
  • use the first dword in each value as the page to repeat the directory traversal process (recursively).

Additional files and subdirs found on the referenced pages should be appended to the list of current directory contents.

Note this is the (an?) implementation of the BTree structure in the ReFS filesystem described by Microsoft, as the record keys contain the tree leaf identifiers (based on file and subdirectory names).

This can be used for quick / efficient file and subdir lookup by name (see 'optimization' in 'next steps' below)

- SubDirectories: these are simply Records in the directory's Attribute List whose key contains the Directory Metadata flag (0x20030) as well as the subdir name.

The value of this Record is the corresponding object id which can be used to lookup the page containing the subdir in the object table.

A typical subdirectory Record

    70 00 00 00  10 00 12 00  00 00 28 00  48 00 00 00  
    30 00 02 00  73 00 75 00  62 00 64 00  69 00 72 00  <- here we see the key containing the flag (30 00 02 00) followed by the dir name ("subdir2")
    32 00 00 00  00 00 00 00  03 07 00 00  00 00 00 00  <- here we see the object id as the first qword in the value (0x730)
    00 00 00 00  00 00 00 00  14 69 60 05  28 dd d2 01  <- here we see the directory timestamps (more on those below)
    cc 87 ce 52  28 dd d2 01  cc 87 ce 52  28 dd d2 01  
    cc 87 ce 52  28 dd d2 01  00 00 00 00  00 00 00 00  
    00 00 00 00  00 00 00 00  00 00 00 10  00 00 00 00

- Files: like directories are Records whose key contains a flag (0x10030) followed by the filename.

The value is far more complicated though and while we've discovered some basic Attributes allowing us to pull timestamps and content from the fs, there is still more to be deduced as far as the semantics of this Record's value.

- The File Record value consists of multiple attributes, though they just appear one after each other, without a List Header. We can still parse them sequentially given that all Attributes are individually prefixed with their lengths and the File Record value length gives us the total size of the block.

- The first attribute contains 4 file timestamps at an offset given by the fifth byte of the attribute (though this position may be coincidental an the timestamps could just reside at a fixed location in this attribute).

In the first attribute example above we see the first timestamp is

       a9 d3 a4 c3  27 dd d2 01

This corresponds to the following date

        2017-06-04 07:43:20

And may be converted with the following algorithm:

          tsi = TIMESTAMP_BYTES.pack("C*").unpack("Q*").first
          Time.at(tsi / 10000000 - 11644473600)

Timestamps being in nanoseconds since the Windows Epoch Data (11644473600 = Jan 1, 1601 UTC)

- The second Attribute seems to be the Header of an Attribute List containing the 'File Reference' semantics. These are the Attributes that encapsulate the file length and content pointers.

I'm assuming this is an Attribute List so as to contain many of these types of Attributes for large files. What is not apparent are the full semantics of all of these fields.

But here is where it gets complicated, this List only contains a single attribute with a few child Attributes. This encapsulation seems to be in the same manner as the Attributes stored in the File Record value above, just a simple sequential collection without a Header.

In this single attribute (dubbed the 'File Reference Body') the first Attribute contains the length of the file while the second is the Header for yet another List, this one containing a Record whose value contains a reference to the page which the file contents actually reside.

      | ...                                  |
      | File Entry Record                    |
      | Key: 0x10030 [FileName]              |
      | Value:                               |
      | Attribute1: Timestamps               |
      | Attribute2:                          |
      |   File Reference List Header         |
      |   File Reference List Body(Record)   |
      |     Record Key: ?                    |
      |     Record Value:                    |
      |       File Length Attribute          |
      |       File Content List Header       |
      |       File Content Record(s)         |
      | Padding                              |
      | ...                                  |

While complicated each level can be parsed in a similar manner to all other Attributes & Records, just taking care to parse Attributes into their correct levels & structures.

As far as actual values,

  • the file length is always seen at a fixed offset within its attribute (0x3c) and
  • the content pointer seems to always reside in the second qword of the Record value. This pointer is simply a reference to the page which the file contents can be read verbatim.


And that's it! An example implementation of all this logic can be seen in our expiremental 'resilience' library found here:


The next steps would be to

  • expand upon the data structures above (verify that we have interpreted the existing structures correctly)
  • deduce full Attribute and Record semantics so as to be able to consistently parse files of any given length, with any given number of modifications out of the file system

And once we have done so robustly, we can start looking at optimization, possibly rolling out some expiremental production logic for ReFS filesystem support!

... Cha-ching $ £ ¥ ¢ ₣ ₩ !!!!

Using b43 firmware on Fedora Atomic Workstation

Posted by William Brown on December 22, 2017 02:00 PM

Using b43 firmware on Fedora Atomic Workstation

My Macbook Pro has a broadcom b43 wireless chipset. This is notorious for being one of the most annoying wireless adapters on linux. When you first install Fedora you don’t even see “wifi” as an option, and unless you poke around in dmesg, you won’t find how to enable b43 to work on your platform.


The b43 driver requires proprietary firmware to be loaded else the wifi chip will not run. There are a number of steps for this process found on the linux wireless page . You’ll note that one of the steps is:

export FIRMWARE_INSTALL_DIR="/lib/firmware"
sudo b43-fwcutter -w "$FIRMWARE_INSTALL_DIR" broadcom-wl-5.100.138/linux/wl_apsta.o

So we need to be able to write and extract our firmware to /usr/lib/firmware, and then reboot and out wifi works.

Fedora Atomic Workstation

Atomic WS is similar to atomic server, that it’s a read-only ostree based deployment of fedora. This comes with a number of unique challenges and quirks but for this issue:

sudo touch /usr/lib/firmware/test
/bin/touch: cannot touch '/usr/lib/firmware/test': Read-only file system

So we can’t extract our firmware!

Normally linux also supports reading from /usr/local/lib/firmware (which on atomic IS writeable …) but for some reason fedora doesn’t allow this path.

Solution: Layered RPMs

Atomic has support for “rpm layering”. Ontop of the ostree image (which is composed of rpms) you can supply a supplemental list of packages that are “installed” at rpm-ostree update time.

This way you still have an atomic base platform, with read-only behaviours, but you gain the ability to customise your system. To achive it, it must be possible to write to locations in /usr during rpm install.

This means our problem has a simple solution: Create a b43 rpm package. Note, that you can make this for yourself privately, but you can’t distribute it for legal reasons.

Get setup on atomic to build the packages:

rpm-ostree install rpm-build createrepo

RPM specfile:


%define debug_package %{nil} Summary: Allow b43 fw to install on ostree installs due to bz1512452 Name: b43-fw Version: 1.0.0 Release: 1 License: Proprietary, DO NOT DISTRIBUTE BINARY FORMS URL: http://linuxwireless.sipsolutions.net/en/users/Drivers/b43/ Group: System Environment/Kernel

BuildRequires: b43-fwcutter

Source0: http://www.lwfinger.com/b43-firmware/broadcom-wl-5.100.138.tar.bz2

%description Broadcom firmware for b43 chips.

%prep %setup -q -n broadcom-wl-5.100.138

%build true

%install pwd mkdir -p %{buildroot}/usr/lib/firmware b43-fwcutter -w %{buildroot}/usr/lib/firmware linux/wl_apsta.o

%files %defattr(-,root,root,-) %dir %{_prefix}/lib/firmware/b43 %{_prefix}/lib/firmware/b43/*

%changelog * Fri Dec 22 2017 William Brown <william at blackhats.net.au> - 1.0.0 - Initial version

Now you can put this into a folder like so:

mkdir -p ~/rpmbuild/{SPECS,SOURCES}
<editor> ~/rpmbuild/SPECS/b43-fw.spec
wget -O ~/rpmbuild/SOURCES/broadcom-wl-5.100.138.tar.bz2 http://www.lwfinger.com/b43-firmware/broadcom-wl-5.100.138.tar.bz2

We are now ready to build!

rpmbuild -bb ~/rpmbuild/SPECS/b43-fw.spec
createrepo ~/rpmbuild/RPMS/x86_64/

Finally, we can install this. Create a yum repos file:

baseurl=file:///home/<YOUR USERNAME HERE>/rpmbuild/RPMS/x86_64
rpm-ostree install b43-fw

Now reboot and enjoy wifi on your Fedora Atomic Macbook Pro!

RETerm to The Terminal with a GUI

Posted by Mo Morsi on December 14, 2017 04:45 PM

When it comes to user interfaces, most (if not all) software applications can be classified into one of three categories:

  • Text Based - whether they entail one-off commands, interactive terminals (REPL), or text-based visual widgets, these saw a major rise in the 50s-80s though were usurped by GUIs in the 80s-90s
  • Graphical - GUIs, or Graphical User Interfaces, facilitate creating visual windows which the user may interact with via the mouse or keyboard. There are many different GUI frameworks available for various platforms
  • Web Based - A special type of graphical interface rendered via a web browser, many applications provide their frontend via HTML, Javascript, & CSS
Interfaces comparison

In recent years modern interface trends seem to be moving in the direction of the Web User Interfaces (WUI), with increasing numbers of apps offering their functionality primarily via HTTP. That being said GUIs and TUIs (Text User Interfaces) are still an entrenched use case for various reasons:

  • Web browsers, servers, and network access may not be available or permissable on all systems
  • Systems need mechanisms to access and interact with the underlying components, incase higher level constructs, such as graphics and network subsystems fail or are unreliable
  • Simpler text & graphical implementations can be coupled and optimized for the underlying operational environment without having to worry about portability and cross-env compatability. Clients can thus be simpler and more robust.

Finally there is a certain pleasing ascethic to simple text interfaces that you don't get with GUIs or WUIs. Of course this is a human-preference sort-of-thing, but it's often nice to return to our computational roots as we move into the future of complex gesture and voice controlled computer interactions.

Scifi terminal

When working on a recent side project (to be announced), I was exploring various concepts as to the user interface which to throw ontop of it. Because other solutions exist in the domain which I'm working in (and for other reasons), I wanted to explore something novel as far as user interaction, and decided to expirement with a text-based approach. ncurses is the goto library for this sort of thing, being available on most modern platforms, along with many widget libraries and high level wrappers


Unfortunately ncurses comes with alot of boilerplate and it made sense to seperate that from the project I intend to use this for. Thus the RETerm library was born, with the intent to provide a high level DSL to implmenent terminal interfaces and applications (... in Ruby of couse <3 !!!)

Reterm sc1

RETerm, aka the Ruby Enhanced TERMinal allows the user to incorporate high level text-based widgets into an orgnaized terminal window, with seemless standardized keyboard interactions (mouse support is on the roadmap to be added). So for example, one could define a window containing a child widget like so:

require 'reterm'
include RETerm

value = nil

init_reterm {
  win = Window.new :rows => 10,
                   :cols => 30

  slider = Components::VSlider.new
  win.component = slider
  value = slider.activate!

puts "Slider Value: #{value}"

This would result in the following interface containing a vertical slider:

Reterm sc2

RETerm ships with many built widgets including:

Text Entry

Reterm sc3

Clickable Button

Reterm sc4

Radio Switch/Rocker/Selectable List

Reterm sc5 Reterm sc6 Reterm sc7

Sliders (both horizontal and vertical)


Ascii Text (with many fonts via artii/figlet)

Reterm sc8

Images (via drawille)

Reterm sc9

RETerm is now available via rubygems. To install, simplly:

  $ gem install reterm

That's All Folks... but wait there is more!!! Afterall:

Delorian meme

For a bit of a value-add, I decided to implement a standard schema where text interfaces could be described in a JSON config file and loaded by the framework, similar to xml schemas which GTK and Android use for their interfaces. One can simply describe their interface in JSON and the framework will instantiate the corresponding text interface:

  "window" : {
    "rows"      : 10,
    "cols"      : 50,
    "border"    : true,
    "component" : {
      "type" : "Entry",
      "init" : {
        "title" : "<C>Demo",
        "label" : "Enter Text: "
Reterm sc10

To assist in generating this schema, I implemented a graphical designer, where components can be dragged and dropped into a 2D canvas to layout the interface.

That's right, you can now use a GUI based application to design a text-based interface.

Retro meme

The Designer itself can be found in the same repo as the RETerm project, loaded in the "designer/" subdir.

Reterm designer

To use if you need to install visualruby (a high level wrapper to ruby-gnome) like so:

  $ gem install visualruby

And that's it! (for real this time) This was certainly a fun side-project to a side-project (toss in a third "side-project" if you consider the designer to be its own thing!). As I to return to the project using RETerm, I aim to revisit it every so often, adding new features, widgets, etc....



What am I doing ?

Posted by Mayank Jha on December 08, 2017 04:31 AM

Joining club 27 because it’s cool ?

No one knows when will someone die.
My friend died, whom I loved.
Unfortunate and sad.
I could have been one.
I could die the next moment.
I am shit scared now.
What should I do ?
The memory of my friend will stay with me forever till I die.
It’s a scar which will be lifelong.
But I cherish the moments
I had with him.
The wild explorations we had.
Let’s cherish the moments we made.
Sadly you could not stay with us any longer.


Makes us realise how transient human life is.
The one final reality which we all need to face.
The one final breath.
The one final slump into slumber and nothingness.
Complete void.
All that remains is between the start and end.
He said, in the end nothing matters.
But the thing in between is ALL that is.
You came with nothing and would go with nothing.
But in the middle is where the magic called life happens


The emotion called love, which binds us.
Love from our creators.
Love from humans around us.
I don’t know whether or not I’ll end up with a girl.
I don’t know whether I’ll end up with a guy.
I know for sure, that spreading love and passion.
My nerves might never heal.
The itch might never go.
But love shall stay.

Creating yubikey SSH and TLS certificates

Posted by William Brown on November 10, 2017 02:00 PM

Creating yubikey SSH and TLS certificates

Recently yubikeys were shown to have a hardware flaw in the way the generated private keys. This affects the use of them to provide PIV identies or SSH keys.

However, you can generate the keys externally, and load them to the key to prevent this issue.


First, we’ll create a new NSS DB on an airgapped secure machine (with disk encryption or in memory storage!)

certutil -N -d . -f pwdfile.txt

Now into this, we’ll create a self-signed cert valid for 10 years.

certutil -S -f pwdfile.txt -d . -t "C,," -x -n "SSH" -g 2048 -s "cn=william,O=ssh,L=Brisbane,ST=Queensland,C=AU" -v 120

We export this now to PKCS12 for our key to import.

pk12util -o ssh.p12 -d . -k pwdfile.txt -n SSH

Next we import the key and cert to the hardware in slot 9c

yubico-piv-tool -s9c -i ssh.p12 -K PKCS12 -aimport-key -aimport-certificate -k

Finally, we can display the ssh-key from the token.

ssh-keygen -D /usr/lib64/opensc-pkcs11.so -e

Note, we can make this always used by ssh client by adding the following into .ssh/config:

PKCS11Provider /usr/lib64/opensc-pkcs11.so

TLS Identities

The process is almost identical for user certificates.

First, create the request:

certutil -d . -R -a -o user.csr -f pwdfile.txt -g 4096 -Z SHA256 -v 24 \
--keyUsage digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment --nsCertType sslClient --extKeyUsage clientAuth \
-s "CN=username,O=Testing,L=example,ST=Queensland,C=AU"

Once the request is signed, we should have a user.crt back. Import that to our database:

certutil -A -d . -f pwdfile.txt -i user.crt -a -n TLS -t ",,"

Import our CA certificate also. Next export this to p12:

pk12util -o user.p12 -d . -k pwdfile.txt -n TLS

Now import this to the yubikey - remember to use slot 9a this time!

yubico-piv-tool -s9a -i user.p12 -K PKCS12 -aimport-key -aimport-certificate -k --touch-policy=always


What’s the problem with NUMA anyway?

Posted by William Brown on November 06, 2017 02:00 PM

What’s the problem with NUMA anyway?

What is NUMA?

Non-Uniform Memory Architecture is a method of seperating ram and memory management units to be associated with CPU sockets. The reason for this is performance - if multiple sockets shared a MMU, they will cause each other to block, delaying your CPU.

To improve this, each NUMA region has it’s own MMU and RAM associated. If a CPU can access it’s local MMU and RAM, this is very fast, and does not prevent another CPU from accessing it’s own. For example:

CPU 0   <-- QPI --> CPU 1
  |                   |
  v                   v
MMU 0               MMU 1
  |                   |
  v                   v
RAM 1               RAM 2

For example, on the following system, we can see 1 numa region:

# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 12188 MB
node 0 free: 458 MB
node distances:
node   0
  0:  10

On this system, we can see two:

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 24 25 26 27 28 29 30 31 32 33 34 35
node 0 size: 32733 MB
node 0 free: 245 MB
node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23 36 37 38 39 40 41 42 43 44 45 46 47
node 1 size: 32767 MB
node 1 free: 22793 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

This means that on the second system there is 32GB of ram per NUMA region which is accessible, but the system has total 64GB.

The problem

The problem arises when a process running on NUMA region 0 has to access memory from another NUMA region. Because there is no direct connection between CPU 0 and RAM 1, we must communicate with our neighbour CPU 1 to do this for us. IE:

CPU 0 --> CPU 1 --> MMU 1 --> RAM 1

Not only do we pay a time delay price for the QPI communication between CPU 0 and CPU 1, but now CPU 1’s processes are waiting on the MMU 1 because we are retrieving memory on behalf of CPU 0. This is very slow (and can be seen by the node distances in the numactl –hardware output).

Today’s work around

The work around today is to limit your Directory Server instance to a single NUMA region. So for our example above, we would limit the instance to NUMA region 0 or 1, and treat the instance as though it only has access to 32GB of local memory.

It’s possible to run two instances of DS on a single server, pinning them to their own regions and using replication between them to provide synchronisation. You’ll need a load balancer to fix up the TCP port changes, or you need multiple addresses on the system for listening.

The future

In the future, we’ll be adding support for better copy-on-write techniques that allow the cores to better cache content after a QPI negotiation - but we still have to pay the transit cost. We can minimise this as much as possible, but there is no way today to avoid this penalty. To use all your hardware on a single instance, there will always be a NUMA cost somewhere.

The best solution is as above: run an instance per NUMA region, and internally provide replication for them. Perhaps we’ll support an automatic configuration of this in the future.

Why I still choose Ruby

Posted by Mo Morsi on October 08, 2017 07:45 PM

With the plethora of languages available to developers, I wanted to do a quick follow-up post as to why given my experience in many different environments, Ruby is still the goto language for all my computational needs!

Prg mtn

While different languages offer different solutions in terms of syntax support, memory managment, runtime guarantees, and execution flows; the underlying arithmatic, logical, and I/O hardware being controlled is the same. Thus in theory, given enough time and optimization the performance differences between languages should go to 0 as computational power and capacity increases / goes to infinity (yes, yes, Moore's law and such, but lets ignore that for now).

Of course different classes of problem domains impose their own requirements,

  • real time processing depends low level optimizations that can only be done in assembly and C,
  • data crunching and process parallelization often needs minimal latency and optimized runtimes, something which you only get with compiled/static-typed languages such as C++ and Java,
  • and higher level languages such as Ruby, Python, Perl, and PHP are great for rapid development cycles and providing high level constructs where complicated algorithms can be invoked via elegant / terse means.

But given the rapid rate of hardware performance in recent years, whole classes of problems which were previously limited to 'lower-level' languages such as C and C++ are now able to be feasbily implemented in higher level languages.

Computer power


Thus we see high performance financial applications being implemented in Python, major websites with millions of users a day being implemented in Ruby and Javascript, massive data sets being crunched in R, and much more.

So putting the performance aspect of these environments aside we need to look at the syntactic nature of these languages as well as the features and tools they offer for developers. The last is the easiest to tackle as these days most notable languages come with compilers/interpreters, debuggers, task systems, test suites, documentation engines, and much more. This was not always the case though as Ruby was one of the first languages that pioneered builtin package management through rubygems, and integrated dependency solutions via gemspecs, bundler, etc. CPAN and a few other language-specific online repositories existed before, but with Ruby you got integration that was a core part of the runtime environment and community support. Ruby is still known to be on the leading front of integrated and end-to-end solutions.

Syntax differences is a much more difficult subject to objectively dicuss as much of it comes down to programmer preference, but it would be hard to object to the statement that Ruby is one of the most Object Oriented languages out there. It's not often that you can call the string conversion or type identification methods on ALL constructs, variables, constants, types, literals, primitives, etc:

  > 1.to_s
  => "1"
  > 1.class
  => Integer

Ruby also provides logical flow control constructs not seen in many other languages. For example in addition to the standard if condition then dosomething paradigm, Ruby allows the user to specify the result after the predicate, eg dosomething if condition. This simple change allows developers to express concepts in a natural manner, akin to how they would often be desrcibed between humans. In addition to this, other simple syntax conveniences include:

  • The unless keyword, simply evaluating to if not
          file = File.open("/tmp/foobar", "w")
          file.write("Hello World") unless file.exist?
  • Methods are allowed to end with ? and ! which is great for specifying immutable methods (eg. Socket.open?), mutable methods and/or methods that can thrown an exception (eg. DBRecord.save!)
  • Inclusive and exclusive ranges can be specified via parenthesis and two or three elipses. So for example:
          > (1..4).include?(4)
          => true
          > (1...4).include?(4)
          => false
  • The yield keywork makes it trivial for any method to accept and invoke a callback during the course of its lifetime
  • And much more

Expanding upon the last, blocks are a core concept in Ruby, once which the language nails right on the head. Not only can any function accept an anonymous callback block, blocks can be bound to parameters and operated on like any other data. You can check the number of parameters the callbacks accept by invoking block.arity, dynamically dispatch blocks, save them for later invokation and much more.

Due to the asynchronous nature of many software solutions (many problems can be modeled as asynchronous tasks) blocks fit into many Ruby paradigms, if not as the primary invocation mechanism, then as an optional mechanism so as to enforce various runtime guarantees:

  File.open("/tmp/foobar"){ |file|
    # do whatever with file here

  # File is guaranteed to be closed here, we didn't have to close it ourselves!

By binding block contexts, Ruby facilitates implementing tightly tailored solutions for many problem domains via DSLs. Ruby DSLs exists for web development, system orchestration, workflow management, and much more. This of course is not to mention the other frameworks, such as the massively popular Rails, as well as other widely-used technologies such as Metasploit

Finally, programming in Ruby is just fun. The language is condusive to expressing complex concepts elegantly, jives with many different programming paradigms and styles, and offers a quick prototype to production workflow that is intuitive for both novice and seasoned developers. Nothing quite scratches that itch like Ruby!

Doge ruby

Data &amp; Market Analysis in C++, R, and Python

Posted by Mo Morsi on October 08, 2017 01:25 AM

In recent years, since efforts on The Omega Project and The Guild sort of fizzled out, I've been exploring various areas of interest with no particular intent other than to play around with some ideas. Data & Financial Engineering was one of those domains and having spent some time diving into the subject (before once again moving on to something else altogether) I'm sharing a few findings here.

My journey down this path started not too long after the Bitcoin Barber Shop Pole was completed, and I was looking for a new project to occupy my free time (the little of it that I have). Having long since stepped down from the SIG315 board, but still renting a private office at the space, I was looking for some way to incorporate that into my next project (besides just using it as the occasional place to work). Brainstorming a bit, I settled on a data visualization idea, where data relating to any number of categories would be aggregated, geotagged, and then projected onto a virtual globe. I decided to use the Marble widget library, built ontop of the QT Framework and had great success:


The architecture behind the DataChoppa project was simple, a generic 'Data' class was implemented using smart pointers ontop of which the Facet Pattern was incorporated, allowing data to be recorded from any number of sources in a generic manner and represented via convenient high level accessors. This was all collected via synchronization and generation plugins which implement a standarized interface whose output was then fed onto a queue on which processing plugings were listening, selecting the data that they were interested in to be operated on from there. The Processors themselves could put more data onto the queue after which the whole process was repeated ad inf., allowing each plugin to satisfy one bit of data-related functionality.

Datachoppa arch

Core Generic & Data Classes

namespace DataChoppa{
  // Generic value container
  class Generic{
      Map<std::string, boost::any> values;
      Map<std::string, std::string> value_strings;

  namespace Data{
    /// Data representation using generic values
    class Data : public Generic{
        Data() = default;
        Data(const Data& data) = default;

        Data(const Generic& generic, TYPES _types, const Source* _source) :
          Generic(generic), types(_types), source(_source) {}

        bool of_type(TYPE type) const;

        Vector to_vector() const;

        TYPES types;

        const Source* source;
    }; // class Data
  }; // namespace Data
}; // namespace DataChoppa

The Process Loop

  namespace DataChoppa {
    namespace Framework{
      void Processor::process_next(){
        if(to_process.empty()) return;
        Data::Data data = to_process.first();
        Plugins::Processors::iterator plugin = plugins.begin();
        while(plugin != plugins.end()) {
          Plugins::Meta* meta = dynamic_cast<Plugins::Meta*>(*plugin);
          //LOG(debug) << "Processing " << meta->id;
          }catch(const Exceptions::Exception& e){
            LOG(warning) << "Error when processing: " << e.what()
                         << " via " << meta->id;
    }; /// namespace Framework
  }; /// namespace DataChoppa

The HTTP Plugin (abridged)

namespace DataChoppa {
  namespace Plugins{
    class HTTP : public Framework::Plugins::Syncer,
                 public Framework::Plugins::Job,
                 public Framework::Plugins::Meta {
        /// ...

        /// sync - always return data to be added to queue, even on error
        Data::Vector sync(){
          String _url = url();
          Network::HTTP::SyncRequest request(_url, request_timeout);

          for(const Network::HTTP::Header& header : headers())

          int attempted = 0;
          Network::HTTP::Response response(request);

          while(attempts == -1 || attempted &lt; attempts){


              if(attempted == attempts){
                Data::Data result = response.to_error_data();
                result.source = &source;
                return result.to_vector();

              if(attempted == attempts){
                Data::Data result = response.to_error_data();
                result.source = &source;
                return result.to_vector();

              Data::Data result = response.to_data();
              result.source = &source;
              return result.to_vector();

          /// we should never get here
          return Data::Vector();
  }; // namespace Plugins
}; // namespace DataChoppa

Overall I was pleased with the result (and perhaps I should have stopped there...). The application collected and aggregated data from many sources including RSS feeds (google news, reddit, etc), weather sources (yahoo weather, weather.com), social networks (facebook, twitter, meetup, linkedin), chat protocols (IRC, slack), financial sources, and much more. While exploring the last I discovered the world of technical analysis and began incorporating many various market indicators into a financial analysis plugin for the project.

The Market Analysis Architecture

Datachoppa extractors Datachoppa annotators

Aroon Indicator (for example)

namespace DataChoppa{
  namespace Market {
    namespace Annotators {
      class Aroon : public Annotator {
          double aroon_up(const Quote& quote, int high_offset, double range){
            return ((range-1) - high_offset) / (range-1) * 100;

          DoubleVector aroon_up(const Quotes& quotes, const Annotations::Extrema* extrema, int range){
            return quotes.collect<DoubleVector>([extrema, range](const Quote& q, int i){
                     return aroon_up(q, extrema->high_offsets[i], range);

          double aroon_down(const Quote& quote, int low_offset, double range){
            return ((range-1) - low_offset) / (range-1) * 100;

          DoubleVector aroon_down(const Quotes& quotes, const Annotations::Extrema* extrema, int range){
            return quotes.collect<DoubleVector>([extrema, range](const Quote& q, int i){
                     return aroon_down(q, extrema->low_offsets[i], range);

          AnnotationList annotate() const{
            const Quotes& quotes = market->quotes;
            if(quotes.size() < range) return AnnotationList();

            const Annotations::Extrema* extrema = aroon_extrema(market, range);
                    Annotations::Aroon* aroon = new Annotations::Aroon(range);
                                        aroon->upper = aroon_up(market->quotes, extrema, range);
                                        aroon->lower = aroon_down(market->quotes, extrema, range);
            return aroon->to_list();
      }; /// class Aroon
    }; /// namespace Annotators
  }; /// namespace Market
}; // namespace DataChoppa

The whole thing worked great, data was pulled in both real time and historical from yahoo finance (until they discontinued it... from then it was google finance), the indicators were run, and results were output. Of course, making $$$ is not as simple as just crunching numbers, and being rather naive I just tossed the results of the indicators into weighted "buckets" and backtested based on simple boolean flags based on the computed signals against threshold values. Thankfully I backtested though as the performance was horrible as losses greatly exceed profits :-(

At this point I should take a step back and note that my progress so far was the result of the availibilty of alot of great resources (we really live in the age of accelerated learning). Specifically the following are indispensible books & sites for those interested in this subject:

  • stockcharts.com - Information on any indicator can be found on this site with details on how it is computed and how it can be used
  • investopedia - Sort of the Wikipedia of investment knowledge, offers great high level insights into how market works and the financial world as it stands
  • Beyond Candlesticks - Though candlestick patterns have limited use, this is great intro to the subject, and provides a good into to reading charts.
  • Nerds on Wall Street - A great book detailing the history of computational finance. Definetly must read if you are new to the domain as it provides a concise high level history on how markets have worked the last few centuries and various computations techniques employed to Seek Alpha
  • High Probability Trading - Provides insights as to the mentality and common pitfalls when trading.
Beyond candlesticksNerds on wallstreetHigh prob trading

The last book is an excellent resource which conveys the importance of money and risk management, as well as the necessity to combine in all factors, or as many factors as you can, when making financial decisions. In the end, I feel this is the gist of it, it's not soley a matter of luck (though there is an aspect of that to this), but rather patience, discipline, balance, and most importantly focus (similar to Aikido but that's a topic for another time). There is no shorting it (unless you're talking about the assets themselves!), and if one does not have / take the necessary time to research and properly plan and out and execute strategies, they will most likely fail (as most do according to the numbers).

It was at this point that I decided to take a step back and restrategize, and having reflected and discussed it over with some acquaintances, I hedged my bets, cut my losses (tech-wise) and switched from C++ to another platform which would allow me prototype and execute ideas quicker. A good amount of time has gone into the C++ project and it worked great, but it did not make sense to continue via a slower development cycle when faster options are available (and afterall every engineer knows time is our most precious resource).

Python and R are the natural choices for this project domain, as there is extensive support in both languages for market analysis, backtesting, and execution. I have used Python at various, points in the past so it was easy to hit the ground running; R was new but by this time no language really poses a serious surprise, the best way I can describe it is spreadsheets on steroids (not exactly, as rather than spreadsheets, data frames and matrixes are the core components, but one can imagine R as being similar to the central execution environment behind Excel, Matlab, or other statistical-software).

I quickly picked up quantmod and prototyped some volatility, trend-following, momentum, and other analysis signal generators in R, plotting them using the provided charting interface. R is a great language for this sort of data manipulation, one can quickly load up structured data from CSV files or online resources, splice it and dice it, chunk it and dunk it, organize it and prioritize it, according to any arithmatic, statistical, or linear/non-linear means which they desire. Quickly loading a new 'view' on the data is as simply as a line of code, and operations can quickly be chained together at high performance.

Volatility indicator in R (consolidated)

quotes <- load_default_symbol("volatility")

quotes.atr <- ATR(quotes, n=ATR_RANGE)

quotes.atr$tr_atr_ratio <- quotes.atr$tr / quotes.atr$atr
quotes.atr$is_high      <- ifelse(quotes.atr$tr_atr_ratio > HIGH_LEVEL, TRUE, FALSE)

# Also Generate ratio of atr to close price
quotes.atr$atr_close_ratio <- quotes.atr$atr / Cl(quotes)

# Generate rising, falling, sideways indicators by calculating slope of ATR regression line
atr_lm       <- list()
atr_lm$df    <- data.frame(quotes.atr$atr, Time = index(quotes.atr))
atr_lm$model <- lm(atr ~ poly(Time, POLY_ORDER), data = atr_lm$df) # polynomial linear model

atr_lm$fit   <- fitted(atr_lm$model)
atr_lm$diff  <- diff(atr_lm$fit)
atr_lm$diff  <- as.xts(atr_lm$diff)

# Current ATR / Close Ratio
quotes.atr.abs_per <- median(quotes.atr$atr_close_ratio[!is.na(quotes.atr$atr_close_ratio)])

# plots
addTA(quotes.atr$tr, type="h")
addTA(as.xts(as.logical(quotes.atr$is_high), index(quotes.atr)), col=col1, on=1)

While it all works great, the R language itself offers very little syntactic sugar for operations not related to data-processing. While there are libraries for most common functionality found in many other execution environments, languages such as Ruby and Python, offer a "friendlier" experience to both novice and seasoned developers alike. Furthermore the process of data synchronization was a tedious step, I was looking for something that offered the flexability of DataChoppa to pull in and process live and historical data from a wide variety of sources, caching results on the fly, and using those results and analysis for subsequent operations.

This all led me to developing a series of Python libraries targeted towards providing a configurable high level view of the market. Intelligence Amplification (IA) as opposed to Artifical Intelligence (AI) if you will (see Nerds on Wall Street).

marketquery.py is a high level market querying library, which implements plugins used to resolve generic market queries for ticker time based data. One can used the interface to query for the lastest quotes or a specific range of them from a particular source, or allow the framework to select one for you.

Retrieve first 3 months of the last 5 years of GBPUSD data

  from marketquery.querier        import Querier
  from marketbase.query.builder   import QueryBuilder
  sym = "GBPUSD"
  first_3mo_of_last_5yr = (QueryBuilder().symbol(sym)
  querier = Querier()
  res     = querier.run(first_3mo_of_last_5yr)
  for query, dat in res.items():
      print(dat.raw[:1000] + (dat.raw[1000:] and '...'))

Retrieve last two month of hourly EURJPY data

  from marketquery.querier        import Querier
  from marketbase.query.builder   import QueryBuilder
  sym = "EURJPY"
  two_months_of_hourly = (QueryBuilder().symbol(sym)
  querier = Querier()
  res     = querier.run(two_months_of_hourly).raw()
  print(res[:1000] + (res[1000:] and '...'))

This provides a quick way to both lookup market data according to specific criteria, as well as cache it so that network resources are used effectively. All caching is configurable, and the user can define timeouts based on the target query, source, and/or data retrieved.

From there the next level up is the technical analysis is was trivial to whip up the tacache.py module which uses the marketquery.py interface to retrieve raw data before feeding it into TALib caching the results. The same caching mechanisms offering the same flexability is employed, if one needs to process a large data set and/or subsets multiple times in a specified period, computational resources are not wasted (important when running on a metered cloud)

Computing various technical indicators

  from marketquery.querier       import Querier
  from marketbase.query.builder  import QueryBuilder
  from tacache.runner            import TARunner
  from tacache.source            import Source
  from tacache.indicator         import Indicator
  from talib                     import SMA
  from talib                  import MACD
  res = Querier().run(QueryBuilder().symbol("AUDUSD")
  ta_runner = TARunner()
  analysis  = ta_runner.run(Indicator(SMA),
  analysis  = ta_runner.run(Indicator(MACD),
  macd, sig, hist = analysis.raw

Finally ontop of all this I wrote a2m.py, a high level querying interface consisting of modules reporting on market volatility and trends as well as other metrics; python scripts which I could quickly execute to report the current and historical market state, making used of the underlying cached query and technical analysis data, periodically invalidated to pull in new/recent live data.

Example using a2m to compute volatility

  sym = "EURUSD"
  self.resolver  = Resolver()
  self.ta_runner = TARunner()

  daily = (QueryBuilder().symbol(sym)

  hourly = (QueryBuilder().symbol(sym)

  current = (QueryBuilder().symbol(sym)

  daily_quotes   = resolver.run(daily)
  hourly_quotes  = resolver.run(hourly)
  current_quotes = resolver.run(current)

  daily_avg  = ta_runner.run(Indicator(talib.SMA, timeperiod=120),  query_result=daily_quotes).raw[-1]
  hourly_avg = ta_runner.run(Indicator(talib.SMA, timeperiod=30),  query_result=hourly_quotes).raw[-1]

  current_val    = current_quotes.raw()[-1]['Close']
  daily_percent  = current_val / daily_avg  if current_val &lt; daily_avg  else daily_avg  / current_val
  hourly_percent = current_val / hourly_avg if current_val &lt; hourly_avg else hourly_avg / current_val
Awesome to the max

I would go onto use this to execute some Forex trades, again not in an algorithmic / automated manner, but rather based on combined knowledge from fundamentals research, as well as the high level technical data, and what was the result...

Poor squidward

I jest, though I did lose a little $$$, it wasn't that much, and to be honest I feel this was due to lack of patience/discipline and other "novice" mistakes as discussed above. I did make about 1/2 of it back, and then lost interest. This all requires alot of focus and time, and I had already spent 2+ years worth of free time on this. With many other interests pulling my strings, I decided to sideline the project(s) alltogether and focus on my next crazy venture.


After some of consideration, I decided to release the R code I wrote under the MIT license. They are rather simple expirements though could be useful as a starting point for others new to the subject. As far as the Python modules and DataChoppa, I intended to eventually release them but aim to take a break first to focus on other efforts and then go back to the war room, to figure out the next stage of the strategy.

And that's that! Enough number crunching, time to go out for a hike!

Hiking meme

Discussion about the greenwave tool.

Posted by David Carlos on September 15, 2017 03:10 PM


Because of my work on Google Summer of Code this year, I was invited to attend to the Fedora Contributor Conference (Flock) as volunteer, helping the organization staff to record some sessions and writing about what was discussed on then. This year, the FLock Conference was in Cape Cod, Massachusetts. Was a incredible experience, allowing me to keep up with great discussions made by the Fedora developers. On this post I will make a resume of what was discussed on the session Gating on automated tests in Fedora - Greenwave, proposed by Pengfei Jia.

The Session

Bodhi is the Fedora service that permits developers to propose a package update for a Fedora distribution. Beyond several functionalities of Bodhi, one that is important for us its that it queries ResultsDB for automated test results and displays them on updates. Greenwave, tool presented on the session, is a service that Bodhi will query to decide if an update is ready to be pushed, based on its test results.

The main purpose of Greenwave is to improve the use of automated tests on Bodhi, mainly because now a day, the automated tests (executed by taskotron) serves only for visualization, not having a useful integration with Bodhi. This is a problem because the developer can release a new update, without check if the tests are passing, and this can break other packages on Fedora. Greenwave, to avoid packages with broken tests, defines policies, to enforce checking the result of some tests.

The main purpose is to make available to developers, an API, that they can define polices that Greenwave will use to check the result of specific tests, telling Bodhi (with fedmsg) what was the result of the tests. Based on Greenwave response for some package tests, Bodhi can decides if a new update can be released. On this link, you can find a example of the use of Greenwave API, and how it policies works. Greenwave will use ResultsDB to access test results. On the session, one of the participants, asked if would not be better if packagers manually check the policies, during the package development. The answer was that these policies are running for four years, and the participant was the first that proposed that, so enforcing checking these policies is something necessary during Fedora updates.

Discussion about the future of fedmsg.

Posted by David Carlos on September 15, 2017 01:10 PM


Because of my work on Google Summer of Code this year, I was invited to attend to the Fedora Contributor Conference (Flock) as volunteer, helping the organization staff to record some sessions and writing about what was discussed on then. This year, the FLock Conference was in Cape Cod, Massachusetts. Was a incredible experience, allowing me to keep up with great discussions made by the Fedora developers. On this post I will make a resume of what was discussed on the session The Future of fedmsg?, proposed by Jeremy Cline.

The Session

The Fedora infrastructure have several different services that need to talk to each other. One simple example is the AutoQA service, that listen to some events triggered by the fedpkg library. If we have only two services interacting the problem is minimal, but when several applications request and response to other several applications, the problem becomes huge. FEDerated MeSsaGe bus (fedmsg), is a python package and API defining a brokerless messaging architecture to send and receive messages to and from applications.

fedmsg does not have a broker to manage the publish/subscribe process, done by the services that interacts with it. This leads to a problem of performance and reliability, because every service that consumes a message from fedmsg have to subscribe to every existent topic. Other issue with the absence of a broker, is that it is common to lose messages, that not always get to the destination. Jeremy proposed to use a broker (or several brokers) to fix these issues, and made a demo code showing the benefits of using a broker, instead of the current architecture of fedmsg.

A great discussion emerged from this demo code, including the reflection if Fedora really needs that fedmsg be reliable. Other problem pointed by Jeremy was the documentation of fedmsg, and the existent tools to consume and publish messages (fedmsg-hub have a setup that is quite confuse). This was my review of the session, and based on my work on Google Summer of Code (I had used fedmsg to consume the Anitya events), I agree with Jeremy. Adding a broker to manage the publish/subscribe process, could improve the consume of resources of fedmsg, would facilitate to add new services consuming messages from the API and would make fedmsg more reliable.

GSoC2017 (Fedora) —— Final

Posted by Mandy Wang on August 29, 2017 10:19 AM

There is the summary about my work in Google Summer of Code during the last three months.


Plinth is a web interface to administer the functions of the FreedomBox which is a Debian based project, and the main goal of this idea is to make it available for Fedora.

My Work


  • Modifying the source code module by module to convert it to RPM-based, including replacing the apt command code with the dnf command code or fit both of them, changing the Deb-based packages into RPM-based packages which play the same roles and testing after each module finished.
  • Add the guide of RPM-based package to Plinth User Guide and create a wiki page for it in Fedora.

This is the welcome page which is run in Fedora:


To Do

  • Some packages which is needed by Plinth, but I can’t find their suitable replacement or effective solution in Fedora, except copying them from Debian directly. For example:
    • Javascript — many pages can’t be loaded perfectly because of that.
    • LDAP — we can’t complete set up because of that.
  • Make a RPM package for Plinth from source and setup a repo for it in Copr.



As why Fedora, just because Fedora is the Linux distribution I use the most, so I want to know more about it and make contributions to it, and I believe GSOC is a good chance to integrate into a community, because I had the similar experience in GNOME during Outreachy. And when I went to Taipei for the COSCUP 2017 in early August, I joined the offline meeting of Fedora Taiwan and advertised GSoC to others.

I must say the last three months in GSoC was a quite valuable experience for me. This idea is not easy as I thought, I learned more about the difference between .rpm and .deb during this period, and my VPN was blocked in the second phase. Fortunately, I dealt most problems I met under my try and my mentor’s guide.

At last, thanks to Google and Fedora for giving me this opportunity, and thanks to my mentor, our admin and the people from Fedora and Debian who had given me help.


This work by Mandy Wang is licensed under a Creative Commons Attribution-ShareAlike 4.0 International




Summer of coding on the Fedora Media Writer - Work Product Submission

Posted by squimrel on August 28, 2017 11:00 AM

Fedora Media Writer - Persistent storage

Summer of code with Google

I worked on a feature for the Fedora Media Writer that makes it possible to persistently store data while booted into a live system. The Fedora Media Writer makes the portable media device bootable using an ISO image.

The Fedora Media Writer is written in C++ and targets Linux, Mac and even Windows users.

Work done

My work can be found for each repository at the following locations (ordered by amount of work done):

All changes are also available in this patch directory.

  • In total git says that I added 7.5k lines and removed 5k lines of code.
  • I removed around 0.2k more lines from the MediaWriter than I added.

Community Bonding

In the community bonding period I looked at other well known projects that can create bootable devices with persistent storage. Most was Linux only and the rest was horrible. The approach of the projects I've looked at is to create a FAT partition on the disk, copy what's needed from the ISO image and make the disk bootable.

My mentor suggested that we should rather not change the way how the portable media device is currently made bootable with the Fedora Media Writer. Which is by copying the ISO image directly to disk (dd-like).

Manipulate an ISO 9660 image

Therefore we took an approach that I've not yet seen implemented anywhere. I manipulated the ISO image, which is supposed to be a read-only file system in-place without extracting and repacking. To do that that I wrote a library that messes with the ISO 9660 file system.

The main task was to modify a couple grub.cfg files to make persistent storage happen. Dracut does the rest for me. The difficulty was to edit grub.cfg files which are inside of an HFS+ or FAT image that is stored on the ISO 9660 image. Since I didn't find any cross-platform C or C++ library for dealing with that either and I didn't want to write a library for stuff like this again the result is a bunch of hacky code that at least works.

Create writable space for the overlay

Since ISO 9660 is a read-only file system I needed to create and format another partition behind the current one. This was trouble mainly due to the isohybrid layout but also due to the fact that this had to work on all three platforms mentioned above and that I couldn't find a library that just does this for me.

I ended up manually adding the partition to the MBR partition table using C++ and then wrote up some code that creates a FAT32 partition with an OVERLAY.IMG file that is used to persistently store data. I made some mistakes along the way which were fixed in the last week of summer of code.


Along the way I also submitted code to udisks and libblockdev since the initial idea was to add the partition through their interface at least on Linux but isohybrid stood in the way.

I tinkered a bit with isomd5sum in community bonding period. A project that is used to store the checksum of an ISO image inside the ISO image itself which is then used at boot time to check if the media is alright and it's also used by the Fedora Media Writer. It isn't a proper dependency though since its source code was simply dropped into the FMW repository at some point which is against the Fedora Packaging guidelines. To make it a proper dependency it needs to become a MinGW package and it had to be maintained a bit after I refactored it in the Community Bonding period so I also worked on that during summer of code.

What's left


This does not yet work on Windows. I started debugging Windows late but eventually I figured out that on the file descriptor _open_osfhandle gave me I have to perform 512-byte aligned operations which I don't do when dealing with persistent storage. I'm not quite sure how to fix this properly but apart from that enabling persistent storage would work on Windows too.


Still needs to appear as a MinGW package. bcl is a great developer who maintains a lot of projects so he doesn't have much time for my isomd5sum PRs. Therefore there's still a PR waiting to be merged which consists of MinGW support and a build script for it. This is important because if the code I wrote would be merged the Windows build would not work without this package.


Still needs to pass the package review process so that Fedora Media Writer can link against it.

Summer of coding on the Fedora Media Writer - Work Product Submission

Posted by squimrel on August 28, 2017 11:00 AM

Fedora Media Writer - Persistent storage

Summer of code with Google

I worked on a feature for the Fedora Media Writer that makes it possible to persistently store data while booted into a live system. The Fedora Media Writer makes the portable media device bootable using an ISO image.

The Fedora Media Writer is written in C++ and targets Linux, Mac and even Windows users.

Work done

My work can be found for each repository at the following locations (ordered by amount of work done):

All changes are also available in this patch directory.

  • In total git says that I added 7.5k lines and removed 5k lines of code.
  • I removed around 0.2k more lines from the MediaWriter than I added.

Community Bonding

In the community bonding period I looked at other well known projects that can create bootable devices with persistent storage. Most was Linux only and the rest was horrible. The approach of the projects I've looked at is to create a FAT partition on the disk, copy what's needed from the ISO image and make the disk bootable.

My mentor suggested that we should rather not change the way how the portable media device is currently made bootable with the Fedora Media Writer. Which is by copying the ISO image directly to disk (dd-like).

Manipulate an ISO 9660 image

Therefore we took an approach that I've not yet seen implemented anywhere. I manipulated the ISO image, which is supposed to be a read-only file system in-place without extracting and repacking. To do that that I wrote a library that messes with the ISO 9660 file system.

The main task was to modify a couple grub.cfg files to make persistent storage happen. Dracut does the rest for me. The difficulty was to edit grub.cfg files which are inside of an HFS+ or FAT image that is stored on the ISO 9660 image. Since I didn't find any cross-platform C or C++ library for dealing with that either and I didn't want to write a library for stuff like this again the result is a bunch of hacky code that at least works.

Create writable space for the overlay

Since ISO 9660 is a read-only file system I needed to create and format another partition behind the current one. This was trouble mainly due to the isohybrid layout but also due to the fact that this had to work on all three platforms mentioned above and that I couldn't find a library that just does this for me.

I ended up manually adding the partition to the MBR partition table using C++ and then wrote up some code that creates a FAT32 partition with an OVERLAY.IMG file that is used to persistently store data. I made some mistakes along the way which were fixed in the last week of summer of code.


Along the way I also submitted code to udisks and libblockdev since the initial idea was to add the partition through their interface at least on Linux but isohybrid stood in the way.

I tinkered a bit with isomd5sum in community bonding period. A project that is used to store the checksum of an ISO image inside the ISO image itself which is then used at boot time to check if the media is alright and it's also used by the Fedora Media Writer. It isn't a proper dependency though since its source code was simply dropped into the FMW repository at some point which is against the Fedora Packaging guidelines. To make it a proper dependency it needs to become a MinGW package and it had to be maintained a bit after I refactored it in the Community Bonding period so I also worked on that during summer of code.

What's left


This does not yet work on Windows. I started debugging Windows late but eventually I figured out that on the file descriptor _open_osfhandle gave me I have to perform 512-byte aligned operations which I don't do when dealing with persistent storage. I'm not quite sure how to fix this properly but apart from that enabling persistent storage would work on Windows too.


Still needs to appear as a MinGW package. bcl is a great developer who maintains a lot of projects so he doesn't have much time for my isomd5sum PRs. Therefore there's still a PR waiting to be merged which consists of MinGW support and a build script for it. This is important because if the code I wrote would be merged the Windows build would not work without this package.


Still needs to pass the package review process so that Fedora Media Writer can link against it.

GSoC: Final Report

Posted by David Carlos on August 27, 2017 06:20 PM

GSoC: Final Report


This is the final report of my work on Google Summer of Code program. My name is David Carlos and I am a Brazilian software engineering student, at University of Brasilia. I already work as programmer, and really love what I do for a living. When I am not working I am with my family and friends, enjoying good beer and listening to the best Brazilian music style, Samba.

Google Summer of Code 2017

The first time that I heard about GSoC was when a friend from University of Brasilia was accepted on the program, working with the Debian distribution. I already had contributed with open source projects before, but nothing compared with projects like Debian or Fedora. Participating on GSoC this year was the best programming experience that I have ever had. The felling of making part of the Fedora community and writing code that can be used by other people is the best thing that I got from the program. As my project was a new experimental tool, I think that my interaction with the community could be better. Receiving feedback of other Fedora developers, beyond my GSoC mentor, could have improved my skills as a programmer even more, as well as my interaction with the people that make Fedora happen.

Participating of GSoC this year put me on another level as programmer, and my main objective is to help the Fedora community even more, to keep delivering this great operating system.


Static analyzers are computer programs that analyze other computer programs. This is generally done by checking source code through static analysis methods. This is a good means to support software assurance, since static analysis can in theory enumerate all possible interactions in a program, having the potential to find rare occurrences that would be harder to find with automated testing.

kiskadee is a system designed to support continuous static analysis in software repositories using different static analyzers and to store this information in a database. Based on such database information, kiskadee will rank warnings reported by the different static analyzers, where warnings with the highest rank are more likely to indicate real and more critical software flaws, while warnings with the lowest rank are more likely to be false positives. In this context, a warning is a single issue produced by a static analyzer. Finally, kiskadee maps software flaws inserted in specific software versions, providing developers with a relatively small list of warnings to be investigated in a suggested order.

To accomplish the process of monitoring and analysis, we defined an architecture that allows us to download software packages source code and run different static analyzers on them, saving the results using the Firehose project. Figure 1 shows this architecture:

Drawing ..

Figure 1: Kiskadee architecture.

The ideia is to have the monitoring part of kiskadee decoupled from the analysis part, and to allow easy integration of new static analyzers. For a more complete view of why we have built kiskadee this way, check our documentation. kiskadee was developed from the ground, and the architecture presented on Figure 1 was designed with my GSoC mentor. To check all the work I put on kiskadee during GSoC, you can download this file, with all my commits (these commits not includes the user interface work, that is being hosted on another repository).

kiskadee is still in a development stage, and is an experimental project. Our long term objective, is to have kiskadee running on Fedora infrastructure, as proposed by the Static Analysis SIG. You can help us on the development of kiskadee, opening issues, fixing bugs or with new features. On this link you can find our repository, and here the documentation. The steps to run kiskadee can be found in our README.

On the last release of kiskadee we had developed an API, that exposes some endpoints that returns several informations about our static analysis. These endpoints are:

    - An endpoint to list all analyzed packages.
    - An endpoint to return the analysis of a specific package.
    - An endpoint to list the fetchers available for monitoring.

With the API we also developed an experimental user interface to consume it. This user interface is not in production, and it will probably have significant changes in the next months. Image 2 shows a list of packages analyzed by kiskadee:

Drawing ..

Figure Two: Kiskadee UI.

You can check the development of the user interface on this repository.


I would like to thank the entire Fedora community, by had accepted my proposal on GSoC. I also want to thank my mentor, Athos Ribeiro, for the support through this months. His help on the creative process of kiskadee development, and the freedom that he gave me to take technical decisions, made me a better programmer. I received a lot from Fedora, and now it's time to give back to the community my work and dedication.

Final GSoC 2017 Report

Posted by Ilias Stamatis on August 23, 2017 07:12 PM

This is a summary of what I did as GSoC participant working for the 389 Directory Server project.


The initial and main goal of this project was to develop command line tools for managing some of the existing Directory Server plugins in an easier way. This goal has been fulfilled and I additionally did some extra work on the lib389 python administration framework. Moreover, I experimented a bit with the C code base – the actual server – and worked on a few issues there as well.

What I did in sort:

  • Added dsconf support for managing the following plugins: MemberOf plugin, Referential Integrity plugin, USN plugin, Rootdn Access Control plugin.
  • Wrote code for executing server tasks through dsconf, such as Dynamic Schema Reload, MemberOf fixup task, etc.
  • Wrote command line tests and in some cases functional tests as well for testing the functionality of some plugins along with their command line behavior.
  • Worked on various other smaller issues on the lib389 codebase from fixing bugs and test cases, and adding new functionality to helping port things to python3.
  • Did some work on the 389-ds-base codebase for performing syntax checking on plugins and logging error messages/warnings when needed.


I don’t understand what you did exactly, can you show me examples? Sure. Until recently you had to apply LDIF files for configuring server plugins . You can see such an example of applying an LDIF in a previous post of mine: dsconf: Adding support for MemberOf plug-in. Or for checking a plugin’s status (whether it’s enabled or disabled) you had to make an LDAP search using some utility like the ldapsearch tool.

Let’s see a few cool things we can now do with the new command line toolkit.

If we want to examine the status of USN plugin it’s only a matter of:

~ $ dsconf localhost usn status 
USN is disabled

…then enable it because it is off:

~ $ dsconf localhost usn enable
Enabled USN

…and turn the global mode on:

~ $ dsconf localhost usn global on
USN global mode enabled

Let’s say I want to set the update the interval of Referential Integrity plugin:

~ $ dsconf localhost referint delay 60
referint-update-delay set to "60"

…and then see on which attributes referint is operating:

~ $ dsconf localhost referint attrs
referint-membership-attr: member
referint-membership-attr: uniquemember
referint-membership-attr: owner
referint-membership-attr: seeAlso

Now, let’s play with memberOf. Say I want to make it work on all backends:

~ $ dsconf localhost memberof allbackends on
memberOfAllBackends enabled successfully

…but I don’t want it to skip nested groups:

~ $ dsconf localhost memberof skipnested off
memberOfSkipNested unset successfully

I wonder if I can see the whole memberOf config entry at once. Of course!

~ $ dsconf localhost memberof show
dn: cn=MemberOf Plugin,cn=plugins,cn=config
cn: MemberOf Plugin
memberofallbackends: on
memberofattr: memberOf
memberofgroupattr: member
memberofgroupattr: uniqueMember
memberofskipnested: off
nsslapd-plugin-depends-on-type: database
nsslapd-pluginDescription: memberof plugin
nsslapd-pluginEnabled: on
nsslapd-pluginId: memberof
nsslapd-pluginInitfunc: memberof_postop_init
nsslapd-pluginPath: libmemberof-plugin
nsslapd-pluginType: betxnpostoperation
nsslapd-pluginVendor: 389 Project
objectClass: top
objectClass: nsSlapdPlugin
objectClass: extensibleObject

And what about running the MemberOf fixup task?

~ $ dsconf localhost memberof fixup -b "dc=example,dc=com"  
Attempting to add task entry... This will fail if MemberOf plug-in is not enabled.
Successfully added task entry cn=memberOf_fixup_2017_08_23_18_49_50,cn=memberOf task,cn=tasks,cn=config

How about interacting with Rootdn Access Control plugin now in order to configure IP based access control for the Directory Manager?

~ $ dsconf localhost rootdn ip allow added to rootdn-allow-ip
~ $ dsconf localhost rootdn ip allow added to rootdn-allow-ip
~ $ dsconf localhost rootdn ip deny "192.168.1.*"
192.168.1.* added to rootdn-deny-ip

And if I actually want to inspect what I did:

~ $ dsconf localhost rootdn ip 

rootdn-deny-ip: 192.168.1.*

These are just a few examples and by using the new command line toolkit you can actually do every operation that is possible by applying LDIF files or doing LDAP searches, only quicker and easier now.


So why would I prefer to use dsconf to configure the server instead of doing it the “standard” way? Well, for starters because it’s obviously easier. Applying LDIFs for performing the simplest operations can soon become very tedious.

But there’s actually an even bigger advantage. The API now becomes _discoverable_. The admin now doesn’t need to go to the docs to see what options are there for each plugin (and you do know that most admins are lazy, right?).

For example, if they want to know what they can do with the USN plugin they can type:

~ $ dsconf localhost usn -h 
usage: dsconf instance usn [-h]
 {show,enable,disable,status,global,cleanup} ...

positional arguments:
    show              display plugin configuration
    enable            enable plugin
    disable           disable plugin
    status            display plugin status
    global            get or manage global usn mode
    cleanup           run the USN tombstone cleanup task

Or if they wants to see what options there are for configuring IP based access control for the Directory Manger they can run:

~ $ dsconf localhost rootdn ip -h 
usage: dsconf instance rootdn ip [-h] {allow,deny,clear} ...

positional arguments:
 {allow,deny,clear}   action
    allow             allow IP addr or IP addr range
    deny              deny IP addr or IP addr range
    clear             reset IP-based access policy

We additionally wish to add extensive help for dsconf in the future and better command line documentation of what each attribute does and how it can be configured. The ultimate goal is to make dsconf and friends a complete one stop for admins when configuring the server.


Enough talk! Show me the code!

All lib389 commits: https://goo.gl/WWAHmH
All 389-ds-base commits: https://goo.gl/eSaS1w

Although numbers do not mean a lot in that case, according to git 3,337 LOC have been added to the lib389 code base and 132 LOC have been removed in a total of 19 commits that have been merged.

Bugs Found

While working on the project I have discovered a few bugs (and fixed some of them).

Here are some examples:

  • #49284 – DS crashes when trying to completely remove some optional memberOf attributes
  • #49309 – Referential Integrity does not perform syntax checking on referint-update-delay
  • #49341 – dbscan is broken for entryrdn
  • #49224 – prefixdir in config.log set to ‘NONE’
  • #49274 – Memberof autoaddoc attribute shall accept only specific object classes

Future Work

There are many more interesting challenges to be faced, and that’s why I plan to continue contributing to the project whenever I can and whenever I find interesting tickets to work on.

I have already started working on an issue about performing a database dump on emergency recovery scenarios that will be completed outside the GSoC period. I’m looking forward to learning new things and facing new challenges!


I can’t emphasize enough how grateful I am for having the chance to have William Brown as a mentor. I learned so much during this period and William constantly motivated me to do my best for the project and the community. I would also like to thank the DS team, and the Fedora Project in general for this great experience and collaboration.

GSoC 2017 - Mentor Report from 389 Project

Posted by William Brown on August 23, 2017 02:00 PM

GSoC 2017 - Mentor Report from 389 Project

This year I have had the pleasure of being a mentor for the Google Summer of Code program, as part of the Fedora Project organisation. I was representing the 389 Directory Server Project and offered students the oppurtunity to work on our command line tools written in python.


From the start we have a large number of really talented students apply to the project. This was one of the hardest parts of the process was to choose a student, given that I wanted to mentor all of them. Sadly I only have so many hours in the day, so we chose Ilias, a student from Greece. What really stood out was his interest in learning about the project, and his desire to really be part of the community after the project concluded.

The project

The project was very deliberately “loose” in it’s specification. Rather than giving Ilias a fixed goal of you will implement X, Y and Z, I chose to set a “broad and vague” task. Initially I asked him to investigate a single area of the code (the MemberOf plugin). As he investigated this, he started to learn more about the server, ask questions, and open doors for himself to the next tasks of the project. As these smaller questions and self discoveries stacked up, I found myself watching Ilias start to become a really complete developer, who could be called a true part of our community.

Ilias’ work was exceptional, and he has documented it in his final report here .

Since his work is complete, he is now free to work on any task that takes his interest, and he has picked a good one! He has now started to dive deep into the server internals, looking at part of our backend internals and how we dump databases from id2entry to various output formats.

What next?

I will be participating next year - Sadly, I think the python project oppurtunities may be more limited as we have to finish many of these tasks to release our new CLI toolset. This is almost a shame as the python components are a great place to start as they ease a new contributor into the broader concepts of LDAP and the project structure as a whole.

Next year I really want to give this oppurtunity to an under-represented group in tech (female, poc, etc). I personally have been really inspired by Noriko and I hope to have the oppurtunity to pass on her lessons to another aspiring student. We need more engineers like her in the world, and I want to help create that future.

Advice for future mentors

Mentoring is not for everyone. It’s not a task which you can just send a couple of emails and be done every day.

Mentoring is a process that requires engagement with the student, and communication and the relationship is key to this. What worked well was meeting early in the project, and working out what community worked best for us. We found that email questions and responses worked (given we are on nearly opposite sides of the Earth) worked well, along with irc conversations to help fix up any other questions. It would not be uncommon for me to spend at least 1 or 2 hours a day working through emails from Ilias and discussions on IRC.

A really important aspect of this communication is how you do it. You have to balance positive communication and encouragement, along with critcism that is constructive and helpful. Empathy is a super important part of this equation.

My number one piece of advice would be that you need to create an environment where questions are encouraged and welcome. You can never be dismissive of questions. If ever you dismiss a question as “silly” or “dumb”, you will hinder a student from wanting to ask more questions. If you can’t answer the question immediately, send a response saying “hey I know this is important, but I’m really busy, I’ll answer you as soon as I can”.

Over time you can use these questions to help teach lessons for the student to make their own discoveries. For example, when Ilias would ask how something worked, I would send my response structured in the way I approached the problem. I would send back links to code, my thoughts, and how I arrived at the conclusion. This not only answered the question but gave a subtle lesson in how to research our codebase to arrive at your own solutions. After a few of these emails, I’m sure that Ilias has now become self sufficent in his research of the code base.

Another valuable skill is that overtime you can help to build confidence through these questions. To start with Ilias would ask “how to implement” something, and I would answer. Over time, he would start to provide ideas on how to implement a solution, and I would say “X is the right one”. As time went on I started to answer his question with “What do you think is the right solution and why?”. These exchanges and justifications have (I hope) helped him to become more confident in his ideas, the presentation of them, and justification of his solutions. It’s led to this excellent exchange on our mailing lists, where Ilias is discussing the solutions to a problem with the broader community, and working to a really great answer.

Final thoughts

This has been a great experience for myself and Ilias, and I really look forward to helping another student next year. I’m sure that Ilias will go on to do great things, and I’m happy to have been part of his journey.

Week twelve: Summer of coding report

Posted by squimrel on August 23, 2017 09:41 AM

Fix stuff to wrap up

This is the final evaluation week of summer of code. It turns out when booting with the fat partition that was created by the code I wrote there were a lot of errors. It seemed funny to me because mounting on Linux worked fine.

FAT Specification

Since I didn’t bother to read most of the specification and learned how fat works mainly from dosfstools I forgot to maintain the cluster chain.

A fat file-system is divided into clusters which vary in size based on the fat file-system size. There’s a fat allocation table which marks which clusters have been used by files. If you store a file that uses cluster number 4 to 100 you have to mark all those clusters as used otherwise fat could be unhappy.
It felt a bit ironic to do that because there’s already a marker that “hints” where the next free cluster is and the file size and cluster location is provided by every extent. I guess it could help a bit when a lot of i/o was done on the file-system and it’s messed up and fragmented.

Also I forgot to specify how many free clusters the file-system has.
Feel free to look at the fix.

Boot with persistent storage on Mac (Flags in HFS+)

Support for that was added but it’s very ugly code. Maybe the ugliest I’ve ever written and it doesn’t work on Windows yet because it uses memmem and that’s not available on Windows even though we’re compiling with MinGW.

I broke isomd5sum for Windows

At least it seems to calculate the incorrect checksum even though I fixed some bugs I introduced. Not sure what exactly is the issue. Debugging this is hard even though I got proper debugging tools on Windows. I’ll hope I’ll find the issue in the next couple days before I submit the evaluation.

Performing a complete DB dump in LDIF format (2/3)

Posted by Ilias Stamatis on August 20, 2017 11:54 PM

In the previous post we’ve seen about how data is actually stored by the Directory Server. An extremely important feature for any directory-like service is the ability to backup and restore data as easy as possible. In our case we need to be able to export data in LDIF format which is the standard for representing LDAP directory content, and of course the opposite as well; import directory entries from an ldif file.

For this exact purpose there’s the db2ldif tool provided by 389 Directory Server which deals with exporting data from the server. The thing is though, that db2ldif relies on a fully functional and working server. That is, if the server is not working you cannot export the data! That of course is a pity. As we have seen in the previous post, all directory content is actually stored in the id2entry.db file.

That is what ticket #47567 is all about. We want to be able to do a complete database dump in LDIF format for emergency recovery scenarios where all else has been damaged / corrupted (and assuming that id2entry is still readable of course). This functionality will be added in the dbscan tool.

When doing a simple “dbscan -f” the result is not a valid LDIF format, but it is close to it. So let’s focus on the modifications we need to do in this case.

Again, here’s an example entry from id2entry.db as displayed by dbscan:

[root@fedorapc userRoot]# dbscan -K 3 -f id2entry.db
id 3
 rdn: ou=Groups
 objectClass: top
 objectClass: organizationalunit
 ou: Groups
 nsUniqueId: 88c7da02-6b3f11e7-a286de5a-31abe958
 createTimestamp: 20170717223028Z
 modifyTimestamp: 20170717223028Z
 parentid: 1
 entryid: 3
 numSubordinates: 5

First of all, the DN of the entry is missing. We have everything we need to build it though. We have the entry’s RDN and we know that DN = RDN + parent’s DN. Given that we have the parent’s id as well, we need to walk this all the way back to the root entry (identified by the absence of the parentid field) and join these to make a DN.

Second, we need to properly format the entry. That means that operational attributes such as createTimestamp etc. along with attributes like parentid, entryid need to be removed from the final result.

Third, if the purpose is to create a useful ldif file which could eventually be used for import, then formatting an entry correctly is not enough. Order of entries matters as well: parents need to come before children. So we can’t just print entries in the order that they are stored in id2entry.db. We have to care about this as well.

Regarding the first point – DN construction – you may actually ask: “Why don’t we store the DN in the entry directly in the first place to avoid all this hassle?” or at least that’s what I wondered when I was thinking about the problem. The answer is simple; modrdn. If we rename an entry, we can literally just “re-parent” it by changing the entry’s parentid. This moves the entry and all its children in a single operation.

Here is what we have discussed in the lists so far regarding this issue (it contains some possible implementation suggestions):

There’s more to be discussed however and this issue will take some time probably. In these blog posts I tried to briefly describe what we wish to do in a higher level without getting too much into implementation details. Obviously there’s not enough time to finish this as part of GSoC, but I will continue working on this issue and I’ll give updates here. In part 3 of this post I’ll come back with implementation details after there are working examples.

Next post will be a final GSoC report, as the program comes to its end, summarizing what I did this summer working in the 389 Directory Server project.

Week eleven: Summer of coding report

Posted by squimrel on August 16, 2017 12:33 AM

Persistent storage works with UEFI

Most of the work I’m doing has to do with byte fiddling since I want to develop cross-platform solutions in areas where no cross-platform libraries are available.

It might be better to develop a library for things like this but the use-cases I’m working on are so tiny that for me it’s just not worth it to write a library for them.

A quick overview of things I needed to do manually because I didn’t find a “cross-platform” C or C++ library that does them. Keep in mind that I might have done crappy research.

  • Modify a file on an ISO 9660 file system.
  • Add a partition to an MBR partition table.
  • Create a FAT file system.
  • Modify a file on a FAT file system.

It’s pretty sad that after so many years no one has found the time to write libraries for these simple tasks. Maybe it’s just not the most fun thing to do.

I solved all these using byte fiddling. Except for ISO 9660 I started to write a library that could be extend in the future to be able to do more than modify a file on it. In particular it should be noted that ISO 9660 is a read-only file system and therefore not meant to be modified.

efiboot.img byte fiddling

Making persistent storage work on UEFI is just a matter of adding a couple switches to the grub.cfg which is inside a FAT file system that is stored inside of an ISO 9660 file system as a file called efiboot.img.

UEFI uses this file because there’s an EFI partition that starts at at the same block as the efiboot.img file content.

I had to figure out where the position of the grub.cfg inside the efiboot.img is, add the corresponding switches and tell the FAT files system the new file size of the grub.cfg file.


Booting into the live system with persistent storage enabled on Mac does not work because for that I’d have to modify a file on an HFS+ file system and HFS+ which I haven’t done yet and there’s no cross-platform library for this as always.

What’s next

Since this is the last week before final evaluations I’ll spend the rest of the time debugging the application on Windows and therefore I’ll not add the missing switch to the HFS+ file system this summer.

Building for Windows works but it crashes for reasons I don’t yet know. I don’t have much experience debugging on Windows so this will be fun -.-. I’ve got gdb peda ❤ running on my Windows VM so I’ll most likely be just fine.

Performing a complete DB dump in LDIF format (1/3)

Posted by Ilias Stamatis on August 12, 2017 12:34 PM

Let’s dive a little bit deeper into the Directory Server’s internals this time. As I mentioned in my previous post I have started working lately on ticket #47567 and here I’m going to explain what it is about. I’ll split this post in 3 parts for easier reading. I’ll first explain some key concepts about how data are actually stored in Directory Server, and then I’ll talk about what we want to achieve.

Database backend & Berkeley DB

The database backend of Directory Server is implemented as a layer above the Berkeley DB storage manager (BDB). Berkeley DB takes care of lower level functions such as maintaining the B-Trees, transaction logging, page pool management, recovery and page-level locking. The sever backend on the other hand handles higher level functions such as indexing, query optimization, query execution, bulk load, archive and restore, entry caching and record-level locking. Berkeley DB is not a relational database. It stores data in a key / value format.

DB files & dbscan

The BDB files used by DS can be found under /var/lib/dirsrv/slapd-instance/db/backend.

For example here are the database files of the userRoot backend for a DS instance named localhost:

[root@fedorapc db]# ls -l /opt/dirsrv/var/lib/dirsrv/slapd-localhost/db/userRoot
total 276
-rw-------. 1 dirsrv dirsrv 16384 Jul 31 00:47 aci.db
-rw-------. 1 dirsrv dirsrv 16384 Aug 4 17:01 ancestorid.db
-rw-------. 1 dirsrv dirsrv 16384 Jul 31 00:47 cn.db
-rw-------. 1 dirsrv dirsrv 51 Aug 10 14:03 DBVERSION
-rw-------. 1 dirsrv dirsrv 16384 Aug 4 17:01 entryrdn.db
-rw-------. 1 dirsrv dirsrv 16384 Aug 4 17:01 entryusn.db
-rw-------. 1 dirsrv dirsrv 16384 Aug 4 17:01 id2entry.db
-rw-------. 1 dirsrv dirsrv 16384 Aug 2 16:22 member.db
-rw-------. 1 dirsrv dirsrv 16384 Jul 31 00:47 memberOf.db
-rw-------. 1 dirsrv dirsrv 16384 Jul 31 00:47 nsuniqueid.db
-rw-------. 1 dirsrv dirsrv 16384 Aug 4 17:01 numsubordinates.db
-rw-------. 1 dirsrv dirsrv 16384 Aug 2 15:48 objectclass.db
-rw-------. 1 dirsrv dirsrv 16384 Aug 4 17:01 owner.db
-rw-------. 1 dirsrv dirsrv 16384 Aug 4 17:01 parentid.db
-rw-------. 1 dirsrv dirsrv 16384 Aug 4 17:01 seeAlso.db
-rw-------. 1 dirsrv dirsrv 16384 Jul 31 00:47 sn.db
-rw-------. 1 dirsrv dirsrv 16384 Aug 4 17:01 uid.db
-rw-------. 1 dirsrv dirsrv 16384 Aug 4 17:01 uniquemember.db

At this point, let’s introduce dbscan. dbscan is a command line tool (provided by DS) that scans a database file and dumps its contents.

DB files: index files

Now, most of the above listed files are index files, used to speed up DS searches. For example sn.db is the index file used for the “sn” attribute (surname). We can see its contents by using dbscan:

[root@fedorapc userRoot]# dbscan -f sn.db 

Its output is not that lengthy because only 2 sn attributes exist in the database so far. In a similar way, cn.db is the index file for the cn attribute (common name) etc.

DB files: id2entry.db

Let’s now see a very important file; id2entry.db. Here is where the actual data are stored. Every single directory entry is stored as key – value record in this file, by having its ID used as the key, and the actual data as the value.

One useful feature of dbscan is that it accepts many options. If we do “dbscan -f id2entry.db”, dbscan will list all directory entries. Instead, we can just display a single entry if we wish by using the -K option.

So, to print entry with ID 3:

[root@fedorapc userRoot]# dbscan -K 3 -f id2entry.db
id 3
    rdn: ou=Groups
    objectClass: top
    objectClass: organizationalunit
    ou: Groups
    nsUniqueId: 88c7da02-6b3f11e7-a286de5a-31abe958
    createTimestamp: 20170717223028Z
    modifyTimestamp: 20170717223028Z
    parentid: 1
    entryid: 3
    numSubordinates: 5

Important things to notice:

  • All real attributes are being displayed, including operational attributes.
  • The DN of an entry is not stored in the database, only its RDN.
  • Every entry (except for the root entry), has a parentid attribute wich links to its parent.
  • There’s a numSubordinates attribute, indicating how many children this entry has. If this attribute is absent, it means that the entry has no children.

We will see why these observations matter later. For now, let’s continue with another database file.

DB files: entryrdn.db

Lastly, let’s take a look at entryrdn.db. I’ll not list the whole output, but just a short part instead, in order to understand how we can read this file.

 ID: 3; RDN: "ou=Groups"; NRDN: "ou=groups"
 ID: 6; RDN: "cn=Accounting Managers"; NRDN: "cn=accounting managers"
 ID: 6; RDN: "cn=Accounting Managers"; NRDN: "cn=accounting managers"
 ID: 3; RDN: "ou=Groups"; NRDN: "ou=groups"
 ID: 7; RDN: "cn=HR Managers"; NRDN: "cn=hr managers"
 ID: 7; RDN: "cn=HR Managers"; NRDN: "cn=hr managers"

Every line displays the ID of the entry, the RDN of the entry, and the normalized RDN (all lowercase, no extra spaces etc.) of the entry. Additionally, we can see some more information. The 3 in the first line it means that the following entry is the one with id 3. C3 means that the following entry is a child of the entry with id 3. P6 means that the following entry is the parent of the entry with id 6. So in this way we can interpret the whole file.

In the second part of this post we will discuss about what we really want to achieve here; perform a complete database dump in valid LDIF format.


Part 2 is out: Performing a complete DB dump in LDIF format (2/3)

Week ten: Summer of coding report

Posted by squimrel on August 08, 2017 11:13 PM

The picture below includes all the UI changes that were made:


The portable media device which is made bootable using the MediaWriter does not support persistent storage if UEFI is enabled on the target computer or if the target computer is a Mac.

To enable support for those systems the overlay switch has to be added in a couple more files which are inside of .img files.

To be specific the efiboot.img is a FAT file system and the macboot.img is an HFS+ file system. On those two file systems the boot.cfg files need to be modified in-place. To do that the corresponding file system needs to be told the new size of that file.

Let’s see how far we’ll get until next week.

Another GSoC Update

Posted by Ilias Stamatis on August 08, 2017 09:00 AM

A small update on what I have been doing lately on the project.

Here are some tickets that I have worked on in the C code base:

  • #48185 referint-logchanges is not implemented in referint plugin
  • #49309 Referential Integrity does not perform proper syntax checking on referint-update-delay
  • #49329 Provide descriptive error message when neither suffix nor backend is specified for USN cleanup task
  • #49315 Add warning on startup if unauthenticated binds are enabled.

And here’s a few on the lib389 side:

  • #46 dsconf support for schema reload plugin — This one allows us to dynamically reload the schema while the server is running.
  • #45 dsconf support for rootdn access — With this plugin we enforce access control on the Directory Manger.

Since last week I have been working on another issue: https://pagure.io/389-ds-base/issue/47567

This last one is by far the most interesting ticket I have been working on this summer, since it has helped me to understand a lot of the server internals. It’s also a tricky one so it will take some time. As well, it has led to some interesting discussions in the list. In my next blog post I’ll write more details on this and I’ll explain what it is about.

Happy coding!

GSoC: Report of bug fixes

Posted by David Carlos on August 05, 2017 05:00 PM

This post is just a report of some bug fixes, done on the last released version of kiskadee. The version 0.2.3 , is the last release before the development of our API, and we decided to release it now, because some of the issues that we had fixed could disrupt the API development. The list of issues that we have fixed are:

  • #18 : Use the download method, inside kiskadee.util, to download stuff from the Internet.
  • #25 : sqlalchemy crash when more than two plugins are active.
  • #31 : In some analysis, the flawfinder parser gets into a infinite loop.
  • #32 : The temporary directory created by Docker, is not been removed.
  • #33 : Rename the plugin package to fetcher.
  • #35 : Execute runner and monitor as separate processes.
  • #37 : Anitya fails to transform a fedmsg event, on a python dictionary.
  • #38 : The Docker sdk for python cut the analysis results, when the result is too long.

Of these issues, what matters most is the #35 and #38. With the implementation of the issue #35, now the monitor and the runner component, runs in separate processes, allowing a better use of the resources of the OS. Being a process, now the runner component can run each analyzer concurrently, instead of sequentially, what will increase the speed with which kiskadee run the analysis. The issue #38 was a a bug that we found on the Docker sdk for python. When the output of a static analyzer was too long, the Docker sdk was cutting of the analysis result, and we were saving a incomplete analysis on the database. This was causing the #31 bug, because the flawfinder parser was not being able to parse a incomplete analysis.

Now we will start the kiskadee API, that must be released on version 0.3.0. With the API, we will be able to make available all the analysis done by kiskadee in a standard way, allowing other tools to interact with our database.

So you want to script gdb with python …

Posted by William Brown on August 03, 2017 02:00 PM

So you want to script gdb with python …

Gdb provides a python scripting interface. However the documentation is highly technical and not at a level that is easily accessible.

This post should read as a tutorial, to help you understand the interface and work toward creating your own python debuging tools to help make gdb usage somewhat “less” painful.

The problem

I have created a problem program called “naughty”. You can find it here .

You can compile this with the following command:

gcc -g -lpthread -o naughty naughty.c

When you run this program, your screen should be filled with:

thread ...
thread ...
thread ...
thread ...
thread ...
thread ...

It looks like we have a bug! Now, we could easily see the issue if we looked at the C code, but that’s not the point here - lets try to solve this with gdb.

gdb ./naughty
(gdb) run
[New Thread 0x7fffb9792700 (LWP 14467)]
thread ...

Uh oh! We have threads being created here. We need to find the problem thread. Lets look at all the threads backtraces then.

Thread 129 (Thread 0x7fffb3786700 (LWP 14616)):
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff78e936f in clone () from /lib64/libc.so.6

Thread 128 (Thread 0x7fffb3f87700 (LWP 14615)):
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff78e936f in clone () from /lib64/libc.so.6

Thread 127 (Thread 0x7fffb4788700 (LWP 14614)):
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff78e936f in clone () from /lib64/libc.so.6


We have 129 threads! Anyone of them could be the problem. We could just read these traces forever, but that’s a waste of time. Let’s try and script this with python to make our lives a bit easier.

Python in gdb

Python in gdb works by bringing in a copy of the python and injecting a special “gdb” module into the python run time. You can only access the gdb module from within python if you are using gdb. You can not have this work from a standard interpretter session.

We can access a dynamic python runtime from within gdb by simply calling python.

(gdb) python
>print("hello world")
>hello world

The python code only runs when you press Control D.

Another way to run your script is to import them as “new gdb commands”. This is the most useful way to use python for gdb, but it does require some boilerplate to start.

import gdb

class SimpleCommand(gdb.Command):
    def __init__(self):
        # This registers our class as "simple_command"
        super(SimpleCommand, self).__init__("simple_command", gdb.COMMAND_DATA)

    def invoke(self, arg, from_tty):
        # When we call "simple_command" from gdb, this is the method
        # that will be called.
        print("Hello from simple_command!")

# This registers our class to the gdb runtime at "source" time.

We can run the command as follows:

(gdb) source debug_naughty.py
(gdb) simple_command
Hello from simple_command!

Solving the problem with python

So we need a way to find the “idle threads”. We want to fold all the threads with the same frame signature into one, so that we can view anomalies.

First, let’s make a “stackfold” command, and get it to list the current program.

class StackFold(gdb.Command):
def __init__(self):
    super(StackFold, self).__init__("stackfold", gdb.COMMAND_DATA)

def invoke(self, arg, from_tty):
    # An inferior is the 'currently running applications'. In this case we only
    # have one.
    inferiors = gdb.inferiors()
    for inferior in inferiors:


To reload this in the gdb runtime, just run “source debug_naughty.py” again. Try running this: Note that we dumped a heap of output? Python has a neat trick that dir and help can both return strings for printing. This will help us to explore gdb’s internals inside of our program.

We can see from the inferiors that we have threads available for us to interact with:

class Inferior(builtins.object)
 |  GDB inferior object
 |  threads(...)
 |      Return all the threads of this inferior.

Given we want to fold the stacks from all our threads, we probably need to look at this! So lets get one thread from this, and have a look at it’s help.

inferiors = gdb.inferiors()
for inferior in inferiors:
    thread_iter = iter(inferior.threads())
    head_thread = next(thread_iter)

Now we can run this by re-running “source” on our script, and calling stackfold again, we see help for our threads in the system.

At this point it get’s a little bit less obvious. Gdb’s python integration relates closely to how a human would interact with gdb. In order to access the content of a thread, we need to change the gdb context to access the backtrace. If we were doing this by hand it would look like this:

(gdb) thread 121
[Switching to thread 121 (Thread 0x7fffb778e700 (LWP 14608))]
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff78e936f in clone () from /lib64/libc.so.6

We need to emulate this behaviour with our python calls. We can swap to the thread’s context with:

class InferiorThread(builtins.object)
 |  GDB thread object
 |  switch(...)
 |      switch ()
 |      Makes this the GDB selected thread.

Then once we are in the context, we need to take a different approach to explore the stack frames. We need to explore the “gdb” modules raw context.

inferiors = gdb.inferiors()
for inferior in inferiors:
    thread_iter = iter(inferior.threads())
    head_thread = next(thread_iter)
    # Move our gdb context to the selected thread here.

Now that we have selected our thread’s context, we can start to explore here. gdb can do a lot within the selected context - as a result, the help output from this call is really large, but it’s worth reading so you can understand what is possible to achieve. In our case we need to start to look at the stack frames.

To look through the frames we need to tell gdb to rewind to the “newest” frame (ie, frame 0). We can then step down through progressively older frames until we exhaust. From this we can print a rudimentary trace:


# Reset the gdb frame context to the "latest" frame.
# Now, work down the frames.
cur_frame = gdb.selected_frame()
while cur_frame is not None:
    # get the next frame down ....
    cur_frame = cur_frame.older()
(gdb) stackfold

Great! Now we just need some extra metadata from the thread to know what thread id it is so the user can go to the correct thread context. So lets display that too:


# These are the OS pid references.
(tpid, lwpid, tid) = head_thread.ptid
# This is the gdb thread number
gtid = head_thread.num
print("tpid %s, lwpid %s, tid %s, gtid %s" % (tpid, lwpid, tid, gtid))
# Reset the gdb frame context to the "latest" frame.
(gdb) stackfold
tpid 14485, lwpid 14616, tid 0, gtid 129

At this point we have enough information to fold identical stacks. We’ll iterate over every thread, and if we have seen the “pattern” before, we’ll just add the gdb thread id to the list. If we haven’t seen the pattern yet, we’ll add it. The final command looks like:

def invoke(self, arg, from_tty):
    # An inferior is the 'currently running applications'. In this case we only
    # have one.
    stack_maps = {}
    # This creates a dict where each element is keyed by backtrace.
    # Then each backtrace contains an array of "frames"
    inferiors = gdb.inferiors()
    for inferior in inferiors:
        for thread in inferior.threads():
            # Change to our threads context
            # Get the thread IDS
            (tpid, lwpid, tid) = thread.ptid
            gtid = thread.num
            # Take a human readable copy of the backtrace, we'll need this for display later.
            o = gdb.execute('bt', to_string=True)
            # Build the backtrace for comparison
            backtrace = []
            cur_frame = gdb.selected_frame()
            while cur_frame is not None:
                cur_frame = cur_frame.older()
            # Now we have a backtrace like ['pthread_cond_wait@@GLIBC_2.3.2', 'lazy_thread', 'start_thread', 'clone']
            # dicts can't use lists as keys because they are non-hashable, so we turn this into a string.
            # Remember, C functions can't have spaces in them ...
            s_backtrace = ' '.join(backtrace)
            # Let's see if it exists in the stack_maps
            if s_backtrace not in stack_maps:
                stack_maps[s_backtrace] = []
            # Now lets add this thread to the map.
            stack_maps[s_backtrace].append({'gtid': gtid, 'tpid' : tpid, 'bt': o} )
    # Now at this point we have a dict of traces, and each trace has a "list" of pids that match. Let's display them
    for smap in stack_maps:
        # Get our human readable form out.
        o = stack_maps[smap][0]['bt']
        for t in stack_maps[smap]:
            # For each thread we recorded
            print("Thread %s (LWP %s))" % (t['gtid'], t['tpid']))

Here is the final output.

(gdb) stackfold
Thread 129 (LWP 14485))
Thread 128 (LWP 14485))
Thread 127 (LWP 14485))
Thread 10 (LWP 14485))
Thread 9 (LWP 14485))
Thread 8 (LWP 14485))
Thread 7 (LWP 14485))
Thread 6 (LWP 14485))
Thread 5 (LWP 14485))
Thread 4 (LWP 14485))
Thread 3 (LWP 14485))
#0  0x00007ffff7bc38eb in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000004007bc in lazy_thread (arg=0x7fffffffdfb0) at naughty.c:19
#2  0x00007ffff7bbd3a9 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff78e936f in clone () from /lib64/libc.so.6

Thread 2 (LWP 14485))
#0  0x00007ffff78d835b in write () from /lib64/libc.so.6
#1  0x00007ffff78524fd in _IO_new_file_write () from /lib64/libc.so.6
#2  0x00007ffff7854271 in __GI__IO_do_write () from /lib64/libc.so.6
#3  0x00007ffff7854723 in __GI__IO_file_overflow () from /lib64/libc.so.6
#4  0x00007ffff7847fa2 in puts () from /lib64/libc.so.6
#5  0x00000000004007e9 in naughty_thread (arg=0x0) at naughty.c:27
#6  0x00007ffff7bbd3a9 in start_thread () from /lib64/libpthread.so.0
#7  0x00007ffff78e936f in clone () from /lib64/libc.so.6

Thread 1 (LWP 14485))
#0  0x00007ffff7bbe90d in pthread_join () from /lib64/libpthread.so.0
#1  0x00000000004008d1 in main (argc=1, argv=0x7fffffffe508) at naughty.c:51

With our stackfold command we can easily see that threads 129 through 3 have the same stack, and are idle. We can see that tread 1 is the main process waiting on the threads to join, and finally we can see that thread 2 is the culprit writing to our display.

My solution

You can find my solution to this problem as a reference implementation here .

Week nine: Summer of coding report

Posted by squimrel on August 02, 2017 01:31 AM

Overlay partition works. GUI stuff pending. EFI coming up.

The PR I’m working on can now create a partition with a FAT32 file system and an overlay file on POSIX compliant systems.

The FAT32 file system is well known for the 4GB maximum file size limitation. Therefore any devices that have a size greater than (iso image size + 4GB) will also only store a 4GB overlay file. Therefore the rest of the additional FAT32 file system may be used by the end user.
Note that some systems like Mac will not be able to mount that additional FAT32 partition because the partition entry was only added to the MBR partition table and the isohybrid layout also deploys a GPT partition table and an Apple partition table which will be preferred by some systems which therefore can’t see that partition.

Tomorrow I’ll get the UI display that the overlay partition is being written. There have also been thoughts on making the user able to skip the write check but probably I’ll come back to that once EFI boot works.

More information regarding to EFI will be incoming next week.

GSoC2017 (Fedora) — Week 5-8

Posted by Mandy Wang on July 27, 2017 04:31 PM

I continued to do the work about the migration of the Plinth. I summarized the packages which are needed by Plinth and their alternates in Fedora.

Sometimes there will be a little different in details between the packages which have the similar name in Fedora and Debian, I’ll check them one by one and find better solutions.

About libjs-bootstrap and libjs-modernizr, I can’t find a suitable alternate, so I extracted the .deb package in Fedora and put them under the javascript/.

As you know, most foreign websites are blocked in the Chinese mainland, which include all of the Google services, many frequently-used IM software, Wikipedia and so on, so we have to use VPN or other ways to connect to the servers located abroad to load these websites. But lots of VPN (include mine) in China are blocked unexpectedly in July because of some political reasons, I can’t update my blogs and codes until I find a new way to “over the wall”, so my work is delayed this moth.

And because my mentor also lives in China and he refuses to use the non-free software in China, we use gmail and telegram to communicate in normal times, so I losed contact with my mentor for many days. As soon as we are able to contact with each other in the last few days, I told him the work I did this month and the question I met. For this case, we are thinking about scaling down our project temporarily and delaying some unimportant work, I will puts forward concrete solving plans with my mentor as soon as possible.

Week eight: Summer coding report

Posted by squimrel on July 25, 2017 06:08 AM

Format a partition with the FAT32 filesystem

Modern file systems are not simple at all. FAT32 was introduced in 1996 and FAT is much older than that so it’s much simpler but still not very intuitive and easy to use. I couldn’t find a library that provides the functionality to format partitions which is weird. On Linux we usually use the mkfs.fat utility which is part of the dosfstools but dosfstools is not laid out to be used as a library.

There’s obviously the specification which I could implement but I only need it for one specific use case so that seemed overkill. The layout needed is basically always the same and looks like this when generated by mkfs.fat (ignoring empty space).

00000000: eb58 906d 6b66 732e 6661 7400 0208 2000  .X.mkfs.fat... .
00000010: 0200 0000 00f8 0000 3e00 f700 0098 2e00 ........>.......
00000020: 0028 c000 f82f 0000 0000 0000 0200 0000 .(.../..........
00000030: 0100 0600 0000 0000 0000 0000 0000 0000 ................
00000040: 8000 29fe caaf de4f 5645 524c 4159 2020 ..)....OVERLAY
00000050: 2020 4641 5433 3220 2020 0e1f be77 7cac FAT32 ...w|.
00000060: 22c0 740b 56b4 0ebb 0700 cd10 5eeb f032 ".t.V.......^..2
00000070: e4cd 16cd 19eb fe54 6869 7320 6973 206e .......This is n
00000080: 6f74 2061 2062 6f6f 7461 626c 6520 6469 ot a bootable di
00000090: 736b 2e20 2050 6c65 6173 6520 696e 7365 sk. Please inse
000000a0: 7274 2061 2062 6f6f 7461 626c 6520 666c rt a bootable fl
000000b0: 6f70 7079 2061 6e64 0d0a 7072 6573 7320 oppy and..press
000000c0: 616e 7920 6b65 7920 746f 2074 7279 2061 any key to try a
000000d0: 6761 696e 202e 2e2e 200d 0a00 0000 0000 gain ... .......
000001f0: 0000 0000 0000 0000 0000 0000 0000 55aa ..............U.
00000200: 5252 6141 0000 0000 0000 0000 0000 0000 RRaA............
000003e0: 0000 0000 7272 4161 fdf8 1700 0200 0000 ....rrAa........
000003f0: 0000 0000 0000 0000 0000 0000 0000 55aa ..............U.
00000c00: eb58 906d 6b66 732e 6661 7400 0208 2000 .X.mkfs.fat... .
00000c10: 0200 0000 00f8 0000 3e00 f700 0098 2e00 ........>.......
00000c20: 0028 c000 f82f 0000 0000 0000 0200 0000 .(.../..........
00000c30: 0100 0600 0000 0000 0000 0000 0000 0000 ................
00000c40: 8000 29fe caaf de4f 5645 524c 4159 2020 ..)....OVERLAY
00000c50: 2020 4641 5433 3220 2020 0e1f be77 7cac FAT32 ...w|.
00000c60: 22c0 740b 56b4 0ebb 0700 cd10 5eeb f032 ".t.V.......^..2
00000c70: e4cd 16cd 19eb fe54 6869 7320 6973 206e .......This is n
00000c80: 6f74 2061 2062 6f6f 7461 626c 6520 6469 ot a bootable di
00000c90: 736b 2e20 2050 6c65 6173 6520 696e 7365 sk. Please inse
00000ca0: 7274 2061 2062 6f6f 7461 626c 6520 666c rt a bootable fl
00000cb0: 6f70 7079 2061 6e64 0d0a 7072 6573 7320 oppy and..press
00000cc0: 616e 7920 6b65 7920 746f 2074 7279 2061 any key to try a
00000cd0: 6761 696e 202e 2e2e 200d 0a00 0000 0000 gain ... .......
00000df0: 0000 0000 0000 0000 0000 0000 0000 55aa ..............U.
00004000: f8ff ff0f ffff ff0f f8ff ff0f 0000 0000 ................
00603000: f8ff ff0f ffff ff0f f8ff ff0f 0000 0000 ................
00c02000: 4f56 4552 4c41 5920 2020 2008 0000 0666 OVERLAY ....f
00c02010: f64a f64a 0000 0666 f64a 0000 0000 0000 .J.J...f.J......

I could go more in depth but if you’re interested you can always read the specification or the source code I wrote.

Some values vary based on the size of the partition like the number of sectors per cluster and the size of the fat data structure. It might also be smart to set a unique volume id.

Therefore I used the layout generated by mkfs.fat and only made minimal changes to it. The result is part of this PR.

GSoC: making static analysis with kiskadee

Posted by David Carlos on July 25, 2017 12:00 AM

This post will cover some static analysis made with kiskadee 0.2.2 [1], and what are our plans for the next major release. We will present two packages as example, demonstrating the use of the plugins and the analyzers available. Currently, kiskadee uses four static analyzers:

  • Cppcheck, version 1.79 [2]
  • Flawfinder, version 1.31 [3]
  • Clang-analyzer, version 3.9.1 [4]
  • Frama-c, version 1.fc25 [5]

In the production environment two plugins are running: the debian plugin and the anitya plugin. The anitya plugin is our mainly plugin, because with it we can monitor and analyze several upstream that are packed in Fedora. We already have talked deeply about this two plugins, here on the blog. The projets that we will show here was analyzed by the cppcheck and flawfinder tools. These projects are the Xe [6] project, monitored by the anitya plugin, and the acpitool [7] project, monitored by the debian plugin.

The Xe project was, initially, monitored by the Anitya service. The upstream released a new version, and this event was published on the fedmsg bus, by the Anitya service. The Figure One, shows the new release of the Xe project, and the Figure Two, shows the event published on fedmsg.

Figure One: New Xe release.

Figure Two: The new release event.

The anitya plugin behavior is presented on the Figure Four. Every time that the Anitya service, publish a new release event, the fedmsg-hub daemon will receive it and send it to the anitya plugin. If the new release is hosted in a place where the anitya plugin can retrieve it source code, a analysis will be made.

Figure Four: anitya plugin behavior.

The Figure Five shows a static analysis made by the flawfinder analyzer, on the Xe source code. This analysis was only possible because the anitya plugin can receive new release events published on fedmsg. On this post, we talk how this integration was made. The Figure Six shows a static analysis made by the cppcheck analyzer, also on the Xe source code.

Figure Five: Flawfinder analysis.

Figure Six: Cppcheck analysis.

The second project that we analyzed was the acpitool package, that was monitored by the debian plugin. The source code of this package was retrieved using the dget tool, available in the devscripts package. The Figure Seven presents part of the analysis made by cppcheck.

Figure Six: Cppcheck analysis.

All the analysis presented here, can be found on two backups of the kiskadee database, available here on the blog, on the following links:

You can download these backups, import then to a postgresql database, and check several other analysis that we already made. Note that this backups are before the architecture change, that we talked on the post Improvements in kiskadee architecture.

The next major release of kiskadee will bring something that we believe that will permit us to integrates kiskadee with other tools. We will start the development of a API, that will provide several endpoints to consume our database. This API is a new step to reach one of our objectives, that is facilitate the process of integrate static analyzers on the cycle of development of software.


GSoC: Improvements in kiskadee architecture

Posted by David Carlos on July 24, 2017 05:00 PM

Today I have released kiskadee 0.2.2. This minor release brings some architecture improvements, fix some bugs in the plugins and improve the log messages format. Initially, lets take a look in the kiskadee architecture implemented on the 0.2 release.

In this architecture we have two queues. One used by plugins, called packages_queue, to queue packages that should be analyzed. This queue is consumed by monitor component, to check if the enqueued package was not analyzed. The other queue, called analysis_queue, is consumed by the runner component, in order to receive from monitor, packages that must be analyzed. If a dequeued package not exists in the database, the monitor component will save it, and enqueue it in the analysis_queue queue. When a analysis is made, the runner component updates the package analysis in the database. Currently, kiskadee only generate analysis for projects implemented in C/C++, and this was a scope decision made by the kiskadee community. Analyze only projects implemented in this languages, makes several monitored packages not be analyzed, and this behavior, with the architecture 0.2, lead us to a serious problem: A package is saved in the database even if a analysis is not generated for it. This was making our database storing several packages without a static analysis of source code, turning on kiskadee a less useful tool for the ones that want to continuously check the quality of some projects.

The release 0.2.2 fix this architecture issue, creating a new queue used by runner component to send back to monitor, packages that was successfully analyzed. In these implementation, we remove from the runner source code all database operations, centering in monitor the responsibility of interact with the database. Only packages enqueued in the results_queue, will be saved in the database by the monitor component.

We also add a limit to all kiskadee queues, once that the rate that a plugin enqueue a package, is greater than the rate the runner run a analysis. With this limit, all queues will always have at most ten elements. This will make the volume of monitored projects proportional to the analyzed projects. The log messages was also improved, facilitating the tool debug. Some bug fix in Debian plugin was also made, and now some packages that were been missed, are been properly monitored. This architecture improvements make the behavior of kiskadee more stable, and this release is already running in a production environment.

Week seven: Summer coding report

Posted by squimrel on July 18, 2017 08:15 PM

The plan to get rid of all issues


The Virtual Disk Service can’t be used since I’m prohibited to try to pull Uuid.lib by the Fedora Packaging Guidelines.


I had a look at how to do this on macOS. The technical documentation in this area is even worse than on Windows in my humbly opinion.

The tool that works with partitions prefers to use the apple partition header if available. But I’d like to add a partition to the master boot record (mbr) partition table.


Quick refresher on why all the tools have trouble with the isohybrid layout.

For compatibility isohybrid fills three partition tables (mbr, gpt and apple) but I’d only like to add a primary partition to the mbr one. Basically it’s unclear to anyone which one to use since it’s not standard practice to use multiple partition tables.

The plan

Since there’s trouble with this on all three platforms which we target I’ve decided to manually add the partition to the mbr partition table.

This does work on Linux now but was a lot of trouble since it doesn’t integrate well with udisks and more debugging needs to be done to fix that.

It will definitely work on Windows and Mac next week if those two platforms don’t try to stop me. They always do.

I learned that writing things myself may be faster then trying to fix the world and whatever I do I should try to make the right decision early on because that saves a ton of time.

GSoC Update: Referint and More

Posted by Ilias Stamatis on July 12, 2017 03:48 PM

Last week I started working on dsconf support for referential integrity plug-in. Referential Integrity is a database mechanism that ensures relationships between related entries are maintained. For example, if a user’s entry is removed from the directory and Referential Integrity is enabled, the server also removes the user from any groups of which the user is a member.

While working on referint support, I discovered and reported a bug, as well as a previous request which was about replacing the plugin’s log file with standard DS log files. So I decided to take on those issues and delay the lib389 support until they are done. It was also my first attempt to do some work in the C codebase.

It turned out finally, that this logfile is not used for real server logging. It is actually part of how the plugin implements its asynchronous mode. So we couldn’t just simply get rid of it. After discussing this a bit with William, my mentor, I started a discussion on the mailing list about changing the implementation from using a file to using a queue. William even suggested to completely deprecate the asynchronous mode of referint. You can see the discussion here: https://lists.fedoraproject.org/archives/list/389-devel@lists.fedoraproject.org/thread/DB5YKUV4A2LVPPXP72OJ4KQC2H2B4G3W/?sort=date

Contrasting opinions were expressed in this debate and I’m waiting for a decision to be reached.

So, my goal at the moment is to try do some C work on the main codebase and move out of the lib389 framework for a while. This is not that easy though. Directory Server is pretty big; cloc returns more than half a million lines of code. Plus, because I don’t have much experience writing C, I’m now learning how to properly use gdb, ASan and other tools to make my life easier, while getting familiar with some parts of the C codebase.

At the same time, I’m also working on smaller python issues on lib389 such as:

So to conclude, at the moment I’m constantly jumping between issues, with my main goal being to start writing non-trivial C patches for the server.

Time safety and Rust

Posted by William Brown on July 11, 2017 02:00 PM

Time safety and Rust

Recently I have had the great fortune to work on this ticket . This was an issue that stemmed from an attempt to make clock performance faster. Previously, a call to time or clock_gettime would involve a context switch an a system call (think solaris etc). On linux we have VDSO instead, so we can easily just swap to the use of raw time calls.

The problem

So what was the problem? And how did the engineers of the past try and solve it?

DS heavily relies on time. As a result, we call time() a lot in the codebase. But this would mean context switches.

So a wrapper was made called “current_time()”, which would cache a recent output of time(), and then provide that to the caller instead of making the costly context switch. So the code had the following:

static time_t   currenttime;
static int      currenttime_set = 0;

    if ( !currenttime_set ) {
        currenttime_set = 1;

    time( &currenttime );
    return( currenttime );

current_time( void )
    if ( currenttime_set ) {
        return( currenttime );
    } else {
        return( time( (time_t *)0 ));

In another thread, we would poll this every second to update the currenttime value:

void *
time_thread(void *nothing __attribute__((unused)))
    PRIntervalTime    interval;

    interval = PR_SecondsToInterval(1);

    while(!time_shutdown) {
        csngen_update_time ();


So what is the problem here

Besides the fact that we may not poll accurately (meaning we miss seconds but always advance), this is not thread safe. The reason is that CPU’s have register and buffers that may cache both stores and writes until a series of other operations (barriers + atomics) occur to flush back out to cache. This means the time polling thread could update the clock and unless the POLLING thread issues a lock or a barrier+atomic, there is no guarantee the new value of currenttime will be seen in any other thread. This means that the only way this worked was by luck, and no one noticing that time would jump about or often just be wrong.

Clearly this is a broken design, but this is C - we can do anything.

What if this was Rust?

Rust touts mulithread safety high on it’s list. So lets try and recreate this in rust.

First, the exact same way:

use std::time::{SystemTime, Duration};
use std::thread;

static mut currenttime: Option<SystemTime> = None;

fn read_thread() {
    let interval = Duration::from_secs(1);

    for x in 0..10 {
        let c_time = currenttime.unwrap();
        println!("reading time {:?}", c_time);

fn poll_thread() {
    let interval = Duration::from_secs(1);

    for x in 0..10 {
        currenttime = Some(SystemTime::now());
        println!("polling time");

fn main() {
    let poll = thread::spawn(poll_thread);
    let read = thread::spawn(read_thread);

Rust will not compile this code.

> rustc clock.rs
error[E0133]: use of mutable static requires unsafe function or block
  --> clock.rs:13:22
13 |         let c_time = currenttime.unwrap();
   |                      ^^^^^^^^^^^ use of mutable static

error[E0133]: use of mutable static requires unsafe function or block
  --> clock.rs:22:9
22 |         currenttime = Some(SystemTime::now());
   |         ^^^^^^^^^^^ use of mutable static

error: aborting due to 2 previous errors

Rust has told us that this action is unsafe, and that we shouldn’t be modifying a global static like this.

This alone is a great reason and demonstration of why we need a language like Rust instead of C - the compiler can tell us when actions are dangerous at compile time, rather than being allowed to sit in production code for years.

For bonus marks, because Rust is stricter about types than C, we don’t have issues like:

int c_time = time();

Which is a 2038 problem in the making :)

Week six: Summer coding report

Posted by squimrel on July 10, 2017 01:44 PM

Where we’re at

The Linux build is stalled until libblockdev switched from libparted to libfdisk.


Doing the partitioning work in Windows is way more complicated than it should be.

Ideally Virtual Disk Service (VDS) should be used to add and format the partition but the symbols that are needed to talk to the Virtual Disk Service COM interface are not present in MinGW most likely because it doesn’t have a shared library. That’s why I decided not to use VDS at first. I’ll come back to this later.

Currently diskpart.exe is used as an alternative since diskpart uses the VDS COM Interface. But diskpart has problems of itself since it gets in the way of locking and the documentation says you need to wait 15 seconds after it quit. Also talking to it is slow and reading its response is tedious.

There’s a newer API called Windows Storage Management API but it’s only available on Windows 8 and above so I haven’t looked at it.

There’s also an older set of tools that were introduced in Windows XP. In contrast to that VDS was introduced in Windows Vista. It provides a DeviceIoControl function with which you can do the partitioning work and there’s the Volume Management API for mounting.
But Windows XP only has one function for formatting a partition. It’s called SHFormatDrive but it got its own GUI and requires user interaction. Users on the internet argued back in the day that formatting a drive without user interaction would be evil.

On top of that it’s really hard to use the Windows XP API. It took me a long time to figure out how to use it to add a partition because certain things happen which you wouldn’t expect.

The reason why I started to use the Windows XP API in the first place is that adding a partition usingdiskpart.exe got in the way of disk locking. Up until now diskpart.exe was only used for restoring the drive which is totally separate from the writing process.

Virtual Disk Service

As mentioned linking against VDS is hard because one has to link against the static library since there’s no dynamic one provided by Microsoft in this case. But I now got a working dummy program that loads VDS via COM.

To accomplish this I had to look at the vds.h provided by Microsoft and figure out how to write my own minimal one. That also seems to be the approach of MinGW to avoid the licensing issue. The reason why I had to look at the vds.h at all is that there’re some GUIDs which aren’t documented and you just can’t guess.
Also another evil part is that I needed to get the static library Uuid.Lib to link against it. I’m not sure how exactly the licensing works in that case but I guess as long as it’s not in the repository and just pulled from the internet to build the Windows target it should be fine.
I’m just not sure where to best pull it from yet. Also it seems pretty evil (security-wise) to pull byte-code from the internet and include it in another binary.

GSoC: The evolution of Kiskadee models.

Posted by David Carlos on July 08, 2017 01:50 PM

The 0.1 release of Kiskadee [1] brought a minimal set of functionalities that permitted us to monitor some targets and analysis some packages, serving more as a proof of concept than as a real monitoring system. One of the limitations of the 0.1 release was that Kiskadee was saving all the analysis in a single field on the packages table. Every time that we need to generate a new analysis for a same package version, we appended the result on the analysis field. The next two code snippet was picked up from the Kiskadee source code.

reports = []
with kiskadee.helpers.chdir(path):
        kiskadee.logger.debug('ANALYSIS: Unpacked!')
        analyzers = plugin.analyzers()
        for analyzer in analyzers:
                kiskadee.logger.debug('ANALYSIS: running %s ...' % analyzer)
                analysis = kiskadee.analyzers.run(analyzer, path)
                firehose_report = kiskadee.converter.to_firehose(analysis,
kiskadee.logger.debug('ANALYSIS: DONE running %s' % analyzer)
return reports
all_analyses = '\n'.join(reports)

Note that we generate a analysis for each plugin analyzer, and append it in the reports array. In other part of the runner code we generate, using a join function, a single string with all the analysis made. A bad implementation to a issue that should be solved by changing the Kiskadee models in order to support several analysis for a same package version, and that was what we have done with the PR #24.With this pull request, Kiskadee will be able to save different analysis from a same package version made by different analyzers. A lot of refactoring was made in the Runner component, specially because was difficult to implement new tests for it. The responsibility of save a analysis was removed from the analyze method, and moved to the _save_source_analysis.

def _save_source_analysis(source_to_analysis, analysis, analyzer, session):
if analysis is None:
        return None

source_name = source_to_analysis['name']
source_version = source_to_analysis['version']

package = (
        .filter(kiskadee.model.Package.name == source_name).first()
version_id = package.versions[-1].id
_analysis = kiskadee.model.Analysis()
        _analyzer = session.query(kiskadee.model.Analyzer).\
            filter(kiskadee.model.Analyzer.name == analyzer).first()
        _analysis.analyzer_id = _analyzer.id
        _analysis.version_id = version_id
        _analysis.raw = analysis
except Exception as err:
        "The required analyzer was not registered on Kiskadee"

With this pull request two new models was added to Kiskadee, analysis and analyzers. The Diagram 1 presents the current database structure of Kiskadee.

Diagram 1: Kiskadee database structure.

With the PR #24 we will fix some other issues:

  • Docker not run properly on the Jenkins VM. Issue #23
    • Docker and selinux are not good friends so we have to change the way that we were creating a container volume to run the analyzers.
  • CI integration. Issue #7
    • We now have a Jenkins instance running our tests for every push made to the pagure repository. In the future we want to add a continuous deploy to every code that be merged on master branch.

RetroFlix / PI Switch Followup

Posted by Mo Morsi on July 05, 2017 07:22 PM

I've been trying to dedicate some cycles to wrapping up the Raspberry PI entertainment center project mentioned a while back. I decided to abandon the PI Switch idea as the original controller which was purchased for it just did not work properly (or should I say only worked sporadically/intermitantly). It being a cheap device bought online, it wasn't worth the effort to debug (funny enough I can't find the device on Amazon anymore, perhaps other people were having issues...).

Not being able to find another suitable gamepad to use as the basis for a snap together portable device, I bought a Rii wireless controller (which works great out of the box!) and dropped the project (also partly due to lack of personal interest). But the previously designed wall mount works great, and after a bit of work the PI now functions as a seamless media center.

Unfortunately to get it there, a few workarounds were needed. These are listed below (in no particular order).

<style> #rpi_setup li{ margin-bottom: 10px; } </style>
  • To start off, increase your GPU memory. This we be needed to run games with any reasonable performance. This can be accomplished through the Raspberry PI configuration interface.

    Rpi setup1 Rpi setup2

    Here you can also overclock your PI if your model supports it (v3.0 does not as evident w/ the screenshot, though there are workarounds)

  • If you are having trouble w/ the PI output resolution being too large / small for your tv, try adjusting the aspect ratio on your set. Previously mine was set to "theater mode", cutting off the edges of the device output. Resetting it to normal resolved the issue.

    Rpi setup3 Rpi setup5 Rpi setup4
  • To get the Playstation SixAxis controller working via bluetooth required a few steps.
    • Unplug your playstation (since it will boot by default when the controller is activated)
    • On the PI, run
              sudo bluetoothctl
    • Start the controller and watch for a new devices in the bluetoothctl output. Make note of the device id
    • Still in the bluetoothctl command prompt, run
              trust [deviceid]
    • In the Raspberry PI bluetooth menu, click 'make discoverable' (this can also be accomplished via the bluetoothctl command prompt with the discoverable on command) Rpi setup6
    • Finally restart the controller and it should autoconnect!
  • To install recent versions of Ruby you will need to install and setup rbenv. The current version in the RPI repos is too old to be of use (of interest for RetroFlix, see below)
  • Using mednafen requires some config changes, notabley to disable opengl output and enable SDL. Specifically change the following line from
          video.driver opengl
          video.driver sdl
    Unfortunately after alot of effort, I was not able to get mupen64 working (while most games start, as confirmed by audio cues, all have black / blank screens)... so no N64 games on the PI for now ☹
  • But who needs N64 when you have Nethack! ♥‿♥(the most recent version of which works flawlessly). In addition to the small tweaks needed to compile the latest version on Linux, inorder to get the awesome Nevanda tileset working, update include/config.h to enable XPM graphics:
        -/* # define USE_XPM */ /* Disable if you do not have the XPM library */
        +#define USE_XPM  /* Disable if you do not have the XPM library */
    Once installed, edit your nh/install/games/lib/nethackdir/NetHack.ad config file (in ~ if you installed nethack there), to reference the newtileset:
        -NetHack.tile_file: x11tiles
        +NetHack.tile_file: /home/pi/Downloads/Nevanda.xpm

Finally RetroFlix received some tweaking & love. Most changes were visual optimizations and eye candy (including some nice retro fonts and colors), though a workers were also added so the actual downloads could be performed without blocking the UI. Overall it's simple and works great, a perfect portal to work on those high scores!

That's all for now, look for some more updates on the ReFS front in the near future!

Week five: Summer coding report

Posted by squimrel on July 04, 2017 01:35 AM

About the issue of last week we decided to tell libblockdev to use libfdisk instead of libparted since there are rumors that they would like to do that anyways but I’m not working on that for now since we’ll have to make progress with the actual project at stake.


This week I worked on porting the FMW to Windows. To do that I had to build and package iso9660io for MinGW which really was not a nice thing to do.

Since I moved isomd5sum out of the FMW projects source code I had to build and package isomd5sum for MinGW too.

I got source code on my end that should be doing most of what’s needed to get persistent storage working on Windows but it’s using diskpart and I would like to move away from that tool since it messes up a lot of things.

My mentor warned me that the Windows C APIs for these things are terrible and he was totally right.

GSoC: How integrates fedmsg with Kiskadee

Posted by David Carlos on June 29, 2017 06:36 PM

If you want to know why we are using fedmsg [1] with Kiskadee [2] you can check this post, where I explain the reasons for such integration. Now we will cover the implementation of this integration and the current status of Kiskadee's architecture.

fedmsg-hub it's a daemon used to interact with the fedmsg bus, and with it we can receive and send messages from and to applications. If you have cloned Kiskadee repository, create the fedmsg-hub configurations files and run the daemon.

sudo mkdir -p /etc/fedmsg.d/
sudo cp util/base.py util/endpoints.py  /etc/fedmsg.d/
sudo cp util/anityaconsumer.py /etc/fedmsg.d/
pip install -e .
PYTHONPATH=`pwd` fedmsg-hub

If everything goes ok, fedmsg-hub will start, and will use our AnityaConsumer class to consume the fedmsg bus. The endpoints.py file will tell to fedmsg-hub a list of addresses from which fedmsg can send messages, in our case, this endpoint will point to a Anitya server, where new project releases are published. Basically, a fedmsg-hub consumer is a class that inherits from fedmsg.consumers.FedmsgConsumer and implements a consume method.

The fedmsg-hub daemon runs in a separate process of Kiskadee, so we need some mechanism to make the consumer send the bus messages to Kiskadee. To do this we are using ZeroMQ [3] as a pub/sub library, in a way that the consumer publishes the incoming messages, and the anitya plugin on Kiskadee consumes this messages.

Once the message arrives in Kiskadee, the default life cycle of a source will occur.

  • The Anitya plugin will queue the source.
  • the monitor component will dequeue the source.
    • If the source version not exists on the database, save it, and queue the source to the runner component.
    • If the source version exists on the database, do nothing.
  • the runner component will dequeue the source.
  • the runner component will run a analysis, and save the result on the database.

On this post we have a better description of the Kiskadee architecture. Each service that publishes on fedmsg have several topics, each on related to a specific event. Here you can check a list of topics where Anitya publish messages. Our consumer have only one responsibility: When a message arrives, publish it on ZeroMQ server, on the anitya topic. Kiskadee will be listen to this topic, and will receive the message. Let's take a look on the consumer code:

class AnityaConsumer(fedmsg.consumers.FedmsgConsumer):
"""Consumer used by fedmsg-hub to subscribe to fedmsg bus."""

topic = 'org.release-monitoring.prod.anitya.project.version.update'
config_key = 'anityaconsumer'
validate_signatures = False

def __init__(self, *args, **kw):
        """Anityaconsumer constructor."""
        super().__init__(*args, *kw)
        context = zmq.Context()
        self.socket = context.socket(zmq.PUB)

def consume(self, msg):
        """Consume events from fedmsg-hub."""
        self.socket.send_string("%s %s" % ("anitya", str(msg)))

ZeroMQ is also used by fedmsg, so the topic variable can cover more than one Anitya topic. If we want to receive all messages published by Anitya, we just need to use a more generic topic value:

topic = 'org.release-monitoring.prod.anitya.project.*'

MemberOf Support Is Complete

Posted by Ilias Stamatis on June 28, 2017 10:09 PM

lib389 support for MemberOf plug-in is finally complete!

Here’s what I have implemented so far regarding this issue:

  • Code for configuring the plug-in using our LDAP ORM system.
  • The wiring in the dsconf tool so we can manage the plug-in from the command line.
  • Some generic utility functions to use for testing this and all future plug-ins.
  • Functional tests.
  • Command-line tests.
  • A new Task class for managing task entries based on the new lib389 code.
  • The fix-up task for MemberOf.

I have proudly written a total of 40 test cases; 8 functional and 32 cli tests.

All of my commits that have been merged into the project up to this point – not only related to MemberOf, but in general – can be found here: https://pagure.io/lib389/commits/master?author=stamatis.iliass%40gmail.com

As I’ve said again in a previous post, I have additionally discovered and reported a few bugs related to the C code of MemberOf. I have written reproducers for some of them too (test cases that prove the erroneous behavior).

At the same time, I was working on USN plug-in support as well. This is about tracking modifications to the database by using Update Sequence Numbers. When enabled, sequence numbers are assigned to an entry whenever a write operation is performed against the entry. This value is stored in a special operational attribute for each entry, called “entryusn”. The process for me was pretty much the same; config code, dsconf wiring, tests, etc. This work is also almost complete and hopefully it will be merged soon as well.

To conclude, during this first month of Google Summer of Code, I have worked on MemberOf and USN plug-ins integration, did code reviews on other team members’ patches, and worked on other small issues.

GSoC: Why integrates fedmsg with Kiskadee

Posted by David Carlos on June 28, 2017 06:20 PM

On this post we will talk why we decides to integrate Kiskadee [1] with fedmsg [2], and how this integration will enable us to easily monitors the source code of several different projects. The next post will explain how this integration was don.

Exists a initiative on Fedora called Anitya [3]. Anitya is a project version monitoring system, that monitor upstream releases and broadcast them on fedmsg. The registration of a project on Anitya it's quite simple. You will need to inform the homepage, which system used to host the project, and some other informations required by the system. After the registration process, Anitya will check, every day, if a new release of the project were released. If so, it will publish the new release, using a JSON format, on fedmsg. In the context of anitya, the systems used to host projects are called backends, and you can check all the supported backend on this link https://release-monitoring.org/about.

The Fedora infrastructure have several different services that need to talk to each other. One simple example is the AutoQA service, that listen to some events triggered by the fedpkg library. If we have only two services interacting the problem is minimal, but when several applications request and response to other several applications, the problem becomes huge. fedmsg (FEDerated MeSsaGe bus) is a python package and API defining a brokerless messaging architecture to send and receive messages to and from applications. Anitya uses this messaging architecture to publish on the bus the new releases of registered projects. Any application that is subscribed to the bus, can retrieve this events. Note that fedmsg is a whole architecture, so we need some mechanism to subscribe to the bus, and some mechanism to publish on the bus. fedmsg-hub it's a daemon used to interact with the fedmsg bus, and it's been used by Kiskadee to consume the new releases published by anitya.

Once Kiskadee can receive notifications that a new release of some project was made, and this project will be packed to Fedora, we can trigger a analysis without having to watch directly to Fedora repositories. Obviously this is a generic solution, that will analysis several upstream, including upstream that will be packed, but is a first step to achieve our goal that is help the QA team and the distribution to monitors the quality of the upstreams that will become Fedora packages.


Week four: Summer coding report

Posted by squimrel on June 26, 2017 11:04 PM

This was a sad week since I was too ill to work until Thursday evening and gone on the weekend starting from Friday. That being said I did not work on anything apart from a PR to allow the user to specify the partition type when talking to UDisks. (To be honest I’m still ill but I can do work :-).)

Anyways let me explain to you (again) how we’re trying to partition the disk on Linux and why we run in so much trouble doing so.

<figure><figcaption>How we’re trying to partition</figcaption></figure>

The good thing about using UDisks is that it’s a centralized daemon everyone can use over the bus so it can act as an event emitter and it can also manage all the devices and since the bottleneck when working with devices is disk i/o anyways a centralized daemon is not a bad idea.

Let’s focus on using UDisks to partition a disk and the current problem (not the problems discussed in previous reports).

The issue is that libparted thinks the disk is a mac label because of the isohybrid layout bootable ISO images are using so that every system can boot them. Instead it should treat the disk as a dos label. This is important because the maximum number of partitions on a mac label is only 3 and on a dos label it’s 4.

The issue could be fixed by:

  • Fixing libparted.
  • Telling libblockdev to use libfdisk instead of libparted.
  • Not using UDisks at all and instead using libfdisk directly.

But can we actually use the fourth partition on a system that runs MacOS or Windows natively?
This is a legit question because we don’t actually know if adding a partition breaks the isohybrid layout.
I’d guess that it doesn’t and I’d also guess that once the kernel took control the partition table is read “correctly” by Linux and therefore it should detect the fourth partition and work. But I don’t know yet.

Using the proof of concept. I did test this on a VM and on a Laptop that usually runs Linux and in both cases persistent storage worked fine.
At the moment my mentor tests this on a device that usually runs Windows and on a device that usually runs MacOS to see if it works over there and even though the results are not out yet it doesn’t look that good.

If this doesn’t work we’re in big trouble since we’ll have to take a totally different approach to creating a bootable device that has persistent storage enabled.

GSoC2017 (Fedora) — Week 3&4

Posted by Mandy Wang on June 26, 2017 05:28 PM

I went to Guizhou and Hunan in China for my after-graduation trip last week. I walked on the glass skywalk in Zhangjiajie, visited Huangguoshu waterfallss and Fenghuang Ancient City, ate a lot of delicious food at the night market in Guiyang and so on. I had a wonderful time there, welcome to China to experience these! (GNOME Asia, hold in Chongqing, in October is a good choice, Chongqing is a big city which has a lot of hot food and hot girls.)

The main work I did these days for GSoC is carding and detailing the work about establishing the environment for Plinth in Fedora. I realize it by some crude way before, such as using some packages in Debian directly, but now I will make these steps more clear, organize the useful information and write them into INSTALL file.

But my mentor and I had a problem when I tried to run firstboot, I don’t know which packages are needed when I want to debug JS in Fedora, in other words, I want to find which packages in Fedora has the same function with the libjs-bootstrap, libjs-jquery and libjs-modernizr in Debian. If you know how to deal with it, please tell me, I’d be grateful.

indexed search performance for ds - the mystery of the and query

Posted by William Brown on June 25, 2017 02:00 PM

indexed search performance for ds - the mystery of the and query

Directory Server is heavily based on set mathematics - one of the few topics I enjoyed during university. Our filters really boil down to set queries:


This filter describes the intersection of sets of objects containing “attr=val1” and “attr=val2”.

One of the properties of sets is that operations on them are commutative - the sets to a union or intersection may be supplied in any order with the same results. As a result, these are equivalent:


In the past I noticed an odd behaviour: that the order of filter terms in an ldapsearch query would drastically change the performance of the search. For example:


The later query may significantly outperform the former - but 10% or greater. I have never understood the reason why though. I toyed with ideas of re-arranging queries in the optimise step to put the terms in a better order, but I didn’t know what factors affected this behaviour.

Over time I realised that if you put the “more specific” filters first over the general filters, you would see a performance increase.

What was going on?

Recently I was asked to investigate a full table scan issue with range queries. This led me into an exploration of our search internals, and yielded the answer to the issue above.

Inside of directory server, our indexes are maintained as “pre-baked” searches. Rather than trying to search every object to see if a filter matches, our indexes contain a list of entries that match a term. For example:

uid=mark: 1, 2
uid=william: 3
uid=noriko: 4

From each indexed term we construct an IDList, which is the set of entries matching some term.

On a complex query we would need to intersect these. So the algorithm would iteratively apply this:

t1 = (a, b)
t2 = (c, t1)
t3 = (d, t2)

In addition, the intersection would allocate a new IDList to insert the results into.

What would happen is that if your first terms were large, we would allocate large IDLists, and do many copies into it. This would also affect later filters as we would need to check large ID spaces to perform the final intersection.

In the above example, consider a, b, c all have 10,000 candidates. This would mean t1, t2 is at least 10,000 IDs, and we need to do at least 20,000 comparisons. If d were only 3 candidates, this means that we then throw away the majority of work and allocations when we get to t3 = (d, t2).

What is the fix?

We now wrap each term in an idl_set processing api. When we get the IDList from each AVA, we insert it to the idl_set. This tracks the “minimum” IDList, and begins our intersection from the smallest matching IDList. This means that we have the quickest reduction in set size, and results in the smallest possible IDList allocation for the results. In my tests I have seen up to 10% improvement on complex queries.

For the example above, this means we procees d first, to reduce t1 to the smallest possible candidate set we can.

t1 = (d, a)
t2 = (b, t1)
t3 = (c, t2)

This means to create t2, t3, we will do an allocation that is bounded by the size of d (aka 3, rather than 10,000), we only need to perform fewer queries to reach this point.

A benefit of this strategy is that it means if on the first operation we find t1 is empty set, we can return immediately because no other intersection will have an impact on the operation.

What is next?

I still have not improved union performance - this is still somewhat affected by the ordering of terms in a filter. However, I have a number of ideas related to either bitmask indexes or disjoin set structures that can be used to improve this performance.

Stay tuned ….

Week three: Summer coding report

Posted by squimrel on June 19, 2017 08:22 PM

A tiny PR to libblockdev got merged! It added the feature to ignore libparted warnings. This is important for the project I’m working on since it’s using udisks which uses libblockdev to partition disks and that wasn’t working because of a parted warning as mentioned in my previous report.
Poorly it’s still does not work since udisks tells libblockdev to be smart about picking a partition type and since there’re already three partitions on the disks libblockdev tries to create an extended partition which fails due to the following error that is thrown by parted:

mac disk labels do not support extended partitions.

Let’s see what we’ll do about that. By the way all this has the upside that I got to know the udisks, libblockdev and parted source code.

Releasing and packaging squimrel/iso9660io is easy now since I automated it in a script.

Luckily I worked on isomd5sum before because I need to get a quite ugly patch through so that it can be used together with a file descriptor that uses the O_DIRECT flag.

A checkbox to enable persistent storage has been added to the UI.

So far the time to handle unexpected issues has been available but since next week is the last week before the first evaluations things should definitely work by then.