Recently I have been working to clean up the configuration file syntax and parsing in rpminspect. Several months back there were suggestions on fedora-devel to improve things with the configuration files. The ideas were good improvements, so I added them to my to do list and am now at a point where I can work on making those changes. The main ideas:
- Move the configuration files out of /etc and in to /usr/share. Have these be the defaults.
- Let local overrides exist in /etc.
- Allow for multiple rpminspect-data vendor packages to be concurrently installed.
In addition to the above, I was planning on implementing support for a local configuration file to be sourced last. Sort of like having pylintrc in a Python project to drive pylint. I wanted the ability to have rpminspect read a final configuration file for local package configuration. My thinking is that package maintainers could put a per-package rpminspect configuration file in the dist-git repo.
Picking A Parser
Before doing this rearrangement, I was looking at the syntax of the configuration file. It has evolved over the past year as new features have been added. The configuration file follows an INI style layout which is the ‘key = value’ style syntax. This is a long established common practice for any kind of configuration file which spans many different kinds of formats. INI file syntax is understood and easy to follow. I have been using libiniparser in rpminspect to handle reading the file. This works but has presented a challenge for two types of settings I need to represent in the configuration file. The first is a simple list. INI syntax does not really allow for this in a well defined way. I get around the limitation by having my lists be space-delimited strings which I then tokenize in the source code. Not ideal because the obvious limitation is that I have now made it difficult to have a list member with a space in it. The second data type is a hash table. I want to capture user-defined key=value settings for a particular category. I get around this by making the setting be the section name (e.g., ‘[products]’) and within that section reading every key and value and adding them to my hash table. It’s not entirely clear in the configuration file and the syntax could lead to confusion. So cleaning all of this up has been on the to do list.
What to do? The program has existed in the wild for over a year so the existing format is now established. I need to either honor the existing format or make a flag-day style change and migrate everything. The latter is possible since the configuration data for rpminspect is nearly exclusive to the vendor data packages. If I had already established the per-package configuration file functionality, this would be a harder change.
Looking at options, here’s how I broke down things:
- Continue using the INI style format, possibly switching libraries. libconfini offers a bit more on top of the INI format, but still does not get me all the way there. There are other libraries and I could extend one of the existing ones. I would want any extensions to go upstream and that may or may not happen.
- Investigate new formats and switch everything over to something else.
- Define a new format and implement a lexer and parser in rpminspect.
I spent a lot of time looking at different INI libraries available. They all more or less provide the same type functionality which left me with limited or no list or hash table options. I then looked at defining a new configuration file format based on what I was already doing and implementing a parser in yacc. While this is possible, I was not really interested in going down this path because I didn’t want to run in to situations where the config file format was limiting a feature for some reason and then get stuck. Basically, I don’t want to be in the business of defining a config file format. Lastly, I moved on to looking at different existing options for configuration file formats. Here’s what I looked at:
- JSON - Already in use for the license database (inherited from another project). Already using the json-c library. The syntax is frustrating, which would make it a pain for a configuration file. Brief survey of applications show that JSON is not really used in this capacity.
- XML - I have used XML for configuration data and libxml provides a reasonable API for this. But it suffers from the same problem JSON has in that it’s a pain to edit and maintain by hand.
- YAML - My experience with YAML is limited and what YAML files I have seen, I do not like. The files I’ve seen tend to be very brief and cryptic and offer no real clue as to what is a setting and what is a value. Short files that might look like this:
--- config: - process: yes when: now - how: you_know - yes: process - now: when you_know: how
What is the significance of the hyphen? What are possible options? What am I even looking at? This file is not really helpful at discovering what you can do with a program, which is one thing I expect out of configuration files.
- TOML - This looked exactly like what I was wanting. Looks like INI style but adds more types and lists and things like that. The downside here is the lack of available libraries. I found libtoml on github which may or may not completely implement the specification and it’s made no releases. I consider this specification evolving and may look at it in a few months.
There are other things to consider for the configuration file format. Who are the target users? In the case of rpminspect it would be developers and package maintainers. The program runs in a CI capacity in Fedora. Of the formats above, YAML has been established for a number of scenarios, many driven by the use of Ansible. What about my converns with YAML? I decided to look in to things a bit more.
I found that YAML does allow comments, so that’s a huge win. And indentation can be more than the nearly unreadable 2 spaces that I see commonly used. Sections are denoted by indentation and hyphens are used for list members. Key=value pairs are of the form ‘key: value’. I rewrote rpminspect.conf as a YAML file and looked at the result. I kept comments and used 4 space indentation. The result was very readable to me so I decided to use this format. The libyaml library provides an entirely usable API for working with YAML data streams.
Making The Change
Because of the parser change, I decided to rename the configuration file to rpminspect.yaml. This both reflects the specification used but also keeps it distinct from the existing configuration file format used. I bumped the major version of rpminspect to ‘1’ as well as on the data packages to account for this change. The profile configuration files will also end with ‘.yaml’.
I rearranged the rpminspect.yaml file as well and broke up what used to be the [settings] section. I give each inspection its own block in the configuration file for more clarity. Some sections do not tie to a specific inspection but are for the entire run of the program. I may move those in to a larger section on its own, but I am not sure yet.
The file parsing happens in lib/init.c so that was where the bulk of the changes went. And moving to YAML meant a lot of this code could be deleted. That is always satisfying even though it’s code that I wrote in the first place.
The project also drops a dependency on the libiniparser library, so I updated the documentation and the meson.build files. With all of these changes in place, I built the program and ran the test suite. I fixed up various things until the test suite passed and pushed the commits. The first big part of this change was now complete.
These changes have been pushed to the master branch and current Copr builds now use YAML configuration files for both the main configuration file and profiles. The next steps are adjusting things to allow for concurrent data package installation and honoring an rpminspect.yaml file in the current directory.
I like the new configuration file layout. libyaml is easy to use and I like having fewer runtime dependencies. I do feel that there will come a time where we talk about using YAML for these types of files like we talk about XML for config files now. There is not a lot I can do about that though, so we will stick with YAML for now.