Fedora Quality Planet

LinuxFest Northwest report

Posted by Adam Williamson on May 10, 2017 02:17 AM

EDIT: recording link added!

Hi folks!

This weekend was LinuxFest Northwest 2017, and as usual I was down in Bellingham to attend it. Had a good time, again as usual. Luckily I got to do my talk first thing and get it out of the way. Here’s a recording, and here’s the slide deck. It was a general talk on Fedora’s past, present and future.

I saw several other good talks, including Bryan Lunduke‘s ‘Lunduke Hour Live’ featuring a great discussion with John Sullivan of the Free Software Foundation. I also saw the openSUSE 101 talk he did with James Mason – it was quite interesting to compare and contrast the openSUSE organization with Fedora’s. Together with James and an Ubuntu developer, I formed a heckler’s row at Kevin Burkeland’s Linux 102 talk on choosing a distribution; it was actually a great talk that was pretty well though-through and had nice things to say about Fedora and openSUSE, so our heckling was sadly pre-empted.

I spent a few hours working on the booth too, but as usual the Jeffs Sandys and Fitzmaurice were the real booth heroes, so thanks once more to them.

The trivia event on Saturday night was pretty fun (and our team, The Unholy Alliance (of SUSE and Fedora folks) won with only minor cheating!). My now-traditional Sunday afternoon board gaming with Jakob Perry and co. was also fun (and I managed not to come last…)

Got to chat with Jesse Keating, Brian Lane, Laura Abbott (briefly – hope your voice is recovered by now!) and many other fine folks too. It was also really nice to hear from a whole bunch of different people that they tried out a recent Fedora release and really liked it – almost feels like we’re doing something right!

If I promised you something at the conference and I don’t get in touch by the end of this week, please do give me a poke and remind me, I probably forgot…

Test Day DNF 2.0

Posted by sumantro on May 08, 2017 11:02 AM


Tuesday, 2017-05-09, is the DNF 2.0 Test Day! As part of this planned Change for Fedora 26, we need your help to test DNF 2.0!

Why test DNF 2.0?

DNF-2 is the upstream DNF version, the only version actively developed. Currently the upstream contains many user requested features, increased compatibility with yum and over 30 bug fixes. To DNF-1 back porting of patches from upstream is difficult and only critical security and usability fixes will be cherry-picked to Fedora.

With DNF 2.0 in places,users can notice usability improvements of DNF like better messages during resolution errors, showing whether package was installed as weak dependency, better handling of obsolete packages, less tracebacks, etc. One command line option and one configuration option changed semantic so DNF could behave differently in some way (these changes are compatible with yum but incompatible with DNF-1)We hope to see whether it’s working well enough and catch any remaining issues.

We need your help!

All the instructions are on the wiki page, so please read through and come help us test! As always, the event will be in #fedora-test-day on Freenode IRC.

Automated *non*-critical path update functional testing for Fedora

Posted by Adam Williamson on April 28, 2017 11:06 PM

Yep, this here is a sequel to my most recent best-seller, Automated critical path update functional testing for Fedora 🙂

When I first thought about running update tests with openQA, I wasn’t actually thinking about testing critical path packages. I just made that the first implementation because it was easy. But I first thought about doing it when we added the FreeIPA tests to openQA – it seemed pretty obvious that it’d be handy to run the same tests on FreeIPA-related updates as well as running them on the nightly development release composes. So all along, I was planning to come up with a way to do that too.

Funnily enough, right after I push out the critpath update testing stuff, a FreeIPA-related update that broke FreeIPA showed up, and Stephen Gallagher poked me on IRC and said “hey, it sure would be nice if we could run the openQA tests on FreeIPA-related updates!”, so I said “funny you should ask…”

I bumped the topic up my todo list a bit, and wrote it that afternoon, and now it’s deployed in production. For now, it’s pretty simple: we just have a hand-written list of packages that we want to run some of the update tests for, whenever an update shows up with one of those packages in it. Simple enough, but it works: whenever an update containing one of those packages is submitted or edited, the server update tests (including the FreeIPA tests) will get run, and the results will be visible in Bodhi.

Here’s a run on the staging instance that was triggered using the new code; since I sent it to the production instance no relevant updates have been submitted or edited, but it should work just the same there. So from now on whenever our FreeIPA-ish overlords submit an update, we’ll get an idea of whether it breaks everything right away.

We can extend this system to other packages, but I couldn’t think of any (besides postgresql, which I threw in there) which would really benefit from the current update tests but aren’t already in the critical path (all the important bits of GNOME or in the critical path, for example, so all the desktop update tests get run on all GNOME updates already). If you can think of any, go ahead and let us know.

Automated critical path update functional testing for Fedora

Posted by Adam Williamson on April 25, 2017 01:08 AM

A little thing I’ve been working on lately finally went live today…this thing:

openQA test results in Bodhi

Several weeks ago now, I adapted Fedora’s openQA to run an appropriate subset of tests on critical path updates. We originally set up our openQA deployment strictly to run tests at the distribution compose level, but I noticed that most of the post-install tests would actually also be quite useful things to test for critical path updates, too.

First, I set up a slightly different openQA workflow that starts from an existing disk image of a clean installed system, downloads the packages from a given update, sets up a local repository containing the packages, and runs dnf -y update before going ahead with the main part of the test.

Then, I adapted our openQA scheduler to trigger this workflow whenever a critical path update is submitted or edited, and forward the results to ResultsDB.

All of this went into production a few weeks ago, and the tests have been run on every critical path update since then. But there was a big piece missing: making the information easily available to the update submitter (and anyone else interested). So I wanted to make the results visible in Bodhi, alongside the Taskotron results. So I sent a patch for Bodhi, and the new Bodhi release with that change included was deployed to production today.

The last two Bodhi releases actually make some other great improvements to the display of automated test results, thanks to Ryan Lerch and Randy Barlow. The results are actually retrieved from ResultsDB by client-side Javascript every time someone views an update. Previously, this was done quite inefficiently and the results were shown at the top of the main update page, which meant they would show up piecemeal for several seconds after the page had mostly loaded, which was rather annoying especially for large updates.

Now the results are retrieved in a much more efficient manner and shown on a separate tab, where a count of the results is displayed once they’ve all been retrieved.

So with Bodhi 2.6, you should have a much more pleasant experience viewing automated test results in Bodhi’s web UI – and for critical path updates, you’ll now see results from openQA functional testing as well as Taskotron tests!

At present, the tests openQA runs fall into three main groups:

  1. Some simple ‘base’ tests, which check that SELinux is enabled, service manipulation (enabling, disabling, starting and stopping services) works, no default-enabled services fail to start, and updating the system with dnf works.

  2. Some desktop tests (currently run only on GNOME): launching and using a graphical terminal works, launching Firefox and doing some basic tests in it works, and updating the system with the graphical updater (GNOME Software in GNOME’s case) works.

  3. Some server tests: is the firewall configured and working as expected, is Cockpit enabled by default, and does it basically work, and both server and client tests for the database server (PostgreSQL) and domain controller (FreeIPA) server roles.

So if any of these fail for a critical path update, you should be able to see it. You can click any of the results to see the openQA webUI view of the test.

At present you cannot request a re-run of a single test. We’re thinking about mechanisms for allowing this at present. You can cause the entire set of openQA tests to be run again by editing the update: you don’t have to add or remove any builds, any kind of edit (just change a character in the description) will do.

If you need help interpreting any openQA test results, please ask on the test@ mailing list or drop by #fedora-qa . Myself or garretraziel should be available there most of the time.

Please do send along any thoughts, questions, suggestions or complaints to test@ or as a comment on this blog post. We’ll certainly be looking to extend and improve this system in future!

Fedora Media Writer Test Day (Fedora 26 edition) on Thursday 2017-04-20!

Posted by Adam Williamson on April 18, 2017 11:43 PM

It’s that time again: we have another test day coming up this week! Thursday 2017-04-20 will be another Fedora Media Writer Test Day. We’ve run these during the Fedora 24 and 25 cycles as well, but we want to make sure the tool is ready for the Fedora 26 release, and also test a major new feature it has this time around: support for writing ARM images. So please, if you have a bit of spare time and a system to test booting on – especially a Fedora-supported ARM device – come out and help us test!

The Test Day page contains all the instructions you need to run the tests and send along your results. As always, the event is in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

[Fedora Classroom] Fedora QA 101 and 102

Posted by sumantro on April 17, 2017 09:39 PM
This post will be summing up what we will be covering as a part of Fedora QA classroom this season. The idea is to understand how to do things the right way and to increase contributors.

The topics covered will be:
<style type="text/css">p { margin-bottom: 0.1in; line-height: 120%; }a:link { }</style>
1. Working with Fedora
2. Installing on VM(s)
3. Configuring and Installing fedora
4. Fedora Release Cycle
5. Live boot and Fedora Media Writer
6. Setting up accounts
7. Types of testing
8. Finding Test Cases
9. Writing Test Cases for Packages
10. Github
11. Bugzilla
12. Release Validation Testing
13. Update Testing
14. Manual Testing
14.1 Release validation
14.2 Update Testing


The 102 will cover Automated Testing and How to Host your own test days during the release cycle.

To make the workflow smooth , we have made a book which will act as an reference even after the classrooms are over.

https://drive.google.com/file/d/0Bzphf6h7upukTDlrczVEb0l3TmM/view?usp=sharing

Fedora Media Writer Test Day 2017-04-20

Posted by sumantro on April 16, 2017 08:11 AM
Fedora Media Writer , is a very handy tool to create live USB media. This became the primary downloadable in Fedora 25. We ran a test day installment to check for 3 major OS Windows , Mac OS and Fedora. The test day focused on writing Fedora images (workstation/server/spins) to a flash drive.

This installment of test day will focus on out of the box support for ARM v7 Arch apart from Intel 64 Bit and 32 Bit. The testers can either download image of their choice and then verify if the image by checksum and booting it on KVM and of course bare metal.

We will be calling this test day on 2017-04-20 , grab a blank SD card or USB and it will take roughly about 30 mins with a good internet speed to complete the test case.

Details will be published in Fedora community blog and @test-announce list. 

The wiki page says it all https://fedoraproject.org/wiki/Test_Day:2017-04-20_Fedora_Media_Writer

[Test Day Report]Anaconda BlivetGUI test day

Posted by sumantro on April 10, 2017 07:26 AM
Hey Testers,

I just wanted to pitch in the test day report for Anaconda BlivetGUI Test Day. It was a huge success and we had about 28 testers (many new faces) .

Testers : 28


Bugs Filed:[12]
1.https://bugzilla.redhat.com/show_bug.cgi?id=1430349
2.https://bugzilla.redhat.com/show_bug.cgi?id=1439538
3.https://bugzilla.redhat.com/show_bug.cgi?id=1439591
4.https://bugzilla.redhat.com/show_bug.cgi?id=1439744
5.https://bugzilla.redhat.com/show_bug.cgi?id=1439717
6.https://bugzilla.redhat.com/show_bug.cgi?id=1439684
7.https://bugzilla.redhat.com/show_bug.cgi?id=1439572
8.https://bugzilla.redhat.com/show_bug.cgi?id=1439111
9.https://bugzilla.redhat.com/show_bug.cgi?id=1439592
10.https://bugzilla.redhat.com/show_bug.cgi?id=1440143
11.https://bugzilla.redhat.com/show_bug.cgi?id=1440150
12.https://bugzilla.redhat.com/show_bug.cgi?id=1439729


Blog of the test day : https://communityblog.fedoraproject.org/anaconda-blivetgui-test-day/

I would like to thank each and every tester and the change owners for helping us test this crucial feature!

If you are one of those people who couldn't make it to the test day , you can go ahead and grab a copy for Fedora Alpha 1.7 and start installing Fedora using Blivet GUI . If something breaks , make sure to file a bug under blivet-gui .

Thanks
Sumantro
On behalf of Fedora QA team

Fedora 26 Alpha released, and blivet-gui Anaconda Test Day on Thursday (2016-04-06)

Posted by Adam Williamson on April 05, 2017 01:57 AM

Hi again folks! Two bits of Fedora 26 news today. First off, Fedora 26 Alpha has been released! It got delayed by a couple of weeks due to rather a grab-bag of issues – mainly problems with FreeIPA and several kernel bugs – but the delays did at least mean we wound up with a really pretty solid build, according to our testing so far. Please do grab the Alpha, play around with it, and see how it works for you. Remember to read the Common Bugs page, though I’m still working on it at the moment.

Secondly, we have another Test Day coming up this Thursday, 2017-04-06! Anaconda blivet-gui Test Day will be a pretty big one. In Fedora 26, an additional partitioning interface is added to Anaconda (the Fedora installer). As well as anaconda’s own custom partitioning interface, there is now a choice to run the blivet-gui partitioning tool from anaconda. This tool is built on the same backend as anaconda itself, but provides an alternative user interface. It’s been available as a standalone tool since Fedora 21, but Fedora 26 is the first time it can be run from the installer to do install-time partitioning. The Test Day will be all about testing out this new feature and making sure it integrates properly with anaconda and works properly in various situations. Please do come along and help out if you have time!

The Test Day page contains all the instructions you need to run the tests and send along your results. As always, the event is in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

[Test Day Annoucement] Anaconda Blivet GUI

Posted by sumantro on April 04, 2017 09:44 AM

Thursday 2017-04-06 will be Anaconda Blivet-GUI Test Day!
As part of this planned Change for Fedora 26, So this is an important Test Day!

We'll be testing the new detailed bottom-up configuration screen has been long requested by users and inclusion of blivet-gui into Anaconda finally makes this a reality. On the other hand, it just adds a new option without changing the existing advanced storage configuration so users that prefer the top-down configuration can still use it. to see whether it's working well enough and catch any remaining issues.
It's also pretty easy to join in: all you'll need is alpha 1.7 (which you can grab from the wiki page).
Anaconda grew a rather important new option in F26: as well as the two existing partitioning choices (automatic, and the existing anaconda custom part interface) there's now a *third* choice.so now you can do custom partitioning with blivet-gui run within anaconda, as well as using anaconda's own interface (because there just weren't enough ways for custom partitioning to go wrong already),so, we'll have a test day for using that interface, to try and shake out whatever problems it inevitably has.As always, the event will be in #fedora-test-day on Freenode IRC.

Fedora 26 crypto policy Test Day today (2017-03-30)!

Posted by Adam Williamson on March 30, 2017 05:39 PM

Sorry for the short notice, folks! Today is Fedora 26 crypto policy Test Day. This event is intended to test the Fedora 26 changes and updates to the ongoing crypto policy feature which intends to provide a centralized and unified configuration system for the various cryptography libraries commonly used in Fedora.

The Test Day page contains all the instructions you need to run the tests and send along your results. As always, the event is in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

Fedora Activity Day, Bangalore 2017

Posted by Kanika Murarka on March 13, 2017 01:53 PM

The Fedora Activity Day (FAD) is a regional event (either one-day or a multi-day) that allows Fedora contributors to gather together in order to work on specific tasks related to the Fedora Project.

On February 25th ’17, FAD was conducted in one of the admirable university of Bangalore, University Visvesvaraya College of Engineering(UVCE). It was not a typical “hackathon” or “DocSprint” but a series of productive and interactive sessions on different tools.

The goal of this FAD was to make students aware about Fedora so that they can test, develop and contribute. The event was a one-day-event, started at 10:30 in morning and concluded at 3 in evening.

The first talk was on Ansible, which is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates. The session was taken up by Vipul Siddharth and Prakash Mishra, who are Fedora contributors. They discussed about the importance of such automation tool and gave a small demo for getting started with Ansible.

The Ansible session was followed by the session on contributing to Linux Kernel, given by our esteemed guest Vaishali Thakkar (@kernel_girl ). Vaishali is Linux Kernel developer at Oracle, she is working in kernel security engineering group and associated with the open source internship programs and some community groups.Vaishali highlighted upon each and every aspect of kernel one should know before contributing. She discussed the lifecycle and how-where-when of a pull request. The session was a 3 hour long session with a short lunch break. The first part of the session was focused on theoretical aspects of sending your first patch to kernel community and the second part was a demo session where she sent a patch from scratch (Slides).

The last session was taken up by Sumantro Mukherjee (Fedora Ambassador) and me, on pathways to contribute to Fedora with a short interactive session.

The speakers were awarded tshirts as a mark of respect.I would like to thank Sumantro Mukherjee, Fedora Community and IEEE subchapter of UVCE college for making FAD possible.


Kernel testing made easy!

Posted by sumantro on February 21, 2017 11:37 PM
Hey Folks , this is sincere effort to bring into notice that people who want to stay on top of the game in terms of bleeding edge. The most important part is to check if the kernel version is supporting your system fine. If it does , then its awesome but if it doesn't you might wanna report it to the team with the proper failure logs which might be helpful for future references.

To get started with , you need a bleeding edge kernel to start with. You can get the latest kernel from Bodhi.





Most of the kernel(s) are updates and hence you need to enable update-testing repo to install the kernel from the update-testing repo. 
Once you have enabled the update testing repo , you can also disable it by executing "dnf config-manager --set-disabled updates-testing".While I'm writing this the latest kernel in update-testing for f25 was "kernel-4.9.10-200.fc25"

Once , you are done installing now , comes the part of checking if all the virtal features of your machine works smoothly. Of course , after a deep manual inspection you can trigger the test suite which will test the major parts for you.

First , you need to install some packages which are important , although many of you might just have all these packages.


The above pic shows the installation of packages, Once you have the required packages , you need to clone the pagure repo "git clone https://pagure.io/kernel-tests.git" .




After cloning you can simply start the test suite , you need to switch to the cloned folder and execute " cp config.example .config" after a successful execution , you need to open the .config file in vi/any text editor and edit the values of "submit=authenticated" and "username=<fas username>" . Once you are done , just run the test by executing " sudo ./runtests.sh" . And for performance testing you can run "sudo ./runtests.sh -t performance" both of these tests are most likely to pass if they dont you need to send/update the log and post karmas on bodhi for people to note if regressions are noted.

For any changes refer to :
https://fedoraproject.org/wiki/QA:Testcase_kernel_regression

Getting started with Pagure CI

Posted by Adam Williamson on February 17, 2017 06:50 AM

I spent a few hours today setting up a couple of the projects I look after, fedfind and resultsdb_conventions, to use Pagure CI. It was surprisingly easy! Many thanks to Pingou and Lubomir for working on this, and of course Kevin for helping me out with the Jenkins side.

You really do just have to request a Jenkins project and then follow the instructions. I followed the step-by-step, submitted a pull request, and everything worked first time. So the interesting part for me was figuring out exactly what to run in the Jenkins job.

The instructions get you to the point where you’re in a checkout of the git repository with the pull request applied, and then you get to do…whatever you can given what you’re allowed to do in the Jenkins builder environment. That doesn’t include installing packages or running mock. So I figured what I’d do for my projects – which are both Python – is set up a good tox profile. With all the stuff discussed below, the actual test command in the Jenkins job – after the boilerplate from the guide that checks out and merges the pull request – is simply tox.

First things first, the infra Jenkins builders didn’t have tox installed, so Kevin kindly fixed that for me. I also convinced him to install all the variant Python version packages – python26, and the non-native Python 3 packages – on each of the Fedora builders, so I can be confident I get pretty much the same tox run no matter which of the builders the job winds up on.

Of course, one thing worth noting at this point is that tox installs all dependencies from PyPI: if something your code depends on isn’t in there (or installed on the Jenkins builders), you’ll be stuck. So another thing I got to do was start publishing fedfind on PyPI! That was pretty easy, though I did wind up cribbing a neat trick from this PyPI issue so I can keep my README in Markdown format but have setup.py convert it to rst when using it as the long_description for PyPI, so it shows up properly formatted, as long as pypandoc is installed (but work even if it isn’t, so you don’t need pandoc just to install the project).

After playing with it for a bit, I figured out that what I really wanted was to have two workflows. One is to run just the core test suite, without any unnecessary dependencies, with python setup.py test – this is important when building RPM packages, to make sure the tests pass in the exact environment the package is built in (and for). And then I wanted to be able to run the tests across multiple environments, with coverage and linting, in the CI workflow. There’s no point running code coverage or a linter while building RPMs, but you certainly want to do it for code changes.

So I put the install, test and CI requirements into three separate text files in each repo – install.requires, tests.requires and tox.requires – and adjusted the setup.py files to do this in their setup():

install_requires = open('install.requires').read().splitlines(),
tests_require = open('tests.requires').read().splitlines(),

In tox.ini I started with this:

deps=-r{toxinidir}/install.requires
     -r{toxinidir}/tests.requires
     -r{toxinidir}/tox.requires

so the tox runs get the extra dependencies. I usually write pytest tests, so to start with in tox.ini I just had this command:

commands=py.test

Pytest integration for setuptools can be done in various ways, but I use this one. Add a class to setup.py:

import sys
from setuptools import setup, find_packages
from setuptools.command.test import test as TestCommand

class PyTest(TestCommand):
    user_options = [('pytest-args=', 'a', "Arguments to pass to py.test")]

    def initialize_options(self):
        TestCommand.initialize_options(self)
        self.pytest_args = ''
        self.test_suite = 'tests'

    def run_tests(self):
        #import here, cause outside the eggs aren't loaded
        import pytest
        errno = pytest.main(self.pytest_args.split())
        sys.exit(errno)

and then this line in setup():

cmdclass = {'test': PyTest},

And that’s about the basic shape of it. With an envlist, we get the core tests running both through tox and setup.py. But we can do better! Let’s add some extra deps to tox.requires:

coverage
diff-cover
pylint
pytest-cov

and tweak the commands in tox.ini:

commands=py.test --cov-report term-missing --cov-report xml --cov fedfind
         diff-cover coverage.xml --fail-under=90
         diff-quality --violations=pylint --fail-under=90

By adding a few args to our py.test call we get a coverage report for our library with the pull request applied. The subsequent commands use the neat diff_cover tool to add some more information. diff-cover basically takes the full coverage report (coverage.xml is produced by --cov-report xml) and considers only the lines that are touched by the pull request; the --fail-under arg tells it to fail if there is less than 90% coverage of the modified lines. diff-quality runs a linter (in this case, pylint) on the code and, again, considers only the lines changed by the pull request. As you might expect, --fail-under=90 tells it to fail if the ‘quality’ of the changed code is below 90% (it normalizes all the linter scores to a percentage scale, so that really means a pylint score of less than 9.0).

So without messing around with shipping all our stuff off to hosted services, we get a pretty decent indicator of the test coverage and code quality of the pull request, and it shows up as failing tests if they’re not good enough.

It’s kind of overkill to run the coverage and linter on all the tested Python environments, but it is useful to do it at least on both Python 2 and 3, since the pylint results may differ, and the code might hit different paths. Running them on every minor version isn’t really necessary, but it doesn’t take that long so I’m not going to sweat it too much.

But that does bring me to the last refinement I made, because you can vary what tox does in different environments. One thing I wanted for fedfind was to run the tests not just on Python 2.6, but with the ancient versions of several dependencies that are found in RHEL / EPEL 6. And there’s also an interesting bug in pylint which makes it crash when running on fedfind under Python 3.6. So my tox.ini really looks this:

[tox]
envlist = py26,py27,py34,py35,py36,py37
skip_missing_interpreters=true
[testenv]
deps=py27,py34,py35,py36,py37: -r{toxinidir}/install.requires
     py26: -r{toxinidir}/install.requires.py26
     py27,py34,py35,py36,py37: -r{toxinidir}/tests.requires
     py26: -r{toxinidir}/tests.requires.py26
     py27,py34,py35,py36,py37: -r{toxinidir}/tox.requires
     py26: -r{toxinidir}/tox.requires.py26
commands=py27,py34,py35,py36,py37: py.test --cov-report term-missing --cov-report xml --cov fedfind
         py26: py.test
         py27,py34,py35,py36,py37: diff-cover coverage.xml --fail-under=90
         # pylint breaks on functools imports in python 3.6+
         # https://github.com/PyCQA/astroid/issues/362
         py27,py34,py35: diff-quality --violations=pylint --fail-under=90
setenv =
    PYTHONPATH = {toxinidir}

As you can probably guess, what’s going on there is we’re installing different dependencies and running different commands in different tox ‘environments’. pip doesn’t really have a proper dependency solver, which – among other things – unfortunately means tox barfs if you try and do something like listing the same dependency twice, the first time without any version restriction, the second time with a version restriction. So I had to do a bit more duplication than I really wanted, but never mind. What the files wind up doing is telling tox to install specific, old versions of some dependencies for the py26 environment:

[install.requires.py26]
cached-property
productmd
setuptools == 0.6.rc10
six == 1.7.3

[tests.requires.py26]
pytest==2.3.5
mock==1.0.1

tox.requires.py26 is just shorter, skipping the coverage and pylint bits, because it turns out to be a pain trying to provide old enough versions of various other things to run those checks with the older pytest, and there’s no real need to run the coverage and linter on py26 as long as they run on py27 (see above). As you can see in the commands section, we just run plain py.test and skip the other two commands on py26; on py36 and py37 we skip the diff-quality run because of the pylint bug.

So now on every pull request, we check the code (and tests – it’s usually the tests that break, because I use some pytest feature that didn’t exist in 2.3.5…) still work with the ancient RHEL 6 Python, pytest, mock, setuptools and six, check it on various other Python interpreter versions, and enforce some requirements for test coverage and code quality. And the package builds can still just do python setup.py test and not require coverage or pylint. Who needs github and coveralls? 😉

Of course, after doing all this I needed a pull request to check it on. For resultsdb_conventions I just made a dumb fake one, but for fedfind, because I’m an idiot, I decided to write that better compose ID parser I’ve been meaning to do for the last week. So that took another hour and a half. And then I had to clean up the test suite…sigh.

Bluetooth in Fedora

Posted by Nathaniel McCallum on February 16, 2017 08:53 PM

So… Bluetooth. It’s everywhere now. Well, everywhere except Fedora. Fedora does, of course support bluetooth. But even the most common workflows are somewhat spotty. We should improve this.

To this end, I’ve enlisted the help of the Don Zickus, kernel developer extrordinaire, and Adam Williamson, the inimitable Fedora QA guru. The plan is to create a set of user tests for the most common bluetooth tasks. This plan has several goals.

First, we’d like to know when stuff is broken. For example, the recent breakage in linux-firmware. Catching this stuff early is a huge plus.

Second, we’d like to get high quality bug reports. When things do break, vague bug reports often cause things to sit in limbo for a while. Making sure we have all the debugging information up front can make reports actionable.

Third, we’d (eventually) like to block a new Fedora release if major functionality is broken. We’re obviously not ready for this step yet. But once the majority of workflows work on the hardware we care about, we need to ensure that we don’t ship a Fedora release with broken code.

To this end we are targeting three workflows which cover the most common cases:

  • Keyboards
  • Headsets
  • Mice

For more information, or to help develop the user testing, see the Fedora QA bug. Here’s to a better future!

Announcing the resultsdb-users mailing list

Posted by Adam Williamson on February 16, 2017 01:28 AM

I’ve been floating an idea around recently to people who are currently using ResultsDB in some sense – either sending reports to it, or consuming reports from it – or plan to do so. The idea was to have a group where we can discuss (and hopefully co-ordinate) use of ResultsDB – a place to talk about result metadata conventions and so forth.

It seemed to get a bit of traction, so I’ve created a new mailing list: resultsdb-users. If you’re interested, please do subscribe, through the web interface, or by sending a mail with ‘subscribe’ in the subject to this address.

If you’re not familiar with ResultsDB – well, it’s a generic storage engine for test results. It’s more or less a database with a REST API and some very minimal rules for what constitutes a ‘test result’. The only requirements really are some kind of test name plus a result, chosen from four options; results can include any other arbitrary key:value pairs you like, and a few have special meaning in the web UI, but that’s about it. This is one of the reasons for the new list: because ResultsDB is so generic, if we want to make it easily and reliably possible to find related groups of results in any given ResultsDB, we need to come up with ways to ensure related results share common metadata values, and that’s one of the things I expect we’ll be talking about on the list.

It began life as Taskotron‘s result storage engine, but it’s pretty independent, and you could certainly get value out of a ResultsDB instance without any of the other bits of Taskotron.

Right now ResultsDB is used in production in Fedora for storing results from Taskotron, openQA and Autocloud, and an instance is also used inside Red Hat for storing results from some RH test systems.

Please note: despite the list being a fedoraproject one, the intent is to co-ordinate with folks from CentOS, Red Hat and maybe even further afield as well; we’re just using an fp.o list as it’s a quick convenient way to get a nice mailman3/hyperkitty list without having to go set up a list server on taskotron.org or something.

The future of Fedora QA

Posted by Adam Williamson on February 12, 2017 05:33 PM

Welcome to version 2.0 of this blog post! This space was previously occupied by a whole bunch of longwinded explanation about some changes that are going on in Fedoraland, and are going to be accelerating (I think) in the near future. But it was way too long. So here’s the executive summary!

First of all: if you do nothing else to get up to speed on Stuff That’s Going On, watch Ralph Bean’s Factory 2.0 talk and Adam Samalik’s Modularity talk from Devconf 2017. Stephen Gallagher’s Fedora Server talk and Dennis Gilmore’s ‘moving everyone to Rawhide’ talk are also valuable, but please at least watch Ralph’s. It’s a one-hour overview of all the big stuff that people really want to build for Fedora (and RH) soon.

To put it simply: Fedora (and RH) don’t want to be only in the business of releasing a bunch of RPMs and operating system images every X months (or years) any more. And we’re increasing moving away from the traditional segmented development process where developers/package maintainers make the bits, then release engineering bundles them all up into ‘things’, and then QA looks at the ‘things’ and says “er, it doesn’t boot, try again”, and we do that for several months until QA is happy, then we release it and start over. There is a big project to completely overhaul the way we build and ship products, using a pipeline that involves true CI, where each proposed change to Fedora produces an immediate feedback loop of testing and the change is blocked if the testing fails. Again, watch Ralph’s talk, because what he basically does is put up a big schematic of this entire system and go into a whole bunch of detail about his vision for how it’s all going to work.

As part of this, some of the folks in RH’s Fedora QA team whose job has been to work on ‘automated testing’ – a concept that is very tied to the traditional model for building and shipping a ‘distribution’, and just means taking some of the tasks assigned to QA/QE in that model and automating them – are now instead going to be part of a new team at Red Hat whose job is to work on the infrastructure that supports this CI pipeline. That doesn’t mean they’re leaving Fedora, or we’re going to throw away all the work we’ve invested in the components of Taskotron and start all over again, but it does mean that some or all of the components of Taskotron are going to be re-envisaged as part of a modernized pipeline for building and shipping whatever it is we want to call Fedora in the future – and also, if things go according to plan, for building and shipping CentOS and Red Hat products, as part of the vision is that as many components of the pipeline as possible will be shared among many projects.

So that’s one thing that’s happening to Fedora QA: the RH team is going to get a bit smaller, but it’s for good and sensible reasons. You’re also not going to see those folks disappear into some kind of internal RH wormhole, they’ll still be right here working on Fedora, just in a somewhat different context.

Of course, all of this change has other implications for Fedora QA as well, and I reckon this is a good time for those of us still wearing ‘Fedora QA’ hats – whether we’re paid by Red Hat or not – to be reconsidering exactly what our goals and priorities ought to be. Much like with Taskotron, we really haven’t sat down and done that for several years. I’ve been thinking about it myself for a while, and I wouldn’t say I have it all figured out, but I do have some thoughts.

For a start I think we should be looking ahead to the time when we’re no longer on what the anaconda team used to call ‘the blocker treadmill’, where a large portion of our working time is eaten up by a more or less constant cycle of waking up, finding out what broke in Rawhide or Branched today, and trying to get it fixed. If the plans above come about, that should happen a lot less for a couple of reasons: firstly Fedora won’t just be a project which releases a bunch of OS images every six months any more, and secondly, distribution-level CI ought to mean that things aren’t broken all the damn time any more. In an ideal scenario, a lot of the basic fundamental breakage that, right now, is still mostly caught by QA – and that we spend a lot of our cycles on dealing with – will just no longer be our problem. In a proper CI system, it becomes truly the developers’ responsibility: developers don’t get to throw in a change that breaks everything and then wait for QA to notice and tell them about it. If they try and send a change that breaks everything, it gets rejected, and hopefully, the breakage never really ‘happens’.

Sadly (or happily, given I still have a mortgage to pay off) this probably doesn’t mean Project Colada will finally be reality and we all get to sit on the beach drinking cocktails for the rest of our lives. CI is a great process for ensuring your project basically works all the time, but ‘basically works’ is a long way from ‘perfect’. Software is still software, after all, and a CI process is never going to catch all of the bugs. Freeing QA from the blocker treadmill lets us look up and think, well, what else can we do?

To be clear, I think we’re still going to need ‘release validation’. In fact, if the bits of the plan about having more release streams than just ‘all the bits, every six months’ come off, we’ll need more release validation. But hopefully there’ll be a lot more “well, this doesn’t quite work right in this quite involved real-world scenario” and less “it doesn’t boot and I think it ate my cat” involved. For the near future, we’re going to have to keep up the treadmill: bar a few proofs of concept and stuff, Fedora 26 is still an ‘all the bits, every six months’ release, and there’s still an awful lot of “it doesn’t boot” involved. (Right now, Rawhide doesn’t even compose, let alone boot!) But it’s not too early to start thinking about how we might want to revise the ‘release validation’ concept for a world where the wheels don’t fall off the bus every five minutes. It might be a good idea to go back to the teams responsible for all the Fedora products – Server, Workstation, Atomic et. al – and see if we need to take another good look at the documents that define what those products should deliver, and the test processes we have in place to try and determine whether they deliver them.

We’re also still going to be doing ‘updates testing’ and ‘test days’, I think. In fact, the biggest consequence of a world where the CI stuff works out might be that we are free to do more of those. There may be some change in what ‘updates’ are – it may not just be RPM packages any more – but whatever interesting forms of ‘update’ we wind up shipping out to people, we’re still going to need to make sure they work properly, and manual testing is always going to be able to find things that automated tests miss there.

I think the question of to what extent we still have a role in ‘automated testing’ and what it should be is also a really interesting one. One of the angles of the ‘more collaboration between RH and Fedora’ bit here is that RH is now very interested in ‘upstreaming’ a bunch of its internal tests that it previously considered to be sort of ‘RH secret sauce’. Specifically, there’s a set of tests from RH’s ‘Platform QE’ team which currently run through a pipeline using RH’s Beaker test platform which we’d really like to have at least a subset of running on Fedora. So there’s an open question about whether and to what extent Fedora QA would have a role in adapting those tests to Fedora and overseeing their operation. The nuts and bolts of ‘make sure Fedora has the necessary systems in place to be able to run the tests at all’ is going to be the job of the new ‘infrastructure’ team, but we may well wind up being involved in the work of adapting the tests themselves to Fedora and deciding which ones we want to run and for what purposes. In general, there is likely still going to be a requirement for ‘automated testing’ that isn’t CI – it’s still going to be necessary to test the things we build at a higher level. I don’t think we can yet know exactly what requirements we’ll have there, but it’s something to think about and figure out as we move forward, and I think it’s definitely going to be part of our job.

We may also need to reconsider how Fedora QA, and indeed Fedora as a whole, decides what is really important. Right now, there’s a pretty solid process for this, but it’s quite tied to the ‘all the things, every six months’ release cycle. For each release we decide which Fedora products are ‘release blocking’, and we care about those, and the bits that go into them and the tools for building them, an awful lot more than we care about anything else. This works pretty well to focus our limited resources on what’s really important. But if we’re going to be moving to having more and more varied ‘Fedora’ products with different release streams, the binary ‘is it release blocking?’ question doesn’t really work any more. Fedora as a whole might need a better way of doing that, and QA should have a role to play in figuring that out and making sure we work out our priorities properly from it.

So there we go! I hope that was useful and thought-provoking. We’ve got a QA meeting coming up tomorrow (2017-02-13) at 1600 UTC where I’m hoping we can chew these topics over a bit, just to serve as an opportunity to get people thinking. Hope to see you there, or on the mailing list!

Fedorahosted to Pagure

Posted by Kanika Murarka on February 12, 2017 06:48 AM

Fedorahosted.org was established in late 2007 using Trac for issues and wiki pages, Fedora Account System groups for access control and source uploads, and offering a variety of Source Control Management tools (git, svn, hg, bzr). With the rise of new workflows and source repositories, fedorahosted.org has ceased to grow, adding just one new project this year and a handful the year before.

As we all know, Fedorahosted is shutting down end of this month, its time to migrate your projects from fedorahosted to one of the following:-

  1. Pagure
  2. Hosting and managing own Trac instance on OpenShift
  3. JIRA
  4. Phabricator
  5. GitHub
  6. Taiga

Pagure is the brainchild of Pierre-Yves Chibon, a member of the Fedora Engineering team. We will be looking into Pagure migration because Pagure is a new, full featured git repository service and its open-source and we ❤ opensource.

So, Pagure provides us Pagure test instance where we can create projects and test importing data. Note:from time to time it is been cleared out, so do not use it for any long-term use.

Here is How Pagure will support Fedorahosted projects ?

Features Fedorahosted Pagure
Priorities We can add as many priority levels as required with weights Same
We can assign a Default priority No such option
Custom priority tags Same
Milestone Ability to add as many milestone as we want Same
Option to add a due date Same
Keeps a track of completed time Does not record completed time
Option to select default milestone No such option
Resolution Ability to add as many resolutions as we want Same
Can set a default resolution type By default it is closed as ‘None’
Other Things Have separate column for Severity, component, Version Here it is easy, it has just Tags
Navigation and Searching Difficult Easy
Permission Different types of permission exist Only, ‘admin’ permission exist
Creating and maintaining tickets Difficult Easy
Enabling Plug-ins Easy Easy

So, lets try importing something to staging pagure repo, I will be showing demo using Fedora QA repo, which has recently been shifted from fedorahosted to pagure.

  1. You should have XML_RPC permission or admin rights for fedorahosted repo.
  2. We will use Pagure-importer to do migration.
  3. Install it using pip . python3 -m pip install pagure_importer
  4. Create a new repo ex- Test-fedoraqascreenshot-from-2017-02-10-16-53-21
  5. Go to Settings and make the repo, tickets friendly by adding new milestones and priorities.screenshot-from-2017-02-10-16-56-50
  6. Clone the issue tracker for issues from pagure. Use: pgimport clone ssh://git@stg.pagure.io/tickets/Test-fedoraqa.git.This will clone the pagure foobar repository into the default set /tmp directory as /tmp/Test-fedoraqa.gitscreenshot-from-2017-02-10-18-28-20
  7. Activate the pagure tickets hook from project settings. This is necessary step to also get pagure database updated for tickets repository changes.screenshot-from-2017-02-10-18-30-19
  8. Deactivate the pagure Fedmsg hook from project settings. This will avoid the issues import to spam the fedmsg bus. The Hook can be reactivated once the import has completed.
  9. The fedorahosted command can be used to import issues from a fedorahosted project to pagure.
    $ pgimport fedorahosted --help
        Usage: pgimport fedorahosted [OPTIONS] PROJECT_URL
    
        Options:
        --tags  Import pagure tags as well.
        --private By default make all issues private.
        --username TEXT FAS username
        --password TEXT FAS password
        --offset INTEGER Number of issue in pagure before import
        --help  Show this message and exit.
        --nopush Do not push the result of pagure-importer back
    
    
    $ pgimport fedorahosted https://fedorahosted.org/fedora-qa --tags

    This command will import all the tickets information with all tags to /tmp/foobar.git repository. If you are getting this error: ERROR: Error in response: {u'message': u'XML_RPC privileges are required to perform this operation', u'code': 403, u'name': u'JSONRPCError'}, means you dont have XML_RPC permission.

  10. You will be prompted for FAS username and password.screenshot-from-2017-02-10-19-01-51
  11. Go to tmp folder cd /tmp/
  12. Now, we need to push the tickets to new repo. The push command can be used to push a clone pagure ticket repo back to pagure
    $ pgimport push Test-fedoraqa.gitscreenshot-from-2017-02-10-19-10-04 screenshot-from-2017-02-10-19-10-16
  13. Refresh your repo, and it will look like thisscreenshot
  14. Now you can edit tickets in any way you want.

Stuck somewhere? Feel free to comment and contact. Thanks for reading this 🙂


openQA and Autocloud result submission to ResultsDB

Posted by Adam Williamson on February 07, 2017 05:18 AM

So I’ve just arrived back from a packed two weeks in Brno, and I’ll probably have some more stuff to post soon. But let’s lead with some big news!

One of the big topics at Devconf and around the RH offices was the ongoing effort to modernize both Fedora and RHEL’s overall build processes to be more flexible and involve a lot more testing (or, as some people may have put it, “CI CI CI”). A lot of folks wearing a lot of hats are involved in different bits of this effort, but one thing that seems to stay constant is that ResultsDB will play a significant role.

ResultsDB started life as the result storage engine for AutoQA, and the concept and name was preserved as AutoQA was replaced by Taskotron. Its current version, however, is designed to be a scalable, capable and generic store for test results from any test system, not just Taskotron. Up until last week, though, we’d never quite got around to hooking up any other systems to it to demonstrate this.

Well, that’s all changed now! In the course of three days, Jan Sedlak and I got both Fedora’s openQA instance and Autocloud reporting to ResultsDB. As results come out of both those systems, fedmsg consumers take the results, process them into a common format, and forward them to ResultsDB. This means there are groups with results from both systems for the same compose together, and you’ll find metadata in very similar format attached to the results from both systems. This is all deployed in production right now – the results from every daily compose from both openQA and Autocloud are being forwarded smoothly to ResultsDB.

To aid in this effort I wrote a thing we’re calling resultsdb_conventions for now. I think of it as being a code representation of some ‘conventions’ for formatting and organizing results in ResultsDB, as well as a tool for conveniently reporting results in line with those conventions. The attraction of ResultsDB is that it’s very little more than a RESTful API for a database; it enforces a pretty bare minimum in terms of required data for each result. A result must provide only a test name, an ‘item’ that was tested, and a status (‘outcome’) from a choice of four. ResultsDB allows a result to include as much more data as it likes, in the form of a freeform key:value data store, but it does not require any extra data to be provided, or impose any policy on its form.

This makes ResultsDB flexible, but also means we will need to establish conventions where appropriate to ensure related results can be conveniently located and reasoned about. resultsdb_conventions is my initial contribution to this effort, originally written just to reduce duplication between the openQA and Autocloud result submitters and ensure they used a common layout, but intended to perhaps cover far more use cases in the future.

Having this data in ResultsDB is likely to be practically useful either immediately or in the very near future, but we’re also hoping it acts as a demonstration that using ResultsDB to consolidate results from multiple test sources is not only possible but quite easy. And I’m hoping resultsdb_conventions can be a starting point for a discussion and some consensus around what metadata we provide, and in what format, for various types of result. If all goes well, we’re hoping to hook up manual test result submission to ResultsDB next, via the relval-ng project that’s had some discussion on the QA mailing lists. Stay tuned for more on that!

Welcome Fedora Quality Planet

Posted by Kamil Páral on January 31, 2017 10:31 AM

Hello, I’d like to introduce a new sub-planet of Fedora Planet to you, located at http://fedoraplanet.org/quality/ (you don’t need to remember the URL, there’s a sub-planet picker in the top right corner of Fedora Planet pages that allows you to switch between sub-planets).

Fedora Quality Planet will contain news and useful information about QA tools and processes present in Fedora, updates on our quality automation efforts, guides for package maintainers (and other teams) how to interact with our tools and checks or understand the reported failures, announcements about critical issues in Fedora releases, and more.

Our goal is to have a single place for you to visit (or subscribe to) and get a good overview of what’s happening in the Fedora Quality space. Of course all Fedora Quality posts should also show up in the main Fedora Planet feed, so if you’re already subscribed to that, you shouldn’t miss our posts either.

If you want to join our effort and publish some interesting quality-related posts into Fedora Quality Planet, you’re more then welcome! Please see the instructions how to syndicate your blog. If you have any questions or need help, ask in the test mailing list or ping kparal or adamw on #fedora-qa freenode IRC channel. Thanks!


The Tale Of The Two-Day, One-Character Patch

Posted by Adam Williamson on January 12, 2017 02:57 AM

I’m feeling like writing a very long explanation of a very small change again. Some folks have told me they enjoy my attempts to detail the entire step-by-step process of debugging some somewhat complex problem, so sit back, folks, and enjoy…The Tale Of The Two-Day, One-Character Patch!

Recently we landed Python 3.6 in Fedora Rawhide. A Python version bump like that requires all Python-dependent packages in the distribution to be rebuilt. As usually happens, several packages failed to rebuild successfully, so among other work, I’ve been helping work through the list of failed packages and fixing them up.

Two days ago, I reached python-deap. As usual, I first simply tried a mock build of the package: sometimes it turns out we already fixed whatever had previously caused the build to fail, and simply retrying will make it work. But that wasn’t the case this time.

The build failed due to build dependencies not being installable – python2-pypandoc, in this case. It turned out that this depends on pandoc-citeproc, and that wasn’t installable because a new ghc build had been done without rebuilds of the set of pandoc-related packages that must be rebuilt after a ghc bump. So I rebuilt pandoc, and ghc-aeson-pretty (an updated version was needed to build an updated pandoc-citeproc which had been committed but not built), and finally pandoc-citeproc.

With that done, I could do a successful scratch build of python-deap. I tweaked the package a bit to enable the test suites – another thing I’m doing for each package I’m fixing the build of, if possible – and fired off an official build.

Now you may notice that this looks a bit odd, because all the builds for the different arches succeeded (they’re green), but the overall ‘State’ is “failed”. What’s going on there? Well, if you click “Show result”, you’ll see this:

BuildError: The following noarch package built differently on different architectures: python-deap-doc-1.0.1-2.20160624git232ed17.fc26.noarch.rpm
rpmdiff output was:
error: cannot open Packages index using db5 - Permission denied (13)
error: cannot open Packages database in /var/lib/rpm
error: cannot open Packages database in /var/lib/rpm
removed     /usr/share/doc/python-deap/html/_images/cma_plotting_01_00.png
removed     /usr/share/doc/python-deap/html/examples/es/cma_plotting_01_00.hires.png
removed     /usr/share/doc/python-deap/html/examples/es/cma_plotting_01_00.pdf
removed     /usr/share/doc/python-deap/html/examples/es/cma_plotting_01_00.png

So, this is a good example of where background knowledge is valuable. Getting from step to step in this kind of debugging/troubleshooting process is a sort of combination of logic, knowledge and perseverance. Always try to be logical and methodical. When you start out you won’t have an awful lot of knowledge, so you’ll need a lot of perseverance; hopefully, the longer you go on, the more knowledge you’ll pick up, and thus the less perseverance you’ll need!

In this case the error is actually fairly helpful, but I also know a bit about packages (which helps) and remembered a recent mailing list discussion. Fedora allows arched packages with noarch subpackages, and this is how python-deap is set up: the main packages are arched, but there is a python-deap-docs subpackage that is noarch. We’re concerned with that package here. I recalled a recent mailing list discussion of this “built differently on different architectures” error.

As discussed in that thread, we’re failing a Koji check specific to this kind of package. If all the per-arch builds succeed individually, Koji will take the noarch subpackage(s) from each arch and compare them; if they’re not all the same, Koji will consider this an error and fail the build. After all, the point of a noarch package is that its contents are the same for all arches and so it shouldn’t matter which arch build we take the noarch subpackage from. If it comes out different on different arches, something is clearly up.

So this left me with the problem of figuring out which arch was different (it’d be nice if the Koji message actually told us…) and why. I started out just looking at the build logs for each arch and searching for ‘cma_plotting’. This is actually another important thing: one of the most important approaches to have in your toolbox for this kind of work is just ‘searching for significant-looking text strings’. That might be a grep or it might be a web search, but you’ll probably wind up doing a lot of both. Remember good searching technique: try to find the most ‘unusual’ strings you can to search for, ones for which the results will be strongly correlated with your problem. This quickly told me that the problematic arch was ppc64. The ‘removed’ files were not present in that build, but they were present in the builds for all other arches.

So I started looking more deeply into the ppc64 build log. If you search for ‘cma_plotting’ in that file, you’ll see the very first result is “WARNING: Exception occurred in plotting cma_plotting”. That sounds bad! Below it is a long Python traceback – the text starting “Traceback (most recent call last):”.

So what we have here is some kind of Python thing crashing during the build. If we quickly compare with the build logs on other arches, we don’t see the same thing at all – there is no traceback in those build logs. Especially since this shows up right when the build process should be generating the files we know are the problem (the cma_plotting files, remember), we can be pretty sure this is our culprit.

Now this is a pretty big scary traceback, but we can learn some things from it quite easily. One is very important: we can see quite easily what it is that’s going wrong. If we look at the end of the traceback, we see that all the last calls involve files in /usr/lib64/python2.7/site-packages/matplotlib. This means we’re dealing with a Python module called matplotlib. We can quite easily associate that with the package python-matplotlib, and now we have our next suspect.

If we look a bit before the traceback, we can get a bit more general context of what’s going on, though it turns out not to be very important in this case. Sometimes it is, though. In this case we can see this:

+ sphinx-build-2 doc build/html
Running Sphinx v1.5.1

Again, background knowledge comes in handy here: I happen to know that Sphinx is a tool for generating documentation. But if you didn’t already know that, you should quite easily be able to find it out, by good old web search. So what’s going on is the package build process is trying to generate python-deap’s documentation, and that process uses this matplotlib library, and something is going very wrong – but only on ppc64, remember – in matplotlib when we try to generate one particular set of doc files.

So next I start trying to figure out what’s actually going wrong in matplotlib. As I mentioned, the traceback is pretty long. This is partly just because matplotlib is big and complex, but it’s more because it’s a fairly rare type of Python error – an infinite recursion. You’ll see the traceback ends with many, many repetitions of this line:

  File "/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py", line 861, in _get_glyph
    return self._get_glyph('rm', font_class, sym, fontsize)

followed by:

  File "/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py", line 816, in _get_glyph
    uniindex = get_unicode_index(sym, math)
  File "/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py", line 87, in get_unicode_index
    if symbol == '-':
RuntimeError: maximum recursion depth exceeded in cmp

What ‘recursion’ means is pretty simple: it just means that a function can call itself. A common example of where you might want to do this is if you’re trying to walk a directory tree. In Python-y pseudo-code it might look a bit like this:

def read_directory(directory):
    print(directory.name)
    for entry in directory:
        if entry is file:
            print(entry.name)
        if entry is directory:
            read_directory(entry)

To deal with directories nested in other directories, the function just calls itself. The danger is if you somehow mess up when writing code like this, and it winds up in a loop, calling itself over and over and never escaping: this is ‘infinite recursion’. Python, being a nice language, notices when this is going on, and bails after a certain number of recursions, which is what’s happening here.

So now we know where to look in matplotlib, and what to look for. Let’s go take a look! matplotlib, like most everything else in the universe these days, is in github, which is bad for ecosystem health but handy just for finding stuff. Let’s go look at the function from the backtrace.

Well, this is pretty long, and maybe a bit intimidating. But an interesting thing is, we don’t really need to know what this function is for – I actually still don’t know precisely (according to the name it should be returning a ‘glyph’ – a single visual representation for a specific character from a font – but it actually returns a font, the unicode index for the glyph, the name of the glyph, the font size, and whether the glyph is italicized, for some reason). What we need to concentrate on is the question of why this function is getting in a recursion loop on one arch (ppc64) but not any others.

First let’s figure out how the recursion is actually triggered – that’s vital to figuring out what the next step in our chain is. The line that triggers the loop is this one:

                return self._get_glyph('rm', font_class, sym, fontsize)

That’s where it calls itself. It’s kinda obvious that the authors expect that call to succeed – it shouldn’t run down the same logical path, but instead get to the ‘success’ path (the return font, uniindex, symbol_name, fontsize, slanted line at the end of the function) and thus break the loop. But on ppc64, for some reason, it doesn’t.

So what’s the logic path that leads us to that call, both initially and when it recurses? Well, it’s down three levels of conditionals:

    if not found_symbol:
        if self.cm_fallback:
            <other path>
        else:
            if fontname in ('it', 'regular') and isinstance(self, StixFonts):
                return self._get_glyph('rm', font_class, sym, fontsize)

So we only get to this path if found_symbol is not set by the time we reach that first if, then if self.cm_fallback is not set, then if the fontname given when the function was called was ‘it’ or ‘regular’ and if the class instance this function (actually method) is a part of is an instance of the StixFonts class (or a subclass). Don’t worry if we’re getting a bit too technical at this point, because I did spend a bit of time looking into those last two conditions, but ultimately they turned out not to be that significant. The important one is the first one: if not found_symbol.

By this point, I’m starting to wonder if the problem is that we’re failing to ‘find’ the symbol – in the first half of the function – when we shouldn’t be. Now there are a couple of handy logical shortcuts we can take here that turned out to be rather useful. First we look at the whole logic flow of the found_symbol variable and see that it’s a bit convoluted. From the start of the function, there are two different ways it can be set True – the if self.use_cmex block and then the ‘fallback’ if not found_symbol block after that. Then there’s another block that starts if found_symbol: where it gets set back to False again, and another lookup is done:

    if found_symbol:
    (...)
        found_symbol = False
        font = self._get_font(new_fontname)
        if font is not None:
            glyphindex = font.get_char_index(uniindex)
            if glyphindex != 0:
                found_symbol = True

At first, though, we don’t know if we’re even hitting that block, or if we’re failing to ‘find’ the symbol earlier on. It turns out, though, that it’s easy to tell – because of this earlier block:

    if not found_symbol:
        try:
            uniindex = get_unicode_index(sym, math)
            found_symbol = True
        except ValueError:
            uniindex = ord('?')
            warn("No TeX to unicode mapping for '%s'" %
                 sym.encode('ascii', 'backslashreplace'),
                 MathTextWarning)

Basically, if we don’t find the symbol there, the code logs a warning. We can see from our build log that we don’t see any such warning, so we know that the code does initially succeed in finding the symbol – that is, when we get to the if found_symbol: block, found_symbol is True. That logically means that it’s that block where the problem occurs – we have found_symbol going in, but where that block sets it back to False then looks it up again (after doing some kind of font substitution, I don’t know why, don’t care), it fails.

The other thing I noticed while poking through this code is a later warning. Remember that the infinite recursion only happens if fontname in ('it', 'regular') and isinstance(self, StixFonts)? Well, what happens if that’s not the case is interesting:

            if fontname in ('it', 'regular') and isinstance(self, StixFonts):
                return self._get_glyph('rm', font_class, sym, fontsize)
            warn("Font '%s' does not have a glyph for '%s' [U+%x]" %
                 (new_fontname,
                  sym.encode('ascii', 'backslashreplace').decode('ascii'),
                  uniindex),
                 MathTextWarning)

that is, if that condition isn’t satisfied, instead of calling itself, the next thing the function does is log a warning. So it occurred to me to go and see if there are any of those warnings in the build logs. And, whaddayaknow, there are four such warnings in the ppc64 build log:

/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py:866: MathTextWarning: Font 'rm' does not have a glyph for '1' [U+1d7e3]
  MathTextWarning)
/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py:867: MathTextWarning: Substituting with a dummy symbol.
  warn("Substituting with a dummy symbol.", MathTextWarning)
/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py:866: MathTextWarning: Font 'rm' does not have a glyph for '0' [U+1d7e2]
  MathTextWarning)
/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py:866: MathTextWarning: Font 'rm' does not have a glyph for '-' [U+2212]
  MathTextWarning)
/usr/lib64/python2.7/site-packages/matplotlib/mathtext.py:866: MathTextWarning: Font 'rm' does not have a glyph for '2' [U+1d7e4]
  MathTextWarning)

but there are no such warnings in the logs for other arches. That’s really rather interesting. It makes one possibility very unlikely: that we do reach the recursed call on all arches, but it fails on ppc64 and succeeds on the other arches. It’s looking far more likely that the problem is the “re-discovery” bit of the function – the if found_symbol: block where it looks up the symbol again – is usually working on other arches, but failing on ppc64.

So just by looking at the logical flow of the function, particularly what happens in different conditional branches, we’ve actually been able to figure out quite a lot, without knowing or even caring what the function is really for. By this point, I was really focusing in on that if found_symbol: block. And that leads us to our next suspect. The most important bit in that block is where it actually decides whether to set found_symbol to True or not, here:

        font = self._get_font(new_fontname)
        if font is not None:
            glyphindex = font.get_char_index(uniindex)
            if glyphindex != 0:
                found_symbol = True

I didn’t actually know whether it was failing because self._get_font didn’t find anything, or because font.get_char_index returned 0. I think I just played a hunch that get_char_index was the problem, but it wouldn’t be too difficult to find out by just editing the code a bit to log a message telling you whether or not font was None, and re-running the test suite.

Anyhow, I wound up looking at get_char_index, so we need to go find that. You could work backwards through the code and figure out what font is an instance of so you can find it, but that’s boring: it’s far quicker just to grep the damn code. If you do that, you get various results that are calls of it, then this:

src/ft2font_wrapper.cpp:const char *PyFT2Font_get_char_index__doc__ =
src/ft2font_wrapper.cpp:    "get_char_index()\n"
src/ft2font_wrapper.cpp:static PyObject *PyFT2Font_get_char_index(PyFT2Font *self, PyObject *args, PyObject *kwds)
src/ft2font_wrapper.cpp:    if (!PyArg_ParseTuple(args, "k:get_char_index", &ccode)) {
src/ft2font_wrapper.cpp:        {"get_char_index", (PyCFunction)PyFT2Font_get_char_index, METH_VARARGS, PyFT2Font_get_char_index__doc__},

Which is the point at which I started mentally buckling myself in, because now we’re out of Python and into C++. Glorious C++! I should note at this point that, while I’m probably a half-decent Python coder at this point, I am still pretty awful at C(++). I may be somewhat or very wrong in anything I say about it. Corrections welcome.

So I buckled myself in and went for a look at this ft2font_wrapper.cpp thing. I’ve seen this kind of thing a couple of times before, so by squinting at it a bit sideways, I could more or less see that this is what Python calls an extension module: basically, it’s a Python module written in C or C++. This gets done if you need to create a new built-in type, or for speed, or – as in this case – because the Python project wants to work directly with a system shared library (in this case, freetype), either because it doesn’t have Python bindings or because the project doesn’t want to use them for some reason.

This code pretty much provides a few classes for working with Freetype fonts. It defines a class called matplotlib.ft2font.FT2Font with a method get_char_index, and that’s what the code back up in mathtext.py is dealing with: that font we were dealing with is an FT2Font instance, and we’re using its get_char_index method to try and ‘find’ our ‘symbol’.

Fortunately, this get_char_index method is actually simple enough that even I can figure out what it’s doing:

static PyObject *PyFT2Font_get_char_index(PyFT2Font *self, PyObject *args, PyObject *kwds)
{
    FT_UInt index;
    FT_ULong ccode;

    if (!PyArg_ParseTuple(args, "I:get_char_index", &ccode)) {
        return NULL;
    }

    index = FT_Get_Char_Index(self->x->get_face(), ccode);

    return PyLong_FromLong(index);
}

(If you’re playing along at home for MEGA BONUS POINTS, you now have all the necessary information and you can try to figure out what the bug is. If you just want me to explain it, keep reading!)

There’s really not an awful lot there. It’s calling FT_Get_Char_Index with a couple of args and returning the result. Not rocket science.

In fact, this seemed like a good point to start just doing a bit of experimenting to identify the precise problem, because we’ve reduced the problem to a very small area. So this is where I stopped just reading the code and started hacking it up to see what it did.

First I tweaked the relevant block in mathtext.py to just log the values it was feeding in and getting out:

        font = self._get_font(new_fontname)
        if font is not None:
            glyphindex = font.get_char_index(uniindex)
            warn("uniindex: %s, glyphindex: %s" % (uniindex, glyphindex))
            if glyphindex != 0:
                found_symbol = True

Sidenote: how exactly to just print something out to the console when you’re building or running tests can vary quite a bit depending on the codebase in question. What I usually do is just look at how the project already does it – find some message that is being printed when you build or run the tests, and then copy that. Thus in this case we can see that the code is using this warn function (it’s actually warnings.warn), and we know those messages are appearing in our build logs, so…let’s just copy that.

Then I ran the test suite on both x86_64 and ppc64, and compared. This told me that the Python code was passing the same uniindex values to the C code on both x86_64 and ppc64, but getting different results back – that is, I got the same recorded uniindex values, but on x86_64 the resulting glyphindex value was always something larger than 0, but on ppc64, it was sometimes 0.

The next step should be pretty obvious: log the input and output values in the C code.

index = FT_Get_Char_Index(self->x->get_face(), ccode);
printf("ccode: %lu index: %u\n", ccode, index);

Another sidenote: one of the more annoying things with this particular issue was just being able to run the tests with modifications and see what happened. First, I needed an actual ppc64 environment to use. The awesome Patrick Uiterwijk of Fedora release engineering provided me with one. Then I built a .src.rpm of the python-matplotlib package, ran a mock build of it, and shelled into the mock environment. That gives you an environment with all the necessary build dependencies and the source and the tests all there and prepared already. Then I just copied the necessary build, install and test commands from the spec file. For a simple pure-Python module this is all usually pretty easy and you can just check the source out and do it right in your regular environment or in a virtualenv or something, but for something like matplotlib which has this C++ extension module too, it’s more complex. The spec builds the code, then installs it, then runs the tests out of the source directory with PYTHONPATH=BUILDROOT/usr/lib64/python2.7/site-packages , so the code that was actually built and installed is used for the tests. When I wanted to modify the C part of matplotlib, I edited it in the source directory, then re-ran the ‘build’ and ‘install’ steps, then ran the tests; if I wanted to modify the Python part I just edited it directly in the BUILDROOT location and re-ran the tests. When I ran the tests on ppc64, I noticed that several hundred of them failed with exactly the bug we’d seen in the python-deap package build – this infinite recursion problem. Several others failed due to not being able to find the glyph, without hitting the recursion. It turned out the package maintainer had disabled the tests on ppc64, and so Fedora 24+’s python-matplotlib has been broken on ppc64 since about April).

So anyway, with that modified C code built and used to run the test suite, I finally had a smoking gun. Running this on x86_64 and ppc64, the logged ccode values were totally different. The values logged on ppc64 were huge. But as we know from the previous logging, there was no difference in the value when the Python code passed it to the C code (the uniindex value logged in the Python code).

So now I knew: the problem lay in how the C code took the value from the Python code. At this point I started figuring out how that worked. The key line is this one:

if (!PyArg_ParseTuple(args, "I:get_char_index", &ccode)) {

That PyArg_ParseTuple function is what the C code is using to read in the value that mathtext.py calls uniindex and it calls ccode, the one that’s somehow being messed up on ppc64. So let’s read the docs!

This is one unusual example where the Python docs, which are usually awesome, are a bit difficult, because that’s a very thin description which doesn’t provide the references you usually get. But all you really need to do is read up – go back to the top of the page, and you get a much more comprehensive explanation. Reading carefully through the whole page, we can see pretty much what’s going on in this call. It basically means that args is expected to be a structure representing a single Python object, a number, which we will store into the C variable ccode. The tricky bit is that second arg, "I:get_char_index". This is the ‘format string’ that the Python page goes into a lot of helpful detail about.

As it tells us, PyArg_ParseTuple “use[s] format strings which are used to tell the function about the expected arguments…A format string consists of zero or more “format units.” A format unit describes one Python object; it is usually a single character or a parenthesized sequence of format units. With a few exceptions, a format unit that is not a parenthesized sequence normally corresponds to a single address argument to these functions.” Next we get a list of the ‘format units’, and I is one of those:

 I (integer) [unsigned int]
    Convert a Python integer to a C unsigned int, without overflow checking.

You might also notice that the list of format units include several for converting Python integers to other things, like i for ‘signed int’ and h for ‘short int’. This will become significant soon!

The :get_char_index bit threw me for a minute, but it’s explained further down:

“A few other characters have a meaning in a format string. These may not occur inside nested parentheses. They are: … : The list of format units ends here; the string after the colon is used as the function name in error messages (the “associated value” of the exception that PyArg_ParseTuple() raises).” So in our case here, we have only a single ‘format unit’ – I – and get_char_index is just a name that’ll be used in any error messages this call might produce.

So now we know what this call is doing. It’s saying “when some Python code calls this function, take the args it was called with and parse them into C structures so we can do stuff with them. In this case, we expect there to be just a single arg, which will be a Python integer, and we want to convert it to a C unsigned integer, and store it in the C variable ccode.”

(If you’re playing along at home but you didn’t get it earlier, you really should be able to get it now! Hint: read up just a few lines in the C code. If not, go refresh your memory about architectures…)

And once I understood that, I realized what the problem was. Let’s read up just a few lines in the C code:

FT_ULong ccode;

Unlike Python, C and C++ are ‘typed languages’. That just means that all variables must be declared to be of a specific type, unlike Python variables, which you don’t have to declare explicitly and which can change type any time you like. This is a variable declaration: it’s simply saying “we want a variable called ccode, and it’s of type FT_ULong“.

If you know anything at all about C integer types, you should know what the problem is by now (you probably worked it out a few paragraphs back). But if you don’t, now’s a good time to learn!

There are several different types you can use for storing integers in C: short, int, long, and possibly long long (depends on your arch). This is basically all about efficiency: you can only put a small number in a short, but if you only need to store small numbers, it might be more efficient to use a short than a long. Theoretically, when you use a short the compiler will allocate less memory than when you use an int, which uses less memory again than a long, which uses less than a long long. Practically speaking some of them wind up being the same size on some platforms, but the basic idea’s there.

All the types have signed and unsigned variants. The difference there is simple: signed numbers can be negative, unsigned ones can’t. Say an int is big enough to let you store 101 different values: a signed int would let you store any number between -50 and +50, while an unsigned int would let you store any number between 0 and 100.

Now look at that ccode declaration again. What is its type? FT_ULong. That ULong…sounds a lot like unsigned long, right?

Yes it does! Here, have a cookie. C code often declares its own aliases for standard C types like this; we can find Freetype’s in its API documentation, which I found by the cunning technique of doing a web search for FT_ULong. That finds us this handy definition: “A typedef for unsigned long.”

Aaaaaaand herein lies our bug! Whew, at last. As, hopefully, you can now see, this ccode variable is declared as an unsigned long, but we’re telling PyArg_ParseTuple to convert the Python object such that we can store it as an unsigned int, not an unsigned long.

But wait, you think. Why does this seem to work OK on most arches, and only fail on ppc64? Again, some of you will already know the answer, good for you, now go read something else. 😉 For the rest of you, it’s all about this concept called ‘endianness’, which you might have come across and completely failed to understand, like I did many times! But it’s really pretty simple, at least if we skate over it just a bit.

Consider the number “forty-two”. Here is how we write it with numerals: 42. Right? At least, that’s how most humans do it, these days, unless you’re a particularly hardy survivor of the fall of Rome, or something. This means we humans are ‘big-endian’. If we were ‘little-endian’, we’d write it like this: 24. ‘Big-endian’ just means the most significant element comes ‘first’ in the representation; ‘little-endian’ means the most significant element comes last.

All the arches Fedora supports except for ppc64 are little-endian. On little-endian arches, this error doesn’t actually cause a problem: even though we used the wrong format unit, the value winds up being correct. On (64-bit) big-endian arches, however, it does cause a problem – when you tell PyArg_ParseTuple to convert to an unsigned long, but store the result into a variable that was declared as an unsigned int, you get a completely different value (it’s multiplied by 2×32). The reasons for this involve getting into a more technical understanding of little-endian vs. big-endian (we actually have to get into the icky details of how values are really represented in memory), which I’m going to skip since this post is already long enough.

But you don’t really need to understand it completely, certainly not to be able to spot problems like this. All you need to know is that there are little-endian and big-endian arches, and little-endian are far more prevalent these days, so it’s not unusual for low-level code to have weird bugs on big-endian arches. If something works fine on most arches but not on one or two, check if the ones where it fails are big-endian. If so, then keep a careful eye out for this kind of integer type mismatch problem, because it’s very, very likely to be the cause.

So now all that remained to do was to fix the problem. And here we go, with our one character patch:

diff --git a/src/ft2font_wrapper.cpp b/src/ft2font_wrapper.cpp
index a97de68..c77dd83 100644
--- a/src/ft2font_wrapper.cpp
+++ b/src/ft2font_wrapper.cpp
@@ -971,7 +971,7 @@ static PyObject *PyFT2Font_get_char_index(PyFT2Font *self, PyObject *args, PyObj
     FT_UInt index;
     FT_ULong ccode;

-    if (!PyArg_ParseTuple(args, "I:get_char_index", &ccode)) {
+    if (!PyArg_ParseTuple(args, "k:get_char_index", &ccode)) {
         return NULL;
     }

There’s something I just love about a one-character change that fixes several hundred test failures. 🙂 As you can see, we simply change the I – the format unit for unsigned int – to k – the format unit for unsigned long. And with that, the bug is solved! I applied this change on both x86_64 and ppc64, re-built the code and re-ran the test suite, and observed that several hundred errors disappeared from the test suite on ppc64, while the x86_64 tests continued to pass.

So I was able to send that patch upstream, apply it to the Fedora package, and once the package build went through, I could finally build python-deap successfully, two days after I’d first tried it.

Bonus extra content: even though I’d fixed the python-deap problem, as I’m never able to leave well enough alone, it wound up bugging me that there were still several hundred other failures in the matplotlib test suite on ppc64. So I wound up looking into all the other failures, and finding several other similar issues, which got the failure count down to just two sets of problems that are too domain-specific for me to figure out, and actually also happen on aarch64 and ppc64le (they’re not big-endian issues). So to both the people running matplotlib on ppc64…you’re welcome 😉

Seriously, though, I suspect without these fixes, we might have had some odd cases where a noarch package’s documentation would suddenly get messed up if the package happened to get built on a ppc64 builder.

QA protip of the day: make sure your test runner fails properly

Posted by Adam Williamson on January 01, 2017 01:44 AM

Just when you thought you were safe…it’s time for a blog post!

For the last few days I’ve been working on fixing Rawhide packages that failed to build as part of the Python 3.6 mass rebuild. In the course of this, I’ve been enabling test suites for packages where there is one, we can plausibly run it, and we weren’t doing so before, because tests are great and running them during package builds is great. (And it’s in the guidelines).

I’ve now come across two projects which have a unittest-based test script which does something like this:

#!/usr/bin/python3

class SomeTests(unittest.TestCase):
    [tests here]

def main():
    suite = unittest.TestLoader().loadTestsFromTestCase(SomeTests)
    unittest.TextTestRunner(verbosity=3).run(suite)

if __name__ == '__main__':
    main()

Now if you just run this script manually all the time and inspect its output, you’ll be fine, because it’ll tell you whether the tests passed or not. However, if you try and use it in any kind of automated way you’re going to have trouble, because this script will always exit 0, even if some or all the tests fail. This, of course, makes it rather useless for running during a package build, because the build will never fail even if all the tests do.

If you’re going to write your own test script like this (which…seriously consider if you should just rely on unittest’s ‘gathering’ stuff instead, or use nose(2), or use pytest…), then it’s really a good idea to make sure your test script actually fails if any of the tests fail. Thus:

#!/usr/bin/python3

import sys

class SomeTests(unittest.TestCase):
    [tests here]

def main():
    suite = unittest.TestLoader().loadTestsFromTestCase(SomeTests)
    ret = unittest.TextTestRunner(verbosity=3).run(suite)
    if ret.wasSuccessful():
        sys.exit()
    else:
        sys.exit("Test(s) failed!")

if __name__ == '__main__':
    main()

(note: just doing sys.exit() will exit 0; doing sys.exit('any string') prints the string and exits 1).

Packagers, look out for this kind of bear trap when packaging…if the package doesn’t use a common test pattern or system but has a custom script like this, check it and make sure it behaves sanely.

Oooh, look! A shiny thing!

Posted by Adam Williamson on November 02, 2016 01:27 AM

Hmm.

Today I was supposed to be finalizing the test cases for Thursday’s switchable graphics Test Day.

Instead, thanks to this:

(jeff) adamw: Hey, I’ve been approached by 2 different people telling me dnf system-upgrade was failing. In both cases they had to import the F24 key manually. And in both cases they were going F21->F24. Do you know if that’s a documented limitation somewhere?

I somehow wound up spending the day using yum and dnf to upgrade a virtual machine from Fedora 13 to 15, to 16, to 17, to 23, to 24. And making a bunch of improvements to:

Easily distracted?

Moi?

Oh, sorry, I saw something shiny over there…(wanders off)

Fedora 25 switchable graphics Test Day this Thursday, 2016-11-03

Posted by Adam Williamson on October 31, 2016 09:30 PM

Yep, it’s Test Day time again – most likely the final Test Day of the Fedora 25 cycle. This Thursday, 2016-11-03, will be switchable graphics Test Day!

‘Switchable graphics’ refers to the fairly common current practice of laptops having two graphics adapters, one low-power one for general purpose use, one more powerful one for use with applications that require more oomph (e.g. games or 3D rendering applications). NVIDIA brands this as ‘Optimus’, and AMD just as ‘Switchable Graphics’ or ‘Dynamic Switchable Graphics’.

There are some enhancements to Fedora’s support for such systems in Fedora 25, and part of the Test Day’s purpose is to test those enhancements. The other part of the Test Day’s purpose is to ensure that support for switchable graphics on Fedora 25 Workstation with Wayland by default is as good as it can be, and that in some cases where we know Wayland support is not sufficient, fallback to X.org works as expected.

If you’re not sure whether you have a system with switchable graphics, you can run xrandr --listproviders. If the output from this command lists more than one ‘provider’, you likely have switchable graphics. If you do, please come along to the Test Day and help us test, if you can spare the time. If you believe your system has switchable graphics, but the command only lists one provider, please come join the Test Day chat on the day – we may be able to investigate and figure out what’s going on!

The Test Day page and the test cases are still being revised and tweaked as I write this, but as the week goes along, all the instructions you need to run the tests will be present there. As always, the event will be in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

What just happened?

Posted by Adam Williamson on October 27, 2016 08:08 AM

4pm: “Well, guess it’s time to write the F25 Final blocker status mail.”

4:10pm: “Yeesh, I guess I’d better figure out which of the three iSCSI blocker bugs is actually still valid, and maybe take a quick look at what the problem is.”

1:06am: “Well, I think I’m done fixing iSCSI now. But I seem to have sprouted four new openQA action items. Blocker status mail? What blocker status mail?”

Fedora 25 Workstation Wayland-by-default Test Day report

Posted by Adam Williamson on October 14, 2016 10:30 PM

Hi folks! As yesterday’s Test Day was pretty popular and widely-covered, I thought I’d blog the report as well as emailing it out.

We had a great Test Day! 49 testers combined ran a total of 341 tests and filed or referenced 35 bugs. 9 of those have since been closed as duplicates, leaving 26 open reports:

  • #1299505 gnome-calculator prints “Currency LTL is not provided by IMF or ECB” in Financial mode
  • #1330034 [abrt] calibre: QObject::disconnect() : python2.7 killed by SIGSEGV
  • #1367846 Scrolling is way too fast in writer
  • #1376471 global menu can’t be used for 15-30 seconds, startup notification stuck, missing item in alt+tab (on wayland)
  • #1379098 [Regression] Gnome-shell crashes on switching back from tty ([abrt] gnome-shell: wl_resource_post_event() : gnome-shell killed by SIGSEGV)
  • #1383471 [abrt] WARNING: CPU: 3 PID: 12176 at ./include/linux/swap.h:276 page_cache_tree_insert+0x1cc/0x1e0
  • #1384431 activities screen shows applications and search results at the same time
  • #1384440 dragging gnome dash application to a specific workplace doesn’t open the application to this workspace (on wayland)
  • #1384489 Music not recognizing/importing files
  • #1384502 Recent tab not available
  • #1384537 Opening a new gnome-software window creates a new entry without a proper icon
  • #1384546 Removing application does not bring back the install icon immediately
  • #1384551 Printing directions using maps do not show the marked path
  • #1384560 Screenshot of gnome-maps does not show the map part at all
  • #1384569 Places dropdown search does not function if weather is open on secondary monitor
  • #1384570 gnome-initial-setup does not exit at the end
  • #1384572 Places dropdown search does not function if clocks is open on secondary monitor
  • #1384590 [abrt] gnome-photos: babl_get_name() : gnome-photos killed by SIGABRT
  • #1384596 gnome-boxes: starting fails without any feedback
  • #1384599 gnome-calculator currency conversion is hard to use
  • #1384616 thumbnail “border” seems misaligned in activities overview
  • #1384651 Selecting city should automatically be added on Clock Application
  • #1384665 [abrt] authconfig-gtk: gdk_window_enable_synchronized_configure() : python2.7 killed by SIGSEGV
  • #1384671 system-config-language does not work under Wayland
  • #1384675 system-config-users does not work under Wayland
  • #1384678 Missing top-left icon (and full application name) on Wayland

Some of these are not Wayland bugs, but it’s not a bad thing that people found some non-Wayland bugs as well while testing! We did find several new Wayland issues, but on the positive side, no really big bugs that weren’t already known and on the radar for the final release.

So the event looks like a success on all fronts: we found some new bugs to squish, but it also gives us a decent indication that the Workstation-on-Wayland experience is in a good enough condition for a first stable release. We also confirmed that the Workstation-on-X11 session is available as a fallback and that works properly, for anyone who can’t use Wayland for any reason.

Many thanks to all the testers for their hard work!

wikitcms, relval, fedfind and testdays moved to Pagure

Posted by Adam Williamson on October 14, 2016 09:54 PM

Today I moved several of my pet projects from the cgit instance on this server to Pagure. You can now find them here:

The home page URLs for each project on this server – e.g. https://www.happyassassin.net/fedfind – also now redirect to the Pagure project pages.

I also deleted some other repos that were hosted in my cgit instance entirely, because I don’t think they were any longer of interest to anyone and I didn’t want to maintain them. Those were mostly related to Fedlet, which I haven’t been working on for 2-3 years now.

For now the repos for the three main projects – wikitcms, relval and fedfind – remain in my cgit instance, containing just a single text file documenting the move to Pagure; in a month or so I will remove these repositories and decommission the cgit instance. So, update your checkouts! 🙂

This saves me maintaining the repos, provides pull review and issue mechanisms, and it’s a good thing to have all Fedora-ish code projects in Pagure in general, I think.

Many thanks to pingou and everyone else who works on Pagure, it’s a great project!

Fedora Workstation Wayland Test Day: 2016-10-13!

Posted by Adam Williamson on October 12, 2016 12:23 AM

Hi folks! Time to announce another Fedora 25 Test Day: Wayland Test Day! Note, the wiki page doesn’t exist yet, as the wiki is having issues right now, but I wanted to make the announcement early enough to give people time to prepare.

You may have read that we plan to make Fedora 25 Workstation the first release to default to Wayland (rather than X11) as the graphical server. This has been in place since Fedora 25 Alpha, but to prepare for the general release, we’d like to run a test day and get some broad-based testing to ensure that Wayland is at least good enough for an initial release, and that the option to switch to X11 works properly for those cases where it might be necessary.

You’ll be able to run most of the tests from a live image (without doing a permanent installation). All the test instructions will be on the wiki page and there will be QA and developer folks around all day in the IRC channel to help you test and report any issues you find.

Just about anyone with a computer can help with this testing, and we’d like to have feedback from as many users as possible, so please, if you have a little time on Thursday, come help out! As always, the event will be in #fedora-test-day on Freenode IRC. If you don’t know how to use IRC, you can read these instructions, or just use WebIRC.

X crash during Fedora update when system has hybrid graphics and systemd-udev is in update

Posted by Adam Williamson on October 04, 2016 09:36 PM

Hi folks! This is a PSA about a fairly significant bug we’ve recently been able to pin down in Fedora 24+.

Here’s the short version: especially if your system has hybrid graphics (that is, it has an Intel video adapter and also an AMD or NVIDIA one, and it’s supposed to switch to the most appropriate one for what you’re currently doing – NVIDIA calls this ‘Optimus’), DON’T UPDATE YOUR SYSTEM BY RUNNING DNF FROM THE DESKTOP. (Also if you have multiple graphics adapters that aren’t strictly ‘hybrid graphics’; the bug affects any case with multiple graphics adapters).

Here’s the slightly longer version. If your system has more than one graphics adapter, and you update the systemd-udev package while X is running, X may well crash. So if the update process was running inside the X session, it will also crash and will not complete. This will leave you in the unfortunate situation where RPM thinks you have two versions of several packages installed at the same time (and also a bunch of package scripts that should have run will not have run).

The bug is actually triggered by restarting systemd-udev-trigger.service; anything which does that will cause X to crash on an affected system. So far only systems with multiple adapters are reported to be affected; not absolutely all such systems are affected, but a good percentage appear to be. It occurs when the systemd-udev package is updated because the package %postun scriptlet – which is run on update when the old version of the package is removed – restarts that service.

The safest possible way to update a Fedora system is to use the ‘offline updates’ mechanism. If you use GNOME, this is how updates work if you just wait for the notifications to appear, the ones that tell you you can reboot to install updates now. What’s actually happening there is that the system has downloaded and cached the updates, and when you click ‘reboot’, it will boot to a special state where very few things are running – just enough to run the package update – run the package update, then reboot back to the normal system. This is the safest way to apply updates. If you don’t want to wait for notifications, you can run GNOME Software, click the Updates button, and click the little circular arrow to force a refresh of available updates.

If you don’t use GNOME, you can use the offline update system via pkcon, like this:

sudo pkcon refresh force && \
sudo pkcon update --only-download && \
sudo pkcon offline-trigger && \
sudo systemctl reboot

If you don’t want to use offline updates, the second safest approach is to run the update from a virtual terminal. That is, instead of opening a terminal window in your desktop, hit ctrl-alt-f3 and you’ll get a console login screen. Log in and run the update from this console. If your system is affected by the bug, and you leave your desktop running during the update, X will still crash, but the update process will complete successfully.

If your system only has a single graphics adapter, this bug should not affect you. However, it’s still not a good idea to run system updates from inside your desktop, as any other bug which happens to cause either the terminal app, or the desktop, or X to crash will also kill the update process. Using offline updates or at least installing updates from a VT is much safer.

The bug reports for this issue are:

  • #1341327 – for the X part of the problem
  • #1378974 – for the systemd part of the problem

Updates for Fedora 24 and Fedora 25 are currently being prepared. However, the nature of the bug actually means that installing the update will trigger the bug, for the last time. The updates will ensure that subsequent updates to systemd-udev will no longer cause the problem. We are aiming to get the fix into Fedora 25 Beta, so that systems installed from Fedora 25 Beta release images will not suffer from the bug at all, but existing Fedora 25 systems will encounter the bug when installing the update.

Fedora 25 i18n test day 2016-09-28

Posted by sumantro on September 27, 2016 04:23 PM
Hey All, this is a call for action for the internationalization[i18n] test day which is happening tomorrow and we will like to have people testing the keyboard layouts of different languages and emojis.

How to test

If you test with an installed system, make sure you have all the current updates installed, using the update manager, especially if you use the Alpha images.Grab a copy latest nightly , you can get it here .
Once you are done you can start testing out for :

Reporting

Result Page : http://testdays.fedorainfracloud.org/events/10





Found Bugs ? Report here

Retrospection of Fedora QA Global Onboarding Call

Posted by sumantro on August 23, 2016 07:18 AM
Building on the premise , that Fedora QA mailing-list started having loads of new contributors, we decided to kick off on-boarding calls for the new joiners to understand the Fedora QA process in the right way. Last Saturday, (2016-08-20) , we got the votes from all the new joiners and started off with the Fedora QA on-boarding call .

Last time when we ran it , we faced few roadblocks , major one being the "Participation" limit of Hangouts. Since, we wanted the call to be more interactive we thought it will be best to have it over "Hangouts". One major roadblock was that although we had a very interactive call with a load of question answers , the participation limit was pretty much restricted because we reached the max number of participants.

In the last call, we went for "Hangouts on Air" this is where we traded our interactivity with viewership. As this call technically had zero limitation in terms of participation  , the communication pattern was mostly simplex. Although pirate pad , help a lot in making the duplex conversation with the chatbox feature in it.

Adding to the silver lining, we had the recording up with zero downtime. People with lower bandwidth can always watch the video or download it from youtube for reference.

Adamw post for the call : https://www.happyassassin.net/2016/08/19/fedora-qa-onboarding-call-tomorrow-2016-08-20-at-1700-utc/

Youtube recording : https://www.youtube.com/watch?v=ASQmkOrB_DY

Having said that , We will be conducting on-boarding calls when ~30 new contributor join Fedora QA

UEFI for QEMU now in Fedora repositories

Posted by Kamil Páral on June 27, 2016 12:55 PM

I haven’t seen any announcement, but I noticed Fedora repositories now contain edk2-ovmf package. That is the package that is necessary to emulate UEFI in QEMU/KVM virtual machines. It seems all licensing issues having been finally resolved and now you can easily run UEFI systems in your virtual machines!

I have updated Using_UEFI_with_QEMU wiki page accordingly.

Enjoy.


‘Package XXX is not signed’ error during upgrade to Fedora 24

Posted by Kamil Páral on June 22, 2016 11:54 AM

Many people hit issues like this when trying to upgrade to Fedora 24:

 Error: Package a52dec-0.7.4-19.fc24.x86_64.rpm is not signed

You can easily see that this is a very widespread issue if you look at comments section under our upgrade guide on fedora magazine. In fact, this issue probably affects everyone who has rpmfusion repository enabled (which is a very popular third-party repository). Usually the a52dec package is mentioned, because it’s early in the alphabet listing, but it can be a different one (depending on what you installed from rpmfusion).

The core issue is that even though their Fedora 24 repository is available, the packages in it are not signed yet – they simply did not have time to do that yet. However, rpmfusion repository metadata from Fedora 23 demand that all packages are signed (which is a good thing, package signing is crucial to prevent all kinds of nasty security attacks). The outcome is that DNF rejects the transaction for being unsecure.

According to rpmfusion maintainers, they are working on signing their repositories and it should be done hopefully soon. So if you’re not in a hurry with your upgrade, just wait a while and the problem will disappear soon (hopefully).

But, if you insist that you want to upgrade now, what are your options?

Some people suggest you can add --nogpgcheck option to the command line. Please don’t do that! That completely bypasses any security checks, even for proper Fedora packages! It will get you vulnerable to security attacks.

A much better option is to temporarily remove rpmfusion repositories:

$ sudo dnf remove 'rpmfusion-*-release'

and run the upgrade command again. You’ll likely need to add --allowerasing option, because it will probably want to remove some packages that you installed from rpmfusion (like vlc):

$ sudo dnf system-upgrade download --releasever=24 --allowerasing

This is OK, after you upgrade your system, you can enable rpmfusion repositories again, and install the packages that were removed prior to upgrade.

(I recommend to really remove rpmfusion repositories and not just disable them, because they manage their repos in a non-standard way, enabling and disabling their updates and updates-testing repos during the system lifecycle according to their needs, so it’s hard to know which repos to enable after the system upgrade – they are not the same as were enabled before the system upgrade. What they are doing is really rather ugly and it’s much better to perform a clean installation of their repos.)

After the system upgrade finishes, simply visit their website, install the repos again, and install any packages that you’re missing. This way, your upgrade was performed in a safe way. The packages installed from rpmfusion might still be installed unsafely (depending whether they manage to sign the repo by that time or not), but it’s much better than to upgrade your whole system unsafely.

To close this up, I’m sorry that people are hit by these complications, but it’s not something Fedora project can directly influence (except for banning third-party repos during system upgrades completely, or some similar drastic measure). This is in hands of those third-party repos. Hopefully lots of this pain will go away once we start using Flatpak.


Hosting your own Fedora Test Day

Posted by sumantro on June 07, 2016 03:08 AM
This post will be talking about how to hold your own test day. In most of the time although the situation will not arise when you have to do all the work single handed. But below is a draft of how you can go ahead an host your own testdays or atleast proceed with.


Procedure:

1. Decide the change which you wanna test for
2. Create a ticket
3. Find out if that type of test case ran sometime before or not, if yes then you can easily re-use the previous wikipage of test cases but if not then you have to write a fresh new testday wiki and test case wiki .
4.Once you are done setting up the wiki(s) , you need to think about the metaapp data page
5. And the results will be shown in the Test day App.

How to setup:
1.For fedora 24 you can find the change set here .

2. Create a Trac ticket. One example is below

                      
                                 
                               

3.Once you are sure you have picked up at least one check if that type of test ever ran before, if not then first start off by grabbing the test day wiki template.

                                           

Actual wiki testday page 
                     
                       
                                      
4.Next and the most crucial part is setting up the test case page, this page contains instruction as in what is supposed to be done and how. It also explains which test case should yield what result(s).One example is below


                           




5.Setting up the meta page for the app to work fine. Here is how , create a wikipage exactly like the one which is given below





6.Once done, you need to add the meta link here

7. After this moment one, your test day is live and you can find it's Testday App result page which will look much like this.

                              


That completes all the procedures required , needless to say that you should announce it on @test list and @test-announce list!

Fedora Media Writer testday [Report]

Posted by sumantro on April 21, 2016 11:06 AM
This is a proud moment as we have seen a good amount of people participating in the event .

Testers : 22 , Test : 40
Bugs Filed : 16 , New :16  , Dup : 0 , Fixed : 0




Participants
  1. Ayan
  2. Karthik Subrahmanya
  3. achembar
  4. arehtykitna
  5. frantisekz
  6. kparal
  7. lbrabec
  8. pschindl
  9. priynag
  10. renault
  11. satellit
  12. sayak
  13. sumantro
  14. swarnava
  15. cpanceac
  16. deaddrift
  17. lsatenstein
  18. roger.k.wells
  19. cmurf
  20. jsedlak
  21. juliuxpigface
  22. qqqqqqq


Bugs and Github issues Reported

  1. moving liveusb-creator to a different display runs out of memory and kills desktop - 1328452
  2. Luc doesn't reuse already downloaded image and stuck - 1328369
  3. AttributeError: 'NoneType' object has no attribute 'device' - 1328337
  4. liveusb-creator: pycurl.error: cannot invoke setopt() - perform() is currently running -1328340
  5. crash when choosing ISO on samba server- 1328560
  6.  Windows 10 install usb stick not recognized-1328794
  7. liveusb-creator on Windows 10 opens diskpart.exe but does nothing-1328484
  8. partitions are not unmounted before writing image to a device, can lead to corrupted data written-1328498
  9. do not leave partially downloaded files on disk-1328789
  10. no writing progress is displayed-1328462
  11. USB stick light flashes continuously while LUC is running, whether it's read/writing or not- 1328563
  12. cpu is over 100% for liveusb-creator process while application is doing nothing - 1318491
  13. Open button doesn't work in file selection dialog-1328457
  14. https://github.com/lmacken/liveusb-creator/issues/43
  15. https://github.com/lmacken/liveusb-creator/issues/44
  16. https://github.com/lmacken/liveusb-creator/issues/45