Making Programs & Events Dashboard updates more reliable

Google Summer of Code projects are often a chance to focus on nagging problems — things that come up every so often, but are never quite urgent enough for more than a quick fix. For Programs & Events Dashboard — which is lately being used for more than 2000 events per year — these “every so often” issues can add up in a big way. Whether it’s rare errors that require a manual fix or user interface shortcomings that lead to confusion and questions, fixing them means saving a lot of time and frustation in the long run.

Shashwat Kathuria’s GSoC 2020 project tackled a set of these issues, and I’m delighted with what he’s accomplished. I’ll let him introduce himself and explain what he worked on.

-Sage


Hello everyone! I am Shashwat Kathuria, a 3rd year undergraduate pursuing Bachelors of Technology in Computer Science & Engineering at Indian Institute of Technology, Jodhpur. This year I was selected for Google Summer of Code program with Wikimedia Foundation and I worked on Wiki Education Dashboard with my mentor Sage Ross. My project focused around improving the error tracking in course data updat process so that we can inform users of the same along with some update statistics, and also making improvements in the process.

The error tracking part of the project will be helpful for all types of programs, particularly so for large ones that involve huge amounts of data or highly prolific editors. When the statistics for a program are incomplete or have not updated recently, it has been difficult for users to figure out what was going wrong and why they were not able to see the data that they were expecting. Now, with this feature in place, the Dashboard makes it much easier to identify errors along with other common causes of missing data — for instance, the Wikimedia replica databases lagging behind (so that missing edits show up later), or a program inadvertently set up an an “Article Scoped Course” (which only counts edits to specific assigned articles or tracked categories).

There are a few general types of scenarios in which the course data update process can lead to errors:

  • Server errors from MediaWiki and other data sources
  • Getting an unexpected format of data
  • Network connectivity problems
  • Timeouts when reading data or connecting to the server

With the error tracking system in place, the Dashboard now keeps track of any errors that occur under the hood during, specific to each update of each program. Additionally, the overall update statistics have also been surfaced to the users, including:

  • Summary of the latest update, whether or not it succeeded, including the number of errors that occurred (if any)
  • Summary of the last 10 updates, how many failed and how many encountered errors
  • Total number of updates
  • Whether or not the course will be regularly updated in future, and how long the statistics will continue to be updated.
Click “See more” beneath the program stats to see new details about the update process.
When an error occurs during update, we now keep track of which program update caused it.

For the next part of the project, I worked on making data updates more reliable. This involved a technical concept called “orphan locks”. At a high level, an orphan lock stops all future updates to a program by making the Dashboard’s data update process believe that the statistics were already in the middle of an update, when they were actually not getting updated. It is usually caused when our servers run out of memory or abruptly shut down when the course data update was in process. This was a particularly crucial problem, and earlier we solved it manually only when we were informed by users when a particular course was not getting updated. I added a feature to automatically detect and remove these orphan locks, so that programs don’t get stuck indefinitely without updates. And when we deployed this feature into production, it removed 7 orphan locks that we hadn’t known about!

For the last part of my project, I made the course data update process for categories more efficient and continuous. Previously, the Dashboard updated tracked categories once a day  for all courses — meaning that it could take a while before any data in a tracked category would show up. Now, category updates run regularly along with the main course update process with some slight improvement in the efficiency. I implemented the same strategy for article status, which is a more heavy process in terms of time taken and memory — which we think may have been the cause of major system stability problems. So far, the system has been very stable since these changes were deployed, although the time between updates for long-running programs is somewhat longer lately.

All in all, my journey with Wiki Education Dashboard has been truly amazing, a lot of which I owe to my mentor Sage Ross who has given me help at each step and made coding super enjoyable. I have been able to learn a lot of new technologies, practices, techniques and get self-reflecting feedback for which I am super grateful because it opened a door for me how things work in real-world applications and how each step is carefully analyzed before going forward.

– Shashwat


Thank you Shashwat! It’s been a great pleasure working with you over the last few months, and your work is already making a difference!