Thoughts

Long from writing about my research and the field in general. Also tangents because I have many interests.

Designing my COVID Dashboard

Designing my COVID Dashboard

My Mississippi State COVID Dashboard was designed to be an at-a-glance summary of the current detected COVID cases on my campus. Herein I dicuss why and how I created it, and the choices I made in the design. I am writing this to be transparent in the design and purpose—in the midsts of a pandemic in a country that is struggling to trust science at times, I feel it is doubly important to properly communicate my intent.

Motivation

In early August, I discovered that the health center was reporting its COVID testing results broken down by students, employees, and others. In addition, it was partitioned into positive and negative tests, so that the positivity rates could be identified. However, the data was reported only in aggregate or with respect to the last week—I was more interested in the change over time (especially with regards to the week before). In addition, having an archive of the weekly cases could be useful for future data analysis and visualization courses (the exact courses I teach). Thus, I set out to design a simple, compact dashboard that would summarize a rough “State-of-State”.

The Technology

The first stage was to create something to archive the data. I build a little scraper that I run manually once a week. It uses Python to read the LSHC page and extract the results, storing them if there is a new week of data; if new data is stored, I manually upload it to GitHub. Thus, so long as the LSHC page is updated regularly, we’ll have an archive of data.

The second part was to generate the embeddable dashboard. The generation is straightforward

  • I state how many new cases there are amongst students and employees separately (and in aggregate)
  • I calculate the current positivity rates and indicate (via color) how concerning they are based upon some guidelines by the World Health Organization and Johns Hopkins
  • Finally, I generate some small line charts to give an overview of the trend over time

All of this written in Python. It reads my archived data from GitHub directly, then generates the HTML snippet to embed. Since I’m not using GitHub pages to store the snippet, I have to do a bit of trickery to properly embed the page in an iframe on my site.

The Design

Positivity Rates

The guidance from the WHO and Johns Hopkins linked above established two thresholds to be concerned with: 5% and 15% positivity. Thus, I color the results based upon a “least”, “more”, and “most concern” threshold. Really, anything red means “be concerned” and bold red just emphasizes the level of concern. I am changing a quantitative dataset (a range of percentages) into a categorical/ordinal data set (ok/bad with two levels of bad) so I’m not commiting the great sin of color mapping. Or maybe I am just trying to convincing myself.1

Positivity Sparklines

As I write this post, I only have two weeks of data from the start of the Fall 2020 semester. However, at some time, there should be more. Consider what happens if the positivity rate is increasing week to week. Which of these two show the larger positivity?

Two different data sets. Which has a higher value?

The issue here is that the upper value is increasing, but the size of the visualization is not—the mapping is not preserved week-to-week. This can potentially lead to misinterpretation, as I have commented on in the past:

Now, in my case, the problem is a mapping of space not a mapping of color, but the issue is the same—we want to preserve the mental model. In this case, I have a fixed upper percentage: So long as the positivity does not reach 50%, I assume the vertical span is 0%–50%. Thus, week-to-week, the vertical scale should be the same. Only if the positivity increases past 50% (which can happen due to lack of tests alone) will the scale change.

Same data, now with same vertical scale.

What about horizontal scale? I could fix it to some number of pixels per week. However, as time grows without bound but screenspace does not, this suffers a similar problem as above. I believe so long as the three sparklines have the same width, we will be comparing a similar scale. The science of aspect size seems to be undecided at the moment best I can tell (correct me if I am wrong. Nicely), but for the moment I bank the sparkine to 45° using the simple median approach. I just chose the one with the largest width and fix the other two to be the same.

Improvements

I put this all together over one weekend, so while functional, there are things that could be improved:

  • The scraper could be embedded as service that runs automatically, instead of me checking once a week
  • I’m not doing anything sophisicated to verify that the data I get does not change mid-week—if LSHC changes the page beyond a date change, I don’t check for it.
  • The dashboard generation could similarly be generated once a week. It wouldn’t need to live on GitHub then and make embedding easier if run from my own server.
  • As mentioned, the data is incomplete. However, neither our local hospital or the State Health Department report on testing directly (though MSDH has aggregrate testing results).
  • My thresholds for concerning positivity value could be tweaked. As stated, I used the WHO-recommended 5% threshold for concern. The 15% threshold I took as a value from the aforelinked Jonhs Hopkins page, though I’ve seen a few values elsewhere.
  • Similarly to above, I do not check the “5% or lower positivity for two weeks” guidance from WHO, just the threshold. The sparklines can give a sense of this, though.
  • I only state if positive test rates are increasing, unchanged, or decreasing. I’m interested in the psychology of interpreting change so I could say “strongly” increasing if someone looking at it would reasonably say that (i.e., is 1 to 2 strongly? 17 to 40 likely is, but what of 27 to 50 or 27 to 64?). Any pointers to literature on this is appreciated.
  • I should flag if the positivity gets greater than 50% and thus the vertical scale changes. But at that point, assuming its not a data glitch (e.g., one test positive out of one test given), there are likely bigger issues.
  • There are a few potential studies that can be with regards to the tasks and designs I chose; feel free to contact me if you want to delve into them further.

  1. I discovered the New York Times’ interactive on testing after I created this project. It uses sparklines and a color scheme similar to mine, but also highlights more data in a national context. I’ll chalk it down to similar thinking and that I’m on the right track.