Saturday, June 01, 2019

Speed Scripting - EXIF, JSON, and OpenBadges

As OpenBadges have become more commonplace, I find that I haven't earned them all in a single identity; some were awarded through my work email address, others via my primary personal email address, and still others to yet another personal email address.  This led to a bit of confusion in a recent course of study, when I accidentally completed the prerequisite work under multiple email addresses.  To resolve that problem (and ensure that I don't do that in the future), I needed a tool to display the email address to which each of my badges had been awarded.

Now, there are a few complicating factors:
  1. This information is stored (in the badge PNG image) as metadata, namely a JSON array.  We don't yet have any standard/common Linux command-line tool for parsing JSON, and I like to write scripts with as few dependencies as possible.
  2. The OpenBadges specification has been around long enough to go through a few versions, and the location of the information has changed from version to version; that means I can't do a straightforward "it's always in field 27" text-based extraction.
  3. For verification purposes, the recipient's email address is NOT stored as plaintext; rather, a SHA256 hash of the address is embedded.  So, I'll need to know the SHA256 hashes for each of my email addresses used in earning badges; I used this free SHA256 Hash Generator to grab the necessary hashes. 
First, I installed exiftool (which is available for most Linux distributions - check your package manager) to extract the PNG metadata.  The OpenBadges JSON array is returned as a single string element, like so:
Openbadges                      : {"@context":"https://w3id.org/openbadges/v1","type":"Assertion","id":"https://www.youracclaim.com/api/v1/obi/badge_assertions/c1396602-e7c9-4213-a769-7d2735fb0f28","uid":"c1396602-e7c9-4213-a769-7d2735fb0f28","recipient":{"type":"email","identity":"sha256$cb6c6af3883a452da960b6cb88630b221888e7afce0ff612a483351664bb7dd0","hashed":true},"image":"https://acclaim-production-app.s3.amazonaws.com/images/84ac9eff-b8a2-4683-846b-f59887a73801/Python%2B101%2BData%2BScience.png","evidence":"https://www.youracclaim.com/badges/c1396602-e7c9-4213-a769-7d2735fb0f28","issuedOn":"2017-12-12T23:57:06.000Z","badge":"https://www.youracclaim.com/api/v1/obi/badge_classes/91e77961-2bcc-4b3a-9ff8-9333921bb2c4","verify":{"type":"hosted","url":"https://www.youracclaim.com/api/v1/obi/badge_assertions/c1396602-e7c9-4213-a769-7d2735fb0f28"}}

The only piece of this that interests us (in this case) is this:
sha256$cb6c6af3883a452da960b6cb88630b221888e7afce0ff612a483351664bb7dd0
Now, this series is entitled "Speed Scripting", so I'm not about to write a full JSON parser.  *chuckle*  Given that this is basically a string of text output, I'm going to use grep to pull the Openbadges string from exiftool's output and employ awk to iterate through the fields and print only the field containing "sha256", using the double quote as my field delimiter.  (I'm using " as my field delimiter so that I don't have to go through an extra step to strip the quotes from the output string.)  The resulting string will be stored in the $hashaddr variable:
BADGEDIR=/home/wmorgan/badges
for file in $BADGEDIR/*.png
do
  hashaddr=`exiftool $file | grep Openbadges |
  awk -F'\"' '
     {for (i=1; i<=NF; i++) {
        if ($i ~/sha256/) print $i;
     }}'`
done
(Note that I had to use a backslash to escape the doublequote character in the -F option to awk.)

That just outputs all the SHA256 hashes...but that's only the halfway point; I need to map those hashes to their corresponding email addresses.  So, I'll add my "known hashes" that I generated earlier (note that we must prepend "sha256$" to each hash string and escape the $ character):
MYHASHEDADDR1="sha256\$caceb56680cabe892cdc5b903b2cbaf9ec3462699dff9094cc25ad27ef824aaa"
MYHASHEDADDR2="sha256\$cb6c6af3873a452da960b6cb88630b221845e7afce0ff612a483351664bb9bd0"
MYHASHEDADDR3="sha256\$2a196380936706eac9c60c638be63409be9eb2728ed0302188ce2d899ed22afb"
and use the shell's case construct to handle the comparison and output:
case $hashaddr in
     $MYHASHEDADDR1)
echo wesbo@email.address.edu
        ;;
     $MYHASHEDADDR2)
        echo wessinator@second.email.org
        ;;
     $MYHASHEDADDR3)
        echo bigwes@yetanother.emailaddr.com
        ;;
     *)
        echo Unknown
        ;;
  esac
Finally, I need to print the name of the badge with its corresponding email address, so I'll extract that from the filename with basename and print it at the top of my do loop:
echo -n `basename $file .png`": "
So, here's our finished product:
#!/bin/sh
#
# dispbadgeaddr - show email address to which OpenBadge was issued
# requires: exiftool
# prerequisite: MYHASHEDxxxxx contains SHA-256 hash of email address
MYHASHEDADDR1="sha256\$caceb56680cabe892cdc5b903b2cbaf9ec3462699dff9094cc25ad27ef824aaa"
MYHASHEDADDR2="sha256\$cb6c6af3873a452da960b6cb88630b221845e7afce0ff612a483351664bb9bd0"
MYHASHEDADDR3="sha256\$2a196380936706eac9c60c638be63409be9eb2728ed0302188ce2d899ed22afb"
BADGEDIR=/home/wmorgan/badges
for file in $BADGEDIR/*.png
do
  hashaddr=`exiftool $file | grep Openbadges |
  awk -F'\"' '
     {for (i=1; i<=NF; i++) {
        if ($i ~/sha256/) print $i;
     }}'`
done
  case $hashaddr in
     $MYHASHEDADDR1)
echo wesbo@email.address.edu
        ;;
     $MYHASHEDADDR2)
        echo wessinator@second.email.org
        ;;
     $MYHASHEDADDR3)
        echo bigwes@yetanother.emailaddr.com
        ;;
     *)
        echo Unknown
        ;;
  esac
done
and here's a bit of its output:
ibm-clm-for-safe-level-1.1: wesbo@email.address.edu
ibm-cloud-essentials: wessinator@second.email.org
ibm-cloud-kubernetes-service: wessinator@second.email.org
If I wanted to polish this a bit, I'd rewrite the core exiftool/grep/awk logic as a standalone shell function, then add the capability to specify particular PNG files on the command line or use -a to iterate through all the badges.  For now, though, I can just use the script as is and use grep to extract info as needed...
$ ./dispbadgeaddr | grep blockchain
ibm-blockchain-consulting: wessinator@second.email.org
ibm-blockchain-essentials: wessinator@second.email.org
ibm-blockchain-scale-v1: wessinator@second.email.org
interskill-blockchain-foundations: wessinator@second.email.org
Total time - 10-15 minutes...and that's what speed scripting is all about.