Wednesday, August 15, 2018

Speed Scripting - Making an Animated GIF with ImageMagick

Those of you who follow me on Twitter probably know that I'm something of a geek-of-all-trades; I love reading and learning.  When my employer, IBM, entered the world of Open Badges for employee credentials, I jumped in with both feet.  We're encouraged to display our badges in email signatures, online, et cetera...but what is one to do when one has earned 50+ badges in a wide range of areas/disciplines?  Now, obviously I can't put that many badges in an email signature, and a static display can only do so much...so the thought occurred to me, "I should be able to turn these into an animated image!"  Time for some shell scripting...



Let's consider the source material:

  • Our badging provider provides badges as 352x352 PNG images, so I don't have to worry about resizing individual images; I can treat them all the same way and resize the end product as needed.
  • I'm not worried about the PNG metadata in this particular case, so converting to another format isn't a big deal.
  • I could go with an animated PNG (APNG), but (a) not all browsers display APNGs properly, and (b) animating PNGs can result in images significantly larger than comparable animated GIFs

At this point, I'm ready to start playing with ImageMagick, a wonderfully complete (and open source!) image manipulation package.  ImageMagick handles 200+ image formats, and is available as both a command-line toolbox and libraries/interfaces for a wide range of programming languages.


The actual construction of the animated GIF with ImageMagick is straightforward - it's a one-liner:

$ convert *.png -frame 5 -dispose Background -set delay 150  allmybadges.gif

The options put a frame around each image, dispose of the previous image before displaying the next one, and set the delay (in hundredths of seconds) before adding them to the end product, allmybadges.gif.


Wait, though...oh, I don't like that.  Since many badges have similar names (e.g. task-level-1.png and task-level-2.png, ibm-iot-whatever1.png, ibm-iot-whatever2.png, etc.), my use of a wildcard puts them in alphabetical order...which is kind of boring when 2-3 consecutive badges share color schemes or other design elements.  So, what I really want to do is randomize the badges before animating them.  Well, bash provides $RANDOM as an environment variable, yielding pseudorandom integers between 0 and 32767; that should be sufficient for me to play with filenames.

I wound up with this script:

#!/bin/bash
#
# badgeanim - create animated GIF of all PNG badges
#
WORKDIR=/home/badges/animate
BADGEDIR=/home/badges

cd $WORKDIR                          # move into working directory
rm $WORKDIR/*.png                    # blow away any old PNG files


for file in $BADGEDIR/*.png          # grab copies of current PNGs
do
cp $file ./$RANDOM`basename "$file"` # prepend $RANDOM to filename
done

convert *.png -frame 5 -fill snow2 -background snow2 -dispose Background -set delay 150 -set dispose Background allmybadges.gif                                         # ImageMagick builds the GIF

rm $WORKDIR/*.png                    # blow away leftover PNG files
exit


The results can be seen at the top of this article and in the "My OpenBadges" widget on the right margin of the page.  (Clicking on the animated GIF to the right takes the viewer to a static display of all badges.)  I could resize the end product for use elsewhere, like so:

$ convert allmybadges.gif -resize 200x200 allmybadges-200x200.gif

Check out ImageMagick - it's definitely worth your time if you do anything at all with images.

Saturday, August 11, 2018

First Steps in node.js - Fortune Cookies!

I've been doing a lot of learning around cloud architectures, microservices, and the like, so I thought it was time to start learning JavaScript and node.js.  I started working with IBM's SDK for node.js, but soon learned that IBM has deprecated its proprietary SDK in favor of the community SDKs. I installed the latest/greatest community SDK (v10.8.0) on my Ubuntu Linux system.

Some months ago, I wrote a Twitter bot that delivers a random fortune-cookie quote/saying every 6 hours. (It's @CollectedQuotes, if you'd like to follow it.)  Well, what better example could I adapt as a simple service in node.js?  Here we go, short and sweet (click images to enlarge)...

Here's the source code:


Here's the (very simple) browser page it generates:
Here's the 404 response to any URL request other than the root:
Finally, here's the console log:

It might be clunky, but it works!  If you'd like to play with this, you can download the JavaScript source and the fortune-cookie file.

Wednesday, August 01, 2018

Optimizing Wireshark for HTTP Analysis/Troubleshooting

I spend quite a bit of time troubleshooting various web applications, so I've done a lot of work with Wireshark's HTTP display filters. The ones I use most frequently are:

  • http.response.code - In an HTTP response, the numeric response code (200, 404, 500, etc.)
  • http.content_length - In an HTTP response (and PUTs and POSTs), the total payload size
  • http.request.method - The type of request made by the client (e.g. GET, PUT, POST, CONNECT, etc.)
  • http.request_in - For an HTTP response, the packet # of the corresponding request
  • http.response_in - For an HTTP request, the packet # of the corresponding response
  • http.time - elapsed time between request and response

That last filter needs a bit of explanation, because it can be computed in two different ways.  If the TCP preference "Allow subdissector to reassemble TCP streams" is enabled, http.time reflects the time between the request and the last packet of the response (i.e. the end of any data returned); if that preference is disabled, http.time reflects the time between the request and the first packet of the response (i.e. the HTTP response code).  I almost always have that TCP preference enabled, because the client/browser usually can't do anything with the response until it receives all of it!

So, one could work through a Wireshark session, using display filters like http.response.code==404, http.content_length > 4096, or http.time > 2.0 to display various packets...but, to be honest, I'd rather not do that much typing.  So, I set out to optimize Wireshark's performance and tweak its display for HTTP analysis.

The most significant performance optimization one can implement in Wireshark is to disable analysis of irrelevant protocols.  By default, Wireshark tries to dissect every protocol it can identify...and there are hundreds of protocols in its dissection engine.  Disabling irrelevant protocols will greatly enhance overall performance.  Keep in mind, though, that one needs visibility at all layers of the network stack; for most work in my environment, I need Wireshark to dissect Ethernet, IPv4, TCP, UDP, ICMP, SSL/TLS, HTTP, and a handful of other protocols.  I disabled analysis of all other protocols.
Now, to the GUI tweaking.  First, I created filter buttons for general display filters, so that I can apply them with a single click:
  • http - apply http - display all packets identified as HTTP
  • 1xx - apply http.response.code < 200 - display responses with informational codes
  • 2xx - apply http.response.code > 199 && http.response.code < 300 - display responses with success codes
  • 3xx - apply http.response.code > 299 && http.response.code < 400 - display responses with redirection codes
  • 4xx - apply http.response.code > 399 && http.response.code < 500 - display responses with client error codes
  • 5xx  - apply http.response.code > 499 - display responses with server error codes
  • >2s - apply http.time > 2.0 - display all responses that required more than 2s to complete
  • GETs - apply http.request.method=="GET" - display all GET requests
  • POSTs - apply http.request.method=="POST" - display all POST requests
Then, I added columns for use in "eyeballing" HTTP traffic and doing quick sorting (you can sort on any column in Wireshark's display):
  • Stream - display tcp.stream, the connection identifier generated by Wireshark as the file is read
  • Req # - display http.request_number, the request's sequence number within its connection (useful for HTTP-pipelined connections)
  • HTTP Req - display http.request_in for HTTP responses
  • HTTP Res - display http.response_in for HTTP requests
  • HTTP Time - display http.time
  • HTTP RC - display http.response.code for all responses
  • Payload - display http.content_length for all requests/responses with data payloads

Here's a sample of the results (click to enlarge):


(You can see the "one click" filter buttons in the top right.)

From here, I can sort on any column (like, oh, HTTP Time?), match requests and responses easily, identify "red flags" at a glance (why on earth did it take 1.5s to pull down PNG files of only ~280 KB?!), and examine how "red flags" affected subsequent requests on the same connection (that 1.5s delay affected the browser's processing of the next GET as well, since they were on the same HTTP-pipelined stream), all from a single view...and Wireshark's processing time is greatly improved, to boot!

Now, this isn't perfect.  If any problem occurs in IP/TCP/SSL-TLS/HTTP reassembly (for instance, a missing or corrupted packet), you won't get full information on affected HTTP transactions...but it's easy to right-click on the "missing information" packet and use Wireshark's Follow TCP stream command to zero in on a transaction with missing information and determine what happened.

(If you'd like to try this profile, you can download it as a ZIP file.  You'll want to unzip it in your Wireshark profiles directory; it will create a profile directory named 'HTTP'.  Within Wireshark, you can switch profiles by clicking on "Profile: Default" in the bottom-right corner.  Remember, though, that any configuration changes you make are automatically saved to the current profile; be careful!)

So, that's the Wireshark environment I'm using for HTTP analysis.  Did I miss something you consider important?  Is there a change you would make in your environment?  Let me know in the comments...

Wednesday, July 18, 2018

Today's Scripting - Extracting HTTP Performance Data from Wireshark with Python


I'm often asked what kind of data can be exported from Wireshark, especially when we're troubleshooting performance issues.  Most recently, someone said, "It would be great if we could suck this HTTP timing data into a spreadsheet." I'm not a big fan of spreadsheets, but I said to myself, "Hmm...can't be that difficult to do"...and sat down to write some code.

A traditional shell script with a few Linux/Unix utilities could do this (I did a LOT of awk and sed back in the day...), but I'm in the process of teaching myself Python, so I set out to do some snake charming.  I'm DEFINITELY a novice, so I'm sure that those more experienced with Python will offer improvements to my brute-force, trial-and-error code. Having said that...

The first step is to collect network packet data while also collecting the TLS session keys from the browser.  (If you haven't done that before, check out my brief video on the technique.)  For ease of use, we're going to name the packet capture files "testX.pcapng" and the matching TLS key logfile "testX-keys."

Now, we turn to Wireshark's command-line kin, tshark.  Since the timing information we need is computed by Wireshark (it isn't in the native packet data), we'll need to run a two-pass analysis in tshark.  So, our tshark command looks like this (I'll split lines for readability):

/usr/local/bin/tshark -2                    # 2 passes to pick up computed values
   -o tcp.check_checksum:FALSE              # tshark ignores packets that fail TCP checksum, so skip that check
   -o ssl.keylog_file:testX-keys            # here's our TLS session keyfile
   -Y "http.time || http.request.full_uri"  # find packets with these fields - requests have URI, responses have http.time
   -T fields                                # we want to output certain fields (specified with -e)
   -e frame.number                     # output frame number (from all packets)
   -e http.request.method              # output HTTP method (GET, POST, etc.) (from requests)
   -e http.request.full_uri            # output full URI requested (from requests)
   -e http.response_in                 # output frame # of response (from requests)
   -e http.request_in                  # output frame # of request (from responses)
   -e http.time                        # output elapsed time (from responses)
   -e http.response.code               # output HTTP response code (from responses)
   -E separator=,                      # separate output fields with commas
   -r testX.pcapng                     # read from testX.pcapng

With these options, tshark provides an output stream of mixed requests and responses that looks like this:

2324,GET,https://apps.na.collabserv.com/,2340,,, (request - note empty fields)
[...any number of intermediate lines...]
2340,,,,2324,0.059056000,302 (response - note empty fields)

(I didn't yet know it, but this format was going to come back and bite me - stay tuned...)

So, I'm going to have to match requests and responses to aggregate the needed data into a single CSV line - let's go write some Python!  In a fit of originality, I named the script httpstats.

First, basic housekeeping.  In general use, I'm assuming that the packet capture files are <name>.pcapng, and that the TLS session key logfiles are <name>-keys...so I'll expect the user to invoke my script with "httpstats name" and go from there. After importing the Python libraries I'll need, the first steps are to validate the command-line argument and let the user know where the output is going:

#!/usr/bin/python3
import sys
import os.path
import subprocess
import csv

if len(sys.argv) != 2:
        print("Syntax: %s filestem \n  %s will look for filestem.pcapng and filestem-keys" % (sys.argv[0],sys.argv[0]))
        quit()

pcapfile = sys.argv[1] + ".pcapng"
keyfile = sys.argv[1] + "-keys"

if os.path.isfile(pcapfile) and os.access(pcapfile,os.R_OK):
        print("Processing %s" % (pcapfile))
else:
        print("ERROR: Capture file %s missing or unreadable" % (pcapfile))
        quit()

if os.path.isfile(keyfile) and os.access(keyfile,os.R_OK):
        print("Using keyfile %s" % (keyfile))
else:
        print("ERROR: Key file %s missing or unreadable" % (keyfile))
        quit()



output_csv = sys.argv[1] + ".csv"
print("CSV file %s will be overwritten if it exists..." % (output_csv))


If we get this far, both the capture file and its accompanying TLS keyfile are present. We're ready to set a few variables, invoke tshark and start parsing its output:

stats_list = list()

tshark_cmd = '/usr/local/bin/tshark -2 -o tcp.check_checksum:FALSE -o ssl.keylog_file:' + keyfile + ' -Y "http.time || http.request.full_uri" -T fields -e frame.number -e http.request.method -e http.request.full_uri -e http.response_in -e http.request_in -e http.time -e http.response.code -E separator=, -r ' + pcapfile

p = subprocess.Popen(tshark_cmd, shell=True, stdout=subprocess.PIPE,universal_newlines=True)

Now to parse tshark's output into Python lists:

for line in p.stdout:
        line = line.rstrip()    # get rid of trailing newlines
        line = line.split(",")

It was at this point that my first test runs blew up in my face.  The .split method simply says "make this line of data a Python list, with commas delimiting list elements"...but I had forgotten that URIs can contain commas.  As a result, what I thought would be a simple Python list of 7 elements (some empty) in every case turned into Python lists of up to 79 elements when .split encountered commas in the URI!  So, I had to catch those cases and use an on-the-fly .join method to undo what .split had done to the third field of the line AND put literal commas back into the URI data...while leaving the first two fields and last four fields intact.  Only then could I append the (corrected) list to my master list-of-lists:

        if(len(line) > 7):
           line[2:len(line)-4] = [','.join(line[2:len(line)-4])]
stats_list.append(line)

(Yeah, figuring THAT one out took a few minutes.  *laugh*)

We're ready to match up request and response data, then write our CSV data. Here's the data structure:

# stats_list[x][0] = Frame number
# stats_list[x][1] = HTTP request method (only present in requests)
# stats_list[x][2] = Full URI requested (only present in requests)
# stats_list[x][3] = Frame number containing response (only present in requests)
# stats_list[x][4] = Frame number containing request (only present in responses)
# stats_list[x][5] = HTTP response time (only present in responses)
# stats_list[x][6] = HTTP response code (only present in responses)

...so, I used nested loops to search out the request/response pairs, then did a single write that pulled elements from both entries and wrote a single CSV line.  Since each line written represents a single HTTP transaction, I also counted them and informed the user of the total number of transactions found:

outputlinecount = 0

with open(output_csv,'w+') as out_file:
outwriter = csv.writer(out_file,delimiter=',')
for packet in range(len(stats_list)):
for target in range(len(stats_list)):
if(stats_list[target][4] == stats_list[packet][0]):
outwriter.writerow([stats_list[packet][0],
           stats_list[target][0],stats_list[target][5],
           stats_list[target][6],stats_list[packet][1],
           stats_list[packet][2]])
outputlinecount += 1

print("%d HTTP transactions written to %s" % (outputlinecount,output_csv))

The end result was a CSV file with entries like this:

6308,6315,0.125648,200,POST,http://www.foobieblex.com/cgi-bin/snarf

That's the packet number of the request, packet number of the response, elapsed time, HTTP return code, the HTTP method used (GET, POST, OPTIONS, etc.), and the URI requested.  (Remember those URIs with commas?  The .csvwriter method automatically quotes any fields containing commas, so that didn't bite me a second time.) This CSV file can be imported directly into any tool that accepts CSV data.  It isn't perfect - for instance, it doesn't (yet) catch HTTP requests that never completed - but it took less than an hour to write/test, and it's sufficient to the task at hand.  Here's a sample run against a 15MB packet capture containing roughly 25,000 packets:


Let me know what you think - or any Python tips/tricks to improve things - in the comments!

VIDEO: Where in the World Are Your Users? Geolocation with Wireshark

You've probably seen websites that greeted you with something like "Oh, you're in New York City? Here's our local store" or asked to "know your location".  If you've ever wondered how they do that, the answer is IP geolocation.  It's an interesting technique...and you can apply it to your own network capture data in Wireshark!

It's a neat trick; I've known mobile service providers who used it to create a dynamic map of locations they were "currently serving", and I've worked with data center operators who used it to create a dynamic heatmap of transaction loads from different parts of the world.  The best part is that - at least for simple, introductory purposes - you can start working with it for free!

In this video, I'll demonstrate how to enable IP geolocation in Wireshark, export the data in CSV format, and upload it to a mapping provider.  Basically, we'll go from a packet capture to a worldwide contact map in about 12 minutes.

As always - if you enjoy the video, please consider giving it a YouTube like and/or comment!


VIDEO: Decrypting End-User SSL/TLS Browser Sessions with Wireshark

Given that just about everyone is using HTTPS these days (and well they should!), troubleshooting web applications can be a major pain when it comes to network-layer analysis.  Fiddler is a solid tool, but its man-in-the-middle approach to capturing HTTPS sessions doesn't work in many secure environments, thanks to certificate issues.  What if you could just grab the end user's browser sessions and decrypt those?  Well, you can!

In this video, I'll demonstrate how to collect TLS session keys from Firefox/Chrome, import them into Wireshark, and work with the decrypted data.

If you enjoy the video, please consider giving it a like and/or a favorable comment...


Yeah, I'm Making Videos Now...

I finally took the plunge into creating videos.  Basically, I'll be creating short (less than 15 minutes) tutorials on various techniques I've developed in working with data networking and troubleshooting.

The videos are posted on IBM Collaboration Services' YouTube channel, ICSSupportVideos.

For those interested in such things, I'm using an off-the-shelf Logictech C270 webcam in a completely FOSS (free and open source software) environment:


Production values are minimal - I'm just learning this stuff - so be kind in your comments where aesthetics are concerned.  **chuckle**

If there's a particular topic in the networking world which interests you, let me know!