« Github Spy

October 6, 2016 • ☕️ 3 min read

GitHub is an incredible tool to contribute and collaborate with everyone online. Central to GitHub is to follow others. See what they push, merge, and comment. All of these activities are broadcasted and one can gleam from the activity how productive or, in the absence of activity, how unproductive you are (this is a mere heuristic than absolute truth).

In March 2016, I founded and launched the (boot) Coding School in Mission, TX with the support of Texas Workforce Commission and Mission EDC. I developed the curriculum and taught for the past 3 months on Full Stack Web Development. Part of my mission with the course was to establish tools that will signal to the instructor if the students are performing. While studying and viewing videos are great tools to learn how to code, the proof is in the code. If you can push code, have your code reviewed, and you iteratively improve then you are on the path to becoming a software developer. Therefore, knowing when and how students were participating on GitHub became a top priority.

Github Spy is Born

You can review the code at github.com/ibolmo/github-spy.

Github Spy is just that: a script that runs periodically and reviews activity of the unwilling participant and publishes the activity/findings to an analytics engine. The spy is a simple Node.JS script that uses mikedeboer/node-github (GitHub API client) and keen/keen-tracking (Keen.IO JS client). Think of Keen.IO as Google Analytics for Developers. For ease of deployment, I chose to use RedHat’s OpenShift (an open source cloud PaaS), but you could use Heroku or the awesome zeit/now.

Results

The following are just random folks (and myself) that I found via the GitHub explore section. I can’t give enough credit to the wonderful folks at Keen.IO. Their product is flawless and so beautifully built. It’s so easy to use that I’ve neglected to include instructions. Let me know, though, if you’d like a separate article.

This is a simple query (lookup) done in Keen.IO. This shows the number of watch activities in the past 45 days!

Another example. This time demonstration a lookup of code changes, and to which repository.

The Code

The source code is available with a very inclusive MIT license. It’s rather simple, but that’s how code should be. I hope this is helpful for other schools, or even recruiters.

Quick propaganda. I encourage you to review and consider our students for employment: codergv/boot/Students.md.

Edification

One of my objectives with this publication on Medium is to continue contributing to the community. If you’re learning to code, or some of the tech in the project is new to you — or you’re just that nerdy (awesome!) — the following sections are for you. I’m just going to cover the interesting bits, however. Should you need additional help leave a comment.

In the main script: index.js

require(‘dotenv’).config();

This reads the .env (see README.md) for environment variables into process.env[…]. This is a great thing. You will receive secret and client id keys from the GitHub and Keen.IO APIs. If you included these secrets in your source code, then the rest of world will be able to pretend to be you. Separating these keys in a .env file protects you.

Note: do not include .env in the repository.

# tell git to ignore .env file
> echo '.env' >> .gitignore

Main loop

There’s a lot going on here. I’ve refactored the code to make it easier to read. I’ve included comments inlined (it’s a rare occasion). You can follow along from top to bottom, and let me explain with a picture:

(1) For each of the usernames defined in users.js we look up their activity events. Because each lookup is asynchronous (non-blocking), we need to track which lookups (termed pages in the code) have finished fetching. As soon as finished the request, then the request handler (handleEvents) gets called with the GitHub API results (events). (2) For each of these events, each activity is sent to Keen.IO using the keen-tracking nodejs client. Again, because each of the keen.recordEvent asynchronous we need to track when those are finished. At every request handler call, we delete the flag from either local store. (3) Eventually the stores will be empty, which will mean that we are done processing the queues. At that time, and only that time, we will be able to exit the operation of the script.

This is the beauty (and beast) of event based languages like JavaScript. Granted the code would be more readable if I had used async/await or Promises. Abstracting to individual components and pure functions would also help, but let’s admit it: this is a quick hack. I needed to use ES5 because OpenShift starts at NodeJS 0.60. There is a way to have Node 6.5 (which includes Promises and generators) in OpenShift, but this was a quick solution.

Feedback

What would you improve or change? What’s missing?