If you have never heard of GitSense before, GitSense brings advanced search and code management metrics to Bitbucket, GitHub and GitHub Enterprise. And in this blog post, we'll talk about client-side routing and how it's used by GitSense, to support its divide and conquer approach to indexing and searching.
Indexing and searching at GitHub and Bitbucket's scale is extremely challenging. Creating and maintaining real-time search indexes, for millions of repositories, requires significant processing power and lots of fast storage; both of which, are not in abundance, for a bootstrapped startup like ours. Luckily for us, improving search breadth (searching across millions of repositories) is not something we are interested in; improving search depth (drilling into repositories) is.
In order for GitHub to support searches across millions of repositories, they have to take a shallow approach to indexing. This is why you can't search for commits or diffs and why you can't search forked repositories and so forth. By focusing on search breadth, GitHub leaves a pretty big hole with regards to insight into a repository, and that's the hole we are interested in filling.
If you've read our first blog post on benchmarking low cost platforms, you would know our indexers are designed to be installed, pretty much anywhere. And it was our first post for a reason. We wanted to highlight, early on, how GitSense could be used to divide and conquer, indexing and searching, at a massive scale. And in this blog post, we'll go over how our Chrome extension is able to support such a model.
If you haven't installed our Chrome extension you can do so through the Chrome Webstore or manually, by downloading the source from GitHub or Bitbucket.
If have already installed GitSense, make sure to upgrade it to the latest version (0.3), which contains client-side routing support. You may also have to re-enable the GitSense extension, since changes were made to how permissions are requested.
With GitSense installed, goto the GitSense options page, by entering
chrome://extensions in your browser's location bar.
and clicking on "Options"
With the options page loaded, you'll find it comes with two default routing rules.
The first rule matches all Bitbucket URLs. The second rule matches all GitHub URLs. And if you expand the settings for each rule, you'll find they both reference the GitSense server at https://api.gitsense.com and GitHub/Bitbucket's respective API servers.
To view the JSON for these two routing rules, refer to chrome/js/config.js and to see how they are matched, refer to the getRule function that is defined in chrome/js/utils/config.js.
As a side note, https://api.gitsense.com is our public GitSense server and at the present moment, we are indexing about 5,000 Git repositories. And since there are millions of repositories hosted on both GitHub and Bitbucket, we are mostly likely not indexing a repository that belongs to you and/or one that you are interested in.
We are also not planning on indexing as many repos as we can, as that's not our objective. Our objective is to empower the end user, by making it, insanely easy for Bitbucket, GitHub, and GitHub Enterprise users to install our technology, so they can index/search whatever repos they want.
Now back to the topic at hand. As explained earlier, forked repositories are not searchable in GitHub. If you goto https://github.com/gitsense/atom/search?type=Code, you'll find a repo that we forked from atom/atom and in this page, you'll find the following message:
If you have a forked repo in GitHub or GitHub Enteprise, you may find this search limitation a little inconvenient; luckily this is an easy fix for GitSense. To make this repository branch level searchable, we just need to let GitSense know about it.
For brevity sake, we are not going to go over how to install and administer our GitSense server. We'll leave that for another post. You just need to know one has been installed locally at 192.168.1.77 and has been setup to index our forked gitsense/atom repository.
With our local GitSense server in place, we'll add a new routing rule to our Chrome extension. To do this, click the "Create new rule" button in GitSense options page.
By default, the GitSense options script (chrome/options.js) will append new rules and if you've read the code for the getRule function in chrome/js/utils/config.js, you'll know order is important. To ensure this rule is matched before
https://github.com/*, we'll move it up one. Below is what the new rule looks like, with all the required fields filled in.
With the new routing rule in place, we can goto https://github.com/gitsense/atom and do a quick search to see if everything works, and it does, as the screenshot below shows:
And if we bring up the Chrome debugger (Shift+Ctrl+J), we can confirm the queries are indeed, being routed to our local GitSense server at https://192.168.1.77
So just like that, GitHub+Gitsense turned a non-searchable GitHub repository, into a searchable one.
Well we hope this post was informative and if you want to learn more about our Chrome extension, you can find the source at GitHub and Bitbucket. And if you are looking to improve your Bitbucket, GitHub or GitHub Enterprise browsing/search experience, make sure to send an email to email@example.com, to register for the free GitSense Beta trial, that will be starting soon.