Skip to content

Commit

Permalink
Add stargazers_count attribute of repositories, allow customizing sle…
Browse files Browse the repository at this point in the history
…ep time to prevent API rate limit errors
  • Loading branch information
simonneutert committed Feb 7, 2025
1 parent 0c3b554 commit cd3075b
Show file tree
Hide file tree
Showing 3 changed files with 72 additions and 21 deletions.
59 changes: 48 additions & 11 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,10 @@ https://knowyourmeme.com/memes/this-is-fine
- [planned features](#planned-features)
- [Prerequisities](#prerequisities)
- [Run](#run)
- [Configuration](#configuration)
- [Sleep time](#sleep-time)
- [Run in Docker](#run-in-docker)
- [Download profiles](#download-profiles)
- [Run locally](#run-locally)
- [examples](#examples)
- [Search in result files (saved profiles)](#search-in-result-files-saved-profiles)
- [examples](#examples-1)
Expand Down Expand Up @@ -84,6 +86,27 @@ or have it in your `.zshrc` 🤗 or whatever your shell loads at start

## Run

Here's what you need to get the thing running.

- [babashka](https://www.babashka.org) or Docker/Podman
- Project Configuration (optional)

### Configuration

Currently, the only configuration you can do is setting sleep time between request cycles.

#### Sleep time

**DEFAULT** sleep time is 30 seconds.

Increase the sleep time to avoid hitting the GitHub API rate limit.

You can customise the sleep time between cycles by setting the `SLEEP_TIME_SECONDS` environment variable.

```bash
$ SLEEP_TIME_SECONDS=15 bb scrape <location-like-city-or-country> <language>
```

### Run in Docker

All of the following should work in Docker, too.
Expand All @@ -97,17 +120,21 @@ $ docker run -it --rm git-hire

If you need to store the profiles, you can mount a docker volume, but this goes beyond the scope of this README.

### Download profiles
### Run locally

`$ bb scrape <location-like-city-or-country>`
```bash
$ bb scrape <location-like-city-or-country>
```

Will save the github profiles as `.edn` into the `profiles` directory,
**but** as GitHub support let me know:
> When using the language qualifier when searching for users, it will only return users where the majority of their repositories use the specified language. (please, see [documentation](https://docs.github.com/en/search-github/searching-on-github/searching-users#search-by-repository-language))
Specify further adding a language:

`$ bb scrape <location-like-city-or-country> <language>`
```bash
$ bb scrape <location-like-city-or-country> <language>
```

**Be warned!** This might not find a PHP dev who switched to Rust recently, as described by GitHub's Support.

Expand All @@ -120,7 +147,7 @@ After having built a pool of profiles, use
#### examples

`$ bb scrape mainz`
`$ bb scrape "Bad Schwalbach"`
`$ bb scrape "Bad Kreuznach"`
`$ bb scrape wiesbaden java`
`$ bb scrape wiesbaden php`
`$ bb scrape mainz javascript`
Expand All @@ -137,19 +164,29 @@ After having built a pool of profiles, use

you might go further, by piping to bb again, unimaginable possibilities...

`$ mkdir rails; cp $(grep -Zril rails profiles) rails`
```bash
$ mkdir rails; cp $(grep -Zril rails profiles) rails
```

and then:

`$ bb search-keyword "ios" | bb -e '(map #(str/upper-case %) *input*)'`
```bash
$ bb search-keyword "ios" | bb -e '(map #(str/upper-case %) *input*)'
```

### Inspect Profiles (with examples! 🤯)

`$ bb read-profile.clj simonneutert`
```bash
$ bb read-profile.clj simonneutert
```

go further, by piping
go further, by piping:

`$ bb read-profile.clj simonneutert | bb -e '(:languages *input*)'`
```bash
$ bb read-profile.clj simonneutert | bb -e '(:languages *input*)'
```

read many profiles
then read many profiles

```bash
$ bb search-keyword ruby | bb -e '(mapv #(edn/read-string (slurp %)) *input*)'
Expand Down
15 changes: 12 additions & 3 deletions src/git_hire/main.clj
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,14 @@
(def user-search-path
"/search/users")

(def default-sleep-time "30")

(def sleep-time
(let [sleep-time (or
(System/getenv "SLEEP_TIME_SECONDS")
default-sleep-time)]
(* (Integer/parseInt sleep-time) 1000)))

(defn ->utf8
[s]
(URLEncoder/encode s "UTF-8"))
Expand Down Expand Up @@ -124,7 +132,7 @@
runs (per-page->runs total-user-count per-page)
users (:items res)]
(if (> total-user-count 1000)
(do (Thread/sleep (* 4 1000))
(do (Thread/sleep (* sleep-time 1000))
(recur location lang (+ 1 more-repos-than)))
(if (> runs 1)
(do (prn "getting users with more than " more-repos-than " repos")
Expand All @@ -147,7 +155,7 @@
runs (per-page->runs total-user-count per-page)
users (:items res)]
(if (> total-user-count 1000)
(do (Thread/sleep (* 4 1000))
(do (Thread/sleep (* sleep-time 1000))
(recur location (+ 1 more-repos-than)))
(do (file-path-location-all location)
(if (> runs 1)
Expand All @@ -161,7 +169,7 @@

(defn repo-slim
[user-repo]
(select-keys user-repo [:html_url :name :description :homepage :topics :language :updated_at]))
(select-keys user-repo [:html_url :name :description :homepage :topics :language :stargazers_count :updated_at]))

(defn repos-slim
[user-repos]
Expand All @@ -180,6 +188,7 @@
{:name (get-in first-repo [:owner :login])
:owner_url (get-in first-repo [:owner :html_url])
:languages (user-languages cleaned-repos)
:total-stars (reduce + (map :stargazers_count cleaned-repos))
:repositories cleaned-repos}))

(defn recursive-curl
Expand Down
19 changes: 12 additions & 7 deletions test/git_hire/test_main.clj
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
(ns git-hire.test-main)
(ns git-hire.test-main
(:require
[clojure.test :as t]))

(require '[clojure.test :as t]
'[babashka.classpath :as cp]
'[git-hire.main :as main])
Expand Down Expand Up @@ -46,6 +49,7 @@
:homepage "www.foo.bar"
:topics ["foo" "bar"]
:language "clojure"
:stargazers_count 10
:updated_at "2020-01-01T00:00:00Z"}]
(main/repos-slim [{:name "foo"
:html_url "bar"
Expand All @@ -68,6 +72,7 @@
:homepage "www.foo.bar"
:topics ["foo" "bar"]
:language "clojure"
:stargazers_count 10
:updated_at "2020-01-01T00:00:00Z"}
(main/repo-slim {:name "foo"
:html_url "bar"
Expand Down Expand Up @@ -115,18 +120,18 @@
:pizza "turtles"}])))))

(t/deftest user-location-search-params-location
(t/is (= {:query-params {"per_page" 10, "q" "location:\"bad+kissingen\" repos:>=0"}}
(main/user-location-search-params-location 10 0 "Bad Kissingen")))
(t/is (= {:query-params {"per_page" 10, "q" "location:\"bad+kreuznach\" repos:>=0"}}
(main/user-location-search-params-location 10 0 "Bad Kreuznach")))
(t/is (= {:query-params {"per_page" 20, "q" "location:\"mainz\" repos:>=0"}}
(main/user-location-search-params-location 20 0 "Mainz"))))

(t/deftest file-path-location-all
(t/is (= "./profiles/mainz/all/"
(main/file-path-location-all "Mainz")))
(t/is (= "./profiles/bad kissingen/all/"
(main/file-path-location-all "Bad Kissingen"))))
(t/is (= "./profiles/bad kreuznach/all/"
(main/file-path-location-all "Bad Kreuznach"))))

(t/deftest user-location-search-params-location-lang
(t/is (= {:query-params {"per_page" 10,
"q" "location:\"bad+kissingen\" repos:>=0 language:\"clojure\""}}
(main/user-location-search-params-location-lang 10 0 "Bad Kissingen" "clojure"))))
"q" "location:\"bad+kreuznach\" repos:>=0 language:\"clojure\""}}
(main/user-location-search-params-location-lang 10 0 "Bad Kreuznach" "clojure"))))

0 comments on commit cd3075b

Please sign in to comment.