Down is a utility tool for streaming, flexible and safe downloading of remote files.
Full documentation can be found at https://hexdocs.pm/down
Add down
to your list of dependencies in mix.exs
:
def deps do
[
{:down, "~> 0.0.1"}
]
end
Install via mix deps.get
.
The library's main functionality is to create streams from external remote HTTP files. It can also be useful for the following reasons:
- It offers safer downloads with size limitation.
- It offers a simple API with good defaults to "just download" content.
- It offers a thin wrapper--with a normalized API--that allows you to use different HTTP backend libraries.
- Low memory consumption.
- Not extra dependencies.
If you're accepting URLs from outside or untrusted source,
it's a good policy to limit the size of the download,
because attackers are always looking to over work your servers.
Down allows you to pass a :max_size
option:
iex> Down.download("http://example.com/image.jpg", max_size: 5 * 1024 * 1024) # 5 MB
{:error, :too_large}
What is the advantage of using Down instead of simply checking the size after downloading?
Down terminates the download very early, as soon as it gets to the Content-Length
header.
When the Content-Length
header is missing,
Down terminates the download as soon as the downloaded content surpasses the :max_size
option.
:max_size
can be used in any Down operation.
Down handles redirects using the :max_redirects
option.
In the case a request produces more redirects than the given option
it returns an error:
iex> Down.download("http://example.com/100_redirects.html", max_redirects: 1)
{:error, :too_many_redirects}
The primary function is Down.download/2
which stores the remote file as a temporary file.
The returned Down.Download
struct contains more information about the request
including file_path
with the location of the temporary file:
iex> {:ok, download} = Down.download("http://localhost:6080/robots.txt")
{:ok, %Down.Download{
backend: Down.HackneyBackend,
file_path: "/tmp/f-1552846757-26630-1bzq5eq.txt",
original_filename: "robots.txt",
request: %{
backend_opts: nil,
body: nil,
headers: %{"User-Agent" => "Down/0.0.1"},
method: :get,
timeout: 5000,
url: "http://localhost:6080/robots.txt"
},
response: %{
encoding: nil,
headers: %{
"access-control-allow-credentials" => "true",
"access-control-allow-origin" => "*",
"connection" => "keep-alive",
"content-length" => "30",
"content-type" => "text/plain",
"date" => "Sun, 17 Mar 2019 18:19:17 GMT",
"server" => "gunicorn/19.9.0"
},
size: 30,
status_code: 200
},
size: 30
}}
iex> download.file_path
"/tmp/f-1552846757-26630-1bzq5eq.txt"
By default the remote file will be downloaded into a temporary location.
If you would like the file to be downloaded to a
specific location on disk, you can specify the :destination
option:
iex> Down.download("http://example.com/image.jpg", destination: "/path/to/destination")
{:ok, %Down.download{
file_path: "/path/to/destination"
}}
If the remote content is the only thing you need, Down.read/2
function is a good option.
It returns a string with the full content:
iex> Down.read("https://google.com")
{:ok,
"<!doctype html><html itemscope=\"\" itemtype=\"http://schema.org/WebPage" <> ...}
Down has the ability to retrieve remote file content as it is being
downloaded. The Down.stream
function returns a stream which
can be used with functions in the Stream
and Enum
module functions.
The content of the remote file is streamed in binary chunks. Because streams are lazy, the chunks are only downloaded on demand. This means that if only the first chunk is requested, the rest of the file isn't downloaded and the operation won't consume more resources. This is very convenient when working with large files, or when we're only interested in some parts of the file, or when keeping low memory consumption is important.
In the following example we are downloading the Elixir webpage content, but only until the title is reached:
# This example doesn't cover all the possibilities
# and it doesn't close the connection
iex> Down.stream("https://elixir-lang.org") |> Enum.find_value(&Regex.run(~r/\<title\>(.*)\<\/title\>/i, &1, capture: :all_but_first))
["Elixir"]
Every Down operation creates a new process under a custom DynamicSupervisor
.
However, Down doesn't have any kind of pool connection or a maximum connection number limitation.
:hackney
and :ibrowser
backends use their own connection pool,
so if you're using one of these backends you're automatically using its connection pool.
There are four optional backends in Down:
The default is :httpc
which comes by default in Erlang,
so no additional dependencies are needed to use Down.
It can be changed globally using the following config:
config :down, :backend, :hackney
The backend can also be changed in runtime and per request,
simply by using the :backend
argument:
iex> Down.read("https://example.com/api.json", backend: :ibrowser)
Down
and :hackney
have proper SSL (https) support.
To run the test correctly, you need to have an httpbin server running on port 6080
.
The best way to do it is using the kennethreitz/httpbin docker image:
docker run -p 6080:80 -d kennethreitz/httpbin
Before publishing a stable version the following tasks should be done:
- Check the minimum compatible version for the backends dependencies.
- Improve tests.
- Improve docs.
- Fix timeouts
- API to close streams
- API to fetch metadata information from a stream
- Handle form data
- Handle cookies
This library is heavily inspired on the awesome Ruby gem down