Download files in same folder from HTTPS site
Saying has a text containing files name listed on multi lines at same URL of a web site, at first I was expect to download all of them using curl
but it’s seem curl
don’t have option to do that, then I do a google around, it’s turned to be use wget is the right tool for the job, the question is:
How to let
wget
looping through that list of files instead of copy / paste one by one?
The intermediate answer come up in my mind is make a script, read the text file, concatenate the file name with base URL then launch wget
with that URL ( feel Use a sledgehammer to crack a nut.
huh!). I started wrote my script and I was checking on wget's
manual to see some arguments of it and I catch this:
-B URL
--base=URL
Resolves relative links using URL as the point of reference, when reading links from an HTML file specified via the -i/--input-file option
(together with --force-html, or when the input file was fetched remotely from a server describing it as HTML). This is equivalent to the
presence of a "BASE" tag in the HTML input file, with URL as the value for the "href" attribute.
For instance, if you specify <http://foo/bar/a.html> for URL, and Wget reads ../baz/b.html from the input file, it would be resolved to
<http://foo/baz/b.html>.
my mind triggered on the word input file
and I scroll back a bit and check the -I
option:
-i file
--input-file=file
Read URLs from a local or external file. If - is specified as file, URLs are read from the standard input. (Use ./- to read from a file
literally named -.)
If this function is used, no URLs need be present on the command line. If there are URLs both on the command line and in an input file, those
on the command lines will be the first ones to be retrieved. If --force-html is not specified, then file should consist of a series of URLs,
one per line.
However, if you specify --force-html, the document will be regarded as html. In that case you may have problems with relative links, which you
can solve either by adding "<base href="url">" to the documents or by specifying --base=url on the command line.
If the file is an external one, the document will be automatically treated as html if the Content-Type matches text/html. Furthermore, the
file's location will be implicitly used as base href if none was specified.
Wahoo, throw away my unfinished script and I got this:
a file that contains list of files I need to download
a URL which is https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/
and final
wget
command to get all the files I need:
$wget -I file_list.txt -B <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/>
the final result:
--2024-07-20 15:00:52-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0-javadoc.jar>
Resolving repo1.maven.org (repo1.maven.org)... 199.232.196.209, 199.232.192.209, 2a04:4e42:4c::209, ...
Connecting to repo1.maven.org (repo1.maven.org)|199.232.196.209|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 953945 (932K) [application/java-archive]
Saving to: ‘sqlite-jdbc-3.46.0.0-javadoc.jar’
sqlite-jdbc-3.46.0.0-javadoc.jar 100%[=============================================================================>] 931.59K 733KB/s in 1.3s
2024-07-20 15:00:54 (733 KB/s) - ‘sqlite-jdbc-3.46.0.0-javadoc.jar’ saved [953945/953945]
--2024-07-20 15:00:54-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0-javadoc.jar.asc>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 258 [text/plain]
Saving to: ‘sqlite-jdbc-3.46.0.0-javadoc.jar.asc’
sqlite-jdbc-3.46.0.0-javadoc.jar.asc 100%[=============================================================================>] 258 --.-KB/s in 0s
2024-07-20 15:00:55 (3.67 MB/s) - ‘sqlite-jdbc-3.46.0.0-javadoc.jar.asc’ saved [258/258]
--2024-07-20 15:00:55-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0-javadoc.jar.md5>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 32 [text/plain]
Saving to: ‘sqlite-jdbc-3.46.0.0-javadoc.jar.md5’
sqlite-jdbc-3.46.0.0-javadoc.jar.md5 100%[=============================================================================>] 32 --.-KB/s in 0s
2024-07-20 15:00:55 (466 KB/s) - ‘sqlite-jdbc-3.46.0.0-javadoc.jar.md5’ saved [32/32]
--2024-07-20 15:00:55-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0-javadoc.jar.sha1>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 40 [text/plain]
Saving to: ‘sqlite-jdbc-3.46.0.0-javadoc.jar.sha1’
sqlite-jdbc-3.46.0.0-javadoc.jar.sha1 100%[=============================================================================>] 40 --.-KB/s in 0s
2024-07-20 15:00:55 (630 KB/s) - ‘sqlite-jdbc-3.46.0.0-javadoc.jar.sha1’ saved [40/40]
--2024-07-20 15:00:55-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0-sources.jar>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 13539820 (13M) [application/java-archive]
Saving to: ‘sqlite-jdbc-3.46.0.0-sources.jar’
sqlite-jdbc-3.46.0.0-sources.jar 100%[=============================================================================>] 12.91M 5.68MB/s in 2.3s
2024-07-20 15:00:58 (5.68 MB/s) - ‘sqlite-jdbc-3.46.0.0-sources.jar’ saved [13539820/13539820]
--2024-07-20 15:00:58-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0-sources.jar.asc>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 258 [text/plain]
Saving to: ‘sqlite-jdbc-3.46.0.0-sources.jar.asc’
sqlite-jdbc-3.46.0.0-sources.jar.asc 100%[=============================================================================>] 258 --.-KB/s in 0s
2024-07-20 15:00:58 (10.6 MB/s) - ‘sqlite-jdbc-3.46.0.0-sources.jar.asc’ saved [258/258]
--2024-07-20 15:00:58-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0-sources.jar.md5>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 32 [text/plain]
Saving to: ‘sqlite-jdbc-3.46.0.0-sources.jar.md5’
sqlite-jdbc-3.46.0.0-sources.jar.md5 100%[=============================================================================>] 32 --.-KB/s in 0s
2024-07-20 15:00:58 (1.50 MB/s) - ‘sqlite-jdbc-3.46.0.0-sources.jar.md5’ saved [32/32]
--2024-07-20 15:00:58-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0-sources.jar.sha1>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 40 [text/plain]
Saving to: ‘sqlite-jdbc-3.46.0.0-sources.jar.sha1’
sqlite-jdbc-3.46.0.0-sources.jar.sha1 100%[=============================================================================>] 40 --.-KB/s in 0s
2024-07-20 15:00:59 (634 KB/s) - ‘sqlite-jdbc-3.46.0.0-sources.jar.sha1’ saved [40/40]
--2024-07-20 15:00:59-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0.jar>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 13615436 (13M) [application/java-archive]
Saving to: ‘sqlite-jdbc-3.46.0.0.jar’
sqlite-jdbc-3.46.0.0.jar 100%[=============================================================================>] 12.98M 5.36MB/s in 2.4s
2024-07-20 15:01:01 (5.36 MB/s) - ‘sqlite-jdbc-3.46.0.0.jar’ saved [13615436/13615436]
--2024-07-20 15:01:01-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0.jar.asc>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 258 [text/plain]
Saving to: ‘sqlite-jdbc-3.46.0.0.jar.asc’
sqlite-jdbc-3.46.0.0.jar.asc 100%[=============================================================================>] 258 --.-KB/s in 0s
2024-07-20 15:01:02 (24.1 MB/s) - ‘sqlite-jdbc-3.46.0.0.jar.asc’ saved [258/258]
--2024-07-20 15:01:02-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0.jar.md5>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 32 [text/plain]
Saving to: ‘sqlite-jdbc-3.46.0.0.jar.md5’
sqlite-jdbc-3.46.0.0.jar.md5 100%[=============================================================================>] 32 --.-KB/s in 0s
2024-07-20 15:01:02 (883 KB/s) - ‘sqlite-jdbc-3.46.0.0.jar.md5’ saved [32/32]
--2024-07-20 15:01:02-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0.jar.sha1>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 40 [text/plain]
Saving to: ‘sqlite-jdbc-3.46.0.0.jar.sha1’
sqlite-jdbc-3.46.0.0.jar.sha1 100%[=============================================================================>] 40 --.-KB/s in 0s
2024-07-20 15:01:02 (2.46 MB/s) - ‘sqlite-jdbc-3.46.0.0.jar.sha1’ saved [40/40]
--2024-07-20 15:01:02-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0.pom>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 18575 (18K) [text/xml]
Saving to: ‘sqlite-jdbc-3.46.0.0.pom’
sqlite-jdbc-3.46.0.0.pom 100%[=============================================================================>] 18.14K --.-KB/s in 0.004s
2024-07-20 15:01:02 (4.69 MB/s) - ‘sqlite-jdbc-3.46.0.0.pom’ saved [18575/18575]
--2024-07-20 15:01:02-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0.pom.asc>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 258 [text/plain]
Saving to: ‘sqlite-jdbc-3.46.0.0.pom.asc’
sqlite-jdbc-3.46.0.0.pom.asc 100%[=============================================================================>] 258 --.-KB/s in 0s
2024-07-20 15:01:03 (3.72 MB/s) - ‘sqlite-jdbc-3.46.0.0.pom.asc’ saved [258/258]
--2024-07-20 15:01:03-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0.pom.md5>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 32 [text/plain]
Saving to: ‘sqlite-jdbc-3.46.0.0.pom.md5’
sqlite-jdbc-3.46.0.0.pom.md5 100%[=============================================================================>] 32 --.-KB/s in 0s
2024-07-20 15:01:03 (785 KB/s) - ‘sqlite-jdbc-3.46.0.0.pom.md5’ saved [32/32]
--2024-07-20 15:01:03-- <https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.46.0.0/sqlite-jdbc-3.46.0.0.pom.sha1>
Reusing existing connection to repo1.maven.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 40 [text/plain]
Saving to: ‘sqlite-jdbc-3.46.0.0.pom.sha1’
sqlite-jdbc-3.46.0.0.pom.sha1 100%[=============================================================================>] 40 --.-KB/s in 0s
2024-07-20 15:01:03 (1.00 MB/s) - ‘sqlite-jdbc-3.46.0.0.pom.sha1’ saved [40/40]
FINISHED --2024-07-20 15:01:03--
Nice huh !
But it’s not stop, I keep in my mind that curl
is a powerful tool, that is ridiculous if it can’t do a simple task like that, digging a little bit more and I found out that curl
able to do that with below syntax:
$curl -O "<https://www.site.com/path/{file1,file2,file3,file4}>"
The command is nice and curl
seem got faster download speed in compare with wget
, the only thing that I worry about is if your list of files has hundred ?
Please let me know if you have better and quicker solution for case like this.