What Does Curl
cURL: What It Is, And How You Can Use It For Web Scraping
cURL is a versatile command used by programmers for data collection and data transfers. But how can you leverage cURL for web scraping? This article will help you get started.
23-Dec-2020
In this blog post you will learn:
What is cURL?
How to use cURL?
Why is cURL so popular?
Web scraping with cURL
What Is cURL?
cURL is a command-line tool that you can use to transfer data via network protocols. The name cURL stands for ‘Client URL’, and is also written as ‘curl’. This popular command uses URL syntax to transfer data to and from servers. Curl is powered by ‘libcurl’, a free and easy-to-use client-side URL transfer library.
Why using curl is advantageous?
The versatility of this command means you can use curl for a variety of use cases, including:
User authentication
HTTP posts
SSL connections
Proxy support
FTP uploads
The simplest ‘use case’ for curl would be downloading and uploading entire websites using one of the supported protocols.
Curl protocols
While curl has a long list of supported protocols it will use HTTP by default if you don’t provide a specific protocol. Here is the list of supported protocols:
Image source: Bright Data
Installing curl
The curl command is installed by default in Linux distributions.
How do you check if you already have curl installed?
1. Open your Linux console
2. Type ‘curl’, and press ‘enter’.
3. If you already have curl installed, you will see the following message:
4. If you don’t have curl installed already, you will see the following message: ‘command not found’. You can then turn to your distribution package and install it (more details below).
How to use cURL
Curl’s syntax is pretty simple:
Curl [options] [url]Image source: Bright Data
For example, if you want to download a webpage: just run:
curl source: Bright Data
The command will then give you the source code of the page in your terminal window. Keep in mind that if you don’t specify a protocol, curl will default to HTTP. Below you can find an example of how to define specific protocols:
Curl source: Bright Data
If you forget to add the curl will guess the protocol you want to use.
We talked briefly about the basic use of the command, but you can find a list of options on the curl documentation site. The options are the possible actions you can perform on the URL. When you choose an option, it tells curl what action to take on the URL you listed. The URL tells curl where it needs to perform this action. Then curl lets you list one or several URLs.
To download multiple URLs, prefix each URL with a -0 followed by a space. You can do this in a single line or write a different line for each URL. You can also download part of a URL by listing the pages. For example:
curl example. (page1, page4, page6). htmlImage source: Bright Data
Saving the download
You can save the content of the URL to a file by using curl using two different methods:
1. -o method: Allows you to add a filename where the URL will be saved. This option has the following structure:
curl -o source: Bright Data
2. -O method: Here you don’t need to add a filename, since this option allows you to save the file under the URL name. To use this option, you just need to prefix the URL with a -O.
Resuming the download
It may happen that your download stops in the middle. In this case scenario, rewrite the command adding the -C option at the beginning:
Curl -C -O source: Bright Data
Why is curl so popular?
Curl is really the ‘swiss-knife’ of commands, created for complex operations. However, there are alternatives, for example, ‘wget’ or ‘Kurly’, that are good for simpler tasks.
Curl is a favorite among developers because it is available for almost every platform. Sometimes it is even installed by default. This means, whatever programs/jobs you are running, curl commands should work.
Also, chances are that if your OS is less than a decade old, you will have curl installed. You can also read the docs in a browser, and check the curl documentation. If you are running a recent version of Windows, you probably already have curl installed. If you don’t, check out this post on Stack Overflow to learn more about how to do this.
Web Scraping with cURL
Pro tip: Be sure to abide by a website’s rules, and in general do not try to access password-protected content which is illegal for the most part or at the very least frowned upon.
You can use curl to automate the repetitive process when web scraping, helping you avoid tedious tasks. For that, you will need to use PHP. Here’s an example we found on GitHub:
When you use curl to scrape a webpage there are three options, you should use:
curl_init($url) -> Initializes the session
curl_exec() -> Executes
curl_close() -> Closes
Other options you should use include:
Curlopt_url -> Sets the URL you want to scrape
Curlopt_returntransfer -> Tells curl to save the scraped page as a variable. (This enables you to get exactly what you wanted to extract from the page. )
What’s next?
In this post, we explained what curl is and what you can do with some basic commands. We also showed you an example of how you can use curl to scrape web pages. Start taking advantage of this versatile tool to start collecting your target data.
Tired of complex and timely web scraping techniques?
Gal El Al | Head of Support Head of Support at Bright Data with a demonstrated history of working in the computer and network security industry. Specializing in billing processes, technical support, quality assurance, account management, as well as helping customers streamline their data collection efforts while simultaneously improving cost efficiency.
This website uses cookies to improve the user experience. To learn more about our cookie policy or withdraw from it, please check our Privacy Policy and Cookie PolicyAgree
What is the curl command? Learning and testing APIs with …
cURL, which stands for client URL, is a command line tool that developers use to transfer data to and from a server. At the most fundamental, cURL lets you talk to a server by specifying the location (in the form of a URL) and the data you want to send. cURL supports several different protocols, including HTTP and HTTPS, and runs on almost every platform. This makes cURL ideal for testing communication from almost any device (as long as it has a command line and network connectivity) from a local server to most edge devices.
The most basic command in curl is curl. The curl command is followed by the URL, from which we would like to retrieve some kind of data. In this case, it would return the html source for
Underlying the curl command is the libcurl development library, which has bindings for almost any codebase.
cURL is also the name of the software project, which encompasses both the curl command-line tool and the libcurl development library.
Prerequisites
To try out the commands in this article, you need a command shell and internet access. We are going to be using the NASA APOD (Astronomy Picture Of the Day) API to create some examples. This API is open source but you will need to sign up for a developer key, which takes just a minute to get signed up.
Why use curl?
So, why should you use cURL? Consider these benefits of this software project:
It is highly portable. It is compatible with almost every operating system and connected device.
It is useful for testing endpoints, to check if they are working.
It can be verbose, providing details of exactly what has been sent/received, which is helpful for debugging.
It has good error logging.
It can be rate limited.
Sending API requests
We can use curl to send API requests. Each request is generally made up of four main parts:
An endpoint, which is the address (URL) to which we are sending the request.
An HTTP method. The most common methods used are GET, POST, PUT and DELETE.
GET is used to retrieve a resource from a server. This could be a file, information, or an image.
POST is used to send information to the server.
PUT can be used to create or update a resource. This could be used to create or update a record in a database or update the contents of a file.
DELETE is used to delete a resource such as a database entry.
These actions for these methods are the recommended actions, but it’s up to the API specification and implementation to define what exactly happens.
Headers, which contain metadata about the request, such as content type, user agent, and so on.
Body, which is the message body and contains the data that we want to send, if any. Generally, the body is used with POST and PUT methods.
curl command options
There are over two hundred curl options. You can see some of them by typing curl -h in a terminal. The most commonly used command options include these:
-I returns only the HTTPS headers
curl –request GET ‘
This command will return header fields such as Date, Content-Type etc
-v is the verbose option
curl –request GET ‘NASA_API_KEY&date=2020-01-01’ -v
This verbose command will show you everything that happens when you run the curl command, from connection to the headers and any data returned. Here we also get the description of the image that is being returned by the request, along with the image url.
-o stores the output in a file
curl –request GET ‘NASA_API_KEY&date=2020-01-01’ –output curloutput
Combining curl with other CLI commands
Combining curl with other cli commands can be really handy in situations where you want to use the output of a command as the input to a curl command or vice versa.
As an example, you could see if a webpage contains a certain piece of text using curl and grep.
Here is an example of using curl and Python to extract the image link from a request to the NASA API and display it in the Preview app (MacOS only):
curl –request GET “NASA_API_KEY&date=2020-01-01” -s | python3 -c “import sys, json; print(()[‘url’])” | xargs curl -o && open -a Preview
In this example we use curl to make a GET request on the Nasa API endpoint. This returns json data, which we use in a small Python script to extract the url of the image. We then use the curl command to get the image and open it using Preview on the mac.
You don’t have to use the command line curl to make API requests. You can use a number of different tools to interact with an API, such as HTTPie, Postman, and Rest Client in VS Code.
HTTPie
HTTPie is a command-line HTTP client that is touted as more friendly to users. It also includes a more expressive, color-coded UI. They have an online version, which is really neat.
Postman
Postman is a UI-based client for all things related to API development, and it’s arguably one of the most popular. Postman is very powerful.
You can generate and execute curl commands from within Postman. To generate curl commands, you can enter the request URL and parameters, and then click on the code option on the right-hand side:
A box is displayed with the option to select from a number of languages, including curl. Select curl to see the generated curl command.
Postman gives you a history of all the requests that you’ve built and even data-stamps them, which can be nice for collecting data points or references as you work with an API. Think of it as the client taking notes for you.
Rest Client in VS Code
Rest Client for VS Code is probably one of my favorite tools for executing curl commands. It’s lightweight and has good syntax highlighting. It’s a really useful add-on to do some quick curl requests from within VS Code.
You can simply type in your curl command and a ‘send request’ option will appear above.
After you click send request, another tab opens with the response.
Summary and Next Steps
In this article, we introduced you to the basic curl command and its most useful options. We also mentioned only a handful of the tools that are available to help you get started with cURL. Now you can begin using cURL to test your endpoints and troubleshoot your applications.
To find out more about APIs and API Management, check out the following articles and videos available on IBM Developer:
API Fundamentals
What is a REST API
Acknowledgements
This article was originally authored by Amara Graham and published in April of 2019.
Curl Command in Linux with Examples
curl is a command-line utility for transferring data from or to a server designed to work without user interaction. With curl, you can download or upload data using one of the supported protocols including HTTP, HTTPS, SCP, SFTP, and FTP. curl provides a number of options allowing you to resume transfers, limit the bandwidth, proxy support, user authentication, and much this tutorial, we will show you how to use the curl tool through practical examples and detailed explanations of the most common curl stalling Curl The curl package is pre-installed on most Linux distributions check whether the Curl package is installed on your system, open up your console, type curl, and press enter. If you have curl installed, the system will print curl: try ‘curl –help’ or ‘curl –manual’ for more information. Otherwise, you will see something like curl command not curl is not installed you can easily install it using the package manager of your stall Curl on Ubuntu and Debian sudo apt updatesudo apt install curlInstall Curl on CentOS and Fedora sudo yum install curlHow to Use Curl The syntax for the curl command is as follows:In its simplest form, when invoked without any option, curl displays the specified resource to the standard example, to retrieve the homepage you would run:curl command will print the source code of the homepage in your terminal no protocol is specified, curl tries to guess the protocol you want to use, and it will default to the Output to a File To save the result of the curl command, use either the -o or -O option. Lowercase -o saves the file with a predefined filename, which in the example below is -o -O saves the file with its original filename:curl -O Multiple files To download multiple files at once, use multiple -O options, followed by the URL to the file you want to the following example we are downloading the Arch Linux and Debian iso files:curl -O \ -O a Download You can resume a download by using the -C – option. This is useful if your connection drops during the download of a large file, and instead of starting the download from scratch, you can continue the previous example, if you are downloading the Ubuntu 18. 04 iso file using the following command:curl -O suddenly your connection drops you can resume the download with:curl -C – -O headers are colon-separated key-value pairs containing information such as user agent, content type, and encoding. Headers are passed between the client and the server with the request or the the -I option to fetch only the HTTP headers of the specified resource:curl -I –2 if a Website Supports HTTP/2 To check whether a particular URL supports the new HTTP/2 protocol, fetch the HTTP Headers with -I along with the –2 option:curl -I –2 -s | grep HTTPThe -s option tells curl to run in a silent (quiet) and hide the progress meter and error the remote server supports HTTP/2, curl prints HTTP/2. 0 200:HTTP/2 200
Otherwise, the response is HTTP/1. 1 200:HTTP/1. 1 200 OK
If you have curl version 7. 47. 0 or newer, you do not need to use the –2 option because HTTP/2 is enabled by default for all HTTPS Redirects By default, curl doesn’t follow the HTTP Location you try to retrieve the non-www version of, you will notice that instead of getting the source of the page you’ll be redirected to the www version:curl -L option instructs curl to follow any redirect until it reaches the final destination:curl -L mChange the User-Agent Sometimes when downloading a file, the remote server may be set to block the Curl User-Agent or to return different contents depending on the visitor device and situations like this to emulate a different browser, use the -A example to emulates Firefox 60 you would use:curl -A “Mozilla/5. 0 (X11; Linux x86_64; rv:60. 0) Gecko/20100101 Firefox/60. 0” a Maximum Transfer Rate The –limit-rate option allows you to limit the data transfer rate. The value can be expressed in bytes, kilobytes with the k suffix, megabytes with the m suffix, and gigabytes with the g the following example curl will download the Go binary and limit the download speed to 1 mb:curl –limit-rate 1m -O option is useful to prevent curl consuming all the available ansfer Files via FTP To access a protected FTP server with curl, use the -u option and specify the username and password as shown below:curl -u FTP_USERNAME:FTP_PASSWORD logged in, the command lists all files and directories in the user’s home can download a single file from the FTP server using the following syntax:curl -u FTP_USERNAME:FTP_PASSWORD upload a file to the FTP server, use the -T followed by the name of the file you want to upload:curl -T -u FTP_USERNAME:FTP_PASSWORD Cookies Sometimes you may need to make an HTTP request with specific cookies to access a remote resource or to debug an default, when requesting a resource with curl, no cookies are sent or send cookies to the server, use the -b switch followed by a filename containing the cookies or a example, to download the Oracle Java JDK rpm file
you’ll need to pass a cookie named oraclelicense with value a:curl -L -b “oraclelicense=a” -O Proxies curl supports different types of proxies, including HTTP, HTTPS and SOCKS. To transfer data through a proxy server, use the -x (–proxy) option, followed by the proxy following command downloads the specified resource using a proxy on 192. 168. 44. 1 port 8888:curl -x 192. 1:8888 the proxy server requires authentication, use the -U (–proxy-user) option followed by the user name and password separated by a colon (user:password):curl -U username:password -x 192. 1:8888 curl is a command-line tool that allows you to transfer data from or to a remote host. It is useful for troubleshooting issues, downloading files, and examples shown in this tutorial are simple, but demonstrate the most used curl options and are meant to help you understand how the curl command more information about curl visit the Curl Documentation
you have any questions or feedback, feel free to leave a comment.
Frequently Asked Questions about what does curl
What exactly is curl?
cURL, which stands for client URL, is a command line tool that developers use to transfer data to and from a server. At the most fundamental, cURL lets you talk to a server by specifying the location (in the form of a URL) and the data you want to send. … The most basic command in curl is curl http://example.com .Feb 23, 2021
What does curl do in terminal?
curl is a command-line utility for transferring data from or to a server designed to work without user interaction. With curl , you can download or upload data using one of the supported protocols including HTTP, HTTPS, SCP , SFTP , and FTP .Nov 27, 2019
What is option in curl?
The following are some of the available options used with curl and examples of their use. -a, –append. When uploading a file, this option allows you to append to the target file instead of overwriting it (FTP, SFTP). $ curl –append file.txt ftp://ftp.example.com/file.txt. –connect-timeout.Jul 1, 2020