How To Mine Data From Instagram
Scrape Data from Instagram | Octoparse
The latest version for this tutorial is available here. Go to have a check now!
In this tutorial, we are going to scrape data from Instagram, including the post content, date, image URL, number of likes and location.
To follow through, you may want to use this URL in the tutorial:
Here are the main steps in this tutorial:[Download demo task file here]
1) “Go To Web Page” – to open the targeted web page
2) Create a pagination loop – to scrape data from multiple posts
3) Extract data – to select the data for extraction
4) Customize the data field using RegEx tool – to revise the field name (Optional)
5) Save and start extraction – to run the task and get data
· Create the task with “Advanced Mode”.
· Paste the URL into the “Extraction URL” box and click “Save URL” to move on
· Change the default built-in browser
The default built-in browser of Octoparse 7 is incompatible with Instagram. To have our target page loaded normally, we need to modify the browser setting.
· Click “Setting”
If you use Octoparse 7. 0. 2, please have the task saved before modifying the settings
· Switch the default built-in browser to Firefox 45. 0.
· Click “Save” to apply the modified setting
2) Create a pagination loop – to scrape data from multiple posts
We can use the “>” button as the“Next page” button to go to the next post. Before creating the pagination loop, we need to go back to the first post.
· Click the first post and click the “A” tag on the bottom of “Action Tips”
When you select an item with URL, the selected tag would be “A”. Normally there’s no need to modify, as Octoparse automatically identifies tags of selected items. But for this case, we need to revise the tag on the bottom of “Action Tips”.
· Select “Click the link”
We have the first post opened now. However, as Instagram loads the content with AJAX, we should set up AJAX Load for the “Click Item” action.
· Uncheck “Auto retry when no response”
· Check “Load the page with AJAX”
· Set up “AJAX Timeout”
Now, we can create the “Pagination”
· Click the “>” button
· Click “Loop click next page” on the “Action Tips”
Instagram uses AJAX on the “>” button, so we need set up AJAX Load for “Click to Paginate” action as well.
· Click “Load the page with AJAX” on the “Customize Action”
· Set up “AJAX timeout”
Tips!
To learn more about dealing with AJAX in Octoparse, please refer to Deal with AJAX.
We are now on the second post. When creating a “Loop Item”, we should always start with the first item on the first page. In this case, we should go back to the first post.
· Click “Go To Web Page” in the workflow
· Click “Click Item”
Octoparse would open the first post.
· Click the pagination loop in the workflow
By doing this, we can help Octoparse decide the execution order and generate the “Extract data” step at the appropriate position in the workflow.
Now, let’s start extract data.
· Select the data you want
· Click “extract data” on the “Action Tips”
4) Customize the data field – to revise the field name(Optional)
· Revise the field name
Typing or selecting from the pre-defined options.
· Click “Start Extraction”
· Select “Local Extraction” to start execution.
Below is the sample output.
Was this article helpful? Contact us any time if you need our help!
Data Mining: Instagram Scraper (1) | by Bruce Oh | Medium
The generation of social media has been changing continuously. Back then, most of people used Facebook to share their thoughts and pictures, and Facebook was the place that people communicate with friends online. But, as our life, nothing lasts forever. At some point, we can easily see that the popularity of social media has moved to Instagram from Facebook. As time goes by, not many people post their idea, pictures on Facebook anymore. Possible reasons for this changes could be being tired of old platform, desire to new contents, or special features that new social medias have. Various reason affect on it. Perhaps, it might be a bit late to discuss about why people moved to Instagram from Facebook because it has been awhile since people have started using Instagram, and still moving to something else, like snapchat, or whatever. However, I personally believe that the fact that never change is words/pictures that users post on social media contains a lot of information about people, society, trends, and social tendency so that we can interpret the intention of people through social media regardless of what social media is. Thus, analyzing social media help you understand what are the trend that people follow currently. The most salient reason made me focus on Instagram is it is specialized for the photo. Not like Facebook or Tweeter, Instagram concentrates on the Photo. Mainly it creates a certain type of social phenomenon based on Pictures. Instagram would make people imply their desire through photos rather than directly reveal it out. It is a interesting part of social media features. Back then, social media user wanted to tell what they have, what they think through the word and photos but nowadays they imply their intention and want people notice it implicitly or secretly. According to this desires, the social media pictures on Instagram have become the implication towards others. This particular desire successfully got people moving over to Instagram. In this point, I would like to share something called Instagram-Scraper for people who want to study about stagram-Scraper is a tool that allows you get most of information posted on Instagram including photos, captions, and more information, you can always check out the official website: installation of Instagram Scraper is pretty easy. If you are on Linux, you can easily type, it still works in Mac OS if you’ve already installed install instagram-scraperAfter you’ve done with installation, you can simply typeinstagram-scraper -hto see all functions that instagram-scraper instagram-scraper [-h] [–destination DESTINATION][–login_user LOGIN_USER] [–login_pass LOGIN_PASS][–login_only] [–filename FILENAME] [–quiet][–maximum MAXIMUM] [–retain_username][–media_metadata] [–include-location][–media_types MEDIA_TYPES [MEDIA_TYPES… ]][–latest] [–tag] [–location] [–search-location][–comments] [–verbose VERBOSE][username [username… ]]instagram-scraper scrapes and downloads an instagram user’s photos and videos. positional arguments:username Instagram user(s) to scrapeoptional arguments:-h, –help show this help message and exit–destination DESTINATION, -d DESTINATIONDownload destination–login_user LOGIN_USER, -u LOGIN_USERInstagram login user–login_pass LOGIN_PASS, -p LOGIN_PASSInstagram login password–login_only, -l Disable anonymous fallback if login fails–filename FILENAME, -f FILENAMEPath to a file containing a list of users to scrape–quiet, -q Be quiet while scraping–maximum MAXIMUM, -m MAXIMUMMaximum number of items to scrape–retain_username, -nCreates username subdirectory when destination flag isset–media_metadata Save media metadata to json file–include-location Include location data when saving media metadata–media_types MEDIA_TYPES [MEDIA_TYPES… ], -t MEDIA_TYPES [MEDIA_TYPES… ]Specify media types to scrape–latest Scrape new media since the last scrape–tag Scrape media using a hashtag–location Scrape media using a location-id–search-location Search for locations by name–comments Save post comments to json file–verbose VERBOSE, -v VERBOSELogging verbosity levelYou can hide your credentials from the history, by reading yourusername from a local file:$ instagram-scraper user_to_scrapewith looking like this:-u=my_username-p=my_passwordThis scraper functions efficiently various ways. If you are looking a particular function, you better the official if you are thinking to work on data mining, you would better have a script to run it because instagram-scraper provides one query at once. Even though they provide the function to get a number of users’ photos with one command function, you might want to have different options. So what I suggest to get big data from Instagram is using Python to create the script to send multiple ’s a simple example for the script to find the places have name ‘unmami burger’ run by Pythonimport subprocessimport timelocation = ‘umami burger’while True: p = ([“instagram-scraper”, “–search-location”, location], ) output, err = mmunicate() if output! = “”: print output (20)This is a script to search the location stored in Instagram data base. It should be equivalent to the search engine on Instagram website. You can always feel free to edit this simple script to get the data you want. But the thing that you have to make sure is to give sleep between each query. If you keep sending queries without break, Instagram will ban your IP for a period of time. There is no certain clue for the time limit we are able to maximize the number of queries unless Instagram officially provides this. So you might want to test it out how much sleep time you are supposed to give. It varies by # of pictures, comments, number of outputs after search. Location tags on Instagram websitelocation-id: 62304541, title: Umami Burger, subtitle: 432 6th Ave, city:, lat: 40. 7344, lng: -73. 99861location-id: 292698230, title: Umami Burger, subtitle: 225 Liberty St, Ste 247, city:, lat: 40. 71156, lng: -74. 01533location-id: 19002234, title: Umami Burger, subtitle: 338 S Anaheim Blvd, Anaheim, California, city: Anaheim, California, lat: 33. 8324803, lng: -117. 9126456location-id: 11265111, title: Umami Burger, subtitle: 2981 Bristol St, Ste B2, Costa Mesa, CA, city: Costa Mesa, CA, lat: 33. 67915, lng: -117. 88604location-id: 87627717, title: Umami Burger, subtitle: 1200 Franklin St, Ste 2190, city:, lat: 37. 8025053, lng: -122. 2706073If you aim to get the data from ‘Umami Burger located in Anaheim, California, you can simple send query with location-id. However, too many people have tagged this place with their pictures, I am going to limit the number of pictures I will download to 300 and also get a meta-data file. The meta-data file will be formed and includes caption, the number of LIKES, most of data that a photo stagram-scraper –location 19002234 -m 300 –media_metadataEven thought I limited the number of pictures but it has only 274 pictures which means there are 274 pictures Instagram users tagged Umami Burger located in Anaheim, California on their pictures. In your folder you can see a bunch of pictures and stagram-scraper is the powerful tool to collect data from Instagram. I shared only how to search the location tag but you will be able to obtain much more data with this helps, and I will share some collaborations with Instagram Scraper and Yelp!
Terms of Use – Instagram
We are updating our Terms of Use: Our updated Terms of Use will be effective on January 19, 2013.
By using the website and Instagram service you are agreeing to be bound by the following terms and conditions (“Terms of Use”).
Basic Terms
You must be 13 years or older to use this site.
You may not post nude, partially nude, or sexually suggestive photos.
You are responsible for any activity that occurs under your screen name.
You are responsible for keeping your password secure.
You must not abuse, harass, threaten, impersonate or intimidate other Instagram users.
You may not use the Instagram service for any illegal or unauthorized purpose. International users agree to comply with all local laws regarding online conduct and acceptable content.
You are solely responsible for your conduct and any data, text, information, screen names, graphics, photos, profiles, audio and video clips, links (“Content”) that you submit, post, and display on the Instagram service.
You must not modify, adapt or hack Instagram or modify another website so as to falsely imply that it is associated with Instagram.
You must not access Instagram’s private API by any other means other than the Instagram application itself.
You must not crawl, scrape, or otherwise cache any content from Instagram including but not limited to user profiles and photos.
You must not create or submit unwanted email or comments to any Instagram members (“Spam”).
You must not use web URLs in your name without prior written consent from Instagram, inc.
You must not transmit any worms or viruses or any code of a destructive nature.
You must not, in the use of Instagram, violate any laws in your jurisdiction (including but not limited to copyright laws).
Violation of any of these agreements will result in the termination of your Instagram account. While Instagram prohibits such conduct and content on its site, you understand and agree that Instagram cannot be responsible for the Content posted on its web site and you nonetheless may be exposed to such materials and that you use the Instagram service at your own risk.
General Conditions
We reserve the right to modify or terminate the Instagram service for any reason, without notice at any time.
We reserve the right to alter these Terms of Use at any time. If the alterations constitute a material change to the Terms of Use, we will notify you via internet mail according to the preference expressed on your account. What constitutes a “material change” will be determined at our sole discretion, in good faith and using common sense and reasonable judgement.
We reserve the right to refuse service to anyone for any reason at any time.
We reserve the right to force forfeiture of any username that becomes inactive, violates trademark, or may mislead other users.
We may, but have no obligation to, remove Content and accounts containing Content that we determine in our sole discretion are unlawful, offensive, threatening, libelous, defamatory, obscene or otherwise objectionable or violates any party’s intellectual property or these Terms of Use.
We reserve the right to reclaim usernames on behalf of businesses or individuals that hold legal claim or trademark on those usernames.
Proprietary Rights in Content on Instagram
Instagram does NOT claim ANY ownership rights in the text, files, images, photos, video, sounds, musical works, works of authorship, applications, or any other materials (collectively, “Content”) that you post on or through the Instagram Services. By displaying or publishing (“posting”) any Content on or through the Instagram Services, you hereby grant to Instagram a non-exclusive, fully paid and royalty-free, worldwide, limited license to use, modify, delete from, add to, publicly perform, publicly display, reproduce and translate such Content, including without limitation distributing part or all of the Site in any media formats through any media channels, except Content not shared publicly (“private”) will not be distributed outside the Instagram Services.
Some of the Instagram Services are supported by advertising revenue and may display advertisements and promotions, and you hereby agree that Instagram may place such advertising and promotions on the Instagram Services or on, about, or in conjunction with your Content. The manner, mode and extent of such advertising and promotions are subject to change without specific notice to you.
You represent and warrant that: (i) you own the Content posted by you on or through the Instagram Services or otherwise have the right to grant the license set forth in this section, (ii) the posting and use of your Content on or through the Instagram Services does not violate the privacy rights, publicity rights, copyrights, contract rights, intellectual property rights or any other rights of any person, and (iii) the posting of your Content on the Site does not result in a breach of contract between you and a third party. You agree to pay for all royalties, fees, and any other monies owing any person by reason of Content you post on or through the Instagram Services.
The Instagram Services contain Content of Instagram (“Instagram Content”). Instagram Content is protected by copyright, trademark, patent, trade secret and other laws, and Instagram owns and retains all rights in the Instagram Content and the Instagram Services. Instagram hereby grants you a limited, revocable, nonsublicensable license to reproduce and display the Instagram Content (excluding any software code) solely for your personal use in connection with viewing the Site and using the Instagram Services.
The Instagram Services contain Content of Users and other Instagram licensors. Except as provided within this Agreement, you may not copy, modify, translate, publish, broadcast, transmit, distribute, perform, display, or sell any Content appearing on or through the Instagram Services.
Instagram performs technical functions necessary to offer the Instagram Services, including but not limited to transcoding and/or reformatting Content to allow its use throughout the Instagram Services.
Although the Site and other Instagram Services are normally available, there will be occasions when the Site or other Instagram Services will be interrupted for scheduled maintenance or upgrades, for emergency repairs, or due to failure of telecommunications links and equipment that are beyond the control of Instagram. Also, although Instagram will normally only delete Content that violates this Agreement, Instagram reserves the right to delete any Content for any reason, without prior notice. Deleted content may be stored by Instagram in order to comply with certain legal obligations and is not retrievable without a valid court order. Consequently, Instagram encourages you to maintain your own backup of your Content. In other words, Instagram is not a backup service. Instagram will not be liable to you for any modification, suspension, or discontinuation of the Instagram Services, or the loss of any Content.
Frequently Asked Questions about how to mine data from instagram
Can I mine data from Instagram?
Instagram-scraper is the powerful tool to collect data from Instagram. I shared only how to search the location tag but you will be able to obtain much more data with it.Sep 10, 2017
How do I extract data from Instagram?
Instagram.com from a mobile browser:Tap your profile picture in the bottom right to go to your profile.Tap Settings in the top left.Tap Privacy and Security.Scroll down to Data Download and tap Request Download.Enter the email address where you’d like to receive a link to your data and tap Next.More items…
Is it legal to scrape Instagram?
You must not abuse, harass, threaten, impersonate or intimidate other Instagram users. You may not use the Instagram service for any illegal or unauthorized purpose. … You must not crawl, scrape, or otherwise cache any content from Instagram including but not limited to user profiles and photos.Jan 19, 2013