In this lesson you will be able to understand the basics of VBA Web scraping. And at the end you will be able to write your first web scraping program to navigate to a webpage you like. But it is not possible to cover everything about VBA web scraping from only one post like this. So please check the link at the end of this article. You will find a complete guide to develop efficient and robust VBA web scraping programs.
Today, web sites play major roles in our day today lives. From the world's largest businesses to self employed individuals use web sites to carry out their businesses. Governments, Organizations and individuals host websites for various needs. Some websites are built to show particular information to the visitors. These websites usually contain only static web pages. But some websites can interact with their visitors. Amount of data included in the websites vary from one website to another. Some websites contain a little amount of data and some contain a large amount. Sometimes business organizations and individuals want to extract that information for their needs. This is where the technology called Web scraping comes into action. VBA Web scraping is a popular technique used to collect data automatically from websites to Excel.
There are a lot of uses of web scraping. It is used
- To compare prices
- To monitor weather data
- To detect changes made to websites
- For research purposes
- To collect various other information
There is a controversy regarding the legal aspect of web scraping. But I’m not going to discuss that here. You can easily find more information regarding that online. However I advise you to get the permission from the website owner before web scraping from a particular website. And even if you get the relevant permission avoid collecting personal data. And also try to follow these ethics as well.
Follow these ethics
- Don’t develop or use a web scraping program which overloads the website.
- Don’t use collected data to damage or steal the business of that website
- Don’t try to extract data from protected pages
- Avoid copyright infringements
- Don’t extract personal data
Note that the web scraping articles on this website are for education purposes only. I recommend you to contact your lawyer if you want to get legal advice about using Web scraping programs.
Complexity of a VBA web scraping program depends on the website's coding structure and the type of data we are going to collect. If we develop a VBA web scraping software, we need to use several in-built functions to extract the data from that particular website. Here are a few essential in-built VBA functions used in web scraping programs.
Commonly used VBA functions in web scraping
Mid - This VBA function can be used to extract a string segment from it's parent string.
InStr - This function can be used to check whether certain string is included in another string and get the position of that sub string if it exists.
Replace - This VBA function replaces a character or string segments with another character or string.
So now let’s look at how we can create a simple VBA web scraping program which can open an internet explorer window and navigate to a particular webpage.
When developing VBA web scraping software, our first step is to define a variable of type object and then assign an internet explorer object to it. You can use the below line of code to create that object.
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
Then set the following properties to the created object. You can change them according to your requirements. First two properties(Top and Left) decide the position of the internet explorer window. Next two properties(Width and Height) will decide the size of the window.
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
Following property enables the visibility of the internet explorer window. You can set it to false if you want to run the program in the background.
objIE.Visible = True
So now we have created the internet explorer object and set the relevant properties. Next let's see how to navigate to a webpage. You can use the below VBA statement to navigate to the webpage you want.
objIE.Navigate ("http://www.example.com")
There is a very important fact you need to remember when developing a web scraping software. That is your program will have to run through several VBA statements after navigating to a specific webpage. However, webpages can take some time to be fully loaded due to various reasons. So when you run the program there is a chance of executing subsequent VBA statements before the webpage is fully loaded. This can adversely affect the results you get from the software. But luckily we can take preventive action to overcome this. You can use below VBA statements to tell the program to wait until the webpage is fully loaded.
Do
DoEvents
Loop Until objIE.readystate = 4
Below is the full code to create an Internet Explorer object and navigate to a webpage.
Sub NavigateToWebpage()
Dim objIE As Object
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True
objIE.Navigate ("http://www.example.com")
Do
DoEvents
Loop Until objIE.readystate = 4
End Sub
Also if the website is temporarily unavailable, VBA web scraping programs can be suspended before collecting all the relevant data. So If our program tries to collect data from a website while it is down, it will halt our program. So we need to use error handling techniques to overcome temporary interruptions like this.
Here is an example of using error handling in VBA web scraping.
Dim objIE As Object
the_start:
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True
On Error Resume Next
objIE.navigate ("http://www.example.com/")
Do
DoEvents
If Err.Number <> 0 Then
objIE.Quit
Set objIE = Nothing
GoTo the_start:
End If
Loop Until objIE.readystate = 4
MsgBox "Web page has loaded"
In the first steps of the web scraping programs, often we need to assign the HTML code to a variable of type string. So the below code shows how to assign HTML code of a webpage to a variable of type string. Here HTML_Code is the variable name.
Dim HTML_Code as String
HTML_Code = objIE.document.body.innerhtml
Sometimes it is very easy to get required data from the inner-text without dealing with HTML codes. Below example shows how to assign inner text to a variable of type string.
Dim InnerText as String
InnerText = objIE.document.body.innertext
In VBA web scraping applications, we often need to get elements by their tag names. Following example shows how to get elements by < a > tags. (anchor tags)
Set Alllinks = objIE.document.getelementsbytagname("A")
For Each Hyperlink In Alllinks
MsgBox Hyperlink.innertext & "-" & Hyperlink.href
Next
Sometimes we need our VBA web scraping programs to follow the links and collect the data. To do that the program needs to click on < a > tags. Now let’s see how we can accomplish this. To achieve this the program needs to loop through all the < a > tags of the webpage and find the relevant link. Then the VBA program can click the link to navigate to that page. Let’s try to understand this using an example. In this example, the VBA web scraping program will click the link if it finds a link with the text "New Arrivals".
Set Alllinks = objIE.document.getelementsbytagname("A")
For Each Hyperlink In Alllinks
If InStr(Hyperlink.innertext, "New Arrivals") > 0 Then
Hyperlink.Click
Exit For
End If
Next
Above are the basics you need to know when developing VBA web scraping programs. Follow this complete guide if you want to become an expert in VBA web scraping.