Want to become an expert in VBA? So this is the right place for you. This blog mainly focus on teaching how to apply Visual Basic for Microsoft Excel. So improve the functionality of your excel workbooks with the aid of this blog. Also ask any questions you have regarding MS Excel and applying VBA. We are happy to assist you.

Web Scraping - Basics

Web sites contains mass amount of data. Some times people need to extract those information for their needs. Web scraping is a popular technique use to collect data automatically from websites. Web scraping is also called as Web harvesting.

There are lot of uses of web scraping. It is used

  • To compare prices
  • To monitor weather data
  • To detect changes made to websites
  • For research purposes
  • To collect contact details

Complexity of the web scraping program depends on the web site and the data we going to collect. If we develop a web scraping software from VBA we need to use several in-built functions to make our program easier. Here below are very important functions we will need.

Functions helpful in web scraping

Mid - This function can be used to extract a string segment from it's parent string.
InStr - This function can be used to check whether certain string is included in another string and get the position of that sub string if it exists.
Replace - This function replace a character or string segments with another character or string.

Create an Internet Explorer object and navigate to a webpage

First, let's see how to create an internet explorer object. You can use below line of code to create that object.

Set objIE = CreateObject("InternetExplorer.Application")

Then set following attributes to the created object. You can change them according to your requirement. First to attributions decide the position of internet explorer window. Next to attributes will decide the size of the window.

objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600

Following attribute enable the viewing of window navigation. However you can set it to false if you do not want to see it.

objIE.Visible = True 

So now we have created the internet explorer object and set the attributes. Next let's see how to navigate to a website. You can use below line to navigate to the website you want.

objIE.Navigate ("http://www.example.com/")

There is a very important thing you need to know when developing a web scraping software. Your program will have to run through several line of codes after navigating to a specific website. However website can take some time to be fully loaded due to various reasons. So when you run the program there is a chance of running the rest of the code before web site is fully loaded. This can harmfully affected to the results you get from the software. However we can take preventive action to overcome this kind of situations. For that you need to always put below three lines where ever you have navigation command.

Do
DoEvents
Loop Until objIE.readystate = 4

Following is the full code you can use to Create an Internet Explorer object and navigate to a webpage.

Set objIE = CreateObject("InternetExplorer.Application")

objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600

objIE.Visible = True

objIE.Navigate ("http://www.example.com/")

Do
DoEvents
Loop Until objIE.readystate = 4

There is an another reason which can cause our program to behave differently to the way we have intended. There are times web sites not available due to different reasons. If our program try web harvest with a web site which is not online it will halt our program. So we need to use error handling techniques wisely to overcome this kind of scenarios.

Here below is a example of using error handling in VBA web scraping.


the_start:

Set objIE = CreateObject("InternetExplorer.Application")
objIE.Top = 0
objIE.Left = 0
objIE.Width = 800
objIE.Height = 600
objIE.Visible = True 'We will see the window navigation

On Error Resume Next
objIE.navigate ("http://www.example.com/")

Do
DoEvents
    If Err.Number <> 0 Then
        objIE.Quit
        Set objIE = Nothing
        GoTo the_start:
    End If
Loop Until objIE.readystate = 4

MsgBox "webpage has loaded"

Often we need to get the html code to a text box or to a variable, in first steps of the web scraping program. So here below is a code to paste html code of a specific web page to a text box. Here txtHtml refers to a text box name.

txtHtml.Text = objIE.document.body.innerhtml

Some times it is very easier to get required data from the inner-text without dealing with html codes. So we can use below line of code to copy paste inner-text to a text box in that kind of situation.

txtHtml.Text = objIE.document.body.innertext

In web scraping, frequently we need to get elements by their tag names. Here below is an example which can get elements have the "a" tags (anchor)

Set Alllinks = objIE.document.getelementsbytagname("A")
    For Each Hyperlink In Alllinks
        MsgBox Hyperlink.innertext & "-" & Hyperlink.href
    Next

Often web scraping software need to click links inside it's procedures. Below example will click the link if it find a link with the text "New Arrivals".

Set Alllinks = objIE.document.getelementsbytagname("A")
    For Each Hyperlink In Alllinks
        If InStr(Hyperlink.innertext, "New Arrivals") > 0 Then
            Hyperlink.Click
            Exit For
        End If
    Next

Above are few basic things you need to know if you like to develop a software for Web scraping using VBA. Read these posts if you want to learn more.

Useful References for VBA Web Scraping
Web Scraping Techniques - Part 2
Web Scraping - Collecting Data From a Webpage
Web Scraping - Collect Options Inside Drop-down Lists
VBA Web Scraping - Identifying Disabled Dropdowns
How to scroll a web page number of pixels

Open and read text file

                         We need to handle text files in various situations. Sometimes we need to create, open, write  or delete text files while the program is running. So today I will explain you how to Open and read text file through VBA.
First of all we need to define the variables.

Dim myFile, text, textline as String

Variable "myFile" is used to store the file path of the text file. Each and every text lines will be stored temporarily to the variable call "textline" while program run throgh the Do loop until it reaches to the end of the file. All the text lines of the text file will be appended to variable "text".

There are two ways to assign the text file to "myFile" variable depending on your requirement.

Either you can use below line of code to assign the file path directly. This method can be used if you always use same folder path and file.

myFile = "D:\Fiverr\amjedamjed\Project 2\all - Copy (11).txt"

Or you can use below line of code to assign the file name to variable. Here user get opportunity to select the file through browsing window.

myFile = Application.GetOpenFilename()

So you should use one of the above two lines in your program according to end user requirement. However rest of the program will not affected by what you chooses from above two ways.  
Below line will open the file for reading

Open myFile For Input As #1

Below do loop will run until it reaches to the end of the text file. Inside the do loop, program read the text file line by line and assign that single line to variable "textline". Whole text of text file is appended to variable call "text" by the next line.
Do Until EOF(1)

    Line Input #1, textline
    text = text & textline
Loop

Below line of code will close the text file which was opened for reading.
Close #1


Now you have whole content of text file in variable "text". Now you can do any string manipulation to that text from here.
However I will just show the content in a message box.

MsgBox text





Here below is the full code for you to easily copy paste for your project

Dim myFile, text, textline as String

myFile = "D:\Fiverr\amjedamjed\Project 2\all - Copy (11).txt"
or
myFile = Application.GetOpenFilename()

Open myFile For Input As #1
Do Until EOF(1)
    Line Input #1, textline
    text = text & textline
Loop
Close #1
MsgBox text

Contact Form

Name

Email *

Message *

Popular Posts