Want to become an expert in VBA? So this is the right place for you. This blog mainly focus on teaching how to apply Visual Basic for Microsoft Excel. So improve the functionality of your excel workbooks with the aid of this blog. Also ask any questions you have regarding MS Excel and applying VBA. We are happy to assist you.

Web Scraping - Collecting Data From a Webpage

Web pages have various HTML tags. So the data we want to collect can be contained in any of these tags. Because of this you will need to carefully examine the HTML structure of the webpages before develop codes.

Web browsers have special tool which helps us to look at the structure of the web pages very easily. For an example, think you need to collect names of the people which is shown in a webpages. In this example each name is shown in separate page. But structure of each page is identical. So what you need to do first is, take your cursor near the name. Then write click. You will see this kind of options.

Then click on the Inspect.

You will able to see the structure of the web-page. Assume the code relate to name is look like this.

<span itemprop="full name" class="full name block">James Smith</span>

So I will show you how to collect name from this kind of structure. Here we can extract name using class name as follows.

objIE.document.getElementsByClassName("name block")(0).innerText

objIE should be defined before like

Set objIE = CreateObject("InternetExplorer.Application")

If you are not familiar with those basic web scraping, you can learn them from my earlier post.

Web Scraping - Basics

So let's now look at how to extract data from hyperlinks. Assume telephone numbers are stored in a hyperlink like below.

<a class="click-link" data-gaaction="ourbusiness" data-gaevent="pho_number" href="tel:9128602884" itemprop="phone">912-860-2884</a>

We can use below type of VBA code to get the telephone number.

Dim telephone_number As String

Set Alllinks = objIE.document.getElementsByTagName("A")

For Each Hyperlink In Alllinks
    If InStr(1, Hyperlink.href, "tel:", vbTextCompare) > 0 Then
        telephone_number = Hyperlink.innerText
        Exit For
    End If

Also we can use following technique to get text inside div tags.

Dim AllString As String
Set AllDiv = objIE.document.getElementsByTagName("div")
For Each DivTag In AllDiv
    AllString = DivTag.innerText

Then we can use functions like InStr, Split etc.. to extract relavent information from that "AllString" variable.

Contact Form


Email *

Message *

Popular Posts