Want to become an expert in VBA? So this is the right place for you. This blog mainly focus on teaching how to apply Visual Basic for Microsoft Excel. So improve the functionality of your excel workbooks with the aid of this blog. Also ask any questions you have regarding MS Excel and applying VBA. We are happy to assist you.

Web Scraping Techniques

In this post we are going to discuss about few more techniques used in web scraping. If you are new to web scraping, please read my earlier posts from below.

In this post I'm going to explain about more advance techniques.

When we do the data mining, sometimes we need to go to certain websites, put some value in text box and click on "Search" button. Then website gives list of results. These results sometimes may in several pages. If you want to automate this kind of process, you can develop a code to go to that url, then put value to text box, and then write a code to click button and so and so. You can learn how to write this kind of code from my earlier post.

But sometimes this can be easier than you think. Because it may be possible to get result of each page by changing the url. For an example if the website gives results in several pages, url's of each page sometimes have part like "?page=1", "?page=2" etc. So you can develop a program to get data from each page directly. And if you are searching for some value in textbox, then that value also can contained in the url. So it is always better to carefully look at the url and try to find some patterns.

Then sometimes we need to open the each result in separate pages to get all the information of each and every result. Think website have hyperlink like "View" for each result. So we can collect url of each of this "View" link from following code.

Set objIE = CreateObject("InternetExplorer.Application")

url="url of particular result page"

objIE.Navigate (url)

Dim WS as worksheet

set WS=activesheet

'wait to load page...
Do
DoEvents
Loop Until objIE.readystate = 4

Set Alllinks = objIE.document.getElementsByTagName("A")
    For Each Hyperlink In Alllinks
        If InStr(Hyperlink.innerText, "View") > 0 Then
            WS.Cells(row, 1).Value = Hyperlink.href
            row = row + 1
        End If
    Next

So what it does is, it lists url of each result in a worksheet. Also you should have some idea about how many result pages the website will give. Then you can set suitable upper limit to the number of pages and should use some technique to exit the loop when there are no more result pages. Here below is example of such program.

Res = InputBox("Please enter number You want to search?", "")

Set objIE = CreateObject("InternetExplorer.Application")

objIE.Top = 0
objIE.Left = 0
objIE.Width = 1100
objIE.Height = 700

objIE.Visible = True

Dim i, row As Integer
Dim url, isdata As String

Dim WS As Worksheet

Set WS = Worksheets("DataBase")

i = 1
row = 1
For i = 1 To 100
    'genarate url...
    url = "your website url+ ?page=" + CStr(i) + "There may be some additional parts of url here" + CStr(Res)
    objIE.Navigate (url)
   
    'wait to load page...
    Do
    DoEvents
    Loop Until objIE.readystate = 4
   
    'collect links...
    isdata = "n"
   
    Set Alllinks = objIE.document.getElementsByTagName("A")
    For Each Hyperlink In Alllinks
        If InStr(Hyperlink.innerText, "View") > 0 Then
            WS.Cells(row, 1).Value = Hyperlink.href
            isdata = "y"
            row = row + 1
        End If
    Next


    If (isdata = "n") Then
        i = 101
    End If
Next i

objIE.Quit

Contact Form

Name

Email *

Message *

Popular Posts