Loading
Homepage
  • Mubar BIMubar BI
  • Date:  3 April 2025
  • Blog

Learn How to Scrape Website with Power Query

Learn How to Scrape Website with Power Query

Web scraping is the process of extracting data from websites through automated means, rather than manual copy and pasting. This can be achieved using several tools, with Python being a popular choice.

The choice of web scraping tool can be influenced by the amount of data needing extraction. For large-scale scraping needs, Python libraries like BeautifulSoup and Selenium may be more robust options. But, Power Query is well-suited for limited, focused scraping tasks.

When to Use Power Query to Scraping

  • Extracting tables or datasets over 1,000 rows from websites
  • Occasional scraping needs, such as monthly/quarterly data updates
  • Scraping tasks focused on a multiple site or section of a site
  • Augmenting existing Excel or Power BI reporting and dashboards

For large-scale web scraping needs, Python libraries may be more appropriate. But for focused scraping tasks to supplement Excel or Power BI work, Power Query offers a quick and easy data extraction option.

How to Scrape a Website with Power Query

In this tutorial guide, we will be extracting a list of countries and their population size from a website. the link to the website.

Step 1: Go to your browser and search for “list of countries in the world

Step 2: Click on “List on countries in the world in alphabetical order

Step 3: Copy the URL link from the URL bar at the top of your browser. We are extracting the list of countries and population.

NB: When scraping/extracting data from website with Power Query, it can only extract data in tabular format.

Step 4: Open Power BI, then Get Data from Web

Step 5: Paste the URL link in the URL box

Step 6: A navigation box will pop up, asking for access credentials to access the website you’re about to scrape.

  • Anonymous: This option gives you free access without the need for credentials to access a website.
  • Windows: This option requests current window user authentication to access the website. Use my Current credentials: If the website is set to request Windows administrative permission,
    • you will need to provide the credentials to be able to access the website.
    • Use Alternate credentials: This means you want to access the website with another user’s credentials.
  • Basic: This option is for a website that requires you to login to your existing account (you have an existing account created/sign up for), e.g. login to your X account.
  • Web API: this option requires you to provide your API key to access the website. (API Key is a unique hidden hash code generated using your username and password when you create an account on a platform). This will allow you to access the information on the website.
  • Organization account: this is similar to Basic option of accessing the website. You will see a button icon showing “sign in” which indicate you should sign in to your organization account to be able to access the website.

NB: If you’re signed in to your organization account, it will show “signed in as [yourname]

In this aspect, you will select the “Anonymous” option since we are scraping a public website and click on Connect

Step 7: Power Query will check the credentials authentication to establish a connection to the website. If the connection was successful, you will another navigation box pop out.

In this navigation box, Power Query will scan through the website and return all possible tabular records it can find and return them in the navigation box. Here, it can only find one tabular record (table)

You can click on Table 1 to preview the data. The HTML Code and Displayed Text contains the HTLM script used to develop the website (this is useful when scraping specific keywords or labels on a website), and the Displayed Text contains all the text on the website.

A Web View feature is beside the Table View (Default view setting when importing data into Power Query) to validate the record.

Once validated, you can load the data to Power BI, if more transformation needs to be done, you can load it to Power Query.

Web scraping with Power Query offers a fast and effective way to extract data from websites for data analysis projects. Understanding how to handle pagination, extract data, and combine it across multiple pages is the key to unlocking valuable insights. I encourage you to explore these powerful techniques and make them an essential part of your data analysis toolkit.


Get Started with Power BI: https://selar.com/4m7j50

If you’re looking to become a Power BI Developer with mentor-guided training, reach out here: Become a Power BI Developer

Posted in Blog, Power BI, Power QueryTags:
Previous Next

Write a comment

© Mubar Dauda 2024. All rights reserved.

Developed by: Mubar BI