Loading
Homepage

Learn How to Scrape Website with Power Query

Web scraping is the process of extracting data from websites through automated means, rather than manual copy and pasting. This can be achieved using several tools, with Python being a popular choice.

.

The choice of web scraping tool can be influenced by the amount of data needing extraction. For large-scale scraping needs, Python libraries like BeautifulSoup and Selenium may be more robust options. But, Power Query is well-suited for limited, focused scraping tasks.

.

When to Use Power Query to Scraping

  • Extracting tables or datasets over 1,000 rows from websites
  • Occasional scraping needs, such as monthly/quarterly data updates
  • Scraping tasks focused on a multiple site or section of a site
  • Augmenting existing Excel or Power BI reporting and dashboards

.

For large-scale web scraping needs, Python libraries may be more appropriate. But for focused scraping tasks to supplement Excel or Power BI work, Power Query offers a quick and easy data extraction option.

.

How to Scrape a Website with Power Query

In this tutorial guide, we will be extracting a list of countries and their population size from a website. the link to the website.

.

Step 1: Go to your browser and search for “list of countries in the world

.

.

Step 2: Click on “List on countries in the world in alphabetical order

.

.

Step 3: Copy the URL link from the URL bar at the top of your browser. We are extracting the list of countries and population.

.

NB: When scraping/extracting data from website with Power Query, it can only extract data in tabular format.

.

.

Step 4: Open Power BI, then Get Data from Web

.

.

Step 5: Paste the URL link in the URL box

.

.

Step 6: A navigation box will pop up, asking for access credentials to access the website you’re about to scrape.

.

  • Anonymous: This option gives you free access without the need for credentials to access a website.
  • Windows: This option requests current window user authentication to access the website. Use my Current credentials: If the website is set to request Windows administrative permission,
    • you will need to provide the credentials to be able to access the website.
    • Use Alternate credentials: This means you want to access the website with another user's credentials.
  • Basic: This option is for a website that requires you to login to your existing account (you have an existing account created/sign up for), e.g. login to your X account.
  • Web API: this option requires you to provide your API key to access the website. (API Key is a unique hidden hash code generated using your username and password when you create an account on a platform). This will allow you to access the information on the website.
  • Organization account: this is similar to Basic option of accessing the website. You will see a button icon showing “sign in” which indicate you should sign in to your organization account to be able to access the website.

.

NB: If you’re signed in to your organization account, it will show “signed in as [yourname]

.

In this aspect, you will select the “Anonymous” option since we are scraping a public website and click on Connect

.

Step 7: Power Query will check the credentials authentication to establish a connection to the website. If the connection was successful, you will another navigation box pop out.

.

.

In this navigation box, Power Query will scan through the website and return all possible tabular records it can find and return them in the navigation box. Here, it can only find one tabular record (table)

.

You can click on Table 1 to preview the data. The HTML Code and Displayed Text contains the HTLM script used to develop the website (this is useful when scraping specific keywords or labels on a website), and the Displayed Text contains all the text on the website.

.

A Web View feature is beside the Table View (Default view setting when importing data into Power Query) to validate the record.

.

.

Once validated, you can load the data to Power BI, if more transformation needs to be done, you can load it to Power Query.

.

.

Web scraping with Power Query offers a fast and effective way to extract data from websites for data analysis projects. Understanding how to handle pagination, extract data, and combine it across multiple pages is the key to unlocking valuable insights. I encourage you to explore these powerful techniques and make them an essential part of your data analysis toolkit.

.


.

Get Started with Power BI: https://selar.com/4m7j50

.

Want more content like this? Follow me on LinkedIn and Twitter, or subscribe to my newsletter for weekly insights on building a profitable BI career. I help people escape low-paying, time-consuming jobs and break into the data and BI world with more income and time freedom. Learn more here

Share:

More Posts

Send Us A Message

Write a comment

Picture of Mubar Dauda

Mubar Dauda

I'm Mubar Dauda, a Data/BI Analyst and Process Improvement Consultant with over 4 years of experience turning messy data into business solutions. I'm the founder of Mubar BI Analytics, where I help people escape low-paying, time-consuming jobs and break into the data and BI world with more income and time freedom.

I work as a consultant, mentor, speaker, and creator—specializing in Power BI, data analytics, and automation. My mission is simple: help you monetize your data skills faster by sharing what actually works, not just theory. I've written dozens of technical articles, spoken at data conferences, and mentored over 3500+ aspiring analysts who've gone on to land high-paying roles and freelance gigs.

I find deep fulfillment in sharing knowledge, and when I'm not working with data or mentoring, I'm practicing karate or playing snooker.

Let's connect on LinkedIn, Twitter, or YouTube. You can also find my technical articles on Medium and here on mubardauda.com.

© Mubar Dauda 2024. All rights reserved.

Developed by: Mubar BI