Scraping In The Name Of!

Plan & Setup

As we've stated in the past, the most important thing in software is to plan the steps needed to complete our task.

  1. Devise some technique to extract information from weather.com
  2. Interpret the data and isolate anything important for display
  3. Display the information in a singular interface

We'll need to determine the URLs for the multiple pages from which we wish to extract data. For our example, we will be using the mountain town of Silverthorne, Colorado and the city of Denver. The current temperature difference at the time of writing is 38 degrees fahrenheit, making these good candidates for our project.

  1. Head over to weather.com, type in the Zip or City for your first location, and save the URLs for any pages from which you want data.
    1. Silverthorne, CO
      1. Current Conditions URL: http://www.weather.com/weather/right-now/Silverthorne+CO+USCO0357
      2. Today's Forecast URL: http://www.weather.com/weather/today/Silverthorne+CO+USCO0357:1:US
  2. Type in the Zip or City for your second location and save the URLs for any pages from which you want data.
    1. Denver, CO
      1. Current Conditions URL: http://www.weather.com/weather/right-now/Denver+CO+USCO0105
      2. Today's Forecast URL: http://www.weather.com/weather/today/Denver+CO+USCO0105:1:US

Now that we have our URLs, we need to figure out a way to extract the data we want from each page using PHP. If you do not have access to a web server to run your PHP, you can create your own or use XAMPP Portable Lite. If you would like to create your own, follow last week's article: Creating A Web Server In Linux. Otherwise, head over to http://www.apachefriends.org/en/xampp-windows.html#646 to download XAMPP Portable Lite and follow the installation instructions that follow. There are many articles on installing and using XAMPP if you need assistance. Google is your friend!



;