Category Archives: Ajax

Experiments with Getting Web Content Using ColdFusion and PHP

Share this:  

So you want to get content from another web site out there to use on your site. You may be doing screen scraping of a page out there… But a better use is to get info from some sort of a web API.

Case in point: Calling a URI from the U.S. Weather Service to get current weather information.

http://w1.weather.gov/xml/current_obs/KMDW.xml

… This URI will return weather data from Chicago’s Midway Airport

Using ColdFusion

Here is a ColdFusion file for getting info on the current weather in Chicago:

It works fine. I do any Ajax call to “weather.cfm” from my home page, parse of the current temperature, and weather description and display it on my home page.  Nice!

 

If I call “weather.cfm” directly, I get weather data back for Chicago’s weather:

Below you can roughly see how it displays this page in Safari:

If you were to view the source, this is approximately what you would see, XML output:

Again, I have a JavaScript routine that calls weather.cfm via AJAX and pulls data from the <weather></weather> and the <temp_f></temp_f> tags.

This was running in ColdFusion Developer on my iMac and I can access on my personal wifi network.

 


Using PHP

I also wanted to do the same sort of thing using PHP. I decided I would use cUrl. Here is the code I used in a file called “weather.php”:

Notice that the URL is the same one as I’m using in the ColdFusion example.

 

So what happens if I directly call this “weather.php” page I created in my Safari web browser?

Well, we get a page like the one displayed above. Bummer! This is running on a PHP MAMP server on the same iMac as the ColdFusion server is. And trust me, the ColdFusion server is not calling this URL with any special permissions!

This led me to suspect that there was something different about the HTTP Header being sent by the ColdFusion server than was being sent by the PHP server using cURL.

But how to figure out what ColdFusion is doing differently than PHP?  Create a new PHP page to call instead of calling the xml file…

 


 

HTTP Header Test Page

I was going to create a PHP page that would look at it’s HTTP Header values and output them to the page for me to be able to see!

Here it is:

 

Now, if we changed weather.php to point to this URL, what do we get?

Note that the ‘1’ at the bottom is an artifact of cUrl (unless you set the CURLOPT_RETURNTRANSFER option to true.

What about doing the same thing with weather.cfm ?

There is definitely a difference between the two. Both have the same value for “Host”. Not much else is the same! It could be that the PHP request has HTTP Headers that the server does not like… But I’m going with the assumption that the PHP request is MISSING one or more HTTP headers that the web server (w1.weather.gov) is expecting. So lets modify our weather.php file:

 

Notice above how I added a new block of code (lines 8 through 12). This is adding three headers to our HTTP Header: ‘User-Agent’, ‘Connection’, and ‘Accept-Encoding’. I saved my changes and refreshed this page.

BINGO! IT WORKED!

But is the server looking for all three of these headers?

I remove ‘Accept-Encoding’.  I refresh the browser.  It still works.

I remove ‘Connection’.  I refresh the browser.  It still works.

And (of course) I remove ‘User-Agent’.  I refresh the browser.  And of course it fails.

So, ‘User-Agent’ is the key. Currently, in our example, it is set to the value of ‘ColdFusion’. Because that is the value I got when running the ColdFusion page. But actually (of course) our page is a PHP page when is requesting the info.

I change the value of ‘User-Agent’ from ‘ColdFusion’ to ‘PHP’. I refresh the browser and it works.

I wonder, is:   w1.weather.gov   looking for specific values for this header, or just that the ‘User-Agent’ header is present in the HTTP header?

So, I change the value of ‘User-Agent’ to: ‘SugarBoogers’.  I refresh the page and it works! This means that the server (at least in this case) is just checking to make sure that the ‘User-Agent’ HTTP Header is present and has a value… but doesn’t care WHAT the value is (I’m sure that ‘SugarBoogers’ is not a common user agent to check for!

Wrapping It Up

You might be able to “screen scrape” a web page without custom setting any HTTP headers. But I suspect that if your calling some sort of XML feed, JSON resource, or web service URI, there’s a good chance that you will need to set the ‘User-Agent’ HTTP header in order to get it to work.

Any comments? Thoughts? Let me know.

Happy Coding!

 

Resources

Javascript: The Good Parts – A Collection of Lectures By Douglas Crockford on the Javascript Language

Share this:  

A few years ago, I am not sure just when or how, I stumbled upon these videos on Yahoo where programming legend Douglas Crockford gave very interesting and insightful lectures on the JavaScript language. It has been a bit tedious to find part X of a lecture and I would think about creating a page on the web that embeds all these videos in a way that makes watching them easier to do. To curate this content as Robert Scoble might say. Well after all this time I am doing just that in this post.  I think that if you are a programmer and you use JavaScript in any fashion that this information is really useful. Both to learn new things, as well as for review. Read more

A Little Update

Share this:  

Well, it’s been a long time since I’ve posted to my blog here. There have been a few storms in my life I’ve had to work through. Things have relatively settled down now and there are various projects that I am working on, some having to do with this site!

One thing of note is that due to certain changes in my life now, I am free to move out of the area I’m in (if I so choose). If I wanted to interview for a new job would I be prepared? I’ve had a little site:  http://orville.chomer.com/ which contained information about me and included my resume to download. When I checked, the site was down! I had been messing around months ago with it and forgot all about it! Read more