RSS Feed for Aamir Khan’s blog using YQL and Pipes

by Thejesh GN · March 9, 2009

As you know earlier I had parsed Aamir Khan’s Blog to create a feed. It was custom screen scraping code to generate the feed.Today, after reading Anand’s blog, I did the same using YQL and Pipes. Using YQL/PIPE is much easier than writing custom code and is less buggy.

If you have subscribed to http://feeds.thejeshgn.com/aamirkhan then, you don’t have to worry. The feed url remains the same only the technology behind has changed. Now we have a better technology. If you have not subscribed, I guess its a good idea to subscribe.

The post below is for fellow hackers. I have tried to write a detailed post on the process I followed and technologies I used.
YQL (Yahoo Query Language) can be used to query the web for data. YQL exposes a SQL-like SELECT syntax with which we all are very familiar. To get the links for the posts from Aamir’s blog I used

select * from  html where url="http://74.55.20.11/blog/login.php" and 
xpath="//a[contains(@href,'/blog/login.php?topicid=')]"

Now that goes to home page of Aamirs blog and gets the links of all the recent posts listed on side bar.
To test the same, Go to YQL console and run the above query. YQL gives you both xml and json. It also gives you restful url for your own application.

But there was a problem with this approach. It used to get the all the urls except that of latest post. On his blog Aamir lists all the posts except the post on which we are on. On the home page he doesn’t have the link to the latest post. Makes sense to the web readers but not for me. So I went to 21 url and got the links and then truncated the results to first 20 urls (20 latest posts are more than enough for any feed).

select * from  html where 
url="http://74.55.20.11/blog/login.php?topicid=21url"
and xpath="//a[contains(@href,'/blog/login.php?topicid=')]"

The most beautiful thing of using Pipes is YQL is built into pipes. So I can send the result of a module into YQL and vice versa. This makes YQL and Pipes a deadly combination.

To get the content I looped through the list of urls and used get page module. I am now getting the data between first
<p class=”body”> and first <br>. Yeah they use <br> for paragraphs. I don’t want to steal users of his blog and hence I am getting only the first paragraph.

You can clone the pipe that I have created to experiment with it.

to do:
1. Get the date info. Probably the text between spans

<span class="graybold">Oct,09,2007</span>

and parse them into date object.

2. Fix the bugs if there are any. Let me know if you find.

Tags: Aamir Blogging Hack Web 🌐Yahoo

Prasoon says:

March 9, 2009 at 8:28 PM

Suddenly today I saw 50 new updates on the feed I had subscribed long back but it had only titles and all that changed again a little while ago I saw text below those feeds – loved it all then.

Great work thej.

Reply
Thejesh GN says:

March 9, 2009 at 8:35 PM

@Prasoon : Thanks. Now you can see the latest post too :)

Reply
sandeep says:

March 10, 2009 at 1:33 PM

I was under the impression aamir has a feed @ that “http://feeds2.feedburner.com/aamirkhan” and was using it for my blogroll for a while now. Now, is that one u created?

Reply
Thejesh GN says:

March 10, 2009 at 5:37 PM

@sandeep : Yup. Its created by me :)

Reply
S Anand says:

March 11, 2009 at 7:53 PM

Neat!

I’d been using a pure XPath solution that returns just the titles. It had the ghastly URL http://www.s-anand.net/xpath?url=http%3A%2F%2F202.87.41.148%2Fdigital%2FAamirKhan%2Flogin.php%3Ftopicid%3D1&xpath=//a%5Bcontains(@href,%22login.php?topicid=%22)%5D%5Bnot(contains(@href,%22page=%22))][string-length(.)%3E2]%20title-%3E.%20link-%3E./@href

Look forward to moving to your :-)

Reply
Vishal says:

December 10, 2009 at 11:03 PM

Thanks. This is useful.

Reply

» links for 2009-03-11 Thej Live

March 12, 2009

[…] Thejesh GN » RSS Feed for Aamir Khan’s blog using YQL and Pipes As you know earlier I had parsed Aamir Khan’s Blog to create a feed. It was custom screen scraping code to generate the feed.Today, after reading Anand’s blog, I did the same using YQL and Pipes. Using YQL/PIPE is much easier than writing custom code and is less buggy. (tags: aamirkhan) […]
Thejesh GN » Feed for Aamir Khan’s Blog

September 23, 2009

[…] MAr 09 2009 : Updated to use YQL, Read RSS Feed for Aamir Khan’s blog using YQL and Pipes for more details. Its much better now. You don’t have to worry about subscribing again. Just […]

RSS Feed for Aamir Khan’s blog using YQL and Pipes

8 Responses

Leave a Reply to Prasoon Cancel reply

About

Blog Ring

Top Posts

Archives

Copyright and Disclosure