Mar
09
Filed Under (Technology) by Thejesh GN on 09-03-2009

As you know earlier I had parsed Aamir Khan’s Blog to create a feed. It was custom screen scraping code to generate the feed.Today, after reading Anand’s blog, I did the same using YQL and Pipes. Using YQL/PIPE is much easier than writing custom code and is less buggy.

If you have subscribed to http://feeds.thejeshgn.com/aamirkhan then, you don’t have to worry. The feed url remains the same only the technology behind has changed. Now we have a better technology. If you have not subscribed, I guess its a good idea to subscribe.

The post below is for fellow hackers. I have tried to write a detailed post on the process I followed and technologies I used.
YQL (Yahoo Query Language) can be used to query the web for data. YQL exposes a SQL-like SELECT syntax with which we all are very familiar. To get the links for the posts from Aamir’s blog I used

select * from  html where url="http://74.55.20.11/blog/login.php" and
xpath="//a[contains(@href,'/blog/login.php?topicid=')]"

Now that goes to home page of Aamirs blog and gets the links of all the recent posts listed on side bar.
To test the same, Go to YQL console and run the above query. YQL gives you both xml and json. It also gives you restful url for your own application.

But there was a problem with this approach. It used to get the all the urls except that of latest post. On his blog Aamir lists all the posts except the post on which we are on. On the home page he doesn’t have the link to the latest post. Makes sense to the web readers but not for me. So I went to 21 url and got the links and then truncated the results to first 20 urls (20 latest posts are more than enough for any feed).

select * from  html where
url="http://74.55.20.11/blog/login.php?topicid=21url"
and xpath="//a[contains(@href,'/blog/login.php?topicid=')]"

The most beautiful thing of using Pipes is YQL is built into pipes. So I can send the result of a module into YQL and vice versa. This makes YQL and Pipes a deadly combination.

To get the content I looped through the list of urls and used get page module. I am now getting the data between first
<p class=”body”> and first <br>. Yeah they use <br> for paragraphs. I don’t want to steal users of his blog and hence I am getting only the first paragraph.

You can clone the pipe that I have created to experiment with it.

to do:
1. Get the date info. Probably the text between spans

<span class="graybold">Oct,09,2007</span>

and parse them into date object.

2. Fix the bugs if there are any. Let me know if you find.



Comments:
11 Comments posted on "RSS Feed for Aamir Khan’s blog using YQL and Pipes"
thej on March 9th, 2009 at 8:19 PM # Reply

Detailed post on How I created Aamir Khan’s blog feed using Yahoo YQL and Pipes http://bitly.com/7Fqez #yql #pipes @yahoo have a look

This comment was originally posted on Twitter


Prasoon on March 9th, 2009 at 8:28 PM # Reply

Suddenly today I saw 50 new updates on the feed I had subscribed long back but it had only titles and all that changed again a little while ago I saw text below those feeds – loved it all then.

Great work thej.


Thejesh GN on March 9th, 2009 at 8:35 PM # Reply

@Prasoon : Thanks. Now you can see the latest post too :)


sandeep on March 10th, 2009 at 1:33 PM # Reply

I was under the impression aamir has a feed @ that “http://feeds2.feedburner.com/aamirkhan” and was using it for my blogroll for a while now. Now, is that one u created?


Thejesh GN on March 10th, 2009 at 5:37 PM # Reply

@sandeep : Yup. Its created by me :)


S Anand on March 11th, 2009 at 7:53 PM # Reply

MiaD on March 11th, 2009 at 10:32 PM # Reply

Liked “Thejesh GN » RSS Feed for Aamir Khan’s blog using YQL and Pipes” http://ff.im/-1seXW

This comment was originally posted on Twitter


» links for 2009-03-11 Thej Live on March 12th, 2009 at 10:43 AM # Reply

[...] Thejesh GN » RSS Feed for Aamir Khan’s blog using YQL and Pipes As you know earlier I had parsed Aamir Khan’s Blog to create a feed. It was custom screen scraping code to generate the feed.Today, after reading Anand’s blog, I did the same using YQL and Pipes. Using YQL/PIPE is much easier than writing custom code and is less buggy. (tags: aamirkhan) [...]


thej on May 11th, 2009 at 2:35 PM # Reply

The latest post of aamirkhans blog showed up on my feedreader. That shows my yahoo pipes is still working http://bit.ly/5JFYs

This comment was originally posted on Twitter


Thejesh GN » Feed for Aamir Khan’s Blog on September 23rd, 2009 at 1:53 PM # Reply

[...] MAr 09 2009 : Updated to use YQL, Read RSS Feed for Aamir Khan’s blog using YQL and Pipes for more details. Its much better now. You don’t have to worry about subscribing again. Just [...]


Vishal on December 10th, 2009 at 11:03 PM # Reply

Thanks. This is useful.


Post a comment
   Name Required
   Email Required
   URL