Jalaj P. Jha Technical & Miscellaneous Ramblings

14Jun/100




Yahoo Pipes: Build RSS for non-RSS Sites

RSS has made browsing the internet so simpler. For each site or its section that you wish to visit regularly, you just need to get the RSS feed url and subscribe it to your Feed Reader as Google Reader. After that all the updates to the sites/sections that you subscribed is like reading just a single page with all the new content. What’s more that you could have wished for? May be that all the sites could have RSS feature!!

There are a number of sites that are highly popular but were built a long ago in static html style or using the WYSIWYG editors as Dreamweaver to get the same look on all pages using templates. Unfortunately they continue to remain as such and you are eager to read it at your convenience in feed reader. Or worse you are actually owner of such a site and realize that your site is missing the RSS but don’t know how to go ahead adding it to your site. The answer lies in somehow building the RSS. If you are a site visitor the Rss will benefit as you can read it at your convenience. If you are the owner you can save the generated RSS files and link them from individual pages or if you can afford to risk biding the site again using CMS/blog as Wordpress, just import all the rss files.

Question is how to build the RSS? And answer could be Yahoo Pipes. The text ahead assumes that you are at least capable of following the instructions mentioned herein. We take here comedy-zone.net, a site that lists hundreds and thousands of jokes. Let’s start with Blonde Jokes at http://www.comedy-zone.net/jokes/laugh/blondes/index.htm. This is a page where all jokes are listed by Titles and link on each title pointing to individual pages containing the full joke. To prepare the RSS feed we will first get the list page and then get all individual page to add the entire joke to description.

imageFrom “User inputs” section add “URL Input” module. We will pass the url of list page through this. Since all such list pages are expected to be based on the same template, a single pipe would be sufficient to get feeds to different categories of joke. put Blonde Jokes list page as the default url. Use a “Fetch Page” module to get the page. If you go through this list page source code you will find that you can get entire list between <td colspan="3" bgcolor="#660000"> and </table> and you can break all jokes by using <td class="menuOption"> as the delimiter. Use these as parameters to the Fetch page module.

The result would be that the list will be broken into multiple items, each item containing the url/title of the joke and some other text we are not interested in. We will use Regex module to clean each item such that only the link remains which we will use to fetch full jokes text. The first entry doesnot contain any link (check it yourself) and to filter it off use a Filter module to permit only items that start with http://.

image

imageNow we are going to fetch each joke page but before that you may want to limit the number of items returned in the rss feed to say 10. It’s better to place this limitation here itself because fetching each page and later discarding it would be a waste.

Get the number as a parameter through “Number Input” module and pipe this to the Truncate module which will truncate the feed we are preparing.

Now add a Loop module and a Fetch page module within it. Fetch pages from URL given in item.content and limit the text to be scraped between <td width="95%" colspan="2" bgcolor="#FFEA97"> and <form. Get all pages within item.loop:fetchpage

image

Each of the page that we have fetched contains the title as well the description that we want to add to the rss feed. we already have the Links from list page fetching. Now all we need to do is to remove the line feed and carriage return characters as they sometimes give unexpected end results. In effect we will do this by replacing all of them to space characters replacing all occurrences to more than one continues space with single space. Once this process is complete we use the rename module to prepare the fields Title, Link & Description. A regex module added later will clean off unwanted texts (description from title and vice versa) from title/description. All these results in the final output.

image

This pipe is available at http://pipes.yahoo.com/jalaj/comedyzone. Replace the list urls with other categories of jokes from comedy-zone.net and see the same pipe working all over.

Want some task to be accomplished using Yahoo Pipes and featured on this blog? Use comments area to ask for it.

21Feb/092




Friendly Short URLs in Yahoo Pipes

Yahoo pipes is the friendliest site when it comes to mashing up contents as RSS, CSV etc, but the URLs are not! For example if you what to check all pipes developed by me the url would be

http://pipes.yahoo.com/pipes/person.info?eyuid=5liC_Uw5uGl4hZ0ufA--

Similarly the most popular pipes among the ones that I publicly released the ‘Google Trends Tokenizer’ has a url as below.

http://pipes.yahoo.com/pipes/pipe.info?_id=OtSAI5r83BGLeywRG8evXg

How I wish there were more friendly urls in Yahoo Pipes! Well there is a way, though not well documented in Yahoo Pipes Docs or Blog, except for a post mentioning the short friendly urls but not teaching how to create them. Let’s see how to create short and friendly URLs.

Log on to Yahoo Pipes and click on “My Pipes” link. That will open your pipes page listing all the pipes that you created including the unpublished ones. Click on the “edit” link besides your url.

image

Clicking on “Edit” will create a textbox where you can give a unique identifier and click on ‘Save’.

image

Now the url that looked like http://pipes.yahoo.com/pipes/person.info?eyuid=5liC_Uw5uGl4hZ0ufA— changes to a friendly short url like http://pipes.yahoo.com/jalaj (This is the url where you can find all my published pipes).

image 

Now let’s take the individual pipes. Open the pipe and you will notice that the pipe address have shortened itself. We will shorten it further by defining identifiers for each pipe. Click on “edit”, put an identifier in the textbox that opens up and your pipe address is further shortened.

image

image

This way you can give short url to each pipe you developed. The four pipes that I released publicly (Published) have short urls as below

To check all pipes that I release in future, do remember the friendly url http://pipes.yahoo.com/jalaj

11Feb/090




Who’s linking me?

imageThis blog was largely inactive until late last year and thus received a very little maintenance. Today I did a little tweaking here and there to the sidebar plus added a section named “Referred this blog!”.

“Referred this blog!” takes its content from an RSS feed and whether you like it or not when you add an RSS feed using RSS widget on WordPress, a link to the source gets created. And in this instant case the RSS feed comes from a Yahoo Pipe named “Who’s linking me?” that I published today. This pipe can work for any blog for which all you need to do is to change the Blog parameter, run the pipe and use the link on ‘Get as RSS’…

image

Noticed the parameter “Pipe ID”? Leave it as it is… This pipe “Who’s linking me?” is a practical demonstration of concepts discussed in my previous post How to Create Private Yahoo Pipes? and the underlying functionality is available in another pipe which is unpublished. So while you can fully use the pipe and share its url to the whole world, the pipe and the functionality cannot be cloned.

This pipe is planned for public release (i.e. removing the shield) but not before I am sure that whatever I discussed in How to Create Private Yahoo Pipes? stands true. All pipes enthusiasts are requested to check for means to break the shield and expose the underlying source/functionality. If you are unable to break it, you get a way to safeguard your pipes too!

Your input on how effective is this pipe in checking posts linking to your blog would be appreciated. That would help in improving the pipe.

Working on Yahoo Pipes is fun and I’m Loving it :)

10Feb/093




How to Create Private Yahoo Pipes?

Officially there are no Private Yahoo Pipes! All Pipes that you create can be seen and cloned by any other Yahoo Pipes user, if they know about it, that is they know its address or the pipe id associated with each pipe. So if you want to keep a Pipes created at Yahoo Pipes private you can do that by :

  • Not Publishing the Pipe
    You can always use your pipe even if it is not published. Publishing a pipe just means that your pipe becomes the part of the pipes directory which all users can access. If a pipe is published there is every chance that it is discovered by others by search string, modules used in it etc. Using your pipe without publishing keeps you safe on this front.
  • Embedding RSS feeds online taking care that the Pipes ID is not revealed
    If you are embedding RSS feeds created by you Yahoo Pipes, make sure that the pipe id is not revealed as a link or in the source code of the page. For example if you embed RSS feed in RSS module available in Blogger your pipe id doesn’t show up anywhere but on a wordpress.com blog an RSS link reveals the pipe url (and thus id) by which anybody can clone the pipe. Take care when you do that and check source code of generated page too.
  • Not embedding or linking or using the pipe on other unsecured pipes.
    Pipes can be embedded in other pipes or RSS feeds from pipes retrieved in other pipes. Don’t do with the pipe that you want to keep secure. If the pipes that linked to it or embedded it goes into others’ hands your private pipe is exposed. So if  a pipe uses your private pipe, it to needs to be kept with same security as with this one.
  • Not sharing it with anybody else.
    Your pipes is just for you. If you want to keep the source code of pipe private but want to share its functionality with others, you have no chance. As soon as you share your pipe with someone you have exposed the source.

Thanks for patiently reading to this point. Now we start with the unofficial way to create a Private Yahoo Pipe. Yes there is a way!

All we need to do is to create one extra pipe for each pipe that you need to publish or share or embed. We would henceforth call this extra pipe as PipeShield. The PipeShield itself can be published or shared or embedded.

Let’s say I want to secure my Google Trends Scraper pipe, which takes one user input and generates output based on it.

image

We will create a new yahoo pipe, but before that run the above yahoo pipe and get the RSS output url as given in the link ‘Get as RSS’

image

Create a new Yahoo Pipe and insert a ‘URL Builder’ module. Paste the url that you got from the previous step into the textbox provided alongside label ‘Base:’. As soon as you press ‘Tab’ button all query parameters present in the url will switch their position.

image

Insert a ‘Fetch Feed’ module and make it to take url from the url builder module and link to output module.

image

Now since the pipe that we are shielding here took a user input ‘Date’ so we will need to do same here to. So, add a ‘Text Input’ field and wire it to date field in ‘URL Builder’

image

Now we have a functional pipe that forms a wrapper around original pipe, but the original pipe is still not secure as its pipe id is visible, in the parameter ‘_id’ of the URL Builder. Here is the final step.

Add a ‘Private Text Input’ module, fill its default value from the value in _id field, check the ‘Private’ checkbox, clear the _id field and make it to take value from Private Text Input module.

image 

Save the pipe and distribute it to all you know and those you don’t without fear of exposing the original pipe and the functions/secrets it carried. This pipeshield will show the value in the ‘Private Text Input’ only to you. Everyone else will see the field blank. Others will not be able to run the pipeshield in Edit mode and thus cannot debug the pipe id, they cannot clone the pipeshield otherwise its link with the underlying pipe will break and the pipeshield will become non-functional. Others can just run the pipe and get the results! Once you have distributed your pipeshield you can rest assured that your original pipe is always.

You can check this pipe in edit mode here. Watch my page on Yahoo Pipes for more pipes. All posts on Yahoo Pipes on this blog will share this archive url