A Unique Problem With Link Normalisation? (Canonical URLs!)

So we have hit a serious problem with rbutr which we need to overcome in order for it to be effective as it needs to be for it to deliver rebuttals reliably for arguments which have been rebutted, and it has to do with link normalisation, and coverage of all links used for a single page.

An example is the best way to clarify this:

This page:
http://online.wsj.com/article/SB10001424052970204301404577171531838421366.html
Is the same page as this one:
http://online.wsj.com/article/SB10001424052970204301404577171531838421366.html?mod=WSJ_article_comments
which is the same page as this one:
http://online.wsj.com/article/SB10001424052970204301404577171531838421366.html?mod=WSJ_article_comments#articleTabs%3Dcomments

Lets call these links: Shorty, Comments and Tabs, respectively.

Currently, rbutr is set up to make rebuttal links between URL’s, so when someone goes to URL ‘Comments’ and submits a rebuttal to the article they find there, that rebuttal will be logged against Comments and only Comments. Anyone who visits Comments in the future will see that there is a rebuttal, but anyone who visits Shorty or Tabs will see zero rebuttals listed.

So how to we solve this problem? How do we cover all possible versions of URL’s which might be used to link to a single article?

Straight Forward Normalisation and Reduction

The first answer that came to mind was to attempts to find the simplest URL that can be used for the page. This involves accessing the page content, and then destructing or reconstructing the URL a bit at a time until we get the smallest version of the URL which has the same page content as the submitted link. And that would work fine, for submitting …but doesn’t solve the exact same problem – what about all the people who land at Comments and Tab? The system took Comments, and turned it in to Shorty, and even if the system stored Comments as well as Shorty, then when people land on Tab, they still get no indication of a rebuttal…. And in some cases, there can be tens or hundreds of real URL’s to a single article. In reality, there are infinite possible URL’s to any page, and if you simply stick a ? at the end, it will still give you the same page, but the different URL will stop rbutr from recognising the page…

So that doesn’t actually solve our problem at all

Breakthrough!

After writing the first half of this post – all prior to this heading – I went to bed for the night. While I was sleeping, Craig managed to find what we were looking for. Turns out it is Canonical URL’s – something which I had seen for years in WordPress, but never really understood the significance of. I’m not sure it is 100% our solution, but it is definitely built to provide the solution we are looking for. The only problem is how many websites actually use – because it relies on the website owners to use it in order for us to take advantage of it…

Anyway, Craig found a heap of information on it, have a look at some if you want:

http://www.leancrew.com/all-this/2011/11/redundant-urls/
this one repeats the problem outlined above which we are trying to avoid, as experienced on another website
http://www.redirectchecker.com/canonical.htm
A short explanation of what Canonical URL’s are
http://www2007.org/papers/paper194.pdf
An academic paper about “DUST” – Different Url’s with Similar Text.

So we have a way forward now, and will be exploring canonical URL’s and keeping an eye out for how to apply the concept effectively so that our user experience is kept at the highest possible level (delivering rebuttals whenever we have them!)

A Unique Problem With Link Normalisation? (Canonical URLs!)

Straight Forward Normalisation and Reduction

Breakthrough!

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112