<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.0.1" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Evolving Datasets and the Netflix Prize</title>
	<link>http://dsanalytics.com/dsblog/evolving-datasets-and-the-netflix-prize_89</link>
	<description>Data Analytics- the art and science of analyzing data</description>
	<pubDate>Thu, 11 Mar 2010 08:34:21 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.1</generator>

	<item>
		<title>by: John Aitchison</title>
		<link>http://dsanalytics.com/dsblog/evolving-datasets-and-the-netflix-prize_89#comment-935</link>
		<pubDate>Thu, 21 Feb 2008 07:45:34 +0000</pubDate>
		<guid>http://dsanalytics.com/dsblog/evolving-datasets-and-the-netflix-prize_89#comment-935</guid>
					<description>small update .. yes, I am still working on it in my spare time

Whimsley has an article at http://whimsley.typepad.com/whimsley/2007/07/the-limitations.html
in which he explores the notion that improvements in the RMSE may not translate to (noticeable) improvements in the customer experience (he also has a good summary of the problems with the data)


and Yehuda Koren (Korbel.. the current top team) argues somewhat to the contrary .. that is, that RMSE does matter, but he also takes a ranking approach and his algorithm is influenced by the idea that recommender systems should predict the &quot;top K&quot; , which implicitly means placing more emphasis on predicting the &quot;5's&quot;..  See http://www.netflixprize.com/community/viewtopic.php?id=828

</description>
		<content:encoded><![CDATA[<p>small update .. yes, I am still working on it in my spare time</p>
<p>Whimsley has an article at <a href='http://whimsley.typepad.com/whimsley/2007/07/the-limitations.html' rel='nofollow'>http://whimsley.typepad.com/whimsley/2007/07/the-limitations.html</a><br />
in which he explores the notion that improvements in the RMSE may not translate to (noticeable) improvements in the customer experience (he also has a good summary of the problems with the data)</p>
<p>and Yehuda Koren (Korbel.. the current top team) argues somewhat to the contrary .. that is, that RMSE does matter, but he also takes a ranking approach and his algorithm is influenced by the idea that recommender systems should predict the &#8220;top K&#8221; , which implicitly means placing more emphasis on predicting the &#8220;5&#8217;s&#8221;..  See <a href='http://www.netflixprize.com/community/viewtopic.php?id=828' rel='nofollow'>http://www.netflixprize.com/community/viewtopic.php?id=828</a>
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: &#187; Taming the Google Monster–we have PageRank 5!. Cool, I guess. [ Data Sciences Analytics ]</title>
		<link>http://dsanalytics.com/dsblog/evolving-datasets-and-the-netflix-prize_89#comment-36</link>
		<pubDate>Fri, 25 May 2007 00:05:47 +0000</pubDate>
		<guid>http://dsanalytics.com/dsblog/evolving-datasets-and-the-netflix-prize_89#comment-36</guid>
					<description>[...] Who knows?. These evolving antagonist problems are intrinsically very interesting, though.  And I guess the web dataset is the biggest game in town, bigger than Netflix – sorta hard to resist. I’d quite like to build my own semi-supervised crawler/spider, but perhaps not tonight Josephine. [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] Who knows?. These evolving antagonist problems are intrinsically very interesting, though.  And I guess the web dataset is the biggest game in town, bigger than Netflix – sorta hard to resist. I’d quite like to build my own semi-supervised crawler/spider, but perhaps not tonight Josephine. [&#8230;]
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Shane</title>
		<link>http://dsanalytics.com/dsblog/evolving-datasets-and-the-netflix-prize_89#comment-24</link>
		<pubDate>Thu, 22 Mar 2007 23:58:40 +0000</pubDate>
		<guid>http://dsanalytics.com/dsblog/evolving-datasets-and-the-netflix-prize_89#comment-24</guid>
					<description>Thanks for linking me, I will have to start blogging again!

How awesome that the ABS data is now free! Hopefully we will see something along the lines of what the Juice guys have done:
http://www.juiceanalytics.com/weblog/?p=119
http://www.juiceanalytics.com/weblog/?p=202
http://www.juiceanalytics.com/weblog/?p=144
http://www.juiceanalytics.com/weblog/?page_id=99</description>
		<content:encoded><![CDATA[<p>Thanks for linking me, I will have to start blogging again!</p>
<p>How awesome that the ABS data is now free! Hopefully we will see something along the lines of what the Juice guys have done:<br />
<a href='http://www.juiceanalytics.com/weblog/?p=119' rel='nofollow'>http://www.juiceanalytics.com/weblog/?p=119</a><br />
<a href='http://www.juiceanalytics.com/weblog/?p=202' rel='nofollow'>http://www.juiceanalytics.com/weblog/?p=202</a><br />
<a href='http://www.juiceanalytics.com/weblog/?p=144' rel='nofollow'>http://www.juiceanalytics.com/weblog/?p=144</a><br />
<a href='http://www.juiceanalytics.com/weblog/?page_id=99' rel='nofollow'>http://www.juiceanalytics.com/weblog/?page_id=99</a>
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: John Aitchison</title>
		<link>http://dsanalytics.com/dsblog/evolving-datasets-and-the-netflix-prize_89#comment-20</link>
		<pubDate>Wed, 21 Mar 2007 08:58:14 +0000</pubDate>
		<guid>http://dsanalytics.com/dsblog/evolving-datasets-and-the-netflix-prize_89#comment-20</guid>
					<description>Thanks, Shane, for the post and good wishes .. I see you are in Melbourne, my old stamping ground and, it seems, a bit of a hot spot for data analytics.

I have added you to my blogroll, dk why I did not find you before. No need to reciprocate - I see you have your blogsite set up a bit differently.

I am going to follow up on your post about Australian geocoding via Google- the ABS has their plans for this, but maybe they have been pre-empted. Anyways, I am always interested in geodemographic data .. see, eg http://dsanalytics.com/dsblog/australian-census-data-is-finally-again-free_62

As for the Netflix data, it does not immediately present to my mind as something where linear correlations are entirely appropriate. There is a lot of restriction of range (most ratings are 3, 4, or 5) as is to be expected (these people did after all expect to like the movie when they borrowed the DVD) , and the scale is at best ordinal. I think there are some limen issues here too.. what is the tipping point what tips a 3 into a 4?

So, it should be interesting.

Thanks for the comment</description>
		<content:encoded><![CDATA[<p>Thanks, Shane, for the post and good wishes .. I see you are in Melbourne, my old stamping ground and, it seems, a bit of a hot spot for data analytics.</p>
<p>I have added you to my blogroll, dk why I did not find you before. No need to reciprocate - I see you have your blogsite set up a bit differently.</p>
<p>I am going to follow up on your post about Australian geocoding via Google- the ABS has their plans for this, but maybe they have been pre-empted. Anyways, I am always interested in geodemographic data .. see, eg <a href='http://dsanalytics.com/dsblog/australian-census-data-is-finally-again-free_62' rel='nofollow'>http://dsanalytics.com/dsblog/australian-census-data-is-finally-again-free_62</a></p>
<p>As for the Netflix data, it does not immediately present to my mind as something where linear correlations are entirely appropriate. There is a lot of restriction of range (most ratings are 3, 4, or 5) as is to be expected (these people did after all expect to like the movie when they borrowed the DVD) , and the scale is at best ordinal. I think there are some limen issues here too.. what is the tipping point what tips a 3 into a 4?</p>
<p>So, it should be interesting.</p>
<p>Thanks for the comment
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Shane</title>
		<link>http://dsanalytics.com/dsblog/evolving-datasets-and-the-netflix-prize_89#comment-19</link>
		<pubDate>Wed, 21 Mar 2007 00:25:38 +0000</pubDate>
		<guid>http://dsanalytics.com/dsblog/evolving-datasets-and-the-netflix-prize_89#comment-19</guid>
					<description>Isn't it great the a company is prepared to put up $1M to see a problem like this solved!! Who cares what their motivation is!

Netflix's Cinematch is a variant of Pearson's correlation, according to the paper linked on the KDD Cup 2007 website.  So far the most popular algorithm on the forum seems to be Singular Value Decomposition, which gains about 5% on Cinematch, but others seem to have tried SlopeOne also. 

Great to hear you are going to takle Netflix, good luck!</description>
		<content:encoded><![CDATA[<p>Isn&#8217;t it great the a company is prepared to put up $1M to see a problem like this solved!! Who cares what their motivation is!</p>
<p>Netflix&#8217;s Cinematch is a variant of Pearson&#8217;s correlation, according to the paper linked on the KDD Cup 2007 website.  So far the most popular algorithm on the forum seems to be Singular Value Decomposition, which gains about 5% on Cinematch, but others seem to have tried SlopeOne also. </p>
<p>Great to hear you are going to takle Netflix, good luck!
</p>
]]></content:encoded>
				</item>
</channel>
</rss>
