Terrell Russell asked for some suggestion on how to improve his tool, Cloudalicious. He asked for it 3 times, one on the del.icio.us mailing list, and one on a comment on my previous entry, and one on his site (link missing as he wisely took this one off). Now I really think three times are too much, and Terrell should be heavily chastised for this. So I will write him a loong list of things that I think his tool should do, and maybe next time he will think better before asking people how should he employ his time as a programmer. I am always happy to give ideas to people, provided a) they remember me when the idea makes them incredibly rich b) they do the coding.
- The site right now, is down, because of an error of wordpress. I am totally with you that wordpress is the best blog program, but why do we need Cloudalicious to be a subpage of wordpress (apart because it is cool)?
- The y axis should be in a logarithm format, at least as a possibility (this is important!). This would permit to spread out the lower part of the graph, and see all the tags that have been tagged only few times
- The x axis should have also the possibility to be user by user, instead than day by day. Then we could see the evolution even when site has been very popular but just in a very few days.
- You have a great todo list, can we not have comments enabled there so that our suggestion would have a place to go?
- If
Withe weight of the tagi, andtiis the total number of times a tag has been used up to a certain time, then at the moment you are using the formula:
Wi=ti/#users
Could we have the possibility to use the formula:
Wi=ti/(. This would take care that the weight all lie on the n-dimensional Simplex. That is, the sum of all the weight should be equal to 1.SUM i=1...nti)
.SUM i=1...nWi = 1
There is another formula that I would like to suggest, but it requires some longer explanation, so we will skip that for now - Little graph with the average number of tag a user uses would be nice too.
- I totally agree on the idea to have the possibility to see the graph only between a certain time period.
- Also, ça va sans dire, that I hope the data are being cached, or Joshua will take both mine and your scalp
I am actually grateful that Russell took some of the suggestion I gave before and immediatly implemented them. If he codes all this, and still keeps asking me for more suggestion I’ll be forced to invite to do produce the loglog plot of the power law we so heavily need. With automatic calculation of the steepness of the curve.
Related posts:
- No more Ivory Towers I am right now at the FET 11 conference. There I was attending a session from my old friend Josh Bongard on Crowdsourcing Science . There I commented that it would be already a good thing if scientists started to make themselves available to the wider public. This by giving a timetable when they are [...]...

Hi Pietro,
Thanks for all the ideas you’ve been writing about, great stuff. I suggested some ideas for tag related graphs here that you might be interested in; they are more about state than time evolution though.
I also wanted to follow up on some things you’ve mentioned: in this post you mention possible alternatives to calculating weight so as to keep the “point” corresponding to a URL on the surface of the n-simplex in tag space; what would they be? Also, you previously suggested using metrics besides the Euclidean metric in calculating “distances” between URLs in tag space; which metrics would you consider, and how do you think they would more accurately describe this distance?
Hello Adam,
sorry if I answer you so late, but I was out. I described the other formula at the end of the last post: tag clouds are hard to spam. Thanks for the important link. My understanding is that power law don’t always have an average, yet finite power law always have one. And this is a fundamental difference if we don’t want to overemphasize the long tail.
I carefully avoided the discussion on the various possible metric that could replace the euclidean one, because I know is a big topic, and deserve full attention. I would start by testing the various Ln metrics. But is useless to think about it right now. We first need to have some tool that given a URL finds us other URL that are near in any metric. Than switching from Euclidean metric (aka L2) to another Ln will be easy. It could even be a variable in the search form.