Tweets

Follow @pietrosperoni (556 followers)

Categories

No more Ivory Towers

I am right now at the FET 11 conference. There I was attending a session from my old friend Josh Bongard on Crowdsourcing Science . There I commented that it would be already a good thing if scientists started to make themselves available to the wider public. This by giving a timetable when they are available to whoever wants to chat with them about science.

Original from James Stewart. http://www.flickr.com/people/jystewart/There was a time when scientists lived in ivory towers. Now that ivory towers are starting to crumble, we should do our best to really tear them down completely. So I am here suggesting, and promoting a new project. An open science project.
The idea is that I, and every scientist who is willing to participate in this, will donate some time to society for science.
I will be available one hour a week on Skype to discuss about science with anyone who is interested.
My Skype name is “pietrosperoni” and I will be available every tuesday from 13 GMT to 14 GMT. You must be able to speak in English or Italian. I speak a bit of French so that might work too, but it’s very poor. And I cannot write it.
In this time we can discuss about science. If you have an idea about my field of expertise you can come and talk to me about that. Maybe we can collaborate on developing it, and maybe making it into a publication.
Before any collaboration I expect you to know about the scientific method and how do peer-reviewed journals work. But I am willing to tell you about it. Those are some sort of basic things that needs to be known when you want to do science. A bit like you need to know the rules of the road when you start driving.
If you are a colleague and you want to chat you are also of course very welcome. In fact you should start joining me, and start to offer 1 hour a week to help people discover about your field of expertise.
You can find my interests as a scientist here. But I am willing and interested to discuss on many other topics.
You also can come to me and ask about any idea you might have found on my blog.
If you are a colleague of course you can come and Skype, but you can do much more: I invite you to join me!
You can do this from your blog, or from the comment section over here. If you have a blog and you write about this, please remember to advertise about it here. And (either here or in your blog) please remember to write:
Name:
Availability:
Skype name:
Field of expertise:
Other interests:
For me:
Scientists, tear down the wall!

COP15 Needs an e-Government System

This morning I received a mail from Copenhagen. It was very moving, and describing a situation of chaos, strong commitment, and braveness. It told the story of people fighting with non violence, and shouting that they want change.

And I am afraid all this is useless. I feel once again what I felt looking at Iran insurgency. But stronger.

Let’s focus on Copenhagen. The sensation is that there is a lot of people on the street asking for a strong carbon tax. Count me among them. But there is more. I am afraid people have ideas, and those ideas are not being heard. And then people assume the worse, and assume the world leaders, the ministers, and everybody who is inside is on the pay check of some big corporations. And then they demand change. But now they do not focus any more on the small change. The key point. Now they want a huge change, that will not happen. And then there are rallies, and people pushing, and the police resisting. And violence. Yes police violence should not be there. And I feel this is not the way. It is not by shouting “Shame on you”, that you win the heart of the police men. It is not by shouting to people that you get yourself heard. As it does not change if I write this in normal letters or in CAPS LOCK. It is the content that matters And when you are shouting, when you are polarized, you are already making violence. This is not the way.

Now, there are people who work hard to negotiate among different positions. The Center for Non Violent Communication is probably one of the best. It was created by Marshall B. Rosenberg. One of the student of Gandhi (or so I remember from his book. The wikipedia page does not seem to mention it). Now Marshall has worked in the past as a negotiator between groups, and I am sure there are a number of very good negotiator working inside the conference to negotiate between the  key people. What I don’t think there is, are negotiator between the people in the conference and people outside.

It is like all the effort is concentrated in getting the communication going between those big players. But no work has been done to get enough communication between the inside of the conference and the outside. The assumption seem to be that either there are no good ideas outside or it is just impractical to engage them. I think both of those assumption are wrong. Yes, we still need to develop the tools to make an efficient brainstorm with millions of people. But the idea of having everybody writing their own ideas, and voting on the ideas they like is already a good start. Why is there no system like this to harvest the ideas from the people?

I was just looking at a youtube cnn conference where people sent questions, and voted on the questions. Again there is the assumption that normal people are just ignorant. This is not true. Not anymore (if it ever was). Not with the internet that let anyone study any topic.

In all those situations we need to set up systems where people can chip in their ideas. While it is happening, can read each other ideas. And the most voted ideas emerge from the noise to the people who are making the discussion.

How Twitter, Google, Wolfram|Alpha and WIkipedia are not competing at all

It seems to me that Twitter, Google, and Wolfram|Alpha, are really not competing at all, but are instead providing complementary services. I would go farther by adding wikipedia (and blogs?), and suggest that the 4 services really represent the digestive process of our information society. From the first Churning to the Backbone

Wolfram|Alpha represents the deeper part. It includes only what is really known inside out from our society. What has been fully digested. FOr something to be in W|A it needs to be known, semantically known, beyond doubt. And notice that I am here speaking about a deeper Wolfram|Alpha than what you have seen here. The Wolfram|Alpha as it should be, once we learned hot to interrogate it proprtly, and once it has epanded with the rest of the knowledge we have.

At a higher level we have wikipedia. Wikipedia permits much more stuff to be present. You have actors, and theories, and stories, and a lot of other stuff.  You also have discussions and point of views. In short you have a lot of stuff that is not being digested anymore, but is also not the bones of our society. It is more like the muscles. The limit of Wikipedia is that since it does not allow for new research, by definition it is limited to what has already been discovered. Although not in a definite way as in Wolfram|Alpha.

And then we have Google. Google is really part of the digestive process. It has new stuff coming in every few days. But is is also less clear. You need to work to get to the results using google. But you can also find new threads. Things that are still not known. There is real food here, waiting to be digested.

And Twitter is the more superficial tool. Twitter has second to second update. It has multiple links in different forms that point to the same resource. Information is not organised in any way, shape or form. But it is information nevertheless. It represents the edge of the knowledge wave of our civilization. It is deeply alive, unpredictable, full of possibilities. You never know how it will react. It is the most alive part of the constant discussion that is going on in our civilisation. It is the civilisation equivalent to the constant chit chat that is going on in our head. Although it has memory, it is not really good with it. Anything that is in Twitter can be true, can be false, can be anything in the middle, neither or both at the same time.

If you are an alive and creative mind that wants to participate in the constant flow of creation of this society you will probably end up interacting in twitter in some ways. But if you want your creation to be grounded in reality you need to use the other levels as well. They are really not competing.

And Blogs? Blogs are ways with which we store personal longer stories. The untwittable (as Chris Anderson from TED called his). They work between the google level and the Twitter level. Letting information move between those levels, and letting complex information be churned before is ready to go deeper. Similarly you have journal articles (and books) working to bring the information to the wikipedia level.

Where life is and money isn’t

Some of you might remember that I wrote a post about the long tail of the ruling class. The post was in Italian and got translated in English by blogger Phil Edward. I took the translation copied it in my blog (with a link), but said that I did not fully agree with Phil understanding of my post. I didn’t enter more into details. And then there was silence, and in the silence I decided it was easier to just ignore the whole discussion. But a few days ago Nicholas Carr from Rough Type wrote a post on how the long tail permits to the service the puts in touch people to make massive amount of money, but to the people who produce the actual content not much money. Absolutely true, and this is why you don’t see google advertisments in my blog. But this is a very different problem from what I was discussing when I was speaking about the long tail of the Ruling Class. Mainly because I was not speaking about the ruling class but about the ‘classe dirigente’. Which is not exactly the ruling class, although I still can’t find a better translation. Ruling class smells a bit too much of kings and queens and prime ministers. And I was actually speaking about ‘classe dirigente’ as people who have authority over a certain field.

So when Phil commented on Nick post:

I blogged on this last year, in response to Pietro Speroni:

I felt I had to answer. Because my post was all about a multidimensional space (all our interests), which gets mistreated as a unidimensional space (money). Poor chap! For a multidimensional space to be treated as a unidimensional one is fairly common, but never fair. And the general excuse is ‘to understand better’, or ‘to simplify a bit’. But I suspect that multidimensional spaces might take it personally, bacause if you treat them bad, they can become quite convoluted, if you know what I mean. Maybe I should write a long post on the importance of not making models (even mental ones) with too few dimensions. But I think I shall leave it for some time next year. And then I can say that it was long due.

In any case I decided to copy my comment to Nick post here. Continue reading Where life is and money isn’t

The Italian blog is born: reasons and technicalities

Finally the time has come. Although I wanted to do this from a long time, only now did I found the time and the technical knowledge to do it:
I divided the blog.
I divided all the Italian posts from the English ones. I created a new blog at http://it.pietrosperoni.it, and my italian posts will, from now on, be posted over there. And only over there. Most of the people (3) who read me (5) either read Italian or English posts. And I am sure it must have been very confusing to scroll through a page and find some posts in English and some in Italian. Plus I always had the sensation that I could not write too much in one language, or possible readers of the other language will just assume the blog contains no information at all for them, and dismiss it. This in time made me slow down posting, as I could not always follow particular threads, that would have involved to post many times, in one language.
But now all this has come to an end.

Of course if you want to read entries from both blogs you should add the rss from the italian blog too. Some topic will remain confined to this blog (like tags, for example), others will remain there (like italian politics), while other will span through both medium (like diet, which already is present in both). The wiki in this case should act like a glue, creating a space where entries from both are aggregated. Plus, being a wiki, I (and whoever wants to come and play) will use it to keep notes, aggregate extra content, and generally make some pages stand out while others will only show the blogs entries, the bookmarks, and the context (i.e. the links from delicious popular page, and from technorati).

Generally it is not a smart idea to to come here every time to see if I have written something. I tend to write when I have something to say, so many days might pass before I say something, then for some days I might make one or more post a day. The solution is to add my rss feeds to your feed reader. Bloglines is a good one. I am sure there are better ones. Feel free to suggest them (as I am always looking for ways to improve).

Now let’s get a bit more technical: making this change also meant getting my hands dirty with MySql Continue reading The Italian blog is born: reasons and technicalities

Yahoo’s delicious meal!

I wanted to start this entry congratulating with Joshua for the deal. But I won’t.
Tha facts: the web site delicious have been sold to Yahoo!.

I personally don’t dislike Yahoo. I positively hate them. For having eaten and raped startup websites, one after the other. For being totally obscure in terms of contact with the public. For refusing to answer e-mails. For being so big that they can just claim: “we are too big to answer your e-mails”. We can ignore you, and trample on you; we will not even notice. I have something personal with them from the moment they deleted my web page back in 2003; and with it all the material inside; which included some preprints of academic papers I wrote; some of them I had in single copy. I hate yahoo because they don’t get what is the web2.0 and they try hard to copy it. And when they fail in copying it, they try to buy it. As if you could buy a community. As if you could own a community. As if you could buy a language and the agreement to keep the data open.

So maybe I should congratulate with Joshua for having sold something which had no price for some real and tangible money. But I still will not. Because delicious was not only a community. It was also an experiment. A place for us geeks to meet and discuss. A place where we were changing the web. Yes WE were changing the web through our ideas. And Joshua was good in picking the best ideas. Inviting us to give more. Now do you really think this will continue under Yahoo!’s reign? Forget it! At least for my part.

But this is not the reason why I shall not congratulate with Joshua. No I shall not congratulate with him because he could have made it. Because delicious was clearly, and recognised, the best bookmarking service on the web. And with the whole community behind giving suggestion it was prosperous and growing. Because people have pleaded him to start charging, or put advertisments, or do something, but let us pay for it. Because we knew. We knew he could not possibly pay off it all by himself. And we were happy to join in. We were happy to pay. How many services are you aware of where the costumers ask to pay for them? Few indeed!

Of all the people who have commented the action I feel the person who better captures my feelings is Ronald Johnson, who comments:

Some lessons to learn here:

  1. Never trust a startup service to store your important data no matter how the owner seems honest to you. Sooner or later he/she will run away with the money and YOUR data.
  2. Never trust a corporate entity to continue storing your important data. Now that they stole your data, you are subjected to the user-specific ads and they abuse you no matter how strong you cry.
  3. Never act like a fanboy on services you don’t trust. Instead, invest your time and knowledge on open source projects to ensure your efforts are never sold to third party evils.

I have to add, one of the thing I found most disturbing was the form whith which Joshua announced it. In evidence the words that I found most disturbing:

We’re proud to announce that del.icio.us has joined the Yahoo! family. Together we’ll continue to improve how people discover, remember and share on the Internet, with a big emphasis on the power of community. We’re excited to be working with the Yahoo! Search team – they definitely get social systems and their potential to change the web. (We’re also excited to be joining our fraternal twin Flickr!)

We want to thank everyone who has helped us along the way – our employees, our great investors and advisors, and especially our users. We still want to get your feedback, and we look forward to bringing you new features and more servers in the future.

I look forward to continuing my vision of social and community memory, and taking it to the next level with the del.icio.us community and Yahoo!

The post stinks of corporate declaration, and has already signed the destiny of delicious as just another piece in the yahoo puzzle. A more honest post would have spoken of the money that was passed. How they made an offer that could not be refused. Of the risks of the passage. It would still make people upset, but we might have felt that it was coming from Joshua and not through Joshua, from the Yahoo P.R. office.

All this calls for some actions, for I really don’t want to support Yahoo; and if all I can do is passive resistance, then that’s what I shall do!

  1. I shall look for a good alternative to Yahoo, ehm, I mean del.icio.us. The folks at slashdot suggest Simpy.
  2. I want to look better at microformats, and in particular at rel-tag. It might be possible to install a small bookmarking service on site, and then have it send standard info to the community at large. In this way I would not be vulnerable anymore to the next Yahoo! acquisition.
  3. While I am there I should also look for ways to get out of Flickr (who has been acquire by Y! too). Don’t miss the wonderful description of the mess Yahoo is doing with the Flickr signup page. There I also heard that 23hq might be a good alternative. Still I would prefer something on site that speaks a common language.
  4. I have to decide what to do with the Delicious Mind Map Maker. You see, I really don’t want to support Yahoo. Not even indirectly. So I am tempted to take it offline. But if I find a better service, and it is bound to be there now that other geeks will start migrating to come out of the belly of the beast, I might just modify it to sustain this other service. Nothing have been decided yet.
  5. And then I might instead develop my own service or help someone else develop their service, using the tagclouds ideas I spoke about early.
  6. And last but not least, there is the possibility that I might develop the famous search utility I have been speaking about. Up to now, apart the constraints in time, what really stopped me where ethical reasons. Joshua asked people not to screenscrape delicious, so I felt I would abide by his request. I surely did not want to tax the servers of a poor hacker. But now the ‘poor’ hacker have sold the golden eggs’ hen, and walked away with tons of cash. And I am sure Yahoo will not even notice if I start screenscraping them. At least until they start putting all sorts of advertisments which might make it too hard to do. Hmm, active resistance might have some attraction!

So I probably should congratulate with Joshua. He sold a bunch of quite simple and useless code to Yahoo. He prospected them the possibility to have a great and creative community. Now all he has to do is walk away with the cash, start a delicious clone and we will all be more than happy to join him in the new adventure. Hell! We will not even ask for our part of the booty. Although we might ask for a dinner in a good restaurant.
And I think that’s just fair.

ADDENDUM:
After reading all the comments on slashdot I found a link to a page with most bookmarking services compared. It is a bit old, so not totally updated. But yet it gives some good overviews and can be used for some good pre-screening. Also the maintainer of Simpy, Otis, wrote a long comment explaining how he might even adapt the code to make the mindmap work for that too!

Hack for gold: rss tag on wp

I suddenly relised I don’t have the time to do all the things I was interested in (and keep what remains of my mental sanity). So out of need I decided to make the following offer.

The technorati tag plug in gives the possibility to have technorati tags, and for each tag a different page. What instead it does not offer is the possibility to have also an rss feed for each tag. So I offer 20 € for whoever makes the necessary changes to the plugin so that side by side to the tag page (here available at http://blog.pietrosperoni.it/tag/… ) I can have an rss feed of the entries in my blog with each tag (possibly at http://blog.pietrosperoni.it/rsstag/…).

The changes will have to be open source, so that they can then join the mainstream wp program. And be on that plugin. I know 20€ is not a lot, but I reason that for the right person this is a simple hack that might take half an hour. For me it would take ages (mainly because I need to understand how wp is working, and it’s not that trivial).
Oh, yes, I can pay via paypal. Or buy you something via Amazon, LJ, Flickr. Whatever.

And now to comment the above:
It’s interesting how the spreading of open source software also means the opening of a whole new market of people helping each other. I could never have made the above offer if I was using blogger, or lj.

Addendum: Since the new release of wordpress 2.0 includes the possibility to use categories as tags, the above offer is no longer valid. Sorry.

Wikka installed

I have to say that I amvery impressed with Wikka. Wikka is a wiki software that I just installed on my web page. It is simple, yet full of plugins. Open source (or I would not consider it). It also permits to integrate freemind mind map inside it. More than this: for each page the administrator, (ehm, that is me!) can decide who is allowed to read, write and comment. I installed it about one week ago, and I avoided to make it public until I would found a way to deal with wiki spam. I already have too much spam on this blog. Finally I found what I think is the perfect solution:

  1. only registered user can comment and modify the wiki. It might not make it very fast, but at least I know who said what.
  2. I inserted a plugin such that to register people must write a password in the ‘registration code’. But the password is written on the same log in page.
  3. To write spam in the wiki they have to manually register. Which I feel is fair. I have no anger toward those that manually spam. Are the mechanical ones that ought to be stopped.
  4. If the spammers write something that automatically register, I will change the registration code.
  5. And if they write something that automatically grabs from the page the registration code I change the context (the phrase in which the code appears), making their software useless. I will move from:
    • registration code:”pippo pluto” to
    • registrati0n code:”pippo pluto”

As you cannot code for something that blocks all permutation of the word “Viagra”, so you cannot code for something that codes for all the permutation of the phrase: “Registration Code”. Ah! And this is the revenge of the mass!

I think the idea is so brilliant that I will look if I can find a similar plugin for wordpress.

The next think that impressed me in Wikka was the use of rss. It is actually very easy to integrate an rss in a page. Maybe it is the same in other wiki engines, I don’t know. But on wikka it is absolutely trivial. You just need to write {{rss url=”http://the.rss.net/address.rss” cachetime=”30″}} and the rss gets taken shown, and cached for 30 minutes. Now 30 minutes cache is what del.icio.us requires from you if you are going to connect an rss to your homepage. So now I have started to integrate all sort of rss from delicious to my web page. Check for example my Tag Cloud page. With the rss from my personal bookmarks tagged with tagcloud, rss from the popular page in delicious delicious/popular/tagcloud, and the rss from technorati (i.e. people who have blogged on Tag Clouds).

And all this is in the floating right bar. So I still can use the rest of the page as place for me to write content, and notes…

And as notes taker this wiki is slowly becoming. I started moving my Reading List to the wiki. And I added to the reading list, the rss of popular reading lists. You see, how it all comes together.

But this is not all! Wikka (and they should pay me after a post like this!) gives the possibility to set the privacy for each page. That is for each page you can chose who can read it, who can comment on it and who can change it. In this way I can use this not only as my personal notes but as the notes for project that I might be sharing with other people.

Come and say hello: http://wiki.pietrosperoni.it

Ruling Class translation

A fast note to point out that Phil Edwards took the ball and translated the previous post. The post was an answer to a post from my father, and as such in Italian.

Phil uses the post as his launch pad for a very interesting one on how he sees the long tail having effect on the society and how he sees this not happening. I don’t think I agree fully on what Phil writes, but I need some time to gather my ideas, and answer properly.

What follows is Phil translation of my post. The translation is incomplete, but correct and faithful to the original spirit (although I am not convinced I would translate classe dirigente as ruling class), as such I am happy to copy it here. Continue reading Ruling Class translation

Tag Clouds are hard to Spam

I think the time have come to write my third, and hopefully last contribution to the topic of tagclouds.

I have been hearing a lot of talk on how users should not use too many tags in linking to url. I also am the maintainer of the mindmap maker, and I often look at some of the maps generated (available to everybody). There is a number of people who tend to use an average of between one and two tags per URL. Their maps are often very ordered. No clustering, no hierarchy. (Forgive me if I don’t put a link to such a map, but since I am going to bash this way of using delicious, I’d rather bash a method than a specific human being. Just go to the list of maps and open a couple, odds are one of them will be of the type I am describing). This way of using delicious uses tags as folders, just with the modification that every now and then you can put an URL in more than one folder at the same time. A bit like big bookstore might carry several copies of the same book, and store them in more than one place (and the Tao Te Ching, ends up in New Age -God knows why- and in Religion).

Of course tags tend not to fit exactly. My Tag Clouds and Cultural Change will be under Tags or Folksonomy or Sociology… Whatever you chose you probably will not put it under Ajax. And yet most of the analysis was done studying the spreading of the term Ajax.

Let’s make a few simple calculations. Continue reading Tag Clouds are hard to Spam

My reading list

Some of you know that I left ProtoLife. I hope in the future to keep on colaborating with the P.A.C.E. project, but for now that’s it. I want to go back to Germany, and finish my Ph.D. .
Somehow this seem to have a higher priority. Beside is getting clearer for me each day that I am hardly a ‘company’ type of person. I’d rather make research inside the university, or nowhere at all. While I was in Venice I met with the local group of Go players. Sandro, one of those made a big impression on me, being a person of great knowledge, whose only excuse was: “I don’t look at television, I read.”.

I decided that it made sense to read more… that it made sense to read ‘cum grano salis‘. With intelligence, choosing carefully what to read,

When I came back to Rome I decided that it made sense to read more. More than this, that it made sense to read ‘cum grano salis‘. With intelligence, choosing carefully what to read, and not reading any bullshit the latest friend suggested me. I remember telling a friend, “you know, I decided to stop just following my nose, on what books to read…”
His answer was quite funny: “If you don’t follow your nose, what do you follow? Other people’s nose”. And then he added: “This is actually a serious question, you might for example, find some people that you really don’t like, ask their suggestion, and then took off the book they suggest you from your reading list”. As you will see by the end of this entry, this ended up being very near the mark.

So I started asking around what where the books (or document) they felt where more important to understand the world we are living in

So I started asking around what where the books (or document) they felt where more important to understand the world we are living in. As an example I often gave where the acts of the Second Vatican Council (in particular Dignitatis Humanae). Since I am not catholic (nor even Christian), by suggesting something that was not traditionally seen as a classical text, I was implicitly suggesting: the documents that are behind the world we are living in, the documents that most people refer too, but few really read.

I did not receive many lists, but here and there someone would suggest a book or two, that I would dutifully add to my note. I then started keeping track of this list in a separate page on my blog. Since I did not publicize the page no one would read it. The list is nowhere ended, and I feel its inadeguateness knowing all the wonderful books that should be there, but I preferred to keep it small, and add new books slowly.

While I was keeping the list in the back burnere, and slowly going through some of those books, I found another list a much better one from which I am about to fatten my list. And the story of how I found it, and how it relates to my list is very funny, so let me tell it to you.

The list have it all, it’s the most complete list of texts I found that were really important to understand the world we are living in. Each of those book inspired millions of people.

A right winged newspaper: Human Events online, asked

a panel of 15 conservative scholars and public policy leaders to help us compile a list of the Ten Most Harmful Books of the 19th and 20th enturies.

They crossed the information between the various people and came to a list of 10 really dangerous books, and 21 ‘honorable mentions’. The list have it all, it’s the most complete list of texts I found that were really important to understand the world we are living in. Each of those book inspired millions of people. Just to understand where those people come from the book is important. Fundamental I would say. You have it all: Freud, Darwin, Gramsci, Marx, Engels, Mao, the Kinsey report (the Kinsey Report! That I wanted to read from so long). Dewey, that I have been told set the foundation for modern relativism thinking (so dear to our new pope). There is also Mein Kampf, which I am not sure if I’ll have the guts to read, but I probably should. And many others, Betty Friedan (don’t you want to understand Feminism? Read it, too), Keynes, Adorno… Is a wonderful list.

Interesting enough I was not the only one to see this as my next reading list. On the delicious page of the people who bookmarked the article the most common comment is “my next reading list”, ” A very interesting list of powerful books that have changed history.”, “some good reading”, “…some of these would make my required reading list…”, “…would make an excellent library booklist.” and so on.

Buy ‘The Kinsley report in the human male’, and help sustain the neocons battle.

One of the things that you should not fail to notice is that each book in the top ten most dangerous book is presented with a link. The link to Amazon. But is not just a normal link to Amazon. Amazon let you sign an agreement so that you can advertise some books from your website, and if people buy your book, they get a discount, and you get a percentage. So, yes you got it, each of those book is presented in that format. If you click on those links to go to Amazon, and you buy the book, the right winged journal will get a percentage. Will get a percentage out of you buying Mein Kampf, and The Communist Manifesto. “Buy ‘The Kinsley report in the human male’, and help sustain the neocons battle.”. Ah, the irony of all this.

no I don’t ask people I don’t like which books to read and cross them out. I ask them which books not to read, and add them to my reading list. Way more efficient!

So, to answer my friend, “no I don’t ask people I don’t like which books to read and cross them out. I ask them which books not to read, and add them to my reading list. Way more efficient!”

And if you read all this, and want to add something to my reading list, feel free to suggest:
“what books or document would you suggest to understand the world we are living in”. And tell us why, in what way was this book so unique that reading it is a must. Now the line is yours.

Tagclouds and cultural changes

In the previous post I discussed how we can measure the relative importance of tags in a post, by calculating their weight, as

  • weight of tag t= (number of people using t)/(total number of people)

I also said that:

Not only we could study a culture by studying the differences in the power law approximated by the tag clouds used by people of that culture. But we could even measure cultural eartquake by measuring the difference between the tag cloud being generated before a certain event, or after a certain event.

Independently Clay Shirky was coming at a similar conclusion, although he more focused on temporal changes that seem more signature of a particular subgroup of people all bookmarking a site at a certain time:

During a period of about 120 users’ additions of OIO, 20 or of them used the tag ‘ia’, putting it between #7 and #10 during that period. Now it is down to #17. This suggests that one or a few IA-oriented sites or mailing lists posted the link, and it got a flurry of attention from those taggers in a narrower window of time. This in turn suggests a conversationally tightly-knit IA community.

Through this tool we can see changes in the culture we are living in. We are used to feel those changes, but generally we never were able to measure them. Maybe now we might start to be able to do it.

But let’s go back to the tag weight. Terrell Russell took the ball, and in one evening of programming presented a tool to actually see how the weights change in time.

Nothing to say about the tool. It works perfectly well, and although it can be enhanced in many little ways, it already is very useful. Not bad for one evening.

More interesting, from my point of view, is how, through this tool we can see changes in the culture we are living in. We are used to feel those changes, but generally we never were able to measure them. Maybe now we might start to be able to do it.

No change

Tag Clouds rapidly converging

First of all I would like to show you the graph of a part of the culture where no changes are happening:
From the site: Nifty Corners. 1859 people having bookmarked it by now. The values soon converge to what we can expect to be their definite value (for the culture we are in).

Little Social-Quake

Continue reading Tagclouds and cultural changes

On Tag Clouds, Metric, Tag Sets and Power Laws

Note: This entry is connected also to a mindmap. Some people were having problems in opening the page because of that. As such the mindmap has been stored in a separate page, and can be viewed from here.

Introduction

As correctly pointed out by Jeffrey Zeldman tag clouds are becoming more and more popular. Yet I keep seeing services which should be using tag clouds that keep on using tag sets. It is not just a problem of programming a tool which can only support tag sets, but also but also of programming tools which might in principles produce tag clouds, but such that the users are not invited to use a tag if one already exists, and as such don’t generate a tag cloud.

Example of the first type of tools are Flickr, 43things, consuMating, tagsurf * , example of the second is the tagged version of the BBC* . In all those cases a tag set is used, where instead a tag cloud would be more appropriate. Some of the differences between a tag cloud and a tag set where explained in Vanderwal.net: Explaining and Showing Broad and Narrow Folksonomies. Let’s see them again, and see some consequences of those differences, which should clarify when is better to use one tool and when is better to use the other. Continue reading On Tag Clouds, Metric, Tag Sets and Power Laws

Blog your mind map

Great results! Great results!

I finally managed to find a way to integrate my mind maps with my blog. Not just as static images or as external pages, but as living entities inside the blog. I learned it from the code in Wikka. And as always in those cases, once you find the solution, it just look at it, and it seems obvious… after.

Of course it requires that the reader has Java 1.4 installed.

This is the code, but please do not use http://maps.pietrosperoni.it/freemindbrowser.jar as the address for your freemind browser java code, but copy the freemindbrowser.jar file in your directory, and use it from there.

This is the code:
<applet code="freemind.main.FreeMindApplet.class" archive="http://maps.pietrosperoni.it/freemindbrowser.jar" width="100%" height="450">
<param name="type" value="application/x-java-applet;version=1.4" />
<param name="scriptable" value="false" />
<param name="modes" value="freemind.modes.browsemode.BrowseMode" />
<param name="browsemode_initial_map" value="http://maps.pietrosperoni.it/TaoistBooks.mm" />
<param name="initial_mode" value="Browse" />
<param name="selection_method" value="selection_method_direct" />
</applet>

And here is how it looks like (on my map of all my taoist books)
Continue reading Blog your mind map

BBC backstage and News in Folksonomy form

Some things are bound to happen. And they tend to happen at the right time. We have been using tags from years now, but the momentum have builded up, day after day. Always seeing more and more computer programs using them. Starting from deli.icio.us and flickr. Then 43 things.com, consumating.com, tagsurf.com and all the clones of the above (BTW if anybody can find me a small open source server program that emulates Flickr for personal use,I would be grateful). And of course technorati tags, and GutenTag that give rss feeds to technorati tag.
But something was missing. Somthing that some people might have noticed. The news were not playing with tags. News were still presented in the old top down way: politics, economics, international…
On Google News, as well as CNN. On Yahoo News, as on BBC.

But finally something is starting to move over there too.
Two services, pretty much at the same time were presented: Yahoo News with tags and BBC with tags.

But there are some serios differences between the two services. Yahoo content is being automatically indexed by a program, who imposes the tags according to what keywords are found in the text. As such Yahoo tags is a Top Down keyword classification of stories.

Instead (and here you can see the revolutionary spirit blowing through English news services), BBC program is a truly down up grassroot program. A program where everybody can add any tag to any article.
The difference is not a minor one, as in the first case it is the user that have to adapt to the world view of Yahoo, while in the second it is BBC that includes in his wider world view the user one. In a sense it is a case of Tagsonomy vs. Folksonomy, or
narrow folksonomy vs. broad folksonomy.

Of course both the program are still in their first days. Full of bugs, and of suggestion from us on how to make it better, smoother, and nearer to our personal desires.

Of course having anybody being able to add any tag to a copy of the BBC content is full of political dangers. What is stories about important politicians start to be tagged as ‘dictator’ or ‘wanker’. This is in fact inevitable, but politicians showld well use this as an indication of their popularity, than something to be changed.

At the moment anybody can add a tag in the BBC news page by login in as ‘guest’/'guest’. And already we have some people who have tagged some stories as ‘wanker. But if we go to delicious we see that nearly no one have used such epitome.

Why is that? My personal position is that people are more careful when tagging something for their own personal use. On delicious everybody have an account. And although you could have as many account you like, they cost. They cost time and memory to set it up. So we all tend to have just the minimum amount of acount needed. But on BBC, at the moment, only BBC person are allowed to have their own account. We normal human being, can just be guest. Ans as such we might feel deresponsabilized respect to what we wrote. So I think that, although the experiment is great, it will only work properly when everybody can set up his own account, and serch his account, or the account of another, well defined person.

Of course this also open up all sort of extra possibilities. After all, if anybody can tag any article with his own tags. Then to each article a set of tag will be defined. What is I want to receive (maybe on my mobile) all the articles tagged with a certain keyword. The possibilities are really endless.

And to look at those possibilities BBC had started a whole new project, called BBC Backstage where geeks are invited to collaborate with the staff of BBC to develop the API to permit to everybody to reuse the BBC material. Cross this with the fact tha much of this material is copyrighted with a copyleft copyright (copygotit?), and you see how the whole situation can positively explode.
Imagine, much of the material from BBC, offered for free, in the way wanted by the best geeks and hackers, to produce information in any noncommercial way they please.

Already many ideas are flowing? An RSS for the results from sport match. Crossing google maps with BBC News.

Possibility to have BBC news accepting trackbacks.

And many many others.

All this would mingle BBC with the common people. Think, all the news, mixed and remixed. Commented, trackbacked. Until you can read an article from BBC news from any device (through rss), in any format you want (through your rss reader). Filtered anyway you want (through folksonomy), and seeing the world response to that article(through trackback and comments).

Thank you BBC
(and no, I am not paid by BBC)

Thanks also Wired for some inspiration.

Visualizing the double hierarchical nature of entries.

I keep on being hunted by a nightmare:

Think about a post. You write a post, and this is in answer to some other posts, some other web pages, done by someone else. And your post will often be answered by other people. In a sense no post is an island. Given a post you can see all the post that answered it, or reviewed it. This through the trackback list. And they themselves has other post that answered them. And so on. But this does not work only one way. You can also go backward in time (which in fact is what we usually do when we follow the links.) You read a post, then you read the post that post is refering to, and so on. And in my dream this is a sort of tapistry, where each post is a node that links together different threads. So each post is not just contained in a thread, but connects to many threads that work through it.

Now think about a discussion group. In a discussion group each post is part of a tree. Each post can be answered by many posts, but it has only one father. One post it is itself answering to. And because of this structure it is possible, and actually easy to generate the classical hierarchical structure, that you can see pretty much everywhere in discussion group. (i.e. the Healing Dao discussion group)

But if you look closely you will notice that discussion groups are actually not having really a tree structure. Posts do yes have one father, but they refer to many other posts. They might not explicitly link to all the posts they refer to, but they surely refer to many posts. This is because in discussion groups there isn’t usually the need to link to all the relevant posts. After all the readers are generally a filtered group of people. Also often a person will use one post to answer a whole bunch of other posts, especially inside a closed community, where everybody reads everything.

Yet the hierarchical way in which posts are written in a discussion group is really useful. You can in an instant perceive how many people answered, what where the thread departing from that post, etc.

Now look at a post in the blogging world. It refers to many other posts. It explicitly links to them. And if it is succesful it will have many posts linking to it themselves. Now forget a moment about the upward link. Each post posts that link to it. In a sense they are replies to it. The link to those posts is saved in the trackback list. And each of those posts itself will have certain posts that refer to it.

Are you starting to see it?
Each post is in a sense the root of a tree, whose branches are the posts that refer to it, and whose sub-branches are all the posts that refer to the branch posts. In a sense nothing new. But now, if you see your posts in this way, you can also wish not to display just the immediate trackbacks, the posts that refer to your posts. But also their trackback too.

And here is the first part of my dea. Since each post is available in feed format, it should be possible to fetch, for each post, not just the trackbacks, but the trackbacks trackback. The post that refer to the post that refer to your post. Which means seeing the tree starting from your post up to depth 2. And in theory it should be possible to reiterate the process, and go deeper and deeper.

Why is this important? Well, when you read a discussion group, it is often useful to see the hierarchical view.

Example
Title of the post 0:
BLAH
Content of the post 0:
blah, blah, blah, blah,
blah, blah, blah, blah,
blah, blah, blah, blah,

blah
Blah.
-Trackback 1
–Trackback to the trackback 1
–Second trackback to the trackback 1
-Trackback 2
-Trackback 3
–Trackback to the trackback 3
—Trackback to the trackback to the trackback 3
-Trackback 4
… and so on.

It might seem an expensive research, but when we read a post, and it has a certain number of trackbacks, it is quite important to see which of those lead to other posts and which didn’t.

And now we go to the second part of the idea.
In a sense there is no reason why the whole tree view structure should only work one way. I mean, each post links to many other posts. Each of those posts link themselves to other posts. And here we have another tree. This time a tree that goes backward in time.

So I think that for each post it should be possible to see both those views.

  • All the entries that are linked from it, and the entries that are linked to those entries, up to a specific depth.
  • All the entries that link to it, and the entries that link to those entries, up to a specific depth.
  • And maybe combine the two view having the first entries, in the format of one entry per line, above it. The later, again in the format of one entry per line, below.

I think this view would greatly increase the ability to see the local structure of the blogsphere. Of course the brothers of a particular entry (the entries that share the same parents) should also be available on the side. As well as the entries that are generally linked from the same offspring. But this is making it unnecessarily complicated. So let’s forget it for the time being.

So, we have reached the conclusion that each post uniquely defines two tree of other posts. The tree generated by it, and the tree that generates it. And I claim that we should work to be able to visualize those trees.

Doing it on Tagsurf
So, where did the idea came to me? Essentially working on tagsurf. Because, you see, tagsurf is maybe the first place where it would be really easy to visualize all this. You have many posts. There is the possibility (although I am not sure if it works right now) to send trackbacks from post to post. So each post does not need to have only one parent, but many. Many. It is true that, as it is now, trackbacks are not used inside the system. The reply is a different thing than the trackback. And each post only belongs to one thread which started with the first post that was not written as a reply to something. So there are quite some changes to be done, to let this vision ground in that system. But is is possible, and comparably easier to do than more generally in the blogsphere.

Those are the changes that I see have to be made to make it possible:

  • Make sure that it is possible to send trackbacks between different posts.
  • Organize all the reply so that they also send a trackback
  • Make sure that each time a post A sends a trackback to another post B, this is also stored inside A
  • Add a view down in time page, that from each post gives you that post, and all the posts that reply (that is trackback) to that post, and so on
  • Hack this page so that the post appear in a hierarchical way, where it is very clear who is answering to what. Generally the way in which livejournal handles comments is a good way
  • Since you stored all the trackback in both directions, organize a page view up in time, that from that post shows you all the posts that entry was answering to. And since they were themselves sending trackback to other posts, add those other posts as subbranches.
  • Make it very easy, given a certain post to use those two views, and try taking away the usual thread view. All the information should still be there.

Once the idea is in place you can then cross the idea with the idea of the tag, you could, for example, investigate one tagsurf entry (blog entry), and one tag. Then only the entries that contain that tag will appear in the two tress. And if an entry does not have that tag, then all its subbranches would be excluded, even if they have the tag. (Thanks Andy for this idea)

Doing it on Technorati
Another one that has all the information to generate those views would be Technorati. Of course I would rather see it in a decentralised way. But it would be so easy for them to do it, while to do it in a decentralised way might be such a nightmare, that I am absolutely hopeful that they might make it before. Think about it. A Technorati page: investigate blogsphere local structure. You pass an url to this page, and the said structure appears. Up to depth… say 3.

Update: BN (in the comments) points out to BlogPulse‘s Conversation Tracker, as a limited solution to what I was suggesting. It still has many limits, but it is surely a step in the right direction. Beside is good to be reminded that Technorati isn’t the only service to observe the blogsphere.

technorati tag & rss

Rss is somehow one of the best ideas. You can have your content, stripped of form BS being redirected all around. This gives a one to many structure. Now we need the opposite. We need to be able to pull the content from many sites in the same place, and check it. A many to one structure.
Most of you will say, “But we already have that, it’s called an aggregator. Just look at bloglines.

Yes, and no, that’s part of it, but it’s not the whole story. We need to have a page that posts all the content from everywhere in a single page.

And again I can hear: “but we have that too: it’s called a technorati tag“.

Again I will repeat: Yes, and no, that’s part of it, but it’s not the whole story. We need to pull the information from the technorati pages to our aggregator.

This is the idea: we need an rss feed of a technorati tag. As we can get the rss feed of a del.icio.us tag, we need to have it for all the blogs. The time have passed to add to your friend list ALL the blogs that might have information of interest. We need to be able to add that rss to our bloglines.

So, either technorati will start releasing the rss, or I predict that:

  • a) other services will start competing with technorati offering that info
  • b) anonymous hackers will start scrapping the info from technorati to offer the very valuable information.

See also:semanticweb, tags

Why you shouldn’t use furl

We must be stupid. I am being serious, we must be REALLY stupid.

It is possible that after many years of people blowing the whistle against people collecting personal information we still fall for it. Who am I refering to? But to Furl, of course. Because, you see, we are often in good faith, and when someone says:

Privacy
Privacy is probably a top priority for you. It certainly is for us at Furl. When you mark an item “private,” we respect your expectation that no one else will be able to see its contents. Other members cannot see your private items when they view your archive, and Furl Search (search all archives) is restricted to public items only. We have designed the Furl system to ensure that your private items and topics are secure. We will not sell your email address or privately-stored information, nor share it except in very specific cases described in our Privacy Policy.

Access to the servers that house your archive is restricted to a very small number of employees. Procedures strictly prohibit accessing a member’s information, except when necessary to diagnose a problem or as specified in our Privacy Policy (such as when ordered by a court of law).

We’re members of Furl, too, and demand the utmost respect for privacy.

We kind of believe we are safe, right? Wrong! Let’s re read it:

We will not sell your email address or privately-stored information, nor share it except in very specific cases described in our Privacy Policy.

Again:

except in very specific cases described in our Privacy Policy.

We can put it in music:
except, except, except…
except, except, except in very specific cases described in our Privacy Policy.
And you should thank that this is no podcast.

But more, at the end of the same page:

Important Note
The contents of this page do not replace, modify or supercede Furl’s Terms of Service and Privacy Policy. Please read them carefully before using Furl.

Let’s go and look at the privacy policy. After all those people at furl have our privacy as a top priority. Guess why?
And we don’t need to look very far to understand the true nature of the service:

Who is collecting my information?

Furl usually collects the requested information. However, Furl has chosen select partners in order to provide certain services. In order to use certain services on the Site, it may be necessary to enter information that then goes to our partner and is not kept by Furl.

We contract with Coremetrics, a service partner, to provide us with a data collection and reporting service for our Site. If you access the Site, Coremetrics may collect information about you on our behalf. For further information, including how to opt out of such data gathering, please see: http://www.coremetrics.com/info_eluminate2.html.

In other words: We don’t gather data, we let Coremetrics do it for us. And guess who is Coremetrics:

The company’s flagship product, Coremetrics Online Analytics 2004, is the industry’s only online marketing analytics platform that captures and stores all customer and visitor clickstream activity to build LIVE (Lifetime Individual Visitor Experience) profiles that serve as the foundation for all successful e-business initiatives. Through a patent-pending browser-based data collection technology, the Coremetrics Online Analytics 2004 Data Warehouse gathers and stores behavioral information directly from the visitor’s browser and records interactions in real-time to build LIVE Profiles.

It can hardly get worse than that.

But let’s keep on reading Furl Privacy Policy. After all our privacy is their first thought in the morning. Or so.

How does Furl use my information?

Furl’s primary goal in collecting personal information is to provide you, the user, with a customized experience on our service. This includes, or may include in the future, personalization services, interactive communications, online shopping, and many other types of services. In order to provide services free of charge, we will serve ads using content-targeting technologies, based on the content of your archived items.

But this is not all:

The following describes some of the ways that your information may be disclosed. Please note that this is not a complete list. The ways your information may be disclosed will change from time to time.

So even the privacy policy is not complete.

Or read this:

Coremetrics: Coremetrics may store certain data that we received from visitors to Furl (which may include email addresses), so that we may access this information via their reporting service. Furl will only use information shared with Coremetrics for proprietary Furl purposes. Coremetrics does not have the right to transfer your information to any party other than LookSmart.

Business Partners: LookSmart may disclose your personal information to our business partners in order to provide you with the services on the Site. If you have questions regarding the privacy policy or data-collection practices of one of our business partners, please contact that partner directly.

We are told the information is disclosed to business partners, but we are not told to whom. Yet we are asked to look at their privacy policy to understand what use do they do of this information.

They also spy when are you reading their e-mails:

We may also collect information through the use of “pixel tags” included in email messages we may send to you. Pixel tags are tiny graphic files, not visible to the human eye, that are included in HTML-encoded email messages. When such a message is opened in an HTML-capable email program, the recipient’s computer will access our server to retrieve the pixel tag file, allowing us to record and store, along with the recipient’s email address, the date and time the recipient viewed the email message, that the recipient’s email program is capable of receiving HTML-encoded email, and other standard logging information. The pixel tag also may see or read cookies.

The policy goes on, and forgive me for not analysing it all. I just didn’t have the guts. I understoo what I wanted, and here are my conclusions:

Conclusions
Furl collects personal information, gives this personal information to online partners for commercial purpose, including your e-mail address. Thus I don’t want to use furl and probably neither do you.

In short: Furl Sucks.
Amen.

Delicious Map Maker Available

I finally made it. Those holidays in Rome have been productive. I made a tool to automagically make a clustered delicious mind map. You need to have java installed, I’ll do the rest. It’s still pre-alpha, but it seem to be working fine up to now.

I used the previous algorithm, only debugged. You see before starting programming those maps I never programmed in python. So those are my first attempts. The more I learn, the more I discover shortcuts. The source code is here.

To test the program, I needed a lighter account (my delicious account have right now 400 tags), so I started a new account just to bookmark all the entries in this blog, and … wow. The map looks really nice!

I also added the tool to the general page with all the various index of the versions of the mind map of delicious.

All the maps that are completed are added to the end of the page. I think this is fair. I am really looking forward to see others people map, too. If you run the map more than one time, your name will appear on the page more than one time. Hopefully this should stop people from running the tool more than once a day… please.

I sincerely hope it will be succesful without giving me massive space problems.

A house divided

As the price of houses rises, more and more people find that the best solution is to divide a house among friends. Usually each person gets a room. The problem then is: who gets what room and how much should he pay. Usually the total rent is fixed, and usually the rooms are not exactly all the same. Some might be bigger, some smaller. Some might have a better view, more privacy, closeness to the toilet, more silence, and so on. And what’s also important is that different people might value the various elements in different ways.

I present here two ways of splitting the rent and dividing a house. I personally favour (and has designed) the second, but while I was presenting this method to some friends to get some
feedback, I was told the other, it seemed simpler, yet interesting enough to add it. They both assume that:
a) the rent is fixed,
b) there are no favoritism among the will-be-housemate on
who gets to choose first.

The ‘find the objective value first’ method.

Before the rooms are assigned, get together and agree on what are the objective value of each room (i.e. 20% of the rent for this, 50% of the rent for this). The total value must of course be the whole rent. Then randomly select who gets what room (at the agreed price), and as a final action people are allowed to exchange rooms if they want to.
Positive element: it is simple and quite straightforward.
Negative element: it assumes that people can easily agree on the actual relative value of the rooms, and that such value does not change respectively to the persons.

The ‘each person gets the best room’ method.

As I said this is the method that I love most. First of all let each person inspect all the room. Then each person, writes, secretly, the relative value of each room in a piece of paper. The sum of the values must be equal to the requested rent. The idea is to divide the house so that each person gets a room, and pays for that room the value THEY wrote on the piece of paper, while the sum of the valued paid by each person totally covers the requested rent.

Obviously, very often, the collected money would then be higher than the rent. Let’s call the collected money minus the monthly rent, the ‘extra money’.

Often there is more than one solution, that permit to have a some extra money each month. When this happens, the solution that permits to maximize the extra money is chosen. The extra money is then used to pay for the light, any extra expenses, or whatever is needed for the house.

Sometimes there are more than one optimal solution, that is some solutions generate the same extra money, everybody is paying the requested cost for each room, and all other solutions are less optimal. In that case the adopted solution will be one of the optimal one, randomly chosen.

Examples, examples:
Let’s suppose we have a house with 3 rooms (a, b, and c) and 3 persons (A, B, and C). Let’s suppose the total rent being 100.

Person A might find the three rooms equivalent, so he might just write (a: 33.3, b: 33.3, c: 33.3). Person B might instead favour room B, because is more sunny, and she likes to paint, and then she thinks that room ‘a’ is slightly better than room ‘c’, infact she would prefer not to be in room c at all, so she would write: (a: 35, b: 40, c: 25). Person C instead does not care about the sun, but has noticed that room A has more privacy, plus is near the toilet, and since he likes to have his gf as a guest, thinks that having room A would be a better deal. So he votes (a: 40, b: 30, c: 30).

Then the papers are revealed.

Generally when a room has a person that values it more than all the others, and he values that room more than all other ooms, then that room gets taken by that person at the price he has choose.

In our example we have:
A: (a: 33.3, b: 33.3, c: 33.3)
B: (a: 35, b: 40, c: 25)
C: (a: 40, b: 30, c: 30)
which would give us that A would get room ‘c’ paying one third of the rent. B would get room ‘b’ paying 40% of the rent, and C would get room ‘a’ for 40% of the rent… and the collected money each month would be 33.3+40+40=113.3 . The extra money would be 113.3-100=13.3 and would be used to pay for the electricity, water, gas, or whatever.

It is also possible to rinormalise the prices, by lowering them so that the total sum becomes exactly the cost of the rent, while the relative ratio remains the same. In our example
A: (33.3/113.3)*100=29.4
B: (40/113.3)*100=35.3
C: (40/113.3)*100=35.3
and person A would pay 29.4 of the rent (since he took the room nobody wanted)
person B would pay 35.3 of the rent (and took the sunny room)
person C would pays 35.3 of the rent (and took the room with more privacy)

So, what if the situation is not that easy. There isn’t a person that prefers each room? For example you could be in a situation like:
A: (a: 45, b: 45, c: 10)
B: (a: 40, b: 40, c: 20)
C: (a: 40, b: 30, c: 30)
well in this case it is obvious that person A will get either room a or room b. But it is also obvious that room c will go to person C. So C get’s c at 30% of the rent. Both A and B value the room a and b equivalently. But once the room will be assigned person A will pay more than person B, so it seem fair to me that person A chooses a or b and pays 45, and person B gets the remaining room, but pays less (40).

But things can get even more complicated if some people
value some rooms exactly the same:
A: (a: 45, b: 45, c: 10)
B: (a: 45, b: 45, c: 10)
C: (a: 40, b: 40, c: 20)
in which case A and B have obviously to randomly choose who gets what.

Or if the situation is symmethric among the rooms:
A: (a: 40, b: 30, c: 40)
B: (a: 40, b: 40, c: 30)
C: (a: 30, b: 40, c: 40)
In which case you randomly choose if A gets a or c, and then the other follow obviously.

So here we have the first mehtod, where everybody chooses the value together, this is equivalent on the second method if everybody agrees on the relative value:
A: (a: 35, b: 40, c: 25)
B: (a: 35, b: 40, c: 25)
C: (a: 35, b: 40, c: 25)
After which, also in this method, you would randomly pick who gets which room.

Please, let me know if you have tried it and if it was succesful.

Clustering Delicious Tags

I went on programming at my favourite Python program: Delimind.

In short: Made a new release of the Deli Mind program. Here is the source code (just remember to change it from a .txt to a .py). Now similar tags are clustered together.

  1. Here is how it looks like.
  2. Here is how the previous version looked like.
  3. The original from Brownhen (may he live long and prosper) used to be here, although now it is missing.

All on the same data. Mine, now.
Go and enjoy.
(Later addition: while the program works well for small databases of links, like mine at the time in which I wrote this entry, it doesn’t scale well on size. For this reason it crashes for most of the people who try to use it with more than 1000 bookmarks. For this reason I was forced to change the link on the cluster example to a database with fewer nodes.)

Now the tecnical stuff for those that have a bit more patience.

Tags are not all the same, some are more similar than others. So, for example, the tag “September11″ and “GeorgeBush” have more links in common than “GeorgeBush” and “intelligence”. The idea behind this version of DeliMind was to cluster tags that had links in common. Since distance is generally not a transitive property (if I am near to you, and you are near to Jim, I am not necessarily that near to Jim), while clustering is (if I and you are in the same cluster, and you and Jim are in the same cluster, then me and Jim have to be in the same cluster… unless people belong to different clusters, but that’s a complication).

So I started by making a matrix of relations among tags (all_dict). Each tag, respect to each other tag could either be

  1. Once contained in the other
  2. Identical
  3. Disjointed
  4. With # bookmarks in common

Then according to the number of links each of the two tags, and the number of links in common I invented a measure of similarity. If #A is the number of links in tag A, and #B is the number of links in tag B, and #AB is the number of links in common.
The the relative similarity (SAB) will be:
SAB= sqrt((#AB/#A)*(#AB/#B))

I actually played with various measures:
SAB= ((#AB/#A)+(#AB/#B))/2
SAB= Max(#AB/#A,#AB/#B)
They all went from 0 to 1, and were quite similar… (I am not going to discuss the relative properties)
But the first one just seemed the one that made more sense, and at the end, the resulting map was the one more close to my personal intuition of what should be in what cluster.

Once the similarity matrix was done I started studying the clusters. Generally for each triplet of tags A, B, C I would modify
SAC:=min (previous SAC, max (SAB, SBC))
And I would continue going through all possible triplets, and then starting again from the beginning until no new change were happening.

Why? The idea is that the similarity between two tags measure how easy it is to jump from one to the other. Visualise each tag as an island, and then you have an animal who can jump from one island to the other. But it can only jump up to a certain distance. So if he can find a succession of tags between two tags, A and B, where the similarity (the similarity is the inverse of the distance) is always above its jumping ability (that is, the distance is below its jumping ability), then the animal can move from A to B. If not A and B are in different clusters. Effectively unreachable.

But we don’t know how far can our beast jump. So in this way we end up having a similarity number that sais: somwhere, between A and B is possible to find a succession of tags, such that the distance is never above x, so SAB is equal to the minimum between the original SAB and x.

If it does feel complicated don’t worry. I got confused a few (hundred) times programming it. And just could not understand why those damn tags were not clustering… until I got it right.

So, now you have this nice matrix, only between your main tags (the one that are not contained in another tag, cfr previous version), and you (or actually I) need to cluster the tags.

Not also that you don’t need to cluster the tags only one time. Once you made a clustering (for animal which can jump d), you can still partition inside the clustering for animals that can jump less than d.
The first time I just asked him to cluster each possible number. That is, if a number was present assume that someone was able to jump exactly that distance. In this way I got a heavily clustered map. It was a mess, but a promising mess. I then saw that most of the interestign things were happening between distances of 0.333333 and 0.6666.

That is, it made quite sense to ask for the clusters generated by putting together tags that had one third of the links in common, and tags that had up to two third of the links in common.

This is how I got clusters:

  • porno, sex and eros
  • GeorgeBush, September11, politics, economy, historical, terrorism, usa
  • green, sustainability

Example of the Clustered Map
Then I just applied the same process in the subtags of each tag.

Ok, I can be satisfied, I can go and have something to eat.

As always, if you find it useful drop me a line, I appreciate.

Pietro

Hierarchical Delicious Free Mind Map

So, I just modified the deli.mind script, originally from brownhen.
The original would take the public bookmark from delicious and make a free mind map out of them.

(For those who have no time to read the whole post, I immediatly tell you that I modified the code. The new code can be found here, and an example is here -open some nodes to see the difference!-).

The program is written in python, and I wasn’t very happy with the result. I mean it was great to have the map, but at the same time I have so many tags, that it was pretty much useless. Now the fact is that we tend to reuse tags that we have already used. This generates a positive feedback dynamic, that tends to create a bunch of very common tags (even among your own tags) and many many tags used only one or two times. I bet you could also plot them into a nice power law picture (but, alas, you need at least 1000 tags, to make it statistically meaningful!). This is generally true, but is particularly true for people who, like me, tend to store each link with around 10 different tags. This means that this long list of tags, that was using up my screen, was mainly composed of completely unimportant tags, with only few interesting among them.

Not only this, but some tags, tend to appear only in conjunction with other tags. For example, the tag “python” comes always with the tag “programming”. In a sense it is a “sub tag”.

Oops, are we back into hierarchy, aren’t we?

Well, not exactly, first the same link can be present in different non hierachically related tags, and second two tags can have links in common, but not be completely hierarchically related (think about the tag ‘September11′ and ‘GeorgeBush’ as a good example). The last thing to note is that from time to time there are tags which have exactly the same links inside, either because they are synonimous (‘del.icio.us’ and ‘delicious’ for example) or because I had not stored enough links to differentiate between the two.

So the new program extracts the information about the relation among the tags, and uses it to build a more interesting mind map.

More precisly two tags can be:

  • Identical,
  • One inside the other,
  • Viceversa,
  • With a non empty intersection, but with some extra links,
  • Completely disjointed.

This information is then used to create the new mind map.

With the following novelties:

  • Sub tags are shown as a sub branch of their parent tag.
  • Tags that are equivalent are shown together with a little empty branch as their parent, to connect them all.
  • A sub tag can be sub tag to more than one tag.
  • Each tag also is followed by two numbers: # of links & # of sub tags.
    So you have an idea about how big is the tree you are going to explore.

Detail of a tag and its sub tagsYou can see my “hierarchical delicious free mind map” in java format here while the code is here.

I also fixed a couple of bugs. That would give some fake results. (i.e. being tagged as ‘socialsoftware’ does not mean being tagged as ‘war’, etc…)

This isn’t the end, I am planning to work on this some more, when I have time.

Disclaimer: This was also my first tentative hack in python. So I am sure I did plenty of things in a clumsy, slow and redundant way. But I am learning.

Acknowledgment: I am very grateful to brownhen., because if he didn’t release the first version of the script I would not have started at all.