Tweets

Follow @pietrosperoni (390 followers)

Categories

COP15 Needs an e-Government System

This morning I received a mail from Copenhagen. It was very moving, and describing a situation of chaos, strong commitment, and braveness. It told the story of people fighting with non violence, and shouting that they want change.

And I am afraid all this is useless. I feel once again what I felt looking at Iran insurgency. But stronger.

Let’s focus on Copenhagen. The sensation is that there is a lot of people on the street asking for a strong carbon tax. Count me among them. But there is more. I am afraid people have ideas, and those ideas are not being heard. And then people assume the worse, and assume the world leaders, the ministers, and everybody who is inside is on the pay check of some big corporations. And then they demand change. But now they do not focus any more on the small change. The key point. Now they want a huge change, that will not happen. And then there are rallies, and people pushing, and the police resisting. And violence. Yes police violence should not be there. And I feel this is not the way. It is not by shouting “Shame on you”, that you win the heart of the police men. It is not by shouting to people that you get yourself heard. As it does not change if I write this in normal letters or in CAPS LOCK. It is the content that matters And when you are shouting, when you are polarized, you are already making violence. This is not the way.

Now, there are people who work hard to negotiate among different positions. The Center for Non Violent Communication is probably one of the best. It was created by Marshall B. Rosenberg. One of the student of Gandhi (or so I remember from his book. The wikipedia page does not seem to mention it). Now Marshall has worked in the past as a negotiator between groups, and I am sure there are a number of very good negotiator working inside the conference to negotiate between the  key people. What I don’t think there is, are negotiator between the people in the conference and people outside.

It is like all the effort is concentrated in getting the communication going between those big players. But no work has been done to get enough communication between the inside of the conference and the outside. The assumption seem to be that either there are no good ideas outside or it is just impractical to engage them. I think both of those assumption are wrong. Yes, we still need to develop the tools to make an efficient brainstorm with millions of people. But the idea of having everybody writing their own ideas, and voting on the ideas they like is already a good start. Why is there no system like this to harvest the ideas from the people?

I was just looking at a youtube cnn conference where people sent questions, and voted on the questions. Again there is the assumption that normal people are just ignorant. This is not true. Not anymore (if it ever was). Not with the internet that let anyone study any topic.

In all those situations we need to set up systems where people can chip in their ideas. While it is happening, can read each other ideas. And the most voted ideas emerge from the noise to the people who are making the discussion.

In memory of David Carradine, what would Master Po say?

I wonder how Caine would have commented David Carradine’s death. Especially the modus operandi.

I can see it. The light dissolves, and a new scene is in front of our eyes:

Master Po walks. Young Caine is thoughtful…

-What’s the matter, Grasshopper? Continue reading In memory of David Carradine, what would Master Po say?

The new project: TagBay. Tagging e-Bay

It is now the time to present the next project we have been working to: TagBay. And I say ‘we’, because is this project I am not alone. I did it with a friend of mine, Derek, who accepted, very patiently to code, some of the idea I have been tinkering around in the last year or so. I am speaking about how tags, and tag clouds, and distance between tags, and so on.
So, in brief we made a web site to tag material that is being sold on e-Bay. Anybody can tag any object that is being sold. Not only can any object be tagged but you can tag sellers, too (oh, we are not responsible for offensive tags, eh!).
Tags on objects can be made private or public, and you can also search among your tags, among everybody else tags, and eventually (when we code it) it will be possible to search among the tags of another user, like in del.icio.us.

Now that the summary for the people who have no time has been done, let’s try to explain the idea in the details for those who have a bit more time.

Tag Bay: Tagging e-Bay

Pages:
On TagBay, right now, there are 3 type of pages: e-Bay Search Pages, TagBay Tag Search Page, TagBayUser Tag Search Page, Item Page, and Seller Page.

  • Search Page: It is possible from inside Tag Bay to make searches on e-Bay on specific keywords. The user can then add tags to each object that came out, store the tags added all at once, or store the tags of a single object. The same thing can be done in the Tag Search Page
  • TagBay Tag Search Page: In this page the user gets all the results for a single tag that someone have used. Nothing fancy (for now). Items where the tag only appears as a private tag will not appear here.
  • TagBayUser Tag Search Page: In this page the user gets all the results for a single tag that he have used. If the user is logged in and is looking at his own tags, also the items tagged in a private way will appear.
  • Item Page: Each object has its specific page. From such page any user can see what are the public tags that other users have used for that page. Also they can define their personal tags for that object, if their tags are going to be private, and the tags of the seller.
  • Seller Page: And then there is the seller page, and in the seller page any user can tag any seller. The use of tag for sellers is still limited, but will be increased in the future.

The natural use of the site

  • For a seller or for a shop A seller might want to use the site to tag all the objects that he is selling, giving for each object all the tags related. Thus increasing the possibility for it to be found. We suggest to list the tag in the order of importance, as soon we are going to use the order consider the importance in the search page.
    Also, if a person wants to make a cool list of objects, they can tag exactly those objects, with a tag they never used, and then link to the page in their directory of this tag. Thus creating on the spot, their lists. Also sellers will want to tag their objects, and people making searches will tag objects to make lists of objects they want to follow, before jumping on a transaction. We think there is more than enough material to generate interesting behaviour. It doesn’t have to be exactly the same emergent behaviour that we are used to see. After all we are just exploring the possibilities of social folksonomy.
  • A shop To the possibilities before, a shop who is selling on ebay might be interested to make sure that the shop itself (remember that you can tag sellers, and not only shops) have all the tags related to the merchandise that they are selling
  • Someone buying Our suggestion for someone who wishes to buy, on e-Bay, would be to first look under the tag search, to see if there is anybody who has already tagged any object that they are interested in. This does not necessarily be someone else who is buying, but also someone who is selling. Then tag the objects they are interested themselves, to have it in their own list of objects. Then they could go to the search e-Bay page with the necessary keywords, and add the chosen tag to all the objects interesting. At that point a first selection have been made, and all the possible objects have been tagged. At this point, he could choose one or those objects, change the tags to private, and start betting on it.
  • Someone suggesting And finally if someone is just trying to suggest some possible objects, he could search e-Bay for those objects, tag them with a unique tag and present the url of the list to whoever is interested.

There are many other ways to use TagBay. In a sense TagBay is a toy, and not a game. And as every good toy it can be used for many different games. We suggest here only some of them. Also TagBay itself is rapidly evolving. We have tons of stuff we are interested in including, and if you have been reading my blog, you know how my problem is always to find people to code my ideas, more than to find them. And this is why I am so happy for Derek work!

Difficulties that we found:
There were a number of issues that came out when we started developing this program.

  1. Public vs private tags:
    Why would someone tag an object if they are interested to buy it? After all aren’t they making it easier to others to find it, by adding those tags?
    This was a serious doubt that we had, and finally we decided to give the possibility to users to tag objects privately. Yet there have to be a balance between private tags and public tags, as public tags are necessary to generate the emerging folksonomy that we wish to use. So we decided for a compromise: public tags can be done from the search page, but private tags requite you to go to the specific object page. In our view (but we are ready to be proven wrong) someone would go to the search page, tag all the entries where he might be interested. Then chose one, and tag that one in a private way.
  2. Limitations due to the temporary nature of the objects
    Considering that most object exist on ebay only for few weeks before being sold, wouldn’t this be not enough time to make a tag cloud and let all cool emergent properties that folksonomy induces, appear?
    Maybe, but sellers also can tag the objects they are selling, thus giving a fresh start to all the objects. Also side by side to tagging object we are giving the possibility to tag sellers. Which eventually should survive each transaction and build up an interesting tag cloud.
  3. I spoke about sellers tagging their own objects, but wouldn’t this invite people to spam your site? After all, wouldn’t it be much better for a seller to add many tags to be present in many searches?
    Ah ha! You think tag clouds can be spammed. This is false. Tag clouds cannot be spammed, and no one understand this. And we shall use this site to prove it. We have nothing against spammers, they are absolutely welcome in our site and spam it as much as they feel. Add all the tags they want to each object they sell. It will make ABSOLUTELY NO DIFFERENCE in the search page. Tag clouds are unspammable. And our engine will use tag clouds as its base. Everybody else uses tag sets. And this makes them easily spammable. So, no we don’t fear spammers. In fact we hope that spammers will come to our TagBay site. They are just people trying to sell their stuff, we are trying to make sellers meet with buyers. Wouldn’t be bad to single out spammers just because they are spammers.

TagBay is obviously still in beta, and there are many things that need to be coded. If you have any idea on how to make it better please do not hesitate to contact me. If you want to make a difference on what the final product will be now is the time to do it. Also all new suggestion implemented should be listed in a special page with links to the original suggester home page.

Sleeping patterns: when is it better to sleep

In the last weeks I have been interested in polyphasic sleep. Polyphasic sleep is a sleeping pattern where the person does not sleep in one big chunck, but in many roughly equivalent pieces throughout the whole day. The first time I heard about it was from my father, some twenty years ago. The pattern is sometimes used by solo sailors, who travel through long oceanic trips. My father has always loved to sail, and read many books on the subject; so that’s how he knew about it. According to thos books Leonardo Da Vinci was a polyphasic sleeper, sleeping some 15 minuites every 2 hours.

But I don’t want to discuss in this post about polyphasic sleep. I want to describe everything else I know about sleep, so that at a later post I can say: “And this has totally blown off everything I knew about sleep” (with a link!). It’s like when in go you play a stone, that is not that important, but such that later you can link to it. You build your framework.

And if all I said in the rest of this post will sound like pseudoscience, is because mostly is. It comes out of personal observations, some lessons explained, learned, and integrated, but no scientific work that I know off (or that I searched for).

My knowledge about sleep originate from a lesson I received some sixteen years ago. At the first yoga class I went to. The teacher explained to me that not every hour is equally important to sleep. There are some moments that are definitly more important, and others less. Some time between Continue reading Sleeping patterns: when is it better to sleep

Small China

We have all heard the news that do-no-evil Google has accepted to comply with Chinese laws and ban some words from the search results (Google testimony here). More than that China is censoring media, editors, journalist, blogs, and practically any form of free expression. According to this article this censorship is not having the desired effect from the government. The only reason they give is that there are simply too many blogs.

Well, I have a different idea, I think that censorship is not useless as a strategy for China’s government. It is counterproductive. It is making the the chinese blogsphere stronger. Let me explain why do I think it is so. Continue reading Small China

wikitags

I think it’s the time to present what have I been doing in the last days. A number of improvement have beed added to this web site. In short I have upgraded to wordpress 2.0. I also moved to the next version of wikka. Some of you might remember that I offered some money to whoever could write some code to get the tag plugin to generate an rss list. I didn’t, at the time, explained why. I will now.

Wordpress 2.0 gives the possibility to start categories on the fly. Just adding them, by listing them. Essentially this makes the category in wordpress work like tags (or keywords, for academics). But categories in wordpress also have an rss feed connected to them. Albeit with some bugs, like linking to the whole blog and not to the particular category. So I passed most of the first of January adding to the entries the relative tags as categories. So now I have no need of an rss feed for the tag page, as the tag page has been substituted with the category pages.

You also will rememer that I installed Wikka. The wiki engine. Now wikka is not only open source but also easy source. It is so simple that even I could hack the code. That is very simple! So I changed the code and inserted the possibility to have default pages. In short if before if you were to look for the url http://wiki.pietrosperoni.it/someunexistingpage and there is no page in the wiki called “someunexistingpage” the result would be that the wiki would ask you to edit the page, and you would be redirected to http://wiki.pietrosperoni.it/someunexistingpage/edit.
Now he would create on the fly the page someunexistingpage with the default content. And the default content I chose was the 4 rss feeds:

  • the feed from my blog from the category: someunexistingpage
  • the feed from my technorati from the tag: someunexistingpage
  • the feed from my delicious bookmarks from the tag…
  • and the feed from the popular pages in delicious, always from that tag

So for each tag I now have a wiki page with the most relevant rss appearing there. But being a wiki page I also can add other rss feeds, write definitions, comments, todo lists. In short modify it as I see fit.

Still it is not perfect. As it writes the page the first time, from that moment the page is set. I can delete it, but I cannot, for example, change the default content for all the pages that only contain the default content. I tried to write a plugin to do that, but I failed when I confronted the fact that I needed to write a plugin {{defaultpage}} who should have activated other plugins:{{rrs}}, for example. Something that I ignored how to do.
Also having the same string to work for delicious (as tag), wordpress (as category name) and wikka (as pagename) puts some heavy constraints on what the string might contain. For example I am already running ashore for all the tags that contain a dot inside (aaargh, del.icio.us!) or an accented letter (aargh, dear italian).

If you want to see how the pages look like just see the idea page. But any link from the right column (provided they have no dots inside or accents) will work fine.

Integrating browsers and feed readers

Speaking about things that should happen, and that have not happen, one thing that I still have not seen is a browser with a feed reader such that when you go to the web page that is linked to a certain feed, it updates the feed and assumes that you have now read that feed. So for example I add Jim feeds, but on a certain period I am just passing through his web page very often. Obviously if he is updating his blog I would see it. The feed reader does not need to give me that information.

I have added the feeds of some major newspaper, but I also often go to the web page, and the result is that bloglines keep on telling me that those feeds have 200 entries that I should look at.

Ruling Class translation

A fast note to point out that Phil Edwards took the ball and translated the previous post. The post was an answer to a post from my father, and as such in Italian.

Phil uses the post as his launch pad for a very interesting one on how he sees the long tail having effect on the society and how he sees this not happening. I don’t think I agree fully on what Phil writes, but I need some time to gather my ideas, and answer properly.

What follows is Phil translation of my post. The translation is incomplete, but correct and faithful to the original spirit (although I am not convinced I would translate classe dirigente as ruling class), as such I am happy to copy it here. Continue reading Ruling Class translation

A new Base

Hello everybody, I’m back.
The vacations were very good, and soon I’ll pull all the pictures on the moblog, with the descritpions and the embarassing details. Now I’m back to work in Jena.
One of the things that is happening to me at the time is that my computer is physically falling to pieces. It is an old laptop of a brand I don’t wish to nominate not to increase their visibility since it gave me all sort of problem freezing about one time a week at least. Having to look for a new model I started thinking about the products around, and I reached the conclusion that we are ready for another big jump. And generally a new product which will make a new base in the economy of Pda. (Yes I am looking for a laptop for me and I end up writing about PDA. So what? Sue me).

It seem to me that there are 4 different products on the market which really need to be integrated and when they will be integrated there will be a big jump in terms of potentiality. I am speaking of:

  • Smart Phones (with Camera)
  • PDA
  • GPS
  • IPod

Right now no product that I know of that is on the market have is really all four at the same time. There are some shy tentative to do the integration but no one has really managed.

Let’s look why those 4 instruments should all be present in one tool and what extra do we get. The smart phone will give the possibility to phone and more important to be in touch on the internet all the time. The more we go on the lower the tarif will be. In Italy is already possible to have a flat rate of 20 Euro a month that let you connect to internet anytime between 6pm and 8 am and during weekends and holidays (It is through TIM, if you are looking for it). It is very good. I wish here in Germany there was a similar possibility. With time the prices will inevitably drop. What all this mean is that the tool, that from now on I will call eBase, will have the possibility to be in contact with the internet pretty much all the time.

When I was in Prague at the European Go Congress I bought a small PDA, used, for 20 euro. It works fine and it permitted me to simply record all the official game I did. If the tool was already online I could have sent them to internet immediatly. But wait, if the tool is on internet I can also play go online directly from the PDA. Imagine, you are on your bus and you play online with someone, far far away, in a distant galaxy. And because more and more of the work is moving on internet, with del.icio.us, calendars on line, office on line, and so on, this would mean that you get all your data all the time. And this on a PDA. So on something that is big enough to actually do some work, read some web pages, and generally be useful.

And now the first critic will be, but what about making phone calls. Will not that be unconfortable? Oc course an eBasa have to be bigger than one of those miniaturized phones that are available right now. On the other hand I see more and more people using those bluetooth microphone to speak. You can still keep your eBase in your pocket while you speak.

Now PDA and phone integrated are already around. And they work quite well. In fact many smart phones are in a sense a PDA plus a phone. I use to be quite skeptic about those tools. Especially about how easy it is to write on the screen itself, but after my last PDA I had to change my mind. It works very well.

But now the last two elements, the IPod and the GPS. Already many smart phones have an mp3 reader inside. But the real novelty in the IPod is not the mp3 reader, but the memory. The fact of having 40 gb of memory means that everybody can carry ALL the music he likes with her. No smart phone that I know of offers this service. This is fundamental. If eBase have also to be a working tool it has to have a huge memory. 40 GB is the minimum. But what else would mean to have an IPod fully integrated with a PDA plus a smart phone. Well, for once it means that you can get your podcast directly on your phone. And since (see above) your phone is supposed to be on internet pretty much all the time you are getting your podcasting all the time from internet directly on your phone. Yes, we had reinvented the radio. And since this is going to have a bigger screen than a mobile (apart that flexible screens are coming out in an case), we can even consider having video podcasting with us. Essentially to have it we just would need some of the smart phone that we already have around and pack them with enough memory, and a flat rate connection to internet. Nothing too incredile.

And the last one is the GPS. We already have that too. Some smart phone have GPS included, and many can have it outside. It seem that until things are not integrated in the base object people don’t use them. It it was for me I would have never bought an external camera to use with my phone. But the model I needed had also a camera included, so I had to take it. And it ended up being what I use most, and I have the most fun with. I think the same goes with GPS. Wait until all phone have GPS. Wait until any phone can tell you the road to anywhere you want. Both in terms of streets, and in terms of physical distance (3 miles in direction 121 degrees). Wait until every picture that you make comes with the exact coordinates of when and where it was taken. And then you will have the possibility to put a picture on internet and everybody can find that exact spot (and of course you can obscure that possibility, to protect your privacy). Wait until you can make a search on internet on all pictures around a certain place. And then the integration of the internet and human beings that wish to be part of the internet can be really strong. And then you can search a post in your blog by where it was written, more than when it was written. And any document we write will not only have the date but also the location. And we start refering to places with their coordinates. Like we do now with time.

Think about it. Before the invention of the calendar people would refer to time by relating with big events. Two years before the flood. 3 years after the king got into power…
Now we have some universal ways to refer to time. 5th of October 2003. We still refer to things respect to an event (’3 months after the twin tower’,…) but mostly we use the western coordinates. And the fact of giving to each user the possibility to know what time it was (the invention of the watch) made it possible and practical for people to refer to time in a precise way. As soon as we have some simple object that is with us all the time and that tells us our spatial position we will start to use this coordinates in the same way.

So I think those 4 objects should come together. And they will. And whoever will do it will get a big share of the market (and maybe this is why it had not happened yet, as each company keeps hold of its own patents to avoid the other being able to make the integration themselves).

I only gave here a sketch of the possibilities of such a tool. In a sense something like an eBase could be enough for a person to interact with the social coomunity. Would be enough to let someone discuss with others and participate in the emerging democracy that we seem to have finally started creating.

I am sure this object will come out. The question is not ‘if?’, the question is ‘when?’, and ‘what should I buy in the meantime?’.

Tag Clouds are hard to Spam

I think the time have come to write my third, and hopefully last contribution to the topic of tagclouds.

I have been hearing a lot of talk on how users should not use too many tags in linking to url. I also am the maintainer of the mindmap maker, and I often look at some of the maps generated (available to everybody). There is a number of people who tend to use an average of between one and two tags per URL. Their maps are often very ordered. No clustering, no hierarchy. (Forgive me if I don’t put a link to such a map, but since I am going to bash this way of using delicious, I’d rather bash a method than a specific human being. Just go to the list of maps and open a couple, odds are one of them will be of the type I am describing). This way of using delicious uses tags as folders, just with the modification that every now and then you can put an URL in more than one folder at the same time. A bit like big bookstore might carry several copies of the same book, and store them in more than one place (and the Tao Te Ching, ends up in New Age -God knows why- and in Religion).

Of course tags tend not to fit exactly. My Tag Clouds and Cultural Change will be under Tags or Folksonomy or Sociology… Whatever you chose you probably will not put it under Ajax. And yet most of the analysis was done studying the spreading of the term Ajax.

Let’s make a few simple calculations. Continue reading Tag Clouds are hard to Spam

My reading list

Some of you know that I left ProtoLife. I hope in the future to keep on colaborating with the P.A.C.E. project, but for now that’s it. I want to go back to Germany, and finish my Ph.D. .
Somehow this seem to have a higher priority. Beside is getting clearer for me each day that I am hardly a ‘company’ type of person. I’d rather make research inside the university, or nowhere at all. While I was in Venice I met with the local group of Go players. Sandro, one of those made a big impression on me, being a person of great knowledge, whose only excuse was: “I don’t look at television, I read.”.

I decided that it made sense to read more… that it made sense to read ‘cum grano salis‘. With intelligence, choosing carefully what to read,

When I came back to Rome I decided that it made sense to read more. More than this, that it made sense to read ‘cum grano salis‘. With intelligence, choosing carefully what to read, and not reading any bullshit the latest friend suggested me. I remember telling a friend, “you know, I decided to stop just following my nose, on what books to read…”
His answer was quite funny: “If you don’t follow your nose, what do you follow? Other people’s nose”. And then he added: “This is actually a serious question, you might for example, find some people that you really don’t like, ask their suggestion, and then took off the book they suggest you from your reading list”. As you will see by the end of this entry, this ended up being very near the mark.

So I started asking around what where the books (or document) they felt where more important to understand the world we are living in

So I started asking around what where the books (or document) they felt where more important to understand the world we are living in. As an example I often gave where the acts of the Second Vatican Council (in particular Dignitatis Humanae). Since I am not catholic (nor even Christian), by suggesting something that was not traditionally seen as a classical text, I was implicitly suggesting: the documents that are behind the world we are living in, the documents that most people refer too, but few really read.

I did not receive many lists, but here and there someone would suggest a book or two, that I would dutifully add to my note. I then started keeping track of this list in a separate page on my blog. Since I did not publicize the page no one would read it. The list is nowhere ended, and I feel its inadeguateness knowing all the wonderful books that should be there, but I preferred to keep it small, and add new books slowly.

While I was keeping the list in the back burnere, and slowly going through some of those books, I found another list a much better one from which I am about to fatten my list. And the story of how I found it, and how it relates to my list is very funny, so let me tell it to you.

The list have it all, it’s the most complete list of texts I found that were really important to understand the world we are living in. Each of those book inspired millions of people.

A right winged newspaper: Human Events online, asked

a panel of 15 conservative scholars and public policy leaders to help us compile a list of the Ten Most Harmful Books of the 19th and 20th enturies.

They crossed the information between the various people and came to a list of 10 really dangerous books, and 21 ‘honorable mentions’. The list have it all, it’s the most complete list of texts I found that were really important to understand the world we are living in. Each of those book inspired millions of people. Just to understand where those people come from the book is important. Fundamental I would say. You have it all: Freud, Darwin, Gramsci, Marx, Engels, Mao, the Kinsey report (the Kinsey Report! That I wanted to read from so long). Dewey, that I have been told set the foundation for modern relativism thinking (so dear to our new pope). There is also Mein Kampf, which I am not sure if I’ll have the guts to read, but I probably should. And many others, Betty Friedan (don’t you want to understand Feminism? Read it, too), Keynes, Adorno… Is a wonderful list.

Interesting enough I was not the only one to see this as my next reading list. On the delicious page of the people who bookmarked the article the most common comment is “my next reading list”, ” A very interesting list of powerful books that have changed history.”, “some good reading”, “…some of these would make my required reading list…”, “…would make an excellent library booklist.” and so on.

Buy ‘The Kinsley report in the human male’, and help sustain the neocons battle.

One of the things that you should not fail to notice is that each book in the top ten most dangerous book is presented with a link. The link to Amazon. But is not just a normal link to Amazon. Amazon let you sign an agreement so that you can advertise some books from your website, and if people buy your book, they get a discount, and you get a percentage. So, yes you got it, each of those book is presented in that format. If you click on those links to go to Amazon, and you buy the book, the right winged journal will get a percentage. Will get a percentage out of you buying Mein Kampf, and The Communist Manifesto. “Buy ‘The Kinsley report in the human male’, and help sustain the neocons battle.”. Ah, the irony of all this.

no I don’t ask people I don’t like which books to read and cross them out. I ask them which books not to read, and add them to my reading list. Way more efficient!

So, to answer my friend, “no I don’t ask people I don’t like which books to read and cross them out. I ask them which books not to read, and add them to my reading list. Way more efficient!”

And if you read all this, and want to add something to my reading list, feel free to suggest:
“what books or document would you suggest to understand the world we are living in”. And tell us why, in what way was this book so unique that reading it is a must. Now the line is yours.

Cloudalicious suggestions

Terrell Russell asked for some suggestion on how to improve his tool, Cloudalicious. He asked for it 3 times, one on the del.icio.us mailing list, and one on a comment on my previous entry, and one on his site (link missing as he wisely took this one off). Now I really think three times are too much, and Terrell should be heavily chastised for this. So I will write him a loong list of things that I think his tool should do, and maybe next time he will think better before asking people how should he employ his time as a programmer. I am always happy to give ideas to people, provided a) they remember me when the idea makes them incredibly rich b) they do the coding. Continue reading Cloudalicious suggestions

Tagclouds and cultural changes

In the previous post I discussed how we can measure the relative importance of tags in a post, by calculating their weight, as

  • weight of tag t= (number of people using t)/(total number of people)

I also said that:

Not only we could study a culture by studying the differences in the power law approximated by the tag clouds used by people of that culture. But we could even measure cultural eartquake by measuring the difference between the tag cloud being generated before a certain event, or after a certain event.

Independently Clay Shirky was coming at a similar conclusion, although he more focused on temporal changes that seem more signature of a particular subgroup of people all bookmarking a site at a certain time:

During a period of about 120 users’ additions of OIO, 20 or of them used the tag ‘ia’, putting it between #7 and #10 during that period. Now it is down to #17. This suggests that one or a few IA-oriented sites or mailing lists posted the link, and it got a flurry of attention from those taggers in a narrower window of time. This in turn suggests a conversationally tightly-knit IA community.

Through this tool we can see changes in the culture we are living in. We are used to feel those changes, but generally we never were able to measure them. Maybe now we might start to be able to do it.

But let’s go back to the tag weight. Terrell Russell took the ball, and in one evening of programming presented a tool to actually see how the weights change in time.

Nothing to say about the tool. It works perfectly well, and although it can be enhanced in many little ways, it already is very useful. Not bad for one evening.

More interesting, from my point of view, is how, through this tool we can see changes in the culture we are living in. We are used to feel those changes, but generally we never were able to measure them. Maybe now we might start to be able to do it.

No change

Tag Clouds rapidly converging

First of all I would like to show you the graph of a part of the culture where no changes are happening:
From the site: Nifty Corners. 1859 people having bookmarked it by now. The values soon converge to what we can expect to be their definite value (for the culture we are in).

Little Social-Quake

Continue reading Tagclouds and cultural changes

On Tag Clouds, Metric, Tag Sets and Power Laws

Note: This entry is connected also to a mindmap. Some people were having problems in opening the page because of that. As such the mindmap has been stored in a separate page, and can be viewed from here.

Introduction

As correctly pointed out by Jeffrey Zeldman tag clouds are becoming more and more popular. Yet I keep seeing services which should be using tag clouds that keep on using tag sets. It is not just a problem of programming a tool which can only support tag sets, but also but also of programming tools which might in principles produce tag clouds, but such that the users are not invited to use a tag if one already exists, and as such don’t generate a tag cloud.

Example of the first type of tools are Flickr, 43things, consuMating, tagsurf * , example of the second is the tagged version of the BBC* . In all those cases a tag set is used, where instead a tag cloud would be more appropriate. Some of the differences between a tag cloud and a tag set where explained in Vanderwal.net: Explaining and Showing Broad and Narrow Folksonomies. Let’s see them again, and see some consequences of those differences, which should clarify when is better to use one tool and when is better to use the other. Continue reading On Tag Clouds, Metric, Tag Sets and Power Laws

Blog your mind map

Great results! Great results!

I finally managed to find a way to integrate my mind maps with my blog. Not just as static images or as external pages, but as living entities inside the blog. I learned it from the code in Wikka. And as always in those cases, once you find the solution, it just look at it, and it seems obvious… after.

Of course it requires that the reader has Java 1.4 installed.

This is the code, but please do not use http://maps.pietrosperoni.it/freemindbrowser.jar as the address for your freemind browser java code, but copy the freemindbrowser.jar file in your directory, and use it from there.

This is the code:
<applet code="freemind.main.FreeMindApplet.class" archive="http://maps.pietrosperoni.it/freemindbrowser.jar" width="100%" height="450">
<param name="type" value="application/x-java-applet;version=1.4" />
<param name="scriptable" value="false" />
<param name="modes" value="freemind.modes.browsemode.BrowseMode" />
<param name="browsemode_initial_map" value="http://maps.pietrosperoni.it/TaoistBooks.mm" />
<param name="initial_mode" value="Browse" />
<param name="selection_method" value="selection_method_direct" />
</applet>

And here is how it looks like (on my map of all my taoist books)
Continue reading Blog your mind map

Visualizing the double hierarchical nature of entries.

I keep on being hunted by a nightmare:

Think about a post. You write a post, and this is in answer to some other posts, some other web pages, done by someone else. And your post will often be answered by other people. In a sense no post is an island. Given a post you can see all the post that answered it, or reviewed it. This through the trackback list. And they themselves has other post that answered them. And so on. But this does not work only one way. You can also go backward in time (which in fact is what we usually do when we follow the links.) You read a post, then you read the post that post is refering to, and so on. And in my dream this is a sort of tapistry, where each post is a node that links together different threads. So each post is not just contained in a thread, but connects to many threads that work through it.

Now think about a discussion group. In a discussion group each post is part of a tree. Each post can be answered by many posts, but it has only one father. One post it is itself answering to. And because of this structure it is possible, and actually easy to generate the classical hierarchical structure, that you can see pretty much everywhere in discussion group. (i.e. the Healing Dao discussion group)

But if you look closely you will notice that discussion groups are actually not having really a tree structure. Posts do yes have one father, but they refer to many other posts. They might not explicitly link to all the posts they refer to, but they surely refer to many posts. This is because in discussion groups there isn’t usually the need to link to all the relevant posts. After all the readers are generally a filtered group of people. Also often a person will use one post to answer a whole bunch of other posts, especially inside a closed community, where everybody reads everything.

Yet the hierarchical way in which posts are written in a discussion group is really useful. You can in an instant perceive how many people answered, what where the thread departing from that post, etc.

Now look at a post in the blogging world. It refers to many other posts. It explicitly links to them. And if it is succesful it will have many posts linking to it themselves. Now forget a moment about the upward link. Each post posts that link to it. In a sense they are replies to it. The link to those posts is saved in the trackback list. And each of those posts itself will have certain posts that refer to it.

Are you starting to see it?
Each post is in a sense the root of a tree, whose branches are the posts that refer to it, and whose sub-branches are all the posts that refer to the branch posts. In a sense nothing new. But now, if you see your posts in this way, you can also wish not to display just the immediate trackbacks, the posts that refer to your posts. But also their trackback too.

And here is the first part of my dea. Since each post is available in feed format, it should be possible to fetch, for each post, not just the trackbacks, but the trackbacks trackback. The post that refer to the post that refer to your post. Which means seeing the tree starting from your post up to depth 2. And in theory it should be possible to reiterate the process, and go deeper and deeper.

Why is this important? Well, when you read a discussion group, it is often useful to see the hierarchical view.

Example
Title of the post 0:
BLAH
Content of the post 0:
blah, blah, blah, blah,
blah, blah, blah, blah,
blah, blah, blah, blah,

blah
Blah.
-Trackback 1
–Trackback to the trackback 1
–Second trackback to the trackback 1
-Trackback 2
-Trackback 3
–Trackback to the trackback 3
—Trackback to the trackback to the trackback 3
-Trackback 4
… and so on.

It might seem an expensive research, but when we read a post, and it has a certain number of trackbacks, it is quite important to see which of those lead to other posts and which didn’t.

And now we go to the second part of the idea.
In a sense there is no reason why the whole tree view structure should only work one way. I mean, each post links to many other posts. Each of those posts link themselves to other posts. And here we have another tree. This time a tree that goes backward in time.

So I think that for each post it should be possible to see both those views.

  • All the entries that are linked from it, and the entries that are linked to those entries, up to a specific depth.
  • All the entries that link to it, and the entries that link to those entries, up to a specific depth.
  • And maybe combine the two view having the first entries, in the format of one entry per line, above it. The later, again in the format of one entry per line, below.

I think this view would greatly increase the ability to see the local structure of the blogsphere. Of course the brothers of a particular entry (the entries that share the same parents) should also be available on the side. As well as the entries that are generally linked from the same offspring. But this is making it unnecessarily complicated. So let’s forget it for the time being.

So, we have reached the conclusion that each post uniquely defines two tree of other posts. The tree generated by it, and the tree that generates it. And I claim that we should work to be able to visualize those trees.

Doing it on Tagsurf
So, where did the idea came to me? Essentially working on tagsurf. Because, you see, tagsurf is maybe the first place where it would be really easy to visualize all this. You have many posts. There is the possibility (although I am not sure if it works right now) to send trackbacks from post to post. So each post does not need to have only one parent, but many. Many. It is true that, as it is now, trackbacks are not used inside the system. The reply is a different thing than the trackback. And each post only belongs to one thread which started with the first post that was not written as a reply to something. So there are quite some changes to be done, to let this vision ground in that system. But is is possible, and comparably easier to do than more generally in the blogsphere.

Those are the changes that I see have to be made to make it possible:

  • Make sure that it is possible to send trackbacks between different posts.
  • Organize all the reply so that they also send a trackback
  • Make sure that each time a post A sends a trackback to another post B, this is also stored inside A
  • Add a view down in time page, that from each post gives you that post, and all the posts that reply (that is trackback) to that post, and so on
  • Hack this page so that the post appear in a hierarchical way, where it is very clear who is answering to what. Generally the way in which livejournal handles comments is a good way
  • Since you stored all the trackback in both directions, organize a page view up in time, that from that post shows you all the posts that entry was answering to. And since they were themselves sending trackback to other posts, add those other posts as subbranches.
  • Make it very easy, given a certain post to use those two views, and try taking away the usual thread view. All the information should still be there.

Once the idea is in place you can then cross the idea with the idea of the tag, you could, for example, investigate one tagsurf entry (blog entry), and one tag. Then only the entries that contain that tag will appear in the two tress. And if an entry does not have that tag, then all its subbranches would be excluded, even if they have the tag. (Thanks Andy for this idea)

Doing it on Technorati
Another one that has all the information to generate those views would be Technorati. Of course I would rather see it in a decentralised way. But it would be so easy for them to do it, while to do it in a decentralised way might be such a nightmare, that I am absolutely hopeful that they might make it before. Think about it. A Technorati page: investigate blogsphere local structure. You pass an url to this page, and the said structure appears. Up to depth… say 3.

Update: BN (in the comments) points out to BlogPulse’s Conversation Tracker, as a limited solution to what I was suggesting. It still has many limits, but it is surely a step in the right direction. Beside is good to be reminded that Technorati isn’t the only service to observe the blogsphere.

technorati tag & rss

Rss is somehow one of the best ideas. You can have your content, stripped of form BS being redirected all around. This gives a one to many structure. Now we need the opposite. We need to be able to pull the content from many sites in the same place, and check it. A many to one structure.
Most of you will say, “But we already have that, it’s called an aggregator. Just look at bloglines.

Yes, and no, that’s part of it, but it’s not the whole story. We need to have a page that posts all the content from everywhere in a single page.

And again I can hear: “but we have that too: it’s called a technorati tag“.

Again I will repeat: Yes, and no, that’s part of it, but it’s not the whole story. We need to pull the information from the technorati pages to our aggregator.

This is the idea: we need an rss feed of a technorati tag. As we can get the rss feed of a del.icio.us tag, we need to have it for all the blogs. The time have passed to add to your friend list ALL the blogs that might have information of interest. We need to be able to add that rss to our bloglines.

So, either technorati will start releasing the rss, or I predict that:

  • a) other services will start competing with technorati offering that info
  • b) anonymous hackers will start scrapping the info from technorati to offer the very valuable information.

See also:semanticweb, tags

Third Map Maked Debugging session

Did some more debugging. Now any unicode the user used in the tags should be ok. Still there is a big brick wall in terms of memory usage. And some users are not having any luck just out of the fact that their map is taking so much resources that it goes beyond the ISP limit. I could work hard and distribute the whole calculation so that all variables are stored on disk, so the memory would never be hit, but honestly, it is not my top priority at the moment. I am here to help those users run the program on their own machine. And eventually we might solve that problem too. So, what are my top priorities:

  • Add an rss feed.I would like to add an rss feed that every time a new map is done, the feed gets updated. It wouldn’t just tell the name but all sorts of data, like the list of the Main Tags. So the users could see if they might be interested in checking the new buddy’s map
  • Insert a way for user to delete their own maps. If I am going to go into hosting business, I am not going to be one of those hosts where you can add info, but you cannot delete it. I am aware that users info ultimately is adding value to my site, as such I want users to be happy in having their map here. Not forced.
  • Insert a general log of all the maps that are being started, and ended. Right now such a log is absent, and there are about 200 maps completed, and more than twice maps that have been started. So about 300 have been dropped. I bet many of those users would have success, if they tried right now, after those 3 deubugging session. Still I want something that tells me: Warning warning warning, map dropped. Bug? OutOfMemoryError?
  • Add the number of posts inside a tag. Just obvious
  • Probably add some of the MainTags as keywords to each single map. The problem is: which? All is too much. All the ones that contain more than x posts, y subtags is not flexible enough. The solution should be: if a MainTag is part of a ParetoFront of Delicious than the keyword should be there. The fact that this means writing a whole program that stores in a database the latest ParetoFront is just a small detail ;) . And before you ask: no, I will not need anybody’s password to do that, and the data will all be public.
  • Add a bookmarklet to save a map in your own delicious, with the keywords as tags
  • Change the map, so that it can run on a single tag. Useful for big complex maps like mine, and others.
  • Make it change the Title of the Map Page, to show the owner of the map. Useful if people want to add the maps to their delicious pages.

And then there are some tests I would like to make, like:

  • Check if it would make sense to show all the tags that appear with a single tag, and not the subtags.

There is more? If you can think of other modifications , please drop a line in the comment section. Also if you tried to run the map maker and it is not giving you satisfaction let me know. I’ll whip it appropriatly. HarHarHar. (I’ve always wanted to say that!)

Observations while Clustering Mike’s bookmarks

The first person to use the tool (presented here) was Mike Harris, for his delicious entries. Note immediatly how the time needed to compute the map has little to do with the number of posts, and much to do with the number of tags.

  • WCityMike: 2029 Posts, 87 Tags and 81 Main Tags, calculated in 86.85 seconds.
  • p.s.blog: 21 Posts, 43 Tags and 17 Main Tags, calculated in 0.23 seconds.
  • pietrosperoni: 372 Posts, 400 Tags and 152 Main Tags, calculated in 377.40 seconds.

The Main Tags, are the tags that will appear as main branches. And we can also see a difference between Mike maps, and mine. In mine I tend to have about 0.4 of the tags as Main Tags, while Mike tends to have something more near 0.9. This is probably due to the fact that I tend to apply many tags to each post (four or five are common, but sometimes more), while Mike tends to use an average of one or two.

If we look at the map we can also see that there are less clusters than in my map. Note for example how in the small blog map nearly everything is clustered… and those are only 20 posts and 17 Main Tags.

If we look at the source code we can see that, on the 9th line some constants are set:

distances_constant= [0.333333,0.4,0.5,1]

Those constants define the minimum distance for entries to be in the same cluster.
The 1/3 means that if one third of the posts between two tags are in common then the tags should be in the same cluster. And so on. Tags that are farther apart, but have a path of tags between them such that you can go from one to the next without never going above that distance are in the same cluster, too. A process that in the log is referred to as making the distances tables transitive.

Those number have been specifically tweaked for my delicious posts (and generally my style of bookmarking). It seem obvious that for Mark the numbers should be different. Since it is more uncommon for him for posts to share a tag, probably the numbers should be lower. Something like:

distances_constant= [0.1,0.333333,0.25,0.4,1]

The last 1 is just to make sure that tags that are synonimes are shown together.

I think eventually I will modify the program so that it is possible to insert your own constants from outside. But for now I am just grateful to Mike for giving me the material to understand better how to enhance the program.

Delicious Map Maker Available

I finally made it. Those holidays in Rome have been productive. I made a tool to automagically make a clustered delicious mind map. You need to have java installed, I’ll do the rest. It’s still pre-alpha, but it seem to be working fine up to now.

I used the previous algorithm, only debugged. You see before starting programming those maps I never programmed in python. So those are my first attempts. The more I learn, the more I discover shortcuts. The source code is here.

To test the program, I needed a lighter account (my delicious account have right now 400 tags), so I started a new account just to bookmark all the entries in this blog, and … wow. The map looks really nice!

I also added the tool to the general page with all the various index of the versions of the mind map of delicious.

All the maps that are completed are added to the end of the page. I think this is fair. I am really looking forward to see others people map, too. If you run the map more than one time, your name will appear on the page more than one time. Hopefully this should stop people from running the tool more than once a day… please.

I sincerely hope it will be succesful without giving me massive space problems.

A house divided

As the price of houses rises, more and more people find that the best solution is to divide a house among friends. Usually each person gets a room. The problem then is: who gets what room and how much should he pay. Usually the total rent is fixed, and usually the rooms are not exactly all the same. Some might be bigger, some smaller. Some might have a better view, more privacy, closeness to the toilet, more silence, and so on. And what’s also important is that different people might value the various elements in different ways.

I present here two ways of splitting the rent and dividing a house. I personally favour (and has designed) the second, but while I was presenting this method to some friends to get some
feedback, I was told the other, it seemed simpler, yet interesting enough to add it. They both assume that:
a) the rent is fixed,
b) there are no favoritism among the will-be-housemate on
who gets to choose first.

The ‘find the objective value first’ method.

Before the rooms are assigned, get together and agree on what are the objective value of each room (i.e. 20% of the rent for this, 50% of the rent for this). The total value must of course be the whole rent. Then randomly select who gets what room (at the agreed price), and as a final action people are allowed to exchange rooms if they want to.
Positive element: it is simple and quite straightforward.
Negative element: it assumes that people can easily agree on the actual relative value of the rooms, and that such value does not change respectively to the persons.

The ‘each person gets the best room’ method.

As I said this is the method that I love most. First of all let each person inspect all the room. Then each person, writes, secretly, the relative value of each room in a piece of paper. The sum of the values must be equal to the requested rent. The idea is to divide the house so that each person gets a room, and pays for that room the value THEY wrote on the piece of paper, while the sum of the valued paid by each person totally covers the requested rent.

Obviously, very often, the collected money would then be higher than the rent. Let’s call the collected money minus the monthly rent, the ‘extra money’.

Often there is more than one solution, that permit to have a some extra money each month. When this happens, the solution that permits to maximize the extra money is chosen. The extra money is then used to pay for the light, any extra expenses, or whatever is needed for the house.

Sometimes there are more than one optimal solution, that is some solutions generate the same extra money, everybody is paying the requested cost for each room, and all other solutions are less optimal. In that case the adopted solution will be one of the optimal one, randomly chosen.

Examples, examples:
Let’s suppose we have a house with 3 rooms (a, b, and c) and 3 persons (A, B, and C). Let’s suppose the total rent being 100.

Person A might find the three rooms equivalent, so he might just write (a: 33.3, b: 33.3, c: 33.3). Person B might instead favour room B, because is more sunny, and she likes to paint, and then she thinks that room ‘a’ is slightly better than room ‘c’, infact she would prefer not to be in room c at all, so she would write: (a: 35, b: 40, c: 25). Person C instead does not care about the sun, but has noticed that room A has more privacy, plus is near the toilet, and since he likes to have his gf as a guest, thinks that having room A would be a better deal. So he votes (a: 40, b: 30, c: 30).

Then the papers are revealed.

Generally when a room has a person that values it more than all the others, and he values that room more than all other ooms, then that room gets taken by that person at the price he has choose.

In our example we have:
A: (a: 33.3, b: 33.3, c: 33.3)
B: (a: 35, b: 40, c: 25)
C: (a: 40, b: 30, c: 30)
which would give us that A would get room ‘c’ paying one third of the rent. B would get room ‘b’ paying 40% of the rent, and C would get room ‘a’ for 40% of the rent… and the collected money each month would be 33.3+40+40=113.3 . The extra money would be 113.3-100=13.3 and would be used to pay for the electricity, water, gas, or whatever.

It is also possible to rinormalise the prices, by lowering them so that the total sum becomes exactly the cost of the rent, while the relative ratio remains the same. In our example
A: (33.3/113.3)*100=29.4
B: (40/113.3)*100=35.3
C: (40/113.3)*100=35.3
and person A would pay 29.4 of the rent (since he took the room nobody wanted)
person B would pay 35.3 of the rent (and took the sunny room)
person C would pays 35.3 of the rent (and took the room with more privacy)

So, what if the situation is not that easy. There isn’t a person that prefers each room? For example you could be in a situation like:
A: (a: 45, b: 45, c: 10)
B: (a: 40, b: 40, c: 20)
C: (a: 40, b: 30, c: 30)
well in this case it is obvious that person A will get either room a or room b. But it is also obvious that room c will go to person C. So C get’s c at 30% of the rent. Both A and B value the room a and b equivalently. But once the room will be assigned person A will pay more than person B, so it seem fair to me that person A chooses a or b and pays 45, and person B gets the remaining room, but pays less (40).

But things can get even more complicated if some people
value some rooms exactly the same:
A: (a: 45, b: 45, c: 10)
B: (a: 45, b: 45, c: 10)
C: (a: 40, b: 40, c: 20)
in which case A and B have obviously to randomly choose who gets what.

Or if the situation is symmethric among the rooms:
A: (a: 40, b: 30, c: 40)
B: (a: 40, b: 40, c: 30)
C: (a: 30, b: 40, c: 40)
In which case you randomly choose if A gets a or c, and then the other follow obviously.

So here we have the first mehtod, where everybody chooses the value together, this is equivalent on the second method if everybody agrees on the relative value:
A: (a: 35, b: 40, c: 25)
B: (a: 35, b: 40, c: 25)
C: (a: 35, b: 40, c: 25)
After which, also in this method, you would randomly pick who gets which room.

Please, let me know if you have tried it and if it was succesful.

wikipedia fast search

I added an extra bookmarklet. I was in this room with 25 great minds discussing molecular dynamics inquantum fields. I couldn’t understand a iota. Luckily new talks are given in places with wifi connection. So to try to get up to speed with what was going on I wrote a small wikipedia fast search.

Now I could just type “w molecule” in the link bar and the browser would automatically go to http://en.wikipedia.org/wiki/Molecule.

So how do you do it?
Just copy the link in your bookmarks. Copy it in the “Quick Search” directory, then edit the properties and add keyword “w”. And voilà .
Following a talk with it is much faster.

More than del.icio.us: org.asm.ic

You know, maybe because my father has been a journalist for so many years I have always been raised to appreciate the complexity of life. And yet I still can’t understand why do we need all this complex machinery.

I totally agree on the importance of the semantic web. And, boy, am I thrilled on the possibility that we might be generating the the internet operating system. I am also aware of the cutting edge problem of who owns your data.

But what I just can’t get is why do we need to make things that can actually be quite simple, into this amazing complexity. I might not be getting the whole picture, and I admit ignorance, above stupidity. But still, why do we need to build this whole house all at once? Example del.icio.us has been amazing, and trivial at the same time. And amazing also because it was so trivial.

Now let’s expand the concept:
Instead of storing one single link let’s store two links, and a set of tags in the middle. Two links with their two titles and maybe their two descriptions. And one set of tags between them.

And people will naturally start using interesting tags.

Like:
‘explains’, ‘terrorises’, ‘defines’, ‘is’, ‘IsTerrorisedBy’, ‘embedds’, ‘uses’, …

It will also be fun.

And then get a page for all the links that uses a certain URI as it’s first link(…/subj/…), and another for those that uses another as their second URI (…/obj/…).

Then you can use delicious to store those pages.

The bookmarks manager was del.icio.us? I can assure you that this will be at least org.asm.ic! And it will not cost the programmer more than 200 lines of code. LAMP, PHP, MySQL, keep it simple. And we will all use it.

Partial translations

Have you ever tried google translate service? I know, if you did you wish you didn’t, unless you were bored, and were looking for some ways to amuse yourself. But you know, translating text is a really daunting task. Generations of PhD’s have been spent in progressing the state of the art just a little bit every time. I know what I am speaking about, I lived with some of them in COGS, at Sussex University. I remember reading somewhere that new, better automatic translators will soon be available. Good! We are waiting for them.

In the meantime…

I had this idea:
Have you ever tried to translate a page from a language you don’t know… quite well. But you are not also totally ignorant about. Something in between. Here in Europe is quite common. And the same is true when I read posts in Portuogese, or in American from people on the other side of the ocean.

Yes, I can try to use Google translate mechanism, but it doesn’t give me something easyto chew. Look at this post, for example:

Depois do high vem o low. É uma lei do universo.
E no low todo mundo é feio e o mundo é triste e é tudo um saco.

E eu já nem sei o que me move.
From here

Google translates it as:

es low.? a law of the universe. E in low everybody? ugly and the world? sad e? everything a bag.

E I j? nor I know what it moves me.

From my darling Alenahra.

I suppose a better translation would be:

After a high comes a low. It is a law of the universe. And in a low everybody is ugly and the world is sad and everything is empty.

And I still don’t know what is that moves me.

And Ale’ will tell me if I got it right.

My idea is that Google, instead of providing for a tentative answer should provide for all the possible translations for each word. Those translated words should appear when we point to a word with the mouse. I know it is a slow way of reading a document, one word at a time, but soon the reader will catch up the most common words, and will speed up.

What follow is an example. Move on the words to see the title appear. I used some simple translation that I could find. Obviously the tool I envision would have to be more professional.

Depois do high vem o low. É uma lei do universo. E no low todo mundo é feio e o mundo é triste e é tudo um saco.

E eu já nem sei o que me move.

In Italy right now more and more people are getting confortable with english. If you werte to come here only 10 years ago most people would refuse to even try to speak engliish, even if they studied it in school. Now, I believe thanks to internet, people are reading english pages daily, the dictionary often ina corner of the desk, ready to be used. It would be helpful for them to have sucha system.

And I would finally learn Portuogese!

Porto Alegre, aspettami!

Special thanks to travlang.com for providing part of the translations.

Clustering Delicious Tags

I went on programming at my favourite Python program: Delimind.

In short: Made a new release of the Deli Mind program. Here is the source code (just remember to change it from a .txt to a .py). Now similar tags are clustered together.

  1. Here is how it looks like.
  2. Here is how the previous version looked like.
  3. The original from Brownhen (may he live long and prosper) used to be here, although now it is missing.

All on the same data. Mine, now.
Go and enjoy.
(Later addition: while the program works well for small databases of links, like mine at the time in which I wrote this entry, it doesn’t scale well on size. For this reason it crashes for most of the people who try to use it with more than 1000 bookmarks. For this reason I was forced to change the link on the cluster example to a database with fewer nodes.)

Now the tecnical stuff for those that have a bit more patience.

Tags are not all the same, some are more similar than others. So, for example, the tag “September11″ and “GeorgeBush” have more links in common than “GeorgeBush” and “intelligence”. The idea behind this version of DeliMind was to cluster tags that had links in common. Since distance is generally not a transitive property (if I am near to you, and you are near to Jim, I am not necessarily that near to Jim), while clustering is (if I and you are in the same cluster, and you and Jim are in the same cluster, then me and Jim have to be in the same cluster… unless people belong to different clusters, but that’s a complication).

So I started by making a matrix of relations among tags (all_dict). Each tag, respect to each other tag could either be

  1. Once contained in the other
  2. Identical
  3. Disjointed
  4. With # bookmarks in common

Then according to the number of links each of the two tags, and the number of links in common I invented a measure of similarity. If #A is the number of links in tag A, and #B is the number of links in tag B, and #AB is the number of links in common.
The the relative similarity (SAB) will be:
SAB= sqrt((#AB/#A)*(#AB/#B))

I actually played with various measures:
SAB= ((#AB/#A)+(#AB/#B))/2
SAB= Max(#AB/#A,#AB/#B)
They all went from 0 to 1, and were quite similar… (I am not going to discuss the relative properties)
But the first one just seemed the one that made more sense, and at the end, the resulting map was the one more close to my personal intuition of what should be in what cluster.

Once the similarity matrix was done I started studying the clusters. Generally for each triplet of tags A, B, C I would modify
SAC:=min (previous SAC, max (SAB, SBC))
And I would continue going through all possible triplets, and then starting again from the beginning until no new change were happening.

Why? The idea is that the similarity between two tags measure how easy it is to jump from one to the other. Visualise each tag as an island, and then you have an animal who can jump from one island to the other. But it can only jump up to a certain distance. So if he can find a succession of tags between two tags, A and B, where the similarity (the similarity is the inverse of the distance) is always above its jumping ability (that is, the distance is below its jumping ability), then the animal can move from A to B. If not A and B are in different clusters. Effectively unreachable.

But we don’t know how far can our beast jump. So in this way we end up having a similarity number that sais: somwhere, between A and B is possible to find a succession of tags, such that the distance is never above x, so SAB is equal to the minimum between the original SAB and x.

If it does feel complicated don’t worry. I got confused a few (hundred) times programming it. And just could not understand why those damn tags were not clustering… until I got it right.

So, now you have this nice matrix, only between your main tags (the one that are not contained in another tag, cfr previous version), and you (or actually I) need to cluster the tags.

Not also that you don’t need to cluster the tags only one time. Once you made a clustering (for animal which can jump d), you can still partition inside the clustering for animals that can jump less than d.
The first time I just asked him to cluster each possible number. That is, if a number was present assume that someone was able to jump exactly that distance. In this way I got a heavily clustered map. It was a mess, but a promising mess. I then saw that most of the interestign things were happening between distances of 0.333333 and 0.6666.

That is, it made quite sense to ask for the clusters generated by putting together tags that had one third of the links in common, and tags that had up to two third of the links in common.

This is how I got clusters:

  • porno, sex and eros
  • GeorgeBush, September11, politics, economy, historical, terrorism, usa
  • green, sustainability

Example of the Clustered Map
Then I just applied the same process in the subtags of each tag.

Ok, I can be satisfied, I can go and have something to eat.

As always, if you find it useful drop me a line, I appreciate.

Pietro