Tweets

Follow @pietrosperoni (388 followers)

Categories

transition done

We seem to have made it.
The website is now hosted on different servers, at dreamhost.com.
Of course what could not be done was to resurrect the wiki, which now should instead be recreated from scratch. And, much worse, the mindmapping tool. The page from which it was possible to make a mind map of a delicious account. Right now I have unfortunately no time for much, as I am in the last months of my Ph.D., yet I hope to find at least the time to put back the program to make the mindmaps. So at least people will be able to make the mindmaps themselves. Apologise to all the spammers that sent me spam in the last 2 weeks. Knowing that all the latest comments were going to be lost anyway I avoided to mark them (about a thousand messages) as spam. Please don’t feel ignored, just sent your message again and I shall trash it asap.

Mind Map Maker being tested

The good news is that finally the Mind Map Maker is being used and tested. The bad news is that it does not always work. Somehow it would have been easier if it never worked. I think there are two problems: one problem is that it requires some heavy download from del.icio.us. No matter if the download are for different account, they are all coming from the same IP, so I would not be surprised to discover that del.icio.us have bashed the program on the head more than once. I can somehow half the request by making the program calculate the whole list of tags, instead of downloading it as a separate file. I had it already on my todo list, and I think I will do it tomorrow. So, if you have requested for a password and it did not appear, than fear not, just try again in some half an hour. (Alenahra, I’m speaking to you for example!)

But this is not the only reason why the map maker is failing. There have also been cases where the map maker made some ‘perfectly acceptable’ maps from my point of view, but that for some reason are unreadable from the mind map. What am I refering to: but to niels77 for example, for whom the program made what seem as a perfectly acceptable .mm file but that for some reason neither the java program, nor the free mindmap in my computer seem able to read. This is the kind of mistery that are more easily unraveled in the morning.

But for few maps who don’t make it many did. Just go to the Make Map page and choose one, any one. And each will tell you a story, a point of view, a set of interests, and a suggestion on how that person sees the world. The more I use them the more I like them.

BTW the Make Map has also made it to the popular page. I feel so unprofessional in noting it ;)

Update I checked how many directories have been created respect to how many maps have been completed. The ratio is about 110:70 That’s not that good. It means that if you ask for a map you have about 1/3 of probability that it will not make it. For now just wait some time than try again.

Clustering Delicious Tags

I went on programming at my favourite Python program: Delimind.

In short: Made a new release of the Deli Mind program. Here is the source code (just remember to change it from a .txt to a .py). Now similar tags are clustered together.

  1. Here is how it looks like.
  2. Here is how the previous version looked like.
  3. The original from Brownhen (may he live long and prosper) used to be here, although now it is missing.

All on the same data. Mine, now.
Go and enjoy.
(Later addition: while the program works well for small databases of links, like mine at the time in which I wrote this entry, it doesn’t scale well on size. For this reason it crashes for most of the people who try to use it with more than 1000 bookmarks. For this reason I was forced to change the link on the cluster example to a database with fewer nodes.)

Now the tecnical stuff for those that have a bit more patience.

Tags are not all the same, some are more similar than others. So, for example, the tag “September11″ and “GeorgeBush” have more links in common than “GeorgeBush” and “intelligence”. The idea behind this version of DeliMind was to cluster tags that had links in common. Since distance is generally not a transitive property (if I am near to you, and you are near to Jim, I am not necessarily that near to Jim), while clustering is (if I and you are in the same cluster, and you and Jim are in the same cluster, then me and Jim have to be in the same cluster… unless people belong to different clusters, but that’s a complication).

So I started by making a matrix of relations among tags (all_dict). Each tag, respect to each other tag could either be

  1. Once contained in the other
  2. Identical
  3. Disjointed
  4. With # bookmarks in common

Then according to the number of links each of the two tags, and the number of links in common I invented a measure of similarity. If #A is the number of links in tag A, and #B is the number of links in tag B, and #AB is the number of links in common.
The the relative similarity (SAB) will be:
SAB= sqrt((#AB/#A)*(#AB/#B))

I actually played with various measures:
SAB= ((#AB/#A)+(#AB/#B))/2
SAB= Max(#AB/#A,#AB/#B)
They all went from 0 to 1, and were quite similar… (I am not going to discuss the relative properties)
But the first one just seemed the one that made more sense, and at the end, the resulting map was the one more close to my personal intuition of what should be in what cluster.

Once the similarity matrix was done I started studying the clusters. Generally for each triplet of tags A, B, C I would modify
SAC:=min (previous SAC, max (SAB, SBC))
And I would continue going through all possible triplets, and then starting again from the beginning until no new change were happening.

Why? The idea is that the similarity between two tags measure how easy it is to jump from one to the other. Visualise each tag as an island, and then you have an animal who can jump from one island to the other. But it can only jump up to a certain distance. So if he can find a succession of tags between two tags, A and B, where the similarity (the similarity is the inverse of the distance) is always above its jumping ability (that is, the distance is below its jumping ability), then the animal can move from A to B. If not A and B are in different clusters. Effectively unreachable.

But we don’t know how far can our beast jump. So in this way we end up having a similarity number that sais: somwhere, between A and B is possible to find a succession of tags, such that the distance is never above x, so SAB is equal to the minimum between the original SAB and x.

If it does feel complicated don’t worry. I got confused a few (hundred) times programming it. And just could not understand why those damn tags were not clustering… until I got it right.

So, now you have this nice matrix, only between your main tags (the one that are not contained in another tag, cfr previous version), and you (or actually I) need to cluster the tags.

Not also that you don’t need to cluster the tags only one time. Once you made a clustering (for animal which can jump d), you can still partition inside the clustering for animals that can jump less than d.
The first time I just asked him to cluster each possible number. That is, if a number was present assume that someone was able to jump exactly that distance. In this way I got a heavily clustered map. It was a mess, but a promising mess. I then saw that most of the interestign things were happening between distances of 0.333333 and 0.6666.

That is, it made quite sense to ask for the clusters generated by putting together tags that had one third of the links in common, and tags that had up to two third of the links in common.

This is how I got clusters:

  • porno, sex and eros
  • GeorgeBush, September11, politics, economy, historical, terrorism, usa
  • green, sustainability

Example of the Clustered Map
Then I just applied the same process in the subtags of each tag.

Ok, I can be satisfied, I can go and have something to eat.

As always, if you find it useful drop me a line, I appreciate.

Pietro

Hierarchical Delicious Free Mind Map

So, I just modified the deli.mind script, originally from brownhen.
The original would take the public bookmark from delicious and make a free mind map out of them.

(For those who have no time to read the whole post, I immediatly tell you that I modified the code. The new code can be found here, and an example is here -open some nodes to see the difference!-).

The program is written in python, and I wasn’t very happy with the result. I mean it was great to have the map, but at the same time I have so many tags, that it was pretty much useless. Now the fact is that we tend to reuse tags that we have already used. This generates a positive feedback dynamic, that tends to create a bunch of very common tags (even among your own tags) and many many tags used only one or two times. I bet you could also plot them into a nice power law picture (but, alas, you need at least 1000 tags, to make it statistically meaningful!). This is generally true, but is particularly true for people who, like me, tend to store each link with around 10 different tags. This means that this long list of tags, that was using up my screen, was mainly composed of completely unimportant tags, with only few interesting among them.

Not only this, but some tags, tend to appear only in conjunction with other tags. For example, the tag “python” comes always with the tag “programming”. In a sense it is a “sub tag”.

Oops, are we back into hierarchy, aren’t we?

Well, not exactly, first the same link can be present in different non hierachically related tags, and second two tags can have links in common, but not be completely hierarchically related (think about the tag ‘September11′ and ‘GeorgeBush’ as a good example). The last thing to note is that from time to time there are tags which have exactly the same links inside, either because they are synonimous (’del.icio.us’ and ‘delicious’ for example) or because I had not stored enough links to differentiate between the two.

So the new program extracts the information about the relation among the tags, and uses it to build a more interesting mind map.

More precisly two tags can be:

  • Identical,
  • One inside the other,
  • Viceversa,
  • With a non empty intersection, but with some extra links,
  • Completely disjointed.

This information is then used to create the new mind map.

With the following novelties:

  • Sub tags are shown as a sub branch of their parent tag.
  • Tags that are equivalent are shown together with a little empty branch as their parent, to connect them all.
  • A sub tag can be sub tag to more than one tag.
  • Each tag also is followed by two numbers: # of links & # of sub tags.
    So you have an idea about how big is the tree you are going to explore.

Detail of a tag and its sub tagsYou can see my “hierarchical delicious free mind map” in java format here while the code is here.

I also fixed a couple of bugs. That would give some fake results. (i.e. being tagged as ’socialsoftware’ does not mean being tagged as ‘war’, etc…)

This isn’t the end, I am planning to work on this some more, when I have time.

Disclaimer: This was also my first tentative hack in python. So I am sure I did plenty of things in a clumsy, slow and redundant way. But I am learning.

Acknowledgment: I am very grateful to brownhen., because if he didn’t release the first version of the script I would not have started at all.