Tweets

Follow @pietrosperoni (388 followers)

Categories

Related Posts

Related posts:

  1. COP15 Needs an e-Government System This morning I received a mail from Copenhagen. It was very moving, and describing a situation of chaos, strong commitment, and braveness. It told the story of people fighting with non violence, and shouting that they want change. And I am afraid all this is useless. I feel once again what I felt looking at Iran [...]...

Hierarchical Delicious Free Mind Map

So, I just modified the deli.mind script, originally from brownhen.
The original would take the public bookmark from delicious and make a free mind map out of them.

(For those who have no time to read the whole post, I immediatly tell you that I modified the code. The new code can be found here, and an example is here -open some nodes to see the difference!-).

The program is written in python, and I wasn’t very happy with the result. I mean it was great to have the map, but at the same time I have so many tags, that it was pretty much useless. Now the fact is that we tend to reuse tags that we have already used. This generates a positive feedback dynamic, that tends to create a bunch of very common tags (even among your own tags) and many many tags used only one or two times. I bet you could also plot them into a nice power law picture (but, alas, you need at least 1000 tags, to make it statistically meaningful!). This is generally true, but is particularly true for people who, like me, tend to store each link with around 10 different tags. This means that this long list of tags, that was using up my screen, was mainly composed of completely unimportant tags, with only few interesting among them.

Not only this, but some tags, tend to appear only in conjunction with other tags. For example, the tag “python” comes always with the tag “programming”. In a sense it is a “sub tag”.

Oops, are we back into hierarchy, aren’t we?

Well, not exactly, first the same link can be present in different non hierachically related tags, and second two tags can have links in common, but not be completely hierarchically related (think about the tag ‘September11′ and ‘GeorgeBush’ as a good example). The last thing to note is that from time to time there are tags which have exactly the same links inside, either because they are synonimous (’del.icio.us’ and ‘delicious’ for example) or because I had not stored enough links to differentiate between the two.

So the new program extracts the information about the relation among the tags, and uses it to build a more interesting mind map.

More precisly two tags can be:

  • Identical,
  • One inside the other,
  • Viceversa,
  • With a non empty intersection, but with some extra links,
  • Completely disjointed.

This information is then used to create the new mind map.

With the following novelties:

  • Sub tags are shown as a sub branch of their parent tag.
  • Tags that are equivalent are shown together with a little empty branch as their parent, to connect them all.
  • A sub tag can be sub tag to more than one tag.
  • Each tag also is followed by two numbers: # of links & # of sub tags.
    So you have an idea about how big is the tree you are going to explore.

Detail of a tag and its sub tagsYou can see my “hierarchical delicious free mind map” in java format here while the code is here.

I also fixed a couple of bugs. That would give some fake results. (i.e. being tagged as ’socialsoftware’ does not mean being tagged as ‘war’, etc…)

This isn’t the end, I am planning to work on this some more, when I have time.

Disclaimer: This was also my first tentative hack in python. So I am sure I did plenty of things in a clumsy, slow and redundant way. But I am learning.

Acknowledgment: I am very grateful to brownhen., because if he didn’t release the first version of the script I would not have started at all.

Related posts:

  1. COP15 Needs an e-Government System This morning I received a mail from Copenhagen. It was very moving, and describing a situation of chaos, strong commitment, and braveness. It told the story of people fighting with non violence, and shouting that they want change. And I am afraid all this is useless. I feel once again what I felt looking at Iran [...]...

12 comments to Hierarchical Delicious Free Mind Map

  • ronnie

    Hi

    This looks very interesting. Where can I get the python script. The link above is broken

    Thanks
    Ronnie

  • Robb Broome

    Have you seen the 3d visual thesaurus? I think that a modified version of this would be an excellent way to depict the information in del.icio.us. Nodes = tags? items related to tags = urls, with each url surrounded by all the tags it’s received? (yours or the whole community.. .

  • Robb: Actually no. But I will check it out. Pietro

  • JOYCE

    WHAT’S A URI? AND FXJ IS WHERE? THANK YOU JOYCE

  • [...] . Now similar tags are clustered together. Here is how it looks like. Here is how the previous version looked like. Here is how the original from Brownhen (may he live long and prosper) l [...]

  • A dream delicious client
    A little over a month ago I mentioned my contemplation of tag based bookmark management. Since then I have made a commitment of sorts to move my links to del.icio.us. Lately, instead of working though my bookmarks and uploading the links I want to keep…

  • [...] artial translations Taoist Books Mind Map Clustering Delicious Tags Entering protolife Hierarchical Delicious Fre [...]

  • [...] artial translations Taoist Books Mind Map Clustering Delicious Tags Entering protolife Hierarchical Delicious Fre [...]

  • [...] artial translations Taoist Books Mind Map Clustering Delicious Tags Entering protolife Hierarchical Delicious Fre [...]

  • [...] There is an additional implication of the fact that highly idiosyncratic tags like “must-read” don’t tend to dominate the distribution (I think there is a side issue that there are many different ways to be idiosyncratic .. it can be a tag only used by a particular individual, or alternatively it can be a tag used by more people, but each time in a highly individual way). If this is generally true, it shows that examples of this sort, which are often cited against the “tags as ontologies” notion, lose some of their power. These highly individual tags seem to me to take on a completely different role in tagging behavior. My feeling is that popular tags are about “collective categories” (of various sorts, to be discussed) and idiosyncratic tags are about user-centered, context dependent memory cues. This has at least two implications. First, we need to consider different sorts of tags .. tags are not all the same. Second, we need to find evidence that highly individual tags are even useful. They often assumed to be, in the interest of the new, empowering, free-to-chose-as-you-like paradigm. But how many things can you label “to read” before the tag loses its meaning? Like the piles of papers rising like mountains in the corners of many of our desks, I am sure! In fairness, I acknowledge the sometimes made claim that user tags have an additional (or perhaps predominant?) role in pointing to similar, possibly useful URLs. According to this view tags are different to formal categories in that the latter are about locating resources in some precise manner while the former is about navigating among potentially useful sites, using tags as pointers. But even if this is true, the point remains that tags like “must-read” and “cool” will add very different amounts of value to different audiences. There is another interesting attempt to find patterns in tags using statistical co-occurence. Here is an example of my tags translated into a mindmap. The mindmap shows two interesting patterns. First, it shows groups of tags which tend to be used for the same URL. The amount of overlap can be adjusted by a parameter, but the default is set around 60%. That is, if two tags share 60% of their URL’s they are clustered together. An example on my map is [Wikepedia encyclopedia]. More than two tags can be clustered as in [emoticons messenger smiley yahoo]. Actually this is a little more complicated because the parametrically determined number of shared tags also depends on the depth of the nodes. Nodes at the leaves can be clustered even if they share much fewer than 60%. The meaning of the hierarchical relation is the second interesting point in this map. Any tag which appears as a sub-tag in the mindmap is one that never labels a URL which is different from the one the super-tag labels. For example on my mindmap “rss” labels two URLS with the names “RSS Readers for Linux” and “FeedXs”. In turn “rss” has the sub-tags “reader”, “feeds”, “free” and “publishing”. Of these, the first is used to tag “RSS Readers for Linux” and the last three each tag “FeedXs.” So what are the additonal tags doing? One possibility is that they are in a sense redundant … rss is always free, so the two tags provide alternate routes for finding the site. In my particular folksonomy either one would have done the trick on its own, but the redundancy might help finding the resource from two different sources. Another possibility is that the additional tags refine the search. “RSS” would give two links but “rss” + “reader” gives only one. As such they act like subclasses in a formal taxonomy. Except .. they don’t. In the current example it is obvious that “reader” is not supposed to be a subclass of “rss”. Instead, I meant to have a single category “rss reader” .. but del.icio.us does not allow two-word tags! But there are other reasons for two tags to go together, apart from a design side effect and a genuine subclass. For example “September11″ and “GeorgeBush” might go together ’til the end of time, but not because one is a subclass of the other, nor is one in any sense a refinement of the other. These relationships contain valuable information, which I haven’t really thought enough about. But it is pretty clear that a number of different patterns could emerge. One observation which is pretty clear is that individual taggers (not an aggregation now) have a selected set of tags which in some sense dominates the others. Look at some numbers on the main site again. My map has the following numbers next to it: (78, 168, 56), meaning that I have 78 unique URLs tagged with a total of 168 tags, but only 56 of those are unique. The pattern here varies widely, with some people having many more total tags than main tags (lots of hierarchical clustering) and others having hardly any hierarchical use of tags. There is clearly lots of interesting information hidden in these relationships. But I haven’t yet told you about what I think is going on with the popular tags. I think this might also help us understand the individual ones …..   [...]

  • [...] So now we need some evidence on how people use tags. I suppose the first thing to look at is how many tags individual people tend to use with individual URLs. Peroni, who has gathered many users’ information for his mindmaps estimates this figure around 10. This site is a very interesting read because he goes on to explain how you can use Pascal’s triangle to calculate the number of URLs that can be uniquely indexed by a combination of n out of a total of m tags. But this is an idealistic calculation which assumes that the tags are used independently (I think!). So I gathered some numbers from the users who generated mindmaps. There were 2202 maps from which I calculated the means and medians. The mean and medium number of links were 179 and 103, respectively. To store 179 tags, you need 10 tags used in combinations of 4. But the mean number of tags per user is 100! Obviously a sub optimal strategy by the users!!This suggests that people use many more tags than they really need for each bookmark. Many of those tags must be redundant, or even unused. Of course the result is consistent with the idea that a few, frequent tags are used as categories in a way that efficiently narrows the search. The rest of the tags are there either for redundancy, or perhaps to facilitate alternative, albeit more infrequent, access paths.Here is another interesting observation that supports this view. Golder and Huberman analyzed the tags assigned to individual URLs by individual users, in the order that the tags were assigned to that URL. What they found was that people tend to use the highest frequency tags first, then start using the lower frequency, more idiosyncratic tags. This is again consistent with the view that high frequency tags are like categories or folders to hold relevant items, while lower frequency tags add more personally oriented distinguishing features to each resource. [...]

You must be logged in to post a comment.