|
|
The two talks I gave at the: International Workshop on Challenges and Visions in the Social Sciences, this summer, are now available at videolectures.net.
Not the best talks of my career, and hopefully not the last either. But the guys at VL did a great job in recording them.
One of the talk was about Tags, and the second about Democracy of the 21st Century.
In the one about Continue reading My first 2 talks available online: Tags & 21st Century Democracy
It is now the time to present the next project we have been working to: TagBay. And I say ‘we’, because is this project I am not alone. I did it with a friend of mine, Derek, who accepted, very patiently to code, some of the idea I have been tinkering around in the last year or so. I am speaking about how tags, and tag clouds, and distance between tags, and so on.
So, in brief we made a web site to tag material that is being sold on e-Bay. Anybody can tag any object that is being sold. Not only can any object be tagged but you can tag sellers, too (oh, we are not responsible for offensive tags, eh!).
Tags on objects can be made private or public, and you can also search among your tags, among everybody else tags, and eventually (when we code it) it will be possible to search among the tags of another user, like in del.icio.us.
Now that the summary for the people who have no time has been done, let’s try to explain the idea in the details for those who have a bit more time.
Pages:
On TagBay, right now, there are 3 type of pages: e-Bay Search Pages, TagBay Tag Search Page, TagBayUser Tag Search Page, Item Page, and Seller Page.
- Search Page: It is possible from inside Tag Bay to make searches on e-Bay on specific keywords. The user can then add tags to each object that came out, store the tags added all at once, or store the tags of a single object. The same thing can be done in the Tag Search Page
- TagBay Tag Search Page: In this page the user gets all the results for a single tag that someone have used. Nothing fancy (for now). Items where the tag only appears as a private tag will not appear here.
- TagBayUser Tag Search Page: In this page the user gets all the results for a single tag that he have used. If the user is logged in and is looking at his own tags, also the items tagged in a private way will appear.
- Item Page: Each object has its specific page. From such page any user can see what are the public tags that other users have used for that page. Also they can define their personal tags for that object, if their tags are going to be private, and the tags of the seller.
- Seller Page: And then there is the seller page, and in the seller page any user can tag any seller. The use of tag for sellers is still limited, but will be increased in the future.
The natural use of the site
- For a seller or for a shop A seller might want to use the site to tag all the objects that he is selling, giving for each object all the tags related. Thus increasing the possibility for it to be found. We suggest to list the tag in the order of importance, as soon we are going to use the order consider the importance in the search page.
Also, if a person wants to make a cool list of objects, they can tag exactly those objects, with a tag they never used, and then link to the page in their directory of this tag. Thus creating on the spot, their lists. Also sellers will want to tag their objects, and people making searches will tag objects to make lists of objects they want to follow, before jumping on a transaction. We think there is more than enough material to generate interesting behaviour. It doesn’t have to be exactly the same emergent behaviour that we are used to see. After all we are just exploring the possibilities of social folksonomy.
- A shop To the possibilities before, a shop who is selling on ebay might be interested to make sure that the shop itself (remember that you can tag sellers, and not only shops) have all the tags related to the merchandise that they are selling
- Someone buying Our suggestion for someone who wishes to buy, on e-Bay, would be to first look under the tag search, to see if there is anybody who has already tagged any object that they are interested in. This does not necessarily be someone else who is buying, but also someone who is selling. Then tag the objects they are interested themselves, to have it in their own list of objects. Then they could go to the search e-Bay page with the necessary keywords, and add the chosen tag to all the objects interesting. At that point a first selection have been made, and all the possible objects have been tagged. At this point, he could choose one or those objects, change the tags to private, and start betting on it.
- Someone suggesting And finally if someone is just trying to suggest some possible objects, he could search e-Bay for those objects, tag them with a unique tag and present the url of the list to whoever is interested.
There are many other ways to use TagBay. In a sense TagBay is a toy, and not a game. And as every good toy it can be used for many different games. We suggest here only some of them. Also TagBay itself is rapidly evolving. We have tons of stuff we are interested in including, and if you have been reading my blog, you know how my problem is always to find people to code my ideas, more than to find them. And this is why I am so happy for Derek work!
Difficulties that we found:
There were a number of issues that came out when we started developing this program.
- Public vs private tags:
Why would someone tag an object if they are interested to buy it? After all aren’t they making it easier to others to find it, by adding those tags?
This was a serious doubt that we had, and finally we decided to give the possibility to users to tag objects privately. Yet there have to be a balance between private tags and public tags, as public tags are necessary to generate the emerging folksonomy that we wish to use. So we decided for a compromise: public tags can be done from the search page, but private tags requite you to go to the specific object page. In our view (but we are ready to be proven wrong) someone would go to the search page, tag all the entries where he might be interested. Then chose one, and tag that one in a private way.
- Limitations due to the temporary nature of the objects
Considering that most object exist on ebay only for few weeks before being sold, wouldn’t this be not enough time to make a tag cloud and let all cool emergent properties that folksonomy induces, appear?
Maybe, but sellers also can tag the objects they are selling, thus giving a fresh start to all the objects. Also side by side to tagging object we are giving the possibility to tag sellers. Which eventually should survive each transaction and build up an interesting tag cloud.
- I spoke about sellers tagging their own objects, but wouldn’t this invite people to spam your site? After all, wouldn’t it be much better for a seller to add many tags to be present in many searches?
Ah ha! You think tag clouds can be spammed. This is false. Tag clouds cannot be spammed, and no one understand this. And we shall use this site to prove it. We have nothing against spammers, they are absolutely welcome in our site and spam it as much as they feel. Add all the tags they want to each object they sell. It will make ABSOLUTELY NO DIFFERENCE in the search page. Tag clouds are unspammable. And our engine will use tag clouds as its base. Everybody else uses tag sets. And this makes them easily spammable. So, no we don’t fear spammers. In fact we hope that spammers will come to our TagBay site. They are just people trying to sell their stuff, we are trying to make sellers meet with buyers. Wouldn’t be bad to single out spammers just because they are spammers.
TagBay is obviously still in beta, and there are many things that need to be coded. If you have any idea on how to make it better please do not hesitate to contact me. If you want to make a difference on what the final product will be now is the time to do it. Also all new suggestion implemented should be listed in a special page with links to the original suggester home page.
Some of you might remember my rant, once del.icio.us was bought. And some others, who where with me from before might remember the entries I wrote on tag clouds. Some time later I was contacted from an Italian developer, Fabio Vescarelli, who asked me some help in developing some algorithms to find the distance between users in a del.icio.us like program. We had an exchange of email first , and we met in chat some other time. He was building a del.icio.us clone, Smarking. But with some interesting differences. Continue reading Review: Smarking
Finally the time has come. Although I wanted to do this from a long time, only now did I found the time and the technical knowledge to do it:
I divided the blog.
I divided all the Italian posts from the English ones. I created a new blog at http://it.pietrosperoni.it, and my italian posts will, from now on, be posted over there. And only over there. Most of the people (3) who read me (5) either read Italian or English posts. And I am sure it must have been very confusing to scroll through a page and find some posts in English and some in Italian. Plus I always had the sensation that I could not write too much in one language, or possible readers of the other language will just assume the blog contains no information at all for them, and dismiss it. This in time made me slow down posting, as I could not always follow particular threads, that would have involved to post many times, in one language.
But now all this has come to an end.
Of course if you want to read entries from both blogs you should add the rss from the italian blog too. Some topic will remain confined to this blog (like tags, for example), others will remain there (like italian politics), while other will span through both medium (like diet, which already is present in both). The wiki in this case should act like a glue, creating a space where entries from both are aggregated. Plus, being a wiki, I (and whoever wants to come and play) will use it to keep notes, aggregate extra content, and generally make some pages stand out while others will only show the blogs entries, the bookmarks, and the context (i.e. the links from delicious popular page, and from technorati).
Generally it is not a smart idea to to come here every time to see if I have written something. I tend to write when I have something to say, so many days might pass before I say something, then for some days I might make one or more post a day. The solution is to add my rss feeds to your feed reader. Bloglines is a good one. I am sure there are better ones. Feel free to suggest them (as I am always looking for ways to improve).
Now let’s get a bit more technical: making this change also meant getting my hands dirty with MySql Continue reading The Italian blog is born: reasons and technicalities
I think it’s the time to present what have I been doing in the last days. A number of improvement have beed added to this web site. In short I have upgraded to wordpress 2.0. I also moved to the next version of wikka. Some of you might remember that I offered some money to whoever could write some code to get the tag plugin to generate an rss list. I didn’t, at the time, explained why. I will now.
Wordpress 2.0 gives the possibility to start categories on the fly. Just adding them, by listing them. Essentially this makes the category in wordpress work like tags (or keywords, for academics). But categories in wordpress also have an rss feed connected to them. Albeit with some bugs, like linking to the whole blog and not to the particular category. So I passed most of the first of January adding to the entries the relative tags as categories. So now I have no need of an rss feed for the tag page, as the tag page has been substituted with the category pages.
You also will rememer that I installed Wikka. The wiki engine. Now wikka is not only open source but also easy source. It is so simple that even I could hack the code. That is very simple! So I changed the code and inserted the possibility to have default pages. In short if before if you were to look for the url http://wiki.pietrosperoni.it/someunexistingpage and there is no page in the wiki called “someunexistingpage” the result would be that the wiki would ask you to edit the page, and you would be redirected to http://wiki.pietrosperoni.it/someunexistingpage/edit.
Now he would create on the fly the page someunexistingpage with the default content. And the default content I chose was the 4 rss feeds:
- the feed from my blog from the category: someunexistingpage
- the feed from my technorati from the tag: someunexistingpage
- the feed from my delicious bookmarks from the tag…
- and the feed from the popular pages in delicious, always from that tag
So for each tag I now have a wiki page with the most relevant rss appearing there. But being a wiki page I also can add other rss feeds, write definitions, comments, todo lists. In short modify it as I see fit.
Still it is not perfect. As it writes the page the first time, from that moment the page is set. I can delete it, but I cannot, for example, change the default content for all the pages that only contain the default content. I tried to write a plugin to do that, but I failed when I confronted the fact that I needed to write a plugin {{defaultpage}} who should have activated other plugins:{{rrs}}, for example. Something that I ignored how to do.
Also having the same string to work for delicious (as tag), wordpress (as category name) and wikka (as pagename) puts some heavy constraints on what the string might contain. For example I am already running ashore for all the tags that contain a dot inside (aaargh, del.icio.us!) or an accented letter (aargh, dear italian).
If you want to see how the pages look like just see the idea page. But any link from the right column (provided they have no dots inside or accents) will work fine.
I wanted to start this entry congratulating with Joshua for the deal. But I won’t.
Tha facts: the web site delicious have been sold to Yahoo!.
I personally don’t dislike Yahoo. I positively hate them. For having eaten and raped startup websites, one after the other. For being totally obscure in terms of contact with the public. For refusing to answer e-mails. For being so big that they can just claim: “we are too big to answer your e-mails”. We can ignore you, and trample on you; we will not even notice. I have something personal with them from the moment they deleted my web page back in 2003; and with it all the material inside; which included some preprints of academic papers I wrote; some of them I had in single copy. I hate yahoo because they don’t get what is the web2.0 and they try hard to copy it. And when they fail in copying it, they try to buy it. As if you could buy a community. As if you could own a community. As if you could buy a language and the agreement to keep the data open.
So maybe I should congratulate with Joshua for having sold something which had no price for some real and tangible money. But I still will not. Because delicious was not only a community. It was also an experiment. A place for us geeks to meet and discuss. A place where we were changing the web. Yes WE were changing the web through our ideas. And Joshua was good in picking the best ideas. Inviting us to give more. Now do you really think this will continue under Yahoo!’s reign? Forget it! At least for my part.
But this is not the reason why I shall not congratulate with Joshua. No I shall not congratulate with him because he could have made it. Because delicious was clearly, and recognised, the best bookmarking service on the web. And with the whole community behind giving suggestion it was prosperous and growing. Because people have pleaded him to start charging, or put advertisments, or do something, but let us pay for it. Because we knew. We knew he could not possibly pay off it all by himself. And we were happy to join in. We were happy to pay. How many services are you aware of where the costumers ask to pay for them? Few indeed!
Of all the people who have commented the action I feel the person who better captures my feelings is Ronald Johnson, who comments:
Some lessons to learn here:
- Never trust a startup service to store your important data no matter how the owner seems honest to you. Sooner or later he/she will run away with the money and YOUR data.
- Never trust a corporate entity to continue storing your important data. Now that they stole your data, you are subjected to the user-specific ads and they abuse you no matter how strong you cry.
- Never act like a fanboy on services you don’t trust. Instead, invest your time and knowledge on open source projects to ensure your efforts are never sold to third party evils.
I have to add, one of the thing I found most disturbing was the form whith which Joshua announced it. In evidence the words that I found most disturbing:
We’re proud to announce that del.icio.us has joined the Yahoo! family. Together we’ll continue to improve how people discover, remember and share on the Internet, with a big emphasis on the power of community. We’re excited to be working with the Yahoo! Search team – they definitely get social systems and their potential to change the web. (We’re also excited to be joining our fraternal twin Flickr!)
We want to thank everyone who has helped us along the way – our employees, our great investors and advisors, and especially our users. We still want to get your feedback, and we look forward to bringing you new features and more servers in the future.
I look forward to continuing my vision of social and community memory, and taking it to the next level with the del.icio.us community and Yahoo!
The post stinks of corporate declaration, and has already signed the destiny of delicious as just another piece in the yahoo puzzle. A more honest post would have spoken of the money that was passed. How they made an offer that could not be refused. Of the risks of the passage. It would still make people upset, but we might have felt that it was coming from Joshua and not through Joshua, from the Yahoo P.R. office.
All this calls for some actions, for I really don’t want to support Yahoo; and if all I can do is passive resistance, then that’s what I shall do!
- I shall look for a good alternative to Yahoo, ehm, I mean del.icio.us. The folks at slashdot suggest Simpy.
- I want to look better at microformats, and in particular at rel-tag. It might be possible to install a small bookmarking service on site, and then have it send standard info to the community at large. In this way I would not be vulnerable anymore to the next Yahoo! acquisition.
- While I am there I should also look for ways to get out of Flickr (who has been acquire by Y! too). Don’t miss the wonderful description of the mess Yahoo is doing with the Flickr signup page. There I also heard that 23hq might be a good alternative. Still I would prefer something on site that speaks a common language.
- I have to decide what to do with the Delicious Mind Map Maker. You see, I really don’t want to support Yahoo. Not even indirectly. So I am tempted to take it offline. But if I find a better service, and it is bound to be there now that other geeks will start migrating to come out of the belly of the beast, I might just modify it to sustain this other service. Nothing have been decided yet.
- And then I might instead develop my own service or help someone else develop their service, using the tagclouds ideas I spoke about early.
- And last but not least, there is the possibility that I might develop the famous search utility I have been speaking about. Up to now, apart the constraints in time, what really stopped me where ethical reasons. Joshua asked people not to screenscrape delicious, so I felt I would abide by his request. I surely did not want to tax the servers of a poor hacker. But now the ‘poor’ hacker have sold the golden eggs’ hen, and walked away with tons of cash. And I am sure Yahoo will not even notice if I start screenscraping them. At least until they start putting all sorts of advertisments which might make it too hard to do. Hmm, active resistance might have some attraction!
So I probably should congratulate with Joshua. He sold a bunch of quite simple and useless code to Yahoo. He prospected them the possibility to have a great and creative community. Now all he has to do is walk away with the cash, start a delicious clone and we will all be more than happy to join him in the new adventure. Hell! We will not even ask for our part of the booty. Although we might ask for a dinner in a good restaurant.
And I think that’s just fair.
ADDENDUM:
After reading all the comments on slashdot I found a link to a page with most bookmarking services compared. It is a bit old, so not totally updated. But yet it gives some good overviews and can be used for some good pre-screening. Also the maintainer of Simpy, Otis, wrote a long comment explaining how he might even adapt the code to make the mindmap work for that too!
I think the time have come to write my third, and hopefully last contribution to the topic of tagclouds.
I have been hearing a lot of talk on how users should not use too many tags in linking to url. I also am the maintainer of the mindmap maker, and I often look at some of the maps generated (available to everybody). There is a number of people who tend to use an average of between one and two tags per URL. Their maps are often very ordered. No clustering, no hierarchy. (Forgive me if I don’t put a link to such a map, but since I am going to bash this way of using delicious, I’d rather bash a method than a specific human being. Just go to the list of maps and open a couple, odds are one of them will be of the type I am describing). This way of using delicious uses tags as folders, just with the modification that every now and then you can put an URL in more than one folder at the same time. A bit like big bookstore might carry several copies of the same book, and store them in more than one place (and the Tao Te Ching, ends up in New Age -God knows why- and in Religion).
Of course tags tend not to fit exactly. My Tag Clouds and Cultural Change will be under Tags or Folksonomy or Sociology… Whatever you chose you probably will not put it under Ajax. And yet most of the analysis was done studying the spreading of the term Ajax.
Let’s make a few simple calculations. Continue reading Tag Clouds are hard to Spam
Note: This entry is connected also to a mindmap. Some people were having problems in opening the page because of that. As such the mindmap has been stored in a separate page, and can be viewed from here.
Introduction
As correctly pointed out by Jeffrey Zeldman tag clouds are becoming more and more popular. Yet I keep seeing services which should be using tag clouds that keep on using tag sets. It is not just a problem of programming a tool which can only support tag sets, but also but also of programming tools which might in principles produce tag clouds, but such that the users are not invited to use a tag if one already exists, and as such don’t generate a tag cloud.
Example of the first type of tools are Flickr, 43things, consuMating, tagsurf * , example of the second is the tagged version of the BBC* . In all those cases a tag set is used, where instead a tag cloud would be more appropriate. Some of the differences between a tag cloud and a tag set where explained in Vanderwal.net: Explaining and Showing Broad and Narrow Folksonomies. Let’s see them again, and see some consequences of those differences, which should clarify when is better to use one tool and when is better to use the other. Continue reading On Tag Clouds, Metric, Tag Sets and Power Laws
Some things are bound to happen. And they tend to happen at the right time. We have been using tags from years now, but the momentum have builded up, day after day. Always seeing more and more computer programs using them. Starting from deli.icio.us and flickr. Then 43 things.com, consumating.com, tagsurf.com and all the clones of the above (BTW if anybody can find me a small open source server program that emulates Flickr for personal use,I would be grateful). And of course technorati tags, and GutenTag that give rss feeds to technorati tag.
But something was missing. Somthing that some people might have noticed. The news were not playing with tags. News were still presented in the old top down way: politics, economics, international…
On Google News, as well as CNN. On Yahoo News, as on BBC.
But finally something is starting to move over there too.
Two services, pretty much at the same time were presented: Yahoo News with tags and BBC with tags.
But there are some serios differences between the two services. Yahoo content is being automatically indexed by a program, who imposes the tags according to what keywords are found in the text. As such Yahoo tags is a Top Down keyword classification of stories.
Instead (and here you can see the revolutionary spirit blowing through English news services), BBC program is a truly down up grassroot program. A program where everybody can add any tag to any article.
The difference is not a minor one, as in the first case it is the user that have to adapt to the world view of Yahoo, while in the second it is BBC that includes in his wider world view the user one. In a sense it is a case of Tagsonomy vs. Folksonomy, or
narrow folksonomy vs. broad folksonomy.
Of course both the program are still in their first days. Full of bugs, and of suggestion from us on how to make it better, smoother, and nearer to our personal desires.
Of course having anybody being able to add any tag to a copy of the BBC content is full of political dangers. What is stories about important politicians start to be tagged as ‘dictator’ or ‘wanker’. This is in fact inevitable, but politicians showld well use this as an indication of their popularity, than something to be changed.
At the moment anybody can add a tag in the BBC news page by login in as ‘guest’/'guest’. And already we have some people who have tagged some stories as ‘wanker. But if we go to delicious we see that nearly no one have used such epitome.
Why is that? My personal position is that people are more careful when tagging something for their own personal use. On delicious everybody have an account. And although you could have as many account you like, they cost. They cost time and memory to set it up. So we all tend to have just the minimum amount of acount needed. But on BBC, at the moment, only BBC person are allowed to have their own account. We normal human being, can just be guest. Ans as such we might feel deresponsabilized respect to what we wrote. So I think that, although the experiment is great, it will only work properly when everybody can set up his own account, and serch his account, or the account of another, well defined person.
Of course this also open up all sort of extra possibilities. After all, if anybody can tag any article with his own tags. Then to each article a set of tag will be defined. What is I want to receive (maybe on my mobile) all the articles tagged with a certain keyword. The possibilities are really endless.
And to look at those possibilities BBC had started a whole new project, called BBC Backstage where geeks are invited to collaborate with the staff of BBC to develop the API to permit to everybody to reuse the BBC material. Cross this with the fact tha much of this material is copyrighted with a copyleft copyright (copygotit?), and you see how the whole situation can positively explode.
Imagine, much of the material from BBC, offered for free, in the way wanted by the best geeks and hackers, to produce information in any noncommercial way they please.
Already many ideas are flowing? An RSS for the results from sport match. Crossing google maps with BBC News.
Possibility to have BBC news accepting trackbacks.
And many many others.
All this would mingle BBC with the common people. Think, all the news, mixed and remixed. Commented, trackbacked. Until you can read an article from BBC news from any device (through rss), in any format you want (through your rss reader). Filtered anyway you want (through folksonomy), and seeing the world response to that article(through trackback and comments).
Thank you BBC
(and no, I am not paid by BBC)
Thanks also Wired for some inspiration.
This evening I played with calendars. In particular with the calendar published by Mozilla. SunBird. It is pretty amazing. Also here they managed to install an open standard with which anybody can write his own calendar. The program let you then save it into a file or publish it on the web. You can also upload claendars from other people, and they will appear superimpressed on your events, so that you can see your event as well as the other calendar event.
Think about it, it is extreemly easy, and extreemly powerful. I can just write down the dates that for me are important, and people can use the info to define meeting, set up ambush, or find out when the campervan is unattended. Infinite power.
More, it is possible to set up calendars for particular type of events. For example we could, at work, set up a calendar for all the conferences on artificial life, artificial chemistry, complex system. etc. Or even a separate calendar for each of type, and each person could just subscribe to the calendars that he is interested in.
The calendar is still very limited in many ways. For example events can be assigned only to one category. The whole idea of tags and folksonomy has here yet to come. For example eventually people should be able to set each event in multiple categories, or even suggest categories for events of others.
In any case, my calendar is at http://www.pietrosperoni.it/calendar/agenda.ics. If you have firefox with the calendar plugin inserted you can just see it. If not you probably need to wait until I integrate it with my blog, which will take quite some centuries.
Update: Another thing that is definitly missing is an integration between this software and the smart phone technologies. What’s the point of having a cool phone that can connect to the net, so you can be everywhere anywhere you are, and have such phone have all your appointments, if you cannot let this phone speak, on the net, with your calendar. It does not seem such a hard thing to obtain, although I would not know where to start, so I would predict that within 6 months, no, no 4 months, a program should me around that let me integrate the 2 things. If it isn’t already there.
As I posted the previous entry, I went to technorati to check if it was being pulled. And what I discovered was that technorati was only pulling the first tag in the list.
I make quite an effort to add all the tags that I think might be relevant. This both to improve visibility, and to better categorise the content. I like to make a copy of the same tags in my p.s.blog delicious account. And then see the whole thing as a mindmap. But for the mindmap to really work it is necessary to that if two entries share some content they should also shar at least a tag. So I use many tags. And the mindmap comes out really nice.
Not only this, but I feel that each post belongs to multiple tags, and should be present in multiple pages. For example this entry belongs to both the tag ‘technorati’, and the tag ‘mindmap’, ‘delicious’ etc.
Investigating a bit further I discovered this post, where a similar problem was presented. In that case technorati was pulling the information from the list of categories in the rss feed. Now the problem is that, in wordpress (other tag!), the list of categories is defined before, while the tags are defined after. And although this might seem like a minor problem, it actually means that often we don’t add all the categories that we need. In a sense it should be possible to just ask that wordpress uses tags as categories.
And then post the tags as:
<category>firsttagname</category>
<category>secondtagname</category>
So the end result of this is:
my posts are not appearing in the technorati page where they should: tag, technorati…;
my posts are appeariung in the technorati page where the shouldn’t :General, English…;
And I haven’t got a clue how to fix it.
Pietro
UPDATE:
I did send a mail to teachnorati, and I got this answer:
Hi Pietro,
Your tags must occur within the boundaries of a post, a div of class of storycontent in your case. Technorati should treat your Dublin Core subjects in your Atom feed as tags.
SECOND UPDATE:
After various tests, I realized that technorati does not parse the html, and I usderstood what the mail meant with Technorati should treat your Dublin Core subjects in your Atom feed as tags.. Since the author of the plugin explained that for a couple of more month he is not going to be able to fix it, in the meantime I downloaded another plugin: Technotag. That gives me the possibility to add <tag>tagname</teg> And that’s makes a tag automagically. Let’s hope that this works!
THIRD UPDATE: it works. And as I keep on making small hacks to the plugins that I use, I slowly learn how they work
FOURTH UPDATE:correction, it only worked for the first tag. But I hacked a bit the code and now it works fine on all. I shall send an email to the author, to pass him the change.
Before it would make a tag on every <tag>tagname</teg>, but all the tags would all point to the same address: The one generated by the first tag. Corrected. The new code is available here.
Rss is somehow one of the best ideas. You can have your content, stripped of form BS being redirected all around. This gives a one to many structure. Now we need the opposite. We need to be able to pull the content from many sites in the same place, and check it. A many to one structure.
Most of you will say, “But we already have that, it’s called an aggregator. Just look at bloglines.
Yes, and no, that’s part of it, but it’s not the whole story. We need to have a page that posts all the content from everywhere in a single page.
And again I can hear: “but we have that too: it’s called a technorati tag“.
Again I will repeat: Yes, and no, that’s part of it, but it’s not the whole story. We need to pull the information from the technorati pages to our aggregator.
This is the idea: we need an rss feed of a technorati tag. As we can get the rss feed of a del.icio.us tag, we need to have it for all the blogs. The time have passed to add to your friend list ALL the blogs that might have information of interest. We need to be able to add that rss to our bloglines.
So, either technorati will start releasing the rss, or I predict that:
- a) other services will start competing with technorati offering that info
- b) anonymous hackers will start scrapping the info from technorati to offer the very valuable information.
See also:semanticweb, tags
This is going to be big. It’s called tagsurf. When we were setting up the taoist discussion board, at Tao Bums, I was looking for a board that permitted me to tag individual messages with different tags. The reason is that over there we are now a group of friendly people and every thread start with a topic, but often touches many separate ones. The board had to be in PhP for reasons only knew to the web master, but that we all were happy to follow. So we started looking around, but no board with tagging facility went up. Nothing. I had to admit that the idea was quite new, and I have not seen any such board around in any case. And then we decided for phpBB which being open source would have had new versions with any new cool geeky thing appearing every so often. Well. Now I finally found the first tag based discussion board. It’s called tagsurf. And is very cool. You get to write messages and tag them. As tag you can use any word up to any size. Now, the result of this is that you can tag thing with the url of something. So immediatly a series of utilities started appearing:
People (first one I saw doing it was Russell Beattie) added a tagsurf button. In short if you click on that button you get all the comments on tagsurf that uses your permalink as a tag. In a sense it is outsourcing the discussion board.
Yes, I added it too, is down near the little technoraty bubble, and I just needed to add:
<a href="http://tagsurf.com/post?tag=<?php the_permalink() ?>">Tagsurf this</a>
in the template.
I also went back to see how was tagsurf behaving in del.icio.us. It seem that, as it often appear in other cases, the meme is 6 days old. At the beginning few people noticed it, and now is starting to explode. I too found out because of the delicious discussion board, which I would suggest anybody who is interested to anybody who is interested in delicious OR folksonomy
I think this tagsurf will and can have great impact. They already have some API defined.
I also got an eye to their privacy policy. It seemed simple and clear. Yet now I cannot find it anymore. I suspect that they might be working on it right now.
I also made a small bookmarklet to post an entry on tagsurf about a specific page. Just drag the word bookmarklet on the bar and it should work. Of course for it to work you have to be logged in in tagsurf.
Great points:
- trackback: every post gets is an entry point for trackback. In other words anything you say can receive trackback from anything else. You say something here, and it get people in the blogsphere chatting. And you can follow their conversation. This is something very important that was missing in all the bullettin board I have been using. In a sense many discussion board are only looking in. This is also looking out.
- trackback 2: Every post that you make can send trackback to anything you want. The software to do this automagically respect to the other posts inside tagsurf is still missing, but I can’t imagine it not appearing very soon.
- possibility to mix different threads: since each post gets as many tags as the poster want it is quite easy for people to join different threads of discussion.
Problems I might see coming.
- Spam, spam, spam: I recieve about 30 spam trackbacks a day. And they get filtered by cool programs and finally deleted by me. Yet those programs need me to make the final judgement. Who will make the judgement for all the trackbacks in all those posts? Will the user have to? Can someone close the trackback from his own posts? I see many problem and much discussion over here.
- copyright: This is another big one. Let’s say that I post a cool entry in tagsurf, who gets the copyright of it? It might be important. Imagine that someone takes it, and wants to add some extra tags. But adding tags is not allowed at the moment. So he copies the post and just reposts it with the extra tags. Do I have a say on it?
All together I think this is a wonderful piece of new technology. When tachnoraty started his tag page I wasn’t very impressed, but this, I think, will make some huge effects. And still I can’t see all the implications.
ADDENDUM: just as I ended this post I read fully the great and very interesting post from Russell Beattie. And I found that he had made exactly the same bookmarklet. Oops. Well, I hope he will not sue me, I haven’t copied his code. I just reinvented the wheel.
ADDENDUM to the ADDENDUM: As I was looking at all the people who were commenting on the thread on Russell post I noticed another post with the same bookmarklet. And I thought I would have been the first . At least I get to see if the trackback to posts over there actually works.
ADDENDUM to the ADDENDUMto the ADDENDUM: trackback does not seem to work, or the comment is being held back for security reasons
Did some more debugging. Now any unicode the user used in the tags should be ok. Still there is a big brick wall in terms of memory usage. And some users are not having any luck just out of the fact that their map is taking so much resources that it goes beyond the ISP limit. I could work hard and distribute the whole calculation so that all variables are stored on disk, so the memory would never be hit, but honestly, it is not my top priority at the moment. I am here to help those users run the program on their own machine. And eventually we might solve that problem too. So, what are my top priorities:
- Add an rss feed.I would like to add an rss feed that every time a new map is done, the feed gets updated. It wouldn’t just tell the name but all sorts of data, like the list of the Main Tags. So the users could see if they might be interested in checking the new buddy’s map
- Insert a way for user to delete their own maps. If I am going to go into hosting business, I am not going to be one of those hosts where you can add info, but you cannot delete it. I am aware that users info ultimately is adding value to my site, as such I want users to be happy in having their map here. Not forced.
- Insert a general log of all the maps that are being started, and ended. Right now such a log is absent, and there are about 200 maps completed, and more than twice maps that have been started. So about 300 have been dropped. I bet many of those users would have success, if they tried right now, after those 3 deubugging session. Still I want something that tells me: Warning warning warning, map dropped. Bug? OutOfMemoryError?
- Add the number of posts inside a tag. Just obvious
- Probably add some of the MainTags as keywords to each single map. The problem is: which? All is too much. All the ones that contain more than x posts, y subtags is not flexible enough. The solution should be: if a MainTag is part of a ParetoFront of Delicious than the keyword should be there. The fact that this means writing a whole program that stores in a database the latest ParetoFront is just a small detail
. And before you ask: no, I will not need anybody’s password to do that, and the data will all be public.
- Add a bookmarklet to save a map in your own delicious, with the keywords as tags
- Change the map, so that it can run on a single tag. Useful for big complex maps like mine, and others.
- Make it change the Title of the Map Page, to show the owner of the map. Useful if people want to add the maps to their delicious pages.
And then there are some tests I would like to make, like:
- Check if it would make sense to show all the tags that appear with a single tag, and not the subtags.
There is more? If you can think of other modifications , please drop a line in the comment section. Also if you tried to run the map maker and it is not giving you satisfaction let me know. I’ll whip it appropriatly. HarHarHar. (I’ve always wanted to say that!)
The first person to use the tool (presented here) was Mike Harris, for his delicious entries. Note immediatly how the time needed to compute the map has little to do with the number of posts, and much to do with the number of tags.
- WCityMike: 2029 Posts, 87 Tags and 81 Main Tags, calculated in 86.85 seconds.
- p.s.blog: 21 Posts, 43 Tags and 17 Main Tags, calculated in 0.23 seconds.
- pietrosperoni: 372 Posts, 400 Tags and 152 Main Tags, calculated in 377.40 seconds.
The Main Tags, are the tags that will appear as main branches. And we can also see a difference between Mike maps, and mine. In mine I tend to have about 0.4 of the tags as Main Tags, while Mike tends to have something more near 0.9. This is probably due to the fact that I tend to apply many tags to each post (four or five are common, but sometimes more), while Mike tends to use an average of one or two.
If we look at the map we can also see that there are less clusters than in my map. Note for example how in the small blog map nearly everything is clustered… and those are only 20 posts and 17 Main Tags.
If we look at the source code we can see that, on the 9th line some constants are set:
distances_constant= [0.333333,0.4,0.5,1]
Those constants define the minimum distance for entries to be in the same cluster.
The 1/3 means that if one third of the posts between two tags are in common then the tags should be in the same cluster. And so on. Tags that are farther apart, but have a path of tags between them such that you can go from one to the next without never going above that distance are in the same cluster, too. A process that in the log is referred to as making the distances tables transitive.
Those number have been specifically tweaked for my delicious posts (and generally my style of bookmarking). It seem obvious that for Mark the numbers should be different. Since it is more uncommon for him for posts to share a tag, probably the numbers should be lower. Something like:
distances_constant= [0.1,0.333333,0.25,0.4,1]
The last 1 is just to make sure that tags that are synonimes are shown together.
I think eventually I will modify the program so that it is possible to insert your own constants from outside. But for now I am just grateful to Mike for giving me the material to understand better how to enhance the program.
I went on programming at my favourite Python program: Delimind.
In short: Made a new release of the Deli Mind program. Here is the source code (just remember to change it from a .txt to a .py). Now similar tags are clustered together.
- Here is how it looks like.
- Here is how the previous version looked like.
- The original from Brownhen (may he live long and prosper) used to be here, although now it is missing.
All on the same data. Mine, now.
Go and enjoy.
(Later addition: while the program works well for small databases of links, like mine at the time in which I wrote this entry, it doesn’t scale well on size. For this reason it crashes for most of the people who try to use it with more than 1000 bookmarks. For this reason I was forced to change the link on the cluster example to a database with fewer nodes.)
Now the tecnical stuff for those that have a bit more patience.
Tags are not all the same, some are more similar than others. So, for example, the tag “September11″ and “GeorgeBush” have more links in common than “GeorgeBush” and “intelligence”. The idea behind this version of DeliMind was to cluster tags that had links in common. Since distance is generally not a transitive property (if I am near to you, and you are near to Jim, I am not necessarily that near to Jim), while clustering is (if I and you are in the same cluster, and you and Jim are in the same cluster, then me and Jim have to be in the same cluster… unless people belong to different clusters, but that’s a complication).
So I started by making a matrix of relations among tags (all_dict). Each tag, respect to each other tag could either be
- Once contained in the other
- Identical
- Disjointed
- With # bookmarks in common
Then according to the number of links each of the two tags, and the number of links in common I invented a measure of similarity. If #A is the number of links in tag A, and #B is the number of links in tag B, and #AB is the number of links in common.
The the relative similarity (SAB) will be:
SAB= sqrt((#AB/#A)*(#AB/#B))
I actually played with various measures:
SAB= ((#AB/#A)+(#AB/#B))/2
SAB= Max(#AB/#A,#AB/#B)
They all went from 0 to 1, and were quite similar… (I am not going to discuss the relative properties)
But the first one just seemed the one that made more sense, and at the end, the resulting map was the one more close to my personal intuition of what should be in what cluster.
Once the similarity matrix was done I started studying the clusters. Generally for each triplet of tags A, B, C I would modify
SAC:=min (previous SAC, max (SAB, SBC))
And I would continue going through all possible triplets, and then starting again from the beginning until no new change were happening.
Why? The idea is that the similarity between two tags measure how easy it is to jump from one to the other. Visualise each tag as an island, and then you have an animal who can jump from one island to the other. But it can only jump up to a certain distance. So if he can find a succession of tags between two tags, A and B, where the similarity (the similarity is the inverse of the distance) is always above its jumping ability (that is, the distance is below its jumping ability), then the animal can move from A to B. If not A and B are in different clusters. Effectively unreachable.
But we don’t know how far can our beast jump. So in this way we end up having a similarity number that sais: somwhere, between A and B is possible to find a succession of tags, such that the distance is never above x, so SAB is equal to the minimum between the original SAB and x.
If it does feel complicated don’t worry. I got confused a few (hundred) times programming it. And just could not understand why those damn tags were not clustering… until I got it right.
So, now you have this nice matrix, only between your main tags (the one that are not contained in another tag, cfr previous version), and you (or actually I) need to cluster the tags.
Not also that you don’t need to cluster the tags only one time. Once you made a clustering (for animal which can jump d), you can still partition inside the clustering for animals that can jump less than d.
The first time I just asked him to cluster each possible number. That is, if a number was present assume that someone was able to jump exactly that distance. In this way I got a heavily clustered map. It was a mess, but a promising mess. I then saw that most of the interestign things were happening between distances of 0.333333 and 0.6666.
That is, it made quite sense to ask for the clusters generated by putting together tags that had one third of the links in common, and tags that had up to two third of the links in common.
This is how I got clusters:
- porno, sex and eros
- GeorgeBush, September11, politics, economy, historical, terrorism, usa
- green, sustainability
- …

Then I just applied the same process in the subtags of each tag.
Ok, I can be satisfied, I can go and have something to eat.
As always, if you find it useful drop me a line, I appreciate.
Pietro
So, I just modified the deli.mind script, originally from brownhen.
The original would take the public bookmark from delicious and make a free mind map out of them.
(For those who have no time to read the whole post, I immediatly tell you that I modified the code. The new code can be found here, and an example is here -open some nodes to see the difference!-).
The program is written in python, and I wasn’t very happy with the result. I mean it was great to have the map, but at the same time I have so many tags, that it was pretty much useless. Now the fact is that we tend to reuse tags that we have already used. This generates a positive feedback dynamic, that tends to create a bunch of very common tags (even among your own tags) and many many tags used only one or two times. I bet you could also plot them into a nice power law picture (but, alas, you need at least 1000 tags, to make it statistically meaningful!). This is generally true, but is particularly true for people who, like me, tend to store each link with around 10 different tags. This means that this long list of tags, that was using up my screen, was mainly composed of completely unimportant tags, with only few interesting among them.
Not only this, but some tags, tend to appear only in conjunction with other tags. For example, the tag “python” comes always with the tag “programming”. In a sense it is a “sub tag”.
Oops, are we back into hierarchy, aren’t we?
Well, not exactly, first the same link can be present in different non hierachically related tags, and second two tags can have links in common, but not be completely hierarchically related (think about the tag ‘September11′ and ‘GeorgeBush’ as a good example). The last thing to note is that from time to time there are tags which have exactly the same links inside, either because they are synonimous (’del.icio.us’ and ‘delicious’ for example) or because I had not stored enough links to differentiate between the two.
So the new program extracts the information about the relation among the tags, and uses it to build a more interesting mind map.
More precisly two tags can be:
- Identical,
- One inside the other,
- Viceversa,
- With a non empty intersection, but with some extra links,
- Completely disjointed.
This information is then used to create the new mind map.
With the following novelties:
- Sub tags are shown as a sub branch of their parent tag.
- Tags that are equivalent are shown together with a little empty branch as their parent, to connect them all.
- A sub tag can be sub tag to more than one tag.
- Each tag also is followed by two numbers: # of links & # of sub tags.
So you have an idea about how big is the tree you are going to explore.
You can see my “hierarchical delicious free mind map” in java format here while the code is here.
I also fixed a couple of bugs. That would give some fake results. (i.e. being tagged as ’socialsoftware’ does not mean being tagged as ‘war’, etc…)
This isn’t the end, I am planning to work on this some more, when I have time.
Disclaimer: This was also my first tentative hack in python. So I am sure I did plenty of things in a clumsy, slow and redundant way. But I am learning.
Acknowledgment: I am very grateful to brownhen., because if he didn’t release the first version of the script I would not have started at all.
|
|
Recent Comments