|
|
Some of you might remember that I wrote a post about the long tail of the ruling class. The post was in Italian and got translated in English by blogger Phil Edward. I took the translation copied it in my blog (with a link), but said that I did not fully agree with Phil understanding of my post. I didn’t enter more into details. And then there was silence, and in the silence I decided it was easier to just ignore the whole discussion. But a few days ago Nicholas Carr from Rough Type wrote a post on how the long tail permits to the service the puts in touch people to make massive amount of money, but to the people who produce the actual content not much money. Absolutely true, and this is why you don’t see google advertisments in my blog. But this is a very different problem from what I was discussing when I was speaking about the long tail of the Ruling Class. Mainly because I was not speaking about the ruling class but about the ‘classe dirigente’. Which is not exactly the ruling class, although I still can’t find a better translation. Ruling class smells a bit too much of kings and queens and prime ministers. And I was actually speaking about ‘classe dirigente’ as people who have authority over a certain field.
So when Phil commented on Nick post:
I blogged on this last year, in response to Pietro Speroni:
I felt I had to answer. Because my post was all about a multidimensional space (all our interests), which gets mistreated as a unidimensional space (money). Poor chap! For a multidimensional space to be treated as a unidimensional one is fairly common, but never fair. And the general excuse is ‘to understand better’, or ‘to simplify a bit’. But I suspect that multidimensional spaces might take it personally, bacause if you treat them bad, they can become quite convoluted, if you know what I mean. Maybe I should write a long post on the importance of not making models (even mental ones) with too few dimensions. But I think I shall leave it for some time next year. And then I can say that it was long due.
In any case I decided to copy my comment to Nick post here. Continue reading Where life is and money isn’t
It is now the time to present the next project we have been working to: TagBay. And I say ‘we’, because is this project I am not alone. I did it with a friend of mine, Derek, who accepted, very patiently to code, some of the idea I have been tinkering around in the last year or so. I am speaking about how tags, and tag clouds, and distance between tags, and so on.
So, in brief we made a web site to tag material that is being sold on e-Bay. Anybody can tag any object that is being sold. Not only can any object be tagged but you can tag sellers, too (oh, we are not responsible for offensive tags, eh!).
Tags on objects can be made private or public, and you can also search among your tags, among everybody else tags, and eventually (when we code it) it will be possible to search among the tags of another user, like in del.icio.us.
Now that the summary for the people who have no time has been done, let’s try to explain the idea in the details for those who have a bit more time.
Pages:
On TagBay, right now, there are 3 type of pages: e-Bay Search Pages, TagBay Tag Search Page, TagBayUser Tag Search Page, Item Page, and Seller Page.
- Search Page: It is possible from inside Tag Bay to make searches on e-Bay on specific keywords. The user can then add tags to each object that came out, store the tags added all at once, or store the tags of a single object. The same thing can be done in the Tag Search Page
- TagBay Tag Search Page: In this page the user gets all the results for a single tag that someone have used. Nothing fancy (for now). Items where the tag only appears as a private tag will not appear here.
- TagBayUser Tag Search Page: In this page the user gets all the results for a single tag that he have used. If the user is logged in and is looking at his own tags, also the items tagged in a private way will appear.
- Item Page: Each object has its specific page. From such page any user can see what are the public tags that other users have used for that page. Also they can define their personal tags for that object, if their tags are going to be private, and the tags of the seller.
- Seller Page: And then there is the seller page, and in the seller page any user can tag any seller. The use of tag for sellers is still limited, but will be increased in the future.
The natural use of the site
- For a seller or for a shop A seller might want to use the site to tag all the objects that he is selling, giving for each object all the tags related. Thus increasing the possibility for it to be found. We suggest to list the tag in the order of importance, as soon we are going to use the order consider the importance in the search page.
Also, if a person wants to make a cool list of objects, they can tag exactly those objects, with a tag they never used, and then link to the page in their directory of this tag. Thus creating on the spot, their lists. Also sellers will want to tag their objects, and people making searches will tag objects to make lists of objects they want to follow, before jumping on a transaction. We think there is more than enough material to generate interesting behaviour. It doesn’t have to be exactly the same emergent behaviour that we are used to see. After all we are just exploring the possibilities of social folksonomy.
- A shop To the possibilities before, a shop who is selling on ebay might be interested to make sure that the shop itself (remember that you can tag sellers, and not only shops) have all the tags related to the merchandise that they are selling
- Someone buying Our suggestion for someone who wishes to buy, on e-Bay, would be to first look under the tag search, to see if there is anybody who has already tagged any object that they are interested in. This does not necessarily be someone else who is buying, but also someone who is selling. Then tag the objects they are interested themselves, to have it in their own list of objects. Then they could go to the search e-Bay page with the necessary keywords, and add the chosen tag to all the objects interesting. At that point a first selection have been made, and all the possible objects have been tagged. At this point, he could choose one or those objects, change the tags to private, and start betting on it.
- Someone suggesting And finally if someone is just trying to suggest some possible objects, he could search e-Bay for those objects, tag them with a unique tag and present the url of the list to whoever is interested.
There are many other ways to use TagBay. In a sense TagBay is a toy, and not a game. And as every good toy it can be used for many different games. We suggest here only some of them. Also TagBay itself is rapidly evolving. We have tons of stuff we are interested in including, and if you have been reading my blog, you know how my problem is always to find people to code my ideas, more than to find them. And this is why I am so happy for Derek work!
Difficulties that we found:
There were a number of issues that came out when we started developing this program.
- Public vs private tags:
Why would someone tag an object if they are interested to buy it? After all aren’t they making it easier to others to find it, by adding those tags?
This was a serious doubt that we had, and finally we decided to give the possibility to users to tag objects privately. Yet there have to be a balance between private tags and public tags, as public tags are necessary to generate the emerging folksonomy that we wish to use. So we decided for a compromise: public tags can be done from the search page, but private tags requite you to go to the specific object page. In our view (but we are ready to be proven wrong) someone would go to the search page, tag all the entries where he might be interested. Then chose one, and tag that one in a private way.
- Limitations due to the temporary nature of the objects
Considering that most object exist on ebay only for few weeks before being sold, wouldn’t this be not enough time to make a tag cloud and let all cool emergent properties that folksonomy induces, appear?
Maybe, but sellers also can tag the objects they are selling, thus giving a fresh start to all the objects. Also side by side to tagging object we are giving the possibility to tag sellers. Which eventually should survive each transaction and build up an interesting tag cloud.
- I spoke about sellers tagging their own objects, but wouldn’t this invite people to spam your site? After all, wouldn’t it be much better for a seller to add many tags to be present in many searches?
Ah ha! You think tag clouds can be spammed. This is false. Tag clouds cannot be spammed, and no one understand this. And we shall use this site to prove it. We have nothing against spammers, they are absolutely welcome in our site and spam it as much as they feel. Add all the tags they want to each object they sell. It will make ABSOLUTELY NO DIFFERENCE in the search page. Tag clouds are unspammable. And our engine will use tag clouds as its base. Everybody else uses tag sets. And this makes them easily spammable. So, no we don’t fear spammers. In fact we hope that spammers will come to our TagBay site. They are just people trying to sell their stuff, we are trying to make sellers meet with buyers. Wouldn’t be bad to single out spammers just because they are spammers.
TagBay is obviously still in beta, and there are many things that need to be coded. If you have any idea on how to make it better please do not hesitate to contact me. If you want to make a difference on what the final product will be now is the time to do it. Also all new suggestion implemented should be listed in a special page with links to the original suggester home page.
I think it’s the time to present what have I been doing in the last days. A number of improvement have beed added to this web site. In short I have upgraded to wordpress 2.0. I also moved to the next version of wikka. Some of you might remember that I offered some money to whoever could write some code to get the tag plugin to generate an rss list. I didn’t, at the time, explained why. I will now.
Wordpress 2.0 gives the possibility to start categories on the fly. Just adding them, by listing them. Essentially this makes the category in wordpress work like tags (or keywords, for academics). But categories in wordpress also have an rss feed connected to them. Albeit with some bugs, like linking to the whole blog and not to the particular category. So I passed most of the first of January adding to the entries the relative tags as categories. So now I have no need of an rss feed for the tag page, as the tag page has been substituted with the category pages.
You also will rememer that I installed Wikka. The wiki engine. Now wikka is not only open source but also easy source. It is so simple that even I could hack the code. That is very simple! So I changed the code and inserted the possibility to have default pages. In short if before if you were to look for the url http://wiki.pietrosperoni.it/someunexistingpage and there is no page in the wiki called “someunexistingpage” the result would be that the wiki would ask you to edit the page, and you would be redirected to http://wiki.pietrosperoni.it/someunexistingpage/edit.
Now he would create on the fly the page someunexistingpage with the default content. And the default content I chose was the 4 rss feeds:
- the feed from my blog from the category: someunexistingpage
- the feed from my technorati from the tag: someunexistingpage
- the feed from my delicious bookmarks from the tag…
- and the feed from the popular pages in delicious, always from that tag
So for each tag I now have a wiki page with the most relevant rss appearing there. But being a wiki page I also can add other rss feeds, write definitions, comments, todo lists. In short modify it as I see fit.
Still it is not perfect. As it writes the page the first time, from that moment the page is set. I can delete it, but I cannot, for example, change the default content for all the pages that only contain the default content. I tried to write a plugin to do that, but I failed when I confronted the fact that I needed to write a plugin {{defaultpage}} who should have activated other plugins:{{rrs}}, for example. Something that I ignored how to do.
Also having the same string to work for delicious (as tag), wordpress (as category name) and wikka (as pagename) puts some heavy constraints on what the string might contain. For example I am already running ashore for all the tags that contain a dot inside (aaargh, del.icio.us!) or an accented letter (aargh, dear italian).
If you want to see how the pages look like just see the idea page. But any link from the right column (provided they have no dots inside or accents) will work fine.
Speaking about things that should happen, and that have not happen, one thing that I still have not seen is a browser with a feed reader such that when you go to the web page that is linked to a certain feed, it updates the feed and assumes that you have now read that feed. So for example I add Jim feeds, but on a certain period I am just passing through his web page very often. Obviously if he is updating his blog I would see it. The feed reader does not need to give me that information.
I have added the feeds of some major newspaper, but I also often go to the web page, and the result is that bloglines keep on telling me that those feeds have 200 entries that I should look at.
Note: This entry is connected also to a mindmap. Some people were having problems in opening the page because of that. As such the mindmap has been stored in a separate page, and can be viewed from here.
Introduction
As correctly pointed out by Jeffrey Zeldman tag clouds are becoming more and more popular. Yet I keep seeing services which should be using tag clouds that keep on using tag sets. It is not just a problem of programming a tool which can only support tag sets, but also but also of programming tools which might in principles produce tag clouds, but such that the users are not invited to use a tag if one already exists, and as such don’t generate a tag cloud.
Example of the first type of tools are Flickr, 43things, consuMating, tagsurf * , example of the second is the tagged version of the BBC* . In all those cases a tag set is used, where instead a tag cloud would be more appropriate. Some of the differences between a tag cloud and a tag set where explained in Vanderwal.net: Explaining and Showing Broad and Narrow Folksonomies. Let’s see them again, and see some consequences of those differences, which should clarify when is better to use one tool and when is better to use the other. Continue reading On Tag Clouds, Metric, Tag Sets and Power Laws
As the price of houses rises, more and more people find that the best solution is to divide a house among friends. Usually each person gets a room. The problem then is: who gets what room and how much should he pay. Usually the total rent is fixed, and usually the rooms are not exactly all the same. Some might be bigger, some smaller. Some might have a better view, more privacy, closeness to the toilet, more silence, and so on. And what’s also important is that different people might value the various elements in different ways.
I present here two ways of splitting the rent and dividing a house. I personally favour (and has designed) the second, but while I was presenting this method to some friends to get some
feedback, I was told the other, it seemed simpler, yet interesting enough to add it. They both assume that:
a) the rent is fixed,
b) there are no favoritism among the will-be-housemate on
who gets to choose first.
The ‘find the objective value first’ method.
Before the rooms are assigned, get together and agree on what are the objective value of each room (i.e. 20% of the rent for this, 50% of the rent for this). The total value must of course be the whole rent. Then randomly select who gets what room (at the agreed price), and as a final action people are allowed to exchange rooms if they want to.
Positive element: it is simple and quite straightforward.
Negative element: it assumes that people can easily agree on the actual relative value of the rooms, and that such value does not change respectively to the persons.
The ‘each person gets the best room’ method.
As I said this is the method that I love most. First of all let each person inspect all the room. Then each person, writes, secretly, the relative value of each room in a piece of paper. The sum of the values must be equal to the requested rent. The idea is to divide the house so that each person gets a room, and pays for that room the value THEY wrote on the piece of paper, while the sum of the valued paid by each person totally covers the requested rent.
Obviously, very often, the collected money would then be higher than the rent. Let’s call the collected money minus the monthly rent, the ‘extra money’.
Often there is more than one solution, that permit to have a some extra money each month. When this happens, the solution that permits to maximize the extra money is chosen. The extra money is then used to pay for the light, any extra expenses, or whatever is needed for the house.
Sometimes there are more than one optimal solution, that is some solutions generate the same extra money, everybody is paying the requested cost for each room, and all other solutions are less optimal. In that case the adopted solution will be one of the optimal one, randomly chosen.
Examples, examples:
Let’s suppose we have a house with 3 rooms (a, b, and c) and 3 persons (A, B, and C). Let’s suppose the total rent being 100.
Person A might find the three rooms equivalent, so he might just write (a: 33.3, b: 33.3, c: 33.3). Person B might instead favour room B, because is more sunny, and she likes to paint, and then she thinks that room ‘a’ is slightly better than room ‘c’, infact she would prefer not to be in room c at all, so she would write: (a: 35, b: 40, c: 25). Person C instead does not care about the sun, but has noticed that room A has more privacy, plus is near the toilet, and since he likes to have his gf as a guest, thinks that having room A would be a better deal. So he votes (a: 40, b: 30, c: 30).
Then the papers are revealed.
Generally when a room has a person that values it more than all the others, and he values that room more than all other ooms, then that room gets taken by that person at the price he has choose.
In our example we have:
A: (a: 33.3, b: 33.3, c: 33.3)
B: (a: 35, b: 40, c: 25)
C: (a: 40, b: 30, c: 30)
which would give us that A would get room ‘c’ paying one third of the rent. B would get room ‘b’ paying 40% of the rent, and C would get room ‘a’ for 40% of the rent… and the collected money each month would be 33.3+40+40=113.3 . The extra money would be 113.3-100=13.3 and would be used to pay for the electricity, water, gas, or whatever.
It is also possible to rinormalise the prices, by lowering them so that the total sum becomes exactly the cost of the rent, while the relative ratio remains the same. In our example
A: (33.3/113.3)*100=29.4
B: (40/113.3)*100=35.3
C: (40/113.3)*100=35.3
and person A would pay 29.4 of the rent (since he took the room nobody wanted)
person B would pay 35.3 of the rent (and took the sunny room)
person C would pays 35.3 of the rent (and took the room with more privacy)
So, what if the situation is not that easy. There isn’t a person that prefers each room? For example you could be in a situation like:
A: (a: 45, b: 45, c: 10)
B: (a: 40, b: 40, c: 20)
C: (a: 40, b: 30, c: 30)
well in this case it is obvious that person A will get either room a or room b. But it is also obvious that room c will go to person C. So C get’s c at 30% of the rent. Both A and B value the room a and b equivalently. But once the room will be assigned person A will pay more than person B, so it seem fair to me that person A chooses a or b and pays 45, and person B gets the remaining room, but pays less (40).
But things can get even more complicated if some people
value some rooms exactly the same:
A: (a: 45, b: 45, c: 10)
B: (a: 45, b: 45, c: 10)
C: (a: 40, b: 40, c: 20)
in which case A and B have obviously to randomly choose who gets what.
Or if the situation is symmethric among the rooms:
A: (a: 40, b: 30, c: 40)
B: (a: 40, b: 40, c: 30)
C: (a: 30, b: 40, c: 40)
In which case you randomly choose if A gets a or c, and then the other follow obviously.
So here we have the first mehtod, where everybody chooses the value together, this is equivalent on the second method if everybody agrees on the relative value:
A: (a: 35, b: 40, c: 25)
B: (a: 35, b: 40, c: 25)
C: (a: 35, b: 40, c: 25)
After which, also in this method, you would randomly pick who gets which room.
Please, let me know if you have tried it and if it was succesful.
Have you ever tried google translate service? I know, if you did you wish you didn’t, unless you were bored, and were looking for some ways to amuse yourself. But you know, translating text is a really daunting task. Generations of PhD’s have been spent in progressing the state of the art just a little bit every time. I know what I am speaking about, I lived with some of them in COGS, at Sussex University. I remember reading somewhere that new, better automatic translators will soon be available. Good! We are waiting for them.
In the meantime…
I had this idea:
Have you ever tried to translate a page from a language you don’t know… quite well. But you are not also totally ignorant about. Something in between. Here in Europe is quite common. And the same is true when I read posts in Portuogese, or in American from people on the other side of the ocean.
Yes, I can try to use Google translate mechanism, but it doesn’t give me something easyto chew. Look at this post, for example:
Depois do high vem o low. É uma lei do universo.
E no low todo mundo é feio e o mundo é triste e é tudo um saco.
E eu já nem sei o que me move.
From here
Google translates it as:
es low.? a law of the universe. E in low everybody? ugly and the world? sad e? everything a bag.
E I j? nor I know what it moves me.
From my darling Alenahra.
I suppose a better translation would be:
After a high comes a low. It is a law of the universe. And in a low everybody is ugly and the world is sad and everything is empty.
And I still don’t know what is that moves me.
And Ale’ will tell me if I got it right.
My idea is that Google, instead of providing for a tentative answer should provide for all the possible translations for each word. Those translated words should appear when we point to a word with the mouse. I know it is a slow way of reading a document, one word at a time, but soon the reader will catch up the most common words, and will speed up.
What follow is an example. Move on the words to see the title appear. I used some simple translation that I could find. Obviously the tool I envision would have to be more professional.
Depois do high vem o low. É uma lei do universo. E no low todo mundo é feio e o mundo é triste e é tudo um saco.
E eu já nem sei o que me move.
In Italy right now more and more people are getting confortable with english. If you werte to come here only 10 years ago most people would refuse to even try to speak engliish, even if they studied it in school. Now, I believe thanks to internet, people are reading english pages daily, the dictionary often ina corner of the desk, ready to be used. It would be helpful for them to have sucha system.
And I would finally learn Portuogese!
Porto Alegre, aspettami!
Special thanks to travlang.com for providing part of the translations.
I went on programming at my favourite Python program: Delimind.
In short: Made a new release of the Deli Mind program. Here is the source code (just remember to change it from a .txt to a .py). Now similar tags are clustered together.
- Here is how it looks like.
- Here is how the previous version looked like.
- The original from Brownhen (may he live long and prosper) used to be here, although now it is missing.
All on the same data. Mine, now.
Go and enjoy.
(Later addition: while the program works well for small databases of links, like mine at the time in which I wrote this entry, it doesn’t scale well on size. For this reason it crashes for most of the people who try to use it with more than 1000 bookmarks. For this reason I was forced to change the link on the cluster example to a database with fewer nodes.)
Now the tecnical stuff for those that have a bit more patience.
Tags are not all the same, some are more similar than others. So, for example, the tag “September11″ and “GeorgeBush” have more links in common than “GeorgeBush” and “intelligence”. The idea behind this version of DeliMind was to cluster tags that had links in common. Since distance is generally not a transitive property (if I am near to you, and you are near to Jim, I am not necessarily that near to Jim), while clustering is (if I and you are in the same cluster, and you and Jim are in the same cluster, then me and Jim have to be in the same cluster… unless people belong to different clusters, but that’s a complication).
So I started by making a matrix of relations among tags (all_dict). Each tag, respect to each other tag could either be
- Once contained in the other
- Identical
- Disjointed
- With # bookmarks in common
Then according to the number of links each of the two tags, and the number of links in common I invented a measure of similarity. If #A is the number of links in tag A, and #B is the number of links in tag B, and #AB is the number of links in common.
The the relative similarity (SAB) will be:
SAB= sqrt((#AB/#A)*(#AB/#B))
I actually played with various measures:
SAB= ((#AB/#A)+(#AB/#B))/2
SAB= Max(#AB/#A,#AB/#B)
They all went from 0 to 1, and were quite similar… (I am not going to discuss the relative properties)
But the first one just seemed the one that made more sense, and at the end, the resulting map was the one more close to my personal intuition of what should be in what cluster.
Once the similarity matrix was done I started studying the clusters. Generally for each triplet of tags A, B, C I would modify
SAC:=min (previous SAC, max (SAB, SBC))
And I would continue going through all possible triplets, and then starting again from the beginning until no new change were happening.
Why? The idea is that the similarity between two tags measure how easy it is to jump from one to the other. Visualise each tag as an island, and then you have an animal who can jump from one island to the other. But it can only jump up to a certain distance. So if he can find a succession of tags between two tags, A and B, where the similarity (the similarity is the inverse of the distance) is always above its jumping ability (that is, the distance is below its jumping ability), then the animal can move from A to B. If not A and B are in different clusters. Effectively unreachable.
But we don’t know how far can our beast jump. So in this way we end up having a similarity number that sais: somwhere, between A and B is possible to find a succession of tags, such that the distance is never above x, so SAB is equal to the minimum between the original SAB and x.
If it does feel complicated don’t worry. I got confused a few (hundred) times programming it. And just could not understand why those damn tags were not clustering… until I got it right.
So, now you have this nice matrix, only between your main tags (the one that are not contained in another tag, cfr previous version), and you (or actually I) need to cluster the tags.
Not also that you don’t need to cluster the tags only one time. Once you made a clustering (for animal which can jump d), you can still partition inside the clustering for animals that can jump less than d.
The first time I just asked him to cluster each possible number. That is, if a number was present assume that someone was able to jump exactly that distance. In this way I got a heavily clustered map. It was a mess, but a promising mess. I then saw that most of the interestign things were happening between distances of 0.333333 and 0.6666.
That is, it made quite sense to ask for the clusters generated by putting together tags that had one third of the links in common, and tags that had up to two third of the links in common.
This is how I got clusters:
- porno, sex and eros
- GeorgeBush, September11, politics, economy, historical, terrorism, usa
- green, sustainability
- …

Then I just applied the same process in the subtags of each tag.
Ok, I can be satisfied, I can go and have something to eat.
As always, if you find it useful drop me a line, I appreciate.
Pietro
|
|
Recent Comments