Introduction

Affinax is a novel universal targeting technology that anticipates a person’s preferences in any domain of life, with much greater speed, without popularity bias, and using far less input about users and items than existing technologies require. Much of commerce and discovery on the internet is currently powered by relatively primitive targeting technologies. With its substantial advancements, Affinax should be able to make a significant impact on internet advertising and discovery, information filtration, and online and mobile commerce. Join us.


Jun 18, 2008

Changing the World: The Business Model

What is the business model for the next internet revolution? In this article, I review web monetization issues, especially that of web 2.0. I propose a monetization solution where any site with users, commercial items, and even visitors, can significantly increase its revenue and reduce marketing and advertising expenses. Our affinity targeting system monetizes itself in the process.

Traditionally, business models for web applications, communities, blogs, etc. are an afterthought. Apps and networking sites dream of reaching critical mass and then selling to Google, Microsoft, Yahoo, etc. Thus the revenue model is actually an exit strategy. This dream has been fueled by the observation that the purchase price for such sites is related to reach ("eyeballs", size of the audience). This is reminiscent of Metcalfe's Law. A more thorough analysis of the market value of social networks was recently posted in TechCrunch, by Michael Arrington.

A very few fortunate web startup founders do not need to consider a business model beyond their big exit, even in the current economic climate. The new owners, however, will be forced to monetize their sexy new purchase. For the vast majority of web startup founders the business model will be important and is often considered and tested from the very start.

The default monetization method is advertising, preferred by 58% of web startups (this figure includes affiliate marketing) according to Bizak. Of the strictly advertising sites, Google's AdSense is adopted by 54%. I imagine this number is higher for web 2.0 social sites. Nonetheless, AdSense earnings per visitor (EPV) are the lowest among the various monetization methods. As an example, Tom OKeefe writes about Mahalo's poor Google AdSense earnings, and Allen Stern predicts that affiliate revenue could surpass Google AdSense revenue for Mahalo in the long-term. Decrying AdSense as "worthless", Tom OKeefe asks "What's Next?".

Many of the hugely popular sites are struggling to better monetize. YouTube, for example, is struggling to justify its $1.65 billion purchase price. Also, Facebook faces a rough road ahead, with "only" $150 million in ad sales in 2007 and projections of $265 million in 2008, and Aidan Henry proposes solutions to the "perennial debate surrounding Twitter's revenue model", and the CEO of Mahalo, Jason Calacanis, even chimes in with his own Twitter business model suggestions.

This struggle may no longer be necessary. Our novel Affinity Targeting technology allows a user to be targeted to entities they are most likely to appreciate, in any domain of life. On-line communities and sites with users can increase their earnings by adding both their site and users into our system. Those users are then targeted to entities of interest (products, services, media, jobs, sites, other users, etc.). Targeting, leading to a commercial transaction, will result in affiliate revenue, part of which is shared with the originating site of the purchasing user. According to Bizak.com, affiliate earnings per visitor are 16 times greater than AdSense earnings. An affiliate model with highly specific user targeting should increase such earnings significantly.

The benefits don't end at monetizing eyeballs; sites and sellers can precisely target users to themselves and their items, thereby increasing sales and reducing costs. Communities, groups and fan clubs all seek to attract enthusiastic members. In our system, users will be targeted to the communities they are most likely to appreciate, leading to increased membership and customers. Also, sellers and providers benefit by precise targeting of users to products and services they are most likely to purchase. This will increase sales, and reduce dependence on marketing, advertising, SEO, etc. All sellers and providers are required to do is profile their products, services, jobs, etc. for the system (in the unique way we need the info) and agree to our affiliate model. There are no other costs to them.

The figure above (click to enlarge) depicts a solution to several critical needs: internet sites and sellers must increase their revenue, reduce expenses, and attract the most ideal new users or members. In our solution, sites and sellers add their existing users (no private information is required) and/or items into the system. Users are then targeted (via the targeting engine) to three different kinds of entities (circles): other users (if they are so inclined), groups (sellers, sites, communities, etc.), and items (products, services, media, jobs, etc.). When a user is targeted to a commercial item and makes a purchase, the seller provides an affiliate fee to the system, part of which is shared with the group that brought the user to the system. Also, if a group added an affiliate item into the system that they are not directly selling (for example, an Amazon.com book), part of any affiliate fee earned from that item is shared with that group. Follow the green arrows to see the flow of money. Note that sites and sellers may contribute users and/or items, and users and/or items may be entered independently of a site or seller.

Our plan is to grow the system organically by bootstrapping it on FaceBook and OpenSocial. We will do this in a way where critical mass never becomes a significant issue. At a certain point the affinity matrix of objects will be large enough to attract sites, communities, sellers and providers. At that point, we will offer our own API, customizable web interface, or client software, such that a site and its users can interact with the system the way the site sees fit. In the beginning we will use existing affiliate and payment processors, but eventually this will likely be done with our own systems. Our affinity engine and business model represents the ideal win-win solution for sites, sellers and users: better targeting, discovery, user satisfaction, monetization, reduced expenditures, etc. Ultimately, we see this targeting system attracting a significant fraction of on-line sites, communities and commercial entities.

Apr 25, 2008

The Affinity Graph

Is the Affinity Graph the anticipated Internet Singularity?

Tim Berners-Lee, the father of the World Wide Web, has been talking about this concept of the future "Internet of things." By "things" he means the people and other objects on the internet, and he argues that those things and the connections between them are the key aspects of the web. This, he argues, is the primary evolution of the walled gardens of "Web 2.0" into something far more important. He calls this evolution the Giant Global Graph, while others call it Web 3.0 or the Semantic Web.

The use of the term "Graph" has been met with a bit of consternation by those who argue that we already have the term "network" to describe these connections. Robert Scoble describes the difference in reference to social relationships where your social network is who you know, while your Social Graph describes who you are associated with based on common objects of interest (passions, concerns, politics, religion, work, school, etc.). He says: "The Social Graph is NOT my social network. My Social Network is my friends list. But the Social Graph shows a LOT more than that." A Graph then is not simply the simple connections, but the types and context of connections and the strengths of those connections.

While the Graph will ultimately know what is currently song #3 on your iPod, some metadata about the song, as well as all the other people who have the same song as #3 on their iPods, one must wonder "what's the point"? How does this help me discover that I should be a dolphin trainer, or to find new people that share my way of thinking? Once the monstrous amount of data on the Graph is accessible to robots, many will be applying data mining and filtering algorithms, and massive amounts of CPU, to try to generate usable information about the people and other objects on the web.

Tim Berners-Lee envisioned the ability to create "intelligent agents", sort of like advanced email filters, to perform many of the more tedious tasks, easier and faster. I talked about a similar kind of agent in the post "Your Identity Proxy". Real progress will be achieved when future technology will be able to offer the users a much more personalized and enjoyable experience, and of course better targeting of those users with commercial objects. In practical terms, this will require the storage of as much data as possible about users and their objects so that futuristic computer programs will be able to make sense of the identities of those users and the meanings of those objects, and also to make predictions about the basic affinities between the objects and users. Some even predict that given enough information, "the machine" will begin to transcend the metadata and attain a kind of sentience (or sapience).

This is similar to the ideas of Gary Flake who hypothesized that continued advancements in networked information and other technologies will create a "virtuous cycle" leading to what he terms the "Internet Singularity". As with the Global Graph, we are far from advanced enough technologically to see these concepts realized in the near future.

Let me propose that both the Internet Singularity and the Global Graph are overlapping concepts that are largely achievable today through the Affinity Graph, a major element of this project. As of late 2007, we have had the technology to begin to store the affinity relationships and strengths between users and all other objects on the internet and mobile devices. This is a much simpler abstraction, where we store the most important kind of meaning (affinity) for the typical user. In other words, the most important benefit of the Graph or Singularity, e.g. searching, personalization, and discovery, can be generated, stored and queried in a much more feasible way than is predicted for the Graph, Semantic Web, or Singularity.

With the Affinity Graph, the similarity in meaning of objects, including people, will be known. Universal categorization, classification, hierarchies and affinity matching will all be made fairly trivial. Users will have immediate access to their future favorites in every domain of life; likewise objects (and those that care about them) will know which users are likely to most appreciate those objects (marketers? advertisers? evangelists?). This is the point at which the Utopian dreams of internet visionaries is realized. The Affinity Graph does not make irrelevant other forms of abstractions or metadata upon which computer scientists are free to set loose their strong AI. There are many other kinds of meaning, and those will be explored by computers in time.

The Singularity is here, as is the Global Graph, in ways that are most important to users.

Mar 19, 2008

Big World, Short Life

The world is big and life is short. We've solved this problem.

To restate the problem: in our short lives, we are unlikely to ever find the people and things that we would most enjoy and appreciate. This is unfortunate.

Have you been feeling the pain? Not finding your soulmate? No best buddies? Have the suspicion that the most incredible music is out there, somewhere? Feel like you never found your ideal vocation? Actually there is little chance you could have found the optimal things in life. As I mentioned in a previous post, it would take us thousands of years to meet every other human, listen to every song, read every book, evaluate every vocation, etc.

Many of us have grown to accept our mortality, and the tyranny of time. We've had to accept the limitations in the time we have to explore options and find those optimal things. This acceptance has silenced our normally inquisitive and innovative inclinations to find solutions to problems; it seems an insurmountable problem, and, frankly, dwelling on mortality is not entirely pleasant. Those who haven't accepted mortality will deny the existence of the problem and thus the need for a solution.

I didn't set out to solve the 'short life' problem. Actually, that's not entirely true - I'm a huge health and nutrition nut: I plan to be healthy to at least age 120. But in this post, and in this project, I'm not talking about extending human lifespan. It is the 'big world' problem that we are addressing, and the problem may not be so big after all. The innovation came first, and then it occurred to me that the thousands of years it would take to find your favorites could be compressed significantly.

I'll use the analogy from a previous post. Many of us receive hundreds of emails every day. Without an email filter, it would take us hours to sort through and pick out the emails we prefer. We don't have enough time to perform this task, nor would we want to. The email filter, if it works well, presents to you only those emails that you are most likely to prefer. Reviewing your emails becomes a much quicker and simpler task. Information overload is reduced.

In a similar way, our discovery engine sorts through thousands of years of people, media, opportunities, ideas, causes, products, etc, and presents to you only those things you are most likely to prefer.

So you need no longer fear your own mortality. :-)

Mar 12, 2008

Your Identity Proxy

There seems to be a bit of confusion about the distinction between the terms "identity" and "identification" in popular discussion. The terms are often used interchangeably, and are used differently in different contexts. I thought I would write a bit about these and other related concepts, including a new concept that we introduce.

The terms are infused with the complexity of multiple disciplines (philosophy, psychology, sociology, neurology, religion, etc.), each with their own usage and take on the meanings. To add to the complexity, identity is now an important concept with different meanings for government, commerce, and the internet.

Who are you? Are you different from your neighbor? From your identical twin? Is there something about you that distinguishes you from everybody else? The subjective versions of this are the "self-image" (a person's own model of his identity) and the identity perception of someone by others. Is it "the self" or the the ego of psychology? Is this the "soul" of certain faiths? Is it the mind? The brain? What about the body? Is identity a product of nature, nurture, or both together? Many questions.

Most of us cannot be relied upon to accurately describe our identities, though sometimes best friends can get pretty close (we get closer, see below). This is the reason metadata contributed by users about themselves or their works is not considered accurate. A personal tag cloud is just an ego trip. It is highly subjective. Web page meta keywords are no longer relied upon by search engines or advertisers because they are so inaccurate. This is the source of the delay in the promised "Semantic Web" revolution.

I like to think of Identity as that mental thingy that distinguishes you from every other person. It is the objective, non-corporeal entity that is the sum of all the biology and environmental influences that constitutes what it is to be you, at this moment in your life. Despite the similarities, you have a different identity from your identical twin because your minds and bodies have had different experiences. You also have a different identity than yourself of one year ago because you've had new experiences... and of course your brain has suffered some oxidative degeneration ("vegetable oil", anybody?).

But in the real world, and for the purposes of government, commerce, most things that make the world work, it is the corpus that counts. You are you because you are contained within the body of you. Identity equals body. The body that is recognized as you by facial recognition, and authenticated by fingerprinting, retinal or corneal scanning, etc. Science fiction has enjoyed this mind-body identity confusion with numerous examples in movies and television ("This body is not mine, and I have to be clever to convince my friends of my true identity").

Now, identification is the assertion that you are actually you ("I may look like a fly, but it's really me!"). Having the face of Nancy is an assertion that you are Nancy, i.e. your friends and family will identify you as being the identity they call "Nancy". Identical twins and masks can confuse the identification in opposite ways.

One can authenticate their identification with some available mechanism that provides some level of authentication. Visual similarity to your picture ID card (is ID "identity" or "identification"?) is a common form, voice recognition on the phone is another common one; "Hello, it's Nancy" works only if you sound like Nancy. We can authenticate the body fairly well, but the mind is more difficult ("Nancy doesn't seem like herself today. Maybe she's been taken over by an alien.").

On the internet, there are various uses for the terms identity, identification, authentication and anonymity. Your Facebook profile is a reflection of your identity, or an exhibition of your identity, most probably with identifying elements like your name and photos. In some cases you may have multiple online "identities" representing different facets of your actual identity. Those facets are sometimes identified by usernames and avatars indicative of the identity or sub-identity or idealized identity they represent.

For a new user, identity may initially not be important: an anonymous user is self contained, requiring no identification or authentication. But as other users get to know that user, they will expect that it is consistently backed by the same identity. As it develops a reputation, the identity behind that user identification will want to maintain exclusive ownership of that identification, via some kind of authentication that ensures such exclusivity.

There are many systems for authentication, each attempting to ensure that the user instance is an active reflection of the same identity. Online banking is an example. There are two levels of authentication here. First, the owner of the username is the identity called "Nancy" with these identifying personal details. Second, that the username instance (i.e. the just logged in identity) is also the "Nancy" identity (access management). The first is corpus related: Nancy walks into her bank and gets her login details based on corpus identity. The second is mind identity: does Nancy remember her username and password, or where she scribbled them?

Our project introduces another concept to the scene: the identity proxy. In our case, it is an objective proxy of your identity that makes choices on your behalf, likely the same choices you would make, even when you are not logged in. In a sense, it is like an email filter that follows your instructions and helps you deal with information overload by automating that small bit of your identity that prefers certain emails over others. Ours is much more powerful in reducing information overload because your identity proxy automates the filtration of all available information and options, in every domain of life. Your identity proxy is an accurate and objective reflection of your identity and it understands and automates your decision making processes. There is no greater weapon against the tyranny of choice and information overload.

Without an email filter, it would take us hours per day to delete the spam and read the relevant emails. We would quickly lose patience and only find a fraction of real emails. Likewise, it would take us thousands of years to meet every other human, listen to every song, read every book, evaluate every vocation, etc, in order to find the ones we like. It's a big world, and, sadly, life is short. The identity proxy does not live our lives for us - it makes our lives richer by allowing us to find those things that we wouldn't have found unless we lived for thousands of years.

Also, at it's core, the identity proxy requires no corpus identification, i.e. no personal or demographic details are necessary in the registration process. Nobody can use the registration information to track you down (track down the corpus). Privacy is intact.

Your identity proxy is singular. Having more than one identity proxy is a waste of time because every time you register accurately the system should see you as being identical (or close) to your previous proxy. Registering inaccurately serves no purpose because the proxy will make choices that do not reflect your identity, and the choices will not be as fulfilling for you.

Feb 12, 2008

The Serendipity Revolution

Traditionally, the success of recommender systems is evaluated by predicting accuracy of recommendations off-line using existing datasets. For example, see the million dollar Netflix prize for a meager 10% improvement of their collaborative filtering algorithm. Netflix provided access to 100 million of its customers’ movie ratings to train new algorithms and test them. In other words, the algorithm is judged more accurate the more it recommends movies the user has already seen. Recommendations based upon this traditional accuracy metric are not the most useful to users.

Researchers know that success of recommendations is better measured by recording user satisfaction - the positive emotional response at having discovered something new that one likes. But that is more difficult to measure - as it requires a community of users and a useful mechanism to compel (or at least strongly encourage) the reporting of satisfaction, it's strength and perhaps type. Satisfaction of recommendations seems to follow in ascending order of the following recommendation types:

  1. Low quality, low accuracy recommendation. Users obviously don't appreciate having their time wasted in evaluating something that the system should have known the user would not be likely to appreciate. These are "trust-busters"; the user will lose trust in the system.
  2. An accurate, but known recommendation. An item the user is already aware of. The user likes the item, but it is not novel. Trust is maintained because at least the system recommended something that the user already likes. Too many of these recommendations imply an excess number of false-negatives or "missed opportunities".
  3. A novel, but obvious recommendation. A novel recommendation is something new and appreciated, but something the user would have discovered on his/her own. For example, a new song from a favorite musician, or a new movie from a favorite director. The user will have a positive, though muted, reaction. Many users will suspect that there were "missed opportunities", given the huge number of unfamiliar items in any domain.
  4. A serendipitous recommendation. A serendipitous recommendation is something new, non-obvious and appreciated that the user would likely not have discovered on his/her own. For example, an unfamiliar song from an unfamiliar musician, or a unfamiliar movie from an unfamiliar director. The user will likely have a very positive reaction, though it has been argued that, in some users, such recommendations may be seen as obscure and not immediately appreciated.

The serendipitous recommendation is obviously the ideal for most users, the problem is that collaborative filters tend to focus on what is commonly known and popular - items that the user has heard about or items that the user would have experienced eventually because of their "blockbuster nature". Many of the most interesting items for the user may be buried in the "long tail", so some collaborative filtering systems have attempted to tweak their algorithms to try to maximize this type of recommendation by reducing the more popular recommendations. Even so, recommendation diversity tends to be reduced in collaborative filtering systems, leading to a large number of false-negatives or "missed opportunities".

Recommendations based on a user's core identity will not focus on the popular, or items from artists or directors the user likes, or that the user's friends like. Instead, the user will be recommended items from the entire item landscape that by definition the user is most likely to appreciate based on that core identity (their "preference engine"). Thus the recommendation diversity (coverage of item space) within a domain (such as music) is as large as the diversity of items within that domain, leading to a large number of serendipitous recommendations - possibly the vast majority. Keep in mind that the number of domains in our community is also unlimited, and the same core identity can be used to recommend anything and everything in life.

Jan 14, 2008

One Degree of Separation

Social networks rely on your primary network - your existing friends and contacts - to introduce you to THEIR friends and contacts. Each of the people in the network are called 'nodes', each with one or more connections to other nodes. Each of those connections is sometimes called a degree of separation; a friend of a friend (FOAF) would then be two degrees of separation. The famous phrase "six degrees of separation" was based on work by psychologist Stanley Milgram who determined that any two Americans, connected in the nation-wide extended network, are separated by an average of five intermediaries, i.e. six steps or degrees.

Despite their connectedness, two people separated by so long a chain are extremely unlikely to ever meet. In fact, we usually only ever meet the friends of our friends: an extremely small fraction of the larger network. Web services like LinkedIn, the business contact network, tracks your chain to three degrees of separation - though I wonder how often the third degrees ever connect. [Friendster tracked the chain even further, and this pursuit has been credited with Friendster's downfall, as tracking long chains is very difficult computationally and has much larger hardware requirements.]

Online Social Networks are not really social, and the network - as degrees of separation - serves mostly to separate. So, if one really wants to 'kill' social nets, one needs to get rid of the 'net' (the multiple degrees of separation that separate people) in order to bring people together. Jyri Engeström argues that social networks should not be based on individual connections between people that can be counted and accumulated, rather people must be connected by shared objects. We agree and take this to the next level by making everything in the virtual community an object, where each object is connected to every other object.

The New Paradigm

As proposed in the last post, what is lacking in the current data islands and the proposed schema solutions is a way of harnessing the true power of the collective to actually reduce information overload and increase discovery. This will require a revolution in content and relationship discovery that can only arise with a completely new kind of information filtration and recommender technology.

"The social web will be powered by recommender systems".
Open Issues in Recommender Systems
John Riedl, Bilbao Recommenders School, 2006

The true power of the collective will be realized with the proper integration of social media, new universal discovery techniques, and associated detailed portable identity and personalization info. The result is a Social Web based on one degree of separation: all people and things are related to each other directly, with each such relationship differing only in type and strength. The following graphic is a representation of such a "one degree" circle of people relationships, but keep in mind that each person is also similarly related to all items, ideas, endeavors, etc. in the system as well.

Critical to this new paradigm are the new universal discovery techniques that I've hinted at previously. Current recommender systems, including collaborative filters, are too primitive and limited to accomplish the task. Instead, we have applied certain bioinformatics concepts to solve the puzzle of simulating the human preference engine without requiring "strong AI". This starts with a quick determination of a person's "core identity", that internal mechanism which is responsible for generating appreciation, and sifting through the chaos and making choices.

Determining that "core identity" is a critical breakthrough as it allows us to quantify the relationship (strength and type) between all people, and between all people and all other things in the system. It also can yield portable data that can be used to quantify such relationships between users and items from multiple data islands, and can even be used in mobile devices and in real-world activity. This discovery system involves no collaborative filtering, psychological testing or interpretation, statistical or stochastic methods, etc.

"But there is no go-to discovery engine - yet. Building a personalized discovery mechanism will mean tapping into all the manners of expression, categorization, and opinions that exist on the Web today. It's no easy feat, but if a company can pull it off and make the formula portable so it works on your mobile phone - well, such a tool could change not just marketing, but all of commerce."
The race to create a 'smart' Google
by Jeffrey M. O'Brien, writing for Fortune Magazine

In addition to the current benefits of the social web, the integration of these universal discovery techniques will allow:

  • A brief one-page registration with no need for private information. Qualifies as 'Cold-Start' for people and also items, ideas, endeavors, etc.
  • Immediate access to promising relationships of all types, i.e. universal recommendations. These relationships are the predicted interest and affinity between a person and all other people, music, movies, books, recreation, groups, products, services, ads, travel destinations, vocations, jobs, teams, politics, religion, ideas, websites, articles, news items, games, etc.
  • Portable data that can be compared and relationships quantified. This portable data can be used between social and data islands, for mobile devices and in real-world activity.
  • No language or cultural barriers: no folksonomy or semantic constraints.
  • No need for existing relationships. Emphasis is on relationship discovery, though existing friends and contacts are revealing.
  • No need to observe history of actions and choices. A one-page registration is enough to provide significantly more information, and better information, than collaborative filters can accumulate.
  • The new system will act as a good friend who knows you well and delivers trusted recommendations of all types, both solicited and unsolicited.
  • Reduced privacy concerns as personal or demographic data is unnecessary.
  • Automatic person-level granularity. Each relationship has a strength and type.
  • Universal recommendations allows for highly successful affiliations of all types, direct sales and downloads, and highly targeted advertising as the diverse business model.
  • Ratio of discovery to effort is high. No need for constant messages, spam, requests, friend searches, etc.
  • Discovery is filtration, so 'information overload' and the 'tyranny of choice' are greatly reduced.
  • Enables highly personalized search engine functionality, news aggregation, and many other forms of person-level information filtration.
  • Constant excitement of discovery, so no "what's next?" reaction. No limit to novelty and interest, little boredom. No feeling of wasted time.
  • Highly useful and usable: the keys to success of any product or service.

Jan 8, 2008

Social Standardization and the Death of Social Networks

"...we’re reaching an inflection point where some fundamental conceptions of the web (and social networks) need to change".
from Stop building social networks, by Chris Messina

It seems that everybody is predicting the end of something due to something else, typically calling the later a 'killer app'. Are VOIP and email replacing the phone and fax? Is social media replacing Google search, email, communication in general? Is IM replacing email? Well, who would have predicted that the trusty typewriter would disappear in the span of a few years? It seems many are making another prediction: Social Nets are on their way out, at least in their current configuration. In this post, I'll talk about the problems and proposed solutions.

Social Nets are hugely popular and are obviously doing something right. They were clearly a revolution in online communication and information sharing. Let's first list why people enjoy them. They allow you to:

  1. express yourself and try to look cool
  2. people-watch / voyeurism / "gawk at strangers"
  3. 'collect friends' and compete to see who has more
  4. waste time doing semi-fun alone stuff with apps, etc.
  5. keep in touch with existing friends (the primary network)
  6. make new friends, dates and business contacts (the largely unfulfilled promise of the 'network')
  7. manage your personal data
  8. exchange knowledge and information
  9. re-connect with old friends and colleagues

As for the negatives, here are some of the points mentioned on the blogosphere:

  1. 'Friend collecting' is not 'social'. No real communication takes place, and no real friends are made. Checkmarking someone as a friend is not being social. Not much relationship building going on.
  2. Information Overload is not reduced, quite the opposite: too many people, messages, spam, etc. There is a limit to our ability to absorb information: our internal filters cannot handle it.
    "There isn’t enough time in the day for any person to find value in what a 1,000 people have to say - our internal filters just won’t allow it. At some point all that information; whether it be valuable or just fluff, becomes nothing more than white noise".
    from Enough with the social crap I think I’m gonna puke, by Steven Hodson
  3. "Massive waste of time" / "It takes too much time" / 'Social Net Fatigue'
  4. Privacy concerns / 'abuse of trust'. Services track user activity on and off the service, and post some of those activities to the "friends". Combining information from multiple sources may reveal private information.
  5. Social nets are 'Walled Gardens'. They are not portable - information is trapped within the bounds of each service. New users must re-enter profile information, must search and re-add network contacts, and must reset notification and privacy preferences for each new social net joined.
  6. Social nets are by definition 'network-centric'. Most users are exposed only to friends of friends (i.e. two degrees of separation). This presents an obstacle to discovering true friends and contacts, most of the potential being outside of your network.
  7. No Business Model beyond popularity and possibly advertising. Also, because new users on social networks often misrepresent themselves and enter false personal information, demographic data for advertisers is therefore unreliable.
  8. The "superficial emptiness"
  9. The "what's next?" phenomenon (after exhausting the novelty of the site) / Lack of Innovation
  10. Not granular enough - no ability to group friends and contacts in categories, or indicate how close or trustworthy those relationships are.
  11. Tired of having to add friends or accept friend requests in all of these networks.
  12. Use a given service only because that's where your friends are.

Proposed Solutions:

Many feel that Identity/Info concepts like OpenID, OpenSocial, FOAF, the 'Semantic Web', Microformats, have great potential in solving a few of the above problems.

"a distributed, user-centric identity scheme would destroy almost every "walled garden" social software application on the web".
from Identity Management Will Destroy Social Software, by Brian 'Bex' Huff

The idea is that each internet user would have a single universal and portable profile that would be used and understood by all services, thereby elimiating the need to enter and configure the same information and connections on every new service. Ideally, this would have the effect of removing the walls between services, creating a single large community or 'cloud' where "relationships transcend networks/documents".

The social and data islands that dot the internet can clearly be helped by some kind of standardized profile that can be uploaded to (and modified by) each service. The burden of registration and establishing relationships would be greatly reduced. Such a profile can grow to include all the data that a person might share, including photos and information, music, movie, web site favorites, etc. As long as all services agreed on standardization, this should work pretty well. As an example, browser standardization is largely successful - though differences do exist and can be frustrating for developers and surfers alike.

The Next Revolution:

Schemas, however, will not solve most of the issues mentioned above, and some are made worse (like privacy concerns). Some even argue that standardization and identity aggregation would not be entirely apprieciated. As much as schemas depend on FOAF information, most of the problems with social networks will remain. If one really wants to 'kill' social nets, one needs to get rid of the 'net' part, i.e. the degrees of separation. What is lacking in the current data islands and the proposed schema solutions is a way of harnessing the true power of the collective to actually reduce information overload and increase discovery. The next revolution in content and relationship discovery can only arise with a completely new kind of information filtration and recommender technology.

"The social web will be powered by recommender systems".
Open Issues in Recommender Systems
John Riedl, Bilbao Recommenders School, 2006

The true power of the collective can only be realized with the proper integration of social media, new universal discovery techniques, and associated detailed portable identity and personalization info. The result is a Social Web based on one degree of separation: all people and things are related to each other directly, with each such relationship differing only in type and strength. More on this new paradigm shortly.

Jan 5, 2008

Cause vs. Effect of Human Preference

"One crucial unsolved problem for recommender systems is how best to learn about a new user".
Getting to Know You: Learning New User Preferences in Recommender Systems
Rashid, et al, 2002


"Success comes from understanding both data and people"
Open Issues in Recommender Systems
John Riedl, Bilbao Recommenders School, 2006


"The problem with recommendation systems is... it measures and acts upon the effect, not the cause".
Response to “UIEtips Article: Watch and Learn: Recommendation Systems are Redefining the Web
Adam Smith, 2006

So far, the internet has been all about effect. What other people say they like, you might also like; what you liked in the past suggest what you may like in the future. Google does it with PageRank; Amazon.com and Netflix do it with their recommender systems. They act based on your, or others', past preferences (the effect) rather than the cause of your past preferences. As you interact with the web, applications can record your actions and choices in order to create a filter with which to formulate suggestions that you might appreciate in the future.

But this is not the way the natural social process of recommendation seeking works. If you really want a good recommendation you ask someone who knows you well, as an individual. This is the way good friends do it. We accept recommendations from good friends because they understand our core identity (hopefully) and have no ulterior motives (hopefully). For example, as a single guy, I will never again go on a blind date unless the intermediary is a good friend who understands my taste and my attitudes, values, personality, etc., as well as that of the prospective date. One could make an assumption based on my past dates and relationships, but it would be an assumption based on insufficient (see below) and indirect data: the effect rather than the cause.

What is the cause? Preferences do not appear out of thin air, they are a result of your core identity: some combination of nature and nurture, your genes and your cultural and social influences, the configuration of your brain. This is the direct cause of your preferences: it is your preference engine. Unfortunately, it is a black box that we cannot really open. Possibly in the future there will be a scanning device that can capture and replicate your precise neural configuration. With this copy, and sufficient understanding of the human mind, we might be able to accurately predict your choices. In making a choice, the steps are:

  1. Core Identity + Exposure -> Preferences (i.e. Brazilian Supermodels)
  2. Preferences + Availability -> Choices      (Damn!)

Current recommender systems, such as collaborative filters, attempt to simulate a filter at the second stage. What we need is a way to accurately simulate your filter at the first: not quite a copy of your brain - but close.

Your preferences are also extremely limited by your limited exposure. Take music as an example. I love music, but I have only heard a tiny fraction of a percent of all music. So how the hell can my current favorites be expected to be entirely descriptive of my true taste or ultimate favorites? I have been exposed to that which is largely popular, better marketed, in English, etc. Music recommender applications suffer from this limitation: they consider only what I have already heard, and so they receive highly skewed data about my true taste. It would be great to have a good friend who is the ultimate "long tail" DJ and can match me to music based on his knowledge of my core identity and detailed knowledge about all music and musical tastes.

"Thus, the task is not so much to see what no one yet has seen, but to think what nobody yet has thought about that which everybody sees".
– Arthur Schopenhauer

It seems obvious that far better recommendations would result from an intimate knowledge of a person's core identity. But identity is mysterious and unapproachable; better left to the fantasies of pipe-smoking psychologists. In reality, it is the chain around the elephant's leg. We all have the tools to break free from the constraints of assumption, but smart people have not previously applied themselves to the task.

Jan 2, 2008

Current Recommender Types

There are a number of types of recommender systems currently available. They vary significantly in their mode of action and ultimate user experience. In terms of results, recommender systems are expected to offer sufficient good quality recommendations ('New Favorites'). In addition to this, the quality of the results is also dependent on minimizing false positives ('Trust Busters') and false negatives ('Missed Opportunities'). In other words, users should also not be shown inappropriate results and should not be denied appropriate results.

The quality of the user experience is also influenced by the time and effort required to give the recommender system enough information to make minimally reasonable recommendations. Users are sometimes asked to fill out lengthy questionnaires, or applications require that a user's history of choices or ratings be observed and recorded. It takes time and effort before things start working well. These days, users don't like to wait for anything and expect immediate gratification - delivering instant results upon quick registration is called 'cold start'. However, existing applications that permit a 'cold start' lack anything close to sufficient information, explicit or implicit, required to make accurate, high-quality recommendations.

There are a number of strategies that recommender systems are taking today. These include:

  1. Non-personalized: "Web 1.0" technology offering the highest rated or most popular items to all users. No intrinsic personalization, poor quality results, but immediate.
  2. Demographic: Require some knowledge about the user in order to group similar users together (i.e. by age, gender, area code, other similar features). Poor quality recommendations, low personalization, though slightly better than the above. May require "private" information, and depending on the length of the questionnaire, registration can take time.
  3. Simple answer or ratings matching: Matches users based on explicit matching of answers, selections, ratings, etc. Makes recommendations with extremely limited scope, many missed opportunities, requires answers or observations.
  4. Heuristics, probabilistic models (Bayesian, Markov), decision tree, neural net, etc. An application must collect a large amount of user-item preferences, or user/item features before quality recommendations are possble. This approach attempts to identify the underlying logic (or apply certain assumptions, in the case of heuristics) to a user's choices.
  5. User-based Collaborative Filtering: similarity of historical choices or actions allows the application to find highly correlated users. The assumption is that users who agreed in the past might tend to agree in the future. Limited immediate results, most items will not be rated/answered (sparsity). Users with non-typical opinions or taste (the 'long tail') may not get good recommendations.
  6. Item-based collaborative filtering: Finds items that tend to be preferred together. Limited immediate results, and users with non-typical opinions or taste may not get good recommendations.
  7. Content-Based: Find items with similar features (Keywords, author, genre, i.e. DNA) to known preferences of a user. Items must be properly and thoroughly represented as a set of features - this generally requires a large staff. Generally limited to a single domain as there may be few cross-domain features. Limited immediate results.

There are many recommendation engines and recommender applications available on the internet and many more seem to be popping up all the time. Currently they all have severe limitations and offer mediocre to poor quality results when compared to, say, recommendations by a best friend. Examples of current applications include:

  • eHarmony requires a very lengthy questionnaire and uses a proprietary empirical heuristic to match people romantically. It's success depends on the quality of the questions and the heuristic, the person's willingness to answer truthfully, and the person's willingness to spend a few hours to register. Mixed results are reported, but there is certainly an advantage over matchmaking sites that allow daters to make their own bad choices.
  • Pandora and Last.fm both recommend music though they do so in different ways. Pandora's large staff must determine the separable features ("DNA") of a song and observe a user's choices in order to extract common features of a user's preference. Last.fm seems to work by grouping users of similar taste. Both suffer from reduced choice diversity for slightly different reasons. Both are mildly satisfactory, but also suffer from excessive false negatives and false positives, and require recording your existing preferences. Two roommates using the same account will likely see poor results.
  • Amazon.com's recommendations work by observed a user's choices and activity and grouping items (books, CDs, DVDs, etc.) that tend to be chosen or viewed by the same users. After viewing or choosing items, you are presented with: "users who liked X (the currently viewed item) also liked Y (a correlated item). As may be considered a typical pattern, users who buy for multiple people, like for children or friends, will likely see poor results.
  • Social DNA sounds like it works similar to Pandora, but the granularity is significantly greater, and unlike eHarmony, there seems to be no heuristic - matching is all or nothing (i.e. explicit ratings and questions). This is expected to lead to extremely high false negatives, relatively few true positives, and, since matches will likely occur with only a tiny fraction of possible DNA (highly limited explicit information yeilds a sparse matrix), considering the complexity of human beings, mostly false positives.

In order to get relatively high quality and accurate recommendations, a large amount of explicit ratings/choices (and/or possibly implicit activity) must be recorded. This is extremely hard to do: users are less likely to maintain interest while the machine learns, and this will be increasingly true in the future. Currently, users must be content with mediocre results, but a trade-off will develop between accuracy/quality and user patience.

Another frequent limitation is that users can act maliciously or inappropriately to skew results. Due to the limitations of current applications, users may feel the need to modify or exaggerate their choices in order to get better results. On the other end, users who want to promote certain items to others may give or encourage false ratings, views or descriptions (called 'Shilling') through manual or automated efforts or attacks. Also, privacy becomes an issue as users may explicitly or implicitly reveal private information about themselves. Details include demographics, personal details, taste, ratings, opinions, etc. Systems administrators (and possibly hackers) will have free access to this data.

Accurate, high quality, robust and broad scope recommendations have been the holy grail for internet futurists for quite some time, though we are still a long way from that goal. The problem is largely technical: recommendations are a really tough problem. Mathematics/statistics, clever algorithms and artificial intelligence are stretching the results to the maximum, given the poor quality data available from users during registration or interaction with the application. The solution is to get high quality data about the user's identity or individuality and match based on that, rather than matching based on a user's history. The problem is that teaching the machine about the core identity of a person is science fiction. Or is it?