Introduction

Affinax is a novel universal targeting technology that anticipates a person’s preferences in any domain of life, with much greater speed, without popularity bias, and using far less input about users and items than existing technologies require. Much of commerce and discovery on the internet is currently powered by relatively primitive targeting technologies. With its substantial advancements, Affinax should be able to make a significant impact on internet advertising and discovery, information filtration, and online and mobile commerce. Join us.


Nov 26, 2009

What is the Affinax Project?

The Affinax Project is attempting to solve one of the most complex and interesting puzzles on the internet: how to predict a person's future favorites in any domain of life (social, career, products, services, media, etc.).

We have developed a completely novel technology, a cross-pollination between Bioinformatics and the Semantic Web. If we are right, it will make such predictions cold, rapidly, without tracking a user's behavior, without data mining, without a lengthy registration process, and in any domain. It should do so without many of the drawbacks of existing recommender technologies.

Due to the nature of the technology, it is not possible to perform a simulation or testing using Netflix data. We must build a live proof of concept with real users and real objects to match them to. This will require a few Facebook applications and some efficient algorithms.

We hope to evaluate the technology to academic standards in order to demonstrate that it works. Join us and help us solve this puzzle - any and all relevant skills are welcome.

Jun 18, 2008

Changing the World: The Business Model

What is the business model for the next internet revolution? In this article, I review web monetization issues, especially that of web 2.0. I propose a monetization solution where any site with users, commercial items, and even visitors, can significantly increase its revenue and reduce marketing and advertising expenses. Our affinity targeting system monetizes itself in the process.

Traditionally, business models for web applications, communities, blogs, etc. are an afterthought. Apps and networking sites dream of reaching critical mass and then selling to Google, Microsoft, Yahoo, etc. Thus the revenue model is actually an exit strategy. This dream has been fueled by the observation that the purchase price for such sites is related to reach ("eyeballs", size of the audience). This is reminiscent of Metcalfe's Law. A more thorough analysis of the market value of social networks was recently posted in TechCrunch, by Michael Arrington.

A very few fortunate web startup founders do not need to consider a business model beyond their big exit, even in the current economic climate. The new owners, however, will be forced to monetize their sexy new purchase. For the vast majority of web startup founders the business model will be important and is often considered and tested from the very start.

The default monetization method is advertising, preferred by 58% of web startups (this figure includes affiliate marketing) according to Bizak. Of the strictly advertising sites, Google's AdSense is adopted by 54%. I imagine this number is higher for web 2.0 social sites. Nonetheless, AdSense earnings per visitor (EPV) are the lowest among the various monetization methods. As an example, Tom OKeefe writes about Mahalo's poor Google AdSense earnings, and Allen Stern predicts that affiliate revenue could surpass Google AdSense revenue for Mahalo in the long-term. Decrying AdSense as "worthless", Tom OKeefe asks "What's Next?".

Many of the hugely popular sites are struggling to better monetize. YouTube, for example, is struggling to justify its $1.65 billion purchase price. Also, Facebook faces a rough road ahead, with "only" $150 million in ad sales in 2007 and projections of $265 million in 2008, and Aidan Henry proposes solutions to the "perennial debate surrounding Twitter's revenue model", and the CEO of Mahalo, Jason Calacanis, even chimes in with his own Twitter business model suggestions.

This struggle may no longer be necessary. Our novel Affinity Targeting technology allows a user to be targeted to entities they are most likely to appreciate, in any domain of life. On-line communities and sites with users can increase their earnings by adding both their site and users into our system. Those users are then targeted to entities of interest (products, services, media, jobs, sites, other users, etc.). Targeting, leading to a commercial transaction, will result in affiliate revenue, part of which is shared with the originating site of the purchasing user. According to Bizak.com, affiliate earnings per visitor are 16 times greater than AdSense earnings. An affiliate model with highly specific user targeting should increase such earnings significantly.

The benefits don't end at monetizing eyeballs; sites and sellers can precisely target users to themselves and their items, thereby increasing sales and reducing costs. Communities, groups and fan clubs all seek to attract enthusiastic members. In our system, users will be targeted to the communities they are most likely to appreciate, leading to increased membership and customers. Also, sellers and providers benefit by precise targeting of users to products and services they are most likely to purchase. This will increase sales, and reduce dependence on marketing, advertising, SEO, etc. All sellers and providers are required to do is profile their products, services, jobs, etc. for the system (in the unique way we need the info) and agree to our affiliate model. There are no other costs to them.

The figure above (click to enlarge) depicts a solution to several critical needs: internet sites and sellers must increase their revenue, reduce expenses, and attract the most ideal new users or members. In our solution, sites and sellers add their existing users (no private information is required) and/or items into the system. Users are then targeted (via the targeting engine) to three different kinds of entities (circles): other users (if they are so inclined), groups (sellers, sites, communities, etc.), and items (products, services, media, jobs, etc.). When a user is targeted to a commercial item and makes a purchase, the seller provides an affiliate fee to the system, part of which is shared with the group that brought the user to the system. Also, if a group added an affiliate item into the system that they are not directly selling (for example, an Amazon.com book), part of any affiliate fee earned from that item is shared with that group. Follow the green arrows to see the flow of money. Note that sites and sellers may contribute users and/or items, and users and/or items may be entered independently of a site or seller.

Our plan is to grow the system organically by bootstrapping it on FaceBook and OpenSocial. We will do this in a way where critical mass never becomes a significant issue. At a certain point the affinity matrix of objects will be large enough to attract sites, communities, sellers and providers. At that point, we will offer our own API, customizable web interface, or client software, such that a site and its users can interact with the system the way the site sees fit. In the beginning we will use existing affiliate and payment processors, but eventually this will likely be done with our own systems. Our affinity engine and business model represents the ideal win-win solution for sites, sellers and users: better targeting, discovery, user satisfaction, monetization, reduced expenditures, etc. Ultimately, we see this targeting system attracting a significant fraction of on-line sites, communities and commercial entities.

Apr 25, 2008

The Affinity Graph

Is the Affinity Graph the anticipated Internet Singularity?

Tim Berners-Lee, the father of the World Wide Web, has been talking about this concept of the future "Internet of things." By "things" he means the people and other objects on the internet, and he argues that those things and the connections between them are the key aspects of the web. This, he argues, is the primary evolution of the walled gardens of "Web 2.0" into something far more important. He calls this evolution the Giant Global Graph, while others call it Web 3.0 or the Semantic Web.

The use of the term "Graph" has been met with a bit of consternation by those who argue that we already have the term "network" to describe these connections. Robert Scoble describes the difference in reference to social relationships where your social network is who you know, while your Social Graph describes who you are associated with based on common objects of interest (passions, concerns, politics, religion, work, school, etc.). He says: "The Social Graph is NOT my social network. My Social Network is my friends list. But the Social Graph shows a LOT more than that." A Graph then is not simply the simple connections, but the types and context of connections and the strengths of those connections.

While the Graph will ultimately know what is currently song #3 on your iPod, some metadata about the song, as well as all the other people who have the same song as #3 on their iPods, one must wonder "what's the point"? How does this help me discover that I should be a dolphin trainer, or to find new people that share my way of thinking? Once the monstrous amount of data on the Graph is accessible to robots, many will be applying data mining and filtering algorithms, and massive amounts of CPU, to try to generate usable information about the people and other objects on the web.

Tim Berners-Lee envisioned the ability to create "intelligent agents", sort of like advanced email filters, to perform many of the more tedious tasks, easier and faster. I talked about a similar kind of agent in the post "Your Identity Proxy". Real progress will be achieved when future technology will be able to offer the users a much more personalized and enjoyable experience, and of course better targeting of those users with commercial objects. In practical terms, this will require the storage of as much data as possible about users and their objects so that futuristic computer programs will be able to make sense of the identities of those users and the meanings of those objects, and also to make predictions about the basic affinities between the objects and users. Some even predict that given enough information, "the machine" will begin to transcend the metadata and attain a kind of sentience (or sapience).

This is similar to the ideas of Gary Flake who hypothesized that continued advancements in networked information and other technologies will create a "virtuous cycle" leading to what he terms the "Internet Singularity". As with the Global Graph, we are far from advanced enough technologically to see these concepts realized in the near future.

Let me propose that both the Internet Singularity and the Global Graph are overlapping concepts that are largely achievable today through the Affinity Graph, a major element of this project. As of late 2007, we have had the technology to begin to store the affinity relationships and strengths between users and all other objects on the internet and mobile devices. This is a much simpler abstraction, where we store the most important kind of meaning (affinity) for the typical user. In other words, the most important benefit of the Graph or Singularity, e.g. searching, personalization, and discovery, can be generated, stored and queried in a much more feasible way than is predicted for the Graph, Semantic Web, or Singularity.

With the Affinity Graph, the similarity in meaning of objects, including people, will be known. Universal categorization, classification, hierarchies and affinity matching will all be made fairly trivial. Users will have immediate access to their future favorites in every domain of life; likewise objects (and those that care about them) will know which users are likely to most appreciate those objects (marketers? advertisers? evangelists?). This is the point at which the Utopian dreams of internet visionaries is realized. The Affinity Graph does not make irrelevant other forms of abstractions or metadata upon which computer scientists are free to set loose their strong AI. There are many other kinds of meaning, and those will be explored by computers in time.

The Singularity is here, as is the Global Graph, in ways that are most important to users.

Mar 19, 2008

Big World, Short Life

The world is big and life is short. We've solved this problem.

To restate the problem: in our short lives, we are unlikely to ever find the people and things that we would most enjoy and appreciate. This is unfortunate.

Have you been feeling the pain? Not finding your soulmate? No best buddies? Have the suspicion that the most incredible music is out there, somewhere? Feel like you never found your ideal vocation? Actually there is little chance you could have found the optimal things in life. As I mentioned in a previous post, it would take us thousands of years to meet every other human, listen to every song, read every book, evaluate every vocation, etc.

Many of us have grown to accept our mortality, and the tyranny of time. We've had to accept the limitations in the time we have to explore options and find those optimal things. This acceptance has silenced our normally inquisitive and innovative inclinations to find solutions to problems; it seems an insurmountable problem, and, frankly, dwelling on mortality is not entirely pleasant. Those who haven't accepted mortality will deny the existence of the problem and thus the need for a solution.

I didn't set out to solve the 'short life' problem. Actually, that's not entirely true - I'm a huge health and nutrition nut: I plan to be healthy to at least age 120. But in this post, and in this project, I'm not talking about extending human lifespan. It is the 'big world' problem that we are addressing, and the problem may not be so big after all. The innovation came first, and then it occurred to me that the thousands of years it would take to find your favorites could be compressed significantly.

I'll use the analogy from a previous post. Many of us receive hundreds of emails every day. Without an email filter, it would take us hours to sort through and pick out the emails we prefer. We don't have enough time to perform this task, nor would we want to. The email filter, if it works well, presents to you only those emails that you are most likely to prefer. Reviewing your emails becomes a much quicker and simpler task. Information overload is reduced.

In a similar way, our discovery engine sorts through thousands of years of people, media, opportunities, ideas, causes, products, etc, and presents to you only those things you are most likely to prefer.

So you need no longer fear your own mortality. :-)

Mar 12, 2008

Your Identity Proxy

There seems to be a bit of confusion about the distinction between the terms "identity" and "identification" in popular discussion. The terms are often used interchangeably, and are used differently in different contexts. I thought I would write a bit about these and other related concepts, including a new concept that we introduce.

The terms are infused with the complexity of multiple disciplines (philosophy, psychology, sociology, neurology, religion, etc.), each with their own usage and take on the meanings. To add to the complexity, identity is now an important concept with different meanings for government, commerce, and the internet.

Who are you? Are you different from your neighbor? From your identical twin? Is there something about you that distinguishes you from everybody else? The subjective versions of this are the "self-image" (a person's own model of his identity) and the identity perception of someone by others. Is it "the self" or the the ego of psychology? Is this the "soul" of certain faiths? Is it the mind? The brain? What about the body? Is identity a product of nature, nurture, or both together? Many questions.

Most of us cannot be relied upon to accurately describe our identities, though sometimes best friends can get pretty close (we get closer, see below). This is the reason metadata contributed by users about themselves or their works is not considered accurate. A personal tag cloud is just an ego trip. It is highly subjective. Web page meta keywords are no longer relied upon by search engines or advertisers because they are so inaccurate. This is the source of the delay in the promised "Semantic Web" revolution.

I like to think of Identity as that mental thingy that distinguishes you from every other person. It is the objective, non-corporeal entity that is the sum of all the biology and environmental influences that constitutes what it is to be you, at this moment in your life. Despite the similarities, you have a different identity from your identical twin because your minds and bodies have had different experiences. You also have a different identity than yourself of one year ago because you've had new experiences... and of course your brain has suffered some oxidative degeneration ("vegetable oil", anybody?).

But in the real world, and for the purposes of government, commerce, most things that make the world work, it is the corpus that counts. You are you because you are contained within the body of you. Identity equals body. The body that is recognized as you by facial recognition, and authenticated by fingerprinting, retinal or corneal scanning, etc. Science fiction has enjoyed this mind-body identity confusion with numerous examples in movies and television ("This body is not mine, and I have to be clever to convince my friends of my true identity").

Now, identification is the assertion that you are actually you ("I may look like a fly, but it's really me!"). Having the face of Nancy is an assertion that you are Nancy, i.e. your friends and family will identify you as being the identity they call "Nancy". Identical twins and masks can confuse the identification in opposite ways.

One can authenticate their identification with some available mechanism that provides some level of authentication. Visual similarity to your picture ID card (is ID "identity" or "identification"?) is a common form, voice recognition on the phone is another common one; "Hello, it's Nancy" works only if you sound like Nancy. We can authenticate the body fairly well, but the mind is more difficult ("Nancy doesn't seem like herself today. Maybe she's been taken over by an alien.").

On the internet, there are various uses for the terms identity, identification, authentication and anonymity. Your Facebook profile is a reflection of your identity, or an exhibition of your identity, most probably with identifying elements like your name and photos. In some cases you may have multiple online "identities" representing different facets of your actual identity. Those facets are sometimes identified by usernames and avatars indicative of the identity or sub-identity or idealized identity they represent.

For a new user, identity may initially not be important: an anonymous user is self contained, requiring no identification or authentication. But as other users get to know that user, they will expect that it is consistently backed by the same identity. As it develops a reputation, the identity behind that user identification will want to maintain exclusive ownership of that identification, via some kind of authentication that ensures such exclusivity.

There are many systems for authentication, each attempting to ensure that the user instance is an active reflection of the same identity. Online banking is an example. There are two levels of authentication here. First, the owner of the username is the identity called "Nancy" with these identifying personal details. Second, that the username instance (i.e. the just logged in identity) is also the "Nancy" identity (access management). The first is corpus related: Nancy walks into her bank and gets her login details based on corpus identity. The second is mind identity: does Nancy remember her username and password, or where she scribbled them?

Our project introduces another concept to the scene: the identity proxy. In our case, it is an objective proxy of your identity that makes choices on your behalf, likely the same choices you would make, even when you are not logged in. In a sense, it is like an email filter that follows your instructions and helps you deal with information overload by automating that small bit of your identity that prefers certain emails over others. Ours is much more powerful in reducing information overload because your identity proxy automates the filtration of all available information and options, in every domain of life. Your identity proxy is an accurate and objective reflection of your identity and it understands and automates your decision making processes. There is no greater weapon against the tyranny of choice and information overload.

Without an email filter, it would take us hours per day to delete the spam and read the relevant emails. We would quickly lose patience and only find a fraction of real emails. Likewise, it would take us thousands of years to meet every other human, listen to every song, read every book, evaluate every vocation, etc, in order to find the ones we like. It's a big world, and, sadly, life is short. The identity proxy does not live our lives for us - it makes our lives richer by allowing us to find those things that we wouldn't have found unless we lived for thousands of years.

Also, at it's core, the identity proxy requires no corpus identification, i.e. no personal or demographic details are necessary in the registration process. Nobody can use the registration information to track you down (track down the corpus). Privacy is intact.

Your identity proxy is singular. Having more than one identity proxy is a waste of time because every time you register accurately the system should see you as being identical (or close) to your previous proxy. Registering inaccurately serves no purpose because the proxy will make choices that do not reflect your identity, and the choices will not be as fulfilling for you.

Feb 12, 2008

The Serendipity Revolution

Traditionally, the success of recommender systems is evaluated by predicting accuracy of recommendations off-line using existing datasets. For example, see the million dollar Netflix prize for a meager 10% improvement of their collaborative filtering algorithm. Netflix provided access to 100 million of its customers’ movie ratings to train new algorithms and test them. In other words, the algorithm is judged more accurate the more it recommends movies the user has already seen. Recommendations based upon this traditional accuracy metric are not the most useful to users.

Researchers know that success of recommendations is better measured by recording user satisfaction - the positive emotional response at having discovered something new that one likes. But that is more difficult to measure - as it requires a community of users and a useful mechanism to compel (or at least strongly encourage) the reporting of satisfaction, it's strength and perhaps type. Satisfaction of recommendations seems to follow in ascending order of the following recommendation types:

  1. Low quality, low accuracy recommendation. Users obviously don't appreciate having their time wasted in evaluating something that the system should have known the user would not be likely to appreciate. These are "trust-busters"; the user will lose trust in the system.
  2. An accurate, but known recommendation. An item the user is already aware of. The user likes the item, but it is not novel. Trust is maintained because at least the system recommended something that the user already likes. Too many of these recommendations imply an excess number of false-negatives or "missed opportunities".
  3. A novel, but obvious recommendation. A novel recommendation is something new and appreciated, but something the user would have discovered on his/her own. For example, a new song from a favorite musician, or a new movie from a favorite director. The user will have a positive, though muted, reaction. Many users will suspect that there were "missed opportunities", given the huge number of unfamiliar items in any domain.
  4. A serendipitous recommendation. A serendipitous recommendation is something new, non-obvious and appreciated that the user would likely not have discovered on his/her own. For example, an unfamiliar song from an unfamiliar musician, or a unfamiliar movie from an unfamiliar director. The user will likely have a very positive reaction, though it has been argued that, in some users, such recommendations may be seen as obscure and not immediately appreciated.

The serendipitous recommendation is obviously the ideal for most users, the problem is that collaborative filters tend to focus on what is commonly known and popular - items that the user has heard about or items that the user would have experienced eventually because of their "blockbuster nature". Many of the most interesting items for the user may be buried in the "long tail", so some collaborative filtering systems have attempted to tweak their algorithms to try to maximize this type of recommendation by reducing the more popular recommendations. Even so, recommendation diversity tends to be reduced in collaborative filtering systems, leading to a large number of false-negatives or "missed opportunities".

Recommendations based on a user's core identity will not focus on the popular, or items from artists or directors the user likes, or that the user's friends like. Instead, the user will be recommended items from the entire item landscape that by definition the user is most likely to appreciate based on that core identity (their "preference engine"). Thus the recommendation diversity (coverage of item space) within a domain (such as music) is as large as the diversity of items within that domain, leading to a large number of serendipitous recommendations - possibly the vast majority. Keep in mind that the number of domains in our community is also unlimited, and the same core identity can be used to recommend anything and everything in life.

Jan 14, 2008

One Degree of Separation

Social networks rely on your primary network - your existing friends and contacts - to introduce you to THEIR friends and contacts. Each of the people in the network are called 'nodes', each with one or more connections to other nodes. Each of those connections is sometimes called a degree of separation; a friend of a friend (FOAF) would then be two degrees of separation. The famous phrase "six degrees of separation" was based on work by psychologist Stanley Milgram who determined that any two Americans, connected in the nation-wide extended network, are separated by an average of five intermediaries, i.e. six steps or degrees.

Despite their connectedness, two people separated by so long a chain are extremely unlikely to ever meet. In fact, we usually only ever meet the friends of our friends: an extremely small fraction of the larger network. Web services like LinkedIn, the business contact network, tracks your chain to three degrees of separation - though I wonder how often the third degrees ever connect. [Friendster tracked the chain even further, and this pursuit has been credited with Friendster's downfall, as tracking long chains is very difficult computationally and has much larger hardware requirements.]

Online Social Networks are not really social, and the network - as degrees of separation - serves mostly to separate. So, if one really wants to 'kill' social nets, one needs to get rid of the 'net' (the multiple degrees of separation that separate people) in order to bring people together. Jyri Engeström argues that social networks should not be based on individual connections between people that can be counted and accumulated, rather people must be connected by shared objects. We agree and take this to the next level by making everything in the virtual community an object, where each object is connected to every other object.

The New Paradigm

As proposed in the last post, what is lacking in the current data islands and the proposed schema solutions is a way of harnessing the true power of the collective to actually reduce information overload and increase discovery. This will require a revolution in content and relationship discovery that can only arise with a completely new kind of information filtration and recommender technology.

"The social web will be powered by recommender systems".
Open Issues in Recommender Systems
John Riedl, Bilbao Recommenders School, 2006

The true power of the collective will be realized with the proper integration of social media, new universal discovery techniques, and associated detailed portable identity and personalization info. The result is a Social Web based on one degree of separation: all people and things are related to each other directly, with each such relationship differing only in type and strength. The following graphic is a representation of such a "one degree" circle of people relationships, but keep in mind that each person is also similarly related to all items, ideas, endeavors, etc. in the system as well.

Critical to this new paradigm are the new universal discovery techniques that I've hinted at previously. Current recommender systems, including collaborative filters, are too primitive and limited to accomplish the task. Instead, we have applied certain bioinformatics concepts to solve the puzzle of simulating the human preference engine without requiring "strong AI". This starts with a quick determination of a person's "core identity", that internal mechanism which is responsible for generating appreciation, and sifting through the chaos and making choices.

Determining that "core identity" is a critical breakthrough as it allows us to quantify the relationship (strength and type) between all people, and between all people and all other things in the system. It also can yield portable data that can be used to quantify such relationships between users and items from multiple data islands, and can even be used in mobile devices and in real-world activity. This discovery system involves no collaborative filtering, psychological testing or interpretation, statistical or stochastic methods, etc.

"But there is no go-to discovery engine - yet. Building a personalized discovery mechanism will mean tapping into all the manners of expression, categorization, and opinions that exist on the Web today. It's no easy feat, but if a company can pull it off and make the formula portable so it works on your mobile phone - well, such a tool could change not just marketing, but all of commerce."
The race to create a 'smart' Google
by Jeffrey M. O'Brien, writing for Fortune Magazine

In addition to the current benefits of the social web, the integration of these universal discovery techniques will allow:

  • A brief one-page registration with no need for private information. Qualifies as 'Cold-Start' for people and also items, ideas, endeavors, etc.
  • Immediate access to promising relationships of all types, i.e. universal recommendations. These relationships are the predicted interest and affinity between a person and all other people, music, movies, books, recreation, groups, products, services, ads, travel destinations, vocations, jobs, teams, politics, religion, ideas, websites, articles, news items, games, etc.
  • Portable data that can be compared and relationships quantified. This portable data can be used between social and data islands, for mobile devices and in real-world activity.
  • No language or cultural barriers: no folksonomy or semantic constraints.
  • No need for existing relationships. Emphasis is on relationship discovery, though existing friends and contacts are revealing.
  • No need to observe history of actions and choices. A one-page registration is enough to provide significantly more information, and better information, than collaborative filters can accumulate.
  • The new system will act as a good friend who knows you well and delivers trusted recommendations of all types, both solicited and unsolicited.
  • Reduced privacy concerns as personal or demographic data is unnecessary.
  • Automatic person-level granularity. Each relationship has a strength and type.
  • Universal recommendations allows for highly successful affiliations of all types, direct sales and downloads, and highly targeted advertising as the diverse business model.
  • Ratio of discovery to effort is high. No need for constant messages, spam, requests, friend searches, etc.
  • Discovery is filtration, so 'information overload' and the 'tyranny of choice' are greatly reduced.
  • Enables highly personalized search engine functionality, news aggregation, and many other forms of person-level information filtration.
  • Constant excitement of discovery, so no "what's next?" reaction. No limit to novelty and interest, little boredom. No feeling of wasted time.
  • Highly useful and usable: the keys to success of any product or service.