The Affinax Project

May 6, 2018

Is the Singularity an Emergent Property of the Virtuous Cycle iterating within a colossal Theory of Mind matrix?

I recently took a look at the original patent application and realized that Google has referenced it in ten of their patents. The application has also been referenced by Microsoft, Apple, IBM, Ebay, Yahoo!, Kodak, BT, Fujitsu, Nokia, Myspace, and many more. I'm not sure how much this indicates adoption of some of these ideas, but it got me thinking about Affinax in the context of the state of recommender systems and AI today. I must admit that even after some years of working on other things I would still be very interested in collaborating with those in the field who might want to, or are currently trying to, develop and implement these ideas as revolutionary and disruptive products. So here's my attempt to summarize Affinax and some of its potential benefits and implications in hopes of starting some new conversations about it.

A good way to understand Affinax is in terms of the theory of mind (ToM). ToM is central to human social behavior and may be the key factor underlying our distinctive cognitive abilities. For example, ToM may have provided the foundation for the emergence of music and language in human evolution. Essentially, it is the capacity to explain and predict people’s observable actions in terms of unobservable mental states such as beliefs and desires. Our brains may be constructing internal representations of other people, running simulations and making predictions. I've referred to this concept in the past as a good friend who knows you well, and as your identity proxy. Affinax mirrors ToM except that it considers all things in the world (people, groups, products, other objects, ideas, etc.) to be social agents because all things in the world form relationships with humans. The mental attitudes of (or for) these agents can be reduced to mental representations (i.e. mental models) in the form of human ontology (traits, demographic data, interests, beliefs, affiliations, history, desires, genetics, etc.).

In Affinax, the representation of an agent is constructed by the agent's relative affinities for other agents whose representations' ontologies the first agent's representation assumes in proportion to the level of affinity. That first agent's representation then contributes its ontology to other agents' representations according to their affinity for it, and so on and so on. A virtuous cycle is established within the increasingly dense (and accurate) matrix of representations whose emergent property can be thought of as a universal social engine. This engine, similar to ToM in humans, is able to make immediate predictions of interest, behavior, risk, etc. that would disrupt multiple industries including marketing, recreation, career, social, medicine, insurance, etc.

The patent application describes a very simple system and example whose implications may be missed at first glance. But the expanded system evolves (develops iteratively) into a dense matrix of ontology and affinity that would represent everything in human context. All things become predictive social behavioral representations, as per ToM, which enables the development of products that benefit from precise targeting and behavioral prediction.

Other implications of this technology are reflected in the intentionally sensational title of this post. For your consideration:

Just as consciousness is an emergent property of the human brain, artificial consciousness and intelligence may result from similar emergent properties of a complex, socially informed, non-biological system.
In humans, the machinery that computes information about other people’s awareness may be the same machinery that computes information about our own awareness. ToM, which Affinax mirrors, may have been a key factor in the development of self-consciousness.
The way in which a thing relates to humanity may be considered the ultimate meaning of that thing, and therefore constitutes a reasonable building block for consciousness or sentience in a system (see below) processing the meanings of many such things, such as Affinax.
Affinax is a learning prediction engine, and prediction is a computational correlate of consciousness.
The subjective internal experience that we call consciousness may simply be the flood of ontological data associated with whatever is the focus of awareness or attention at any particular moment. Your ontological data for the color blue, based in part on the history of your personal experiences with blue, i.e.the associations, significance, emotion or "meaning" you attach to it (see above), are different from mine and that's why it is said that we see them differently. Affinax would be able to provide similar meaning to any non-biological ToM system focusing on a particular object or idea and would therefore provide it with the same internal experience of consciousness that we humans feel.

This post is not an appropriate forum to describe the above concepts in greater detail. Needless to say, there has been much intellectual product generated on the subject matter that has not been published and may be of interest to technology companies either already working on the technology, or with the resources to do so. Anybody interested in a collaboration?

Nov 26, 2009

What is the Affinax Project?

The Affinax Project is attempting to solve one of the most complex and interesting puzzles on the internet: how to predict a person's future favorites in any domain of life (social, career, products, services, media, etc.).

We have developed a completely novel technology, a cross-pollination between Bioinformatics and the Semantic Web. If we are right, it will make such predictions cold, rapidly, without tracking a user's behavior, without data mining, without a lengthy registration process, and in any domain. It should do so without many of the drawbacks of existing recommender technologies.

Due to the nature of the technology, it is not possible to perform a simulation or testing using Netflix data. We must build a live proof of concept with real users and real objects to match them to. This will require a few Facebook applications and some efficient algorithms.

We hope to evaluate the technology to academic standards in order to demonstrate that it works. Join us and help us solve this puzzle - any and all relevant skills are welcome.

Jun 18, 2008

Changing the World: The Business Model

What is the business model for the next internet revolution? In this article, I review web monetization issues, especially that of web 2.0. I propose a monetization solution where any site with users, commercial items, and even visitors, can significantly increase its revenue and reduce marketing and advertising expenses. Our affinity targeting system monetizes itself in the process.

Traditionally, business models for web applications, communities, blogs, etc. are an afterthought. Apps and networking sites dream of reaching critical mass and then selling to Google, Microsoft, Yahoo, etc. Thus the revenue model is actually an exit strategy. This dream has been fueled by the observation that the purchase price for such sites is related to reach ("eyeballs", size of the audience). This is reminiscent of Metcalfe's Law. A more thorough analysis of the market value of social networks was recently posted in TechCrunch, by Michael Arrington.

A very few fortunate web startup founders do not need to consider a business model beyond their big exit, even in the current economic climate. The new owners, however, will be forced to monetize their sexy new purchase. For the vast majority of web startup founders the business model will be important and is often considered and tested from the very start.

The default monetization method is advertising, preferred by 58% of web startups (this figure includes affiliate marketing) according to Bizak. Of the strictly advertising sites, Google's AdSense is adopted by 54%. I imagine this number is higher for web 2.0 social sites. Nonetheless, AdSense earnings per visitor (EPV) are the lowest among the various monetization methods. As an example, Tom OKeefe writes about Mahalo's poor Google AdSense earnings, and Allen Stern predicts that affiliate revenue could surpass Google AdSense revenue for Mahalo in the long-term. Decrying AdSense as "worthless", Tom OKeefe asks "What's Next?".

Many of the hugely popular sites are struggling to better monetize. YouTube, for example, is struggling to justify its $1.65 billion purchase price. Also, Facebook faces a rough road ahead, with "only" $150 million in ad sales in 2007 and projections of $265 million in 2008, and Aidan Henry proposes solutions to the "perennial debate surrounding Twitter's revenue model", and the CEO of Mahalo, Jason Calacanis, even chimes in with his own Twitter business model suggestions.

This struggle may no longer be necessary. Our novel Affinity Targeting technology allows a user to be targeted to entities they are most likely to appreciate, in any domain of life. On-line communities and sites with users can increase their earnings by adding both their site and users into our system. Those users are then targeted to entities of interest (products, services, media, jobs, sites, other users, etc.). Targeting, leading to a commercial transaction, will result in affiliate revenue, part of which is shared with the originating site of the purchasing user. According to Bizak.com, affiliate earnings per visitor are 16 times greater than AdSense earnings. An affiliate model with highly specific user targeting should increase such earnings significantly.

The benefits don't end at monetizing eyeballs; sites and sellers can precisely target users to themselves and their items, thereby increasing sales and reducing costs. Communities, groups and fan clubs all seek to attract enthusiastic members. In our system, users will be targeted to the communities they are most likely to appreciate, leading to increased membership and customers. Also, sellers and providers benefit by precise targeting of users to products and services they are most likely to purchase. This will increase sales, and reduce dependence on marketing, advertising, SEO, etc. All sellers and providers are required to do is profile their products, services, jobs, etc. for the system (in the unique way we need the info) and agree to our affiliate model. There are no other costs to them.

The figure above (click to enlarge) depicts a solution to several critical needs: internet sites and sellers must increase their revenue, reduce expenses, and attract the most ideal new users or members. In our solution, sites and sellers add their existing users (no private information is required) and/or items into the system. Users are then targeted (via the targeting engine) to three different kinds of entities (circles): other users (if they are so inclined), groups (sellers, sites, communities, etc.), and items (products, services, media, jobs, etc.). When a user is targeted to a commercial item and makes a purchase, the seller provides an affiliate fee to the system, part of which is shared with the group that brought the user to the system. Also, if a group added an affiliate item into the system that they are not directly selling (for example, an Amazon.com book), part of any affiliate fee earned from that item is shared with that group. Follow the green arrows to see the flow of money. Note that sites and sellers may contribute users and/or items, and users and/or items may be entered independently of a site or seller.

Our plan is to grow the system organically by bootstrapping it on FaceBook and OpenSocial. We will do this in a way where critical mass never becomes a significant issue. At a certain point the affinity matrix of objects will be large enough to attract sites, communities, sellers and providers. At that point, we will offer our own API, customizable web interface, or client software, such that a site and its users can interact with the system the way the site sees fit. In the beginning we will use existing affiliate and payment processors, but eventually this will likely be done with our own systems. Our affinity engine and business model represents the ideal win-win solution for sites, sellers and users: better targeting, discovery, user satisfaction, monetization, reduced expenditures, etc. Ultimately, we see this targeting system attracting a significant fraction of on-line sites, communities and commercial entities.

Apr 25, 2008

The Affinity Graph

Is the Affinity Graph the anticipated Internet Singularity?

Tim Berners-Lee, the father of the World Wide Web, has been talking about this concept of the future "Internet of things." By "things" he means the people and other objects on the internet, and he argues that those things and the connections between them are the key aspects of the web. This, he argues, is the primary evolution of the walled gardens of "Web 2.0" into something far more important. He calls this evolution the Giant Global Graph, while others call it Web 3.0 or the Semantic Web.

The use of the term "Graph" has been met with a bit of consternation by those who argue that we already have the term "network" to describe these connections. Robert Scoble describes the difference in reference to social relationships where your social network is who you know, while your Social Graph describes who you are associated with based on common objects of interest (passions, concerns, politics, religion, work, school, etc.). He says: "The Social Graph is NOT my social network. My Social Network is my friends list. But the Social Graph shows a LOT more than that." A Graph then is not simply the simple connections, but the types and context of connections and the strengths of those connections.

While the Graph will ultimately know what is currently song #3 on your iPod, some metadata about the song, as well as all the other people who have the same song as #3 on their iPods, one must wonder "what's the point"? How does this help me discover that I should be a dolphin trainer, or to find new people that share my way of thinking? Once the monstrous amount of data on the Graph is accessible to robots, many will be applying data mining and filtering algorithms, and massive amounts of CPU, to try to generate usable information about the people and other objects on the web.

Tim Berners-Lee envisioned the ability to create "intelligent agents", sort of like advanced email filters, to perform many of the more tedious tasks, easier and faster. I talked about a similar kind of agent in the post "Your Identity Proxy". Real progress will be achieved when future technology will be able to offer the users a much more personalized and enjoyable experience, and of course better targeting of those users with commercial objects. In practical terms, this will require the storage of as much data as possible about users and their objects so that futuristic computer programs will be able to make sense of the identities of those users and the meanings of those objects, and also to make predictions about the basic affinities between the objects and users. Some even predict that given enough information, "the machine" will begin to transcend the metadata and attain a kind of sentience (or sapience).

This is similar to the ideas of Gary Flake who hypothesized that continued advancements in networked information and other technologies will create a "virtuous cycle" leading to what he terms the "Internet Singularity". As with the Global Graph, we are far from advanced enough technologically to see these concepts realized in the near future.

Let me propose that both the Internet Singularity and the Global Graph are overlapping concepts that are largely achievable today through the Affinity Graph, a major element of this project. As of late 2007, we have had the technology to begin to store the affinity relationships and strengths between users and all other objects on the internet and mobile devices. This is a much simpler abstraction, where we store the most important kind of meaning (affinity) for the typical user. In other words, the most important benefit of the Graph or Singularity, e.g. searching, personalization, and discovery, can be generated, stored and queried in a much more feasible way than is predicted for the Graph, Semantic Web, or Singularity.

With the Affinity Graph, the similarity in meaning of objects, including people, will be known. Universal categorization, classification, hierarchies and affinity matching will all be made fairly trivial. Users will have immediate access to their future favorites in every domain of life; likewise objects (and those that care about them) will know which users are likely to most appreciate those objects (marketers? advertisers? evangelists?). This is the point at which the Utopian dreams of internet visionaries is realized. The Affinity Graph does not make irrelevant other forms of abstractions or metadata upon which computer scientists are free to set loose their strong AI. There are many other kinds of meaning, and those will be explored by computers in time.

The Singularity is here, as is the Global Graph, in ways that are most important to users.

Mar 19, 2008

Big World, Short Life

The world is big and life is short. We've solved this problem.

To restate the problem: in our short lives, we are unlikely to ever find the people and things that we would most enjoy and appreciate. This is unfortunate.

Have you been feeling the pain? Not finding your soulmate? No best buddies? Have the suspicion that the most incredible music is out there, somewhere? Feel like you never found your ideal vocation? Actually there is little chance you could have found the optimal things in life. As I mentioned in a previous post, it would take us thousands of years to meet every other human, listen to every song, read every book, evaluate every vocation, etc.

Many of us have grown to accept our mortality, and the tyranny of time. We've had to accept the limitations in the time we have to explore options and find those optimal things. This acceptance has silenced our normally inquisitive and innovative inclinations to find solutions to problems; it seems an insurmountable problem, and, frankly, dwelling on mortality is not entirely pleasant. Those who haven't accepted mortality will deny the existence of the problem and thus the need for a solution.

I didn't set out to solve the 'short life' problem. Actually, that's not entirely true - I'm a huge health and nutrition nut: I plan to be healthy to at least age 120. But in this post, and in this project, I'm not talking about extending human lifespan. It is the 'big world' problem that we are addressing, and the problem may not be so big after all. The innovation came first, and then it occurred to me that the thousands of years it would take to find your favorites could be compressed significantly.

I'll use the analogy from a previous post. Many of us receive hundreds of emails every day. Without an email filter, it would take us hours to sort through and pick out the emails we prefer. We don't have enough time to perform this task, nor would we want to. The email filter, if it works well, presents to you only those emails that you are most likely to prefer. Reviewing your emails becomes a much quicker and simpler task. Information overload is reduced.

In a similar way, our discovery engine sorts through thousands of years of people, media, opportunities, ideas, causes, products, etc, and presents to you only those things you are most likely to prefer.

So you need no longer fear your own mortality. :-)

Mar 12, 2008

Your Identity Proxy

There seems to be a bit of confusion about the distinction between the terms "identity" and "identification" in popular discussion. The terms are often used interchangeably, and are used differently in different contexts. I thought I would write a bit about these and other related concepts, including a new concept that we introduce.

The terms are infused with the complexity of multiple disciplines (philosophy, psychology, sociology, neurology, religion, etc.), each with their own usage and take on the meanings. To add to the complexity, identity is now an important concept with different meanings for government, commerce, and the internet.

Who are you? Are you different from your neighbor? From your identical twin? Is there something about you that distinguishes you from everybody else? The subjective versions of this are the "self-image" (a person's own model of his identity) and the identity perception of someone by others. Is it "the self" or the the ego of psychology? Is this the "soul" of certain faiths? Is it the mind? The brain? What about the body? Is identity a product of nature, nurture, or both together? Many questions.

Most of us cannot be relied upon to accurately describe our identities, though sometimes best friends can get pretty close (we get closer, see below). This is the reason metadata contributed by users about themselves or their works is not considered accurate. A personal tag cloud is just an ego trip. It is highly subjective. Web page meta keywords are no longer relied upon by search engines or advertisers because they are so inaccurate. This is the source of the delay in the promised "Semantic Web" revolution.

I like to think of Identity as that mental thingy that distinguishes you from every other person. It is the objective, non-corporeal entity that is the sum of all the biology and environmental influences that constitutes what it is to be you, at this moment in your life. Despite the similarities, you have a different identity from your identical twin because your minds and bodies have had different experiences. You also have a different identity than yourself of one year ago because you've had new experiences... and of course your brain has suffered some oxidative degeneration ("vegetable oil", anybody?).

But in the real world, and for the purposes of government, commerce, most things that make the world work, it is the corpus that counts. You are you because you are contained within the body of you. Identity equals body. The body that is recognized as you by facial recognition, and authenticated by fingerprinting, retinal or corneal scanning, etc. Science fiction has enjoyed this mind-body identity confusion with numerous examples in movies and television ("This body is not mine, and I have to be clever to convince my friends of my true identity").

Now, identification is the assertion that you are actually you ("I may look like a fly, but it's really me!"). Having the face of Nancy is an assertion that you are Nancy, i.e. your friends and family will identify you as being the identity they call "Nancy". Identical twins and masks can confuse the identification in opposite ways.

One can authenticate their identification with some available mechanism that provides some level of authentication. Visual similarity to your picture ID card (is ID "identity" or "identification"?) is a common form, voice recognition on the phone is another common one; "Hello, it's Nancy" works only if you sound like Nancy. We can authenticate the body fairly well, but the mind is more difficult ("Nancy doesn't seem like herself today. Maybe she's been taken over by an alien.").

On the internet, there are various uses for the terms identity, identification, authentication and anonymity. Your Facebook profile is a reflection of your identity, or an exhibition of your identity, most probably with identifying elements like your name and photos. In some cases you may have multiple online "identities" representing different facets of your actual identity. Those facets are sometimes identified by usernames and avatars indicative of the identity or sub-identity or idealized identity they represent.

For a new user, identity may initially not be important: an anonymous user is self contained, requiring no identification or authentication. But as other users get to know that user, they will expect that it is consistently backed by the same identity. As it develops a reputation, the identity behind that user identification will want to maintain exclusive ownership of that identification, via some kind of authentication that ensures such exclusivity.

There are many systems for authentication, each attempting to ensure that the user instance is an active reflection of the same identity. Online banking is an example. There are two levels of authentication here. First, the owner of the username is the identity called "Nancy" with these identifying personal details. Second, that the username instance (i.e. the just logged in identity) is also the "Nancy" identity (access management). The first is corpus related: Nancy walks into her bank and gets her login details based on corpus identity. The second is mind identity: does Nancy remember her username and password, or where she scribbled them?

Our project introduces another concept to the scene: the identity proxy. In our case, it is an objective proxy of your identity that makes choices on your behalf, likely the same choices you would make, even when you are not logged in. In a sense, it is like an email filter that follows your instructions and helps you deal with information overload by automating that small bit of your identity that prefers certain emails over others. Ours is much more powerful in reducing information overload because your identity proxy automates the filtration of all available information and options, in every domain of life. Your identity proxy is an accurate and objective reflection of your identity and it understands and automates your decision making processes. There is no greater weapon against the tyranny of choice and information overload.

Without an email filter, it would take us hours per day to delete the spam and read the relevant emails. We would quickly lose patience and only find a fraction of real emails. Likewise, it would take us thousands of years to meet every other human, listen to every song, read every book, evaluate every vocation, etc, in order to find the ones we like. It's a big world, and, sadly, life is short. The identity proxy does not live our lives for us - it makes our lives richer by allowing us to find those things that we wouldn't have found unless we lived for thousands of years.

Also, at it's core, the identity proxy requires no corpus identification, i.e. no personal or demographic details are necessary in the registration process. Nobody can use the registration information to track you down (track down the corpus). Privacy is intact.

Your identity proxy is singular. Having more than one identity proxy is a waste of time because every time you register accurately the system should see you as being identical (or close) to your previous proxy. Registering inaccurately serves no purpose because the proxy will make choices that do not reflect your identity, and the choices will not be as fulfilling for you.

Feb 12, 2008

The Serendipity Revolution

Traditionally, the success of recommender systems is evaluated by predicting accuracy of recommendations off-line using existing datasets. For example, see the million dollar Netflix prize for a meager 10% improvement of their collaborative filtering algorithm. Netflix provided access to 100 million of its customers’ movie ratings to train new algorithms and test them. In other words, the algorithm is judged more accurate the more it recommends movies the user has already seen. Recommendations based upon this traditional accuracy metric are not the most useful to users.

Researchers know that success of recommendations is better measured by recording user satisfaction - the positive emotional response at having discovered something new that one likes. But that is more difficult to measure - as it requires a community of users and a useful mechanism to compel (or at least strongly encourage) the reporting of satisfaction, it's strength and perhaps type. Satisfaction of recommendations seems to follow in ascending order of the following recommendation types:

Low quality, low accuracy recommendation. Users obviously don't appreciate having their time wasted in evaluating something that the system should have known the user would not be likely to appreciate. These are "trust-busters"; the user will lose trust in the system.
An accurate, but known recommendation. An item the user is already aware of. The user likes the item, but it is not novel. Trust is maintained because at least the system recommended something that the user already likes. Too many of these recommendations imply an excess number of false-negatives or "missed opportunities".
A novel, but obvious recommendation. A novel recommendation is something new and appreciated, but something the user would have discovered on his/her own. For example, a new song from a favorite musician, or a new movie from a favorite director. The user will have a positive, though muted, reaction. Many users will suspect that there were "missed opportunities", given the huge number of unfamiliar items in any domain.
A serendipitous recommendation. A serendipitous recommendation is something new, non-obvious and appreciated that the user would likely not have discovered on his/her own. For example, an unfamiliar song from an unfamiliar musician, or a unfamiliar movie from an unfamiliar director. The user will likely have a very positive reaction, though it has been argued that, in some users, such recommendations may be seen as obscure and not immediately appreciated.

The serendipitous recommendation is obviously the ideal for most users, the problem is that collaborative filters tend to focus on what is commonly known and popular - items that the user has heard about or items that the user would have experienced eventually because of their "blockbuster nature". Many of the most interesting items for the user may be buried in the "long tail", so some collaborative filtering systems have attempted to tweak their algorithms to try to maximize this type of recommendation by reducing the more popular recommendations. Even so, recommendation diversity tends to be reduced in collaborative filtering systems, leading to a large number of false-negatives or "missed opportunities".

Recommendations based on a user's core identity will not focus on the popular, or items from artists or directors the user likes, or that the user's friends like. Instead, the user will be recommended items from the entire item landscape that by definition the user is most likely to appreciate based on that core identity (their "preference engine"). Thus the recommendation diversity (coverage of item space) within a domain (such as music) is as large as the diversity of items within that domain, leading to a large number of serendipitous recommendations - possibly the vast majority. Keep in mind that the number of domains in our community is also unlimited, and the same core identity can be used to recommend anything and everything in life.