Graphs are, quite simply, a universal method for representing relationships among entities – starting with immediate connections, then “hopping” to connections of connections of connections. The farther out they go, the broader the tree becomes.
To make sense of this, graph neural networks (GNNs) are often applied. These deep learning models are specialized for understanding graphs.
Still, when it comes to today’s social networks, GNNs are less than optimal. When applied to determine connections between friends, acquaintances and professional colleagues, they often can’t compute the nuances and complex degrees of relationships. This makes it difficult for platforms like LinkedIn, Twitter, Facebook and Instagram to make accurate recommendations – a task that is core to their mission.
To overcome such inherent challenges with GNNs and improve its recommendation abilities, LinkedIn has created a process it calls Performance-Adaptive Sampling Strategy (PASS). This uses AI to select the neighbors in graphs that are the most relevant, thus improving predictive accuracy.
After applying the new GNN model to its own recommendation engines, the professional networking platform has just released PASS to the open-source community.
“We want to benchmark our methods for other researchers’ data sets,” said Jaewon Yang, a senior staff software engineer at LinkedIn who spearheaded PASS. “We’re hoping that they can build off our networks.”
At a high level, LinkedIn uses GNNs to understand relationships among individual members, groups, skills and interests – on primary, secondary, tertiary levels and beyond – to help inform recommendations.
PASS’s unique neighbor selection AI model hones in on this, according to its creators, by deciding whether to select a given neighbor by looking at their attributes. It can also help detect if one of these neighbors is actually a bot or a fake account by determining the authenticity of its connections. This adaptive model learns how to select neighbors that boost its accuracy.
“Sometimes people miss some other titles that may be very relevant for a job posting or other recommendation,” Yang said. “We want to accurately understand who this user is following, we want to understand what other users are following the users.”
Traditional GNNs can be difficult to scale to social networks because they present so many potential relationships, not all of which are relevant to given tasks, Yang said. For example, one member’s connections might be personal friends working in different fields, thus diminishing recommendation accuracy.
Meanwhile, an influencer or a prominent public figure might have connections in the hundreds of millions – flouting the “Dunbar’s number” sociological theory that any one person can only have a certain number of friends, Yang pointed out – and it is impossible to compute them all.
“These present an explosive number of data points that have to be considered,” he said. “We cannot consider all of them, we need to sample a few.”
Some existing methods have attempted to overcome scaling challenges by sampling a fixed number of “neighbors,” thereby reducing inputs to the GNN. But such samplers are not fully representative, Yang said, and do not consider what neighbors might prove most relevant.
Other organizations are rolling out similar platforms that attempt to boost existing GNNs. For instance, Yale University and IBM recently proposed a concept they call Kernel Graph Neural Networks (KerGNNs), which integrate graph kernels into GNN message-passing. This is the process by which vector messages are exchanged between nodes in a graph and subsequently updated. Using this KerGNN method has resulted in improved model interpretability compared with conventional GNNs, according to Yale researchers.
Similarly, Google has released TensorFlow Graph Neural Networks, a library designed to make it easier to work with graph-structured data in its TensorFlow machine learning (ML) framework. Twitter, Pinterest, Airbnb and others are also performing research and releasing tools to help tackle GNN limitations.
PASS has been shown to achieve higher prediction accuracy even as it uses fewer numbers of neighbors than other GNN models. In experiments on seven public benchmark graphs and two LinkedIn graphs, PASS outperformed GNN methods by up to 10.4%. It also showed up to 3 times greater accuracy compared to baseline methods by adding so-called “noisy edges.”
In open sourcing PASS, the hope is that other researchers will discover novel ways to apply the platform, Yang said, and thus make it more expressive, flexible, easier to model, and address its limitations to continually broaden its use for a variety of applications.
“This technology is evolving very fast,” said Romer Rosales, senior director for AI at LinkedIn. “We are just scratching the surface in terms of all the uses that this can have. There is a lot of room for us to grow, and for the general community to grow in this space.”
LinkedIn researchers will continue to refine PASS to tackle larger and larger datasets without losing expressive power, he said. The goal is to eventually automate certain processes that still require human provisioning – such as specifying parameters around how to sample hops and whether the system should identify two hops, three hops, or further along the chain.
“This is fertile ground for trying these new ideas,” Rosales said. “We hope that other communities will also join us, and we will join other communities in trying and sharing these experiences.”
PASS is one of several that LinkedIn has open-sourced this year, he pointed out. Another is FastTreeSHAP, a package for the programming language Python. This helps to more efficiently interpret algorithm results to improve transparency in AI, including explainable AI to build trust and augment decision-making – such as business predictive, recruiter search and job search models. It also helps modelers perform debugging and make overall improvements.
Another project is Feathr, a feature store that makes ML feature management easier at scale and improves developer productivity. Dozens of applications use the feature store to define features, compute them for training, deploy them in production, and share them across teams. Feathr users have reported significantly reduced time required to add new features to model training workflows, and improved runtime performance compared to previous app-specific feature pipeline tools.
“PASS is one example in a long line of AI projects that we’ve opened to the community,” Rosales said, “in an effort to share our experience and help build more scalable, expressive, and responsible AI algorithms and tools.”