EDITOR’S NOTE: At TheBestSchools.org, we rank schools and degree programs — that’s our claim to fame. Over the years we have strived to provide insightful rankings that will help prospective students make wise choices in advancing their academic careers. That said, we have always been concerned about subjective factors in academic rankings, which can skew them and which, though capable perhaps of being minimized, seem incapable of ever quite being entirely eliminated. Moreover, we have grown concerned about the increasingly common cat-and-mouse game played between schools and ranking organizations, in which schools adapt themselves to popular academic rankings without any intention of substantively improving education at their institutions but simply to “game” the rankings and rise up in them. It’s been an ongoing discussion among our staff here at TBS how to control for such biases.
Enter InfluenceNetworks.com. In the past months we have partnered with this Silicon Valley startup to form a fundamentally new approach to academic rankings. This approach uses machine learning and search algorithms to characterize academic influence on the web, and thereby avoids the bias of continual human intervention that infects most academic rankings. The results of this partnership have been groundbreaking. Below is a short white paper by two of InfluenceNetworks.com’s key technologists laying out its algorithm and how it generates academic rankings. The rankings we uncover are based on the influential faculty and alums associated with schools and degree programs. We predict that this Influence Networks approach will fundamentally transform academic rankings, if for no other reason than what ultimately ought to elevate a school or degree program above another is the comparative degree to which members of its academic community are influential in the discipline under consideration.
To appreciate the power of this new influence-based method of ranking academic programs, see our article “Influence Ranking of Top Academic Degree Programs.”
Stanford and MIT are widely considered to be among the most influential computer science programs. Academic ranking organizations like U.S. News & World Report agree that this is so. But if it is so, why is it so?
In order to answer this question, our approach at InfluenceNetworks.com is to begin by asking what it means for a particular school to be influential. A school is made up of individuals, notably the people who teach and have studied at that institution. It’s these people who determine the school’s degree of influence.
Accordingly, the school is not so much influential as a school per se as through the individuals who are the lifeblood of the school. Specifically, it has influence through the individuals who teach at the institution, as well as the individuals who were taught by the institution and who subsequently went on to make significant contributions to their fields.
Thus, we propose to evaluate the influence of an institution by evaluating the influence of the individuals who teach at (present tense) or were taught by (past tense) the institution. This leaves us with the data science problem of evaluating the influence of an individual on a particular topic. For instance, Donald Knuth is clearly a highly influential computer scientist and Adam Smith is an influential economist. However, how can we evaluate the degree of their influence?
We propose to evaluate the influence of an individual by investigating the co-occurrence of the individual and the topic (over which the individual exercises influence) in web-accessible documents. Our guiding hypothesis is as follows: if a person is influential with a particular topic, such as computer science or economics, then this person should be often mentioned in discussions of that topic.
For example, the Wikipedia page on sorting networks begins, “In computer science, comparator networks are abstract devices…” indicating that the page discusses computer science. Later in the article it says that “Donald Knuth describes how the comparators for binary integers can be implemented as simple, three-state electronic devices,” indicating that Donald Knuth is related to the subject of page. Taken together, this indicates that Donald Knuth has influence in the area of computer science.
As another example, a list exists of faculty books associated with the Cowles Foundation for Research in Economics. The word “economics” is mentioned repeatedly on the page, as would be expected, but the same page also includes the phrase, “Ever since Adam Smith, the central teaching of economics has been that free markets provide us with material well-being…” Since the page references both economics and Adam Smith, it provides an indication of the influence of Adam Smith on economics.
However, not all documents are created equal. One will find numerous pages devoted to the heterodox positions of obscure economists. This is influence of a sort, but should not be considered the same level of influence as economists discussed on more mainstream sources such as Wikipedia or the pages of major universities. Fortunately, the problem of identifying important pages has many proposed solutions, such as PageRank, Harmonic Centrality, or Kat’s Index. By employing such techniques, we can obtain a measure of the importance of a document, and take this into account.
In order to assess the validity of our approach, we indexed a vast number of web-accessible sources to use as our base documents. This included the entire English Wikipedia corpus, a collection of high-profile university websites, the Notable Names Database (NNDB), and pages linked to directly by the DMOZ directory. Anybody who is anybody in the academic world should, at the very least, leave a footprint in such pages. There is, of course, much more content that could have been included (and our InfluenceRanker algorithm can digest such additional information). Nevertheless, any influential academics should appear in these data. Furthermore, any other documents are expected to carry less weight with influence, and thus affect the influence rankings more weakly.
We also require some basic data about a person. For this, we use Wikidata, which provides structured data about a wide variety of people. For example, it records the birth, death, occupation, schools, employers, and field of study of a wide range of academics. Given a particular academic discipline, we can identify the people who have occupations corresponding to a particular field and who have worked for given academic institutions. These people are determined to be the set of academics for a particular field.
Given a document, we identify the potential names by looking for text that is in title case. “Title case” refers to text where the first letter in each word is capitalized, except for certain key words such “of,” “in,” “the,” etc. This will identify phrases such as “Donald Knuth,” but also “Weyland Smith Academy.” For each potential person, we check the name against the known names of academics in the field under consideration. If the name matches an academic, we add to the influence of that academic. Any other potential names are ignored. Note that this means that “Weyland Smith Academy” is not considered a reference to Weyland Smith.
The exact formula used to evaluate the influence is proprietary. It derives from the provisional patent: “METHOD FOR COMPUTING AN INFLUENCE RANKING IN BIDIRECTIONAL HYPERLINKED DATABASES WITH PERSON BIOGRAPHIES” by Erik Larson. This method considers how important a document is based on PageRank and Harmonic Centrality. Less important documents attribute less influence to the individual. It also considers how often a person or topic is mentioned on the page. The more often an individual is mentioned on a page, the more likely the page is about the person. The same holds true for a topic. If the same person is mentioned repeatedly on a page that also repeatedly mentions a topic, this strongly indicates that the person is connected to the topic. The method also considers how close to the start of a document a person is mentioned. The earlier a person is mentioned, the more likely it is that the document is about that person.
Once we attribute an influence weight to a particular individual for a particular topic, we can aggregate across the entire collection of documents to obtain that individual’s influence for the topic as a whole. Given individuals of the caliber of Donald Knuth or Adam Smith, they will be mentioned across a wide variety of pages discussing their respective disciplines. By aggregating across the documents, we obtain those individuals who are widely discussed, not simply discussed a few times.
With measures of influence for individuals in hand, we can finally obtain the degree of influence of the school (or degree program) as a whole by aggregating across the individuals who taught or studied at that school. The entire process is depicted in the following figure:
Our method thus eliminates the need for surveys and other manual or subjective methods, and directly evaluates the relative influence of an institution by using a measure of relevance and importance based directly on the institution’s most influential members.
To our knowledge, this pure data science approach to academic rankings is unique to the industry, and opens up powerful research methods that help to bring academic rankings fully into the exciting new world of machine learning and big data.
We also believe such an approach shows promise for novel commercial as well as government applications. We are currently using our InfluenceRanker algorithm to evaluate influence across a broad spectrum of topics important to the intelligence community under a DARPA SBIR Phase II contract.