First NameAli
Last NameDaud
Supervisor NameZhou LiZhou
UniversityTsinghua University
KeywordsLatent Semantics, Group Level, Directed Probabilistic Topic Models, Temporal Maven Finding, Dynamic Research Interests Finding, Academic Social Network
Publication DateMarch 11, 2015
DomainComputer Science / IT

Group Level Temporal Academic Social Network Mining through Topic Models 2010

Social networks are growing quickly and are becoming the important part of World Wide Web recently. With the emergence of Web one of the typical social networks, academic social network has gotten new life. Consequently, research has become so rapid and also more challenging, due to which it has become easy to know who is doing research on what topics and to differentiate between solved and unsolved problems in specific domains. Knowing patterns and behaviors of researchers in academic social networks is very useful for several academic recommendation tasks such as collaborator finding, assigning reviewers to papers and finding program committee members for conferences. At the same time, academic social networks are providing us with several interesting knowledge discovery problems and challenges such as community mining, temporal maven (expert) finding with minimum usage of information and researchers interests evolution analysis, which are the focus of this thesis.
Previous studies in academic social network mining can be divided into two frameworks based on the methods used. First type of framework is based on the links based connectivity between actors such as co-author and citation relationships; all are creating links between actors or entities. Several algorithms like PageRank and HITS basically utilize in-out links. These algorithms do not use text-based information of documents results in ignoring the latent semantics present between documents. Second type of framework is topic modeling algorithms, such as Latent Dirichlet Allocation which use text-based information of documents results in capturing the latent semantics. Topic modeling just have dealt with the element (document) level structures while ignoring group (conference or journal) level structures, which is our focus in this work with the importance of temporal information in these dynamic academic social network.
Latent topic layer based topic modeling which demonstrates the semantic relations among words is helpful for information handling of different problems in these networks by capturing the latent semantics. We provide basic knowledge and importance of topic modeling in the beginning. Next a variety of topic modeling techniques, extensions of Latent Dirichlet Allocation (LDA) with capturing group level structures and time considerations to cope with the explicit grouping structures and temporal trends, have been proposed or introduced for solving academic social network mining tasks.
In this thesis, firstly we highlight the importance of group level structures in comparison with element level structures by providing an example of conference mining tasks. We show that how simpler but with richer semantics of group level topic modeling dense topics are obtained, followed by the impact of dense topics on discovering more precise topically related conferences and their associations as compared to element level topic modeling (sparse topics).
Secondly we highlight the importance of conferences based relationships influence for finding experts in totally unsupervised way with considering different time frames by proposing Temporal-Maven-Topic (TMT) approach. TMT can provide quite promising and realistic solution to solve expert finding problem; without using impact factors of publication venues, how many students supervised and other supervised information which is usually difficult to collect.
Thirdly we highlight that why dynamic research interests are important to be discovered because of highly dynamic Web and changing research interests of authors; due to several new projects are beginning for new problems in different research areas. Intuitively, the idea of Author-Topic (AT) model for finding research interests and the idea of Topics over Time (TOT) model to consider time for capturing topic dynamics is combined to propose Temporal-Author-Topic approach for finding dynamic research interests.
We investigate and evaluate topic models for the above mentioned research topics on famous academic social network DBLP large dataset. We conducted experiments and show that significant improvements over previous work can be obtained by using our proposed approaches.

Download Thesis