周志华教授的研究兴趣画像

使用Jupyter notebook

本文从周志华教授的简历网站出发,收集周教授的研究兴趣相关数据,并进行分析画像。

本文档为文字版本,若需查看带源码的notebook,请前往此处

A. 概览

简历网站首页: I have wide research interests, mainly including artificial intelligence, machine learning, data mining, pattern recognition, evolutionary computation and multimedia retrieval, among which machine learning and data mining are my core research areas. I am particularly interested in the problem of how to enable computing machines to handle “ambiguity”.

整理一下,他的研究兴趣大致在:

  • 人工智能
  • 机器学习
  • 数据挖掘
  • 模式识别
  • 进化计算
  • 多媒体检索

接下来找下他的文章,来组一个数据集,以定量的预测下他的研究兴趣。

B. 设计数据集

M. Xu, Y.-F. Li, and Z.-H. Zhou. Robust multi-label learning with PRO loss. IEEE Transactions on Knowledge and Data Engineering, in press.

每条文章信息由四个数据段构成,再扩展两个特征:文章的分类,文章的链接,一共六个特征组成原始数据集.

数据集样本示例:

分类 作者 标题 刊物或会议 时间 链接
Multi-Label Learning M. Xu, Y.-F. Li, and Z.-H. Zhou. Robust multi-label learning with PRO loss. IEEE Transactions on Knowledge and Data Engineering, in press. https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/tkde19proloss.pdf

这样的数据集看起来还不错,接下来写个脚本把其他样本收集整理好。再耐心的洗一下数据.(由于网站格式特别乱,部分数据还是没有洗干净)

分类 作者 标题 刊物或会议 时间 链接
0 Multi-Label Learning M. Xu, Y.-F. Li, andZ.-H. Zhou. Robust multi-label learning with PRO loss IEEE Transactions on Knowledge and Data Engine... in press https://cs.nju.edu.cn/zhouzh/zhouzh.files/publ...
1 Multi-Label Learning T.-Z. Wang, S.-J. Huang, and Z.-H. Zhou. Towards identifying causal relation between in... Proceedings of the 19th SIAM International Con... 2019 https://cs.nju.edu.cn/zhouzh/zhouzh.files/publ...
2 Multi-Label Learning S.-J. Huang, W. Gao, and Z.-H. Zhou. Fast multi-instance multi-label learning IEEE Transactions on Pattern Analysis and Mach... in press https://cs.nju.edu.cn/zhouzh/zhouzh.files/publ...
3 Multi-Label Learning S.-Y. Li, Y. Jiang, N. V. Chawla, andZ.-H. Zhou. Multi-label learning from crowds IEEE Transactions on Knowledge and Data Engine... 2019 https://cs.nju.edu.cn/zhouzh/zhouzh.files/publ...
4 Multi-Label Learning Y. Zhu, K. M. Ting, andZ.-H. Zhou. Multi-label learning with emerging new labels IEEE Transactions on Knowledge and Data Engine... 2018 https://cs.nju.edu.cn/zhouzh/zhouzh.files/publ...
... ... ... ... ... ... ...
529 Miscellaneous Z.-H. Zhou. Three perspectives of data mining</font> Artificial Intelligence 2003 https://cs.nju.edu.cn/zhouzh/zhouzh.files/publ...
530 Miscellaneous Z.-H. Zhou and S.-F. Chen. Evolving fault-tolerant neural networks</font> Neural Computing &amp; Applications 2003 https://cs.nju.edu.cn/zhouzh/zhouzh.files/publ...
531 Miscellaneous Z.-H. Zhou. Review on <i>Data Mining: Concepts and Techniq... IEEE Transactions on Neural Networks 2002 https://cs.nju.edu.cn/zhouzh/zhouzh.files/publ...
532 Miscellaneous Z.-H. Zhou, S.-F. Chen, and Z.-Q. Chen. Improving tolerance of neural networks against... Proceedings of the INNS-IEEE International Joi... 2001 https://cs.nju.edu.cn/zhouzh/zhouzh.files/publ...
533 Miscellaneous Z.-H. Zhou, S. Chen, and Z. Chen. FANNC: A fast adaptive neural network classifi... Knowledge and Information Systems 2000 https://cs.nju.edu.cn/zhouzh/zhouzh.files/publ...

534 rows × 6 columns

C. 特征处理

为了预测研究兴趣,对数据集做一些处理。

  1. 作者衍生出作者排名 (即周志华教授是第几作者)
  2. 链接丢掉(在本文中暂时放弃对文章内容的分析)
分类 作者 标题 刊物或会议 时间 作者排名
0 Multi-Label Learning M. Xu, Y.-F. Li, andZ.-H. Zhou. Robust multi-label learning with PRO loss IEEE Transactions on Knowledge and Data Engine... in press 3
1 Multi-Label Learning T.-Z. Wang, S.-J. Huang, and Z.-H. Zhou. Towards identifying causal relation between in... Proceedings of the 19th SIAM International Con... 2019 3
2 Multi-Label Learning S.-J. Huang, W. Gao, and Z.-H. Zhou. Fast multi-instance multi-label learning IEEE Transactions on Pattern Analysis and Mach... in press 3
3 Multi-Label Learning S.-Y. Li, Y. Jiang, N. V. Chawla, andZ.-H. Zhou. Multi-label learning from crowds IEEE Transactions on Knowledge and Data Engine... 2019 4
4 Multi-Label Learning Y. Zhu, K. M. Ting, andZ.-H. Zhou. Multi-label learning with emerging new labels IEEE Transactions on Knowledge and Data Engine... 2018 3
... ... ... ... ... ... ...
529 Miscellaneous Z.-H. Zhou. Three perspectives of data mining</font> Artificial Intelligence 2003 1
530 Miscellaneous Z.-H. Zhou and S.-F. Chen. Evolving fault-tolerant neural networks</font> Neural Computing &amp; Applications 2003 1
531 Miscellaneous Z.-H. Zhou. Review on <i>Data Mining: Concepts and Techniq... IEEE Transactions on Neural Networks 2002 1
532 Miscellaneous Z.-H. Zhou, S.-F. Chen, and Z.-Q. Chen. Improving tolerance of neural networks against... Proceedings of the INNS-IEEE International Joi... 2001 1
533 Miscellaneous Z.-H. Zhou, S. Chen, and Z. Chen. FANNC: A fast adaptive neural network classifi... Knowledge and Information Systems 2000 1

534 rows × 6 columns

D. 数据集分析

概览

分析报告见此处

时间维度

各年份发表文章数

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Face Recognition 1.0 NaN 1.0 1.0 5.0 6.0 10.0 2.0 3.0 3.0 2.0 1.0 NaN 1.0 NaN NaN 1.0 NaN NaN NaN
Miscellaneous 1.0 1.0 1.0 2.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN
Ensemble Learning NaN 1.0 2.0 4.0 2.0 2.0 7.0 4.0 5.0 9.0 6.0 7.0 6.0 6.0 6.0 3.0 NaN 3.0 5.0 1.0
Computer-Aided Medical Diagnosis NaN NaN 1.0 1.0 NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Improving Comprehensibility NaN NaN 1.0 2.0 2.0 1.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Multi-Instance Learning NaN NaN 1.0 2.0 2.0 2.0 2.0 3.0 1.0 5.0 1.0 1.0 6.0 3.0 4.0 NaN 1.0 5.0 NaN 2.0
Image Retrieval NaN NaN NaN 1.0 1.0 1.0 4.0 2.0 1.0 5.0 1.0 1.0 1.0 2.0 NaN 1.0 NaN 1.0 NaN NaN
Metric Learning, Dimensionality Reduction and Feature Selection NaN NaN NaN NaN 1.0 3.0 5.0 1.0 3.0 6.0 3.0 2.0 1.0 1.0 NaN 1.0 4.0 1.0 NaN 1.0
Multi-View Learning NaN NaN NaN NaN 1.0 2.0 2.0 3.0 1.0 4.0 4.0 3.0 2.0 2.0 2.0 3.0 1.0 1.0 NaN 1.0
Semi-Supervised and Active Learning NaN NaN NaN NaN 1.0 3.0 2.0 6.0 1.0 8.0 5.0 7.0 6.0 10.0 2.0 5.0 5.0 4.0 3.0 2.0
Multi-Label Learning NaN NaN NaN NaN NaN 1.0 1.0 2.0 2.0 2.0 3.0 3.0 7.0 8.0 3.0 3.0 1.0 5.0 5.0 2.0
Bioinformatics NaN NaN NaN NaN NaN NaN 2.0 NaN NaN 3.0 NaN NaN 3.0 1.0 2.0 NaN 1.0 1.0 NaN NaN
Cost-Sensitive and Class-Imbalance Learning NaN NaN NaN NaN NaN NaN 4.0 NaN 2.0 1.0 4.0 1.0 2.0 4.0 1.0 1.0 3.0 NaN NaN NaN
Structure Learning and Clustering NaN NaN NaN NaN NaN NaN 1.0 3.0 1.0 3.0 5.0 9.0 5.0 5.0 10.0 5.0 2.0 1.0 1.0 1.0
Theoretical Aspects of Evolutionary Computation NaN NaN NaN NaN NaN NaN 1.0 NaN 2.0 1.0 1.0 1.0 2.0 1.0 1.0 4.0 3.0 2.0 3.0 NaN
Web Search and Mining NaN NaN NaN NaN NaN NaN 1.0 1.0 3.0 3.0 1.0 2.0 1.0 1.0 NaN NaN NaN NaN NaN NaN
Software Mining NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 NaN NaN NaN NaN NaN NaN 1.0
Crowdsourcing Learning NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2.0 1.0 1.0 1.0 1.0
Logic Learning NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 1.0 2.0 1.0

png

Ensemble Learning 很具有代表性,抓出来分析作者排名

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
作者排名
1 NaN 1.0 2.0 4.0 1.0 1.0 3.0 1.0 NaN 5.0 1.0 1.0 1.0 NaN 1.0 NaN NaN 1.0 NaN 1.0
2 NaN NaN NaN NaN 1.0 NaN 1.0 2.0 1.0 1.0 3.0 NaN NaN 3.0 1.0 NaN NaN NaN 3.0 NaN
3 NaN NaN NaN NaN NaN 1.0 3.0 NaN 2.0 3.0 2.0 4.0 3.0 2.0 2.0 3.0 NaN 2.0 1.0 NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN 2.0 NaN NaN NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN
5 NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN 2.0 NaN 1.0 1.0 NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 NaN NaN NaN NaN NaN NaN NaN

png

在前期大多为一作,后期二三作增多而一作减少

兴趣维度

png

兴趣占比
分类
Computer-Aided Medical Diagnosis 0.5618%
Software Mining 0.7491%
Logic Learning 0.9363%
Crowdsourcing Learning 1.1236%
Miscellaneous 1.3109%
Improving Comprehensibility 1.3109%
Bioinformatics 2.4345%
Web Search and Mining 2.4345%
Image Retrieval 4.1199%
Cost-Sensitive and Class-Imbalance Learning 4.3071%
Theoretical Aspects of Evolutionary Computation 4.6816%
Multi-View Learning 6.1798%
Metric Learning, Dimensionality Reduction and Feature Selection 6.1798%
Face Recognition 6.9288%
Multi-Instance Learning 8.2397%
Multi-Label Learning 9.9251%
Structure Learning and Clustering 9.9251%
Semi-Supervised and Active Learning 13.6704%
Ensemble Learning 14.9813%

最后,画张词云图吧

png

以上只是所有历史的兴趣统计,周志华老师的现在的研究兴趣需要做一定的修正,例如近几年在Crowdsourcing LearningLogic Learning领域,表现出较高的文章产出,但由于两个领域较为新兴,历史统计受到一定影响.

以后可以尝试完成的工作:

  1. 通过超链接下载文章,分词统计等处理后,进一步丰满数据集
  2. 尝试进行时序预测,估计未来周志华老师的研究兴趣

本文档为文字版本,若需查看带源码的notebook,请前往此处

本文总字数: 7244