u++の備忘録

【論文メモ】AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification

追記:20171226

論文読み会での発表資料を共有します。

===
追記終

以下の記事に記載した論文

upura.hatenablog.com

論文名

Yukihiro Tagami: AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification, KDD’17, pp.455-464, August 13–17, 2017, Halifax, NS, Canada.

どんなもの?

ラベルの種類数が膨大な場合(10^4 - 10^6)のマルチラベル分類問題に対し、高速かつ精度良く予測を行う分類器を提案

先行研究と比べてどこがすごい?

Sparse Local Embeddings for Extreme Classification (SLEEC) [4] の3つの問題点を解決

  • SLEEC only uses feature vectors and does not access label information
  • SLEEC’s optimization process for learning regressors is slightly complicated because of sparsity-induced and rank-constraint regularization terms.
  • Much faster prediction is preferable for scaling up to solve Web-scale classification problems

技術や手法のキモはどこ?

k-nearest neighbor graph (KNNG)

どうやって有効だと検証した?

Extreme Classification Repository [5]のデータを使い検証。予測時間と精度の両者の向上が確認できた。
(At the same level of accuracy, the prediction time of AnnexML was up to 58 times faster than that of SLEEC)
[5] Kush Bhatia, Himanshu Jain, Yashoteja Prabhu, and Manik Varma. 2016. The Extreme Classification Repository. (2016). Retrieved January 15, 2017 from https://manikvarma.github.io/downloads/XC/XMLRepository.html

議論はある?

More complex nonlinear models, such as kernel machines and deep neural networks に取り組みたい

次に読むべき論文は?

[4] Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, and Prateek Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems.