International Journal of Hybrid Intelligent Systems
Volume 1, No. 2 (2004)
A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees
Doina Caragea, Adrian Silvescu and Vasant Honavar
Abstract. This paper motivates and precisely formulates the problem of learning from distributed data; describes a general strategy for transforming traditional machine learning algorithms into algorithms for learning from distributed data; demonstrates the application of this strategy to devise algorithms for decision tree induction from distributed data; and identifies the conditions under which the algorithms in the distributed setting are superior to their centralized counterparts in terms of time and communication complexity. The resulting algorithms are provably exact in that the decision tree constructed from distributed data is identical to that obtained in the centralized setting. Some natural extensions leading to algorithms for learning from heterogeneous distributed data and learning under privacy constraints are outlined.
Keywords: Distributed Learning, Sufficient Statistics, Learning Agents, Decision Trees
Copyright © 2004 Advanced Knowledge International, Australia