Skip to main content
SHARE
Publication

Error-Bounded Graph Construction for Semi-supervised Manifold Learning...

by Christopher T Symons
Publication Type
Conference Paper
Journal Name
ACM Digital Library
Publication Date
Page Number
51
Volume
TBD
Conference Name
14th International Workshop on Mining and Learning with Graphs (held in conjunction with KDD'18)
Conference Location
London, United Kingdom
Conference Sponsor
ACM SIGKDD
Conference Date
-

Graphs are commonly used in semi-supervised learning to represent a manifold on which the data reside in a high-dimensional ambient space. The graph can then be utilized in different ways, typically via the Laplacian of the graph, in order to leverage associations among the unlabeled data to improve learning. One common way to leverage the graph Laplacian is as a regularization term, where models that would disagree with the graph are penalized. More often the spectrum of the graph Laplacian is used to find a lower dimensional embedding in which neighboring relations encoded via the graph are preserved. Most manifold-based methods of semi-supervised learning depend upon geometric structure in the ambient feature space in order to construct a graph whose edges encode similarity that should be useful in selecting a model. A critical assumption is that some standard measure of similarity applied to the ambient space can be used to construct a graph that is error-free or of low error, meaning that examples (i.e., vertices) from distinct classes are not connected. However, this assumption often precludes the use of such methods in noisy or complex feature spaces, even though such spaces often arise in problems that can most benefit from structure that might be uncovered within the unlabeled data. This paper presents a method of graph construction for manifold-based semi-supervised learning that respects the manifold assumptions underlying these methods and bounds the error on the graph itself, which then permits bounds on the overall generalization error of the learning algorithms without relying on assumptions that do not hold in many modern problem domains.