In many data science applications, the objective is to extract appropriately ordered
smooth low-dimensional data patterns from high-dimensional data sets. This is
challenging since common sorting algorithms are primarily aiming at finding
monotonic orderings in low-dimensional data, whereas typical dimension reduction
and feature extraction algorithms are not primarily designed for extracting smooth
low-dimensional data patterns. We show that when selecting the Euclidean
smoothness as a pattern quality criterion, both of these problems (finding the
optimal “crisp” data permutation and extracting the sparse set of permuted
low-dimensional smooth patterns) can be efficiently solved numerically as one
unsupervised entropy-regularized iterative optimization problem. We formulate
and prove the conditions for monotonicity and convergence of this linearly
scalable (in dimension) numerical procedure, with the iteration cost scaling of
, where
is the size of the
data statistics and
is a feature space dimension. The efficacy of the proposed method is
demonstrated through the examination of synthetic examples as well as a
real-world application involving the identification of smooth bankruptcy
risk minimizing transition patterns from high-dimensional economical data.
The results showcase that the statistical properties of the overall time
complexity of the method exhibit linear scaling in the dimensionality
within the specified confidence intervals.