Download this article
 Download this article For screen
For printing
Recent Issues
Volume 15, Issue 1
Volume 14, Issue 2
Volume 14, Issue 1
Volume 13, Issue 1
Volume 12, Issue 2
Volume 12, Issue 1
Volume 11, Issue 2
Volume 11, Issue 1
The Journal
About the journal
Ethics and policies
Peer-review process
 
Submission guidelines
Submission form
Editorial board
 
Subscriptions
 
ISSN (electronic): 2693-3004
ISSN (print): 2693-2997
Author Index
To Appear
 
Other MSP Journals
Tropical support vector machine and its applications to phylogenomics

Xiaoxian Tang, Houjie Wang and Ruriko Yoshida

Vol. 14 (2023), No. 2, 133–165
Abstract

Most data in genome-wide phylogenetic analysis (phylogenomics) is essentially multidimensional, posing a major challenge to human comprehension and computational analysis. Also, we cannot directly apply statistical learning models in data science to a set of phylogenetic trees since the space of phylogenetic trees is not Euclidean. In fact, the space of phylogenetic trees is a tropical Grassmannian in terms of the max-plus algebra. Therefore, to classify multilocus data sets for phylogenetic analysis, we propose tropical support vector machines (SVMs). Linear SVMs are supervised learning models that can be formulated in terms of quadratic optimization problems and that classify using hyperplanes in a high-dimensional Euclidean space. Here we study hard margin tropical SVMs introduced by Gärtner and Jaggi and define soft margin tropical SVMs in the setting of tropical geometry. Then we show that both hard margin tropical SVMs and soft margin tropical SVMs can be formulated as linear optimization problems. For hard margin tropical SVMs, we show necessary and sufficient conditions on the feasibility of the linear optimization problem and if there exists a feasible solution then we show an explicit formula for the optimal value of the feasible linear optimization problem. For soft margin tropical SVMs, we show necessary conditions of the feasibility of the linear optimization problem. Computational experiments show that our methods work well with data sets generated under the multispecies coalescent model.

Keywords
linear programming, phylogenetic tree, supervised learning, non-Euclidean data, tropical geometry
Mathematical Subject Classification
Primary: 14T90, 90C24, 92B10
Milestones
Received: 26 September 2022
Revised: 27 February 2023
Accepted: 5 March 2023
Published: 16 May 2024
Authors
Xiaoxian Tang
School of Mathematical Sciences
Beihang University
Beijing
China
Key Laboratory of Mathematics, Informatics and Behavioral Semantics
Ministry of Education
Beijing
China
State Key Laboratory of Software Development Environment
Beihang Unviersity
Beijing
China
Houjie Wang
Department of Statistics
Duke University
Durham, NC
United States
Ruriko Yoshida
Department of Operations Research
Naval Postgraduate School
Monterey, CA
United States