Analysis and Experimental Evaluation of the Needleman-Wunsch Algorithm for Trajectory Comparison

Výsledky výskumu našich pracovníkov Maroša Čavojského a Martina Drozdu boli prijaté do prestížneho časopisu Expert systems with applications. Tento časopis mal v roku 2020 Impact factor  na hodnote 6.954.

Maroš Čavojský, Martin Drozda a Zoltán Balogh : Analysis and Experimental Evaluation of the Needleman-Wunsch Algorithm for Trajectory Comparison

Needleman-Wunsch algoritmus sa štandardne používa na porovnávanie DNA a RNA sekvencií. V článku je popísaný a analyzovaný prístup na využitie tohto algoritmu pre porovnávanie trajektórií používateľov, pričom približná poloha používateľa je získaná pomocou GPS (Global positioning system). Autori výskumu predpokladajú, že približné polohy sú normálne rozdelené okolo skutočnej polohy používateľa. Tento predpoklad má za následok, že Needleman-Wunsch algoritmus pri porovnávaní trajektórií nepoužije nezhody (angl. mismatch). Táto vlastnosť je experimentálne overená. Článok sa tiež zaoberá spôsobom konverzie trajektórií získaných pomocou GPS na reťazce, ktoré sú vyžadované ako vstup pre Needleman-Wunsch algoritmus, ako aj aproximáciou chýbajúcich častí trajektórie, ktoré môžu vzniknúť v prípade, že GPS signál nemôže byť prijatý v požadovanej kvalite.

Abstrakt: We evaluate whether the Needleman-Wunsch algorithm is suitable for user trajectory comparison. The problem that we aim to solve is pair-wise user trajectory comparison. Similar user trajectories are then clustered with respect to their similarity, where clusters emerge in a non-supervised way.

We assume that user position, provided by GPS (Global positioning system), is normally distributed around user actual position. This assumption allows us to derive a model for setting score for match, penalty for gap and penalty for mismatch, which are an input to the Needleman-Wunsch algorithm. Our model implies that, in scenarios where actual user position is unknown and must be thus estimated from measured positions, the Needleman-Wunsch algorithm may be prevented from applying mismatches. In an experimental evaluation, we apply two data sets that contain recorded user positions and we show that our approach based on the Needleman-Wunsch algorithm is capable of correct classification of user trajectories into groups. Unlike in existing literature, we show that in GPS based user trajectory comparison, it is indeed not necessary to consider mismatches when applying the Needleman-Wunsch algorithms. This leads to a simplified string editing problem known as Longest Common Subsequence (LCS). We compare our approach with Edit Distance on Real sequence (EDR) in order to provide an insight into the performance of our approach.

Applying the Needleman-Wunsch algorithm has helped to solve several problems that emerge in GPS based user trajectory comparison such as interrupted GPS service due to satellite occlusion and various signal propagation phenomena such as signal reflection, fading etc. In order to improve the efficiency of the Needleman-Wunsch algorithm, we apply Move ability to identify when such detrimental conditions could occur. We also apply linear approximation in order to enhance user GPS trajectories with missing points, what further improves the efficiency of user trajectory comparison.

Celý článok nájdete na stránkach vydavateľstva Elsevier.