## Jul 26, 2012

### Sequences alignment in Python

One of the uses of the LCS algorithm is the Sequences Alignment algorithm (SAA). The SAA is useful for comparing the evolution of a sequence (a list of characteristic elements) from one state to another, and is widely used by biomedics for comparing DNA, RNA and proteins; SAA is also used for comparing two text and finding their differences, like the *nix's diff tool.

The following SC is useful for solving the second case:
```#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Recursive sequences alignment algorithm, based on LCS.
# Copyright (C) 2012  Gonzalo Exequiel Pedone
#
# This program is free software: you can redistribute it and/or modify
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with This program. If not, see <http://www.gnu.org/licenses/>.
#
# Email   : hipersayan DOT x AT gmail DOT com
# Web-Site: http://hipersayanx.blogspot.com/
#
# This sequences alignment algorithm consist in findind the LCS of two sequences
# and then resolve the alignment of the sequences before and after the LCS
# recursively.

# Auxiliary function for finding the first index in wich appears a sub-sequence.
def findSubList(l=[], sub=[]):
if len(sub) > len(l):
return -1

for i in range(len(l) - len(sub) + 1):
j = 0
eq = True

for s in sub:
if l[i + j] != s:
eq = False

break

j += 1

if eq:
return i

return -1

def alignSequences(sequence1=[], sequence2=[]):
# lcs is the Longest Common Subsequence function.
cs = lcs(sequence1, sequence2)

if cs == []:
return sequence1 + [''] * len(sequence2) , \
[''] * len(sequence1) + sequence2
else:
# Remainings non-aligned sequences in the left side.
left1 = sequence1[: findSubList(sequence1, cs)]
left2 = sequence2[: findSubList(sequence2, cs)]

# Remainings non-aligned sequences in the right side.
right1 = sequence1[findSubList(sequence1, cs) + len(cs):]
right2 = sequence2[findSubList(sequence2, cs) + len(cs):]

# Align the sequences in the left and right sides:
leftAlign = alignSequences(left1, left2)
rightAlign = alignSequences(right1, right2)

return leftAlign[0] + cs + rightAlign[0], \
leftAlign[1] + cs + rightAlign[1]

a, b = alignSequences(list('abcdfghjqz'), list('abcdefgijkrxyz'))

print(''.join(['-' if i == '' else i for i in a]))
print(''.join(['-' if j == '' else j for j in b]))
```