Given two Strings A and B. Find the length of the Longest Common Subsequence (LCS) of the given Strings.
Subsequence can contain any number of characters of a string including zero or all (subsequence containing zero characters is called as empty subsequence).
INPUT : A : JAVABLOG B : ABLGOUTPUT
APPROACH – I (RECURSIVE):
We just need to find the maximum length of a sequence containing characters
common to both the strings in the relative order same as that of the given
Strings.
In this approach we calculate
our result in bottom-up manner that is, while returning from recursion.
We keep on shortening the length of the given Strings by one character,
and once we hit the basecase, this is the point where at least one of the
two string is exhausted having size zero. So in this case we can not have
any matching character in the two strings, therefore, we return 0.
As we know we kept on removing the first character of the string in every
recursion state, now the result of the bigger string will depend on the
result of these smaller Strings.
Now to
find this, two possibilities arise :
Case 1 :
When the two characters match.
If the both the character matches then we can add 1 to the length of the LCS
found till now and can return one plus the result of shortened strings which
we will get from recursion.
that
is, 1 + LCS( shortened A, shortened B)
Case 2 :
When the two characters do not match.
If the characters do not match, then
there are chances are, that the first character of String A matches with
some other character in B or the first character of B matches with some
other character in A. But as we need our result to be maximum therefore we
return the maximum of these two calls.
that is, Maximum of (LCS of shortened A & B) and (LCA of A and shortened
B)
.
As at every point we are making two calls when the first character of the
two strings does not match, therefore, in the worst case where there is no
matching characters in either of the two strings, then in that case our
worst time complexity will be O(2^n)
, where n is the length of the shorter String.
Consider an Example for better understanding and its recursion tree,
B : nop
As we can see, in the recursion tree, we make only one call for remaining Strings whenever the first character matches and in every other case we make two calls.
package DynamicProgramming; import java.util.Scanner; public class LongestCommonSubs { public static void main(String[] args) { Scanner scn = new Scanner(System.in); String A = scn.nextLine(); String B = scn.nextLine(); System.out.println(LCS(A, B)); } public static int LCS(String A, String B) { /*if one String is exhausted or is empty then * there can not be any matching characters in * string A and B*/ if(A.length()==0||B.length()==0) { return 0; } /*As on this state of recursion we are working * on first characters of strings, in next recursion * we need to recur for remaining strings with first * characters removed */ String remA = A.substring(1); String remB = B.substring(1); /* if the current first characters of both strings * matches then we add 1 to the LCS length and then * recur to find answer for remaining Strings*/ if(A.charAt(0)==B.charAt(0)) { int remRes = LCS(remA, remB); return 1 + remRes; } else { /*if the current first characters of both strings * does not match, then maximum length of LCS of * current string will be the max LCS of * (remaining A and B) or (A and remaining B) */ int remRes = Math.max(LCS(remA, B), LCS(A, remB)); return remRes; } } }
Approach – II : Dynamic Programming:
We can clearly see the duplication of the calls represented by red colour in
the recursion tree where result of same problem is getting calculated again.
These Overlapping Subproblems are leading to an increase in Time Complexity
of this algorithm.
Also, The result of
bigger problem can be calculated with the help of results of smaller
subproblems as we discussed in the recursive approach how we calculated the
maximum length of LCS of a bigger string with the help of results of the
smaller (shortened) strings.
As this
problem has both the properties of Dynamic Programming, that is, Overlapping
subproblems and Optimal Substructure. Therefore, we can use an approach
involving dynamic programming to efficiently solve this problem.
We keep a Two – Dimensional matrix DP[][] of length (N+1) x (M+1)
where N and M are the lengths of two given Strings A and B. Any cell
DP[i][j] will represent the maximum length of LCS of two substrings, that
is, A[1...i]
& B[1...j]
, where A and B are the two given Strings.
Base cases would be the problems whose solution is known without any further
computation. In this approach,DP[0][j] = 0
and DP[i][j] = 0
wherein both the cases atleast one string has length zero and in that
case LCS would obviously be zero. Also to store this base case result,
Length of DP matrix is one plus the length of given Strings.
The final result of this problem will be on the cell DP[N][M] as this cell
represents the LCS of
strings A[1...N]
& B[1...M]
, where A and B are the two given Strings and N & M are their lengths
respectively
Discussion of Optimal Substructure :
CASE-1 :
We know, if the first character of the current strings/ substrings matches
then we make call for the shortened strings and add one to the LCS being
calculated as the current characters match.
Similarly here in this DP approach, if A[i] = B[j]
that is, character at ith position in A matches with character at jth
position in B, then,
DP[i][j] = 1 + DP[i-1][j-1]
DP[i-1][j-1]
is cell where the result of shortened string lies, that is, A[1…i-1]
and B[1…j-1]
Case-2:
if the first character of the current strings/ substrings does not match
then we made two separate calls in recursive approach and final length of
LCS is the result of Maximum of those two calls.
Similarly, in this DP approach if A[i] != B[j] then,
DP[i][j] = max (DP[i-1][j] , DP[i][j-1])
where,DP[i-1][j]
contains the maximum Length of LCS of A[1...i-1]
and B[1...j]
and,DP[i][j-1]
contains the maximum Length of LCS of A[1...i]
and B[1...j-1]
.
Discussion of time complexity :
As we use a Matrix of size N X M where N and M are the lengths of the given
two strings, and as we progress we calculate result for every cell.
That is why, the Worst time complexity of this approach for
solving this problem would be O(NM)
.
package DynamicProgramming; import java.util.Scanner; public class LongestCommonSubs { public static void main(String[] args) { Scanner scn = new Scanner(System.in); String A = scn.nextLine(); String B = scn.nextLine(); System.out.println(LCS(A, B)); } public static int LCS(String A, String B) { /* matrix for storing results of the subproblems, size is * one more than the string length to store the base case * results*/ int[][] dp = new int[A.length() + 1][B.length() + 1]; for (int i = 1; i < dp.length; i++) { for (int j = 1; j < dp[0].length; j++) { /*checking characters at index - 1 as 0th character * of given String is at 1st index in DP matrix */ if (A.charAt(i-1) == B.charAt(j-1)) { /*optimal substructure as explained in the article*/ dp[i][j] = 1 + dp[i - 1][j - 1]; } else { dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]); } } } return dp[A.length()][B.length()]; } }
That’s all about Longest common Subsequence in java