The average run time of naive method can be O(n2), which is straightforward to design. So here are the other two possible methods whose time complexity are O(nlogn) and O(n).
Let's assume the strings to be compared are a and b, and the functions are:
string comchr(string&, string&)
and
string comchr2(string&, string&)
in C++ respectively.
1. comchr(): Sort the strings first and then find the common characters - O(nlogn)
If we use merge sort to sort the two strings and then do the comparison, the average and worst time complexity of the problem will be of order O(nlogn), where n is the max(a.length(), b.length()). So here are the C++ implementation of the algorithm:
string comchr(string &a, string &b)
{
string r;
string::size_type i, j, k;
if (a.length()+b.length()<=1) return "";
if (a.compare(b)==0) return a;
{
string r;
string::size_type i, j, k;
if (a.length()+b.length()<=1) return "";
if (a.compare(b)==0) return a;
merge_sort(a, 0, a.length()-1);
merge_sort(b, 0, b.length()-1);
i=j=k=0;
while (i<a.length() && j<b.length())
{
if (a[i]==b[j])
{
if (k==0 || r[k-1]!=a[i])
{
r.push_back(a[i]);
k++;
}
i++; j++;
}
else
(a[i]<b[j]) ? i++ : j++;
return r;
}
The merge_sort() is the merge sort routine that sorts the string and contributes O(nlogn) to the complexity.merge_sort(b, 0, b.length()-1);
i=j=k=0;
while (i<a.length() && j<b.length())
{
if (a[i]==b[j])
{
if (k==0 || r[k-1]!=a[i])
{
r.push_back(a[i]);
k++;
}
i++; j++;
}
else
(a[i]<b[j]) ? i++ : j++;
return r;
}
2. comchr2(): Map the characters in string a into a set and then compare each of the characters in this set with string b - O(n)
A simple way to improve the running time is to use a direct-address access table (here called t) for insertion (O(1)) of each character of one string and for table look up (O(1)). The implementation could be:
string comchr2(string &a, string &b)
{
string r;
set<char> s; //s to remove duplicate chars
bitset<256> t; // All ASCII char
if (a.length()+b.length()<=1) return "";
if (a.length()+b.length()<=1) return "";
if (a.compare(b)==0) return a;
for (string::size_type i=0; i<a.length(); i++)
if (!t.test(a[i]))
t.set(a[i]);
for (string::size_type i=0; i<b.length(); i++)
if (t.test(b[i]))
s.insert(b[i]);
for (set<char>::iterator it=s.begin(); it!=s.end(); it++)
r.push_back(char(*it));
return r;
}
Note that the function uses a bitset t for direct-address access. Of course, the comchr2() is easier to design
A simple experiment to compare the performance of the two functions is done using test set:
a b
Test 1 abcdefghi jklmnopqr
Test 2 abcdefghi abcdefghi
Test 3 abcdefghi aaabbbccc
The same function will be called 1000000 times to scale up the time to run the problem. The average running time (in second) of each function on different strings is shown below:
The comchr2() function runs faster than the comchr() when the test strings are not 'too' similar. And both functions perform similarly on the test case 2 because the test strings are same in both cases.
4. Conclusion
In programming practice, sometimes the simplest approach may also be the 'best'! Given the scale of the problem, the computing resource or the performance requirement, we may tune the program in different ways or even have different solutions.
For example, what if the strings we compare use wide character set (e.g. Unicode)?
What if the strings are stored in separate files on disk?