Algorithm {Longest substring with 2 unique chars}


Hi,

A few days ago my friend asked me question,

How can we find longest substring which contains 2 unique characters in linear time without additional memory ?

Yeah, we spoke about algorithms :). I suggested to use 2 cursors like start and end and just get substring by these cursors. Generally idea is right, but how can we manage these cursors ? This is quite hard. Maybe not so hard but need some time to think about it.

OK, what do we have ?

1)  we have input string like next abcc;

2)  we should not use additional memory, it means we can store some indexes (like start and end cursors), maybe some additional points, but cannot clone input string or use some data – structure to manage string, I mean trie tree and/or so on.

3) complexity should be linear

OK,  we have all information about algorithm and can start development. Yes, firstly I’ve provided these implementation after that he told me about algorithm Kadane.

Yes, we should modify it, but idea is great, ok lets go to implementation:

import java.util.Iterator;
import java.util.LinkedHashSet;
import java.util.Set;
 
/** * * @author szagriichuk */
public class LongestSubstring {
    public String subString(String data) {
 
        if (data == null || data.isEmpty()) {
            return "";
        }
        if (data.length() <= 2) {
            return data;
        }
 
        Set<Character> chars = new LinkedHashSet<Character>();
        String maxString = "";
 
        int start = 0;
        int end;
 
        for (end = 0; end < data.length(); end++) {
            chars.add(data.charAt(end));
            if (chars.size() > 2) // next letter in string
            {
                Character removed = removeFirst(chars);
                String tempString = data.substring(start, end);
                maxString = chooseMaxString(maxString, tempString);
                start = moveStart(start, removed, tempString);
            }
        }
        if (chars.size() <= 2) {
            maxString = chooseMaxString(maxString, data.substring(start, end));
        }
        return maxString;
    }
 
    private String chooseMaxString(String maxString, String tempString) {
        if (maxString.length() < tempString.length()) {
            return tempString;
        }
        return maxString;
    }
 
    private int moveStart(int start, Character removed, String data) {
        int result = data.length() - 2;
        Character lastChar = data.charAt(data.length() - 1);
        Character prevChar = data.charAt(result);
        while (prevChar.equals(lastChar)) {
            prevChar = data.charAt(--result);
        }
        return data.indexOf(removed) < 0 ? start + result : start + result + 1 ; // +1 because we took last char
    }
 
    private Character removeFirst(Set<Character> chars) {
        Character character = null;
        Iterator<Character> iterator = chars.iterator();
        if (iterator.hasNext()) {
            character = iterator.next();
            iterator.remove();
        }
        return character;
    }
}

I am using next variables:

        Set<Character> chars = new LinkedHashSet<Character>();
        String maxString = "";
 
        int start = 0;
        int end;
 

Instead of Set we can use two Characters, but it’s more readable use Set and store just 2 chars in it.

Start and End these are two cursors I’ve written above and maxString is result string of the our algorithm.

All steps in implementation are easy but I want to draw your attention to method moveStart. This implementation contains simple trick :). But I think all of you can understand it !!!

OK, testing code (maybe there are some problems in code I’ve tested on a few test data)

        LongestSubstring longestSubstring = new LongestSubstring();
        assertEquals("", longestSubstring.subString(null));
        assertEquals("", longestSubstring.subString(""));
        assertEquals("bcbbbbcccb", longestSubstring.subString("abcbbbbcccbdddadacb"));
        assertEquals("a", longestSubstring.subString("a"));
        assertEquals("ab", longestSubstring.subString("ab"));
        assertEquals("ab", longestSubstring.subString("abc"));
        assertEquals("bcc", longestSubstring.subString("abcc"));
        assertEquals("aaaaaaaaaaaaaaab", longestSubstring.subString("aaaaaaaaaaaaaaabcccccc"));
        assertEquals("aaaaaaaaaaaaaaa", longestSubstring.subString("aaaaaaaaaaaaaaa"));

Looks like working in linear time :)

Thanks.

About these ads

2 thoughts on “Algorithm {Longest substring with 2 unique chars}

  1. Wow! You are on fire right now! :) 2 days – 2 stories. Keep it up!

    Just a quick comment to let you know about some errors.
    I’ve made some aditional tests and got invalid result on this one:
    assertEquals(“acac”, longestSubstring.subString(“abacac”));

    Expected :acac
    Actual :aba

    P.S. Maybe you can share your result projects on Github or on another hosting service?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s