Strings Boot Camp

Bootcamp Strings

A string is a sequence of characters that is usually implemented as an array of character data types. Depending on the language implementation a string can be mutable or immutable. This is a very important distinction and it’s important to understand how strings are implemented in the language that you are choosing for your interview. In this article, we will discuss how this affects the runtime complexity analysis of your solution and how to avoid common pitfalls of designing algorithms to manipulate strings.

The String class in Java is the fundamental data type for representing strings, and a string in quotes (ie. “abc”) is called a string literal. String literals are stored in heap memory in an area called String Constant Pool: every time a string literal is used, the JVM looks in the string pool to check if it contains a string literal equal to the new string, and if this is the case it points the new variable to the same string literal in memory (this is OK because strings are immutable in Java so the same object can be safely shared).

Common patterns

String Manipulation algorithms are a class of algorithms that involve processing a string to find a specific pattern, count its elements or manipulate its content. Here are a few fundamental techniques and operations on strings that are used in a variety of problems. Familiarize with these concepts and practice implementing them.

Concatenating strings

In languages that implement strings as immutable objects (like Java) concatenating strings iteratively can be extremely inefficient because each concatenation requires the creation of a brand new string object. Consider the following code snippet:

String str = "";
char[] chars = new char[]{'a', 'b', 'c'};

for (char c : chars) {
  str = str + c;
}

At each concatenation, a new string is created and all the characters are copied over. In the first iteration, 1 character is copied (because str is empty so only the first character is copied). In the second iteration, 2 characters are copied, in the third iteration 3 characters are copied and so on, all the way to N (when N is the number of characters in str). The cost of this operation is then O(n^2) because 1 + 2 + 3 + ... + n = n(n+1)/2 = O(n^2). In Java, a good way to avoid this pitfall is to use StringBuilder to concatenate the string and then build the object at the end.

Counting characters

Many problems involving strings require to count the occurrence of characters in the string. Typically this is achieved using a HashMap data structure where the key is the character and the value is an int representing the occurrence of the character.

String str = "aabc";

Map<Character, Integer> counter = new HashMap<>(str.length());
for (char c : str.toCharArray()) {
    // Initialize counter of c to 0 if not seen before
    counter.putIfAbsent(c, 0);

    counter.put(c, counter.get(c)+1);
}

Alternatively, an array of integers can be used. Assuming that we are dealing with ASCII characters we could use a length-256 array where the index is the ASCII value of the character and the value is the count. A typical mistake is to think that the complexity of the above algorithm is O(n) where n == s.length(). What’s important to understand is that most of the time the alphabet is well defined and has a fixed length, which means that the running time of the above algorithm is constant O(1).

Sliding window

The sliding window technique is one of the most common patterns and consists of moving two pointers (representing the beginning and end of the “window”) and only consider the characters inside that window at any given time.

String to integer

int stringToint(String s) {
 	int res = 0;
  for (int i=0; i < s.length();  i++) {
      res += (Math.pow(10, i) * charToInt(s.charAt(s.length() - i - 1)));
  }
  return res;
}

int charToInt(char c) {
  switch (c) {
    case '0': return 0;
    case '1': return 1;
    case '2': return 2;
    case '3': return 3;
    case '4': return 4;
    case '5': return 5;
    case '6': return 6;
    case '7': return 7;
    case '8': return 8;
    case '9': return 9;
    default: return -1;
  }
}

Enumerate all substrings of a string

public static void findAllSubstrings(String s) {
  for(int i = 0; i < s.length(); i++) {
    for(int j = i+1; j <= s.length(); j++) {
      System.out.println(s.substring(i, j));
    }
  }
}

Reverse a string

public static String reverse(String s) {
  if (s.length() == 0)
    return StringUtils.EMPTY;

  char[] arr = s.toCharArray();

  for (int start=0; start < arr.length/2; start++) {
    // Note that we don't need to manipulate 2 pointers, we can
    // calculate the right pointer from the left pointer an the
    // length of the string
    int end = arr.length - start - 1;

    char tmp = arr[end];
    arr[end] = arr[start];
    arr[start] = tmp;
  }

  return new String(arr);
}

Useful string methods in Java

length(): Returns the length of the string.

equals(String s): Compares the string to s.

contains(String s): Returns true if the is string contains s.

charAt(int i): Returns the character at index i.

substring(int i, int j): Returns the string starting at index i (inclusive) and finishing at index j (exclusive).

toCharArray(): Converts the string in a sequence of characters.

Common pitfalls

  1. Many operations (like System.out.println()) take linear time on strings.
  2. Strings in Java are immutable and final
  3. Be mindful of the security implications of storing sensitive data in a string: because string literals are stored in the string pool they will stay in memory for a longer time, which means it can be available to everyone that can do a memory dump)

Fundamental algorithms to know