alexn.org

Regexp Replacement via Function in Java/Scala

Article explains how to have fine grained control over replacements when using String.replaceAll or similar in Java or Scala. The samples given are in Scala, but if you're a Java user, the snippets can be easily translated without any issues.

The following task should be obvious to accomplish in Scala / Java, yet I've lost 2 hours on it, because Java's standard library is aged and arcane.

Writing this article mostly for myself, to dump this somewhere 🙂

In Ruby if you want to replace all occurrences of a string, via a regular expression, you can use gsub:

"HelloWorld!".gsub(/(?<=[a-z0-9_])[A-Z]/, ' \0')
#=> Hello World!

This API is available in Scala/Java as well:

"HelloWorld!".replaceAll("(?<=[a-z0-9_])[A-Z]", " $0")
// res: String = Hello World!

However Ruby goes one step further and accepts as the replacement a function block:

"HelloWorld!".gsub(/(?<=[a-z0-9_])[A-Z]/) {|ch| ch.downcase }
#=> Hello world!

"Apollo 12".gsub(/\d+/) {|num| num.to_i + 1}
#=> "Apollo 13"

The Java API exposed by Pattern is a little awkward, so lets see how to do this in Scala / Java:

type Index = Int
type MatchGroup = String

def replaceAll(regex: Pattern, input: String)
  (f: (Index, MatchGroup, List[MatchGroup]) => String): String = {

  val m = regex.matcher(input)
  val sb = new StringBuffer

  while (m.find()) {
    val groups = {
      val buffer = ListBuffer.empty[String]
      var i = 0
      while (i < m.groupCount()) {
        buffer += m.group(i + 1)
        i += 1
      }
      buffer.toList
    }
    val replacement = f(m.start(), m.group(), groups)
    m.appendReplacement(sb, Matcher.quoteReplacement(replacement))
  }

  m.appendTail(sb)
  sb.toString
}

What happens here is that Matcher allows you to replace all occurrences of a string with an iterator-like protocol by using:

  • .find: for finding the next occurrence in a while loop
  • .appendReplacement: which appends the remaining text after the discovery of the last match and the current one, plus the replacement that you've calculated for the current match
    • Note this requires the usage of quoteReplacement, because otherwise the logic in appendReplacement will treat certain special chars like \ and $, so this specifies that you want the replacement to be verbatim
  • .appendTail: to append to the final string whatever is left

We can now describe something like this:

def camelCaseToSnakeCase(input: String): String =
  replaceAll("[A-Z](?=[a-z0-9_])".r.pattern, input) { (i, ch, _) =>
    (if (i > 0) "_" else "") + ch.toLowerCase
  }

// And usage:
camelCaseToSnakeCase("RebuildSubscribersCounts")
//=> res: String = rebuild_subscribers_counts

Or like this:

def incrementNumbersIn(input: String): String =
  replaceAll("\\d+".r.pattern, input) { (_, num, _) =>
    (num.toInt + 1).toString
  }

// And usage:
incrementNumbersIn("Apollo 12")
//=> res: String = Apollo 13

Enjoy ~