Regexp Replacement via Function in Java/Scala
Article explains how to have fine grained control over replacements when using String.replaceAll or similar in Java or Scala. The samples given are in Scala, but if you’re a Java user, the snippets can be easily translated without any issues.
The following task should be obvious to accomplish in Scala / Java,
yet I’ve lost 2 hours on it, because Java’s standard library is aged
and arcane.
Writing this article mostly for myself, to dump this somewhere 🙂
In Ruby if you want to replace all occurrences of a string, via a regular expression, you can use gsub:
"HelloWorld!".gsub(/(?<=[a-z0-9_])[A-Z]/, ' \0')
#=> Hello World!
This API is available in Scala/Java as well:
"HelloWorld!".replaceAll("(?<=[a-z0-9_])[A-Z]", " $0")
// res: String = Hello World!
However Ruby goes one step further and accepts as the replacement a function block:
"HelloWorld!".gsub(/(?<=[a-z0-9_])[A-Z]/) {|ch| ch.downcase }
#=> Hello world!
"Apollo 12".gsub(/\d+/) {|num| num.to_i + 1}
#=> "Apollo 13"
The Java API exposed by Pattern is a little awkward, so lets see how to do this in Scala / Java:
type Index = Int
type MatchGroup = String
def replaceAll(regex: Pattern, input: String)
(f: (Index, MatchGroup, List[MatchGroup]) => String): String = {
val m = regex.matcher(input)
val sb = new StringBuffer
while (m.find()) {
val groups = {
val buffer = ListBuffer.empty[String]
var i = 0
while (i < m.groupCount()) {
buffer += m.group(i + 1)
i += 1
}
buffer.toList
}
val replacement = f(m.start(), m.group(), groups)
m.appendReplacement(sb, Matcher.quoteReplacement(replacement))
}
m.appendTail(sb)
sb.toString
}
What happens here is that Matcher allows you to replace all occurrences of a string with an iterator-like protocol by using:
- .find: for finding the next occurrence in a
while
loop - .appendReplacement: which appends the remaining text after the discovery of the last match and the current one, plus the replacement that you’ve calculated for the current match
- Note this requires the usage of quoteReplacement, because otherwise the logic in
appendReplacement
will treat certain special chars like\
and$
, so this specifies that you want the replacement to be verbatim
- Note this requires the usage of quoteReplacement, because otherwise the logic in
- .appendTail: to append to the final string whatever is left
We can now describe something like this:
def camelCaseToSnakeCase(input: String): String =
replaceAll("[A-Z](?=[a-z0-9_])".r.pattern, input) { (i, ch, _) =>
(if (i > 0) "_" else "") + ch.toLowerCase
}
// And usage:
camelCaseToSnakeCase("RebuildSubscribersCounts")
//=> res: String = rebuild_subscribers_counts
Or like this:
def incrementNumbersIn(input: String): String =
replaceAll("\\d+".r.pattern, input) { (_, num, _) =>
(num.toInt + 1).toString
}
// And usage:
incrementNumbersIn("Apollo 12")
//=> res: String = Apollo 13
Enjoy ~