Blog
navigate_next
Java
Java and the String Odyssey: Navigating Changes from JDK 1 to JDK 21
A N M Bazlur Rehman
November 20, 2023

Java has been a significant player in software engineering since its inception in 1995. Through the years, it has undergone significant evolution. Among its many features, one key aspect is how Java handles text; in fact, a <span  class="teal" >String</span> is a heavily used object in Java programs. On average, 50% of a typical Java heap may be consumed by <span  class="pink" >String</span> objects, which is substantial.

This article explores the evolution of string handling in Java, starting from its first release up to the latest version, Java 21.

Early Java of String Handling

In JDK 1, Java introduced the <span  class="pink" >String</span> class as an immutable sequence of characters, a choice that was made keeping reliability and security in mind. Immutable strings are thread-safe, allowing safe use across multiple threads in multi-threaded applications—their predictability and resistance to tampering secure sensitive data like network addresses and file paths. Java's string pooling, enabled by this immutability, efficiently stores only one copy of each unique string, reducing memory usage.

String concatenation in JDK 1 primarily uses the <span  class="teal" >+</span> operator. For example,

 
 String greeting = "Hello, " + name + "!". 

However, this method had efficiency concerns, especially for multiple concatenations. Each + operator usage in concatenation resulted in the creation of a new String object. This was particularly inefficient in scenarios like loops where concatenating a list of strings would create a new String object at every iteration, causing substantial performance overhead and increased memory usage.

 
String[] words = {"Java", "is", "cool"};
String sentence = "";

for (String word : words) {
   sentence = sentence + word + " "; // Inefficient concatenation
}

System.out.println(sentence.trim()); 

The above code will create seven string objects to construct one.

JDK 1 to JDK 5: Introduction of StringBuffer and StringBuilder

To solve the efficiency concern, Java introduced the StringBuffer class, which provides a mutable sequence of characters. This was a game-changer for string manipulation, especially in scenarios involving frequent modifications. For example-

 
public class StringBufferExample {
   public static void main(String[] args) {
       StringBuffer stringBuffer = new StringBuffer();
       String[] words = {"Java", "evolves", "with", "time"};

       for (String word : words) {
           stringBuffer.append(word).append(" ");
       }

       System.out.println(stringBuffer.toString().trim());
   }
}

JDK 5 took string manipulation a step further with the introduction of StringBuilder. It was similar to <span  class="pink" >StringBuffer</span> in providing a mutable sequence of characters but differed in a crucial aspect.

The major difference between these two is that <span  class="pink" >StringBuilder</span> is a bit faster and more suitable for single-threaded scenarios as it's not thread-safe. In contrast, <span  class="pink" >StringBuffer</span> is thread-safe and slightly slower due to its synchronized methods, making it ideal for multi-threaded environments. Both offer similar APIs, allowing for easy interchangeability based on your thread safety requirements.

Enhanced String Processing

Moving on, JDK 6 and JDK 7 continued to refine string handling, focusing more on performance optimizations rather than introducing new APIs. The major leap in string manipulation came with JDK 8, which introduced lambda expressions and the Stream API, revolutionizing how developers could handle data, including strings.

With JDK 8, operations on collections of strings became more concise and expressive due to lambda expressions and streams.

 
List words = Arrays.asList("Java", "is", "evolving");
String combined = words.stream()
                       .map(String::toUpperCase)
                       .collect(Collectors.joining(" "));

// Output: "JAVA IS EVOLVING"

In this example, we transform a list of strings into a single concatenated string. Each word is converted to uppercase, and then they are joined. This approach is more readable and eliminates the need for manual iteration and string concatenation.

Besides, JEP 192 reduces the Java heap live-data set by enhancing the G1 garbage collector so that duplicate instances of String are automatically and continuously deduplicated.

JDK 9 to 11: Compact Strings and API Enhancements

In Java 9, there has been a significant improvement in how string concatenation is handled at the bytecode level. The introduction of invokedynamic, a special bytecode instruction, has changed the game.

When concatenating strings using the <span  class="pink" >+</span> operator, Java 9 and later versions use invokedynamic, which delegates the optimization responsibility to java.lang.invoke.StringConcatFactory#makeConcatWithConstants. This method is more efficient in optimizing string concatenation. Consider the following code:

 
public class StringConcatenation {
    public static void main(String[] args) {
        String[] words = {"Java", "is", "cool"};
        String sentence = "";

        for (String word : words) {
            sentence = sentence + word + " "; // Inefficient concatenation
        }

        System.out.println(sentence.trim());
    }
}

The equivalent bytecode of the above code, where concatenation happens, would be:

 
46: aload         6
48: invokedynamic #7,  0              // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
53: astore_2

This optimization is a substantial under-the-hood improvement, reducing the memory and performance overhead of string concatenation.

For those interested in the technical details of this improvement, JEP 280 offers an in-depth explanation. This Java Enhancement Proposal details the changes and optimizations brought by <span  class="pink" >invokedynamic</span> for string concatenation.

Aside from this, JDK 9 introduced another significant change in the internal representation of strings with the introduction of compact strings with JEP 254.

In Java 8 and earlier versions, strings were represented as an array of characters (<span  class="teal" >char[]</span>), with each char occupying two bytes of memory. This representation was not always memory-efficient, especially considering that many characters in Western locales could be encoded using just one byte.

Consider the string <span  class="white" "Hello"</span>:

  • Size of a <span  class="teal"> char[]</span> array object: 8 bytes (object header)
  • Size of 5 characters (<span  class="teal"> char</span>): 5 * 2 bytes = 10 bytes
  • Array length (<span  class="teal"> integer</span>): 4 bytes
  • Total size: 8 bytes (header) + 10 bytes (characters) + 4 bytes (length) = 22 bytes

With this idea, the new compact strings encode a string with an 8-bit byte array instead of a <span  class="teal"> char</span> array. Unless they explicitly need 16-bit characters, these strings are known as compact strings. Hence, the size of an average string in Java 9 is roughly half the size of the same string in Java 8.

On average, 50% of a typical Java heap may be consumed by String objects. This will vary from application to application, but on average, the heap requirement for such a program running with Java 9 is only 75% of that same program running in Java 8.

This is a huge saving.

Nonetheless, JDK 11 continued to expand the String API, introducing methods like <span  class="teal"> strip()</span>, <span  class="teal" >stripLeading()</span>, <span  class="teal" >stripTrailing()</span>, <span  class="teal" >repeat()</span>, and <span  class="teal" >isBlank()</span>.

 
String text = "   Hello Java!   ";
String trimmed = text.strip();    // "Hello Java!"
String repeated = text.repeat(2); // "   Hello Java!      Hello Java!   "
boolean blank = text.isBlank();   // false

These methods made common string operations more straightforward, reducing the need for external libraries or custom utility methods.

JDK 12 to 15: Incremental Improvements

During the releases of JDK 12 to 15, Java focused on incremental improvements and refinements in string handling. These versions introduced several new methods and enhancements to the String class, making string operations more intuitive and efficient.

JDK 12 introduced new methods to the <span  class="teal">String</span> class, further simplifying common string operations.

 
String text = "Java\nEvolution";
String indentedText = text.indent(4);  // Adds four spaces to the beginning of each line
// Result: "    Java\n    Evolution"

String transformed = text.transform(s -> new StringBuilder(s).reverse().toString());
// Result: "noitulovE\navaJ"

The <span  class="teal">indent()</span> method adds or removes spaces from each line in the string, while <span  class="teal">transform()</span>allows applying a function to the string.

JDK 15: Text Blocks

One of the most significant additions in JDK 15 was the introduction of Text Blocks, which greatly enhanced working with multi-line string literals.

 
String html = """
              <html>
                  <body>
                      <p>Hello, Java 13!</p>
                  </body>
              </html>
              """;

Text blocks simplify the creation of multi-line strings, preserving the intended formatting without the need for escape sequences.

 
String name = "John";
String greeting = """
                 Hello,
                 Dear %s,
                 Welcome to our service.
                 """.formatted(name);

Text blocks can be easily concatenated with other strings or variables, maintaining readability and structure.

Creating complex SQL queries in Java becomes more manageable and readable with the use of Text Blocks. Let's consider an example where we need to construct a SQL query for retrieving data from a database. This query involves multiple joins, conditions, and potentially complex logic:

 
String complexSQL = """
    SELECT 
        u.name AS UserName,
        p.title AS PostTitle,
        c.name AS CategoryName
    FROM 
        Users u
    INNER JOIN 
        Posts p ON u.id = p.user_id
    LEFT JOIN 
        Categories c ON p.category_id = c.id
    WHERE 
        u.status = 'active'
        AND p.published_date >= '2022-01-01'
        AND (
            c.name = 'Technology'
            OR c.name = 'Science'
        )
    ORDER BY 
        p.published_date DESC
    LIMIT 10;
    """;

Imagine this if we had to write in the old ways with the + operator.

JDK 21: String Template

JEP 430 introduces String Templates as a preview feature in Java 21. This enhancement aims to simplify Java programming by allowing the combination of literal text with embedded expressions and template processors. It is extremely useful for strings that include runtime-computed values or are composed of user-provided values for systems like databases.

With this, Java developers can now enhance the language's string literals and text blocks with string templates. This new feature aims to simplify writing Java programs, improve the readability of expressions that mix text and expressions, and enhance the security of Java programs, especially those that compose strings from user-provided values.

Let’s explore it a bit in depth.

Template Expressions

A new kind of expression called a template expression has been introduced, allowing developers to perform string interpolation and compose strings safely and efficiently. Template expressions are programmable and extend beyond composing strings – they can convert structured text into various types of objects according to domain-specific rules.

 
String name = "Joan";
String info = STR."My name is \{name}";
assert info.equals("My name is Joan"); // true

In this example, the template expression is prefixed and combined with embedded expressions, providing a safe and efficient way to compose strings.

Unlike traditional string interpolation, which can create security vulnerabilities, Java's template expressions require validation and sanitization of strings with embedded expressions. This approach automatically applies template-specific rules, resulting in safer and more efficient string composition.

For example, consider this hypothetical Java code with the embedded expression <span  class="teal">${name}</span>:

 
String query = "SELECT * FROM Person p WHERE p.last_name = '${name}'";
ResultSet rs = connection.createStatement().executeQuery(query);

If <span  class="teal">name</span> had the troublesome value

 
Smith' OR p.last_name <> 'Smith

then the query string would be

 
SELECT * FROM Person p WHERE p.last_name = 'Smith' OR p.last_name <> 'Smith'

and the code would select all rows, potentially exposing confidential information.

To avoid such vulnerability, Java took a safer approach. For example, when composing SQL statements, any quotes in the values of embedded expressions must be escaped, and the string overall must have balanced quotes.

The STR template processor

<span  class="pink">STR</span> is a template processor defined in the Java Platform. It performs string interpolation by replacing each embedded expression in the template with the value of that expression, converted to a string.

Let's see another example:

 
String title = "My Web Page";
String text  = "Hello, world";
String html = STR."""
        <html>
          <head>
            <title>\{title}</title>
          </head>
          <body>
            <p>\{text}</p>
          </body>
        </html>
        """;

This example demonstrates how template expressions can be used to create structured HTML content safely and efficiently.

<span  class="pink">STR</span> is a <span  class="teal">public</span> <span  class="pink">static</span> <span  class="pink">final</span> field that is automatically imported into every Java source file.

The FMT Template Processor

Alongside <span  class="pink">STR</span>, Java introduces <span  class="pink">FMT</span>, another template processor with additional capabilities. Like <span  class="pink">STR</span>, <span  class="pink">FMT</span> performs interpolation, but it uniquely interprets format specifiers positioned to the left of embedded expressions. These format specifiers are consistent with those defined in java.util.Formatter, providing familiar syntax for those accustomed to Java's standard formatting utilities.

The <span  class="pink">FMT</span> processor is particularly useful for creating structured and formatted outputs, where alignment and numerical formatting are crucial.

Consider an example where we define a Rectangle record and create an array of these objects. Using <span  class="pink">FMT</span>, we can format a table that neatly displays the properties and computed area of each rectangle.

 
record Rectangle(String name, double width, double height) {
    double area() {
        return width * height;
    }
}

Rectangle[] zone = new Rectangle[] {
    new Rectangle("Alfa", 17.8, 31.4),
    new Rectangle("Bravo", 9.6, 12.4),
    new Rectangle("Charlie", 7.1, 11.23),
};

String table = FMT."""
    Description     Width    Height     Area
    %-12s\{zone[0].name}  %7.2f\{zone[0].width}  %7.2f\{zone[0].height}     %7.2f\{zone[0].area()}
    %-12s\{zone[1].name}  %7.2f\{zone[1].width}  %7.2f\{zone[1].height}     %7.2f\{zone[1].area()}
    %-12s\{zone[2].name}  %7.2f\{zone[2].width}  %7.2f\{zone[2].height}     %7.2f\{zone[2].area()}
    \{" ".repeat(28)} Total %7.2f\{zone[0].area() + zone[1].area() + zone[2].area()}
    """;

//Output:
Description     Width    Height     Area
Alfa            17.80    31.40      558.92
Bravo            9.60    12.40      119.04
Charlie          7.10    11.23       79.73
                             Total  757.69

This code snippet creates a well-structured table, demonstrating the power of  <span  class="pink">FMT</span> in handling complex string formatting scenarios.

User-Defined Template Processors

Beyond the built-in template processors <span  class="pink">STR</span> and <span  class="pink">FMT</span>, Java allows developers to create custom template processors. This flexibility opens a realm of possibilities for string manipulation tailored to specific application needs.

A template processor is essentially an instance of the functional interface StringTemplate.Processor. It implements the process method, which takes a StringTemplate and returns an object. Static fields like <span  class="pink">STR</span> simply store instances of such classes.

StringTemplate represents the template used in a template expression. It exposes the text fragments and the values of embedded expressions. These two components – fragments and values – are key to how custom template processors operate.

Developers can define their own template processors, leveraging the StringTemplate class to create specialized string composition behaviors.

 
var INTER = StringTemplate.Processor.of((StringTemplate st) -> {
    StringBuilder sb = new StringBuilder();
    Iterator<String> fragIter = st.fragments().iterator();
    for (Object value : st.values()) {
        sb.append(fragIter.next());
        sb.append(value);
    }
    sb.append(fragIter.next());
    return sb.toString();
});

int x = 10, y = 20;
String s = INTER."\{x} plus \{y} equals \{x + y}";
// Output: "10 plus 20 equals 30"

In this example, the custom processor <span  class="pink">INTER</span> alternates between appending fragments and values to construct the final string.

Let’s consider another scenario where we want to embed code snippets within a text in a way that clearly differentiates them from the surrounding text. This is particularly useful in technical writing, documentation, or educational materials.

To achieve this, we can define a custom template processor, <span  class="pink">CODE</span>, that processes a template to format and embed code snippets using a specific syntax (like backticks in Markdown).

The <span  class="pink">CODE</span> processor handles the embedding of Java class names or code snippets within a regular text, formatting them distinctly.

 
var CODE = StringTemplate.Processor.of((template) -> {
   List<Object> values = template.values();
   Iterator<String> fragIter = template.fragments().iterator();
   StringBuilder builder = new StringBuilder();
   for (Object value : values) {
       String next = fragIter.next();
       builder.append(next);
       builder.append(STR."`\{value}`");
   }
   builder.append(fragIter.next());
   return builder.toString();
});

String output = CODE."Use the \{String.class.getName()} class in Java for text manipulation.";
System.out.println(output);

// Output: Use the `java.lang.String` class in Java for text manipulation.

In this example, the <span  class="pink">CODE</span> processor wraps the <span  class="teal">String.class.getName()</span> expression within backticks, clearly marking it as a code snippet within the text.

Beyond simple string manipulation, Java's template processor API is robust enough to accommodate the creation of more complex data structures. A prime example of this is a template processor that returns instances of <span  class="teal">JSONObject</span>.

The ability to dynamically create JSON objects in a structured and safe manner is crucial in many modern applications, especially those involving data interchange and web APIs. Java's template processors can be leveraged to achieve this with great efficiency.

Here's how we can create a custom template processor that interprets the template expression to produce a <span  class="teal">JSONObject</span>:

 
import org.json.JSONObject; // Assuming the use of a common JSON library

var JSON = StringTemplate.Processor.of((StringTemplate st) -> {
            JSONObject json = new JSONObject();
            Iterator<Object> valueIterator = st.values().iterator();
            for (String string : st.fragments()) {
                String key = string.trim();
                if (!key.isEmpty() && valueIterator.hasNext()) {
                    Object value = valueIterator.next();
                    json.put(key, value);
                }
            }
            return json;
        });

        String name = "Java";
        int version = 21;
        JSONObject jsonObject = JSON."name: \{name}, version: \{version}";
        System.out.println(jsonObject.toString());

// Output: {"name":"Java","version":21}

In this implementation, the JSON extracts keys and values from the template and uses them to construct a <span  class="teal">JSONObject</span>. This approach is particularly useful for building JSON objects dynamically, with data coming from various sources in the application.

💡 It's important to note that <span  class="pink">StringTemplate</span> is currently a preview feature. Developers looking to experiment with string templates and custom template processors must enable these features explicitly. This is done by adding the <span  class="teal">--enable-preview</span> flag while compiling and running Java applications. For instance:
<span  class="teal">javac --release 21 --enable-preview Example.java</span>
<span  class="teal">javac --enable-preview Example</span>

Conclusion

The journey of string manipulation in Java, from its inception in JDK 1 to the sophisticated advancements in JDK 21, showcases a remarkable evolution. Initially focusing on immutability for security and stability, Java gradually introduced more flexible and efficient string handling mechanisms, such as <span  class="pink">StringBuffer</span>, <span  class="pink">StringBuilder</span>, and enhancements in JDK 8. The introduction of compact strings and, more recently, string templates and template expressions in JDK 21, marked significant strides towards modernization. These advancements not only simplified string manipulation but also aligned Java with contemporary programming practices, demonstrating its adaptability and responsiveness to developers' needs. As Java continues to evolve, it stands as a testament to its robustness and versatility, remaining a fundamental tool in the ever-changing landscape of software development.

A N M Bazlur Rehman
November 20, 2023
Use Unlogged to
mock instantly
record and replay methods
mock instantly
Install Plugin