Bash Split String by Delimiter
Last updated
Last updated
Splitting string data is essential in some specific tasks, such as reading log files line by line to get specific information like date. Most programming languages have a built-in function known as 'split' for dividing string data into various parts.
On the other hand, the bash does not contain such built-in functions. Therefore, it uses delimiters for splitting string data. Delimiters are often either a special character or a sequence of characters separating parts of a string.
In this tutorial, we will learn various ways to split string data using different examples.
The special variable $IFS is called Internal Field Separator. It is used in bash to split a string into words by assigning a specific delimiter. White space is the default delimiter for $IFS. However, other values such as '\n', '\t', '-' etc. can be used.
After assigning the delimiter, the string can be read by two options: '-r' and '-a'. The first option, '-r', reads the backslash (\) as a character rather than escape character while the '-a' option stores the split-ted words into an array variable.
The examples below show different ways to split string data with $IFS.
Example 1: Bash Split String by Space
|
The script defines the string value to split and specifies the delimiter as space. The set delimiter is then used to split the words into an array which are then printed out using a for loop.
The output prints each of the split words of the provided string: “Welcome to Linux OS”. It shows that string splitting is based on the space delimiter.
Example 2: Bash Split String by a Particular Character
|
The script prompts the user to key in specific input, reads the input, and sets the comma as a delimiter. It then reads the input string as an array (starr).
From the output, the user provided “John, Doe, Ohio” as their input. This string was then splitted based on the comma delimiter and the output printed.
A string can also be divided into sections without using the $IFS variable. For instance, the readarray command used with -d option can be used to split string data. Just like in $IFS, the -d command defines the separator character.
|
This script prompts the user to enter input with a colon. It reads the string provided into an array and splits it based on the colon delimiter. The for loop is then used to print each value of the array.
In this example, the string “Python:Node:React” is taken in as input for splitting. The readarray command used with the -d option splits the provided string data based on the colon (:) delimiter.
The read command reads raw input (option -r), it interprets backslashes literally rather than treating them as escape characters. The option -a with read command stores the word read into an array in bash. The bash loop prints the string in split form.
|
This script defines the string to split and sets the semicolon as the delimiter. The for loop is then used to read the value of the splitted string.
The output shows the results printed after splitting the string value, “Windows;Linux OS; Debian; Fedora”, based on the semicolon delimiter.
Instead of the read command, the tr command is used to split the string on the delimiter. The following example shows how this command is used.
|
This example is pretty much the same as the previous one but instead of the read command the script uses the tr command.
In this approach, array elements are divided based on semicolon and space delimiters. The string “Windows; Linux OS; Debian; Fedora '' results in the output shown above. Some elements are treated as separate words. For example, ‘Linux OS’ is treated as two words: Linux and OS.
The awk Linux command works in bash and shell distributions. It returns the exit code with the result. The pipe (|) symbol is used to pass input to the awk command. Awk provides a split function to create an array based on the given delimiter.
The syntax for the split function is:
However, the delimiter is optional and defaults to space if not provided.
Example:
|
In this script, the delimiter is not provided. Therefore, the string value defined is split based on the space delimiter.
The defined string value was “9 10 11”. This string is split based on the space delimiter and the output is printed as shown above.
The cut command can be used to split a string on a delimiter. The -d option is used to specify the delimiter. The -f option is used to specify the string to split.
|
The script shows how the cut command is used to split a string on a delimiter. The defined string value is splitted based on the semicolon delimiter.
In this example, the cut command is used to split the string “Python;React;Angular” based on the semicolon delimiter. The -d option specifies the delimiter which in this case is the semicolon(;). The -f option specifies that the first element (Python) should be extracted.
The sed command can also be used to split string data. The syntax for this command is:
|
In this case, g stands for global. It means that the substitution has to be globally, that is, for any occurrence. The example above shows how sed command is used to split string data using colon as a delimiter.
The output shows that the defined string value, “Hello:World”, was split into two words: Hello and World, based on the colon delimiter.
Splitting strings is useful in some specific use cases. Unlike other programming languages, bash splits strings using delimiters with commands such as IFS variable, tr, cut, awk, and sed commands.