Python pattern strings

by Alex
Python pattern strings

Despite one of the Python principles that says “There must be one – and preferably only one – obvious way to do something”, our favorite language has as many as four ways to format a string. That’s just the way it has been historically. This is the second lesson in a series on string formatting. It includes

  1. String format statement
  2. Format() method
  3. f-Lines
  4. Template strings

In this lesson, we’re going to learn about string templates.

String templates are a tool provided by the built-in string module of the standard Python library. To start working with it, you need to import the Template class:


from string import Template

Templates are not in the main syntax, but in a module because they are generally not used for everyday tasks, but for more specific ones. They have fairly limited functionality, but can be fine-tuned.

How it works

String templates originally emerged as an alternative to the string format statement to create and handle complex patterns.

Let’s look at an example:


from string import Template
template_string = Template('The best programming language is $lang!')
prepared_string = template_string.substitute(lang='Python')
print(prepared_string)
# Output:
The best programming language is Python!

Here we are

  • We import the Template class from the module. This is the class which provides all the magic.
  • We define the pattern of the string. “$lang” is an identifier, instead of which some value will be substituted.
  • We implement substitution by “.substitute” method. Here we specify what value to substitute instead of the identifier lang=’Python’. The arguments (their names) we pass to “.substitute()” must match the identifiers given in the placeholders of the template string.
  • We output the result to the console.

We can think of this mechanism as a command: “if this line has identifiers corresponding to the names of the arguments, substitute the values of the arguments instead of the identifiers.” The substitution does not take into account the type of value of the argument being passed. Whatever it is, the interpreter converts it to a string and inserts it.

Finding an identifier and replacing it with a value is done using a regular expression, which can be overridden, but more on that later. You can learn more about regular expressions in our tutorial on this topic.

Template String

A template string can be any string that contains valid identifiers. What does “correct” mean? By default, there are some identifier requirements

  • Starts with “$”
  • Contains letters and/or numbers and/or the underscore character. Cannot start with a number. In general, the requirements are the same as for ordinary variable names, except that Cyrillic and other non-ASCII characters cannot be used in template identifiers.
  • An identifier is considered complete when the first character that does not meet the requirements of the previous paragraph is encountered.
  • The identifier can be surrounded by curly braces.

Valid identifiers:

$var, ${var}, $var_var, $vAr12345, $_1234

Invalid identifiers:

Var, $variable, $1234, $var*, $(var)

If you need to use the “$” character outside an identifier, it must be escaped, that is, put another identical character before it, otherwise Python will return an exception:


from string import Template
from random import randint
template_string = Template('I want to earn $zp$!')
prepared_string = template_string.substitute(zp=randint(10, 100)**randint(10, 100))
print(prepared_string)
# Output:
Traceback (most recent call last):
..
ValueError: Invalid placeholder in string: line 1, col 22
Process finished with exit code 1
from string import Template
from random import randint
template_string = Template("I want to earn $zp$$!")
prepared_string = template_string.substitute(zp=randint(10, 100)**randint(10, 100))
print(prepared_string)
# Output:
I want to earn $504857282956046106624$!

You could even go like this:


from string import Template
from random import randint
template_string = Template("I want to earn $$$$$$$zp!")
prepared_string = template_string.substitute(zp=randint(10, 100)**randint(10, 100))
print(prepared_string)
# Output:
I want to make $$131621703842267671360000000000000000000!

Now let’s discuss why we need the ability to frame an identifier with curly braces. Recall: “An identifier is considered complete when the first character that does not meet the requirements of the previous paragraph is encountered. Such a character, most often, is a space. But what if you have, in the place where the pattern continues, should be an ordinary letter or number? Yes, that’s where the curly braces to separate the identifier come in handy:


from string import Template
from random import randint
template_string = Template('I want to earn $zp000$$!')
prepared_string = template_string.substitute(zp=randint(10, 100))
print(prepared_string)
# Output:
Traceback (most recent call last):
..
KeyError: 'zp000'
Process finished with exit code 1
from string import Template
from random import randint
template_string = Template("I want to earn ${zp}000$$!")
prepared_string = template_string.substitute(zp=randint(10, 100))
print(prepared_string)
# Output:
I want to make $24,000!

You may also need to manipulate word parts:


from string import Template
def semantic_reverse(word):
template_string = Template('${word}less')
prepared_string = template_string.substitute(word=word)
print(prepared_string)
semantic_reverse('brain')
semantic_reverse('limit')
# output:
brainless
limitless

Here’s another example that’s already close to the real problem. Suppose we need to dynamically create a path to a directory on the filesystem:


from string import Template
def create_path(*crumbs):
result_path = ''
template_string = Template('$crumb/')
for crumb in crumbs:
prepared_string = template_string.substitute(crumb=crumb)
result_path += prepared_string
return result_path
my_path = create_path('C', 'home', 'dir')
print(my_path)
# output:
C/home/dir/

Since the slash is not a valid identifier character, the script works the way we wanted it to. But if a similar problem arises, but the separator is a valid character, we will encounter a problem:


from string import Template
def create_file_name(*crumbs):
file_name = ''
template_string = Template('$crumb_')
for crumb in crumbs:
prepared_string = template_string.substitute(crumb=crumb)
file_name += prepared_string
file_name = file_name[:-1]
file_name += '.xml'
return file_name
my_path = create_file_name('2021', '12', '31')
print(my_path)
# Output:
Traceback (most recent call last):
..
KeyError: 'crumb_'

Again the curly braces come to the rescue:


from string import Template
def create_file_name(*crumbs):
file_name = ''
template_string = Template('${crumb}_')
for crumb in crumbs:
prepared_string = template_string.substitute(crumb=crumb)
file_name += prepared_string
file_name = file_name[:-1]
file_name += '.xml'
return file_name
my_path = create_file_name('2021', '12', '31')
print(my_path)
# Output:
2021_12_31.xml

Yes, that’s correct now. That’s because the curly braces correctly limit identifiers from the underscore. The template string itself is stored in the template attribute of the template instance:


from string import Template
template_string = Template('Some string with identifier $crumb')
print(template_string.template)
# output:
Some string with identifier $crumb

we can change the object’s template, but it’s hard to call this good programming style:


from string import Template
template_string = Template('Some string with ID $crumb')
print(template_string.template)
print(template_string.substitute(crumb=12345))
template_string.template = 'Some new string with id $var'
print(template_string.template)
print(template_string.substitute(var='there was an id'))
# Output:
Some string with identifier $crumb
Some string with id 12345
Some new string with id $var
Some new string with the identifier "there was an identifier here"

The substitute() method

The .substitute() method substitutes values in place of identifiers. To do this, it maps the names of the arguments to the names of the identifiers. There can be several identifiers, of course:


from string import Template
template_string = Template('$a and $b sat on a pipe')
print(template_string.substitute(a='Python', b='C++'))
# Output:
Python and C++ were sitting on a pipe

In addition to named arguments, the . substitute() method you can pass a dictionary. To do this, you need to apply dictionary unpacking (the operator “**” before the dictionary name):


from string import Template
template_string = Template('$a and $b were sitting on a pipe')
my_dict = {'a': 'Python', 'b': 'C++'}
print(template_string.substitute(**my_dict))
# output:
Python and C++ sat on the pipe

Common errors

The .substitute() method is strict and does not forgive mistakes. If you specify more identifiers in the pattern than you pass arguments, a KeyError will be returned:


from string import Template
template_string = Template('$a and $b sat on a pipe')
my_dict = {'a': 'Python',}
print(template_string.substitute(**my_dict))
# Output:
Traceback (most recent call last):
..
KeyError: 'b'
Process finished with exit code 1

the same thing will happen if you pass an argument, the name of which does not match the identifier. If we pass an invalid identifier, we get a ValueError exception, telling us the placeholder is invalid.

safe_substitute() method

The .safe_substitute() method is a less strict analog of . substitute(). It does everything the same, but it does not raise the exceptions described above: If the arguments are missing or their names do not match the identifiers, the .safe_substitute() method simply will not replace the identifier with the argument value and will return the string “as is”. Let’s check on the above example:


from string import Template
template_string = Template('$a and $b were sitting on a pipe')
my_dict = {'a': 'Python',}
print(template_string.safe_substitute(**my_dict))
# Output:
Python and $b sat on the pipe

Configuring the Template class

As with any class in Python, we can inherit from Template. This means that we can override its attributes. This is where the most interesting features of this tool are hidden. And so, let’s inherit:


from string import Template
class NewTemplate(Template):
pass

Override delimiter

The delimiter attribute contains the character used as the initial identifier character:


from string import Template
template_string = Template('$a and $b sat on a pipe')
print(template_string.delimiter)
# output:
$

Now let’s try to override it:


from string import Template
class NewTemplate(Template):
delimiter = 'substituting this, I suppose:'
template_string = Template('I'm substituting this: a and $b sat on a pipe')
print(template_string.safe_substitute(a='Python'))
template_string = NewTemplate('substitute, so this is:a and $b sat on a pipe')
print(template_string.safe_substitute(a='Python'))
# Output:
I substitute, so it means this:a and $b were sitting on a pipe
Python and $b were sitting on a pipe

Now you can use the NewTemplate class the same way you use the Template class, but the separator will not be a “$” but “‘substituting this:”. Why do you need it? Imagine you have a lot of “$” characters in your template, which should be part of the template, not the identifier. If you don’t override the delimiter, you’d have to manually escape each of them. The situation gets even worse if the program gets this string from an external source. Another option when it is convenient to redefine the delimiter is templates that already have delimiters but for a different syntax. For example, you have a T-SQL query text for a database and it has query parameters. It’s very simple:


from string import Template
class NewTemplate(Template):
delimiter = '@'
qwery = 'select 1 from my_shema.my_table where my_table.id = @id'
template_string = NewTemplate(qwery)
print(template_string.safe_substitute(id=13))
# Output:
select 1 from my_shema.my_table where my_table.id = 13

Override identifier mask

The idpattern attribute is a regular expression used to check the body of the identifier specified in the template string:


from string import Template
template_string = Template('$a and $b sat on a pipe')
print(template_string.idpattern)
# output:
(?a:[_a-z][_a-z0-9]*)

Of course, we can override this attribute as well. As a rule, such an override will introduce stricter naming rules for the identifier. An interesting example:


from string import Template
class NewTemplate(Template):
delimiter = ' '
idpattern = 'Olya'
qwery = 'Olya sat on a stump, Olya ate a pie'
template_string = NewTemplate(qwery)
print(template_string.safe_substitute())
print(template_string.safe_substitute(Olya='Larisa Petrovna'))
# output:
Sitting down on a stump, Olya ate a pie
Larisa Petrovna sat down on a stump, Larisa Petrovna ate a pie

The pattern attribute

If overriding of delimiter and idpattern attributes is not enough, you can override the pattern attribute. To do this, you need to provide a regular expression with four named groups

  1. escape – corresponds to the sequence for the separator
  2. named – corresponds to a valid identifier as in $identifier and must not include a separator.
  3. braced – this group corresponds to the bracketed name as in ${identifier}. It must not include escaped or curly braces.
  4. invalid – this group corresponds to any other delimiter pattern (usually a single delimiter), and should appear last.

Here’s how you can find out the current pattern:


from string import Template
class NewTemplate(Template):
delimiter = ' '
idpattern = 'Olya'
qwery = 'Olya sat on a stump, Olya ate a pie'
template_string = NewTemplate(qwery)
print(template_string.pattern)
print(template_string.pattern.pattern)
# Output:
re.compile('\n \ (?:\n (?P ) | # Escape sequence of two delimiters\n (?Rol) | # delimiter and a Python identifier\n {(?Rol, re.IGNORECASE|re.VERBOSE)
\ (?:
(?P ) | # Escape sequence of two delimiters
(?Rol) | # delimiter and a Python identifier
{|| delimiter and a braced identifier
(?P) # Other ill-formed delimiter exprs
)

Related Posts

LEAVE A COMMENT