Working with strings in python

by Alex
Working with strings in python

Strings in programming languages are ordered sequences of characters that are used to represent any textual information. In Python, strings are a data type in their own right, so with Python’s built-in functions you can perform operations on them and format them for output.

Creating

There are several ways to get a new string: with corresponding literals or by calling a ready-made function. First, let’s look at the first method, which is shown below. Here the variable string gets the value of some text, using the assignment operator. The function print will print the created string.

string = 'some text'
print(string)

some text

As you can see from the previous example, a string literal is surrounded by single quotes. If you want this literal to be part of a string, double quotes should be used, as shown in the following code snippet. It shows that the new string includes some ‘new’ text, which can be easily displayed.

string = "some 'new' text"
print(string)

some 'new' text

Sometimes you need to create objects that include several strings at once, retaining the formatting. The triple use of double quotes to select a literal solves this problem. Declaring a string this way you can give it a text with an unlimited number of paragraphs, as shown in this code

string = """some 'new' text
with new line here"""
print(string)

some 'new' text
with new line here

Special characters

Using triple quotes to format strings is not always convenient, because it sometimes takes too much space in your code. To set your own text formatting, simply use special control characters with a backslash, as shown in the following example. The tab character t is used here, as well as the line feed character n. The method print demonstrates the output of a new object on the screen.

string = "somettextnnew line here"
print(string)

some text
new line here

The syntax characters for line formatter perform their function automatically, but sometimes it gets in the way, for example when saving a file path to disk. To disable them you need to apply special prefix r before the first quotation mark of a literal. This way the backslashes will be ignored by the program when it starts.

string = r "D:dirnew"

The following table is a list of all the Python formatting characters used in the language. As a rule, most of them allow you to change the position of the carriage to perform line feed, tabulation, or carriage return.

Symbol Purpose
n Carriage advance to a new line
b Carriage return one character back
f Carriage advance to a new page
r Carriage return to the beginning of the line
t Horizontal tab
v Vertical tabulation
a Beep tone
N Database identifier
u, U 16-bit and 32-bit Unicode character
x 16 bit Unicode character
o Symbol in 8-digit number system
Null character

Very often n is used. It is used in Python to perform line feeds. Let’s look at an example:

print('firstnsecond')

first
second

Formatting

To perform formatting of individual parts of the string by specifying some objects in the program as its components, you can use the % symbol that follows the literal. The following example shows how a string literal includes not only text but also a string and an integer. Note that each variable in parentheses must correspond to a special symbol in the literal itself, prefixed with % and a suitable value.

string = "text"
number = 10
newString = "this is %s and digit %d" % (string, number)
print(newString)

this is text and digit 10

The following code snippet demonstrates the use of formatting to output a right-aligned string (the total length of characters is specified as 10)

string = "text"
newString = "%+10s" % string
print(newString)

text

This table contains all of the control characters for string formatting in Python, each of which represent a specific object, either numeric or symbolic.

Symbol Purpose
%d, %i, %u Number in decimal notation
%x, %X Number in 16-cimal notation with upper and lower case letters
%o Number in 8-cimal numeral system
%f, %F Floating point number
%e, %E Floating point number with exponent in lower and upper case
%c Single character
%s, %r Literals and plain string
%% Percent character

More convenient formatting is done with the format function. This function accepts as its arguments the objects that should be included in the string, and specifies where they should be located using numerical indices, starting from zero.

string = "text"
number = 10
newString = "this is {0} and digit {1}".format(string, number)
print(newString)

this is text and digit 10

The following example shows how a string can be centered by using the format method and special characters. The original text here is centered in the middle of the string, and the empty space is filled with an *.

string = "text"
newString = "{:*^10}".format(string)
print(newString)

***text***

The following table displays special characters for string alignment and number output with the required character format for positive and negative objects.

Symbol Purpose
‘<‘ Left alignment of string with placeholder characters to the right
‘>’ Right side alignment with placeholder characters to the left
‘=’ Alignment with placeholders after a number sign, but before its digits
‘^’ Center line alignment with placeholder characters on both sides
‘+’ Applying a sign to any number
‘-‘ Applying a sign to negative numbers and nothing to positive numbers
‘ ‘ Applying a sign to negative numbers and a space for positive numbers

String operations

Before we turn to functions for working with strings, you should consider basic operations on strings that allow you to quickly convert any sequence of characters. The plus sign allows you to concatenate strings, joining them together. The following example demonstrates the concatenation of this is new and text.

string = "text"
newString = "this is new " + string
print(newString)

this is new text

Using the multiplication character, the programmer can duplicate a string any number of times. In this code, the word text is written into the new string three times.

string = "text "
newString = string * 3
print(newString)

text text text text

Just as with numbers, you can use comparison operators with strings such as the double-equals. Obviously the literals some text and some new text are different, so calling the print method print will print False to string and newString.

string = "some text"
newString = "some new text"
print(string == newString)

False

Operations on strings allow you to get substrings out of them by making cuts, just as with normal sequence elements. In the next example, you just have to specify the necessary index interval in square brackets, remembering that numbering is done starting from zero.

string = "some text"
newString = string[2:4]
print(newString)

me

The negative index allows you to address individual characters of a string from the end, not from the beginning. Thus, the element number -2 in the string some text is the letter x.

string = "some text"
print(string[-2])

x

Methods and Functions

The str function is very often used to cast types to string. You can use it to create a new string from a literal you pass as an argument. This example shows how to initialize the variable string with the new value of some text.

string = str("some text")
print(string)

some text

This function accepts variables of various types, like numbers or lists. This function allows Python to convert different types of data to strings. If you create your own class, you should define a __str__ method for it. This method should return a string which will be returned when your class object is used as a str argument. Python uses the len function to get the length of a string in characters. As you can see from the following code snippet, the length of the some text object is 9 (spaces also count).

string = "some text"
print(len(string))

9

The method find allows you to search a string. It can be used in Python to find a single character or an entire substring in any other sequence of characters. As a result of its execution it returns the index of the first letter of the sought object, numbering it from zero.

string = "some text"
print(string.find("text"))

5

The replace method replaces specific characters or strings with the sequence of characters entered by the programmer. To do so, you have to pass appropriate arguments to the function, like in the following example, where spaces are replaced with ‘-‘.

string = "some text"
print(string.replace(" ", "-"))

some-text

To split a string into several substring using a specified delimiter, call the split method. By default, its separator is a space. As shown in the example below, some new text is transformed into a list of strings.

string = "some new text"
strings = string.split()
print(strings)

['some', 'new', 'text']

You can perform the reverse conversion to make a list of strings into one by using the join method. In the next example we put a space as the separator for the newline, and the argument is an array of strings that contains some, new and text.

strings = ["some", "new", "text"]
string = ".join(strings)
print(string)

some new text

Finally, the strip method is used to strip whitespace from both sides of the string automatically, as shown in the following code snippet for the value of string.

string = " some new text "
newString = string.strip()
print(newString)

some new text

This table introduces you to the functions and methods used in Python 3 for working with strings. It also contains methods that allow you to interact with character case.

Method Assignment
str(obj) Converts the object to string form
len(s) Returns the string length
find(s, start, end), rfind(s, start, end) Returns the index of the first and last occurrence of a substring in s or -1, searching within the range start to end
replace(s, ns) Replaces the chosen character sequence in s with a new substring of ns
split(c) Splits into substrings using the selected delimiter c
join(c) Combines a list of strings into one string using the selected delimiter c
strip(s), lstrip(s), rstrip(s) Strip whitespace from both sides of s, only left or only right
center(num, c), ljust(num, c), rjust(num, c) Returns a centered string aligned left and right with length num and character c on the edges
lower(), upper() Convert all characters to lower and upper case
startwith(ns), endwith(ns) Checks if a string starts or ends with a substring of ns
islower(), isupper() Checks if the string consists only of uppercase and lowercase characters
swapcase() Reverses the case of all the characters
title() Capitalizes the first letter of each word to uppercase and the rest to lowercase
capitalize() Capitalize() converts the first letter to uppercase and all others to lowercase
isalpha() Checks if the word consists of letters only
isdigit() Checks if it contains only numbers
isnumeric() Checks if the string is a number

Encoding

To set the desired encoding for the characters used in strings in Python, simply place the corresponding instruction at the beginning of the code file, as was done in the following example, which uses utf-8. Using the prefix u, which precedes the literal, you can mark it with the appropriate encoding. At the same time, the prefix b is used for literals of strings with one-byte elements.

# coding: utf-8
string = u'some text'
newString = b'text'

The built-in decode and encode methods allow to encode and decode single strings with given encoding. They take as their argument the name of the encoding, as in the following code example, where the name utf-8 is used.

string = string.decode('utf8')
newString = newString.encode('utf8')

Related Posts

LEAVE A COMMENT