Python 文字列 str

2021-02-21

Pythonの文字列型について説明します。

文字列の定義

単引用符'あるいは二重引用符"で文字を囲むことで文字列を表現できます。

円マーク(バックスラッシュ)\は特殊な記号で、エスケープや特定の文字を表現できます。

'文字列A' # => 文字列A
'文字列"B"' # => 文字列"B"
'文字列\'C\'' # => 文字列'C'
"文字列1" # => 文字列1
"文字列\"2\"" # => 文字列"2"
"文字列'3'" # => 文字列'3'

print("改\n行\r\nやタブ\t、円マーク\\含む文字列")
## =>
# 改
# 行
# やタブ	、円マーク\含む文字列

円マーク(バックスラッシュ)\を特殊文字として解釈させたくない場合は先頭にrをつけてraw stringsが使える。

print(r'c:\some\ttt\nnn\path')
## => c:\some\ttt\nnn\path

三連引用符("""、または、’’’)で複数行の文字列を扱える。

print('''A
B
C''')
# =>
# A
# B
# C

各行の末尾に円マーク(バックスラッシュ)\を置くと改行が行われないようになる。

print("""A\
B\
C""")
# => ABC

+で文字列の結合、*で繰り返しを行える。文字列がリテラル(引用符で囲まれた文字)の場合連続して並べることで自動で結合される。

"A" + "B" + "C" # => ABC
"A" "B" "C" # => ABC
"py" * 2 # => pypy

文字列のインデックス

先頭0とした0起算のインデックスの他に、末尾を-1とした後ろからのインデックス指定で、特定位置の文字を取り出すことができます。

str[0] # => a 先頭の文字
str[1] # => b 先頭から数えて1番目(0起算)の文字
str[2] # => c 先頭から数えて1番目(0起算)の文字
str[-1] # => j 末尾の文字 (str[len(str)-1]と同じ)
str[-2] # => i 末尾から数えて1番目の文字 (str[len(str) - 2]と同じ)
str[-3] # => h 末尾から数えて2番目の文字 (str[len(str) - 3]と同じ)

文字列のスライス

コロン(colon):で指定した範囲のインデックスで部分文字列を取得できます。

str = "abcdefghij"
str[0:3] # => abc インデックス0～2の部分文字列
str[5:9] # => fghi インデックス5～8の部分文字列
str[2:9:3] # => cfi インデックス2～9の部分文字列を2文字毎

省略した場合、開始は0、終了は文字列長が指定されます。

str = "abcdefghij"
str[5:] # => fghij インデックス5以降の部分文字列 (str[5:len(str)]と同じ)
str[:3] # => abc インデックス3より前の部分文字列 (str[0:3]と同じ)
str[1:] # => bcdefghij 先頭だけ取り除いた文字列

マイナスの添字も指定可能です。

str = "abcdefghij"
str[-5:-2] # => fgh インデックス-5～-3の部分文字列
str[:-1] # => abcdefghi 末尾だけ取り除いた文字列

文字列の長さ

str = "abcdefghij"
len(str) # => 10

文字列型のメソッド

capitalize() 先頭を大文字

str_sample = 'python snakes'
str_sample.capitalize() # => 'Python snakes'

casefold() 積極的な小文字化

str_sample = 'A ß Ⅰ'
str_sample.casefold() # => 'a ssⅰ'

center(width[, fillchar]) 中央寄せ

str_sample = 'abc'
str_sample.center(9) # => '   abc   '
str_sample.center(9, '_') # => '___abc___'

count(sub[, start[, end]]) 部分文字の出現回数

str_sample = 'abacabababaca'
str_sample.count('aba') # => 3
str_sample.count('aba', 2) # => 2
str_sample.count('aba', 2, 8) # => 1

encode(encoding=“utf-8”, errors=“strict”) エンコード

str_sample = 'abcあ⭐'
str_sample.encode() #=> b'abc\xe3\x81\x82\xe2\xad\x90'
str_sample.encode(encoding='shift_jis', errors='replace') #=> b'abc\x82\xa0?'
str_sample.encode(encoding='shift_jis', errors='ignore') #=> b'abc\x82\xa0'
str_sample.encode(encoding='shift_jis', errors='xmlcharrefreplace') #=> b'abc\x82\xa0&#11088;'
str_sample.encode(encoding='shift_jis', errors='backslashreplace') #=> b'abc\x82\xa0\\u2b50'

endswith(suffix[, start[, end]]) 末端文字

str_sample = 'abcdefghij'
str_sample.endswith('ij') # => True
str_sample.endswith('ij', 5) # => True
str_sample.endswith('hi', 5, 9) # => True

expandtabs(tabsize=8) タブをスペースに置換

str_sample = 'a\tb\tc\t\td'
str_sample.expandtabs() # => 'a       b       c               d'

find(sub[, start[, end]]) 部分文字列の最小インデックス(見つからないとき-1)

str_sample = 'abcabcabcabc'
str_sample.find('bc') # => 1
str_sample.find('bc', 5) # => 7
str_sample.find('bc', 5, 6) # => -1

format(*args, **kwargs)　フォーマット(位置と名称)

"{0}, {1}, {key1}".format(1, 'a', key1=12.3) # => '1, a, 12.3'

dic = {"key1":12.3}
"{0}, {1}, {key1}".format(1, 'a', **dic) # => '1, a, 12.3'

format_map(mapping) フォーマット(名称)

dic = {"key1":12.3, "key2":"a"}
"{key1}, {key2}".format_map(dic) # => '12.3, a'

index(sub[, start[, end]]) インデックス(見つからないときValueError)

str_sample = 'abcabcabcabc'
str_sample.index('bc') # => 1
str_sample.index('bc', 5) # => 7
str_sample.index('bc', 5, 6) # => ValueError

isalnum() 英数字判定

"".isalnum() #=> False
"abc123".isalnum() #=> True
"-123".isalnum() #=> False

isalpha() 英字判定

"".isalpha() #=> False
"abc".isalpha() #=> True
"abc123".isalpha() #=> False

isascii() ASCIIまたは空文字判定

"".isascii() #=> True
"abc".isascii() #=> True
"+-$\abcABC123".isascii() #=> True
"¦あいう".isascii() #=> False

isdecimal() 10進数字判定

"".isdecimal() #=> False
"0123456789".isdecimal() #=> True
"૦૧૨૩૪૫૬૭૮૯".isdecimal() #=> True
"123.45".isdecimal() #=> False
"abc".isdecimal() #=> False
"⒎".isdecimal() #=> False

isdigit() 数字判定

"".isdigit() #=> False
"0123456789".isdigit() #=> True
"૦૧૨૩૪૫૬૭૮૯".isdigit() #=> True
"123.45".isdigit() #=> False
"abc".isdigit() #=> False
"⒎".isdigit() #=> True

isidentifier() 識別子として有効か

"1".isidentifier() # => False
"$abc".isidentifier() # => False
"abc".isidentifier() # => True

islower() 小文字判定

"".islower() # => False
"abc".islower() # => True
"abC".islower() # => False
"a123".islower() # => True
"123".islower() # => False

isnumeric() 数字判定

"".isnumeric() #=> False
"0123456789".isnumeric() #=> True
"૦૧૨૩૪૫૬૭૮૯".isnumeric() #=> True
"123.45".isnumeric() #=> False
"abc".isnumeric() #=> False
"⒎".isnumeric() #=> True

isprintable()　印字可能であるか

"".isprintable() #=> True
"abc123".isprintable() #=> True
" ".isprintable() #=> False

isspace() スペース判定

"".isspace() # => False
" ".isspace() # => True
"  ".isspace() # => True
"　".isspace() # => True

istitle() 大文字が大文字小文字の区別のない文字の後に続くか

"".istitle() # => False
"abc".istitle() # => False
"Abc".istitle() # => True
"Abc123".istitle() # => True
"AbcDef".istitle() # => False
"Abc Def".istitle() # => True

isupper() 大文字判定

"".isupper() # => False
"abc".isupper() # => False
"abC".isupper() # => False
"ABC".isupper() # => True
"ABC123".isupper() # => True
"123".isupper() # => False

join(iterable) 文字の結合

"-".join(str(x) for x in range(3)) # => '0-1-2'
"-".join(['a', 'b', 'c']) # => 'a-b-c'

ljust(width[, fillchar]) 左寄せ

"a".ljust(5) # => 'a    '
"a".ljust(5, '_') # => 'a____'

lower() 小文字化

"ABC123abc".lower() # => 'abc123abc'

lstrip([chars]) 先頭トリム

"  abc  ".lstrip() # => 'abc '
"  abc  ".lstrip(" a") # => 'bc '

str.maketrans(x[, y[, z]]) 変換テーブルの作成

t = str.maketrans({"a":"A", 98:"B"})
t # => {97: 'A', 98: 'B'}
"abc".translate(t) # => 'ABc'

t = str.maketrans("ab", "AB")
t # => {97: 65, 98: 66}
"abc".translate(t) # => 'ABc'

t = str.maketrans("ab", "AB","c")
t # => {97: 65, 98: 66, 99: None}
"abc".translate(t) # => 'AB'

partition(sep) 文字列の分割

"abc,def,ghi".partition(",") # => ('abc', ',', 'def,ghi')
"abc,def,ghi".partition("|") # => ('abc,def,ghi', '', '')
"abc,def,ghi".partition(",def,") # => ('abc', ',def,', 'ghi')

removeprefix(prefix, /) 接頭辞除去

"No.123".removeprefix("No.") # => '123'

removesuffix(suffix, /) 接尾辞除去

"SomeTest".removesuffix("Test") # => 'Some'

replace(old, new[, count]) 文字列の置換

"abcabcabcabc".replace("abc", "XYZ") # => 'XYZXYZXYZXYZ'
"abcabcabcabc".replace("abc", "XYZ", 2) # => 'XYZXYZabcabc'

rfind(sub[, start[, end]]) 文字列の最大インデックス(見つからないとき-1)

str_sample = 'abcabcabcabc'
str_sample.rfind('bc') # => 10
str_sample.rfind('bc', 5) # => 10
str_sample.rfind('bc', 5, 9) # => 7
str_sample.rfind('bc', 5, 8) # => -1

rindex(sub[, start[, end]]) 文字列の最大インデックス(見つからないときValueError)

str_sample = 'abcabcabcabc'
str_sample.rindex('bc') # => 10
str_sample.rindex('bc', 5) # => 10
str_sample.rindex('bc', 5, 9) # => 7
str_sample.rindex('bc', 5, 8) # => ValueError

rjust(width[, fillchar]) 左寄せ

"a".rjust(5) # => '    a'
"a".rjust(5, '_') # => '____a'

rpartition(sep) 文字列の分割(最後の位置で)

"abc,def,ghi".rpartition(",") # => ('abc,def', ',', 'ghi')
"abc,def,ghi".rpartition("|") # => ('', '', 'abc,def,ghi')
"abc,def,ghi".rpartition(",def,") # => ('abc', ',def,', 'ghi')

rsplit(sep=None, maxsplit=-1) 後半から文字列の分解(split)

"a,b,c,d,e".rsplit(",") # => ['a', 'b', 'c', 'd', 'e']
"a,b,c,d,e".rsplit(",", 3) # => ['a,b', 'c', 'd', 'e']
"a,b,c,d,e".rsplit(sep=",") # => ['a', 'b', 'c', 'd', 'e']
"a,b,c,d,e".rsplit(sep=",", maxsplit=3) # => ['a,b', 'c', 'd', 'e']

rstrip([chars]) 末端トリム

"  abc  ".rstrip() # => '  abc'
"  abc  ".rstrip("c ") # => '  ab'

split(sep=None, maxsplit=-1) 文字列の分解(split)

"a,b,c,d,e".split(",") # => ['a', 'b', 'c', 'd', 'e']
"a,b,c,d,e".split(",", 3) # => ['a', 'b', 'c', 'd,e']
"a,b,c,d,e".split(sep=",") # => ['a', 'b', 'c', 'd', 'e']
"a,b,c,d,e".split(sep=",", maxsplit=3) # => ['a', 'b', 'c', 'd,e']

splitlines([keepends])

lines = '''aaa
bbb
ccc

ddd
eee
'''

lines.splitlines() # => ['aaa', 'bbb', 'ccc', '', 'ddd', 'eee']
lines.splitlines(keepends=True) # => ['aaa\n', 'bbb\n', 'ccc\n', '\n', 'ddd\n', 'eee\n']

startswith(prefix[, start[, end]]) 先頭が指定文字で始まるか

str_sample = 'abcdefghij'
str_sample.startswith('ab') # => True
str_sample.startswith('cd', 2) # => True
str_sample.startswith('cd', 2, 3) # => False

strip([chars]) トリム

"  abc  ".strip() # => 'abc'
"  abcba  ".strip(" a") # => 'bcb'

swapcase() 大文字と小文字の入れ替え

"abc123ABC".swapcase() # => 'ABC123abc'

title() 単語を大文字に

"hello world! hello python!".title() # => 'Hello World! Hello Python!'

translate(table) 文字のマッピング

t = {97: 65, 98: 66, 99: None}
"abc".translate(t) # => 'AB'

upper() 大文字化

"abc123ABC".upper() # => 'ABC123ABC'

zfill(width) 0埋め文字列

"12".zfill(8) # => '00000012'
"12".zfill(1) # => '12'

フォーマット済み文字列リテラル (formatted string literal)

先頭にfまたはFのついた文字列でフォーマット済みの文字列を表現できます。

文字列中に波括弧を使用し{変数名}の形式で評価したい変数を埋め込む事ができます。変数名に続けて{変数名!変換の形式}や{変数名:書式指定}の形式で出力を指定します。

str = "テスト"
print(f"フォーマット済み文字列リテラル {str}")
# フォーマット済み文字列リテラル テスト
print(f"フォーマット済み文字列リテラル {str!s}") # s => str()で変換
# フォーマット済み文字列リテラル テスト
print(f"フォーマット済み文字列リテラル {str!r}") # r => repr()で変換
# フォーマット済み文字列リテラル 'テスト'
print(f"フォーマット済み文字列リテラル {str!a}") # a => ascii()で変換
# フォーマット済み文字列リテラル '\u30c6\u30b9\u30c8'

number = 12345.12345
print(f"フォーマット済み文字列リテラル {number}") # = > 12345.12345
print(f"フォーマット済み文字列リテラル {number:,}") # = > 12,345.12345
print(f"フォーマット済み文字列リテラル {number:.2f}") # = > 12345.12
print(f"フォーマット済み文字列リテラル {number:012.3f}") # = > 00012345.123

integer = 10
print(f"フォーマット済み文字列リテラル {integer}") # = > 10
print(f"フォーマット済み文字列リテラル {integer:d}") # = > 10 10進数
print(f"フォーマット済み文字列リテラル {integer:b}") # = > 1010 2進数
print(f"フォーマット済み文字列リテラル {integer:o}") # = > 12 8進数
print(f"フォーマット済み文字列リテラル {integer:x}") # = > a　16進数(小文字)
print(f"フォーマット済み文字列リテラル {integer:X}") # = > A　16進数(大文字)

文字列書式

%(変数名)変換型、%(変数名)変換フラグ変換型の形で、文字列を構成し、 % 演算子で辞書を指定することで文字列のフォーマット操作を行える。

変換型は以下の通り。

# 8進数や16進数の場合接頭辞を表示
0 0埋めを行う
- 左寄せ
スペースをあける
+ 符号±をつけてスペースをあける

変換フラグは次の通り

d i u 符号付き10進整数
o 符号付き8進数
x 符号付き16進数(変換フラグ#で接頭辞に小文字0xが表示)
X 符号付き16進数(変換フラグ#で接頭辞に大文字0Xが表示)
e 指数表記の浮動小数点数(指数eが小文字)
E 指数表記の浮動小数点数(指数Eが大文字)
f F 10 進浮動小数点数
g 小文字指数表記または10進表記の浮動小数点数
G 大文字指数表記または10進表記の浮動小数点数
c 文字一文字
r 文字列　repr()
s 文字列　str()
a 文字列　ascii()

d = {'var1': "string", "var2": 9, "var3": 12345.6789}
print('%(var1)s,%(var2) 5d,%(var2) 5i,%(var2) 5o,%(var3).3e' % d)
print('%(var1)s,%(var2)#5d,%(var2)#5i,%(var2)#5o,%(var3).4E' % d)
print('%(var1)s,%(var2)05d,%(var2)05i,%(var2)05o,%(var3).3f' % d)
print('%(var1)s,%(var2)-5d,%(var2)-5i,%(var2)-5o,%(var3).4g' % d)
print('%(var1)s,%(var2)+5d,%(var2)+5i,%(var2)+5o,%(var3).5G' % d)
# =>
# string,    9,    9,   11,1.235e+04
# string,    9,    9, 0o11,1.2346E+04
# string,00009,00009,00011,12345.679
# string,9    ,9    ,11   ,1.235e+04
# string,   +9,   +9,  +11,12346