作为一个python初学者,我发现只要正则表达式学好了,可有让计算机帮你在各种文本中找到任何你想要的东西,前几天在baidu求助中看到一则问题,是关于获取某某关键词后面的值的问题,我觉着这类问题很常见,也经常在日常文本检索中用到,下面我把问题移过来,顺便贴上自己的答案和体会。
原文见:
https://zhidao.baidu.com/question/182278714860878444.html?entry=qb_uhome_tag
问题搬运过来,如下:
文件的内容格式:
“<2018-09-19 15:09:54,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond38,cost:1
<2018-09-19 15:09:54,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond55,cost:2
<2018-09-19 15:09:54,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond1,cost:1
<2018-09-19 15:09:54,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond43,cost:4
<2018-09-19 15:09:54,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond73,cost:5
<2018-09-19 15:09:54,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond12,cost:9”
需要计算“cost”后面值的平均值,以及文件中的最大值和最小值,如何写Python?
我给出了两种答案:
答案1:
import re
text = '''<2018-09-19 15:09:54,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond38,cost:1
<2018-09-19 15:09:55,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond55,cost:2
<2018-09-19 15:09:56,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond1,cost:1
<2018-09-19 15:09:57,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond43,cost:4
<2018-09-19 15:09:58,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond73,cost:5
<2018-09-19 15:09:59,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond12,cost:9'''
num = re.findall(r'.*cost:(\d)',text) #第一种方法,技巧是(\d)才是findall的返回值,注意findall返回的是list
cont = 0
for i in range(0,len(num)):
cont = cont + int(num[i])
print('平均值:',cont/len(num))
print('最大值:',max(num))
print('最小值:',min(num))
执行结果为:
平均值: 3.6666666666666665
最大值: 9
最小值: 1
答案2:
import re
text = '''<2018-09-19 15:09:54,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond38,cost:1
<2018-09-19 15:09:55,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond55,cost:2
<2018-09-19 15:09:56,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond1,cost:1
<2018-09-19 15:09:57,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond43,cost:4
<2018-09-19 15:09:58,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond73,cost:5
<2018-09-19 15:09:59,159> (Thread-2) [INFO ] (PressureListener.java:25) - bond12,cost:9'''
list = list() #第二种方法,巧用了每段之间的换行符(\n)
text2 = text.split('\n')
for i in text2:
num = re.match(r'.+cost:(\d)',i)
list.append(num.group(1))
cont = 0
for i in range(0,len(list)):
cont = cont + int(list[i])
print('平均值:',cont/len(list))
print('最大值:',max(list))
print('最小值:',min(list))
执行结果同样为:
平均值: 3.6666666666666665
最大值: 9
最小值: 1