jieba分词
去除标点符号
punct = set(u''':!),.:;?]}¢'"、。〉》」』】〕〗〞︰︱︳﹐、﹒
﹔﹕﹖﹗﹚﹜﹞!),.:;?|}︴︶︸︺︼︾﹀﹂﹄﹏、~¢
々‖•·ˇˉ―--′’”([{£¥'"‵〈《「『【〔〖([{£¥〝︵︷︹︻
︽︿﹁﹃﹙﹛﹝({“‘-—_…''')
# 对str/unicode
filterpunt = lambda s: ''.join(filter(lambda x: x not in punct, s))
# 对list
filterpuntl = lambda l: list(filter(lambda x: x not in punct, l))
最后更新于
这有帮助吗?