修复 FinNLP 爬虫bug
-
在使用 FinNLP 提供例子 去获取
Easymoney
from finnlp.data_sources.news.eastmoney_streaming import Eastmoney_Streaming
东方财富 论坛的数据的时候,
由于 xpath发生了变化。 也就是网友html内部变动了。导致线上代码,不能获取 数据。
现在 已经 修改, 提交了 issues里面包含了正确的代码:
https://github.com/AI4Finance-Foundation/FinNLP/issues/3def _gather_pages(self, stock, page): .... # gather the comtent of the first page page = etree.HTML(response.text) trs = page.xpath('//*[@id="mainlist"]/div/ul/li[1]/table/tbody/tr') have_one = False for item in trs: have_one = True read_amount = item.xpath("./td[1]//text()")[0] comments = item.xpath("./td[2]//text()")[0] title = item.xpath("./td[3]/div/a//text()")[0] content_link = item.xpath("./td[3]/div/a/@href")[0] author = item.xpath("./td[4]//text()")[0] time = item.xpath("./td[5]//text()")[0] tmp = pd.DataFrame([read_amount, comments, title, content_link, author, time]).T columns = [ "read amount", "comments", "title", "content link", "author", "create time" ] tmp.columns = columns self.dataframe = pd.concat([self.dataframe, tmp]) #print(title) if have_one == False: return "break" ...