{"id":1158,"date":"2020-09-05T23:36:00","date_gmt":"2020-09-05T14:36:00","guid":{"rendered":"https:\/\/dalomo.net\/blog\/?p=1158"},"modified":"2020-09-12T13:59:15","modified_gmt":"2020-09-12T04:59:15","slug":"%e3%83%87%e3%83%bc%e3%82%bf%e3%83%99%e3%83%bc%e3%82%b9%e3%82%92%e4%bd%bf%e3%81%a3%e3%81%9f%e3%82%b5%e3%82%a4%e3%83%88%e3%82%92%e4%bd%9c%e3%82%8a%e3%81%9f%e3%81%84%e2%91%a5-%e6%9c%80%e6%96%b0%e3%83%87","status":"publish","type":"post","link":"https:\/\/dalomo.net\/blog\/2020\/09\/05\/1158\/","title":{"rendered":"\u30c7\u30fc\u30bf\u30d9\u30fc\u30b9\u3092\u4f7f\u3063\u305f\u30b5\u30a4\u30c8\u3092\u4f5c\u308a\u305f\u3044\u2465 Python+BeautifulSoup\u3067\u6700\u65b0\u30c7\u30fc\u30bf\u3092\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0"},"content":{"rendered":"<h1>\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0<\/h1>\n<p>\u5143\u3005\u306fVBA+IE\u3067\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3092\u3057\u3066\u30c7\u30fc\u30bf\u3092\u53ce\u96c6\u3057\u3066\u30c7\u30fc\u30bf\u30d9\u30fc\u30b9\u3092\u4f5c\u3063\u305f\u3002\u3067\u3082\u305d\u308c\u3092\u6bce\u56de\u3084\u308b\u306e\u306f\u3081\u3093\u3069\u3044\u306e\u3067\u81ea\u52d5\u5316\u3067\u304d\u308b\u3068\u826f\u3044\u3002\u30b5\u30fc\u30d0\u30fc\u4e0a\u3067\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3059\u308b\u3084\u308a\u65b9\u3063\u3066\u4f55\u304c\u3042\u308b\u3093\u3060\u308d\u3046\u3068\u63a2\u3057\u3066\u307f\u305f\u3089Python\u306e\u30e9\u30a4\u30d6\u30e9\u30ea\u3067BeautifulSoup\u3068\u3044\u3046\u3082\u306e\u3092\u898b\u3064\u3051\u305f\u3002\u3053\u308c\u3067\u5b9a\u671f\u7684\u306b\u66f4\u65b0\u3055\u308c\u3066\u308b\u304b\u898b\u306b\u884c\u304d\u3064\u3064\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3057\u3066\u5fc5\u8981\u306a\u30c7\u30fc\u30bf\u3092\u62bd\u51fa\u3057\u3066\u3001\u306a\u304a\u304b\u3064\u30c7\u30fc\u30bf\u30d9\u30fc\u30b9\u3092\u66f4\u65b0\u3067\u304d\u308c\u3070\u3044\u3044\u306a\u3002\u3068\u308a\u3042\u3048\u305a\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3057\u3088\u30fc\u3002<\/p>\n<h2>Beautiful Soup\u3067\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0<\/h2>\n<p><a href=\"https:\/\/www.crummy.com\/software\/BeautifulSoup\/bs4\/doc\/\">https:\/\/www.crummy.com\/software\/BeautifulSoup\/bs4\/doc\/<\/a><\/p>\n<p>\u516c\u5f0f\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u3002\u4eca\u306f4.9.0\u304c\u6700\u65b0\uff1f<\/p>\n<p><a href=\"http:\/\/kondou.com\/BS4\/\">http:\/\/kondou.com\/BS4\/<\/a><\/p>\n<p>\u3053\u3063\u3061\u306f\u65e5\u672c\u8a9e\u8a33\u3002\u3053\u308c\u306f4.2.0\u304c\u57fa\u307f\u305f\u3044\u3002\u307e\u3041\u3084\u3063\u3066\u3053\u30fc\u3002<\/p>\n<h3>\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb<\/h3>\n<p>CentOS\u306a\u306e\u3067<\/p>\n<pre><code class=\"shell\">$ pip install beautifulsoup4\r\nCollecting beautifulsoup4\r\nDownloading beautifulsoup4-4.9.1-py3-none-any.whl (115 kB)\r\n|????????????????????????????????| 115 kB 17.9 MB\/s\r\nCollecting soupsieve&gt;1.2\r\nDownloading soupsieve-2.0.1-py3-none-any.whl (32 kB)\r\nInstalling collected packages: soupsieve, beautifulsoup4\r\nSuccessfully installed beautifulsoup4-4.9.1 soupsieve-2.0.1<\/code><\/pre>\n<p>\u3055\u308c\u305f\u3002<\/p>\n<h3>HTML\u30d1\u30fc\u30b5\u30fc\u306e\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb<\/h3>\n<p>HTML\u3092\u89e3\u6790\u3057\u3066\u304f\u308c\u308b\u3084\u3064\u3002Python\u3067\u6a19\u6e96\u306b\u5165\u3063\u3066\u308b\u3089\u3057\u3044\u3084\u3064\u3068\u306a\u3093\u304b\u8272\u3005\u3002\u305d\u306e\u3046\u3061lxml\u306eHTML\u30d1\u30fc\u30b5\u30fc\u3092\u4f7f\u304a\u3046\u3068\u601d\u3046\u3002\u901f\u3044\u3089\u3057\u3044\u3088\u3002<\/p>\n<pre><code class=\"shell\">$ pip install lxml\r\nCollecting lxml\r\n  Downloading lxml-4.5.2-cp38-cp38-manylinux1_x86_64.whl (5.4 MB)\r\n     |????????????????????????????????| 5.4 MB 10.6 MB\/s\r\nInstalling collected packages: lxml\r\nSuccessfully installed lxml-4.5.2<\/code><\/pre>\n<h3>Requests\u306e\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb<\/h3>\n<p><a href=\"https:\/\/requests.readthedocs.io\/en\/master\/\">https:\/\/requests.readthedocs.io\/en\/master\/<\/a><\/p>\n<p>HTML\u3092\u305d\u306e\u307e\u307e\u53d6\u5f97\u3059\u308b\u6642\u306b\u4f7f\u3046\u3084\u3064\u307f\u305f\u3044\u3002Python\u306e\u6a19\u6e96\u306b\u3082Urllib\u3068\u3044\u3046\u30e9\u30a4\u30d6\u30e9\u30ea\u304c\u5165\u3063\u3066\u308b\u3089\u3057\u3044\u3051\u3069\u3001\u3053\u3063\u3061\u306e\u307b\u3046\u304c\u4f7f\u3044\u3084\u3059\u3044\u3089\u3057\u3044\u3002\u3089\u3057\u3044\u3070\u3063\u304b\u3060\u3002\u305f\u3060\u81ea\u5206\u306e\u76ee\u7684\u3060\u3068\u5916\u90e8\u516c\u958b\u3055\u308c\u3066\u308b\u56fa\u5b9a\u30da\u30fc\u30b8\u306b\u884c\u3063\u3066\u53d6\u5f97\u3059\u308b\u3060\u3051\u3060\u3057\u5225\u306b\u305d\u3053\u307e\u3067\u3059\u308b\u3088\u3046\u306a\u3082\u306e\u3067\u3082\u306a\u3044\u304b\u3082\u3057\u3093\u306a\u3044\u3002\u307e\u3041\u4e00\u5fdc\u5165\u308c\u3068\u304f\uff01<\/p>\n<pre><code class=\"shell\">$ pip install requests\r\nCollecting requests\r\n  Downloading requests-2.24.0-py2.py3-none-any.whl (61 kB)\r\n     |????????????????????????????????| 61 kB 727 kB\/s\r\nCollecting chardet&lt;4,&gt;=3.0.2\r\n  Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)\r\n     |????????????????????????????????| 133 kB 22.3 MB\/s\r\nCollecting certifi&gt;=2017.4.17\r\n  Downloading certifi-2020.6.20-py2.py3-none-any.whl (156 kB)\r\n     |????????????????????????????????| 156 kB 30.5 MB\/s\r\nCollecting urllib3!=1.25.0,!=1.25.1,&lt;1.26,&gt;=1.21.1\r\n  Downloading urllib3-1.25.10-py2.py3-none-any.whl (127 kB)\r\n     |????????????????????????????????| 127 kB 17.0 MB\/s\r\nCollecting idna&lt;3,&gt;=2.5\r\n  Downloading idna-2.10-py2.py3-none-any.whl (58 kB)\r\n     |????????????????????????????????| 58 kB 6.0 MB\/s\r\nInstalling collected packages: chardet, certifi, urllib3, idna, requests\r\nSuccessfully installed certifi-2020.6.20 chardet-3.0.4 idna-2.10 requests-2.24.0 urllib3-1.25.10<\/code><\/pre>\n<p>\u3067\u304d\u305f\u3002<\/p>\n<h3>\u4f7f\u3063\u3066\u307f\u308b<\/h3>\n<p>\u307e\u305a\u3061\u3083\u3093\u3068\u53d6\u5f97\u3067\u304d\u308b\u304b\u8a66\u3059\u3002\u30b3\u30fc\u30c9\u66f8\u3044\u305f\u3002<\/p>\n<pre><code class=\"python\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\nurl = 'https:\/\/www.shogi.or.jp\/tsume_shogi\/everyday\/'\r\nres = requests.get(url)\r\n\r\nsoup = BeautifulSoup(res.text, 'lxml')\r\nprint(soup.title)<\/code><\/pre>\n<p>\u7d50\u679c<\/p>\n<blockquote><p>python3 scrape.py<br \/>\n&lt;title&gt;a??a??a?\u226aa?!ec\u00b0a\u00b0?a\uffe1?i??ec\u00b0a\u00b0?a\uffe1?a?\u226ba\uffe2!a?Ra\uff0c\u0080a??i??a?\\a?\uffe2a\u00b0?a\uffe1?e\u0080\uffe1c??&lt;\/title&gt;<\/p><\/blockquote>\n<p>\u304a\u3063\u3068\u30fc\u3001\u307e\u305f\u6587\u5b57\u5316\u3051\u304b\u3044\u2026\u3002\u3093\u30fc\u3068<\/p>\n<pre><code class=\"python\">print('\u3053\u3093\u306b\u3061\u308f')<\/code><\/pre>\n<p>\u3067<\/p>\n<pre><code class=\"shell\">$ python3 hello.py\r\n\u3053\u3093\u306b\u3061\u308f<\/code><\/pre>\n<p>\u8868\u793a\u3055\u308c\u308b\u3002<\/p>\n<pre><code class=\"python\">import requests\r\n\r\nurl = 'https:\/\/www.shogi.or.jp\/tsume_shogi\/everyday\/'\r\nres = requests.get(url)\r\n\r\nprint(res.text)<\/code><\/pre>\n<p>\u3053\u3046\u3057\u3066\u3001\u7d50\u679c<\/p>\n<blockquote><p>&lt;title&gt;a??a??a?\u226aa?!ec\u00b0a\u00b0?a\uffe1?i??ec\u00b0a\u00b0?a\uffe1?a?\u226ba\uffe2!a?Ra\uff0c\u0080a??i??a?\\a?\uffe2a\u00b0?a\uffe1?e\u0080\uffe1c??&lt;\/title&gt;<\/p><\/blockquote>\n<p>\u3042\u30fc\u3053\u306e\u6642\u70b9\u3067\u3082\u3046\u5316\u3051\u3066\u308b\u3002&#8221;Requests \u6587\u5b57\u5316\u3051&#8221;\u3068\u304b\u3067\u691c\u7d22<\/p>\n<p style=\"text-align: left;\"><a href=\"https:\/\/qiita.com\/nittyan\/items\/d3f49a7699296a58605b\">https:\/\/qiita.com\/nittyan\/items\/d3f49a7699296a58605b<\/a><\/p>\n<p>\u3068\u304b\u3002\u81ea\u5206\u3067\u3082\u8a66\u3057\u3066\u307f\u308b\u3002<\/p>\n<pre><code class=\"python\">print(res.encoding)<\/code><\/pre>\n<p>\u306b\u5909\u3048\u3066<\/p>\n<pre><code class=\"shell\">$ python3 scrape.py\r\nISO-8859-1<\/code><\/pre>\n<p>\u304a\u30fc\u3002qiita\u3060\u3068<span class=\"n\">apparent_encoding\u3092\u4f7f\u3063\u3066\u308b\u3051\u3069\u3059\u3067\u306b\u30b5\u30a4\u30c8\u5074\u306e\u6587\u5b57\u30b3\u30fc\u30c9\u304cutf-8\u3060\u3068\u5206\u304b\u3063\u3066\u308b\u306e\u3067<\/span><\/p>\n<pre><code class=\"python\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\nurl = 'https:\/\/www.shogi.or.jp\/tsume_shogi\/everyday\/'\r\nres = requests.get(url)\r\nres.encoding = 'utf-8'\r\n\r\nsoup = BeautifulSoup(res.text, 'lxml')\r\nprint(soup.title)<\/code><\/pre>\n<p>\u7c21\u6613\u306b\u6587\u5b57\u30b3\u30fc\u30c9\u3092\u5f37\u5236\u7684\u306b\u6307\u5b9a\u3057\u3066\u3042\u3052\u308b\u3002\u7d50\u679c<\/p>\n<blockquote><p>$ python3 scrape.py<br \/>\n&lt;title&gt;\u307e\u3044\u306b\u3061\u8a70\u5c06\u68cb\uff5c\u8a70\u5c06\u68cb\u30fb\u6b21\u306e\u4e00\u624b\uff5c\u65e5\u672c\u5c06\u68cb\u9023\u76df&lt;\/title&gt;<\/p><\/blockquote>\n<p>\u53d6\u5f97\u3067\u304d\u305f\uff01\u305d\u3057\u305f\u3089\u6700\u65b0\u51fa\u984c\u65e5\u306e\u30c7\u30fc\u30bf\u3092\u53d6\u5f97\u3057\u3066\u307f\u308b\u3002<\/p>\n<h3>\u6700\u65b0\u3092\u53d6\u5f97<\/h3>\n<p>CSS\u30bb\u30ec\u30af\u30bf\u3067\u6307\u5b9a\u3059\u308b\u306e\u306f\u3044\u307e\u3044\u3061\u3088\u304f\u5206\u304b\u3063\u3066\u306a\u3044\u306e\u3067find\u3068find_all\u4f7f\u3063\u3066\u3084\u3063\u3066\u304f\u3002\u30bd\u30fc\u30b9\u898b\u308b\u3068\u3001<span class=\"html-attribute-name\">id<\/span>=&#8221;<span class=\"html-attribute-value\">contents<\/span>&#8220;\u306e\u4e2d\u306eli\u8981\u7d201\u500b\u76ee\u3067\u3001a\u3068p\u306e\u8981\u7d20\u3092\u53d6\u5f97\u3067\u304d\u308c\u3070\u826f\u3055\u3052\u3002\u305d\u306e\u5f8cp\u8981\u7d20\u306e\u30c6\u30ad\u30b9\u30c8\u3092\u52a0\u5de5\u3059\u308c\u3070\u30c7\u30fc\u30bf\u4f5c\u308c\u305d\u3046\u3002\u3067\u3001\u3053\u3093\u306a\u611f\u3058\u306b\u306a\u3063\u305f\u3002<\/p>\n<pre><code class=\"python\">import requests\r\nfrom bs4 import BeautifulSoup\r\n\r\ntarget_url = 'https:\/\/www.shogi.or.jp\/tsume_shogi\/everyday\/'\r\nres = requests.get(target_url)\r\nres.encoding = 'utf-8'\r\n\r\nsoup = BeautifulSoup(res.text, 'lxml')\r\n\r\ncontents = soup.find(id=\"contents\")\r\nurl = contents.find('a').get('href')\r\n\r\nprint(url)\r\n\r\ntitle = contents.find('p').text\r\n\r\nprint(title)\r\n\r\nq_date = title[0:title.find('\u65e5')].replace('\u5e74','-').replace('\u6708','-')\r\n\r\nprint(q_date)\r\n\r\nmaker = ''\r\nsteps = 0\r\nif '\u3001' in title:\r\n  maker = title[title.find('\uff08')+1:title.find('\u3001')-1]\r\n  steps = title[title.find('\u3001')+1:title.find('\u624b\u8a70')]\r\nelse:\r\n  steps = title[title.find('\uff08')+1:title.find('\u624b\u8a70')]\r\n\r\nprint(maker)\r\nprint(steps)<\/code><\/pre>\n<p>\u7d50\u679c<\/p>\n<pre><code class=\"shell\">$ python3 scrape.py\r\nhttps:\/\/www.shogi.or.jp\/tsume_shogi\/everyday\/2020959.html\r\n2020\u5e749\u67085\u65e5\u306e\u8a70\u5c06\u68cb\uff08\u670d\u90e8\u614e\u4e00\u90ce\u4f5c\u30019\u624b\u8a70\uff09\r\n2020-9-5\r\n\u670d\u90e8\u614e\u4e00\u90ce\r\n9<\/code><\/pre>\n<p>\u3068\u308a\u3042\u3048\u305a9\/5\u5206\u3057\u304b\u30c1\u30a7\u30c3\u30af\u3057\u3066\u306a\u3044\u3051\u3069\u5927\u4e08\u592b\u304b\u306a\uff5e\u3002BF\u306efind\u306e\u5f15\u6570\u306e\u66f8\u304d\u65b9\u304c\u3088\u304f\u5206\u304b\u3093\u306a\u304b\u3063\u305f\u306e\u3067\u4f55\u56de\u3082find\u3057\u305f\u3002li\u306f\u3044\u3089\u3093\u304b\u3063\u305f\u306d\u3002\u30c7\u30fc\u30bf\u306e\u5207\u308a\u51fa\u3057\u3082\u6b63\u898f\u8868\u73fe\u3068\u304b\u4f7f\u3048\u308b\u3068\u3082\u3063\u3068\u3044\u3044\u611f\u3058\u306b\u66f8\u3051\u308b\u3093\u3060\u308d\u3046\u306a\u3041\u2026\u3002\u307e\u3060\u307e\u3060\u3067\u3059\u306d\u3002\u307e\u3041\u76ee\u7684\u306e\u30c7\u30fc\u30bf\u306f\u30b2\u30c3\u30c8\u3067\u304d\u305d\u3046\u306a\u306e\u3067\u3001\u6b21\u306f\u3053\u308c\u3092DB\u306b\u66f4\u65b0\u3057\u3066\u3044\u304f\u51e6\u7406\u3092\u66f8\u3053\u3046\uff01<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0 \u5143\u3005\u306fVBA+IE\u3067\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3092\u3057\u3066\u30c7\u30fc\u30bf\u3092\u53ce\u96c6\u3057\u3066\u30c7\u30fc\u30bf\u30d9\u30fc\u30b9\u3092\u4f5c\u3063\u305f\u3002\u3067\u3082\u305d\u308c\u3092\u6bce\u56de\u3084\u308b\u306e\u306f\u3081\u3093\u3069\u3044\u306e\u3067\u81ea\u52d5\u5316\u3067\u304d\u308b\u3068\u826f\u3044\u3002\u30b5\u30fc\u30d0\u30fc\u4e0a\u3067\u30b9\u30af\u30ec\u30a4\u30d4\u30f3\u30b0\u3059\u308b\u3084\u308a\u65b9\u3063\u3066\u4f55\u304c\u3042\u308b\u3093\u3060\u308d\u3046\u3068\u63a2\u3057\u3066\u307f\u305f\u3089P &hellip; <a href=\"https:\/\/dalomo.net\/blog\/2020\/09\/05\/1158\/\">\u7d9a\u304d\u3092\u8aad\u3080 <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[8],"tags":[123,50,124,125,120],"class_list":["post-1158","post","type-post","status-publish","format-standard","hentry","category-8","tag-beautifulsoup","tag-python","tag-requests","tag-125","tag-120"],"_links":{"self":[{"href":"https:\/\/dalomo.net\/blog\/wp-json\/wp\/v2\/posts\/1158","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dalomo.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dalomo.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dalomo.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dalomo.net\/blog\/wp-json\/wp\/v2\/comments?post=1158"}],"version-history":[{"count":3,"href":"https:\/\/dalomo.net\/blog\/wp-json\/wp\/v2\/posts\/1158\/revisions"}],"predecessor-version":[{"id":1163,"href":"https:\/\/dalomo.net\/blog\/wp-json\/wp\/v2\/posts\/1158\/revisions\/1163"}],"wp:attachment":[{"href":"https:\/\/dalomo.net\/blog\/wp-json\/wp\/v2\/media?parent=1158"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dalomo.net\/blog\/wp-json\/wp\/v2\/categories?post=1158"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dalomo.net\/blog\/wp-json\/wp\/v2\/tags?post=1158"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}