{"id":19521,"date":"2022-09-16T04:54:46","date_gmt":"2022-09-16T02:54:46","guid":{"rendered":"http:\/\/blog.wenzlaff.de\/?p=19521"},"modified":"2022-09-16T10:49:13","modified_gmt":"2022-09-16T08:49:13","slug":"hakrawler-ein-schneller-golang-web-crawler-auf-dem-pi-im-docker","status":"publish","type":"post","link":"http:\/\/blog.wenzlaff.de\/?p=19521","title":{"rendered":"Hakrawler ein schneller golang Web-Crawler auf dem Pi im Docker"},"content":{"rendered":"<p>Wer einen Web-Crawler auf dem Pi laufen lassen will, kann sich mal den in Go geschiebenen <a href=\"https:\/\/github.com\/hakluke\/hakrawler\" rel=\"noopener\" target=\"_blank\">hakrawler<\/a> anschauen.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/blog.wenzlaff.de\/wp-content\/uploads\/2022\/09\/blutzelle.webp\" alt=\"Blutzelle\" width=\"1920\" height=\"1080\" class=\"aligncenter size-full wp-image-19587\" srcset=\"http:\/\/blog.wenzlaff.de\/wp-content\/uploads\/2022\/09\/blutzelle.webp 1920w, http:\/\/blog.wenzlaff.de\/wp-content\/uploads\/2022\/09\/blutzelle-300x169.webp 300w, http:\/\/blog.wenzlaff.de\/wp-content\/uploads\/2022\/09\/blutzelle-1024x576.webp 1024w, http:\/\/blog.wenzlaff.de\/wp-content\/uploads\/2022\/09\/blutzelle-768x432.webp 768w, http:\/\/blog.wenzlaff.de\/wp-content\/uploads\/2022\/09\/blutzelle-1536x864.webp 1536w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/p>\n<p>Nach der Installation lassen wir das Programm im Docker (nur gegen eigene Server!) laufen, z.B. mit der Dom\u00e4ne <a href=\"http:\/\/kleinhirn.eu\/\" rel=\"noopener\" target=\"_blank\">http:\/\/kleinhirn.eu\/<\/a><\/p>\n<p><strong>echo http:\/\/kleinhirn.eu | docker run &#8211;rm -i hakluke\/hakrawler -subs -u<\/strong><\/p>\n<p>Hier ein Ausschnitt aus dem Dump:<\/p>\n<p><!--more--><\/p>\n<pre class=\"lang:default decode:true \" >http:\/\/blog.wenzlaff.de\/?p=6195\r\n<blockquote class=\"wp-embedded-content\" data-secret=\"NakSNE4Wsz\"><a href=\"http:\/\/kleinhirn.eu\/2015\/09\/01\/die-schlechtesten-airlines-der-welt-2015\/\">Die schlechtesten Airlines der Welt &#8211; 2015<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8222;Die schlechtesten Airlines der Welt &#8211; 2015&#8220; &#8212; Das Kleinhirn\" src=\"http:\/\/kleinhirn.eu\/2015\/09\/01\/die-schlechtesten-airlines-der-welt-2015\/embed\/#?secret=LYrUBUco8Y#?secret=NakSNE4Wsz\" data-secret=\"NakSNE4Wsz\" width=\"525\" height=\"296\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\r\nhttp:\/\/kleinhirn.eu\/wp-content\/uploads\/2015\/09\/schlechtesten-Airlines-der-Welt.png\r\nhttp:\/\/kleinhirn.eu\/wp-content\/uploads\/2015\/09\/schlechtesten-Airlines-der-Welt.pdf\r\nhttp:\/\/kleinhirn.eu\/wp-content\/uploads\/2020\/09\/ziele-setzen.png\r\n<blockquote class=\"wp-embedded-content\" data-secret=\"ut3CqPquq0\"><a href=\"http:\/\/kleinhirn.eu\/2020\/09\/04\/top-7-wie-ziele-setzen\/\">TOP 7 &#8211; Wie Ziele setzen?<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8222;TOP 7 &#8211; Wie Ziele setzen?&#8220; &#8212; Das Kleinhirn\" src=\"http:\/\/kleinhirn.eu\/2020\/09\/04\/top-7-wie-ziele-setzen\/embed\/#?secret=grsXtfydFS#?secret=ut3CqPquq0\" data-secret=\"ut3CqPquq0\" width=\"525\" height=\"296\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\r\nhttp:\/\/kleinhirn.eu\/wp-content\/uploads\/2020\/01\/top-10-goals-2020.png\r\n<blockquote class=\"wp-embedded-content\" data-secret=\"ophMVakRQM\"><a href=\"http:\/\/kleinhirn.eu\/2020\/01\/10\/top-10-goals-template\/\">TOP 10 Goals &#8211; Template<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8222;TOP 10 Goals &#8211; Template&#8220; &#8212; Das Kleinhirn\" src=\"http:\/\/kleinhirn.eu\/2020\/01\/10\/top-10-goals-template\/embed\/#?secret=MbqkD9MlDy#?secret=ophMVakRQM\" data-secret=\"ophMVakRQM\" width=\"525\" height=\"296\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\r\nhttp:\/\/kleinhirn.eu\/wp-content\/uploads\/2020\/01\/2020_Ziele-scaled.jpg\r\n<blockquote class=\"wp-embedded-content\" data-secret=\"3xojbZScwB\"><a href=\"http:\/\/kleinhirn.eu\/2020\/01\/08\/ziele-2020\/\">Ziele 2020<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8222;Ziele 2020&#8220; &#8212; Das Kleinhirn\" src=\"http:\/\/kleinhirn.eu\/2020\/01\/08\/ziele-2020\/embed\/#?secret=BB8Y3zePNI#?secret=3xojbZScwB\" data-secret=\"3xojbZScwB\" width=\"525\" height=\"296\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\r\nhttp:\/\/kleinhirn.eu\/wp-content\/uploads\/2018\/12\/ziele-vorlage.png\r\n<blockquote class=\"wp-embedded-content\" data-secret=\"MMFiFZKB1B\"><a href=\"http:\/\/kleinhirn.eu\/2018\/12\/22\/top-10-ziele-vorlage\/\">Top 10 Ziele Vorlage<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8222;Top 10 Ziele Vorlage&#8220; &#8212; Das Kleinhirn\" src=\"http:\/\/kleinhirn.eu\/2018\/12\/22\/top-10-ziele-vorlage\/embed\/#?secret=BmWzn0C1dA#?secret=MMFiFZKB1B\" data-secret=\"MMFiFZKB1B\" width=\"525\" height=\"296\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\r\nhttp:\/\/kleinhirn.eu\/2015\/12\/25\/2016\/\r\nhttp:\/\/kleinhirn.eu\/wp-content\/uploads\/2015\/12\/2016.png\r\nhttp:\/\/kleinhirn.eu\/wp-content\/uploads\/2015\/12\/2016.pdf\r\nhttp:\/\/kleinhirn.eu\/wp-content\/uploads\/2021\/01\/base58.png\r\nhttp:\/\/kleinhirn.eu\/2021\/01\/30\/base58\/#more-5706<\/pre>\n<p>Hier die Parameter aus der Hilfe:<\/p>\n<p><strong>docker run &#8211;rm -i hakluke\/hakrawler &#8211;help<\/strong><\/p>\n<pre class=\"lang:default decode:true \" >\r\n\r\n\r\nUsage of hakrawler:\r\n  -d int\r\n    \tDepth to crawl. (default 2)\r\n  -h string\r\n    \tCustom headers separated by two semi-colons. E.g. -h \"Cookie: foo=bar;;Referer: http:\/\/example.com\/\"\r\n  -insecure\r\n    \tDisable TLS verification.\r\n  -json\r\n    \tOutput as JSON.\r\n  -proxy string\r\n    \tProxy URL. E.g. -proxy http:\/\/127.0.0.1:8080\r\n  -s\tShow the source of URL based on where it was found. E.g. href, form, script, etc.\r\n  -size int\r\n    \tPage size limit, in KB. (default -1)\r\n  -subs\r\n    \tInclude subdomains for crawling.\r\n  -t int\r\n    \tNumber of threads to utilise. (default 8)\r\n  -timeout int\r\n    \tMaximum time to crawl each URL from stdin, in seconds. (default -1)\r\n  -u\tShow only unique urls.<\/pre>\n<p>Mehr auf <a href=\"https:\/\/github.com\/hakluke\/hakrawler\" rel=\"noopener\" target=\"_blank\">GitHub.<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Wer einen Web-Crawler auf dem Pi laufen lassen will, kann sich mal den in Go geschiebenen hakrawler anschauen. Nach der Installation lassen wir das Programm im Docker (nur gegen eigene Server!) laufen, z.B. mit der Dom\u00e4ne http:\/\/kleinhirn.eu\/ echo http:\/\/kleinhirn.eu | docker run &#8211;rm -i hakluke\/hakrawler -subs -u Hier ein Ausschnitt aus dem Dump:<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[220,5119,1023,4129,1319],"tags":[5281,2502,4314,1025,5282],"class_list":["post-19521","post","type-post","status-publish","format-standard","hentry","category-anleitung","category-go","category-raspberry-pi","category-raspberry-pi-4-b","category-sicherheit-2","tag-crawler","tag-docker","tag-go","tag-pi","tag-web-crawler"],"_links":{"self":[{"href":"http:\/\/blog.wenzlaff.de\/index.php?rest_route=\/wp\/v2\/posts\/19521","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/blog.wenzlaff.de\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blog.wenzlaff.de\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blog.wenzlaff.de\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.wenzlaff.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=19521"}],"version-history":[{"count":0,"href":"http:\/\/blog.wenzlaff.de\/index.php?rest_route=\/wp\/v2\/posts\/19521\/revisions"}],"wp:attachment":[{"href":"http:\/\/blog.wenzlaff.de\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=19521"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blog.wenzlaff.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=19521"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blog.wenzlaff.de\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=19521"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}