{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":447387652,"defaultBranch":"main","name":"crawler","ownerLogin":"crwlrsoft","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2022-01-12T22:20:59.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/36385739?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1711475895.0","currentOid":""},"activityList":{"items":[{"before":"356237480f217e321309ca720799c850b3da3bf2","after":"78881867d99ec49bd95576839ed844e87eb6c36c","ref":"refs/heads/feature/keep-and-sub-crawling-procedures","pushedAt":"2024-05-30T11:03:12.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Fix changelog\n\nThe new method is `outputType()`. Method `outputKey()` is an existing\nmethod.","shortMessageHtmlLink":"Fix changelog"}},{"before":"125e1367a5e69581a9a3894bf1ced6629eff8a86","after":"356237480f217e321309ca720799c850b3da3bf2","ref":"refs/heads/feature/keep-and-sub-crawling-procedures","pushedAt":"2024-05-30T10:49:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Changes after updating PHP CS Fixer\n\nAdd trailing commas in multi line function calls.","shortMessageHtmlLink":"Changes after updating PHP CS Fixer"}},{"before":"5bc581be6e96d04039d737e6c46fe4d1a7f71be5","after":"125e1367a5e69581a9a3894bf1ced6629eff8a86","ref":"refs/heads/feature/keep-and-sub-crawling-procedures","pushedAt":"2024-05-30T10:29:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Improve dealing with compression in cache files\n\nMake reading a compressed cache file work, even when useCompression was\nnot called on the `FileCache` instance.","shortMessageHtmlLink":"Improve dealing with compression in cache files"}},{"before":null,"after":"5bc581be6e96d04039d737e6c46fe4d1a7f71be5","ref":"refs/heads/feature/keep-and-sub-crawling-procedures","pushedAt":"2024-03-26T17:58:15.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"keep() instead of addToResult() and sub crawlers\n\nNew methods `Step::keep()`, `Step::keepAs()`, `Step::keepFromInput()`\nand `Step::keepInputAs()` as simpler alternatives for\n`Step::addToResult()`, `Step::addLaterToResult()` and\n`Step::keepInputData()` which are all deprecated now. The new keep\nmethods add data to a keep array in IO objects. Not creating a Result\nobject and potentially sharing the same Result object for a lot of child\noutputs, makes the new keep functionality less complex. No need for\nsomething like `addLaterToResult()`. Kept properties can also be used\nwith `useInputKey()` which is pretty handy.\n\nAnother cool new feature are sub crawlers. Any step can now create a\nsub crawler to fill a property. Example: you have a page about an\nauthor with multiple links to detail pages about his books. You can\nselect those links and let a sub crawler fill the author's `books`\nproperty with data from the book detail pages.\n\nFurther also introduce a new `Step::outputType()` method, that returns\nif a certain step yields outputs that are associate arrays (or objects),\nscalar values or potentially both (mixed). This helps reduce potential\ncritical problems during a crawler run by validating before the run and\nthrowing an exception (or log error messages).","shortMessageHtmlLink":"keep() instead of addToResult() and sub crawlers"}},{"before":"b719c8b07bdee320d57478b2f0f240dcbf9397c5","after":null,"ref":"refs/heads/bugfix/fail-soft-when-input-key-missing","pushedAt":"2024-03-19T11:39:45.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"}},{"before":"f2214f808663e9dca663000efc6739b2ca87431c","after":"eafa3b6ae70b90a1cda9cb033623dcdac67f7d3e","ref":"refs/heads/main","pushedAt":"2024-03-19T11:39:40.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Fail soft when input key is missing\n\nWhen the `useInputKey()` method is used on a step and the defined key\ndoes not exist in input, it logs a warning and does not invoke the step\ninstead of throwing an `Exception`.","shortMessageHtmlLink":"Fail soft when input key is missing"}},{"before":null,"after":"b719c8b07bdee320d57478b2f0f240dcbf9397c5","ref":"refs/heads/bugfix/fail-soft-when-input-key-missing","pushedAt":"2024-03-19T11:18:01.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Fail soft when input key is missing\n\nWhen the `useInputKey()` method is used on a step and the defined key\ndoes not exist in input, it logs a warning and does not invoke the step\ninstead of throwing an `Exception`.","shortMessageHtmlLink":"Fail soft when input key is missing"}},{"before":"454acfc97a52caba4bf233fcc5bdb1a125a05ef0","after":null,"ref":"refs/heads/bugfix/http-crawl-initial-response-null","pushedAt":"2024-03-11T12:46:01.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"}},{"before":"7e5874477f509e1198967e67a802a41e9a029804","after":"f2214f808663e9dca663000efc6739b2ca87431c","ref":"refs/heads/main","pushedAt":"2024-03-11T12:45:57.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Fix issue in Http::crawl() step\n\nFix a PHP error that happened when the loader returns `null` for the\ninitial request in the `Http::crawl()` step.","shortMessageHtmlLink":"Fix issue in Http::crawl() step"}},{"before":null,"after":"454acfc97a52caba4bf233fcc5bdb1a125a05ef0","ref":"refs/heads/bugfix/http-crawl-initial-response-null","pushedAt":"2024-03-11T12:34:18.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Fix issue in Http::crawl() step\n\nFix a PHP error that happened when the loader returns `null` for the\ninitial request in the `Http::crawl()` step.","shortMessageHtmlLink":"Fix issue in Http::crawl() step"}},{"before":"253549e1c30ee4dc7c4def65935563d57b759073","after":null,"ref":"refs/heads/feature/json-step-improvement","pushedAt":"2024-03-04T13:04:03.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"}},{"before":"95e5bdfa9bae43a21eec47d6d8a20ec34657ab11","after":"7e5874477f509e1198967e67a802a41e9a029804","ref":"refs/heads/main","pushedAt":"2024-03-04T13:03:59.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"JSON step improvement\n\nAllow getting the whole decoded JSON as array with the new `Json::all()`\nand also allow to get the whole decoded JSON, when using `Json::get()`,\ninside a mapping using either empty string or `*` as target.\nExample: `Json::get(['all' => '*'])`. `*` only works, when there is no\nkey `*` in the decoded data.\n\nMake it work with responses loaded by a headless browser. If decoding\nthe input string fails, it now checks if it could be HTML. If that's the\ncase, it extracts the text content of the `` and tries to decode\nthis instead.","shortMessageHtmlLink":"JSON step improvement"}},{"before":null,"after":"253549e1c30ee4dc7c4def65935563d57b759073","ref":"refs/heads/feature/json-step-improvement","pushedAt":"2024-03-04T13:01:58.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"JSON step improvement\n\nAllow getting the whole decoded JSON as array with the new `Json::all()`\nand also allow to get the whole decoded JSON, when using `Json::get()`,\ninside a mapping using either empty string or `*` as target.\nExample: `Json::get(['all' => '*'])`. `*` only works, when there is no\nkey `*` in the decoded data.\n\nMake it work with responses loaded by a headless browser. If decoding\nthe input string fails, it now checks if it could be HTML. If that's the\ncase, it extracts the text content of the `` and tries to decode\nthis instead.","shortMessageHtmlLink":"JSON step improvement"}},{"before":"c944ee14592014aa3bcf9c87af733d317c4f33b8","after":null,"ref":"refs/heads/bugfix/cache-filter-with-redirects","pushedAt":"2024-02-26T22:32:01.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"}},{"before":"7a1633d4e2948a34d6ffade728b6333ad3af8f81","after":"95e5bdfa9bae43a21eec47d6d8a20ec34657ab11","ref":"refs/heads/main","pushedAt":"2024-02-26T22:31:57.000Z","pushType":"pr_merge","commitsCount":3,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Add phpdoc for HttpCrawler::getLoader()\n\nFor better autocompletion in IDEs.","shortMessageHtmlLink":"Add phpdoc for HttpCrawler::getLoader()"}},{"before":"693a658f6332b7908202f42e5575b42ab8496463","after":"c944ee14592014aa3bcf9c87af733d317c4f33b8","ref":"refs/heads/bugfix/cache-filter-with-redirects","pushedAt":"2024-02-26T22:29:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Add phpdoc for HttpCrawler::getLoader()\n\nFor better autocompletion in IDEs.","shortMessageHtmlLink":"Add phpdoc for HttpCrawler::getLoader()"}},{"before":"6db3de44b9bb4c054decac9c42048f747db54f79","after":"693a658f6332b7908202f42e5575b42ab8496463","ref":"refs/heads/bugfix/cache-filter-with-redirects","pushedAt":"2024-02-26T22:19:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Test code improvements\n\nImprove some test code to avoid phpstan ignore line and to have better\nautocompletion.","shortMessageHtmlLink":"Test code improvements"}},{"before":null,"after":"6db3de44b9bb4c054decac9c42048f747db54f79","ref":"refs/heads/bugfix/cache-filter-with-redirects","pushedAt":"2024-02-26T21:35:10.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Fix issue with cache filters and redirects\n\nWhen using `HttpLoader::cacheOnlyWhereUrl()` and a request was\nredirected (maybe even multiple times), previously all URLs in the chain\nhad to match the filter rule. As this isn't really practicable, now only\none of the URLs has to match the rule.","shortMessageHtmlLink":"Fix issue with cache filters and redirects"}},{"before":"18174bbb44d24800c0876d108c43c399b6d06e85","after":null,"ref":"refs/heads/feature/allow-updating-cached-response","pushedAt":"2024-02-16T22:08:46.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"}},{"before":"33b49fbfdb688a61f2801fddb584144b925a9836","after":"7a1633d4e2948a34d6ffade728b6333ad3af8f81","ref":"refs/heads/main","pushedAt":"2024-02-16T22:08:42.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Enable updating cached responses via the Loader\n\nMake method `HttpLoader::addToCache()` public, so steps can update a\ncached response with an extended version.","shortMessageHtmlLink":"Enable updating cached responses via the Loader"}},{"before":null,"after":"18174bbb44d24800c0876d108c43c399b6d06e85","ref":"refs/heads/feature/allow-updating-cached-response","pushedAt":"2024-02-16T17:33:54.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Enable updating cached responses via the Loader\n\nMake method `HttpLoader::addToCache()` public, so steps can update a\ncached response with an extended version.","shortMessageHtmlLink":"Enable updating cached responses via the Loader"}},{"before":"96b86fe97875943cc33f9f64a0f2c4c242feb2ea","after":null,"ref":"refs/heads/feature/add-to-result-dot-notation","pushedAt":"2024-02-13T02:03:09.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"}},{"before":"882bc1ea9a03f6c1c5be622c09f2c71860998def","after":"33b49fbfdb688a61f2801fddb584144b925a9836","ref":"refs/heads/main","pushedAt":"2024-02-13T02:03:00.000Z","pushType":"pr_merge","commitsCount":4,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Revert unnecessary multi line method signature","shortMessageHtmlLink":"Revert unnecessary multi line method signature"}},{"before":"3b815c73a9be62f17686d3b0389c69d4a8fc3e21","after":"96b86fe97875943cc33f9f64a0f2c4c242feb2ea","ref":"refs/heads/feature/add-to-result-dot-notation","pushedAt":"2024-02-13T01:59:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Revert unnecessary multi line method signature","shortMessageHtmlLink":"Revert unnecessary multi line method signature"}},{"before":"7cbac4e37dd2d023b347bc1fa4e97cef313fec16","after":"3b815c73a9be62f17686d3b0389c69d4a8fc3e21","ref":"refs/heads/feature/add-to-result-dot-notation","pushedAt":"2024-02-13T01:52:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"New tests for the store call timing improvement","shortMessageHtmlLink":"New tests for the store call timing improvement"}},{"before":"a715bf1cef9d1134d4cd82eec88e99fc2c134bf4","after":"7cbac4e37dd2d023b347bc1fa4e97cef313fec16","ref":"refs/heads/feature/add-to-result-dot-notation","pushedAt":"2024-02-13T00:46:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Improve store call timing and memory usage\n\nImprovement regarding the timing when a store (`Store` class instance)\nis called by the crawler with a final crawling result. When a crawling\nstep initiates a crawling result (so, `addToResult()` was called on the\nstep instance), the crawler has to wait for all child outputs (resulting\nfrom one step-input) until it calls the store, because the child outputs\ncan all add data to the same final result object. But previously this\nwas not only the case for all child outputs starting from a step where\n`addToResult()` was called, but all children of one initial crawler\ninput. So with this change, in a lot of cases, the store will earlier be\ncalled with finished `Result` objects and memory usage will be lowered.","shortMessageHtmlLink":"Improve store call timing and memory usage"}},{"before":"8e26cc4b73fc1294823397447658fcb6d3a3739b","after":null,"ref":"refs/heads/bugfix/partially-revert-loader-split","pushedAt":"2024-02-12T17:24:31.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"}},{"before":null,"after":"a715bf1cef9d1134d4cd82eec88e99fc2c134bf4","ref":"refs/heads/feature/add-to-result-dot-notation","pushedAt":"2024-02-09T13:38:42.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Enable adding to result from nested output\n\n* Enable dot notation in `Step::addToResult()`, so you can get data from\nnested output, like: `$step->addToResult(['url' => 'response.url',\n'status' => 'response.status', 'foo' => 'bar'])`.\n* When a step adds output properties to the result, and the output\ncontains objects, it tries to serialize those objects to arrays, by\ncalling `__serialize()`. If you want an object to be serialized\ndifferently for that purpose, you can define a `toArrayForAddToResult()`\nmethod in that class. When that method exists, it's preferred to the\n`__serialize()` method.\n* Implemented above-mentioned `toArrayForAddToResult()` method in the\n`RespondedRequest` class, so on every step that somehow yields a\n`RespondedRequest` object, you can use the keys `url`, `uri`, `status`,\n`headers` and `body` with the `addToResult()` method. Previously this\nonly worked for `Http` steps, because it defines output key aliases\n(`HttpBase::outputKeyAliases()`). Now, in combination with the ability\nto use dot notation when adding data to the result, if your custom step\nreturns nested output like `['response' => RespondedRequest, 'foo' =>\n'bar']`, you can add response data to the result like this\n`$step->addToResult(['url' => 'response.url', 'body' =>\n'response.body'])`.","shortMessageHtmlLink":"Enable adding to result from nested output"}},{"before":"9f04c17ee9ccd27f5537e01a7d60fcc1083a2bf5","after":"882bc1ea9a03f6c1c5be622c09f2c71860998def","ref":"refs/heads/main","pushedAt":"2024-02-07T14:48:10.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Merge HttpBaseLoader back to HttpLoader\n\nMerge `HttpBaseLoader` again with `HttpLoader`. It's probably not a good\nidea to have multiple loaders. At least not multiple loaders just for\nHTTP. It should be enough to publicly expose the\n`HeadlessBrowserLoaderHelper` via `HttpLoader::browserHelper()` for the\nextension steps. But keep the `HttpBase` step, to share the general HTTP\nfunctionality implemented there.","shortMessageHtmlLink":"Merge HttpBaseLoader back to HttpLoader"}},{"before":null,"after":"8e26cc4b73fc1294823397447658fcb6d3a3739b","ref":"refs/heads/bugfix/partially-revert-loader-split","pushedAt":"2024-02-07T14:45:29.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"otsch","name":"otsch","path":"/otsch","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4062813?s=80&v=4"},"commit":{"message":"Merge HttpBaseLoader back to HttpLoader\n\nMerge `HttpBaseLoader` again with `HttpLoader`. It's probably not a good\nidea to have multiple loaders. At least not multiple loaders just for\nHTTP. It should be enough to publicly expose the\n`HeadlessBrowserLoaderHelper` via `HttpLoader::browserHelper()` for the\nextension steps. But keep the `HttpBase` step, to share the general HTTP\nfunctionality implemented there.","shortMessageHtmlLink":"Merge HttpBaseLoader back to HttpLoader"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEV-oTSAA","startCursor":null,"endCursor":null}},"title":"Activity ยท crwlrsoft/crawler"}