Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Are there cases where BrowserFetcher does not fully support CSR? #224

Open
pistolcaffe opened this issue Jun 24, 2023 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@pistolcaffe
Copy link

describe what you want to archive
I am going to create a user guide page for my app and I need to crawl that page in my app. (I need to crawl certain urls in the app as well as notion pages)
https://fundevstudio.notion.site/524eafbfa8f2414898d6d8d79f222c05?pvs=4

However, even if �i use the initial BrowserFetcher,cannot get the title of the loaded page.

Please let me know if there is any additional way I can do it.

Code Sample

fun main(args: Array<String>) {
    skrape(BrowserFetcher) {
        request {
            url = "https://fundevstudio.notion.site/524eafbfa8f2414898d6d8d79f222c05?pvs=4"
        }

        response {
            htmlDocument {
                println("title: $titleText")
            }
        }
    }
}

[expect] title: 인사이트 플로우 가이드
[but] title: Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.

If it is not possible, waitUntill property value similar to playwright, puppeteer: load, networkidle, documentLoaded
Please consider providing options.

@pistolcaffe pistolcaffe added the question Further information is requested label Jun 24, 2023
@pistolcaffe
Copy link
Author

When using htmlUnit directly, I found the following exception. net.sourceforge.htmlunit.corejs.javascript.EvaluatorException: identifier is a reserved word: class (https://fundevstudio.notion.site/8402-8521e6e24e557272e4c0.js#1)

Since htmlUnit is using an outdated Rhino, I think we may need to consider porting it to a V8 engine or something.

Of course, it's only speculation that the exception caused by the engine is the direct cause. If there is any additional information, I will write a comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants