Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add support for cluster execution of arbitrary notebook code #976

Open
lukeSmth opened this issue Dec 14, 2023 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@lukeSmth
Copy link

Loving the extension! Huge improvement for using engineering best practices and integrating Databricks compute with the larger ecosystem of locally executed tools.

I'd like to see support for executing arbitrary notebook code (not just Spark calls) on remote Databricks clusters. This would allow local developers to seamlessly take advantage of Databricks compute for heavy, non-Spark workflows (model training for example).

Two approaches come to mind:

  1. Pipe commands to the Command Execution API, possibly using a local Jupyter Kernel to interop between the notebook environment and Databricks.
  2. Connect to the driver node Jupyter Kernel over SSH

Command Execution API

The Databricks Power Tools extension solves this by using the Command Execution API.

I don't know Rust, but as far as I can tell this article Connecting Jupyter with Databricks aims to wrap the API with a local Jupyter kernel (which would allow connections to any Jupyter client).

SSH

This seems the most straightforward in terms of net new code required. Also seems identical to the deprecated (for security purposes?) jupyterlab-integration.

@lukeSmth lukeSmth changed the title Add support for cluster execution of arbitrary notebook code [Feature] Add support for cluster execution of arbitrary notebook code Dec 14, 2023
@kartikgupta-db kartikgupta-db added the enhancement New feature or request label Apr 17, 2024
@MrTeale
Copy link

MrTeale commented May 2, 2024

+1 on this

@kartikgupta-db - If you have a rough understanding of what would need to change for this to be implemented and would accept a PR, I'd be willing to have a go. Just need some guidance on getting started

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants