The Databricks CLI and REST API are essential tools for automating and managing various tasks within the Databricks environment. These tools provide a powerful interface for interacting with clusters, jobs, workspaces, and data assets, offering flexibility and efficiency for day-to-day operations. Here are five key use cases, each accompanied by example implementations using both the CLI and REST API.
1. Cluster Management
Efficiently managing clusters is crucial for optimizing resource utilization and cost. This includes creating, starting, and stopping clusters as needed.
CLI Example:
databricks clusters create --json '{
"cluster_name": "example-cluster",
"spark_version": "7.3.x-scala2.12",
"node_type_id": "i3.xlarge",
"num_workers": 2
}'
REST API Example (Python):
import requests
DATABRICKS_URL = "https://<databricks-instance>"
TOKEN = "<access-token>"
CLUSTER_ID = "1234-567890-abcd123"
response = requests.post(
f"{DATABRICKS_URL}/api/2.0/clusters/start",
headers={"Authorization": f"Bearer {TOKEN}"},
json={"cluster_id": CLUSTER_ID}
)
print(response.json())
2. Job Scheduling and Management
Automating job scheduling is vital for maintaining consistent data processing workflows.
CLI Example:
databricks jobs create --json '{
"name": "example-job",
"new_cluster": {
"spark_version": "7.3.x-scala2.12",
"node_type_id": "i3.xlarge",
"num_workers": 2
},
"notebook_task": {
"notebook_path": "/Users/username/example-notebook"
}
}'
REST API Example (Python):
import requests
JOB_ID = 1234
DATABRICKS_URL = "https://<databricks-instance>"
TOKEN = "<access-token>"
response = requests.post(
f"{DATABRICKS_URL}/api/2.0/jobs/run-now",
headers={"Authorization": f"Bearer {TOKEN}"},
json={"job_id": JOB_ID}
)
print(response.json())
3. Workspace Management
Managing notebooks and other workspace assets is key to organizing and maintaining project consistency.
CLI Example:
# Upload a notebook
databricks workspace import --language PYTHON --format SOURCE /path/to/local-notebook.py /Users/username/notebooks/example-notebook.py
REST API Example (Python):
import requests
DATABRICKS_URL = "https://<databricks-instance>"
TOKEN = "<access-token>"
NOTEBOOK_PATH = "/Users/username/notebooks/example-notebook.py"
# Export a notebook
response = requests.get(
f"{DATABRICKS_URL}/api/2.0/workspace/export",
headers={"Authorization": f"Bearer {TOKEN}"},
params={"path": NOTEBOOK_PATH, "format": "SOURCE"}
)
with open("local-notebook.py", "wb") as f:
f.write(response.content)
4. Security and Access Control
Proper access control ensures that sensitive data and operations are secure.
CLI Example:
databricks scim create-user --json '{
"userName": "user@example.com",
"groups": [{"value": "group-id"}]
}'
REST API Example (Python):
import requests
DATABRICKS_URL = "https://<databricks-instance>"
TOKEN = "<access-token>"
USER_ID = "user-id"
GROUP_ID = "group-id"
response = requests.post(
f"{DATABRICKS_URL}/api/2.0/preview/scim/v2/Groups/{GROUP_ID}/members",
headers={"Authorization": f"Bearer {TOKEN}"},
json={"members": [{"value": USER_ID}]}
)
print(response.json())
5. Data Management and Operations
Handling data assets, including creating, reading, updating, and deleting data, is fundamental to data engineering tasks.
CLI Example:
databricks sql create-table --name example_table --path dbfs:/path/to/table --columns 'id INT, value STRING'
REST API Example (Python):
import requests
DATABRICKS_URL = "https://<databricks-instance>"
TOKEN = "<access-token>"
response = requests.post(
f"{DATABRICKS_URL}/api/2.0/sql/warehouses/create",
headers={"Authorization": f"Bearer {TOKEN}"},
json={
"name": "example_table",
"columns": [
{"name": "id", "type": "INT"},
{"name": "value", "type": "STRING"}
]
}
)
print(response.json())
These examples showcase how Databricks CLI and REST API can streamline various daily tasks, making it easier to manage and automate your Databricks environment effectively.