databricks · lennartkats-db · Sep 7, 2025 · Sep 8, 2025 · Sep 14, 2025 · Sep 14, 2025
@@ -1,12 +1,16 @@
 
 >>> [CLI] bundle init lakeflow-pipelines --config-file ./input.json --output-dir output
-
 Welcome to the template for Lakeflow Declarative Pipelines!
 
+Please answer the below to tailor your project to your preferences.
+You can always change your mind and change your configuration in the databricks.yml file later.
+
+Note that [DATABRICKS_URL] is used for initialization
+(see https://docs.databricks.com/dev-tools/cli/profiles.html for how to change your profile).
 
-Your new project has been created in the 'my_lakeflow_pipelines' directory!
+✨ Your new project has been created in the 'my_lakeflow_pipelines' directory!
 
-Refer to the README.md file for "getting started" instructions!
+Please refer to the README.md file for "getting started" instructions.
 
 >>> [CLI] bundle validate -t dev
 Name: my_lakeflow_pipelines

@@ -1,7 +1,7 @@
 {
     "recommendations": [
         "databricks.databricks",
-        "ms-python.vscode-pylance",
-        "redhat.vscode-yaml"
+        "redhat.vscode-yaml",
+        "ms-python.black-formatter"
     ]
 }
@@ -1,19 +1,37 @@
 {
-    "python.analysis.stubPath": ".vscode",
-    "databricks.python.envFile": "${workspaceFolder}/.env",
     "jupyter.interactiveWindow.cellMarker.codeRegex": "^# COMMAND ----------|^# Databricks notebook source|^(#\\s*%%|#\\s*\\<codecell\\>|#\\s*In\\[\\d*?\\]|#\\s*In\\[ \\])",
     "jupyter.interactiveWindow.cellMarker.default": "# COMMAND ----------",
     "python.testing.pytestArgs": [
         "."
     ],
-    "python.testing.unittestEnabled": false,
-    "python.testing.pytestEnabled": true,
-    "python.analysis.extraPaths": ["resources/my_lakeflow_pipelines_pipeline"],
     "files.exclude": {
         "**/*.egg-info": true,
         "**/__pycache__": true,
         ".pytest_cache": true,
+        "dist": true,
+    },
+    "files.associations": {
+        "**/.gitkeep": "markdown"
     },
+
+    // Pylance settings (VS Code)
+    // Set typeCheckingMode to "basic" to enable type checking!
+    "python.analysis.typeCheckingMode": "off",
+    "python.analysis.extraPaths": ["src", "lib", "resources"],
+    "python.analysis.diagnosticMode": "workspace",
+    "python.analysis.stubPath": ".vscode",
+
+    // Pyright settings (Cursor)
+    // Set typeCheckingMode to "basic" to enable type checking!
+    "cursorpyright.analysis.typeCheckingMode": "off",
+    "cursorpyright.analysis.extraPaths": ["src", "lib", "resources"],
+    "cursorpyright.analysis.diagnosticMode": "workspace",
+    "cursorpyright.analysis.stubPath": ".vscode",
+
+    // General Python settings
+    "python.defaultInterpreterPath": "./.venv/bin/python",
+    "python.testing.unittestEnabled": false,
+    "python.testing.pytestEnabled": true,
     "[python]": {
         "editor.defaultFormatter": "ms-python.black-formatter",
         "editor.formatOnSave": true,

@@ -2,38 +2,53 @@
 
 The 'my_lakeflow_pipelines' project was generated by using the Lakeflow Pipelines template.
 
-## Setup
+* `lib/`: Python source code for this project.
+* `lib/shared`: Shared source code across all jobs/pipelines/etc.
+* `resources/lakeflow_pipelines_etl`: Pipeline code and assets for the lakeflow_pipelines_etl pipeline.
+* `resources/`:  Resource configurations (jobs, pipelines, etc.)
 
-1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/databricks-cli.html
+## Getting started
 
-2. Authenticate to your Databricks workspace, if you have not done so already:
-    ```
-    $ databricks auth login
-    ```
+Choose how you want to work on this project:
+
+(a) Directly in your Databricks workspace, see
+    https://docs.databricks.com/dev-tools/bundles/workspace.
 
-3. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
-   https://docs.databricks.com/dev-tools/vscode-ext.html. Or the PyCharm plugin from
-   https://www.databricks.com/blog/announcing-pycharm-integration-databricks.
+(b) Locally with an IDE like Cursor or VS Code, see
+    https://docs.databricks.com/vscode-ext.
 
+(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html
 
-## Deploying resources
+# Using this project using the CLI
 
-1. To deploy a development copy of this project, type:
+The Databricks workspace and IDE extensions provide a graphical interface for working
+with this project. It's also possible to interact with it directly using the CLI:
+
+1. Authenticate to your Databricks workspace, if you have not done so already:
+    ```
+    $ databricks configure
+    ```
+
+2. To deploy a development copy of this project, type:
     ```
     $ databricks bundle deploy --target dev
     ```
     (Note that "dev" is the default target, so the `--target` parameter
     is optional here.)
 
-2. Similarly, to deploy a production copy, type:
-   ```
-   $ databricks bundle deploy --target prod
-   ```
+    This deploys everything that's defined for this project.
+    For example, the default template would deploy a pipeline called
+    `[dev yourname] lakeflow_pipelines_etl` to your workspace.
+    You can find that resource by opening your workpace and clicking on **Jobs & Pipelines**.
 
-3. Use the "summary" comand to review everything that was deployed:
+3. Similarly, to deploy a production copy, type:
    ```
-   $ databricks bundle summary
+   $ databricks bundle deploy --target prod
    ```
+   Note the default template has a includes a job that runs the pipeline every day
+   (defined in resources/lakeflow_pipelines_etl/lakeflow_pipelines_job.job.yml). The schedule
+   is paused when deploying in development mode (see
+   https://docs.databricks.com/dev-tools/bundles/deployment-modes.html).
 
 4. To run a job or pipeline, use the "run" command:
    ```

@@ -14,8 +14,6 @@ variables:
     description: The catalog to use
   schema:
     description: The schema to use
-  notifications:
-    description: The email addresses to use for failure notifications
 
 targets:
   dev:
@@ -30,18 +28,15 @@ targets:
     variables:
       catalog: main
       schema: ${workspace.current_user.short_name}
-      notifications: []
-
   prod:
     mode: production
     workspace:
       host: [DATABRICKS_URL]
       # We explicitly deploy to /Workspace/Users/[USERNAME] to make sure we only have a single copy.
       root_path: /Workspace/Users/[USERNAME]/.bundle/${bundle.name}/${bundle.target}
+    variables:
+      catalog: main
+      schema: prod
     permissions:
       - user_name: [USERNAME]
         level: CAN_MANAGE
-    variables:
-      catalog: main
-      schema: default
-      notifications: [[USERNAME]]
@@ -0,0 +1,7 @@
+from databricks.sdk.runtime import spark
+from pyspark.sql import DataFrame
+
+
+def find_all_taxis() -> DataFrame:
+    """Find all taxi data."""
+    return spark.read.table("samples.nyctaxi.trips")
@@ -4,5 +4,7 @@ dist/
 __pycache__/
 *.egg-info
 .venv/
+scratch/**
+!scratch/README.md
 **/explorations/**
 **/!explorations/README.md
@@ -0,0 +1 @@
+This folder is reserved for Databricks Asset Bundles resource definitions.
@@ -1,11 +1,11 @@
-# my_lakeflow_pipelines_pipeline
+# my_lakeflow_pipelines
 
-This folder defines all source code for the my_lakeflow_pipelines_pipeline pipeline:
+This folder defines all source code for the my_lakeflow_pipelines pipeline:
 
-- `explorations`: Ad-hoc notebooks used to explore the data processed by this pipeline.
-- `transformations`: All dataset definitions and transformations.
-- `utilities` (optional): Utility functions and Python modules used in this pipeline.
-- `data_sources` (optional): View definitions describing the source data for this pipeline.
+- `explorations/`: Ad-hoc notebooks used to explore the data processed by this pipeline.
+- `transformations/`: All dataset definitions and transformations.
+- `utilities/` (optional): Utility functions and Python modules used in this pipeline.
+- `data_sources/` (optional): View definitions describing the source data for this pipeline.
 
 ## Getting Started
 

@@ -37,7 +37,7 @@
    "source": [
     "# !!! Before performing any data analysis, make sure to run the pipeline to materialize the sample datasets. The tables referenced in this notebook depend on that step.\n",
     "\n",
-    "display(spark.sql(\"SELECT * FROM main.[USERNAME].my_lakeflow_pipelines\"))"
+    "display(spark.sql(\"SELECT * FROM main.[USERNAME].sample_trips_my_lakeflow_pipelines\"))"
    ]
   }
  ],

@@ -1,12 +1,15 @@
+# The main pipeline for my_lakeflow_pipelines
+
 resources:
   pipelines:
-    {{template `pipeline_name` .}}:
-      name: {{template `pipeline_name` .}}
-      serverless: true
-      channel: "PREVIEW"
+    lakeflow_pipelines_etl:
+      name: lakeflow_pipelines_etl
+      ## Catalog is required for serverless compute
       catalog: ${var.catalog}
       schema: ${var.schema}
+      serverless: true
       root_path: "."
+
       libraries:
         - glob:
             include: transformations/**
@@ -1,19 +1,21 @@
-# The job that triggers my_lakeflow_pipelines_pipeline.
+# The job that triggers lakeflow_pipelines_etl.
+
 resources:
   jobs:
-    my_lakeflow_pipelines_job:
-      name: my_lakeflow_pipelines_job
+    lakeflow_pipelines_job:
+      name: lakeflow_pipelines_job
 
       trigger:
         # Run this job every day, exactly one day from the last run; see https://docs.databricks.com/api/workspace/jobs/create#trigger
         periodic:
           interval: 1
           unit: DAYS
 
-      email_notifications:
-        on_failure: ${var.notifications}
+      #email_notifications:
+      #  on_failure:
+      #    - [email protected]
 
       tasks:
         - task_key: refresh_pipeline
           pipeline_task:
-            pipeline_id: ${resources.pipelines.my_lakeflow_pipelines_pipeline.id}
+            pipeline_id: ${resources.pipelines.lakeflow_pipelines_etl.id}
@@ -1,13 +1,12 @@
-import dlt
+from pyspark import pipelines as dp
 from pyspark.sql.functions import col
-from utilities import utils
 
 
 # This file defines a sample transformation.
 # Edit the sample below or add new transformations
 # using "+ Add" in the file browser.
 
 
-@dlt.table
+@dp.table
 def sample_trips_my_lakeflow_pipelines():
-    return spark.read.table("samples.nyctaxi.trips").withColumn("trip_distance_km", utils.distance_km(col("trip_distance")))
+    return spark.read.table("samples.nyctaxi.trips")
@@ -1,4 +1,4 @@
-import dlt
+from pyspark import pipelines as dp
 from pyspark.sql.functions import col, sum
 
 
@@ -7,7 +7,7 @@
 # using "+ Add" in the file browser.
 
 
-@dlt.table
+@dp.table
 def sample_zones_my_lakeflow_pipelines():
     # Read from the "sample_trips" table, then sum all the fares
-    return spark.read.table("sample_trips_my_lakeflow_pipelines").groupBy(col("pickup_zip")).agg(sum("fare_amount").alias("total_fare"))
+    return spark.read.table(f"sample_trips_my_lakeflow_pipelines").groupBy(col("pickup_zip")).agg(sum("fare_amount").alias("total_fare"))
@@ -1,12 +1,16 @@
 
 >>> [CLI] bundle init lakeflow-pipelines --config-file ./input.json --output-dir output
-
 Welcome to the template for Lakeflow Declarative Pipelines!
 
+Please answer the below to tailor your project to your preferences.
+You can always change your mind and change your configuration in the databricks.yml file later.
+
+Note that [DATABRICKS_URL] is used for initialization
+(see https://docs.databricks.com/dev-tools/cli/profiles.html for how to change your profile).
 
-Your new project has been created in the 'my_lakeflow_pipelines' directory!
+✨ Your new project has been created in the 'my_lakeflow_pipelines' directory!
 
-Refer to the README.md file for "getting started" instructions!
+Please refer to the README.md file for "getting started" instructions.
 
 >>> [CLI] bundle validate -t dev
 Name: my_lakeflow_pipelines

@@ -1,7 +1,7 @@
 {
     "recommendations": [
         "databricks.databricks",
-        "ms-python.vscode-pylance",
-        "redhat.vscode-yaml"
+        "redhat.vscode-yaml",
+        "ms-python.black-formatter"
     ]
 }
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		This folder is reserved for Databricks Asset Bundles resource definitions.