-
Notifications
You must be signed in to change notification settings - Fork 42
Orchestrator course #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some notes regarding the dlt_pipeline.py. Overall, there are no problems, but I think some tweaks can be made for users to follow along the course with less friction.
|
||
github_source = rest_api_source(config) | ||
|
||
# pipeline = dlt.pipeline( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's uncomment this and put it in under a main check (if name == "main"). I (and the users) can run the pipeline locally that way.
@@ -0,0 +1,80 @@ | |||
import dlt | |||
from dlt.sources.rest_api import RESTAPIConfig, rest_api_source | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pendulum needs to be imported
|
||
# pipeline = dlt.pipeline( | ||
# pipeline_name="github_repos_issues", | ||
# destination="duckdb", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll be using bigquery as our destination instead of duckdb
@@ -0,0 +1,80 @@ | |||
import dlt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant to this script itself, but I am wondering if there should also be a README file with instructions on running the script. We can give installation commands and also provide links to the dlt docs when relevant.
The script looks and works great, but it might be hard for users to navigate it without a guide.
Implemented backfill and incremental loading.