-
Notifications
You must be signed in to change notification settings - Fork 197
Add time-zone aware timestamp normalization transformer with tests #680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add time-zone aware timestamp normalization transformer with tests #680
Conversation
Pull Request Test Coverage Report for Build 14765567445Details
💛 - Coveralls |
juarezr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind making some changes to this PR?
| @@ -0,0 +1,38 @@ | |||
| from datetime import datetime | |||
| import pytz | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, pytz is not a hard requirement when using petl.
Due to this, the CI jobs running on windows are failing with the following error:
=================================== ERRORS ====================================
_______ ERROR collecting petl/test/transform/test_normalize_timezone.py _______
ImportError while importing test module 'D:\a\petl\petl\petl\test\transform\test_normalize_timezone.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
C:\hostedtoolcache\windows\Python\3.6.8\x64\lib\importlib\__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
petl\test\transform\test_normalize_timezone.py:2: in <module>
from petl.transform.normalize_timezone import normalize_timezone
petl\transform\normalize_timezone.py:2: in <module>
import pytz
E ModuleNotFoundError: No module named 'pytz'
Would you mind making the pytz import to be called only when explicitly using this functionality?
| {'timestamp': '2023-12-01T10:00:00', 'timezone': 'America/New_York'}, | ||
| {'timestamp': '2023-12-01T15:00:00', 'timezone': 'Europe/London'} | ||
| ] | ||
| result = list(normalize_timezone(input_data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like python3.6 doesn't work with this:
_________________ TestNormalizeTimezone.test_basic_conversion __________________
table = [{'timestamp': '2023-12-01T10:00:00', 'timezone': 'America/New_York'}, {'timestamp': '2023-12-01T15:00:00', 'timezone': 'Europe/London'}]
timestamp_col = 'timestamp', tz_col = 'timezone'
def normalize_timezone(table, timestamp_col='timestamp', tz_col='timezone'):
"""
Normalize timestamps to UTC while retaining original timezone.
Args:
table: petl table (iterable of rows/dicts)
timestamp_col (str): column name with timestamp strings
tz_col (str): column name with timezone name (e.g., 'America/New_York')
Yields:
Each row with two added fields: 'timestamp_utc' and 'timezone_original'
"""
for row in table:
try:
original_ts = row[timestamp_col]
original_tz = row[tz_col]
# Parse the timestamp
> naive_dt = datetime.fromisoformat(original_ts)
E AttributeError: type object 'datetime.datetime' has no attribute 'fromisoformat'
petl/transform/normalize_timezone.py:22: AttributeError
During handling of the above exception, another exception occurred:
self = <petl.test.transform.test_normalize_timezone.TestNormalizeTimezone testMethod=test_basic_conversion>
def test_basic_conversion(self):
input_data = [
{'timestamp': '2023-12-01T10:00:00', 'timezone': 'America/New_York'},
{'timestamp': '2023-12-01T15:00:00', 'timezone': 'Europe/London'}
]
> result = list(normalize_timezone(input_data))
petl/test/transform/test_normalize_timezone.py:11:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
...
Can you rework this test to be skipped when python <= 3.6, please?
| tz_col (str): column name with timezone name (e.g., 'America/New_York') | ||
|
|
||
| Yields: | ||
| Each row with two added fields: 'timestamp_utc' and 'timezone_original' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a code example here would be interesting, but not required.
Purpose
Adds a reusable ETL transformation to normalize timestamps to UTC based on source time zone.
Features
Real-World Use
Essential for pipelines dealing with global log files, sensors, APIs, or multi-region data warehouses ensures consistency and accurate time-series reporting.
Tests
Included unit tests:
All tests passed