Skip to content

Commit e837ff9

Browse files
ueshinHyukjinKwon
authored andcommitted
[SPARK-54123][PYTHON] Add timezone to make the timestamp an absolute time
### What changes were proposed in this pull request? Adds timezone to make the timestamp in the log record JSON string the absolute time. ### Why are the changes needed? Without the timezone, the timestamp of each log record doesn't reflect the session timezone, which makes it confusing. <details> <summary>example</summary> ```python >>> from pyspark.sql.functions import * >>> import logging >>> >>> udf ... def logging_test_udf(x): ... logger = logging.getLogger("test") ... logger.warning(f"message") ... return str(x) ... >>> >>> spark.conf.set("spark.sql.pyspark.worker.logging.enabled", True) >>> >>> spark.range(1).select(logging_test_udf("id")).show() ... ``` </details> - Before ```python >>> spark.conf.get('spark.sql.session.timeZone') 'America/Los_Angeles' >>> spark.sql("select ts from system.session.python_worker_logs").show(truncate=False) +--------------------------+ |ts | +--------------------------+ |2025-10-31 17:17:59.495541| +--------------------------+ >>> spark.conf.set('spark.sql.session.timeZone', 'UTC') >>> spark.sql("select ts from system.session.python_worker_logs").show(truncate=False) +--------------------------+ |ts | +--------------------------+ |2025-10-31 17:17:59.495541| +--------------------------+ ``` - After ```python >>> spark.conf.get('spark.sql.session.timeZone') 'America/Los_Angeles' >>> spark.sql("select ts from system.session.python_worker_logs").show(truncate=False) +--------------------------+ |ts | +--------------------------+ |2025-10-31 17:19:52.152868| +--------------------------+ >>> spark.conf.set('spark.sql.session.timeZone', 'UTC') >>> spark.sql("select ts from system.session.python_worker_logs").show(truncate=False) +--------------------------+ |ts | +--------------------------+ |2025-11-01 00:19:52.152868| +--------------------------+ ``` ### Does this PR introduce _any_ user-facing change? Yes, the timestamp of log record is now absolute time. ### How was this patch tested? Manually. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52823 from ueshin/issues/SPARK-54123/timezone. Authored-by: Takuya Ueshin <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
1 parent cce32a8 commit e837ff9

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

python/pyspark/logger/worker_io.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,7 @@ def formatTime(self, record: logging.LogRecord, datefmt: Optional[str] = None) -
164164
)
165165
elif self.default_msec_format:
166166
s = self.default_msec_format % (s, record.msecs)
167+
s = f"{s}{time.strftime('%z', ct)}"
167168
return s
168169

169170

0 commit comments

Comments
 (0)