Skip to content

Conversation

cbornet
Copy link
Collaborator

@cbornet cbornet commented Sep 2, 2025

Summary

This PR standardizes all text file I/O to use UTF-8. This eliminates OS-specific defaults (e.g. Windows cp1252) and ensures consistent, Unicode-safe behavior across platforms.

Breaking changes

Users on systems with a default encoding which is not utf-8 may see decoding errors from the following code paths:

  • langchain_core.vectorstores.in_memroy.InMemoryVectorStore.load
  • langchain_core.prompts.loading.load_prompt
  • from_template in AIMessagePromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate

Migration

Change the encoding of files that are encoded with a non utf-8 encoding to utf-8.

@cbornet cbornet requested a review from eyurtsev as a code owner September 2, 2025 09:32
Copy link

vercel bot commented Sep 2, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
langchain Ready Ready Preview Comment Sep 2, 2025 9:45am

@cbornet cbornet requested a review from mdrxy September 2, 2025 09:33
Copy link

codspeed-hq bot commented Sep 2, 2025

CodSpeed WallTime Performance Report

Merging #32784 will not alter performance

Comparing cbornet:plw1514 (d71acf9) with wip-v1.0 (25d5db8)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 13 untouched benchmarks

@cbornet cbornet changed the title Add utf-8 encoding to Path read_text/write_text chore(all): add utf-8 encoding to Path read_text/write_text Sep 2, 2025
@cbornet cbornet changed the title chore(all): add utf-8 encoding to Path read_text/write_text chore(core): add utf-8 encoding to Path read_text/write_text Sep 2, 2025
Copy link

codspeed-hq bot commented Sep 2, 2025

CodSpeed Instrumentation Performance Report

Merging #32784 will not alter performance

Comparing cbornet:plw1514 (d71acf9) with wip-v1.0 (25d5db8)

Summary

✅ 14 untouched benchmarks

@mdrxy
Copy link
Collaborator

mdrxy commented Sep 3, 2025

https://docs.astral.sh/ruff/rules/unspecified-encoding/

Should ensure consistency

Also, windows doesn't use utf-8 by default so we may need to investigate remediation for those users

@cbornet
Copy link
Collaborator Author

cbornet commented Sep 3, 2025

NB: currently missing encoding in Path.read/write is not detected by ruff rule PLW1514 but it will be in the future (already detected with --preview flag)

@mdrxy mdrxy changed the title chore(core): add utf-8 encoding to Path read_text/write_text chore(core): add utf-8 encoding to Path read_text/write_text Sep 3, 2025
@mdrxy mdrxy added this to the v1 milestone Sep 3, 2025
@mdrxy mdrxy added the core Related to the package `langchain-core` label Sep 3, 2025
@eyurtsev eyurtsev added the v1 Issue specific to LangChain 1.0 label Sep 8, 2025
@eyurtsev eyurtsev merged commit 083fbfb into langchain-ai:wip-v1.0 Sep 8, 2025
87 of 90 checks passed
@cbornet cbornet deleted the plw1514 branch September 8, 2025 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking core Related to the package `langchain-core` v1 Issue specific to LangChain 1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants