-
Notifications
You must be signed in to change notification settings - Fork 290
Description
It seems that usage of **
in glob
can lead to incorrect listings cache state so that subsequent commands like glob
or ls
produce incorrect results.
I have two examples.
Missing directories
import s3fs
print(s3fs.__version__)
s3 = s3fs.S3FileSystem()
s3.glob("s3://<redacted bucket>/<redacted directory>/*/<redacted>/**/*.txt")
print(s3.ls("s3://<redacted bucket>/<redacted directory>"))
Here s3.ls
misses several directories that are contained in <redacted directory>
. If I comment s3.glob
out, or replace **
with *
(the output of glob
does not change, as there is only one subdirectory level in <redacted>
), no missed folders in ls
output.
Duplicated directories
I tried to reproduce the previous example with a simpler directory structure, and encountered different problem.
I have a very simple structure inside my bucket:
a/
b/somefile.pdf
c/
Then I do
import s3fs
print(s3fs.__version__)
s3 = s3fs.S3FileSystem()
s3.glob("s3://<test bucket>/**/*.pdf")
print(s3.ls("s3://<test bucket>/"))
The output:
2025.9.0
['<test bucket>/a/', '<test bucket>/b', '<test bucket>/b', '<test bucket>/b/', '<test bucket>/c/']
So directory b
is present three times (one with a leading /
, the other two without).
If I replace **
with *
, I get different output:
2025.9.0
['<test bucket>/a', '<test bucket>/b', '<test bucket>/c']
In all cases, if I initialize the filesystem with s3 = s3fs.S3FileSystem(use_listings_cache=False)
, the problems disappear. So I believe the problem is the interaction between **
in glob and the cache.
I am using 2025.9.0 version of s3fs (also reproduces with 2025.9.0+3.g2ccadeb), Python 3.12.9 on Mac OS.