-
Notifications
You must be signed in to change notification settings - Fork 256
[query] Various Benchmark Suite Improvements #14761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
e47b948 to
df2624c
Compare
33b6b9d to
54cd48a
Compare
df2624c to
b6c0201
Compare
54cd48a to
93e81d9
Compare
b6c0201 to
5828ffd
Compare
93e81d9 to
2344aed
Compare
5828ffd to
d8e0986
Compare
2344aed to
e8cb1fb
Compare
5c2b0f3 to
ada7cfa
Compare
e8cb1fb to
c4a8c24
Compare
c4a8c24 to
687437c
Compare
687437c to
c10bd00
Compare
ada7cfa to
92c63d7
Compare
c10bd00 to
578a974
Compare
92c63d7 to
4e259cd
Compare
578a974 to
624cfbb
Compare
4e259cd to
a44a1f9
Compare
624cfbb to
4ff2e4d
Compare
a44a1f9 to
908af43
Compare
4ff2e4d to
30172a1
Compare
908af43 to
71ad800
Compare
30172a1 to
4c313eb
Compare
db3c740 to
f639115
Compare
904283a to
e0980e6
Compare
e0980e6 to
ae6d51c
Compare
341f1dd to
744cbba
Compare
| # Note: the below does not work on batch due to docker/ssl problems | ||
| # dest = os.path.join(data_dir, filename) | ||
| # urlretrieve(url, dest) | ||
| subprocess.check_call(['curl', url, '-Lfs', '--output', f'{data_dir / filename}']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you're deleting this comment. Have we every diagnosed why we can't use python to download these files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No clue. urlretrieve seems to be deprecated now so I'm not going to look into it.
This is tested and seems to work - not sure if looking into a python only implementation is worth it.
hail/scripts/benchmark_in_batch.py
Outdated
| items = collect.items | ||
|
|
||
| if include is not None: | ||
| diff = set(include) - set([nodeid for nodeid, *_ in items]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a set comprehension.
| diff = set(include) - set([nodeid for nodeid, *_ in items]) | |
| diff = set(include) - {nodeid for nodeid, *_ in items} |
| b.sc.cancelAllJobs() | ||
|
|
||
| yield (max_duration, True, traceback.format_exc()) | ||
| raise TimeoutError(f'Timed out after {max_duration}s') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an analyzer warning here. Is there a reason that we aren't re-raising?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - __Timeout is an internal implementation detail. I'll supress the warning.
744cbba to
9000864
Compare
9000864 to
27fabe0
Compare

Changes include:
benchmark()decorator - a test being in 'benchmark/' is sufficient to indicate that it's a benchmark.xtimeoutmarker to gracefully handle benchmarks that exceed time limitsHAIL_BENCHMARK_DIRThis change has no impact on the GCP hail batch deployment as it is limited to refactoring of query benchmark infrastructure. Changes only affect how benchmarks are run and reported, not the actual functionality being tested.