You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+40Lines changed: 40 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,7 @@ Table of Contents
15
15
*[bind_condor.sh](#bind_condorsh)
16
16
*[Usage](#usage-1)
17
17
*[Setting up bindings](#setting-up-bindings)
18
+
*[get_files_on_disk.py](#get_files_on_diskpy)
18
19
*[tunn](#tunn)
19
20
*[Detailed usage](#detailed-usage)
20
21
*[Web browser usage](#web-browser-usage)
@@ -214,6 +215,45 @@ In this particular case, it is necessary to upgrade `pip` because the Python ver
214
215
**NOTE**: These recipes only install the bindings for Python3. (Python2 was still the default in `CMSSW_10_6_X`.)
215
216
You will need to make sure any scripts using the bindings are compatible with Python3.
216
217
218
+
## `get_files_on_disk.py`
219
+
220
+
This script automates the process of querying Rucio to find only the files in a CMS data or MC sample that are currently hosted on disk.
221
+
(The most general form of this functionality is not currently available from other CMS database tools such as `dasgoclient`.)
222
+
223
+
There are two major use cases for this tool:
224
+
1. Finding AOD (or earlier formats such as RECO or RAW) files for testing or development. (AOD samples are not hosted on disk by default, so typically only small subsets of a sample will be transferred to disk for temporary usage.)
225
+
2. Obtaining file lists for premixed pileup samples for private MC production. (Premixed pileup input samples are no longer fully hosted on disk because of resource limitations.)
226
+
227
+
A fraction of each premixed pileup sample is subscribed to disk by the central production team, and the corresponding list of files is synced to cvmfs.
228
+
By default, this script will just copy this cached information.
229
+
This is the most stable and preferred approach, so only deviate from it if absolutely necessary.
230
+
231
+
This script should *not* be run in batch jobs, as that can lead to an inadvertent distributed denial of service disruption of the CMS data management system.
232
+
The script will actively try to prevent you from running it in batch jobs.
233
+
Please run the script locally, before submitting your jobs, and send the resulting information as part of the job input files.
0 commit comments