Skip to content

Commit e683c53

Browse files
authored
Update README for the new paclage (#12)
1 parent 0977088 commit e683c53

File tree

1 file changed

+4
-99
lines changed

1 file changed

+4
-99
lines changed

README.md

Lines changed: 4 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -1,106 +1,11 @@
1-
# ClusterManagers.jl
1+
# HTCondorClusterManager.jl
22

3-
The `ClusterManagers.jl` package implements code for different job queue systems commonly used on compute clusters.
3+
The `HTCondorClusterManager.jl` package implements code for HTCondor clusters.
44

5-
> [!WARNING]
6-
> This package is not currently being actively maintained or tested.
7-
>
8-
> We are in the process of splitting this package up into multiple smaller packages, with a separate package for each job queue systems.
9-
>
10-
> We are seeking maintainers for these new packages. If you are an active user of any of the job queue systems listed below and are interested in being a maintainer, please open a GitHub issue - say that you are interested in being a maintainer, and specify which job queue system you use.
11-
12-
## Available job queue systems
13-
14-
### In this package
15-
16-
The following managers are implemented in this package (the `ClusterManagers.jl` package):
17-
18-
| Job queue system | Command to add processors |
19-
| ---------------- | ------------------------- |
20-
| Local manager with CPU affinity setting | `addprocs(LocalAffinityManager(;np=CPU_CORES, mode::AffinityMode=BALANCED, affinities=[]); kwargs...)` |
21-
22-
### Implemented in external packages
23-
24-
| Job queue system | External package | Command to add processors |
25-
| ---------------- | ---------------- | ------------------------- |
26-
| Slurm | [SlurmClusterManager.jl](https://github.com/JuliaParallel/SlurmClusterManager.jl) | `addprocs(SlurmManager(); kwargs...)` |
27-
| Load Sharing Facility (LSF) | [LSFClusterManager.jl](https://github.com/JuliaParallel/LSFClusterManager.jl) | `addprocs_lsf(np::Integer; bsub_flags=``, ssh_cmd=``)` or `addprocs(LSFManager(np, bsub_flags, ssh_cmd, retry_delays, throttle))` |
28-
| Kubernetes (K8s) | [K8sClusterManagers.jl](https://github.com/beacon-biosignals/K8sClusterManagers.jl) | `addprocs(K8sClusterManager(np; kwargs...))` |
29-
| Azure scale-sets | [AzManagers.jl](https://github.com/ChevronETC/AzManagers.jl) | `addprocs(vmtemplate, n; kwargs...)` |
30-
31-
### Not currently being actively maintained
32-
33-
> [!WARNING]
34-
> The following managers are not currently being actively maintained or tested.
35-
>
36-
> We are seeking maintainers for the following managers. If you are an active user of any of the following job queue systems listed and are interested in being a maintainer, please open a GitHub issue - say that you are interested in being a maintainer, and specify which job queue system you use.
37-
>
5+
Implemented in this package:
386

397
| Job queue system | Command to add processors |
408
| ---------------- | ------------------------- |
41-
| Sun Grid Engine (SGE) via `qsub` | `addprocs_sge(np::Integer; qsub_flags=``)` or `addprocs(SGEManager(np, qsub_flags))` |
42-
| Sun Grid Engine (SGE) via `qrsh` | `addprocs_qrsh(np::Integer; qsub_flags=``)` or `addprocs(QRSHManager(np, qsub_flags))` |
43-
| PBS (Portable Batch System) | `addprocs_pbs(np::Integer; qsub_flags=``)` or `addprocs(PBSManager(np, qsub_flags))` |
44-
| Scyld | `addprocs_scyld(np::Integer)` or `addprocs(ScyldManager(np))` |
459
| HTCondor | `addprocs_htc(np::Integer)` or `addprocs(HTCManager(np))` |
4610

47-
### Custom managers
48-
49-
You can also write your own custom cluster manager; see the instructions in the [Julia manual](https://docs.julialang.org/en/v1/manual/distributed-computing/#ClusterManagers).
50-
51-
## Notes on specific managers
52-
53-
### Slurm: please see [SlurmClusterManager.jl](https://github.com/JuliaParallel/SlurmClusterManager.jl)
54-
55-
For Slurm, please see the [SlurmClusterManager.jl](https://github.com/JuliaParallel/SlurmClusterManager.jl) package.
56-
57-
### Using `LocalAffinityManager` (for pinning local workers to specific cores)
58-
59-
- Linux only feature.
60-
- Requires the Linux `taskset` command to be installed.
61-
- Usage : `addprocs(LocalAffinityManager(;np=CPU_CORES, mode::AffinityMode=BALANCED, affinities=[]); kwargs...)`.
62-
63-
where
64-
65-
- `np` is the number of workers to be started.
66-
- `affinities`, if specified, is a list of CPU IDs. As many workers as entries in `affinities` are launched. Each worker is pinned
67-
to the specified CPU ID.
68-
- `mode` (used only when `affinities` is not specified, can be either `COMPACT` or `BALANCED`) - `COMPACT` results in the requested number
69-
of workers pinned to cores in increasing order, For example, worker1 => CPU0, worker2 => CPU1 and so on. `BALANCED` tries to spread
70-
the workers. Useful when we have multiple CPU sockets, with each socket having multiple cores. A `BALANCED` mode results in workers
71-
spread across CPU sockets. Default is `BALANCED`.
72-
73-
### Using `ElasticManager` (dynamically adding workers to a cluster)
74-
75-
The `ElasticManager` is useful in scenarios where we want to dynamically add workers to a cluster.
76-
It achieves this by listening on a known port on the master. The launched workers connect to this
77-
port and publish their own host/port information for other workers to connect to.
78-
79-
On the master, you need to instantiate an instance of `ElasticManager`. The constructors defined are:
80-
81-
```julia
82-
ElasticManager(;addr=IPv4("127.0.0.1"), port=9009, cookie=nothing, topology=:all_to_all, printing_kwargs=())
83-
ElasticManager(port) = ElasticManager(;port=port)
84-
ElasticManager(addr, port) = ElasticManager(;addr=addr, port=port)
85-
ElasticManager(addr, port, cookie) = ElasticManager(;addr=addr, port=port, cookie=cookie)
86-
```
87-
88-
You can set `addr=:auto` to automatically use the host's private IP address on the local network, which will allow other workers on this network to connect. You can also use `port=0` to let the OS choose a random free port for you (some systems may not support this). Once created, printing the `ElasticManager` object prints the command which you can run on workers to connect them to the master, e.g.:
89-
90-
```julia
91-
julia> em = ElasticManager(addr=:auto, port=0)
92-
ElasticManager:
93-
Active workers : []
94-
Number of workers to be added : 0
95-
Terminated workers : []
96-
Worker connect command :
97-
/home/user/bin/julia --project=/home/user/myproject/Project.toml -e 'using ClusterManagers; ClusterManagers.elastic_worker("4cOSyaYpgSl6BC0C","127.0.1.1",36275)'
98-
```
99-
100-
By default, the printed command uses the absolute path to the current Julia executable and activates the same project as the current session. You can change either of these defaults by passing `printing_kwargs=(absolute_exename=false, same_project=false))` to the first form of the `ElasticManager` constructor.
101-
102-
Once workers are connected, you can print the `em` object again to see them added to the list of active workers.
103-
104-
### Sun Grid Engine (SGE)
105-
106-
See [`docs/sge.md`](docs/sge.md)
11+
The functionality in this package originally used to live in [ClusterManagers.jl](https://github.com/JuliaParallel/ClusterManagers.jl).

0 commit comments

Comments
 (0)