Skip to content

Commit 34949fd

Browse files
Merge branch 'develop' into karansh1/batched_fetch
2 parents 4ae2b31 + 77297d8 commit 34949fd

File tree

12 files changed

+71
-251
lines changed

12 files changed

+71
-251
lines changed

openfl-workspace/flower-app-pytorch/README.md

Lines changed: 10 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -280,31 +280,18 @@ flwr run ./src/app-pytorch
280280
```
281281
It will run another experiment. Once you are done, you can manually shut down OpenFL's `collaborator` and Flower's `SuperNode` with `CTRL+C`. This will trigger a task-completion by the task runner that'll subsequently begin the graceful shutdown process of the OpenFL and Flower components.
282282

283-
### Running in SGX Enclave
284-
Gramine does not support all Linux system calls. Flower FAB is built and installed at runtime. During this, `utime()` is called, which is an [unsupported call](https://gramine.readthedocs.io/en/latest/devel/features.html#list-of-system-calls), resulting in error or unexpected behavior. To navigate this, when running in an SGX enclave, we opt to build and install the FAB during initialization and package it alongside the OpenFL workspace. To make this work, we introduce some patches to Flower's build command, which helps circumvent the unsupported system call as well as minimize read/write access.
285-
286-
To run these patches, simply add `patch: True` to the `Connector` and `Task Runner` settings (if not already set). For the `Task Runner` also include the name of the Flower app for building and installation.
283+
### Running in Intel<sup>®</sup> SGX Enclave
284+
Intel SGX is a set of CPU extensions for creating isolated memory regions, called enclaves, that allow secure computation of sensitive data. Applications that run within enclaves are encrypted in memory and remain isolated from the rest of the system. Executing code within an enclave requires an [Intel SGX–supported Intel CPU](https://www.intel.com/content/www/us/en/support/articles/000028173/processors.html). To run this workspace in an SGX enclave, first, set `sgx_enabled: True` in the `plan.yaml` for the `task_runner`:
287285

288286
```yaml
289-
connector :
290-
defaults : plan/defaults/connector.yaml
291-
template : openfl.component.ConnectorFlower
292-
settings :
293-
automatic_shutdown : True
294-
superlink_params :
295-
insecure : True
296-
serverappio-api-address : 127.0.0.1:9091
297-
fleet-api-address : 127.0.0.1:9092
298-
exec-api-address : 127.0.0.1:9093
299-
flwr_run_params :
300-
flwr_app_name : "app-pytorch"
301-
federation_name : "local-poc"
302-
patch : True
303-
304287
task_runner :
305288
defaults : plan/defaults/task_runner.yaml
306-
template : openfl.federated.task.runner_flower.FlowerTaskRunner
289+
template : src.runner.FlowerTaskRunner
307290
settings :
308-
patch : True
309-
flwr_app_name : "app-pytorch"
310-
```
291+
flwr_app_name: app-pytorch
292+
sgx_enabled: False
293+
```
294+
295+
Then, follow the [instructions](https://openfl.readthedocs.io/en/latest/about/features_index/taskrunner.html#docker-container-approach) to build/pull a base image, dockerize your workspace, and containerize each component.
296+
297+
Enabling this flag does two things. First, it runs the Flower `ClientApp` as an isolated process (`--isolation process`). This reduces the number of processes that need to be spawned in the enclave, reducing overhead time ([see](https://gramine.readthedocs.io/en/stable/performance.html#multi-process-workloads)). Second, Gramine does not support all Linux system calls. Flower FAB is built and installed at runtime. During this, `utime()` is called, which is an [unsupported call](https://gramine.readthedocs.io/en/latest/devel/features.html#list-of-system-calls), resulting in error or unexpected behavior. To navigate this, when running in an SGX enclave, we opt to build and install the FAB during initialization and package it alongside the OpenFL workspace when the `sgx_enabled` flag is set to `True`

openfl-workspace/flower-app-pytorch/plan/plan.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ connector :
2222
flwr_run_params :
2323
flwr_app_name : "app-pytorch"
2424
federation_name : "local-poc"
25-
sgx_enabled: False
2625

2726
collaborator :
2827
defaults : plan/defaults/collaborator.yaml

openfl-workspace/flower-app-pytorch/src/app-pytorch/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ version = "1.0.0"
88
description = ""
99
license = "Apache-2.0"
1010
dependencies = [
11-
"flwr>=1.15.0,<1.17.0",
11+
"flwr-nightly",
1212
"flwr-datasets[vision]>=0.5.0",
1313
"torch==2.5.1",
1414
"torchvision==0.20.1",

openfl-workspace/flower-app-pytorch/src/connector_flower.py

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -154,12 +154,7 @@ def _build_flwr_run_command(self) -> list[str]:
154154
federation_name = self.flwr_run_params.get("federation_name")
155155
flwr_app_name = self.flwr_run_params.get("flwr_app_name")
156156

157-
os.environ["TMPDIR"] = os.environ["FLWR_HOME"]
158-
159-
if self.flwr_run_params.get("sgx_enabled"):
160-
command = ["python", "src/patch/flwr_run_patch.py", "run", f"./src/{flwr_app_name}"]
161-
else:
162-
command = ["flwr", "run", f"./src/{flwr_app_name}"]
157+
command = ["flwr", "run", f"./src/{flwr_app_name}"]
163158

164159
if federation_name:
165160
command.append(federation_name)

openfl-workspace/flower-app-pytorch/src/patch/__init__.py

Lines changed: 0 additions & 3 deletions
This file was deleted.

openfl-workspace/flower-app-pytorch/src/patch/flwr_run_patch.py

Lines changed: 0 additions & 11 deletions
This file was deleted.

openfl-workspace/flower-app-pytorch/src/patch/patch_flwr_build.py

Lines changed: 0 additions & 154 deletions
This file was deleted.

openfl-workspace/flower-app-pytorch/src/runner.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55
import os
66
import numpy as np
77
from pathlib import Path
8-
import sys
98
import socket
109
from src.util import is_safe_path
1110

@@ -34,13 +33,13 @@ def __init__(self, **kwargs):
3433
"""
3534
super().__init__(**kwargs)
3635

37-
self.sgx_enabled = kwargs.get('sgx_enabled')
3836
if self.data_loader is None:
3937
flwr_app_name = kwargs.get('flwr_app_name')
40-
if self.sgx_enabled:
41-
install_flower_FAB(flwr_app_name)
38+
install_flower_FAB(flwr_app_name)
4239
return
4340

41+
self.sgx_enabled = kwargs.get('sgx_enabled')
42+
4443
self.model = None
4544
self.logger = getLogger(__name__)
4645

@@ -170,16 +169,18 @@ def install_flower_FAB(flwr_app_name):
170169
flwr_app_name (str): The name of the Flower application to patch.
171170
"""
172171
flwr_dir = os.environ["FLWR_HOME"]
173-
os.environ["TMPDIR"] = flwr_dir
172+
173+
# Change the current working directory to the Flower directory
174+
os.chdir(flwr_dir)
174175

175176
# Run the build command
176-
subprocess.check_call([
177-
sys.executable,
178-
"src/patch/flwr_run_patch.py",
177+
build_command = [
178+
"flwr",
179179
"build",
180180
"--app",
181-
f"./src/{flwr_app_name}"
182-
])
181+
os.path.join("..", "..", "src", flwr_app_name)
182+
]
183+
subprocess.check_call(build_command)
183184

184185
# List .fab files after running the build command
185186
fab_files = list(Path(flwr_dir).glob("*.fab"))
@@ -189,8 +190,7 @@ def install_flower_FAB(flwr_app_name):
189190

190191
# Run the install command using the newest .fab file
191192
subprocess.check_call([
192-
sys.executable,
193-
"src/patch/flwr_run_patch.py",
193+
"flwr",
194194
"install",
195195
str(newest_fab_file)
196196
])

openfl/component/aggregator/aggregator.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -149,15 +149,17 @@ def __init__(
149149
self.quit_job_sent_to = []
150150

151151
self.tensor_db = TensorDB()
152-
if persist_checkpoint:
152+
if persist_checkpoint and not self.assigner.is_task_group_evaluation():
153153
persistent_db_path = persistent_db_path or "tensor.db"
154154
logger.info(
155155
"Persistent checkpoint is enabled, setting persistent db at path %s",
156156
persistent_db_path,
157157
)
158158
self.persistent_db = PersistentTensorDB(persistent_db_path)
159159
else:
160-
logger.info("Persistent checkpoint is disabled")
160+
logger.info(
161+
"Either persistent checkpoint is disabled or the experiment is in evaluation mode"
162+
)
161163
self.persistent_db = None
162164
# FIXME: I think next line generates an error on the second round
163165
# if it is set to 1 for the aggregator.
@@ -224,8 +226,10 @@ def __init__(
224226

225227
self.secagg = SecAggSetup(self.uuid, self.authorized_cols, self.tensor_db)
226228

227-
if self.persistent_db and self._recover():
228-
logger.info("Recovered state of aggregator")
229+
# Only recover from persistent DB if not in evaluation mode
230+
if self.persistent_db and not self.assigner.is_task_group_evaluation():
231+
if self._recover():
232+
logger.info("Recovered state of aggregator")
229233

230234
# TODO: Aggregator has no concrete notion of round_begin.
231235
# https://github.com/securefederatedai/openfl/pull/1195#discussion_r1879479537

0 commit comments

Comments
 (0)