-
Notifications
You must be signed in to change notification settings - Fork 415
feature(zjow): envpoool env example in new pipeline #746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
308 commits
Select commit
Hold shift + click to select a range
eac9434
fix bug
zjowowen 6fda31b
fix dtype error
zjowowen 6b9def4
polish code
zjowowen 6f49d0a
polish code
zjowowen cdafb55
Add dqn agent
zjowowen b5788e2
merge from main
zjowowen b181597
add config
zjowowen b534448
merge from main
zjowowen 1d91b6d
add bonus/c51.py
zhangpaipai ef5f1d5
add c51 logit monitor
zhangpaipai 87822ba
add sac dqn agent
zjowowen c86d897
add sac dqn agent demo in dizoo
zjowowen 0f06aa6
merge from main
zjowowen 1973d01
polish format
zjowowen 7a832ee
pull zjow new-pipeline-agent
zhangpaipai 1c111c2
polish code
zjowowen 3a05437
polish code
zjowowen 54b1a09
fix ddpg bug
zjowowen 1caefff
merge nyz c51/dqn config and policy
zhangpaipai 646b005
merge from main
zjowowen 6a8d535
fix config
zjowowen 6fb3534
remove mutistep_trainer
zhangpaipai c54f220
fix bug
zjowowen 557102e
polish code
zjowowen 95f995c
polish code
zjowowen c5e9a52
polish code
zjowowen 01b82c7
add Hopper demo
zjowowen 0d60070
polish code
zjowowen 3f3fb68
add property best
zjowowen 49cab88
merge from main
zjowowen dc5aa8c
add a2c pipeline
zjowowen ccb2fcf
add sac halfcheetah+walker2d
zhangpaipai 84bef89
pull zjow new-pipeline-agent
zhangpaipai c937f3b
fix a2c pipeline bug
zjowowen 27ff425
fix pipeline bug
zjowowen 02bc7f0
fix bug
zjowowen 6fb854f
change config
zjowowen bbf7e2d
merge from main
zjowowen a76408c
remove IMPALA pipeline
zjowowen fd7f922
format code
zjowowen 70009ae
polish code
zjowowen 772c354
polish c51 and add ddpg halfcheetah walker2d
zhangpaipai a7513d8
pull zjow new-pipeline-agent again
zhangpaipai 12d6291
add dizoo/common for zjow to review
zhangpaipai fec830a
fix agent best method
zjowowen 471aff4
Merge branch 'new-pipeline-agent' of https://github.com/zjowowen/DI-e…
zhangpaipai 8f523e7
reset dizoo
zjowowen 0f5015e
delete common
zhangpaipai 151079c
Merge branch 'new-pipeline-agent' of https://github.com/zjowowen/DI-e…
zhangpaipai d5cdb1e
polish for zjow to review
zhangpaipai 511dfad
merge from main
zjowowen d69b165
polish code
zjowowen b95f340
polish code
zjowowen b6be677
fix bug
zjowowen 516780b
fix bug
zjowowen 98e4d46
Merge branch 'new-pipeline-agent' of https://github.com/zjowowen/DI-e…
zhangpaipai 93008aa
polish c51
zhangpaipai 83861f8
merge from main
zjowowen 883ce54
Merge branch 'new-pipeline-agent' of https://github.com/zjowowen/DI-e…
zhangpaipai c7f5ad6
add pg agent
zjowowen 36a7dfa
merge from main
zjowowen 1941951
add pendulum config
zhangpaipai af7272a
add c51_atari td3_pendulum,bipedalwalker ddpg_pendulum
zhangpaipai eafeada
polish code
zjowowen d21839c
merge from main
zjowowen 68a738e
polish code
zjowowen fdc6408
polish code
zjowowen c67622d
merge zjow
zhangpaipai 5cf69d6
add bipedalwalker_ddpg_config
zhangpaipai 995e39c
merge from main
zjowowen 7e03fc1
merge zjow
zhangpaipai f222d42
feature(zp): add c51
zjowowen ca63569
change config
zjowowen 2e8978b
change bipedalwalker config and noframeskip
zhangpaipai aa3367d
polish c51-atari name
zhangpaipai 4b7aa50
add pong spaceinvaders and qbert for dqn
ruoyuGao 134e3e5
merge from main
zjowowen 4c08017
git fetch
zjowowen efc807e
polish code
zjowowen eed925f
polish code; add env mode
zjowowen f37f65b
add rew_clip in ding_env_wrapper
zhangpaipai 59cc61b
polish dqn atari
zhangpaipai 4b2ffcd
merge from main
zjowowen 8b04a11
merge from new-pipeline-agent
zjowowen b1aab8d
add a2c continuous action space
zjowowen 0584404
add a2c continuous action space
zjowowen f651f68
add a2c continuous for mujoco
zjowowen 92bfff3
add a2c continuous for mujoco
zjowowen a72de14
add a2c continuous for mujoco
zjowowen 4e59519
add a2c mujoco config; add ppo atari config
zjowowen f104d81
add a2c mujoco config; add ppo atari config
zjowowen 308e25a
fix a2c deploy bug
zjowowen 522b0ff
Add bipedalwalker a2c
zjowowen 1e87b1d
polish code
zjowowen 06f4046
polish code
zjowowen 7fc7032
polish code
zjowowen bb74395
polish code
zjowowen 5ea9233
polish code
zjowowen 59b7080
add pendulum a2c+pg
zhangpaipai d96ce90
Merge branch 'new-pipeline-agent' of https://github.com/zjowowen/DI-e…
zhangpaipai e6e100b
add pg bipedalwalker+mujoco
zhangpaipai f69c448
polish code for wandb sweep
zjowowen 98877de
polish code for wandb sweep
zjowowen dbec6a7
polish code for wandb sweep
zjowowen d2d7e8e
polish code for a2c mujoco
zjowowen 168fd41
add pg pendulum new pipeline
zhangpaipai de2d180
fix scalar action bug in random collect
zjowowen 1a2d4dd
polish pg algorithm
zhangpaipai 498c094
Merge branch 'new-pipeline-agent' of https://github.com/zjowowen/DI-e…
zhangpaipai 1221565
add bonus pg config
zhangpaipai 1018dda
polish pg config
zhangpaipai f197844
polish config
zjowowen 2e578ad
merge from main
zjowowen 0bc6923
polish code
zjowowen 5473706
change pendulum pg config
zhangpaipai db8176b
fix continuous action dim=1 bug
zjowowen 43b0c3e
merge from main
zjowowen 44a3047
merge from origin main
zjowowen d16fa86
Add ppof lr scheduler
zjowowen eb86c63
polish config
zjowowen eab7912
fix random collect bug for dqn
zjowowen 98a9017
polish ppo qbert spaceinvader config
zjowowen b52d8f1
remove mujoco wrapper
zjowowen 8b15b52
polish a2c mujoco config; add ppo offpolicy agent pipeline
zjowowen c915f33
merge from main
zjowowen dc61317
Add wandb monitor evaluate return std
zjowowen ea5f1e7
polish deploy method
zjowowen 35a21b4
format code
zjowowen f95e8eb
polish code
zjowowen 603fa5e
polish pg pendulum+hopper config
zhangpaipai 6b874dc
Merge branch 'new-pipeline-agent' of https://github.com/zjowowen/DI-e…
zhangpaipai ddd6550
fix data shape bug
zjowowen d25228e
merge from remote
zjowowen df7963d
merge from main
zjowowen a7c3cf4
fix ppo offpolicy deploy bug
zjowowen ea979e8
fix mujoco reward action env clip bug
zjowowen ff7f639
fix mujoco reward action env clip bug
zjowowen ed5b1a3
fix deploy env mode bug
zjowowen 05f8c47
fix env reset bug for deployment and evaluation
zjowowen df20033
Add ppo offpolicy atari config
zjowowen 4cc8eac
merge from main
zjowowen 5ecc9dc
polish config
zjowowen 1e5ec1a
merge from main
zjowowen c621c35
polish config code
zjowowen 41786e3
polish code; add SQL
zjowowen ebcefb4
polish code
zjowowen 420ef72
polish code
zjowowen d9d93dd
polish code
zjowowen aa1f39d
polish code
zjowowen 653a00b
change config path
zjowowen 57e7325
add compatibility fix for nstep
zjowowen 0754dd9
polish code
zjowowen 0919f06
Add ppo offpolicy continuous policy
zjowowen d958f49
polish config
zjowowen ab0fdda
add ppo offpolicy general action modeling
zjowowen 0c1f2b6
add dependencies
zjowowen 9336a0a
polish config
zjowowen ced06f8
polish deploy
zjowowen a8822fd
Add array video helper
zjowowen 8d152e0
polish deploy
zjowowen 2e2db04
merge from main
zjowowen e063d77
polish config
zjowowen afb6355
polish setup
zjowowen 0863b0b
fix config bug
zjowowen c934ef6
polish code
zjowowen a1f3e94
polish code
zjowowen da9d2c1
polish code
zjowowen af3d101
merge from main
zjowowen 92d9504
fix bug in evaluator
zjowowen 1f0704c
polish code
zjowowen 5a08ec7
polish code
zjowowen 1774224
merge from main
zjowowen 65b9f08
Add priority in collector
zjowowen 02f90cf
merge from main
zjowowen c9e736a
polish code
zjowowen 4d125c9
Merge branch 'new-pipeline-agent' of https://github.com/zjowowen/DI-e…
zjowowen d204c95
polish example
zjowowen eea5573
polish example
zjowowen 0f807d8
add wandb logger
zjowowen 984c8ba
polish code
zjowowen 584bd7a
polish code
zjowowen b87eabc
merge from main
zjowowen 795ec5d
merge from new-agent-pipeline
zjowowen 948c99b
polish code
zjowowen 554edb2
Merge branch 'envpool-pr' of https://github.com/zjowowen/DI-engine in…
zjowowen d0047ed
change timer gpu to false
zjowowen 8c57293
Merge branch 'envpool-pr' of https://github.com/zjowowen/DI-engine in…
zjowowen c7509cb
polish config
zjowowen 9ff7d4b
add sweep main file for new pipeline
zjowowen 0a1a2cc
polish code
zjowowen b798c2e
Merge branch 'envpool-pr' of https://github.com/zjowowen/DI-engine in…
zjowowen fb5045b
polish code
zjowowen 6a4d83e
polish code
zjowowen 52ded5a
Add main file
zjowowen 9aa23f7
add test
zjowowen c6e90a4
add test
zjowowen 3addb8b
merge from main
zjowowen 1d9f9af
Merge branch 'new-pipeline-agent' of https://github.com/zjowowen/DI-e…
zjowowen ef99434
add time logger
zjowowen 27cb8bd
add new envmanager and collector
zjowowen 5a41f63
fix bug in learner
zjowowen 1b7cf2a
add nstep support for fast dqn
zjowowen aab3847
change data type
zjowowen 83ece4f
polish code
zjowowen 1980d51
add spaceinvaders envpool
zjowowen 96c0bbf
fix import bug
zjowowen 8419a38
merge file from main
zjowowen 7adbc77
merge file from main
zjowowen 7b581eb
merge file from main
zjowowen fe30fbe
merge file from main
zjowowen dc0ea3a
merge file from main
zjowowen c7b7645
merge file from main
zjowowen 2f9a41f
merge file from main
zjowowen 871fdc0
merge file from main
zjowowen e6e6828
change offline learner
zjowowen 8d79f66
add dqn policy timer
zjowowen e1c137a
polish code
zjowowen ab33001
polish code
zjowowen 340d50e
polish code
zjowowen e5af078
polish code
zjowowen 20cbef1
add shrink model
zjowowen 532b5b8
add large batch
zjowowen b31a7ca
add large batch
zjowowen 35069ae
add large learning rate; add priority
zjowowen a06bd3f
Add update per collect 5 and target update 100
zjowowen e5ea2fd
Add qbert test 6 7
zjowowen d7c4983
polish qbert test 6 7
zjowowen e22df12
polish qbert test 6 7
zjowowen c03a17b
polish qbert test 8 9
zjowowen 48c1333
polish qbert test 10~12
zjowowen dda0ffc
polish qbert test 13
zjowowen 878bbb3
polish qbert test 14 15
zjowowen 73b73dc
polish qbert test 16~18
zjowowen c068721
merge from main
zjowowen 7daf239
polish code
zjowowen ed0f490
polish code
zjowowen 8981236
polish code
zjowowen 3a1d98c
polish code
zjowowen 83ca217
polish code
zjowowen 97360c0
polish code
zjowowen 25fab56
polish code
zjowowen ab93b39
polish code
zjowowen cd762b6
polish code
zjowowen d3c9bf8
polish code
zjowowen 1bd96e0
polish pr
zjowowen 35a2c67
fix bug
zjowowen fca097f
Merge branch 'main' of https://github.com/zjowowen/DI-engine into dis…
zjowowen 3687f8b
polish code
zjowowen 4fb85b0
polish code
zjowowen 48ee6da
polish code
zjowowen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,11 @@ | |
from easydict import EasyDict | ||
from copy import deepcopy | ||
import numpy as np | ||
import torch | ||
import treetensor.torch as ttorch | ||
import treetensor.numpy as tnp | ||
from collections import namedtuple | ||
import enum | ||
from typing import Any, Union, List, Tuple, Dict, Callable, Optional | ||
from ditk import logging | ||
try: | ||
|
@@ -17,17 +21,28 @@ | |
from ding.torch_utils import to_ndarray | ||
|
||
|
||
@ENV_MANAGER_REGISTRY.register('env_pool') | ||
class EnvState(enum.IntEnum): | ||
VOID = 0 | ||
INIT = 1 | ||
RUN = 2 | ||
RESET = 3 | ||
DONE = 4 | ||
ERROR = 5 | ||
NEED_RESET = 6 | ||
|
||
|
||
@ENV_MANAGER_REGISTRY.register('envpool') | ||
class PoolEnvManager: | ||
''' | ||
""" | ||
Overview: | ||
PoolEnvManager supports old pipeline of DI-engine. | ||
Envpool now supports Atari, Classic Control, Toy Text, ViZDoom. | ||
Here we list some commonly used env_ids as follows. | ||
For more examples, you can refer to <https://envpool.readthedocs.io/en/latest/api/atari.html>. | ||
|
||
- Atari: "Pong-v5", "SpaceInvaders-v5", "Qbert-v5" | ||
- Classic Control: "CartPole-v0", "CartPole-v1", "Pendulum-v1" | ||
''' | ||
""" | ||
|
||
@classmethod | ||
def default_config(cls) -> EasyDict: | ||
|
@@ -39,10 +54,17 @@ def default_config(cls) -> EasyDict: | |
# Async mode: batch_size < env_num | ||
env_num=8, | ||
batch_size=8, | ||
image_observation=True, | ||
episodic_life=False, | ||
reward_clip=False, | ||
gray_scale=True, | ||
stack_num=4, | ||
frame_skip=4, | ||
) | ||
|
||
def __init__(self, cfg: EasyDict) -> None: | ||
self._cfg = cfg | ||
self._cfg = self.default_config() | ||
self._cfg.update(cfg) | ||
self._env_num = cfg.env_num | ||
self._batch_size = cfg.batch_size | ||
self._ready_obs = {} | ||
|
@@ -55,6 +77,7 @@ def launch(self) -> None: | |
seed = 0 | ||
else: | ||
seed = self._seed | ||
|
||
self._envs = envpool.make( | ||
task_id=self._cfg.env_id, | ||
env_type="gym", | ||
|
@@ -65,8 +88,10 @@ def launch(self) -> None: | |
reward_clip=self._cfg.reward_clip, | ||
stack_num=self._cfg.stack_num, | ||
gray_scale=self._cfg.gray_scale, | ||
frame_skip=self._cfg.frame_skip | ||
frame_skip=self._cfg.frame_skip, | ||
) | ||
self._action_space = self._envs.action_space | ||
self._observation_space = self._envs.observation_space | ||
self._closed = False | ||
self.reset() | ||
|
||
|
@@ -77,6 +102,8 @@ def reset(self) -> None: | |
obs, _, _, info = self._envs.recv() | ||
env_id = info['env_id'] | ||
obs = obs.astype(np.float32) | ||
if self._cfg.image_observation: | ||
obs /= 255.0 | ||
self._ready_obs = deep_merge_dicts({i: o for i, o in zip(env_id, obs)}, self._ready_obs) | ||
if len(self._ready_obs) == self._env_num: | ||
break | ||
|
@@ -91,6 +118,8 @@ def step(self, action: dict) -> Dict[int, namedtuple]: | |
|
||
obs, rew, done, info = self._envs.recv() | ||
obs = obs.astype(np.float32) | ||
if self._cfg.image_observation: | ||
obs /= 255.0 | ||
rew = rew.astype(np.float32) | ||
env_id = info['env_id'] | ||
timesteps = {} | ||
|
@@ -124,3 +153,153 @@ def env_num(self) -> int: | |
@property | ||
def ready_obs(self) -> Dict[int, Any]: | ||
return self._ready_obs | ||
|
||
@property | ||
def observation_space(self) -> 'gym.spaces.Space': # noqa | ||
try: | ||
return self._observation_space | ||
except AttributeError: | ||
self.launch() | ||
self.close() | ||
return self._observation_space | ||
|
||
@property | ||
def action_space(self) -> 'gym.spaces.Space': # noqa | ||
try: | ||
return self._action_space | ||
except AttributeError: | ||
self.launch() | ||
self.close() | ||
return self._action_space | ||
|
||
|
||
@ENV_MANAGER_REGISTRY.register('envpool_v2') | ||
class PoolEnvManagerV2: | ||
""" | ||
Overview: | ||
PoolEnvManagerV2 supports new pipeline of DI-engine. | ||
Envpool now supports Atari, Classic Control, Toy Text, ViZDoom. | ||
Here we list some commonly used env_ids as follows. | ||
For more examples, you can refer to <https://envpool.readthedocs.io/en/latest/api/atari.html>. | ||
|
||
- Atari: "Pong-v5", "SpaceInvaders-v5", "Qbert-v5" | ||
- Classic Control: "CartPole-v0", "CartPole-v1", "Pendulum-v1" | ||
""" | ||
|
||
@classmethod | ||
def default_config(cls) -> EasyDict: | ||
return EasyDict(deepcopy(cls.config)) | ||
|
||
config = dict( | ||
type='envpool_v2', | ||
env_num=8, | ||
batch_size=8, | ||
image_observation=True, | ||
episodic_life=False, | ||
reward_clip=False, | ||
gray_scale=True, | ||
stack_num=4, | ||
frame_skip=4, | ||
) | ||
|
||
def __init__(self, cfg: EasyDict) -> None: | ||
super().__init__() | ||
self._cfg = self.default_config() | ||
self._cfg.update(cfg) | ||
self._env_num = cfg.env_num | ||
self._batch_size = cfg.batch_size | ||
|
||
self._closed = True | ||
self._seed = None | ||
|
||
def launch(self) -> None: | ||
assert self._closed, "Please first close the env manager" | ||
if self._seed is None: | ||
seed = 0 | ||
else: | ||
seed = self._seed | ||
|
||
self._envs = envpool.make( | ||
task_id=self._cfg.env_id, | ||
env_type="gym", | ||
num_envs=self._env_num, | ||
batch_size=self._batch_size, | ||
seed=seed, | ||
episodic_life=self._cfg.episodic_life, | ||
reward_clip=self._cfg.reward_clip, | ||
stack_num=self._cfg.stack_num, | ||
gray_scale=self._cfg.gray_scale, | ||
frame_skip=self._cfg.frame_skip, | ||
) | ||
self._action_space = self._envs.action_space | ||
self._observation_space = self._envs.observation_space | ||
self._closed = False | ||
return self.reset() | ||
|
||
def reset(self) -> None: | ||
self._envs.async_reset() | ||
ready_obs = {} | ||
while True: | ||
obs, _, _, info = self._envs.recv() | ||
env_id = info['env_id'] | ||
obs = obs.astype(np.float32) | ||
if self._cfg.image_observation: | ||
obs /= 255.0 | ||
for i in range(len(list(env_id))): | ||
ready_obs[env_id[i]] = obs[i] | ||
if len(ready_obs) == self._env_num: | ||
break | ||
self._eval_episode_return = [0. for _ in range(self._env_num)] | ||
|
||
return ready_obs | ||
|
||
def send_action(self, action, env_id) -> Dict[int, namedtuple]: | ||
self._envs.send(action, env_id) | ||
|
||
def receive_data(self): | ||
next_obs, rew, done, info = self._envs.recv() | ||
next_obs = next_obs.astype(np.float32) | ||
if self._cfg.image_observation: | ||
next_obs /= 255.0 | ||
rew = rew.astype(np.float32) | ||
|
||
return next_obs, rew, done, info | ||
|
||
def close(self) -> None: | ||
if self._closed: | ||
return | ||
# Envpool has no `close` API | ||
self._closed = True | ||
|
||
@property | ||
def closed(self) -> None: | ||
return self._closed | ||
|
||
def seed(self, seed: int, dynamic_seed=False) -> None: | ||
# The i-th environment seed in Envpool will be set with i+seed, so we don't do extra transformation here | ||
self._seed = seed | ||
logging.warning("envpool doesn't support dynamic_seed in different episode") | ||
|
||
@property | ||
def env_num(self) -> int: | ||
return self._env_num | ||
|
||
@property | ||
def observation_space(self) -> 'gym.spaces.Space': # noqa | ||
try: | ||
return self._observation_space | ||
except AttributeError: | ||
self.launch() | ||
self.close() | ||
self._ready_obs = {} | ||
return self._observation_space | ||
|
||
@property | ||
def action_space(self) -> 'gym.spaces.Space': # noqa | ||
try: | ||
return self._action_space | ||
except AttributeError: | ||
self.launch() | ||
self.close() | ||
self._ready_obs = {} | ||
return self._action_space | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add envpooltest for this file There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why merge config here, we have already merged the config of env manager in
compile_config
functionThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two env manager config, one for evaluator and one for collector. It's too complicated to use compile_config with auto=True.
I suggest use compile_config with auto=False.