Feedback #91

rasauq1122 · 2024-10-01T10:48:08Z

👋! GitHub Classroom created this pull request as a place for your teacher to leave feedback on your work. It will update automatically. Don’t close or merge this pull request, unless you’re instructed to do so by your teacher.
In this pull request, your teacher can leave comments and feedback on your code. Click the Subscribe button to be notified if that happens.
Click the Files changed or Commits tab to see all of the changes pushed to the default branch since the assignment started. Your teacher can see this too.

Notes for teachers

Use this PR to leave feedback. Here are some tips:

Click the Files changed tab to see all of the changes pushed to the default branch since the assignment started. To leave comments on specific lines of code, put your cursor over a line of code and click the blue + (plus sign). To learn more about comments, read “Commenting on a pull request”.
Click the Commits tab to see the commits pushed to the default branch. Click a commit to see specific changes.
If you turned on autograding, then click the Checks tab to see the results.
This page is an overview. It shows commits, line comments, and general comments. You can leave a general comment below.
For more information about this pull request, read “Leaving assignment feedback in GitHub”.

Subscribed: @swbaek97 @Uvamba @hskhyl @rasauq1122 @Now100 @privetin

test

…initialization - Instead of hardcoding the dataloader and model configurations in main.py, this commit updates the code to use a config file for defining these configurations. - The config.yaml file now includes a new "max_tokens" item to specify the maximum number of tokens for the dataloader. - The data path information is also added to the config.yaml file. - The max_epochs argument in the Trainer initialization is replaced with the value from the config file. - The model.py file is updated to import the get_loss and get_optimizer functions from separate files. - The loss.py file is added, which includes a function to get the loss function based on the provided name. - The optimizer.py file is added, which includes a function to get the optimizer based on the provided name. Refactoring the code in this way improves modularity and makes it easier to modify the dataloader, model, loss function, and optimizer configurations in the future.

#5 완료

fix: dataloader.py - args.shuffle -> self.shuffle

fix: delete backticks

feat: add EDA.ipynb

feat: data augmentation

feat: label 별 데이터 분포 및 token length 별 데이터 분포 시각화

feat: interactive한 앙상블

feat: can load .ckpt file

앙상블 모델 결정 과정에서 분석한 EDA 코드입니다.

requirements.txt 최종 업데이트

Update README.md

nonegom

전체적으로 너무 잘 작성해주신 것 같습니다. 이미 피드백이 없이도 어느정도 완성도 있는 코드라는 생각이 듭니다. 제가 드리는 피드백은 정말 간단하게 처음 보는 사람의 입장에서 작성한 것들이라, 적당히 보시고 취사선택을 해주시면 좋을 것 같습니다. 프로젝트 너무 고생 많으셨습니다.

nonegom · 2024-10-09T13:15:52Z

README.md

+| 백승우 |데이터 EDA(label 분포 분석), 데이터 전처리(불용어 제거), 데이터 증강(under sampling, copied sentence), 모델 아키텍쳐 변경 실험(embedding layer freezing, 일부 encoder layer freezing), 모델 조사 및 실험  |
+| 김정석 |데이터 분석 및 전처리 고도화, 모델 아키텍처 최적화, 앙상블 기법 개선  |
+
+## 2. 프로젝트 파일 구조


프로젝트 파일 구조를 나눠서 구분해둔 게 좋은 것 같습니다! 첫 프로젝트이실텐데 이런식으로 파일을 구조화해서 기록하시려고 시도하신 점에서 놀랐습니다.

nonegom · 2024-10-09T13:21:55Z

README.md

+- **Slack**: Github 및 WandB 봇을 활용한 협업, 의견 공유, 회의
+- **Zoom**: 실시간 소통을 통한 의견 공유 및 회의
+
+### 1.3 팀 구성 및 역할


팀 구성 및 역할은 확실히 필요한 부분이라고 생각하는데, 잘 작성해주신 것 같습니다! 이는 개인적인 의견일 수도 있습나다만, 가독성있게 읽혀지지는 않는 것 같습니다. 하신 일들에 대해서 구체적으로 작성해주신 것 같기는 하지만, 너무 세세한 부분까지 작성이 된 것 같아, 내용을 어느 정도 함축해주시고, 전체적으로 Task명을 통일 시켜주시면 더 가독성이 좋지 않을까 생각이 듭니다.

nonegom · 2024-10-09T13:25:06Z

config.yaml

@@ -0,0 +1,33 @@
+model:


config 파일을 통해서 모델의 학습 파라미터를 정리하신 부분이 되게 좋은 것 같습니다! 추후 해당 프로젝트와 관련해서 말씀을 하시게 될 때, 여기서 사용한 optimizer나 loss, scheduler, augmentation 방법들에 대해서 깊게 이해를 하고 계시면 도움이 될 것 같습니다! 어떤 방법이고, 왜 사용했는지 등에 대해 질문이 들어올 때 도움이 될 것 같습니다!

nonegom · 2024-10-09T13:29:28Z

notebooks/EDA.ipynb

notebook 폴더를 별도로 나눠, 실험 내용을 기록하시려고 하신 시도가 좋았습니다. 다만, 이와 관련해서 제목의 통일성을 조금 맞춰주고, 가장 처음 코드블록에 주석으로라도, 어떤 실험에 대한 노트북 파일이었는지 작성해주시면 더 좋을 것 같습니다. 해당 주석을 잘 작성해주시면, 처음 보는 사람뿐만이 아니라 스스로에게도 많은 도움이 됩니다. 왜냐하면, 몇 달 뒤면 저희도 모두 코드를 처음 보는 사람처럼 될 수 있기 때문입니다 ㅎㅎ...

nonegom · 2024-10-09T13:35:37Z

requirements.txt

requirements의 경우 처음 git을 통해 코드 관리를 하실 때 많이 놓칠 수도 있는 부분인데, 이를 챙겨주신 것만 해도 충분히 좋은 것 같습니다. 조금 더 좋아지길 바라는 마음에서 말씀드리면, 제가 생각했을 때 모든 패키지가 필요하지는 않을 것 같습니다. 이와 관련해 실제 main에서 활용하지 않아도 되는 패키지는 제거 후, 최소한의 패키지만을 올려두는 것이 더 좋을 것 같다는 생각이 듭니다.

nonegom · 2024-10-09T14:12:44Z

src/main.py

+            train(base_config, args.model_path)
+
+
+    elif args.mode == 'inference':


이건 개인 선호도의 차이일 수도 있지만, 위에서 train 단계에서는 깔끔하게 데이터 정의 후 학습 함수 이렇게 했는데, inference 단게에서는 상대적으로 덜 모듈화 되어있는 것 같다는 생각이 듭니다. inference도 함수로 만들어두신다면 더 가독성이 좋을 것 같습니다!

nonegom · 2024-10-09T14:19:41Z

src/model/model.py

+from model.optimizer import get_optimizer
+from model.MultiTaskLoss import MultiTaskLoss
+
+class Model(pl.LightningModule):


다른 모델과 다른 loss를 사용하시게 되는 부분을 주석으로 조금 더 설명을 추가해주시는 것도 좋을 것 같아요!

nonegom · 2024-10-09T14:20:54Z

src/model/model.py

+            loss = torch.sqrt(loss)
+        self.log("val_loss", loss)
+
+        # prog_bar=True를 통해 epoch마다 progress bar에 val_pearson 점수 출력


main으로 들어가게 되는 코드에는 사용하지 않는 코드는 제거해두면 좋을 것 같아요! 추후 사용해야 하는 코드라면, 이와 관련해 왜 주석 처리를 하는지 등에 대해서 추가적인 설명이 있으면 코드가 더 깔끔할 것 같습니다!

nonegom · 2024-10-09T14:23:30Z

src/streamlit.py

streamlit을 활용해 평가 결과를 쉽게 확인할 수 있게 만드신 시도가 신선하고, 또 좋은 것 같습니다!!

nonegom · 2024-10-09T14:26:53Z

src/utils/decorators.py

@@ -0,0 +1,6 @@
+def augment_func(name):


데코레이터 함수를 활용해서 함수의 속성을 관리하실 생각을 하신 게 좋은 것 같습니다. 만약, 이와 관련해서 질문이 들어온다면, 왜 데코레이터를 활용해서 augmentf function들을 관리하려고 했는지 추가 질문이 들어올 수도 있을 것 같습니다. (아마 이는 이미 알고 계실 것 같기는 한데, 이는 이걸 '알고 썼는지' 를 묻고 싶어서 질문이 들어올 수도 있을 것 같다는 생각이 들어, 간단한 피드백으로 작성해봤습니다. )

따로 기록된 내용이 없는 것 같아 정리용으로 남겨두겠습니다!

개인적으로 저희가 추가할 데이터 증강 기법의 수가 얼마나 늘어날지 모르는 상황에서,
새로운 증강 기법이 추가될 때마다 augmentation.py에서

함수를 import

함수의 약칭을 config.yaml에서 찾은 후 설정 값을 불러와 실행

이 과정이 추가되어야 한다는 게 귀찮기도 하고 (추가한 건 새로운 데이터 증강 방법인데, 증강 함수를 호출하기만 하면 되는 augmentation.py도 같이 수정되어야 한다는 것이) 비직관적이라고 생각했습니다.

그래서 의도하고자 한 것이 아래와 같았습니다:

특정 디렉토리에서 데이터 증강을 위한 함수(사실은 클래스)를 모두 찾아서 import하자.

import한 함수가 따로 설정해둔 alias를 확인하여 config.yaml으로부터 찾자.

이를 쉽게 할 수 있는 방법을 찾다보니 자연스럽게 데코레이터 함수에 대해 알게 되었고,
마침 제가 생각한 요구조건을 모두 만족할 수 있는 방법으로 보여 적용하였습니다.

hskhyl and others added 30 commits September 10, 2024 17:32

Create README.md

e4c930c

test

refactor: baseline 코드 클래스별로 분리하기

aff33f7

feat: config.yaml 추가 및 이를 불러오기 위한 config.py 추가

cab5038

refactor: .gitignore <- __pycache__

c235ead

refactor: config.yaml에 data path 정보 추가

4812f6a

refactor: config.yaml에 max_tokens 항목 추가

2c4ebac

refactor: config.yaml로 dataloader 정의하기

c7d1ad0

refactor: Add data and lightning_logs directories to .gitignore

08173b3

Merge pull request #6 from boostcampaitech7/pjs-issue-05

27c9e7a

#5 완료

fix: dataloader.py - args.shuffle -> self.shuffle

4202c3b

Merge pull request #8 from boostcampaitech7/pjs-issue-05

f8044b6

fix: dataloader.py - args.shuffle -> self.shuffle

fix: delete backticks

21ff8a5

Merge pull request #9 from boostcampaitech7/feature-ljb

c93ebba

fix: delete backticks

feat: add EDA.ipynb

2fa682c

feat: EDA.ipynb 문자열 길이에서 토큰 갯수로 변경

2dfe062

refactor: data_pipeline/EDA.ipynb -> notebooks/EDA.ipynb

c7cb902

Merge pull request #10 from boostcampaitech7/feature-ljb

4467a0c

feat: add EDA.ipynb

feat: data visualization w. streamlit

62cd269

feat: Add data augmentation for dataloader.py

357d2c9

feat: Add data augmentation config for config.yaml

6fd5a3a

refactor: Skip data augmentation for items with label 0

49d954b

feat: Apply data augmentation to training data in Dataloader

78335c2

Merge pull request #12 from boostcampaitech7/feature-ljb

3e77921

feat: data augmentation

feat: add visualization about token length

3132d2e

refactor: remove print statement used to debug

bd5c77a

Merge pull request #13 from boostcampaitech7/pjs-issue-06

d844a86

feat: label 별 데이터 분포 및 token length 별 데이터 분포 시각화

refactor: delete return of validation_step method

19993b1

Fix: config.yaml file has to move on to src folder. not home

49cc899

Fix: change the path of predict

dbe66ea

rasauq1122 and others added 28 commits September 26, 2024 11:48

fix: fix error occurs when sum of weights is zero. + minor fix

7c8c4d3

Merge pull request #87 from boostcampaitech7/pjs-issue-84

6aab2a4

feat: interactive한 앙상블

feat: can load .ckpt file

3d55fa3

Merge pull request #89 from boostcampaitech7/pjs-issue-88

3658455

feat: can load .ckpt file

fix: round error + path error

7deecb3

feat: ensemble.py can recognize each items of csv is all equal.

a24dbee

Feat : result EDA of ensemble

f58fbbf

앙상블 모델 결정 과정에서 분석한 EDA 코드입니다.

Update requirements.txt

860454e

requirements.txt 최종 업데이트

Update README.md

7f5856c

Update README.md

a269839

Update README.md

87ec069

Update README.md

1223462

Update README.md

6346fab

Update README.md

7638e81

Update README.md

1afaeb2

Update README.md

ded298a

Update README.md

e2670dd

Update README.md

1561d37

Update README.md

4c42ee3

Update README.md

f4fb74a

Update README.md

3f7109a

Update README.md

4b1982f

Update README.md

89b3955

Update README.md

8f40b27

Merge pull request #90 from boostcampaitech7/hskhyl-Readme-update-#1

be1fa2b

Update README.md

Update README.md

f19083c

Update config.yaml

15ac0ef

Update README.md

06146eb

nonegom approved these changes Oct 9, 2024

View reviewed changes

Update README.md

e98fa64

		train(base_config, args.model_path)


		elif args.mode == 'inference':

Uh oh!

Feedback #91

Are you sure you want to change the base?

Feedback #91

Uh oh!

Conversation

rasauq1122 commented Oct 1, 2024

Uh oh!

nonegom left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants