Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
FROM python:3.8-slim-buster

#build with --platform linux/arm64 or linux/amd64
FROM python:latest
RUN mkdir usr/app
WORKDIR usr/app

Expand Down
16 changes: 8 additions & 8 deletions docker/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Keras==2.0.7
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
numpy==1.18.5
pandas==1.1.4
scikit-learn==0.23.2
tensorflow==2.3.1
tokenizers==0.10.3
Keras
Keras-Applications
Keras-Preprocessing
numpy
pandas
scikit-learn
tensorflow
tokenizers
6 changes: 6 additions & 0 deletions lecture1/FinancialPhraseBank/License.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.

If you are interested in commercial use of the data, please contact the following authors for an appropriate license:

Pekka Malo email: [email protected]
Ankur Sinha email: [email protected]
74 changes: 74 additions & 0 deletions lecture1/FinancialPhraseBank/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
===========================================================

Documentation for Financial Phrase Bank v.1.0

===========================================================

Contents:

1. Introduction
2. Data
3. Acknowledgements
4. Contact Information
5. References

-----------------------------------------------------------

1. Introduction

The key arguments for the low utilization of statistical techniques in financial sentiment analysis have been the difficulty of implementation for practical applications and the lack of high quality training data for building such models. Especially in the case of finance and economic texts, annotated collections are a scarce resource and many are reserved for proprietary use only. To resolve the missing training data problem, we present a collection of ∼ 5000 sentences to establish human-annotated standards for benchmarking alternative modeling techniques.

The objective of the phrase level annotation task was to classify each example sentence into a positive, negative or neutral category by considering only the information explicitly available in the given sentence. Since the study is focused only on financial and economic domains, the annotators were asked to consider the sentences from the view point of an investor only; i.e. whether the news may have positive, negative or neutral influence on the stock price. As a result, sentences which have a sentiment that is not relevant from an economic or financial perspective are considered neutral.

-----------------------------------------------------------

2. Data

This release of the financial phrase bank covers a collection of 4840 sentences. The selected collection of phrases was annotated by 16 people with adequate background knowledge on financial markets. Three of the annotators were researchers and the remaining 13 annotators were master’s students at Aalto University School of Business with majors primarily in finance, accounting, and economics.

Given the large number of overlapping annotations (5 to 8 annotations per sentence), there are several ways to define a majority vote based gold standard. To provide an objective comparison, we have formed 4 alternative reference datasets based on the strength of majority agreement:

(i) sentences with 100% agreement [file=Sentences_AllAgree.txt];
(ii) sentences with more than 75% agreement [file=Sentences_75Agree.txt];
(iii) sentences with more than 66% agreement [file=Sentences_66Agree.txt]; and
(iv) sentences with more than 50% agreement [file=Sentences_50Agree.txt].

All reference datasets are included in the release. The files are in a machine-readable "@"-separated format:

sentence@sentiment

where sentiment is either "positive, neutral or negative".

E.g., The operating margin came down to 2.4 % from 5.7 % .@negative


-----------------------------------------------------------

3. Acknowledgements

The development of the Financial Phrase Bank v.1.0 was supported by Emil Aaltonen Foundation and Academy of Finland (grant no: 253583).

-----------------------------------------------------------

4. Contact Information

In case you have any questions regarding this phrase bank, please contact
Pekka Malo or Ankur Sinha for further information.

Pekka Malo email: [email protected]
Ankur Sinha email: [email protected]

-----------------------------------------------------------

5. References

If you plan to use the dataset for research or academic purposes, please cite the following publication. For commercial or any other than academic use of the dataset, contact us for an appropriate license.

Malo, P., Sinha, A., Korhonen, P., Wallenius, J., & Takala, P. (2014). Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65(4), 782-796.
-----------------------------------------------------------

Pekka Malo
Ankur Sinha
Pyry Takala
Pekka Korhonen
Jyrki Wallenius
Loading