MIT-Emerging-Talent · MUSABKAYMAK · Jun 1, 2025 · Jun 3, 2025 · Jun 8, 2025 · Jun 8, 2025
diff --git a/.markdownlint.yml b/.markdownlint.yml
@@ -1,3 +1,7 @@
 ignore:
   - venv
   - .github
+
+MD013:
+  line_length: 350
+
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -122,5 +122,11 @@
       "source.fixAll.ruff": "explicit",
       "source.organizeImports.ruff": "explicit"
     }
-  }
+  },
+  "cSpell.words": [
+    "Kerth",
+    "Nazario",
+    "NLTK",
+    "stopwords"
+  ]
 }
diff --git a/0_domain_study/README.md b/0_domain_study/README.md
@@ -1 +1,90 @@
-# Domain Research
+# 🛡️ Domain Study: Phishing and Linguistic Influence on User Behavior
+
+Welcome to the `0_domain_study` folder! This section summarizes our team's research into phishing — specifically the linguistic features that affect user click-through behavior. Below you'll find a structured overview of our research domain, background, and actionable insights.
+
+---
+
+## 📌 Problem Statement (Based on Team's Personal Experiences)
+
+**Research Question:**  
+_What type of linguistic features in phishing emails influence user click-through behavior?_
+
+Phishing is a growing concern globally. Based on personal experiences:
+
+- **Meklit** (Canada) frequently encounters smishing and phishing at work. One incident involved a fake OneDrive link flagged by IT — highlighting how rushed environments reduce our vigilance.
+- **Mahdia** (Portugal) emphasized how scammers use sophisticated linguistic techniques to appear trustworthy and manipulate users.
+- **Ahmad** often receives fake IT department emails at work and personal prize scams. He noticed a clear difference in language tone: urgent and professional at work vs emotional in personal life.
+- **Semira** initially was intrigued about fake malware by watching anti-virus software finds. She bolstered her cybersecurity knowledge through research and workshop to spot psychological manipulation such as fake tax threats or UPS warnings. She now verifies and reports these attempts.
+- **Musab** (USA) stressed the emotional toll of constant phishing attempts and how phishing poses both legal and financial risks.
+
+Together, we observed that phishing strategies are becoming more **emotionally manipulative**, **context-aware**, and **linguistically advanced**, requiring in-depth study of their language patterns.
+
+---
+
+## 🧠 Our Understanding of the Problem Domain (Using Systems Thinking)
+
+Phishing is a **socio-technical** problem involving three interconnected components:
+
+1. **Phishers (Attackers):**  
+   Skilled in social engineering. Use language to create urgency, trust, fear, or curiosity.
+
+2. **Communication Channels:**  
+   Mainly email, but also SMS (smishing) and voice (vishing). All channels aim to prompt the user into clicking or responding.
+
+3. **Recipients (Targets):**  
+   Everyday users or professionals. Often fall victim due to low awareness, poor digital hygiene, or stress.
+
+We particularly focus on the **linguistic layer**—how language is engineered to bypass cognitive defenses and influence behavior.
+
+---
+
+## ❓ Research Question
+
+> **"What type of linguistic features in phishing emails influence user click-through behavior?"**
+
+We updated the research question from the above to
+
+>**How do phishing emails differ from legitmate emails interms of common linguistic patterns and language tactics?**
+
+Given email’s dominant role in phishing, and the centrality of language in deceiving recipients, this research question aims to uncover patterns in wording, tone, and psychological triggers. The revised question aims to address for the lack of data in user click through behavior while still uncovering lingustic patterns
+in phishing and legitimate emails.
+
+## 📚 Background Review of the Domain
+
+### 1. **Human Psychology and Language Triggers**
+
+- **Emotions** like fear, urgency, or reward are widely used in phishing (Jakobsson & Myers, 2006).
+- **Users acting under pressure** are less likely to evaluate messages critically (Vishwanath et al., 2011).
+
+### 2. **Phishing Detection Tools**
+
+- Tools like email filters, browser warnings, and ML-based classifiers can detect known phishing messages (Bergholz et al., 2010).
+- However, attackers adapt quickly with new linguistic patterns to bypass these systems.
+
+### 3. **User Education**
+
+- Training and awareness programs are effective but vary in success.
+- **Interactive and ongoing training** is more impactful than one-off sessions (Jansson & Von Solms, 2013).
+
+### 4. **Evolving Threat Landscape**
+
+- **Spear phishing** and **smishing** are on the rise (Hong, 2012).
+- Smartphones and social platforms open new vectors.
+- Despite evolution, **email remains the most common attack method** (CISA, 2023).
+
+Click here: [Full Background Review](https://docs.google.com/document/d/1at2nE_Ladr2_HlNFqoaHtACwAhOVvcFE6qYVRcrerbg/edit?tab=t.0)
+
+### Conclusion
+
+Phishing success stems largely from **manipulating language to trigger impulsive reactions**. Understanding this manipulation can help in detection and prevention.
+
+---
+
+## 📂 Resources & References
+
+- **Bergholz et al. (2010)** – Email filtering via ML  
+- **Hong (2012)** – Evolution of phishing  
+- **Jakobsson & Myers (2006)** – Psychological manipulation in phishing  
+- **Jansson & Von Solms (2013)** – Phishing education effectiveness  
+- **Vishwanath et al. (2011)** – User susceptibility factors
+- **CISA (2023)** – Counter-Phishing Recommendations for Federal Agencies
diff --git a/0_domain_study/retrospective.md b/0_domain_study/retrospective.md
@@ -0,0 +1,93 @@
+# Domain Study Retrospective
+
+## Stop Doing
+
+- Relying solely on personal experiences without broader research validation
+- Working in isolation when researching complex technical concepts
+- Postponing documentation until research is "complete"
+
+## Continue Doing
+
+- Building from team members' personal experiences with phishing
+- Using systems thinking to understand the multi-faceted nature of phishing
+- Collaborative research approach with diverse cultural perspectives
+- Regular refinement of research questions based on data availability
+
+## Start Doing
+
+- Earlier validation of research feasibility with available datasets
+- More structured literature review process
+- Creating shared knowledge base for domain concepts
+- Setting clearer milestones for domain research completion
+
+## Lessons Learned
+
+1. **Personal experiences are valuable starting points** - Our team's diverse encounters with phishing across different countries provided rich initial insights
+2. **Research questions evolve** - We learned to adapt our focus from "user click-through behavior" to "linguistic patterns" based on data availability
+3. **Domain complexity requires structured approach** - Phishing operates as a socio-technical system requiring interdisciplinary understanding
+4. **Cultural diversity enhances understanding** - Different team members' geographic experiences revealed various phishing tactics and contexts
+
+---
+
+## Strategy vs. Board
+
+### What parts of your plan went as expected?
+
+- Successfully collected diverse personal experiences from team members across different countries
+- Developed a comprehensive understanding of phishing as a socio-technical problem
+- Created a solid foundation for understanding linguistic manipulation tactics
+- Established clear problem boundaries and scope
+
+### What parts of your plan did not work out?
+
+- Initial research question was too ambitious given available data constraints
+- Underestimated the time needed for thorough domain research
+- Limited access to current phishing campaign data for contemporary analysis
+
+### Did you need to add things that weren't in your strategy?
+
+- Literature review of existing phishing detection research
+- Technical feasibility assessment for linguistic analysis approaches
+- Data availability research to inform research question refinement
+- Systems thinking framework to understand phishing ecosystem
+
+### Or remove extra steps?
+
+- Removed user behavior survey component due to resource constraints
+- Simplified focus from multi-channel phishing to email-specific analysis
+- Reduced scope from real-time detection to pattern identification
+
+---
+
+## Individual Retrospectives
+
+### Meklit
+
+Contributed workplace phishing experiences from Canada, particularly around smishing and sophisticated fake OneDrive attacks. Learned about the importance of context in phishing detection and how work environments affect user vigilance.
+
+### Mahdia
+
+Provided insights from Portugal phishing landscape and emphasized sophisticated linguistic manipulation techniques. Developed understanding of trust-building language patterns used by scammers.
+
+### Ahmad
+
+Shared experiences with workplace IT phishing and personal prize scams, highlighting the difference in linguistic tones across contexts. Contributed to understanding of professional vs. personal phishing approaches.
+
+### Semira
+
+Brought cybersecurity workshop knowledge and experience with fake tax/delivery scams. Developed expertise in psychological manipulation tactics and verification processes.
+
+### Musab
+
+Emphasized the emotional and legal aspects of phishing from US perspective. Contributed to understanding the broader impact beyond just technical detection.
+
+---
+
+## Impact on Next Milestones
+
+This domain study established a solid foundation for:
+
+- Data collection strategy focusing on email-based phishing
+- Feature engineering approach for linguistic analysis
+- Understanding of psychological manipulation tactics to detect
+- Framework for interpreting results in broader phishing context