Skip to content

Commit ebab6b3

Browse files
Qwen QA Android example using ONNX Runtime (#521)
1 parent f83e708 commit ebab6b3

File tree

74 files changed

+153720
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+153720
-0
lines changed
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
*.iml
2+
.gradle
3+
/local.properties
4+
/.idea/caches
5+
/.idea/libraries
6+
/.idea/modules.xml
7+
/.idea/workspace.xml
8+
/.idea/navEditor.xml
9+
/.idea/assetWizardSettings.xml
10+
.DS_Store
11+
/build
12+
/captures
13+
.externalNativeBuild
14+
.cxx
15+
local.properties
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Local Qwen LLM on Android
2+
3+
This example shows how to run Qwen2.5-0.5B-Instruct and Qwen3-0.6B entirely on an Android device using ONNX Runtime.
4+
All tokens are generated offline on the phone no network calls, no telemetry.
5+
6+
---
7+
8+
## Key features
9+
10+
- On-device inference with the official onnxruntime-android.
11+
- Tokenizer compatibility – reads the Hugging Face-standard tokenizer.json shipped with Qwen.
12+
- Prompt formatting for Qwen 2.5 and Qwen 3, including the **Thinking Mode** toggle supported by Qwen3.
13+
- Streaming generation with past-KV caching for smooth, low-latency text output (see [OnnxModel.kt](app/src/main/java/com/example/local_llm/OnnxModel.kt)).
14+
- Output supports Markdown — copy and reuse formatted answers anywhere.
15+
16+
17+
---
18+
19+
## 📸 Inference Preview
20+
21+
<p align="center">
22+
<img src="demo/Demo.gif" alt="Model Output 2" width="25%" style="margin: 1%"/>
23+
<img src="demo/Demo2.gif" alt="Input Prompt" width="25%" style="margin: 1%"/>
24+
<img src="demo/Qwen3demo.gif" alt="Input Prompt" width="25%" style="margin: 1%"/>
25+
</p>
26+
27+
<p align="center">
28+
<em>Figure: App interface showing prompt input and generated answers using the local LLM.</em>
29+
</p>
30+
31+
---
32+
33+
## Model Info
34+
35+
This app supports both **Qwen2.5-0.5B-Instruct** and **Qwen3-0.6B** — optimized for instruction-following, QA, and reasoning tasks.
36+
37+
### Option 1: Use Preconverted ONNX Model
38+
39+
Download the `model.onnx` and `tokenizer.json` from Hugging Face:
40+
41+
- 🔹 [Qwen2.5](https://huggingface.co/onnx-community/Qwen2.5-0.5B-Instruct)
42+
- 🔹 [Qwen3](https://huggingface.co/onnx-community/Qwen3-0.6B-ONNX)
43+
44+
- You can also use quantized models (e.g., `model_q4fp16.onnx`) for faster, lighter inference with minimal accuracy loss.
45+
46+
### ⚙️ Option 2: Convert Model Yourself
47+
48+
```bash
49+
pip install optimum[onnxruntime]
50+
# or
51+
python -m pip install git+https://github.com/huggingface/optimum.git
52+
```
53+
54+
Export the model:
55+
56+
```bash
57+
optimum-cli export onnx --model Qwen/Qwen2.5-0.5B-Instruct qwen2.5-0.5B-onnx/
58+
```
59+
60+
- You can also convert any fine-tuned variant by specifying the model path.
61+
- Learn more about [Optimum here](https://huggingface.co/docs/optimum/main/en/index).
62+
63+
---
64+
65+
## ⚙️ Requirements
66+
67+
- [Android Studio](https://developer.android.com/studio)
68+
- [ONNX Runtime for Android](https://github.com/microsoft/onnxruntime-genai/releases) (already included in this repo).
69+
- A physical Android device for deployment and testing, ≥ 4 GB RAM for FP16 / Q4 models, ≥ 6 GB RAM for FP32 models.
70+
- Real hardware preferred—emulators are acceptable for UI checks only.
71+
72+
---
73+
#### Choose which Qwen model to run
74+
75+
In[MainActivity.kt](app/src/main/java/com/example/local_llm/MainActivity.kt) you will find two pre-defined `ModelConfig` objects:
76+
77+
```kotlin
78+
val modelconfigqwen25 =// Qwen 2.5-0.5B
79+
val modelconfigqwen3 =// Qwen 3-0.6B
80+
````
81+
Right below them is a single line that tells the app which one to use:
82+
83+
````kotlin
84+
val config = modelconfigqwen25 // ← change to modelconfigqwen3 for Qwen 3
85+
````
86+
87+
## How to Build & Run
88+
89+
1. Open Android Studio and create a new project (Empty Activity).
90+
2. Name your app `local_llm`.
91+
3. Copy all the project files from `Qwen_QA/Android` into the appropriate folders.
92+
4. Place your `model.onnx` and `tokenizer.json` in:
93+
```
94+
app/src/main/assets/
95+
```
96+
5. Connect your Android phone using wireless debugging or USB.
97+
6. To install:
98+
- Press Run ▶️ in Android Studio, **or**
99+
- Go to **Build → Generate Signed Bundle / APK** to export the `.apk` file.
100+
7. Once installed, look for the **Pocket LLM** icon&nbsp;
101+
<img src="demo/pocket_llm_icon.png" alt="Pocket LLM icon" width="28" style="vertical-align:middle;border-radius:100%"/>
102+
on your home screen.
103+
104+
**Note**: All Kotlin files are declared in the package com.example.local_llm, and the Gradle script sets applicationId "com.example.local_llm".
105+
If you name the app (or change the package) to anything other than local_llm, you must refactor:
106+
- The directory structure in app/src/main/java/...,
107+
- Every package com.example.local_llm line, and
108+
- The applicationId in app/build.gradle.
109+
- Otherwise, Android Studio will raise “package … does not exist” errors and the project will fail to compile.
110+
----
111+
112+
## Customize Your App Experience with These
113+
- Define the assistant’s tone and role by setting defaultSystemPrompt (in your model config).
114+
- Adjust TEMPERATURE to control response randomness — lower for accuracy, higher for creativity ([OnnxModel.kt](app/src/main/java/com/example/local_llm/OnnxModel.kt)).
115+
- Use REPETITION_PENALTY to avoid repetitive answers and improve fluency ([OnnxModel.kt](app/src/main/java/com/example/local_llm/OnnxModel.kt)).
116+
- Change MAX_TOKENS to limit or expand the length of generated replies ([OnnxModel.kt](app/src/main/java/com/example/local_llm/OnnxModel.kt)).
117+
118+
### 📄 License Notice
119+
Note: These ONNX models are based on Qwen, which is licensed under the [Apache License 2.0](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct/blob/main/LICENSE).
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/build
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
plugins {
2+
alias(libs.plugins.android.application)
3+
alias(libs.plugins.kotlin.android)
4+
alias(libs.plugins.kotlin.compose)
5+
}
6+
7+
android {
8+
namespace = "com.example.local_llm"
9+
compileSdk = 35
10+
11+
defaultConfig {
12+
applicationId = "com.example.local_llm"
13+
minSdk = 24
14+
targetSdk = 35
15+
versionCode = 1
16+
versionName = "1.0"
17+
18+
testInstrumentationRunner = "androidx.test.runner.AndroidJUnitRunner"
19+
}
20+
21+
buildTypes {
22+
release {
23+
isMinifyEnabled = false
24+
proguardFiles(
25+
getDefaultProguardFile("proguard-android-optimize.txt"),
26+
"proguard-rules.pro"
27+
)
28+
}
29+
}
30+
compileOptions {
31+
sourceCompatibility = JavaVersion.VERSION_11
32+
targetCompatibility = JavaVersion.VERSION_11
33+
}
34+
kotlinOptions {
35+
jvmTarget = "11"
36+
}
37+
buildFeatures {
38+
compose = true
39+
viewBinding = true
40+
}
41+
}
42+
43+
dependencies {
44+
45+
implementation(libs.androidx.core.ktx)
46+
implementation(libs.androidx.lifecycle.runtime.ktx)
47+
implementation(libs.androidx.activity.compose)
48+
implementation(platform(libs.androidx.compose.bom))
49+
implementation(libs.androidx.ui)
50+
implementation(libs.androidx.ui.graphics)
51+
implementation(libs.androidx.ui.tooling.preview)
52+
implementation(libs.androidx.material3)
53+
implementation(libs.onnxruntime.android)
54+
implementation(libs.androidx.appcompat)
55+
testImplementation(libs.junit)
56+
androidTestImplementation(libs.androidx.junit)
57+
androidTestImplementation(libs.androidx.espresso.core)
58+
androidTestImplementation(platform(libs.androidx.compose.bom))
59+
androidTestImplementation(libs.androidx.ui.test.junit4)
60+
debugImplementation(libs.androidx.ui.tooling)
61+
implementation (libs.json.json)
62+
implementation("androidx.constraintlayout:constraintlayout:2.1.4")
63+
implementation(files("libs/onnxruntime-genai-android-0.7.1.aar"))
64+
implementation("io.noties.markwon:core:4.6.2")
65+
}
Binary file not shown.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Add project specific ProGuard rules here.
2+
# You can control the set of applied configuration files using the
3+
# proguardFiles setting in build.gradle.
4+
#
5+
# For more details, see
6+
# http://developer.android.com/guide/developing/tools/proguard.html
7+
8+
# If your project uses WebView with JS, uncomment the following
9+
# and specify the fully qualified class name to the JavaScript interface
10+
# class:
11+
#-keepclassmembers class fqcn.of.javascript.interface.for.webview {
12+
# public *;
13+
#}
14+
15+
# Uncomment this to preserve the line number information for
16+
# debugging stack traces.
17+
#-keepattributes SourceFile,LineNumberTable
18+
19+
# If you keep the line number information, uncomment this to
20+
# hide the original source file name.
21+
#-renamesourcefileattribute SourceFile
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<?xml version="1.0" encoding="utf-8"?>
2+
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
3+
xmlns:tools="http://schemas.android.com/tools">
4+
5+
<application
6+
android:allowBackup="true"
7+
android:dataExtractionRules="@xml/data_extraction_rules"
8+
android:fullBackupContent="@xml/backup_rules"
9+
android:icon="@mipmap/ic_launcher_2"
10+
android:label="@string/app_name"
11+
android:roundIcon="@mipmap/ic_launcher_2_round"
12+
android:supportsRtl="true"
13+
14+
android:theme="@style/Theme.local_llm"
15+
tools:targetApi="31">
16+
<activity
17+
android:name=".MainActivity"
18+
android:exported="true"
19+
android:label="@string/app_name"
20+
android:theme="@style/Theme.local_llm">
21+
<intent-filter>
22+
<action android:name="android.intent.action.MAIN" />
23+
24+
<category android:name="android.intent.category.LAUNCHER" />
25+
</intent-filter>
26+
</activity>
27+
</application>
28+
29+
</manifest>
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
### Add model.onnx and tokenizer.json in this folder

0 commit comments

Comments
 (0)