AidGenSE Development Documentation
Introduction
AidGenSE is a generative AI HTTP service based on the AidGen SDK wrapper that adapts to the OpenAI HTTP protocol. Developers can call generative AI through HTTP and quickly integrate it into their applications.
💡Note
All large language models supported by Model Farm achieve inference acceleration on Qualcomm NPUs through AidGen.
Support Status
Model Format and Backend Support
| Model Format | CPU | GPU | NPU |
|---|---|---|---|
| .gguf | ✅ | ✅ | ❌ |
| .bin | ❌ | ❌ | ✅ |
| .aidem | ❌ | ❌ | ✅ |
✅: Supported ❌: Not supported
Operating System Support
| Linux | Android |
|---|---|
| ✅ | 🚧 |
✅: Supported 🚧: Planned support
AidGenSE Service Installation and Operation
Installation
# Install aidgen sdk
sudo aid-pkg update
sudo aid-pkg -i aidgenseModel Query & Retrieval
# View supported models
aidllm remote-list apiExample output:
Current Soc : 8550
Name Url CreateTime
----- --------- ---------
qwen2.5-0.5B-Instruct-8550 aplux/qwen2.5-0.5B-Instruct-8550 2025-03-05 14:52:23
qwen2.5-3B-Instruct-8550 aplux/qwen2.5-3B-Instruct-8550 2025-03-05 14:52:37
Qwen2.5-VL-3B-392x392-8550 aplux/Qwen2.5-VL-3B-392x392-8550 2025-12-02 16:48:32
Qwen2.5-VL-3B-672x672-8550 aplux/Qwen2.5-VL-3B-672x672-8550 2025-12-02 16:48:05
Qwen2.5-VL-3B-Instruct-q4_k_m aplux/Qwen2.5-VL-3B-Instruct-q4_k_m 2026-03-10 11:00:27
...# Download model
aidllm pull api [Url] # aplux/qwen2.5-3B-Instruct-8550
# View downloaded models
aidllm list api
# Delete downloaded model
sudo aidllm rm api [Name] # qwen2.5-3B-Instruct-8550Starting the Service
# Start OpenAI API service for the corresponding model
aidllm start api -m <model_name>
# Check status
aidllm status api
# Stop service
aidllm stop api
# Restart service
aidllm restart api💡Note
Default port number is 8888
Chat Testing
Using Web UI for chat testing
# Install UI frontend service
sudo aidllm install ui
# Start UI service
aidllm start ui
# Check UI service status
aidllm status ui
# Stop UI service
aidllm stop ui💡Note
After the UI service starts, access http://ip:51104
Large Language Model Chat Testing
For programmatic integration, call the /v1/chat/completions endpoint via HTTP POST request with a messages array. Set "stream": true to enable streaming output, which returns generated tokens one by one.
Python API Call Example:
import os
import requests
import json
def stream_chat_completion(messages, model="qwen2.5-3B-Instruct-8550"):
url = "http://127.0.0.1:8888/v1/chat/completions"
headers = {
"Content-Type": "application/json"
}
payload = {
"model": model,
"messages": messages,
"stream": True # Enable streaming
}
# Make request with stream=True
response = requests.post(url, headers=headers, json=payload, stream=True)
response.raise_for_status()
# Read line by line and parse SSE format
for line in response.iter_lines():
if not line:
continue
# print(line)
line_data = line.decode('utf-8')
# Each SSE line starts with "data: " prefix
if line_data.startswith("data: "):
data = line_data[len("data: "):]
# End marker
if data.strip() == "[DONE]":
break
try:
chunk = json.loads(data)
except json.JSONDecodeError:
# Print and skip when parsing fails
print("Unable to parse JSON: ", data)
continue
# Extract model output token
content = chunk["choices"][0]["delta"].get("content")
if content:
print(content, end="", flush=True)
if __name__ == "__main__":
# Example conversation
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello."}
]
print("Assistant:", end=" ")
stream_chat_completion(messages)
print() # New lineVision Language Model Chat Testing
AidGenSE supports Vision Language Models (VLM), enabling image understanding and description. In messages, use a content array to pass both text and images: text is represented as {"type": "text", "text": "..."}, and images are passed as {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} with base64-encoded data.
Python API Call Example:
import os
import requests
import json
import base64
def encode_image_to_base64(image_path):
"""Encode image file to base64 string."""
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
def stream_chat_completion(messages, model="Qwen2.5-VL-3B-392x392-8550"):
url = "http://127.0.0.1:8888/v1/chat/completions"
headers = {
"Content-Type": "application/json"
}
payload = {
"model": model,
"messages": messages,
"stream": True # Enable streaming
}
# Make request with stream=True
response = requests.post(url, headers=headers, json=payload, stream=True)
response.raise_for_status()
# Read line by line and parse SSE format
for line in response.iter_lines():
if not line:
continue
line_data = line.decode('utf-8')
# Each SSE line starts with "data: " prefix
if line_data.startswith("data: "):
data = line_data[len("data: "):]
# End marker
if data.strip() == "[DONE]":
break
try:
chunk = json.loads(data)
except json.JSONDecodeError:
# Print and skip when parsing fails
print("Unable to parse JSON: ", data)
continue
# Extract model output token
content = chunk["choices"][0]["delta"].get("content")
if content:
print(content, end="", flush=True)
if __name__ == "__main__":
# Encode image to base64
image_path = "/path/to/your/image.jpg"
image_base64 = encode_image_to_base64(image_path)
# Example conversation with image
messages = [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Please describe the content of this image."
},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64," + image_base64
}
}
]
}
]
print("Assistant:", end=" ")
stream_chat_completion(messages)
print() # New lineImage Format Restrictions
- MIME Types: Only
image/jpegandimage/pngformats are supported. For PNG format, change the MIME in the URL fromimage/jpegtoimage/png - Encoding: Only base64 encoding is supported, format:
data:image/jpeg;base64,<base64_string> - Image Dimensions: Maximum single side 7680 pixels, total pixels ≤ 33177600 (approx. 8K UHD resolution), minimum 1×1 pixels
- Not Supported: Automatic image download from URL