Skip to content

Concept

Here are some general concepts of LLM ecosystems.

DFA Overview

Inference and Serving

Serving refers to making the model accessible as a service.

Session ID

LoRA

low rank adaption

Input Enrichment

Embedding models translate the original query into an embedding.

Vector database

Prompt Optimization

LangChain

LLM Cache

Content Classifier or Filter

Remove harmful responses.

Feedback

OpenTelemetry

One Production-Level Implementation Overview

graph TB
   User -->|HTTPS| Nginx --> api[API Gateway] --> vLLM --> Redis --> OSS
   vLLM -->|Monitor| Prometheus --> Grafana
   api[API Gateway] -->|Auth| Keycloak(RBAC)

References