先定义部署边界，再选择模型

Boundary 先于 model

Model choice 应当跟随部署边界。

一个 model 即使孤立来看表现很强，如果它无法在工作流需要存在的位置运行，无法使用被允许的 data path，或无法以可维护的方式 hand over 给客户团队，也可能并不适合真实系统。

在评估 models 之前，先定义 operating shape：

model 被允许在哪里运行
哪些 data 可以进入或离开运行时
谁可以 trigger、评审或 maintain 工作流
environment 施加了怎样的 re来源 ceiling
delivery 之后必须保留哪些证据与交接说明

Boundary 不是文书工作。它是 technical design 的一部分。

Boundary 包含什么

Data movement

定义哪些材料可以被复制、转换、缓存、导出，或保持本地。

允许用于模型工作的来源材料
Temporary files 与 intermediate 工件
Logs、prompts、输出与评审材料
Deletion、return 或 retention expectations

Runtime location

定义 model 与 supporting 工作流实际可以在哪里运行。

Local workstation
On-premise machine
Customer-控制led private environment
Restricted 或 disconnected environment
Scoped remote 运行时，如果被允许

Access path

定义谁可以操作工作流，以及通过哪个界面操作。

Operator access
Admin access
Review access
敏感 actions 之前的 approval points
first delivery 之后的 handoff responsibility

Re来源 ceiling

在选择 model 之前，定义实际 operating limits。

Available memory 或 VRAM
Acceptable latency
Context length requirements
Storage 与 cache limits
Batch、queue 或 concurrency assumptions

Handoff and 证据 path

定义 implementation 之后仍必须可理解的内容。

Evaluation examples
Run records
Known-limit notes
Rollback path
Operator notes 与 next-step constraints

Boundary 如何改变 model choice

Boundary 对 model decision 的影响，可能比 benchmark result 更大。

一个 model 也许在技术上很有吸引力，但如果它把 environment 推到 operating limits 之外，要求客户无法使用的 data path，或制造团队无法维护的 handoff complexity，它就并不合适。

Boundary decisions 会影响：

model 应在本地运行，还是在限定范围的 remote 运行时中运行
是否需要 quantization
适配器-based work 是否优于 full model replacement
评估是否应聚焦 latency、behavior、format stability 或评审ability
部署是否需要在 user-facing activation 之前具备 rollback path

从能够存在于边界内的 models 中选择。不要在选择 model 之后再定义边界。

Review questions

model 被允许在哪里运行？
哪些来源材料可以进入运行时？
哪些输出、logs 或 temporary files 可以离开 environment？
谁可以 trigger inference、评估或适配？
哪些 hardware、memory 与 latency limits 重要？
在 reuse 或部署前，哪些内容需要客户评审？
如果 behavior 改变，rollback path 是什么？
handoff 之后谁拥有 operation？

如果这些答案不清楚，model shortlist 就为时过早。

一个有用的模式

1. Boundary

定义部署 environment、data movement limits、访问路径、re来源 ceiling 与 handoff expectations。

2. Candidate model path

筛选能够适应该边界的 models、适配器、运行时s 与 quantization choices。

3. Evaluation 证据

用 task examples、known-limit cases、latency expectations 与评审 criteria 测试 candidate path。

这让模型工作始终连接到一个真正能够被 operated 的 environment。

Takeaway

不要从“the best model”开始。

从部署边界开始：

Data movement
Runtime location
Access path
Re来源 ceiling
Evidence 与 handoff

然后选择能够在这个 shape 内运行、被评审，并被维护的 model path。