<p>人工智能公司 Together 最近披露，他们成功实施了技术，将推理延迟减少了 50-100 毫秒。这是通过利用量化和智能解码方法实现的，导致每个令牌的成本显著降低，最高可达五倍。这些改进对于优化人工智能解决方案的性能和成本效益至关重要。</p>

<p>AI 公司 Together 最近披露，他们成功实施了技术，将推理延迟减少了 50-100 毫秒。这是通过利用量化和智能解码方法实现的，导致每个 token 的成本显著降低，最高可达五倍。这些改进对于优化 AI 解决方案的性能和成本效益至关重要</p>

<p>AI company Together has recently disclosed that they have successfully implemented techniques to reduce inference latency by 50-100 milliseconds in their production environment. This was achieved by utilizing quantization and smart decoding methods, resulting in a significant decrease in per-token costs by up to five times. These improvements are crucial in optimizing the performance and cost-effectiveness of AI solutions.</p>
<div class="lb-trans"><p>人工智能公司 Together 最近披露，他们成功实施了技术，将推理延迟减少了 50-100 毫秒。这是通过利用量化和智能解码方法实现的，导致每个令牌的成本显著降低，最高可达五倍。这些改进对于优化人工智能解决方案的性能和成本效益至关重要。</p>
</div>

New GPU tactics reduce AI inference costs by 40% - latency cut by 50-100ms, per-token costs down up to 5x through quantization and decoding strategies.

Unusual Whales

半导体 3 倍做多 - Direxion

C3.ai

标普软件与服务 ETF - SPDR

全球科技股指数 ETF - iShares

云计算 ETF - GlobalX

费城交易所 半导体 ETF - iShares

YieldMax AI Option Income Strategy ETF

标普半导体 ETF - SPDR

北美科技软件股指数 ETF - iShares

动态半导体 ETF - Invesco

Arrive AI

半导体 ETF - VanEck Vectors

新的 GPU 策略使 AI 推理成本降低了 40% - 延迟减少了 50-100 毫秒，通过量化和解码策略，每个 token 的成本最多降低了 5 倍