Deploy SageMaker AI inference endpoints with set GPU capacity using training plans

Deploying large language models (LLMs) for inference requires reliable GPU capacity, especially during critical evaluation periods, limited-duration production testing, or burst workloads. Capacity constraints can delay deployments and impact application performance. Customers can use Amazon SageMaker AI training plans to reserve compute capacity for specified time periods. Originally designed for training workloads, training plans now … Read more

Accelerating custom entity recognition with Claude tool use in Amazon Bedrock

Businesses across industries face a common challenge: how to efficiently extract valuable information from vast amounts of unstructured data. Traditional approaches often involve resource-intensive processes and inflexible models. This post introduces a game-changing solution: Claude Tool use in Amazon Bedrock which uses the power of large language models (LLMs) to perform dynamic, adaptable entity recognition … Read more

How Reco transforms security alerts using Amazon Bedrock

This post is cowritten by Tal Shapira and Tamir Friedman from Reco. Reco helps organizations strengthen the security of their software as a service (SaaS) applications and accelerate business without compromise. Using Anthropic Claude in Amazon Bedrock, Reco tackles the challenge of machine-readable security alerts that SOC teams struggle to quickly interpret. This implementation helps … Read more

Integrating Amazon Bedrock AgentCore with Slack

Integrating Amazon Bedrock AgentCore with Slack brings AI agents directly into your workspace. Your teams can interact with agents without jumping between applications, losing conversation history, or re-authenticating. The integration handles three technical requirements: validating Slack event requests for security, maintaining conversation context across threads, and managing responses that exceed Slack’s timeout limits. Developers typically … Read more

Overcoming LLM hallucinations in regulated industries: Artificial Genius’s deterministic models on Amazon Nova

This post is cowritten by Paul Burchard and Igor Halperin from Artificial Genius. The proliferation of large language models (LLMs) presents a significant paradox for highly regulated industries like financial services and healthcare. The ability of these models to process complex, unstructured information offers transformative potential for analytics, compliance, and risk management. However, their inherent … Read more

Run NVIDIA Nemotron 3 Super on Amazon Bedrock

Nemotron 3 Super is now available as a fully managed and serverless model on Amazon Bedrock, joining the Nemotron Nano models that are already available within the Amazon Bedrock environment. With NVIDIA Nemotron open models on Amazon Bedrock, you can accelerate innovation and deliver tangible business value without managing infrastructure complexities. You can power your … Read more

Use RAG for video generation using Amazon Bedrock and Amazon Nova Reel

Generating high-quality custom videos remains a significant challenge, because video generation models are limited to their pre-trained knowledge. This limitation affects industries such as advertising, media production, education, and gaming, where customization and control of video generation is essential. To address this, we developed a Video Retrieval Augmented Generation (VRAG) multimodal pipeline that transforms structured … Read more

Introducing V-RAG: revolutionizing AI-powered video production with Retrieval Augmented Generation

A key development in generative AI is AI-powered video generation. Before AI, creating dynamic video content required extensive resources, technical expertise, and significant manual effort. Today, AI models can generate videos from simple inputs, but organizations still face challenges like unpredictable results. This post introduces Video Retrieval-Augmented Generation (V-RAG), an approach to help improve video … Read more

Enhanced metrics for Amazon SageMaker AI endpoints: deeper visibility for better performance

Running machine learning (ML) models in production requires more than just infrastructure resilience and scaling efficiency. You need nearly continuous visibility into performance and resource utilization. When latency increases, invocations fail, or resources become constrained, you need immediate insight to diagnose and resolve issues before they impact your customers. Until now, Amazon SageMaker AI provided … Read more

Enforce data residency with Amazon Quick extensions for Microsoft Teams

Organizations with users in multiple geographies face data residency requirements such as General Data Protection Regulation (GDPR) in Europe, country-specific data sovereignty laws, and internal compliance policies. Amazon Quick with Microsoft 365 extensions supports Regional routing to meet these requirements. Amazon Quick supports multi-Region deployments so you can route users to AWS Region-specific Amazon Quick … Read more

Kick off Nova customization experiments using Nova Forge SDK

With a wide array of Nova customization offerings, the journey to customization and transitioning between platforms has traditionally been intricate, necessitating technical expertise, infrastructure setup, and considerable time investment. This disconnect between potential and practical applications is precisely what we aimed to address. Nova Forge SDK makes large language model (LLM) customization accessible, empowering teams … Read more

Introducing Nova Forge SDK, a seamless way to customize Nova models for enterprise AI

Large language models (LLMs) have transformed how we interact with AI, but one size doesn’t fit at all. Out-of-the-box LLMs are trained with broad, general knowledge and improved for a wide range of use cases, but they often fall short when it comes to domain-specific tasks, proprietary workflows, or unique business requirements. Enterprise customers increasingly … Read more

Evaluating AI agents for production: A practical guide to Strands Evals

Moving AI agents from prototypes to production surfaces a challenge that traditional testing is unable to address. Agents are flexible, adaptive, and context-aware by design, but the same qualities that make them powerful also make them difficult to evaluate systematically. Traditional software testing relies on deterministic outputs: same input, same expected output, every time. AI … Read more

Build an AI-Powered A/B testing engine using Amazon Bedrock

Organizations commonly rely on A/B testing to optimize user experience, messaging, and conversion flows. However, traditional A/B testing assigns users randomly and requires weeks of traffic to reach statistical significance. While effective, this process can be slow and might not fully leverage early signals in user behavior. This post shows you how to build an … Read more

How Bark.com and AWS collaborated to build a scalable video generation solution

This post is cowritten with Hammad Mian and Joonas Kukkonen  from Bark.com. When scaling video content creation, many companies face the challenge of maintaining quality while reducing production time. This post demonstrates how Bark.com and AWS collaborated to solve this problem, showing you a replicable approach for AI-powered content generation. Bark.com used Amazon SageMaker and … Read more

Migrate from Amazon Nova 1 to Amazon Nova 2 on Amazon Bedrock

If you’re running Amazon Nova 1 models on Amazon Bedrock, you might be looking to expand your context window size, deepen reasoning capabilities, or integrate external tools for web search and code execution. Amazon Nova 2 models address these constraints while improving performance on reasoning, agentic AI, and tool use benchmarks. Amazon Nova 2 Lite … Read more

AWS AI League: Atos fine-tunes approach to AI education

This post is co-written with Mark Ross from Atos. Organizations pursuing AI transformation can face a familiar challenge: how to upskill their workforce at scale in a way that changes how teams build, deploy, and use AI. Traditional AI training approaches—online courses, certification programs, and classroom-based instruction—are necessary, but often insufficient. While they build foundational … Read more

AWS and NVIDIA deepen strategic collaboration to accelerate AI from pilot to production

AI is moving fast, and for most of our customers, the real opportunity isn’t in experimenting with it—it’s in running AI in production where it drives meaningful business outcomes. This means building systems that run reliably, perform at scale, and meet your organization’s security and compliance requirements. Today at NVIDIA GTC 2026, AWS and NVIDIA … Read more

Introducing Disaggregated Inference on AWS powered by llm-d

We thank Greg Pereira and Robert Shaw from the llm-d team for their support in bringing llm-d to AWS. In the agentic and reasoning era, large language models (LLMs) generate 10x more tokens and compute through complex reasoning chains compared to single-shot replies. Agentic AI workflows also create highly variable demands and another exponential increase in … Read more

How Workhuman built multi-tenant self-service reporting using Amazon Quick Sight embedded dashboards

This post is cowritten with Ilija Subanovic and Michael Rice from Workhuman. Workhuman’s customer service and analytics team were drowning in one-time reporting requests from seven million users worldwide—a common challenge with legacy reporting tools at scale. Business intelligence (BI) admins faced mounting pressure as their teams became overwhelmed with these requests. By rebuilding their … Read more

Build an offline feature store using Amazon SageMaker Unified Studio and SageMaker Catalog

Building and managing machine learning (ML) features at scale is one of the most critical and complex challenges in modern data science workflows. Organizations often struggle with fragmented feature pipelines, inconsistent data definitions, and redundant engineering efforts across teams. Without a centralized system for storing and reusing features, models risk being trained on outdated or … Read more

P-EAGLE: Faster LLM inference with Parallel Speculative Decoding in vLLM

EAGLE is the state-of-the-art method for speculative decoding in large language model (LLM) inference, but its autoregressive drafting creates a hidden bottleneck: the more tokens that you speculate, the more sequential forward passes the drafter needs. Eventually those overhead eats into your gains. P-EAGLE removes this ceiling by generating all K draft tokens in a … Read more

Improve operational visibility for inference workloads on Amazon Bedrock with new CloudWatch metrics for TTFT and Estimated Quota Consumption

As organizations scale their generative AI workloads on Amazon Bedrock, operational visibility into inference performance and resource consumption becomes critical. Teams running latency-sensitive applications must understand how quickly models begin generating responses. Teams managing high-throughput workloads must understand how their requests consume quota so they can avoid unexpected throttling. Until now, gaining this visibility required … Read more

Secure AI agents with Policy in Amazon Bedrock AgentCore

Deploying AI agents safely in regulated industries is challenging. Without proper boundaries, agents that access sensitive data or execute transactions can pose significant security risks. Unlike traditional software, an AI agent chooses actions to achieve a goal by invoking tools, accessing data, and adapting its reasoning using data from its environment and users. This autonomy … Read more

Multimodal embeddings at scale: AI data lake for media and entertainment workloads

This post shows you how to build a scalable multimodal video search system that enables natural language search across large video datasets using Amazon Nova models and Amazon OpenSearch Service. You will learn how to move beyond manual tagging and keyword-based searches to enable semantic search that captures the full richness of video content. We … Read more

Fine-tuning NVIDIA Nemotron Speech ASR on Amazon EC2 for domain adaptation

This post is a collaboration between AWS, NVIDIA and Heidi.  Automatic speech recognition (ASR), often called speech-to-text (STT) is becoming increasingly critical across industries like healthcare, customer service, and media production. While pre-trained models offer strong capabilities for general speech, fine-tuning for specific domains and use cases can enhance accuracy and performance. In this post, … Read more

Accelerate custom LLM deployment: Fine-tune with Oumi and deploy to Amazon Bedrock

This post is cowritten by David Stewart and Matthew Persons from Oumi. Fine-tuning open source large language models (LLMs) often stalls between experimentation and production. Training configurations, artifact management, and scalable deployment each require different tools, creating friction when moving from rapid experimentation to secure, enterprise-grade environments. In this post, we show how to fine-tune … Read more

Run NVIDIA Nemotron 3 Nano as a fully managed serverless model on Amazon Bedrock

This post is cowritten with Abdullahi Olaoye, Curtice Lockhart, Nirmal Kumar Juluru from NVIDIA. We are excited to announce that NVIDIA’s Nemotron 3 Nano is now available as a fully managed and serverless model in Amazon Bedrock. This follows our earlier announcement at AWS re:Invent supporting NVIDIA Nemotron 2 Nano 9B and NVIDIA Nemotron 2 … Read more