Access Anthropic Claude models in India on Amazon Bedrock with Global cross-Region inference

The adoption and implementation of generative AI inference has increased with organizations building more operational workloads that use AI capabilities in production at scale. To help customers achieve the scale of their generative AI applications, Amazon Bedrock offers cross-Region inference (CRIS) profiles. CRIS is a powerful feature that organizations can use to seamlessly distribute inference … Read more

Drive organizational growth with Amazon Lex multi-developer CI/CD pipeline

As your conversational AI initiatives evolve, developing Amazon Lex assistants becomes increasingly complex. Multiple developers working on the same shared Lex instance leads to configuration conflicts, overwritten changes, and slower iteration cycles. Scaling Amazon Lex development requires isolated environments, version control, and automated deployment pipelines. By adopting well-structured continuous integration and continuous delivery (CI/CD) practices, … Read more

Building custom model provider for Strands Agents with LLMs hosted on SageMaker AI endpoints

Organizations increasingly deploy custom large language models (LLMs) on Amazon SageMaker AI real-time endpoints using their preferred serving frameworks—such as SGLang, vLLM, or TorchServe—to help gain greater control over their deployments, optimize costs, and align with compliance requirements. However, this flexibility introduces a critical technical challenge: response format incompatibility with Strands agents. While these custom … Read more

Embed Amazon Quick Suite chat agents in enterprise applications

Organizations can face two critical challenges with conversational AI. First, users need answers where they work—in their CRM, support console, or analytics portal—not in separate tools. Second, implementing a secure embedded chat in their applications can require weeks of development to build authentication, token validation, domain security, and global distribution infrastructure. Amazon Quick Suite embedded … Read more

Unlock powerful call center analytics with Amazon Nova foundation models

Call center analytics play a crucial role in improving customer experience and operational efficiency. With foundation models (FMs), you can improve the quality and efficiency of call center operations and analytics. Organizations can use generative AI to assist human customer support agents and managers of contact center teams, so they can gain insights that are … Read more

How Ricoh built a scalable intelligent document processing solution on AWS

This post is cowritten by Jeremy Jacobson and Rado Fulek from Ricoh. This post demonstrates how enterprises can overcome document processing scaling limits by combining generative AI, serverless architecture, and standardized frameworks. Ricoh engineered a repeatable, reusable framework using the AWS GenAI Intelligent Document Processing (IDP) Accelerator. This framework reduced customer onboarding time from weeks … Read more

Building a scalable virtual try-on solution using Amazon Nova on AWS: part 1

In this first post in a two-part series, we examine how retailers can implement a virtual try-on to improve customer experience. In part 2, we will further explore real-world applications and benefits of this innovative technology. Every fourth piece of clothing bought online is returned to the retailer, feeding into America’s $890 billion returns problem … Read more

How Lendi revamped the refinance journey for its customers using agentic AI in 16 weeks using Amazon Bedrock

This post was co-written with Davesh Maheshwari from Lendi Group and Samuel Casey from Mantel Group. Most Australians don’t know whether their home loan is still competitive. Rates shift, property values move, personal circumstances change—yet for the average homeowner, staying informed of these changes is difficult. It’s often their largest financial commitment, but it’s also … Read more

How Tines enhances security analysis with Amazon Quick Suite

Organizations face challenges in quickly detecting and responding to user account security events, such as repeated login attempts from unusual locations. Although security data exists across multiple applications, manually correlating information and making corrective actions often delays effective response. With Amazon Quick Suite and Tines, you can automate the investigation and remediation process by integrating … Read more

Building specialized AI without sacrificing intelligence: Nova Forge data mixing in action

Large language models (LLMs) perform well on general tasks but struggle with specialized work that requires understanding proprietary data, internal processes, and industry-specific terminology. Supervised fine-tuning (SFT) adapts LLMs to these organizational contexts. SFT can be implemented through two distinct methodologies: Parameter-Efficient Fine-Tuning (PEFT), which updates only a subset of model parameters, offering faster training … Read more

Build a serverless conversational AI agent using Claude with LangGraph and managed MLflow on Amazon SageMaker AI

Customer service teams face a persistent challenge. Existing chat-based assistants frustrate users with rigid responses, while direct large language model (LLM) implementations lack the structure needed for reliable business operations. When customers need help with order inquiries, cancellations, or status updates, traditional approaches either fail to understand natural language or can’t maintain context across multistep … Read more

Build safe generative AI applications like a Pro: Best Practices with Amazon Bedrock Guardrails

Are you struggling to balance generative AI safety with accuracy, performance, and costs? Many organizations face this challenge when deploying generative AI applications to production. A guardrail that’s too strict blocks legitimate user requests, which frustrates customers. One that’s too lenient exposes your application to harmful content, prompt attacks, or unintended data exposure. Finding the … Read more

Learnings from COBOL modernization in the real world

There’s a lot of excitement right now about AI enabling mainframe application modernization. Boards are paying attention. CIOs are getting asked for a plan. AI is a genuine accelerator for COBOL modernization but to get results, AI needs additional context that source code alone can’t provide.Here’s what we’ve learned working with 400+ enterprise customers: mainframe … Read more

Reinforcement fine-tuning for Amazon Nova: Teaching AI through feedback

Foundation models deliver impressive out-of-the-box performance for general tasks, but many organizations need models to consume their business knowledge. Model customization helps you bridge the gap between general-purpose AI and your specific business needs when building applications that require domain-specific expertise, enforcing communication styles, optimizing for specialized tasks like code generation, financial reasoning, or ensuring … Read more

Large model inference container – latest capabilities and performance enhancements

Modern large language model (LLM) deployments face an escalating cost and performance challenge driven by token count growth. Token count, which is directly related to word count, image size, and other input factors, determines both computational requirements and costs. Longer contexts translate to higher expenses per inference request. This challenge has intensified as frontier models … Read more

Efficiently serve dozens of fine-tuned models with vLLM on Amazon SageMaker AI and Amazon Bedrock

Organizations and individuals running multiple custom AI models, especially recent Mixture of Experts (MoE) model families, can face the challenge of paying for idle GPU capacity when the individual models don’t receive enough traffic to saturate a dedicated compute endpoint. To solve this problem, we have partnered with the vLLM community and developed an efficient … Read more

Building intelligent event agents using Amazon Bedrock AgentCore and Amazon Bedrock Knowledge Bases

Large conferences and events generate overwhelming amounts of information—from hundreds of sessions and workshops to speaker profiles, venue maps, and constantly updating schedules. While basic AI assistants can answer simple questions about event logistics, most fail to deliver the personalized guidance and contextual awareness that attendees need to navigate complex, multi-day conferences effectively. More importantly, … Read more

Build an intelligent photo search using Amazon Rekognition, Amazon Neptune, and Amazon Bedrock

Managing large photo collections presents significant challenges for organizations and individuals. Traditional approaches rely on manual tagging, basic metadata, and folder-based organization, which can become impractical when dealing with thousands of images containing multiple people and complex relationships. Intelligent photo search systems address these challenges by combining computer vision, graph databases, and natural language processing … Read more

Build an intelligent photo search using Amazon Rekognition, Amazon Neptune, and Amazon Bedrock

Managing large photo collections presents significant challenges for organizations and individuals. Traditional approaches rely on manual tagging, basic metadata, and folder-based organization, which can become impractical when dealing with thousands of images containing multiple people and complex relationships. Intelligent photo search systems address these challenges by combining computer vision, graph databases, and natural language processing … Read more

Train CodeFu-7B with veRL and Ray on Amazon SageMaker Training jobs

The rapid advancement of artificial intelligence (AI) has created unprecedented demand for specialized models capable of complex reasoning tasks, particularly in competitive programming where models must generate functional code through algorithmic reasoning rather than pattern memorization. Reinforcement learning (RL) enables models to learn through trial and error by receiving rewards based on actual code execution, … Read more

Generate structured output from LLMs with Dottxt Outlines in AWS

This post is cowritten with Remi Louf, CEO and technical founder of Dottxt. Structured output in AI applications refers to AI-generated responses conforming to formats that are predefined, validated, and often strictly entered. This can include the schema for the output, or ways specific fields in the output should be mapped. Structured outputs are essential … Read more

Global cross-Region inference for latest Anthropic Claude Opus, Sonnet and Haiku models on Amazon Bedrock in Thailand, Malaysia, Singapore, Indonesia, and Taiwan

Organizations across in Thailand, Malaysia, Singapore, Indonesia, and Taiwan can now access Anthropic Claude Opus 4.6, Sonnet 4.6, and Claude Haiku 4.5 through Global cross-Region inference (CRIS) on Amazon Bedrock—delivering foundation models through a globally distributed inference architecture designed for scale. Global CRIS offers three key advantages: higher quotas, cost efficiency, and intelligent request routing … Read more

Introducing Amazon Bedrock global cross-Region inference for Anthropic’s Claude models in the Middle East Regions (UAE and Bahrain)

We’re excited to announce the availability of Anthropic’s Claude Opus 4.6, Claude Sonnet 4.6, Claude Opus 4.5, Claude Sonnet 4.5, and Claude Haiku 4.5 through Amazon Bedrock global cross-Region inference for customers operating in the Middle East. This launch supports organizations in the Middle East to access Anthropic’s latest Claude models on Amazon Bedrock while … Read more

Scaling data annotation using vision-language models to power physical AI systems

Critical labor shortages are constraining growth across manufacturing, logistics, construction, and agriculture. The problem is particularly acute in construction: nearly 500,000 positions remain unfilled in the United States, with 40% of the current workforce approaching retirement within the decade. These workforce limitations result in delayed projects, escalating costs, and deferred development plans. To address these … Read more

How Sonrai uses Amazon SageMaker AI to accelerate precision medicine trials

In precision medicine, researchers developing diagnostic tests for early disease detection face a critical challenge: datasets containing thousands of potential biomarkers but only hundreds of patient samples. This curse of dimensionality can determine the success or failure of breakthrough discoveries. Modern bioinformatics use multiple omic modalities—genomics, lipidomics, proteomics, and metabolomics—to develop early disease detection tests. … Read more

Accelerating AI model production at Hexagon with Amazon SageMaker HyperPod

This blog post was co-authored with Johannes Maunz, Tobias Bösch Borgards, Aleksander Cisłak, and Bartłomiej Gralewicz from Hexagon. Hexagon is the global leader in measurement technologies and provides the confidence that vital industries rely on to build, navigate, and innovate. From microns to Mars, Hexagon’s solutions drive productivity, quality, safety, and sustainability across aerospace, agriculture, … Read more

Agentic AI with multi-model framework using Hugging Face smolagents on AWS

This post is cowritten by Jeff Boudier, Simon Pagezy, and Florent Gbelidji from Hugging Face. Agentic AI systems represent an evolution from conversational AI to autonomous agents capable of complex reasoning, tool usage, and code execution. Enterprise applications benefit from strategic deployment approaches tailored to specific needs. These needs include managed endpoints, which deliver auto-scaling capabilities, foundation … Read more

Amazon SageMaker AI in 2025, a year in review part 1: Flexible Training Plans and improvements to price performance for inference workloads

In 2025, Amazon SageMaker AI saw dramatic improvements to core infrastructure offerings along four dimensions: capacity, price performance, observability, and usability. In this series of posts, we discuss these various improvements and their benefits. In Part 1, we discuss capacity improvements with the launch of Flexible Training Plans. We also describe improvements to price performance … Read more

Amazon SageMaker AI in 2025, a year in review part 2: Improved observability and enhanced features for SageMaker AI model customization and hosting

In 2025, Amazon SageMaker AI made several improvements designed to help you train, tune, and host generative AI workloads. In Part 1 of this series, we discussed Flexible Training Plans and price performance improvements made to inference components. In this post, we discuss enhancements made to observability, model customization, and model hosting. These improvements facilitate … Read more

Integrate external tools with Amazon Quick Agents using Model Context Protocol (MCP)

Amazon Quick supports Model Context Protocol (MCP) integrations for action execution, data access, and AI agent integration. You can expose your application’s capabilities as MCP tools by hosting your own MCP server and configuring an MCP integration in Amazon Quick. Amazon Quick acts as an MCP client and connects to your MCP server endpoint to access … Read more