Hosting NVIDIA speech NIM models on Amazon SageMaker AI: Parakeet ASR

This post was written with NVIDIA and the authors would like to thank Adi Margolin, Eliuth Triana, and Maryam Motamedi for their collaboration. Organizations today face the challenge of processing large volumes of audio data–from customer calls and meeting recordings to podcasts and voice messages–to unlock valuable insights. Automatic Speech Recognition (ASR) is a critical … Read more

Responsible AI design in healthcare and life sciences

Generative AI has emerged as a transformative technology in healthcare, driving digital transformation in essential areas such as patient engagement and care management. It has shown potential to revolutionize how clinicians provide improved care through automated systems with diagnostic support tools that provide timely, personalized suggestions, ultimately leading to better health outcomes. For example, a … Read more

Beyond pilots: A proven framework for scaling AI to production

The era of perpetual AI pilots is over. This year, 65% of AWS Generative AI Innovation Center customer projects moved from concept to production—some launching in just 45 days, as AWS VP Swami Sivasubramanian shared on LinkedIn. These results come from insights gained across more than one thousand customer implementations. The Generative AI Innovation Center … Read more

Generate Gremlin queries using Amazon Bedrock models

Graph databases have revolutionized how organizations manage complex, interconnected data. However, specialized query languages such as Gremlin often create a barrier for teams looking to extract insights efficiently. Unlike traditional relational databases with well-defined schemas, graph databases lack a centralized schema, requiring deep technical expertise for effective querying. To address this challenge, we explore an … Read more

Incorporating responsible AI into generative AI project prioritization

Over the past two years, companies have seen an increasing need to develop a project prioritization methodology for generative AI. There is no shortage of generative AI use cases to consider. Rather, companies want to evaluate the business value against the cost, level of effort, and other concerns, for a large number of potential generative … Read more

Build scalable creative solutions for product teams with Amazon Bedrock

Creative teams and product developers are constantly seeking ways to streamline their workflows and reduce time to market while maintaining quality and brand consistency. This post demonstrates how to use AWS services, particularly Amazon Bedrock, to transform your creative processes through generative AI. You can implement a secure, scalable solution that accelerates your creative workflow, … Read more

Build a proactive AI cost management system for Amazon Bedrock – Part 2

In Part 1 of our series, we introduced a proactive cost management solution for Amazon Bedrock, featuring a robust cost sentry mechanism designed to enforce real-time token usage limits. We explored the core architecture, token tracking strategies, and initial budget enforcement techniques that help organizations control their generative AI expenses. Building upon that foundation, this … Read more

Build a proactive AI cost management system for Amazon Bedrock – Part 1

As organizations embrace generative AI powered by Amazon Bedrock, they face the challenge of managing costs associated with the token-based pricing model. Amazon Bedrock offers a pay-as-you-go pricing structure that can potentially lead to unexpected and excessive bills if usage is not carefully monitored. Traditional methods of cost monitoring, such as budget alerts and cost … Read more

Streamline code migration using Amazon Nova Premier with an agentic workflow

Many enterprises are burdened with mission-critical systems built on outdated technologies that have become increasingly difficult to maintain and extend. This post demonstrates how you can use the Amazon Bedrock Converse API with Amazon Nova Premier within an agentic workflow to systematically migrate legacy C code to modern Java/Spring framework applications. By breaking down the … Read more

Metagenomi generates millions of novel enzymes cost-effectively using AWS Inferentia

This post was written with Audra Devoto, Owen Janson, and Christopher Brown of Metagenomi, and Adam Perry of Tennex. A promising strategy to augment the extensive natural diversity of high value enzymes is to use generative AI, specifically protein language models (pLMs), trained on known enzymes to create orders of magnitude more predicted examples of … Read more

Serverless deployment for your Amazon SageMaker Canvas models

Deploying machine learning (ML) models into production can often be a complex and resource-intensive task, especially for customers without deep ML and DevOps expertise. Amazon SageMaker Canvas simplifies model building by offering a no-code interface, so you can create highly accurate ML models using your existing data sources and without writing a single line of … Read more

Building a multi-agent voice assistant with Amazon Nova Sonic and Amazon Bedrock AgentCore

Amazon Nova Sonic is a foundation model that creates natural, human-like speech-to-speech conversations for generative AI applications, allowing users to interact with AI through voice in real-time, with capabilities for understanding tone, enabling natural flow, and performing actions. Multi-agent architecture offers a modular, robust, and scalable design pattern for production-level voice assistants. This blog post … Read more

Accelerate large-scale AI training with Amazon SageMaker HyperPod training operator 

Large-scale AI model training faces significant challenges with failure recovery and monitoring. Traditional training requires complete job restarts when even a single training process fails, resulting in additional downtime and increased costs. As training clusters expand, identifying and resolving critical issues like stalled GPUs and numerical instabilities typically requires complex custom monitoring code. With Amazon SageMaker … Read more

How TP ICAP transformed CRM data into real-time insights with Amazon Bedrock

This post is co-written with Ross Ashworth at TP ICAP. The ability to quickly extract insights from customer relationship management systems (CRMs) and vast amounts of meeting notes can mean the difference between seizing opportunities and missing them entirely. TP ICAP faced this challenge, having thousands of vendor meeting records stored in their CRM. Using … Read more

Principal Financial Group accelerates build, test, and deployment of Amazon Lex V2 bots through automation

This guest post was written by Mulay Ahmed and Caroline Lima-Lane of Principal Financial Group. The content and opinions in this post are those of the third-party authors and AWS is not responsible for the content or accuracy of this post. With US contact centers that handle millions of customer calls annually, Principal Financial Group® … Read more

Beyond vibes: How to properly select the right LLM for the right task

Choosing the right large language model (LLM) for your use case is becoming both increasingly challenging and essential. Many teams rely on one-time (ad hoc) evaluations based on limited samples from trending models, essentially judging quality on “vibes” alone. This approach involves experimenting with a model’s responses and forming subjective opinions about its performance. However, … Read more

Splash Music transforms music generation using AWS Trainium and Amazon SageMaker HyperPod

Generative AI is rapidly reshaping the music industry, empowering creators—regardless of skill—to create studio-quality tracks with foundation models (FMs) that personalize compositions in real time. As demand for unique, instantly generated content grows and creators seek smarter, faster tools, Splash Music collaborated with AWS to develop and scale music generation FMs, making professional music creation … Read more

Iterative fine-tuning on Amazon Bedrock for strategic model improvement

Organizations often face challenges when implementing single-shot fine-tuning approaches for their generative AI models. The single-shot fine-tuning method involves selecting training data, configuring hyperparameters, and hoping the results meet expectations without the ability to make incremental adjustments. Single-shot fine-tuning frequently leads to suboptimal results and requires starting the entire process from scratch when improvements are … Read more

Voice AI-powered drive-thru ordering with Amazon Nova Sonic and dynamic menu displays

Artificial Intelligence (AI) is transforming the quick-service restaurant industry, particularly in drive-thru operations where efficiency and customer satisfaction intersect. Traditional systems create significant obstacles in service delivery, from staffing limitations and order accuracy issues to inconsistent customer experiences across locations. These challenges, combined with rising labor costs and demand fluctuations, have pushed the industry to … Read more

Optimizing document AI and structured outputs by fine-tuning Amazon Nova Models and on-demand inference

Multimodal fine-tuning represents a powerful approach for customizing vision large language models (LLMs) to excel at specific tasks that involve both visual and textual information. Although base multimodal models offer impressive general capabilities, they often fall short when faced with specialized visual tasks, domain-specific content, or output formatting requirements. Fine-tuning addresses these limitations by adapting … Read more

Transforming enterprise operations: Four high-impact use cases with Amazon Nova

Since the launch of Amazon Nova at AWS re:Invent 2024, we have seen adoption trends across industries, with notable gains in operational efficiency, compliance, and customer satisfaction. With its capabilities in secure, multimodal AI and domain customization, Nova is enhancing workflows and enabling cost efficiencies across core use cases. In this post, we share four … Read more

Building smarter AI agents: AgentCore long-term memory deep dive

Building AI agents that remember user interactions requires more than just storing raw conversations. While Amazon Bedrock AgentCore short-term memory captures immediate context, the real challenge lies in transforming these interactions into persistent, actionable knowledge that spans across sessions. This is the information that transforms fleeting interactions into meaningful, continuous relationships between users and AI … Read more

Configure and verify a distributed training cluster with AWS Deep Learning Containers on Amazon EKS

Training state-of-the-art large language models (LLMs) demands massive, distributed compute infrastructure. Meta’s Llama 3, for instance, ran on 16,000 NVIDIA H100 GPUs for over 30.84 million GPU hours. Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that simplifies the deployment, management, and scaling of Kubernetes clusters that can scale up to the ranges … Read more

Scala development in Amazon SageMaker Studio with Almond kernel

Scala stands out as a versatile programming language that combines object-oriented and functional programming approaches. By running on the Java Virtual Machine (JVM), it maintains seamless compatibility with Java libraries while offering a concise and scalable development experience. The language has distinguished itself in the realm of distributed computing and big data processing, with the … Read more

Build a device management agent with Amazon Bedrock AgentCore

The proliferation of Internet of Things (IoT) devices has transformed how we interact with our environments, from homes to industrial settings. However, as the number of connected devices grows, so does the complexity of managing them. Traditional device management interfaces often require navigating through multiple applications, each with its own UI and learning curve. This … Read more

How Amazon Bedrock Custom Model Import streamlined LLM deployment for Salesforce

This post is cowritten by Salesforce’s AI Platform team members Srikanta Prasad, Utkarsh Arora, Raghav Tanaji, Nitin Surya, Gokulakrishnan Gopalakrishnan, and Akhilesh Deepak Gotmare. Salesforce’s Artificial Intelligence (AI) platform team runs customized large language models (LLMs)—fine-tuned versions of Llama, Qwen, and Mistral—for agentic AI applications like Agentforce. Deploying these models creates operational overheads: teams spend … Read more

Transforming the physical world with AI: the next frontier in intelligent automation 

The convergence of artificial intelligence with physical systems marks a pivotal moment in technological evolution. Physical AI, where algorithms transcend digital boundaries to perceive, understand, and manipulate the tangible world, will fundamentally transform how enterprises operate across industries. These intelligent systems bridge the gap between digital intelligence and physical reality, unlocking unprecedented opportunities for efficiency … Read more

Medical reports analysis dashboard using Amazon Bedrock, LangChain, and Streamlit

In healthcare, the ability to quickly analyze and interpret medical reports is crucial for both healthcare providers and patients. While medical reports contain valuable information, they often remain underutilized due to their complex nature and the time-intensive process of analysis. This complexity manifests in several ways: the interpretation of multiple parameters and their relationships (such … Read more

Kitsa transforms clinical trial site selection with Amazon Quick Automate

This post was written with Ajay Nyamati from Kitsa. The clinical trial industry conducts medical research studies to evaluate the safety, efficacy, and effectiveness of new drugs, treatments, or medical devices before they reach the market. The industry is a cornerstone of medical innovation, yet it continues to face a fundamental bottleneck: selection of the … Read more

Connect Amazon Quick Suite to enterprise apps and agents with MCP

Organizations need solutions for people and AI agents to securely collaborate through a single interface to the organization’s data and take actions across enterprise applications to improve productivity. The ability of an AI agent to securely and seamlessly connect with organizational knowledge bases, enterprise applications, and other AI agents is foundational to drive adoption and … Read more