Accelerating LLM inference with post-training weight and activation using AWQ and GPTQ on Amazon SageMaker AI

Foundation models (FMs) and large language models (LLMs) have been rapidly scaling, often doubling in parameter count within months, leading to significant improvements in language understanding and generative capabilities. This rapid growth comes with steep costs: inference now requires enormous memory capacity, high-performance GPUs, and substantial energy consumption. This trend is evident in the open … Read more

How Beekeeper optimized user personalization with Amazon Bedrock

This post is cowritten by Mike Koźmiński from Beekeeper. Large Language Models (LLMs) are evolving rapidly, making it difficult for organizations to select the best model for each specific use case, optimize prompts for quality and cost, adapt to changing model capabilities, and personalize responses for different users. Choosing the “right” LLM and prompt isn’t … Read more

Sentiment Analysis with Text and Audio Using AWS Generative AI Services: Approaches, Challenges, and Solutions

This post is co-written by Instituto de Ciência e Tecnologia Itaú (ICTi) and AWS. Sentiment analysis has grown increasingly important in modern enterprises, providing insights into customer opinions, satisfaction levels, and potential frustrations. As interactions occur largely through text (such as social media, chat applications, and ecommerce reviews) or voice (such as call centers and … Read more

Architecting TrueLook’s AI-powered construction safety system on Amazon SageMaker AI

This post is co-written by TrueLook and AWS. TrueLook is a construction camera and jobsite intelligence company that provides real-time visibility into construction projects. Its platform combines high-resolution time-lapse cameras, live video streaming, and AI-powered insights to help teams monitor progress, improve accountability, and reduce risk across the entire project lifecycle. TrueLook used Amazon SageMaker … Read more

Scaling medical content review at Flo Health using Amazon Bedrock (Part 1)

This blog post is based on work co-developed with Flo Health. Healthcare science is rapidly advancing. Maintaining accurate and up-to-date medical content directly impacts people’s lives, health decisions, and well-being. When someone searches for health information, they are often at their most vulnerable, making accuracy not just important, but potentially life-saving. Flo Health creates thousands … Read more

Detect and redact personally identifiable information using Amazon Bedrock Data Automation and Guardrails

Organizations handle vast amounts of sensitive customer information through various communication channels. Protecting Personally Identifiable Information (PII), such as social security numbers (SSNs), driver’s license numbers, and phone numbers has become increasingly critical for maintaining compliance with data privacy regulations and building customer trust. However, manually reviewing and redacting PII is time-consuming, error-prone, and scales … Read more

Speed meets scale: Load testing SageMakerAI endpoints with Observe.AI’s testing tool

This post is cowritten with Aashraya Sachdeva from Observe.ai. You can use Amazon SageMaker to build, train and deploy machine learning (ML) models, including large language models (LLMs) and other foundation models (FMs). This helps you significantly reduce the time required for a range of generative AI and ML development tasks. An AI/ML development cycle … Read more

Migrate MLflow tracking servers to Amazon SageMaker AI with serverless MLflow

Operating a self-managed MLflow tracking server comes with administrative overhead, including server maintenance and resource scaling. As teams scale their ML experimentation, efficiently managing resources during peak usage and idle periods is a challenge. Organizations running MLflow on Amazon EC2 or on-premises can optimize costs and engineering resources by using Amazon SageMaker AI with serverless … Read more

Build an AI-powered website assistant with Amazon Bedrock

Businesses face a growing challenge: customers need answers fast, but support teams are overwhelmed. Support documentation like product manuals and knowledge base articles typically require users to search through hundreds of pages, and support agents often run 20–30 customer queries per day to locate specific information. This post demonstrates how to solve this challenge by … Read more

Programmatically creating an IDP solution with Amazon Bedrock Data Automation

Intelligent Document Processing (IDP) transforms how organizations handle unstructured document data, enabling automatic extraction of valuable information from invoices, contracts, and reports. Today, we explore how to programmatically create an IDP solution that uses Strands SDK, Amazon Bedrock AgentCore, Amazon Bedrock Knowledge Base, and Bedrock Data Automation (BDA). This solution is provided through a Jupyter notebook that enables users … Read more

AI agent-driven browser automation for enterprise workflow management

Enterprise organizations increasingly rely on web-based applications for critical business processes, yet many workflows remain manually intensive, creating operational inefficiencies and compliance risks. Despite significant technology investments, knowledge workers routinely navigate between eight to twelve different web applications during standard workflows, constantly switching contexts and manually transferring information between systems. Data entry and validation tasks … Read more

Agentic QA automation using Amazon Bedrock AgentCore Browser and Amazon Nova Act

Quality assurance (QA) testing has long been the backbone of software development, but traditional QA approaches haven’t kept pace with modern development cycles and complex UIs. Most organizations still rely on a hybrid approach combining manual testing with script-based automation frameworks like Selenium, Cypress, and Playwright—yet teams spend significant amount of their time maintaining existing … Read more

Optimizing LLM inference on Amazon SageMaker AI with BentoML’s LLM- Optimizer

The rise of powerful large language models (LLMs) that can be consumed via API calls has made it remarkably straightforward to integrate artificial intelligence (AI) capabilities into applications. Yet despite this convenience, a significant number of enterprises are choosing to self-host their own models—accepting the complexity of infrastructure management, the cost of GPUs in the … Read more

AWS AI League: Model customization and agentic showdown

Building intelligent agents to handle complex, real-world tasks can be daunting. Additionally, rather than relying solely on large, pre-trained foundation models, organizations often need to fine-tune and customize smaller, more specialized models to outperform them for their specific use cases. The AWS AI League provides an innovative program to help enterprises overcome the challenges of building … Read more

Accelerate Enterprise AI Development using Weights & Biases and Amazon Bedrock AgentCore

This post is co-written by Thomas Capelle and Ray Strickland from Weights & Biases (W&B). Generative artificial intelligence (AI) adoption is accelerating across enterprises, evolving from simple foundation model interactions to sophisticated agentic workflows. As organizations transition from proof-of-concepts to production deployments, they require robust tools for development, evaluation, and monitoring of AI applications at … Read more

How dLocal automated compliance reviews using Amazon Quick Automate

dLocal, Uruguay’s first unicorn, has established itself as a pioneer in cross-border payments since its founding in 2016. Today, the company operates in over 40 emerging countries, connecting more than two billion consumers with global technology leaders. Operating at this scale requires strict and consistent compliance processes. Each month, thousands of merchant ecommerce websites are … Read more

Advancing ADHD diagnosis: How Qbtech built a mobile AI assessment Model Using Amazon SageMaker AI

This post is cowritten with Dr. Mikkel Hansen from Qbtech. The assessment and diagnosis of attention deficit hyperactive disorder (ADHD) has traditionally relied on clinical observations and behavioral evaluations. While these methods are valuable, the process can be complex and time-intensive. Qbtech, founded in 2002 in Stockholm, Sweden, enhances ADHD diagnosis by integrating objective measurements … Read more

Accelerating your marketing ideation with generative AI – Part 1: From idea to generation with the Amazon Nova foundation models

Marketing teams face increasing pressure to create engaging campaigns quickly while maintaining brand consistency and creative quality. Traditional marketing campaign creation processes often involve multiple iterations between creative teams, stakeholders, and external agencies, leading to extended timelines and increased costs. The advent and availability of generative models (especially image and video generation ones) has opened … Read more

Introducing Visa Intelligent Commerce on AWS: Enabling agentic commerce with Amazon Bedrock AgentCore

This post is cowritten with Sangeetha Bharath and Seemal Zaman from Visa. Across every industry, agentic AI is redefining how work gets done by shifting digital experiences from manual, user-driven interactions to autonomous, outcome-driven workflows. Unlike traditional AI systems that merely answer questions or provide suggestions, agentic AI introduces intelligent agents capable of reasoning, acting, … Read more

Move Beyond Chain-of-Thought with Chain-of-Draft on Amazon Bedrock

As organizations scale their generative AI implementations, the critical challenge of balancing quality, cost, and latency becomes increasingly complex. With inference costs dominating 70–90% of large language model (LLM) operational expenses, and verbose prompting strategies inflating token volume by 3–5x, organizations are actively seeking more efficient approaches to model interaction. Traditional prompting methods, while effective, … Read more

Deploy Mistral AI’s Voxtral on Amazon SageMaker AI

Mistral AI’s Voxtral models combine text and audio processing capabilities in a single framework. The Voxtral family includes two distinct variants designed for different use cases and resource requirements. The Voxtral-Mini-3B-2507 is a compact 3-billion-parameter model optimized for efficient audio transcription and basic multimodal understanding, making it ideal for applications where speed and resource efficiency … Read more

Enhance document analytics with Strands AI Agents for the GenAI IDP Accelerator

Extracting structured information from unstructured data is a critical first step to unlocking business value. Our Generative AI Intelligent Document Processing (GenAI IDP) Accelerator has been at the forefront of this transformation, already having processed tens of millions of documents for hundreds of customers. Although organizations can use intelligent document processing (IDP) solutions to digitize … Read more

Build a multimodal generative AI assistant for root cause diagnosis in predictive maintenance using Amazon Bedrock

Predictive maintenance is a strategy that uses data from equipment sensors and advanced analytics to predict when a machine is likely to fail, ensuring maintenance can be performed proactively to prevent breakdowns. This enables industries to reduce unexpected failures, improve operational efficiency, and extend the lifespan of critical equipment. It is applicable across a wide range of components, … Read more

Introducing SOCI indexing for Amazon SageMaker Studio: Faster container startup times for AI/ML workloads

Today, we are excited to introduce a new feature for SageMaker Studio: SOCI (Seekable Open Container Initiative) indexing. SOCI supports lazy loading of container images, where only the necessary parts of an image are downloaded initially rather than the entire container. SageMaker Studio serves as a web Integrated Development Environment (IDE) for end-to-end machine learning (ML) development, … Read more

Build and deploy scalable AI agents with NVIDIA NeMo, Amazon Bedrock AgentCore, and Strands Agents

This post is co-written with Ranjit Rajan, Abdullahi Olaoye, and Abhishek Sawarkar from NVIDIA. AI’s next frontier isn’t merely smarter chat-based assistants, it’s autonomous agents that reason, plan, and execute across entire systems. But to accomplish this, enterprise developers need to move from prototypes to production-ready AI agents that scale securely. This challenge grows as … Read more

Bi-directional streaming for real-time agent interactions now available in Amazon Bedrock AgentCore Runtime

Building natural voice conversations with AI agents requires complex infrastructure and lots of code from engineering teams. Text-based agent interactions follow a turn-based pattern: a user sends a complete request, waits for the agent to process it, and receives a full response before continuing. Bi-directional streaming removes this constraint by establishing a persistent connection that … Read more

Tracking and managing assets used in AI development with Amazon SageMaker AI 

Building custom foundation models requires coordinating multiple assets across the development lifecycle such as data assets, compute infrastructure, model architecture and frameworks, lineage, and production deployments. Data scientists create and refine training datasets, develop custom evaluators to assess model quality and safety, and iterate through fine-tuning configurations to optimize performance. As these workflows scale across … Read more

Track machine learning experiments with MLflow on Amazon SageMaker using Snowflake integration

A user can conduct machine learning (ML) data experiments in data environments, such as Snowflake, using the Snowpark library. However, tracking these experiments across diverse environments can be challenging due to the difficulty in maintaining a central repository to monitor experiment metadata, parameters, hyperparameters, models, results, and other pertinent information. In this post, we demonstrate … Read more

Governance by design: The essential guide for successful AI scaling

Picture this: Your enterprise has just deployed its first generative AI application. The initial results are promising, but as you plan to scale across departments, critical questions emerge. How will you enforce consistent security, prevent model bias, and maintain control as AI applications multiply? It turns out you’re not alone. A McKinsey survey spanning 750+ … Read more