Mistral-NeMo-Instruct-2407 and Mistral-NeMo-Base-2407 are now available on SageMaker JumpStart

Today, we are excited to announce that Mistral-NeMo-Base-2407 and Mistral-NeMo-Instruct-2407—twelve billion parameter large language models from Mistral AI that excel at text generation—are available for customers through Amazon SageMaker JumpStart. You can try these models with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models that can be deployed with one click for … Read more

Advancing AI trust with new responsible AI tools, capabilities, and resources

As generative AI continues to drive innovation across industries and our daily lives, the need for responsible AI has become increasingly important. At AWS, we believe the long-term success of AI depends on the ability to inspire trust among users, customers, and society. This belief is at the heart of our long-standing commitment to building … Read more

Deploy RAG applications on Amazon SageMaker JumpStart using FAISS

Generative AI has empowered customers with their own information in unprecedented ways, reshaping interactions across various industries by enabling intuitive and personalized experiences. This transformation is significantly enhanced by Retrieval Augmented Generation (RAG), which is a generative AI pattern where the large language model (LLM) being used references a knowledge corpus outside of its training … Read more

Speed up your cluster procurement time with Amazon SageMaker HyperPod training plans

Today, organizations are constantly seeking ways to use advanced large language models (LLMs) for their specific needs. These organizations are engaging in both pre-training and fine-tuning massive LLMs, with parameter counts in the billions. This process aims to enhance model efficacy for a wide array of applications across diverse sectors, including healthcare, financial services, and … Read more

Amazon Bedrock Marketplace now includes NVIDIA models: Introducing NVIDIA Nemotron-4 NIM microservices

This post is co-written with Abhishek Sawarkar, Eliuth Triana, Jiahong Liu and Kshitiz Gupta from NVIDIA.  At AWS re:Invent 2024, we are excited to introduce Amazon Bedrock Marketplace. This a revolutionary new capability within Amazon Bedrock that serves as a centralized hub for discovering, testing, and implementing foundation models (FMs). It provides developers and organizations … Read more

Use Amazon Bedrock tooling with Amazon SageMaker JumpStart models

Today, we’re excited to announce a new capability that allows you to deploy over 100 open-weight and proprietary models from Amazon SageMaker JumpStart and register them with Amazon Bedrock, allowing you to seamlessly access them through the powerful Amazon Bedrock APIs. You can now use Amazon Bedrock features such as Amazon Bedrock Knowledge Bases and … Read more

A guide to Amazon Bedrock Model Distillation (preview)

When using generative AI, achieving high performance with low latency models that are cost-efficient is often a challenge, because these goals can clash with each other. With the newly launched Amazon Bedrock Model Distillation feature, you can use smaller, faster, and cost-efficient models that deliver use-case specific accuracy that is comparable to the largest and … Read more

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

Building generative AI applications presents significant challenges for organizations: they require specialized ML expertise, complex infrastructure management, and careful orchestration of multiple services. To address these challenges, we introduce Amazon Bedrock IDE, an integrated environment for developing and customizing generative AI applications. Formerly known as Amazon Bedrock Studio, Amazon Bedrock IDE is now incorporated into … Read more

Scale ML workflows with Amazon SageMaker Studio and Amazon SageMaker HyperPod

Scaling machine learning (ML) workflows from initial prototypes to large-scale production deployment can be daunting task, but the integration of Amazon SageMaker Studio and Amazon SageMaker HyperPod offers a streamlined solution to this challenge. As teams progress from proof of concept to production-ready models, they often struggle with efficiently managing growing infrastructure and storage needs. … Read more

Introducing Amazon Kendra GenAI Index – Enhanced semantic search and retrieval capabilities

Amazon Kendra is an intelligent enterprise search service that helps you search across different content repositories with built-in connectors. AWS customers use Amazon Kendra with large language models (LLMs) to quickly create secure, generative AI–powered conversational experiences on top of your enterprise content. As enterprises adopt generative AI, many are developing intelligent assistants powered by … Read more

Building Generative AI and ML solutions faster with AI apps from AWS partners using Amazon SageMaker

Organizations of every size and across every industry are looking to use generative AI to fundamentally transform the business landscape with reimagined customer experiences, increased employee productivity, new levels of creativity, and optimized business processes. A recent study by Telecom Advisory Services, a globally recognized research and consulting firm that specializes in economic impact studies, shows that … Read more

Query structured data from Amazon Q Business using Amazon QuickSight integration

Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Although generative AI is fueling transformative innovations, enterprises may still experience sharply divided data silos when it comes to enterprise knowledge, in particular between unstructured content … Read more

Elevate customer experience by using the Amazon Q Business custom plugin for New Relic AI

Digital experience interruptions can harm customer satisfaction and business performance across industries. Application failures, slow load times, and service unavailability can lead to user frustration, decreased engagement, and revenue loss. The risk and impact of outages increase during peak usage periods, which vary by industry—from ecommerce sales events to financial quarter-ends or major product launches. … Read more

Amazon SageMaker launches the updated inference optimization toolkit for generative AI

Today, Amazon SageMaker is excited to announce updates to the inference optimization toolkit, providing new functionality and enhancements to help you optimize generative AI models even faster. These updates build on the capabilities introduced in the original launch of the inference optimization toolkit (to learn more, see Achieve up to ~2x higher throughput while reducing … Read more

Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

This post was written with Zach Marston and Serg Masis from Syngenta. Syngenta and AWS collaborated to develop Cropwise AI, an innovative solution powered by Amazon Bedrock Agents, to accelerate their sales reps’ ability to place Syngenta seed products with growers across North America. Cropwise AI harnesses the power of generative AI using AWS to … Read more

Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker

This post is co-written with Abhishek Sawarkar, Eliuth Triana, Jiahong Liu and Kshitiz Gupta from NVIDIA.  At re:Invent 2024, we are excited to announce new capabilities to speed up your AI inference workloads with NVIDIA accelerated computing and software offerings on Amazon SageMaker. These advancements build upon our collaboration with NVIDIA, which includes adding support … Read more

Unlock cost savings with the new scale down to zero feature in SageMaker Inference

Today at AWS re:Invent 2024, we are excited to announce a new feature for Amazon SageMaker inference endpoints: the ability to scale SageMaker inference endpoints to zero instances. This long-awaited capability is a game changer for our customers using the power of AI and machine learning (ML) inference in the cloud. Previously, SageMaker inference endpoints … Read more

Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference

Today at AWS re:Invent 2024, we are excited to announce the new Container Caching capability in Amazon SageMaker, which significantly reduces the time required to scale generative AI  models for inference. This innovation allows you to scale your models faster, observing up to 56% reduction in latency when scaling a new model copy and up … Read more

Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – part 1

The generative AI landscape has been rapidly evolving, with large language models (LLMs) at the forefront of this transformation. These models have grown exponentially in size and complexity, with some now containing hundreds of billions of parameters and requiring hundreds of gigabytes of memory. As LLMs continue to expand, AI engineers face increasing challenges in … Read more

Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – Part 2

In Part 1 of this series, we introduced Amazon SageMaker Fast Model Loader, a new capability in Amazon SageMaker that significantly reduces the time required to deploy and scale large language models (LLMs) for inference. We discussed how this innovation addresses one of the major bottlenecks in LLM deployment: the time required to load massive models … Read more

Fast and accurate zero-shot forecasting with Chronos-Bolt and AutoGluon

Chronos-Bolt is the newest addition to AutoGluon-TimeSeries, delivering accurate zero-shot forecasting up to 250 times faster than the original Chronos models [1]. Time series forecasting plays a vital role in guiding key business decisions across industries such as retail, energy, finance, and healthcare. Traditionally, forecasting has relied on statistical models [2] like ETS and ARIMA, … Read more

How Amazon Finance Automation built a generative AI Q&A chat assistant using Amazon Bedrock

Today, the Accounts Payable (AP) and Accounts Receivable (AR) analysts in Amazon Finance operations receive queries from customers through email, cases, internal tools, or phone. When a query arises, analysts must engage in a time-consuming process of reaching out to subject matter experts (SMEs) and go through multiple policy documents containing standard operating procedures (SOPs) … Read more

Cohere Rerank 3.5 is now available in Amazon Bedrock through Rerank API

We are excited to announce the availability of Cohere’s advanced reranking model Rerank 3.5 through our new Rerank API in Amazon Bedrock. This powerful reranking model enables AWS customers to significantly improve their search relevance and content ranking capabilities. This model is also available for Amazon Bedrock Knowledge Base users. By incorporating Cohere’s Rerank 3.5 … Read more

Easily deploy and manage hundreds of LoRA adapters with SageMaker efficient multi-adapter inference

The new efficient multi-adapter inference feature of Amazon SageMaker unlocks exciting possibilities for customers using fine-tuned models. This capability integrates with SageMaker inference components to allow you to deploy and manage hundreds of fine-tuned Low-Rank Adaptation (LoRA) adapters through SageMaker APIs. Multi-adapter inference handles the registration of fine-tuned adapters with a base model and dynamically … Read more

Improve the performance of your Generative AI applications with Prompt Optimization on Amazon Bedrock

Prompt engineering refers to the practice of writing instructions to get the desired responses from foundation models (FMs). You might have to spend months experimenting and iterating on your prompts, following the best practices for each model, to achieve your desired output. Furthermore, these prompts are specific to a model and task, and performance isn’t … Read more

Search enterprise data assets using LLMs backed by knowledge graphs

Enterprises are facing challenges in accessing their data assets scattered across various sources because of increasing complexities in managing vast amount of data. Traditional search methods often fail to provide comprehensive and contextual results, particularly for unstructured data or complex queries. Search solutions in modern big data management must facilitate efficient and accurate search of … Read more

Embodied AI Chess with Amazon Bedrock

Generative AI continues to transform numerous industries and activities, with one such application being the enhancement of chess, a traditional human game, with sophisticated AI and large language models (LLMs). Using the Custom Model Import feature in Amazon Bedrock, you can now create engaging matches between foundation models (FMs) fine-tuned for chess gameplay, combining classical … Read more

Efficiently train models with large sequence lengths using Amazon SageMaker model parallel

Large language models (LLMs) have witnessed an unprecedented surge in popularity, with customers increasingly using publicly available models such as Llama, Stable Diffusion, and Mistral. Across diverse industries—including healthcare, finance, and marketing—organizations are now engaged in pre-training and fine-tuning these increasingly larger LLMs, which often boast billions of parameters and larger input sequence length. Although … Read more