Machine Learning – Page 22

Unlock cost savings with the new scale down to zero feature in SageMaker Inference

December 3, 2024 by

Today at AWS re:Invent 2024, we are excited to announce a new feature for Amazon SageMaker inference endpoints: the ability to scale SageMaker inference endpoints to zero instances. This long-awaited capability is a game changer for our customers using the power of AI and machine learning (ML) inference in the cloud. Previously, SageMaker inference endpoints … Read more

Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference

December 3, 2024 by

Today at AWS re:Invent 2024, we are excited to announce the new Container Caching capability in Amazon SageMaker, which significantly reduces the time required to scale generative AI models for inference. This innovation allows you to scale your models faster, observing up to 56% reduction in latency when scaling a new model copy and up … Read more

Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – part 1

December 3, 2024 by

The generative AI landscape has been rapidly evolving, with large language models (LLMs) at the forefront of this transformation. These models have grown exponentially in size and complexity, with some now containing hundreds of billions of parameters and requiring hundreds of gigabytes of memory. As LLMs continue to expand, AI engineers face increasing challenges in … Read more

Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – Part 2

December 3, 2024 by

In Part 1 of this series, we introduced Amazon SageMaker Fast Model Loader, a new capability in Amazon SageMaker that significantly reduces the time required to deploy and scale large language models (LLMs) for inference. We discussed how this innovation addresses one of the major bottlenecks in LLM deployment: the time required to load massive models … Read more

Fast and accurate zero-shot forecasting with Chronos-Bolt and AutoGluon

December 2, 2024 by

Chronos-Bolt is the newest addition to AutoGluon-TimeSeries, delivering accurate zero-shot forecasting up to 250 times faster than the original Chronos models [1]. Time series forecasting plays a vital role in guiding key business decisions across industries such as retail, energy, finance, and healthcare. Traditionally, forecasting has relied on statistical models [2] like ETS and ARIMA, … Read more

How Amazon Finance Automation built a generative AI Q&A chat assistant using Amazon Bedrock

December 2, 2024 by

Today, the Accounts Payable (AP) and Accounts Receivable (AR) analysts in Amazon Finance operations receive queries from customers through email, cases, internal tools, or phone. When a query arises, analysts must engage in a time-consuming process of reaching out to subject matter experts (SMEs) and go through multiple policy documents containing standard operating procedures (SOPs) … Read more

Cohere Rerank 3.5 is now available in Amazon Bedrock through Rerank API

December 1, 2024 by

We are excited to announce the availability of Cohere’s advanced reranking model Rerank 3.5 through our new Rerank API in Amazon Bedrock. This powerful reranking model enables AWS customers to significantly improve their search relevance and content ranking capabilities. This model is also available for Amazon Bedrock Knowledge Base users. By incorporating Cohere’s Rerank 3.5 … Read more

AWS DeepRacer: How to master physical racing?

December 1, 2024 by

As developers gear up for re:Invent 2024, they again face the unique challenges of physical racing. What are the obstacles? Let’s have a look. In this blog post, I will look at what makes physical AWS DeepRacer racing—a real car on a real track—different to racing in the virtual world—a model in a simulated 3D … Read more

Easily deploy and manage hundreds of LoRA adapters with SageMaker efficient multi-adapter inference

November 29, 2024 by

The new efficient multi-adapter inference feature of Amazon SageMaker unlocks exciting possibilities for customers using fine-tuned models. This capability integrates with SageMaker inference components to allow you to deploy and manage hundreds of fine-tuned Low-Rank Adaptation (LoRA) adapters through SageMaker APIs. Multi-adapter inference handles the registration of fine-tuned adapters with a base model and dynamically … Read more

Improve the performance of your Generative AI applications with Prompt Optimization on Amazon Bedrock

November 29, 2024 by

Prompt engineering refers to the practice of writing instructions to get the desired responses from foundation models (FMs). You might have to spend months experimenting and iterating on your prompts, following the best practices for each model, to achieve your desired output. Furthermore, these prompts are specific to a model and task, and performance isn’t … Read more

Search enterprise data assets using LLMs backed by knowledge graphs

November 27, 2024 by

Enterprises are facing challenges in accessing their data assets scattered across various sources because of increasing complexities in managing vast amount of data. Traditional search methods often fail to provide comprehensive and contextual results, particularly for unstructured data or complex queries. Search solutions in modern big data management must facilitate efficient and accurate search of … Read more

Embodied AI Chess with Amazon Bedrock

November 27, 2024 by

Generative AI continues to transform numerous industries and activities, with one such application being the enhancement of chess, a traditional human game, with sophisticated AI and large language models (LLMs). Using the Custom Model Import feature in Amazon Bedrock, you can now create engaging matches between foundation models (FMs) fine-tuned for chess gameplay, combining classical … Read more

Efficiently train models with large sequence lengths using Amazon SageMaker model parallel

November 27, 2024 by

Large language models (LLMs) have witnessed an unprecedented surge in popularity, with customers increasingly using publicly available models such as Llama, Stable Diffusion, and Mistral. Across diverse industries—including healthcare, finance, and marketing—organizations are now engaged in pre-training and fine-tuning these increasingly larger LLMs, which often boast billions of parameters and larger input sequence length. Although … Read more

Getting started with Amazon Bedrock Agents custom orchestrator

November 27, 2024 by

Generative AI agents are designed to interact with their environment to achieve specific objectives, such as automating repetitive tasks and augmenting human capabilities. By orchestrating multistep workflows that adapt to evolving goals in real time, these agents increase productivity, reduce errors, and deliver more personalized experiences. To manage these complex workflows effectively, agents rely on … Read more

Use Amazon Bedrock Agents for code scanning, optimization, and remediation

November 27, 2024 by

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that best suits your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize … Read more

Create a generative AI assistant with Slack and Amazon Bedrock

November 27, 2024 by

Seamless integration of customer experience, collaboration tools, and relevant data is the foundation for delivering knowledge-based productivity gains. In this post, we show you how to integrate the popular Slack messaging service with AWS generative AI services to build a natural language assistant where business users can ask questions of an unstructured dataset. To demonstrate, … Read more

Unleash your Salesforce data using the Amazon Q Salesforce Online connector

November 26, 2024 by

Thousands of companies worldwide use Salesforce to manage their sales, marketing, customer service, and other business operations. The Salesforce cloud-based platform centralizes customer information and interactions across the organization, providing sales reps, marketers, and support agents with a unified 360-degree view of each customer. With Salesforce at the heart of their business, companies accumulate vast … Read more

Reducing hallucinations in large language models with custom intervention using Amazon Bedrock Agents

November 26, 2024 by

Hallucinations in large language models (LLMs) refer to the phenomenon where the LLM generates an output that is plausible but factually incorrect or made-up. This can occur when the model’s training data lacks the necessary information or when the model attempts to generate coherent responses by making logical inferences beyond its actual knowledge. Hallucinations arise … Read more

Deploy Meta Llama 3.1-8B on AWS Inferentia using Amazon EKS and vLLM

November 26, 2024 by

With the rise of large language models (LLMs) like Meta Llama 3.1, there is an increasing need for scalable, reliable, and cost-effective solutions to deploy and serve these models. AWS Trainium and AWS Inferentia based instances, combined with Amazon Elastic Kubernetes Service (Amazon EKS), provide a performant and low cost framework to run LLMs efficiently … Read more

Serving LLMs using vLLM and Amazon EC2 instances with AWS AI chips

November 26, 2024 by

The use of large language models (LLMs) and generative AI has exploded over the last year. With the release of powerful publicly available foundation models, tools for training, fine tuning and hosting your own LLM have also become democratized. Using vLLM on AWS Trainium and Inferentia makes it possible to host LLMs for high performance … Read more

Using LLMs to fortify cyber defenses: Sophos’s insight on strategies for using LLMs with Amazon Bedrock and Amazon SageMaker

November 26, 2024 by

This post is co-written with Adarsh Kyadige and Salma Taoufiq from Sophos. As a leader in cutting-edge cybersecurity, Sophos is dedicated to safeguarding over 500,000 organizations and millions of customers across more than 150 countries. By harnessing the power of threat intelligence, machine learning (ML), and artificial intelligence (AI), Sophos delivers a comprehensive range of … Read more

Enhanced observability for AWS Trainium and AWS Inferentia with Datadog

November 26, 2024 by

This post is co-written with Curtis Maher and Anjali Thatte from Datadog. This post walks you through Datadog’s new integration with AWS Neuron, which helps you monitor your AWS Trainium and AWS Inferentia instances by providing deep observability into resource utilization, model execution performance, latency, and real-time infrastructure health, enabling you to optimize machine learning … Read more

Create a virtual stock technical analyst using Amazon Bedrock Agents

November 26, 2024 by

Stock technical analysis questions can be as unique as the individual stock analyst themselves. Queries often have multiple technical indicators like Simple Moving Average (SMA), Exponential Moving Average (EMA), Relative Strength Index (RSI), and others. Answering these varied questions would mean writing complex business logic to unpack the query into parts and fetching the necessary … Read more

Apply Amazon SageMaker Studio lifecycle configurations using AWS CDK

November 26, 2024 by

This post serves as a step-by-step guide on how to set up lifecycle configurations for your Amazon SageMaker Studio domains. With lifecycle configurations, system administrators can apply automated controls to their SageMaker Studio domains and their users. We cover core concepts of SageMaker Studio and provide code examples of how to apply lifecycle configuration to … Read more

Build a read-through semantic cache with Amazon OpenSearch Serverless and Amazon Bedrock

November 26, 2024 by

In the field of generative AI, latency and cost pose significant challenges. The commonly used large language models (LLMs) often process text sequentially, predicting one token at a time in an autoregressive manner. This approach can introduce delays, resulting in less-than-ideal user experiences. Additionally, the growing demand for AI-powered applications has led to a high … Read more

Rad AI reduces real-time inference latency by 50% using Amazon SageMaker

November 26, 2024 by

This post is co-written with Ken Kao and Hasan Ali Demirci from Rad AI. Rad AI has reshaped radiology reporting, developing solutions that streamline the most tedious and repetitive tasks, and saving radiologists’ time. Since 2018, using state-of-the-art proprietary and open source large language models (LLMs), our flagship product—Rad AI Impressions— has significantly reduced the … Read more

Read graphs, diagrams, tables, and scanned pages using multimodal prompts in Amazon Bedrock

November 26, 2024 by

Large language models (LLMs) have come a long way from being able to read only text to now being able to read and understand graphs, diagrams, tables, and images. In this post, we discuss how to use LLMs from Amazon Bedrock to not only extract text, but also understand information available in images. Amazon Bedrock … Read more

How Crexi achieved ML models deployment on AWS at scale and boosted efficiency

November 26, 2024 by

This post is co-written with Isaac Smothers and James Healy-Mirkovich from Crexi. With the current demand for AI and machine learning (AI/ML) solutions, the processes to train and deploy models and scale inference are crucial to business success. Even though AI/ML and especially generative AI progress is rapid, machine learning operations (MLOps) tooling is continuously … Read more

Deploy Meta Llama 3.1 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium

November 26, 2024 by

We’re excited to announce the availability of Meta Llama 3.1 8B and 70B inference support on AWS Trainium and AWS Inferentia instances in Amazon SageMaker JumpStart. Meta Llama 3.1 multilingual large language models (LLMs) are a collection of pre-trained and instruction tuned generative models. Trainium and Inferentia, enabled by the AWS Neuron software development kit … Read more

AWS achieves ISO/IEC 42001:2023 Artificial Intelligence Management System accredited certification

November 26, 2024 by

Amazon Web Services (AWS) is excited to be the first major cloud service provider to announce ISO/IEC 42001 accredited certification for AI services, covering: Amazon Bedrock, Amazon Q Business, Amazon Textract, and Amazon Transcribe. ISO/IEC 42001 is an international management system standard that outlines requirements and controls for organizations to promote the responsible development and use … Read more