How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

This post is co-written by Hesham Fahim from Thomson Reuters. Thomson Reuters (TR) is one of the world’s most trusted information organizations for businesses and professionals. It provides companies with the intelligence, technology, and human expertise they need to find trusted answers, enabling them to make better decisions more quickly. TR’s customers span across the … Read more

Connecting Amazon Redshift and RStudio on Amazon SageMaker

Last year, we announced the general availability of RStudio on Amazon SageMaker, the industry’s first fully managed RStudio Workbench integrated development environment (IDE) in the cloud. You can quickly launch the familiar RStudio IDE and dial up and down the underlying compute resources without interrupting your work, making it easy to build machine learning (ML) … Read more

Use machine learning to detect anomalies and predict downtime with Amazon Timestream and Amazon Lookout for Equipment

The last decade of the Industry 4.0 revolution has shown the value and importance of machine learning (ML) across verticals and environments, with more impact on manufacturing than possibly any other application. Organizations implementing a more automated, reliable, and cost-effective Operational Technology (OT) strategy have led the way, recognizing the benefits of ML in predicting … Read more

2022H2 Amazon Textract launch summary

Documents are a primary tool for record keeping, communication, collaboration, and transactions across many industries, including financial, medical, legal, and real estate. The millions of mortgage applications and hundreds of millions of W2 tax forms processed each year are just a few examples of such documents. Critical business data remains unlocked in unstructured documents such … Read more

How to redact PII data in conversation transcripts

Customer service interactions often contain personally identifiable information (PII) such as names, phone numbers, and dates of birth. As organizations incorporate machine learning (ML) and analytics into their applications, using this data can provide insights on how to create more seamless customer experiences. However, the presence of PII information often restricts the use of this … Read more

Get to production-grade data faster by using new built-in interfaces with Amazon SageMaker Ground Truth Plus

Launched at AWS re:Invent 2021, Amazon SageMaker Ground Truth Plus helps you create high-quality training datasets by removing the undifferentiated heavy lifting associated with building data labeling applications and managing the labeling workforce. All you do is share data along with labeling requirements, and Ground Truth Plus sets up and manages your data labeling workflow … Read more

Announcing the updated Salesforce connector (V2) for Amazon Kendra

Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should … Read more

­­Speed ML development using SageMaker Feature Store and Apache Iceberg offline store compaction

Today, companies are establishing feature stores to provide a central repository to scale ML development across business units and data science teams. As feature data grows in size and complexity, data scientists need to be able to efficiently query these feature stores to extract datasets for experimentation, model training, and batch scoring. Amazon SageMaker Feature … Read more

Announcing the updated ServiceNow connector (V2) for Amazon Kendra

Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should … Read more

Power recommendations and search using an IMDb knowledge graph – Part 2

This three-part series demonstrates how to use graph neural networks (GNNs) and Amazon Neptune to generate movie recommendations using the IMDb and Box Office Mojo Movies/TV/OTT licensable data package, which provides a wide range of entertainment metadata, including over 1 billion user ratings; credits for more than 11 million cast and crew members; 9 million … Read more

Power recommendation and search using an IMDb knowledge graph – Part 1

The IMDb and Box Office Mojo Movies/TV/OTT licensable data package provides a wide range of entertainment metadata, including over 1 billion user ratings; credits for more than 11 million cast and crew members; 9 million movie, TV, and entertainment titles; and global box office reporting data from more than 60 countries. Many AWS media and … Read more

Accelerate the investment process with AWS Low Code-No Code services

The last few years have seen a tremendous paradigm shift in how institutional asset managers source and integrate multiple data sources into their investment process. With frequent shifts in risk correlations, unexpected sources of volatility, and increasing competition from passive strategies, asset managers are employing a broader set of third-party data sources to gain a … Read more

Automatically retrain neural networks with Renate

Today we announce the general availability of Renate, an open-source Python library for automatic model retraining. The library provides continual learning algorithms able to incrementally train a neural network as more data becomes available. By open-sourcing Renate, we would like to create a venue where practitioners working on real-world machine learning systems and researchers interested … Read more

Create Amazon SageMaker models using the PyTorch Model Zoo

Deploying high-quality, trained machine learning (ML) models to perform either batch or real-time inference is a critical piece of bringing value to customers. However, the ML experimentation process can be tedious—there are a lot of approaches requiring a significant amount of time to implement. That’s why pre-trained ML models like the ones provided in the PyTorch … Read more

New performance improvements in Amazon SageMaker model parallel library

Foundation models are large deep learning models trained on a vast quantity of data at scale. They can be further fine-tuned to perform a variety of downstream tasks and form the core backbone of enabling several AI applications. The most prominent category is large-language models (LLM), including auto-regressive models such as GPT variants trained to complete … Read more

Next generation Amazon SageMaker Experiments – Organize, track, and compare your machine learning trainings at scale

Today, we’re happy to announce updates to our Amazon SageMaker Experiments capability of Amazon SageMaker that lets you organize, track, compare and evaluate machine learning (ML) experiments and model versions from any integrated development environment (IDE) using the SageMaker Python SDK or boto3, including local Jupyter Notebooks. Machine learning (ML) is an iterative process. When solving … Read more

Introducing Fortuna: A library for uncertainty quantification

Proper estimation of predictive uncertainty is fundamental in applications that involve critical decisions. Uncertainty can be used to assess the reliability of model predictions, trigger human intervention, or decide whether a model can be safely deployed in the wild. We introduce Fortuna, an open-source library for uncertainty quantification. Fortuna provides calibration methods, such as conformal … Read more

How to evaluate the quality of the synthetic data – measuring from the perspective of fidelity, utility, and privacy

In an increasingly data-centric world, enterprises must focus on gathering both valuable physical information and generating the information that they need but can’t easily capture. Data access, regulation, and compliance are an increasing source of friction for innovation in analytics and artificial intelligence (AI). For highly regulated sectors such as Financial Services, Healthcare, Life Sciences, … Read more

Augment fraud transactions using synthetic data in Amazon SageMaker

Developing and training successful machine learning (ML) fraud models requires access to large amounts of high-quality data. Sourcing this data is challenging because available datasets are sometimes not large enough or sufficiently unbiased to usefully train the ML model and may require significant cost and time. Regulation and privacy requirements further prevent data use or … Read more

LightOn Lyra-fr model is now available on Amazon SageMaker

We are thrilled to announce the availability of the LightOn Lyra-fr foundation model for customers using Amazon SageMaker. LightOn is a leader in building foundation models specializing in European languages. Lyra-fr is a state-of-the-art French language model that can be used to build conversational AI, copywriting tools, text classifiers, semantic search, and more. You can … Read more

Automatically identify languages in multi-lingual audio using Amazon Transcribe

If you operate in a country with multiple official languages or across multiple regions, your audio files can contain different languages. Participants may be speaking entirely different languages or may switch between languages. Consider a customer service call to report a problem in an area with a substantial multi-lingual population. Although the conversation could begin … Read more

Translate multiple source language documents to multiple target languages using Amazon Translate

Enterprises need to translate business-critical content such as marketing materials, instruction manuals, and product catalogs across multiple languages to communicate with a global audience of customers, partners, and stakeholders. Identifying the source language in each document before calling a translate job creates complexities and adds another step to your workflow. For example, an international product … Read more

Introducing Amazon SageMaker Data Wrangler’s new embedded visualizations

Manually inspecting data quality and cleaning data is a painful and time-consuming process that can take a huge chunk of a data scientist’s time on a project. According to a 2020 survey of data scientists conducted by Anaconda, data scientists spend approximately 66% of their time on data preparation and analysis tasks, including loading (19%), cleaning (26%), … Read more

Start your successful journey with time series forecasting with Amazon Forecast

Organizations of all sizes are striving to grow their business, improve efficiency, and serve their customers better than ever before. Even though the future is uncertain, a data-driven, science-based approach can help anticipate what lies ahead to successfully navigate through a sea of choices. Every industry uses time series forecasting to address a variety of … Read more

Chronomics detects COVID-19 test results with Amazon Rekognition Custom Labels

Chronomics is a tech-bio company that uses biomarkers—quantifiable information taken from the analysis of molecules—alongside technology to democratize the use of science and data to improve the lives of people. Their goal is to analyze biological samples and give actionable information to help you make decisions—about anything where knowing more about the unseen is important. … Read more

Image augmentation pipeline for Amazon Lookout for Vision

Amazon Lookout for Vision provides a machine learning (ML)-based anomaly detection service to identify normal images (i.e., images of objects without defects) vs anomalous images (i.e., images of objects with defects), types of anomalies (e.g., missing piece), and the location of these anomalies. Therefore, Lookout for Vision is popular among customers that look for automated … Read more

Amazon SageMaker JumpStart now offers Amazon Comprehend notebooks for custom classification and custom entity detection

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning (ML) to discover insights from text. Amazon Comprehend provides customized features, custom entity recognition, custom classification, and pre-trained APIs such as key phrase extraction, sentiment analysis, entity recognition, and more so you can easily integrate NLP into your applications. We recently added … Read more

Prepare data from Amazon EMR for machine learning using Amazon SageMaker Data Wrangler

Data preparation is a principal component of machine learning (ML) pipelines. In fact, it is estimated that data professionals spend about 80 percent of their time on data preparation. In this intensive competitive market, teams want to analyze data and extract more meaningful insights quickly. Customers are adopting more efficient and visual ways to build … Read more

Exafunction supports AWS Inferentia to unlock best price performance for machine learning inference

Across all industries, machine learning (ML) models are getting deeper, workflows are getting more complex, and workloads are operating at larger scales. Significant effort and resources are put into making these models more accurate since this investment directly results in better products and experiences. On the other hand, making these models run efficiently in production … Read more