Confronting Multimodal LLM Challenges: Reasoning Gaps and Safety Trade-offs in Phi-3-Vision

Table of Links Abstract and 1 Introduction 2 Technical Specifications 3 Academic benchmarks 4 Safety 5 Weakness 6 Phi-3-Vision 6.1 Technical Specifications 6.2 Academic benchmarks 6.3 Safety 6.4 Weakness References A Example prompt for benchmarks B Authors (alphabetical) C Acknowledgements 6.4 Weakness Regarding the multi-modal LLM capabilities of our Phi-3-Vision, it performs admirably across various … Read more

Benchmarking Multimodal Safety: Phi-3-Vision’s Robust RAI Performance

Table of Links Abstract and 1 Introduction 2 Technical Specifications 3 Academic benchmarks 4 Safety 5 Weakness 6 Phi-3-Vision 6.1 Technical Specifications 6.2 Academic benchmarks 6.3 Safety 6.4 Weakness References A Example prompt for benchmarks B Authors (alphabetical) C Acknowledgements 6.3 Safety To ensure the integration of Phi-3-Vision aligns with Microsoft’s Responsible AI (RAI) principles, … Read more

How Concept Frequency Affects AI Image Accuracy

Table of Links Abstract and 1. Introduction 2 Concepts in Pretraining Data and Quantifying Frequency 3 Comparing Pretraining Frequency & “Zero-Shot” Performance and 3.1 Experimental Setup 3.2 Result: Pretraining Frequency is Predictive of “Zero-Shot” Performance 4 Stress-Testing the Concept Frequency-Performance Scaling Trend and 4.1 Controlling for Similar Samples in Pretraining and Downstream Data 4.2 Testing … Read more

What Sequent Calculus Teaches Us About Computation

Table of Links Introduction Translating To Sequent Calculus 2.1 Arithmetic Expressions 2.2 Let Bindings 2.3 Top-level Definitions 2.4 Algebraic Data and Codata Types 2.5 First-Class Functions 2.6 Control Operators Evaluation Within a Context 3.1 Evaluation Contexts for Fun 3.2 Focusing on Evaluation in Core Typing Rules 4.1 Typing Rules for Fun 4.2 Typing Rules for … Read more

Across Metrics and Prompts, Frequent Concepts Outperform in Zero-Shot Learning

Table of Links Abstract and 1. Introduction 2 Concepts in Pretraining Data and Quantifying Frequency 3 Comparing Pretraining Frequency & “Zero-Shot” Performance and 3.1 Experimental Setup 3.2 Result: Pretraining Frequency is Predictive of “Zero-Shot” Performance 4 Stress-Testing the Concept Frequency-Performance Scaling Trend and 4.1 Controlling for Similar Samples in Pretraining and Downstream Data 4.2 Testing … Read more

What 34 Vision-Language Models Reveal About Multimodal Generalization

Table of Links Abstract and 1. Introduction 2 Concepts in Pretraining Data and Quantifying Frequency 3 Comparing Pretraining Frequency & “Zero-Shot” Performance and 3.1 Experimental Setup 3.2 Result: Pretraining Frequency is Predictive of “Zero-Shot” Performance 4 Stress-Testing the Concept Frequency-Performance Scaling Trend and 4.1 Controlling for Similar Samples in Pretraining and Downstream Data 4.2 Testing … Read more

What Codata, Control Flow, and Logic Teach Us About Programming

Table of Links Introduction Translating To Sequent Calculus 2.1 Arithmetic Expressions 2.2 Let Bindings 2.3 Top-level Definitions 2.4 Algebraic Data and Codata Types 2.5 First-Class Functions 2.6 Control Operators Evaluation Within a Context 3.1 Evaluation Contexts for Fun 3.2 Focusing on Evaluation in Core Typing Rules 4.1 Typing Rules for Fun 4.2 Typing Rules for … Read more

How Dataset Diversity Impacts AI Model Performance

Table of Links Abstract and 1. Introduction 2 Concepts in Pretraining Data and Quantifying Frequency 3 Comparing Pretraining Frequency & “Zero-Shot” Performance and 3.1 Experimental Setup 3.2 Result: Pretraining Frequency is Predictive of “Zero-Shot” Performance 4 Stress-Testing the Concept Frequency-Performance Scaling Trend and 4.1 Controlling for Similar Samples in Pretraining and Downstream Data 4.2 Testing … Read more

Meet Session App, Winner of Startups of The Year 2024 in Zug, Switzerland

Tell us about Session. Session is a messaging app that protects user privacy by eliminating metadata exposure. It offers fully anonymous sign-up, with no phone number or email required, and uses decentralisation and onion routing to shield users from surveillance. Session is a user-friendly alternative to centralised messaging apps that remain vulnerable to tracking, third … Read more

Did Two AIs Just Show Emergent Behavior?

My homegrown experiment might just say yes. Traditional AI systems are designed to react to user input with pre-trained responses, exhibiting no independent reasoning or self-awareness. However, emergence — where simple systems interacting in complex ways produce outcomes greater than their parts — is often seen as a necessary precursor to true artificial general intelligence … Read more

Sequent Calculus vs CPS: A Compiler’s Perspective on Consumers and Evaluation Strategies

Table of Links Introduction Translating To Sequent Calculus 2.1 Arithmetic Expressions 2.2 Let Bindings 2.3 Top-level Definitions 2.4 Algebraic Data and Codata Types 2.5 First-Class Functions 2.6 Control Operators Evaluation Within a Context 3.1 Evaluation Contexts for Fun 3.2 Focusing on Evaluation in Core Typing Rules 4.1 Typing Rules for Fun 4.2 Typing Rules for … Read more

AI Writes Code Now—So Why Do Developers Still Matter?

Artificial intelligence has steadily become a major talking point in the tech community, fueling both excitement and anxiety about its future impact on various industries – especially software development. The potential for AI-driven tools to transform coding workflows is undeniably fascinating, yet it also prompts significant questions about our roles as developers in an increasingly … Read more

Why Compiler Writers Care About Case-of-Case

Table of Links Introduction Translating To Sequent Calculus 2.1 Arithmetic Expressions 2.2 Let Bindings 2.3 Top-level Definitions 2.4 Algebraic Data and Codata Types 2.5 First-Class Functions 2.6 Control Operators Evaluation Within a Context 3.1 Evaluation Contexts for Fun 3.2 Focusing on Evaluation in Core Typing Rules 4.1 Typing Rules for Fun 4.2 Typing Rules for … Read more

‘Let It Wag!’ and the Limits of Machine Learning on Rare Concepts

Table of Links Abstract and 1. Introduction 2 Concepts in Pretraining Data and Quantifying Frequency 3 Comparing Pretraining Frequency & “Zero-Shot” Performance and 3.1 Experimental Setup 3.2 Result: Pretraining Frequency is Predictive of “Zero-Shot” Performance 4 Stress-Testing the Concept Frequency-Performance Scaling Trend and 4.1 Controlling for Similar Samples in Pretraining and Downstream Data 4.2 Testing … Read more

AI Training Data Has a Long-Tail Problem

Table of Links Abstract and 1. Introduction 2 Concepts in Pretraining Data and Quantifying Frequency 3 Comparing Pretraining Frequency & “Zero-Shot” Performance and 3.1 Experimental Setup 3.2 Result: Pretraining Frequency is Predictive of “Zero-Shot” Performance 4 Stress-Testing the Concept Frequency-Performance Scaling Trend and 4.1 Controlling for Similar Samples in Pretraining and Downstream Data 4.2 Testing … Read more

AI Models Trained on Synthetic Data Still Follow Concept Frequency Trends

Table of Links Abstract and 1. Introduction 2 Concepts in Pretraining Data and Quantifying Frequency 3 Comparing Pretraining Frequency & “Zero-Shot” Performance and 3.1 Experimental Setup 3.2 Result: Pretraining Frequency is Predictive of “Zero-Shot” Performance 4 Stress-Testing the Concept Frequency-Performance Scaling Trend and 4.1 Controlling for Similar Samples in Pretraining and Downstream Data 4.2 Testing … Read more

Analyzing the Impact of Pretraining Frequency on Zero-Shot Performance in Multimodal Models

Table of Links Abstract and 1. Introduction 2 Concepts in Pretraining Data and Quantifying Frequency 3 Comparing Pretraining Frequency & “Zero-Shot” Performance and 3.1 Experimental Setup 3.2 Result: Pretraining Frequency is Predictive of “Zero-Shot” Performance 4 Stress-Testing the Concept Frequency-Performance Scaling Trend and 4.1 Controlling for Similar Samples in Pretraining and Downstream Data 4.2 Testing … Read more

How AI Models Count and Match Concepts in Images and Text

Table of Links Abstract and 1. Introduction 2 Concepts in Pretraining Data and Quantifying Frequency 3 Comparing Pretraining Frequency & “Zero-Shot” Performance and 3.1 Experimental Setup 3.2 Result: Pretraining Frequency is Predictive of “Zero-Shot” Performance 4 Stress-Testing the Concept Frequency-Performance Scaling Trend and 4.1 Controlling for Similar Samples in Pretraining and Downstream Data 4.2 Testing … Read more

What 300GB of AI Research Reveals About the True Limits of “Zero-Shot” Intelligence

:::info Authors: (1) Vishaal Udandarao, Tubingen AI Center, University of Tubingen, University of Cambridge, and equal contribution; (2) Ameya Prabhu, Tubingen AI Center, University of Tubingen, University of Oxford, and equal contribution; (3) Adhiraj Ghosh, Tubingen AI Center, University of Tubingen; (4) Yash Sharma, Tubingen AI Center, University of Tubingen; (5) Philip H.S. Torr, University … Read more

Grok is “Improved” According to Elon, But It’s Raising More Concerns Now Than Ever

Friday morning started with a bold claim from Elon Musk: his AI chatbot Grok has gotten a major upgrade. “We have improved @Grok significantly. You should notice a difference when you ask Grok questions,” Musk posted on X (formerly Twitter). But within hours, people were asking a very different question: What exactly did he improve? Because if … Read more