Multilevel Profiling of Situation and Dialogue-based Deep Networks: EMTD Dataset

:::info
Authors:

(1) Dinesh Kumar Vishwakarma, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India;

(2) Mayank Jindal, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India

(3) Ayush Mittal, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India

(4) Aditya Sharma, Biometric Research Laboratory, Department of Information Technology, Delhi Technological University, Delhi, India.

:::

Table of Links

Abstract and Intro
Background and Related Work
EMTD Dataset
Proposed Methodology
Experiments
Conclusion and References

3. EMTD Dataset

The datasets in previous literature lack the uniform composition of movie genres. Hence, we propose an EMTD (English Movie Trailer Dataset) consisting of around 2000 unique Hollywood movie trailers downloaded from IMDB1 . EMTD contains 2000 unique trailers of 5 genres namely: action, comedy, horror, romance, science fiction. The dataset is extracted from IMDB by web scrapping procedure as follows: (1) fetch the list of movie titles available on IMDB (with at least 1 genre common to one mentioned above), (2) scrape metadata corresponding to each movie title including trailer link to download, and (3) download the trailers (.mp4) corresponding to the link into a folder, and list down all the information/metadata about the movie including trailer name, descriptions, plot, keywords, and genres in the form of a CSV file. In this work, the dataset is partitioned into train set (1700 trailers), validation set (300 trailers) as shown in Table 1.


The study is conducted with the above genres only because mostly these genres are observed in the movies. We also want to explore the performance of our architecture first on a small set of genres, so we go for choosing only 5 genres instead of going towards a broad set of genres.

:::info
This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.

:::

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.