Mir Lec1 Print

download Mir Lec1 Print

of 42

Transcript of Mir Lec1 Print

  • 7/24/2019 Mir Lec1 Print

    1/42

    Multimedia TechnologyLecture 1: Overview and Arrangement

    Lecturer: Dr.Wan-Lei Zhao

    Autumn Semester 2015

    Contact: [email protected] / 4 2

    All rights are reserved by Wan-Lei zhao

  • 7/24/2019 Mir Lec1 Print

    2/42

    About this Course

    Outline

    1 About this Course

    2 Syllabus

    3 Course plan

    4 Brief History about IR and Web

    5 Brief History about WWW

    2 / 4 2

    All rights are reserved by Wan-Lei zhao

  • 7/24/2019 Mir Lec1 Print

    3/42

    About this Course

    Major subjects Deal with information such as text, image and video

    Text retrieval, content-based image retrieval and video retrieval

    Focus on how to retrieve above mentioned information Popular machine learning approaches will be covered

    K-means, SVM and decision tree Popular model fitting approaches will be covered

    RANSAC and Hough transform

    Popular algorithms in computer vision will be covered

    SIFT, BoVW and Hamming Embedding Objectives

    Bring you into this interesting topic Get you familiar with basic & popular algorithms in this field Able to build a simple but workable search engine on your own Able to apply algorithms to solve the problems in your field

    3 / 4 2

    All rights are reserved by Wan-Lei zhao

  • 7/24/2019 Mir Lec1 Print

    4/42

    Syllabus

    Text Retrieval (42 hours) Brief History about IR and Web Pre-processing on Text Information Three Retrieval Models

    Boolean, vector and probability models

    Evaluation Measure Web Search Parallel Computing in IR

    Machine Learning Approaches (22 hours)

    K-means Spectral clustering Decision Tree K-Nearest Neighbour

    Support Vector Machine (SVM) Nearest Neighbour Search (12 hours)

    R-Tree KD-Tree

    Locality Sensitive Hashing Product Quantizer

    4 / 4 2

    All rights are reserved by Wan-Lei zhao

  • 7/24/2019 Mir Lec1 Print

    5/42

    Syllabus

    Model Fitting RANSAC Hough Transform

    Image & Video Retrieval (22 hours) Challenges & Trends

    Image Features: SIFT and et al. BoVW Framework Fisher Kernel Framework Challenges in Video Retrieval

    Temporal Verification Approach Image Classification and MISC (12 hours)

    Challenges & Trends One-against-all Framework Tricks in model training Convolutional Neural Network

    5 / 4 2

    All rights are reserved by Wan-Lei zhao

  • 7/24/2019 Mir Lec1 Print

    6/42

    Syllabus

    Course work in the lab (32 hours) Three experiments Subjects that you learn in the class Keep secret until the lab time Each time, it is also aquiz 10 marksfor each experiment NOteam work!!! Late submission is allowed, but with 30% discount

    Presentation of the course project (22 hours)

    Two course projects Implement after class Team work is encouraged, butsize(team)4 15 minutes for each team to present their project

    A hardcopy of the project report is also required

    6 / 4 2

    All rights are reserved by Wan-Lei zhao

    S ll b

  • 7/24/2019 Mir Lec1 Print

    7/42

    Syllabus

    Prerequisites of this course

    Data Structure You have to be familiar with it Otherwise, you are not suggested to take this course

    Good at C/C++ It will be used in the lab It is recommended for your course project

    Basic knowledge about Internet Internet protocols Mechanism of WWW HTML and Javascript

    Matlab is a plus It will be used in the lab Even you do not know, it does not matter You will learn its basics during this course

    7 / 4 2

    All rights are reserved by Wan-Lei zhao

    S ll b

  • 7/24/2019 Mir Lec1 Print

    8/42

    Syllabus

    Teaching assistant for this course

    Mr. Zhihui Chen will be in charge of the course project related issues

    Miss Haihui Liu helps to do proofreading on the course materials

    Experiment lectures are held in Labotrary building, Room 501

    Time slot: 2:30pm -4:20pm, in the 6th, 8th and 10th weeks I will remind you one week ahead

    8 / 4 2

    All rights are reserved by Wan-Lei zhao

    Syllabus

  • 7/24/2019 Mir Lec1 Print

    9/42

    Syllabus

    Course website

    Platform of online teaching in XMU URL: l.xmu.edu.cn, please go to there and register the course Password: 007

    9 / 4 2

    All rights are reserved by Wan-Lei zhao

    Syllabus

  • 7/24/2019 Mir Lec1 Print

    10/42

    Syllabus

    Language in the Class

    English or Chinese?

    You might be uncomfortable at

    the beginning Me too:)

    Several advantages: Computer science is defined in

    English Get you guys used to English

    10/42

    All rights are reserved by Wan-Lei zhao

    Syllabus

  • 7/24/2019 Mir Lec1 Print

    11/42

    Syllabus

    Intersection of four disciplines

    Related (top-ranked) Conferences: ACM SIGIR, WWW ACM MM, ACM ICMR & ACM ICME IEEE CVPR & ECCV IEEE ICCV, IEEE ACCV, IEEE ACCV & BMVC ICML & AAAI

    11/42

    All rights are reserved by Wan-Lei zhao

    Syllabus

  • 7/24/2019 Mir Lec1 Print

    12/42

    Syllabus

    Related (top-ranked) Journals: IEEE Trans. on Knowledge and Data Engineering IEEE Trans. on Pattern Analysis and Machine Intelligence International Journal of Computer Vision IEEE Trans. on Multimedia IEEE Trans. on Image Processing Computer Vision and Image Understanding

    Reference Books R. Baeza-Yates and et al., Modern Information Retrieval: The

    Concepts and Technology behind Search (2nd edition) Richard Szeliski, Computer Vision: Algorithms and Applications Lecture notes of Machine Learning by Dr. Andrew Ng, from

    Stanford University

    Related papers will be suggested to read as assignment Online Resources:

    Youku Wikipedia Baidu Baike

    12/42

    All rights are reserved by Wan-Lei zhao

    Syllabus

  • 7/24/2019 Mir Lec1 Print

    13/42

    y

    Question: can our brain understand how our brain works? We are going to have a taste that how tough this question is from

    two aspects

    1 Computer Vision2 Machine Learning

    13/42

    All rights are reserved by Wan-Lei zhao

    Course plan

  • 7/24/2019 Mir Lec1 Print

    14/42

    p

    Evaluation: 3 lab experiments + 2 course projectsS= 30% + 35% + 35%

    About course projects Implemented in C, C++/Python, Matlab If you do not know Python or Matlab, learn it!!

    Sample codes will be given, you only need to fill blanks Team work is encouraged for the two course projects Team leader will be marked 5 credits higher or lower depending on the

    performance

    Report (only the second one) and presentation (both) are required (inEnglish if possible)

    Failure is acceptable but nocheatingorplagiarism

    If it happens, you are OUT!! Any questions?

    14/42

    All rights are reserved by Wan-Lei zhao

    Course plan

  • 7/24/2019 Mir Lec1 Print

    15/42

    Be an Active Learner

    Level 1 Catch the concept

    Level 2

    Understand the idea Know how to use it

    Level 3 Able to re-implement the algorithms Knows where it works Knows where it fails

    15/42

    All rights are reserved by Wan-Lei zhao

    Brief History about IR and Web

  • 7/24/2019 Mir Lec1 Print

    16/42

    Outline

    1 About this Course

    2 Syllabus

    3

    Course plan

    4 Brief History about IR and Web

    5 Brief History about WWW

    16/42

    All rights are reserved by Wan-Lei zhao

    Brief History about IR and Web

  • 7/24/2019 Mir Lec1 Print

    17/42

    Human Languages (1)

    7,000 languages in the world

    90% of these languages are used by less than 100,000 people

    Based on your knowledge and imagination Please list out top-5 most popularly used languages

    Give the rank also, do it now ...

    17/42

    All rights are reserved by Wan-Lei zhao

    Brief History about IR and Web

  • 7/24/2019 Mir Lec1 Print

    18/42

    Human Languages (1)

    7,000 languages in the world 90% of these languages are used by less than 100,000 people

    Language Population Category Region

    Mandarin 1.2 billion isolating language China

    English 508 million reflecting language UK, North America

    Hindi 497 million reflecting language India & Pakistan

    Spanish 392 million reflecting language Span & South AmericaRussian 277 million reflecting language Russia & East Europe

    Mainly talk about retrieval on English documents Mention a little about processing on Chinese documents

    18/42

    All rights are reserved by Wan-Lei zhao

    Brief History about IR and Web

  • 7/24/2019 Mir Lec1 Print

    19/42

    Human Languages (2)

    Figure : Weights of real impact to the world.

    In terms of real influence, the rank changes1

    Influence: economically, politically, size of population and number ofcountries

    1Conducted by Webb.19/42

    All rights are reserved by Wan-Lei zhao

    Brief History about IR and Web

  • 7/24/2019 Mir Lec1 Print

    20/42

    Distribution of World Languages

    Pay attention that not all the languages have their written forms

    20/42

    All rights are reserved by Wan-Lei zhao

    Brief History about IR and Web

  • 7/24/2019 Mir Lec1 Print

    21/42

    Evolution of Storage Media

    Egyptian papyrus2 Babylonian clay tablet (3000 B.C.) Chinese Oracle (1400 B.C.)

    In 105 A.D., paper was invented in China

    2It is not paper in real sense.21/42

    All rights are reserved by Wan-Lei zhao

    Brief History about IR and Web

  • 7/24/2019 Mir Lec1 Print

    22/42

    Story of Rosetta Stone

    Written in both acient Egyptian and Greek, discovered in 1799

    in 196 BC on behalf of King Ptolemy V.

    Key to understanding of acient Egyptian J.-F. Champollion decoded the language

    22/42

    All rights are reserved by Wan-Lei zhao

    Brief History about IR and Web

  • 7/24/2019 Mir Lec1 Print

    23/42

    library comes from Latin word liber, means book

    bibliothek comes from Greek word biblion, means book writtenon papyrus

    23/42

    All rights are reserved by Wan-Lei zhao

    Brief History about IR and Web

  • 7/24/2019 Mir Lec1 Print

    24/42

    Spread of ancient civilizations

    Five ancient civilizations: ancient Egypt, ancient Babylion, ancientIndia, ancient China, ancient Maya

    24/42

    All rights are reserved by Wan-Lei zhao

    Brief History about IR and Web

  • 7/24/2019 Mir Lec1 Print

    25/42

    The first library (as far as we know) was established in north Syria,around 3000 BC

    Later, Empire Assyria built Library Nineveh (current Mosul) in 612BC

    Best well-known library was built by Alexander the Great about 350

    BC in Egypt

    In China, library appeared around 800 BC

    25/42

    All rights are reserved by Wan-Lei zhao

    Brief History about IR and Web

    E l i f S M di

  • 7/24/2019 Mir Lec1 Print

    26/42

    Evolution of Storage Media

    After the advent of computer

    26/42

    All rights are reserved by Wan-Lei zhao

    Brief History about IR and Web

    IR i diff

  • 7/24/2019 Mir Lec1 Print

    27/42

    IR in two different eras

    before WWW WWW era

    Media text document, TV, film & CD in electronic forms

    Publishing months or years hoursStorage books & papers disc, DVD and etc & web

    Indexing title, author, keywords and date and contents

    Interface library browser

    According to IBM, 90% of the knowledge in the world are created inlast two years

    Powerful IR system is required to coordinate the distribution ofinformation/knowledge

    27/42

    All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    Th Bi h f WWW

  • 7/24/2019 Mir Lec1 Print

    28/42

    The Birth of WWW

    1981-1991: the invention of the Web In 1980, Tim Berners-Lee worked in CERN (European Organization for

    Nuclear Research) Manage information for physicists such that they can share In 1984, he returned to CERN In 1989, he wrote a proposal about large hypertext database By Christmas 1990, he built all necessary elements for web HTTP, HTML, web browser and httpd

    28/42

    All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    Th th f W ld Wid W b

  • 7/24/2019 Mir Lec1 Print

    29/42

    The growth of World Wide Web

    Early times of growth (1991-1995) Microsoft has its first browser: Cello Mosaic (from UIUC) is the first successful browser W3C was founded by Berners-Lee in 1994 at MIT

    Commercialize (1996-1998) More and more dot-coms appeared

    Boom and Bust (1999-2001)

    More and more dot-coms appeared Internet becomes popular in China Many currently well-known companies were established: Baidu,Alibaba Search Engines were born

    29/42

    All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    Th th f W ld Wid W b

  • 7/24/2019 Mir Lec1 Print

    30/42

    The growth of World Wide Web

    Early times of growth (1991-2001) First version of Java was released in 1995 First version of PHP was released in 1995 JavaScript was invented by Netscape in 1995 Static web to dynamic web Strong support for multimedia

    30/42

    All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    WWW is everywhere

  • 7/24/2019 Mir Lec1 Print

    31/42

    WWW is everywhere

    Ubiquitous web (2002-present) Introduction of Web 2.0 is the milestone Wikipedia was born in 2001 Flickr was born in 2004 Facebook was born in 2004 Youtube was born in 2006 Twitter was born in 2006 Smartphone was released in 2007

    All technologies and media are intertwined to reshape the world

    Impact on our daily life of many aspects

    IR becomes the main interface to them all

    31/42

    All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    Semantic Web

  • 7/24/2019 Mir Lec1 Print

    32/42

    Semantic Web

    Web 3.0 (20??) Proposed by Berners-Lee3

    Websites are linked by semantic meta data Machine builds the link automatically Requires technology of natural language understanding Still a vague concept

    Automatic documenting, e.g. books and recipes

    3Weaving the Web: The Original Design and Ultimate Destiny of the World WideWeb, in American Scientific, 2000

    32/42

    All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    Statistics on WWW

  • 7/24/2019 Mir Lec1 Print

    33/42

    Statistics on WWW

    100M

    1B

    2B

    2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

    Number

    Year

    Num. of websites and users (2000-1013)

    Num. of sitesNum. of users

    The growth rate of user is much higher than that of websites The growth rate of clicks would be even much higher

    33/42

    All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    Challenges in Modern Information Retrieval

  • 7/24/2019 Mir Lec1 Print

    34/42

    Challenges in Modern Information Retrieval

    How to bridge such a semantic gap

    A word is worth a thousand pictures

    A picture is worth a thousand of words

    34/42

    All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    Scalability in the age of BIG data (1)

  • 7/24/2019 Mir Lec1 Print

    35/42

    Scalability in the age ofBIG data (1)

    A glance at big data today 1.1billion websites until Nov. 2014 >3,000images uploaded to Flickr in every minute4

    >200,000videos uploaded per day to YouTube (>1,000years) TV News: thousands hours of programs broadcasted each day >100 billion photos in Facebook till Jun. 2011

    Challenges: facilitate fast browsing and sharing How to store? How to organize? How to retrieve?

    4Statistics was collected on Apr. 28th 2010.35/42

    All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    Scalability in the age of BIG data (2)

  • 7/24/2019 Mir Lec1 Print

    36/42

    Scalability in the age ofBIG data (2)

    Given the thickness of one photo: 0.2 mm 36/42All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    Top Rank Search Engines

  • 7/24/2019 Mir Lec1 Print

    37/42

    Top Rank Search Engines

    Google takes lions share of the market

    Baidu is not in the rank (unfortunately)5

    5Cited from: http://www.ebizmba.com/articles/search-engines37/42

    All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    Sketch the framework of a search engine

  • 7/24/2019 Mir Lec1 Print

    38/42

    Sketch the framework of a search engine

    Draw a framework about a search engine in 5 minutes

    Put all elements you could figure out, do it now ...

    38/42

    All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    Framework of a search engine

  • 7/24/2019 Mir Lec1 Print

    39/42

    Framework of a search engine

    Observations Information are highly distributed in Internet The indexer (search engine) keeps information in a centralized manner

    39/42

    All rights are reserved by Wan-Lei zhao

    Brief History about WWW

    Structure of a crawler

  • 7/24/2019 Mir Lec1 Print

    40/42

    Structure of a crawler

    Observations

    Crawler plays very important role Experiences of using Baidu and Google

    40/42

    All rights are reserved by Wan-Lei zhao

  • 7/24/2019 Mir Lec1 Print

    41/42

    Q & A

    41/42

    All rights are reserved by Wan-Lei zhao

  • 7/24/2019 Mir Lec1 Print

    42/42

    Thanks for your attention!

    42/42

    All rights are reserved by Wan-Lei zhao