Contextual Advertising Online

The internet advertising market is growing much faster than any other advertising vertical. The technology for serving advertising online goes more and more towards automated processes that analyze the page content and the user’s preferences and then matches the ads with these parameters.The task at hand was to research and find methods that could be suitable for matching web documents to ads automatically, build a prototype system, make an evaluation and suggest areas for further development. The goals of the system was high throughput, accurate ad matching and fast response times. A requirement on the system was that human input could only be done when adding ads into the system for the system to be scalable.The prototype system is based on the vector space model and a td-idf weighting scheme. The cosines coefficient was used in the system to quantify the similarity between a web document and an ad.A technique called stemming was also implemented in the system together with a clustering solution that aided the ad matching in cases where few matches could be done on the keywords attached to the ads. The system was built with a threaded structure to improve throughput and scalability. The tests results show that you accurately can match ads to a website’s content using the vector space model and the cosines-coefficient. The tests also show that the stemming has a positive effect on the ad matching accuracy.

Contents

1 INTRODUCTION
1.1 OVERVIEW
1.2 BACKGROUND
1.3 GETUPDATED INTERNET MARKETING
1.4 METHOD
1.4.1 ANALYSIS
1.4.2 DESIGN
1.4.3 IMPLEMENTATION
1.4.4 TEST
1.4.5 EVALUATION
1.5 READING INSTRUCTIONS
2 PROBLEM DESCRIPTION
2.1 PURPOSE
2.2 PROBLEM DEFINITION
2.3 CONSTRAINTS
3 PROBLEM ANALYSIS
3.1 OVERVIEW
3.2 SYSTEM FUNCTIONALITY
3.3 IDENTIFIED PROBLEMS
3.4 IDENTIFIED REQUIREMENTS
3.4.1 DOCUMENT DOWNLOADING
3.4.2 DATA STRUCTURES
3.4.3 DOCUMENT ANALYSIS AND AD RANKING
3.4.4 DUPLICATE WEBSITES
3.4.5 PLATFORM
4 TECHNICAL BACKGROUND
4.1 OVERVIEW
4.2 RANKING MODELS
4.2.1 BOOLEAN MODEL
4.2.2 VECTOR SPACE MODEL
4.2.3 PROBABILISTIC MODEL
4.2.4 COMPARISON OF THE MODELS
4.3 STEMMING
4.3.1 DEFINITION
4.3.2 EXPLANATION
4.3.3 ALGORITHMS
4.4 THESAURUS
4.5 QUERY EXPANSION
4.6 CLUSTERING
4.6.1 BATCH AND ONLINE
4.6.2 CLUSTERING ALGORITHMS
4.6.3 CLUSTER REPRESENTATION
4.7 DUST DETECTION
4.7.1 DUSTBUSTER
4.7.2 SIMHASH
4.8 SIMILARITY COEFFICIENTS
4.8.1 TANIMOTO COEFFICIENT
4.8.2 COSINES COEFFICIENT
5 DESIGN CHOICES
5.1 OVERVIEW
5.2 VECTOR SPACE MODEL
5.3 DUPLICATE DOCUMENTS
5.4 STEMMING
5.5 CLUSTER DENDROGRAM
5.6 SIMILARITY COEFFICIENT
5.7 THREADED SOLUTION
5.8 DATABASE LAYER
6 IMPLEMENTATION
6.1 OVERVIEW
6.2 COMPLETE SYSTEM FLOW CHART
6.3 COMMUNICATION BETWEEN SYSTEM PARTS
6.4 THE RETRIEVER
6.4.1 PURPOSE
6.4.2 IMPLEMENTATION
6.5 THE ANALYZER
6.5.1 PURPOSE
6.5.2 IMPLEMENTATION
6.6 DB INTERACTION
6.6.1 PURPOSE
6.6.2 TABLE DIAGRAM
6.6.3 OPTIMIZATIONS
7 TEST AND EVALUATION
7.1 OVERVIEW
7.2 METHODS TESTED
7.3 TEST GROUP
7.4 TEST DATA
7.5 THE TEST SYSTEM
7.5.1 PURPOSE
7.5.2 PLATFORM
7.5.3 FLOW CHART
7.5.4 FLOW CHART EXPLAINED
7.5.5 THE INTERFACE
7.5.6 EXTRACTING TEST RESULTS
7.6 TEST RESULTS
7.6.1 AD TO DOCUMENT ACCURACY
7.6.2 MATCHING ADS PER DOCUMENT
7.6.3 AVG. WEIGHT SCORE
7.6.4 SYSTEM THROUGHPUT
8 TEST RESULTS DISCUSSION
8.1 OVERVIEW
8.2 RESULTS DISCUSSION
8.2.1 AD TO DOCUMENT ACCURACY
8.2.2 MATCHING ADS PER DOCUMENT
8.2.3 AVG. WEIGHT SCORE
8.2.4 SYSTEM THROUGHPUT
9 CONCLUSIONS AND FUTURE WORK
9.1 OVERVIEW
9.2 CONCLUSIONS
9.3 IDENTIFIED AREAS OF IMPROVEMENT
9.3.1 DENDROGRAM GENERATION
9.3.2 AD MATCHING THRESHOLDS
9.4 SUGGESTIONS FOR FURTHER DEVELOPMENT
9.4.1 WEBSITE RETRIEVAL
9.4.2 LEXICON AND WORD MATCHING
9.4.3 INTRODUCING FEEDBACK
10 GLOSSARY
11 REFERENCES
11.1 LITERATURE
11.2 WEBSITES

Author: Pettersson, Jimmie

Source: Linköping University

Download URL 2: Visit Now

Leave a Comment