N-gram Analysis & PPC: NLP, Machine Learning and Understanding N-Grams
N-gram analysis is a statistical method that advertisers can use to discover the most profitable keywords for a given ad campaign.
The definition for N-grams comes from natural language processing, (NLP) an area of machine learning application responsible for technologies like voice-to-text, speech recognition, language identification, and auto-correct. An N-gram is defined as a contiguous sequence of N words from a specified sample of text or speech.
For example, we can look at a simple text sample and turn it into N-grams. Our sample can be a basic sentence:
“The quick brown fox jumps over the lazy dog”
If we were to convert this sentence to 1-grams, we would have 9 of them:
- The
- Quick
- Brown
- Fox
- Jumps
- Over
- The
- Lazy
- Dog
If we converted the sentence to 2-grams, we would have eight of them:
- The quick
- Quick brown
- Brown fox
- Fox jumps
- Jumps over
- Over the
- The lazy
- Lazy dog
So, what’s the significance of reorganizing the data this way? Well, if we took a lot of query data for a single ad campaign, we would likely discover that certain keywords and phrases are present in many of the search terms that generated impressions for the campaign. N-gram analysis allows advertisers to see which of these keywords or phrases that reoccur within a campaign are best correlated with positive ad performance.
In N-gram analysis and PPC, advertisers look at all the keywords that generated impressions or clicks for a given ad campaign – these keywords are taken together as the text sample for the N-gram analysis. The next step is to group the keywords into N-grams of the desired length – typically either one, two or three-word phrases. N-gram analysis is basically interested in three things:
- How frequently does each N-gram appear throughout the query data?
- How many total clicks are generated by keywords that contain each of the specified N-grams?
- How many total conversions are generated by keywords that contain each of the specified N-grams?
Using this technique, advertisers can identify the specific words and phrases that are most strongly correlated (either positively or negatively) with ad performance for a given campaign.