Guide to scraping the web

Scraping the web for data has immense benefits in relation to scaling your email marketing program. Here are a few use cases for scraping the web and bringing that data into your email:

  • Creating a post-purchase request for review by automatically pulling in data about a specific product from your website
  • Creating a “grocery list” reminder email of all items you saved as “want to eat” on a grocery shopping website
  • Highlighting a new-in product with all its details

In this example, we’ll look at how you can bring a recipe listed online into your email to show off to your subscribers. What might be a lot of tedious copy and paste is no longer!

On the right, you have the recipe online; on the left, you have the email auto-fetching the recipe ingredients from the website!

Email and the recipe it will pull from
Email and the recipe it will pull from

Scraping data with Parcel

Before pulling out recipe ingredients into our email, let’s look at the basics of how this will work.

You can scrape data from a webpage so long as you have a URL (the ultimate key for this to work). Data is optional unless you want to bring in specific aspects of it. We’ll walk through this further in a bit.

<component>
<scrape name="pricing" url="https://useparcel.com/pricing" data="{features:{selectorAll: '#features tr h4',attr: 'text'},}">
<ul>
<li foreach="feature in pricing.features">${feature}</li>
</ul>
</scrape>
</component>

Using the Parcel website as an example, in the above component, all data on https://useparcel.com/pricing, where text matches the CSS selector #features tr h4, is called into Parcel. We further specify that we want each of these to be listed in an unordered list.

Pricing item list from Parcel
Pricing item list from Parcel

By changing the CSS selector from <h4> to other tags available on the site, we can see the data changes! Watch how we change <h4> to <h3> and the content of the email updates.

Swapping CSS selectors

Key parts of this project

  • Parcel account
  • URL you want to scrape data from
  • Email template
  • Component to specify your selector

Follow along resources

Let’s begin

We’ll start by opening up our base email template into which we want to call data.

Base template
Base template

We’ll also need the recipe we want to scrape data from. In this example, I’m pulling from Half Baked Harvest, which is a talented chef I’ve followed for years, and once or twice a month will dive into one of her recipes.

Next, we’ll create a component to call on our URL

<meta name="label" content="Scrape Ingredients" />
<fieldset></fieldset>
<component>
<scrape name="recipe" url="https://www.halfbakedharvest.com/sesame-garlic-chili-oil-noodles/" data="{ingredients:{selectorAll: '.wprm-recipe-ingredients li',attr: 'text'},}">
<p foreach="ingredient in recipe.ingredients">${ingredient}</p>
</scrape>
</component>

Our scrape component is composed of a few pieces. Let’s work through them!

The <meta> tag is self-closing and primarily serves as a description of what this component is.

<fieldset> allows you to add attributes you want to control on an individual email basis. For some use cases, being able to swap the URL out quickly would be a great way to update the email with a new recipe we’ve recently made. If I wanted only to feature Half Baked Harvest recipes in an email series and create a great structured way of swapping out new recipes, I could feel confident using the <fieldset> tag because the ingredients are all styled with the same CSS class. But because I’m not exclusively going to be using Half Baked Harvest recipes, I can’t assume that ingredients on other websites will be styled the same, so it’s important to specify this information inside of the email.

<scrape> this is where we specify the content we’ll be including in our email!

How do we isolate just the ingredients on the Half Baked Harvest recipe? We inspect them to find their selector! We can see that the ingredients listed are included under the <li> tag, with the class of .wprm-recipe-ingredient

Inspecting the ingredients
Inspecting the ingredients

Within our <scrape> tag, we have a few fields.

  • Name - we’ve named this recipe
  • URL - representing the URL of the recipe we’re calling on
  • Data - here we’re defining that ingredients, represents the individual <li> tag with the class of wprm-recipe-ingredients as we saw from inspecting the website. We’re also defining that this is text.

<p> within the opening <p> we use the foreach method to call in each of the items that were selected from the website. These are then added inside the email content each in their own paragraph.

Checking your selector

One more thing you can do to check your selector, is go into dev tool on the page you are pulling from, then add a CSS rule using that selector and add a background or outline.

.wprm-recipe-ingredients li {
background: red;
}

You can then see what is selected on the screen:

Highlight your selector
Highlight your selector

Bringing it all together

Now that we’ve brought our component together, we can call on our component in our email template!

Putting it all together

Voila, our email automatically pulls in the ingredients list, and we’re ready to send it to our subscribers!

If you liked this example and want to see others, let us know! There’s a handful of ways that email creators can use the <scrape> tag to bring in data from external sources.