How to Use the Data Scraping Crawler

Slot usage

1 slot

Estimated execution time

~6 seconds per domain

What you'll need

One or more website URLs

→ See the full breakdown of all input fields in the detailed section below.

What you’ll get

Public email and phone number found on visited pages
Social links (Facebook, Instagram, X/Twitter, LinkedIn, YouTube)

→ See the full breakdown of all output fields in the detailed section below.

Before you start

This Phantom extracts public contact details from websites, usually company-level emails such as info@ or contact@, not personal ones.
Awareness of safe usage:
- Crawling websites can be resource-intensive. Always set reasonable exit conditions to prevent the Phantom from running indefinitely on one site.
- Avoid using deep crawling levels (depth > 2), as this can cause very long execution times.

Step 1: Provide website inputs

Tell the Phantom which websites to crawl. You can provide inputs in any of these formats:

My Lists:
Choose a saved LinkedIn Leads list you’ve already created in PhantomBuster.
A URL:
- Paste a single website URL directly in the setup field.
- Paste the URL of a Google Sheet with your website URLs (make sure it’s shared with “Anyone with the link”).
- Or upload a CSV file with your website URLs (make sure it’s publicly accessible, and note that CSV upload is only available on paid plans).
  
  → If you’re using a spreadsheet, the Phantom defaults to the first column (A). To use a different column, enter the column’s header name in the field “Name of column containing websites.”
My Phantoms:
Use results from another Phantom as input.

PhantomBuster Data Scraping Crawler step 1 providing website inputs

Step 2: Select the data to extract

Choose the type of information you want to extract from each website:

Email addresses (enabled by default)
Facebook pages
Instagram profiles
Twitter profiles
LinkedIn pages
YouTube channels
Phone numbers

PhantomBuster Data Scraping Crawler step 2 selecting the data to extract

Step 3: Define exit conditions

Set the conditions under which the Phantom should stop crawling a site. This helps prevent it from getting stuck or wasting execution time:

Website depth: How many layers of links to follow from the starting page.
- 0 = only the main URL.
- 1 = also follow links from the main page.
- 2 = follow links from those pages as well (not recommended).
When an email is found: Stop crawling once at least one email is collected.
When a phone number is found: Stop after finding a phone number.
When a social link is found: Stop after detecting at least one social profile.
After opening X pages: Stop after a defined number of subpages.

PhantomBuster Data Scraping Crawler step 3 defining exit conditions

Step 4: Configure behavior

Number of milliseconds to wait before scraping the page (optional):

→ This setting controls how long the Phantom waits after loading a page and before extracting data from it. Adding a short delay can help mimic human behavior and improve success on sites with basic anti-bot protections.
- Default = empty, no delay will be applied.
Number of websites to process per launch (optional):
- Default = 5.
- If left empty, the Phantom processes all provided websites in one run.

If your input websites contain a large number of pages, start by crawling 5–10 pages per website for smoother performance.
→ Processing too many pages at once can cause the Phantom to run for too long or fail to finish successfully.

PhantomBuster Data Scraping Crawler step 4 configuring crawl behavior

Advanced settings (dropdown in setup)

Scrape multiple results per website (Optional):
- By default, the Phantom returns only one result per input so that each website in your list matches a single result in your output file.
- If you want to capture multiple emails or social profiles from the same domain, make sure to enable this setting.
Only visit web pages that start with a particular root URL (Optional):
- Limit the Phantom to pages starting with a specific prefix.

PhantomBuster Data Scraping Crawler step 4 advanced settings drop-down menus

Result file settings (dropdown in setup)

Name your results file (optional)
- You can customize the file name.
  
  If you rename the file between launches, the Phantom will create a new results file and start processing inputs from scratch.

Step 5: Select launch frequency

Choose how often the Phantom should run:

Launch manually: Start the Phantom yourself whenever you need.
Launch once at a specific time: schedule a one-time run at a set date and time.
Launch repeatedly: schedule regular runs (e.g. once per day, several times during working hours).
Launch after another Phantom: chain automations together so this Phantom starts right after another finishes.
Advanced scheduling: customize the exact minutes, hours, days, or months when the Phantom should run.

→ For a complete walkthrough of scheduling options, see our guide to scheduling Phantoms automatically.

PhantomBuster Data Scraping Crawler step 5 selecting launch frequency

Step 6 (Optional): Advanced settings

Advanced settings are available if you want to fine-tune how your Phantom runs, but by default they’re already optimized for most use cases.

We recommend leaving them as they are unless a guide specifically instructs you to change something.

→ For a detailed overview of all advanced options (like execution limits, retries, email notifications, proxies, webhooks, and file management), see our Advanced settings guide.

What you give (Input) and What you get (Output)

This section gives you a detailed breakdown of everything you need to provide to run this Phantom, and everything you’ll receive once it completes.

What you give (Input)

Type	Description
Websites	Website URLs

What you get (Output)

Type	Description
email	Email
facebookUrl	Facebook URL
instagramUrl	Instagram URL
linkedinUrl	Linkedin URL
phoneNumber	Phone Number
twitterUrl	Twitter URL
youtubeUrl	YouTube URL

Launch and results

When you’re ready:

Click Launch to start your Phantom.
Once it finishes, open the Results tab in the Phantom console.
Download your results as a CSV or JSON file.

→ To learn how to export your data to Google Sheets, integrate with other tools, or reuse it in more automations, check our Access and Export your Phantom Results guide.

Export and input limits on the Free plan
If you’re on the Free plan or Free trial, some features are limited:
- CSV exports include only the first 10 rows of results.
- CSV download links (for dynamic viewing in Google Sheets or integrations) are not available.
- JSON exports are not available.
- CSV upload as an input method is not supported.
To unlock all features, you’ll need to upgrade to a paid plan.

Tips and troubleshooting

Common pitfalls

Using a private spreadsheet (make sure it’s set to “Anyone with the link”).
Setting a website depth too high (causing very long or stuck runs).
Not defining any exit condition, leading to excessive crawling.
If in the logs you see “Page can’t be loaded”, try increasing the Number of milliseconds to wait before scraping the page to 10,000 ms (10 seconds) to give the page more time to load.
- If the logs then return “Couldn't open this website (Connection refused)”, it means this page cannot be extracted, the website has built-in security that blocks automated access.
Crawl stops after around 100 pages: by default, the Phantom crawls up to 100 subpages per website with a maximum depth of 1 (the main page and its direct links).
→ You can adjust these values in the setup (see Step 4 above), but higher limits can lead to longer or incomplete runs.
Results mostly show generic or support emails: This is expected behavior. The Data Scraping Crawler extracts the public contact details listed on websites, which are usually company-level emails (like info@ or contact@), not personal ones.
→ To find professional emails from company websites, combine this Phantom with the LinkedIn Company Employees Export to get a list of people working at each company, and then run the LinkedIn Profile Scraper to extract professional email addresses for those employees.

If you run into issues

Check how to troubleshoot your phantom using Logs.
Browse the Fix Issues & Troubleshoot Errors section for solutions to common problems.
Review our Automation Rate Limits by Platform guide.
Check our Best Practices for Social Media Automation guide.

Suggested automations

LinkedIn Company Scraper → Use the company URLs found by the Data Scraping Crawler Phantom to extract LinkedIn page data and enrich your lead lists.
AI Advanced Enricher → Summarize or qualify scraped contact info for outreach.
The Data Scraping Crawler collects not just emails and phone numbers, but also links to social media profiles (Instagram, Facebook, YouTube, X/Twitter, and LinkedIn).
- You can pass those links directly into the corresponding Profile Scraper Phantoms to extract detailed information from each platform.