Relate tables

In some cases, you need scrape data from some pages to a table, then open some links on these pages to scrape more contents to another table. And you need relate these tables.

Relate table with "URL" and "parent URL"

For example, here are many items' links on categories pages. You scrape "category" from these categories pages to table "categories", then you open all items' links on these pages and scrape items' name, price, description to table"items", but here's no "category" value on items' pages, and you want to get it. You must relate table "categories" and table "items". Because items' URL is from categories' pages, this means item page's parent is category page. You can add a new "capture" node to scrape categories pages' URLs to table "categories" in a column named "url" and add a new "capture" node to scrape "parent url" to table "items" in a column named "parent_url". OK, these two tables are related with "url" and "parent_url". For example:

Steps

1. When configure the "scrape page" action to scrape category page, add a new "capture content" action, and change "extract type" to "page attribute" -> "url".

2. When configure the "scrape page" action to scrape item page, add a new "capture content" action, and change "extract type" to "page attribute" -> "parent url".

 

Relate table with "Item Unique attribute(name, ID...)"

For example, the parent pages have "category", and you scraped the "category" and other information in "table 1", then you open children links of items on these pages to scrape detail of items to "table 2", if items' pages have "category"(maybe in Bread crumbs), you can scrape "category" to "table 2" directly. Then you can relate "table 1" and "table 2" by "category" without scraping "URL" and "Parent URL".

Another example, you scrape a item page to "table 1", and then open many sub pages from them and scrape contents to "table 2", "table 3"... If all these sub pages have a unique and same content(such as item's name, ID). You can relate these tables by it. For example:

 

Merge two tables

Here's a template of "merge_tables_with_same_column" in Run script to Edit Data,  you can merge two tables with it.