This page is outdated, please go to here for new documentation!

Capture content

/media/fminercms/docsnag6/act-capture.jpg

Capture content nodes are nodes in scrape page nodes to assgin which content will be captured and will be saved into which column.

This action notifies the program that you wish to Extract Specific Data Elements from the page. To do do, find a data element you'd like to extract. Next, right click on that data element and you'll see a popup of selection options that permit you to Capture Content. Note that as you choose the Capture Content Option, the data element's XPath code will show up in the Target Select XPath bar of the Attribute Panel just below the visual designer panel. Use the Target Select Tool just below the XPath Target Bar to select one or multiple elements having the particular attribute.

Next, specify the data you would like to capture from the selected data block using the Extract Type Dropdown below the XPath Target Select Tool. Options include: Text, HTML, DOM Attribute, and Pages Attribute.

Finally, you'll want to specify a table and field to which you'd like the data element saved. To do so, simply click in the Table Text Entry area. The resulting popup will permit you to either Name a New Table or Select from an Existing Table. Upon identifying the table to be used, you can name specific columns in which you like each data element saved.

/media/fminercms/docsnag6/capture.jpg

target

See select target

extract type

Use the extract type selector to specify which information in the selected block is to be extracted. The default setting is for text, however, you have a number of options which include the element's html, DOM attribute, page attribute, download elements and regular expression.

Text

Capture text content of the target(s).

Html

Capture html code of the target(s).

Dom attribute

Capture Dom attribute of the target(s), here you should input the attribute name(eg, href, class...).

Page attribute

Capture attribute of the page.

  1. page title
  2. page metadata
  3. page URL
  4. parent URL

Download

Here you should assign a folder to hold the download files.

  1. Link

    Download file from link of the target.

  2. Image

    Download image of the selected target.

  3. Wait download

    wait download is for some special situation and the program will wait until some download request(e.g a page has a button, when click button something will be downloaded).

Regular express

Extract data from the target's html code with a "regular express", For example:

(\b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)

will scrape email address.

You can switch between them, to see the different results.

Static data

Save a static data here, and you can also set an inputted data(format as [$table.column$] or [%variable%], see input data) here.

save to database column

If you assign a data table in its ancestor scraping node, you can select a column field to keep this value.

adjust data with javascript (for pro version)

When checked it, you can change the captured data with javascript.

The scraped data is a variable of "data", and the last line of the javascript codes returned value will be as the result data. For example, if you captured "mailto:support@fminer.com", and js code like this:

data.substring(7)

You will get "support@fminer.com". Another example: We scraped "price:100$", and we want to just need"100", we can write js like this:

i = data.indexOf(':')
data.slice(i, -1)