Overview

 

FMiner is a visual web data extraction tool for web scraping and web screen scraping. Its intuitive user interface permit you to quickly harness the software's powerful data mining engine to extract data from web sites. 

The core engine it using is webkit browser (same as chrome), so it can extract data from most of sites include the dynamic websites with javascript/ajax. 

In addition to web scraping, it can be as a web macro software recording and playing human actions on the web browser.

 

Steps to scrape a site with FMiner

To design a project, you can record your actions in the integrated browser,  FMiner will generate a flowchart with each steps. Common procedures are:

1. Create a project, and begin "record".

2. Explore and do actions in the internal browser, all the steps will be recorded.

3. When got the page you want to scrape, create a "scrape page" action, and assign a table to hold the data.

4. Add some "capture content" and assign columns to them.

5. Done the project. Run it over, then export the results.

 

How it works

When build a project, it will create a flow chart to show how it works.

FMiner will start from "starting goto" action, and follow the arrow to do all actions. The right blue node is for scraping content from page, and the left black arrow is the route when action failed.

When done an action:

  • If fail(e.g. can't find the target to fill text), it will try to follow the black arrow from left joint, and if here's no left arrow, the execution will stop;
  • If success, it will try to find blue arrow from right joint, if here's a blue arrow, program will follow it to scrape contents from current page. Then regardless of whether here's a right blue arrow, it will follow the bottom red arrow to do next action.

This figure will perform like this:

Red numbers is the route all actions succeed; Blue numbers is the route the "fill" action failed(Logged in, can't find target to fill account, skip these steps). 

For the details how to control the execution flow, see here.