FMiner updates logs
web scraping software version updates
Added path of ftp report, http://www.fminer.com/run-settingsclear-tables-run-traverse-table-rows-report-after-ru/
Fixed bug of "Prompt input a row to table".
Fixed the issue: when select big page's content, program will be very slow.
Added the feature:
1. Upload resutls to ftp. http://www.fminer.com/run-settingsclear-tables-run-traverse-table-rows-report-after-ru/
2. Create screenshot of page: http://www.fminer.com/capturecontent/
Fixed the bug: For "openlinks" action, when "Links type" is "Generate URLs", the UI can't update when switch to other actions.
Fixed the bug: "open recent project" not work for Mac OS edition.
For "capture content" action, add static variables type: [!current_time!], [!start_time!], [!project_name!]. http://www.fminer.com/capturecontent/
Added loading tip when the page not finish loading.
Fixed the bug: when here's some data in a table, add new column, the table can't be exported.
Change location of activation inforamtion of Mac Edition to system, then update to new version, no need activate it again.
Fixed the bug: downloading file not work, this is a new bug in 9.76.
Fixed the bug: error somethings when call table.select_rows in run code. http://www.fminer.com/run-code-action/
Fixed the bug: missed newlines when scraped target is hidden. (e.g. the hidden table).
Added verification mechanism to prevent special characters in "table name" and "column name".
Add Downloader, this is a tool to download files of URLs in a data table.
Add a pager to data tables, can use it go to "first", "previous", "next", "last" page to check all data in data table.
Fixed bug: when stop manually once, "run multiple times with each row data of a table" will not work, it will just travel one row and stop until close and open the project again.
Enabled "LocalStorageEnabled" in webkit settings, encountered a html5 site can't work without LocalStorageEnabled, enable it to make the site work.
Added "import_xls" and "import_csv" to table object of "run code", and added a template of "run code" to import xls/csv.
Changed "remove data", now click the button will remove all selected rows(The old version just remove the first row).
Fixed a serious bug: The execution will become slower and slower when run a long time. This bug does not effect much for the short time extraction.
Add escape character for proxy.
1.when "new project", settings not update.
2."click" action can't work for a site.
1. Import/export tables structure, it's important for make many projects with the same data structure.
2. Add "Generate URLs" to "openlink(s)" action.
Fixed the bug: Captcha get/input can't work.
Fixed the problem: projects on windows or Mac OS can't be opened on different system with the error “can't find module copy_reg” sometimes. Now changed project format to binary format.
0. Upgrade the core of QT to last version 4.8.6, it include new version of webkit, so can open some sites can't be open previously, such as instagram.com. http://www.fminer.com/forum/topic/248/
1.Changed the default folder of downloaded files to path of project.
2.Change the log window just show last 100 logs, this will save the resource for long run project.
3.Add "with class ..." in menu of "Change Xpath"
4.Fixed the problem of "parent URL".
5.Remove limit of 50 fields in table, now you can add as many fields.
6.Fixed the "rich text" format error of the "adjust data".
7.Made the scene size bigger of complex project.
8.Fixed bugs: can't capture "description" and "keywords" in meta when they are upper case.
9.62 didn't fix all bugs of " ", release this version to fix the issue.
Fixed bug in 9.6, select table show all blank to " "
Fixed the bug in 9.60, can't show images on pages.
1. Fixed some bugs when designing the project with iframe on pages.
2. Fixed the bug, when select big images on page, the program very slow to freeze.
3. Add a feature "Prompt input a row to table" before run.
4. Add export results to "xml/html" and "json" format.
1. Make the font better on Mac OS 10.9.
2. Changed "regular expression" capture. Add a helper dialog to capture content with "regular expression".
3. To stop the spam on forum, add a menu "register forum account" to FMiner. Disabled register account on site.
4. Fixed some bugs.
This version fixed the bug:
1. Can't set "max recursive level" on "openlink(s)" action.
1. Change the lines smooth.
2. For selected DOM on pages, now will change color of backgroup, not just the border, because some borders of tr in tables often not show.
3. Add numbers for multiple targets on "selection" panel.
Fixed bug can't scrape text from "select" on pages.
Add feature can "down", "up" the fields in "data table".
Add "referer" in header of request when download file from pages.
Add "useragent" control on "settings dialog, now can change it here not "useragent.ini" file.
Add "clear table" template in "run code".
Changed "Batch add URLs", now program will record old "plattern" and keywords.
Some small changes on UI.
Fixed the bug: when remove link on scene, program crash sometimes.
Fixed the bug: Argument '--expert_headers' can't work.
Added new argument '--move_errorlinks' for execute task externally.
Fixed bug of "wait download" http://www.fminer.com/capture-content/.
Fixed bug of "Run multiple times with each row data of a table".
Add missing templates of "run code" for pro version.
Changed http://www.fminer.com/execute-fminer-task-externally/ , now FMiner can just export data without run(resume) options.
Fixed the bug of "capture" action, "DOM attribute" type can't scrape some attribute such as tag, innterText. This bug just in version 9.00.
Update SDK of PySide from 1.11 to 1.21, the new version of PySide included the new version of webkit, it's more stable.
Fixed bug: Browsers' loading icons not show.
Note: This version must be downloaded and be upgraded manually.
Fixed bug: when step run "scrape action", not update data table for new data
Add feature: when run a project, will created a project.running.pid file, and when done the extraction, will created a project.done.pid file. Then you can know the program status and pid.
Fixed bug, when add the first field to table, will be repeated twice.
Changed select action, now can select options with index format like "#1".
Changed position of XPath, now can input data here.
Add a feature "Insert Data to Column", with this feature, you can insert many string or number to a column of a table.
Fixed bug: can't really stop the program with option of "Run multiple times with each row data of a table".
Fixed a bug in core of "adblock" and "block files".
Add new feature to filter of "open links" action, now it can allow/ban URLs in a column of table. This feature can be used for "extract Incrementally", for example, you can capture the URL of a page to a column when save data, then set the filter to ban URLs in this column, the program will not open these links scraped when run again.
Add a feature "Get License Number from Invoice Number", because some users can't get the email with the license number when order, the email often be discarded by spam filter. With this feature, you can get license number directly if you can't get the email.
Fixed bug: often "Trial expires" for the new user.
Fixed bug: 'runto' can't work.
Changed the price of pro version from $398 to $248. And changed "install limit" from "2 host" to "1 host".
The old license of pro version still can be used on 2 host, so it's fair for old users.
8.11 not really fix the bug in activating, this version fixed it.
Fixed the bug: get the error message "None type object has no attribute 'strip'" sometimes when active the program.
1. This version fixed bug in "Automatic upgrade" of version 8.00, 8.00 can't update automatically, so if you are using 8.00, you should download and install 8.10 manually.
2. Add a feature of "Run multiples times with each row data of a table" for Pro version, with this feature you can input more than one sets of data to form of page easily, does not need such a complex operation http://www.fminer.com/input-every-row-data-table-1/.
1. Fixed bug of "scheduler".
2. Add new action "scroll down page".
3. Add "Add a unique time suffix to the output file/folder name" on export config dialog, you can export data to a different file with this option.
4. Add some arguments for executing fminer task externally.
Fixed the bug: when add a column to data table, it not been recorded in project file(.fmpx), just in database file(*.db), then when remove and rebuild the database file, the columns will miss.
1. Add a feature "Get data from file" see: http://www.fminer.com/input-data/
2. Fixed the bug: program will crash sometimes when remove node from the scene.
2. Add "log" action, it can be used to control the flow, and record error links, and report failed captcha to third-party captcha decoding services to save money.
1. Fixed the bug "URLs filter" in "openlinks" action can't work.
2. Fixed the bug "Max recursive level" not work correctly.
1. Change project file format, seperate the project file from the database, so It can add more database format as the internal database in later version.
2. "Validate" action add "js result" option, you can write some js code here, and it will validate the returned value.
3. Add "open link with image" option to the "change target xpath", it's for locate the image link without text.
4. Add "not wait when done" to the actions, when check it, when done the action, FMiner will not wait to do the next action.
5. Fixed some bugs.
Add "Save code and screenshot of waring/error pages" option on settings dialog, you can uncheck it for running the big project to get better performance.
Add "Run scripts to edit data" for edit the scraped data.
Fixed bug when export xlsx file according http://lsimons.wordpress.com/2011/03/17/stripping-illegal-characters-out-of-xml-in-python/
Fixed a bug to export xlsx file.
Add support for exporting and importing xlsx(Excel 2007) file.
Fixed the bug: Click "run to current action", FMiner not stop when get to the selected action.
Add "Clear up cookies footprint".
1. Remove "set and change proxy" in "before do action", because it's conflict with the "rotate proxy", and not very useful.
2.Remove "Do follow actions on initial page" option in "open link(s)" action, because most sites need enable it when "Open link(s) recursively" enabled, and It is difficult to understand. Now when "Open link(s) recursively" enabled, it will be enabled implicitly.
3. Make the "Pages" number when running correctly.
Add a "Block files" option, then can enable it to block all additional files of pages, eg. js, css, jpg... files.
Fixed some bugs.
Add a features of "rotate proxies every some pages/minutes", then it can use list of proxies easily.
Change format of "scraped text", now the scraped text will like the copy from page to notepad with delimiter.
This version fixed the bug: adblcok often crash the program, and reduced memory usage.
This version add a "relatively select" button for locate targets on pages, change the default XPath for selected targets in "capture content" node. They are all for positioning targets accurately for capture contents from bulk pages.
Make some optimization for running project: 1.disable updating data tables when running. 2. disable scene and attribute panel updating when running. 3. disable "web inspector" of the main browser when running.
This version add "parent url" in "capture content" nodes, then we can scrape the parent page's url, it's helpful to associate multiple tables hold the data scraped from different pages. For example, you can capture url and other data to table 1 from a page, then open links from this page and save scrape data and "parent url" from these children pages to table 2, then the "url" in table 1 will be same with the "parent url" in table 2.
This version fixed a problem when "goto" and "open link(s)" have multiple links. Now the new opened browsers will have the same global variables for the "goto" and "open link(s)" nodes.
Add "write results to variables" option in "scrape page" nodes. It can be used for writing script codes in "runcode" nodes.
Limit the details of blocks on the selection widget to top 10 for performance. This will avoid pages freeze when the pages are big and many targets are selected.
Add static data type in "capture content" node, and you can save some static data to the tables with it.
Make the cursor busy when page loading to avoid misuse.
Add copy data name in context menus of data tables and variables, this is to facilitate to write scripts.
Fixed bug: when more than one scrape page nodes with multiple data tables, the assigned tables often change incorrectly when switch between these scrape page nodes.
Fixed bug: when a Goto node is not the starting node, and has bulk links, only the first link be valid
This version changed welcome dialog, and add some links on it to help the new users.
Fixed the bug: When add "capture content[download]" by right clicking on the page during record, the target is not the link sometimes.
When add a recursive "open link(s)" node, "Do follow actions on initial page" will be checked by default.
Fixed bug: when the page has more than one frames, can't select the target(s) on it. It's a major bug, and causeand can't work correctly, and I launched the new version so quickly is because of this bug.
Add max recursive level in recursive open link(s) node, then you can set the deep of the spider can go.
Add URL filter in open link(s) node, then you can select bulk of links and set a filter just allow the special links.
Add regular express option in capture content node, it can extract the data from the target's html code, you can use it to extract some special data(eg, email address).
Add Non-empty attribute to data field, then if it can't scrape content for this field, FMiner will ignore and doesn't save this data.
Add a default folder for hold the downloaded files when a capturing node is downloading type, default folder is "files".
Add an option "capture content [download]" to the menu when record actions, it's for downloading image or files.
Fixed bug: When select a "capture content" node, run to will never reach it.
Change the model of "select target" and "group select", now, when clicked button of "select target" or "group select", after click the target on page to select it, the checked button will raise automatically, so you can continue to do other things(eg, record actions). This is convenient for recording macro, and reduced the operating steps. Because it's a big change for operation, and some video tutorials need change.
Change the model of tip window, we have changed tip contents shorter and the tip widget will popup at every steps, and add links on them to open the document of the step or current node(action).
Add timestamp at the end of each line of the logs.
Add feature of doing "compact database" when "clear data table", because without "compact database", the project will not reduce and will become bigger and bigger when add and clear data frequently.
Fixed bug: When clicked a link on log window, later added logs will be abnormal.
Fixed bug: The result URLs of "open link(s)" nodes are not correct and just show text not links at some times.
Fixed bug: When click "clear" button to clear all nodes, the attribute panel doesn't update.
Fixed the bug: when change focus from a "validate" node to other node, target panel disappear sometimes.
Add a "backup the project to..." option in file menu for backup the project. It's because when I doing a project, accidentally click the "clear" button, then I had to build the project again. A backup button is necessary.
Now FMiner will record "page screenshot", "page code", and "page url" in logs when action fail or scrape nothing, and this can facilitate debugging.
Fixed bug standard version can't export sqlite format.
Add an option "Do follow actions on initial page" in "open link(s)" recursively node. This is an important feature, when checked it, FMiner will do all actions (scraping or others) behind the "recursively node" on the initial page before open "next" link. With it, you do not need drag and drop link between nodes for scraping the first page.
1.Add Try scrape once when no blocks found option on settings dialog.
2.Add transfer license to another computer feature.
1.Fixed the bug when export tables to database.
2.Add support of sqlite export.
3.Add "open link(s) recursively" option in open link(s) actions. This is an important improvement, now no need drag and drop line to open "next link" recursively. (Need some new tutorials to replace the old to show this feature)
This release adds a feature of automatic update. The program will detect whether a new version exists at startup and update to the latest version automatically. It can be disabled on settings dialog.