Deployment / Scraping

Obtaining content through scraping of web pages, and then allowing for the transformation of data and contained obtained, and deliver as simpler web APIs.

Tools

Scrapy - An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.
Apache Tika - Apache TikaThe Apache Software Foundation Apache Tika - a content analysis toolkit The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.

Services

Scrapinghub - Scrapinghub is a company that provides web crawling solutions, including a platform for running crawlers, a tool for building scrapers visually, data feed providers (DaaS) and a consulting team to help startups and enterprises build and maintain their web crawling infrastructures.
ParseHub - ParseHub is a free web scraping tool. With our advanced web scraper, extracting data is as easy as clicking the data you need.
Import.io - Import.io is intuitive and highly capable. Simply point-and-click to show us the data of interest on a web page. Machine learning based – no coding required

Notes:

	Content Harvesting & Extraction
	Concept Extraction
	Summarization
	Entity Extraction
	Taxonomy & Classification
	Relation Extraction
	Article Extraction
	Discussion Extraction
	Date Extraction
	Author Extraction
	Product Extraction
	Related Phrases
	Pagination Extraction
	Dictionaries
	Crawling
	Seed URLs
	Pseudo-URLs
	Scripting
	Conditional Expressions
	XPath
	RegEx
	Injection
	Timeout
	Content Storage & Access
	Content Latest Index
	Historical Index
	Storage
	Search
	Automation & Orchestration
	API
	Webhooks
	Command Line Interface
	DNS
	Domain Lists
	Domain Metadata
	Document Processing
	Feed Detection
	PDF Extraction
	Word Documents
	Integrations
	Dropbox
	Amazon S3
	Google Sheets
	Plot.ly
	Silk
	Tableau
	International
	Language Detection
	Geo IP Address
	Machine Learning
	Semantic Text Analysis
	Semantic Similarity
	Sentiment Analysis
	Emotion Analysis
	Media Acquisition
	Image Extraction
	Video Extraction
	Image Tagging
	Image Color Extraction
	Face Detection
	Barcode Recognition
	License Plate Recognition
	Structured Data Extraction
	HTML Table Extraction
	Spreadsheet Extraction
	CSV Files
	JSON Files
	Microformats Parsing
	XML Extraction
	Utilities
	Proxies
	Cookies
	Headers
	User Agents
	IP Address
	Logging
	Batch Calls
	Scheduler
	Low Latency
	Analytics
	Reporting
	URL Metrics
	Spam Score
	Rankings

(Indexing)

Line

Listing

(Content Harvesting & Extraction)

Deployment / Scraping

Tools

Services

API Evangelist Partners

Get in touch