Creating a Chrome Plugin to Scrape A Page (using jQuery)

If you’ve played w/ Chrome extensions at all, you know they are super powerful. I recently wanted to visit a bunch of pages, and extract some info from each page. I could easily run some jQuery script in the console of each page to do this, but I wanted a quick and easy way to do this. Creating a Chrome extension, that includes jQuery, to run locally is pretty simple. Below are the different files (5 of them) you’ll need (put these all in a single folder).

After creating these and adding your code, add to Chrome by going to Extensions > Load unpacked extension and choose your folder.

1: manifest.json

{
 "name": "Your Extension Name",
 "description": "This was easy",
 "version": "1.1",
 "background": {
 "scripts": [ "jquery-3.1.1.min.js","background.js","content.js"]
 },
 "permissions": [
 "tabs", "http://*/*", "https://*/*"
 ],
 "browser_action": {
 "default_title": "My Extension Title",
 "default_icon": "a-cool-logo.png"
 },
 "manifest_version": 1
}

2: jquery-3.1.1.min.js (get a copy from jquery.com)

 

3: a-cool-logo.png (16px x 16px)

 

4: background.js

chrome.browserAction.onClicked.addListener(function(tab) {
 chrome.extension.getBackgroundPage().console.log('your plugin gonna do something');
//maybe see if your plugin should be allowed to run
 if (tab.url.indexOf("/maybeCheckAurl/") !== -1) {
 
chrome.tabs.executeScript(null, { file: "jquery-3.1.1.min.js" }, function() {
 chrome.tabs.executeScript(null, { file: "content.js" });
});
 
 }
 
});

5: content.js (the magic happen here)

//i check to make sure jQuery is loaded
if (jQuery) { 
 
 jQuery(".someclass a").each( function() { 
 
//log the results 
 console.log($(this).attr("href"));

//maybe do something with them
 $.get( "http://yourapi"), function( data ) {});
 
 });
 
} else {
 alert('no jq');
}
Creating a Chrome Plugin to Scrape A Page (using jQuery)

Allow Local File Access in Chrome (Windows)

chrome-128Sometimes it’s cool to debug and test javascript applications in Chrome but you want to read / write to local files. Example: you’re looking to use ajax and do a $.getJSON(‘json/somefile.json’). Chrome by default won’t allow this and will throw an error similar to:

Failed to load resource: No 'Access-Control-Allow-Origin' 
header is present on the requested resource. 
Origin 'null' is therefore not allowed access.

Or
XMLHttpRequest cannot load. No 'Access-Control-Allow-Origin' 
header is present on the requested resource. 
Origin 'null' is therefore not allowed access.

Chrome does have a switch to enable this, it’s quite easy to turn on. You’ll need to make sure Chrome is closed completely, and run chrome with the ‘–allow-file-access-from-files’ flag. Ie:

C:\Users\<user>\AppData\Local\Google\Chrome\Application>
chrome --allow-file-access-from-files

Or you should be able to run:

%localappdata%\google\chrome\application\chrome --allow-file-access-from-files

I’ve made the below into a .bat file I use, if you find it helps.

start "chrome" %localappdata%\google\chrome\application\chrome --allow-file-access-from-files
exit

To see if the flag is set, you can visit:  chrome://version/ and look at the Command Line section and you should see –allow-file-access-from-files

You’ll most likely need to run this with at least admin access, and I would caution visiting unknown sites with this setting on, as they could capitalize on your setting and potentially read local files.

Update, see my newer post on using node and http-server to create a local web server to get around these issues:
Local web server for testing / development using Node.js and http-server

Allow Local File Access in Chrome (Windows)