Parsing XML with HTML Agility Pack (instead of XDocument, etc.)

If you’re looking to easily parse some XML w/ .net and finding the normal XObject (Linq) isn’t cutting it, using HtmlAgilityPack could be your answer.

Below I’ll layout the simple steps for parsing a XML file w/ HtmlAgilityPack. The XML file I’m using to test is the WordPress export XML format. It uses lots of XML features so seemed like a good test. Let’s start. I’m using Visual Studio and in my example I’m assuming you have a project already open (I’m using a simple Console app). Continue reading “Parsing XML with HTML Agility Pack (instead of XDocument, etc.)”

Parsing XML with HTML Agility Pack (instead of XDocument, etc.)

Removing Bad (Spam) Traffic From Google Analytics

fake-spamIf you use Google Analytics, you’ve probably noticed in the past few months a ton of fake traffic in your website analytics. Traffic from referrals like social-buttons.com, best-seo-offer.com or 100dollars-seo.com (and other obviously legit sites). You’ll notice this traffic has a zero session view time and only visits the root of your site. Actually, “visits” isn’t even the correct term, because to my knowledge, these site are just loading the Google Analytics javascript with your tracking code. I guess they believe it’s an easy way for people to see their site “giving” you traffic, then you visit them – and who knows what happens next.

Below are the steps I use to create a new segment in Google Analytics. You can use this segment instead of the “All Sessions” default.

 

1. Add a new segment:

1

2. Give it a name:

(I called mine “Real Traffic”, since I’m attempting to keep only actual user visit data).

2

3. Go to the “advanced” area to start adding:

3

4. Let’s add the first rule to remove data with blank hostnames:

Make sure Sessions and Include is set, then select Hostname and Matches Regex. We’re including multiple domains (your domain and others you want to include (like webcache.googleusercontent.com). Be sure to add your domain to this list! Note: the “|.*” is RegEx for and + starts with.

4

5. Second and last rule:

Now the second rule has 2 conditions and is removing the other part of the bad / spam data using referral sources. This is where the action happens and starts to really clean the bad fake traffic. Make sure to set Sessions to Exclude, and use “and” in between the two parts of the condition.

5

 

That should be it. Save your segment and see if it makes a difference. I prefer this method over filters since it doesn’t remove any data.

In some of my test sites, I’m finding 96% of the traffic is fake (below is a comparison).

ga-sample

 

For reference, here are the sites I’ve found so far to be junk and I’m excluding:

social-buttons.com
simple-share-buttons.com
free-share-buttons.com
free-social-buttons.com
event-tracking.com
Get-Free-Traffic-Now.com
buttons-for-website.com
semalt.com
best-seo-offer.com
best-seo-solution.com
buttons-for-your-website.com
makemoneyonline.com
100dollars-seo.com
dailyrank.net

Edit 7/15 - 2 more additions to add:
success-seo.com
videos-for-your-business.com

Edit 8/3 - another awesome referrer:
yourserverisdown.com

Removing Bad (Spam) Traffic From Google Analytics

Removing WordPress Spam Comments In Bulk

I’ve encountered a few WordPress blogs that had been bombarded with spam comments (some over 200k comments – taking several gigs of db space). Usually these comments accumulated over a few months or years (many using Akismet) – but if you’ve tried to delete thousands of comments through the WordPress admin, you might have noticed it taking very long and timing out.

Does your comments page look like this?
Does your comments page look like this?

 

Having these comments exist in your WordPress site only increases the size of the database, causing backups and migrations / upgrades to take longer. Unless you have plans to review thousands of comments (if so, you probably have too much time on your hands) you should be able to delete these in my opinion. The fastest option to remove the comments that I’ve found is to delete them from the database side (MySql).

Continue reading “Removing WordPress Spam Comments In Bulk”

Removing WordPress Spam Comments In Bulk

IIS / WordPress – Blocking User Agents using Rewrite Rules

wp_logo

If you run WordPress on Windows (and who doesn’t?) and have the need to block specific user agents (bots, crawlers, browsers) below is a decent way I’ve found that uses rewriting rules and works along side the needed WordPress rules:

Adding this rule to your web.config will block the request from the specified agent(s):

 <rule name="RequestBlockingRule1" stopProcessing="true">
          <match url=".*" />
          <conditions>
            <add input="{HTTP_USER_AGENT}" pattern="agent1|agent2|agent3" />
          </conditions>
          <action type="CustomResponse" statusCode="403"
             statusReason="Forbidden: Access is denied."
             statusDescription="You do not have permission to view this page." />
        </rule>

Continue reading “IIS / WordPress – Blocking User Agents using Rewrite Rules”

IIS / WordPress – Blocking User Agents using Rewrite Rules

WordPress Multisite Windows / IIS Login Redirect / Loop Issue

wp_logoIf you have WordPress installed on Windows (2008 / IIS) with multisite + sub domains enabled you may have run into the issue of login loops / redirects when  accessing /wp-admin. Below are a few spots you may want to confirm you have the correct URLs:

In the database:

Table / Field

wp_blogs / domain
wp_options / siteurl
wp_options / home
wp_sitemeta / siteurl    (don't forget this one!)

Also check these tables in any additional sites (ie: wp_2_options).

Config Files
wp-config.php:

define('DOMAIN_CURRENT_SITE', 'www.yourdomain.com');

I also have this in my wp-config.php:

define('ADMIN_COOKIE_PATH', '/');
define('COOKIE_DOMAIN', '');
define('COOKIEPATH', '');
define('SITECOOKIEPATH', '');

Web Config
Below is my successful web.config

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <system.webServer>
    <rewrite>
      <rules>
        <rule name="WordPress Rule 1" stopProcessing="true">
          <match url="^index\.php$" ignoreCase="false" />
          <action type="None" />
        </rule>
        <rule name="WordPress Rule 2" stopProcessing="true">
          <match url="^([_0-9a-zA-Z-]+/)?wp-admin$" ignoreCase="false" />
          <action type="Redirect" url="{R:1}wp-admin/" redirectType="Permanent" />
        </rule>
        <rule name="WordPress Rule 3" stopProcessing="true">
          <match url="^" ignoreCase="false" />
          <conditions logicalGrouping="MatchAny">
            <add input="{REQUEST_FILENAME}" matchType="IsFile" ignoreCase="false" />
            <add input="{REQUEST_FILENAME}" matchType="IsDirectory" ignoreCase="false" />
          </conditions>
          <action type="None" />
        </rule>
        <rule name="WordPress Rule 4" stopProcessing="true">
          <match url="^" ignoreCase="false" />
          <conditions logicalGrouping="MatchAny">
            <add input="{REQUEST_FILENAME}" matchType="IsFile" ignoreCase="false" />
            <add input="{REQUEST_FILENAME}" matchType="IsDirectory" ignoreCase="false" />
            <add input="{URL}" pattern="([a-zA-Z0-9\./_-]+)\.axd" />
          </conditions>
          <action type="None" />
        </rule>
        <rule name="WordPress Rule 5" stopProcessing="true">
          <match url="^[_0-9a-zA-Z-]+/(wp-(content|admin|includes).*)" ignoreCase="false" />
          <action type="Rewrite" url="{R:1}" />
        </rule>
        <rule name="WordPress Rule 6" stopProcessing="true">
          <match url="." ignoreCase="false" />
          <action type="Rewrite" url="index.php" />
        </rule>
      </rules>
    </rewrite>
    <httpRedirect enabled="false" destination="http://www.yourdomain.com" />
  </system.webServer>
</configuration>
WordPress Multisite Windows / IIS Login Redirect / Loop Issue