Sitecore analytics stops tracking

Got a weird issue recently when Sitecore analytics stop tracking page visits on one of our client site. The site is running on Sitecore 7.2 rev 140228 and has been running for a few weeks, looking at the ItemUrls and Visits table the records doesn’t seem to be populated when the site pages was accessed a couple of times even with different browsers.

I quickly ran SQL Server Profiler against the analytics database and try to access a couple of the pages again and definitely see some activities against the analytics database but none of it which inserting data to ItemUrls and Visits table. I tried clearing out all the records in the analytics database, all tables except PageEventDefinitions, TrafficTypes, and VisitorClassifications because those tables have default values in it when we setup a Sitecore fresh install.

Still no avail, I then wondered if the config values for the analytics is set correctly. I access the /sitecore/admin/showconfig.aspx and was able to confirm that the configuration was already right. Trying to pinpoint the issue I was wondering what files are included when we install the analytics feature on Sitecore, opening the zip file I found the following:

  • Sitecore.Analytics.config
  • Sitecore.Analytics.ExcludeRobots.config
  • Sitecore.Analytics.RobotDetection.config
  • Sitecore.Analytics.ldf
  • Sitecore.Analytics.mdf

Wondering if the cause is because one of the following config file I then tried to disable the Sitecore.Analytics.RobotDetection.config file by renaming it to Sitecore.Analytics.RobotDetection.config.disabled (rename the extension to anything other than .config so Sitecore ignores it).

Aha!. The analytics start working again.

In the Sitecore.Analytics.RobotDetection.config file this two line pick my interest
<acceptChanges>
<processor patch:instead="*[@type='Sitecore.Analytics.Pipelines.AcceptChanges.Robots,Sitecore.Analytics']" type ="Sitecore.Analytics.RobotDetection.Pipelines.AcceptChanges.Robots, Sitecore.Analytics.RobotDetection"/>
</acceptChanges>

<initializeTracker>
<processor patch:instead="*[@type='Sitecore.Analytics.Pipelines.InitializeTracker.Robots,Sitecore.Analytics']" type ="Sitecore.Analytics.RobotDetection.Pipelines.InitializeTracker.Robots, Sitecore.Analytics.RobotDetection"/>
</initializeTracker>

It replaces the Robots tracker from the Sitecore.Analytics and use the one from Sitecore.Analytics.RobotDetection dll instead. When I take a look on the Sitecore.Analytics.RobotDetection assemby this is what I found.

Sitecore.Analytics.RobotDetection.Pipelines.AcceptChanges.Robots
public override void Process(AcceptChangesArgs args)
{
Assert.ArgumentNotNull((object)args, "args");
if (AnalyticsSettings.DisableDatabase)
args.AbortPipeline();
if (args.Visitor.VisitorClassification < 900 || !AnalyticsSettings.Robots.IgnoreRobots || Switcher<bool, saverobotsenabler="">.CurrentValue)
return;
args.AbortPipeline();
}

Sitecore.Analytics.RobotDetection.Pipelines.InitializeTracker.Robots
public override void Process(InitializeTrackerArgs args)
{
Assert.ArgumentNotNull((object) args, "args");
if (!AnalyticsSettings.Robots.AutoDetect)
return;
if (Tracker.CurrentVisit.VisitorVisitIndex <= 1 && Tracker.CurrentPage.VisitPageIndex <= 1 && args.CanBeRobot)
{
Tracker.Visitor.SetVisitorClassification(925, 925, true);
this.SetRobotSessionTimeout();
}
this.UpdateVisitCookie();
}

On the first visit it will set the visitor classification to 925 which means this visitor is flagged as Bot – Auto Detected which then on the acceptChanges pipeline it will abort the pipeline causing it not to insert any data to the ItemUrls and Visits table.

The issue is on the Sitecore.Analytics.RobotDetection.Pipelines.InitializeTracker.Robots class, it always mark the visitor on first visit as Bot which caused the activity to be ignored by the Sitecore analytics. Compared to what Sitecore.Analytics.Pipelines.AcceptChanges.Robots is doing here
public override void Process(InitializeTrackerArgs args)
{
Assert.ArgumentNotNull((object) args, "args");
if (!AnalyticsSettings.Robots.AutoDetect)
return;
if (Tracker.CurrentVisit.VisitorVisitIndex <= 1 && Tracker.CurrentPage.VisitPageIndex <= 1)
{
if (args.CanBeRobot)
{
Tracker.Visitor.SetVisitorClassification(925, 925, true);
this.SetRobotSessionTimeout();
}
}
else if (Tracker.CurrentPage.VisitPageIndex == 2 || Tracker.CurrentVisit.VisitorVisitIndex == 2 && Tracker.CurrentPage.VisitPageIndex <= 1)
{
Tracker.Visitor.SetVisitorClassification(Tracker.Visitor.VisitorClassification == 925 ? 0 : Tracker.Visitor.VisitorClassification, Tracker.Visitor.OverrideVisitorClassification == 925 ? 0 : Tracker.Visitor.OverrideVisitorClassification, true);
this.ResetSessionTimeout();
}
this.UpdateVisitCookie();
}

on Sitecore.Analytics.Pipelines.AcceptChanges.Robots it will mark the visitor as Bot on the first visit but then on the second visit it will mark the visitor as Unidentified and log the activitiy to ItemUrl and Visits table.

[update-7-November-2014]

My colleague Cahyadi Hendrawan contacted Sitecore support and confirmed there was a breaking changes in Sitecore 7.2 robot detection logic. The solution was to add

<head>
….
@Html.Sitecore().VisitorIdentification
</head>

[/update-7-November-2014]

Sitecore – JqueryModalDialogs error when opening popup modal

I got an issue when trying to open a modal popup when logged in to Sitecore

Error when trying to show popup modal dialog

Error when trying to show popup modal dialog

The site has just been setup in IIS, the server is running Windows Server 2008 R2 and it’s using Sitecore 7.2. The strange thing is that there’s another site that’s running under the same Sitecore version and the issue does not happen over there.

I got a clue after opening the ApplicationHost.config file on system32\inetsrv\config and it turns out that on the other site the UrlScan 3.1 was removed from the ISAPI Filters. After i do that the new site runs smoothly as well.

Using powershell to install Sitecore from a zip file until we get a running website

I had a chance to play around with powershell to get Sitecore up and running supplied only with the Sitecore zip file. Starting with a discussion with a colleague of mine about how powershell can register a website in IIS and also have access to Sql Server database i’m curious to actually try it out.

The script basically does the following

  • Get Sitecore CMS and DMS zip file from a specified folder file
  • Extract both zip file into a target folder
  • Structure the folder so that DMS files goes into the right folder
  • Register a new web application pool in IIS
  • Register a new website in IIS
  • Update the host file
  • Attach the database files in Sql Server
  • Start the website
  • Open a browser and if all went ok we’ll have Sitecore displayed

I get to know a few things about powershell as well in doing this, it’s actually pretty powerfull stuff. Feel free to checkout the code in https://github.com/reyrahadian/sitecore-installer

Sending out Microsoft Search Server failed crawl messages

What i want to do is to build an app that send out an email every time there’s a failed crawl operation when we trigger the crawling operation against a service service application. After reading about it, it seems there are no configuration in the Microsoft Search Server that provide this out of the box.

Instead i found out that you could use the PowerShell to perform administrative task. From getting the search service application of a farm, add it, delete it, and other kind of tasks, read more about the cmdlets available in the documentation.

With this knowledge I’m set to build is a simple script to get the current error message from today’s crawl operation and then send the results through an email. This script will be triggered daily by a task scheduler.

I have no experience on coding on PowerShell before and luckily i found some existing script that i can just tweak a bit to do just what i want.

The PowerShell script that I’ve tweaked.

$ver = $host | select version if ($ver.Version.Major -gt 1) {$Host.Runspace.ThreadOptions = "ReuseThread"} if ( (Get-PSSnapin -Name Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue) -eq $null ) { Add-PsSnapin Microsoft.SharePoint.PowerShell } #----CONFIGS $searchServiceName = "just testing" $logFilePath = "c:\temp\searchserverlog.txt" function GetErrorLogs(){ $ssa = Get-SPEnterpriseSearchServiceApplication | Where-Object {$_.Name -eq $searchServiceName} $ssaContent = new-object Microsoft.Office.Server.Search.Administration.Content($ssa) $logViewer = New-Object Microsoft.Office.Server.Search.Administration.Logviewer $ssa $ErrorList = $logViewer.GetAllStatusMessages() | Select ErrorId $crawlLogFilters = New-Object Microsoft.Office.Server.Search.Administration.CrawlLogFilters; $crawlLogFilters.AddFilter([Microsoft.Office.Server.Search.Administration.MessageType]"Error"); $startNum = 0; $errorItems += $logViewer.GetCurrentCrawlLogData($crawlLogFilters, ([ref] $startNum)); WHILE($startNum -ne -1){ $crawlLogFilters.AddFilter("StartAt", $startNum); $startNum = 0; $errorItems += $logViewer.GetCurrentCrawlLogData($crawlLogFilters, ([ref] $startNum)); } return $errorItems } function sendEmail($errorItemsParam){ $currentDate = get-date; $isThereAnyErrorsToday = "false"; $result = "<table>"; foreach($error in $errorItemsParam){ $date = get-date $error.LastTouchStart; if($date.Date -eq $currentDate.Date){ $result += "<tr><td>DisplayUrl</td><td>"+$error.DisplayUrl+"</td></tr>"; $result += "<tr><td>ErrorLevel</td><td>"+$error.ErrorLevel+"</td></tr>"; $result += "<tr><td>ErrorMsg</td><td>"+$error.ErrorMsg+"</td></tr>"; $result += "<tr><td>HResult</td><td>"+$error.HResult+"</td></tr>"; $result += "<tr><td>ErrorDesc</td><td>"+$error.ErrorDesc+"</td></tr>"; $result += "<tr><td>ContentSourceId</td><td>"+$error.ContentSourceId+"</td></tr>"; $result += "<tr><td>LastTouchStart</td><td>"+$error.LastTouchStart+"</td></tr>"; $result += "<tr><td colspan='2'></td></tr>"; $result += "<tr><td colspan='2'></td></tr>"; $isThereAnyErrorsToday = "true"; } } $result += "</table>"; #Write-Host $result $result > $logFilePath #SMTP server name $smtpServer = "mailserver.com" #Creating a Mail object $msg = new-object Net.Mail.MailMessage #Email structure $msg.From = "powershellscript@xxxx.com" $msg.ReplyTo = "powershellscript@xxxx.com" $msg.To.Add("recipient@xxxx.com") $msg.Subject = "Search server error logs" $msg.Body = Get-Content $logFilePath $msg.IsBodyHtml = "true"; #Creating SMTP server object $smtp = new-object Net.Mail.SmtpClient($smtpServer) #Sending email if($isThereAnyErrorsToday -eq "true"){ Write-Host "Sending Email" $smtp.Send($msg) } else{ Write-Host "No error today" } } $messageBody = GetErrorLogs sendEmail($messageBody)

This script will be executed by a batch script which in turn is setup to run every day by a Task Scheduler.

powershell -noexit "& "D:\search-error-in-logs.ps1"

 

The end result would be like this

DisplayUrl http://kompasabc.com ErrorLevel 2 ErrorMsg The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use Search Administration page. HResult -2147216863 ErrorDesc ContentSourceId 3 LastTouchStart 03/22/2013 00:24:01 DisplayUrl http://ljkadjalsdkjalskdlaskjdlajd.com ErrorLevel 2 ErrorMsg The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use Search Administration page. HResult -2147216863 ErrorDesc ContentSourceId 3 LastTouchStart 03/22/2013 00:23:53 DisplayUrl http://kompas123.com ErrorLevel 2 ErrorMsg The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use Search Administration page. HResult -2147216863 ErrorDesc ContentSourceId 3 LastTouchStart 03/22/2013 00:23:53 DisplayUrl http://yayayayayayayayaya.com ErrorLevel 2 ErrorMsg The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use Search Administration page. HResult -2147216863 ErrorDesc ContentSourceId 3 LastTouchStart 03/22/2013 00:23:53 DisplayUrl http://kompasabc123.com ErrorLevel 2 ErrorMsg The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use Search Administration page. HResult -2147216863 ErrorDesc ContentSourceId 3 LastTouchStart 03/22/2013 00:23:51

How database index works

Ever wonder when you trigger a query that took 2 minutes to complete then when you applied index as suggested by the SQL Server Database Engine Tuning Advisor then it suddenly runs under a second?.

Think of it like this, when a data is stored in a disk based storage devices, the entirety of the data is stored as blocks of data. When running a query against unindexed field which value is not unique, to search a value it would require to scan the entire blocks of data (at worst N).

With an indexed field, a new blocks of data is created to store the indexed field which value is already sorted. Therefore binary search is performed when trying to find a value that in the indexed fields (log2 N).

Now for example if you have a table schema like the following

Person

Field Data type Size in disk
Id (primary key) unsigned int 4 bytes
FirstName char(50) 50 bytes
LastName char(50) 50 bytes

with that schema, to store a record in a disk it would take 104 bytes, or in one disk block (1024 bytes) it can store 1024/104 = 9 records in a disk block. If you have 1.000.000 records then it would take 1.000.000/9 = 111.111 disk block to store all of those data.

Now depending on the type of query that you run against the table you would get different result in performance, for example if you do a search query against the Id field it would perform a binary search (log2 n) which results in log2 111.111 = 16 block access. This is possible because the Id field is a primary key which value has to be unique and also has been sorted.

Compare it with a query against the FirstName field, since the FirstName field is not sorted a binary search would not be possible, thus it would require exactly 111.111 block access to find the value,  a huge difference.

Creating an index would help greatly in slow performing query, but once again you have to keep mind that creating index would also mean creating a new data structure that is stored in the disk. For example if we are to create an index for the FirstName field

Field Data type Size in disk
FirstName char(50) 50 bytes
(record pointer) special 4 bytes

Based on that schema then it would require 54 bytes for each record. 1024/54 = 18 record in a disk block.  1.000.000/18 = 55.555 disk block. Manage your index wisely, the more fields the index contains or the more index that you created the more disk space that it’s going to take.

reference:

http://stackoverflow.com/questions/1108/how-does-database-indexing-work/1130#1130

Sitecore Content Tree Structure

I’m sure there are a lot of variation implementing content tree structure out there, here I’m sharing the content tree structure that i have grown liking to.

The idea is to keep things simple and organized for the content editor so they can work their way around the site and feels natural about how the content is structured.

 

image

We start simple, a website node has two child node, a home page node and a libraries node.

A home page node serve as the landing page when the site URL is accessed,  a libraries serve as a place to put the data sources of components that’s being used through out the site.

imageimage

And there’s pretty much how content tree is structured, as a convention i usually use lowercase and no whitespace for the pages node item name because that translate directly into a URL, so you want a friendly URL to be generated. You can enforce this by using a custom pipeline that change the item name after creation and update but convention is good enough for me.

For the pages under supplementary  pages, these are the pages that do not belong to the site’s pages hierarchy/sitemap and usually i give them alias so we can achieve URL such as /404 instead of /supplementary-pages/404. It’s just a matter of taste but think about it.

Hey, what about multisite ?

image

What about it ?, we replicate the same structure for the other sites and if you notice we have a global node over there. The global node serve as a place to put shared content across the sites, maybe the sites shared the same banner data source or carousel put it in the libraries of the global node. And that also goes for the shared pages, you can put it in the supplementary-pages of the global node.

The same structure can be found on other places as well

imageimageimage

 

also keep in mind these following things:

  • if you have a need of referencing items avoid referencing by item path and use item id instead because item path can change, for example when the content editor move the item around or rename them.
  • remember that it’s a bad practice to have hundreds of child nodes under a single node, use categorization to manage them – see news page above.
  • if there’s a need to store a huge amount of data that results in hundreds of child nodes don’t try to store it Sitecore as is, give it a thought and see if storing in a custom table suit your requirement better or take a look at the Sitecore Item Bucket.
  • use branch template to specify a content structure.
  • use insert options to specify child node type that can be inserted under a node.
  • use icon to give visual information to your content editor, this way they can easily distinguish which is which, but be wise in the implementation don’t just give any random icon.

Debugging Salesforce Callout

Inside of the Salesforce Apex class we can have a bit of code that execute an http request call to an external site, for example if we have a need to send information from Salesforce to an external site. Normally i would use Salesforce outbound message as it has more feature such as resend the message if the first request failed, able to send bulk messages and so on.

However Salesforce Outbound Message as part of the workflow can only be triggered on the create and update event of an entity. Therefore if we want to send information about the entity that is being manipulated on the delete event it’s not possible to do that.

The workaround is to use Apex Trigger that uses HttpRequest class to POST data to an external site, it could a be webservice and aspx page you get the idea. The downside is that there isn’t an easy way to see the debug log to see if the callout succeed or not when we execute the method because when we write a method that contain callout, that method has to be mark with @future(callout=true) and thus making it run asynchronously. Because of this all the log that we set inside the callout method will not showup on the developer console log viewer. Or is it..

Using the developer console, we can execute an Apex Code to trigger a callout, and then we can see the log just fine.

image

image

Make sure you tick the Open Log so that it opens the log viewer after executing the code

image

And then there’s the exception that i want to see.

a simple example of creating apex trigger and executing callout
http://cheenath.com/?tutorial/sfdc/sample1/index.html

checkout this link if you’re interested with viewing the http raw data when executing the callout
http://sfdc.arrowpointe.com/2010/02/16/endpoint-for-debugging-http-callouts/

Request format is unrecognized for URL unexpectedly ending in /methodname

Spent quite some time trying to figure out this error message

Request format is unrecognized for URL unexpectedly ending in /methodname 

 

This happens when i tried to send http POST to a remote computer, work fine on my local machine, from here it says.

The .NET Framework 1.1 defines a new protocol that is named HttpPostLocalhost. By default, this new protocol is enabled. This protocol permits invoking Web services that use HTTP POST requests from applications on the same computer

the solution is to add the following in the web.config

<configuration>
    <system.web>
    <webServices>
        <protocols>
            <add name="HttpGet"/>
            <add name="HttpPost"/>
        </protocols>
    </webServices>
    </system.web>
</configuration>

 

ref: http://stackoverflow.com/questions/657313/request-format-is-unrecognized-for-url-unexpectedly-ending-in

Stack VS Heap

There’s two type of memory that we should know. Stack and Heap, what’s the difference ?

Stack

  • keeps track of the code execution contains
  • it uses LIFO structure
  • high performance memory, fixed limit
  • local variables goes to stack
  • points to an object in heap

Heap

  • It’s a large pool of operating system memory
  • used in dynamic memory allocation
  • garbage collector remove any resources that no longer used

 

let’s take a look on how stack and heap works

public void Go()
{
    MyInt x = new MyInt();
    x.MyValue = 2;

    DoSomething(x);

    Console.WriteLine(x.MyValue.ToString());
  
}

public void DoSomething(MyInt pValue)
{
    pValue.MyValue = 12345;
}

when that method is called, this what’s happening in the stack

  1. The Go() method is called
  2. the x local variable is defined and goes to the stack, x is a pointer that point to MyInt instance in the heap
  3. MyValue property is set, the value is stored in heap because MyValue is declared on the heap
  4. DoSomething() method is called
  5. the MyValue value in the heap is changed to 12345

references:

https://www.youtube.com/watch?v=clOUdVDDzIM
http://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap
http://www.c-sharpcorner.com/UploadFile/rmcochran/csharp_memory01122006130034PM/csharp_memory.aspx

SQL Connection Pooling

It is expensive to create an SQL connection, a socket connection must be established, a handshake must occurred, the connection credential must be check against the list of known credentials. To optimize this a technique called SQL Connection Pooling is implemented.

Every time a connection is required, we request it from the connection pool, if there’s exist the connection by the specified connection string then it’s going to return that connection rather than creating a new one. Every time we are going to close a connection we do not close it but instead we returned it to the connection pool. This way we can save the connection for later use.

reference: http://msdn.microsoft.com/en-us/library/8xx3tyca(v=vs.80).aspx