Monthly Archives: March 2013

Sending out Microsoft Search Server failed crawl messages

What i want to do is to build an app that send out an email every time there’s a failed crawl operation when we trigger the crawling operation against a service service application. After reading about it, it seems there are no configuration in the Microsoft Search Server that provide this out of the box.

Instead i found out that you could use the PowerShell to perform administrative task. From getting the search service application of a farm, add it, delete it, and other kind of tasks, read more about the cmdlets available in the documentation.

With this knowledge I’m set to build is a simple script to get the current error message from today’s crawl operation and then send the results through an email. This script will be triggered daily by a task scheduler.

I have no experience on coding on PowerShell before and luckily i found some existing script that i can just tweak a bit to do just what i want.

The PowerShell script that I’ve tweaked.

$ver = $host | select version if ($ver.Version.Major -gt 1) {$Host.Runspace.ThreadOptions = "ReuseThread"} if ( (Get-PSSnapin -Name Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue) -eq $null ) { Add-PsSnapin Microsoft.SharePoint.PowerShell } #----CONFIGS $searchServiceName = "just testing" $logFilePath = "c:\temp\searchserverlog.txt" function GetErrorLogs(){ $ssa = Get-SPEnterpriseSearchServiceApplication | Where-Object {$_.Name -eq $searchServiceName} $ssaContent = new-object Microsoft.Office.Server.Search.Administration.Content($ssa) $logViewer = New-Object Microsoft.Office.Server.Search.Administration.Logviewer $ssa $ErrorList = $logViewer.GetAllStatusMessages() | Select ErrorId $crawlLogFilters = New-Object Microsoft.Office.Server.Search.Administration.CrawlLogFilters; $crawlLogFilters.AddFilter([Microsoft.Office.Server.Search.Administration.MessageType]"Error"); $startNum = 0; $errorItems += $logViewer.GetCurrentCrawlLogData($crawlLogFilters, ([ref] $startNum)); WHILE($startNum -ne -1){ $crawlLogFilters.AddFilter("StartAt", $startNum); $startNum = 0; $errorItems += $logViewer.GetCurrentCrawlLogData($crawlLogFilters, ([ref] $startNum)); } return $errorItems } function sendEmail($errorItemsParam){ $currentDate = get-date; $isThereAnyErrorsToday = "false"; $result = "<table>"; foreach($error in $errorItemsParam){ $date = get-date $error.LastTouchStart; if($date.Date -eq $currentDate.Date){ $result += "<tr><td>DisplayUrl</td><td>"+$error.DisplayUrl+"</td></tr>"; $result += "<tr><td>ErrorLevel</td><td>"+$error.ErrorLevel+"</td></tr>"; $result += "<tr><td>ErrorMsg</td><td>"+$error.ErrorMsg+"</td></tr>"; $result += "<tr><td>HResult</td><td>"+$error.HResult+"</td></tr>"; $result += "<tr><td>ErrorDesc</td><td>"+$error.ErrorDesc+"</td></tr>"; $result += "<tr><td>ContentSourceId</td><td>"+$error.ContentSourceId+"</td></tr>"; $result += "<tr><td>LastTouchStart</td><td>"+$error.LastTouchStart+"</td></tr>"; $result += "<tr><td colspan='2'></td></tr>"; $result += "<tr><td colspan='2'></td></tr>"; $isThereAnyErrorsToday = "true"; } } $result += "</table>"; #Write-Host $result $result > $logFilePath #SMTP server name $smtpServer = "mailserver.com" #Creating a Mail object $msg = new-object Net.Mail.MailMessage #Email structure $msg.From = "powershellscript@xxxx.com" $msg.ReplyTo = "powershellscript@xxxx.com" $msg.To.Add("recipient@xxxx.com") $msg.Subject = "Search server error logs" $msg.Body = Get-Content $logFilePath $msg.IsBodyHtml = "true"; #Creating SMTP server object $smtp = new-object Net.Mail.SmtpClient($smtpServer) #Sending email if($isThereAnyErrorsToday -eq "true"){ Write-Host "Sending Email" $smtp.Send($msg) } else{ Write-Host "No error today" } } $messageBody = GetErrorLogs sendEmail($messageBody)

This script will be executed by a batch script which in turn is setup to run every day by a Task Scheduler.

powershell -noexit "& "D:\search-error-in-logs.ps1"

 

The end result would be like this

DisplayUrl http://kompasabc.com ErrorLevel 2 ErrorMsg The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use Search Administration page. HResult -2147216863 ErrorDesc ContentSourceId 3 LastTouchStart 03/22/2013 00:24:01 DisplayUrl http://ljkadjalsdkjalskdlaskjdlajd.com ErrorLevel 2 ErrorMsg The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use Search Administration page. HResult -2147216863 ErrorDesc ContentSourceId 3 LastTouchStart 03/22/2013 00:23:53 DisplayUrl http://kompas123.com ErrorLevel 2 ErrorMsg The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use Search Administration page. HResult -2147216863 ErrorDesc ContentSourceId 3 LastTouchStart 03/22/2013 00:23:53 DisplayUrl http://yayayayayayayayaya.com ErrorLevel 2 ErrorMsg The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use Search Administration page. HResult -2147216863 ErrorDesc ContentSourceId 3 LastTouchStart 03/22/2013 00:23:53 DisplayUrl http://kompasabc123.com ErrorLevel 2 ErrorMsg The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use Search Administration page. HResult -2147216863 ErrorDesc ContentSourceId 3 LastTouchStart 03/22/2013 00:23:51

How database index works

Ever wonder when you trigger a query that took 2 minutes to complete then when you applied index as suggested by the SQL Server Database Engine Tuning Advisor then it suddenly runs under a second?.

Think of it like this, when a data is stored in a disk based storage devices, the entirety of the data is stored as blocks of data. When running a query against unindexed field which value is not unique, to search a value it would require to scan the entire blocks of data (at worst N).

With an indexed field, a new blocks of data is created to store the indexed field which value is already sorted. Therefore binary search is performed when trying to find a value that in the indexed fields (log2 N).

Now for example if you have a table schema like the following

Person

Field Data type Size in disk
Id (primary key) unsigned int 4 bytes
FirstName char(50) 50 bytes
LastName char(50) 50 bytes

with that schema, to store a record in a disk it would take 104 bytes, or in one disk block (1024 bytes) it can store 1024/104 = 9 records in a disk block. If you have 1.000.000 records then it would take 1.000.000/9 = 111.111 disk block to store all of those data.

Now depending on the type of query that you run against the table you would get different result in performance, for example if you do a search query against the Id field it would perform a binary search (log2 n) which results in log2 111.111 = 16 block access. This is possible because the Id field is a primary key which value has to be unique and also has been sorted.

Compare it with a query against the FirstName field, since the FirstName field is not sorted a binary search would not be possible, thus it would require exactly 111.111 block access to find the value,  a huge difference.

Creating an index would help greatly in slow performing query, but once again you have to keep mind that creating index would also mean creating a new data structure that is stored in the disk. For example if we are to create an index for the FirstName field

Field Data type Size in disk
FirstName char(50) 50 bytes
(record pointer) special 4 bytes

Based on that schema then it would require 54 bytes for each record. 1024/54 = 18 record in a disk block.  1.000.000/18 = 55.555 disk block. Manage your index wisely, the more fields the index contains or the more index that you created the more disk space that it’s going to take.

reference:

http://stackoverflow.com/questions/1108/how-does-database-indexing-work/1130#1130