I had actually wanted to blog a few weeks ago the fact we’ve released some new API’s for WDS as part of our 2.5 release. I chose to hold off on posting any comments here because I knew we had fairly significant update in the works. So now that we’ve posted an updated set of files let me tell you about them.
The 2.5 release was largely about internationalization for us but we were able to squeeze in some new API’s which we’ve been hearing requests for. The new API’s provide access to a SearchManager component to simplify setup & management of protocol handlers and a SearchDesktop component to provide programmatic access to our index. Both of these components are implemented as COM objects and require WDS 2.5 or later. The initial SDK we posted contained a single IDL file which required compilation via Visual Studios MIDL utility into a .tlb file before you could do anything useful with it. And if you wanted to use the SDK from within a .NET application you then needed to run the .tlb generated by MIDL through another utility called TLBIMP which generates an interop assembly that can then be added as a reference to your .NET project. Easy right? Well to simplify both your lives and ours (we have to explain this stuff to you) we decided to do this work for you and give it to you in the form of an updated SDK. We also talked Brandon Paddock into putting together a sample C# app to get you started in using the new API’s. I’m sure Brandon will blog about the sample so I’ll just talk a bit about the files in the SDK and the basic principles behind the API. Be sure and check out the sample as there’s some cool code in there that enables data binding of a queries results to standard WinForms controls. I’m also going to save discussions about the SearchManager component for another post as there’s enough to talk about with the SearchDesktop component.
What’s in the SDK?
Good question. Here’s a list of the files:
End user agreement
SDK EULA.rtf
COM Interface definitions
wdsQuery.idl
wdsSetup.idl
Output of MIDL
wdsQuery.tlb
wdsQuery.h
wdsQuery_i.c
wdsQuery_p.c
wdsSetup.tlb
wdsSetup.h
wdsSetup_i.c
wdsSetup_p.c
Output of TLBIMP
WDSQuery.dll
WDSSetup.dll
The EULA is mandatory, the .idl files define the COM objects we’re exposing, the files built by MIDL are the files you’ll need if you plan to call the API from C++, and the two .dll’s are the .NET interop assemblies generated after we ran the .tlb’s through TLBIMP. For .NET developers the two .dll’s are the only files you’ll need to make your applications work. And in most cases you’ll only need one or the other.
If you create a new C# project and add references to WDSQuery.dll & WDSSetup.dll, you’ll notice from within the Object Browser that two new namespaces are available to your project:
Microsoft.Windows.DesktopSearch.Query
Microsoft.Windows.DesktopSearch.Setup
If you browse into those namespaces you’ll notice a SearchDesktopClass class along with a bunch of ADO stuff in the Microsoft.Windows.DesktopSearch.Query namespace and a SearchManagerClass class in the Microsoft.Windows.DesktopSearch.Setup namespace. The two classes are interop wrappers around the two COM components we’ve exposed via the SDK. Like I said above the SearchManager component is only relevant to building new protocol handlers so we’ll save that discussion for another time.
Tell me about the classy SearchDesktopClass class…
Ok so now I’m just getting corny… The SearchDesktopClass is the class you’ll use to create a connection and issue a query to the indexer. If you look at the methods exposed by this class you’ll see it’s pretty simple:
public virtual new Microsoft.Windows.DesktopSearch.Query._Recordset ExecuteQuery ( System.String lpcwstrQuery , System.String lpcwstrColumn , System.String lpcwstrSort , System.String lpcwstrRestriction )
public virtual new Microsoft.Windows.DesktopSearch.Query._Recordset ExecuteSQLQuery ( System.String lpcwstrSQL )
Just two methods of which you’re probably only ever going to call one, ExecuteQuery(). Why so simple? It’s a really low level wrapper around the core component used by the UI when it calls the indexer. The UI and the SearchDesktopClass both communicate with the indexer via an OLE-DB Provider exposed off the indexer. More accurately they both communicate with the indexers OLE-DB provider via ADO so you can think of the SearchDesktopClass as a thin wrapper around ADO calls to the indexer which is why results come back as an ADO recordset.
Such low level access to the indexer has PRO’s and CON’s. The PRO’s are that it’s fast and you can pretty much ask for anything you want. The CON is that since you can ask for pretty much anything you want you have to first know what it is that you want. :) Which leads into my explanation of the differences between ExecuteQuery() and ExecuteSQLQuery(). Let’s start with ExecuteSQLQuery().
Calling SearchDesktopClass.ExecuteSQLQuery()
The ExecuteSQLQuery() method is pretty much a direct call to the indexers OLE-DB provider via ADO. The only thing we’ve done is hidden some connection setup details from you. It wants as an argument the complete SQL statement that should be evaluated by the indexer. Here’s the SQL generated when you query for “test query” and don’t filter to a specific type:
SELECT DocFormat, Url, HasAttach, Characterization, PerceivedType, IsDeleted, WorkID, IsAttachment, ConversationID, FileExt, Rank, DocTitlePrefix, DocTitle, FlagText, IsFlagged, Create, DueDate, Importance, ToName, CcName, AttachmentNames, DocCompany, Location, DocCategory, DocKeywords, MusicAlbum, FileName, MusicGenre, DocAuthor, PrimaryDate, Size, FileExtDesc, DisplayFolder, PrimaryTelephone, EmailAddress, EndDate, Location, DueDate, TaskStatus, MusicAlbum, MusicGenre, DocComments, DocKeywords, AudioAvgDataRate, DocComments FROM "MyIndex"..scope() WHERE CONTAINS(*,'"test*"',1033) AND CONTAINS(*,'"query*"',1033) ORDER BY PrimaryDate DESC
Here’s another example of the same query, “test query” but filtered to only show e-mails:
SELECT DocFormat, Url, HasAttach, Characterization, PerceivedType, IsDeleted, WorkID, IsAttachment, ConversationID, FileExt, Rank, DocAuthor, FromName, DocTitlePrefix, DocTitle, Importance, IsFlagged, IsFlaggedCompleted, FlagText, DueDate, AttachmentNames, BccName, CcName, ToName, People, DocCategory, ReceivedDate, PrimaryDate, FromAddress, ToAddress, CcAddress, BccAddress, Size, DisplayFolder FROM "MyIndex"..scope() WHERE CONTAINS(*,'"test*"',1033) AND CONTAINS(*,'"query*"',1033) AND (Contains(PerceivedType,'email') RANK BY COERCION(Absolute, 1000)) ORDER BY ReceivedDate DESC
These guys look a little overwhelming but they’re not too bad once you understand them. If you look closely at these two expressions you’ll notice two differences; 1) some of the columns are different because we’ve figured out you’re querying for e-mails and we might want to show those specialized columns to you. And 2) The WHERE clause has changed to accommodate the added constraint need to only return e-mail’s.
So how do you know what to put in your SQL expression? Well for the columns you just have to know which ones you want. We don’t have them documented yet because they’re not stable enough to say “We’re going to support these forever!” But check out the sample app and the examples above for a good starter list and I’ll also try to post some additional sample SQL queries in the near future that call out additional columns. Just be aware that some of them most likely WILL CHANGE in the future. At any rate the columns is a fairly manageable portion of the expression but the WHERE clause is another story.
The sample WHERE clauses above are relatively tame because the original query text didn’t contain any Advanced Query Syntax (AQS). Once you start adding AQS to the mix your WHERE clauses can get messy in a hurry. To help you out of this jam we added the ExecuteQuery() method. Remember how I said that’s the only method you’ll ever want to call. :)
Calling SearchDesktopClass.ExecuteQuery()
The ExecuteQuery() method implements the proper logic for building a well formed SQL expression before calling the indexers OLE-DB provider via ADO. ExecuteQuery() takes the following arguments:
- pcwstrQuery: This is the text of the users query which MAY contain AQS expressions. The query will be parsed and converted into a WHERE clause to be passed as part of the SQL expression passed to the indexer.
- lpcwstrColumn: This is a list of 1 or more columns you want retrieved from the indexer. This is the SELECT portion of the SQL expression.
- lpcwstrSort: This is the column you want the results ordered by. This directly becomes the ORDER BY clause so you’ll need to pass it in as “RecievedDate DESC” or “ReceivedDate ASC”. This value must be explicitly set to null if you don’t want the results sorted. Due to a bug in 2.5 passing an empty string (“”) will cause the method to fail.
- lpcwstrRestriction: This is any optional constraint you want appended to the SQL expressions WHERE clause. Why do you need this? It let’s you filter the results to a particular type without have to append the AQS expression of “type:email” to the users query. To constrain the results to email you’d pass in “Contains(PerceivedType,'email')”. This value must also be explicitly set to null if you don’t want any additional constraint applied. Due to a bug in 2.5 passing an empty string (“”) will cause the method to fail.
As you can see, calling ExecuteQuery() is a lot simpler then calling ExecuteSQLQuery() even though it takes more arguments. The only thing special you still need to know is the list of columns you want returned. But watch this and other blogs for updates to the unofficial list of columns currently supported in 2.5.
Final tips…
In summary we can’t wait to see what cool tools you guys build using the new API’s. Check out the sample and use that as a starting point. And here are a couple of final tips for now:
- Create a new SearchDesktopClass instance for every query. There shouldn’t be any leaks when reusing an existing SearchDesktopClass but in the UI we create a new instance for every query so I’d recommend you do to as this will be the most tested usage pattern.
- Don’t pass empty strings (“”) in any of the arguments to ExecuteQuery() at least for 2.5. Pass null instead as there is a known bug related to passing empty strings.
- The data binding stuff in the sample is super cool! But when dealing with really large results sets it may not be super efficient as the entire result set has to be copied to a .NET DataTable object before use. For ultimate speed you’ll have to stick with using the raw ADO recordset we return but you should at least give the data binding stuff a shot and see if you’re happy with the perf.
- Give us feedback! Lots and lots of feedback. We are listening…
-Steve