Category Archives: Software Dev

FileLocator Pro and Large Searches

FileLocator Pro Crash ReportDuring August 2012 we quietly added a new crash reporting module to FileLocator Pro. Based on CrashRpt (an open source product hosted on Google Code) it’s one of the most useful quality control features we’ve ever added, although we hope it’s a ‘feature’ most of our users will never have cause to see.

Since then you may have noticed an increase in memory management related upgrades to FileLocator Pro. It’s not a co-incidence.

We’ve had a slow trickle of crash reports over the last few months and while most were odd, quick to fix, edge-case samples the majority have been related to memory management issues. It didn’t take long to see that FileLocator Pro had a problem on low spec’d machines performing searches where the data was in the gigabyte range and involved millions of files. We found a few problems that were simply bugs in the code, e.g. algorithms that reserved more memory than was necessary, but some of the problems were more subtle function related issues.

By default FileLocator Pro will record up to 10,000 lines of text per file and each line can be up to around 20,000 characters. That’s not usually a problem when searching in a limited set of files. Rarely will a file have 10,000 hits or a line have 20,000 characters. However, when searching over a very large data set with criteria that might not be very selective (e.g. searching for the letter ‘a’ – which was the actual search phrase in one of the crash reports we received) it can be a problem. It can be compounded by searching through file types that may not have EOL (End Of Line) markers, such as EXE or DLLs. Finally to make the whole thing just a little bit trickier, what might be a problem on a scrawny 512MB laptop is not necessarily a problem on sturdy 16GB PC.

The trouble is that FileLocator Pro doesn’t know at the beginning of the search if it’ll find a few hundred files with hits on a few lines (easy), a couple of files with hits on 10,000 lines (not a problem) or a million files with each one reporting hits on 10,000 lines (problem… probably).

FileLocator Pro 6.5 introduces a pre-emptive based solution. Based on the amount of memory installed on the machine FileLocator Pro sets an upper limit for un-restricted results per search (from 20MB up to around 200MB). If during a search that limit is reached FileLocator Pro starts restricting the search. Results for each file are reduced to around 20 lines, with a maximum of 256 characters per line, and the restriction is retained until the search finishes. If the search still runs out of memory then rather than crashing, as it did previously, it terminates the search.

Our tests on very low powered machines with just 512MB have shown a huge improvement in stability for very large searches and so far we haven’t received any memory related crash reports. Job done? Not quite but it’s one more step in cementing FileLocator Pro’s place as the ultimate super fast, rock solid, search and data analysis tool.

In a previous post I talked about ‘pushing a button that I think does nothing’. I hope you can see from our response to these bug reports that when you ‘Push the Button’ and send us a crash report it most certainly does something!

PST and MSG attachment searching

In 2007 Joel Spolsky wrote a blog post about gnarly problems, called Where there’s muck, there’s brass. It basically argued that real benefit to consumers comes in solving gnarly problems not nice simple fun ones.

We’ve just had our own ‘mucky’ experience dealing with attachment searching in PST and MSG files. While the MSG format is nowhere near as complicated as the PST format both have nasty surprises when accessing the attachments.

However, once it was all up and running it was impossible not to have a silly grin watching a demo of FileLocator Pro finding some ‘secret’ text inside a PDF, attached to a MSG file, attached to an email in a PST file, that itself was zipped up and attached to an email in another PST file. How cool is that!

In ‘Other News’ we also have a new Q&A site. It’s the same sort of thing as StackOverflow but just for Mythicsoft products. Check it out: http://qa.mythicsoft.com

Where’s the Apple Mac version?

How would you answer the following question:

Do software developers write software for Windows because

a) Windows is REALLY cool and hip
b) Microsoft creates a great environment in which to write applications
c) 90% of all PC users use Windows
d) Microsoft understands and looks after 3rd party software developers
e) Windows users understand that paying for software provides much needed support for their favourite tools and utilities

My answer: “f) All of the above (except a)”.

What’s my point? Well, it’s a convoluted answer to numerous, very complimentary, requests to port FileLocator Pro to the Mac. Usually I reply something along the lines of “We don’t have the resources to support multiple platforms with minimal market share”. But it’s not as simple as that.

Continue reading

Automated Testing

Our automated test system has discovered bugs, often just before a big release, more times than I’d care to admit. It’s an invaluable tool and one recommended by most modern development methodologies. However, I’m not a fan of methodologies in general. Let me preface that with a little bit of personal history…

Back in 1994, while working for a bank in London, I was asked to come up with a specification standard that could be given to any old ‘monkey’ and would produce reliable results. Management were fed up with the code quality and productivity disparities through-out the development teams.

Continue reading

Running ATL Components created in VS2008 as CLSCTX_LOCAL_SERVER

Yesterday Jasenko, one of the developers here, noticed that our newest components weren’t working in ‘Safe Mode’. Since Safe Mode simply runs the component in a separate process by specifying CLSCTX_LOCAL_SERVER instead of CLSCTX_INPROC_SERVER this had us all quite confused. This used to work seamlessly. On closer inspection it appears that the default settings generated by Visual Studio 2008 are to blame.

Two extra changes are now required for out of process activation to work:

Continue reading

Bugs can be code styling errors

I just found this while debugging an issue:

if ( isConfigured() ) 
    if ( isAllowed() ) 
        doTask(); 
else 
    noConfigurationIssue(); 

I can’t tell you how many times I looked at that before I saw where the problem was. My brain was so used to following styling hints to see control of flow that it didn’t notice that the actual true flow was different from the visual flow. @*&^!

Where are dynamically typed languages heading?

As with most other kids in the 1980s I grew up programming in BASIC. Variables weren’t strongly typed they just held values. Type mismatches were reported at runtime with the ‘Type mismatch at line xx’ error. If you wanted to start running a different piece of code you could simply GOTO whatever line you wanted. BASIC was a great language to learn programming in, you were in charge and the computer did the best to keep up.

Pascal on the other hand was something quite different. Pascal didn’t just run (at least the version we used didn’t), it needed to be compiled first. Variables needed to be declared ahead of time with their type specified before you even used them! No longer could you just jump around the code, instead you had to split the program up into functions and carefully control how they interacted. I hated Pascal. I preferred 6502 assembly (with an instruction set so limited there was no multiply operator) to Pascal.

Continue reading

Managed C++ Destructors and Finalizers

I spend a lot of time moving between C++ and C#. Fortunately the languages are different enough that it’s not too difficult switching between using concepts such as stack allocated objects in C++ and garbage collected objects in C#. However, if I’m not concentrating I do run into trouble when coding in Managed C++ since I expect it to generally behave just like C++, just with access to .NET classes.

Managed C++ is really cool but it has a couple of gotchas. The one that has bitten me more than once is the difference between destructors and finalizers. To make sure that I don’t fall into the trap again I’m going to elaborate on what is a very long comment in one of our source code files.

In standard C++ you would write something like:

Continue reading

Compiling Boost with Visual Studio 2008 (VS2008)

I’m currently in the process of moving all our source over to Visual Studio 2008. Most of our 3rd party libraries compiled right out of the box. However, the excellent Boost library was not so simple.

Boost has quite a complex build process that automatically discovers your compiler and builds the library with almost no user interaction. Unfortunately the current release of Boost (1.34) does not recognize VS2008 and the build process will only pick up older versions of the compiler. Fortunately once you know what files need to change it’s not too difficult to add a new compiler to the build process.

For anyone else who’s trying to compile Boost with VS2008 the link below contains the files I updated to fix our build process.

boost_vs2008.zip