Monday, June 09, 2008

GSoC Hoogle: Week 2

This week I submitted my PhD thesis, emptied my entire rented house of furniture, spent £96 on petrol, drove (or was driven) 400 miles, travelled a similar distance by train, have been to the north of Scotland and am currently working on a borrowed Mac in London. Needless to say, its been rather busy - but now all the excitement is over and I should be able to focus properly on Hoogle.

In the last week I've been focusing on the database, the store of all the function names and type signatures, so a very critical piece of information. I want to support fast searching, which doesn't slow down as the number of known functions increases - a nasty property of the current version. For text searching, the trie data structure has this nice property, and can deal with searching for substrings. For fuzzy type searching, things are a lot more complex. However, I think I have an algorithm which is fast (few operations), accurate (gives better matches), scalable (independent of the number of functions in the database) and lazy (returns the best results first). The idea is to have a graph of function results, and then navigate this graph to find the best match.

Most of the database work has been theoretical, but I have done some coding. In particular, I have started on the database creation code, and polished the flag argument interaction code some more. Part of the development required the Derive tool, and in doing this work I noticed a few deficiencies. In particular, if you run Windows and run derive over a UNIX line-ending file, the tool will generate a Windows line-ending file. This problem, and a few others, are now fixed.

Next week: Database creation and searching. I want text searches to work by the end of the week.

User visible changes: The --help flag prints out information on the arguments.

PS. I was looking forward to seeing some blog posts from the other Haskell summer of code students on the Haskell Planet. If any Haskell GSoC student does have a blog, please ask for it to be included!

2 comments:

Roman Cheplyaka said...

Funny. I submitted my blog to Planet Haskell several weeks ago, but only now it's included.

Anyone else?

Roman Cheplyaka said...

And, by the way, there is planet-soc.com