I'm in need of a script that does the following: it takes all the pages that come up as hits for a given Google query, and dumps them into one big file that I can then search. And it has to work for Hebrew queries and pages. The reason I need this is that I'm trying to get frequency data for various Hebrew verb forms for a paper on loan verbs, and Google hit counts aren't a good proxy because the same token can occur multiple times in a page (and show up as one hit). So I need a large corpus, and there doesn't seem to be an existing one that's suitable.
I don't know enough programming to judge how complicated this would be to write, but something tells me it might be the work of a moment for one sufficiently savvy. And some of you are plenty savvy. So if this is the case, any chance some kind soul could whip up such a thing for me? I'd be much obliged.