WWWJDIC Examples Search Function


This function allows you to search the WWWJDIC Examples file for examples which contain selected words in Japanese and/or English. This page provides some information and tips about conducting the search.

If you are looking for examples of single Japanese words, it is probably better to look up the word in the EDICT file and follow the [Ex] link.

Simple Searching

The most straightforward way to use this function is to enter some text in either English or Japanese, and see if that text is to be found in any of the examples.

For example, if you enter 日本語を勉強した you get two matching sentences:

He always wanted to study Japanese.
He studied Japanese eagerly.

Similarly, entering holiday plan results in several examples beginning with:

Our holiday plans are still in the air.
My wife and I agreed on a holiday plan.

This is the result of the strings of characters making up, for example, 日本語を勉強した and holiday plan appearing in the sentences exactly as entered. Entering something like plan holiday will result in no matches at all.

Note that the matching of English characters is not sensitive to case, so you don't have to worry about whether words have capital letters.

Note also that if you attempt to match on single Japanese characters, e.g. a single kanji, you are likely to get spurious results, as the search software used in this function is a standard Unix utility that doesn't understand multiple-byte characters. It is recommended you use strings of at least two characters.

Note also that cutting the search string back to 日本語を勉強 results in more sentences, as it also gets 勉強する, 勉強しなければ, etc. Searching just for を勉強 will get sentences about studying all sorts of things. Similarly searching for an adjective like 新しい will only get sentences with that exact word. Searching for 新し will get get both sentences with 新しい and sentences with 新しく, 新しくない, etc. etc.

Advanced Searching

It is possible to do more sophisticated searching by drawing on some features of the search routine used in this function. Although plan holiday results in no sentences being selected, if you try plan.*holiday you get three matched sentences:

Many young women in their 20s plan to go abroad during their summer holidays.
As yet we have not made any plans for the holidays.
We are making plans for the holidays.

Note that the words plan and holiday both appear in the sentences, but with other letters in between. The .* between plan and holiday signal that all intervening characters are to be ignored.

If you want to find sentences containing either grey or gray you can specify gr[ae]y and the search will allow either of the letters inside the [ ] to match. Similarly if you want to allow both the "rumour" and "rumor" spellings, use rumou?r and the search will allow the "u" to be missing.

Also, if you want sentences containing either 教師 or 教授 you can specify 教師|教授.

You can even combine these features, and have search requests such as (教師.*教授)|(教授.*教師) (can you guess what it is asking?)

All the examples above involve what are called regular expressions, also known as "regex"s. Regexs are sets of symbols and syntactic elements used to match patterns of text. They are the full expressions of which wild-cards are but a simple example. The reason they work with this WWWJDIC function is that the WWWJDIC server uses a regular expression library module to search the example sentence file. This module is a standard part of the library for the C programming language, in which the server is written.

Regexs are used in many text editors, in utility programs like grep (Global Regular Expression Print), and are now key components of languages such as Perl, TCL, Python, Java and .NET.

This is not the place to explain regexs in detail. For a quick overview, I suggest:

Closing Note

Not all regex features work or are even relevant to this WWWJDIC function. I have described a few of the basic ones I think are likely to be useful. Feel free to experiment, and if you come up with some nifty regexs, please let me know. HOWEVER, please don't ask me why a particular regex didn't do what you expected. I'm no expert. (If you use a regular expression which cannot be understood by the software module, you will get an error message. Don't ask what they mean - I am just passing on the message from the module.)

Also, remember that the regexs used here will not handle Japanese kanji and kana as single characters. Something like 合[気氣]道 will not work because it only specifies a single byte for the [気氣]. Using 合[気氣][気氣]道 will usually work, but you may get some false matches too. (Better support for this type of search may be available at some time in the future.)

Jim Breen
April 2004