Jim's Work-in-progress list.

Some Suggested Projects.

Type Suggestion Suggested by/when Comments Progress
JMdict Replace the "g_lang" attribute with "xml:lang" I forget Probably a good idea. I'll do it when I go to V1.06 of the DTD Done.
JMdict Flag glosses which are not translational equivalents Francis Bond
July 06
Can see its advantage for MT systems.Not hard to do (flag dropped from EDICT version.) None yet.
JMdict/EDICT Proper online submission/edit system Jim and many others Yes, this is an important goal. DONE! (Live July 2010!)
JMdict/EDICT For nouns marked with "vs", indicate the verb form explicitly in the English Many people. Francis Bond is keen for this. Ideally should be done. A few entries, e.g. プレゼント, have it. Very little. Much can be done automatically, but the proof-reading, etc. will be huge.
JMdict/EDICT Flag gairaigo in the <etym/> field, and perhaps Move the "(trans: ....)" there too. Jean-Luc Leger Good idea Some movement on this. The new element has been introduced.
JMdict/EDICT More and consistent use of senses Stuart McGraw Indeed. I have just completed a large sense differentiation for gairaigo, based on stuff Jean-Luc sent. Depends on people pointing out changes needed. I fix up a few every day.
JMdict/EDICT A "counter" tag for words used as counters Stuart McGraw Quite do-able. There are about 60 counters there, so it's not too massive a task. Any comments on suitable tag? ct? ctr? Done.
JMdict/EDICT Nouns and other countable things to have cross ref to appropriate counters (with notes on usage.) Stuart McGraw A massive task. Totally dependent on available data. I know of one commercial (and unavailable source.) None yet.
JMdict/EDICT Tags that indicate words that can be used with "o-" / "go-" prefixes. Stuart McGraw Possible, but the data-entry would be massive, and it's all very subjective. Again, totally dependent on available data. None yet.
JMdict/EDICT Notes providing supplimentary usage info. Stuart McGraw I prefer such material in adjunct linked files. Suggest jeKai be explored, as WWWJDIC links to its entries now.
JMdict/EDICT Orders of magnitude more cross ref's. Stuart McGraw You provide-em, I'll add-em. (Depends on people providing them.) I add them quite often.
JMdict/EDICT Check the newspaper rankings for mistakes based on common proper names Charles Kelly
Sep 06
Good idea. Certainly some mistakes there. Deleted a few: 藤本, 高木, etc. More checking needed.
JMdict/EDICT About 250 entries marked "adj" are not keiyoushi. Should be marked "adj-pn"? Jean-Luc
Sep 06
Agree. "adj-pn" is probably useful for all non-keiyoushi/keiyoudoushi Will get around to it some time.
JMdict/EDICT Add okurigana variants to compound verbs, etc. (e.g. くり返す and 繰りかえす to 繰り返す) Charles Kelly
Sep 06
Agree. Would help the Translate Words in WWWJDIC. May be amenable to a programmed attack, e.g. form possible variants and test using Google/Yahoo. Non-常用漢字 would be prime candidates. I add them as I become aware of them. No concerted programme yet.
JMdict/EDICT Add more domain/field tags, and maybe markers equivalent to Wordnet synsets Various Agree. A big task. For Wordnet synsets a programmed approach my be possible, aligning the words from the glosses with Wordnet's words. None yet.
JMdict/EDICT Extend the "counting" entries by adding the Arabic numeral versions, e.g. 1匹 as well as 一匹, 1月 as well as 一月, etc. Charles Kelly
Oct 06
Worth doing, as it will help Rikaichan et al. Some entries like 一ヶ月, 一ケ月, 一か月, 一箇月 will get very busy None yet.
JMdict/EDICT Write a set of instructions or an FAQ about rules/guidelines for new entries. Various
Nov 06
Good idea. Should be a reasonable priority. None yet.
JMdict/EDICT Do something about showing the various inflected forms of verbs like いらっしゃる Paul Blay
Oct 06
Could be something like an info page on jeKai, which would then get a link from WWWJDIC None yet.
JMdict/EDICT Add synonyms and antonyms to the file Various
Nov 06
I'll do it as soon as someone provides a reliable list of synonyms and antonyms 8-)} None yet.
JMdict/EDICT Revise the XML tagging of xrefs, ants, etc. Various
Yes, having .... as a general element is better. None yet. Think it should wait for the database.
EDICT Allow use of /, [ and ] within glosses, perhaps by using an escape character. Harold Abilock
May 07
High time that was addressed. /, [ and ] is quite possible. Big problem is with breaking legacy software. None yet. Prospect is rather daunting.
ENAMDICT Extend the romanization to include common variants, e.g. Sato and Satoh for Satou. Charles Kelly
Sep 06
Probably useful. Possibly could be done reliably by software. None yet.
ENAMDICT gps coordinates for place names Stuart McGraw Dream on 8-)} If they were available, they could be added or linked in WWWJDIC.
ENAMDICT Links to Google Maps Michael Engel Could be done May get around to it
WWWJDIC Internationalized interface (e.g. Japanese, French, German, ..) Several people; most recently Paul Blay Partly implemented for Japanese. Framework can be extended for other languages. Needs volunteers. Nothing yet
WWWJDIC Example links by senses from EDICT entries Paul Blay Rather messy to do. None yet
WWWJDIC 'reminder' display of full dictionary entry at top of first examples list screen Paul Blay Shouldn't be too hard to do. Done.
WWWJDIC "Specimen examples" for each sense available for each EDICT entry Paul Blay Main job will be to identify and mark such sentences in their index line. Done. Depends on a special code in the Tanaka index.
WWWJDIC Facility to see and review recently added/amended entries Paul Blay
Aug 06
Good idea Done. Available since Sep 06. See here.
WWWJDIC Extend Google linksfor verbs and adjectives to include inflected forms Charles Kelly
Oct 06
Good idea. I want to rethink the whole process of linking to other resources. The string of links on each entry is messy. Doing, say, the most common verb and adjective forms is possible. None yet.
WWWJDIC Replace the GA/LG/NA/etc. tags with dictionary names. Ben Bullock
Dec 06
Good idea. Not too hard. None yet.
WWWJDIC Make significant use of "acceskey" navigation as an aide to keyboard use (see: Antti Tuominen
Dec 06
I've read that page and the W3C docs. I can see the use of it, although since I use X-windows, I tend to operate WWWJDIC by mouse alone (when cutting and pasting.) None yet. I am pondering using a template system for WWWJDIC and maybe it will wait until then.
WWWJDIC For entries that are marked "uk" make the ALC link go from the hiragana version. Peter Maydell
Sep 07
Seems like a good idea. I'll think about it. None yet.
Tanaka Corpus Some markup indicating edit or correction Rene A good idea. Not easy, as many have been edited or proofread. None yet.
  1. JLPT levels
  2. White Rabbit card numbers
  3. kokuji entity (XML)
Aug 06
Good idea. Todd sent the WR and JLPT info from Max Hodges. Shouldn't take long. Not yet.
Add kan/go/tou/kan'you tags to the on readings Ben Bullock
Aug 06
Good idea. David Ranvig has provided the list from Koujien. I need to work out a strategy and implement it. Not yet.
Fix up the 47 DeRoo classification codes Ben Bullock
Oct 06
Yes, should be done Not yet.
Mark the kokuji properly, instead of just a comment in the meaning field. Jean-Luc Leger
March 2005
Yes, should be done Not yet. I'll try and get to it.
JMnedict Fix the multiple tags when there are several names in an entry, e.g. Gerri SMcG
June 07
Yes, should be done Later.

Some suggested projects for people to work on.

A. A mimesis/onomatopoeia collection to go in EDICT/JMdict

While there quite a lot of 擬音語 and 擬態語 in JMdict/EDICT already, the coverage is clearly not complete. It would be great if one or more people could take on the task of building up the number of entries for 擬音語 and 擬態語, and also marking the existing ones with a "{mim}" tag.

Some resources are:

  1. a file compiled a few years ago by Christopher Amis.
  5. Book: "A practical guide to Japanese-English onomatopoeia and mimesis" ISBN4590-007822-3, 北星堂書店, by 尾野秀一
  6. Book: 擬態語・擬音語分類用法辞典 A Thesaurus of Japanese Mimesis and Onomatopoeia: Usage by Categories by Andrew C. Chang
  7. Book: 和英擬音語・擬態語翻訳辞典(藤田孝・秋保慎一編、金星社)
  8. Book: "Jazz Up Your Japanese With Onomatopoeia" by Hiroko Fukuda (Kodansha International)

B. Expanding the 外来語 coverage in JMdict/EDICT

There are about 20,000 外来語 in JMdict/EDICT at present, but many more than that are used in Japanese. It would be good to expand them.

A good resource in Eijiro. I have extracted a list of about 30,000 カタカナ語 which are not currently in JMdict/EDICT. They need some work, e.g.

There are also heaps of printed カタカナ語辞典 and 新語辞典, as well as several online ones.

Jim's Work-in-progress List

Last update: 2007-07-25