Jim's Work-in-progress list.
Some Suggested Projects.
Type | Suggestion | Suggested by/when | Comments | Progress |
JMdict | Replace the "g_lang" attribute with "xml:lang" | I forget | Probably a good idea. I'll do it when I go to V1.06 of the DTD | Done. |
JMdict | Flag glosses which are not translational equivalents | Francis Bond July 06 |
Can see its advantage for MT systems.Not hard to do (flag dropped from EDICT version.) | None yet. |
JMdict/EDICT | Proper online submission/edit system | Jim and many others | Yes, this is an important goal. | DONE! (Live July 2010!) |
JMdict/EDICT | For nouns marked with "vs", indicate the verb form explicitly in the English | Many people. Francis Bond is keen for this. | Ideally should be done. A few entries, e.g. プレゼント, have it. | Very little. Much can be done automatically, but the proof-reading, etc. will be huge. |
JMdict/EDICT | Flag gairaigo in the <etym/> field, and perhaps Move the "(trans: ....)" there too. | Jean-Luc Leger | Good idea | Some movement on this. The new |
JMdict/EDICT | More and consistent use of senses | Stuart McGraw | Indeed. I have just completed a large sense differentiation for gairaigo, based on stuff Jean-Luc sent. | Depends on people pointing out changes needed. I fix up a few every day. |
JMdict/EDICT | A "counter" tag for words used as counters | Stuart McGraw | Quite do-able. There are about 60 counters there, so it's not too massive a task. Any comments on suitable tag? ct? ctr? | Done. |
JMdict/EDICT | Nouns and other countable things to have cross ref to appropriate counters (with notes on usage.) | Stuart McGraw | A massive task. Totally dependent on available data. I know of one commercial (and unavailable source.) | None yet. |
JMdict/EDICT | Tags that indicate words that can be used with "o-" / "go-" prefixes. | Stuart McGraw | Possible, but the data-entry would be massive, and it's all very subjective. Again, totally dependent on available data. | None yet. |
JMdict/EDICT | Notes providing supplimentary usage info. | Stuart McGraw | I prefer such material in adjunct linked files. | Suggest jeKai be explored, as WWWJDIC links to its entries now. |
JMdict/EDICT | Orders of magnitude more cross ref's. | Stuart McGraw | You provide-em, I'll add-em. (Depends on people providing them.) | I add them quite often. |
JMdict/EDICT | Check the newspaper rankings for mistakes based on common proper names | Charles Kelly Sep 06 |
Good idea. Certainly some mistakes there. | Deleted a few: 藤本, 高木, etc. More checking needed. |
JMdict/EDICT | About 250 entries marked "adj" are not keiyoushi. Should be marked "adj-pn"? | Jean-Luc Sep 06 |
Agree. "adj-pn" is probably useful for all non-keiyoushi/keiyoudoushi | Will get around to it some time. |
JMdict/EDICT | Add okurigana variants to compound verbs, etc. (e.g. くり返す and 繰りかえす to 繰り返す) | Charles Kelly Sep 06 |
Agree. Would help the Translate Words in WWWJDIC. May be amenable to a programmed attack, e.g. form possible variants and test using Google/Yahoo. Non-常用漢字 would be prime candidates. | I add them as I become aware of them. No concerted programme yet. |
JMdict/EDICT | Add more domain/field tags, and maybe markers equivalent to Wordnet synsets | Various | Agree. A big task. For Wordnet synsets a programmed approach my be possible, aligning the words from the glosses with Wordnet's words. | None yet. |
JMdict/EDICT | Extend the "counting" entries by adding the Arabic numeral versions, e.g. 1匹 as well as 一匹, 1月 as well as 一月, etc. | Charles Kelly Oct 06 |
Worth doing, as it will help Rikaichan et al. Some entries like 一ヶ月, 一ケ月, 一か月, 一箇月 will get very busy | None yet. |
JMdict/EDICT | Write a set of instructions or an FAQ about rules/guidelines for new entries. | Various Nov 06 |
Good idea. Should be a reasonable priority. | None yet. |
JMdict/EDICT | Do something about showing the various inflected forms of verbs like いらっしゃる | Paul Blay Oct 06 |
Could be something like an info page on jeKai, which would then get a link from WWWJDIC | None yet. |
JMdict/EDICT | Add synonyms and antonyms to the file | Various Nov 06 |
I'll do it as soon as someone provides a reliable list of synonyms and antonyms 8-)} | None yet. |
JMdict/EDICT | Revise the XML tagging of xrefs, ants, etc. | Various |
Yes, having |
None yet. Think it should wait for the database. |
EDICT | Allow use of /, [ and ] within glosses, perhaps by using an escape character. | Harold Abilock May 07 |
High time that was addressed. /, [ and ] is quite possible. Big problem is with breaking legacy software. | None yet. Prospect is rather daunting. |
ENAMDICT | Extend the romanization to include common variants, e.g. Sato and Satoh for Satou. | Charles Kelly Sep 06 |
Probably useful. Possibly could be done reliably by software. | None yet. |
ENAMDICT | gps coordinates for place names | Stuart McGraw | Dream on 8-)} | If they were available, they could be added or linked in WWWJDIC. |
ENAMDICT | Links to Google Maps | Michael Engel | Could be done | May get around to it |
WWWJDIC | Internationalized interface (e.g. Japanese, French, German, ..) | Several people; most recently Paul Blay | Partly implemented for Japanese. Framework can be extended for other languages. Needs volunteers. | Nothing yet |
WWWJDIC | Example links by senses from EDICT entries | Paul Blay | Rather messy to do. | None yet |
WWWJDIC | 'reminder' display of full dictionary entry at top of first examples list screen | Paul Blay | Shouldn't be too hard to do. | Done. |
WWWJDIC | "Specimen examples" for each sense available for each EDICT entry | Paul Blay | Main job will be to identify and mark such sentences in their index line. | Done. Depends on a special code in the Tanaka index. |
WWWJDIC | Facility to see and review recently added/amended entries | Paul Blay Aug 06 |
Good idea | Done. Available since Sep 06. See here. |
WWWJDIC | Extend Google linksfor verbs and adjectives to include inflected forms | Charles Kelly Oct 06 |
Good idea. I want to rethink the whole process of linking to other resources. The string of links on each entry is messy. Doing, say, the most common verb and adjective forms is possible. | None yet. |
WWWJDIC | Replace the GA/LG/NA/etc. tags with dictionary names. | Ben Bullock Dec 06 |
Good idea. Not too hard. | None yet. |
WWWJDIC | Make significant use of "acceskey" navigation as an aide to keyboard use (see: http://www.cs.tut.fi/~jkorpela/forms/accesskey.html) | Antti Tuominen Dec 06 |
I've read that page and the W3C docs. I can see the use of it, although since I use X-windows, I tend to operate WWWJDIC by mouse alone (when cutting and pasting.) | None yet. I am pondering using a template system for WWWJDIC and maybe it will wait until then. |
WWWJDIC | For entries that are marked "uk" make the ALC link go from the hiragana version. | Peter Maydell Sep 07 |
Seems like a good idea. I'll think about it. | None yet. |
Tanaka Corpus | Some markup indicating edit or correction | Rene | A good idea. Not easy, as many have been edited or proofread. | None yet. |
KANJIDIC KANJIDIC2 |
Add:
|
Todd Aug 06 |
Good idea. Todd sent the WR and JLPT info from Max Hodges. Shouldn't take long. | Not yet. |
KANJIDIC KANJIDIC2 |
Add kan/go/tou/kan'you tags to the on readings | Ben Bullock Aug 06 |
Good idea. David Ranvig has provided the list from Koujien. I need to work out a strategy and implement it. | Not yet. |
KANJIDIC KANJIDIC2 |
Fix up the 47 DeRoo classification codes | Ben Bullock Oct 06 |
Yes, should be done | Not yet. |
KANJIDIC KANJIDIC2 |
Mark the kokuji properly, instead of just a comment in the meaning field. | Jean-Luc Leger March 2005 |
Yes, should be done | Not yet. I'll try and get to it. |
JMnedict | Fix the multiple tags when there are several names in an entry, e.g. Gerri | SMcG June 07 |
Yes, should be done | Later. |
Some suggested projects for people to work on.
While there quite a lot of 擬音語 and 擬態語 in JMdict/EDICT already, the coverage is clearly not complete. It would be great if one or more people could take on the task of building up the number of entries for 擬音語 and 擬態語, and also marking the existing ones with a "{mim}" tag.
Some resources are:
There are about 20,000 外来語 in JMdict/EDICT at present, but many more than that are used in Japanese. It would be good to expand them.
A good resource in Eijiro. I have extracted a list of about 30,000 カタカナ語 which are not currently in JMdict/EDICT. They need some work, e.g.
There are also heaps of printed カタカナ語辞典 and 新語辞典, as well as several online ones.
Last update: 2007-07-25