ENAMDICT/JMnedict

Japanese Proper Names Dictionary Files

Copyright (C) 2013 The Electronic Dictionary Research and Development Group

Introduction

The ENAMDICT/JMnedict files contain Japanese proper names; place-names, surnames, given names, (some) company names and product names. These were originally included in the EDICT file, along with other non-name entries. By late 1995, the number of name entries had exceeded the others, and the file was becoming unmanageably large, so the decision was made to split it. From this split came the ENAMDICT file.

The JMnedict (Japanese Multilingual Named Entity Dictionary) is simply the ENAMDICT file reformatted into an XML file in UTF-8 coding. It also has a small number of names which use kanji from the JIS X 0212 character set.

Format

The format of the ENAMDICT file is the same as the EDICT file, and the EDICT documention should be consulted for more information.

Most software which uses the EDICT file can also handle other files, however there is some software, such as MacJDic, which can only handle a single file. In such cases, users can concatenate EDICT and ENAMDICT to create a single file.

Note that with the release of ENAMDICT V97-001, the tagging of names has now changed. The old (sur), (giv), etc. were replaced with (mostly) single-letter codes, without the old redundancies. The codes, as of release V2000-01, are:

s - surname (138,500)
p - place-name (99,500)
u - person name, either given or surname, as-yet unclassified (139,000) 
g - given name, as-yet not classified by sex (64,600)
f - female given name (106,300)
m - male given name (14,500)
h - full (usually family plus given) name of a particular person (30,500)
pr - product name (55)
c - company name (34)
st - stations (8,254)

In addition, a number of country-names are added in parentheses after place-names.

The JMnedict is structured according to its DTD, which is at the front of the file.

Downloads

The files can be downloaded from the Monash ftp site: enamdict.gz and JMnedict.xml.gz

Jim Breen
jwb@csse.monash.edu.au
May 2013

APPENDIX

ENAMDICT COPYRIGHT STATEMENT

In March 2000, James William Breen assigned ownership of the copyright of the dictionary files assembled, coordinated and edited by him to the The Electronic Dictionary Research and Development Group.

Information about the formal usage arrangements for ENAMDICT can be found on the Group's WWW page.

In summary, ENAMDICT can be freely used provided satisfactory acknowledgement is made, and a number of other conditions are met.