kanjidic_en_2013-06-07_UTC/ 0000755 0001756 0001756 00000000000 12154406141 013327 5 ustar fpw fpw kanjidic_en_2013-06-07_UTC/License 0000644 0001756 0001756 00000023216 12135531502 014637 0 ustar fpw fpw MONASH UNIVERSITY
SCHOOL OF COMPUTER SCIENCE AND SOFTWARE ENGINEERING
ELECTRONIC DICTIONARY RESEARCH AND DEVELOPMENT GROUP
GENERAL DICTIONARY LICENCE STATEMENT
Copyright (C) 2004 The Electronic Dictionary Research and Development Group, Monash University.
EDRDG Home Page
1. Introduction
In March 2000, James William Breen assigned ownership of the copyright of the dictionary files assembled, coordinated and edited by him to the The Electronic Dictionary Research and Development Group at Monash University (hereafter "the Group"), on the understanding that the Group will foster the development of the dictionary files, and will utilize all monies received for use of the files for the further development of the files, and for research into computer lexicography and electronic dictionaries.
This document outlines the licence arrangement put in place by The Group for usage of the files. It replaces all previous copyright and licence statements applying to the files.
2. Application
This licence statement and copyright notice applies to the following dictionary files, the associated documentation files, and any data files which are derived from them.
* JMDICT - Japanese-Multilingual Dictionary File - the Japanese and English components (the German, French and Russian translational equivalents are covered by separate copyright held by the compilers of that material.)
* EDICT - Japanese-English Electronic DICTionary File
* ENAMDICT - Japanese Names File
* COMPDIC - Japanese-English Computing and Telecommunications Terminology File
* KANJIDIC2 - File of Information about the Kanji in JIS X 0208, JIS X 0212 and JIS X 0213 in XML format.
* KANJIDIC - File of Information about the 6,355 Kanji in the JIS X 0208 Standard (special conditions apply)
* KANJD212 - File of Information about the 5,801 Supplementary Kanji in the JIS X 0212 Standard
*
EDICT-R - romanized version of the EDICT file. (NB: this file has been withdrawn from circulation, and all sites carrying it are requested to remove their copies.)
Copyright over the documents covered by this statement is held by James William BREEN and The Electronic Dictionary Research and Development Group at Monash University.
3. Usage, Copying and Distribution Permission and Restrictions
Any person or organization in possession of a copy of any of the files covered by this statement, whether they have received the copy via free distribution or purchase:
1. must as part of the receiving of the copy undertake to be bound by all the conditions in this document relating to the distribution or usage of the files.
2. must undertake not to assert copyright over any portion of the files.
3. may use the file for personal purposes such as to assist with reading texts, research, translation services, etc.
4. may supply extracts or small portions of the files to other persons or organizations in the form of written documents, electronic mail, etc.
5. may make and distribute verbatim copies of these files provided the full documentation of the files and this copyright notice and permission notice are distributed with all copies.
6. may place copies of these files on WWW, ftp and equivalent servers for subsequent distribution provided the full documentation of the files and this copyright notice and permission notice are also made available on the servers, and are given equivalent notification to potential down-loaders of the files. For WWW distribution, there must be links to either local copies of the documentation and licence files or to the locations of the files at Monash University, and acknowledgement must be made of the source of the files.
7. may make and distribute extracts or subsets of the files, or files in other formats and codings containing material selected from the files, under the same conditions applying to verbatim copies. Where a subset of a file is being made, either by reducing the number of entries or by reducing the amount of information in entries or both, the nature of the subset must be made clear in documentation accompanying the distribution of the subset.
8. may make and distribute modified versions of the files, or works derived from the files, under the same conditions applying to verbatim copies. Where a modified version or derived work is being made and distributed, the nature of the modification and derivation must be made clear in documentation accompanying the distribution of the work. (The author of the modified version or derived work is encouraged to make any additional information information available for inclusion in later versions of the file(s).)
9. must either make every endeavour to ensure that the versions of the files they distribute are the most recent available, or must make the version and date clear and prominent in their documentation, WWW page, etc. and supply information as to where and how the most recent version may be obtained.
10. may translate elements of the files into other languages, and to make and distribute copies of those translations under the same conditions applying to verbatim copies.
11. may use these files as part of, or in association with a software package or system. Full acknowledgement of the source of the files must be made both in the promotional material, WWW pages and software documentation, and copies of the documentation of the files and of this licence statement must be included in the distribution. Where the files play a major part in the package, e.g. in the case of the package being a dictionary system based on the files, prominent reference to the source and status of the files must be made on any GUI screens, etc. associated with using the files. In the case of the EDICT, JMdict and KANJIDIC files, the following URLs must be used or quoted:
* http://www.csse.monash.edu.au/~jwb/edict.html
* http://www.csse.monash.edu.au/~jwb/jmdict.html
* http://www.csse.monash.edu.au/~jwb/kanjidic.html
(See this page for a sample of possible acknowledgement text.)
12. may publish all or part of the files on paper or digital media such as CD-ROM, provided clear and prominent acknowledgement of the source is made in the publication.
Note that in all cases, the addition of material to the files or to extracts from the files does not remove or in any way diminish the Group's copyright over the files.
Note also that there is no restriction placed on commercial use of the files. Where use of the files results in a financial return to the user, it is suggested that the user make a donation to the Group to assist with the continued development of the files.
4. Warranty and Liability
While every effort has been made to ensure the accuracy of the information in the files, it is possible that errors may still be included. The files are made available without any warranty whatsoever as to their accuracy or suitability for a particular application.
Any individual or organization making use of the files must agree:
1. to assume all liability for the use or misuse of the files, and must agree not to hold Monash University or the Group liable for any actions or events resulting from use of the files.
2. to refrain from bringing action or suit or claim against Monash University or the Group on the basis of the use of the files, or any information included in the files.
3. to indemnify Monash University or the Group in the case of action by a third party on the basis of the use of the files, or any information included in the files.
5. Copyright
Every effort has been made in the compilation of these files to ensure that the copyright of other compilers of dictionaries and lexicographic material has not been infringed. The Group asserts its intention to rectify immediately any breach of copyright brought to its attention.
Any individual or organization in possession of copies of the files, upon becoming aware that a possible copyright infringement may be present in the files, must undertake to contact the Group immediately with details of the possible infringement.
6. Prior Permission
All permissions for use of the files granted by James William Breen prior to March 2000 will be honoured and maintained, however the placing of the KANJD212 and EDICTH files under the GNU GPL has been withdrawn as of 25 March 2000.
7. Special Conditions for the KANJIDIC File
In addition to licensing arrangements described above, the following additional conditions apply to the KANJIDIC file.
The following people have granted permission for material for which they hold copyright to be included in the files, and distributed under the above conditions, while retaining their copyright over that material:
Jack HALPERN: The SKIP codes and Frequency codes in the KANJIDIC file.
With regard to the Frequency codes, Mr Halpern has stated as follows: "The commercial utilization of the frequency numbers is prohibited without written permission from Jack Halpern. Use by individuals and small groups for reference and research purposes is permitted, on condition that acknowledgement of the source and this notice are included."
With regard to the SKIP codes, Mr Halpern draws your attention to the statement he has prepared on the matter.
Christian WITTERN and Koichi YASUOKA: The Pinyin information in the KANJIDIC file.
Urs APP: the Four Corner codes and the Morohashi information in the KANJIDIC file.
Mark SPAHN and Wolfgang HADAMITZKY: the kanji descriptors from their dictionary.
Charles MULLER: the Korean readings.
Joseph DE ROO: the De Roo codes.
8. Enquiries
All enquiries to:
The Electronic Dictionary Research and Development Group
(Attn: Assoc. Prof. Jim Breen)
School of Computer Science and Software Engineering
Monash University
CLAYTON VIC 3168
AUSTRALIA
(jwb@csse.monash.edu.au)
kanjidic_en_2013-06-07_UTC/CATALOGS 0000644 0001756 0001756 00000004000 12154406136 014445 0 ustar fpw fpw `4A1Q;zE5 kanjidic gai16f kanjidic_en_2013-06-07_UTC/README 0000644 0001756 0001756 00000005564 12154406017 014223 0 ustar fpw fpw
KanjiDic in EPWING format - Version 1.7.4
This is a conversion of Jim Breen's KanjiDic [1] (6.355 JIS X 0208-1990 kanji) converted to JIS X 4081 format (an EPWING subset [2]) by Hannes Loeffler (hal@hloeffler.info). The code of the conversion script is based on work by Kazuhiko Shiozaki [3] using the FreePWING library [4]. German version by Hans-Joerg Bibiko [5], Spanish version (2.280 kanji translated, English otherwise) by Francisco Gutierrez, and Portuguese version (the 1.945 Jouyou kanji translated) by Wladimir Mendes de Carvalho. The kanji gaiji have been created from the jiskan16-1990 BDF font. The large kanji bitmaps have been created from the Kochi Mincho substitute font.
The dictionary may be searched by kanji, kana, translation, Chinese and Korean readings, kanji dictionary indexes, and JIS and Unicode text encodings (see [INFORMATION FIELDS](M2)). Conditional search (keyword search) is available for all readings together with [component elements](M5) (but see notice below), and keys starting with B, C, G, N, P, S and DR (see [INFORMATION FIELDS](M2)).
The readings are in the order On-Yomi (katakana), Kun-Yomi (hiragana), Nanori (hiragana) marked with N, radical name marked with R, Chinese reading (pinyin) marked with C, and Korean reading (romanized) marked with K. The white down triangle marks kanji variants, the white circle marks the dictionary codes (see [INFORMATION FIELDS](M2)), and the white star the kanji components.
Please note that the [249 radicals/component elements](M5) identify kanji by their visual appearance of the elements and not the [214 classical radicals](M4). Some of them are not representable by JIS X 0208 (*,#). These radicals/component elements are displayed by gaiji but searches must be done according to the following table.
(*) radical is top part: Ф Ϸ
radical is right part: ˮ ũ
radical is bottom part: ۿ
radical is left part: ˻ ٩
radical is top-left part:
radical is bottom-left part:
(#) only visual representation, real radical is different character:
(fullwidth vertical line) and katakana
Search examples (search keys in quotes):
by kanji: ""
by reading: "", "", "jian3" (Chinese), "gweon" (Korean)
by stroke count: "S11"
by frequency: "F350"
by Jouyou level: "G4"
by encoding: "366B" (JIS), "U77e9" (Unicode)
by dictionary/query code: "N793", "H3301", "I2q5.3", "MP11.0024", etc.
by keyword search: " S12", " cao2", " B18", " g6 s8 yan4", etc.
by component elements (keyword search): " ", " ", etc.
---
[1] http://www.csse.monash.edu.au/~jwb/kanjidic.html
[2] http://www.epwing.or.jp/
[3] http://openlab.ring.gr.jp/edict/
[4] http://www.sra.co.jp/people/m-kasahr/freepwing/
[5] http://www.bibiko.de/kanji/
kanjidic_en_2013-06-07_UTC/INFO 0000644 0001756 0001756 00000153647 12135531502 014024 0 ustar fpw fpw K A N J I D I C
===============
Copyright (C) 2000 The Electronic Dictionary Research and Development
Group, Monash University.
CONTENTS: INTRODUCTION
CONTENTS & FORMAT
INFORMATION FIELDS
CURRENT USAGE
SUPPORT
TOO MUCH INFORMATION?
HISTORY
LICENCE STATEMENT AND COPYRIGHT NOTICE
APPENDIX A - JIS CODES
APPENDIX B - UNICODE
APPENDIX C - SKIP CODES
APPENDIX D - AN OVERVIEW OF THE FOUR CORNER CODING SYSTEM
APPENDIX E - RADICAL AND STROKE COUNTING RULES
APPENDIX F - CONDITIONS FOR USING SKIP DATA
APPENDIX G - DE ROO CODES
INTRODUCTION
The KANJIDIC file contains comprehensive information about Japanese kanji. It
is a text file currently 6,355 lines long, with one line for each kanji in
the two levels of the characters specified in the JIS X 0208-1990 set. (For
information about this set, see Appendix A.)
The file contains a mixture of ASCII characters and kana/kanji encoded using
the EUC (Extended Unix Code) coding.
Attention is drawn to the KANJIDIC LICENCE STATEMENT AND COPYRIGHT NOTICE
included below in this document.
A similar file, KANJD212, is available for the 5,801 supplementary kanji in
the JIS X 0212-1990 set.
CONTENTS & FORMAT
The first part of each line is of a fixed format, indicating which character
the line is for, while the rest is more free-format.
The first two bytes are the kanji itself. There is then a space, the 4-byte
ASCII representation of the hexadecimal coding of the two-byte JIS encoding,
and another space.
The rest of the line is composed of a combination of three kinds of fields
(which may be in any order and interspersed):
(a) information fields, beginning with an identifying letter and ending with
a space. See below for more information about these fields.
(b) readings (with '-' to indicate prefixes/suffixes, and '.' to separate a
reading from its okurigana). ON-yomi are in katakana and KUN-yomi are in
hiragana. There may be several classes of reading fields, with ordinary
readings first, followed by members of the other classes, if any. The current
other classes, and their tagging, are:
(i) where the kanji has special "nanori" (i.e. name) readings,
these are preceded the marker "T1";
(ii) where the kanji is a radical, and the radical name is not already
a reading, the radical name is preceded the marker "T2".
(Other Tn classes may be created at a later date.)
(c) English meanings. Each such field begins with an open brace '{' and ends
at the next close brace '}'.
INFORMATION FIELDS
There are currently a variety of predefined fields (programs using KANJIDIC
should not make any assumptions about the presence or absence of any of these
fields, as KANJIDIC is certain to be extended in the future):
B -- the radical (Bushu) number. There is one per entry. As
far as possible, this is the radical number used in the Nelson "New
Japanese-English Character Dictionary. Where the classical or
historical radical number differs from this, it is present as a
separate C entry.
C -- the historical or classical radical number, as recorded
in the KangXi Zidian (where this differs from the B entry.) There
will be at most one of these.
F -- the frequency-of-use ranking. At most one per line. The
2,135 most-used characters have a ranking; those characters that lack
this field are not ranked. The frequency is a number from 1 to 2,135
that expresses the relative frequency of occurrence of a character in
modern Japanese. The data is based on statistics published by The
National Language Research Institute (Tokyo), interpreted and adapted
by Jack Halpern in a manner to make it useful to the learner. The
data is derived from the New Japanese-English Character Dictionary
(Kenkyusha, Tokyo 1990; NTC, Chicago 1993). The commercial
utilization of the frequency numbers is prohibited without written
permission from Jack Halpern. Use by individuals and small groups
for reference and research purposes is permitted, on condition that
acknowledgment of the source and this notice are included.
G -- the Jouyou grade level. At most one per line. G1 through
G6 indicate Jouyou grades 1-6. G8 indicates general-use characters.
G9 indicates Jinmeiyou ("for use in names") characters. If not
present, it is a kanji outside these categories.
H -- the index number in the New Japanese-English Character
Dictionary, edited by Jack Halpern. At most one allowed per line.
If not preset, the character is not in Halpern.
N -- the index number in the Modern Reader's Japanese-English
Character Dictionary, edited by Andrew Nelson. At most one allowed
per line. If not present, the character is not in Nelson, or is
considered to be a non-standard version, in which case it may have a
cross-reference code in the form: XNnnnn. (Note that many kanji
currently used are what Nelson described as "non-standard" forms or
glyphs.)
V -- the index number in The New Nelson Japanese-English
Character Dictionary, edited by John Haig.
D -- the "D" codes will be progressively used for dictionary
based codes.
(a) DRnnnn - these are the codes developed by Father Joseph De Roo,
and published in his book "2001 Kanji" (Bojinsha). Fr De Roo has
given his permission for these codes to be included.
P -- the SKIP pattern code. The is of the form
"P--". The System of Kanji Indexing by Patterns
(SKIP) is a scheme for the classification and rapid retrieval of
Chinese characters on the basis of geometrical patterns. Developed
by Jack Halpern, it first appeared in the New Japanese-English
Character Dictionary (Kenkyusha, Tokyo 1990; NTC, Chicago 1993), and
is being used in a series of dictionaries and learning tools called
KIT (Kanji Integrated Tools). SKIP is protected by copyright,
copyleft and patent laws. The commercial utilization of SKIP in any
form is strictly forbidden without the written permission of Jack
Halpern, the copyright holder (jhalpern@cc.win.or.jp). (A brief
summary of the method is in Appendix C. See Appendix E. for some of
the rules applied when counting strokes in some of the radicals.)
S -- the stroke count. At least one per line. If more than
one, the first is considered the accepted count, while subsequent
ones are common miscounts. (See Appendix E. For some of the rules
applied when counting strokes in some of the radicals.)
U -- the Unicode encoding of the kanji. See Appendix B for
further information on this code. There is exactly one per line.
I -- the index codes in the reference books by Spahn &
Hadamitzky. These codes take two forms:
(a) for The Kanji Dictionary (Tuttle 1996), they are in the form
nxnn.n, e.g. 3k11.2, where the kanji has 3
strokes in the identifying radical, it is radical "k" in the S&H
classification system, there are 11 other strokes, and it is the 2nd
kanji in the 3k11 sequence. I am very grateful to Mark Spahn for
providing the (almost) full list of these descriptor codes for the
kanji in this file. At the time of writing some 800 kanji in the
file lack the SH descriptor. This is because the book used a
different glyph as the primary kanji. The gaps are gradually being
filled in. Where the JIS X 0208 glyph is the second kanji for a
particular descriptor code, it has a "-2" appended to the code.
(b) for the Kanji & Kana book (Tuttle), they are in the form
INnnnn, where nnnn is the number of the 1,945 kanji referenced in
that book.
Qnnnn.n -- the "Four Corner" code for that kanji. This is a code
invented by Wang Chen in 1928, it has since then been widely used for
dictionaries in China and Japan. In some cases there are two of these
codes, as it is can be little ambiguous, and Morohashi has some kanji
coded differently from their traditional Chinese codes. See Appendix
D for an overview of the Four Corner System. Christian Wittern,
who passed on these codes, comments that they are in need of
proof-reading and thus users are advised to be cautious using the
codes for serious scholarship.
MNnnnnnnn and MPnn.nnnn -- the index number and volume.page
respectively of the kanji in the 13-volume Morohashi "Daikanwajiten.
In the MNnnn field, a terminal `P`, e.g. MN4879P, indicates that it
is 4879' in the original. In some 500 cases, the number is terminated
with an `X`, to indicate that the kanji in Morohashi has a close, but
not identical, glyph to the form in the JIS X 0208 standard.
Ennnn -- the index number used in "A Guide To Remembering Japanese
Characters" by Kenneth G. Henshall. There are 1945 kanji with these
numbers (i.e. the Jouyou subset.)
Knnnn -- the index number in the Gakken Kanji Dictionary ("A New
Dictionary of Kanji Usage"). Some of the numbers relate to the list
at the back of the book, jouyou kanji not contained in the
dictionary, and various historical tables at the end.
Lnnnn -- the index number used in "Remembering The Kanji" by James
Heisig.
Onnnn -- the index number in "Japanese Names", by P.G. O'Neill.
(Weatherhill, 1972) (A warning: some of the numbers end with 'A'. This
is how they appear in the book; it is not a problem with the file.)
Wxxxx -- the romanized form of the Korean reading(s) of the kanji.
Most of these kanji have one Korean reading, a few have two or more.
The readings are in the (Republic of Korea) Ministry of Education
style of romanization.
Yxxxxx -- the "Pinyin" of each kanji, i.e. the (Mandarin or Beijing)
Chinese romanization. About 6,000 of the kanji have these. Obviously
most of the native Japanese kokuji do not have Pinyin, however at least
one does as it was taken into Chinese at a later date.
Xxxxxxx -- a cross-reference code. An entry of, say, XN1234 will mean
that the user is referred to the kanji with the (unique) Nelson index
of 1234. XJ0xxxx and XJ1xxxx are cross-references to the kanji with
the JIS hexadecimal code of xxxx. The `0' means the reference is to a
JIS X 0208 kanji, and the `1' references a JIS X 0212 kanji.
Zxxxxxx -- a mis-classification code. It means that this kanji is
sometimes mis-classified as having the xxxxxx coding. In the case of
the SKIP classifications, an extra letter code is used to indicate
the type of mis-classification. ZPPn-n-n, ZSPn-n-n and ZBPn-n-n
indicate mis-classification according to position, stroke-count and
both position and stroke-count. (ZRPn-n-n codes are where Jim Breen &
Jack Halpern are having a [hopefully temporary] disagreement over the
number of strokes.)
If the final field of a line is not an English field, there is a final space.
Each reading and information field is therefore bracketed by space characters
(which makes it convenient for searches using programs like "grep".)
As far as possible all entries will have their yomikata and readings
attached, even if they are a recognized variant of another kanji. This is to
facilitate electronic searches using these fields as keys, and should not be
taken as a recommendation to use such obscure kanji.
CURRENT USAGE
KANJIDIC is used now to build the "kinfo.dat" file which is used by JDIC and
JREADER, and by Stephen Chung's JWP. "kinfo.dat" contains the identical
information, but in a compressed form and in a structure suitable for fast
indexed access.
KANJIDIC is also used in the XJDIC and MacJDic dictionary programs, and a
growing number of other programs such as KDRILL and KDIC.
SUPPORT
KANJIDIC was originally compiled, and is maintained by:
Jim Breen
(jwb@csse.monash.edu.au)
School of Computer Science & Software Engineering
Monash University, Victoria, Australia
If you have suggested changes, send diffs [not complete files] with
corrections to him.
TOO MUCH INFORMATION?
KANJIDIC is now rather large, and has information in it which is not much use
for people who are not studying and researching Japanese orthography. It is
still appropriate to maintain it as a useful freely-available compendium of
such information.
For people who only wish to use a subset of the information in KANJIDIC,
there is a program "kdfilt.c", also available as kdfilt.exe for MS-DOS, which
will strip out unwanted fields. Dan Crevier has also released a program
(kanjidicSplit) which does the same for MacJDic users. (For users of the JDIC
program, the KANJDFIX.EXE utility also strips out unwanted fields prior to
building the KINFO.DAT file.)
HISTORY
(some comments by Jim Breen)
KANJIDIC began as two files: jis1detl.lst and jis2detl.lst, which were later
merged into a single file.
The first file was compiled initially from the file "kinfo.dat" supplied by
Stephen Chung, who in turn compiled his file from a file prepared by Mike
Erickson. I originally added about 1900 "meanings" by James Heisig keyed in
by Kevin Moore from the book "Remembering The Kanji". I later added the
meanings from Rik Smoody's files, compiled when he was working for Sony in
Japan. These appear to have been based on Nelson.
The second file was compiled from a complete JIS2 list with Bushu and stroke
counts kindly supplied to me by Jon Crossley, to which I added Nelson
numbers, yomikata and meanings extracted from Rik Smoody's file.
Theresa Martin was an early assister with this file, particularly with
tracking down and correcting many mistranscribed yomikata (the old zu/dzu,
oo/ou, ji/dji, etc. problems).
Jeffrey Friedl did a major overhaul in September-October 1992, in which he
added the frequency rankings, Halpern codes, SKIP patterns, updated the
grading ("G" fields) to reflect the modern Jouyou lists, corrected radical
numbers, corrected stroke counts and readings to fall in line with modern
usage.
Magnus Halldorsson corrected some erroneous Halpern numbers, and provided
them for a lot of the radicals. He provided the list of Heisig indices,
which he originally compiled himself, then verified and expanded using lists
from Richard Walters and Antti Karttunen. He also passed on to me the list of
Gakken indices compiled by Antti Karttunen.
Lee Collins provided the Unicode mappings (see appendix B)
Iain Sinclair has provided the yomikata, meanings and S&H indices of many of
the obscure JIS2 kanji.
Christian Wittern, a Sinologist working at Kyoto University, sent me a
monster file prepared by Dr Urs App from Hanazono College. From this I have
extracted the Four Corner and Morohashi information. Christian also provided
the original Pinyin details, which were later replaced. I am very grateful
for these significant contributions.
In March 1994 the Morohashi indices were proof-read and corrected by
Christian.
Alfredo Pinochet supplied all the Henshall numbers.
Ingar Holst has provided considerable assistance in regularizing the Bnnn and
Cnnn radical classifications to remove some errors that were in the original
JIS2 file, and to make it all conform to Nelson's classification.
In mid-1993 I withdrew the SKIP codes from the distributed file as it
appeared that their presence violated Jack Halpern's copyright on these
codes. Jeffrey Friedl contacted Jack about this, and Jack obtained permission
from his publisher for the codes to be included subject to the copyright and
usage restrictions stated in this document. In March 1994 the Halpern indices
and SKIP codes were checked against an extract from Jack's files, and the "Z"
mis-classification codes added, again from his files. Jack has also made a
lot of useful comments and suggestions about the content and format of the
file. I am most grateful to Jack for his permission and assistance, and also
to Jeffrey for making the contact.
In May 1995, a number of updates took place. Jeffrey Friedl established
contact with James Heisig, and obtained a further set of his indices. I
contacted Mark Spahn (via the "honyaku" mailing list) and he kindly provided
most of the missing S&H descriptors, and Jack Halpern released to me the SKIP
codes of the kanji not in the New Japanese-English Character Dictionary. For
all this material I am most grateful.
In August 1995, I added the O'Neill index numbers. These were compiled by
Jenny Nazak, David Rosenfeld and myself. Thanks to Jenny & David for their
assistance.
In January and February 1996 the Morohashi numbers were checked thoroughly
against two important sources: a file of Unicode-Morohashi data (Uni2Dict)
which was prepared by Koichi Yasuoka from the allocation in the JIS X 0221
standard, and the review draft of the proposed revision of the JIS X 0208
standard, which was prepared by the INSTAC Committee, and made available in
a text file, thus enabling comparisons. All the mismatches between the three
files were examined against the Morohashi text, and extensive corrections
made to all three files. I am grateful to Koichi Yasuoka and Masayuki
Toyoshima for their considerable assistance in this task.
In March 1996 the Korean readings were added. They were provided by Dr
Charles Muller of Toyo Gakuen University (acmuller@gol.com), to whom I
am most grateful. Chuck's compilation of Korean readings is extremely
thorough and scholarly, and I am pleased to be able to incorporate
them.
In April 1996 the readings of all the kanji were compared with those in the
JIS X 0208 draft, and a number of corrections and additions made.
In May 1996 I carried out a "unification" of the readings of the KANJIDIC
and KANJD212 files, wherein all the readings of the "itaiji" were brought
into line. The identification of these itaiji was drawn from a file posted
to the fj.kanji group by Taichi Kawabata (kawabata@is.s.u-tokyo.ac.jp),
which was compiled at the ETL from the itaiji identification in the
JIS X 0208 and JIS X 0212 standards. I corrected a few errors, and added
some extra sets which were indicated in the JIS X 0208-1996 draft.
In July 1996 the Pinyin details were completely replaced by a new set. The
original Pinyin were from an earlier compilation by Christian Wittern, and
and contained many errors. Two more reliable sources had become available:
the Uni2Pinyin file compiled by Koichi Yasuoka, which is based in part on
the TONEPY.tit by Yongguang Zhang; and the PYCHAR set of readings of Big5
hanzi compiled by Christian Wittern. The Pinyin currently in the KANJIDIC
file is a combination of the two, following the order in the Uni2Pinyin
file.
In August 1996 I corrected a few more missing and erroneous Nelson numbers,
using a massive Nelson list prepared by Wolfgang Cronrath. He also flagged
the kokuji, so I added these to the readings fields as "{(kokuji)}".
Also in August 1996 I deleted the handful of former "XJxxxx" cross-references,
and replaced them with a much more comprehensive set, so that they now
represent all the recognized "itaiji". The file I used for this was the
corrected itaiji file mentioned above.
In April 1997 I corrected a large number of bushu codes. Many of these had
been identified as errors by Jean-Luc Leger (reiga@iria.mines.u-nancy.fr) who
analyzed and examined all the Nelson bushu. I also identified and added a large
number of missing Cnnn codes.
Also in April 1997 I added the S&H "Kanji & Kana" indices. These had been
keyed by Olivier Galibert (Olivier.Galibert@mines.u-nancy.fr). (There must
be an outbreak of kanji interest on Nancy.)
In February 1998, the long-awaited inclusion of the "New Nelson" numbers took
place. I had been waiting for the editor of the New Nelson, John Haig, to
supply a list (as he had agreed some years before), but in the meantime,
Jean-Luc Leger keyed a list, so they are now available.
Also between December 1997 and February 1998 a large number of Level 2
kanji had their stroke counts corrected to bring them into line with the
counting principles used in the Level 1 kanji. This usually aligned the
counts with those used in the New Nelson and in S&H. Appendix E of this
document was amended to reflect this. The leg-work in tracking this material
down was done by Wolfgang Cronrath.
During December 1998 & Jan 1999 I updated the stroke counts of many of the
Level 2 kanji, using an analysis of them carried out by Wolfgang Cronrath.
I also added the De Roo codes, which had been keyed by Jasmin Blanchette,
who also typed the explanatory material. I contacted Fr De Roo in Tokyo who
readily agreed to the inclusion of thecodes.
KANJIDIC LICENCE STATEMENT AND COPYRIGHT NOTICE
===============================================
In March 2000, James William Breen assigned ownership of the copyright
of the dictionary files assembled, coordinated and edited by him to the
The Electronic Dictionary Research and Development Group at Monash
University.
Information about the formal usage arrangement for EDICT can be found on
the Group's WWW page at: http://www.dgs.monash.edu.au/edrdg/
In summary, KANJIDIC can be used, with acknowledgement, for any free
software or server, or included in file and software distributions
at a nominal charge for the distribution medium. It is also available
under non-exclusive licence for commercial usesi, subject to the
provisos below.
The following people have granted permission for material for which they hold
copyright to be included in the files, and distributed under the above
conditions, while retaining their copyright over that material:
Jack HALPERN: The SKIP codes and Frequency codes in the KANJIDIC file.
With regard to the Frequency codes, Mr Halpern stated as follows:
"The commercial utilization of the frequency numbers is prohibited
without written permission from Jack Halpern. Use by individuals and
small groups for reference and research purposes is permitted, on
condition that acknowledgment of the source and this notice are
included."
With regard to the SKIP codes, Mr Halpern draws your attention to the
statement he has prepared on the matter, which is included at Appendix F.
Christian WITTERN and Koichi YASUOKA: The Pinyin information in the KANJIDIC
file.
Urs APP: the Four Corner codes and the Morohashi information in the KANJIDIC
file.
Mark SPAHN and Wolfgang HADAMITZKY: the kanji descriptors from their
dictionary.
Charles MULLER: the Korean readings.
Joseph DE ROOO: the De Roo codes.
APPENDIX A - JIS CODES
======================
For full information about JIS codes, please see Ken Lunde's "japan.inf"
file, or his book "Understanding Japanese Information Processing", O'Reilly
1993. The following is a brief extract from the "japan.inf" file.
"The Japanese character set as described in the document JIS X 0208-1990
specifies 6,879 standard characters; 6,355 kanji in 2 levels (Level 1: 2,965
kanji arranged by pronunciation; Level 2: 3,390 kanji arranged by radical),
86 katakana, 83 hiragana, 10 numerals, 52 Roman characters, 147 symbols, 66
Russian characters, 48 Greek characters, and 32 line elements (for making
charts).
This standard was first established in 1978, modified for the first time in
1983 (character position swapping, glyph changes, and four kanji appended to
JIS Level 2), and modified again in 1990 (two kanji were appended to JIS
Level 2). This character set is widely implemented on a variety of platforms.
Encoding methods for JIS X 0208-1990 include Shift-JIS, EUC, and JIS."
APPENDIX B - UNICODE
====================
The following information about Unicode was provided in 1992 by Lee
Collins at Taligent.
(The Unicode sequences are) "the final, official mapping to JIS of the
CJK-JRG's (Chinese, Japanese, Korean- Joint Research Group) "Unified
Repertoire and Ordering Version 2.0" which is the unified Han character set
of ISO 10646 and Unicode. All of the Unicode companies (Apple, IBM,
Microsoft, NeXT, Taligent, etc) are now using this mapping. There has been
some confusion because of difference in nomenclature. Unicode people call it
UniHan, the Chinese sometimes call it HCS (Han Character Set) and ISO calls
it "Ideographic CJK Character Unified Repertoire and Ordering". ISO can't use
the term "Han" character because Japan was very sensitive to this (even
though it is a direct translation of "Kanzi") and it can't be called a
character set because only ISO WG2 is empowered with the authority to encode
characters. Problems of naming aside, they are all the same thing.
The CJK-JRG was formed under the aegis of ISO in 1990 to investigate and
propose a unified Han character set for inclusion in ISO 10646. It brought
together various experts on Han characters from China, Hong Kong, Japan,
Korea, Taiwan and the United States selected by the national bodies
participating in ISO WG2.
Including the initial work in the US on Unicode and in China on GB 13000,
which were merged and became the basis for the URO, the task spanned about 4
years. The work was completed in April of this year. It contains 21,000 Han
characters from all of the major standards used in East Asia, including JIS X
0208-1990 and JIS X 0212-1990. The Unicode consortium provides a
cross-reference file for all of the source sets. To get a copy contact
Steve Greenfield
unicode-inc@HQ.M4.Metaphor.COM
For further details about the URO/UniHan, you might want to pick up a copy of
the "The Unicode Standard Version 1.0 Vol II". It's published by Addison
Wesley, ISBN 0-201-60845-6. It's been available in the USA for over a month
now. For a slightly different presentation of the characters, a copy of 10646
or of the "Ideographic CJK Character Unified Repertoire and Ordering Version
2.0" might be available through the the Australian national body to ISO WG2."
APPENDIX C - SKIP CODES
=======================
S K I P - SYSTEM OF KANJI INDEXING BY PATTERNS
[This document contains the text and examples from the covers of the "New
Japanese-English Character Dictionary" edited by Jack Halpern and published
by Kenkyusha and NTC. It is reproduced with Mr Halpern's kind permission.
The text on which this is based used four patterns which are not able to be
reproduced in this document. They are referred to below as #1 through #4,
and relate to the following shapes in the NJECD:
á
#1 #2 #3 #4
LEFT- TOP- ENCLOSURE SOLID
RIGHT BOTTOM]
HOW TO LOCATE AN ENTRY
A. Determine the SKIP number of your character.
STEP 1 IDENTIFY PATTERN
Determine to which of the four PATTERNS your character belongs to get the
first part of the SKIP number (the PATTERN NUMBER).
If your character belongs to pattern #1, #2 or #3 (ꢪ#1), carry out the
steps in the left column; if it belongs to pattern #4 (#4), carry out the
steps in the right column. (REF: R4. How to Identify the Pattern)
#1 #2 #3 #4
STEP 2
DIVIDE CHARACTER OMIT
Divide the character into two parts at (Since solid characters
the first division point. [=+] cannot be divided, go to
REF: R5. How to Divide the Character STEP 3.) REF: R6. How to
Subclassify the Solid Pattern
STEP 3
COUNT STROKES OF SHADED PART DETERMINE TOTAL STROKE-COUNT
Count the strokes of the SHADED PART Determine the total stroke-count of
to get the second part of the SKIP your character to get the second part
number. [ #1 1-4-] of the SKIP number. [ #4 4-3-]
REF: Appendix 2. How to Count Strokes REF: Appendix 2. How to Count Strokes
STEP 4
COUNT STROKES OF BLANK PART IDENTIFY SOLID SUBPATTERN
Count the strokes of the BLANK PART Determine to which of the four
to get the third part of the SKIP SOLID SUBPATTERNS your character
number. [ #1 1-4-5] belongs to get the third part of the
REF: Appendix 2. How to Count Strokes SKIP number. Select from: `' 1,
`' 2, `|' 3, or `' 4. [ #4 4-3-1]
REF: R6. How to Subclassify the
Solid Pattern
After determining the SKIP number of your character, locate your character
entry in one of two ways:
1. Determine the entry number in the Pattern Index beginning on p. 1952 then
locate your character entry in the main part of the dictionary. See R3.1.2
Index Method for details.
2. Locate your character entry directly (without referring to the Pattern
Index) from its SKIP number. See R3.1.3 Direct Method for details.
NOTE: All references preceded by a section mark (R) refer to SYSTEM OF KANJI
INDEXING BY PATTERNS beginning on p. 106a
HOW TO IDENTIFY THE PATTERN
DETERMINE TO WHICH OF THE FOUR PATTERNS YOUR CHARACTER BELONGS
#1 Characters that can be divided into left and right parts
RIGHT: 4-5 Ȭ 1-1 1-11 3-3
WRONG: 1-3 1-4 3-2 ¿ 3-3
#2 Characters that can be divided into top and bottom parts
RIGHT: 1-1 3-3 2-3 5-4
WRONG: 1-2 4-2 8-4 4-3
#3 Characters that can be divided by an enclosure element
RIGHT: 3-8 3-2 8-3 3-5
WRONG: 1-1 4-3 ̾ 3-3 5-4
#4 Characters that cannot be classified under patterns #1, #2, or #3
RIGHT: 8-1 ʼ 5-2 4-3 Ϳ 3-4
WRONG: 2-1 4-1 4-3
IF A CHARACTER CAN BE CLASSIFIED UNDER MORE THAN ONE PATTERN, SELECT THE ONE
THAT FOLLOWS THE NATURAL CONSTRUCTION OF THE CHARACTER
RIGHT: 2-5-2 Ȣ 2-6-9
WRONG: 1-2-5 Ȣ 1-7-8
HOW TO DIVIDE THE CHARACTER
DIVIDE THE CHARACTER INTO TWO PARTS AT THE FIRST DIVISION POINT
#1 Going from left to right, divide at the first space
RIGHT: 4-4 1-2 3-3
WRONG: 2-1 9-3
#2 Going from top to bottom, divide at the first space, horizontal line, or
frame element, whichever comes first
RIGHT: 1-2 2-8 3-4 2-3
WRONG: 2-1 6-4 2-5 1-2
#3 Going from the outside toward the inside, divide after the first enclosure
element
RIGHT: 3-6 3-8 8-3 3-2
WRONG: 7-2 11-5
DO NOT VIOLATE THE PRINCIPLE OF ELEMENT INTEGRITY
1. Never break through strokes
RIGHT: 3-2-2 WRONG: 1-1-4
2. Never break through indivisible units
RIGHT: 1-3-8 WRONG: 1-1-10
3. Never make unnatural divisions
RIGHT: 3-4-2 WRONG: 2-2-4
HOW TO SUBCLASSIFY THE SOLID PATTERN
A. DETERMINE TO WHICH OF THE FOUR SOLID SUBPATTERNS YOUR CHARACTER BELONGS
`T' 1. Characters that contain a top line
RIGHT: 8-1 3-1 6-1 8-1
WRONG: 2-1 3-2 8-1 ʼ 5-1
2. Characters that contain a bottom line
RIGHT: 3-2 ʼ 5-2 8-2
WRONG: 3-2 5-2 8-2
3. Characters that contain a through line
RIGHT: 4-3 8-3 4-3
WRONG: 4-3 3-3 4-3 7-3
4. Characters that do not contain a top line, bottom line, or through line
RIGHT: Ϳ 3-4 3-4 7-4
WRONG: 6-4 3-4 ͧ 4-4 6-4
B. IF A CHARACTER CAN BE CLASSIFIED UNDER MORE THAN ONE SUBPATTERN, THE
SUBPATTERN WITH THE SMALLEST NUMBER TAKES PRECEDENCE
RIGHT: 4-1 3-1 7-1 8-1 5-2 5-2 5-1
WRONG: 4-2 3-2 7-2 8-3 5-3 5-3 5-3
APPENDIX D: - AN OVERVIEW OF THE FOUR CORNER CODING SYSTEM
==========================================================
The Four Corner System has been used for many years in China and Japan for
classifying kanji. In China it is losing popularity in favour of Pinyin
ordering. Some Japanese dictionaries, such as the Morohashi Daikanwajiten
have a Four Corner Index.
The following overview of the system has been condensed from the article "The
Four Corner System: an introduction with exercises" by Dr Urs App, which
appeared in the Electronic Bodhidharma No 2, February 1992, published by the
International Research Institute for Zen Buddhism, Hanazono College. (More
examples will be added from that article in due course.)
1. Stroke shapes are divided into ten classes:
0 LID е
1 HORIZONTAL LINE
2 VERTICAL LINE
3 DOT Ц
4 CROSS
5 SKEWER
6 BOX
7 ANGLE
8 HACHI Ȭ
9 CHIISAI
2. The Four Digits are derived from the Four Corners in a Z-shaped order.
A B 7 1 7 7
for example:
C D 2 9 2 2
Some examples: 2421 2122 7121 2733 0762 Ʊ 7722 4292
3. A shape is only used once. If it fills several corners, it is counted as
zero in subsequent corners.
Some examples: 6000 8060 ʬ 8022 2003 2690 6066 0096
4. When the upper or lower half of a character consists of only one (single
or composite) shape, it is, regardless of its position, counted as a left
corner. The right corner is counted as zero.
Some examples: Ω 0010 ͳ 5060 1017 0022 0024 2090 2050
5. When there is no additional element to the four sides of the characters
, , (and sometimes ), whatever is inside these characters is taken
for the lower two corners.
Some examples: 7760 6080 Ԣ 6015 6010 7744 1060 2110
6. The analysis is based on the block-style handwritten kaisho (ܴ) shape
of characters.
(This needs attention, as is 3027, not 1027. The top stroke is treated as
a Ц.)
7. Some points to note when analyzing shapes:
o Shape 0:
When the horizontal line below a DOT shape (number 3) is connected to another
stroke at its right-hand end (as in , etc.) it is not counted as a LID
(number 0) but as a DOT.
Examples: 3040 3520 3222
o Shape 6:
Characters such as and where one of the strokes of the square extends
beyond it, are not considered to be square (number 6) shapes, but corners
(number 7).
Examples: 7710 3222 7710 8377 3010
o Shape 7:
Only the cornered end of corner shapes (number 7) is counted as 7.
Examples: 7171 7222 2762 ȿ 7124
o Shape 8:
Strokes that cross other strokes are not counted as shape number 8 (Ȭ).
Examples: 8043 7743 4003 8043 2143 9043
o Shape 9:
Shapes resembling shape 9, but featuring two strokes in the middle (as in the
top part of or ) or two strokes on one side (as in or the bottom part
of ) are not considered as 9 shapes.
Examples: 4433 3290 3214
8. Some points to note when choosing corners.
- when a corner is occupied by more than one independent or parallel strokes,
the one that extend furthest to the left or right is taken as the corner,
regardless of how high or low it is.
examples: 1111 2124 0013 0022 3421 4721
- if there is another shape above (or, at the bottom of the character, below)
the leftmost or rightmost stroke of a character, that shape is given
preference and is taken as the corner.
examples: 3090 4040 6020 4040 3521 ¶ 4480
- when two composite stroke shapes are interwoven and each could be regarded
as a corner, the shape that is higher is taken as the upper corner, and the
lower stroke as lower corner.
- when a stroke that slopes downwards to the left or right is supported by
another stroke, the latter is taken as the corner.
examples: 2740 0073 1962 4464 4410 3424
- a left slanting stroke on the upper left is taken for the left corner only;
for the right corner one takes a stroke more to the right.
examples: 2740 ̶ 2350 6752 Ū 2762 2762 2772
9. Shape variations: (Dr App includes several pages of examples)
10. The fifth corner:
In order to differentiate between the several characters with the same code,
an optional "fifth corner" is sometimes used. This is, loosely, a shape above
the fourth corner which has not been used in any other shape.
APPENDIX E. RADICAL AND STROKE COUNTING RULES
==============================================
These rules apply:
(a) to the stroke-counts themselves;
(b) to the stroke counts in the SKIP codes. Where this results in a SKIP
which differs from that in the NJECD, or in the non-NJECD SKIPs
provided by Jack Halpern, the Jack Halpern version is included prefixed
with "ZR"
RADICALS
The radicals listed below are ones where there are differing approaches to
the counting of radicals in the various references. The stroke counting in
this file does not strictly follow any reference, but tends to more
aligned to Halpern.
1. B140 KUSA-KANMURI e.g. always counted as 3 strokes (Halpern counts
this 4 strokes for the (mostly level 2) kanji where the older form is
often printed.) Note that this has been carried through to kanji where
this element is not the indexing radical, such as ۯ.
2. B162 SHIN-NYUU e.g. or counted as 3 or 4 strokes. (Nelson and S&H
count it as 2 strokes, and Halpern as either 3 or 4.) [See Note 1 below.]
3. B163 OOZATOZUKIRI & B170 KOZATO-HEN ˮ and always counted as 3 strokes
(Nelson and S&H count it as 2, Halpern as 3.) This also applies where it
appears mid-kanji, such as in .
4. B199 MUGI always counted as 7 strokes, except for & where it
is counted as 11. (Nelson and Halpern do the same, and S&H avoid treating
it as a radical, but count it as 12 in the remainder.)
5. B113 SHIMESU e.g. , is counted as 4 strokes in that form, and 5 strokes
in its older form, . 18 kanji are in the 4-stroke form and 20 are in
the 5-stroke form. (Nelson and S&H count it as 4; Halpern counts it as 4
or 5. [See Note 1.])
6. B184 SHOKU HEN , , etc.is counted as 8 strokes in the form, and as
9 strokes in the Ҭ and forms. (Nelson and S&H count it as 8 strokes,
and Halpern as 8 or 9.) [See Note 1. below.]
7. B131 SHIN/KERAI . Counted as 7 (Nelson counts it as 6, Halpern as 7
(in the book), and S&H as both for different kanji.)
8. B136 MAI ASHI . Counted as 7 (traditionally counted as 6, in
accordance with the older writing of `'. Nelson counts as 6, S&H as
7, and Halpern as 7 for and ̾Ѵ and 6 for the rest.) Note
this is also applied to counting and for kanji with the pattern.
9. B131 SHIN or KERAI . Counted as 7 (traditionally counted as 6). Nelson
counts as 6, Halpern as 7, and S&H as 6 or 7 in different cases.
10 The ROO or OI radical (Ϸ) has a variant consisting of the top 4 strokes.
For example, it is in . Traditionally, this variant had an extra dot,
and was counted as 5 strokes. I'm counting it as 4 throughout.
OTHER STROKE PATTERNS
1. While the pattern is a 6-stroke radical, the top half of is made up
of three distinct parts totalling 8 strokes. Note that this also is the
case with տ, , and despite the simplification in the JIS glyphs.
2. (KIBA HEN) is a problem. It is classically counted as 4 strokes, but
these days has a flick that makes it effectively 5. Halpern, Nelson and
S&H usually have it as 5 strokes, so I'm standardizing on that.
3. Another little horror is (MU or NASHI), which is classically counted
as 4 strokes. The most common variant has 5 strokes, but looks like 6.
Halpern, S&H and the Classical Nelson count this as 4 strokes, and the New
Nelson as 5. I'm making it 5 too.
4. The JUU or ASHIATO radical is at the bottom of and . It is
traditionally counted as 5 strokes, although sometimes it looks like 4.
I'm using 5 throughout.
5. The pattern to the left of , which appears in several kanji, e.g.
ʾ and , has 8 strokes. (There are 3 strokes at the top as in .)
6. The "east" pattern () has 8 strokes. There is an older form in which
there are two strokes in the box (). It is counted as 8
strokes here in the form (e.g. ) and 9 in the form, as in .
7. The pattern at the bottom of is counted as 4 strokes in modern
dictionaries, although traditionally it was 5.
8. The pattern , which appears in several kanji, is counted as 9 strokes.
Several dictionaries count it as either 8 or 9.
Note 1: The JIS X 0208-1990 standard does not formally specify the precise
glyphs used for kanji, however the glyphs it uses in the published
version have become de facto standards for many font compilations. In
the published standard, for several kanji, e.g. é/, /, /Ҭ, the
JIS level one kanji use the simpler form, and the Level 2 kanji use the
older more complex form. Just to make matters worse, many fonts for
JIS X 0208 kanji are based on the bit-maps specified in JIS X 9051-1984
standard, which defines the 16x16 patterns for JIS X 0208-1983 characters.
According to Ken Lunde: "This standard was not very good, and JSA is no
longer supporting it."
Anyway, JIS X 9051-1984 had the simpler form for all these bushu in both
Levels 1 and 2, as well as having simplifications of kanji like . Thus,
as the font foundries have freedom to choose whichever glyphs they like,
what you see on your screen may well not agree with these rules. All
the rules in this appendix relate to the glyphs as published in the
JIS X 0208-1990 standard, and as appearing in font compilations based
on them.
APPENDIX F.CONDITIONS FOR USING SKIP DATA by Jack Halpern (jack@kanji.org)
Ever since my New Japanese-English Character Dictionary (NJECD) came
out (Kenkyusha 1990, NTC 1993), I have been getting inquiries asking for
permission to use SKIP (System of Kanji Indexing by Patterns) data in
software products and electronic dictionaries. Below I explain the
policy of the Kanji Dictionary Publishing Society (KDPS) on how to use
copyright issues when distributing SKIP data or using it in software
product or electronic dictionary.
WHAT IS SKIP?
Briefly, SKIP is an indexing system that enables the user to locate
kanji quickly and accurately. The system is extremely convenient because
it can be learned in a very short time, is easy to use, and requires very
little prior knowledge of kanji.
The central idea of SKIP is the classification of characters into four
major categories on the basis of easy-to-identify geometrical :
1. Left-right
2. Up-down
3. Enclosure
4. Solid
Characters belonging to the first three categories are arranged in
ascending order of hyphenated numerals that represent the number of
strokes in the and the number of strokes in the See http://www.kanji.org and NJECD front matter for details.
To distribute SKIP data within a group or use it in a commercial
or non-commercial product, please confirm that you agree to the following
conditions:
1. COPYRIGHT AND DISTRIBUTION
SKIP data is protected by copyright, copyleft and patent laws. The
copyright holder is Jack Halpern, chief editor of KDPS (the Kanji
Dictionary Publishing Society). The SKIP data must be protected
from illegal copying and distribution, using such meaasures encryption.
The data must be encrypted if it is to be used in any kind of product,
including commercial products, software and freeware. The data, or extracts
from it, must not be distributed to a third party, must not be sold as
part of any commercial software package, and must not be incorporated
in any published dictionary or other printed document without the
specific permission of the copyright holder.
2. ACKNOWLEDGMENT OF SOURCE
The source of SKIP data shall be acknowledged in the information
screens of the product, and the following disclaimer should appear
in the documentation and/or help screens:
"SKIP (System of Kanji Indexing by Patterns) numbers are derived from
the New Japanese-English Character Dictionary (Kenkyusha 1990, NTC
1993) and The Kodansha Kanji Learner's Dictionary (Kodansha
International, 1999). SKIP is protected by copyright, copyleft and
patent laws. The commercial or non-commercial utilization of SKIP in
any form is strictly forbidden without the written permission of
Jack Halpern, the copyright holder. Such permission is normally
granted. Please contact jack@kanji.org and/or see http://www.kanji.org."
3. ROYALTIES
SKIP is a product of seven years of computer-assisted research and
experimentation on how kanji elements are intuitively perceived in
terms of their parts. Development work was financed by private funds
and research grants. To enable us to continue to develop useful data
and products, we ask for you cooperation by paying KDPS (the Kanji
Dictionary Publishing Society) a royalty 0.5% (negotiable) if you are
using the data for a commercial product. Depending on the circumstances,
it is also possible to use SKIP data free of charge or at a lower
royalty.
Finally, please send a copy of your product to Jack Halpern
APPENDIX G - DE ROO CODES
AN OVERVIEW OF THE DE ROO SYSTEM
[This document contains the text found in the second edition of "2001 Kanji"
edited by Joseph R. De Roo and published by Bonjinsha.]
The system used in "2001 Kanji" is intended for the beginner who encounters
a kanji and wants to look it up, knowing neither its radical, pronunciation,
nor its exact number of strokes. The method consists of looking at the top
of the kanji, and then at its bottom, disregarding its other parts.
"2001 Kanji" provides drawings for all graphic elements. This information
cannot be reproduced here. However, an attempt was made to describe each
element as much as possible given the constraints of a computer text file,
and examples of characters possessing the element are always given.
Two-step visual method for locating a kanji:
1. Observe its EXTREME TOP or LEFT TOP.
There are only four possibilities: DOT (Ц), VERTICAL LINE (), DIAGONAL
LINE (), HORIZONTAL LINE (). Each of these four strokes can occur either
in isolation or in connection with one or more strokes. Each of the four
groups of graphic elements correspond to the four basic strokes in their
immediate environment. Each element has a number wich will become the first
half of the kanji number.
DOT (Ц):
3 DOT (Ц) ɬ
4 ROOF (е)
5 DOTTED CLIFF () ģ
6 ALTAR Ƿ
7 KANA U ()
8 LID
9 HORNS
VERTICAL LINE ():
10 SMALL ON BOX Ⱦ
11 SMALL () ɹ ϧ
12 VERTICAL LINE () Ҹ
13 HAND TO THE LEFT
14 CROSS () ¸ ˮ
15 CROSS ON BOX () ī «
16 KANA KA ()
17 WOMAN ()
18 TREE ()
19 LETTER H (װ) ʦ ¶
DIAGONAL LINE ():
20 KANA NO () ˳ ë Ȭ
21 MAN TO THE LEFT ()
22 THOUSAND ()
23 MAN TO THE TOP ̵
24 COW ()
25 KANA KU () ұ
26 HILL TOP α
27 LEFT ARROW ()
28 ROOF ()
29 X ()
HORIZONTAL LINE ():
30 HORIZONTAL LINE () Ʀ
31 FOURTH () ŷ
ʿ
32 BALD (Ѻ)
33 CLIFF () ä ȿ
34 TOP-LEFT CORNER Ĺ
35 TOP-RIGHT CORNER ȯ ͽ λ
36 UPSIDE-DOWN CAN () Ʊ ð
37 MOUTH () ̱
38 SUN () ¨
39 EYE TOP
2. Observe its EXTREME BOTTOM or RIGHT BOTTOM.
There are nine possibilities: DOT (Ц), LEFT HOOK (Э), VERTICAL LINE (),
RIGHT HOOK, DIAGONAL LINE (), BACK DIAGONAL LINE (), BOTTOM OF HEAD ɥ,
BOTTOM OF WATAKUSHI , HORIZONTAL LINE (). They are listed in association
with one or more strokes. The number of the bottom element will become the
second half of the kanji number.
DOT (Ц):
40 FOUR DOTS ̵
41 SMALL () ;
42 WATER () ɹ
LEFT HOOK (Э):
43 KANA RI ()
44 SEAL () ʦ Ĥ
45 SWORD BOTTOM () ǵ
46 MOON () ͭ
47 DOTLESS INCH в ͷ Ϳ ð
48 INCH ()
49 MOUTH LEFT HOOK
50 BIRD BOTTOM Ļ
51 ANIMAL () ʪ
52 BOW BOTTOM ұ
53 LEFT HOOK (Э) λ
VERTICAL LINE ():
54 VERTICAL LINE () ʹ
55 CROSS ()
װ
RIGHT HOOK:
56 RIGHT HOOK ε Ҹ
57 LEGS (ѹ) Ѻ ȯ
58 HEART () ǰ
59 TASSELED SPEAR BOTTOM ɬ
DIAGONAL LINE ():
60 KANA NO () ͼ
BACK DIAGONAL LINE ():
61 SMALL PODIUM ϻ
62 BACK KANA NO ()
63 BIG () ŷ
64 TREE () « ̤
65 SMALL SPOON ι Ĺ ä
66 GOVERN (Щ) ʸ ھ
67 AGAIN ()
68 WINDY AGAIN ()
69 WOMAN ()
HEAD BOTTOM:
70 HEAD BOTTOM ɥ Ƿ
WATAKUSHI BOTTOM:
71 WATAKUSHI BOTTOM
HORIZONTAL LINE ():
72 HORIZONTAL LINE () Τ
73 STANDING BOTTOM Ω Ʀ
74 DISH BOTTOM
75 BOTTOM CORNER ľ ҹ ˴
76 MOUNTAIN () ͳ ͩ
77 MOUTH () Ϥ
78 SUN () ɴ
79 EYE ()
The number of the kanji you are looking for consists of the top number
coming first and the bottom number coming second, the two numbers being
placed side by side. E.g., 363 (3 63), 747 (7 47).
There are two rules always to keep in mind:
a. Ignore the complete enclosure and the "road" radical (as in ƻ). Look
at the top and bottom (in some cases only the bottom) of what is inside the
complete enclosure, and of what is to the upper right of "road". E.g.,
1262, 2177, ƻ 979, ˥ 2755.
b. When a part is enclosed by the "gate" radical, take the bottom or right
bottom of that part. E.g., Ʈ 3848, 1864.
kanjidic_en_2013-06-07_UTC/kanjidic/ 0000755 0001756 0001756 00000000000 12154406141 015103 5 ustar fpw fpw kanjidic_en_2013-06-07_UTC/kanjidic/DATA/ 0000755 0001756 0001756 00000000000 12154406144 015617 5 ustar fpw fpw kanjidic_en_2013-06-07_UTC/kanjidic/DATA/HONMON.ebz 0000664 0001756 0001756 00030265567 12154406136 017352 0 ustar fpw fpw EBZip 6x 4AQ^ PG 4 3 b F b ) t a W 9 ; _ h O - n A j '^ e m
]
5 R9 L
^
y
W
.
P d , K B 9 . b8 C @_ f 7 bh 5 Ok O \
M b E s D 8 h ǡ " o K ^ b 8x s X 't O ![ ! "% "[ "; #7 # # $E" $ % %T %O &[ &s