About the project

How to use the database


A. Information in the database are organized into three main categories:

  1. Articles
  2. Images
  3. Advertisements
  • In addition, the database also intensively features "persons" and "keywords."

B. Search options

  • Users can search the database by typing in key words or phrases using characters, pinyin, and English either within all fields or in the following search categories:
    • Article titles
    • Image captions
    • Advertisement captions
    • Personal names
    • Keywords
    • Genres
    • Series

C. Rules for Pinyin

  1. All phrases and sentences are entered syllable by syllable, i.e. each character is entered separately (not linked up), with the following exceptions:
    • Personal names (See the following section on "Personal Names" for details
    • Place names: Shanghai (上海), Suzhou (蘇州), Zhongguo (中國), Meiguo (美 國), Deguo (德國) etc.
    • Printed Materials and book titles: Funu shibao (婦女時報), Linglong (玲瓏), Sheying huabao (攝影畫報), Shenbao (申報), etc.
    • Names for companies: Shangwu yin shu guan (商務印書館), Shanghai Hengdali yang hang (上海亨達利洋行), etc.
  2. Examples of phrases:
    • Yun dong yu Zhongguo fu nu (運動與中國婦女)
    • Shengjia feng ren nu xue xiao zhao sheng (勝家縫紉女學校招生)
    • Meiguo Tanxiangshan feng jing jue jia "hu la" wu wei gai tu ren te chang (美國檀香山風景絕佳『呼啦』舞為該土人特長)
  3. U-Umlaut “ü”: The standard pinyin rules use ü (u-Umlaut) for four syllables: nü, nüe, lü, lüe. To facilitate and broaden search results, all occurrences of “ü” were entered as “u” into the database: “fu nu” (婦女) was entered instead of “fu nü”.
    However, when searching, users can still enter “ü” in three ways: as “ü”, or “v”, or “u”. Thus, searching “nü zi” - or “nu zi” - or “nv zi” will each result in identical hits.

D. Rules for personal names

  1. Names in Chinese characters are written in the conventional way, i.e. there is no space between characters.
  2. Names in Pinyin:
    • Chinese names are entered with a space between family and first names: Zhou Jinjue (周今覺), Bao Tianxiao(包天笑)
    • Pen names are entered syllable by syllable: Lu Xun (魯迅), Zhuo Dai (卓呆)
    • Chinese transliterations of foreign names: Liefu Tuoersitai (列夫•托爾斯泰) [Leo Tolstoy], Kelisituibai Pangesi (克立司推白潘葛司) [Christabel Pankhurst]


E. Rule for Chinese Characters

  1. All characters are entered in traditional form. Users should enter traditional characters when searching.
  2. All characters are encoded according to the Unicode standard.

F. Page numbering

  1. Technical numbers vs. Printed numbers
    • Magazines in the database do not use a consistent system for numbering pages, and many pages have no page number. Therefore, we use Technical numbers to denote the page sequence in an issue. For every issue, the “Technical number” starts with page 1 (the cover) and is then counted in sequence until the last page (the back cover).
    • Printed numbers are page numbers that are printed on the actual pages. However, due to the inconsistency of the page numbering system in the magazines, this number should be used as a reference only. It does not denote the actual page sequence in an issue.
    • If no printed page number is available but a number was written by hand (e.g. often on the page of images in Linglong), the number is recorded in square brackets, such as [625].
    • Missing pages - as we strive to find good quality images for all pages, there are still many missing pages in the various editors that we could get hold of.
    • When citing from the database, users should use the printed number with technical number in square bracket [ ].

G. Language - Wenyan (文言) vs Baihua (白話):

  1. 的 versus 之: A vernacular text will have relatively many 的, a non-vernacular text will have very few. Connected to this, a non-vernacular text is likely to have many more occurences of 其 (in the meaning of 他的/她的/它的). (Joan, Michel)
  2. The use of measure words 量词 seems to be almost exclusive to modern Mandarin (i.e. vernacular). So things like 两个人, 三棵树, 四种想法 and so on will occur rarely in non-vernacular texts. Instead, in those texts, you would find numbers linked directly to nouns, for instance 二人, or behind nouns 理由有三. Annother example: Liang Qichao's (arguable a very "modern" type of classical Chinese) writing contains no measure words at all. Whereas if we leaf through one or two pages of a Lu Xun story, we can find lots of them. (Michel)
  3. The use of question particles 吗 and 呢, as opposed to 乎 and other such particles, is another possible marker. (Michel)
  4. Mentioned in most of the introductions: the vernacular is largely (something like 92%) bisyllabic, such as 我手写我口, it is five monosyllabic words in a row, with no connecting particles, which is virtually impossible in modern Chinese. (Michel)
  5. The aspect marker 了 and negative marker 没 are very uncommon in non-vernacular. Even something like 没有 is more likely to be 不有 in classical. (Michel)
  6. Some dictionaries (such as the PC version of the ABC Dictionary) give frequency percentages for the occurrence of certain words in modern vernacular. It might be possible to draw up a list of characters/words that would normally have a very high frequency in modern Chinese. If the text you are dealing with does not have a high frequency of those words, it is likely that it is not a modern text. (Michel)