Help wanted digitizing Wolff's Cebuano dictionary

jhellingman's picture

For many years, I've been running a small Philippine dictionary on my website http://www.bohol.ph/diksyunaryo.php. Although this is used quite often, it is fairly limited, and of a rather low quality.

Recently, I've added a large dictionary for Hiligaynon on the site, available http://www.bohol.ph/kved.php. This is nothing less than the complete text of Kaufmann's 1934 Visayan-English dictionary, which, although it shows its age, is still very useful, and includes sample sentences for most Hiligaynon words. Both a searchable interface and a downloadable PDF file are available.

Now, I want to do the same for Cebuano, and started a project to digitize Wolff's excellent dictionary, published in 1972 (With permission of the Prof. John U. Wolff). Since this is such a large work, I would like to invite your help.

For now we need help from volunteers and students to correct the text derived from page-images using OCR (optical character recognition) software. To help, you can follow the following steps.

1. Go to the site http://www.pgdp.net/
2. Register yourself as a member (link in top right of welcome page), and sign in.
3. Carefully read the proofreading guidelines at http://www.pgdp.net/c/faq/proofreading_guidelines.php
4. Go to the P1 (Proofreading Round 1) page. (link in top left of activity hub page).
5. Locate the dictionary in the available projects list. (Search for Wolff in your browser)
6. Open the project page for the dictionary.
7. Carefully read the special instructions for this dictionary on the PGDP wiki.
8. Start proofreading, cleaning up the output of the OCR software.

The dictionary will go through no less than five rounds of proofreading, so in the end, we will have a digital text with a very high level of accuracy. This text I will use to create a searchable interface similar to that for Kaufmann on my site.

Contact me if you have any questions. If I can obtain more dictionaries, I will also add them to this site.

Comments

jhellingman's picture

Wolff dictionary available online

After about four years of proof-reading at Distributed Proofreaders, the digital edition of John U. Wolff's Dictionary of Cebuano Visayan is now nearing completion. This dictionary of over 1200 pages was first published in 1972, and is one of the most comprehensive Cebuano dictionaries available. For two years, an experimental interface to the raw dictionary data has been available on this site, but today we are ready to show a new interface to the fully checked and tagged text version of this dictionary.
John Wolff spend about 10 years producing this dictionary from scratch. With a team of local assistants, he collected words from actual spoken conversations and print publications in the old-fashioned way: using cards to note down each word and its usage. This, way, the dictionary reflects the language as it was in the sixties. Some of it strengths for foreign learners are that it includes sample sentences with most entries, uses accents to help with the correct pronunciation, and identifies plants and animals with their scientific names.
For Cebuano speakers, some aspects of this dictionary may make it a little bit harder to use. First of all, the orthography will take some time to get used to, as Wolff resolutely purged the e and o from the alphabet, using i and u, respectively, in their place. Second, many of the translations, especially those of the sample sentences, the author uses American idiom in an attempt not only to translate the literal meaning, but also the connotation of the usage. Reading those will actually also help you improve your English!
Finally, this dictionary also indicates, through a system of codes, the various possible uses of verbs. This system, however, requires a careful reading of the introduction of the dictionary.
This project wouldn't have been possible without the help of countless volunteers proofreading the data, and, even more important the huge effort John Wolff put into compiling this work, and the generosity of his publisher to place this work in the public domain.
All raw data and scripts used to produce this dictionary are available from Google Code (search phildic).
You are now all invited to play around with the interface (http://www.bohol.ph/wced.php), use it as you think it is useful, and tell us about your experiences in the comments below this article. All your ideas and criticism are welcome.