[NN Bibliography] [Related to the NN Bibliography] [Related to the NN Projects] [Related to the NN Workshops]


Related to the Nomen Nescio Projects

MUC-6; http://www.cs.nyu.edu/cs/faculty/grishman/muc6.html (visited Oct 2003)

MUC-6 was the sixth in a series of Message Understanding Conferences, was held in November 1995. This conference, like the previous five MUCs, was organized by the Naval Research and Development group (NRaD) of NCCOSC (previously NOSC). These conferences, which have involved the evaluation of information extraction systems applied to a common task, have been funded by ARPA to measure and foster progress in information extraction. NYU and NRaD worked together to develop specifications for a set of four evaluation tasks:

  1. named entity recognition
  2. coreference
  3. template elements
  4. scenario templates (traditional information extraction)

MUC-7; http://www.itl.nist.gov/iad/894.02/related_projects/muc/proceedings/muc_7_toc.html (visited Oct 2003)

For the first time, the multilingual NE evaluation was run using training and test articles from comparable domains for all languages. The domain for all languages for training was airline crashes and the domain for all languages for testing was launch events. In MUC-7, there were more international sites participating than ever before. The papers reflect interesting observations by system developers who were non-native speakers of the language of their system and system developers who were native speakers of the language of their system.

ACE - Automatic Content Extraction; http://www.itl.nist.gov/iaui/894.01/tests/ace/index.htm (visited Oct 2003)

The objective of the ACE program is to develop automatic content extraction technology to support automatic processing of human language in text form. The program is devoted to three sources types. These are namely newswire, broadcast news (with text derived from ASR), and newspaper (with text derived from OCR). ACE technology R&D is aimed at supporting various classification, filtering, and selection applications by extracting and representing language content (i.e., the meaning conveyed by the data). Thus the ACE program requires the development of technologies that automatically detect and characterize this meaning.

MUSE - a MUlti-Source Entity finder; http://www.dcs.shef.ac.uk/~hamish/muse.html (visited Oct 2003)

MUSE is an Information Extraction (IE) project to find named entities appearing in many different sorts of text with minimal alteration of the IE software involved. It is a 4 person-year project running through to approx. the end of 2001. The IE technology involved must adapt to different text types and information needs, so we could have called the project Adaptive MUlti-Source Entity finder (AMUSE), but then we would have been promising something that probably won't happen. Still, let's hope at least that it will be good science.


Related to the Nomen Nescio Workshops

ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition(Sapporo, Japan, July 7-12, 2003)

Background: Named Entity (NE) Recognition systems vary widely, from high-speed bulk methods optimized for indexing, to deep semantic parsers tuned for specific domains. Optimal ways to combine statistical and symbolic models also vary, depending on applications and tasks. Is it possible to:

  1. maximize use of knowledge-rich resources (e.g. lexicons, NE grammars, parsing) while permitting corpus-based training for domain or language?
  2. acquire and share resources (including lexicons and grammars) across languages?
  3. balance performance speed with reasonable accuracy?
  4. use specific language patterns while permitting rapid transfer to another language?
  5. minimize variability in results across language types?

Topics of the Workshop:

  1. the role of the lexicon vs. dynamic processing information
  2. grammars and lexicons shared (or ported) across languages
  3. acquisition of multilingual resources (e.g. from corpora)
  4. translating NEs across multiple languages
  5. domain tuning

HLT-NAACL 2003 Workshop on the Analysis of Geographic References(Edmonton, Canada, 27 May-June 1, 2003)

Background: Effective analysis of textual references to places is a critical core technology for a wide variety of NLP applications. This workshop focuses on topics concerning the recognition, disambiguation, normalization, storage, and display of geographic references, e.g., "New York", "Nueva York", "LaGuardia Airport", "LaGuardia", "[the] Brooklyn Bridge", "a mile from downtown Manhattan", "the southern tip of Manhattan Island", "the Amazon delta", "the San Diego-Tijuana border". Many place names are lexically ambiguous with respect to their physical location -- "Orange" as a county in either California or Florida, for example -- and sometimes also with respect to type -- "New York" as city or state in the U.S. Such names are relatively straightforward to normalize, though there is room for discussion about establishing some common normalization practices. Other forms of expression are more difficult to normalize, e.g., references to vague areas such as "the Amazon delta" and relative locations such as "a mile south of the village".

To respond to the challenges of providing accurate analyses in broad domains and across languages, and useful information on subjects for which there is sparse training data, advances in core technology are needed that can bring to bear both lexical and spatial background knowledge about places worldwide. For most applications, it is not enough for the system to bracket a text string and tag it as a "LOCATION"; it is necessary to normalize the information in a way that specifically describes or even uniquely identifies the place in question. Since texts may contain place references without providing all the extra information needed to disambiguate them, the system needs background knowledge in some form or another that it can draw on to tell it about known names and their types and locations. One form of knowledge resource is a placename gazetteer, and there are large electronic gazetteers in existence that are publicly available. How can they be tailored and exploited to meet the needs of NLP?

Such resources may also contain foreign names in native script or in transliteration. These name forms are critically needed to support analysis in multilingual and cross-lingual settings. Recognition of the various ways that a given place may be referenced in one language is a challenging problem in and of itself, and issues of name translation and transliteration and special character sets multiply that problem. Map-based visualization of the results geographic reference analysis introduces the challenge of associating places with specific locations. Coordinates that identify a centerpoint of a place may be found in some gazetteers; in some cases, more extensive coordinates may be available that approximately define the boundaries of the place. Although it's obvious that such gazetteer information enables data visualization, it's also clear that it can sometimes be useful in the process of doing geographic reference analysis, e.g., to identify the set of states that are intended by a phrase such as "the states that neighbor California" or "the states on the coast of the Gulf of Mexico".

Topics of the Workshop:

  1. Disambiguation of recognized terms and geographic references based on text evidence ("London" in Ontario v. "London" in England; "New York" as city v. "New York" as state): Special aspects of recognizing and characterizing place entities (i.e., methods, etc. that do not apply equally to processing other types of entities, e.g., persons and organizations)
  2. Usage of background knowledge (external gazetteers or other knowledge sources) to assist in analysis of textual references to places: Partial term matching; cross-gazetteer term matching; methods for weighing name, type and location evidence to identify best match
  3. Usage of results of text analysis to improve knowledge resources: detecting and filling gaps in coverage of names, types and containment/coordinate data; detecting need to update existing entries due to text-based evidence of changes in place names and/or characteristics (e.g., a commercial building that is converted to a church, a city that becomes the capital of a newly defined republic)
  4. Gazetteer localization and multilingual fill; spelling normalization and cross-language term matching
  5. Automatic categorization and description of place names: Definition of standard categorization schemes and mapping among schemes; automatic processes for coarse- and fine-grained category assignment, e.g., to distinguish at some level among functional (bridge, building, ...), geographical (cave, bay, ...) and administrative-political (county, province, ...) types of places
  6. Interpretation and representation of complex references, such as relative locations ("10 miles from Ankara"), vague areas ("the Amazon delta"), boundaries ("the San Diego-Tijuana border"), metonymies ("Green Bay" used in reference to football team)
  7. Language analysis uses for data on absolute coordinates (bounding boxes, polygons, polylines, etc.) in gazetteers, e.g., understanding containment, border and neighbor relationships via knowledge of areas and boundaries
  8. Standards for annotation and data interchange

IJCNLP 2004 Workshop on the Named Entity Recognition for Natural Language Processing Applications(Sanya, Hainan, 26 March, 2004)

Named Entities (NEs) occupy a considerable proportion in natural language and have remained an important area in natural language processing (NLP). The recognition of proper names as unknown words has long been an issue in word segmentation and part-of-speech tagging, especially for non-alphabetic Asian languages and interlingual NLP involving these languages. Named entities constitute significant pieces of data in information extraction. Proper transliteration of named entities, especially proper names, is critical for the intelligibility and accuracy of machine translation output. This workshop aims at bringing researchers together to discuss the issues and advances in NE recognition and extraction, and how NE could be handled most cost-effectively in a variety of NLP applications.

Topics of the Workshop:

  1. Symbolic and statistical models for NE recognition
  2. NE recognition systems
  3. Translation of NEs across multiple languages
  4. Resources (lexicons, grammars) for NE extraction
  5. NE recognition as a subtask in NLP applications
  6. Evaluation of NE processing in NLP applications

LREC 2004 Workshop on the Beyond Named Entity Recognition, Semantic labelling for NLP tasks(Lisbon, Portugal, 25 May, 2004)

Although it is generally assumed that improvements in language processing will be made through the integration of linguistic information and statistical techniques, the reality is that language is very diverse and looking for specific patterns of words that repeat enough to be statistically significant tends not to be a very fruitful task: sequences longer than three words are not generally repeated often enough to be statistically significant. At the same time, the identification of named entities: Names, dates, places, organizations etc., has proved to be avery useful preliminary task in many natural language processing systems are interested in pursuing approaches which extend this notion by identifying and labeling other semantic information in a text, in such as way as to allow repeatable semantic patterns to emerge. Our interest is in attacking the data sparseness problem by exploring ways to collapse (semantically) related phrases which are expressed by different word sequences.

Topics of the Workshop:

  1. Methods for lexical - semantic annotation of corpora
  2. Methods and Standards for lexical semantic representation of dictionary information
  3. Lexico-semantic taxonomies
  4. Existing sources of classification: dictionaries, thesauri and computerized ontologies
  5. Corpus-driven methods for semantic disambiguation
  6. Feature selection for semantic disambiguation
  7. Lexico-semantic tagging of very large corpora
  8. Algorithms and methods for disambiguation of semantic phenomena
  9. Statistical learning models and their applications to semantic labeling
  10. Computational learning frameworks for Natural Language Learning
  11. Semi-supervised and unsupervised statistical semantic disambiguation
  12. Evaluation of semantic disambiguation

ACM SIGIR 2004 Workshop on Geographic Information Retrieval(Sheffield, UK, 29th July, 2004)

Almost all human activity can be regarded as taking place within geographic space and as a consequence there are many types of information that are geographically referenced, in the sense that they refer to somewhere on Earth. Information technology for handling geographic information has been based largely on the highly structured map-based representations of space that are used in most geographical information systems (GIS). Relatively little effort has been expended on developing facilities required to access less structured, textual information, in which geographical context may be given by place names and associated terminology for spatial relationships. Such geographical text is commonly found in web documents, but geographical terms are considered by conventional search engines no differently to other search terms. As a consequence, documents will only be retrieved if they contain exact matches with the geographical terminology in the query expression. Documents that refer to alternative versions of the query place name or to places that are in the vicinity, either nearby or even within the query place are unlikely to be found. In recent years a variety of work has looked at the potential of indexing and retrieving unstructured text from the web using geospatial location. A number of examples of geographically oriented search engines exist, and the growth of services based around location provides further impetus for attempts to develop geographical information retrieval. The purpose of this workshop is to bring together the growing community of researchers and practitioners working in the field of geographic information retrieval to discuss progress within the field and discuss future research strands. Examples of topics that are particularly relevant include, but are not confined to:

Topics of the Workshop:

  1. architectures for geographic search engines;
  2. spatial indexing of documents and images;
  3. extraction of geographical context from documents and geo-datasets;
  4. geographical annotation techniques for geo-referenced documents;
  5. design, construction, maintenance and access methods for geographical ontologies, gazetteers and geographical thesauri;
  6. geographical query interfaces for the web and geospatial libraries;
  7. visualising of the results of geographic searches;
  8. relevance ranking for geographical search;
  9. web portals to geo-information; and
  10. standards for exchange of unstructured or partially-structured geographical information.

Nomen Nescio Bibliography

Alhaug, Gulbrand (2001). Automatisk namneattkjennar. Nytt om namn - Meldingsblad for Norsk namnlag, nr. 33, p. 28-29.
Bick, Eckhard (2004-5). A Named Entity Recognizer for Danish. In: Lino et al. (eds.), Proc. 4th International Conf. on Language Resources and Evaluation, LREC2000 (Lisbon, 2004), pp. 305-308.
Bick, Eckhard (2003-6). Multi-Level NER for Portuguese in a CG Framework. In: Nuno J. Mamede et.al. (eds.) Computational Processing of the Portuguese Language (Proceedings of PROPOR2003, Faro, June 26-27, 2003), pp.118-125. Springer
Bick, Eckhard (2003-5). Multi-Level NER in a CG framework. In Proceedings of NoDaLiDa2003, 30-31. May 2003, Reykjavik, forthcoming
Bick, Eckhard (2003-2). Named Entity Recognition for Danish. In: Henrik Holmboe (red.), Nordic Language Technology, Årbog for Nordisk Sprogteknologisk Forskningsprogram 2000-2004 (Yearbook 2002). p.331-349, Copenhaguen: Museum Tusculanum
Björk Jónsdóttir A. (2003). En vei ved navn Jesus? In H. Holmboe (red.): Nordisk Sprogteknologi /Nordic Language Technology 2002. Museum Tusculanums Forlag, Københavns Universitet, København, s.373-378.
Björk Jónsdóttir A. (2003). ARNER - what kind of a name is that? - An automatic rule-based named entity recogniser for Norwegian. Cand.philol. thesis, University of Oslo. (146 pages)
Björk Jónsdóttir A. (2002). Named Entity Recognition for Norwegian. In M. Nissim (ed.) Proceedings of the seventh ESSLI Student Session.
Bondi Johannessen J., Hagen K., Haaland Å., Kokkinakis D., Meurer P., Bick E., Haltrup D., Björk Jónsdottir A. and Nøklestad A. (2005). Named Entity Recognition for the Mainland Scandinavian Languages. Literary and Linguistic Computing, Volume 20, issue 1. (12 pages)
Bondi Johannessen J. (2004). Named Entity Recognition in Scandinavian: The Nomen Nescio Project. In H. Holmboe (ed.) Nordisk Sprogteknologi 2003, Museum Tusculanums Forlag, København, 149-158.
Bondi Johannessen J. (2003). Nomen Nescio: Nettverk for en automatisk navnegjenkjenner for norsk, svensk og dansk. Nordisk Sprogteknologi 2002 (ed. Henrik Holmboe). Museum Tusculanums Forlag, Københavns universitet, 327-330.
Bondi Johannessen J.; Meurer P. and Hagen K. (2003). Recognising word strings as names. Paper presented at NoDaLiDa (Nordiske datalingvistikkdager), Reykjavik, Iceland.
Bondi Johannessen J. and Meurer P. (2002). Automatisk gjenkjenning av vanskelige navn. In Moen, I., H.G. Simonsen, A. Torp og K.I. Vannebo (red.) MONS 9, Novus forlag, Oslo, s. 141-149.
Bondi Johannessen J. (2002). Nomen Nescio-prosjektet: Bakgrunn og aktiviteter. I H. Holmboe (red.): Nordisk Sprogteknologi/Nordic Language Technology 2001. Museum Tusculanums Forlag, Københavns Universitet, København, s.133-140.
Bondi Johannessen J. (2001). En automatisk navnegjenkjenner for norsk, svensk og dansk. Paper presented at NoDaLiDa, Uppsala, Sweden.
Botolv H. (2004). Automatisk namneattkjenning - ein reiskap i namnegranskinga. Årsmeldingfor Seksjon for namnegransking Haaland Å.T.F. (2003). Classification of Names in Norwegian Text Based on Maximum Entropy Modeling. NoDaLiDa. Iceland
Haaland Å.T.F. (200?). Automatic Classification of Names in Norwegian Text Based on Maximum Entropy Modeling. Ongoing PhD Thesis, financed by the "Nordisk Ministerråds Språkteknologiprogram"
Hagen K. (2003). Automatisk sammensleing av navn. In Nordisk Sprogteknologi 2002. København. Museum Tusculanums Forlag. Pp. 351-356
Kokkinakis D. (2002). Namnigenkänning på svenska. Nordisk Sprogteknologi - Nordic Language Technology. Årbog for Nordisk Sprogteknologisk Forskningsprogram 2000-2004 . pp. 167-171. Holmboe H. (ed.). Museum Tusculanums Forlag.
Kokkinakis D. (2003). Swedish NER in the Nomen Nescio Project. Nordisk Sprogteknologi - Nordic Language Technology. Årbog for Nordisk Sprogteknologisk Forskningsprogram 2000-2004. pp. 379-398. Holmboe H. (ed.). Museum Tusculanums Forlag.
Kokkinakis D. (2004). Reducing the effect of name explosion. LREC Workshop: Beyond Named Entity Recognition Semantic labelling for NLP tasks. Lisbon, Portugal.
Kokkinakis D. (2004). Att automatiskt känna igen och kategorisera namn i svenska texter. In "Vision och Verklighet", Humanistdagarna. Humanistdag-boken nr 17. Pp. 157-163. Göteborg University.
Nøklestad, Anders (2004). Memory-based Classification of Proper Names in Norwegian. Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004).
Nøklestad, Anders (in progress). Statistical Methods for Acquiring Some Grammatical Knowledge of Norwegian. Ph.D. thesis, University of Oslo.
rest coming soon...

Related to the Nomen Nescio Bibliography

Appelt D.E. and Israel D. (1999). Introduction to Information Extraction Technology. Tutorial Presented at the International Joint Conference on Artificial Intelligence (IJCAI-99). Stockholm, Sweden.
Bikel D.M., Schwartz R. and Weischedel R.M. (1999). An Algorithm that Learns What's in a Name. Machine Learning 34 (1-3): 211-231, Kluwer Academic Publishers
Black W.J., Rinaldi F. and Mowatt D. (1998). FACILE: Description of the NE System used for MUC-7. In Proceedings of the MUC-7, Washington D.C.
Borthwick A., Sterling J., Agichtein E. and Grishman R. (1998). NYU: Description of the MENE Named Entity System as Used in MUC-7. Proceedings of the Seventh Message Understanding Conference (MUC-7). Washington D.C.
Chinchor N. (1997). MUC-7 Named Entity Definition, version 3.5. Available from: http://www.muc.saic.com/proceedings/ne_task.html.
Collins M. and Singer Y. (1999). Unsupervised models for named entity classification. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
Cucchiarelli A., Luzi D., and Velardi P. (1997). Automatic Semantic Tagging of Unknown Proper Names. In Proceedings of the COLING/ACL Conference, pp. 286-292, Montreal, Canada
Douthat A. (1998). The message understanding conference scoring software user's manual. http://www.itl.nist.gov/iaui/894.02/-related projects/muc_sw/muc_sw_manual.html.
Freitag D. (1998). Machine Learning for Information Extraction in Informal Domains. Phd Thesis. CMU-CS-99-104, Pittsburgh, PA.
Friburger N. and Maurel D. (2002). Textual similarity based on proper names, Mathematical Formal Information Retrieval (MFIR'2002), pp. 155-167. Tampere, Finland.
Gallippi A. (1996). Learning to Recognize Names Across Languages. Proceedings of the 16th International Conference on Computational Linguistics, (COLING), vol. 1:424-429. Copenhagen, Denmark.
Grishman R. (1995). The NYU System for MUC-6 or Where's the Syntax? Proceedings of the MUC-6 workshop, Washington. November 1995.
Hobbs et al. (1996). FASTUS A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Texts. Available from: http://www.ai.sri.com/~hobbs/.
Karkaletsis V., Spyropoulos C.D. and G. Petasis. (1999). Named Entity Recognition from Greek texts: the GIE Project. In S.Tzafestas, editor, Advances in Intelligent Systems: Concepts, Tools and Applications, pages 131-142. Kluwer Academic Publishers.
Krupka G.R. and Hausman K. (1998). IsoQuest, Inc.: Description of the NetOwl Extractor System as Used for MUC-7. Proceedings of the 7th Message Understanding Conference (MUC-7). Washington D.C.
Lehrer A. (1992). Names and Naming: Why we Need Fields and Frames. Frames, Fields and Contrasts. New Essays in Semantic and Lexical Organization. Lehrer A. and Kittay E.F. (eds). Lawrence Erlbaum Associates.
Mani I. and MacMillan R. T. (1996). Identifying Unknown Proper Names in Newswire Text. In Corpus Processing for Lexical Acquisition, pp. 41-59. MIT Press. Cambridge, MA.
Maynard D. et al (2001). Named Entity Recognition from Diverse Text Types. Recent Advances in Natural Language Processing 2001 Conference, Tzigov Chark, Bulgaria.
Maynard D., Bontcheva K. and H. Cunningham. (2003). Towards a semantic extraction of named entities. Recent Advances in Natural Language Processing, Bulgaria, 2003.
McDonald D. (1996). Internal and External Evidence in the Identification and Semantic Categorisation of Proper Nouns. Corpus-Processing for Lexical Acquisition, 21-39. Pustejovsky J. and Boguraev B. (eds). MIT Press.
Mikheev A., Moens M. and Grover C. (1999). Named Entity Recognition without Gazeteers. Proceedings of the 9th European Chapter of the Association of Computational Linguistics (EACL), 1-8. Bergen, Norway.
Nobata C., Collier N. H. and Tsujii J. (2000). Comparison between Tagged Corpora for the Named Entity Task. Proceedings of the Workshop on Comparing Corpora (at ACL'2000), Kilgarriff A. and Berber Sardinha T. (eds.), pp.20-27, Hong Kong University of Science and Technology.
Paik W., Liddy E.D., Yu E. and McKenna M. (1996). Categorizing and Standardizing Proper Nouns for Efficient Information Retrieval. Corpus Processing for Lexical Acquisition, 61-73. Boguarev B. and Pustejovsky J. (eds). Bradford.
Pastra K. (2000). Basic Semantic Element Extraction: The Rule Writing Experience. MSc Thesis. University of Manchester
Pastra K., Maynard D., Hamza O., Cunningham J. and Wilks Y. (2002). How feasible is the reuse of grammars for Named Entity Recognition?. Proceedings of the 3rd Conference On Language Resources and Evaluation. Las Palmas, Spain
Poibeau T. and Kosseim L. (2001). Proper Name Extraction from Non-Journalistic Texts. Proceedings of the Eleventh Meeting of Computational Linguistics in the Netherlands (CLIN), pp. 144-157.
Ravin Y. and Wacholder N. (1997). Extracting names from natural-language text IBM Research Report, RC 20338.
Sekine S., Grishman R. and Shinnou H. (1998). A Decision Tree Method for Finding and Classifying Names in Japanese Texts. Proceedings of the Sixth Workshop on Very Large Corpora. Charniak E. (ed), Montréal, Canada
Sekine S. (1998). NYU: Description of the Japanese NE System used for MET. In Proceedings of the MUC-7, Washington D.C.
Sekine S., Sudo K. and Nobata C. (2002). Extended Named Entity Hierarchy . Proceedings of the 3rd Conference On Language Resources and Evaluation. Las Palmas, Spain
Sheremetyeva S., Cowie J., Nirenburg S. and Zajac R. (1998). A Multilingual Onomasticon as a Multipurpose NLP Resource. In Proceedings of the 1st Language Resources and Evaluation Conference (LREC), Granada, Spain
Solorio T. and López A. (2004). Learning named entity classifiers using support vector machines.In A. Gelbukh, editor, CICLing-2004, Lecture Notes in Computer Science. Springer-Verlag.
Sun J., Gao J.F., Zhang L., Zhou M. and Huang C.N. (2002). Chinese Named Entity Identification Using Class-based Language Model. pp.967-973. In proceeding of the 19th International Conference on Computational Linguistics (COLING2002).
Takeuchi K. and Collier N. (2002). Use of Support Vector Machines in Extended Named Entity Recognition. The 6th Conference on Natural Language Learning. Thompson P. and Dozier C. (1999). Name Recognition and Retrieval Performance. Natural Language Information Retrieval, 261-272. Strzalkowski T. (ed.). Kluwer Academic Publishers.
Wakao T., Gaizauskas R. and Wilks Y. (1996). Evaluation of an Algorithm for the Recognition of Proper Names. Proceedings of the 16th International Conference on Computational Linguistics, (COLING), 418-423. Copenhagen, Denmark.

Top