|
Related to the Nomen Nescio Workshops
ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition; (Sapporo, Japan, July 7-12, 2003)
Background: Named Entity (NE) Recognition systems vary widely, from high-speed bulk methods optimized for indexing, to deep semantic parsers tuned for specific domains. Optimal ways to combine statistical and symbolic models also vary, depending on applications and tasks. Is it possible to:
- maximize use of knowledge-rich resources (e.g. lexicons, NE grammars, parsing) while permitting corpus-based training for domain or language?
- acquire and share resources (including lexicons and grammars) across languages?
- balance performance speed with reasonable accuracy?
- use specific language patterns while permitting rapid transfer to another language?
- minimize variability in results across language types?
Topics of the Workshop:
- the role of the lexicon vs. dynamic processing information
- grammars and lexicons shared (or ported) across languages
- acquisition of multilingual resources (e.g. from corpora)
- translating NEs across multiple languages
- domain tuning
HLT-NAACL 2003 Workshop on the Analysis of
Geographic References; (Edmonton, Canada, 27 May-June 1, 2003)
Background: Effective analysis of textual references to places is a critical core
technology for a wide variety of NLP applications. This workshop focuses
on topics concerning the recognition, disambiguation, normalization,
storage, and display of geographic references, e.g., "New York", "Nueva
York", "LaGuardia Airport", "LaGuardia", "[the] Brooklyn Bridge", "a mile
from downtown Manhattan", "the southern tip of Manhattan Island", "the
Amazon delta", "the San Diego-Tijuana border".
Many place names are lexically ambiguous with respect to their physical
location -- "Orange" as a county in either California or Florida, for
example -- and sometimes also with respect to type -- "New York" as city
or state in the U.S. Such names are relatively straightforward to
normalize, though there is room for discussion about establishing some
common normalization practices. Other forms of expression are more
difficult to normalize, e.g., references to vague areas such as "the
Amazon delta" and relative locations such as "a mile south of the
village".
To respond to the challenges of providing accurate analyses in broad
domains and across languages, and useful information on subjects for which
there is sparse training data, advances in core technology are needed that
can bring to bear both lexical and spatial background knowledge about
places worldwide. For most applications, it is not enough for the system
to bracket a text string and tag it as a "LOCATION"; it is necessary to
normalize the information in a way that specifically describes or even
uniquely identifies the place in question.
Since texts may contain place references without providing all the extra
information needed to disambiguate them, the system needs background
knowledge in some form or another that it can draw on to tell it about
known names and their types and locations. One form of knowledge resource
is a placename gazetteer, and there are large electronic gazetteers in
existence that are publicly available. How can they be tailored and
exploited to meet the needs of NLP?
Such resources may also contain foreign names in native script or in
transliteration. These name forms are critically needed to support
analysis in multilingual and cross-lingual settings. Recognition of the
various ways that a given place may be referenced in one language is a
challenging problem in and of itself, and issues of name translation and
transliteration and special character sets multiply that problem.
Map-based visualization of the results geographic reference analysis
introduces the challenge of associating places with specific locations.
Coordinates that identify a centerpoint of a place may be found in some
gazetteers; in some cases, more extensive coordinates may be available
that approximately define the boundaries of the place. Although it's
obvious that such gazetteer information enables data visualization, it's
also clear that it can sometimes be useful in the process of doing
geographic reference analysis, e.g., to identify the set of states that
are intended by a phrase such as "the states that neighbor California" or
"the states on the coast of the Gulf of Mexico".
Topics of the Workshop:
- Disambiguation of recognized terms and geographic references based on
text evidence ("London" in Ontario v. "London" in England; "New York" as
city v. "New York" as state): Special aspects of recognizing and
characterizing place entities (i.e., methods, etc. that do not apply
equally to processing other types of entities, e.g., persons and
organizations)
- Usage of background knowledge (external gazetteers or other knowledge
sources) to assist in analysis of textual references to places: Partial
term matching; cross-gazetteer term matching; methods for weighing name,
type and location evidence to identify best match
- Usage of results of text analysis to improve knowledge resources:
detecting and filling gaps in coverage of names, types and
containment/coordinate data; detecting need to update existing entries due
to text-based evidence of changes in place names and/or characteristics
(e.g., a commercial building that is converted to a church, a city that
becomes the capital of a newly defined republic)
- Gazetteer localization and multilingual fill; spelling normalization and
cross-language term matching
- Automatic categorization and description of place names: Definition of
standard categorization schemes and mapping among schemes; automatic
processes for coarse- and fine-grained category assignment, e.g., to
distinguish at some level among functional (bridge, building, ...),
geographical (cave, bay, ...) and administrative-political (county,
province, ...) types of places
- Interpretation and representation of complex references, such as
relative locations ("10 miles from Ankara"), vague areas ("the Amazon
delta"), boundaries ("the San Diego-Tijuana border"), metonymies ("Green
Bay" used in reference to football team)
- Language analysis uses for data on absolute coordinates (bounding boxes,
polygons, polylines, etc.) in gazetteers, e.g., understanding containment,
border and neighbor relationships via knowledge of areas and boundaries
- Standards for annotation and data interchange
IJCNLP 2004 Workshop on the Named Entity Recognition for Natural Language Processing Applications; (Sanya, Hainan, 26 March, 2004)
Named Entities (NEs) occupy a considerable proportion in natural language and have remained an important area in natural language processing (NLP). The recognition of proper names as unknown words has long been an issue in word segmentation and part-of-speech tagging, especially for non-alphabetic Asian languages and interlingual NLP involving these languages. Named entities constitute significant pieces of data in information extraction. Proper transliteration of named entities, especially proper names, is critical for the intelligibility and accuracy of machine translation output. This workshop aims at bringing researchers together to discuss the issues and advances in NE recognition and extraction, and how NE could be handled most cost-effectively in a variety of NLP applications.
Topics of the Workshop:
- Symbolic and statistical models for NE recognition
- NE recognition systems
- Translation of NEs across multiple languages
- Resources (lexicons, grammars) for NE extraction
- NE recognition as a subtask in NLP applications
- Evaluation of NE processing in NLP applications
LREC 2004 Workshop on the Beyond Named Entity Recognition, Semantic labelling for NLP tasks; (Lisbon, Portugal, 25 May, 2004)
Although it is generally assumed that improvements in language
processing will be made through the integration of linguistic
information and statistical techniques, the reality is that language
is very diverse and looking for specific patterns of words that repeat
enough to be statistically significant tends not to be a very fruitful
task: sequences longer than three words are not generally repeated
often enough to be statistically significant. At the same time, the
identification of named entities: Names, dates, places, organizations
etc., has proved to be avery useful preliminary task in many natural
language processing systems are interested in pursuing approaches
which extend this notion by identifying and labeling other semantic
information in a text, in such as way as to allow repeatable semantic
patterns to emerge. Our interest is in attacking the data sparseness
problem by exploring ways to collapse (semantically) related phrases
which are expressed by different word sequences.
Topics of the Workshop:
- Methods for lexical - semantic annotation of corpora
- Methods and Standards for lexical semantic representation of
dictionary information
- Lexico-semantic taxonomies
- Existing sources of classification: dictionaries, thesauri and
computerized ontologies
- Corpus-driven methods for semantic disambiguation
- Feature selection for semantic disambiguation
- Lexico-semantic tagging of very large corpora
- Algorithms and methods for disambiguation of semantic phenomena
- Statistical learning models and their applications to semantic
labeling
- Computational learning frameworks for Natural Language Learning
- Semi-supervised and unsupervised statistical semantic disambiguation
- Evaluation of semantic disambiguation
ACM SIGIR 2004 Workshop on Geographic Information Retrieval; (Sheffield, UK, 29th July, 2004)
Almost all human activity can be regarded as taking place within geographic space and as a consequence there are many types of information that are geographically referenced, in the sense that they refer to somewhere on Earth. Information technology for handling geographic information has been based largely on the highly structured map-based representations of space that are used in most geographical information systems (GIS). Relatively little effort has been expended on developing facilities required to access less structured, textual information, in which geographical context may be given by place names and associated terminology for spatial relationships. Such geographical text is commonly found in web documents, but geographical terms are considered by conventional search engines no differently to other search terms. As a consequence, documents will only be retrieved if they contain exact matches with the geographical terminology in the query expression. Documents that refer to alternative versions of the query place name or to places that are in the vicinity, either nearby or even within the query place are unlikely to be found. In recent years a variety of work has looked at the potential of indexing and retrieving unstructured text from the web using geospatial location. A number of examples of geographically oriented search engines exist, and the growth of services based around location provides further impetus for attempts to develop geographical information retrieval. The purpose of this workshop is to bring together the growing community of researchers and practitioners working in the field of geographic information retrieval to discuss progress within the field and discuss future research strands. Examples of topics that are particularly relevant include, but are not confined to:
Topics of the Workshop:
- architectures for geographic search engines;
- spatial indexing of documents and images;
- extraction of geographical context from documents and geo-datasets;
- geographical annotation techniques for geo-referenced documents;
- design, construction, maintenance and access methods for geographical ontologies, gazetteers and geographical thesauri;
- geographical query interfaces for the web and geospatial libraries;
- visualising of the results of geographic searches;
- relevance ranking for geographical search;
- web portals to geo-information; and
- standards for exchange of unstructured or partially-structured geographical information.
|
|
Nomen Nescio Bibliography
Alhaug, Gulbrand (2001). Automatisk namneattkjennar. Nytt om namn - Meldingsblad for
Norsk namnlag, nr. 33, p. 28-29.
Bick, Eckhard (2004-5). A Named Entity Recognizer for Danish. In: Lino
et al. (eds.), Proc. 4th International Conf. on Language Resources and Evaluation, LREC2000 (Lisbon, 2004), pp. 305-308.
Bick, Eckhard (2003-6). Multi-Level NER for Portuguese in a CG Framework. In: Nuno J. Mamede et.al. (eds.) Computational Processing of
the Portuguese Language (Proceedings of PROPOR2003, Faro, June 26-27, 2003), pp.118-125. Springer
Bick, Eckhard (2003-5). Multi-Level NER in a CG framework. In
Proceedings of NoDaLiDa2003, 30-31. May 2003, Reykjavik, forthcoming
Bick, Eckhard (2003-2). Named Entity Recognition for Danish. In: Henrik
Holmboe (red.), Nordic Language Technology, Årbog for Nordisk Sprogteknologisk Forskningsprogram 2000-2004 (Yearbook 2002). p.331-349, Copenhaguen: Museum Tusculanum
Björk Jónsdóttir A. (2003). En vei ved navn Jesus? In H. Holmboe (red.):
Nordisk Sprogteknologi /Nordic Language Technology 2002. Museum
Tusculanums Forlag, Københavns Universitet, København, s.373-378.
Björk Jónsdóttir A. (2003). ARNER - what kind of a name is that? - An
automatic rule-based named entity recogniser for Norwegian. Cand.philol.
thesis, University of Oslo. (146 pages)
Björk Jónsdóttir A. (2002). Named Entity Recognition for Norwegian. In M.
Nissim (ed.) Proceedings of the seventh ESSLI Student Session.
Bondi Johannessen J., Hagen K., Haaland Å., Kokkinakis D., Meurer P., Bick
E., Haltrup D., Björk Jónsdottir A. and Nøklestad A. (2005). Named Entity
Recognition for the Mainland Scandinavian Languages. Literary and
Linguistic Computing, Volume 20, issue 1. (12 pages)
Bondi Johannessen J. (2004). Named Entity Recognition in Scandinavian: The
Nomen Nescio Project. In H. Holmboe (ed.) Nordisk Sprogteknologi 2003,
Museum Tusculanums Forlag, København, 149-158.
Bondi Johannessen J. (2003). Nomen Nescio: Nettverk for en automatisk
navnegjenkjenner for norsk, svensk og dansk. Nordisk Sprogteknologi 2002
(ed. Henrik Holmboe). Museum Tusculanums Forlag, Københavns universitet, 327-330.
Bondi Johannessen J.; Meurer P. and Hagen K. (2003). Recognising word
strings as names. Paper presented at NoDaLiDa (Nordiske
datalingvistikkdager), Reykjavik, Iceland.
Bondi Johannessen J. and Meurer P. (2002). Automatisk gjenkjenning av
vanskelige navn. In Moen, I., H.G. Simonsen, A. Torp og K.I. Vannebo
(red.) MONS 9, Novus forlag, Oslo, s. 141-149.
Bondi Johannessen J. (2002). Nomen Nescio-prosjektet: Bakgrunn og
aktiviteter. I H. Holmboe (red.): Nordisk Sprogteknologi/Nordic Language
Technology 2001. Museum Tusculanums Forlag, Københavns Universitet, København, s.133-140.
Bondi Johannessen J. (2001). En automatisk navnegjenkjenner for norsk,
svensk og dansk. Paper presented at NoDaLiDa, Uppsala, Sweden.
Botolv H. (2004). Automatisk namneattkjenning - ein reiskap i namnegranskinga.
Årsmeldingfor Seksjon for namnegransking
Haaland Å.T.F. (2003). Classification of Names in Norwegian Text Based on Maximum Entropy Modeling. NoDaLiDa. Iceland
Haaland Å.T.F. (200?). Automatic Classification of Names in Norwegian Text Based on Maximum Entropy Modeling. Ongoing PhD Thesis, financed by the "Nordisk Ministerråds Språkteknologiprogram"
Hagen K. (2003). Automatisk sammensleing av navn. In Nordisk Sprogteknologi 2002. København. Museum Tusculanums Forlag. Pp. 351-356
Kokkinakis D. (2002). Namnigenkänning på svenska. Nordisk Sprogteknologi -
Nordic Language Technology. Årbog for Nordisk Sprogteknologisk
Forskningsprogram 2000-2004 . pp. 167-171. Holmboe H. (ed.). Museum Tusculanums Forlag.
Kokkinakis D. (2003). Swedish NER in the Nomen Nescio Project. Nordisk
Sprogteknologi - Nordic Language Technology. Årbog for Nordisk
Sprogteknologisk Forskningsprogram 2000-2004. pp. 379-398. Holmboe H. (ed.). Museum Tusculanums Forlag.
Kokkinakis D. (2004). Reducing the effect of name explosion. LREC
Workshop: Beyond Named Entity Recognition Semantic labelling for NLP tasks. Lisbon, Portugal.
Kokkinakis D. (2004). Att automatiskt känna igen och kategorisera namn i svenska texter. In "Vision och Verklighet", Humanistdagarna.
Humanistdag-boken nr 17. Pp. 157-163. Göteborg University.
Nøklestad, Anders (2004). Memory-based Classification of Proper Names in
Norwegian. Proceedings of the 4th International Conference on Language
Resources and Evaluation (LREC-2004).
Nøklestad, Anders (in progress). Statistical Methods for Acquiring Some
Grammatical Knowledge of Norwegian. Ph.D. thesis, University of Oslo.
rest coming soon...
|
|
Related to the Nomen Nescio Bibliography
Appelt D.E. and Israel D. (1999). Introduction to Information Extraction
Technology. Tutorial Presented at the International Joint Conference on
Artificial Intelligence (IJCAI-99). Stockholm, Sweden.
Bikel D.M., Schwartz R. and Weischedel R.M. (1999). An Algorithm that
Learns What's in a Name. Machine Learning 34 (1-3): 211-231, Kluwer Academic Publishers
Black W.J., Rinaldi F. and Mowatt D. (1998). FACILE: Description of the NE
System used for MUC-7. In Proceedings of the MUC-7, Washington D.C.
Borthwick A., Sterling J., Agichtein E. and Grishman R. (1998). NYU:
Description of the MENE Named Entity System as Used in MUC-7. Proceedings
of the Seventh Message Understanding Conference (MUC-7). Washington D.C.
Chinchor N. (1997). MUC-7 Named Entity Definition, version 3.5. Available
from: http://www.muc.saic.com/proceedings/ne_task.html.
Collins M. and Singer Y. (1999). Unsupervised models for named entity classification.
In Proceedings of the Joint SIGDAT Conference on Empirical Methods in
Natural Language Processing and Very Large Corpora
Cucchiarelli A., Luzi D., and Velardi P. (1997). Automatic Semantic Tagging
of Unknown Proper Names. In Proceedings of the COLING/ACL Conference, pp.
286-292, Montreal, Canada
Douthat A. (1998). The message understanding conference scoring software user's manual.
http://www.itl.nist.gov/iaui/894.02/-related projects/muc_sw/muc_sw_manual.html.
Freitag D. (1998). Machine Learning for Information Extraction in
Informal Domains. Phd Thesis. CMU-CS-99-104, Pittsburgh, PA.
Friburger N. and Maurel D. (2002). Textual similarity based on proper names,
Mathematical Formal Information Retrieval (MFIR'2002), pp. 155-167. Tampere, Finland.
Gallippi A. (1996). Learning to Recognize Names Across Languages.
Proceedings of the 16th International Conference on Computational
Linguistics, (COLING), vol. 1:424-429. Copenhagen, Denmark.
Grishman R. (1995). The NYU System for MUC-6 or Where's the Syntax? Proceedings
of the MUC-6 workshop, Washington. November 1995.
Hobbs et al. (1996). FASTUS A Cascaded Finite-State Transducer for
Extracting Information from Natural-Language Texts. Available from:
http://www.ai.sri.com/~hobbs/.
Karkaletsis V., Spyropoulos C.D. and G. Petasis. (1999). Named Entity Recognition from Greek texts:
the GIE Project. In S.Tzafestas, editor, Advances in Intelligent Systems: Concepts, Tools and Applications,
pages 131-142. Kluwer Academic Publishers.
Krupka G.R. and Hausman K. (1998). IsoQuest, Inc.: Description of the
NetOwl Extractor System as Used for MUC-7. Proceedings of the 7th Message
Understanding Conference (MUC-7). Washington D.C.
Lehrer A. (1992). Names and Naming: Why we Need Fields and Frames. Frames,
Fields and Contrasts. New Essays in Semantic and Lexical Organization.
Lehrer A. and Kittay E.F. (eds). Lawrence Erlbaum Associates.
Mani I. and MacMillan R. T. (1996). Identifying Unknown Proper Names in Newswire
Text. In Corpus Processing for Lexical Acquisition, pp. 41-59. MIT Press. Cambridge, MA.
Maynard D. et al (2001). Named Entity Recognition from Diverse Text Types.
Recent Advances in Natural Language Processing 2001 Conference, Tzigov Chark, Bulgaria.
Maynard D., Bontcheva K. and H. Cunningham. (2003). Towards a semantic extraction of named entities.
Recent Advances in Natural Language Processing, Bulgaria, 2003.
McDonald D. (1996). Internal and External Evidence in the Identification
and Semantic Categorisation of Proper Nouns. Corpus-Processing for Lexical
Acquisition, 21-39. Pustejovsky J. and Boguraev B. (eds). MIT Press.
Mikheev A., Moens M. and Grover C. (1999). Named Entity Recognition
without Gazeteers. Proceedings of the 9th European Chapter of the
Association of Computational Linguistics (EACL), 1-8. Bergen, Norway.
Nobata C., Collier N. H. and Tsujii J. (2000). Comparison between Tagged Corpora for the Named
Entity Task. Proceedings of the Workshop on Comparing Corpora (at ACL'2000), Kilgarriff A. and Berber
Sardinha T. (eds.), pp.20-27, Hong Kong University of Science and Technology.
Paik W., Liddy E.D., Yu E. and McKenna M. (1996). Categorizing and
Standardizing Proper Nouns for Efficient Information Retrieval. Corpus
Processing for Lexical Acquisition, 61-73. Boguarev B. and Pustejovsky J.
(eds). Bradford.
Pastra K. (2000). Basic Semantic Element Extraction: The Rule Writing Experience.
MSc Thesis. University of Manchester
Pastra K., Maynard D., Hamza O., Cunningham J. and Wilks Y. (2002). How feasible is the reuse of grammars
for Named Entity Recognition?. Proceedings of the 3rd Conference On Language Resources and Evaluation.
Las Palmas, Spain
Poibeau T. and Kosseim L. (2001). Proper Name Extraction from
Non-Journalistic Texts. Proceedings of the Eleventh Meeting of Computational
Linguistics in the Netherlands (CLIN), pp. 144-157.
Ravin Y. and Wacholder N. (1997). Extracting names from natural-language text
IBM Research Report, RC 20338.
Sekine S., Grishman R. and Shinnou H. (1998). A Decision Tree Method for
Finding and Classifying Names in Japanese Texts. Proceedings of the Sixth
Workshop on Very Large Corpora. Charniak E. (ed), Montréal, Canada
Sekine S. (1998). NYU: Description of the Japanese NE System used for MET.
In Proceedings of the MUC-7, Washington D.C.
Sekine S., Sudo K. and Nobata C. (2002). Extended Named Entity Hierarchy
. Proceedings of the 3rd Conference On Language Resources and Evaluation.
Las Palmas, Spain
Sheremetyeva S., Cowie J., Nirenburg S. and Zajac R. (1998). A Multilingual
Onomasticon as a Multipurpose NLP Resource. In Proceedings of the 1st
Language Resources and Evaluation Conference (LREC), Granada, Spain
Solorio T. and López A. (2004). Learning named entity classifiers using
support vector machines.In A. Gelbukh, editor, CICLing-2004, Lecture
Notes in Computer Science. Springer-Verlag.
Sun J., Gao J.F., Zhang L., Zhou M. and Huang C.N. (2002). Chinese Named Entity
Identification Using Class-based Language Model. pp.967-973. In proceeding of the 19th
International Conference on Computational Linguistics (COLING2002).
Takeuchi K. and Collier N. (2002). Use of Support Vector Machines in Extended Named
Entity Recognition. The 6th Conference on Natural Language Learning.
Thompson P. and Dozier C. (1999). Name Recognition and Retrieval
Performance. Natural Language Information Retrieval, 261-272. Strzalkowski
T. (ed.). Kluwer Academic Publishers.
Wakao T., Gaizauskas R. and Wilks Y. (1996). Evaluation of an Algorithm
for the Recognition of Proper Names. Proceedings of the 16th International
Conference on Computational Linguistics, (COLING), 418-423. Copenhagen,
Denmark.
|