Tutorial - STIL 2011

Extração de informações semânticas a partir da Wikipédia

Doutoranda Clarissa Castellã Xavier
            
Prof. Vera Lúcia Strube de Lima

A Wikipédia, devido sua dimensão, natureza multilíngue, livre e colaborativa, vem sendo reconhecida e explorada pela comunidade científica como um importante recurso para extração de informações. Este tutorial irá apresentar o estado da arte das iniciativas que realizam extração de dados semânticos da Wikipédia, em combinação, ou não, com outras fontes de dados. Propomos o estudo e contraste destes trabalhos, focando no modo como o conteúdo e a estrutura da enciclopédia são explorados, métodos e tecnologias utilizados, forma e teor dos dados extraídos e resultados obtidos.

Conteúdos abordados:

1 – Wikipédia

Introdução

Edição

Qualidade

Estrutura

Download

2 – Conceitos

3 – Perspectivas da Wikipédia

4 – Extração de informações semânticas

A partir de diversas fontes

A partir da Wikipédia

Linhas de investigação

5 - Bases de dados extraídas da Wikipédia

6 - Conclusões

Extração de informações da Wikipédia

Futuro da Wikipédia

Para quem quer fazer pesquisa


Bibliografia

(Melo, 2010) Gerard de Melo e Gerhard Weikum. Untangling the Cross-Lingual Link Structure of Wikipedia. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 844–853, Uppsala, Sweden, 11-16 July 2010.

Milne, D., Medelyan, O., Witten, I.H.: Mining domain-specific thesauri from wikipedia: A case study. In: WI ’06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 442–448. IEEE Computer Society, Washington, DC, USA (2006).

Günter Neumann. Mining Meaning From Wikipedia. LT-lab, DFKI, Saarbrücken (slides)

Metke‐Jimenez, Alejandro and Raymond, Kerry and MacColl, Ian (2010) Ontologies derived from Wikipedia : a framework for comparison. In: Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD 2010), 25‐28 October 2010, Valencia, Spain.

Gaoying Cui, Qin Lu, W.L., Chen, Y.: Corpus exploitation from wikipedia for ontology construction. In: E.L.R.A. (ELRA) (ed.) Proceedings of the Sixth International Language Resources and Evaluation (LREC’08). Marrakech, Morocco (2008)

Simone Paolo Ponzetto Michael Strube. Extracting World and Linguistic Knowledge from Wikipedia. Tutorial, NAACL HLT 2009. (Slides)

Gjergji Kasneci, Maya Ramanath, Fabian Suchanek, and Gerhard Weikum. 2009. The YAGO-NAGA approach to knowledge discovery. SIGMOD Rec. 37, 4 (March 2009), 41-47.

Vivi Nastase, Michael Strube, Benjamin Börschinger, Cäcilia Zirn, and Anas Elghafari (2010). .WikiNet: A very large scale multi-lingual concept network.  In Proceedings of the 7th International Conference on Language Resources and Evaluation, La Valetta, Malta, 17-23 May 2010

Michael Strube and Simone Paolo Ponzetto. 2006. WikiRelate! computing semantic relatedness using wikipedia. In proceedings of the 21st national conference on Artificial intelligence - Volume 2 (AAAI'06), Anthony Cohn (Ed.), Vol. 2. AAAI Press 1419-1424.

Gabrilovich, E. and Markovitch, S. (2007). "Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis", Proceedings of The 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, January 2007

Ponzetto, S. P.; Strube, M. Knowledge derived from wikipedia for computing semantic relatedness. Journal of Artificial Intelligence Research, vol. 30, 2007, pp.181-212.

Gruber, T. R. “Towards Principles for the Design of Ontologies Used for Knowledge Sharing”. International Journal of Human and Computer Studies, vol. 43–5–6, 1993, pp.907-928.

John F. Sowa (1987). "Semantic Networks". In Stuart C Shapiro. Encyclopedia of Artificial Intelligence. Retrieved 2008-04-29.

Pidcock, Woody (2003) What are the differences between a vocabulary, a taxonomy, a thesaurus, an ontology, and a meta-model?

PICKLER, Maria Elisa Valentim. Web Semântica: ontologias como ferramentas de representação do conhecimento. Perspect. ciênc. inf. [online]. 2007, vol.12, n.1 [cited  2011-10-19], pp. 65-83 . Available from: <http://www.scielo.br/scielo.php?script=sci_arttext&pid=S1413-99362007000100006&lng=en&nrm=iso>. ISSN 1413-9936.  http://dx.doi.org/10.1590/S1413-99362007000100006.

Syed, Z., Finin, T.: Unsupervised techniques for discovering ontology elements from Wikipedia article links. In: Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading (FAM-LbR '10), pp. 78—86. Association for Computational Linguistics, Stroudsburg, PA, USA (2010)

Suchanek, F. M., Kasneci, G., Weikum, G.Yago: A large ontology from wikipedia and wordnet. In: Web Semantics: Science, Services and Agents on the World Wide Web, vol. 6(3), pp. 203--217 (2008)

Navigli R., Ponzetto S.P.: BabelNet: building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL '10). Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 216-225 (2010)

Szumlanski, S., Gomez, F.: Automatically acquiring a semantic network of related concepts. In: Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10). ACM, New York, NY, USA, pp. 19--28 (2010)

Melo, G., Weikum G.: MENTA: inducing multilingual taxonomies from wikipedia. In: Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10). ACM, New York, NY, USA, pp. 1099--1108 (2010)

Fogarolli, A.: Wikipedia as a Source of Ontological Knowledge: State of the Art and Application. In: Caballé, S., Xhafa, F., Abraham, A. (eds.) Intelligent Networking, Collaborative Systems and Applications. Studies in Computational Intelligence, vol.329, pp. 1—26. Springer Berlin / Heidelberg (2011)

Nastase, V., Strube, M., Boerschinger, B., Zirn, C., Elghafari, A. Wikinet: A very large scale multilingual concept network. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta (2010)

Wu, F.; Weld, D. S. “Autonomously semantifying Wikipedia”. In: 16th ACM Conference on Conference on information and Knowledge Management (CIKM 2007), 2007, pp.41-50.

Yu, J., Thom, J.A., Tam, A.: Ontology evaluation using wikipedia categories for browsing. In: CIKM ’07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pp. 223–232. ACM, New York, NY, USA (2007). DOI http: //doi.acm.org/10.1145/1321440.1321474

Max Völkel, Markus Krötzsch, Denny Vrandecic, Heiko Haller, and Rudi Studer. 2006. Semantic Wikipedia. In Proceedings of the 15th international conference on World Wide Web (WWW '06). ACM, New York, NY, USA, 585-594. DOI=10.1145/1135777.1135863 http://doi.acm.org/10.1145/1135777.1135863

Ċ
Clarissa CX,
Oct 24, 2011, 11:14 AM
Comments