data provenance

Liberman on Open Access and the three-legged stool

This post over at Language Log is highly recommended. A quick excerpt:


“reproducible research” [...] requires three things: (1) the data sets that serve as input; (2) the programs needed to run the experiment; and (3) a comprehensible account of what the experiment does, why it matters, and what the results are.

This is from Mark Liberman's abstract for his talk at the Berlin 9 Open Access Conference taking place in Maryland (not Berlin).

Conference on Science and the Internet 2012

From the call for papers:

Online media have brought about numerous changes in scholarly practices, including, but not limited to gathering data, finding relevant literature, making research and results accessible, organising collaboration, communicating with colleagues and students as well as creating fruitful learning environments.

Adapting a Scientific Workflow Infrastructure to Linguistics

In Linguistics (and similar social sciences), there are no standard 'workflow workbenches' that can be used for non-programmers to develop, use, and share their workflows. However, as an increasingly data-intensive science, computational linguists are using computational pipelines in their research, in order to facilitate their main work.

Changing the Conduct of Science in the Information Age

The National Science Foundation has posted a workshop report entitled Changing the Conduct of Science in the Information Age. While it doesn't appear to contain direct input from linguists, many of the issues it discusses will be familiar to those interested in promoting a cyberlinguistics infrastructure.

From the executive summary:

"Linked Data in Linguistics" at DGfS 2012

Linked Data in Linguistics
Linguists from all disciplines produce more and more data and share the challenge how to make this data accessible to other researchers in their field and beyond. This does not only concern the general availability of data, but also the representation of the structure of the data. Linked Data is one paradigm which can be employed to tackle this task.
We are happy to announce the workshop "Linked Data in Linguistics" at the annual meeting of the German Linguistic Society (Deutsche Gesellschaft für Sprachwissenschaft, DGfS) taking place March 7-9, 2012 in Frankfurt a.M., Germany.

Access to lexical databases: discussion

Claire Bowern has started a discussion on her blog, Anggarrgoon, about access to aggregated lexical data: how to protect the rights of the various stake holders while encouraging as much sharing as possible. I enjoyed her tongue-in-cheek suggestion that linguist-contributors should, in game-theoretic fashion, get access to data in proportion to the data they share.

Data provenance and data aggregation

Peter Austin, over at Endangered Languages and Cultures, has initiated a discussion on citation practices (with James McElvenny also participating), and it was prompted (at least partly) by some data I have had a role in processing as part of the LEGO project.

Interdisciplinary Centre for Social and Language Documentation in Portugal

The Centro Interdisciplinar de Documentação Linguística e Social (CIDLeS) is an interdisciplinary non-profit centre dedicated to the documentation and preservation of the linguistic (and cultural) heritage in Europe. It was founded in January 2010 as a result of the work of a number of researchers at the Institute of General Linguistics and Language Typology at the University of Munich and at the Department of Portuguese Studies at the Universidade Nova de Lisboa.

Beyond the PDF?

While looking for something on this blog http://cameronneylon.net/category/blog/ (which I recommend in general), I stumbled on the fact that an interesting workshop recently took place entitled Beyond the PDF. The workshop goal is described as follows:

A Grand Challenge for Linguistics: Scaling Up and Integrating Models

In response to NSF's call for White Papers in the SBE 2020 Initiative, Jeff Good and I have submitted a paper outlining our take on Cyberinfrastructure for Linguistics, why its necessary, and how it can come about. The abstract:

Syndicate content
Powered by Drupal, an open source content management system
foo