Open data and open source. Incompatible?

16 Jul
2010

Open data movement seemingly doesn’t differ much from other “open” movements – its goal is to have  free access to data (you can substitute data for articles/source code/creative content) without certain restrictions and mechanisms of control. Quite recently a group of usuall suspects (Cameron Neylon, John Wilbanks, Peter Murray-Rust and Rufus Pollock) put together a set of principles for open data in science called Panton Principles.

It’s easy to think about open data in similar way to thinking about open access or open source. However I’ve come to think that open data, especially in life sciences, is different. The background story is that we had to remove “open data” component from a big grant proposal because it was incompatible with the policy (aka long-term strategy) of the funder. It didn’t violate any of the requirements, but in get-a-grant game you don’t want to lower your chances on purpose. Hence no “open data” on that project. Konrad pointed out in this FriendFeed thread that Poland is no different to other countries in respect of compatibility of open data principles with long-term strategy of science development. And indeed – Wikipedia entry on open scientific data states that it can be challenged already by individual institutions as well as by grant agencies:

As the term Open Data is relatively new it is difficult to collect arguments against it. Unlike Open Access where groups of publishers have stated their concerns, Open Data is normally challenged by individual institutions. Their arguments may include:

  • this is a non-profit organisation and the revenue is necessary to support other activities (e.g. learned society publishing supports the society)
  • the government gives specific legitimacy for certain organisations to recover costs (NIST in US, Ordnance Survey in UK)
  • government funding may not be used to duplicate or challenge the activities of the private sector (e.g. PubChem)

And this made me think that from a point of view of a funder that wants some return on investment in science, open data and open source are pretty much incompatible. In abundance of open source code, data becomes an asset and is more protected. In abundance of open data, analysis methods become more valuable, and as such more protected (meaning less likely to be released under open source license).

Current business models don’t fit well in situation of abundance-of-everything (that is data and software), so I don’t expect that science funders (the least innovative group in the world) can get out of the “scarcity/competitiveness” frame of thinking anytime soon. I have a feeling that our issues with open data being incompatible with long-term strategy of science development, will return some day when we try to put open source into yet another grant.

  • July 18, 2010 at 6:12 pm joergkurtwegner
    We should make a list of people working in industry and people working in academia ... especially for any comments flying in on this ;-) I am an industry person with a strong believe in some "open" aspects, clearly not all.
  • July 18, 2010 at 6:19 pm Cameron Neylon
    I'd turn it round the other way - the only way the business of science can succeed is by doing our best to provide the best possible return. The question is whether open foo is the way to achieve that. In the case of government funders I think the evidence is pretty clear that the losses from not commercialising research are much less than the losses from not effectively exploiting the outputs of that research. We need more and better evidence of that (and I agree that the timeframes and motivations are different in industry - as they should be) but if it is true (or rather, where it is true) then we have an obligation to provide the best return we can on the public investment by making data open. I think if government funders have a long term strategy of locking up data then they won't have a long term future - simply because maximising the economic return isn't served by taking that path.
  • July 18, 2010 at 6:26 pm Cameron Neylon
    Thinking about it - all these kinds of arguments hinge on an unspoken assumption - that the major returns on investment in research come from the results of that funded research. If that is shown to not be true - if for example the major economic benefit is through the generation of human capital and capacity - then all these scarcity based arguments need to be reassessed. The British science minister seems to be looking hard at that: http://www.bis.gov.uk/news/speeches/david-willetts-science-innovation-and-the-economy
  • July 18, 2010 at 6:32 pm Graham Steel
    Spinning back in time/history. A quick search on Wikipedia. Open Source:- http://en.wikipedia.org/wiki/Open_source (started in 1911), Open Access Publishing:- http://en.wikipedia.org/wiki/Open_access_%28publishing%29#History (started in the 1960's) and Open Data:- http://en.wikipedia.org/wiki/Open_science_data#History is rather interesting !!
  • July 18, 2010 at 6:39 pm Pawel Szczesny
    Cameron, actually I am arguing (talking on open science with Open Society Institute right now) that one of the benefits of opening up science in transition countries (such as Poland) is exactly building human capital and capacity. I'm not sure if that is also true for all research (closed academic or industrial) and in all countries - because of local law regulations and traditions this might differ between countries.
  • July 18, 2010 at 6:52 pm Cameron Neylon
    Agreed - one of the most interesting things about Willetts' speech was the comment that only 3% of economic returns from academic research come from patenting and spinouts - the classic scarcity based approach and that there were bigger returns in building that "absorptive capacity" as he termed it to apply research done in other places. The key question is whether this is true for transition countries as the argument is that you have to be doing high level research to use high level research. Because doing this across the board is unaffordable you are then forced to try and pick winners and focus on their success, rather than that of the overall community - which is were taking a secretive approach might make more sense. Very hard to tell as the probabilities of success are unknown and big wins very rare which makes playing the numbers game very risky with limited resources...
  • July 18, 2010 at 7:27 pm joergkurtwegner
    @Cameron - commenting on "if government funders have a long term strategy of locking up data then they won't have a long term future" ... I think it is important that people learn being more flexible with their licensing terms and with offering multiple options on how and to which degree data can be accessed, and at which time point. If this could help preventing endless discussions and negotiation rounds then I would assume industrial partners like the idea, too. Besides, this would also help to educate some partners to be more realistic on their negotiation terms (or not getting a deal at all).
  • July 18, 2010 at 8:19 pm Pawel Szczesny
    Graham, nice comparison of dates of adoption. Joerg, I wanted to reply that maybe instead of relying on people becoming more reasonable, we should go for implementation of automated contracts (see Michael Nielsen's post http://michaelnielsen.org/blog/intellectual-property-automated-contracts-and-the-free-flow-of-information/ )? Michael Heller in "Gridlock economy" claims that "quick" resolving of Golden Rice intellectual property issues (http://en.wikipedia.org/wiki/Golden_rice#Golden_rice_and_intellectual_property_issues ) took 6 years. But because automation of negotiation of licensing terms would require all parties involved to share some potentially important information, I guess we're ending up again expecting that people become less paranoid (more reasonable again).
  • July 18, 2010 at 9:04 pm joergkurtwegner
    @Pawel, @Michael - Are there any practical examples of automated contracts, especially in a life science context?
  • July 18, 2010 at 9:11 pm Cameron Neylon
    There's not a lot out there at the moment but this is the kind of thing that Creative Commons have been at with generic patent licensing and such like I think. The aim is to have well understood terms that are designed to be compatible and predictable so that people can get on and do stuff with confidence.
  • July 18, 2010 at 9:13 pm joergkurtwegner
    Its a chicken-egg-problem and it would be good seeing some examples on this for making this discussion easier for all parties;-)
  • July 18, 2010 at 9:27 pm Pawel Szczesny
    I haven't seen any examples either. Automated contracts in open digital media would be something like "DRM meets Creative Commons". In life sciences we don't have DRM nor CC (SC is trying to fill the niche in the latter). It's harder to build a system from the scratch.
  • July 18, 2010 at 9:30 pm joergkurtwegner
    "Do not go where the path may lead; go instead where there is no path and leave a trail." [Ralph Waldo Emerson] and yes, at some point we will get it ... http://www.joergkurtwegner.eu/?q=node/11
  • July 18, 2010 at 9:31 pm Ruchira S. Datta
    Very interesting. We've never run into such a conflict (the idea that these conflict had not even occurred to us).
  • July 19, 2010 at 12:49 am Michael Nielsen
    Joerg - high-speed algorithmic trading (on Wall St and elsewhere) relies on having an infrastructure in place that allows for automated contracts to be carried out. Many extremely interesting things become possible in that context. (I don't know of really good single reference on what's being done, unfortunately.) Aside from that, I'm not aware of much that's been done in other fields, although I'm sure it will come. The ability to automate contracts will be incredibly powerful.
  • July 19, 2010 at 1:05 am Deepak Singh
    Not that unknown. Enterprise Service Buses are often required to fulfill and automate data contracts used in BI, other market intelligence ops, and of course for algorithmic trading. A lot of companies in the creative space and yield analytics also have similar "contracts"
  • July 19, 2010 at 3:02 am Bill Anderson
    One of the confusions in the use of the term "open source" was described by Richard Stallman in a CACM Viewpoint (http://doi.acm.org/10.1145/1516046.1516058) on open source and free software: "Open source is a [software] development methodology; free software is a social movement." "Open data" is not a data development method; it's a data sharing practice, and one that can be supported by policies and procedures. It is helpful to avoid conflating this with a software development practice.

Add a comment on FriendFeed




Comment Form

top