We use, we bake and we eat cookies. By browsing our site you agree to our use of cookies. Okay!

Open science and the economic collapse

My colleague is interested in designing a better PCR machine and is collecting all the hardware information he can get. Today, after I’ve forwarded him a link to OpenPCR, a project aiming at constructing open source PCR hardware that anyone can build, he asked me what’s the point of making biotech equipment by yourself in an academic environment. My answer was that with empty pockets academics will have no choice, but turning to DIY hardware. And this is going to be one of many other changes that scientists are going to face within next 18-24 months.

If you still believe that recession is ending, look at the recent newsletter from Investors Insight. Author comes to a similar conclusion like other analysts I’ve read over the last year: no bailout is going to save several countries from economic collapse. How will it look like, it’s still an open question (rapid collapse or death by a thousand cuts), but I assume that science funding is about to be severely limited (friends in the US are already reporting quite difficult situation in academia). And probably the only way to move the research forward at the large scale will be making it open and distributed. Citizen science, open access, DIY biotech, open source, outsourcing and crowdsourcing, rapid development, etc – these are the songs of (inevitable for some places) a future of science. Probably at a smaller scale, at a level of an individual scientist, radical openness might not work in each case. However I would expect that some level of opening the research process is going to happen in almost all fields, because the spectrum of open solutions is big enough to accomodate different requirements/needs/expectations  (for example, a citizen science project in which the final outcome is still a subject for patenting).

Open and distributed means less expensive. If the science funders assume it’s true (and I’d assume we’ve enough evidence to postulate at least correlation), we’re going to see some interesting things happening. Plus couple of familiar names at managing positions of science-related public institutions ;).

Two metagenomics samples: are they different?

Some other day I was asked to compare taxonomic distribution of two metagenomics samples and answer the question whether they are significantly different. With my deep dislike of using statistical methods without understanding the data, the first thing I’ve tried was to SEE if they are different. What I did was a clustering, but (again) with CLANS software. I’ve joined two sets of reads, run BLASTN all against all and clustered them in 2D. The last thing was to color the points, one sample in red, the other in blue.

CLANS on two metagenomics samples

Due to high density of the points, the question is only partially answered. Detailed inspection revealed that black spots contain reads from both samples, although it’s not that clear from the picture. What is hardly visible here is that both samples maintain the same structure of clusters, therefore one can assume they are highly similar. To see detailed picture, I’ve clustered the reads by sequence identity (using a version of cd-hit written to deal with reads from 454 sequencer – called cd-hit-454) at the level of 90%. Clustering again in CLANS confirmed initial assessment – two samples were basically identical in terms of taxonomic coverage (to make the picture more clear I’ve removed connections – this group in the middle is a single cluster of highly connected reads):


Two sets of reads were not identical in size, but if they were different in terms of taxonomic coverage, one could expect at least two clusters.

Of course the initial question could be answered in other ways, some of which also include fancy pictures (such as mapping the reads onto one of NCBI’s database and then visualizing the taxonomic coverage with MEGAN, which also can compare visually two samples). Advantage of using CLANS for that particular question is that it’s robust, in a sense that it’s independent of sample-size (as you can see in the pictures above – clusters are different, although the answer is the same).

Open Foo: sharing practice, social movement and technology


In the discussion under my recent post on incompatibilities between open source and open data Bill Anderson pointed out frequent confusion between “open source” and “free software”. He cited Richard Stallman’s essay which argues that open source is a software development methodology, while free software is a social movement. Building on that, Bill wrote that “‘Open data’ is not a data development method; it’s a data sharing practice (…)”, which sounded quite right.

However, after reading Stallman’s essay again and looking at the official definition of open source software (linked also by Stallman) I view the distinction between these two quite differently.

“Open” as sharing practice

The official definition of open source software specifies conditions of sharing and distributing source code. As such, it’s no different than definition of open data, except that it contains some conditions very specific to software (for example, the code must be shared under technology-neutral license). So first layer of “open” would be non-restrictive sharing policy.

“Open” as social movement

Attaching a societal idea to the “open” results in the second layer in this schema. “Free software” is one example, “making research results available to taxpayers” is another one. In many cases, a social movement was already in place before sharing practice was established, in other cases (for example with open data or open science in general) we still struggle to define the ultimate societal benefit in less than six words.

“Open” as technology

Here’s the place for viewing open source as software development method. Here’s also the place to view open science as research organization method. Most of “open” ideas result in practical advantages, such as new business models, increased sustainability, or faster growth/development. I call them “technology” because they are applied to processes. The exact benefits depend on the domain – open source and open data are comparable only in analogies.

Discussing openness

I don’t think it’s the ultimate solution, but I find such three layer model of “open” quite useful in clarifying discussions on openness. First thing is that mixing different aspects of openness results in such abstract ideas as software that has liberties (which in the end might become very important but don’t help in establishing the basics in areas that have shorter history than open source). The other thing is that it helps to divide tasks between interesting parties based on their competencies. This is how it works in case of Open Access (establishing policies, advocacy and software development are usually done by different organizations). And finally, you can easily define new or already established “open” ideas within that framework –  for example it helped me to understand differences between open politics, open government and open democracy (the differences were not that intuitive for me, as I expected).

This model could be possibly improved by adding other layers/aspects. If you have any ideas, please let me know.

Open Data citation advantage

Quite recently my colleagues published a paper on genome-wide model of translation (in PLoS Computational Biology). They have used a number of different data, although in one case the data wasn’t available in the text or as a supplementary material – they needed to ask the other group for certain raw numbers from their experiment on ribosome profiling in yeast (published in Science). Data was shared promptly, so my colleagues could finish the project. They have cited original paper, plus they’ve expressed their gratitude mentioning data sharing in acknowledgements.

Because sharing data resulted in a citation, I wonder how long will it take for Open Data advocates to start using this “open data citation advantage” as an argument for sharing data?

Don’t get me wrong – I’m all for Open Foo, but I do get frustrated when “citation advantage” becomes a major or even the only point of going open. It’s obscuring debate on Open Foo and limiting it to the aspects only (some) scientists care about.

Must read: advantage, schmantage post by Bill Hooker on OA citation advantage.

Large patent portfolio: an equivalent of automated contracts?

Intellectual Ventures is a private company with business model relying on developing large patent portfolio and licensing these to companies with infringing products. In other words their business model is patent trolling. Given my attitude towards openness, it’s clear that I don’t like their approach at all, although I must admit that some of the ideas they have developed are freaking cool. You can imagine my satisfaction when it turned out that their first venture fund, IDF I, isn’t doing that well (see embedded document below, page 7).


However the recent tweet from Glyn Moody points to an article in TechDirt, which states that the numbers (internal rate of return at -78%) might be meaningless and the true revenue from patent trolling is still unknown.

This news inspired a hot discussion at our small research institute (the one we’ve started last year) about our attitude towards intellectual property protection and management. It seems clear that some form of IP protection is going to stay, at least for privately funded research or applied science. On the other hand, the price we as a society pay for current patent management system is constantly going up. Michael Heller in “Gridlock economy” claims that so called “quick” resolving of Golden Rice intellectual property issues took 6 years and I don’t think situation got better since then. For that reason, I’m a big fan of Michael Nielsen’s idea of automated contracts and its extension to patent system.

In the discussion we’ve had and I’d argued that in the  long run, we should stay away from holding IP rights, because of huge investment (time and money) to obtain one (just after that discussion I’ve seen an interview with Craig Venter in which he says “nobody has made any serious money off patents on human genes except patent attorneys” – worth read for other reasons as well). Instead I’ve argued for building a platform for streamlining of negotiations between patent holders and businesses in the area of our competence (which is currently green and sustainable technology). Even if we don’t grow beyond local market (Poland), my feeling is that such platform is going to be more profitable than collecting patents.

I didn’t compare our capacity for filling patents with patent portfolio of Intellectual Ventures (that would be silly). Also, despite my aversion towards IV business model, I’m not that sure their returns will never become positive. Rather I’ve argued that patent trolling might be actually profitable, but only if you can sell licenses for the whole process, or most of it. In other words, large patent portfolio might be an equivalent of automated contracts. Because the price for individual licenses is going to drop (have a look at amounts awarded for solutions over at Innocentive – there’s no way any Western university would price its services so low), small non-profit research institutions (including ours) aren’t going to earn enough from their patents to make the research sustainable.

Related articles by Zemanta

Enhanced by Zemanta

Science as a complex system – introduction

The Lorenz attractor is an example of a non-li...
Image via Wikipedia

Which complex system?

Complexity theory, that is studying complex systems, is tracked back to 18th century with classical political economy of the Scottish Enlightenment, although the real pioneers of the field are 20th century’s philosophers, economists, mathematicians and social scientists. It’s a rather young field, but it already covers quite large number of topics (such as complex adaptive systems, chaos theory, non-linearity, emergence or self-organization) and which influences other fields of science, like biology, sociology or economics. In this post each time I mention a complex system I mean “complex adaptive system” (CAS) which is adaptive (which is not the case for non-linear system), non-deterministic (which is not the case for chaotic system) and non-predictable (which is not the case for simple or linear system). John Holland’s definition of CAS is:

A Complex Adaptive System (CAS) is a dynamic network of many agents (which may represent cells, species, individuals, firms, nations) acting in parallel, constantly acting and reacting to what the other agents are doing. The control of a CAS tends to be highly dispersed and decentralized. If there is to be any coherent behavior in the system, it has to arise from competition and cooperation among the agents themselves. The overall behavior of the system is the result of a huge number of decisions made every moment by many individual agents.

I think we can safely say that science as a system of organized research within and outside certain institutions exhibit large number of properties attributed to CAS. Therefore let’s assume that science is a complex system.

Laws vs models

It’s important to remember that behavior of a complex system may depend on a unique set of fundamental laws, but these are different from models we use for practical purposes to describe this behavior. In other words, models of complex systems do not have to be reducible to unique laws. Let me pull out another quote, this time from this recent post by Wavefunction (emphasis mine):

A molecular mechanics model of a molecule assumes the molecule to be a classical set of balls and springs, with the electrons neglected. By any definition this is a ludicrously simple model that completely ignores quantum effects (or at least takes them into consideration implicitly by getting parameters from experiment). Yet, with the right parametrization, it works well-enough to be useful. There could conceivably be many other models which could give the same results. Yet nobody would make the argument that the behavior of molecules modeled in molecular mechanics is not reducible to quantum mechanics.

So, despite some people claiming to know exactly how the science is operating, and we are all wrong with our analogies, we are free to make as many models of science as we wish and there’s nothing wrong with that. Not only because laws and models are different. In many cases, emergent properties of the system cannot be derived from a set of underlying laws – we use (often naive) models to capture these phenomena.

Models of science

How many models of science can we build? Or how many models is enough?

We could compare science to a multi-agent system, where researchers would compete for goods produced by science funders.

We could compare science to a culture, where research areas would rise and fall as a result of competition between memes. Researchers and science funders would be agents of transmission

We could compare science to a simple system, with linear laws (such as “more money, more papers”) which becomes unpredictable due to inherent elements of randomness (scientific discoveries).

We could compare science to a social system, in which behaviour of researchers could be modelled by game theory.

We could compare science to a campfire, where people gather and tell stories.

We could make analogies to art, economics, sociology, or almost anything else. We could derive “laws” or “rules” based on these models, which often can quite accurately (within certain boundaries) approximate behavior of the system.

Model agnosticism

However, asking which model is the best one is like asking which approximation of molecules is the best one. The answer is that it depends on the experiment. As for protein structures, there’s a large spectrum of different approximation used, depending on the task (rough structure comparison, structure modelling, molecular dynamics, docking of small compounds). For other complex systems, situation is quite similar – the practical purpose determine the choice of the model. This is often forgotten, when you move to other fields.

There are also two other approaches – multi-model or multilevel modelling (represented roughly multiscale modelling) and model-free (represented roughly by neural networks), but if these are chosen, this happens for practical purposes, not because they represent “reality” better.

Why “science as a complex system” (or “why such a long introduction?”)

I’ve been thinking about future of science and strategy for science for quite some time. It can be quite difficult already at a personal level (career strategy) and real hard to get at a larger level (for example, open science strategy for Poland). What I’ve learned from Michael Nielsen, is that if you want to make predictions about the future, you need to understand the present as good as possible. I don’t know any better way of understanding something than constructing model after model (and testing them, if that’s possible) .

However, if you look at the predictions made by some people around, they usually focus around one or two ideas their authors like the most. Also people don’t test their predictions against different models, not to mention trying to combine models, or learn something from models incompatible with their own ideas.

But treating science as a complex system doesn’t mean only slight update to our methodology, that is testing different approches. It provides us with a variety  tools to build and test our models (network analysis, multi-agent modelling, pattern oriented modelling, cellular automata, game theory, and list goes on and on). And how to apply these tools to understand how science develops, will be the topic of upcoming posts.