E-Science: What and Why?

E-Science facilitates scientific research by applications of Computer Science. Although it has roots in High Performance Computing and Super Computing, which focus on number crunching, it has evolved to a general role in last few years. Since last few years, E-Science has appeared in the spotlight and has attracted significant amount of funds and able to attract researchers likes of Jim Gray.

Years ago the role of computing in Research was number crunching and help scientists to keep track of their data. However, now computing has become an indistinguishable part of scientific research, and almost all research disciplines have major dependencies on Computer Science. Let us briefly look at some of those areas and reasons why computers play such an integral role in sciences.

Science said to be stand on two pillars: empirical methods and analytical methods. In the first, scientists uses data collected over sufficient period of time to find new trends and patterns of the nature. In the second, based on formal models of world and current knowledge, they try to derive new results through formal logic. More often than not, these two methods have been used in tandem, one helping the other. However, with the advent of computers, a third pillar–computations– has arrived. Since many results derived from empirical and analytical results do not have closed from answers, scientists often solve such problems through simulations. For example, PDEs (partial Deferential Equations), which often resulting from many real life calculations, often do not have a closed from answers. Therefore, they had to be solved through numerical analysis. For example, the state of art weather models predict weather by simulating a model rather than solving them. There are such examples in all filed of engineering and physical sciences. This aspect covers most hpc use cases of E-Science.

On the other hand, most of scientific calculations are easily beyond single computers. For an example, high resolution weather predations can easily use 1000 CPUs, and space telescopes can easily generate tera bytes of data in relatively short time. Handling such problems require multiple computers and distributed system knowledge. Also, building efficient solutions and exploiting the parallel nature need high performance computing (HPC) and Parallel computing.

Furthermore, the reliance of computer science has forced scientists and graduate students from Sciences to learn computer science. Although some of those scientists have made significant contributions to computer science, often that is road block for many scientists to adopt IT in the fullest extent in their research. Consequently, making computing transparent, or in other words, building tools that allow scientists to perform sciences with minimal Computer knowledge, is another interesting challenge being tackled by e-science.

Moreover, efficient scientific research requires a high level of communications and collaboration among scientists. Although IT plays a significant role in that arena even now, there is a greater potential role which it can play. For example, IT has greatly simplified dissemination of scientific research, and has significantly reduce the time and effort required to conduct a literature survey. However, we still lack infrastructure to collaborate in ongoing basis, which allow scientists to collaboratively perform large experiments. More an more grand challenges require collaboration across multiple disciplines, and that increases the importance of such collaborations and consequently the importance of tools to enable such collaborations.

Finally, given the reduced cost of sensors and ubiquity of information technology, there are vast amount of data available to a researcher from the natural world. However, one of the challenges of our time is to learn how to make sense of that data, which is more or less the goal of science itself. In the world we live in, it is much easier to obtain data, but it is much harder to make sense of that data. Therefore, computer science can play a major role in enabling and streamlining the process of getting from data to knowledge, which include collecting raw data, generating meta data, archiving, searching, visualizing, generate information by processing, deriving knowledge from information, and preserving data for the future.

Current E-science includes traditional computational topics like Building Super computing, High Performance computing, Parallel programming, multi core and GPU programming, as well as more general topics like data intensive computing, processing systems like workflow systems, and large scale data storage systems. In general, E-Science tries to facilitate scientific discovery through applications of computer science, and it tries to do that in transparent manner as possible hiding details about CS as much as possible from the end users.

Given the significant interested by system researchers to in E-Science, it is interesting to inquire the reasons. The answer is two fold. On one hand, the amount funding available to pure computer science has greatly reduced, while the funding allocated for national wide cyberinfrastructures has greatly increased. On the other hand, E-Science brings in to focus very large scales, in terms of both computations and data. The resulting problems are challenging even to computer scientists, and the tools and systems we have are often inadequate handle such problems. Therefore, E-Science has continue to push boundaries of computer science.

There are multiple E-Science initiates both at U.S., as well as UK and Europe, each receiving millions of dollars and attracting top scientists. Furthermore, Microsoft research has made significant investments and have a major presence in E-Science. Furthermore, IEEE E-Science Conference will be holding its 6th annual conference this year. Among some of the venues are Annual IEEE E-Science Conference, Annual Super Computing Conference, Annual Teragrid conference, and Annual Microsoft E-Science workshop.

To summarize, computing has the potential to facilitate conduct of scientific research enabling humans to take giant leaps, and the E-Science is a filed of study whose goal is to make that a reality. It has attracted scientists from both sciences and computer science, has receives millions of dollars in funding, and currently running many multi-disciplinary research projects to build next generation research infrastructure.

Advertisements

Paul on getting maximum out of Cloud:Cloud Native

Paul has written a nice blog on cloud nativity.
Well what is it? When a new technology come around, you can use it, but it is possible that you are only using it to handle your old usecase. Often, new technologies comes with new strengths and powerful features. To get best out of it, you should be using all its strengths, not just implementing your old scenario.
Lets try to be concrete. With cloud, you can move your apps to the cloud, and you might get some benefits through economic’s of scale as your computing provider might be able to give you computing power cheaper than you running your servers. But still there may be lot of other potential benefits of the cloud you are not getting, like elasticity etc. It is like, if I buy a iPhone and only use it to make calls, even though it is cools and it covers my old usecase—making calls—I am getting only 10% of what iPhone can give me. So in the same way, if you are going to use the cloud, you have to look beyond your current usecase and be aware of wonderful new scenarios cloud can enable. On the flip side, if you go around telling “I brought a iPhone, but call quality is the same”, obviously you are off the mark by a lot.
On his blog Paul is trying to define some of features required by middleware if it is to exact the best out of cloud on your behalf. They are Distributed / dynamically wired, Elastic, Multi-tenant, Self-service, Granularly metered and billed, and Incrementally deployed and tested. Refer to Paul’s blog for more details.

Fixing Vista endless reboot

Recently my Vista OS got in to a endless reboot, most probably because it was restarted while installing a live update. When that happens, nothing works, including safe mode. Also figured out that if you have a Linux partition, Vista installations fail with a blue screen.

So rule number 1, Make sure you stop installing live updates automatically.

Above problem can be fixed by deleting the pending.xml file—which lists TODO when a machine bootup, which is at C:\Windows\winsxs\pending.xml. Simply boot with Ubuntu live CD and delete it. This and this give more details.

Webinar: Making the hybrid cloud a reality

I am doing a Webinar on “Webinar: Making the hybrid cloud a reality” at 12th 10AM PST (in an hour or so) and 13th 9AM GMT. Check here for more details.

An outline is given below.

The hybrid cloud option leverages the security of a private cloud solution and the elasticity/scalability of a public cloud. Architects designing hybrid cloud solutions need to reconcile between these competing goals. With WSO2’s Cloud Services Gateway organizations can now effectively mediate between their public and private clouds without compromising existing network firewall infrastructures.