by Sian Haynes and Stilianos Vidalis
University of Wales, Newport
Data and Computational Grids are quite similar in the fact that they are used to manage and analyse data. With technology increasing and developing at such a dramatic rate, average computers cannot cope with the amount of data or the calculations they are being asked to perform. For example if a scientist was doing research about cancer cells and their development using conventional methods i.e. on one machine, this could take years to complete the calculations. However if a grid was used to perform the calculation, its combined computational power could significantly reduce the time frame. To analyse a complicated set of data it could take a standard computer a few days or even weeks to analyse. Whereas if a grid was used to perform the same analysis it could take considerably less because it would harness the computational power available on the grid, parallelise the load and allow the calculations to be performed with a small turnaround time.
Knowledge Grids are self-explanatory; their purpose is to share knowledge. At this day and age we have come to a point where we are using computers to create vast amounts of data. The information overload is so big that human beings are not able to analyse that data in a timely manner and extract the much sought after knowledge that will allow to further science and better our lives. We are at a point where we now have to teach computers how to extract knowledge from raw data.
Going back to the HPC Wales Project; the project is set across a minimum of nine sites including Swansea, Cardiff, Aberystwyth, Bangor, Glamorgan, Swansea Met, Newport, Glyndwr and a range of other sites. The grid will allow all of the sites to share and distribute resources freely. A number of pilot applications will be sponsored via HPC Wales to test the capabilities of the grid, for example in Newport we are considering G4CP.
Grid for Crime Prevention
The Grid for Crime Prevention, also known as G4CP will stimulate, promote and develop horizontal methods and tools for strategically preventing and fighting cyber-crime and guaranteeing security and public order in the Welsh cyber-space. Furthermore, G4CP will promote a coherent Welsh strategy in the fields of cyber security through the exploitation of the project’s artefacts, and will play an active role in the establishment of global standards in the areas of cyber-crime prevention, identification and prosecution.
The centre of gravity of G4CP is to design and implement an application that promotes collaborative working, using data fusion and data mining techniques, and allow knowledge discovery from raw security incident data. The application was originally called Inter-Organisational Intrusion Detection System (IOIDS).
Trying to defend the European cyber-space against organised cyber-crime can be seen as a complex problem. One of the problems being that companies are afraid to report these instances because they feel their reputation will be damaged. This means that many private organisations and law enforcement agencies are forced to face cyber crimes with next to no help from other organisations in the same supply chain. The G4CP felt that was a need for the defenders of the European Information Infrastructure to come together to form a number of virtual communities in order to take actions collectively against the perpetrators of cyber-crimes and promote a culture of security amongst and across the members of these communities. These communities should allow for secure information sharing and facilitate organisations to be proactive in defending their networks against ongoing cyber attacks.
G4CP will make grid technology attractive to establishments across the cyber-crime fighting field. It will help the uptake of grid type architectures and extend their concept from computation grids to knowledge grids.
Figure 1 – G4CP Centre of Gravity
The application that G4CP developed would manage to effectively police the cyberspace and minimise the threats against computing infrastructures. This will promote a coherent European strategy in the field of crime prevention through the exploitation of the projects products and will play an active role in the establishment of global standards in the area of crime prosecution through the dissemination of the projects results.
The G4CP raises a lot of questions around grid management, for example if a user wanted to know about Denial of Service against web servers in Wales, once they received the result what would happen with it?
The knowledge would be stored so that if another user needs it then it would be available and instead of using up resources creating the same query it could just locate the knowledge and deliver it to the user. However, if this occurred with every search then it could become too overloaded with information and cause the system to slow because the storage would quickly run out. To store every query it would take thousands of terabytes at least. The only solution would be to keep the information available for a short period of time and if the query was not called for during that time frame then to delete the query. If it occurs again outside of the time frame then the query will be developed again and held on the server. The demand for information and harnessing the power of the grid to deliver information and knowledge faster is the key.
Managing a Knowledge Grid
In a communication network, a node is a connection point, either a redistribution point or a communication end point. However, the definition of a node does depend on the network and protocol layer referred too. The main goal of grid management is to measure and publish the state of resources at a particular point in time. To be effective, monitoring must done from one end to the other end, meaning that the entire environment and its components must be monitored.
Understandably this is no easy task, if we take HPC Wales for example; it will provide to G4CP 1400 nodes set across 9 different sites. It is a huge task on its own just to manage all the components that are required to control that grid and its environment.
There has to be some form of security and authentication on the grid to ensure that the users on the grid are accessing material which is appropriate to them. There are two methods which can control and help monitor users on the grid and their security; public & private key cryptology along with X.509 certificates.
Public & Private Key cryptology is used regularly in many different kinds of computing projects and environments for a secure authentication method. The main reason it is still in use is that it helps indicate the true authors of a piece of information. E.g. If Sian wanted to send a message to Stelios, she would encrypt it with her Private Key so that when Stelios received the message he would be able to unlock it with his Public Key and read the message but because Sian encrypted it with her Private Key he knows it is from Sian.
There are flaws in Public & Private key cryptology along with all methods of authentication, however; this method of secure authentication is put in place as a ‘contract’ of trust between the user and the manager of the grid.
To ensure that the grid is used on appropriate applications or web browsers an X.509 certificate could be issued to authorities who use the grid. The X.509 certificates are standard for a public key infrastructure for a single sign on and privilege management infrastructure. It basically specifies (amongst other things) standard formats for public key certificates, certificate revocation lists, attribute certificates and a certification path validation algorithm.
If we go back to the example of G4CP, we could manage the users on the grid via a X.509 certificate (similar to an E-Science Certificate). For example; when a user joins the G4CP (or attempts to join) they will have to meet up with a member from the authority management team of the G4CP who will then run through an application process which will identify whether they have a need for accessing the grid. If they are suitable then an X.509 certificate with the correct permissions for accessing the grid could be issued to them. The certificate however will have sufficient permissions built in stating which sections they are allowed access to and IP rights stating that it can only be installed on one computer. There will also be an expiration date on the certificate whereby the user would have to re-apply for a certificate close to the time of expiration. Therefore, if a user doesn’t wish to continue having access to the grid or doesn’t have a need for it anymore there won’t be any rogue accounts which could become vulnerable and used maliciously.
Figure 2 – E-Science Certificate (X.509)
The above figure is a typical E-Science certificate. It shows the registered owner and the expiration date for the certificate.
The IOIDS Subsystem must be connected to the subjacent communication platform G4CP in order to allow integration with other platforms this issue will be solved using its own module too. The employment of a dispatcher will perform the processing of incoming messages sent over through the Grid for Digital Security.
How to manage the nodes
Managing the users has been covered but what can you do to manage the nodes? As previously mentioned G4CP could be using up to 1400 nodes running across 9 different sites. If one of the sites goes down and there is no one there to manage it then what happens when the user needs access to that specific piece of information?
There are several software packages available which can be used to manage the nodes on a network. One specific software package is Conga; it is an integrated set of software components that provides centralised configuration and management of clusters and storage.
It has features such as one web interface which manages clusters and storage, automated deployment of cluster data and supporting packages, easy integration with existing clusters with no need to re-authenticate them, integration of cluster status and logs along with fine grained control over user permissions.
This software basically manages the clusters and the nodes of the grid without altering the user permissions already in place. The user permissions would be set by the authority figure or manager of the grid using the Public & Private Key infrastructure combined with X.509 Certificates which manage the users already on the grid. The expiration date of the certificate could be set by the authority figure after an interview. The time length could vary dependent upon the relevance of the knowledge grid to the user.
The primary components in Conga are luci and ricci which can be separately installed. Luci is a server than runs on one computer and communicates with multiple clusters and computers via ricci. Ricci is an agent that runs on each computer which in turn would be managed by Conga.
Luci can manage the nodes on the grid; there is an administrative menu which can do the following options: make a node leave or join a different cluster, fence a node, reboot a node and delete a node.
All these options would be extremely helpful to manage the grid especially if its nodes are spread worldwide. E.g. If a node was broken in Germany and the main administrative authority was based in Wales it could take anywhere from a few hours to a few days to arrange for someone to either instruct someone how to fix it or to travel to Germany to fix it. Whereas with Conga, it is all done by a network and remote log in which means that the node can be managed from Wales (or wherever the headquarters is) near instantaneously with next to no disruption.
Figure 3 – Conga GUI
The above image (figure 3), shows a screen from Conga running on Redhat which is a Linux based Operating System.
What does this mean?
The High Powered Computing (HPC Wales Project) will mainly be based in Swansea and Cardiff which will then be connected across the remaining sites. This means that each placement will have a taste of the high power computer performance that can be used for research.
Grids are developing everyday and are predicted to become the ‘norm’ in the future. Similar to the internet, it is widely used and spread across the world. Grids will give scientists and researchers the power to get results faster and to expand the knowledge they already have. The demand for information is increasing as is the rate of expecting the information.