A huge amount of work is being done to meet the technological requirements of the country, which demands that disaster management and disaster recovery systems are simultaneously set in place. The Roundtable Discussion on Disaster Management and Disaster Recovery, during 28th Skoch Summit held in New Delhi in March 2012, experts from the government and the private sector shared their experiences, spoke of the present and future challenges and suggested solutions. This is a multipronged exercise of identifying the different kinds of problems that could arise while setting up data protection, back-up systems for recovery and archival systems, training the manpower required, and constantly reviewing and updating systems.
With nations worldwide dependent on computerised systems, disaster management (DM) and disaster recovery (DR) have become vital. Critical operational services and business continuation in real time situations serve governments and people at all levels, from essential public services such as power supplies, banking, land records, stock markets, travel reservations, and inventories. Efficient (DM) and (DR) focuses on strong IT infrastructure, backups, systematic procedures, checklists, prevention and drills, data protection, recovery and application services, the use of standard best practices, security, manpower expertise, the penal aspects, and keeping the recovery services in constant use. Disasters are of many types, ranging from natural calamities and terrorist attacks to manmade mistakes and even sabotage. Hardware and software, networks and operating systems could all be at risk. Backup plans are an absolute must when there is a technology changeover to other versions and systems.
The Roundtable was chaired by N Vijayaditya, Distinguished Fellow, Skoch Development Foundation and former Director General, National Informatics Centre, who spoke of the changes that have taken place over the last decade. “We are now taking critical applications into the computerised format and all our services are probably provided through computer systems. All our data is being stored in machines and we are not keeping any manual copies of those records. In such a scenario whether your computer, your power system or your application goes down, there are going to be a lot of repercussions on the services which you provide and the business that you are running. So it’s important for each of these applications to create some sort of DR system and DM system.”
Setting up systems
About seven years back Vijayaditya wanted to create a computerised format archival system for the government’s voluminous records and files. “The Government of India has a very good manual archival system with procedures that tell you how to manage archival records. The whole process of screening and procedures has been well laid out.” To create a similar system in a computerised format, he was looking at other countries for a suitable model to replicate here. At one of the big companies in the US, he was surprised to see all their data was stored on backup tapes in a different location. That was a time when disc devices were expensive. “When you are talking about DM or DR systems, you are only talking about the current data.” What Vijayaditya was looking at was secure storage of old transactions that constantly increase in volume, efficient transaction time both at the time of loading and accessing archives, a separate centre for backup storage, and frequent testing of disaster recovery systems.
|“It is important to consider disaster recovery as subset of business continuity planning. It should not lose its business centricity. Involvement of the top management is key to any successful programme.”
— P K Chophla, General Manager - Policy Division, Department of Information Technology, Reserve Bank of India
“At the NIC we created a backup system for the power system. We had two power lines from DESU (the then Delhi Electric Supply Undertaking), two feeder lines from two independent sources, battery backup system, more than two generators, two control points, and a common control point by which it could be switched over. When there was a flood, obviously the power system from DESU failed, the generators did not start for whatever reason, and the control panel operating the two control panels of each individual system also failed. In spite of the fact that we had created a backup, the total system collapsed as the only place where we had not created a backup system was for the control point which had burnt out. It took 24 hours to replace it. So it is very important that every segment of the system is duplicated.”
Another important aspect Vijayaditya is the advance preparation to handle additional loads to a system, ensuring capacities and high tolerance when running important applications such as the massive volumes in case of school board examination results, and ensuring access to a sufficient number of experts for the software in use. “The more critical the application, the faster you want the recovery, the more you have to invest…. There are technologies available today to recover and to manage your data, but the investment depends on the application, the criticality, and the recovery times that you are looking for.” He stressed on the importance of choosing the application and particular system depending on the user’s needs, and ensuring that the people who are running the system have the necessary expertise, otherwise “you are going to have a major disaster”.
|“Customers should not choose technologies first, they should choose an approach and framework to put processes in place and then the technology would naturally evolve.”
— Lakshman Narayanaswamy, Vice President-Products, Sanovi Technologies
The geographical aspects of the disaster recovery centre site are crucial. Vijayaditya emphasised that there is need to avoid locating disaster recovery centre and main centre in the same geographical space, something we tend to do in this country. There is also need to continuously switch operations from disaster recovery centre to main centre, and vice-versa, even when your system is running beautifully. “If you don’t change over and test the system on a frequent basis, the chances of failure are higher when you need it to work…. It is also important that the processes and procedures are continually evaluated as some may have become redundant.”
Vijayaditya framed the disaster recovery policy for the NIC and was instrumental in setting up the disaster recovery centre in Hyderabad in 2005-06. The NIC, headquartered in Delhi, has a mammoth data centre that hosts and runs a large number of websites of the various ministries and government offices including critical ones such as the prime minister’s office, the office of the President of India, and the external affairs ministry.
National e-Governance Plan
As the country progresses more and more into e-governance and e-services, huge investments are required to be made in core infrastructures, hardware, software and human resources. Renu Budhiraja, Senior Director in the Department of Electronics & Information Technology, Ministry of Communications & IT, says, “we are setting up core infrastructures across the country for countrywide implementation of National e-Governance Plan. Every state is going to have a data centre which is state-of-the-art. With critical applications and data, the important aspect first if DM is by DR. How do we ensure that once data is available, there is no loss or there is a minimum loss of data based on the type of application?” To address this, the NIC has set up four national data centres for disaster recovery - Delhi, Pune and Hyderabad, with the fourth coming up in Bhubaneswar. Critical data and operational services are handled in such a way so as to ensure that customers do not suffer. The challenges that are being addressed include the choice of replication technology and management technology, scalability, the space availability depending on the location of each centre, setting up large-capability controllers, the bandwidth, and identifying criticality. The National Knowledge Network (NKN) which is coming up is a gigabit network that will be used for DR.
She further said, “while the shared and co-location services will continue, we are bringing cloud enablement in a limited way for a basic infrastructure and test and development environment. A new challenge arises as I have a mix and match of applications and I am not going to have the same cloud solution sitting on the other side.” As part of the entire exercise, the government is preparing support systems such as DR and strategy manuals. Also being chalked out are the roles and responsibilities in crisis management operations. While the government has put in place the guidelines, it is now pushing the states to prepare their own DR plans. Some states like Karnataka, Gujarat and Rajasthan, already have them. All the data centre operations including testing the DR procedures are subject to continuous auditing, both internal and external, to ensure that procedures are followed.
|“Most of the times organizations go for the best of systems. Processes are also defined but there is nobody to ensure that processes are maintained and followed.”
—M Thyagraj, Advisor, Oil and Natural Gas Corporation
|“While the shared and co-location services will continue, we are bringing cloud enablement in a limited way for a basic infrastructure. A new challenge arises as I have a mix and match of applications and I may not have the same cloud solution sitting on the other side.”
— Renu Budhiraja, Senior Director, Department of Electronics & Information Technology, Ministry of Communications & IT
| “Drills should be regularly conducted because our IT applications and processes keep changing. This is necessary so that the disaster recovery sites are aligned to the processes.
— Nityanand Phatarphod, Executive Vice President, National Securities Depository Limited
CSR Prabhu, Deputy Director General of NIC, was in charge of the disaster recovery centre in Hyderabad when it was set up in 2005-06. “We operated our email server from Hyderabad and from Delhi. The NIC email serves the entire government. With virtualisation, instead of physical servers we have virtual servers.” Prabhu, who is also the NIC’s cloud coordinator, speaks of the differences between virtualisation and cloud in terms of infrastructure and service. “Cloud has three layers: infrastructure service, platform service and software services. In infrastructure service in a plain virtualisation scenario, the system administrator himself will create virtual servers. He has a standard procedure to create a virtual server out of a single physical server and he will administer those virtual servers himself. In the cloud case, the remote user will create the virtual server at his will. We have deployed Eucalyptus, which is open source.”
Disaster Recovery Management
Sanovi Technologies, a company that was set up more than nine years back, is a company focused on disaster recovery management. Lakshman Narayanaswamy, its Vice President, elaborates on its IT disaster recovery services. “We enable organisations to meet their continuity goals and business continuity and IT DR goals. Our flagship product, Sanovi DRM, is in use across several organisations in India and abroad. We work at various levels, enabling organisations to deploy a plan to make their IT more operational, more resilient and recovery ready. We have learnt what works in situations and what does not work.”
The Business Continuity Planning (BCP) and DR really works only if you have everyday management involvement. It is not just IT, as then it becomes technology centric - the large part is the people and the process along with the technology. The second is that the organisation has to set its recovery goals, agreed upon across the board, and the IT department has to figure out its key matrix - from a technology perspective, the recovery point and the recovery time - as solutions are designed around these. It is important to build a culture of testing and readiness with a plan. To do a successful DR, Narayanaswamy pointed to the importance of automating DR rather than hoping to have the right person to do the right thing. This saves a huge amount of time, cuts down the amount of errors and people dependence. Report, analyse and constant improvement helps enormously. Providing resilience rather than just recovery is important as 93% of outages are manmade: tripping over a power cord, configuration errors, hardware errors, network or application failures. A server vendor could take hours to restart the operation whereas it can be done within no time from a DR site. Sanovi’s approach is to look at the entire process as a lifecycle rather than a point product which leads to customers enjoying higher operational efficiencies, reduced outages and industry best practices. Sanovi cites the example of HDFC bank that has reduced business failover by 85% resulting from better planning, implementationa and coordination.
|“In their disaster recovery centres, the data is synched at midnight for consistency and concurrency of data transactions. We have a four-stage disaster recovery to ensure continuous supply of petrol, diesel, LPG and other petrochemicals. Business continuity planning should be made mandatory for all essential services.”
— S Ramasamy, Executive Director, Indian Oil Corporation
| “We have our primary datacenter at Hyderabad and we have our DR at Chennai. We are running Recovery Point Objective (RPO) of about 4 to 5 minutes and a Recovery Time Objective (RTO) of about 4 hours many times and we are testing it - mostly without much notice.”
— P Srinivas, General Manager-Delhi Zone, Andhra Bank
With the amount of data generated growing by the day, storage is becoming a major challenge. According to M Lakshmi Narayana Rao, who leads Cloud Consulting Initiatives for HP in India, this growth has a direct impact on disaster recovery centres and costs. “New technologies that are coming in actually make it easier for optimal usage of storage space.” He says that when you look at what you have in your disks, data centres and DR centres, there is bound to be loads of data that is redundant, some of it probably is not recoverable and if it can be, in will require a whole process. “The next wave of technologies in this area is intelligent technologies that make the whole aspect of storage more intelligent, much lighter and leaner, and something which can self-recover on its own. I am calling them cloud technologies as they were deployed in a cloud environment. The technologies that have emerged in the last five years or so have hundreds of thousands of concurrent users, so it is not easy to predict the load that’s coming in…. We need to see the emergence of cloud as a DR centre.” Narayana Rao stressed that small and medium enterprises have zero DR, except perhaps a backup of some critical components. “I think the SMEs will adapt to the cloud faster.”
K Bhaskhar, Director, Office Imaging Solution Group, Canon India, reiterates that most organisations find retrieving archived documents effectively and easily a difficult task. Technologies that offer safety but are not user friendly defeat the entire purpose. However, with new technologies improving, this is being overcome. He says that often documents are stored poorly. Since safety cannot be guaranteed, many organisations opt for multiple DR sites, an expensive proposition, which is where cloud plays a major role. Deepak Rout from Microsoft spent years with the Army’s military intelligence in the cyber security area. He is of the view that most enterprises worldwide are about to create DR Business Continuity Planning (BCP) as mission critical applications are lifeline of an organisation.
P K Chophla, GM of the Policy Division of the Reserve Bank of India’s IT Department, stresses the importance of enterprise risk management across all sectors which he says, most organisations do not address. A technology centric approach, he adds, has a lot of pitfalls, saying that most organisations do not have BCP in place. “To have an integrated document, business has to come first because if business doesn’t tell us what are its priorities, we may keep on storing data and continue to create capacities which are not required.” With regard to the banking sector, he says that RBI has gradually brought almost all its important applications to data centres, primary and disaster, and all banks have their DR centres. “From the technology part we get comfortable results with RTGS, the most important application of the Core Banking System…. But we want to make DR drills more rigorous.”
Identical DR is an important aspect at National Securities Depository Ltd, stated Nityanand Phatarphod, its Executive Vice President. When DR was introduced about thirteen years back, the first approach was to ensure that they could sustain it. “We created a task force which included technology as well as business people to evaluate new implementations and new technologies to continuously improve our DR processes. Checklists, mocks and drills are always conducted. We have a policy where we shift to DR at least twice a year for a minimum of a week and all our processes are tested.” Scenarios are created for live testing, not during odd hours but by shifting to the DR site intraday when the stock market is in full operation
DM and DR are equally vital for PSU giants such as Indian Oil and ONGC. Indian Oil was the first company in India to go for the business continuity certification. According to S Ramasamy, Executive Director of Indian Oil, the company does a transaction of 30 million rupees every minute. In their DR centres, they sync the data at midnight for consistency and concurrency of data transactions. Indian Oil has a four-stage DR and BCP to ensure continual supplies of petrol, diesel, LPG and other petrochemicals. Ramasamy suggests that business continuation should be made mandatory for all essential services.
M Thyagraj, ONGC’s Advisor IT on real time operations, speaks of the company’s strategies. “We divided our data into three main streams: business data, real time data and scientific data.” Scientific data gathered through surveys all over the country is collected to prospect and evaluate oil and gas. The data goes on to supercomputers and analysed for decisions of where to drill. In addition to data, DR and scientific centres, the company has created EPINET, another data centre for day-to-day data to be screened and quality tested, and then sent to the exploration database so that the real time data goes into the dynamic re-evaluation of the models.
Air India introduced its DR system back in 1996. When the company faced two disasters, one a minor fire and the other due to heavy rain, neither time nor systems were lost. Its DR centres are never idle as production applications are also in place.
P Srinivas, Andhra Bank says, “we are taking appropriate steps. We have our primary datacenter at Hyderabad and we have our DR at Chennai. We are running Recovery Point Objective (RPO) of about 4 to 5 minutes and a Recovery Time Objective (RTO) of about 4 hours many times and we are testing it - mostly without much notice. The system is such that we are able to switch over from there with ease.”
In spite of a late start, India appears to have surged ahead of many countries including some western ones, in terms of DR technology adoption. There is a growing awareness and more organisations including smaller ones are likely to go in for DM and DR in the near future.
comments powered by Disqus