A distributed workforce with centralized data presents many challenges: NFINIT gets it!
IT departments have experienced a sudden impact from the COVID-19 virus that brought about an immediate short-term challenge of providing a solution for their employees to access applications and maintain productivity from remote locations.
This has created a chain reaction that will introduce a series of long-term hurdles to address over the months to come…
Below we will be drilling down on each of the below categories and showcasing what IT professionals will likely face as remote users attempt to return to normal levels of productivity.
Powering your applications
Compute Resources – With global supply chain and public cloud resources impacted, are you prepared to stay agile through these circumstances?
It was like a cyber shopping spree as IT departments responded to the seemingly overnight shift of workforces. As cities, states and countries put policies in place to limit the spread of COVID-19 and the end users’ homes transformed into remote offices, business as usual changed instantly.
At this point, most IT departments have a plan in place to bridge the gap of the immediate impact of this pandemic. Here at NFINIT, our focus is on anticipating what could be coming next…
The second phase of supporting the newly distributed workforce features the adjusting and fine tuning efforts required to “magically” eliminate the impact of decoupling the applications and desktop compute environments.
Through this phase, various previously unknown dependencies will begin to surface: application codependency’s, bandwidth/latency requirements or equipment limitations, etc. End users are accustomed to a certain level of performance, process and ease-of-use for their daily work routines, and creating additional stress on a user base that is trying to adjust to a new sense of normalcy, will further impact productivity (especially when the users are “non-technical”).
To complicate matters, COVID-19 has placed a chokehold on the global supply chain that could force IT decision makers to utilize non-optimal choices. These choices could include redeploying end of life resources, bypassing proper risk assessments, or rushing migration plans for digital transformation projects.
The lesson we have seen play out across our 30 years of experience supporting our Clients, is that every plan, no matter how well thought out, requires a back-up plan. Without the proper runway to thoroughly vet the business continuity solutions, these secondary back-up plans will prove to be invaluable.
Below are some of the challenges that NFINIT is anticipating:
- File level access issues – As the user base tries to access the files that in a “normal state” they would simply access via a shared network folder or repository and quickly pull across the LAN, the distance between user and app becomes apparent.
- Large files – These file types consist of multimedia files (video), CAD, PSBs, etc. and can represent various challenges surrounding sharing and collaborating.
- Compute infrastructure availability – As mentioned above, we are experiencing a global impact to the IT supply chain (network equipment, workstations and servers). From materials to manufacturing to logistics, delivery times are subject to guesswork at best…
- Virtual Desktop Instances (VDI) and Desktop as a Service (DaaS) – Leveraging employee owned devices (BYOD) or utilizing corporate owned legacy devices powered by purpose-built backend servers provides a good alternative to sourcing mobile workstations and laptops for end users.
- Hardware refreshes and extend lifecycles – Repurposing out of date hardware or extending hardware refresh cycles is an effective strategy for managing capacity, provided they are supporting a less critical workload or leveraging an effective DR solution.
- Application and end-user distancing – For certain applications, significant parts of the user experience can be attributed to the latency between the user’s workstation and the backend application. By separating the applications and the users, now connected by unpredictable and saturated residential grade networks, performance is sure to suffer.
- Remote datacenter management – Organizations performing remote-management of in-house datacenters’ underlying infrastructure are at risk. There are various tools available for monitoring and managing these facilities from a distance, but seldom do in-house datacenters have a complete portfolio in place and even fewer have appropriately tested them.
- Datacenter management systems – Tools to monitor and manage power, HVAC and access control systems are hard to manage remotely and in a corporate datacenter environment depend highly on tribal knowledge, which can easily become a single point of failure.
- Infrastructure management – The shelter in place directives have resulted in most corporate offices closed and unstaffed. This includes IT support personnel responsible for maintaining line-of-business network and compute infrastructure. This will result in extended response times at a minimum. In these circumstances, it is imperative that the proper level of redundancies and disaster recovery plans are in place and regularly tested. This could be as simple as a failed power supply or the lack of a working HA (high availability) design.
- Public cloud resource contention – With the combination of the global supply chain constraints and the sudden influx of workloads migrating to the public clouds, even the largest providers will be subject to capacity constraints. (the below link defines Microsoft’s initial response to the subject)
- Tier 1 Applications – Companies that leverage public cloud as their primary hosting environment should have a hybrid or Multi-cloud strategy to address potential lack of resource availability.
- Disaster Recovery (DR) – DR plans leveraging “burstable” resources in the public cloud should be under scrutiny. If there is a widespread outage during the pandemic timeframe and service restoration becomes extended (due to what many providers are considering a “Force Majeure” event) public cloud regions are likely to be overrun. This equates to your company’s data becoming stranded in a replication target without the ability to access the corresponding compute infrastructure.
- Total Cost of Ownership (TCO) – As companies were forced with prioritizing expeditious plans over well-researched projects, the financial pressure inside organizations will lead to recalculating and evaluating the alternative means of service delivery.
What has changed with accessing applications
Network Impact – As users have relocated, so have your network requirements. From the ability to transfer large files that traditionally traversed across the LAN, now across a subpar home internet connection, to a lack of network resources providing access to an in-house data center, resources are going to be in contention.
As users have relocated, so have the network requirements. From the ability to transfer large files that traditionally traversed across the LAN, now across a subpar home internet connection, to a lack of network resources providing access to an in-house data center, resources are going to be in contention. The reality is companies are now learning they do not have all the pieces in place to enable this shift.
From a high-level, these are the phases and responses that have accompanied the Work From Home (WFH) shift:
- Extension of the status-quo
- Home network considerations
- Scaling and ensuring key application access
- Clouding or SaaS-ifying commodity services: email, shared drives, video conferencing, etc
- Adapting and moving specialized or legacy applications to the cloud, or providing access to them without the need of VPN
- Cost optimization or control on the newly deployed infrastructure
Extension of the status quo – as employees were leaving office buildings for their home office, IT administrators’ first approach was to “extend the network”. Keep the same look & feel by allowing WFH users connect to utilize VPNs to access the office network. Multiple considerations need to be addressed:
- Will the existing equipment handle the load?
- Is the equipment properly licensed?
- Do users have the necessary devices and bandwidth at home?
- How does WFH fall into the business’ security policy or framework?
- Assistance from vendors – multiple vendors that traditionally bill by the number of VPN users alleviated some of the financial pressure on their clients by temporarily providing free licenses, expedited service, or pushing existing pandemic emergency licensing:
- Network security – network administrators should be periodically reexamining network security concessions made under emergency circumstances. Especially focusing on the access criteria from home systems:
- How does the coexistence ofcorporate-owned devices and IOT/unsecured home devices on the same network play into corporate security guidelines?
- Is two-factor authentication still being correctly utilized?
- How is user activity monitored/audited when they are outside corporate purview?
- How do you control multiple non-corporate users on corporate devices or BYOD with access to company assets?
- Is split tunnel VPN compatible with company security policy?
- Providing remote support – a large shift in the day-to-day activities of IT departments became dedicated to helping the remote non-technical users configure VPN clients, along with any required application fat-clients has become a challenge as not all companies have remote access software on their system (let alone BYODs).
Home network optimization – as home networks’ capacity became burdened by additional WFH traffic generated by multiple simultaneous users (working households and stay at home kids represent a huge surge in residential network demand), local ISPs started experiencing outages due to this additional load. To further exacerbate the issue, all these network streams combined with those from the “cord cutting” trends, overwhelmed local WiFi and/or ISP plan capacity.
- ISP aid – the preemptive removal or expansion of data caps has been a great help for households that are now streaming throughout the day (be it for work or personal use). Special low bandwidth plans have been added at very accessible (most free) for qualifying families.
- Choose the right ISP bandwidth plan – encourage users to properly size their current plan by providing minimum bandwidth requirements for WFH applications, added to their normal personal household requirements (e.g. Zoom session 1.5-3Mbps, Softphone 0.15Mbps, Netflix stream 3-5Mbps, Disney+ 5-10Mbps, etc). Once work requirements are accurately defined, subsidizing cost delta for the additional bandwidth is a comprehensive strategy.
- Expanding helpdesk support to include home WiFi networks – most non-technical users will not have WiFi networks that are properly sized for the number active home devices. Locations used as home office, may not have adequate coverage. Systems requiring the best performance should be hardwired. If this is not possible, hardwiring other devices to provide relief is also a valid strategy. Another option is to increase the number of radios by leveraging an easy to deploy wireless mesh system such as Eero, Nest WiFi, Velop, or Orbi. IT helpdesks will be able to help the user navigate these dark waters.
Scaling connectivity at the HQ or Data center – with the distribution of end-users, traffic that used to stay on the local LAN (workstations to back end-servers), or flow through metro-e links (HQ or branch locations to datacenter) is now traversing over the Internet. The consideration here is twofold – is there enough bandwidth to handle the load, and is the existing configuration sufficiently reliable.
- Business as usual for ISPs – account executives and engineering staff have been available to process bandwidth increases for existing circuits in a quick and efficient manner.
- City permits are hard to obtain – ISP projects requiring local government intervention or right of way (ROW) permits are taking longer than usual, or not happening. Companies seeking additional redundancy are having to conduct more due diligence to confirm infrastructure is diverse and not leveraging existing carrier fiber assets.
- 5G/LTE backup – a popular option where secondary links may be too expensive or take too long to deploy is leveraging cellular networks to supplement primary links or create a failover strategy in case of outages.
- SDWAN – for an enhanced visibility, return on investment, and automation – network administrators are replacing legacy equipment with SDWAN technology, leveraging products from VMware (VeloCloud), Silver Peak, Sophos, Cisco, etc.
- Network Diversity – administrators need to audit their network architectures to validate that the necessary redundancies are in place at the ISP and device level. Subsequently testing to confirm proper configuration and that all single points of
failure have been eliminated.
- NFINIT Network Passport – let NFINIT provide the necessary level of redundancy, automation, and ease of use with the lowest TCO, while making ‘everything on-net’ from cloud providers, to data centers, to SaaS platforms.
Clouding or SaaS-ifying commodity services: email, shared drives, video conferencing, etc – Utilizing hosted solutions for standard office applications can help to alleviate traffic congestion to the remaining core applications. This strategy can be very effective for in-house data centers with limited network expansion from the existing providers. This also eliminates the need for VPN services for workers that only need access to these basic systems.
Adapting and moving specialized or legacy applications to the cloud, or providing access to them without the need of VPN – Multiple approaches are being taken here, not all applications are Cloud Tolerant, or can be made tolerant in the time available to administrators. A common approach to cutting down on latency, bandwidth requirements, and VPN necessity is by providing DaaS or VDI access where an application can be delivered as if the user was still sitting ‘next to it’. Other applications have been split, where the front-end resides in public cloud infrastructures, while the back-end is still based out of colocation or physical environments. With this approach, use of public direct connects will assist greatly in providing constant user experience when it comes to dedicated bandwidth, stable latency and security.
Cost optimization – All of these sudden changes represent not only a technical shift of resources but also a financial one. It is important to create a baseline for the variables that each application possesses (corporate network traffic, IOPS, storage and egress) to track what the financial ramifications are. Some of the things that should be examined are:
- Public Cloud Network Connections
Distributed endpoints in a distributed threat landscape
Security – As home based networks and employee owned devices (BYOD), with limited security controls become the gateway to the corporate network and centralized data, what is required to maintain an acceptable level of transparency and fortitude?
The security category of this series is not only the area that we believe represents the single greatest risk to companies during the COVID-19 governmental “stay at home” mandates, but also will be the category in which the impact will be permanent…
As IT leadership worked in haste to deploy solutions to keep their workforces productive, the dark side of IT was working equally hard to exploit what corners were cut in response to this emergency. Since the beginning of the COVID-19 outbreak, there has been an increase in the quantity and sophistication of attacks that companies are facing. Unfortunately, there is no way to completely eliminate these risks, only prescriptions for helping mitigate them.
Security basics – It may go without saying, but at a minimum IT systems should align with the basic criteria outlined below:
- Users should use passwords with sufficient complexity
- Systems should leverage 2 factor authentication
- Valid SSL certificates on systems
- Keep up with software updates to avoid vulnerabilities
- Leverage a “least privilege access rights” policy
- Encrypt sensitive data at rest and inflight (see SSL comment)
- Perform vulnerability and penetration tests where able
- Log as much as possible (preferably in a centralized location for event correlation)
Security awareness – This needs to be the front lines of this battle in every company. An unaware end user clicking on a suspicious e-mail claiming to have a new vaccine for COVID-19 can instantly erase all of the investments in technology and policy.
COVID-Themed Domain Trends
This graph shows the number of domains registered per day that contain a COVID-19 related term or something closely resembling a such a term (e.g., Unicode lookalikes in punycode) in their domain name. The red line is the count of such domains that are likely malicious (‘high risk’, in DomainTools’s terminology), and the blue is domains that are likely benign (‘low risk’).
- Training and testing – Security awareness training and the subsequent testing needs to be a requirement in our current state and beyond.
- Training end-users – This is a tough task, made even tougher remotely. However, it is essential to understanding the potential risks and closing the gaps. This is an area that a company does not need to shoulder alone if it is not already an active practice inside of the organization. There are plenty of tools and companies that can be leveraged that cover a wide range of depth and investment. Trainings should consist of the following:
- Overview / refresher of company policies surrounding security including idiosyncrasies specific to the company.
- Analysis of common vulnerabilities and how bad actors are exploiting them (Phishing, Credential harvesting, Social engineering, etc.)
- Procedures to follow if employees notice suspicious activity or a potential breach.
- Testing methodology – As they say, “you can’t manage what you can’t measure”, it is important that a company have a way to measure their employees’ susceptibility to attacks from bad actors. This is an iterative process that through repetition, training, and exposure educates employees, but also provides insight to company stakeholders.
- Create a baseline – Utilizing targeted email phishing campaigns to test all of the various user groups throughout a company is imperative to establishing a baseline. Information such as who opened the email, who clicked on the link, who entered credentials, who was able to identify the attack and reported it to the proper entities, who failed the test and started the necessary supplemental training, who completed the training and passed the assessment. All of these are quantifiable metrics that paint a picture of what the most vulnerable employee demographics are within a company are presented via dynamic reporting and executive dashboards that allow for trending over time to build a cultural shift
- Targeted tests and training – Once the problem groups are identified, additional email phishing campaigns and more specific training initiatives can be focused to these individuals.
- Rinse and repeat – The bad guys are innovating, seemingly always one-step ahead. As fearful employees seek to find as much COVID-19 information as possible, they have become extra vulnerable to opening articles/emails from less reputable sources. This has led to an exponentially large number of targeted email phishing attacks, and malware infiltrations thus increasing the number of reported ransomware encryptions and breaches. Keep employees up to date with the latest attacks and intrusion vectors. Establish a regular cadence to perform these tests, continuing to quantify and report.
- Reap the results – NFINIT has seen great measureable impact from organizations taking on these testing/training regiments helped by tools that provide automation, reporting, and insight from the top security professionals.
- Below are some of the tools that are available for these purposes:
- SOPHOS Phish Threat (NFINIT’s internal tool of choice)
Work/Life relevance – With employees struggling to find the balance and segregation between work life and home life, don’t fight it here. Employees are likely to be much more engaged if they feel that the company security program is something that they can put to use in their personal dealings as well. With additional household users sharing home networks and devices, it is a win-win to widen the net in terms of security awareness.
Below are some examples of relevant topics:
- Anti-virus / malware protection for privately owned devices – Same as we have been asked to protect ourselves with social distancing, masks and gloves, we should be careful when introducing our work devices into environments occupied by others that are ‘unprotected’. Practice network distancing! Various security companies are actively assisting in this by offering free licensing for client’s employees to use on personal devices as well as enhanced network security offerings.
- Online and Social Media best practices – Provide useful real world case studies about how bad actors are using social media information to spoof identities, or provide false information.
- Updating network and IoT devices – It is always important to be on the forefront of security. For this, most device vendors will release software updates for network and IoT devices. Users should keep up with these patches/updates as much as possible as they (normally) help decrease attack vectors.
- Company provided security equipment – Depending on the criticality of certain data or employee functions. Some companies should consider providing users with security appliances that will provide enterprise-level network security to the remote worker.
Access to corporate networks and data – This move to the home office adds a new dimension to keeping corporate data secure. Data that was securely stored within the corporate office is now potentially available to users at their home (along with other cohabitants that are not employees and devices that are not company owned/managed).
- To VPN or NOT to VPN (that is the question???) – As frequently mentioned throughout this blog, VPNs have been extensively leveraged during this WFH initiative. However, for companies lacking the proper protections, VPNs can cause more harm than good as they effectively create tunnels from the unsecured home environment to sensitive corporate systems. The trend here is towards a hybrid approach in which, based on employee workflow, only the users requiring access to “VPN only” systems are able to connect.
- Data deemed to be highly sensitive can be accessed remotely with various layers of authentication and security. While less critical data can flow freely.
- Data encryption – Encrypting data is not a new concept, but where it is now being unencrypted (the home) is. In the cloud age, we have a moto that every company should act as if there are only three walls around their data center. This thought process pushes companies to segregate data sets by the level of sensitivity and apply the proper level of access and protection granularly.
Home office security gaps – Not many employees are running next-gen firewalls at their homes, with URL filtering, IDS/IPS or inline AV scanning functionality. Typical home network security consists of a residential router (provided by a residential grade ISP), WiFi access point, and whatever (expired) anti-virus software came with their Best Buy purchased workstation. Today, critical data flows freely through these under protected networks.
- The new perimeter – It is paramount that companies provide the necessary tools to protect their assets. There are multiple next-gen security packages that can be installed on employee endpoints (company owned, or BYOD). Most modern approaches to endpoint security use multi-pronged approaches to identifying, stopping, reporting, and recovering from infections.
- Signature based protection – These are the traditional anti-virus tools that rely on pre-identification of threats by a central intelligence organism. Once cataloged, signatures are periodically fed to endpoints designed to automatically alert and block when encountered in the wild. In this day in age, these protections are falling short, as modern malware tends to mutate and generate new signature as it jumps from host to host.
- Machine-learning based protection – As malware becomes more advanced, so have the detection algorithms. These tools use behavioral information and patterns to flag and block potential malware. The drawback to this approach is that it is sometimes not immediate, while also yielding false positives.
- Ransomware protection – As mentioned above, the most advanced malware may not be stopped immediately, systems need to experience some ill behavior in order to be flagged as such. Administrators still need protections and recovery plans against potential file-level encryption such as air-gapped data backups.
- Data loss prevention (DLP) – Once sensitive data lives in the “new perimeter”, system administrators need to have control over how this data is handled. Data Loss Prevention systems will keep track of (or stop) the movement of sensitive data as it’s moved via email, external drive, cloud upload or other methods. Thresholds and data types are configurable to allow employees to work within their established boundaries.
- File integrity monitoring (FIM) – Keep track of what files have been touched, who touched them, and when they were touched. This virtual trail of breadcrumbs provides companies with a high level of confidence that collections of data have not been tampered with and are complete.
- Network based protections – Move the next-gen firewall to the users’ endpoint, provide local IPS/IDS protections, URL filtering, and application monitoring/cataloging locally without the need or an external appliance.
- Out of office software updates – Once remote workers are off the corporate network, make sure the same software update expectations are IT administrators should have the necessary tools in place to keep track of software versions (and to push update) on employee devices.
- Centralized reporting and management – As employees disperse, management of all these endpoints needs to be easy for the IT administrator – they are no longer able to take a quick walk to ‘the sales pit’ and utter their signature “moooove”. A complete picture of employee endpoint health needs to be dynamic and easy to manage, in case more proactive measures are required.
- Response and transparency – No modern anti-malware system can claim to be perfect. Corporate security outfits need to have response plans in place to be able to act in the event of an attack or breach. However, with the assistance of the different systems drawn-up above, and IT forensics tools such as Sophos’ Endpoint Detection and Response (EDR) security professionals are able to prioritize and focus their efforts accordingly.
- Additional support – It is a hard ask for the IT administrator to implement and manage these security initiatives and systems on their own. Multiple security outfits offer managed services that help sift through all generated events, perform threat hunting, use logged information to determine scope and severity of threats, identify false positives, provide root cause and remediation for incidents. These services are offered in various levels, from a basic threat hunting service, to a dedicated technical account lead, to a collection of dedicated security professionals. Examples of these services are:
Be prepared for the unforeseen
Business Continuity – It appears that most companies were scrambling to test and assess their business continuity plans as COVID-19 swept across the globe. The results were mixed, with some companies finding wide gaps in the written hypothetical processes and procedures, while others experienced the need to make a few, but sudden tweaks.
The phrase business continuity is a very broad principle, but in a quick Google search, one can find a definition as follows: “Business continuity is the ability of an organization to maintain essential functions during, as well as after, a disaster has occurred”. The challenge in the real world of executing a business continuity plan is that these “essential functions” do not reside in any single department, application or data set.
Most business continuity plans are tasked to the IT department to design how the “business as usual” processes and procedures will be translated into the virtual world once a “disaster” event is declared. This makes sense as the majority of the workflows are based on technical systems and applications that are delivered and maintained by this department. Where this breaks is when a company expects the IT administrators to understand the business criticality and department based end user workflows without involving these elements in the construction of the plan. This is where the eye of the COVID storm sent businesses and their associated IT resources scrambling as the workforce shifted to remote locations.
Companies rely on Business Continuity Plans (BCP) to identify essential LoB applications, potential risks, and the necessary policies, people and procedures that need to be in place to avoid unrecoverable loss.
- BCPs are normally a product of:
- BIA – Business Impact Analysis
- Differentiates urgent and non-urgent functions/activities
- RPO and RTO assigned to each critical function
- TRA – Threat and Risk Analysis
- Each potential threat is identified and recovery steps are defined (earthquake, sabotage, hardware failure)
- BCPs will also include a Disaster Recovery Plan (DRP) that dictates the procedures to be followed in order to recuperate IT services after a disaster.
Many facets to the execution of a BCP require the following items to be addressed appropriately:
- Critical application design
- Infrastructure assessment
- Workflow review
- Human resource availability
The majority of business continuity plans are based completely on the 1st step above, critical application design. When put to the test, the application may still work, but now, the end users cannot access it while maintaining proper company security standards… Now what???
Below we will drill into each of these categories to provide an overview of either what we have found to be lessons learned or secondary countermeasures across our clients’ experiences with this real world disaster.
Critical application assessment – A prerequisite to this step typically is for a formal Business Impact Analysis (BIA) to be conducted. The outcome of the BIA delivers quantifiable value of each application to the business. Based on these values, technical teams can now design the appropriate level of redundancy into the supporting ecosystem of the application. This requires a comprehensive look into all of the dependencies that feed into a critical application and prioritizing them as such.
- Incomplete application picture – Non-technical users may label unused applications as non-essential, however, they may be dependencies to other applications. As an example, users may see SQL server installed on a workstation, but assume it is not used as they never need to directly interact with it. It is the LoB application that uses it to store data and is required to make the primary application function.
- Identifying dependencies – There are various tools available that will allow you to map all of the network connections between the various VMs that support an application as well as monitor the data flows between other applications. Certain tools will even provide the throughput being utilized between these assets to capture a complete picture.
- Below are links to a few of these types of tools (Most of them offer a free trial period that will allow for a snapshot of the current environment):
Infrastructure audit – This item is inclusive of everything tangible. Understanding capabilities surrounding facilities, telecommunications (data and voice), compute infrastructure and applications is imperative, as this becomes the foundation for everything else. As we mentioned in the first series of the blog, an event can change the way we access applications as well as the support behind the infrastructure that hosts them.
- Physical security – With the reduced level of on-site personnel, we have heard from customers about break-ins to their facilities. In one particular case, the client received a monitoring alert signaling a loss of power, so they proceeded to call the power company while they made their way down to the building. To their surprise, the perpetrator had flipped the breakers in an attempt to shut off security systems. Had the customer had their camera and physical alarm system tied to their IT monitoring, the first call would have been to the local police department. It was unmentionable what a review of the security camera footage revealed about the perp’s usage of a screwdriver. Coordinating all of the remote monitoring elements into a holistic view of the environment is necessary to provide an accurate picture of what is really happening to the data center environment. This customer, in particular, is now looking into the following:
- Expanding door sensor utilization
- Remote leak detection
- Remote temperature and humidity sensors
- Adding visual and audio monitoring through additional cameras with movement alerts
- Disaster Recovery – It is important to prepare for the eventuality that the IT infrastructure upon which a business critical application is running may fail or be destroyed (cyber crime, water damage, fire, etc.). In these situations, backups may exist at a secondary location, but does that location have the appropriate compute/storage/network facilities to run the application? In order to have a complete DR strategy in place, IT administrators need to make sure that:
- Whatever replication method or application is chosen, can do so correctly (no data loss) and within the required RPO/RTO for the application.
- There are sufficient compute resources at the DR site to support application requirements.
- Network access the DR site is configured as necessary (link into existing MPLS cloud, sufficient bandwidth, sufficient public IPs, etc.)
- DR environment testing or failover drills should be performed frequently to validate proper function.
- Backups – These come in multiple flavors, but the recommendation is to follow the 3-2-1-0 rule: 3 copies of your data, 2 different locations, 1 read-only copy (immutable/air gapped), 0 backup errors.
- Back-end systems – can be captured via backup tools such as Veeam, Comvault, Avamar, etc. It is important to select the proper backup tool and method that allows proper and timely capture of data. No sense in taking an approach that will capture your database in a dirty state, or have it take longer than allotted backup windows.
- Workstations – as mentioned above, workstations are being exposed to less secure networks increasing the probability that these may experience some sort of issue. Backup agents can be installed on end-user workstations to provide periodic backups of complete OS configurations, and locally saved work files. These systems can potentially allow users to self recover individual files, or perform a complete system recovery in case of a more catastrophic issue.
- SaaS platforms – these are not impervious to failure, your users can still make mistakes and accidentally erase important information. Not all SaaS platforms provide periodic backups (and recovery service) of user data, so it is up to the IT administrator to control their own destiny. A commonly utilized platform that suffers from this shortcoming is Office 365 Email/Sharepoint/OneDrive.
Workflow mapping – Above we have discussed the need to have a comprehensive roadmap of technical application dependencies and account for how standalone applications share data. The more complicated aspect of supporting workloads in a crisis is understanding the non-technical workflows and validating that the end users have access to data sets that are not technically integrated.
- Legacy applications – applications that contain data, which may need to be utilized by an end user to complete a specific business process, are not available causing the process to stall.
Human resource availability – The most commonly overlooked aspect of the business continuity plan is the human resources element. If a great technology plan allows for an application to failover, meeting RTO and RPO business requirements, does it take into account the potential of the designated IT staff not being available to play their role during a disaster.
Maintaining the business requires failover plans for employees as well. Whether the resources are internal or external, proper documentation of processes and procedures can save the day. Best practices have individuals or 3rd Party support systems identified and engaged in routine testing prior to needing to react to a real disaster scenario.
What areas of a business need to be considered?
- Technical staff – Business continuity depends highly on the availability of human resources with tribal knowledge. During this pandemic, especially in the hard hit regions, entire workgroups can become exposed and unavailable. If a company only employees a single network engineer that can redirect the end user traffic to allow employees to connect remotely, what happens if that individual gets sick?
- Facilities personnel – These roles are instrumental with maintaining in-house data centers that have become evacuated by the remote IT team. The support of these assets typically overlap, but with proper training and documentation, Facilities and IT can supplement each other.
- Authorized signer – Additional designated individuals with the authority to procure goods and services to recover in an emergency.
Now that we have painted the picture of all of the challenges that could surface, you are probably waiting for the hook of how to seamlessly fix them all. Well, you are going to have to keep waiting for it…
If we were to tell you that we could solve all of these issues without the integral knowledge of your applications and infrastructure, we would be either lying or ignorant.
What we can do is provide you access to our seasoned team of experts, which have the unique perspective of designing and supporting projects for hundreds of clients. With this engagement, our resources will roll up their sleeves to gain insight into your specific environment, and provide you with valuable input to help navigate through these uncharted waters.
Give us a call at 1-866-971-2656, or drop us an email to set-up a consultation!