HomeMy WebLinkAboutCOW 2014-10-13 Item 4D - Powerpoint Presentation Shown at Meeting - After-Action Report for Emergency Due to IT Infrastructure FailureIT AFTER ACTION REPORT
Committee of the Whole, October 1 3, 2014
After Action Report Overview & Purpose
ir
Summarize incident outage
Identify cause(s)
Review lessons learned
Discuss next steps
IT Incident Summary
September 5t" —Day of Incident
i T
3:00 a.m. City's main Storage Area Network (SAN)
failed
IT notified of problem at 4:00 a.m.
By 6:30 a.m. outage deemed to be a major failure
Affected Systems
MobileCom — police and fire in -car computers
City Email
City Websites
Eden Online Services — online utility billing, staff time sheets, etc.
TRAKiT — permitting system
Access to Personal and Shared Directories — isolated to Police
❑ Justice RMS — PD records management system
• FileONQ — PD evidence tracking
❑ Print Servers
Incident Command Structure (ICS)
wir IF i IF -1=i■L
Li Initiated at 1 1:45 a.m. on day of incident
❑ Commander Eric Dreyer ICS commander
❑ Other City resources deployed to assist as needed
❑ Affected systems prioritized for action
El Communications with all department directors to
ensure full knowledge of affected services
❑ Council informed via phone call that afternoon
Restored Systems
MobileCom up and running at 3:00 p.m. September
5th
Email and websites on September 7th
Eden Online Services on September 8th
All remaining services online after data recovery on
September 15tH
Emergency Declaration
Full cost of data recovery and legal review not
complete until Wednesday, September 10th
Emergency declaration in line with previous
infrastructure incidents such as:
April, 201 3 Sanitary Sewer Collapse, Andover Pk. W.
December, 201 2 Stormwater Failure, E. Marg. Wy.
March, 2011 Sanitary Sewer Collapse, Andover Pk. W.
IT Incident Causes
Incident Cause: Bottom Line
A single piece of hardware, the SAN, with built in
redundancies, partitions, physical separation via
multiple arrays, and back up power sources failed
❑ Certain partitions in the SAN were being used as
temporary back up repositories for some data
during transition to a new, off site back up system
Pre - Incident Back Ups
Police data was backed up on the SAN in a
separate physical location within the SAN
IT is required to run two backup systems to keep
Police data separate due to Federal CJIS policy
Police data growing at a rapid rate; temporary
back up needed while transitioning to off site
resource
Temporary Storage on SAN
Was the most reliable storage system
Separate partitions
Physically separate arrays
Built in scheduled back up system
Redundancy
Back up power
Lessons Learned
Lessons Learned
iv
Immediately evaluate and eliminate potential single
points of failure
Physical (hardware, etc.)
Off site back ups
Physical redundancies
Personnel
Cross training
Lessons Learned
i
Better Communications
Ensuring city leadership receives timely and accurate
information
Immediate outreach to users to fully understand affect
of outage on customers
Lessons Learned
=Er
Develop asset replacement schedule
Physical (hardware, software, etc.)
Service agreements
Ensure process in place for regular review and
updates
Lessons Learned
mur
Ensure City -wide knowledge of and training in
Incident Command Structure
Awareness that ICS available for all City incidents
Allows for prioritization, communication and additional
resources when necessary
Expand ICS training to non - emergency related
personnel
Ensure role for policy makers in ICS process
Next Steps
Finish Necessary IT Purchases
Potential final system design:
To be purchased
Dothill High Perfromance
6200 City Hall
DotHilll
DotHill2
DotHi113
DotHill4
Dot Hill5
PD Storage
DotHilll
DotHill2
DotHill3
DotHill4
DotHillS
PD Storage
1111 0
1111
Mi ror
II 11
II II
DotHill High Performance
Sabey Building
II I1 0
1111
DotHill {City Backup)
Sabey Building
DotHill (PD Backup)
Sabey Building
Already in use
11 11
II 11
Falconstor Continuous Data
Protection Backup
6300 Building
City Wide Backup
Develop & Implement IT Strategic Plan
1
Build on the existing baseline assessment
Develop IT strategic plan that encompasses:
Role of IT and user expectations
Physical equipment and service contracts
Personnel resources and cross training
Asset replacement schedule and process
Effective IT project management
Enhanced communications
Potential budget implications
Questions