Best Practices in Incident Response Automation

Earlier this year, we looked at the key phases of the incident response process and highlighted the importance of digital forensics and incident response (DFIR) as an essential complement to today’s detection and prevention tools. Automation in incident response is a natural evolution of automation in security and one of the keys to effective DFIR; however, the term “automation” itself is broad and contextually loaded, ranging from API calls for information-gathering to automated network containment and account lockout. There are several key use cases and best practices to consider when applying automation to an incident response process to insure efficiency and effectiveness.

No argument automation
Automated incident response is deeper and more nuanced than simply taking an IP in an operating system change report and blocking access from or to each reported IP.  Automation should start with understanding what incident responders do to accomplish their tasks and could be as simple as looking at the source and destination of an attack to add context to those data points. Is the attack aimed at a key department such as Finance, at a source code server, or even targeting the CFO or entire executive suite? Is the targeted system in fact infected? Is the attack coming from a country that you do not do business with; from a known command and control server; from an IP range on an intelligence list; or using malware that is detected by a handful or majority of antivirus tools? Manually collecting and integrating security alert source data, organizing datasets, and connecting the dots between this and other information can take hours for a single incident, and is at greater risk of error from subjective human analysis.

All of these questions can be answered by implementing automation at the front end of your incident response process. Some organizations with more extensive software development teams will move beyond the collection of information above, and will license and/or integrate graphics packages to visualize their data.  While both the collection and visualization steps can and do save hours and reduce frustration, one common pitfall is underestimating the design and implementation process for connecting the data, building out the data schema, and visualizing the right data for decision making.

Automation on existing and generated data
Incident responders not only collect and organize data around incidents, they also need to analyze and often recommend protective actions. Before they can make recommendations, the datasets they collect are often analyzed and processed more deeply to reduce duplicate effort, reduce false positives, identify attacker patterns, identify targeting patterns, and more. When these datasets combine targeted endpoint system data as well as internal and external context and intelligence, a dossier or profile of the incident can be created, delivering a situational awareness that enables priority setting and ordered responses.

Two key best practices in this phase are history and visualization. History of targeted users, systems, departments, groups, and attacking IPs enables security teams to understand what resources may need hardening, additional monitoring, or even additional tools or security measures.  Data visualization and applied analysis enhanced by historical data enable more rapid decision-making.

Protective automation
It is often assumed that the end objective of the incident response process is to block current and future connections to any sites related to attackers and their malware in order to stop downloading new malware or malware components, disable the ability to connect to exfiltration servers, and an also block any command and control communications. However, those actions are only a subset of possible responses. In addition to blocking IP addresses and URLs, it is also good practice to take other protective actions: these may include activating more verbose logging for users and systems, limiting user access to sensitive applications and systems, and quarantining user accounts.

These response actions can also be automated, but again, best practices include steps to avoid common pitfalls. Before taking automated quarantine or containment actions, take “baby steps” along the path to full automation. These include using the previous automation steps to build profiles or use cases where push-button action is appropriate. After confidence is developed in those profiles or use cases, enable automated actions that increase logging or automatically initiate additional data collection. With higher confidence in the context and event data, it makes sense to activate focused automatic enforcement that may restrict access to sensitive servers but still grant Internet access. Lastly, when confidence is higher, use those same tools to automatically activate severity-based containment.

Automation in incident response spans multiple tiers of automation, most of which includes common sense automation, integration, and analysis. The last phases of automation rest upon the success of the earlier automated tiers, where additional data capture, user and system quarantine and containment, and other actions can be triggered. Any security automation program can be successful when the tiers of automation create actionable situational awareness to drive rapid and consistent decisions and actions for the security team.