[[detection-engineering]] practical threat detection engineering:

Summary

This note list data sources for SIEM/ SOAR

Types of assets

Windows

Common windows list of data sources: - Application log - Security log - System log - Powershell log - Sysmon log

PowerShell event logs only include full command lines if PowerShell Script Block Logging is enabled. It is disabled by default and must be enabled via a Group Policy Object or the Registry.

Linux

Linux logs commonly used: - /var/log/syslog or / var/log/messages: system activty data - /var/log/auth.log or / var/log/secure: security log - /var/log/cron: scheduled task - /var/log/faillog: failed logins - /var/log/audit/audit.log: auditd rules based log

Network

Packet captures: expensive, low priority. e.g. using packetbeat
Network devices: logs from network devices, commonly used syslog to ship. e.g. log are flow data; another one is alerts from network security appliances

Application

Application logs may vary by name, but commonly app logs have these function: - Access, authentication and authorization logs: who logged in/out and when, whether it was a successful login attempt, and what operations were performed - Change log: e.g. config change, permission change - Error log - Availability log

Cloud

read more on AWS: [[aws-detect-and-response]]

Security tooling

Endpoint solution (EDR, endpoint protection) logs: alert log based off rules
Network protection (Zeek, suricata, see also [[detection-engineering-data-source#Network]]): alert based off rules. these appliances also have detection development capability, thus detection engineer usually can also tinker with the rule that fire alert

Data sources challenges

Completeness

We do not want to waste storage resources and bandwidth on data sources that won’t add value to our investigation due to the data they expose For example, if a system provides logs showing a network connection was established but there are no details on the source/destination of the connection with contextless timestamps and ambiguous time zones, there is likely not much that can be used from that to develop a quality detection.

Quality

data should be reliable and relevant. another aspect is the format of the data.

Timeliness

this covers delays of sending data, if the data is pre-process that cause delay?

Coverage

data should provide cover for what detection we aim to build

Understanding your data sources

Some points to help understand your sources: - Does the data source cover our detection needs? we should already have the technical needs for our detection when we ask these questions. Read more on [[detection-engineering#Investigate]] - What method is required to retrieve logs from the data source? API, Syslog, or something else? - What format does the data source provide logs in? - What time zone is the data source using for their timestamps? - Who is the point of contact for the data source that we need logs from? - What are the retention policies of the data source? - Does the detection engineering team have access to retrieve logs from the source? - What is the delay between an event occurring and it being received in the detection? - Can the logging configuration be modified to improve the data being received?

note about retention: sometimes the data source owner fail to understand that the goal of detection is not storing log, and this blur the line between their log retention policy with detection needs retention policy. be concise and clear to differentiate this when communicating data needs.

Related notes: