Mason Morales
  • Splunk Blog
  • Contact
  • About

Forwarding index=_audit data using UFs

10/8/2020

0 Comments

 
Working at Splunk, we have several deployments of the product internally. Many of which are managed by different teams. As my team, Splunk@Splunk, is in charge of the stack that is used by the SOC to monitor our security posture, we were recently asked to help collect internal and audit logs from all those other stacks and forward them into our own indexer cluster.

So, we decided that the least intrusive way to do this (and still afford ourselves the flexibility of collecting any additional logs that security might ask for later on) was to have the other teams install a separate UF on each of their Splunk instances. The UF would then check-in to a deployment server owned by our team and we'd be able to control the inputs and outputs to collect the data. Simple, right?

Well, as it turns out, not so much. I spent most of today learning about Splunk's audit logging configurations and I'd like to share with you all what I learned. For those of you that don't know, Splunk actually has a processor outside of the normal Slunkd pipeline specifically for managing audit events. It's called the AuditTrailManager, and as you might guess from the name, it feeds the auditqueue which then passes data into index=_audit with sourcetype=audittrail. Cool, right?

So, why does it matter? If we dig into $SPLUNK_HOME/etc/system/default/inputs.conf you'll notice this stanza:
[monitor://$SPLUNK_HOME/var/log/splunk]
index = _internal


Then, if you look at $SPLUNK_HOME/etc/system/default/props.conf, you'll find this:
[source::.../var/log/splunk/audit.log(.\d+)?]
TRANSFORMS = send_to_nullqueue
sourcetype = splunk_audit


As you might guess from the stanza and transform names, this configuration causes Splunk's audit.log file, which gets picked-up out of the box by the file monitor input stanza to be dropped (sent to the null queue). Why? Because the AuditTrailManager is already collecting these events, so if Splunk didn't drop it, we'd end up with duplicate events in index=_audit. 

At this point, you're probably thinking, "So what? It works, right?" Well, if you're trying to forward audit data that was generated by a different Splunk instance by reading its audit.log file using a monitor input, then no, it doesn't. Source has the highest precedence in props.conf, and because the above mentioned transform will match /var/log/splunk/audit.log regardless of where Splunk is installed, it's going to end up getting dropped by the indexers.

Before I get into how we solved this, a quick side note: The best practice of forwarding your internal logs (as described here) should really be done for everything that is not an indexer. If you're doing this already, awesome. It works great for everything EXCEPT the UF. Curiously, the UF does not seem to be able to successfully forward its data for index=_audit. I've gone through all the default configurations - on both the UF and full packages - and I can't figure out why audit is being dropped. My theory is that either there's a bug that causes the AuditTrailManager to not forward data to the output queue on the UF, OR the data from AuditTrailManager is sent to the indexers by the UF with its original source path to the indexers, and then when they cook it, it gets sent to the null queue by the transform. In either case, this behavior is undesirable for audit data and so I've filed a bug ticket for it: SPL-196147.  

Okay, so let's ignore the UF not being able to forward its own audit data for now, and focus on forwarding audit logs from another Splunk instance residing on the same host.

First, the easy part. We have to tell the UF to pickup the other Splunk instance's logs (remember we're going after all the internal logging, not just audit data). Simple enough. We add an inputs.conf with a new monitor stanza:
[monitor:///opt/splunk/var/log/splunk]
disabled = false


Assuming file permissions are good, and that you've also already configured your outputs.conf, your UF will start reading the logs and shipping them off to the indexers.... Only to meet their doom by send_to_nullqueue at their final destination -- which is why if you're doing this, you should configure the input last.

To prevent the indexer from dropping these audit logs, you have to update props.conf. You may be tempted to do:
[source::.../var/log/splunk/audit.log(.\d+)?]
TRANSFORMS = 


This will cause two problems. First, you'll have a duplicate copy of your indexers audit logs, because now they're no longer discarding the audit logs picked up by their own input processor AND they're still getting logs from the AuditTrailManager process. Second, those logs will land in index=_internal, because that's the index specified in the aforementioned recursive file monitor input for $SPLUNK_HOME/var/log/splunk and we haven't told Splunk to do otherwise.

So, first let's solve the duplicate data problem. We have two options on how to solve this:
Option 1: Configure audit.conf with the following settings on the indexers (note that this is not supported in Splunk Cloud): 
[default]
queueing = false

Option 2: Blacklist audit.log in inputs.conf on the indexers (our preference):
[monitor:///opt/splunk/var/log/splunk]
blacklist = (
audit\.log)
disabled = false

Once that file is in place, then we can fix both the discard and index routing issue with the following:
props.conf
[source::.../var/log/splunk/audit.log(.\d+)?]
TRANSFORMS = set_audit_index


transforms.conf
[set_audit_index]
REGEX = .
DEST_KEY = _MetaData:Index
FORMAT = _audit
​
​Perfect, right? Almost! If you recall from earlier, the default props for the audit log contains sourcetype = splunk_audit. I have no idea why though because: 1) it's discarding that data by default anyway, and 2) a splunk_audit stanza does not exist anywhere in the default configurations. If you don't believe me, try a splunk cmd btool props list splunk_audit --debug​ and watch it return nothing. But, if you look at index=_audit you'll see only one sourcetype: audittrail, and you guessed it, that does have a configuration stanza out of the box with some useful settings. So, we'll want to include that in our props and transforms configurations as well:

props.conf
[source::.../var/log/splunk/audit.log(.\d+)?]
TRANSFORMS = set_audit_index, set_audittrail_st


transforms.conf
[set_audit_index]
REGEX = .
DEST_KEY = _MetaData:Index
FORMAT = _audit

[set_audittrail_st]
REGEX = .
DEST_KEY = MetaData:Sourcetype
FORMAT = sourcetype::audittrail


Once you've deployed your final props.conf, transforms.conf, and audit.conf to your indexers (and restarted them), you're ready to deploy the inputs.conf to the UF. This configuration will allow you to continue collecting audit logs on your indexers, as well as index audit logs that you've collected from other Splunk instances on the same host(s). As an added bonus, if you're forwarding internal logs from your UF in your outputs.conf, those logs will also be successfully indexed now.

Before anyone says anything, yes there is an alternative to running a UF to collect logs: index and forward. However, that carries its own risks and downsides. For example, if there's an outage on your stack, then the output queues can backup the splunkd pipeline for the other person's stack. You also have to go back and ask them to make additional configuration changes if you decide that you want other logs, or that you want to update the outputs. When you're working with a dozen teams, it's just easier to have them run a separate forwarder that you control.

That's all for today. Thanks for reading!
0 Comments

More JSON, More Problems

9/30/2020

0 Comments

 
I just wanted to make a quick post to address common issues with JSON field extractions that I've seen in Splunk over the years.

Issue #1 JSON doesn't extract in long events
Recently, we had JSON events where the length was over 10,000 characters and the fields were not extracting properly. We solved that with a simple change in limits.conf:
[kv]
maxchars = 20000

Issue #2 Nested key=value pairs
Another issue I've ran into is nested key=value pairs inside the JSON dictionary. To solve that, look no further than this blog post.

Issue #3 Bad dictionaries
Finally, and what I believe is a common issue, we've ran into some silliness with how JSON dictionaries are being used. If you've ever seen a multi-value field named parameters{}.name with all of your keys in it and another multi-value field named parameters{}.value containing all the values for those keys in Splunk, then you know what I'm talking about.

The raw data will usually look something like this:
Picture
Solution A
If the data is coming from a scripted input, you can usually do a little ETL in Python to fix it. For example, after you get the API response back, you can loop through the dictionary and call a function to reformat the events:
Picture
That will clean up the JSON and give you a nice clean dictionary under parameters.

​Solution B
If you don't have control over the source, all hope is not listed. You can perform a SEDCMD at index-time to re-write the dictionary. You'd want to test this at search time first, so here's the general ideal:
​| rex field=_raw mode=sed "s/\"key\":\"([^\"]+)\",\"value\"/\"\1\"/g" | spath
Once you have your SED expression down, you can simply convert this to a props.conf configuration and then deploy it onto your indexers (or the heavy forwarder if the data is coming in from one).

​props.conf
[my_sourcetype]
​SEDCMD-fix_json_dictionary = 
s/"key":"([^"]+)","value"/"\1"/g
Solution C
If you don't want to modify the raw data using SEDCMD in solution B, there is yet another alternative. You can tell Splunk how to extract the key value pairs using props and transforms:

props.conf​
[my_sourcetype]
​REPORT-json_key_value_extraction = json_key_value_extraction
transforms.conf​
[json_key_value_extraction]
REGEX = "key":"(?<_KEY_1>[^"]+)","value":"(?<_VAL_1>[^"]+)"
FORMAT = $1::$2
MV_ADD = 1
Issue #4 Duplicate extractions in Splunk
The last common issue that I've seen is JSON fields being extracted twice. This is caused by the data being indexed with INDEXED_EXTRACTIONS = true in props.conf while the search head is using either the default of KV_MODE = auto or an explicit KV_MODE = json. The field appears twice because the indexed fields will automatically show up in Splunk, and then the search head extracts the fields a second time.

The fix for this is simple, just set KV_MODE = none for the corresponding sourcetype on the search head in props.conf. You could also disable indexed extractions at the indexers, but you will still see duplicate field extractions for historical data (prior to the change). 
I hope these solutions can help some other Splunk administrators out there with their JSON data. Feel free to comment!
0 Comments

Mass-Updating Knowledge Objects on Splunk Search Head Clusters

3/30/2020

0 Comments

 
Have you ever been in a situation where you needed to mass-edit a large number of knowledge objects on a search head cluster? Any Splunk admin that has ever had to redirect data to a new index knows how painful this can be. Today, I'm going to teach you the easy way to do it, without even having to restart splunk!

Here are the steps:
  1. Find the SHC captain via Splunk Web (Settings -> Search Head Clustering) or via CLI splunk show shcluster-status
  2. SSH into the captain node and sudo to the splunk user
  3. Perform a git clone https://github.com/masonsmorales/splunk_script_update_files or copy the contents of the update_files.sh bash script and update_files.txt file from a browser. (Note: If you copy them manually, you'll need to do a chmod +x on update_files.sh)
  4. Move the two files into the topmost directory that you want to change knowledge objects for. This could be the entire $SPLUNK_HOME/etc folder, only $SPLUNK_HOME/etc/apps, or a specific app.
  5. Edit the update_files.txt contents to your liking. The file should contain a list of filenames and/or patterns that you want to perform the find/replace operation against.
  6. Edit lines 7 of update_files.sh with the original text that you want to find, and line 8 with the new text you want to replace it with.
  7. As a best practice, always take a backup of whatever you are going to change. e.g. tar czvf splunk_etc_bak.tar $SPLUNK_HOME/etc
  8. After you've taken a backup and completed your edits, run the script to update the configuration files on disk. e.g. ./update_files.sh
  9. Once the script has completed, you'll need to force Splunk to reload the on-disk Splunk configurations on the captain. The quickest way to do this is by restarting only the Splunk Web service. Here's the command: splunk restartss (Clarifying Note: SS = Splunk Search. This command is an alias to splunk restart splunkweb)
  10. Finally, once the previous command has completed, SSH into each of the slaves and force them to download the latest bundle from the captain by executing: splunk resync shcluster-replicated-config
  11. That’s it! All that's left is to validate your changes. From one of the slaves, cat one of the file paths that the script updated and confirm that the file contents reflect your changes.
0 Comments

Data On-Boarding Best Practices

11/3/2018

0 Comments

 
Just a few notes on settings that everyone should be thinking about when creating custom sourcetypes or technology add-ons in Splunk...

Data Parsing
Do you have these configurations in props.conf?
SHOULD_LINEMERGE =
LINE_BREAKER  =
MAX_TIMESTAMP_LOOKAHEAD = 
TIME_PREFIX =
TIME_FORMAT = 
TRUNCATE =

More Data Parsing...
ANNOTATE_PUNCT = false (if you don't need the punct field)
TZ = (if it's not part of the timestamp in your data)
CHARSET = UTF-8 (usually)
NO_BINARY_CHECK = true
KV_MODE = 

Check out Splunk's documentation on props.conf for help with these settings.

Field Extractions
Are you extracting fields for your users at data on-boarding? You should be! Splunk tends to grow organically and if your data isn't well-groomed when you bring it on, it may never be. Setup your users for success by identifying the fields they need and getting them extracted when you on-board their data. 

Be sure to use either EXTRACT in props.conf or a REPORT in props.conf and corresponding REGEX/FORMAT in transforms.conf.

For CIM compliance, use this as a guide: http://docs.splunk.com/Documentation/CIM/4.12.0/User/Howtousethesereferencetables
 
Or, consider using the Splunk Add-on Builder
​
A word on community-built/3rd party apps and addons....
  • For COTS products, be sure to check splunkbase.com and github.com for any community built technology-addons. Sometimes you have to modify them a bit to get them working with your data, but they can potentially save you some serious time.
  • Don't be afraid of customizing community-built addons, or ripping out pieces you don't want (like eventgen.conf, indexes.conf, KV stores you don't plan to use, etc.).
  • Finally, be sure to test anything you download for Splunk in a development environment before installing it in your production environment.  Setting up a test environment can be as simple as spinning up Splunk on your laptop, setting up a vagrant host to run Splunk, or even using a docker image of Splunk. All you need is a sample of your data and a test environment to see if everything will work right.
  • Once you have installed a TA in a test environment, be sure to check for startup errors (or simply run splunk cmd btool check. This will tell you if there are any syntax errors in the config files. You can also search the _internal index on your test instance for log_level=ERROR with something like the following: data_source="*" OR data_host="*" OR data_sourcetype="*" You can replace the "*" with the source, host, or sourcetype that you are trying to ingest. This can uncover problems with data parsing, event breaking, etc. that you might not otherwise be aware of
0 Comments

Securing Splunk

10/20/2018

0 Comments

 
I feel like security is an often overlooked part of being a Splunk Engineer. This blog post is all about the importance of securing Splunk and the systems that it runs on. In addition to following the Securing Splunk guide in Splunk Docs, here are some other best practices you should be thinking about...
  1. Running Splunk as a non-privileged user (i.e. not root)
  2. Forwarding your local system messages and audit logs to syslog (and then of course to Splunk)
  3. Forwarding all _internal, _introspection, and _internal logs from all non-indexing instance of Splunk to your indexers
  4. Configuring host-based firewall rules (e.g. IPTABLES, ufw, or firewalld) for both inbound and outbound connections, specific to port, protocol, and destination host/network(s)
  5. Deploying additional open-source security tools to your core Splunk servers, such as OSQuery, OSSEC, or ClamAV
  6. Splunking your bash history
  7. Disabling the REST port on forwarders when it's not needed
  8. Mitigating the POODLE attack in Splunk Web
  9. Changing default certificates used for Splunkd, Splunk Web, etc.
  10. Using encryption for Splunk to Splunk (S2S) connections
  11. Hardening SSH on your Splunk servers
  12. Enabling 2FA for SSH 
  13. Restricting who has the admin role in Splunk to only a handful of users
  14. Implementing patching and vulnerability management policies
  15. Hardening the operating system of your Splunk servers using the CIS benchmarks, along with other security controls recommended by NIST, SANS, etc.
  16. Enabling data integrity control on all of your indexes
  17. Using a common splunk.secret file so that you can securely deploy passwords via configuration files 
  18. Using different SSH key pairs for each environment and rotating them periodically
  19. Securely storing passwords, secrets, API keys, and other sensitive information using a secret manager like Vault by HashiCorp
  20. Disabling Splunk Web on hosts that don't need it (think indexers and heavy forwarders)
  21. Changing the default password for the admin account by deploying a user-seed.conf file with a pre-hashed password
  22. Randomly generating long passwords for service accounts, or using Splunk's new token-based authentication system for REST API access.
  23. Restricting usage of the admin account so that you can properly audit who is making changes through Splunk web or Splunk CLI
  24. Enabling SSL on Splunk Web with certificates signed by a trusted CA
  25. If you're an app developer, storing passwords using the KV store instead of in plain text contained in configuration files
0 Comments

Upgrading to Splunk 7? Read this first...

12/8/2017

4 Comments

 
I recently upgraded a Splunk cluster from v6.5.2 to v7.0.1. There was one thing that wasn't covered in the release notes. After upgrading my first host (master node), I couldn't execute CLI commands. Splunk threw the following error:
$ splunk enable maintenance-mode 
Couldn't complete HTTP request: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure
Splunk Support admitted that they have some SSL bugs in the new release, and that this was one of them. To workaround this, you can make the following edits in server.conf:
[sslConfig] 
sslVersions = *,-ssl2 
sslVersionsForClient = *,-ssl2 
cipherSuite = TLSv1+HIGH:TLSv1.2+HIGH:@STRENGTH 

Once this is done, restart Splunk and try the CLI again. You should be back in business.

I had to update server.conf on most of my Splunk server hosts (master node, search heads, deployers, deployment server, license master, etc.) but for some reason not on my indexers. I'm not sure why as both my indexers and search heads run the same OS and had the same OpenSSL package installed. Hopefully this helps anyone out there with a similar issue.
4 Comments

Creating Indexed Fields in Splunk to Identify Heavy Forwarders

3/13/2017

1 Comment

 
Do you use Heavy Forwarders in your organization? Perhaps you have one installed on your syslog server, or on a dozen syslog servers? Chances are that your host field is already being used to identify which host generated any particular event, which is exactly what it was designed to do. But, what if you need to identify where that data is coming from? That's where indexed fields can help out.

I like to call this indexed field, "splunk_forwarder" because it's not one of the fields Splunk uses by default (e.g. splunk_server), and it's easy to remember.

First, we'll create a fields.conf file on our search head(s) to tell Splunk about our indexed field:

[splunk_forwarder]
INDEXED = true


Next, we'll add an inputs.conf file to our heavy forwarder that creates the new field along with its value:

[default]
_meta = splunk_forwarder::myforwarderhostname


This configuration will create a new indexed field called, "splunk_forwarder" and will set its value to whatever you put after the double colons. In this case, it will be assigned a value of "myforwarderhostname". I typically use the hostname of the heavy forwarder, but you could also use the IP address, FQDN, etc.

Finally, restart Splunk on your heavy forwarder and search head(s). Any new data that gets indexed will automatically have your new splunk_forwarder field!

Now, you can run cool searches like this one to quickly see which forwarders are sending what data to Splunk:

| tstats count  where splunk_forwarder=* index=* by splunk_forwarder sourcetype index | stats values(index) as index values(sourcetype) as sourcetype sum(count) as count by splunk_forwarder ​

Picture
1 Comment

Learning Splunk

2/4/2017

1 Comment

 
New to Splunk? This is a list of learning resources that I've curated for new Splunk users over the years. Feel free to share this with your fellow Splunkers!
  • Splunk Quick Reference Guide
  • Splunk Answers/Community Forum
  • Exploring Splunk eBook (free)
  • Splunk Intro Course (free)
  • Splunk Education Videos
  • Splunk Documentation
  • Splunk's Global User Group Conference​












1 Comment

    Author

    Mason Morales
    Splunk Architect
    SplunkTrust 2015-2019
    My GitHub Repos
    Follow me on Splunk Answers

    View my profile on LinkedIn

    Archives

    October 2020
    September 2020
    March 2020
    November 2018
    October 2018
    December 2017
    March 2017
    February 2017

    Categories

    All

    RSS Feed


Copyright © 2018 Mason Morales All rights reserved.

  • Splunk Blog
  • Contact
  • About