Working at Splunk, we have several deployments of the product internally. Many of which are managed by different teams. As my team, Splunk@Splunk, is in charge of the stack that is used by the SOC to monitor our security posture, we were recently asked to help collect internal and audit logs from all those other stacks and forward them into our own indexer cluster. So, we decided that the least intrusive way to do this (and still afford ourselves the flexibility of collecting any additional logs that security might ask for later on) was to have the other teams install a separate UF on each of their Splunk instances. The UF would then check-in to a deployment server owned by our team and we'd be able to control the inputs and outputs to collect the data. Simple, right? Well, as it turns out, not so much. I spent most of today learning about Splunk's audit logging configurations and I'd like to share with you all what I learned. For those of you that don't know, Splunk actually has a processor outside of the normal Slunkd pipeline specifically for managing audit events. It's called the AuditTrailManager, and as you might guess from the name, it feeds the auditqueue which then passes data into index=_audit with sourcetype=audittrail. Cool, right? So, why does it matter? If we dig into $SPLUNK_HOME/etc/system/default/inputs.conf you'll notice this stanza: [monitor://$SPLUNK_HOME/var/log/splunk] index = _internal Then, if you look at $SPLUNK_HOME/etc/system/default/props.conf, you'll find this: [source::.../var/log/splunk/audit.log(.\d+)?] TRANSFORMS = send_to_nullqueue sourcetype = splunk_audit As you might guess from the stanza and transform names, this configuration causes Splunk's audit.log file, which gets picked-up out of the box by the file monitor input stanza to be dropped (sent to the null queue). Why? Because the AuditTrailManager is already collecting these events, so if Splunk didn't drop it, we'd end up with duplicate events in index=_audit. At this point, you're probably thinking, "So what? It works, right?" Well, if you're trying to forward audit data that was generated by a different Splunk instance by reading its audit.log file using a monitor input, then no, it doesn't. Source has the highest precedence in props.conf, and because the above mentioned transform will match /var/log/splunk/audit.log regardless of where Splunk is installed, it's going to end up getting dropped by the indexers. Before I get into how we solved this, a quick side note: The best practice of forwarding your internal logs (as described here) should really be done for everything that is not an indexer. If you're doing this already, awesome. It works great for everything EXCEPT the UF. Curiously, the UF does not seem to be able to successfully forward its data for index=_audit. I've gone through all the default configurations - on both the UF and full packages - and I can't figure out why audit is being dropped. My theory is that either there's a bug that causes the AuditTrailManager to not forward data to the output queue on the UF, OR the data from AuditTrailManager is sent to the indexers by the UF with its original source path to the indexers, and then when they cook it, it gets sent to the null queue by the transform. In either case, this behavior is undesirable for audit data and so I've filed a bug ticket for it: SPL-196147. Okay, so let's ignore the UF not being able to forward its own audit data for now, and focus on forwarding audit logs from another Splunk instance residing on the same host. First, the easy part. We have to tell the UF to pickup the other Splunk instance's logs (remember we're going after all the internal logging, not just audit data). Simple enough. We add an inputs.conf with a new monitor stanza: [monitor:///opt/splunk/var/log/splunk] disabled = false Assuming file permissions are good, and that you've also already configured your outputs.conf, your UF will start reading the logs and shipping them off to the indexers.... Only to meet their doom by send_to_nullqueue at their final destination -- which is why if you're doing this, you should configure the input last. To prevent the indexer from dropping these audit logs, you have to update props.conf. You may be tempted to do: [source::.../var/log/splunk/audit.log(.\d+)?] TRANSFORMS = This will cause two problems. First, you'll have a duplicate copy of your indexers audit logs, because now they're no longer discarding the audit logs picked up by their own input processor AND they're still getting logs from the AuditTrailManager process. Second, those logs will land in index=_internal, because that's the index specified in the aforementioned recursive file monitor input for $SPLUNK_HOME/var/log/splunk and we haven't told Splunk to do otherwise. So, first let's solve the duplicate data problem. We have two options on how to solve this: Option 1: Configure audit.conf with the following settings on the indexers (note that this is not supported in Splunk Cloud): [default] queueing = false Option 2: Blacklist audit.log in inputs.conf on the indexers (our preference): [monitor:///opt/splunk/var/log/splunk] blacklist = (audit\.log) disabled = false Once that file is in place, then we can fix both the discard and index routing issue with the following: props.conf [source::.../var/log/splunk/audit.log(.\d+)?] TRANSFORMS = set_audit_index transforms.conf [set_audit_index] REGEX = . DEST_KEY = _MetaData:Index FORMAT = _audit
Perfect, right? Almost! If you recall from earlier, the default props for the audit log contains sourcetype = splunk_audit. I have no idea why though because: 1) it's discarding that data by default anyway, and 2) a splunk_audit stanza does not exist anywhere in the default configurations. If you don't believe me, try a splunk cmd btool props list splunk_audit --debug and watch it return nothing. But, if you look at index=_audit you'll see only one sourcetype: audittrail, and you guessed it, that does have a configuration stanza out of the box with some useful settings. So, we'll want to include that in our props and transforms configurations as well: props.conf [source::.../var/log/splunk/audit.log(.\d+)?] TRANSFORMS = set_audit_index, set_audittrail_st transforms.conf [set_audit_index] REGEX = . DEST_KEY = _MetaData:Index FORMAT = _audit [set_audittrail_st] REGEX = . DEST_KEY = MetaData:Sourcetype FORMAT = sourcetype::audittrail Once you've deployed your final props.conf, transforms.conf, and audit.conf to your indexers (and restarted them), you're ready to deploy the inputs.conf to the UF. This configuration will allow you to continue collecting audit logs on your indexers, as well as index audit logs that you've collected from other Splunk instances on the same host(s). As an added bonus, if you're forwarding internal logs from your UF in your outputs.conf, those logs will also be successfully indexed now. Before anyone says anything, yes there is an alternative to running a UF to collect logs: index and forward. However, that carries its own risks and downsides. For example, if there's an outage on your stack, then the output queues can backup the splunkd pipeline for the other person's stack. You also have to go back and ask them to make additional configuration changes if you decide that you want other logs, or that you want to update the outputs. When you're working with a dozen teams, it's just easier to have them run a separate forwarder that you control. That's all for today. Thanks for reading!
0 Comments
I just wanted to make a quick post to address common issues with JSON field extractions that I've seen in Splunk over the years. Issue #1 JSON doesn't extract in long events Recently, we had JSON events where the length was over 10,000 characters and the fields were not extracting properly. We solved that with a simple change in limits.conf: [kv] maxchars = 20000 Issue #2 Nested key=value pairs Another issue I've ran into is nested key=value pairs inside the JSON dictionary. To solve that, look no further than this blog post. Issue #3 Bad dictionaries Finally, and what I believe is a common issue, we've ran into some silliness with how JSON dictionaries are being used. If you've ever seen a multi-value field named parameters{}.name with all of your keys in it and another multi-value field named parameters{}.value containing all the values for those keys in Splunk, then you know what I'm talking about. The raw data will usually look something like this: Solution A If the data is coming from a scripted input, you can usually do a little ETL in Python to fix it. For example, after you get the API response back, you can loop through the dictionary and call a function to reformat the events: That will clean up the JSON and give you a nice clean dictionary under parameters. Solution B If you don't have control over the source, all hope is not listed. You can perform a SEDCMD at index-time to re-write the dictionary. You'd want to test this at search time first, so here's the general ideal: | rex field=_raw mode=sed "s/\"key\":\"([^\"]+)\",\"value\"/\"\1\"/g" | spath Once you have your SED expression down, you can simply convert this to a props.conf configuration and then deploy it onto your indexers (or the heavy forwarder if the data is coming in from one). props.conf [my_sourcetype] Solution C If you don't want to modify the raw data using SEDCMD in solution B, there is yet another alternative. You can tell Splunk how to extract the key value pairs using props and transforms: props.conf [my_sourcetype] transforms.conf [json_key_value_extraction] Issue #4 Duplicate extractions in Splunk The last common issue that I've seen is JSON fields being extracted twice. This is caused by the data being indexed with INDEXED_EXTRACTIONS = true in props.conf while the search head is using either the default of KV_MODE = auto or an explicit KV_MODE = json. The field appears twice because the indexed fields will automatically show up in Splunk, and then the search head extracts the fields a second time. The fix for this is simple, just set KV_MODE = none for the corresponding sourcetype on the search head in props.conf. You could also disable indexed extractions at the indexers, but you will still see duplicate field extractions for historical data (prior to the change). I hope these solutions can help some other Splunk administrators out there with their JSON data. Feel free to comment!
Have you ever been in a situation where you needed to mass-edit a large number of knowledge objects on a search head cluster? Any Splunk admin that has ever had to redirect data to a new index knows how painful this can be. Today, I'm going to teach you the easy way to do it, without even having to restart splunk!
Here are the steps:
Just a few notes on settings that everyone should be thinking about when creating custom sourcetypes or technology add-ons in Splunk...
Data Parsing Do you have these configurations in props.conf? SHOULD_LINEMERGE = LINE_BREAKER = MAX_TIMESTAMP_LOOKAHEAD = TIME_PREFIX = TIME_FORMAT = TRUNCATE = More Data Parsing... ANNOTATE_PUNCT = false (if you don't need the punct field) TZ = (if it's not part of the timestamp in your data) CHARSET = UTF-8 (usually) NO_BINARY_CHECK = true KV_MODE = Check out Splunk's documentation on props.conf for help with these settings. Field Extractions Are you extracting fields for your users at data on-boarding? You should be! Splunk tends to grow organically and if your data isn't well-groomed when you bring it on, it may never be. Setup your users for success by identifying the fields they need and getting them extracted when you on-board their data. Be sure to use either EXTRACT in props.conf or a REPORT in props.conf and corresponding REGEX/FORMAT in transforms.conf. For CIM compliance, use this as a guide: http://docs.splunk.com/Documentation/CIM/4.12.0/User/Howtousethesereferencetables Or, consider using the Splunk Add-on Builder A word on community-built/3rd party apps and addons....
I feel like security is an often overlooked part of being a Splunk Engineer. This blog post is all about the importance of securing Splunk and the systems that it runs on. In addition to following the Securing Splunk guide in Splunk Docs, here are some other best practices you should be thinking about...
I recently upgraded a Splunk cluster from v6.5.2 to v7.0.1. There was one thing that wasn't covered in the release notes. After upgrading my first host (master node), I couldn't execute CLI commands. Splunk threw the following error: $ splunk enable maintenance-mode Couldn't complete HTTP request: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure Splunk Support admitted that they have some SSL bugs in the new release, and that this was one of them. To workaround this, you can make the following edits in server.conf: [sslConfig] sslVersions = *,-ssl2 sslVersionsForClient = *,-ssl2 cipherSuite = TLSv1+HIGH:TLSv1.2+HIGH:@STRENGTH Once this is done, restart Splunk and try the CLI again. You should be back in business.
I had to update server.conf on most of my Splunk server hosts (master node, search heads, deployers, deployment server, license master, etc.) but for some reason not on my indexers. I'm not sure why as both my indexers and search heads run the same OS and had the same OpenSSL package installed. Hopefully this helps anyone out there with a similar issue. Do you use Heavy Forwarders in your organization? Perhaps you have one installed on your syslog server, or on a dozen syslog servers? Chances are that your host field is already being used to identify which host generated any particular event, which is exactly what it was designed to do. But, what if you need to identify where that data is coming from? That's where indexed fields can help out.
I like to call this indexed field, "splunk_forwarder" because it's not one of the fields Splunk uses by default (e.g. splunk_server), and it's easy to remember. First, we'll create a fields.conf file on our search head(s) to tell Splunk about our indexed field: [splunk_forwarder] INDEXED = true Next, we'll add an inputs.conf file to our heavy forwarder that creates the new field along with its value: [default] _meta = splunk_forwarder::myforwarderhostname This configuration will create a new indexed field called, "splunk_forwarder" and will set its value to whatever you put after the double colons. In this case, it will be assigned a value of "myforwarderhostname". I typically use the hostname of the heavy forwarder, but you could also use the IP address, FQDN, etc. Finally, restart Splunk on your heavy forwarder and search head(s). Any new data that gets indexed will automatically have your new splunk_forwarder field! Now, you can run cool searches like this one to quickly see which forwarders are sending what data to Splunk: | tstats count where splunk_forwarder=* index=* by splunk_forwarder sourcetype index | stats values(index) as index values(sourcetype) as sourcetype sum(count) as count by splunk_forwarder New to Splunk? This is a list of learning resources that I've curated for new Splunk users over the years. Feel free to share this with your fellow Splunkers!
|
AuthorMason Morales Archives
October 2020
Categories |