I just wanted to make a quick post to address common issues with JSON field extractions that I've seen in Splunk over the years. Issue #1 JSON doesn't extract in long events Recently, we had JSON events where the length was over 10,000 characters and the fields were not extracting properly. We solved that with a simple change in limits.conf: [kv] maxchars = 20000 Issue #2 Nested key=value pairs Another issue I've ran into is nested key=value pairs inside the JSON dictionary. To solve that, look no further than this blog post. Issue #3 Bad dictionaries Finally, and what I believe is a common issue, we've ran into some silliness with how JSON dictionaries are being used. If you've ever seen a multi-value field named parameters{}.name with all of your keys in it and another multi-value field named parameters{}.value containing all the values for those keys in Splunk, then you know what I'm talking about. The raw data will usually look something like this: Solution A If the data is coming from a scripted input, you can usually do a little ETL in Python to fix it. For example, after you get the API response back, you can loop through the dictionary and call a function to reformat the events: That will clean up the JSON and give you a nice clean dictionary under parameters. Solution B If you don't have control over the source, all hope is not listed. You can perform a SEDCMD at index-time to re-write the dictionary. You'd want to test this at search time first, so here's the general ideal: | rex field=_raw mode=sed "s/\"key\":\"([^\"]+)\",\"value\"/\"\1\"/g" | spath Once you have your SED expression down, you can simply convert this to a props.conf configuration and then deploy it onto your indexers (or the heavy forwarder if the data is coming in from one). props.conf [my_sourcetype] Solution C If you don't want to modify the raw data using SEDCMD in solution B, there is yet another alternative. You can tell Splunk how to extract the key value pairs using props and transforms: props.conf [my_sourcetype] transforms.conf [json_key_value_extraction] Issue #4 Duplicate extractions in Splunk The last common issue that I've seen is JSON fields being extracted twice. This is caused by the data being indexed with INDEXED_EXTRACTIONS = true in props.conf while the search head is using either the default of KV_MODE = auto or an explicit KV_MODE = json. The field appears twice because the indexed fields will automatically show up in Splunk, and then the search head extracts the fields a second time. The fix for this is simple, just set KV_MODE = none for the corresponding sourcetype on the search head in props.conf. You could also disable indexed extractions at the indexers, but you will still see duplicate field extractions for historical data (prior to the change). I hope these solutions can help some other Splunk administrators out there with their JSON data. Feel free to comment!
0 Comments
|
AuthorMason Morales Archives
October 2020
Categories |