Mason Morales
  • Splunk Blog
  • Contact
  • About

More JSON, More Problems

9/30/2020

0 Comments

 
I just wanted to make a quick post to address common issues with JSON field extractions that I've seen in Splunk over the years.

Issue #1 JSON doesn't extract in long events
Recently, we had JSON events where the length was over 10,000 characters and the fields were not extracting properly. We solved that with a simple change in limits.conf:
[kv]
maxchars = 20000

Issue #2 Nested key=value pairs
Another issue I've ran into is nested key=value pairs inside the JSON dictionary. To solve that, look no further than this blog post.

Issue #3 Bad dictionaries
Finally, and what I believe is a common issue, we've ran into some silliness with how JSON dictionaries are being used. If you've ever seen a multi-value field named parameters{}.name with all of your keys in it and another multi-value field named parameters{}.value containing all the values for those keys in Splunk, then you know what I'm talking about.

The raw data will usually look something like this:
Picture
Solution A
If the data is coming from a scripted input, you can usually do a little ETL in Python to fix it. For example, after you get the API response back, you can loop through the dictionary and call a function to reformat the events:
Picture
That will clean up the JSON and give you a nice clean dictionary under parameters.

​Solution B
If you don't have control over the source, all hope is not listed. You can perform a SEDCMD at index-time to re-write the dictionary. You'd want to test this at search time first, so here's the general ideal:
​| rex field=_raw mode=sed "s/\"key\":\"([^\"]+)\",\"value\"/\"\1\"/g" | spath
Once you have your SED expression down, you can simply convert this to a props.conf configuration and then deploy it onto your indexers (or the heavy forwarder if the data is coming in from one).

​props.conf
[my_sourcetype]
​SEDCMD-fix_json_dictionary = 
s/"key":"([^"]+)","value"/"\1"/g
Solution C
If you don't want to modify the raw data using SEDCMD in solution B, there is yet another alternative. You can tell Splunk how to extract the key value pairs using props and transforms:

props.conf​
[my_sourcetype]
​REPORT-json_key_value_extraction = json_key_value_extraction
transforms.conf​
[json_key_value_extraction]
REGEX = "key":"(?<_KEY_1>[^"]+)","value":"(?<_VAL_1>[^"]+)"
FORMAT = $1::$2
MV_ADD = 1
Issue #4 Duplicate extractions in Splunk
The last common issue that I've seen is JSON fields being extracted twice. This is caused by the data being indexed with INDEXED_EXTRACTIONS = true in props.conf while the search head is using either the default of KV_MODE = auto or an explicit KV_MODE = json. The field appears twice because the indexed fields will automatically show up in Splunk, and then the search head extracts the fields a second time.

The fix for this is simple, just set KV_MODE = none for the corresponding sourcetype on the search head in props.conf. You could also disable indexed extractions at the indexers, but you will still see duplicate field extractions for historical data (prior to the change). 
I hope these solutions can help some other Splunk administrators out there with their JSON data. Feel free to comment!
0 Comments



Leave a Reply.

    Author

    Mason Morales
    Splunk Architect
    SplunkTrust 2015-2019
    My GitHub Repos
    Follow me on Splunk Answers

    View my profile on LinkedIn

    Archives

    October 2020
    September 2020
    March 2020
    November 2018
    October 2018
    December 2017
    March 2017
    February 2017

    Categories

    All

    RSS Feed


Copyright © 2018 Mason Morales All rights reserved.

  • Splunk Blog
  • Contact
  • About