Add output format options #40

decibel · 2015-12-29T14:42:22Z

This is an attempt to make it easier to parse JSON.sh output from within bash.

The first thing I looked at was supporting direct assignment to associative arrays (declare -A, not declare -a), since the native output format is very close to what you need for that. That change essentially amounts to piping the original output through egrep -v '^\[]' | tr '\t' =.

Despite the usefulness of that, I'm not a fan of it because as far as I can tell the only way to actually interpret that in bash is by using an eval, which is dangerous. If someone finds a flaw in any of this they could potentially inject any arbitrary code, which would then get eval'd. I'll leave it to the reader to figure out what something like `eval rm -rf /`` would end up doing...

The next thing I looked at was better support for read -r key value. Thanks to the tab delimiter, that was also pretty easy: I just stripped the [] surrounding the path. This seems pretty robust, so long as you change IFS to $'\t'. That works because tabs aren't valid in JSON, and the script seems to detect that pretty robustly (though I didn't exhaustively test that).

The part I don't like about key-value mode is everything stays wrapped in double-quotes. Probably not a big deal for keys (I guess), but I'm worried it might cause problems for values. Especially values that have escapes in them. Maybe there's a clean way to deal with that.

The remaining 3 modes are simple... key-only and value-only produce one key or value per line, as you'd expect. The default mode retains the same behavior as today.

I also added a script of examples. It's a bit verbose and ugly, but at least it gives a good foundation on using the script. It does lean very heavily towards the function interface though, which is probably not a good thing to promote since JSON.sh makes heavy use of globals.

grep returns 1 when no lines match, which breaks ‘set -o errexit’. Rather than unsetting that and the trap on ERR, wrap it in a function

If you’ve sourced the script you might want to switch output formats, so you shouldn’t get permanently stuck with NO_HEAD just because you selected a format that needs it.

decibel · 2015-12-31T01:13:48Z

I just looked at using xargs to de-quote stuff when using read; it breaks on escaped quotes. I guess that would be fixable by processing valid JSON escape sequences through sed, though that leaves the question of what the output should be. For example, "bad": "=\\\" turns into bad = =\" when processed into an array, which isn't really correct either. Tab escapes (\t) certainly aren't handled correctly either.

Just to be clear, all these problems already existed; the new format option and the usage examples in example.sh just point them out.

The only reason I think any of this matters is because right now you get different results from reading into an array vs doing key-value assignment. I think it would be best if they were at least consistent (and hopefully without resorting to eval).

Thoughts?

dominictarr · 2016-01-03T02:14:01Z

The keys should be converted into a reasonable bash variable, if there are weird values that should probably be converted to something without spaces, hyphens, periods, etc. normally this will be okay. It might also be a good idea to prefix all those variables with a string provided by the user, which would support leading numerals in paths.

you can use bash pattern matching on variables, so you could iterate over all the items in an array by doing for i in foo_*; do ... done; (it's something like that, at least)

what if we replaced all non-aphanumeric characters in paths with a _, and join each item with a double underscore __ ? maybe just strip out non-aphanumeric characters?

values should be surrounded by single quotes, because then they won't be interpreted by bash.
the user could still create a security hole by not using that variable safely, but that isn't our fault, at least.

dominictarr · 2016-01-03T02:22:49Z

oh, yeah, if there are single quotes in the input, that would need to be replaced with '"'"' close the current single quote, then open a double quote, which surrounds a single quote, close the double quote, and then reopen the single quote. Escaping quotes in bash is weird!

decibel · 2016-01-04T22:25:11Z

On 1/2/16 8:14 PM, Dominic Tarr wrote:

The keys should be converted into a reasonable bash variable, if there
are weird values that should probably be converted to something without
spaces, hyphens, periods, etc. normally this will be okay. It might also
be a good idea to prefix all those variables with a string provided by
the user, which would support leading numerals in paths.

you can use bash pattern matching on variables, so you could iterate
over all the items in an array by doing |for i in foo_*; do ... done;|
(it's something like that, at least)

Oh, I hadn't thought about turning each path into a variable. Is there a
way to do that without eval?

what if we replaced all non-aphanumeric characters in paths with a |_|,
and join each item with a double underscore |__| ? maybe just strip out
non-aphanumeric characters?

That would be OK most of the time, but maybe not all the time.

If you're shoving the data into an associative array (see my example.sh)
it's not necessary; bash dequotes most everything for you. There are a
few exceptions though, like tabs.

values should be surrounded by single quotes, because then they won't be
interpreted by bash.

Right, but the problem with using read is that the value itself contains
the quotes, and it's difficult to get rid of them:

decibel@decina:[16:12]$test_var='"json value"'
decibel@decina:[16:12]$echo $test_var
"json value"

What I think should happen is that the variable is NOT quoted. AFAICT,
bash is smart enough to understand that a variable reference is just
that: a variable, and not something that should be parsed.

the user could still create a security hole by not using that variable
safely, but that isn't our fault, at least.

I don't think they can unless they do something like eval, or echo it in
a process expansion $(). But I suspect the process expansion bit is game
over anyway, even if it is quoted.

Anyway, now that I've thought about it a bit, I think the best option is
to provide an unquote function to the user and let them run the risk of
using it. That way you could do:

JSON.sh | while read -r key value; do
key=$(unquote "$key")
value=$(unquote "$value")
associative_array[$key]="$value"
done

I think that should be safe, because bash understands "$key" is a variable:

echo "$test_var"
"json value"

You could also use keys[] and values[] arrays if you don't want to mess
with associative...

By having this as a separate function we leave it up to the user what
they want to do. In the future we could also have the function
optionally unquote all the other oddball quoting json supports (I
actually looked up code to handle \u, and it's not that horrible.)

For right now though, my inclination is just to produce a simple

unquote() and call it good.

Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

dominictarr · 2016-01-07T06:01:45Z

I suspect most people would run this by doing $(command) or command that is probably the same thing as eval. That is what you get with a stringly typed language like bash though.

If I've learnt anything from doing bash, it's don't try to be too clever. This gets much simpler to do if the keys are alphanumeric, and that is usually the case, unless some total asshole created the JSON you are needing to parse.

I think the most important question: what do you need this for? and who controls the source of the data?

decibel · 2016-03-01T22:33:06Z

(getting back to this...)

In my particular case, the keys are generally pretty clean and values would be provided by the user, so I could probably manage this with either approach.

Jim Nasby added 18 commits December 28, 2015 13:53

Add unit test for to-be-done format option

59d07ca

Merge branch 'no-head' into format

95f5e17

Document new options

a590c18

Make -f code-formatted

5e72ee3

Add link

ced3629

Output list of failed tests

f78e935

Allow different output formats

49aeb01

Improve error message.

ff867ce

Better error message

1ba83c2

Safe handling of $format in case

3677930

Output an executable command

13df628

Handle the case when grep didn't match anything

fdeee3e

grep returns 1 when no lines match, which breaks ‘set -o errexit’. Rather than unsetting that and the trap on ERR, wrap it in a function

Better error trap

2490da3

Refactor format handling

49f2bf4

Remember whether we were given -n or inferred it

90c436c

If you’ve sourced the script you might want to switch output formats, so you shouldn’t get permanently stuck with NO_HEAD just because you selected a format that needs it.

Need NO_HEAD=1 for key-value too

19c1c5f

No reason to grep value-only

b94fdc3

Add example

6c4dcd0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add output format options #40

Add output format options #40

Uh oh!

decibel commented Dec 29, 2015

Uh oh!

decibel commented Dec 31, 2015

Uh oh!

dominictarr commented Jan 3, 2016

Uh oh!

dominictarr commented Jan 3, 2016

Uh oh!

decibel commented Jan 4, 2016

Uh oh!

dominictarr commented Jan 7, 2016

Uh oh!

decibel commented Mar 1, 2016

Uh oh!

Uh oh!

Add output format options #40

Are you sure you want to change the base?

Add output format options #40

Uh oh!

Conversation

decibel commented Dec 29, 2015

Uh oh!

decibel commented Dec 31, 2015

Uh oh!

dominictarr commented Jan 3, 2016

Uh oh!

dominictarr commented Jan 3, 2016

Uh oh!

decibel commented Jan 4, 2016

unquote() and call it good.

Uh oh!

dominictarr commented Jan 7, 2016

Uh oh!

decibel commented Mar 1, 2016

Uh oh!

Uh oh!