Gnuplot Quickstart

siiky

2022/01/12

2022/07/14

en

I started learning Gnuplot for a PA because I didn't want to deal with Python BS or whatnot, and Gnuplot has been around since like... even before the dinosaurs were invented, so it must be super specialized for this kinda thing, and it must be pretty good right?

Hopefully I'll document well enough the things I've learned these past few days, for posterity or someone else. I've been using it to plot 2D graphs of recorded data, not functions/expressions, so my focus will be on that.

Gnuplot files don't have a "standard" file extension, but some common ones seem to be .gp, .plot, .gnu, .gnuplot, .plt. I've been using .gp and will use .gp here.

To run Gnuplot scripts, just call gnuplot script.gp. It's possible to pass arguments to the script by using the -c flag: gnuplot -c script.gp arg1 arg2 etc. And inside the script the arguments are available as the variables ARG1, ARG2, etc. As is common on other programming languages, ARG0 is the script name. I don't know if there are, or what are the limits on the number of arguments, nor how to loop through them, but I'm guessing it's possible.

Now let's get going with some Gnuplot code. I said the focus would be on plotting datafiles, so let's start with expressions:

# The output "format".
set terminal svg

# The output file.
set output "exp.svg"

# Enable gridlines.
set grid

# Where to place the lines/points/&c legend.
set key right bottom

# Legend of the XX/YY axes.
set xlabel "The passage of time..."
set ylabel "Shittiness of the web"

# Mirror or not the axes' tics -- notice the YY axis has tics on both the left
#   and right, but the XX axis has only on the bottom, not the top.
set ytics mirror
set xtics nomirror

# Use a logscale of base 7 for the XX axis -- the base is optional and defaults
#   to 10 I think.
set logscale y 7

# The actual plot: `exp(x)` is the expression to plot; `x` is "special" --
#   there are a few different variables you can use, but they seem to depend on
#   the available axes/dimensions, but I don't know details of this so RTFM.
#
# `title "..."` sets this line's legend.
plot exp(x) title "Super straight line"

Notice how it starts to grow really fucking quick after t=5 -- right after HTML was invented.

Here's another one:

set terminal svg
set output "rollercoaster.svg"

# The number of samples to use to plot the expression.
set samples 1000

# The ranges here specify the XX and YY ranges respectively.
plot [-50:50] [-5:5] x*sin(x)*cos(x)**x title "Rollercoaster"

The website, with documentation and all (including a 300+ pages PDF of all the documentation, with proper PDF index!):

The first seems to be the "official" one, but is sometimes offline? The second looks like a mirror.

You can use the help command to read the documentation inside the Gnuplot REPL too.

An important concept is that of the terminal, as seen above being set to SVG. It's nothing but an "output backend", and Gnuplot has tons of those -- run set terminal and see for yourself; there's even one to output ASCII art to the terminal! Different terminals may have different specific options -- RTFM for those.

Once you start messing around with line styles, line types, colors, and whatnot, it's helpful to know what the valid values are. For that use the test command after setting the terminal (the result of the test command varies depending on the terminal, so it's important to set it):

set terminal svg
set output "gnuplot-test.svg"
test

Variables are a thing, and you can define them just as you'd expect:

some_var = 42

To plot data from files just pass the filename to plot:

plot "/path/to/file.tsv" # ...

Gnuplot is supposed to support many different formats but I don't know details here. I've been using TSV because it makes sense. For tabular data files (TSV, CSV, ...), this may be useful:

set datafile separator tab

RTFM for details: help set datafile separator.

It's possible to define datasets inside a Gnuplot script, too, like this:

plot "-"
1 2
3 4
5 6
7 8
9 0
e

Notice the e at the end! You can even define more than one for the same plot command:

plot "-", "-"
1 2
3 4
5 6
7 8
9 0
e
2 1
4 3
6 5
8 7
0 9
e

Another arguably more useful way is to do it like so (notice the dollar!):

$SomeData << EOD
1 2
4 5
7 8
EOD
plot $SomeData # ...

This kind of inline data definition doesn't seem to work on the REPL though... At least I couldn't make it work.

For tabular data files, files may have many columns, some that you want, some that you don't, some that are in the wrong order... To solve that, you use using:

plot "-" using 1:3
1 2 3
4 5 6
7 8 9

The above uses the first and third columns of the dataset.

And with that, if you want to plot several graphs from the same dataset, you can do it like so:

plot "/path/to/file.tsv" using 1:3, "" using 1:4

Assuming the data file has at least 4 columns, the above will plot a line/w.e. using the first and third columns, and then another using the first and fourth columns. The empty string there is a shortcut to mean "the previous dataset/file".

For certain plot types, such as for errorlines or errorbars, you may want or need to use more than 2 columns of data.

And a final plot, pretty much the most advanced I can get right now. The dataset's fields are separated by tabs but your browser or something may present them as spaces, so download the file for greater €€€profit€€€.

$Dataset << EOD
NELEMS	RTIME-MEAN	RTIME-MIN	RTIME-MAX	TOTCYC-MEAN	TOTCYC-MIN	TOTCYC-MAX	TOTINS-MEAN	TOTINS-MIN	TOTINS-MAX	L1DCM-MEAN	L1DCM-MIN	L1DCM-MAX	L2DCM-MEAN	L2DCM-MIN	L2DCM-MAX
100	51.5	51	52	153107.0	151836	154378	49774.5	42768	56781	912.5	849	976	557.0	528	586
1000	386.4	373	396	1117484.4	1076454	1145864	652026.8	634123	680188	5681.0	5118	5990	1660.4	1300	1793
10000	29215.8	29061	29364	21425394.0	10119054	26280869	15941285.8	6979549	22139082	619685.8	169351	1140152	298639.6	17765	644415
100000	2852669.8	2845788	2859074	708231510.6	119757602	1080138950	1145341284.8	184498435	1753956316	93553900.2	13537085	144424582	72470004.8	2110199	120731566
200000	11472829.2	11426994	11502233	922765811.8	181797852	3298279336	1480628498.6	268678225	5359163617	120437869.6	19420767	444041994	87885551.2	2726575	385487784
300000	25821154.8	25729787	25883755	7291510319.0	393353142	9962527951	11904840316.2	594020363	16257913776	988685146.8	45337229	1351805549	907188629.2	12260881	1258022140
400000	45910144.6	45833530	46047114	680904937.0	670501982	689922089	1033095368.6	1029084389	1038083524	80272464.4	80011372	80610671	30805397.6	29580829	32181760
500000	71859779.2	71703444	72007099	11187695199.8	1031362427	25645713435	18329461121.4	1586757574	42172405522	1521273469.8	125041982	3509799991	1394315157.4	56848580	3340745801
600000	103237386.5	103196178	103278595	1492849041.0	1484306515	1501391567	2238479602.0	2234440484	2242518720	177991124.0	177535409	178446839	93661585.0	93119294	94203876
EOD

set terminal svg
set output "errorlines.svg"

# Tell Gnuplot that fields are separated by a tab, as briefly mentioned before.
set datafile separator tab

set title "Some shitty performance right here..."
set key left top

# Ask Gnuplot to use log scales for the XX, YY, and YY2 (right side) axes.
set logscale xyy2 10

set xtics nomirror
set ytics nomirror
set y2tics nomirror

set xlabel "#Elements"
set ylabel "Time (s)"
set y2label "L1 Cache Misses"

set grid

# The `($n/1000000)` syntax asks Gnuplot to divide the values of the field `n`
#   by 1000000 (in this case, the time is in microseconds, so dividing by
#   1000000 converts to seconds).
#
# `with yerrorlines` changes the style of plot, in this case lines with error
#   bars. `yerrorbars` is the same but without the connecting lines.
# Other common styles are `points` (the default?), `lines`, & `linespoints`.
# RTFM for more: `help with`.
#
# The `yerrorlines` style requires additional values. There are some different
#   alternatives (RTFM), but in this case the columns are x:y:ymin:ymax. In
#   this dataset I've used the mean for the YY, but you may use whatever you
#   wish.
#
# `title columnheader` asks Gnuplot to automatically read the given line's
#   legend from the input dataset. Note that Gnuplot supports some LaTeX-like
#   formatting syntax for text. E.g., the text "RTIME_MEAN" would be rendered
#   as "RTIMEMEAN" with the the "M" of "MEAN" in subscript.
#
# Finally, `axis x1y1` & `axis x1y2` set the axes the data should be plotted in
#   -- x1 & x2 for bottom & top XX respectively; y1 & y2 for left & right YY
#   respectively.
plot $Dataset using 1:($2/1000000):($3/1000000):($4/1000000) with yerrorlines title columnheader axis x1y1,\
     ""       using 1:11:12:13                               with yerrorlines title columnheader axis x1y2

-----

Just a couple of notes on security, especially for someone wanting to develop an interface library. These are things that may be useful when writing and running scripts directly in Gnuplot, but that are a security nightmare if left as something to think about tomorrow.

system() a la C is a thing!

And so are backticks like in shell languages! The first line of the following Gnuplot code runs the echo command, but the second one doesn't:

"`echo hello from Gnuplot`"
'`echo hello from Gnuplot`'