Introduction
In part 1 of this series I walked through the process of installing sysstat on a Linux-based SQL Server as well as some simple commands for getting data from it. The next question is … how do I get this data into our database server so that we can analyze it? Setting up a SSIS package to read and transform the data would be a great way to go, but I will not be discussing that today because it’s a lot to ask of a reader that is just starting to evaluate the use of Linux in their environment. Instead I will use this post to discuss using the bcp (Bulk Copy) command to load the data into a database without the need for development work.
Sample Database
This particular post is all about minimizing effort, so I will not be using format files for the bulk copy commands. In the name of simplicity I will also be avoiding using any tricks to work around bulk copy limitations (such as inserting into views instead of tables). That means that each type of data that we get from sar/sadf will need to go into it’s own table, that the columns in the table need to line up exactly with columns in the data files (in order … bcp pays attention to column order), and data types need to be close enough that everything can import with only implicit conversions.
I hope that most readers will be disturbed by the composite keys found in this schema. This is a consequence of the fact we are not using format files for the import – since the table columns need to match what’s in the file, there’s no way to add an identity column without doing something tricky. Stay tuned – part three of this series will add format files to the mix which will allow us to clean up the schema a bit.
CREATE DATABASE perf -- Adjust file paths, sizing, etc as needed for your environment
ON PRIMARY( NAME = N'pri', FILENAME = N'/sqldata/perf.mdf' , SIZE = 8MB, FILEGROWTH = 1MB ),
FILEGROUP PERF DEFAULT ( NAME = N'perf', FILENAME = N'/sqldata/perf.ndf', SIZE = 1GB, FILEGROWTH = 1GB )
LOG ON ( NAME = N'perf_log', FILENAME = N'/sqllog/perf_log.ldf' , SIZE = 1GB , FILEGROWTH = 128MB )
GO
ALTER DATABASE perf SET RECOVERY SIMPLE; -- Not required, but simple mode may be easier for testing
GO
USE perf
GO
CREATE SCHEMA sar AUTHORIZATION dbo;
GO
CREATE TABLE sar.cpu_data(
hostname VARCHAR(128) NOT NULL,
interval SMALLINT NOT NULL,
tstamp DATETIME2(0) NOT NULL,
cpu TINYINT NOT NULL,
pct_user DECIMAL(5,2) NOT NULL,
pct_nice DECIMAL(5,2) NOT NULL,
pct_sys DECIMAL(5,2) NOT NULL,
pct_iowait DECIMAL(5,2) NOT NULL,
pct_steal DECIMAL(5,2) NOT NULL,
pct_idle DECIMAL(5,2) NOT NULL,
CONSTRAINT PK_cpu_data PRIMARY KEY CLUSTERED( hostname, tstamp, cpu )
);
CREATE TABLE sar.mem_data(
hostname VARCHAR(128) NOT NULL,
interval SMALLINT NOT NULL,
tstamp DATETIME2(0) NOT NULL,
kbmemfree INT NOT NULL,
kbavail INT NOT NULL,
kbmemused INT NOT NULL,
pct_used DECIMAL(5,2) NOT NULL,
kbbuffers INT NOT NULL,
kbcached INT NOT NULL,
kbcommit INT NOT NULL,
pct_commit DECIMAL(5,2) NOT NULL,
kbactive INT NOT NULL,
kbinact INT NOT NULL,
kbdirty INT NOT NULL,
CONSTRAINT PK_mem_data PRIMARY KEY CLUSTERED( hostname, tstamp )
);
CREATE TABLE sar.disk_data(
hostname VARCHAR(128) NOT NULL,
interval SMALLINT NOT NULL,
tstamp DATETIME2(0) NOT NULL,
dev VARCHAR(128) NOT NULL,
tps DECIMAL(9,2) NOT NULL,
rkbs DECIMAL(9,2) NOT NULL,
wkbs DECIMAL(9,2) NOT NULL,
areq_sz DECIMAL(9,2) NOT NULL,
aqu_sz DECIMAL(9,2) NOT NULL,
await DECIMAL(9,2) NOT NULL,
svctm DECIMAL(9,2) NOT NULL,
pct_ut DECIMAL(5,2) NOT NULL,
CONSTRAINT PK_disk_data PRIMARY KEY CLUSTERED( hostname, tstamp, dev )
);
CREATE TABLE sar.net_data(
hostname VARCHAR(128) NOT NULL,
interval SMALLINT NOT NULL,
tstamp DATETIME2(0) NOT NULL,
iface VARCHAR(128) NOT NULL,
rxpcks DECIMAL(9,2) NOT NULL,
txpcks DECIMAL(9,2) NOT NULL,
rxkBs DECIMAL(9,2) NOT NULL,
txkBs DECIMAL(9,2) NOT NULL,
rxcmps DECIMAL(9,2) NOT NULL,
txcmps DECIMAL(9,2) NOT NULL,
rxxcsts DECIMAL(9,2) NOT NULL,
pct_ut DECIMAL(5,2) NOT NULL,
CONSTRAINT PK_net_data PRIMARY KEY CLUSTERED( hostname, tstamp, iface )
);
CREATE TABLE sar.page_data(
hostname VARCHAR(128) NOT NULL,
interval SMALLINT NOT NULL,
tstamp DATETIME2(0) NOT NULL,
pgins DECIMAL(9,2) NOT NULL,
pgouts DECIMAL(9,2) NOT NULL,
faults DECIMAL(9,2) NOT NULL,
mfaults DECIMAL(9,2) NOT NULL,
pgfrees DECIMAL(9,2) NOT NULL,
pgscanks DECIMAL(9,2) NOT NULL,
pgscands DECIMAL(9,2) NOT NULL,
pgsteals DECIMAL(9,2) NOT NULL,
pct_vmeff DECIMAL(5,2) NOT NULL,
CONSTRAINT PK_page_data PRIMARY KEY CLUSTERED( hostname, tstamp)
);
Sample script for daily data import
Below is a short shell script that could be scheduled on the Linux server to import a day’s worth of data. The script as written will process the previous day’s data and in most environments would probably be run shortly after midnight. Lines 6 and 7 allow us to supply the SQL username and password (this is convenient for testing, but we shouldn’t use clear text passwords if this goes to prod). The command at line 9 calls out (back quotes) to the Linux date command to get the day number (+”%d”) for the previous day (–date=”yesterday”). Line 10 uses this to construct the name of the sysstat data file (using the Ubuntu naming convention – on RedHat location will be different ).
#!/bin/sh
# NOTE - password is for testing. Please don't store password in clear text in prod ... consider
# joining server to an AD domain and using integrated security or using
# a password safe.
sqluser="username_goes_here"
sqlpass="password_goes_here"
day=`date --date="yesterday" +"%d"`
fname="/var/log/sysstat/sa$day"
# On RedHat use fname=/var/log/sa/sa$day"
sadf -P ALL $fname -e 23:59:00 -td > cpu.txt
sadf $fname -e 23:59:00 -td -- -r > mem.txt
sadf $fname -e 23:59:00 -td -- -dp > disk.txt
sadf $fname -e 23:59:00 -td -- -n DEV > net.txt
sadf $fname -e 23:59:00 -td -- -B > page.txt
/opt/mssql-tools/bin/bcp sar.cpu_data in cpu.txt -U $sqluser -P $sqlpass -d perf -S "(local)" -F 2 -c -t ";" -r "\n"
/opt/mssql-tools/bin/bcp sar.mem_data in mem.txt -U $sqluser -P $sqlpass -d perf -S "(local)" -F 2 -c -t ";" -r "\n"
/opt/mssql-tools/bin/bcp sar.disk_data in disk.txt -U $sqluser -P $sqlpass -d perf -S "(local)" -F 2 -c -t ";" -r "\n"
The calls to sadf were discussed in part one of this series but I should spend some time drilling into the bcp commands, whose syntax is timeless. In this case, by “timeless” I really just mean that the syntax not changed much in the past three decades, and hearkens back to the heyday of UNIX-y command-line utilities (I actually used to use this tool all the time around the turn of the millennium when I was working with Sybase SQL Server on Tru64 UNIX). It is absolutely worthwhile for us to work through the syntax, however, because Bulk Copy is usually the fastest way to get data from a text file into SQL Server.
Let’s decompose the first bcp call to see what all of the options do.
- /opt/mssql-tools/bin/bcp – path to bcp command, in case we haven’t added it to our path yet
- sar.cpu_data – name of the table we are working on (in this case inserting into)
- in – bcp supports both input and output. “in” specifies that we are inserting data
- cpu.txt – name of the file we are working on (inserting data from)
- -U $sqluser – SQL username (supplied at top of script). Use of the -U and -P options indicate we are using SQL authentication, if we want integrated authentication we would use -T instead. Joining a Linux server to Active Directory is out of scope for this post.
- -P $sqlpass – SQL password (supplied at top of script)
- -d perf – use the perf database
- -S “(local)” – Server name. In this case we are working locally so we can simply say “(local)”.
- -F 2 – Start at line 2 of the file (because the first line is a header)
- -c – Input file is a text file (character format)
- -t “;” – Field terminator is semi-colon (i.e. this is a semi-colon delimited file). This is a quirk of sadf.
- -r “\n” – Row terminator is a newline character (rather than a semi-colon)
At this point we have seen how to get performance data with the sadf command and then get it into a database where we can start viewing it. But we probably do not care about all of the data that we are importing, and the fact that the tables need to exactly match the layout of the text files limits our schema design choices. Wouldn’t it be nice if we had more flexibility when mapping fields from the file onto table columns? We actually can do that if we do a little extra work before importing to set up a format file – which we will do in the next part of this series.