Linux | Data FLowe Solutions

Introduction

In part 1 of this series I walked through the process of installing sysstat on a Linux-based SQL Server as well as some simple commands for getting data from it. The next question is … how do I get this data into our database server so that we can analyze it? Setting up a SSIS package to read and transform the data would be a great way to go, but I will not be discussing that today because it’s a lot to ask of a reader that is just starting to evaluate the use of Linux in their environment. Instead I will use this post to discuss using the bcp (Bulk Copy) command to load the data into a database without the need for development work.

Sample Database

This particular post is all about minimizing effort, so I will not be using format files for the bulk copy commands. In the name of simplicity I will also be avoiding using any tricks to work around bulk copy limitations (such as inserting into views instead of tables). That means that each type of data that we get from sar/sadf will need to go into it’s own table, that the columns in the table need to line up exactly with columns in the data files (in order … bcp pays attention to column order), and data types need to be close enough that everything can import with only implicit conversions.

I hope that most readers will be disturbed by the composite keys found in this schema. This is a consequence of the fact we are not using format files for the import – since the table columns need to match what’s in the file, there’s no way to add an identity column without doing something tricky. Stay tuned – part three of this series will add format files to the mix which will allow us to clean up the schema a bit.

CREATE DATABASE perf   -- Adjust file paths, sizing, etc as needed for your environment
ON PRIMARY( NAME = N'pri', FILENAME = N'/sqldata/perf.mdf' , SIZE = 8MB, FILEGROWTH = 1MB ),
FILEGROUP PERF DEFAULT ( NAME = N'perf', FILENAME = N'/sqldata/perf.ndf', SIZE = 1GB, FILEGROWTH = 1GB )
LOG ON ( NAME = N'perf_log', FILENAME = N'/sqllog/perf_log.ldf' , SIZE = 1GB , FILEGROWTH = 128MB )
GO

ALTER DATABASE perf SET RECOVERY SIMPLE;  -- Not required, but simple mode may be easier for testing
GO
USE perf
GO

CREATE SCHEMA sar AUTHORIZATION dbo;
GO

CREATE TABLE sar.cpu_data(
	hostname   VARCHAR(128) NOT NULL,
	interval   SMALLINT     NOT NULL,
	tstamp     DATETIME2(0) NOT NULL,
	cpu        TINYINT      NOT NULL,
	pct_user   DECIMAL(5,2) NOT NULL,
	pct_nice   DECIMAL(5,2) NOT NULL,
	pct_sys    DECIMAL(5,2) NOT NULL,
	pct_iowait DECIMAL(5,2) NOT NULL,
	pct_steal  DECIMAL(5,2) NOT NULL,
	pct_idle   DECIMAL(5,2) NOT NULL,
	CONSTRAINT PK_cpu_data PRIMARY KEY CLUSTERED( hostname, tstamp, cpu )
);

CREATE TABLE sar.mem_data(
	hostname   VARCHAR(128) NOT NULL,
	interval   SMALLINT     NOT NULL,
	tstamp     DATETIME2(0) NOT NULL,
	kbmemfree  INT          NOT NULL,
	kbavail    INT          NOT NULL,
	kbmemused  INT          NOT NULL,
	pct_used   DECIMAL(5,2) NOT NULL,
	kbbuffers  INT          NOT NULL,
	kbcached   INT          NOT NULL,
	kbcommit   INT          NOT NULL,
	pct_commit DECIMAL(5,2) NOT NULL,
	kbactive   INT          NOT NULL,
	kbinact    INT          NOT NULL,
	kbdirty    INT          NOT NULL,
	CONSTRAINT PK_mem_data PRIMARY KEY CLUSTERED( hostname, tstamp )
);

CREATE TABLE sar.disk_data(
	hostname VARCHAR(128) NOT NULL,
	interval SMALLINT     NOT NULL,
	tstamp   DATETIME2(0) NOT NULL,
	dev      VARCHAR(128) NOT NULL,
	tps      DECIMAL(9,2) NOT NULL,
	rkbs     DECIMAL(9,2) NOT NULL,
	wkbs     DECIMAL(9,2) NOT NULL,
	areq_sz  DECIMAL(9,2) NOT NULL,
	aqu_sz   DECIMAL(9,2) NOT NULL,
	await    DECIMAL(9,2) NOT NULL,
	svctm    DECIMAL(9,2) NOT NULL,
	pct_ut   DECIMAL(5,2) NOT NULL,
	CONSTRAINT PK_disk_data PRIMARY KEY CLUSTERED( hostname, tstamp, dev )
);

CREATE TABLE sar.net_data(
	hostname VARCHAR(128) NOT NULL,
	interval SMALLINT     NOT NULL,
	tstamp   DATETIME2(0) NOT NULL,
	iface    VARCHAR(128) NOT NULL,
	rxpcks   DECIMAL(9,2) NOT NULL,
	txpcks   DECIMAL(9,2) NOT NULL,
	rxkBs    DECIMAL(9,2) NOT NULL,
	txkBs    DECIMAL(9,2) NOT NULL,
	rxcmps   DECIMAL(9,2) NOT NULL,
	txcmps   DECIMAL(9,2) NOT NULL,
	rxxcsts  DECIMAL(9,2) NOT NULL,
	pct_ut   DECIMAL(5,2) NOT NULL,
	CONSTRAINT PK_net_data PRIMARY KEY CLUSTERED( hostname, tstamp, iface )
);

CREATE TABLE sar.page_data(
	hostname  VARCHAR(128) NOT NULL,
	interval  SMALLINT     NOT NULL,
	tstamp    DATETIME2(0) NOT NULL,
	pgins     DECIMAL(9,2) NOT NULL,
	pgouts    DECIMAL(9,2) NOT NULL,
	faults    DECIMAL(9,2) NOT NULL,
	mfaults   DECIMAL(9,2) NOT NULL,
	pgfrees   DECIMAL(9,2) NOT NULL,
	pgscanks  DECIMAL(9,2) NOT NULL,
	pgscands  DECIMAL(9,2) NOT NULL,
	pgsteals  DECIMAL(9,2) NOT NULL,
	pct_vmeff DECIMAL(5,2) NOT NULL,
	CONSTRAINT PK_page_data PRIMARY KEY CLUSTERED( hostname, tstamp)
);

Sample script for daily data import

Below is a short shell script that could be scheduled on the Linux server to import a day’s worth of data. The script as written will process the previous day’s data and in most environments would probably be run shortly after midnight. Lines 6 and 7 allow us to supply the SQL username and password (this is convenient for testing, but we shouldn’t use clear text passwords if this goes to prod). The command at line 9 calls out (back quotes) to the Linux date command to get the day number (+”%d”) for the previous day (–date=”yesterday”). Line 10 uses this to construct the name of the sysstat data file (using the Ubuntu naming convention – on RedHat location will be different ).

#!/bin/sh

# NOTE - password is for testing. Please don't store password in clear text in prod ... consider
# joining server to an AD domain and using integrated security or using
# a password safe.
sqluser="username_goes_here"
sqlpass="password_goes_here"

day=`date --date="yesterday" +"%d"`
fname="/var/log/sysstat/sa$day"
# On RedHat use fname=/var/log/sa/sa$day"

sadf -P ALL $fname -e 23:59:00 -td > cpu.txt
sadf $fname -e 23:59:00 -td -- -r > mem.txt
sadf $fname -e 23:59:00 -td -- -dp > disk.txt
sadf $fname -e 23:59:00 -td -- -n DEV > net.txt
sadf $fname -e 23:59:00 -td -- -B > page.txt

/opt/mssql-tools/bin/bcp sar.cpu_data in cpu.txt -U $sqluser -P $sqlpass -d perf -S "(local)" -F 2 -c -t ";" -r "\n"
/opt/mssql-tools/bin/bcp sar.mem_data in mem.txt -U $sqluser -P $sqlpass -d perf -S "(local)" -F 2 -c -t ";" -r "\n"
/opt/mssql-tools/bin/bcp sar.disk_data in disk.txt -U $sqluser -P $sqlpass -d perf -S "(local)" -F 2 -c -t ";" -r "\n"

The calls to sadf were discussed in part one of this series but I should spend some time drilling into the bcp commands, whose syntax is timeless. In this case, by “timeless” I really just mean that the syntax not changed much in the past three decades, and hearkens back to the heyday of UNIX-y command-line utilities (I actually used to use this tool all the time around the turn of the millennium when I was working with Sybase SQL Server on Tru64 UNIX). It is absolutely worthwhile for us to work through the syntax, however, because Bulk Copy is usually the fastest way to get data from a text file into SQL Server.

Let’s decompose the first bcp call to see what all of the options do.

/opt/mssql-tools/bin/bcp – path to bcp command, in case we haven’t added it to our path yet
sar.cpu_data – name of the table we are working on (in this case inserting into)
in – bcp supports both input and output. “in” specifies that we are inserting data
cpu.txt – name of the file we are working on (inserting data from)
-U $sqluser – SQL username (supplied at top of script). Use of the -U and -P options indicate we are using SQL authentication, if we want integrated authentication we would use -T instead. Joining a Linux server to Active Directory is out of scope for this post.
-P $sqlpass – SQL password (supplied at top of script)
-d perf – use the perf database
-S “(local)” – Server name. In this case we are working locally so we can simply say “(local)”.
-F 2 – Start at line 2 of the file (because the first line is a header)
-c – Input file is a text file (character format)
-t “;” – Field terminator is semi-colon (i.e. this is a semi-colon delimited file). This is a quirk of sadf.
-r “\n” – Row terminator is a newline character (rather than a semi-colon)

At this point we have seen how to get performance data with the sadf command and then get it into a database where we can start viewing it. But we probably do not care about all of the data that we are importing, and the fact that the tables need to exactly match the layout of the text files limits our schema design choices. Wouldn’t it be nice if we had more flexibility when mapping fields from the file onto table columns? We actually can do that if we do a little extra work before importing to set up a format file – which we will do in the next part of this series.

Data FLowe Solutions – Rick Lowe

MSSQL performance by a recovering developer and relationship counselor for DBA and dev teams.

Category Archives: Linux

Monitoring SQL Server on Linux Part 2: Getting sar Data Into SQL Server (Quick and Dirty)