Subscribe Now

ABC, 123, Ruby, C#, SAS, SQL, TDD, VB.NET, XYZ

Sunday, November 11, 2007

Futzing with FUTS - Part I

Unit testing is a well accepted practice in the software development community. There are many tools and articles devoted it. Google 'unit testing' if you have any doubts.



What about those of us working in the SAS realm? Given that SAS is basically a data-oriented scripting language, is it feasible to even think of unit test SAS code? I would say, "of course it is, it's code afterall!" If there is code, we can test it. There's CUnit for heaven's sake! I've even had the pleasure of using it (and then opted to roll my own C unit tests). :)



A typical SAS program moves data around or analyzes it and consists of data steps (cursor style data manipulation) and/or procedures ("procs"). Also thrown in are miscellaneous statements to make it all work how it should (e.g., libname statements). Here is SAS program that creates a text file with 100 random numbers between 1 and 10.



data _NULL_;
file "c:\my_folder\random.txt";
do i = 1 to 100;
r = 1 + int(10*ranuni(-1));
put r;
end;
run;


Naturally, SAS is capable of WAY more powerful things, but we must start simple.



What if I want to test my random number generator to make sure that it always and only generates numbers between 1 and 10? How would I do such a thing in SAS? Let's change gears for a minute and consider what we would do in C#. In C# we would have solution containing three projects: a class library called RandomNumberGenerator, a console application called RandomClient, and a class library called TestHarness.



RandomNumberGenerator



using System;

namespace UtilityLib
{
public class RandomNumberGenerator
{
private static Random generator = new Random();

public static int Ranuni()
{
return generator.Next(1, 11);
}
}
}


RandomClient



using System;
using System.IO;
using UtilityLib;

namespace RandomClient
{
class Program
{
static void Main(string[] args)
{
using (StreamWriter writer = new StreamWriter(@"c:\my_folder\random.txt"))
{
for(int i=0; i<100; ++i)
writer.WriteLine(RandomNumberGenerator.Ranuni());
writer.Flush();
writer.Close();
}
}
}
}


TestHarness



using System;
using UtilityLib;
using NUnit.Framework;

namespace TestHarness
{
[TestFixture]
public class RandomNumberGeneratorTestSuite
{
[Test]
public void Ranuni_TestBounds()
{
for (int i = 0; i < 100; ++i)
{
int r = RandomNumberGenerator.Ranuni();
Assert.IsTrue(r >= 1 && r <= 10);
}
}

[Test]
public void Ranuni_TestFullRange()
{
int[] counts = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
for (int i = 0; i < 100; ++i)
counts[RandomNumberGenerator.Ranuni() - 1] += 1;
for (int i = 0; i < 10; ++i)
Assert.IsTrue(counts[i] >= 1);
}
}
}


NUnit Success

Now let's get back to SAS. What can we similarly do to test random number generation in SAS? The trick to unit testing in SAS is to place the code you want to test (your black box, if you will), into a macro. So the random number generation code becomes a SAS macro like this.



RandomNumberGenerator



%macro RandomNumberGenerator;
1 + int(10*ranuni(-1))
%mend;


Now my client code looks like this. It just calls the RandomNumberGenerator macro 100 times to create the output file.



RandomClient



data _NULL_;
file "c:\my_folder\random.txt";
do i = 1 to 100;
r = %RandomNumberGenerator;
put r;
end;
run;


Now what about a test harness for RandomNumberGenerator? It is finally time for FUTS (Framework for Unit Testing SAS® programs) to make its appearance. FUTS, a free product from Thotwave, is a wonderful set of easy to use assert SAS macros that test for various conditions - similar to the set of NUnit asserts. Unlike NUnit, FUTS doesn't have a slick GUI, and instead FUTS throws errors into the SAS log when an assert fails and writes nothing to the log in case of success. To run your tests, run the test harness code and check the SAS log for errors.



To test the RandomNumberGenerator macro, I first create a temporary dataset called test1 that contains 100 random numbers. To do a lower and upper bounds check (i.e., all random numbers are between 1 and 10), I select the max(r) and min(r) into macro variables and use the FUTS macro %assert_sym_compare to test (a) minr is greater than or equal to (GE) 1 and (b) maxr is less than or equal to (LE) 10. This is equivalent to Assert.IsTrue(r >= 1 && r <= 10); in the C# test Ranuni_TestBounds() above. The second test, making sure that each number from 1 to 10 is generated at least once, is accomplished by first performing a proc freq (count how many times each value appears), then getting the count for each number (1, 2, ..., 10) into a macro variable and testing, using %assert_sym_compare again, that each count is GE 1.



TestHarness



data test1;
do i = 1 to 100;
r = %RandomNumberGenerator;
output;
end; drop i;
run;

proc sql noprint;
select min(r), max(r) into :minr, :maxr from test1;
quit;
%assert_sym_compare(&minr, 1, type=COMPARISON, operator=GE);
%assert_sym_compare(&maxr, 10, type=COMPARISON, operator=LE);

proc freq data=test1;
tables r / noprint out=freqout;
run;
proc sql noprint;
select count into :count1 from freqout where r=1;
select count into :count2 from freqout where r=2;
select count into :count3 from freqout where r=3;
select count into :count4 from freqout where r=4;
select count into :count5 from freqout where r=5;
select count into :count6 from freqout where r=6;
select count into :count7 from freqout where r=7;
select count into :count8 from freqout where r=8;
select count into :count9 from freqout where r=9;
select count into :count10 from freqout where r=10;
quit;
%assert_sym_compare(&count1, 1, type=COMPARISON, operator=GE);
%assert_sym_compare(&count2, 1, type=COMPARISON, operator=GE);
%assert_sym_compare(&count3, 1, type=COMPARISON, operator=GE);
%assert_sym_compare(&count4, 1, type=COMPARISON, operator=GE);
%assert_sym_compare(&count5, 1, type=COMPARISON, operator=GE);
%assert_sym_compare(&count6, 1, type=COMPARISON, operator=GE);
%assert_sym_compare(&count7, 1, type=COMPARISON, operator=GE);
%assert_sym_compare(&count8, 1, type=COMPARISON, operator=GE);
%assert_sym_compare(&count9, 1, type=COMPARISON, operator=GE);
%assert_sym_compare(&count10, 1, type=COMPARISON, operator=GE);


This last bit of code is repetitive and would ideally be "macro-ized".

No comments: