INCLUDE_DATA

Wicket Homepage Redirecting

With the Wicket Java Web framework where a lot of magic happens behind the scenes, and almost no XML needs to be configured. Most of it is done with Java. It has a lot of pretty cool features. I just came across this one.

In the init setup of your class that extends WebApplication, you can “mount” your pages at paths, like this:

@Override
protected void init() {
	super.init();
	mountBookmarkablePage("/search", SearchPage.class);
}

So when you goto http://somewhere.com/somecontext/search, it gives you the SearchPage class.

You can also specify a HomePage index, like this:

public Class getHomePage() {
	return IndexPage.class;
}

Then when you goto http://somewhere.com/somecontext/, it gives you the IndexPage class.

The cool subtle thing here is what happens when you do them both; i.e. you mount SearchPage to /search, and you override getHomePage to return SearchPage.class. Do this, and when you goto http://somewhere.com/somecontext/, it will redirect to http://somewhere.com/somecontext/search!

Hello, Rule 30

Conus Pennaceus

Conus Pennaceus

Hello, Rule 30, didn’t expect to see you there! Thanks, Nature, for making the universe such an amazing place to live.

SQL Optimization: Union vs. Union All

Everyone should learn the difference between Union and Union All. Knowing it will make you a better programmer, and it’s fairly trivial to understand.

SELECT * FROM apples
UNION
SELECT * FROM oranges

When you know for a fact that there will never be any common rows between the apples table and the oranges table, this query will be slightly faster with at low cardinality, and incredibly faster at high cardinality by using “UNION ALL”

SELECT * FROM apples
UNION ALL
SELECT * FROM oranges

The difference between the two queries is this: UNION ALL will simply concatenate the two queries together into the resultset. Just using UNION will concatenate, but then remove duplicates (do a distinct sort). Leaving out this second step can vastly reduce the time it takes for your query to run.

Abusing Hash objects as a Stack in SAS Data Step

Everyone in the computer science field (should) eventually learn or realize that any recursive function can be rewritten as an iterative process with the aid of a stack. Since a SAS Data Step is iterative, it’s fairly easy to look up children of a tree node in metadata, but nearly impossible to recursively look up children of children, and so fourth, without breaking out of the Data Step loop and into macro code, because there’s no native program stack.

You can use the LINK keyword, but that just keeps the PC register in a stack to return to where you were a subroutine was called, it doesn’t recreate the data vector. Because of this, all variables in a SAS Data Step behave as global within the data step block.

You could use a _TEMPORARY_ array as a stack, but its size has to be hard-coded to a predetermined depth.

SAS 9 has a rarely used Hash object, which can be used within a Data Step. It grows as needed, so it can be used to function as a functionally infinite length array, and further abused to function as a stack within a Data Step. The following code demonstrates this:

data _null_;
  declare hash stack();
  length stackvar $100;
  length stackdepth 8.;
  stackdepth=0;
  rc=stack.defineKey('stackdepth');
  rc=stack.defineData('stackvar');
  rc=stack.defineDone();

  stackvar='a';
  stackdepth+1;
  rc=stack.add();
  put 'push' stackdepth= stackvar=; 

  stackvar='b';
  stackdepth+1;
  rc=stack.add();
  put 'push' stackdepth= stackvar=; 

  stackvar='c';
  stackdepth+1;
  rc=stack.add();
  put 'push' stackdepth= stackvar=; 

  rc=stack.find();
  rc=stack.remove();
  put 'pop' stackdepth= stackvar=;
  stackdepth+-1;

  rc=stack.find();
  rc=stack.remove();
  put 'pop' stackdepth= stackvar=;
  stackdepth+-1;

  rc=stack.find();
  rc=stack.remove();
  put 'pop' stackdepth= stackvar=;
  stackdepth+-1;

  stackvar='x';
  stackdepth+1;
  rc=stack.add();
  put 'push' stackdepth= stackvar=; 

  stackvar='y';
  stackdepth+1;
  rc=stack.add();
  put 'push' stackdepth= stackvar=; 

  stackvar='z';
  stackdepth+1;
  rc=stack.add();
  put 'push' stackdepth= stackvar=; 

  rc=stack.find();
  rc=stack.remove();
  put 'pop' stackdepth= stackvar=;
  stackdepth+-1;

  rc=stack.find();
  rc=stack.remove();
  put 'pop' stackdepth= stackvar=;
  stackdepth+-1;

  rc=stack.find();
  rc=stack.remove();
  put 'pop' stackdepth= stackvar=;
  stackdepth+-1;
run;

It outputs this:

pushstackdepth=1 stackvar=a
pushstackdepth=2 stackvar=b
pushstackdepth=3 stackvar=c
popstackdepth=3 stackvar=c
popstackdepth=2 stackvar=b
popstackdepth=1 stackvar=a
pushstackdepth=1 stackvar=x
pushstackdepth=2 stackvar=y
pushstackdepth=3 stackvar=z
popstackdepth=3 stackvar=z
popstackdepth=2 stackvar=y
popstackdepth=1 stackvar=x

By doing this, you can traverse a tree in metadata.

Neptune’s Pride: Things I Learned The Hard Way

I’ve played a few (4 and a half) games of Neptune’s Pride, now. It’s a pretty slow-paced long-term diplomacy/risk-like online flash game. Lots of people have reviewed it, some of them discussing obvious strategy.

The following are nuances regarding the game’s mechanics; things I wish I could have found on some nerd’s blog, but instead had to find out the hard way.

The game has timers on ticks

With a tick every 10 minutes by default. Every tick in the game, jump preps countdown and fleets move. ETAs are reported in approximate units of time because of this.

Payday times shift later over time

Probably because there are 144 ticks in a day, but additional time is used to compute locations before the next delay. This happens because the delay loop PROBABLY does this: compute stuff, wait 600 seconds, repeat… rather than this: check clock, compute stuff, check clock again and find difference in number of seconds, wait 600-minus-that-difference seconds.

Fleets arriving on the same tick fight as one fleet

So if you’re OCD enough, you can have fleets from two different systems timed to arrive on the same tick at a system, and will be considered one fleet.

During a tick, fleets leave before they land

This means that a fleet leaving with an ETA of ~30 minutes will evade a fleet landing in ~30 minutes, however, a fleet leaving with an ETA of ~30 minutes will not evade a fleet landing in ~20 minutes.

Fleet speeds and ETAs are recalculated mid-flight when speed tech advances

This is pretty deadly if you depend on ETAs of fleets to arrive at a system in order to form a defense.

Looping through SAS metadata objects

The following is my attempt at SAS Golf, where in a Data Step, I try to list the associations of a metadata object to the log. This is basically just taking advantage of combining SAS for loops with SAS while loops. I thought it was cool. Usually I find myself doing this with metadata_getnasn, metadata_getnprp, and metadata_getnobj

  • n – index for the nth association
  • u – uri to object
  • a – association
data;
  ...
  do n=1 by 1 while(n

UPDATE: The metadata_getnass function has been renamed to metadata_getnasn

Dramatically Increasing SAS DI Studio performance of SCD Type-2 Loader Transforms

SCD LoaderIn SAS DI Studio 3.4 (and I imagine in future versions), the prepackaged code for the SCD Type-2 Loader works like this: Does the dataset exist? If not, create an empty dataset with structure and indexes as defined from metadata. Then detect differences between it and the source dataset and the target dataset, expire any observations that are modified or deleted by setting their valid-to-date to now, and append any modified or new observations with a valid-from-date. The expire bit is done in-place with a data step modify statement, and the append is done with PROC APPEND. I assume this is done to reduce the amount of locking necessary on the dimension dataset. Because new observations are appended, the dataset never actually gets sorted by the business key, so this could lead to exponential growth over time on the expire bit; every time the transform wants to change a single observation’s valid-to-date, it scans the entire table. And it doesn’t help that compression is off by default.

In the instance below, we had an uncompressed dataset with a modest 220,000 rows. The dataset had a variable “description” defined as a 4000-length string which was usually but not always null. Most steps of the SCD Loader run in a few seconds, but the following is usually the bottleneck. On a run one night to populate a new variable, this data step ran in a little over 9 hours:

     data PRESTAGE.LOAD_SW_ENTITY_RELEASE_X_HOST;
        modify PRESTAGE.LOAD_SW_ENTITY_RELEASE_X_HOST
           work.etls_close
            (rename = (ETLS_KEY = SW_ENTITY_REL_X_HOST_ID
           ETLS_FROMDATE = VALID_FROM_DTTM))
            updatemode=nomissingcheck;
        by SW_ENTITY_REL_X_HOST_ID VALID_FROM_DTTM;
        VALID_TO_DTTM = ETLS_CLSDATE;
        if %sysrc(_SOK) eq _iorc_ then
           replace;
        _iorc_ = 0;
        _error_ = 0;
     run;
NOTE: There were 1 observations read from the data set PRESTAGE.LOAD_SW_ENTITY_RELEASE_X_HOST.
NOTE: The data set PRESTAGE.LOAD_SW_ENTITY_RELEASE_X_HOST has been updated.  There were 45924 observations rewritten, 0
     observations added and 0 observations deleted.
NOTE: There were 45924 observations read from the data set WORK.ETLS_CLOSE.
NOTE: DATA statement used (Total process time):
     real time           9:06:00.46
     cpu time            9:05:58.43

Turning compression on reduced the size of this dataset from 1.1 gigs to 4.5 megs, mostly from compressing the 4000-char string that was usually empty; with compression off, an X-length string always takes up X bytes because it’s faster to seek to a certain observation if all observations are the same size. Additionally, in metadata an index was defined on the two variables used above in the BY statement. I ran this PROC DATASETS statement to create the index by hand (the Loader would create them if the table didn’t exist, but it assumes indexes exist if the table exists).

proc datasets library=prestage nolist;
 modify load_sw_entity_release_x_host;
 index create ndx=(sw_entity_rel_x_host_id valid_from_dttm) / unique;
quit;

With this metadata in place, the SCD Loader uses the index in the datastep and generates the following datastep instead:

     data PRESTAGE.LOAD_SW_ENTITY_RELEASE_X_HOST;
        set work.etls_close(rename = (ETLS_KEY = SW_ENTITY_REL_X_HOST_ID
                                      ETLS_FROMDATE = VALID_FROM_DTTM));
        modify PRESTAGE.LOAD_SW_ENTITY_RELEASE_X_HOST
            key=ndx / unique;
        VALID_TO_DTTM = ETLS_CLSDATE;
        if %sysrc(_SOK) eq _iorc_ then
           replace;
        _iorc_ = 0;
        _error_ = 0;
     run;

By using the index, the runtime was reduced to under two seconds.

NOTE: There were 45924 observations read from the data set WORK.ETLS_CLOSE.
NOTE: The data set PRESTAGE.LOAD_SW_ENTITY_RELEASE_X_HOST has been updated.  There were 45924 observations rewritten, 0
     observations added and 0 observations deleted.
NOTE: DATA statement used (Total process time):
     real time           1.41 seconds
     cpu time            1.21 seconds

Friends in 2009

Lots of differences between now and last year. It’s interesting that now that I’m single, a good percentage of my 10 closest friends are also single. The age distribution is a lot wider now too, I expect this trend to continue as I am no longer meeting friends almost exclusively in school.

Summary of Top 10 Closest Friends at End of 2009

2009_7_virgin

Easy Inserting/Appending Libraries into FMTSEARCH path lists in SAS 9.2

In SAS 9.1.3 and prior, Options list (such as FMTSEARCH for libraries with format catalogs, and SASAUTOS for paths that contain macros shared across jobs) were annoying to work with. If you have nested code that wants to add a library or a path the list, doing so like this could potentially clobber statements executed outside of the nested block:

OPTIONS FMTSEARCH=(mylib.formats);

To really be safe, you want to add your library to the list. Doing this wasn’t very intuitive. I had always done this like this:

%let FMTSEARCH = %substr(%sysfunc(getoption(fmtsearch) ) ,2);
OPTIONS FMTSEARCH = (mylib.formats &FMTSEARCH;

But in 9.2, there are two new System Options, INSERT= and APPEND=. This is now as simple as

OPTIONS INSERT=(FMTSEARCH=mylib.formats);

Counting distinct variables in SQL with SAS

One way to get the count of distinct variables, which works in most flavors of SQL, is to use a subquery. For instance, in Oracle this is:

SELECT count(SELECT DISTINCT foo FROM table) FROM dual

In SAS, using PROC SQL, you can do that too, but you can also simply do this:

SELECT count(distinct foo) FROM table