Categories

Top 30 Advanced SAS Interview Questions

Statistical Analysis Systems (SAS) dates back to the 1970s. Currently, it is one of the most considerable software tools used across organisations and companies. It is a closed source analysis tool that is extensively used in the corporate world to make strategic decisions. 

1. What are some functions that the SAS performs?

It is a domain that involves scaling brands and products online to bring about value addition by making use of a plethora of concepts, such as SEO, and channels such as email, social media, and more.

Some of the functions performed by SAS include:

1) Project management and data management

2) Statistical analysis

3) Data warehousing

4) Business planning

5) Operational research and decisional support

6) Information retrieval and quality management

2. Explain the use of double trailing @@ in Input statements.

During the iteration of the data step, the inclusion of double trailing @@ in Input statements talks about SAS holding the current record for the execution of the next Input statement instead of switching to a new record altogether.

3. Name the data types that SAS contains.

There are two types of data that a SAS program contains, such as Numeric and Character.

4. What are the syntax rules followed in the SAS statements?

To write a SAS program, we will need an Editor Window. Here, it comprises a series of statements that are followed by adequate syntax in a comprehensive order for the SAS program to comprehend it. Thus, some of the syntax rules followed in the SAS statements are:

1) The end of any statement gets marked with a semicolon (;)

2) SAS statements are not case sensitive

3) Extra spacing before any statement gets removed automatically

4) A semicolon is used to separate multiple statements as well that appear on a single line

5) A semicolon is used to separate multiple statements as well that appear on a single line

a) A line beginning with a forwarding slash and an asterisk (/*) and ending with an asterisk and a forward slash (*/)

b) A line beginning with an asterisk (*) and ending with a semicolon (;)

5.What do you mean by PDV? State its function.

Program Data Vector (PDV) is one logical concept that is defined as an area of memory where a specific dataset is developed by SAS. Some of the major functions of PDV include:

1) A database has one observation at a time while it is created

2) PDV comprises two automatic variables, such as _N_ and _ERROR_; the former displays the count of datastep being executed and the latter notifies the error occurring during the execution

3) The input buffer for holding the data from an external file is created during compilation

6. What other SAS features do you use for error trapping and data validation?

1) Conditional statements, if then else.

2) Put statement

3) Debug option

7. What different options in SAS can be used to debug the Macro program?

The SAS system provides an array of efficient system options that help debug macro problems and issues. The results linked with using macro options get automatically displayed on the SAS log. Some of the specific options related to macro debugging get displayed in alphabetical order, such as

1) MEMRPT

2) MERROR

8. What is Autocall Facility?

An auto call facility is referred to as a facility in the SAS system that uses the same SAS macro code for multiple programs by storing the macro in a specific location. It helps benefit a variety of programmers by permitting better consistency and faster updates across all of the programs.

9. What is the use of Macros in SAS?

Macros in SAS are used when we wish to use a program step to execute the same Proc step on several datasets. We can accomplish varying repetitive tasks efficiently and quickly with Macros. Moreover, a macro program can be reused several times.

10. In how many ways we can create Macro variables in the Global Symbol Table?

There are five varying ways to create macro variables, such as

1) %Let

2) %Global and %Local

3) Call Symput

4) Proc SQL into clause

5) Macro Parameters

11. Define all ways to create a macro variable.

%LET: One of the seamless methods to define a macro variable is through the %LET statement. It works quite similar to an assignment statement in the dataset. It is followed by the name of the macro variable, an equal sign (=) and the text value to be assigned to the macro variable. The syntax is %LET macro-variable-name = text-or-text-value.

%GLOBAL and %LOCAL: These statements are generally used to force macro variables into certain referencing scopes or environments. When used, they create the macro variables with null values. The syntax is %global dsn;

Call SYMPUT: The SYMPUT call routine is used to assign a Data set value to a specific macro variable. However, it is not a macro level statement but a dataset routine. As such, it is being used as part of a dataset and allows you to directly assign dataset variables’ values to macro variables. The syntax is CALL SYMPUT (macro_varname, value);

PROC SQL Into Clause: PROC SQL can be used to create macro variables by writing directly to the symbol tables. The string gets placed in a macro variable (&CLN) and the SQL COUNT function gets used to count the observations to match the WHERE clause.

Macro Parameters: There are different types of macro parameters, such as positional parameters and keyword or named parameters. The former can be defined by listing the macro variable names that have to receive the parameter values in the %MACRO statement. The latter gets designated by following the name of the parameter with an equal sign (=). They can be used to refine the last version of the %LOOK macro.

12. How would you create multiple observations from a single observation?

To create multiple observations from a single observation, we will use double trailing @@.

13. What is the difference between %NRSTR and %STR functions?

Both %NRSTR and %STR functions are macro quoting functions that are used to conceal the normal meaning of special tokens and other logical and comparison operators; thus, they get displayed as contant text. However, the only difference between both is that %NRSTR can mask up the macro triggers while %STR cannot do so.

14. What is the difference between Global and Local Symbol Table?

The %GLOBAL is basically used to create a global macro variable and remains accessible till the session ends. Post that, it gets removed.

The %LOCAL is used to create a local macro variable during the time the macro is getting executed. Once the macro completes the processing, %LOCAL gets removed.

Advanced SAS Interview Questions for Experienced

If you are a professional or an expert in advanced SAS and looking forward to switching your job, refer to these advanced SAS interview questions for professionals to get a gist of what’s going to come.

15. What is the meaning of Append procedure in SAS?

The Append procedure helps adding the observations from one SAS dataset to the end of the another dataset. The PROC Append procedure doesn’t process the observations in the first datasets.

16. How can you convert the character variable into a numeric variable and vice versa?

Under the SAS programming, we come across several such tasks where a character variable has to be converted into a numeric variable and vice versa. To convert a numeric variable to a character variable, we use PUT( ). In such a situation, the source variable type and the source format will be the same always. For instance: Under the SAS programming, we come across several such tasks where a character variable has to be converted into a numeric variable and vice versa. To convert a numeric variable to a character variable, we use PUT( ). In such a situation, the source variable type and the source format will be the same always. For instance:

char_var= PUT( num_var, 6.);

On the other hand, to convert a character variable to a numeric variable, INPUT ( ) will be used. In such a situation, the source variable type should be the character variable always. For instance:

Num_var= INPUT(char_var,2.0);

17. Which command will you use to sort in the SAS program?

PROC SORT command is the one that will be used to perform sorting, regardless of it is being done on multiple variables or a single variable. This command is performed on a dataset where the new set of data is created as a result of sorting but the original one remains untouched.

The syntax for this command is:

PROC SORT DATA=original OUT=Sorted;
BY variable

Here, 
‘Original’ is in reference to the original dataset
‘Sorted’ is in reference to the result as sorted dataset
‘Variable’s is in reference to the column where sorting is being done

Also, sorting can be done in both descending and ascending orders.

PROC SORT DATA=original OUT=Sorted;
BY DESCENDING variable

18. What are the few ways with which a table lookup is executed in SAS programming?

In SAS programming, the table lookup values can get stored in the following ways:

1) Code

2) Dataset

3) Array

4) Format

5) Hash object

Below-mentioned techniques are used to perform the table lookup in SAS:

1) Merge, join, KEY= Option

2) SELECT/WHEN or IF/THEN statements

3) FORMAT statement, PUT function

4)Array Index Value

5) Hash Object Key Value

Let’s consider an example that shows the Code way to perform the table lookup with the help of IF/THEN statements:
data location;
set myinfo;
if AreaCode='226' then Location='Mumbai, India';
else if AreaCode='212' then Location='Delhi, India';
else Location='Unknown';
run;

19. What is the purpose of the RETAIN statement?

The purpose of the RETAIN statement in SAS programming is to keep the value that was once assigned. Within a SAS program, whenever it has to move from the current iteration to the upcoming datastep, RETAIN statement is used to tell SAS to retain the values instead of setting them as missing.

To understand better, let’s print a program that will showcase the output value of ‘h’ beginning from 1 through the RETAIN statement.
data abc;
set xyz;
RETAIN h 0;
h = h + 1;
run;

20. State the difference between SAS functions and SAS procedures.

The SAS functions anticipate argument values that should be supplied across the observation in the SAS dataset. On the other hand, the SAS procedure anticipates one variable value for each observation. For instance:
data average ;
set temp ;
avgtemp = mean( of T1 – T24 ) ;
run ;

Here, the arguments of mean function have been taken across the observation. The mean function calculates the average of varying values in one observation.

proc sort ;
by month ;
run ;
proc means ;
by month ;
var avgtemp ;
run ;

21. If there is an unsorted dataset, how will you read the last observation to a new dataset

It is possible to read the last observation to a new dataset by using the end= dataset option. For instance:
data work.calculus;
set work.comp end=last;
If last;
run;

22. What is Symget?

Symget is a data step function that returns the macro variable’s value to the data step during its execution. However, it comes with a few restrictions and is not supported by the CAS engine. The syntax for Symget is SYMGET (argument).

Basically, SYMPGET returns the character value that is of the maximum length of the data step's character variable. In case SYMGET is unable to find the macro variable discovered as the argument, it ends up returning a missing value and the program gives a message for an invalid argument.

You can use SYMGET in all of the SAS language programs as it resolves variables at the execution of a program.

23. What is the use of MPRINT Option in Macros?

The MPRINT option helps writing to the SAS log each SAS statement that is generated by a macro. You can use the MPRINT option when you suspect the bug lies in code that is generated in a way you didn’t expect.

For instance, the below-mentioned program can generate a simple DATA step:
%macro second(param);
   %let a = %eval(&param);a
%mend second;

%macro first(exp);
   data _null_;
      var=%second(&exp);
      put var=;
   run;
%mend first;

options mprint;
%first(1+2)

When you will submit the statement with MPRINT option, the following lines will get written to the SAS log:

MPRINT(FIRST):   DATA _NULL_;
MPRINT(FIRST):   VAR=
MPRINT(SECOND):  3
MPRINT(FIRST):  ;
MPRINT(FIRST):   PUT VAR=;
MPRINT(FIRST):   RUN;

VAR=3

24. What is the difference between SAS views and SAS datasets?

The primary difference between a SAS views and SAS datasets is the place where data values get stored:

1) A view has metadata and instructions to retrieve data but it doesn’t store the data values

2) A dataset comprises both the data values and metadata

Feature

SAS View

SAS Dataset

Merge Efficiency

One view is capable of performing a multi-table join. With SAS/CONNECT, a view can join datasets that are stored on varying host computers. 

Multiple data steps are needed to merge datasets by common variables.

Disk Space vs Processing Speed

A view doesn’t store any underlying data; thus, processing speed can get impacted

It stores the full data for faster processing

Data Integrity

Data is dynamic; hence, whenever you refer to a view in a PROC step, the view gets executed and offers the data values as they currently exist in the underlying data

Here, the data remains static

Data Preparation

Data gets processed in the existing form during its execution

Variables can get sorted and indexed before used

Separation of Data from the Consumers’ Data

A view can offer custom, the prepackaged perspective of the underlying data. The query of view can get altered without changing the data

A custom perspective may need a dataset duplication; modifying the data may need replacing the entire dataset.

25. Define the difference between One to One Merge and Match Merge with an example.

If both the datasets in the merge statement have been sorted by ID and every observation in one dataset has a corresponding observation in another dataset, a one to one merge will be applied. For instance:
data mydata1;
input id class $;
cards;
1 Sa
2 Sd
3 Rd
4 Uj
;
data mydata2;
input id class1 $;
cards;
1 Sac
2 Sdf
3 Rdd
4 Lks
;
data mymerge;
merge mydata1 mydata2;
run;

However, if the observations don’t match, the match mergin will be used. For example:

 

data mydata1;
input id class $;
cards;
1 Sa
2 Sd
2 Sp
3 Rd
4 Uj
;
data mydata2;
input id class1 $;
cards;
1 Sac
2 Sdf
3 Rdd
3 Lks
5 Ujf
;
data mymerge;
merge mydata1 mydata2;
by id
run;

 

 

26. Define the parameter of the scan function.

To use the Scan function, we will apply:

scan(argument,n,delimiters)

Here, the argument specifies the expression or character variable to scan. N specifies the word to read and delimiters are the special characters that should be enclosed in the single quotation marks.

27. How can you specify the number of specific conditions and iterations in a single DO loop?

To specify the number of specific conditions and iterations in a single DO loop, we will use the following code:
data work;
do i=1 to 20 until(Sum>=20000);
Year+1;
Sum+2000;
Sum+Sum*.10;
end;
run;

28. What is the process of deleting duplicate observations in SAS?

Through Nodups in the Procedure
Through Nodups in the Procedure
Proc sort data=SAS-Dataset nodups;
by var;
run;

 

Through SQL Query in a Procedure
Proc sort data=SAS-Dataset nodups;
by var;
run;

 

Through Data Cleaning
Proc sort data=SAS-Dataset nodups;
by var;
run;

 

29. Can you state the differences between using “+” operator and sum function?

The “+” operator returns a value that is missing in a situation where any of the arguments are missing. On the other hand, the SUM function returns the sum of all the non-missing arguments. For example:
data mydata;
input x y z;
cards;
33 3 3
24 3 4
24 3 4
. 3 2
23 . 3
54 4 .
35 4 2
;
run;
data mydata2;
set mydata;
a=sum(x,y,z);
p=x+y+z;
run;

30. Differentiate between SAS vs SPSS.

Features

SAS

SPSS

User Interface

Highly interactive UI

Moderately interactive UI

Decision Making

Works along with Enterprise Miner

Possible to obtain answer tree

Data Management

Advantageous than SPSS

Supports Data Management

Documentation

Huge set of technical documentation

Lack of documentation