You can use the SUBSTR function in SAS to extract a portion of a string.
This function uses the following basic syntax:
SUBSTR(Source, Position, N)
where:
- Source: The string to analyze
- Position: The starting position to read
- N: The number of characters to read
Here are the four most common ways to use this function:
Method 1: Extract First N Characters from String
data new_data;
set original_data;
first_four = substr(string_variable, 1, 4);
run;
Method 2: Extract Characters in Specific Position Range from String
data new_data;
set original_data;
two_through_five = substr(string_variable, 2, 4);
run;
Method 3: Extract Last N Characters from String
data new_data;
set original_data;
last_three = substr(string_variable, length(string_variable)-2, 3);
run;
Method 4: Create New Variable if Characters Exist in String
data new_data;
set original_data;
if substr(string_variable, 1, 4) = 'some_string' then new_var = 'Yes';
else new_var = 'No';
run;
The following examples show how to use each method with the following dataset in SAS:
/*create dataset*/
data original_data;
input team $1-10;
datalines;
Warriors
Wizards
Rockets
Celtics
Thunder
;
run;
/*view dataset*/
proc print data=original_data;
Example 1: Extract First N Characters from String
The following code shows how to extract the first 4 characters from the team variable:
/*create new dataset*/
data new_data;
set original_data;
first_four = substr(team, 1, 4);
run;
/*view new dataset*/
proc print data=new_data;
Notice that the first_four variable contains the first four characters of the team variable.
Example 2: Extract Characters in Specific Position Range from String
The following code shows how to extract the characters in positions 2 through 5 from the team variable:
/*create new dataset*/
data new_data;
set original_data;
two_through_five = substr(team, 2, 4);
run;
/*view new dataset*/
proc print data=new_data;
Example 3: Extract Last N Characters from String
The following code shows how to extract the last 3 characters from the team variable:
/*create new dataset*/
data new_data;
set original_data;
last_three = substr(team, length(team)-2, 3);
run;
/*view new dataset*/
proc print data=new_data;
Example 4: Create New Variable if Characters Exist in String
The following code shows how to create a new variable called W_Team that takes a value of ‘yes‘ if the first character in the team name is ‘W’ or a value of ‘no‘ if the first characters is not a ‘W.’
/*create new dataset*/
data new_data;
set original_data;
if substr(team, 1, 1) = 'W' then W_Team = 'Yes';
else W_Team = 'No';
run;
/*view new dataset*/
proc print data=new_data;
Additional Resources
The following tutorials explain how to perform other common tasks in SAS:
How to Normalize Data in SAS
How to Replace Characters in a String in SAS
How to Replace Missing Values with Zero in SAS
How to Remove Duplicates in SAS