String Handling in C Language
A string
in C Language can be defined as a character array terminated with a special
character ‘\0’ to mark the end of the string. Unlike some other high-level
languages like BASIC language, C does not have built-in “string” type data. So,
C has no built-in facilities for manipulating entire arrays(such as copying and
comparing them, etc). It also has very few built-in facilities for manipulating
strings. In fact, C’s only truly built-in string-handling is that it allows us
to use string constants (also called string literals) in our code, Whenever we
write a string, enclosed in double
quotes, C automatically creates an array of characters for us,
containing that string, terminated by the
‘\0‘ character.
The
reason why the last character in a character array should be a ‘\0’ (called
null character , a character with the value 0) is that in most programs that
manipulate character arrays expect it. For example, printf uses the ‘\0’ to
detect the end of a character array when printing it out with a ‘%s’.
There
are two ways to represent a string (of characters) in C, that are :
·
As arrays of type char ( as in char
str[10] or char str[] )
·
As pointers of type char (as in char
*str )
Since,
neither of these approaches provides a complete solution to the representation
of strings in C, so in practice,
elements of both of them are often used together.
String as an array of type char
When
a string is represented as an array of type char, then that array must have the
provision to accommodate the string terminating character ‘\0‘ also. For
example, if a string as a character array is defined as,
char str[10];
Then,
the longest string which we would contain in this array would be 9 characters
long, since, all the strings are terminated by a special character ‘\0 ‘.
Although this special character ‘\0’ is appended automatically (and implicitly)
to strings of characters in most contexts, but if it is not there ( or is lost
anyway), then you may see unpredictable results. For example, in the following
code
char str1[10] = “kangaroo” ;
char str2[10] = “australian” ;
The
first string str1 contains 8 characters, and so in the memory, the ninth
character ‘\0’ is automatically appended
to it, as is shown below :
K |
A |
N |
G |
A |
R |
O |
O |
\0 |
But,
the second string str2 contains 10 characters, and so in memory, the
terminating ‘\0’ is lost, as is shown below :
A |
U |
S |
T |
R |
A |
L |
I |
A |
N |
So,
any attempt made to copy or display this sting would cause unpredictable results,
since the string now has an undefined length, and will be terminated after displaying ( or copying) the
contents of all the memory locations after the string “australian” until a ‘\0’
is encountered.
Intializing the (char array) strings
To initialize a character array (or string) , with a
string literal (or string constant), then you have to initialize that character
array just along with the declaration in a single statement, and not in two
statements (i.e. , first statement for declaring the character array and second
for initializing it!) For example,
Char
str [ ] =”This is ok” ; /* declare
& initialize at the same time is valid*/
It is valid to declare a character
array (string) but invalid to assign value to string is invalid
char
str [11] ; /* declaring a
character array ( or string) , */
str=” This is ok”; /* and then trying to initialize it is
invalid */
Is
invalid, because an array name is a pointer constant, and a constant must be
initialized then and there along with the declaration. So, first declaring, and then assigning it a
string constant “This is OK” is illegal. However, we can initialize a character
array after declaration in some other way - by initializing it one character at
a time! For example,
char str[6] ; /* 5 characters + ‘/0’ */
str [0]=’R’ ; /* OK, 1st character of str initialized to R */
str [1]=’a’ ; /* OK, 2nd character of str initialized to a */
str[2]= ‘m’ ; /*OK, 3rd character of str initialized to m */
str[3] =’a’ ; /*OK, 4th character of str initialized to a */
str[4]=’n’ ; /*OK, 5th character of str initialized to n */
strcpy
and strncpy
We
have just saw that we cannot copy one char array (string) into another, as
shown below:
char str1[] =”string1” ;
char str2[8] ;
str2=str1; /* invalid !
*/
However,
C provides two functions declared in the header file string. h, to do this job,
whose syntax are :
strcpy(string1,
string2) ; : copies contents of string2 to string1 , including the
terminating ‘\0’
strncpy
(string1, string2 ,n) ; : copies first n characters of string2 to
string1 (n<length of string2)
We
see that the strncpy function takes an extra parameter than strcpy function,
which is the number of characters to be copied from one string to another. This
makes copying a little safer, since we can avoid the possibility of putting too
many characters into the array. However, care must be taken to reserve the last
character position for the terminating ‘\0’. If required, this ‘\0’ can be
explicitly added. For example,
Char
str1 [ ] =”Avinash” ;
Char
str2 [8] ;
Strncpy
(str2, str1 , 7) ;
Str2
[7] =’\0’ ;
The
following program illustrates the use of strncpy in safely copying a string
from one array to another . Again, let me make
you remember that a character array is treated as a string, if
terminated with ‘\0’ ;.
#include<stdio.h>
#include<string.h>
void
main()
{
char str1 [25} ;
char str2 [16] ;
printf(“Enter
a string (up to 24 characters) :”) ;
gets
(str1) ;
puts
(str1) ;
strncpy
(str2, str1, 15) ; /* strncpy will
truncate any string
longer
than 16 Characters */
Str2
[15] =’\0’ ; /* the last character
(i.e. 16th character) will
be the terminating
‘\0’ */
Puts
(str2) ;
}
Output :
Enter a string (up to 24 characters): I am
going deeper in C.
I am going deeper in C.
I am going deep
Most
of the time we have to put in our own ‘ \ 0 ‘ at the end of a string. If we
want to print the line with printf, it’s necessary. This code prints the number
of characters before the line:
main
( )
{
int i;
char line[80];
for(i=0 ;(line[i++ ] = getchar( ))!=’
\n’;)
;
line [i ] = ‘\0’;
printf (“%d : \t%s”,i,line);
}
Here
we increment i in the subscript itself, but only after the previous value have
been used.
The
character is read, placed in line[i] , and only then i incremented.
String as a pointer to
type char
From
the above discussion, it is clear that using arrays of type ‘char’ to represent
strings has many drawbacks, since arrays are hard to manipulate and so cannot
be used for many purposes (e.g., they cannot be used as the return type of a
function!) . Another way of representing strings is to use a pointer to
reference a string of characters. To declare a pointer to a string , the pointer
must be of the type ‘char’ , as shown below :
char
*str; /* now, str is a pointer to a string of
characters */
The
above declaration declares a pointer of type char called str, which is able to
point to the first character of a string of characters.
The
benefit of using this approach for representing strings is that now we can
manipulate the pointer by using the arithmetic that is applicable on pointers.
For example, now we can use a char pointer (string) to reference an array of
characters quite easily, as is shown below :
#include<stdio.h>
#include<string.h>
void
main ( )
{
char str1[10]=”Education”; /* string represented by a
char array str1 */
char *str2; /* string represented by char
pointer str2 */
str2=str1; /* ok, put address of str1 to str2, so
that now str2 can refer str1 */
puts (str2); /* displays Education*/
}
Thus,
with pointer representation of strings, we can use assignment operator at any
time after declaration to make one char pointer equal to another. The following
example makes things more clear :
main(
)
{
char str1 [10]=”Education”; /* string represented by a
char
array str1*/
Char str2 [10]=”Literacy”; /* string represented by a
char array str2 */
Char *str3=” Institute”; /*string represented by char
pointer str3*/
Char *str4=”Schooling” /* string represented by char
pointer
str4 */
Str2=str1; /*wrong! ! Won’t compile . Cannot
assign
to char array. */
Str4=str3; /*wrong! ! Won’t compile . Cannot
assign
to char array . */
Str4=”Graduation”; /*valid. Will compile. */
Str4=”Graduation”; /* valid. Will compile. */
}
No strong medicine
comes without its side effects! There is a potential problem with char pointer
representation of strings too! This is so because using the assignment operator
to make them equal. This results in making both pointers pointing to the same
memory location (this is known as shallow copying!).
main()
{
char *str1=”string1”; /* declare 1st string with
initial
value
*/
char *str2=”string2”; /* declare 2nd string with
initial
value */
str1=str2; /* assign str1 equal to str2 */
}
Thus, after
assigning one char pointer to another results in both the pointers now pointing
to the same string, so that change in one will affect the another one. This is
not safe, and can lead to strange behavior by the program in many cases ( e.g.,
when memory is allocated dynamically!). So, for safe storage of strings, char
arrays are more reliable option.
Q.
Can’t we use strcpy (or,
strncpy ) function with char pointer (strings) to copy data (i.e., to
make deep copy) and not addresses?
It is possible to
use strcpy (or, strncpy) function with char pointers to make deep copy (i.e. to
copy data in one string to another string, and not to copy the address of one
string to the another). This is shown below :
char *str1=”Avinash” ;
char *str2=”Vikas” ;
strcpy(str1, str2) ;
Now , the situation
of str1 and str2 is shown below :
Str1 |
I |
K |
A |
S |
\0 |
V |
I |
K |
A |
S |
\0 |
Str2
After using strcpy( )
Both the char
pointers (strings) have the same data,
And even they are
pointing to different addresses.
But, the use of
strcpy can still lead to problems with char pointers (strings), because by using strcpy we are
actually overwriting the data in the subsequent memory locations. This can
create a problem when we declare a char pointer to point to a string of
one length, and then
·
Copy a longer string to it, or
·
Try to allocate a char string to a pre declared
char pointer.
Running the
following program, for example
main()
{
char *str1=”Ram”;
char *str2=”Kumar”;
char *str3 ;
strcpy(str1,str2); /*str2 is larger than str1, so this copy is
not
good, and the result */
puts(str2); /* of strcpy (str1, str2) is that
str2 may
print unpredictable string */
strcpy(str3,
str2) ; /* also, trying to copy a str2 to
str3
may result in unpredictable str3! */
}
May or may not give the desired output,
since its behavior is unpredictable. This is so, because we are trying to copy
a char pointer with a string constant (i.e. str1), to the char pointer that is
not pre-initialized (i.e., str2)! So, any attempt to copy a string into this uninitialized
char pointer may result in copying only the starting address of the string that
is assigned to it. Hence printing a string would print only the first character
of the copied string (ie., of “Ram”), and rest of the characters printed may be
unpredictable because what will be in the subsequent memory locations is
unpredictable! Below is shown the situation of str3
After running the above program:
Str1 |
u |
m |
a |
r |
\0 |
k |
? ? ?
? ?
What are the similarities and
differences between strings represented by char arrays and char pointers?
The similarities , the differences
between character array and character pointer representations of strings are :
·
With strings, the definitions char str[]
=”My string”; and char *str1=”My string” ; have the same effect. In both the
case , a string is created and its starting address is used for str or str1.
·
Each individual character, in both the
representations of the string, can be accessed by any of the following
expressions:
-
str[index] or str1[index]
-
*(str+index) or *(str1+index)
·
We
cannot assign a value to the character string after it is declared.
char str[25] ; /* declaring a character array
string,
and */
str=”This can not be done”;
/* then assigning it a value, is
invalid. */
We
can initialize it only at the time of declaration, as shown below:
char str[25] =”This can be done” ; /*
declaring & initializing in
a
single shot is valid */
On
the other hand, a string represented as char pointer, can be assigned a value
(at the time of declaration as well as) after being declared also. For,
example,
char *str1; /* declaring a character pointer string , and*/
str1=”This can also be done”; /* then assigning it a value, is also
valid.*/
·
We
cannot assign one char array string to another, to copy the contents of the one
to the another. For example,
Char str1 [ ] =”First” ;
Char str2 [6] ;
Str2=str1 ; /* invalid, cannot copy one char array string into another */
But, we can assign one char pointer
string to another, to copy the contents of the one to the another. For example,
Char *str1=”First” ;
Char *str2;
Str2=str1; /*valid, now str2 also points to
“First” */
·
The most significant difference between
string representations char str[] and char *str1 , is that later one is a
pointer variable and so we can modify it. So, str1=str and str1++ both are
legal. But str=str1 and str++ is totally illegal because str isn an array and
not a pointer. So, when we say str, we produce the starting address of the
array, but str is not a variable, and therefore we cannot say str=str1 or
str++.
·
Char arrays are suitable only for storing
strings, and not allows for their manipulation. A char pointer is suitable for
manipulating strings, but can lead to unpredictable results in some
contexts where one char pointer is
copied to another one.
How
do we perform operations on strings?
We have a variety of standard library
string-handling functions declared in the header file string.h readily
available to be used by us . Some of these library functions are :
1. Strlen()
Syntax
: size _t strlen (const char *string) ;
To
find the length of the string given by
string. The number of characters before the terminating character ‘\0’ is
returned. The string may be a string constant or a string variable. Size_t may
be an integral unsigned type. On systems with 2-bytes int, it is equivalent to
unsigned type. On systems with 4-bytes
int, it is equivalent to unsigned long.
2. Strchr()
Syntax : char
*strchr ( const char *string, int ch) ;
Returns
a pointer to the first occurrence of ch in string. Returns NULL if ch is not in string.
3. Strcpy()
Syntax : char *strcpy(char *string1, const char
*string2) ;
To
copy string2 to string1, where string2 may be a string constant or a string
variable. This function effectively assign one string to another string. The
characters in string2 are copied into string1 until ‘\0’ is moved. It returns
string1.
4. Strncpy()
Syntax
: char *strncpy (char *string1, const char * string2, size_t n) ;
Replaces
first n characters or string1 with first n characters of string2, where string2
may be a string constant or a string variable. It returns string1. Also,
If n < strlen (string1), then the
length of string1 is not affected , but
If n >= strlen (string1), then
strncpy ( string1, string2, n) and strcpy (string1, string2) have the same
effect.
5. Strcmp()
Syntax
: int
strcmp (const char *string, const char *string2);
This
function takes 2 strings (string1 and string2) as its arguments to compare
string2 with string1, where any of these strings may be a string constant or a
string variable. This function returns an integer value, depending on the
relative order of the two strings, as follows:
-
a negative value, if string1 is
alphabetically les than string2
-
a positive value, if string1 is
alphabetically greater than string2
-
a zero value, if string1 is identical to
string2
6. strncmp()
Syntax : int
strncmp (const char * string1, const
char *string2, size_t n) ;
This
function takes 2 strings (string1 and string2) and an integer (n) as its
arguments to compare first n characters of string1 with first n characters of
string2, where any of these strings may be a string constant or a string
variable. This function returns an integer value, which can be a negative
value, a positive value, or a zero, depending on whether the first substring is
alphabetically less than, greater than, or identical to second substring.
7. Strcat()
Syntax
: char *strcat (char *string1, const char *string2) ;
This
function takes 2 strings (string1 and string2) as arguments to concatenate
string2 to string1. That is, to append (or add at the end) string2 to string1.
It returns string1. The programmer must ensure that string 1 points to enough
space to hold the result.
8. Strncat()
Syntax
: char *strncat (char *string1, const char * string2, size_t n0 ;
This
function takes 2 strings (string1 and string2) and an integer (n) as arguments,
and appends the first n characters of string2 to string1. It returns string1.
If n>=strlen (string2), this function has same effect as strcat
(string1,string2).
9. Strstr()
Syntax
: char * strstr (const char *string1, const char * string2);
Returns
the address of the first occurrence of string 2 as a substring of string1.
Returns NULL if string2 is not in string1.
10.
Strtok(0)
Syntax : char * strtok (char *string1, const char * string2) ;
Tokenizes
string1 into tokens delimited by the characters found in string2. After the
initial call to strtok (string1, string2), each successive call strtok (NULL,
string2) returns a pointer to the next token found in string1. These calls
change the string1, replacing each delimiter with the NUL character ‘\0’.
11.
Memcpy()
Syntax : vod
*memcpy (const void *string1, const void * string2, size_t n) ;
Copies
the n bytes of memory beginning at string2 into memory location string1, and
returns string1.
12.
Memmove()
Syntax
: int memmove ( const void *string1, const void * string2, size_t n) ;
Same
as memcpy ( ) except strings may overlap.
13.
Memcmp()
Syntax : int memcmp
(const void *string1, const void* string2, size_t n) ;
Compares
the n bytes of memory beginning at string1 with the n bytes of memory beginning
at string2. Returns a negative, zero, or positive integer according to whether
the first string is alphabetically less than, equal to, or greater than the second string.
14.
Memchr()
Syntax : void
*memchr (const void * string, int ch, size_t n);
Searches the n bytes of memory beginning at string for character ch. If ch is found , the address of its first occurrence is returned, otherwise a NULL is returned.
You may also like:
Pl leave comments about the topic.Your valuable comments and suggestions would be appreciated and will help in improving content quality.
Comments
Post a Comment