Wednesday, May 28, 2008

Unix: sed to change characters in particular position

Suppose you have big file. You want to change character 100-110 with some text. So, simply you want to interchange first 4 characters with the next 4 characters in each line. In these kind of scenarios, sed one liners will be very useful.

See few examples below:

$cat a.tmp
20090918ARPITH2010011634
20090918ARPITH2010012050
20090905ARPITH2010011382
20090824ARPITH2010012075
20090921ARPITH2010012075
--Follwing sed command will interchange first 4 digits with next 4 digits.
$sed 's:\([0-9]\{4\}\)\([0-9]\{4\}\):\2\1:' a.tmp
09182009ARPITH2010011634
09182009ARPITH2010012050
09052009ARPITH2010011382
08242009ARPITH2010012075
09212009ARPITH2010012075
Explanation:
\([0-9]\{4\}\) It defines first four digits. If you want to replace any 4 characters not only digit then replace [0-9] with . like \(.\{4\}\)
\ It is to escape (
() This bracket will define different groups. It is used for backreference and can be used again using \1 or \2 etc based on its occurence.
[0-9] Check for digits
\{4\} Four occurence of digit in [0-9]

\2\1 It says first put second set and then first set.

Following sed command will interchange first 8 characters/digits with next 6 characters. And it will also seperate each set with space.
$sed 's:\(.\{8\}\)\(.\{6\}\):\2 \1 :' a.tmp
ARPITH 20090918 2010011634
ARPITH 20090918 2010012050
ARPITH 20090905 2010011382
ARPITH 20090824 2010012075
ARPITH 20090921 2010012075

Following command will replace characters from 9 to 15 with text "NIRAVB"
sed 's:\(.\{8\}\)\(.\{6\}\):\1NIRAVB:' a.tmp
20090918NIRAVB2010011634
20090918NIRAVB2010012050
20090905NIRAVB2010011382
20090824NIRAVB2010012075
20090921NIRAVB2010012075

Limitation: If you specify more than 255 in curly braces "\{ \}", you will get following error.
$sed 's:\(.\{256\}\)\(.\{6\}\):\1NIRAVB:' a.tmp
sed: Function s:\(.\{256\}\)\(.\{6\}\):\1NIRAVB: cannot be parsed.

Solution: To overcome the issue you can break 335 in 255 + 80. As I have shown in following example. Here I want to change character 335 to 338 with text "TEST"
$cat a.tmp
20090913476178957893478958937589078903745897123890789074589076238904758906179038759081738904715890378904518978907348907579487592763785691287589072386590827890752890475890274890768902748906892748976898000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000005
20090913476178957893478958937589078903745897123890789074589076238904758906179038759081738904715890378904518978907348907579487592763785691287589072386590827890752890475890274890768902748906892748976898000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000005
20090913476178957893478958937589078903745897123890789074589076238904758906179038759081738904715890378904518978907348907579487592763785691287589072386590827890752890475890274890768902748906892748976898000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000005

$sed 's:\(.\{335\}\)\(.\{4\}\):\1TEST:' a.tmp
sed: Function s:\(.\{335\}\)\(.\{4\}\):\1TEST: cannot be parsed.

$sed 's:\(.\{255\}\)\(.\{80\}\)\(.\{4\}\):\1\2TEST:' a.tmp

20090913476178957893478958937589078903745897123890789074589076238904758906179038759081738904715890378904518978907348907579487592763785691287589072386590827890752890475890274890768902748906892748976898000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000TEST00000000000000000000000000000000005
20090913476178957893478958937589078903745897123890789074589076238904758906179038759081738904715890378904518978907348907579487592763785691287589072386590827890752890475890274890768902748906892748976898000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000TEST00000000000000000000000000000000005
20090913476178957893478958937589078903745897123890789074589076238904758906179038759081738904715890378904518978907348907579487592763785691287589072386590827890752890475890274890768902748906892748976898000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000TEST00000000000000000000000000000000005

4 comments:

sheila said...

THANK YOU! THANK YOU! THANK YOU!

adding that note about the 255 limit was very much appreciated. It saved me hours (and hours) of frustration.

Jel said...

same here..tnx a lot for the 255 limit workaround.

Anonymous said...

Good piece of information

Anonymous said...

Thanks so much for the workaround for sed 255 char limitation! I have been troubling for sometimes becoz it is so difficult to get the unix admin to install GNU sed on the unix box...