Google

May 27, 2013

Unix Interview Questions: splitting and archiving files

Unix for software developers

1.Top 17 Unix commands Java developers use frequently 2.History commands 3.Shell scripting 4. Reading from a file
5.Purging older files 6.Splitting and archiving files 7.Emulator, SSH client 8.Unix commands for developers 9.Unix basic interview Q&A

Q. Can you write a Unix script that archives files that are older tahn 7 days from a folder say /data/csv? The number of files in the folder /data/csv needs to be split into a group of 10 files. For example, if you had 25 *.csv files under the folder /data/csv, 3 tar files containing 10, 10, and 5 will be created.

A. Firstly, define a configuration file that contains the source dir, archive dir, how many days old, and split size. For example

zip.cfg file

/cygdrive/c/data/csv         /cygdrive/c/data/csv/zip     +7    10


Now, the shell script zip.sh file that reads the zip.cfg and archives the files.

#!/bin/ksh

TODAY=`date '+%Y%m%d'`
SPLITPREFIX="/cygdrive/c/temp/abcdef_split"

#routine to split files in batch of say 10 and archive them
splitandzip () {
      rm $SPLITPREFIX*
      echo "cd $1; ls  | split -l $2 - $SPLITPREFIX; counter=1"
      cd $1; ls  | split -l $2 - $SPLITPREFIX; counter=1
      ls -l $SPLITPREFIX*
      cat $SPLITPREFIX*
      
      # This loop returns all files in the 'targetdir' directory; sed removes any leading ./ sequences
      for FILENAME in $(/usr/bin/find $SPLITPREFIX* | sed 's/^\.\///'); do
        cat  $FILENAME | xargs echo
           cat $FILENAME  | xargs tar -cvf $1$TODAY'_'$counter'.tar'
     cat $FILENAME | xargs rm -f
           ((counter=counter+1))
      done
}

CFG=$1
echo "Reading config file..... "$CFG

SCRIPTDIR=`pwd`

while read SOURCEDIR TARGETDIR WAITTIME SPLITSIZE
do
     ARCHIVEDIR="$TARGETDIR$TODAY/"
     mkdir $ARCHIVEDIR
     echo  " housekeeping....." $SOURCEDIR $ARCHIVEDIR $WAITTIME $SPLITSIZE
     cd $SOURCEDIR
  #move the files that are older than x days into the archive dire
  find  .  -name . -o -type f  -prune -type f -mtime $WAITTIME -exec mv {} $ARCHIVEDIR \;  ;cd $SCRIPTDIR
     splitandzip $ARCHIVEDIR $SPLITSIZE
done < $CFG



Finally, you run the above script as shown below.

sh logs_zip.sh hk_zip.cfg


The commands used above are

Split command

ls  | split -l 2 - split_prefix


-l: number of files
 -: input file is from the standard input
The last argument is the split file name.

The "ls" list the file names and the split command takes 2 file names at a time and creates files with names like split_prefixaa, split_prefixab, split_prefixac, etc and these files contains maximum 2 names from the ls. For example

File split_prefixab contents

my_file1.csv
myfile2.csv


Find command



find  .  -name . -o -type f  -prune -type f -mtime +2 -exec mv {} /out/archive \;


-name : anything
-type: file
-o: Boolean OR
-mtime : older than 2 days
-exec: execute mv command
{} : selected files to the archive folder /out/archive


Xargs Command


ls  | xargs tar -cvf test.tar'


'ls' (i.e. list) the file names in the current directory, and xargs loops through each listed file and adds it to the tar file test.tar that it creates with the -c option.



More...

In the above example, it has been used as shown below for each split file. The sed command is used to substitute ./ with /

     for FILENAME in $(/usr/bin/find $SPLITPREFIX* | sed 's/^\.\///'); do
        cat  $FILENAME | xargs echo
           cat $FILENAME  | xargs tar -cvf $1$TODAY'_'$counter'.tar'
     cat $FILENAME | xargs rm -f
           ((counter=counter+1))
      done
   


SCRIPTDIR=`pwd`


`pwd` means, execute the pwd command to list the present working directory and store the value into a variable name SCRIPTDIR.

The following code reads each line from the zip.cfg file, which has 4 fields separated by spaces.

 while read SOURCEDIR TARGETDIR WAITTIME SPLITSIZE
 do

  #....do somrthing here

  done < zip.cfg


Note: If you are running on Window, you can practice the above code by downloading the MobaXterm, which is a free Unix emulator for Windows. You need to download the files MobaXterm_Personal_5.0.exe, MobaXterm.ini, and for the korn shell download the plugin Pdksh.mxt3. Put all this files under a same folder and create a short-cut for MobaXterm_Personal_5.0.exe to start the MobaXterm window.

Labels:

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home