Breaking down large CSV files

Today I received a 45Mb CSV file for importing into a database… Needless to say the application we were importing to didn’t seem to like the size of the file, for what ever reason… So I knocked up a quite bash script to create smaller ‘chunks’ defined as a number of lines, to make importing simpler.

I’m sure there’s many way in which is can be simplified, so if you know any I’d like the contributions!

It’s run like this:

$ ./ large-data.csv 5000

The first argument being the filename and the second argument the maximum number of lines for each ‘chunk’. From that 45Mb megalith, 38 files of around 1.2Mb were produced which didn’t seem to break the other end!

Continue reading “Breaking down large CSV files”

MySQL backup script

I’ve recently been learning more and more about bash scripting and the cool functions and features that are included, that i’ll probably never use…

I recently had to set up a cron to backup all MySQL databases but I didn’t want them in one huge file.

So I’ve written a little script, that is probably not entirely efficient, but serves it’s purpose for us:


# get todays date
DATE=`date +"%F"`;

# go to the backup directory
cd /var/backup/mysql

# export the databases, each to it's own file
# the first echo is sent to mysql, so we're basically echoing SHOW DATABASES; to mysql
# then we're chopping off the first line (sed 1d) and passing the remaining input to
# xargs which accepts \n or space delimited arguments as a list and echo's out the
# string mysqldump -u  -p % > %.sql (replacing % with the name) to
# /bin/bash
echo "SHOW DATABASES;" | mysql -u  -p | sed 1d | xargs --replace=% echo "mysqldump -u  -p % > %.sql" | /bin/bash

# compress the backups
tar czf mysql_backup_$DATE.tar.gz *.sql

# remove the large .sql files
rm *.sql

This was tested on a SuSE 11 server, might need some tweaking on other dists.