Tag Archives: utility - Page 2

find file based on its content

On my last post I’ve mentioned tool called Agent Ransack for Windows that has ability to search file based on it’s content (and other criteria such as modification time). On Linux, there’s no need to install additional package/application. There is already a command line tool that will do the trick, i.e. grep.

For example, you want to find all php files in the current folder and its subfolder that contain word “stat

grep --color=auto -inRH 'stat' *.php

Parameters explanation:
–color=auto : this will made the word ‘stat‘ highlighted
-i : ignore pattern case (case in-sensitive — the pattern is ‘stat’)
-nH : show filename and the line number

grep example output

example output

If you want to add file modification time as the other criterion, you must combine find and grep command.
For example, you want to search all php files in the current folder and its subfolder that contain word ‘stat‘ and the file was last modified within less than one day a go:

find . -iname "*.php" -mtime -1 -exec grep --color=auto -inH 'stat' {} \;

Please refer to find man pages for more option about file modification/access/creation time options.

find out disk space used by certain type of files

It’s very easy to check how much does certain files occupied your hard disk space. I’m using this technique before moving my files to other host to estimate how long will it takes to do some transfer. We will use find command to do this (and some pipelining)

cd ./tmp           #move to desired directory where you want to check
find . -iname "*.mp3" -exec du -b {} + | awk '{total += $1} END {print total}'

What does it means?
the “find” command will search for all files matching (case-insensitive) pattern *.mp3 beginning from the current directory. For each file name found, do “du -b” on that file (it means: show the file size in bytes). Then “awk” will summarize all of the files size then at the end, show it to user. The total result would be in bytes.

alternative to HJSplit; split and join files faster!

It’s been few days after my last post. I’ve been really busy with offline projects. So, here I’ll make a quick post.
Why we need alternative to HJSplit? Well, for me, sometimes, HJSplit is hogging my memory and CPU usage. So I need other some kind of replacement that works just like HJSplit but better (and faster and compatible with HJSplit). And I found this piece of work:

FFSJ: The Fastest File Splitter and Joiner


The feature is just like HJPlit, FFSJ can split, join and md5-checksum your files. One thing it miss it “compare” file feature but I think it’s just comparing 2 md5-checksum of both files.

This is splitting window, you have more options than HJSplit
faster free file splitter
As you can see you can choose on how FFSJ split: based on how many parts (HJSPlit didn’t have this) or based on size (like HJSPlit). Also you can encrypt your files (FFSJ will ask you the password to encrypt)

The “join” window, mostly the same as HJSplit:
faster join file

The “checksum” window, nothing special here (except that you can copy the md5 checksum automatically to clipboard):
file md5 checksum

Drawback on Windows 7 64bit:

  • shell menu extension doesn’t work; it means you can’t right-click on a file then join/split/checksum it. However, when you’re double-clicking the first part of split-file, FFSJ will automatically open it.
  • FFSJ does not shown on Windows menu list. FFSJ install itself in c:\windows\sysWOW64\FFSJ\ffsj.exe. Quite a hinder when you’re trying to split a file (which because of reason #1, you can right-click and “split” on the file)

faster search file based on its content

As a programmer with bad house keeping habit, I often forgot where I wrote my code that I want to re-use. I’m also not an SVN/git fan, so, managing my project source codes is always a hinder to me. This tool helped me a lot coping with this kind of trouble. It can find files that has some content we want. It search based on Boolean operators, support regular expression/regex (even had regular expression tester) and can search inside Office 2007 and PDF document. The best thing is: its totally FREE!!! I noticed also that this tool is not hogging your CPU usage and memory, it does the search faster than Windows built-in search function (at least in my system).

Tool used: Agent Ransack – Free File Searching Utility

agent ransack main view

agent ransack example usage

Once installed, Agent Ransack can be accessed via right-click menu from your Windows Explorer. Agent Ransack will begin it’s search with the folder you selected in Explorer. As you can see in the above screen shot example, I’m searching all “*.pas” files inside some directory which contain word “GIS_AREA“. After inputting the search criteria/rules, click on “Start” button or that blue-colored “play” toolbar. Agent Ransack just need less than 1 second to do this search. After the search finished, you can click on the file list (left pane) and see the word occurrence on the right pane. You can also right-click this file list and edit the file using your favorite editor or browse the directory.

Instead of one folder, you can also search in multiple folders at once. Also you can tell Agent Ransack to search just the top folder not including sub-directories.

agent ransack choose folder

Choose which folder to search

You can also specify the file size, so it wont search big or too small file.

agent ransack file size setting

File Size restriction

Even more rules: you can specify the modified date! Very useful if you had a large code base and just want to search the file which modified in most recent days (or a week).

agent ransack set modified time

File modification date rules

More advanced user can play more with Agent Ransack options such as: using regex or not and specify the date rules more detailed.

agent ransack advanced options

Advanced options: using regex

agent ransack advanced date settings

Advanced Date rules settings

bash script — delete files older than specified time

This time we will create a bash script that will delete certain file that older than certain times. This is useful for cleaning up old temporary files. This script will also delete only files that has specified extensions. I’m using this script in my video converter script to cleanup temporary video/audio files so the server hard disk is not quickly full. I suggest you to put this script as a cron job so it will work automatically.

Full bash script source code:

#!/bin/bash

DIR=/var/www/ytconv/tmp                         # target directory where we should do some cleanup
FILES="*.flv *.3gp *.mp4 *.avi *.mp3"           # file extensions that will be cleaned
let "EXPIRETIME=540*60"                         # expire time in seconds

shopt -s nullglob                               # suppress "not found" message

cd $DIR                                         # change current working directory to target directory

for f in $FILES
do
	NOW=`date +%s`                              # get current time
	FCTIME=`stat -c %Y ${f}`                    # get file last modification time
	let "AGE=$NOW-$FCTIME"
	if [[ $AGE -gt $EXPIRETIME ]] ; then
		rm -f $f                                # this file age is more than the EXPIRETIME above, we can delete it
	fi
done

A bit explanation:
DIR variable is the directory where the script should done cleanup, change it as you wish. FILES variable is the file extensions that will be deleted once it’s met the expiration time. EXPIRETIME is the maximum file age, files that older than this value (in seconds) will be deleted (so, in above code, the script will delete files that older than 540 minutes).

The script will then change the working directory to target directory specified above. Check all files that met the file extensions requirement. Take current time in seconds (time since Epoch). Get file last modification time (also in seconds since Epoch). Then check if the file’s age is older than EXPIRETIME to determine whether it should be deleted or not.