1

/tmp Partition Full No Visible Files

I recently ran into this initially horrifying and confusing problem.  I logged into a Linux server to investigate a problem reported where the /tmp partition was full.

A quick df -h gave me the information to confirm the report.  Yes indeed, the /tmp partiition was full.  This is bad news for a production server that uses the /tmp directory to generate files and can definitely result in some unexpected quirks.

[[email protected] ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 2.0G 1.9G 9.8M 100% /tmp

Upon looking into the directory I became a bit worried.  There were no files present using 1.9GB of space.  Well that is definitely a problem because if you can't see the files, how in the hell are you supposed to remove what isn't there?

As it turns out, while tmp is used to store physical files, it is also used to store, what I'll call, virtual files.  The best way I can think of to describe this situation is as follows.  Say you have a process that's generating a report called report12304987.txt.  That report is attached to the process that's generating it and if the process doesn't release the file, then the OS still has to account for that files space somehow.  As a result, the file is not actually on the system file structure, but the space it used is still allocated.  Almost like a recycle bin, it's trashed, but it's not actually off your system yet.

Fortunately, there is a way to find out what is taking up the space using the lsof command and deal with it from there.  I would also leverage grep with this command as follows: lsof | grep /tmp | grep deleted and you should see results similiar to those below.

[[email protected] ~]# lsof | grep /tmp |grep deleted
procname  23282  owner   60u   REG       8,22 1939509248         29 /tmp/fileDUtvs9.TMP (deleted)

If you would like more specific process information then you can grep the procid which is the second entry in the output, in this case 23282, like so ps aux |grep 23282 with output similiar to that below.

[[email protected] ~]# ps aux |grep 23282
owner  23282  5.4  0.8 53112 33544 ?       S    10:42  15:31 procname webfile=5,2223,server1_9007_TCIP

So as you can see by the 1939509248 in the lsof output.  The missing temp file here was using ~1.9 GB of space in the /tmp partition.  This definitely accounts for our space problem.  Now that we've found it, we have three courses of action:

  1. Restart the server
  2. Restart the service/daemon responsible
  3. Kill the associated process.

These all have risks which need to be evaluated in context of your setup and your problem.  With the situation that I encountered we moved forward with killing the process via a simple kill 23282 and then validated that the space had returned.

[[email protected] ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 2.0G 75M 1.9G 4% /tmp

Whew, another day, another problem!  Cheers!

Torry Crass

One Comment

Leave a Comment