Monthly Archives: December 2012


Yesterday I got an email friend who complained that "awk is still a mystery". Not being one to ignore a cry for help with the command line, I was motivated to write up a simple introduction to the basics of awk. But where to post it? I know! We've got this little blog we're not doing anything with at the moment (er, yeah, sorry about that folks-- life's been exciting for the Command Line Kung Fu team recently)...

Lesson #1 -- It's a big loop!

The first thing you need to understand about awk is that it reads and operates on each line of input one at a time. It's as if your awk code were sitting inside a big loop:

for each line of input
# your code is here
end loop

Your code goes in curly braces. So the simplest awk program is one that just prints out every line of a file:

awk '{print}' /etc/passwd

Nothing too exciting there. It's just a more complicated way to "cat /etc/passwd". Note that you generally want to enclose your awk code in single quotes like I did in the example above. This prevents special characters in the awk script from being interpolated by your shell before they even get to awk.

Lesson #2 -- awk splits the line into fields

One of the nice features of awk is that it automatically splits up each input line using whitespace as the delimiter. It doesn't matter how many spaces/tabs appear in between items on the line, each chunk of whitespace in its entirety is treated as a delimiter.

The whitespace-delimited fields are put into variables named $1, $2, and so on. Rather than just doing "print" as we did in the last example (which prints out the whole original line), you can print out any of the individual fields by number. For example, I can pull out the percentage used (field 5) and file system mount point (field 6) from df output:

$ df -h -t ext4 | awk '{print $5, $6}'
Use% Mounted
58% /
24% /boot
42% /var
81% /home
89% /usr

The comma in the "print $5, $6" expression causes awk to put a space between the two fields. If you did "print $5 $6", you'd get the two fields jammed up against each other with no space between them.

We could use a similar strategy to pull out just the usernames from ps (field 1):

$ ps -ef | awk '{print $1}'

Not so interesting maybe, until you start combining it with other shell primitives:

$ ps -ef | awk '{print $1}' | sort | uniq -c | sort -nr
188 root
70 hal
2 www-data
2 avahi
2 108
1 syslog
1 rtkit
1 ntp
1 mysql
1 gdm
1 daemon
1 102

Once we sort all the usernames in order, we can use "uniq -c" to count the number of processes running as each user. The final "sort -nr" gives us a descending ("-r") numeric ("-n") sort of the counts.

And this is fundamentally what's interesting about awk. It's great in the middle of a shell pipeline to be able to pull out individual fields that we're interested in processing further.

Lesson #3 -- Being selective

The other cool power of awk is that you can operate on selected lines of your input and ignore the rest. Any awk statement like "{print}" can optionally be preceded by a conditional operator. If a conditional operation exists, then your awk code will only operate on lines that match the expression.

The most common conditional operator is "/.../", which does pattern matching. For example, I could pull out the process IDs of all sshd processes like this:

$ ps -ef | awk '/sshd/ {print $2}'

That output is maybe more interesting when you use it with the kill command to kick people off of your system:

# kill $(ps -ef | awk '/sshd/ {print $2}')

Of course, you better be on the system console when you execute that command. Otherwise, you've just locked yourself out of the box!

While pattern matching tends to get used most frequently, awk has a full suite of comparison and logical operators. Returning to our df example, what if we wanted to print out only the file systems that were more than 80% full? Remember that the percent used is in field 5 and the file system mount point is field 6. If field 5 is more than 80, we want to print field 6:

$ df -h -t ext4 | awk '($5 > 80) {print $6}'

Whoops! The header line ends up getting dumped out too! We'd actually like to suppress that. I could use the tail command to strip that out, but I can also do it in our awk statement:

$ df -h -t ext4 | awk '$5 ~ /[0-9]/ && ($5 > 80) {print $6}'

"$5 ~ /[0-9/" means do a pattern match specifically against field 5 and make sure it contains at least one digit. And then we check to make sure that field 5 is greater than 80. If both of those conditional expressions are true then we'll print out field 6. I made this more complicated that it needs to be just to show you that you can put together complicated logical expressions with "&&" (and "||" for the "or" relationship) and do pattern matching on specific fields if you want to.

Lesson #4 -- You don't have to split on whitespace

While splitting on whitespace is frequently useful, sometimes you're dealing with input that's broken up by some other character, like commas in a CSV file or colons in /etc/passwd. awk has a "-F" option that lets you specify a delimiter other than whitespace.

Here's a little trick to find out if you have any duplicate UIDs in your /etc/passwd file:

$ awk -F: '{print $3}' /etc/passwd | sort | uniq -d

Here we're merely using awk to pull the UID field (field 3) from the colon-delimited ("-F:") /etc/passwd file. Then we sort the UIDs and use "uniq -d" to tell us if there are any duplicates. You want this command to return no output, indicating no duplicates were found.

The Rest is Practice

There's a lot more to awk, but this is more than enough to get you started with this useful little utility. But like any new skill, the best way to master awk is practice. So I'm going to give you a few exercises to work on. I'll post the answers on the blog in a week or so. Good luck!

  1. If you go back and look at the example where I counted the number of processes per user, you'll notice that the "UID" header from the ps command ends up being counted. How would you suppress this?

  2. Print out the usernames of all accounts with superuser privileges (UID is 0 in /etc/passwd).

  3. Print out the usernames of all accounts with null password fields in /etc/shadow.

  4. Print out process data for all commands being run as root by interactive users on the system (HINT: If the command is interactive, then the "TTY" column will have something other than a "?" in it)

  5. I mentioned that if you kill all the sshd processes while logged in via SSH, you'll be kicked out of the box (you killed your own sshd process) and unable to log back in (you've killed the master SSH daemon). Fix the awk so that it only prints out the PIDs of SSH daemon processes that (a) don't belong to you, and (b) aren't the master SSH daemon (HINT: The master SSH daemon is the one who's parent process ID is 1).

  6. Use awk to parse the output of the ifconfig command and print out the IP address of the local system.

  7. Parse the output of "lsof -nPi" and output the unique process name, PID, user ID, and port combinations for all processes that are in "LISTEN" mode on ports on the system.

EU – Commission urges industry to deliver innovative solutions for greater access to online content

 The European Commission has adopted a Communication which sets out parallel tracks of action to be undertaken during this Commission's term of office to ensure that the EU's copyright framework stays fit for purpose in the digital environment. It follows the Commission's orientation debate on content in the digital economy held on 5 December 2012 on the initiative of Commission President José Manuel Barroso. A structured stakeholder dialogue, jointly led by Commissioners Michel Barnier (Internal Market and Services), Neelie Kroes (Digital Agenda) and Androulla Vassiliou (Education, Culture, Multilingualism and Youth), will be launched in 2013 to seek to deliver rapid progress in four areas through practical industry-led solutions.These areas are cross-border access and the portability of services; user-generated content and licensing for small-scale users of protected material; facilitating the deposit and online accessibility of films in the EU; and promoting efficient text and data mining for scientific research purposes.

Exposing the WiFi Password Secrets

This research article throws light on the internal password storage and encryption mechanism used for storing the WiFi account passwords. It explains where the WiFi passwords are stored on different platforms and how to decrypt them using the practical code sample.

Gartner IAM Notes

In case you missed all the live tweeting by me and others, here are some notes from this week's Gartner IAM Summit:
  • There seemed to be a common theme that the primary driver for IAM projects has shifted from operational (early) to compliance (recent) to business enablement (now).
  • Communication to the business stakeholders is key. (not new, but as important as ever)
  • IAM and IAG seem to be converging.

(from Chris Howard’s keynote)

  • The CIO’s business goals are to increase business growth, attract new customers, and reduce cost.
  • The CIO’s IT goals are to deliver solutions, manage infrastructure, reduce cost of IT, and expand analytics.

(from Jeff Wheatman’s session on DG)

  • Despite increasing requirements, less than 10% of orgs will get above maturity level 1 by 2015.
  • Solutions that help identify ownership and accountability are very immature.

Customers will look at solutions that can:

  • 3. Prevent situations (most difficult & expensive)
  • 2. Alert & Notify upon high-risk situation
  • 1. Document & Accept risk (which is OK for many – least costly)

Unstructured data remains a very big problem.

(from Lori Rowland’s session on Selling IAM with Perry Carpenter and Tom Scholtz)

ROI is impossible to demonstrate. Business cases are based on:

  • Efficiency: Any perceived time savings
  • Effectiveness: Improved audit, tracking, regulatory
  • Enablement: enhance business opps, reduce friction, integrate networks, etc.

You must continuously show value to the business by communicating success and building credibility with regular, honest feedback. You can do this by stating goals clearly up front and tracking toward them. One great example was to send a survey to stakeholders on where their pain lies. Measure their pain (1-10). Track progress on pain level improvements to show progress and success.

Roughly 45% of attendees reported that IAM was sponsored by CIO and 45% by CISO. Two things everyone has in common as drivers: Time & Money.