Monthly Archives: January 2013

Virtual Directory as Database Security

I've written plenty of posts about the various use-cases for virtual directory technology over the years. But, I came across another today that I thought was pretty interesting.

Think about enterprise security from the viewpoint of the CISO. There are numerous layers of overlapping security technologies that work together to reduce risk to a point that's comfortable. Network security, endpoint security, identity management, encryption, DLP, SIEM, etc. But even when these solutions are implemented according to plan, I still see two common gaps that need to be taken more seriously.

One is control over unstructured data (file systems, SharePoint, etc.). The other is back door access to application databases. There is a ton of sensitive information exposed through those two avenues that aren't protected by the likes of SIEM solutions or IAM suites. Even DLP solutions tend to focus on perimeter defense rather than who has access. STEALTHbits has solutions to fill the gaps for unstructured data and for Microsoft SQL Server so I spend a fair amount of time talking to CISOs and their teams about these issues.

While reading through some IAM industry materials today, I found an interesting write-up on how Oracle is using its virtual directory technology to solve the problem for Oracle database customers. Oracle's IAM suite leverages Oracle Virtual Directory (OVD) as an integration point with an Oracle database feature called Enterprise User Security (EUS). EUS enables database access management through an enterprise LDAP directory (as opposed to managing a spaghetti mapping of users to database accounts and the associated permissions.)

By placing OVD in front of EUS, you get instant LDAP-style management (and IAM integration) without a long, complicated migration process. Pretty compelling use-case. If you can't control direct database permissions, your application-side access controls seem less important. Essentially, you've locked the front door but left the back window wide open. Something to think about.

Alissa Torres, Drunken Security News – Episode 317 – January 24, 2013

Alissa Torres is a certified SANS Instructor and Incident Handler at Mandiant, finding evil on a daily basis. Alissa began her career in information security as a Communications Officer in the United States Marine Corps and is a graduate of University of Virginia and University of Maryland. She's on tonight to talk to us about Bulk Extractor.

Cisco responds to the WRT54GL Linksys router hack. They're working on a fix for people being able to remotely get a root shell, but their recommendation in the meantime? Only let friends use your router. Oh yeah, with friends like these...

Have you signed up for the SANS webinar titled "Uninstall Java? Realistic Recommendation? No. Insanity? Yes!" with John Strand, Paul Asadoorian and Eric Conrad? It's coming up, this Tuesday at 2 pm EST.

Do you have all the HTTP response codes memorized? Someone is proposing a new range of 700-level codes Some that might be helpful: HTTP 725: It Works On My Machine. And I fear how often the Security Weekly web server will return an HTTP 767. It simply reads "Drunk".

Former Dawson College graduate student, Ahmed Al-Khabaz, who was expelled for allegedly hacking the university's infrastructure, has received multiple job offers. The guys talks about the situation with a little more detail than is often reported. He found a vulnerability and reported it. So far, so good. But then a little while later, he pointed a scanner at the vulnerability that he found, presumably setting off alarms. Even worse, the noise from the scanner pointed back to him. Once he reported the vulnerability, what's he doing going back to it, and as "evil" Jack mentions, why didn't Al-Khabaz cover his tracks better when he switched his hat color? Nonetheless, lots of weirdness abounds in this story. The university overreacted (what?!? a university overreacted? never!) instead of using this as a learning opportunity. Plus, the student may have made some mistakes along the way, yet he comes out better for it. So is the lesson here to hack your way to a job? Is that what the universities are for? Umm, no. Never go after something that you don't have explicit, written permission to hack. Plus there's Paul's suggestion of punishment here, the student should have been required to work the help desk for three months. That's enough to teach anyone a good lesson.

How do I remember strong passwords?

 

Mikko Hypponen and Sean Sullivan from the F-Secure Labs recently sat down to answer some questions on online banking security from our F-Secure Community. The first question dealt with remembering strong passwords. They recommended a password manager, pass-phrases and simpler passwords for less critical accounts.

Here’s a system we recommend to create strong passwords you can remember for you most critical accounts.

More answers are coming soon or you can treat yourself to them all now.

Performing Clean Active Directory Migrations and Consolidations


Active Directory Migration Challenges

Over the past decade, Active Directory (AD) has grown out of control. It may be due to organizational mergers or disparate Active Directory domains that sprouted up over time, but many AD administrators are now looking at dozens of Active Directory forests and even hundreds of AD domains wondering how it happened and wishing it was easier to manage on a daily basis.

One of the top drivers for AD Migrations is enablement of new technologies such as unified communications or identity and access management. Without a shared and clearly articulated security model across Active Directory domains, it’s extremely difficult to leverage AD for authentication to new business applications or to establish the related business rules that may be based on AD attributes or security group memberships.

Domain consolidation is not a simple task. Whether you're moving from one platform to another, doing some AD security remodeling, or just consolidating domains for improved management and reduced cost, there are numerous steps, lots of unknowns and an overwhelming feeling that you might be missing something. Sound familiar?

One of the biggest fears in Active Directory migration projects is that business users will lose access to their critical resources during the migration. To reduce the likelihood of that occurring, many project leaders choose to enable a dirty migration; they enable historical SIDs which carry old credentials and group memberships from the source domain and apply them to the new domain. Unfortunately, enabling historical SIDs proliferates one of the main challenges that initially drove the migration project. The dirty migration approach maintains the various security models that have been implemented over the years making AD difficult to manage and near impossible to understand who has what rights across the environment.

Clean Active Directory Migrations

The alternative to a dirty migration is to disallow historical SIDs and thereby enable a clean migration where rights are applied as-needed in an easy-to-manage and well articulated security model. Security groups are applied on resources according to an intentional model that is defined up-front and permissions are limited to a least-privilege model where only those who require rights actually get them.

All consolidation or migration projects aren't the same. The motivations differ, the technologies differ, and the Active Directory organizational structure and assets differ wildly. Most solutions on the market provide point A to point B migrations of Active Directory assets. This type of migration often contributes to making the problem worse over time. There's nothing wrong with using an Active Directory tool to help you perform an AD forest or domain migration, but knowing which assets to move and how to structure or even restructure them in the target domain is critical.

Enabling a clean migration and transforming the Active Directory security model requires a few steps to be followed. It starts with assessment and cleanup of the source Active Directory environments. You should assess what objects are out there, how they’re being used, and how they’re currently organized. Are there dormant user accounts or unused computer objects? Are there groups with overlapping membership? Are there permissions that are unused or inappropriate? Are there toxic or high-risk conditions in the environment? This type of intelligence enables visibility into which objects you need to move, how they're structured, how the current domain compares to the target domain, and where differences exist in GPO policies, schema, and naming conventions. The dormant and unused objects as well as any toxic or high-risk conditions can be remediated so that those conditions aren’t propagated to the target environment.

Once the initial assessment and cleanup is complete, a gap-analysis should be performed to understand where the current state differs from the intended model. Where possible, the transformation should be automated. Security groups can be created, for example, based on historical user activity so that group membership is determined by actual need. This is a key requirement for numerous legal regulations.

The next step is to perform a deep scan into the Active Directory forests and domains that will be consolidated and look at server-level permissions and infrastructure across Active Directory, File Systems, Security Policies, SharePoint, SQL Server, and more. This enables the creation of business rules that will transform existing effective permissions into the target model while adhering to new naming conventions and group utilization. Much of this transformation should be automated to avoid human error and reduce effort.

Maintaining a Clean Active Directory

Once the migration or consolidation project is complete and adherence to the intended security model has been enforced, it’s vital that a program is in place to maintain Active Directory in its current state. There are a few capabilities that can help achieve this goal.

First, a mandatory periodic audit should be enforced. Security Group owners should confirm that groups are being used as-intended. Resource owners should confirm that the right people have the right level of access to their resources. Business managers should confirm that their people have access to the right resources. These reviews should be automated and tracked to ensure that these reviews are completely thoroughly and on-time.

Second, tools should be implemented that provide visibility into the environment answering questions as they come up. When a security administrator needs to see how a user is being granted rights to something they should perhaps not have, they’ll need tools that provide answers in a timely fashion.

Third, a system-wide scan should be conducted regularly to identify any toxic or high-risk conditions that occur over time. For example, if a user account becomes dormant, notification should be sent out according to business rules. Or if a group is nested within itself perhaps ten layers deep, you want an automated solution to discover that condition and provide related reporting.

Finally, to ensure adherence to Active Directory security policies, a real-time monitoring solution should be put in place to enforce rules, prevent unwanted changes via event blocking, and to maintain an audit trail of critical administrative activity.

Complete visibility across the entire Active Directory infrastructure enables a clean AD domain consolidation while making life easier for administrators, improving security, and enabling adoption of new technologies

About the Author

Matt Flynn has been in the Identity & Access Management space for more than a decade. He’s currently a Product Manager at STEALTHbits Technologies where he focuses on Data & Access Governance solutions for many of the world’s largest, most prestigious organizations. Prior to STEALTHbits, Matt held numerous positions at NetVision, RSA, MaXware, and Unisys where he was involved in virtually every aspect of identity-related projects from hands-on technical to strategic planning. In 2011, SYS-CON Media added Matt to their list of the most powerful voices in Information Security.

Reduce Risk by Monitoring Active Directory

Active Directory (AD) plays a central role in securing networked resources. It typically serves as the front gate allowing access to the network environment only when presented with valid credentials. But Active Directory credentials also serve to grant access to numerous resources within the environment. For example, AD group memberships are commonly used to manage access to unstructured data resources such as file systems and SharePoint sites. And a growing number of enterprise applications leverage AD credentials to grant access to their resources as well.

Active Directory Event Monitoring Challenges

Monitoring and reporting on Active Directory accounts, security groups, access rights, administrative changes, and user behavior can feel like a monumental task. Event monitoring requires an understanding of which events are critical, where those events occur, what factors might indicate increased risk, and what technologies are available to capture those events.

Understanding which events to ignore is as important and knowing which are critical to capture. You don't need immediate alerts on every AD User or Group change which takes place but you want visibility into critical high-risk changes: Who is adding AD user accounts? ...adding a user to an administrative AD group? ...making Group Policy (GPO) changes?

Active Directory administrators face a complex challenge that requires visibility into events as well as infrastructure to ensure proper system functionality. A complete AD monitoring solution doesn't stop at user and group changes. It also looks at Domain Controller status: which services are running, disk space issues, patch levels, and similar operational and infrastructure needs. There are numerous technical requirements to get that level of detail.

AD administrators require full access in the environment which presents another set of challenges. How do you enable administrators to do their job while controlling certain high-risk activity such as snooping on sensitive data or accidentally making GPO changes to important security policies? Monitoring Active Directory effectively includes either preventing unintended activities through change blocking or deterring activities through visible monitoring and alerting.

Monitoring Active Directory Effectively

Effective audit and monitoring solutions for Active Directory address the numerous challenges discussed above by providing a flexible platform that covers typical scenarios out-of-the-box without customization but also allows extensibility to accommodate the unique requirements of the environment.

Data collection is the cornerstone of any Active Directory monitoring and audit solution. Collection must be automated, reliable, and non-intrusive on the target environment. Data that can be collected remotely without agents should be. But, when requirements call for at-the-source monitoring, for example when you want to see WHO did it, what machine they came from, capture before-and-after values, or block certain activities, a real-time agent should be available to accommodate those needs. The data collection also needs to scale to the environment’s size and performance requirements.

Once data has been collected, both batch and real-time per-event analysis are required to meet common requirements. For example, you may want an alert on changes to administrative groups but you don’t want alerts on all group changes. Or you may want a report that highlights all empty groups or groups with improper nesting conditions. This analysis should provide intelligence out-of-the-box based on industry expertise and commonly requested reporting. But it should also enable unique business questions to be answered. Every organization uses Active Directory in unique ways and custom reporting is an extremely common requirement.

Finally, once data collection and analysis phases have been completed, AD monitoring solutions should provide a flexible reporting interface that provides access to the intelligence that has been cultivated. As with collection and analysis, the reporting functionality should include commonly requested reports with no customization but should also enable report customization and extensibility. Reporting should include web-accessible reports, search and filtering, access to the raw and post-analysis data, and email or other alerting.

An effective Active Directory monitoring solution provides deep insight on all things Active Directory. It should enable user, group and GPO change detection as well as reporting on anomalies and high-risk conditions. It should also provide deep analysis on users, groups, OUs, computer objects, and Active Directory infrastructure. Because the types of reports required by different teams (such as security and operations) may differ, it may be prudent to provide slightly different interfaces or report sets for the various intended audiences.

When real-time monitoring of Active Directory Users, Groups, OUs, and other changes (including activity blocking) are important, the solution should provide advanced filtering and response on nearly all Active Directory events as well as an audit trail of changes and attempts with all relevant information.

Benefits of Active Directory Monitoring

The three most common business drivers for Active Directory monitoring are improved security, improved audit response, and simplified administration. Active Directory audit and monitoring solutions make life easier for administrators while improving security across the network environment. This is especially important as AD becomes increasingly integrated into enterprise applications.
Some common use-cases include:
  • Monitor Active Directory user accounts for create, modify and delete events. Capture the user account making the change along with the affected account information, changed attributes, time stamp, and more. This monitoring capability acts independent of the Security Event log and is non-reputable.
  • Monitor Active Directory group memberships and provide reports and/or alerts in real time when memberships change on important groups such as the Domain Admins group.
  • Report on failed attempts in addition to successful attempts. Filter on specific types of events and ignore others.
  • Report on Active Directory dormant accounts, empty groups, unused groups, large groups, and other high-risk conditions to empower administrators with actionable information.
  • Automate event response based on policy with email alerts, remediation processes, or record the event to a file or database.
Active Directory Monitoring and Reporting doesn't need to feel complicated or overwhelming. Solutions are available to simplify the process while providing increased security and reduced risk.

About the Author

Matt Flynn has been in the Identity & Access Management space for more than a decade. He’s currently a Product Manager at STEALTHbits Technologies where he focuses on Data & Access Governance solutions for many of the world’s largest, most prestigious organizations. Prior to STEALTHbits, Matt held numerous positions at NetVision, RSA, MaXware, and Unisys where he was involved in virtually every aspect of identity-related projects from hands-on technical to strategic planning. In 2011, SYS-CON Media added Matt to their list of the most powerful voices in Information Security.

Career Advice

To my great surprise, young people now somewhat frequently contact me in order to solicit career advice. They are usually in college or highschool, and want to know what the best next steps are for a career in security or software development.

This is, honestly, a really complicated question, mostly because I’m usually concerned that the question itself might be the wrong one to be asking. What I want to say, more often than not, is something along the lines of don’t do it; when I got out of highschool and focused on the answer to that same question, it was very nearly one of the biggest mistakes of my life.

Since I get these inquiries fairly regularly, I thought I’d write something here that I can use as a sort of canonical starting point for a response.

An AWK-ward Response

A couple of weeks ago I promised some answers to the exercises I proposed at the end of my last post. What we have here is a case of, "Better late than never!"

1. If you go back and look at the example where I counted the number of processes per user, you'll notice that the "UID" header from the ps command ends up being counted. How would you suppress this?

There's a couple of different ways you could attack this using the material I showed you in the previous post. One way would be to do string comparison on field $1:

$ ps -ef | awk '$1 != "UID" {print $1}' | sort | uniq -c | sort -nr
178 root
58 hal
2 www-data
...

An alternative approach would be to use pattern matching to print lines that don't match the string "UID". The "!" operator means "not", so the expression "!/UID/" does what we want:

$ ps -ef | awk '!/UID/ {print $1}' | sort | uniq -c | sort -nr
178 root
57 hal
2 www-data
...

You'll notice that the "!/UID/" version counts one less process for user "hal" than the string comparison version. That's because the pattern match is matching the "UID" in the awk code and not showing you that process. So the string comparison version is slightly more accurate.

2. Print out the usernames of all accounts with superuser privileges (UID is 0 in /etc/passwd).

Remember that /etc/passwd file is colon-delimited, so we'll use awk's "-F" operator to split on colons. UID is field #3 and the username is field #1:

$ awk -F: '$3 == 0 {print $1}' /etc/passwd
root

Normally, a Unix-like OS will only have a single UID 0 account named "root". If you find other UID 0 accounts in your password file, they could be a sign that somebody's doing something naughty.

3. Print out the usernames of all accounts with null password fields in /etc/shadow.

You'll need to be root to do this one, since /etc/shadow is only readable by the superuser:

# awk -F: '$2 == "" {print $1}' /etc/shadow

Again, we use "-F:" to split the fields in /etc/shadow. We look for lines where the second field (containing the password hash) is the empty string and print the first field (the username) when this condition is true. It's really not much different from the previous /etc/passwd example.

You should get no output. There shouldn't be any entries in /etc/shadow with null password hashes!

4. Print out process data for all commands being run as root by interactive users on the system (HINT: If the command is interactive, then the "TTY" column will have something other than a "?" in it)

The "TTY" column in the "ps" output is field #6 and the username field is #1:

# ps -ef | awk '$1 == "root" && $6 != "?" {print}'
root 1422 1 0 Jan05 tty4 00:00:00 /sbin/getty -8 38400 tty4
root 1427 1 0 Jan05 tty5 00:00:00 /sbin/getty -8 38400 tty5
root 1434 1 0 Jan05 tty2 00:00:00 /sbin/getty -8 38400 tty2
root 1435 1 0 Jan05 tty3 00:00:00 /sbin/getty -8 38400 tty3
root 1438 1 0 Jan05 tty6 00:00:00 /sbin/getty -8 38400 tty6
root 1614 1523 0 Jan05 tty7 00:09:00 /usr/bin/X :0 -nr -verbose -auth ...
root 2082 1 0 Jan05 tty1 00:00:00 /sbin/getty -8 38400 tty1
root 5909 5864 0 13:42 pts/3 00:00:00 su -
root 5938 5909 0 13:42 pts/3 00:00:00 -su
root 5968 5938 0 13:47 pts/3 00:00:00 ps -ef
root 5969 5938 0 13:47 pts/3 00:00:00 awk $1 == "root" && $6 != "?" {print}

We look for the keyword "root" in the first field, and anything that's not "?" in the sixth field. If both conditions are true, then we just print out the entire line with "{print}".

Actually, "{print}" is the default action for awk. So we could shorten our code just a bit:

# ps -ef | awk '$1 == "root" && $6 != "?"'
root 1422 1 0 Jan05 tty4 00:00:00 /sbin/getty -8 38400 tty4
root 1427 1 0 Jan05 tty5 00:00:00 /sbin/getty -8 38400 tty5
root 1434 1 0 Jan05 tty2 00:00:00 /sbin/getty -8 38400 tty2
...

5. I mentioned that if you kill all the sshd processes while logged in via SSH, you'll be kicked out of the box (you killed your own sshd process) and unable to log back in (you've killed the master SSH daemon). Fix the awk so that it only prints out the PIDs of SSH daemon processes that (a) don't belong to you, and (b) aren't the master SSH daemon (HINT: The master SSH daemon is the one who's parent process ID is 1).

This one's a little tricky. Take a look at the sshd processes on my system:

# ps -ef | grep sshd
root 3394 1 0 2012 ? 00:00:00 /usr/sbin/sshd
root 13248 3394 0 Jan05 ? 00:00:00 sshd: hal [priv]
hal 13250 13248 0 Jan05 ? 00:00:02 sshd: hal@pts/0
root 25189 3394 0 08:27 ? 00:00:00 sshd: hal [priv]
hal 25191 25189 0 08:27 ? 00:00:00 sshd: hal@pts/1
root 25835 25807 0 15:33 pts/1 00:00:00 grep sshd

For modern SSH daemons with "Privilege Separation" enabled, there are actually two sshd processes per login. There's a root-owned process marked as "sshd: <user> [priv]" and a process owned by the user marked as "sshd: <user>@<tty>". Life would be a whole lot easier if both processes were identified with the associated pty, but alas things didn't work out that way. So here's what I came up with:

# ps -ef | awk '/sshd/ && !($3 == 1 || /sshd: hal[@ ]/) {print $2}'

First we eliminate all processes except for the sshd processes with "/sshd/". We only want to print out the process IDs if it's not the master SSH daemon ("$3 == 1" to make sure the PPID isn't 1) or if it's not one of my SSH daemons ("/sshd: hal[@ ]/" means the string "sshd: hal" followed by either "@" or space). If everything looks good, then print the process ID of the process ("{print $2}").

Frankly, that's some pretty nasty awk. I'm not sure it's something I'd come up with easily on the spur of the moment.

6. Use awk to parse the output of the ifconfig command and print out the IP address of the local system.

Here's the output from ifconfig on my system:

$ ifconfig eth0
eth0 Link encap:Ethernet HWaddr f0:de:f1:29:c7:18
inet addr:192.168.0.14 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::f2de:f1ff:fe29:c718/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7724312 errors:0 dropped:0 overruns:0 frame:0
TX packets:13553720 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:711630936 (711.6 MB) TX bytes:17529013051 (17.5 GB)
Memory:f2500000-f2520000

So this is a reasonable first approximation:

$ ifconfig eth0 | awk '/inet addr:/ {print $2}'
addr:192.168.0.14

The only problem is the "addr:" bit that's still hanging on. awk has a number of built-in functions, including substr() which can help us in this case:

$ ifconfig eth0 | awk '/inet addr:/ {print substr($2, 6)}'
192.168.0.14

substr() takes as arguments the string we're working on (field $2 in this case) and the place in the string where you want to start (for us, that's the sixth character so we skip over the "addr:"). There's an optional third argument which is the number of characters to grab. If you leave that off, then you just get the rest of the string, which is what we want here.

There are lots of other useful built-in functions in awk. Consult the manual page for further info.

7. Parse the output of "lsof -nPi" and output the unique process name, PID, user ID, and port combinations for all processes that are in "LISTEN" mode on ports on the system.

Let's take a look at the "lsof -nPi" output using awk to match only the lines for "LISTEN" mode:

# lsof -nPi | awk '/LISTEN/'
sshd 1216 root 3u IPv4 5264 0t0 TCP *:22 (LISTEN)
sshd 1216 root 4u IPv6 5266 0t0 TCP *:22 (LISTEN)
mysqld 1610 mysql 10u IPv4 6146 0t0 TCP 127.0.0.1:3306 (LISTEN)
vmware-au 1804 root 8u IPv4 6440 0t0 TCP *:902 (LISTEN)
cupsd 1879 root 6u IPv6 73057 0t0 TCP [::1]:631 (LISTEN)
cupsd 1879 root 8u IPv4 73058 0t0 TCP 127.0.0.1:631 (LISTEN)
apache2 1964 root 4u IPv4 7412 0t0 TCP *:80 (LISTEN)
apache2 1964 root 5u IPv4 7414 0t0 TCP *:443 (LISTEN)
apache2 4112 www-data 4u IPv4 7412 0t0 TCP *:80 (LISTEN)
apache2 4112 www-data 5u IPv4 7414 0t0 TCP *:443 (LISTEN)
apache2 4113 www-data 4u IPv4 7412 0t0 TCP *:80 (LISTEN)
apache2 4113 www-data 5u IPv4 7414 0t0 TCP *:443 (LISTEN)
skype 5133 hal 41u IPv4 104783 0t0 TCP *:6553 (LISTEN)

Process name, PID, and process owner are fields 1-3 and the protocol and port are in fields 8-9. So that suggests the following awk:

# lsof -nPi | awk '/LISTEN/ {print $1, $2, $3, $8, $9}'
sshd 1216 root TCP *:22
sshd 1216 root TCP *:22
mysqld 1610 mysql TCP 127.0.0.1:3306
vmware-au 1804 root TCP *:902
cupsd 1879 root TCP [::1]:631
cupsd 1879 root TCP 127.0.0.1:631
apache2 1964 root TCP *:80
apache2 1964 root TCP *:443
apache2 4112 www-data TCP *:80
apache2 4112 www-data TCP *:443
apache2 4113 www-data TCP *:80
apache2 4113 www-data TCP *:443
skype 5133 hal TCP *:6553

And if we want the unique entries, then just use "sort -u":

# lsof -nPi | awk '/LISTEN/ {print $1, $2, $3, $8, $9}' | sort -u
apache2 1964 root TCP *:443
apache2 1964 root TCP *:80
apache2 4112 www-data TCP *:443
apache2 4112 www-data TCP *:80
apache2 4113 www-data TCP *:443
apache2 4113 www-data TCP *:80
cupsd 1879 root TCP 127.0.0.1:631
cupsd 1879 root TCP [::1]:631
mysqld 1610 mysql TCP 127.0.0.1:3306
skype 5133 hal TCP *:6553
sshd 1216 root TCP *:22
vmware-au 1804 root TCP *:902

Looking at the output, I'm not sure I care about all of the different apache2 instances. All I really want to know is which program is using port 80/tcp and 443/tcp. So perhaps we should just drop the PID and process owner:

# lsof -nPi | awk '/LISTEN/ {print $1, $8, $9}' | sort -u
apache2 TCP *:443
apache2 TCP *:80
cupsd TCP 127.0.0.1:631
cupsd TCP [::1]:631
mysqld TCP 127.0.0.1:3306
skype TCP *:6553
sshd TCP *:22
vmware-au TCP *:902

In the above output you see cupsd bound to both the IPv4 and IPv6 loopback address. If you just care about the port numbers, we can flash a little sed to clean things up:

# lsof -nPi | awk '/LISTEN/ {print $1, $8, $9}' | \
sed 's/[^ ]*:\([0-9]*\)/\1/' | sort -u -n -k3

sshd TCP 22
apache2 TCP 80
apache2 TCP 443
cupsd TCP 631
vmware-au TCP 902
mysqld TCP 3306
skype TCP 6553

In the sed expression I'm matching "some non-space characters followed by a colon" ("[^ ]*:") with some digits afterwards ("[0-9]*"). The digits are the port number, so we replace the matching expression with just the port number. Notice I used "\(...\)" around the "[0-9]*" to create a sub-expression that I can substitute on the right-hand side as "\1".

I've also modified the final "sort" command so that we get a numeric ("-n") sort on the port number ("-k3" for the third column). That makes the output look more natural to me.

I guess the moral of the story here is that awk is good for many things, but not necessarily for everything. Don't forget that there are other standard commands like sed and sort that can help produce the output that you're looking for.

Happy awk-ing everyone!