Sunday, December 8, 2013

Advanced Analysis of Modbus Traffic

It can be said that a proper network trace (from, say, Wireshark or TCPDump) probably has the answer to the network problem we are facing.  Of course this is not always true, but at the very least it more than likely cuts the problem in half.  With proper understanding of the network protocol, we can usually figure which end is not living up to its requirement.

By way of example, let's have a look at a trace with the following features, as captured from a mirror port at the logical 'master' end, also known as the ModbusTCP client.

The stats are (from Statistics->Summary option in Wireshark):


So we have more than quarter of a million packets, taken over 11 minutes.  95% of them are ModbusTCP packets, so for an analysis of timing, we have quite a ways to go if we want to determine the response time of every query, as well as the time between queries.  This could take quite some time - but is possible by hand.  As an example, to find the response time for a specific query, I will filter in Wireshark by 'Follow TCP stream' as the default display has the packets in chronological order, but I need to find a particular query and it's matching response.  The default view:


We can see many IPs in the source and destination fields, but since we want a particular query and response which would come from the same IP address pair, and most often from the same TCP connection, I can right click on a frame in the window and select follow TCP stream:


 Once we follow the TCP stream, we see only packets for this TCP connection (this is IPaddress,Port <-> IPaddress,Port pair).  For this case we have


Now, in general, the queries and responses are directly related and we can use Wireshark to find some timing information for us.  Choosing a query - recall this is just a sample - I select frame 39 and right click to select Set Time Reference:


Doing this, we can directly find the response time and have Wireshark calculate it for us:


We can see here (helped by the fact that Transaction ID is in use, though this is optional per the ModbusTCP spec so it could always be zero) that the Modbus Response came out 0.038635sec after the query.  This would be the response time, and is then about 39ms.  Great - for sure, Wireshark gave us useful information.  But here's the catch.  I want to use the timings to see the big picture.  We have 150200 total Modbus transactions (a transaction being a query/response pair, which provides for a single response time measurement) across many different hosts (recall we saw different IP addresses in use before filtering to a single TCP connection).

The tool developed automates these calculations and provides for CSV file format output so that the results can be sent to nearly any graphing tool that can import CSV files (comma separated values).  Excel will work, but is not optimal.  A better tool is Minitab, which has great graphs for exploratory data analysis.  The idea is to get the big picture so we can compare what normal might look like for this network compared to what is happening when a problem is indicated.

The tool executes in perl and acts like a processor - it natively reads pcap files (as generated from libpcap, which is what Wireshark and TCPDump can use) and sorts the packets and does the calculations for us.  Note that newer versions of Wireshark use pcapng format, which is not the same - so if this is in use, open the file in Wireshark and save as pcap format.  Also note that the tool expects little endian format, as would be the case if used with Intel or other x86 format processors.  If you have a packet capture from an early MAC (before they went to Intel chips) or other big endian processors (such as SPARC on SUN, MIPS, etc) there be trouble due to endianess issue (these would be rare, but could happen).

Executing the program we get some information but the real value is in the CSV files generated which contain the timing data we want.  Taking a look at the response time as a dot plot (similar to histogram) for ALL the data, we have:

   
Based on this, we might have three separate populations.  I expect random error in network communications - small amount of deviations based on things we have no control over.  Random error is typically Gaussian in nature so we might expect to see some type of bell curve.  At the scaling we have here, it's hard to tell.  Zooming in (remember, this is exploratory data analysis - we don't know what we will find, so we are just looking around the data set to see what it can tell us):


In this range from 0.14 to 0.28sec, it's clear it is not a single population but likely two.  Be careful: scaling has an impact on how the results are viewed.  It is possible (and recommended) to continue down this path to see what is happening, but it's also useful to to have a look at what information is provided.  These past two graphs are for all response time data that has the MasterIP (or client) as shown.  But how many servers (or slaves, in old Modbus terminology) are there?  Can we make separate plots for each one of these to compare?  The power of Minitab is in the by-variables feature - those familiar with SQL will have seen this before as it is similar to the group-by feature - so instead of plotting versus a single master (or many, though in this case there is only a single), let's compare dotplots for each server:


Here we can start to see many slave or server IPs are typical - we don't know what is correct or incorrect - we are only noting what is 'different'.  But some are not typical - some look very different than others.  This greatly narrow down the issues.  For all those that look normal or are consistent with the majority, there is usually little need to dig through these as they can be discarded.  We are looking for the needle in a haystack, but the haystack is getting significantly smaller, with very little effort.  Another view of the data is to look at the response times as a function of time.  Does it vary?  Are there trends or patterns?  The trace is about 11min long - is the response time at the end the same as the beginning?  Let's plot all the response times as a function of time:



Here we have all response times for all Modbus transactions as a function of time in the trace.  Based on the dot plots, we know we will have several different modes and we can see them here.  But which specific devices are different than the others?  Using the by-variables feature again in Minitab, we have:


Immediately apparent from this is that some devices (see yellow circles on graph) have no data - there is no red data point because there is no transaction here.  This doesn't tell us why there is no data - we may need to go back into the trace, but we have narrowed the problem down from 150000+ transactions from 29 slave devices to only those transactions with four slave devices.  This becomes a much more tractable problem.

The tool can also help us index into the trace.  Next time, we will use Minitab to tell us exactly which frame in the trace we should look at.



Sunday, September 22, 2013

Properties of ICS Protocols

I want to propose a method or philosophy to study ICS protocols but first I need to define some of their properties.  These properties are not unique, or new, only that with the analysis of the network communication, it's best if the terminology is consistent.

There are many ICS protocols.  Many are called fieldbus protocols, and some are based on the pre-Ethernet days, while some are not.  Some popular Ethernet ones are

ModbusTCP
EtherNet/IP 
Profinet
PowerLink
EtherCAT
sercosIII

and many others.  Some non-Ethernet ones are

CANopen
DeviceNet
Modbus
ASI

and many more...

My focus is on Ethernet mostly, in particular ModbusTCP and EtherNet/IP, which is CIP on Ethernet.

Some particular features of ModbusTCP is that it is usually a polling protocol.  A request is made, and then this request is fulfilled in a response.  This applies to CIP explicit messaging as well (CIP is a suite of protocols, and the explicit messaging type behaves this way).  Based on this behavior, there are several factors to note:

1. No response with updated data is usually available until a request is made.  There are exceptions, but the vast majority of these types of communications is pure polling - there is little to no report-by-exception.  The server issues responses.
2. There is typically a request time - how often a request for new data is made.  This is called many things on many different platforms, but the concept is simple: every period of time, make a request for new data.  The client makes the request.
3. The device that acts as the server has a response time.  How long does it take to answer a request that comes in?  Wireshark is a nice tool to evaluate this, if the number of samples is not too large.
4. There is a concept of data freshness - how long does it take to update data in the client?  This is obviously a function of both the request period and the response time of the server device.

Some different role names used in ICS protocols:

 
and a visual on the request/response aspect of ModbusTCP:

Since we are capturing the traffic in Wireshark now, we can see the Queries going out, and then we can see the responses coming back.  It's straightforward to calculate the time between the two, and we can call that the response time.  The time between queries for a given set of data we can call Query-Query time, or what is called RepRate in the graphic above (it's not really a repetition rate, but rather a repetition period - I didn't name it!)

For CIP (EtherNet/IP) the same concepts exist for explicit messaging.  This is actually what makes it explicit - a request comes in and then it is answered.  It's object based -yada, yada, yada, but fundamentally the communications profile is the same - a request is sent, and it is answered.  Now there are advances here.  There is Implicit messaging as well (often called Class1 or IO messaging), which is very different.  It uses UDP at the transport layer, and the logical connection is set up in advance in which both sides agree to what data will be sent, how often, and with what options (timeouts, etc).  In this case, then, there is no request - each side, independently, just publishes data to the wire (possibly using multicast, other times using unicast at the network layer).  But explicit is different - the specific request comes in, and then is answered, and uses TCP at the transport layer.  There is Class3 messaging which means a CIP connection is set up before hand and then explicit messages are sent and responded to.  There is UCMM, where the message is just sent and it's best effort - this is much like ModbusTCP where there is no concept of an application layer connection like with Class3 messaging.  There are unconnected_sends, which uses UCMM but the actual data request is encapsulated.  I expect to talk more about EtherNet/IP in the future.

For analysis, Class1 messaging has a free tool from NIST available on SourceForge.  It's called IENetP, and is available at http://sourceforge.net/projects/ienetp/.  A method of analysis will be presented here for explicit type messaging, though right now it is for ModbusTCP only.  It could be easily extended to EtherNet/IP, and will be in the future.  Note that this request/response type communication protocol is similar to others: SNMP has a similar structure with gets and get-responses (and has trap type extensions which are nice additions to improve performance with a report-by-exception mechanism).    

Next time we'll introduce the tool and show some of the results.

Sunday, September 15, 2013

Industrial Control Protocols

As Ethernet takes over the Industrial Control connectivity space, I see more and more need for analysis tools to quickly and accurately benchmark ICS (Industrial Control Systems) networks and identify actual (or preferably) potential problems.

There are many Ethernet diagnostic tools, many in long use by IT groups to monitor and manage Ethernet networks.  Some have made it over to the ICS world, some have not.  Some various tools in use:

1. SNMP with network management software
2. Device webpages
3. Netflow and Openflow tools
4. Packet sniffers
5. etc.

There are many packet sniffers available, depending on the platform in use.  These sniffers are able to capture and timestamp Ethernet frames as they traverse the network - of course, putting a tap in an appropriate place is always useful, and this determines exactly which frames you will get...

Some different sniffers include tcpdump, windump, tethereal, Wireshark, etc.  One common way to capture some network data for later use is to use a CLI type tool, such as tcpdump, and then follow up with Wireshark as the analysis tool.  Other times, one can use Wireshark directly and sniff the network, and perhaps performing some analysis in real time.

Wireshark is maintained as a free tool at http://www.wireshark.org/, and a good site for follow advancements is http://www.lovemytool.com/ (no, it's not that kind of site!).

So you want to capture some industrial control system traffic, like ModbusTCP, or EtherNet/IP, or... whatever.  How to do it?  Download and install Wireshark, and tell which interface to listen on...

 
Easy.  No problem.  I want to sniff on my LAN port... to see some ModbusTCP traffic (for example).  This traffic is between two others devices on my network.  So Wireshark will try and and put my network interface (LAN) into promiscuous mode so the actual packets are sent up the stack to be captured by Wireshark.  What happens is that the Ethernet interface knows it's MAC address, so is listening on the wire for packets destined to this MAC, like this:


In this case the Dell PC has an IP 192.168.3.254 with the MAC shown.  As frames come from the network, the NIC card is looking for this address.  Other addresses will be dropped - the frames are not destined for this device, so why even deal with them?  There are exceptions to this, of course, namely broadcast traffic - frames that have a destination MAC address of ff:ff:ff:ff:ff:ff. (or Wireshark labels as Broadcast).  There are some other exceptions to this as well, but for now, this is enough.

So we want to capture and analyze ModbusTCP traffic between two devices using this Dell PC.  Turning on Wireshark, and waiting about a minute:

  
I know there is plenty of ModbusTCP traffic, but I can't see it.  The little LED lights are flashing away...  Why?  A number of reasons:

1. I am using an Ethernet switch
2. My Firewall is blocking traffic
3. There really is no traffic

So how to get traffic?  If using a switch, this presents a problem.  Switches operate at layer-2 of the OSI 7-layer model, so filter based on MAC address, much as the Ethernet NIC in the Dell PC does.  If two devices are communicating, the switch is smart and maintains a forwarding table, so only frames destined for a given device are sent to that port.  This is precisely what distinguishes a switch from a hub - a hub sends frames to every port, every time... a switch will only forward to ports that expect the data.

How to get around this switch limitation?  There are a number of ways:

1. Span or mirror port - allows for a given port on the switch to copy data over to another port, which we can then use to capture traffic.  Requires use of a managed switch that supports this feature, and the 'copy' operation is usually lower priority, so cannot guarantee that data is exact, with no delay.
2. Put a hub inline with one of the devices to be monitored.  Degrades link speed to 10Mbps/half duplex.
3. Add a special network tap inline with one of the devices to capture traffic.  Must purchase and install something, can be expensive and usually must break link to put inline.

The easiest method is to use a mirror port, with a setup that may look like this:


This says to take the traffic on port 1.15 (Source) and copy over to port 1.16 (destination).  You can also do many to one, so have multiple ports copy their traffic over to a single destination port.  Be careful: each port is likely full line rate capable, so trying to copy multiple ports into one under heavy load will saturate the destination, causing loss of data at the destination.

So once we know data is being copied, we can put our sniffer device onto port 16 to monitor the communications from port 15 (obviously, make sure the communication we want to observe is actually using port 15!)

Will we be able to see this data?  Maybe.  The NIC card attached to port 16 will drop the traffic because it is destined for other devices... not here.  So the sniffer tool has to put the interface into promiscuous mode.  Wireshark will do this automatically.  Is this enough... maybe.  In many cases, it is not.  Corporate firewalls can be installed on local machines so even though the NIC will pass up the traffic because it is in promiscuous mode, the firewall will drop the traffic before it gets into the sniffer tool.  Solution: disable the firewall (just don't tell IT I told you to do that...).  Better yet, work with IT to provide a long term solution to the problem.  Good luck.

On Linux, ifconfig can be used to tell if your in promiscuous mode:

eth2      Link encap:Ethernet  HWaddr 66:1C:A3:12:0D:5f
          inet6 addr: fe80::6a1c:a3ff:fe12:d5f/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:468 (468.0 b)

OK... we have mirroring, we started Wireshark so the interface is in promiscuous mode, we disabled the firewall... what do we get...  
    

ModbusTCP traffic!  By ensuring we are capturing correctly, we can see traffic between two devices on the network.  This is the start of our analysis - we need to be able to capture suitable data before can analyze it.