Accountability and Responsibility

A member of my development team asked me to clarify the difference between accountability and responsibility. It is one of those seemingly innocuous questions, where you think the difference is obvious, but is not so obvious once you have to actually verbalize it. So I thought about it for a couple of days, and decided to write down my answer.

I will limit my answer to Accountability and Responsibility in the workplace. Their definition in ones personal life is a different kettle of fish.

The Oxford English Dictionary defines these two words as follows (only the relevant definition shown).

responsibility

the opportunity or ability to act independently and take decisions without authorization:
     we expect individuals to take on more responsibility
[count noun] (often responsibilities) a thing which one is required to do as part of a job, role, or legal obligation:
     he will take over the responsibilities of Overseas Director

accountability

the fact or condition of being accountable; responsibility:
lack of accountability has corroded public respect for business and political leaders

To elaborate, your responsibilities are the set of tasks which you are assigned and you have the authority to execute. This is something you’ve agreed with your manager, and you are expected to perform as long as there are no extenuating circumstances. You are therefore accountable for your responsibilities.

My experience in large organisations has been that accountability tends to mean more than responsibility. So senior management roles tend to have the word accountable used more often, whereas the word responsibility is used for junior roles. The perception also tends to be that the word accountable carries a greater burden of responsibility.

I suppose that is because when you are a manager, you have no direct control over the actions of your staff. You do all you can to set a good example and create the right working culture, and then hope that your staff behave responsibly. Of course you can have regular check points, feedback loops, status reports etc, but ultimately you are dependent on your staff to do the right thing, and you are to a greater or lesser extent, accountable for the result of their actions. So the risk that the manager may be held accountable for a staff member’s actions puts a greater burden on the manager, which is reflected in the way these words are used. The burden carried is clearly demonstrated in the recent controversy at the BBC, where the director general resigned for the actions of his staff.

For additional reading, here is an article about accountability published recently in the Harvard Business Review. So what do these words mean in your organisation? And does accountability exist at all levels?

Performance tuning of DMC

We’ve recently developed a real time charging mediation system (named DMC) for all our mobile data users on the Orange brand. This was to replace a system from an external vendor which was not performing very well. This system will handle charging for the majority of our customers, and this blog post is about how we performance tuned the system.

We initially developed the code to meet all the business requirements, without much focus on performance. The idea was to get the functionality correct, and then measure the performance. The high level view of the system is as shown below.

Image

The basic charging flow consisted of the following messages.

  • A user profile request from the Packet Gateway (not the LTE P-GW) to determine the subscriber profile which involved an LDAP call to the subscriber database. Sent once per session.
  • A service authorisation request from the Packet Gateway to get an allocation of data, called a tranche.
  • Subsequent service reauthorisation requests to report data usage, and to request additional tranches of data
  • A service stop request to indicate that the session has ended.

DMC had to keep a record of session related information for the lifetime of the session. To simplify things, we decided not to use replicated databases for redundancy of session information.

To handle the GTP protocol, we used the excellent open-cgf library with some refactoring to meet our specific needs. For LDAP protocol support, we used the eldap application from OTP with a few enhancements (load balancing to a named pool of servers, windowing to limit number of concurrent requests on each connection, support for LDAP ExtendedRequest). One day when I get the time and figure out how to submit patches to OTP, I will contribute these enhancements.

Performance test environment

DMC test layout
DMC test layout

Initial performance tests showed that the system handled a 1000 reqs/sec without any explicit performance tuning, but beyond that, we were starting to see timeouts on the client side. This is a testament to Erlang’s amazing performance characteristics, considering that the current vendor supplied solution uses about 10 servers for handling about 2500 reqs/sec along with a rack of servers for the Oracle database. I knew Erlang could do better, so the following performance “tweaks” were done.

Incrementing Counters

Obviously, for every data charging transaction, a CDR has to be produced. And for each CDR, a unique sequence number was required. We started off using mnesia:dirty_update_counter via a gen_server (as the maximum value for the counter was 1000000, so access had to be serialised). This didn’t scale very well as even at moderate loads, we started to see mnesia_overload alarms. So we implemented a counter_server which did prefetching of blocks of counter values. So for instance, if the counter value started to zero, the counter_server would change the value of the counter to 1000 in the DB, and in the interim, if processes requested a new unique value, it would return one from the 0-1000 block. Once this block was exhausted, it would “fetch” a new block by setting the counter value in mnesia to 2000. This cut down the number of mnesia write by a factor of a 1000! This ensured that even if this process crashed or the system shutdown, we wouldn’t lose more than a 1000 counter values.

Writing CDRs

Initially, we used an ordered_set mnesia table with an index value of now(), and every 15 minutes, a process would walk the table, and write CDRs to disk. At the initial load, we started to see mnesia_overload alarms. So we decided to write CDRs directly to disk rather than cache them in mnesia. So we switched to writing CDRs to an ordered_set ets table with the {write_concurrency, true} option, and a process walked the table every 5 seconds, and flushed the CDRs to disk. Worked like a charm and got rid of the mnesia_overload alarms.

UDP packet handling

We initially had a single UDP socket taking all the traffic from the Packet Gateway. Despite experimenting with various values for the recbuf and read_packets option of gen_udp, we couldn’t push performance beyond a certain point. The only way we could improve the performance was to accept packets on multiple ports.

Results

We now have a system which consistently handles 2000 reqs/sec at 40% CPU load running in a Solaris zone with one dedicated 2GHz quad core processor. Put four of these into the network, two in each site, and we’ve got ourselves covered for a couple of years at least. We haven’t put this system live yet, so fingers crossed that it will perform in production as it is in test!

Image

Tools used

We developed our own load testing tool, which basically spawned processes at a certain rate, and collected performance stats to monitor success/failure/timeout. Each process then simulated a session. We will look into using Tsung in the future. I’ve heard good things about it.

The following configuration file was used to plot the above graph using gnuplot. All log entries for each request type were collated into a file dmc_load_timings.txt and separated by two blank lines to allow gnuplot to distinguish the different sections.

set terminal png size 1200,800
set xdata time
set timefmt "%Y-%m-%d_%H:%M:%S"
set output "dmc_graph.png"
set xlabel "Time"
set ylabel "Reqs/sec"
set title "DMC Traffic"
set datafile separator ","
plot 'dmc_load_timings.txt' using 2:1 index 0 title "UserProf" with lines, \
     'dmc_load_timings.txt' using 2:1 index 1 title "Auth" with lines, \
     'dmc_load_timings.txt' using 2:1 index 2 title "Reauth" with lines, \
     'dmc_load_timings.txt' using 2:1 index 3 title "Stop" with lines, \
     'dmc_load_timings.txt' using 2:1 index 4 title "Total" with lines

Network interfaces were monitored using the following command.

netstat -iI hme0 1

Ingredients for a high performance software development team

I’ve been developing software for the past 18 years, and currently manage a team of 15 developers for a large telco in the UK, developing mission critical systems. Here is what I feel are the necessary ingredients (both people and resources) for a high performance software development team.

  • A chat system to enable people to communicate easily. Email is not suitable for a quick conversation.
  • Project/Ticket tracker. To keep track of features to complete, bugs to fix.
  • Documentation system (integrated with the Ticket tracker and Version Control system). Very important! Most developers hate writing documentation, but love having access to great documents. The solution is fairly simple: pay someone to write and keep documents up to date. I often find that a lot of delays in projects happen when people cannot find information easily, so a well maintained document set is invaluable. Ideally this should be a Wiki based system so you have change history, the ability to search easily, linking to other documents, tickets etc.
  • Version Control system (Git or Subversion). For large projects, choose Git.
  • Servers. Either hosted in the office or in a cloud. Need a few of these dedicated for building software and testing.
  • Laptop. A beefy laptop which has enough oomph to allow a developer to work even if not connected any network.
  • Large monitors. 2 per person, 27″ at least.
  • Technical writer(s). Someone who has enough smarts to speak to technical and non-technical people in the business, and produce documentation to meet the needs of everyone. One of the most important roles within the team in my opinion.
  • Tester(s). Ideally a person with some software development skills. The tester needs to understand the domain they are operating in (Obvious, I know, but isn’t always true in practice), be methodical, have the ability to research and use open source tools. The amount of software development skills required varies by the product he/she is testing. Relying on developers to do all the testing is a bad idea.
  • Infrastructure engineer(s). Another very important role in the team. Someone who can build servers, development machines, set up networking, maintain and administer the tools mentioned above. If such a role doesn’t exist, a lot of developer time will be wasted fixing niggling issues.
  • Designer(s). I don’t mean user interface designers here, though they do fall into this category. This is a person who understands the business, the product being developed, the capabilities of the developers and the technology being used. Designers are expected to be having the long conversations with the rest of the business and act as a bridge to the development team, who should be left mostly uninterrupted while they go about their job of writing beautiful code. Ideally, these people should have some programming experience.
  • Support engineer(s). Engineers who can troubleshoot problems in existing versions of your products and monitor/maintain existing live installations. Very good knowledge of scripting required. These engineers should be working closely with the developers to improve the quality of the product.
  • Release engineer(s). Someone who co-ordinates the packaging and delivery of new software into production. This role could be performed by the support engineers.
  • Delivery manager. This person should be negotiating priorities with the rest of the business and provide a queue of work to the development team.
  • Developer(s). Of course. Businesses usually tend to treat a development team as just a bunch of developers. Ask for some work to be done, with not a lot of support provided. NO! Software developers are a specialist resource who need a lot of support from the rest of the business. Think of them as an expensive hi-tech machine. The machine isn’t just going to start working magic for you. You need to put it in the right environment, control access to it, maintain it so that it doesn’t fall apart, and use it within its limits!
  • Last but not the least, a benevolent dictator (such as myself :-)) Someone who understands all of the above and orchestrates the whole operation. Software development experience, a must.

You know, there isn’t anything particularly radical or new in the above list, but it is amazing how many setups get this wrong. And then they moan about delays in projects, cost overruns etc. Whenever I’ve looked at an underperforming development team, it has been fairly obvious what the causes of failure were, but too often, the culture within an organisation makes it too hard (or impossible) to make the necessary changes to succeed.

Disable proxy when using apt-get

I love apt-get: most of the world’s open source software at ones fingertips! I use a Linux laptop for all my work, and I use it at work behind a proxy and at home connected directly to the Internet. I setup my apt.conf to look like this so that I can still use apt-get when I am in the office, behind a proxy.


Acquire::http::proxy "http://username:password@proxy.host.name:8080/";
Acquire::https::proxy "https://username:password@proxy.host.name:8080/";
Acquire::ftp::proxy "ftp://username:password@proxy.host.name:8080/";
Acquire::socks::proxy "socks://username:password@proxy.host.name:8080/";

Works perfectly when I’m at the office. Problem was that whenever I was at home I could’t use apt-get any more. It always seemed to want to connect to the proxy, even if I blitzed that config file and removed all the entries. Various searches on the Internet didn’t help, until I finally found this option.


Acquire::http::proxy=false

When I’m at home, I run apt-get thusly.


sudo apt-get -o Acquire::http::proxy=false install kgraphviewer

gSTM configuration by hand

Gnome has a handy utility for managing your SSH tunnels called gSTM (Gnome SSH Tunnel Manager). But if you have to manage a large number of tunnels, the GUI becomes quite painful. Instead, you can edit the configuration files by hand. The files are stored under $HOME/.gSTM.

$ ls -l ~/.gSTM
total 12
-rw-rw-r– 1 chandru chandru 1354 Sep  3 04:50 jumpbox1.reJBkD.gstm
-rw-rw-r– 1 chandru chandru 2901 Sep  5 09:51 jumpbox2.0VZHUf.gstm
-rw-rw-r– 1 chandru chandru 1212 Aug 31 14:51 jumpbox3.Fg3FFh.gstm

A sample entry in one of the configuration file looks like this.

        <tunnel>
<type>local</type>
<port1>9015</port1>
<host>hostname.domain</host>
<port2>22</port2>
</tunnel>