Category Archives: Software

500 Million Lines of Code?

500 million lines of code. That’s how big the source code is for, according to this article at the New York Times. That number has since been repeated in a CNN editorial by Julianne Pepitone.

That can’t be right.

I know for a fact you can build a healthcare enrollment site in under a million lines of code. does more than just handle enrollment, but I don’t believe it does 500 times more.

According to this other New York Times article, principle software development did not begin until early this year. It doesn’t seem plausible that developers could have written 500 million lines since then.

Windows 8 is rumored to be somewhere between 30 and 80 million lines of code, and Microsoft was developing it for over 20 years. Or take a look at this handy graph by web developer Alex Marchant. The codebase for the Debian 5 release with all packages is only about 325 million lines of code, and it includes not only an entire Linux operating system distribution, but also a collection of over 17,000 open source software packages. Could really be bigger than that gigantic collection of software?

There’s also the issue of cost. Depending which stories you believe, the site cost between $400 and $600 million dollars to build. At 500 million lines of code, that would work out to between $0.80 and $1.20 per line. That’s far too cheap. Lines of code is a pretty squishy metric, but production code should cost somewhere around $10 to $20 per line. If is really 500 million lines of source code, it should have cost billions.

I tried a few online COCOMO calculators to see if I could come up with a ballpark cost estimate for a 500 million line program. One of them didn’t allow me to enter a number that large, and the other two both recommended a team of 11,500 people working for 52 years. At a salary of $75,000 per year (neglecting all overhead) that would cost around $45 billion, which is around 100 times the reported cost of

So where did that number come from? The Times just attributes it to “one specialist,” who is otherwise unidentified, without explaining how he arrived at the figure. I don’t know anything more, but I can suggest a few possibilities:

(1) Someone just pulled the number out of thin air. It’s large, it’s impressive, but it’s totally meaningless.

(2) The number could include generated code. Many software development tools take specifications from a programmer and generate code. It could be that large parts of the system are written in a much more compact high-level specification language that is fed into a code generator to create all 500 million lines of code.

(3) Related to (2), the number could include static HTML web content as code. Sometimes, for performance reasons, pages that could be the result of a database query are actually generated in advance and served statically. For example, an insurance pricing system might break down the plan structure into 4 different plan levels across 10 different age bands in 75 geographic regions. Rather than generate each page on the fly as needed, the developers might generate all 1500 possible pricing pages in advance, so they can be served more quickly. If each page is 2000 lines of HTML, that alone could count as 6 million lines of code. Do this a few more times, and it wouldn’t be too hard to get to 500 million lines of “code.”

(Either of the preceding two possibilities would be misleading, because the true system complexity — and therefore development effort — is related to the size of the input to the code generator, not the size of its output.)

(4) The line count could include a lot of duplication for some reason, perhaps due to poor factoring as part of a damn-the-maintenance push to get something out now. For example, maybe each state website has to be customized. The smart and maintainable way to do it is to have all the websites share common code except for the (say) 5% that has to be customized. Thus if the base website is 10 million lines of code, then there would be 9.5 million lines of common code, and about 17 million lines of custom code (a half-million lines for each of the 33 states plus D.C. on for a total of about 25 million lines of code.

However, finding the 5% of the code that has to be changed and designing the remaining 95% so it can be shared across all sites is relatively hard work. (It’s a common software engineering process, but it still takes time and effort.) It might be faster to build one website from 10 million lines of code and then fork off and customize 34 copies — one for each state that uses the federal website — for an apparent total of 350 million lines of code, even though only 25 million lines required effort to develop. (But all of it will require effort to maintain.) Again, it wouldn’t be hard to get to 500 million lines this way.

(5) The line count could represent all the cooperating systems behind, including pre-existing ones. One of the most complicated aspects of is that it has to interact with a lot of other data sources, including (I’ve heard) the Internal Revenue Service, Homeland Security, the Social Security Administration, the Health and Human Services Department, the Treasury Department, the Department of Justice, and all the insurance carriers.

Although these external data repositories undoubtedly required some work to interact with, for the most part the software systems already exist. So perhaps someone was discussing the complexity of all the interactions, and they were asked to estimate the size of the entire interacting system — and every system it talks to. 500 million lines might be a reasonable guess for that.

The first three of these explanations are misleading at least, and at worst they are a manipulative attempt to explain the disaster. The fourth and fifth possibilities could be misleading, but are likely the result of miscommunication rather than an attempt to mislead. Of course, there’s always another possibility: (6) I could be wrong, and the system really could have 500 million lines of code. Awful, awful code.

More Interesting News About Problems

Lena H. Sun and Scott Wilson in the Washington Post have a pretty good article about the mess that tells us more about what’s going wrong. Let’s start with the specification for the load:

CGI built the shopping and enrollment applications to accommodate 60,000 users at the same time. U.S. Chief Technology Officer Todd Park has said that the government expected to draw 50,000 to 60,000 simultaneous users but that the site was overwhelmed by up to five times as many users in the first week.

That sounds reasonable, and it sounds like the sort of meaningless transient problem I was talking about on day 1. This is day 23 however, and the system is still failing, and was apparently doomed to do so, as I speculated earlier. Sun and Wilson’s article gives us a better idea how that happened, starting with stress testing under a synthetic load:

Days before the launch of President Obama’s online health ­insurance marketplace, government officials and contractors tested a key part of the Web site to see whether it could handle tens of thousands of consumers at the same time. It crashed after a simulation in which just a few hundred people tried to log on simultaneously.

Despite the failed test, federal health officials plowed ahead.

If it never passed its stress tests, then it’s not surprising what happened when it opened for business on October 1:

When the Web site went live Oct. 1, it locked up shortly after midnight as about 2,000 users attempted to complete the first step, according to two people familiar with the project.

That’s way below the design spec of 60,000 users and potentially a sign of serious problems. On the first day, it could still be a glitch — misconfigured load sharing, caching disabled, some performance option turned off by accident — although the stress tests, had they passed, should have caught most of that. That there are still performance problems is a sign of deeper issues.

The president’s remarks reflected rising anxiety within his administration over the widening problems with the online enrollment process. “There’s no excuse for the problems,” he added, “and they are being fixed.”

To me, this sounds like a fairly routine software disaster.  That’s not exactly an excuse, but this is hardly an unprecedented event. Nor was it unpredictable, given the combination of scope and deadline. And indeed, it was predicted:

The Centers for Medicare and Medicaid Services (CMS), the federal agency in charge of running the health insurance exchange in 36 states, invited about 10 insurers to give advice and help test the Web site.

About a month before the exchange opened, this testing group urged agency officials not to launch it nationwide because it was still riddled with problems, according to an insurance IT executive who was close to the rollout.

As a software engineer, probably the thing I find most damning is this:

Some key testing of the system did not take place until the week before launch, according to this person. As late as Sept. 26, there had been no tests to determine whether a consumer could complete the process from beginning to end: create an account, determine eligibility for federal subsidies and sign up for a health insurance plan, according to two sources familiar with the project.

This was the core use case for the system, the primary path for users. And no one had tested it until five days before the site opened. This is another sign of exactly the sort of late integration problem I was speculating about in my previous post. And in case you’ve ever wondered, this is exactly why so many software products are still shipped late; because it’s better to slip the delivery date than deliver a broken system.

But with the date set by Congress (or the implementing regulations), there wasn’t much to do

People working on the project knew that Oct. 1 was set in stone as a launch date. “We named it the tyranny of the October 1 date,” said a person close to the project.

They do seem to be working their way through the list of problems:

Initial problems centered on account registration, a function that takes place early in the process and was in part a responsibility of contractor QSSI. While that function has improved, it is not fixed, according to the person close to the project.

QSSI said that a critical component that involves identity management is “successfully handling current volumes,” said Matt Stearns, a spokesman for UnitedHealth Group, the parent company. He said the “entire federal marketplace” was overwhelmed by consumer interest at launch.

Of course, now that lots of users are making it past the first bottleneck, they are crowding into the next one:

Additional problems are now showing up in the shopping and enrollment parts of the process, applications that are largely the responsibility of CGI, the person said. Those issues would have shown up earlier if testing had been done sooner, the person said.

Yeah. That sounds about right.

This part is a little disturbing:

Obama said government officials are “doing everything we can possibly do” to repair the site, including 24-hour work from “some of the best IT talent in the country.”

(Actually, it’s a little disturbing that WaPo would put a link to one of their own stories into a quote from the President. He certainly didn’t say the link. And how do they know he was talking about the same thing their story was talking about?)

So is this:

“We are working around the clock to identify issues with the site, diagnose them and fix them,” said Joanne Peters, a spokeswoman for Health and Human Services.

If they’re talking about routine 24/7 operations staff, then they’re being deceptive. But if they’re talking about working the software development staff for long hours, then they’re pushing their people beyond the limits and are probably suffering productivity losses by now.

According to another story by Amy Goldstein, the government is bringing in extra staff to help:

The Obama administration said Sunday that it has enlisted additional computer experts from across the government and from private companies to help rewrite computer code and make other improvements to the online health insurance marketplace, which has been plagued by technical defects that have stymied many consumers since it opened nearly three weeks ago.

The additional staff may not be terribly helpful due to Brooke’s Law, a well-known observation in software development usually expressed as, “Adding manpower to a late software project makes it later.” This counter-intuitive result has several causes.

First of all, many tasks can only be subdivided so far — nine women cannot make a baby in one month — so it may not be easy to find tasks for the extra staff to do.

Also, it takes time for software engineers to ramp-up on a project — learn what all the existing pieces are, learn the procedural steps of the job, integrate into the teams — and teaching them these things soaks up the time of other members on the team, which is why growing a software team always slows it down at first.

Finally, any time you make a team larger, it increases the amount of overhead, producing diminishing returns to team size, which reduces the benefits of adding extra staff. Optimizing the teams may mean breaking up large teams into smaller ones, which will add to the ramp-up time as the teams adjust to their new roles. Eventually, the increased staff should make the team more productive, but it could take months.

It’s possible that the new teams are being brought in for some very specific, targeted purposes, or to take on looming projects in the near future to keep the current development staff focused on the existing problems, but nobody’s saying:

Even now, administration officials are declining to disclose many details about the debugging effort. They will not say how many experts — whom they describe as “the best and the brightest” — are on the team, when the team began its work or how soon the site’s flaws might be corrected.

Uhm, no offense intended, but “the best and the brightest” probably already have other work to do. Or else they should have been brought in six months ago. Not that it would have helped with this project, which appears to have been doomed for some time. I’m sure the “best and the brightest” would have told them that the project was going to blow its schedule, scope, and budget.

Actually, “the best and the brightest” are probably no smarter than people already on the team. Calling them that is just rhetorical fluff. So I can’t help thinking it must really suck to be on the development team. They would have seen this train wreck coming for months. And we know they tried to warn officials about the problems. But now that disaster has struck, government officials are effectively calling them “mediocre and stupid.”

In Software Engineering, Sometimes Failure Is the Only Option

On day one of the roll-out, I explained that first-day glitches in a large production web site are meaningless. With only a few specialized exceptions (and some lucky ones) things always go wrong on the first day. It’s a normal part of the shakedown process, and not necessarily a reason to get upset. However, just because first day glitches are normal, that doesn’t mean there aren’t also real problems with the site. Only time will allow us outsiders to tell if early problems are roll-out issues or evidence of more serious defects.

It’s been two weeks, and now we know. is software disaster, and insiders are starting to talk:

“These are not glitches,” said an insurance executive who has participated in many conference calls on the federal exchange. Like many people interviewed for this article, the executive spoke on the condition of anonymity, saying he did not wish to alienate the federal officials with whom he works. “The extent of the problems is pretty enormous. At the end of our calls, people say, ‘It’s awful, just awful.’ ”

That’s from an excellent short article in the New York Times by Robert Pear, Sharon LaFraniere, and Ian Austen that points to some of the reasons for the failure.

Most of the coverage of the problems at has repeated the point that the government had three years to get the website working. That may sound like a long time, but it’s not. I’ve been writing software for over 25 years, including about 8 years working on government contract projects and another 9 years helping to build healthcare enrollment websites. I think that three years is plenty of time for a large-enough team to write the software, build the website, load the data, test the system, and launch our national healthcare enrollment system.

The problem is that they didn’t really have three years.

Deadline after deadline was missed. The biggest contractor, CGI Federal, was awarded its $94 million contract in December 2011.

That leaves a little less than two years. And it’s only the beginning of the problem, because the Affordable Care Act is not a software requirements document. Much of it just established the regulatory process that would spell out the details of all aspects of the healthcare exchanges, including the software requirements. That took time. A lot of time, and maybe for some questionable reasons:

To avoid giving ammunition to Republicans opposed to the project, the administration put off issuing several major rules until after last November’s elections. The Republican-controlled House blocked funds. More than 30 states refused to set up their own exchanges, requiring the federal government to vastly expand its project in unexpected ways.

The result of all this was a very late start:

But the government was so slow in issuing specifications that the firm did not start writing software code until this spring, according to people familiar with the process. As late as the last week of September, officials were still changing features of the Web site,, and debating whether consumers should be required to register and create password-protected accounts before they could shop for health plans.

This explains a lot. Changing features during the last week before going live isn’t unusual on any website, but changing major operating rules — like whether you have to login — is a sign of serious problems. (Unless they were aware early on that the rules might change and implemented it both ways.)

In addition to the late start and shifting requirements — problems common to many software engineering projects — large projects like these have their own distinctive problems. Historically, the most common killer of large software projects is integration, and the healthcare exchanges appear to have foundered there:

One highly unusual decision, reached early in the project, proved critical: the Medicare and Medicaid agency assumed the role of project quarterback, responsible for making sure each separately designed database and piece of software worked with the others, instead of assigning that task to a lead contractor.


While some branches of the military have large software engineering departments capable of acting as the so-called system integrator, often on medium-size weapons projects, the rest of the federal government typically does not, said Stan Soloway, the president and chief executive of the Professional Services Council, which represents 350 government contractors. CGI officials have publicly said that while their company created the system’s overall software framework, the Medicare and Medicaid agency was responsible for integrating and testing all the combined components.

These problems should have been obvious to project managers. And they were:

Confidential progress reports from the Health and Human Services Department show that senior officials repeatedly expressed doubts that the computer systems for the federal exchange would be ready on time, blaming delayed regulations, a lack of resources and other factors.


By early this year, people inside and outside the federal bureaucracy were raising red flags. “We foresee a train wreck,” an insurance executive working on information technology said in a February interview. “We don’t have the I.T. specifications. The level of angst in health plans is growing by leaps and bounds. The political people in the administration do not understand how far behind they are.”

The Government Accountability Office, an investigative arm of Congress, warned in June that many challenges had to be overcome before the Oct. 1 rollout.

“So much testing of the new system was so far behind schedule, I was not confident it would work well,” Richard S. Foster, who retired in January as chief actuary of the Medicare program, said in an interview last week.

The response from higher officials just kills me:

But [the chief website architect’s] superiors at the Department of Health and Human Services told him, in effect, that failure was not an option, according to people who have spoken with him.

Sorry, no. Software engineering just doesn’t work that way. No amount of willpower, positive thinking, or self-confidence will make a failing software project into a success. Neither will threats of unemployment. In fact, once a project is well into the development phase, decades of experience show that it’s almost impossible to turn around a project that is late and over budget.

In the early days of software engineering, we used to call this the software crisis. As computers got more powerful, and more able to communicate with each other, it became possible to run much larger software systems on them than ever before. But as the software projects got larger, more and more of them started to fail by falling behind schedule, going over budget, being riddled with defects, or all three. As I mentioned earlier, the integration phase was a big problem. Quite often what would happen is that the teams developing various parts of the software system would make good progress, but when they tried to integrate all the components into a working system, it would fall apart for reasons that were complex and hard to fix.

An even bigger problem arose if the requirements weren’t stable. The project teams would develop requirements documents, they would be approved by the customer (which might be internal), the developers would start coding the system, and then someone would discover that the requirements were incomplete, or wrong, or the customer would decide to change them. This was a huge problem: Changing requirements in the requirements document was easy, but changing requirements after coding had started was time-consuming and expensive.

A lot of effort in the early years was devoted to developing methodologies for capturing requirements and checking them for completeness and correctness, and also for developing thorough design documents that specified all the system interactions, so that software components would integrate smoothly. A few extremely well-run teams had manged to do this very well (e.g. the software team for NASA’s Space Shuttle), and the industry was focused on the idea of improving requirements and design processes as a way out of the software crisis.

This entire software development process — from requirements to design to coding to integration to testing to deployment — was referred to as the waterfall model (because of the diagrams). But if you look at all the major successful websites — Amazon, Facebook, Twitter, LinkedIn — it’s likely that none of them were developed this way.

What happened was that the dominant methodology of software development evolved into something called agile development. Software engineers decided to accept that unstable requirements are an inevitable part of the process and to eliminate the big integration step at the end. They start by building a very small piece of software that works. It is integrated, tested, and deployed (at least on a limited basis). Users get to see it and play with it right away, which allows them to give feedback, which is used to plan the next iteration of development. Each iteration of the development cycle takes somewhere between a week and a month. And at the of each iteration, the development team has a full product that is integrated, tested, and ready to deploy.

Initially, the product is only shown to internal customers — development managers, product managers, company executives. As the developers keep iterating, it slowly acquires new functionality, piece by piece. Changing requirements are no longer a big problem, since reviewing and adjusting them is a built-in step at the end of each iteration, as developers plan for the next iteration. Bad ideas can be discovered and discarded early, and good ideas can be recognized and developed further.

At some point, the product is deemed good enough, and the software is released to the public. Often the initial release has reduced functionality and is released to a limited user group. As the product evolves through iteration after iteration and acquires new functionality, it gets released to larger and larger groups of users until eventually everyone can use it. This slow release method allows the development team to test their ideas in the real world, and it also reduces the stress of suddenly scaling up the system to full size.

Unfortunately, when your product’s functionality and release date are both defined by act of Congress (and its regulatory agents), the iterative method doesn’t help much. Neither do the politics:

Nor was rolling out the system in stages or on a smaller scale, as companies like Google typically do so that problems can more easily and quietly be fixed. Former government officials say the White House, which was calling the shots, feared that any backtracking would further embolden Republican critics who were trying to repeal the health care law.

Critics have been eager to paint the disastrous roll-out as a failure of ObamaCare, but the problems are really the result of the government procurement process and of the inability of legislatively-defined software projects to benefit from modern design processes, especially in a political environment in which any major change in requirements would require the approval of a divided Congress.

Obamacare Day One Doesn’t Matter

I’m very skeptical about the effectiveness and long-term benefits of the healthcare system put in place by the Affordable Care Act (Obamacare). However, I’m not going to join in with the people who are proclaiming that Obamacare doesn’t work because they’ve visited, poked around for a bit, and found busted web sites.

It’s the first day of production operation for a highly-anticipated new website. Suddenly systems that have only ever been subject to synthetic test loads are being hit with a huge number of real-world visitors. That’s a hard thing to get right, and almost everybody screws it up a bit. (For example, Grand Theft Auto V is having a bit of a wobbly.)

Stuff just happens. Maybe somebody loads last week’s test data into the pricing tables instead of this week’s final data. Maybe the server farm that produces the drop-down list of plans for your location turns out not to scale — running the system on 10 times as many servers supports 10 times as many users, but running on 100 times as many servers only supports 35 times as many users, so the system hangs at at that step, and  everybody presses F5 to reload the page, doubling and tripling the load in minutes.

Maybe a database administrator accidentally places two highly-active database tables on the same hard drive. Maybe a jQuery graphics plugin used on only three pages turns out to load assets from a third-party server run by people who had no idea they were about to support a nationwide healthcare rollout. Maybe the batch job that should have loaded the new graphics assets stopped after only updating half the servers and nobody’s noticed because they’re all working on some other problem. Maybe the training video includes a cute kitten and Ellen DeGeneres tweets the link to 22 million followers, overloading the video servers. Those who can’t see the video press F5 to try again.

And so on. This kind of stuff just happens.

Sure, there are a few companies who get this sort of thing right, but they tend to fall into a few special categories such as (1) server farms with a business plan, such as Google or Amazon, for which a sudden surge of 30 million visitors per hour may not even be a reportable event for their automated provisioning system, (2) companies such as Microsoft and WordPress, which operate mature, well-understood systems where even the peak loads have predictable effects, and (3) companies where day one performance is critical, perhaps because the release is timed to a marketing plan and if the site isn’t there when customers come looking, they’ll give up and never ever come back.

The Obamacare sites don’t fall into any of those categories, so it shouldn’t be surprising that they’re a little glitchy on the first day. Everybody has at least three months to sign up, so there’s still plenty of time to get things right. If you’re having problems enrolling now, just try it again in about six to eight weeks. By then they should have eliminated any teething problems.

Note that I’m not saying that they will get things right. For all I know, the marketplace sites launched too early and are doomed to months or years of failure. But day one is just too soon to tell.

Google’s New Reader Mess

It looks like Google has decided to screw up Reader with a new design:

Google Reader Redesign

I guess they wanted it to have a more modern-looking design, and I suppose it looks nicer, but it’s kind of goofy from a usability standpoint. The biggest problem is that everything is, well, bigger. They’re following the modern web design trend of separating things on the page with space rather than graphical components.

The thing is, though, nobody visits a web page for its use of space. You visit a web page for its content. Now, your enjoyment of the content is affected by how it’s presented, of course, but the presentation should enhance the content, not hinder it or overwhelm it.

So what’s the content of Google Reader? Links to other content. Users of Reader want to be able to scan through dozens or hundreds of items to see what looks worth reading. That means a good page design for a feed reader should present as many links as possible, so users can scan them easily for something of interest. The new design simply doesn’t display as many links as the old one.

And here’s another thing they could do instead of filling the page with space: Let us see the full names of the blogs we’re reading. The column on the left can’t be resized, so I’m going to be left reading “Marginal Revolut…” and “Technology Liber…” I can’t remember if the old page design occasionally cut something off, but it’s certainly become more of a problem now.

Oh, and the scrollbar is slightly narrower, making it slightly harder for me to click. And the scroll thumb — the part that moves up and down — doesn’t appear until my mouse is over the scroll bar, which means I can’t position the mouse vertically until I’ve got it positioned horizontally.

It’s like one of those weird buildings, where all the architecture critics ooh and aah over how swirly and unconventional is, and no one seems to be noticing that the offices are cramped, there aren’t enough bathroom stalls, and the roof leaks.

The Only Known Requirement

There’s an animated video making its way around the legal blogosphere in which a cynical older lawyer tries to convince an idealistic prospective law student not to go to law school. As a non-lawyer, I’m probably missing half the jokes, but it’s still pretty funny.

I was more fascinated, however, by the web site used to create the video,, which provides tools for anyone to make a video in a similar style. Once you register for a free account, you can use a simple interface to create the shooting script. You can change the characters and setting, and you can annotate the script with facial expressions, animations, sound effects, and changes in camera angle. When you’re ready to see what you created, the site can generate a low-res version of the video for you to preview. Once you’ve got something you like, you can re-render a high-quality version for downloading and publication.

I had to give this a try. In keeping with the theme of the first video, I made it about what I do for a living. I don’t know what the job market is like for a beginning software developer, but I’m not nearly as cynical about my job as these lawyers are about theirs. Still, there are some annoying surprises for people just out of school. After a few years, the surprise factor is gone, and only the annoyance remains.

So here’s my video of a conversation in which Mark, a software engineer, finds out about his next software project from Ted, the boss. My clients are too smart to waste my time and their money this way, but back when I was an employee at a mid-size corporation, I had a depressing number of conversations like this. You see, a lot of people launch software projects with almost no thought to the details, except for one curiously specific requirement…

Followup On Sony Vegas Problems

When I figured out that my Sony Vegas video editing software was crashing because of a bogus file date, I filed a detailed problem report with Sony Creative Software, and I emailed a short description of the problem to Premiumbeat, the suppliers of the music files—created by a guy names Styve—that had bad dates on them.

I got a reply email from Premiumbeat early the next morning:

Hello Mark,

I’m am sorry you had problems with one of our music files.

I had no idea Styve was that old!

Seriously this is a most bizarre mistake. I checked the creation date of the file and it says November 30 1979 on my end. So I don’t know what to think of it.

I have sent your comments to both the composer and to our technician. We’ll do what we can to correct this.

Thank you for taking your time to let us know about this issue with all the details.

We appreciate.


Gilles Arbour

Fast, nice, good natured, and thankful for pointing out a problem that would discourage customers from buying more of their product.

(Although, the 1979 date on the files isn’t much of an improvement, unless Styve was really pushing the envelope with Apple IIe sound technology…)

Sony Creative Software took a day longer to respond to the problem, and their message was professional, but, well…

Hi Mark,

Thank you for contacting Sony Creative Software, and thank you for the update on the problem. Please let us know if you have any further questions or concerns on this issue.

If you still have a follow-up question on this particular incident, please feel free to update it. If you have a completely different question, please create a new incident.


Padraic C.

Sigh. Is it too much to expect a “thank you for sending us a problem report that described how to reproduce the problem in detail and pointed us to exactly the place where the problem was occurring”?

Why Software Testing Is Hard

Now that I’m experimenting with video for the blog, I’m learning how to use video editing software. Currently, I’m experimenting with Sony Vegas. I like it a lot, it seems intuitive and fairly powerful.

One of the more interesting features of Vegas is a tool called Media Manager which helps you index and track your media. That’s important because video files are huge—the original file for my drive home is 3.5 GB—and I won’t be able to store them all on my hard drive.  (Although, terabyte USB drives are less than $200…) I’ll have to burn them off to DVD, and it would be nice to have a database of what’s stored where.

While I was experimenting with Medial Manager, I tried to load the .WAV file that contains the music playing in the background of my last video and Vegas crashed.

A little experimentation showed that it was only those music files that crashed Vegas, not any of the the other sound files or any of the video files. Also, the problem went away when I disabled Media Manager.

Each time Vegas crashed, it popped up an error message box that said “This application has encountered a fatal exception and will close.” Not very helpful by itself, but there was a Details button, so I clicked it. With most Windows applications this would be some ugly bit of hex code and address offsets useful only to someone with access to the source code. But parts of Vegas—or at least Media Manager—are apparently written using the .NET framework because this was a normal .NET exception trace dump, and .NET has a much more helpful standard of error reporting.

Media Manager uses an embeddable desktop version of Microsoft SQL Server 2005 as its database, and it was clear from the library classes involved in the problem that Media Manager was calling the .NET framework SQL Client code to access the database, and it was blowing up because it was using a date value that was not a valid SQL Server date. The .WAV files I downloaded must have had a bad date in them somewhere, and Media Manager was blowing up when it tried to use it in a SQL query.

I poked at the .WAV files a bit, and I think I found the problem. The SQL Server datetime data type can hold a range of dates from January 1, 1753, through December 31, 9999. The .WAV files, however, were all timestamped with a creation date in the year 1601.

A lot of software bugs have this pattern, where a defect in one program—whatever was used to create the .WAV music files—generates data that exposes a defect in another program—crashing Sony Vegas.

In cases of invalid data like this, you normally don’t want the program to just crash. It should either fix the problem, report the problem to the user, or both. In this case, it’s an understandable mistake. If I were a developer or tester at Sony Creative Software, I doubt I would have given much thought to how the software should handle a file that purported to have been created 350 years before the invention of the computer.