Most engineers will start with complex task, model it and then break it up into subroutines which execute in a linear fashion. Not unlike a cook following a recipe or a farmer preparing and planting crops. The challenge with making software multi-threaded is that sometimes you just simply can't parallelize many of the tasks the same way a farmer can't prepare the soil, plant the seeds and apply the fertilizer simultaneously without stepping back and abstracting the entire process and analyzing the problem from many angles looking for creative efficiencies. After doing such an excercise the farmer might come up with a a tractor that makes one pass across acres of farm rototilling, seeding and fertilizing in one shot. The same has to be done with software. Simple routines that solve simple problems can be hacked out in no time. Multithreaded applications that tackle complex problems can take mohths if not years to model, abstract and design.
Yes, you will always have tasks that must be done in order. But within tasks, you will usually be able to divide up work.
In the farmer's example, let's say you have to prepare the soil, then plant the seeds, the apply fertilizer. Well, you could have 10 farmers prepare the soil (parallel!), then have the same 10 farmers plant the seeds, then the same ten farmers could apply fertilizer. In a large field, you'll have multiple rows of crops, so the individual tasks, which must be performed sequentially, can be broken in parallelizable chunks of work.
In video games, you have to calculate the physics, update the universe, then draw the graphics.
But the physics engine can be made parallelizable, as can the drawing of the graphics.
Most of the programs that crunch large amounts of data do so on highly parallelizable tasks. There are rare exceptions, but for the most part, many forms of software can be made parallelizable with the right approach. It just takes more forethought. But with multiple cores becoming the norm, this type of forethought will have to become the norm. And once you've done it a few times, it becomes more and more natural to figure out how a seemingly serial set of tasks can be broken down into independent, parallelizable threads, with a few events being used to synchronize the parts that have to be in order.