Zelaven's Programming


This page is under construction.


These are my thoughts on how I do programming.

They are subject to change, however some of these ideas I believe to be fundamentally solid.



Code Conventions

Code is what we write, and I find that every part of writing code that doesn't directly affect the semantics of the program ends up being done in as many ways as possible.

Now, What we should care about is Writing, Agreeing on, Reading, Modifying, Executing, and Debugging the code. Clearly, writing, reading and modification are directly affected by code convention, but I believe agreeing on the code is what most people have issue with when it comes to this point.

I will here detail some things I believe to be useful. While they are comprised merely of my opinion, I will provide my reasoning as to why I think the way I do.

Line Length

This point is painful because our work is quite archaic. We write strange text in glorified txt files with odd file extensions.

While it would be great if we could move away from plaintext for code storage, the reality is that it is what we have to work with right now. It will probably stay around in some capacity for as long as C and other languages continue to be tightly integrated into workflows and build systems.

This is where differences in editors and their ergonomics matter greatly. Depending on your team and the tools that you team uses this point might not be very important. Some people just hit some autoformatting button and that is the end of that. I don't have an autoformatter, nor do I intend on investing time and energy into one when some simple habits solve all my problems.


What I do is that I make a cut at the 80 character mark. While this point has been belabored to death many times, I find that 80 characters alwaysfits half screens with reasonable readability with simple line-wrapping editors like Vim.

People that work only within the confines of an integrated system like a big name IDE may not quite see the point in this. It's quite simple; I often work off-the-cuff and on machines other than my own, in which case it's valuable to be able to edit the code effectively without reliance on a fancy IDE.

So how do I format within 80 characters? I format over multiple lines. Procedure parameters can go on their own lines, providing a formatting that is easier to read as a list anyway. I do the same at call sites. Note that I don't do this where it makes sense, as I don't want to needlessly dilute the usage of the screen space for no reason. If an expression fits within the 80 characters then I'm inclined towards one-lining it (this includes indentation, of course).


Back to top.

Tabs Vs. Spaces

I use tabs.

When people praise spaces, they talk about how it makes your code formatting consistent and how important it is that code you write on your own machine looks the same on the machine of another person.


Firstly, any reasonable code formatting style doesn't intrinsically care about your choice of tabs or spaces for indentation.

Secondly, unless you are using an identical setup; Editor/IDE, color scheme, highlighting and all other possible visual customizations, it isn't going to look the same regardless.

At my workplace we all simply use whatever editors and customzations that we want in order to make the workflow of each individual as effective as possible.

At the time of writing, the number of different editors equals the number of people. This has never led to any problems.

The arguments in favor of spaces are completely void to me.


To me, a level of indentation is a logical unit. A single tab for a single level of indentation is what works for our workplace and we can all work with the code with the level of indentation that is easiest for each individual to read, prioritizing the efficiency of each of us.

Also, if reading code in another person's setup is that difficult for you then you should simply do it more. Pair programming can be very useful sometimes, and you have to read on another person's screen if you're the second person.


Back to top.

Naming

You can name types, variables, etc. however you think makes the most sense to you.

One thing that is important, however, is consistency.


If you ever read any of my code you will notice that I name all my types as Names_With_Underscores.

If you find anything that looks like that in my code you will, from the name itself, know that it is a type and not a variable or a procedure or anything else.

If you get inconsistent then your code will get mentally burdensome to read, simply because it lacks those "mental markers".


Sometimes you will have to use libraries that use other schemes, in which case it is perfectly fine to use typedefs in your headers and renaming macros in your code files to make things consistent.

Do note, however, that being able to look things up in the documentation for the library is important.

Therefore, if I use a library that has a type called foo_bar, then I would typedef it to Foo_Bar. This way most search engines should be able to find the right thing despite the difference in capitalization.

In some cases a library uses names that simply won't fit nicely into your naming scheme. In this case I think it's better to use typedeffing than to use the default type name. This will incur a one-off confusion of the type being named differently, but the types won't look completely out of place.


Many programmers today appear to believe that if two programs are syntactically identical, then the one that took fewer button presses to type is preferable because it will supposedly take less time to write.

That is, they will justify things like single-character variables names and the like.

Please, for the sake of anyone who has to read your code, including yourself, don't do that.

Variable, type and procedure names should be descriptive.

Sometimes an iteration variable can be reasonably named 'i' without leaving anything to be confused, but you almost always want variable names that are at least 5-10 characters long to be able to communicate what something is.

If you have a list of entities, then it might be deducible from the type of the variable what it is, but there is no defensible reason to name it 'el' instead of 'entity_list'.

The amount of time it takes you to type in the code now will be saved by at least tenfold in time it takes to understand the code you have written later, even if you work alone. Code is read significantly many more times than it is written.


Back to top.

Practical Aspects of Programming

I have learned a some things about practical programming through experience and testing the ideas of other people, which I will share with you here.

The commonality you may pick up from this section is that they are all oriented around solving problems. I do not suggest people go and do things that are merely thought up to meet some abstract criterion.

What i describe here constitutes the foundational elements of how I approach programming, so what I preach here is what I practice.

Specific and General Code

Imagine for a moment a piece of code that opens a window and draws to it. That code is likely written with a library such as SDL. That is, in my opinion, a bad long-term solution.

The problem is that you have what I like to call "specificity inversion" in code like that. To explain what I mean by that, let's look at what happens.

When you call to SDL to open a window the code looks the same on every platform. This is nice in that your code is directly portable between platforms (aside from platform-specific bugs in the library, e.g. SDL 1 has keyboard issues on Linux). The library abstracts away the platform and implements a consistent, cross-platform interface for you to program against.

The problem is that you get "lowest common denominator syndrome" in the code. The library cannot in a platform-independent way expose to you platform-specific features. If a platform has a great feature for audio that isn't available on other platforms then that's either just tough, or the usage code has to handle added complexity for each platform-specific feature it may touch.

Casey Muratori shows another way to handle platform code in Handmade Hero. Instead of the game code calling a platform library, the game is structured so that the game code is the library, and the platform code, which he easily writes himself, calls the game. In this manner he both eliminates the need for abstraction and the need for shoe-horning at the same time. In fact, the shoe-horning is a direct consequence of the abstraction.

Specific "Always" Calls General

All rules in programming are rules of thumb. However, this rule is quite near absolute in my experience. When code conforms to these rules then the usage code is offered the maximum flexibility when the programmer decides how to implement business logic when leveraging reusable code.

When general code calls specific code then it involves an abstraction designed to hide the specificities. This is because general code can only call into general interfaces, as it would otherwise no longer be general. This means that things like threading implementation, windows, audio, IPC mechanisms and everything else that is platform or environment dependent gets hidden away. To make matters worse libraries that do this commonly violate other important practices in this pursuit, often relying on things like global locks or secretly managing a plethora of resources outside of your control.


An example offender is OpenSSL. The interface looks nice until you try to set up your own threading scheme and realize that you get a segmentation fault inside a library call. Turns out that in order to hide details, probably in fear of usage code programmer incompetence, they do things like lazy initialization where things are initialized on a first use. While this doesn't sound that bad in and of itself, it turns out that they are lazily initializing global variables in a multithreaded environment. Instead of letting the usage code take care of threading related problems and providing easy-to-use APIs to do what you normally want, they assume that on a UNIX system you will use pthreads and you will segfault if you don't. If you want to do something else then you are forced to download their repository and build it yourself, with a specific flag, so that you can be graced with permission to implement some procedures that they need for the library to work.

What could OpenSSL have done instead? They could perhaps have let you initialize the random bit generator yourself and pass it to the RSA API. This is what MbedTLS does. It gives the usage code more control and invalidates the need for the immense complexity that is crammed into the library to handle multi-threaded lazy initialization. It allows the programmer the flexibility to use alternatives to the given random bit generator, which you may object to as dangerous, but may be applicable in some scenarios that nobody had thought of. Maybe some researchers or cryptographic engineers would find it useful.


So what should you do when writing your own code? First of all, I recommend that you watch the first few episodes of Handmade Hero. Everything else I can tell you is extrapolated from that, so go to the primary source and have a look.

Note that this section is closely related to The Client Provides the Resource and The Hierarchy Of Partial Solutions, but feel free to read in any order.

To start with a recap of how Casey does platform code: Have the game code (general, business logic) function as a library that is called by the platform code (specific, plumbing code).

Next up is the hard part; don't make abstractions that aren't absolutely necessary and don't make them abstract enough that they violate The Client Provides the Resource. Take a client/server application where you have your own encrypted protocol as an example. It's tempting to abstract away the crypto library used. Why would the user of the library care anyway? It seems that it's a great idea to abstract away the crypto library, but it turns out that it's painful in practice. Not only do you inherit the all problems all the libraries you try to support may have, but you also inherit all the instances where they don't have problems.

Remember that OpenSSL did lazy initialization and that MbedTLS doesn't? Well, if you want to support both you need to decide how you do the initialization in case the user wants to use MbedTLS. Lazy initialization adds a (small) cost to every operation as it checks for initialization, so you may get lowest common denominator creeping in on you there. The other option is to have a mandatory init call in your API that does nothing if the user is using OpenSSL and initializes a global variable in the MbedTLS case. In any case you get nothing but problems.

It's simply better to explicitly expose to the user an API tailored to the dependencies the user selects, in which case the user can provide the resources specific to the dependencies. If the users then want to abstract the dependencies away then it's their own problem. This doesn't sound very helpful towards the user, but any specific program will rarely need to support multiple dependencies, in which case the user has to devise a strategy anyway. Making it clearer what the data flow is in the usage code seems like a much better tradeoff to me.


Back to top.

The Client Provides the Resource

This point is about affording maximum control to the usage code.

Let us consider a "create object" procedure that allocates a struct on the heap and returns the address to the user. This "create object" procedure takes care of initialization and whatever else needs to happen, including the memory allocation. Sounds good?

There is one problem; by maximizing the amount of code that lives inside the "create object" procedure, we decrease flexibility. When the loss of flexibility impacts resource management then there is a problem.

The "create object" procedure makes multiple implicit assumptions; there is never any reason to allocate the object on the stack, the word of memory for malloc'ing each object is insignificant in all possible applications, cache locality doesn't matter, possibility of memory fragmentation from use of malloc for these allocations doesn't matter.

Compare this to an "init object" procedure, which does all the things "create object" does, except the user provides a pointer to the object, which the user has allocated. The flexibility afforded to the user now allows the object to be stored on the stack, in arrays, as a direct struct or union member, etc. I do not see any scenarios where this is not the desirable tradeoff.


Letting the client provide the resource allows the user to simplify and specialize resource management tremendously, e.g. by allocating on the stack. When done pervasively, including by the user, resource management becomes consolidated close to the program entry point, which makes it much easier to get an overview compared to when resource management is scattered across the entire codebase.


In addition to these benefits, I am also of the opinion that it makes programs easier to understand. I find that it takes an unreasonably long time to work through the details of library APIs when they manage resources to find out what APIs need to be called and when to avoid leakage.

When the user provides the resources then the leaks are very visible right there in the usage code, just like any other usage code mistake. Obscuring resource management also obscures resource leaks. Therefore, resource management should be as visible as possible.

The RSA API of OpenSSL comes to mind. What bignums are owned by who and when is at least documented in this case, but if I could handle the memory management then it would be entirely unnecessary.

Some may object that letting the usage code of a crypto library manage its own RSA keys is a bad idea, but there are so many more ways to mess up, especially if the workings of the library are obtuse, that I think that is the least of our concerns. Besides, where did the keys come from in the first place? Probably a file that is more vulnerable to attack than recycled memory. You need to be realistic about your threat models, and if you work with anything using cryptography then you need to know how to manage your resources safely, including your cryptographic keys, without blindly relying on libraries. Depending on your application the library may not even know about all the places the key exists in some form.


Back to top.

Resource Management using Single Return

Memory management scares people away from languages like C, but I won't find it very difficult. Libraries with terrible APIs can make it a terrible job, but code that is entirely within my own control is a breeze in this regard.

By adhering to The Client Provides the Resource the amount of resource management I need to do is often relatively small, and whenever I do have to manage resources in the usual "allocate at entry and free on return" it's usually not that big a problem.

Most code I see that manages resources in this manner often has multiple levels of conditions. If there are two resource acquisition and three possibly-failing procedure calls in there then there is a stack of five conditions, each with their own else-cases that need to handle the specificities of each failure point, including freeing resources.

This is also what I have personally seen people point at as what makes resource management fragile n C. I also have this feeling that people attribute this to be intrinsic to C rather than intrinsic to how the code is structured, which I find fallacious. I write my resource management code differently.


What I like to do is to introduce a boolean "success" variables and a declare my resource variables at the top, somtimes in an anonymous struct called "resources". I initialize the resources to some sort of "not allocated" value, e.g. NULL for pointers to allocations or -1 for file descriptors. At the end of the procedure I check the values of all these variables to determine if they should be freed.

The way I write the code that follows is in a todo-list-like form. Each block of code that needs to be executed is formatted in a column rather than nested the end of the previous one. Each code block is guarded by "success == true", where "success" is set to false if any code block encounters an error, which are the same conditions that would result in entry into an else case in the "nested conditions style".


In my style with the single return at the end, with a failure simply skipping the remaining code blocks, it becomes easy to verify whether or not a resource is properly managed. It gives more or less the same end result that a defer statement can achieve, but with all the flexibility you may ever want. Admittedly, the flexibility only rarely becomes relevant, but with my programming style this kind of resource management only rarely becomes relevant in the first place.


Back to top.

Optimization

In programming, the optimization immediately leads one to think of neckbeard wizardry that makes a procedure fast... somehow.

However, there is a little more to it. Of course, taking a deep dive in how many cycles are used by different sequences of machine instructions and profiling for several days is an option, but not the only one.

In the lecture Philosophies of Optimization Casey explains that there is another, important aspect of optimization. It may sound obvious, but most programmers routinely does the opposite; Don't make your solution more complicated than it needs to be. This is called non-pessimization. The lecture is only about 18 minutes and extremely valuable, so I highly recommend it.


I don't have much experience with hardcode, machine-instruction level optimization, as it hasn't proven itself necessary to me often. I have, at the time of writing, only done it once because I found it interesting to do in relation to writing some fixed-point calculations code.

I mostly practice non-pessimization, which helps performance as well as all the other things a programmer should care about; readability, modifiability, and so on.


Back to top.

Dealing With Shoddy Type Checking

Shoddy type checkers, or the complete lack of type checking, can be quite the annoyance. It may surprise you that I will suggest Hungarian notation, but that is because you haven't heard all there is to it.

Joel Spolsky explains in a post on his blog called Making Wrong Code Look Wrong (archive.org) that there are two hungarian notations. The original prefixes things with information of semantic importance while the more well-known one uselessly prefixes things types.

If you want to know more, then read his blog post.


Back to top.

Logic Density and Readability

This is one of those points that is a little vague and doesn't have any hard answers. Uncle Bob identifies that code can be difficult to read and understand if it is all very verbose and condensed into a single place (see his lecture "Clean Code"). However, he states that each procedure should do only one thing, by which he means that it's impossible to split the procedure into two smaller ones, and that it solves all code reading problems.

Here's the kicker in my opinion: while he's right that reading a very verbose, 3000 line procedure called "gi" can be somewhat arduous, replacing it with a quantity of procedures of a similar scale will never be a great idea, no matter how many groupings of code you build up around it.


Let's unpack the problem of a long procedure. Let's say that it's your main procedure and you need to initialize a few subsystems, parse some configuration files and then you have some sort of primary loop that handles interaction with the user and other events.

You may already imagine that this main procedure will become lengthy. In this imaginary code, it's not difficult to also imagine that it will be difficult to reconstruct the list of things it does from skimming it.

Now, let's imagine that this same code has been atomized so that every single thing this code does it more or less simply deferred to the leaves of a giant call tree. Here I the challenge is the inverse; find out what the code does, concretely, to accomplish its task.


It should be obvious that I think some middle-ground approach is required. I also think it isn't possible to define any objective metric that will always work for this. What we can do is try to understand how this all works.


Looking first at code atomization, the problem is clearly that the implementation logic becomes diluted. When your break every sequence of operations into tinier pieces, then it's clearly more difficult to determine what the sequence of operations is. While this may seem unimportant to some, I would argue that it's perhaps the most important thing to understand in your program. After all, bugs in your program are wrong sequences of operations, so you want to be able to determine what they are.


Now, while John Carmack suggests inlining code when possible (text available here (archive.org)), I wish to argue that moving things out into separate procedures can be very helpful even when it is only called once. There are many other good points in there, so give it a read as well.

The long, verbose, procedure with many details is probably difficult to get an overview over, which is likely where the "it must fit on your screen" idea came from (but what about different screen sizes?). Now, you may notice that the implementation logic is maximally visible. It's all there in sequence, after all. The thing we need to recognize here is that initialization of two completely disjoint subsystems have no overlap. The implementation logic has a maximal density, and won't become denser by including things that are are independent form each other in the same procedure. Imagine that you had put them in separate procedures that can be called in either order, in which case the program would be semantically equivalent. It should be clear that there is no utility to verbosely present those implementations next to each other.

By splitting out the disjoint pieces of the main program code into procedures it suddenly becomes clearer what the structure of the main program is. That is, the structural logic of the program becomes clearer.

When deciding how to slice your program, you should maximize as much as you can both implementation logic and structural logic density, and make tough calls whenever they are in conflict.


Back to top.

Software Architecture

This is one of the difficult issues, particularly because the best architecture depends on the problem domain. Now, my experiences are limited and there are many problem domains I have never touched. I can only really speak for my own experiences, and I am still in the learning process.

Instead of talking about specific problem domains, I will mostly explain my general approach and thoughts I think may be valuable regardless of domain.

Planning and the Development "Pipeline"

Serious programming projects are often complex enough that you don't simply sit down and type code for a few hours, and then the project is complete. More likely than not, there are going to be parts of the project that you don't have specific experience with. Of course, if you your job is to make an android java app and you have made a few dozen of those already then that part probably won't give you too much trouble, but the business logic may still pose novel challenges to you.


Before starting on a serious project you may want to draw some things to get an idea of what it is your're going to be developing in more concrete terms than just "program that does X". However, there is a bit of a "chicken and egg problem" that I haven't seen mentioned anywhere: In order to know how to write a non-trivial program you must have already written it. This means that you will make mistakes along the way and that the end result you produce will be flawed. There is no way around this.

This means that in an ideal world you would write the program not once, not twice, but multiple times. Each iteration on the program would be an improved, more refined version. However, this is not feasible in practice for non-trivial projects and projects with deadlines.


When starting out on a software project you need to have some idea of what it is that you are making, and some sort of plan for how it will be structured. This is significantly more important when the project involves multiple people. All artifacts of the planning process will be used to simplify and reduce the need for communication.

Worth noting is that the planning process doesn't only decide the structure of the program, but it also decides what the optimal team management structure is. The more sub-teams there are the more distinct sub-programs are involved in the structure of the overall program. Likewise, if you decide the team structure first, then you will inevitably force the structure of the codebase to conform to it as well. I recommend that you watch this video on The Only Unbreakable Law.


So, what has been attempted?

One thing that has been attempted are UML diagrams. These diagrams can be very detailed, including every single object-oriented class in the program, the fields, all procedures, and so on. This is not very useful in my estimaton. The time investment in writing very large, detailed diagrams won't yield more overview of what the system should do than putting the same time investment into writing the code, in which case you can also easilier find mistakes in your mental model.

Other than UML and alternatives that honestly sound very similar to me, I don't know of only of ad-hoc approaches, which is honestly what I would recommend.

As important as planning is it may sound odd that I recommend that you ad-hoc your way through your planning, but it's both quick and effective, can be tailored to the specific needs of your team and your project, and can be done at any combination of levels of detail and abstraction, which will be relevant at different parts of the development process of specific projects. There is no silver bullet to solving creative problems, so your planning needs to be flexible, and fewer things are more flexible than plotting something free-form on a whiteboard.


When planning there is an another point to take into account: The more detailed your plan is, the more wrong it is.

If you can write a complete UML diagram of your entire software project then it is either embarassingly trivial or you're omniscient. Clearly, if you can diagram the entire project without mistakes, then you could simply write the program without mistakes instead.

I recommend that you take an approach where you balance planning and experimentation. If you can make a call based on past experience, then great, but if you either don't have specific enough experience with something or there is room for learning then experimentation may prove very useful.

Experimentation may seem like busywork. After all, you will be discarding code you write that you think doesn't lead to good solutions. On the contrary however, if you don't experiment then you are really just choosing a direction based on good faith that the first idea is the best one. The strategy is to spend some time up front that will save you (and your team) much more time in the long haul.


Worshippers of contemporary object-oriented religions may scoff at these ideas. After all, all problems must be solved with more abstractions (classes).

Abstractions dilute implementation logic and are often harmful to achieving high performance, so in serious programming projects they are at most a necessary evil.

This also ties into the myth of how object-oriented programming grants you code reusability for free with no effort required. If that interests you, then read the other topics here as well.

All in all, simply inflating the plumbing code in your project doesn't solve architectural challenges for the same reason it doesn't solve code reusability: It addresses only imaginary, abstract problems humans made up. Simply put, it doesn't solve any problems at all, it only hides them.

Project Lifecycle

What I will describe here is quite basic. I know far from everything about this subject and I learn every day as I gain more experience.


First of all, let's consider the clean slate situation. A new project is being started up and the question is how to go about it.

If you get a new project landing in your lap you may find that it's hard to split up the work because you don't know what the individual pieces are yet. With a small group you can relatively easily work this out and integrate multiple perspectives into it. If you get handed a hundred developers and you are expected to give them all something useful to do, then you're clearly in trouble.

The thing is that in order to expose the parallelism of tasks in the project you need to explore it. Anything else is asking for headaches and wasted time. If the one hundred developers have nothing to do because they have to wait for you to find out what even needs to happen, then it's the fault of the managers.

The ideal situation is to have a pipeline, not of development of a single project, but a pipeline of projects. Because you can't throw enormous resources at a project in its infantile state, you need to pipeline your development plan. While the hundred developers are finishing up project A, a smaller group should be picked out to start on project B and prepare it for the influx of developers that will happen when project A is complete. This pipeline can of course be whatever length that is applicable for the organization.


Another avenue of parallelization in programming tasks that is embarassingly easy is to simply work on more than one project. This is sort of what happens in the pipelining scheme, but if you have hundreds of developers than it may well be worth it to develop multiple things. After all, adding more developers to a single project will have diminishing returns as Amdahl's Law starts to rear its head.


Back to top.

Anatomy of a Program

Let's start with what programs are. Most people think of the code that goes into them, but the program really is the exectable program in the end. While programs that are fed to interpreters really consist of their source code, it isn't the case for compiled programs. Compiled programs consist of the machine instructions that make up their execution logic.

And what do they do? They manipulate data. All the meaningful operations the program performs manipulate data in one way or another. Anything else the program does it manage its own control flow.

Luckily, this level of abstraction, while important to understand, isn't necessary for our daily work. It is however a prerequisite for properly understanding this section, but we will mostly focus on the source code level.


Programs have two kinds of flow, which are important to consider both for you and your compiler. These two flows are control flow, i.e. the structure that emerges from jumping around in the code, and data flow, i.e. how data moves around in the system.

The anatomy of your program consists of these two flows. Simplified: The data operations and the structured way the program performs them.

Now, when we think in terms of software architecture we think on a different level than a compiler usually does. The bulk of the work a compiler does is in optimizing the internals of procedures. A procedure has manageable boundaries, whole-program optimization is simply a bigger ordeal. This level of detail is however not the way we want to think about the problems; we want to think about the program as a whole because procedures are easier to redesign than the larger, composite structures are.


This is where a merit of UML diagrams and the like gets to shine, as they get something right: The programmer-view of the program is about what calls what.

I don't care if you think in objects, procedures or whatever else. Code will at times defer to another piece of code. what is important is to have a well-defined notion of what a unit of code is in your model. For me, programming in C, it's procedures.


The anatomy of a program in my case is a graph that looks like this: the nodes are procedures, and if procedure A has a call to procedure B, then (A,B) is an edge in the graph. This is commonly referred to as a call graph, commonly used in compilers.

The Hierarchy Of Partial Solutions

Considering that the source code forms a call graph, an obvious question presents itself: what are desirable properties of the call graph? It should be clear that things like how cross-cutting concerns are handled can be seen in the shape of the call graph.

Let us first consider a "bird's nest" call graph. Disregard for a moment that possible semantics of the program it represents, and think of what it really means that the call graph is crisscrossed with edges. Clearly, there are no logical subdivisions of the code in such a program.

Human-intuitive subdivisions of code imposes restrictions on the shape of the call graph, which is probably a good thing as long as those restrictions do not inhibit desirable properties (e.g. hinder optimizations). Note that I have not studied that topic in detail, and I therefore cannot know of pitfalls in my suggestions in this regard. Apply your own critical thinking.


My angle on the call graph is that the source code should be well-structured and easy to work with. To this end to originally directed my attention towards trees. Your program entry point would be the root, and procedure calls always go "down" in the tree. This provides a nice, mental invariant, which makes it more intuitive to navigate around. The problem is that a tree disallows code reuse.

To solve the reuse issue, a DAG (Directed, Acyclic Graph) can be used. It has the same mental invariant, and it allows branches to converge on a shared procedure node, which constitutes code reuse.

I call the DAG structure the hierarchy of partial solutions.


I personally find that this structure does a few things that I like.


ADTs (Abstract Data Structures) like lists, dictionaries, etc. don't necessarily benefit too much from this, but the first time I tried applying this thinking in an explicit manner was in fact for ring queues.

There were two queues where some details differed, and they were simplified greatly by making completely separate queues that had an abstract ring queue member variable. Management of memory and queue type specific details could be handled without over-generalizing (over-complicating) a single implementation. It was slightly analogous to prefering composition over inheritance.


Back to top.

Dependencies

It is always useful when you can use code that is written beforehand to accomplish a task. The downside is that code written by arbitrary people is of arbitrary quality. Additionally, a lot of available libraries are APIs that are completely forsaken by sanity.

I have complained plenty about the API design of OpenSSL in other topics, namely the topics in the Practical Aspects of Programming section. Of course, that section should teach you plenty about what you should be doing, and shouldn't be, in order to make your own libraries worthwhile, but it also acts as a measuring stick to evaluate libraries you find. Does it take control out of your hands? Might be fine for a prototype but definitely not for a serious end product. You get the idea.


So, considering that libraries are terrible, what should we do? First of all, have as few as possible aside from your own. Instead of pulling in twenty libraries to do all kinds of things, stick to e.g. OpenSSL and pthreads (because OpenSSL assumes that you use pthreads, e.g. by using pthread_once for lazy initialization). It is often more than reasonable to roll your own code for most things, in fact it can be much faster to roll your own then to learn how to use a library. The added bonus of that is that you just made something better than what you could have gotten from internet randos, but if you really did make it better (by following my advice) then you can trivially use it in any of your projects.


One more important note: dynamic linkage is a stability and portability hazard. It simply sucks to have programs refuse to start because of a dynamic linkage issue, which I have most often experienced on the laptop I am currently writing this on (2022-12-12), where glibc is several years outdated. The insidious part of this is that when you write your program you can't see that such a problem is even possible, as evidenced by games that are listed as running on my OS version but fail to start for this very reason.

This problem isn't limited to running the final result; it can also prevent people from compiling on their own machine if you use newer versions of shared libraries than they have available.

To solve this problem you download your dependencies and put them in your source control together with the rest of your code. This provides a more stable build environment where you don't get errors stemming from missing dependencies, and with static linkage you can prevent runtime error conditions.


There are two common retorts: Safety/bug fixes and space usage.

Firstly, I have never heard of a dynamic library getting a bug fix solving a problem except in a single case: SDL1, which is an ancient dependency that no software should use in todays age, where there is a keyboard input bug that can be solved by partching the system library. The situation is that everyone who runs a specific game needs to download and manually patch their own system libraries to play the game, instead of the game incorporating the patch in its own codebase, which would solve the problem for good. The superior option should be obvious.

The space concern isn't very important either. Code simply isn't all that large, so disk space isn't exactly going to get tight on that account. Shared libraries being shared in memory is however a notable point, and may well be important in overall memory usage on systems that have a large quantity of distinct processes running concurrently. This does however appear to me as more of a systems programming issue than an application programming issue.