Programs with custom services, virtual environments, config files in different locations, programs creating datas in different location…
I know today a lot of stuff runs in docker, but how does a sysadmin remember what has done on its system? Is it all about documenting and keeping your docs updated? Is there any other way?
(Eg. For installing calibre-web I had to create a python venv, the venv is owned by root in /opt
, but the service starting calibre web in /etc/systemd/system
needs to be executed with the User=<user>
specifier because calibre web wants to write in a user home directory, at the same time the database folder needs to be owned by www-data because I want to r/w it from nextcloud… So calibreweb is installed as a custom root(?) program, running in a virtual env, can access a folder owned by someone else, but still needs to be executed by another user to store its data there… )
Despite my current confusion in understanding if all of this is right in terms of security, syntax and ownership, No fucking way I will remember all this stuff in a week from now… So… What do you use to do, if you do something? Do you use flowcharts? Simple text documents? Both?
Essentially, how do you keep track?
Don’t make a mess, and do the changes you need with ansible. Effectively making its code your documentation.
Yes. Documentation. Documentation aaaalll the way.
You are right. In two months you wont remember the shit you had to enable/disable to make things work.
Doing things that arent a reocurring doing should be documented. Not crazy. A basic how to set up is enough.
Common/reocurring errors/situations? Document 'em
Got a semi permanent fix for problem, so that it will most likely never come up again, but possibly in 5 years? Document it fella.
You’ll kiss your past self on the head and say thanks when you have an critical ticket in 5 years and remember nothing about the doing itself but that you wrote some documentation.
It will save your ass and possibly you might come out as the hero of the day for having a solution right away for a super nieche problem.
I’ve making a private hosted documentation for stuff, tricks and problems i learn at work.
I’ve had plenty of situatuons where i remembered that i already encountered such a situation yeeeaars ago at my previois employer and that i’ve written somtehting down in my personal documentation. Bam and just by a few mins I’ve got either a really good or at least a shittysysadmin-style solution that works.
Yep, and don’t just state the what, but the why in your docs.
The why really helps with knowing if a step is still important, or if it no longer applies. This is especially important with anything cloud based, as I’ve seen weird workarounds become no longer needed due to updates, and I would never have caught it without my notes on why we had the weird workaround to begin with.
You are right. In two months you wont remember the shit you had to enable/disable to make things work.
Tried to login to my router that I reset up about 2 months ago…hell if I remember the password
Follow some basic rules so as to avoid making the mess.
Only install standard packages from distro’s repository and Python’s pseudo-official PIP. For both, keep a text file with the installed package names. No compiling from source EVER. Too much hassle to maintain.
Back up config files that I changed. Not all of them.
Keep a text file to record what I did, with exact commands etc, whenever I need to go off-road. Much experience taught me that this is a chore that is very much worth the effort.
But still, the problem you point to is real. It’s the reason for immutable distros. The idea of which I find quite tempting.
“Infrastructure as code” is what the strategy is typically called. You use one of the many tools for orchestrating configuration of hosts (Ansible, OpenTofu, Puppet, Saltstack, Chef, etc.). These allow you to provide configuration files and code for setting up your hosts in a central place. This place is typically a Git repo, allowing you to keep track of when which change was made.
Depending on the tool you use, you trigger applying the configuration on your dev PC, or there’s a hosted CI/CD server which automatically rolls out the changes when a new commit is pushed.
We use a mix of FreshDesk for tickets/(some) projects/helpdesk articles and Teams/Sharepoint for documentation and distribution of info/help to techs, analysts, end users, etc.
As for the non-technical side of the answer: Basically, yeah, just document everything you can when you come across anything that needs documented.
Declarative configuration fixes this problem. You don’t really have to write down how to setup something because the configuration is the description.
I use NixOS so in my case all the stuff you described would be defined in a Nix code in a separate Calibre module. I can enable and disable such module at will with a single option in my main config file.
I really recommend looking into immutable, declarative systems. I think NixOS is the most complete solution but there are some other too. I have no experience with them though.
I take daily work log notes in obsidian, then transclude chunks from those notes into topic notes and attach config files, images, context from the web, etc.
Obsidian was a game changer for me. I just paste all the stuff, make a tag and forget about it until it is needed. For those not using it, there are dozens of plugins and unlimited options for customization. I have a standard daily note with timestamps and changelog over the whole vault and sqlite-like queries that manage dynamic dataviews.
I use a lot of comments in config files, and in the past I’ve also used bookstack to make documentation (something I should probably do again). You’re right that docker (especially docker compose) has helped with this immensely.
I keep a documentation page in my wiki for every thing I set up - how I did it, what I ran into, how I fixed it, and where everything is. Reason being, when it comes time to upgrade or I have to install it again someplace else, I remember how I did it. Basically, every completed step gets copy-and-pasted into a page along with notes about it.
As for watching the file system, I have AIDE on all of my boxen (configured to run daily, but not configured to copy the new AIDE database over the old one automatically). That way, I can look at the output of an AIDE run and see what new files were created where (which would correspond to when I installed the new thing).
code forges are great for management tasks. host an internal forgejo, and create repos for your servers and services. use issues for keeping track of initial setup, config changes and upgrades. have a longer term issue for whenyou just want to record a little change but too lazy to open a full issue for it. you can also store config in the git repo, and write docs as wiki pages for things that are more stable or important aspects of your systems