Pkgupgrade. The author has writen a python program, pkgupgrade, to help solve what he sees as issues in portupgrade, while keeping some of its advantages. The first idea is not using any state keeping mechanism, so as to not rely on databases, etc. In this, it is very similar to portmaster and recomputes all necessary dependencies. In particular pkgupgrade doesn't rely on the indications in the package "database" which we have explained have a natural tendency to suffer bitrot. Conversely it doesn't try, contrary to portupgrade, to fix them, it simply ignores everything except the installed package names and their origins that it extracts from the relevant line in +CONTENTS. The other main idea is to prepare everything before doing any modification to the installed ports, so that one knows before any destructive action, exactly what will be removed, installed and compiled. This is analogous to apt-get's behavior and is certainly useful to avoid disastrous upgrades. Moreover all this information appears in written form to be studied at leisure, and not on screen. Unlike portmaster, pkgupgrade's aim is to use precompiled packages as far as possible, hence its name. The aim of this program is very different from the aim of portupgrade or portmaster, it is intended to be used infrequently, with massive upgrades, and the majority of packages being precompiled, basically just after a FreeBSD release. However, since this is a program intended to be used on FreeBSD, the possibility that some ports have to be compiled, or that one prefers to compile them is always present. Let us repeat that the preference given to precompiled packages is motivated by the desire of having reliable upgrades, since there is always a non negligible probability that a compilation fails, for example because the distfile has disappeared. A secondary motivation is that some ports are extremely long to compile, particularly big C++ frameworks such as KDE, OpenOffice, etc. There is no gain whatsoever to be expected from recompilation on ones machine of such big software. Only people extremely unaware of the reality of gcc optimization may expect that their super mega framework will run much faster when compiled with -O6 or other inexistent compiler flags frequently advocated by zealots. The situation is very different for server software installed on server machines, where the administrator may have chosen to use particular Makefile settings, best suited to his situation, to select or deselect particular features. Moreover frequent security upgrades may be necessary. The author does not advocate the use of pkgupgrade in such situation, portupgrade or portmaster are very well suited for maintaining such installations. However, pkgupgrade has a feature to help using it in this situation. One may demand that particular ports are always compiled while the rest is installed from binary packages by listing them in COMPILE. This combines the reliability and speed of pkgupgrade with the flexibility of being able to compile ones favorite ports. As may already be clear, one of the aims of pkgupgrade is running as fast as possible. On the other hand one assumes that disk space is cheap, that people don't have objections to having so­called bloatware like python on their machine, that an Internet connection is available, and there is no problem downloading a lot of packages. This is not a program for minimalistic people. Note that using compilation from source requires downloading a similar quantity of distfiles. Note also that any user who has installed KDE or Gnome, or perhaps many other ports, necessarily has installed python as a dependency. However, contrary to portupgrade, which uses several ports besides the ruby port, pkgupgrade uses exclusively facilities included in the standard python port, without any addition. It has been developed with python-2.4 and will not need any change for python-2.5. One of the advantages of python is that it is a very stable and mature language, with excellent backwards compatibility. The situation with ruby and portupgrade is well known to be entirely different. A prerequisite to running pkgupgrade is having updated the ports collection to a state at least as recent as the release one intends to update to. One possibility is to simply extract the ports.tgz from the RELEASE cdrom, as well as using the packages in the same cdrom. This ensures that the ports state is exactly coherent with the packages state, which can only minimize problems. A more recent state of the ports tree should make no difference, except that the ports we will compile will be more recent, and that some ports may have been removed in between, which in principle pkgupgrade deals with gracefully. It may also happen that the dependency relations are not the same in the ports system and in the RELEASE packages, so that minimizing the distance between the state of ports and binary packages is always a good idea. Since one cannot rely entirely on packages and some ports will necessarily be compiled, it would not be a solution to disregard the port tree and base the dependency analysis on the INDEX for packages. The program proceeds in the following way: first it determines all installed packages and finds their origins. Then it follows the "MOVED" file to discover if each origin has moved or the port has disappeared. Doing so one obtains a list of origins covering most of the installed ports. Then one runs make -V commands for each port in order to discover its run-time dependencies, and one adds their origins so that the procedure closes under dependency. Simultaneously one downloads the latest INDEX from a FreeBSD ftp site corresponding to the appropriate RELEASE, e.g. FreeBSD-6.2 if uname says so. Here one locates all precompiled packages that could be installed to fulfill the above requirements. The other ones will need to be compiled, so the script extends its analysis to build-time dependencies for these ones, and tries to find precompiled packages. Optionally one can replace a "RELEASE" distribution of packages by "Latest" ones, but then the coherency is less guaranteed. This gives a list of ports which is closed under dependencies, and for which we build a complete INDEX. This is rather long, so we use threading to achieve maximum parallelism. As a by product we are also able to get the list sorted in topological order. Moreover we know which packages are sufficiently up to date, which ones will need to be upgraded by binary packages, and which ones will be compiled. One can put some ports on hold, they will not be considered at all, by putting them in a HOLD list. Similarly there is a list COMPILE for ports which should only be compiled, never installed from binary packages. This accommodates the needs of people wanting to tweak the build parameters of particular software, by inserting appropriate options in /etc/make.conf. Note that dependencies of such ports will still be installed from binary packages if possible. Each port is named after the last name it gets following the MOVED file, or the last valid name before removal in the same file. Precompiled and installed packages are coerced to use the same naming scheme, so that we may compare them. The strategy is to consider only the installed packages which have more recent versions, either in binary form, or to be compiled. If a precompiled package exists it will be preferred systematically, even if a more recent port exists. Of course an installed package is never downgraded to an inferior or equal version. This leads us to a small discussion of the way in which package versioning works. We have already explained that versioning has several components, the version proper (which usually comes from upstream), the portrevision, which has to do with revisions of the FreeBSD port itself, and the portepoch, which is a knob added to solve version misorderings. We want to add here some comments on the version itself, which, usually coming from the software author, suffers from severe inhomogeneities which renders determination of correct ordering difficult. Usual version numbers are dot separated numbers like 1.2.3 which are easy to parse. However difficulties occur when mentions like "release candidate", "alpha", "beta", "patchlevel" creep into the version. For example it is clear that we want 1.0rc2 < 1.0 because we want the second release candidate to come before the final version. But we also want that a version with a high patchlevel comes after the initial version, perhaps 1.0pl9 > 1.0. In FreeBSD the "official" order is determined by running pkg_version -t on two version strings. The code for that is in version.c, see in src/usr.sbin/pkg_install/lib, and contains a large number of special cases. In pkgupgrade, there is a python routine to determine such ordering, and in case it doesn't work on some nasty version string, there is the possibility of using pkg_version. One should be aware, however, that forking external programs in this way has a high performance cost, for example i have measured around 6s for 1000 executions of pkg_version on a very high performance machine, while the python version runs 200 times faster. The end result of this analysis phase is a table listing, for the relevant ports (those requiring attention), installed package, binary package existing in the repository, and package which would be compiled from the ports, that is, a priori three different packages. All of them are indexed as explained above by the most recent origin. This represents the "general case" but there can be different situations. First an installed package may have been removed after installation and before package creation, or after package creation and before the present state of ports (in this case we keep it). Conversely a port may be absent of the installation, but required as a dependency by more recent versions of the software, and thus either present as a precompiled package or to be built. Finally in case some port has to be built, we also try to install precompiled packages for the build dependencies of this port. This is because it may happen that such build dependencies are extremely heavy to compile, such as a new version of gcc or of java, so that using packages represents an enormous gain. We hope this brief summary gives an idea of the complexity of the situation, having to cope with three different states of the software, two sorts of dependencies (run time and build time), two species of software (precompiled or to be compiled), very little normalization of software naming and versioning, is a lot more than a software like Debian apt-get has to consider. Here is an example of such a table, as it appears in UpgradeLog. The ports are listed in dependency order, so that all dependencies of a port should appear before it. Only ports concerned by the upgrade procedure are shown, those that will not be modified are hidden (in particular those in hold). Old installed ==> Ports ==> Binary pkgs jmf-2.1.1e ==> jmf-2.1.1e ==> Build gnu-automake-1.8.4 ==> gnu-automake-1.10 ==> gnu-automake-1.9.6 vim-gtk2-7.0.178 ==> vim-7.0.178_3 ==> Build NewBuild ==> rpm2cpio-1.2_2 ==> rpm2cpio-1.2_2 New ==> p5-XML-SAX-0.15 ==> p5-XML-SAX-0.14 ...... This shows that gnu_automake which is installed at version 1.8.4 will be removed and replaced by the binary package of version 1.9.6 although compilation would allow to get 1.10. On the other hand vim will be upgraded by compilation, because no adequate binary package exists. In fact when a port is first considered the third field is set to 'Build'. Afterwards, the INDEX for prebuilt packages is downloaded, and the third field is updated to the value occurring here. So it remains 'Build' only when it doesn't appear in this INDEX, the usual reasons being that the build failed, or binary distribution is forbidden, or more rarely the port did not exist at package build time. The port jmf has in fact already been compiled on the machine and is current. It will be removed from consideration and this will be logged. The port p5-XML-SAX is a new run dependency of some other port to be installed, not present before, and will be installed through binary package at version 0.14. A peculiarity is port rpm2cpio which is a build dependency of a port we want to build, so flagged NewBuild and will be installed through binary package. Ports which were already installed on the machine are recorded with their version number on first field. Otherwise this field contains New, or exceptionally NewBuild in the above case. The second field either contains the current package name computed from the ports tree, or 'Removed' if the port has been removed. In this last case, and if no prebuilt package exists, the third field will also be set to 'Removed', otherwise the binary package will be installed if it is recent enough. Finally the ports listed in COMPILE will not be upgraded using a binary package, even if a good one exists, and will be added to the list of ports to build. In a second step, we still use threading to do simultaneously two tasks. We backup all old packages that will be removed, and download from the ftp site all packages that will be installed. To gain space and time the backup is limited to shared libraries and configuration files, and already present backups will not be redone - a useful feature if we run the script several times, e.g. on several machines, with shared backup directory. However packages which will end up completely removed, because they have been removed in the MOVED file, will be completely backed up, and their name will be prefixed by "REM-", so as to spot them easily in case of need. Moreover this full backup is done inconditionally because this concerns very few ports. Shared libraries are detected by running the command "file". The backup itself is performed by a python script, save_pkg.py[11] written by Cyrille Szymanski. It uses some heuristics to speed up the task, such as looking at certain patterns in the path of the file or its suffix. For example it will classify as configuration files, files whose path contain etc, conf, ... or end up in .conf or .cfg. All backups are logged to BackupLog. These backups are kept to help fixing old programs who broke during the upgrade by losing some shared library dependencies, or configuring some new programs where the config files would have been inadvertently overwritten. They have vocation to be erased after some time. It is quite probable that they will not be used at all, this being the author's experience. This is the reason why, contrary to portupgrade, we don't install systematically the saved shared libraries in a compat directory. If a problem happens it will be obvious enough, as well as its solution. Conversely detecting the useless libraries in the compat directory is far more difficult and error prone. To help speed up downloading one can mount a cdrom, typically the second cdrom of a FreeBSD release, the script will look in /cdrom/packages/All to find necessary packages, and will establish a symlink if they are here. Similarly one can use a repository from some NFS share or whatever. The script will only download by ftp the packages it has not found locally. The author has remarked that, once a first machine has been upgraded, and corresponding backups and downloads done, there is very little to do for the upgrade of a second machine when packages are kept on an NFS share. Usually pkgupgrade will run in less than 5 minutes. When all this is done, the script has exact knowledge of present packages, and it proceeds to write a simple shell script whose aim is to remove all packages flagged for upgrade (in inverse topological sort so as to not trigger complaints from pkg_delete), add all precompiled packages in direct sort order (to keep pkg_add happy), finally launch compilation of the ports that we need to compile. This last step is of course susceptible of erroring out, as with all other source based methods, like portupgrade or portmaster. In my experience, even with a recent release like FreeBSD-6.2-RELEASE, there are important ports which have problems, for example mplayer, one of the most sought after ports doesn't install without tweaks on a fresh 6.2 box due to problems with win32 plugins. So it is my opinion that relying on source compilation for building a reliable system is a recipe for problems. The less one compiles ports, the more the probability of a successful upgrade augments. To run the program, the best way is to change to a clean directory, perhaps on an NFS share if several machines need upgrading, and simply launch it. There are no options, to keep it as simple as possible. There are however configurable settings at the beginning of pkgupgrade. There is no need to run as root and absolutely no destructive action occurs. Efforts have been applied so that it runs as fast as possible, so one may expect to get the results in short time. Of course if there are a lot of backups to do and downloads to perform, it is necessary to wait until they are finished. Otherwise, with a fast machine and a fast Internet connection, the time will be of the order of a quarter of an hour or less, if one has previously mounted a FreeBSD cdrom, even for a large set of installed packages. Many of the events encountered are logged to UpgradeLog, which may be helpful in case of problems. Here is a list of configurable options: portdir = "/usr/ports/" # can be replaced for non standard installs pkgdir = "/var/db/pkg/" # idem freebsd_server = "ftp.freebsd.org" nwork = 3 # Number of worker threads delay = 0.01 # Small delay before firing a new job. download_index = True # Do we download INDEX.ftp ? keep_index = True # Do we keep it after download ? add_deps_log = False pkg_release = True # Do we use RELEASE or Latest packages? pkg_cmp_version = False # Do we fork pkg_version -t to compare versions ? index_pristine = True pkg_repos = '/cdrom/packages/All:depot:/usr/ports/packages/All' HOLD = ['print/teTeX-texmf', ...] COMPILE=[] They can be changed either in pkgupgrade itself or in /usr/local/etc/pkgupgrade.conf which will be sourced at startup. The first two accomodate installations in non standard places (please, no symlinks, it will not work), the third one is for choosing a FreeBSD mirror. HOLD and COMPILE have already been described. The flag pkg_release allows to choose between RELEASE or Latest binary packages, for the other ones see the comments in pkgupgrade. All downloaded packages end up in directory Packages, and backups end up in directory Backups. The individual files saved in Backups can be seen in BackupLog. Of course it is necessary that enough disk space exists to store Backups and Packages. The main products of pkgupgrade are an index of all installed ports and all their dependencies, named INDEX.ports, (the index downloaded by ftp is kept as INDEX.ftp) which can be explored with a tool like show_index.py, and a shell script UpgradeShell, which contains the upgrade instructions. This script can be reviewed at leisure in case one fears some problem. The script is extremely simple and readable, and there is no problem editing it further if it may prove useful. Using a shell script has the advantage that it will run completely independently of any infrastructure provided by the ports system, and will be immune to all package removals. Since no complicated logic is required at this stage, it is perfectly adequate to the job. The shell script has knowledge of the location of Packages, etc. so can be run from anywhere, and preferably from a base system shell on console. When compiling ports it will log success or error in UpgradeLog, so that, at the end of the procedure, one gets a complete log of events in this file. To run the shell script, of course, one needs to be root, and it will have destructive action. Here there is no difference with other tools like portupgrade or portmaster. However we know in advance what will be removed and can be confident on the packages that will be installed. The elements of risk are localized in the ports that will be built, and will perhaps require user attention, if only to fix configuration. All package removals and installations will proceed at full speed, avoiding to spoil user time. In the author's experience, it took around 2 hours to run UpgradeShell with around 500 ports removed and reinstalled, out of a total of 700. Only 7 ports needed compilation, and 3 compilations failed, by lack of distfiles. This was easily cured by upgrade of the ports tree afterwards. As intended, all deinstallation (15 minutes) and reinstallation of binary packages went smoothly and without any user interaction. One may find that 1h45 is a lot for the installation of 500 packages, but this is the problem of pkg_add. No package management system will be able to run pkg_add faster. Moreover port compilations will occur on a clean machine, without the clutter of old installations, which i think reduces the risk of misbehavior. In this pkgupgrade differs a lot from portupgrade or portmaster, it is much closer to the "wipe everything and reinstall" strategy which, in my experience works much better than progressive upgrading, be it on FreeBSD or on Linux distributions. Failed ports will also be logged to UpgradeLog so that one may redo the builds at leisure afterwards. If failure was due to inaccessibility of a distfile, perhaps updating the port in question or the ports tree will be a sufficient solution. Extensive logging and the backup strategy should reduce risks to the minimum.